TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Litigation Funding

May 8th, 2012

An internet search on the phrase “litigation funding” returns thousands of hits.  There are an incredible number of companies and persons “out there” who will buy equity shares in a lawsuit.  Hedge funds are actively seeking opportunities to invest in lawsuits.

Putting aside the concerns about champerty and maintenance, I wonder whether defense counsel are doing enough to work on this issue in trials.  Assuming that these websites really are engaged in the practice they describe, shouldn’t defense counsel include questions related to investments in lawsuits, in their voir dire of the jury panel?

Obviously if potential jurors owned stock in the defendant company, they would be disqualified.  Equity ownership in a chose in action surely is relevant to counsel’s evaluation of a prospective juror’s impartiality.  Even if the prospective juror is not invested in this particular lawsuit, the question is important.  If the investment is in the litigation generally, or in other plaintiffs’ cases within the same litigation, then a jury verdict in favor of the plaintiff would likely benefit the juror by increasing the settlement value of the other cases.  Even investments in unrelated personal injury litigation, the investments have the potential to prejudice the juror against the defense.  For instance, if the juror has investments in another personal injury litigation, returning a large verdict in the present case could benefit the investment by making the company defending against the juror’s chose in action believe that trying cases in the particular venue was too dangerous to risk, and those claims should be settled.

Certainly, the existence and extent of investment by others in a lawsuit should be a worthy line of discovery to conduct in mass tort litigation.

Are we doing enough to stop this insanity?

Haack Attack on Legal Probabilism

May 6th, 2012

Last year, Professor Susan Haack presented a lecture on “legal probabilism,” at a conference on Standards of Proof and Scientific Evidence, held at the University of Girona, in Spain.  The lecture can be viewed on-line, and a manuscript of Haack’s paper is available , as well.  Susan Haack, “Legal Probabilism:  An Epistemological Dissent” (2011)(cited here as “Haack”).   Professor Haack has franked her paper as a draft, with an admonition “do not cite without permission,” an imperative that has no moral or legal force.  Her imperative certainly has no epistemic warrant.  We will ignore it.

As I have noted previously, here and there, Professor Haack is a Professor of philosophy and of law, at the University of Miami, Florida.  She has written widely on the philosophy of science, in the spirit of Peirce’s pragmatism.  Despite her frequent untutored judgments about legal matters, much of what she has written is a useful corrective to formalistic writings on “the scientific method,” and are worthy of study by lawyers interested in the intersection of science and the law.

The video of Professor Haack’s presentation is worth watching to get an idea of how ad hominem her style is.  I won’t repeat her aspersions and pejorative comments here.  They are not in her paper, and I will take her paper, which she posted online, as the expression of her mature thinking.

Invoking Lord Russell and Richard von Mises, Haack criticizes the reduction of epistemology to a calculus of probability.  Russell, for instance, cautioned against confusing the credibility of a claim with the probability that the claim is true:

“[I]t is clear that some things are almost certain, while others are matters of hazardous conjecture. For a rational man, there is a scale of doubtfulness, from simple logical and arithmetical propositions and perceptive judgments, at one end, to such questions as what language the Myceneans spoke or “what song the Sirens sang” at the other … , [T]he rational man, who attaches to each proposition the right degree of credibility, will be guided by the mathematical theory of probability when it is applicable . … The concept ‘degree of credibility’, however, is applicable much more widely than that of mathematical probability.”‘

Bertrand Russell, Human Knowledge, Its Scope and Limits 381 (N.Y. 1948)(quoted in Haack, supra, at 1).   Haack argues that ordinary language is beguiling.  We use “probably” to hedge our commitment to the truth of a prediction or a proposition of fact.  We insert the adverb “probably” to recognize that our statement might turn out false, although we have no idea of how likely, and no way of quantifying the probability of error.  Thus,

“[w]e commonly use the language of probability or likelihood when we talk about the credibility or warrant of a claim-about how likely is it, given this evidence, that the claim is true, or, unconditionally, about how probable the claim is.”

Haack at 14.

Epistemology is the “thing,” and psychology, not.  Haack admits that legal language is inconsistent:  sometimes the law appears to embrace psychological states of mind as relevant criteria for decisions; sometimes the law is expressly looking at epistemic warrant for the truth of claim.  Flipping the philosophical bird to Derrida and Feyerabend, Haack argues that trials are searches for the truth, and that our notions of substantial justice require replacement of psychological standards of proof, to the extent that they are merely subjective and non-epistemic, with a clear theory of epistemic warrant.  Haack at 6 (citing Tehan v. United States, 383 U.S. 406,416 (1966)(“the purpose of a trial is to determine the truth”); id. at 7 (citing In re Winship, 397 U.S. 358, 368, 370 (1970) (Harlan, J. concurring)(the standard of proof is meant to “instruct the factfinder concerning the degree of confidence our society thinks he should have in the correctness of factual conclusions for a particular type of adjudication.)

Haack points out that there are instances where evidence seems to matter more than subjective state of mind, although the law sometimes equivocates.  She cautions us that “we shouldn’t simply assume, just because the word “probable” or “probability” occurs in legal contexts, that we are dealing with mathematical, rather than epistemological, probabilities.  Haack at 16.  (citing and quoting Thomas Starkie, et al., A Practical Treatise of the Law of Evidence and Digest of Proofs in Civil and Criminal Proceedings vol. I, 579 (Philadelphia 1842)(“That … moral probabilities … could ever be represented by numbers … and thus be subject to numerical analysis,” … “cannot but be regarded as visionary and chimerical.”)  Thus the criminal standard, “beyond a reasonable doubt” seems to be about state of mind, but it is described, at least some of the time, as about the quality and strength of the evidence needed to attain such a state of mind.  The standards of “preponderance of the evidence” and “clear and convincing evidence,” on the other hand, appear to be directly related to the strength of the evidentiary display offered by the party with the burden of proof.

An example that Haack might have used, but did not, is the requirement that an expert witness express an opinion to a “reasonable degree of medical or scientific certainty.”  The law is not particularly concerned about the psychological state of certainty possessed by the witness:  the witness may be a dogmatist with absolute certainty but no epistemic warrant; and that simply will not do.

Of course, the preponderance standard is alternatively expressed as the burden to show the disputed fact is “more likely than not” correct, and that brings us back to explicit probabilisms in the law.  Haack’s argument would be bolstered by acknowledging the work of Professor Kahnemann, who makes the interesting point, at several places, that experts, or for that matter anyone making decisions, are not necessarily expert at determining their level of certainty.  Can someone really say that they believe one set of claims have been shown to be 50.1%, and have an intelligent discussion with another person, who adamantly believes that the claims have been shown to 49.9% true.  Do they resolve their differences by splitting the differences?  Unless we are dealing with an explicit set of frequencies or proportions, the language of probability is metaphorical.

Haack appropriates the term warrant for her epistemiologic theory, but the use seems much older and not novel with Haack.  In any event, Haack sets out her theory of “warrants”:

“(i) How supportive the evidence is; analogue: how well a crossword entry fits with the clue and intersecting completed entries. Evidence may be supportive (positive, favorable), undermining (negative, unfavorable), or neutral (irrelevant) with respect to some conclusion.

(ii) How secure the reasons are, independent of the claim in question; analogue:  how reasonable the competed intersecting entries are, independent of the entry in question. The better the independent security of positive reasons, the more warranted the conclusion, but the better the independent security of negative reasons, the less warranted the conclusion.

(iii) How comprehensive the evidence is, i.e., how much of the relevant evidence it includes; analogue: how much of the crossword has been completed. More comprehensive evidence gives more warrant to a conclusion than less comprehensive evidence does iff the additional evidence is at least as favorable as the rest.”

Haack at 18 (internal citation omitted).  According to Haack, the calculus of probabilities does not help in computing degrees of epistemic warrant.  Id. at 20. Her reasons are noteworthy:

  • “since quality of evidence has several distinct dimensions (supportiveness, independent security, comprehensiveness), and there is no way to rank relative success and failure across these different factors, there is no guarantee even of a linear ordering of degrees of warrant;
  • while the probability of p and the probability of not-p must add up to 1, when there is no evidence, or only very weak evidence, either way, neither p nor not-p may be warranted to any degree; and
  • while the probability of p and q (for independent p and q) is the product of the two, and hence, unless both are 1, less than the probability of either, the warrant of a conjunction may be higher than the warrant of its components”

Id. at 20-21.  The third bullet appears to have been a misfire.  If we were to use Bayes’ theorem, the two pieces of evidence would require sequential adjustments to our posterior odds or probability; we would not multiply the two probabilities directly.

Haack’s attack on legal probabilism blinds her to the reality that sometimes all there is in a legal case is probabilistic evidence.  For instance, in the litigation over claims that asbestos causes colorectal cancer, plaintiffs had only a relative risk statistic to support their desired inference that asbestos had caused their colorectal cancers.  There was no other evidence.  (On general causation, the animal studies failed to find colorectal cancer from asbestos ingestion, and the “weight of evidence” was against an association in any event.)  Nonetheless, Haack cites one case as a triumph of her anti-probabilistic viewpoint:

“Here I am deliberately echoing the words of the Supreme Court of New Jersey in Landrigan, rejecting the idea that epidemiological evidence of a doubling of risk is sufficient to establish specific causation in a toxic-tort case: ‘a relative risk of 2.0 is not so much a password to a finding of causation as one piece of evidence among many’.114 This gets the key epistemological point right.”

Landrigan v. Celotex Corp., 127 N.J. 405, 419, 605 A.2d 1079 (1992).  Well, not really.  Had Haack read the Landrigan decision, including the lower courts’ opinions, she would be aware that there were no other pieces of evidence.  There were no biomarkers, no “fingerprints” of causation; no evidence of Mr. Landrigan’s individual, special vulnerability.  The case went up to the New Jersey Supreme Court, along with a companion case, as a result of directed verdicts.  Caterinicchio v. Pittsburgh Corning Corp., 127 N.J. 428, 605 A.2d 1092 (1992). The plaintiffs had put in their cases and rested; the trial courts were required to assume that the facts were as presented by the plaintiffs.  All the plaintiffs had offered, however, of any possible relevance, was a relative risk statistic.

Haack’s fervent anti-probabilism obscures the utility of probability concepts, especially when probabilities are all we have.   In another jarring example, Haack seems to equate any use of Bayes’ theorem, or any legal analysis that invokes an assessment of probability, with misguided “legal probabilism.”  For instance, Haack writes:

“Mr. Raymond Easton was arrested for a robbery on the basis of a DNA “cold hit”; statistically, the probability was very low that the match between Mr. Easton’s DNA (on file after an arrest for domestic violence) and DNA found at the crime scene was random. But Mr. Easton, who suffered from Parkinson’s disease, was too weak to dress himself or walk more than a few yards-let alone to drive to the crime scene, or to commit the crime.”

Haack at 37 (internal citation omitted).  Bayes’ Theorem, with its requirement of inclusion of a base rate, or prior probability, in the complete analysis provides the complete answer to Haack’s misguided error about DNA cold hits.

 

Philadelphia Plaintiff’s Claims Against Fixodent Prove Toothless

May 2nd, 2012

In Milward, Martyn Smith got a pass from the First Circuit of the U.S. Court of Appeals on his “weight of the evidence” (WOE) approach to formulating an opinion as an expert witness.  Last week, Smith’s WOE did not fare so well.  The Honorable Sandra Mazer Moss, in one of her last rulings as judge presiding over the Philadelphia Court of Common Pleas mass tort program, sprinkled some cheer to dispel WOE in Jacoby v Rite Aid PCCP (Order of April 27, 2012; Opinion of April 12, 2012).

Applying Pennsylvania’s Frye standard, Judge Moss upheld Proctor & Gambles challenge to Dr. Martyn Smith, as well as two other plaintiff expert witnesses, Dr. Ebbing Lautenbach and Dr. Frederick Askari.  The plaintiff, Mr. Mark Jacoby, used Fixodent for six years before he first experienced parasthesias and numbness in his hands and feet.  Jacoby’s expert witnesses claimed that Fixodent contains zinc compounds, which are released upon use, and are absorbed into the blood stream.  Very high zinc levels suppress copper levels, and cause a copper deficiency myeloneuropathy.  Finding that the plaintiffs’ causal claims were toothless in the face of sound science, Judge Moss excluded the reports and proffered testimony of Drs. Smith, Askari, and Lautenbach.

Although Pennsylvania courts follow a Frye standard, Judge Moss followed the lead of a federal judge, who had previously examined the same body of evidence, and who excluded plaintiff’s expert witnesses, under Federal Rule of Evidence 702, in In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345 (S.D. Fla. 2011).  Without explication, Judge Moss stated that Judge Altanoga’s reasoning and conclusions, reached under federal law, were “very persuasive” under Frye.  Moss Opinion at 5.  In particular, Judge Moss appeared to be impressed by the lack of baseline incidence data on copper deficiency myeloneuropathy, the lack of exposure-response information, and the lack of risk ratios for any level of use of Fixodent.  Id. at 6 – 10.

Judge Moss accepted at face value Martyn Smith’s claims that WOE can be used to demonstrate causation when no individual study is conclusive.  Her Honor did, however, look more critically at the component parts of Smith’s particular application of WOE in the Jacoby case.  Smith used various steps of extrapolation, dose-response, and differential diagnosis in applying WOE, but these steps were woefully unsound.  Id. at 9.  There was no evidence of how low, and for how long, a person’s copper levels must drop before injury results.  Having attacked Proctor & Gamble’s pharmacokinetic studies, the plaintiffs’ expert witnesses had no basis for inferring levels for any plaintiff.  Furthermore, the plaintiffs’ witnesses had no baseline incidence data, and no risk ratios to apply for any level of exposure to, or use of, defendant’s product.

Predictably, plaintiffs’ invoked the pass that Smith received in Milward, but Judge Moss easily distinguished Milward as having involved baseline rates and risk ratios (even if Smith may have imagined the data to calculate those ratios).

Another plaintiff witness, Dr. Askari, used a method he called the “totality of the evidence” (TOE) approach.  In short, TOE is WOE is NO good, as applied in this case.  Id. at 10 -11.

Finally, another plaintiff’s witness, Dr. Lautenbach applied the Naranjo Adverse Drug Reaction Probability Scale, by which he purported to transmute case reports and case series into a conclusion of causality.  Actually, Lautenbach seems to have claimed that the lack of analytical epidemiologic studies supporting an association between Fixodent and myeloneuropathy did not refute the existence of a causal relationship.  Of course, this lack of evidence hardly supports the causal relationship.  Judge Moss assumed that Lautenbach was actually asserting a causal relationship, but since he was relying upon the same woefully, toefully flawed body of evidence, Her Honor excluded Dr. Lautenbach as well.  Id. at 12.

WOE-fully Inadequate Methodology – An Ipse Dixit By Another Name

May 1st, 2012

Take all the evidence, throw it into the hopper, close your eyes, open your heart, and guess the weight.  You could be a lucky winner!  The weight of the evidence suggests that the weight-of-the-evidence (WOE) method is little more than subjective opinion, but why care if it helps you to get to a verdict?

The scientific community has never been seriously impressed by the so-called weight of the evidence (WOE) approach to determining causality.  The phrase is vague and ambiguous; its use, inconsistent. See, e.g., V. H. Dale, G.R. Biddinger, M.C. Newman, J.T. Oris, G.W. Suter II, T. Thompson, et al., “Enhancing the ecological risk assessment process,” 4 Integrated Envt’l Assess. Management 306 (2008)(“An approach to interpreting lines of evidence and weight of evidence is critically needed for complex assessments, and it would be useful to develop case studies and/or standards of practice for interpreting lines of evidence.”);  Igor Linkov, Drew Loney, Susan M. Cormier, F.Kyle Satterstrom, Todd Bridges, “Weight-of-evidence evaluation in environmental assessment: review of qualitative and quantitative approaches,” 407 Science of Total Env’t 5199–205 (2009); Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review);   R.G. Stahl Jr., “Issues addressed and unaddressed in EPA’s ecological risk guidelines,” 17 Risk Policy Report 35 (1998); (noting that U.S. Environmental Protection Agency’s guidelines for ecological weight-of-evidence approaches to risk assessment fail to provide guidance); Glenn W. Suter II, Susan M. Cormier, “Why and how to combine evidence in environmental assessments:  Weighing evidence and building cases,” 409 Science of the Total Environment 1406, 1406 (2011)(noting arbitrariness and subjectivity of WOE “methodology”).

 

General Electric v. Joiner

Most savvy judges quickly figured out that weight of the evidence (WOE) was suspect methodology, woefully lacking, and indeed, not really a methodology at all.

The WOE method was part of the hand waving in Joiner by plaintiffs’ expert witnesses, including the frequent testifier Rabbi Teitelbaum.  The majority recognized that Rabbi Teitelbaum’s WOE weighed in at less than a peppercorn, and affirmed the district court’s exclusion of his opinions.  The Joiner Court’s assessment provoked a dissent from Justice Stevens, who was troubled by the Court’s undressing of the WOE methodology:

“Dr. Daniel Teitelbaum elaborated on that approach in his deposition testimony: ‘[A]s a toxicologist when I look at a study, I am going to require that that study meet the general criteria for methodology and statistical analysis, but that when all of that data is collected and you ask me as a patient, Doctor, have I got a risk of getting cancer from this? That those studies don’t answer the question, that I have to put them all together in my mind and look at them in relation to everything I know about the substance and everything I know about the exposure and come to a conclusion. I think when I say, “To a reasonable medical probability as a medical toxicologist, this substance was a contributing cause,” … to his cancer, that that is a valid conclusion based on the totality of the evidence presented to me. And I think that that is an appropriate thing for a toxicologist to do, and it has been the basis of diagnosis for several hundred years, anyway’.

* * * *

Unlike the District Court, the Court of Appeals expressly decided that a ‘weight of the evidence’ methodology was scientifically acceptable. To this extent, the Court of Appeals’ opinion is persuasive. It is not intrinsically “unscientific” for experienced professionals to arrive at a conclusion by weighing all available scientific evidence—this is not the sort of ‘junk science’ with which Daubert was concerned. After all, as Joiner points out, the Environmental Protection Agency (EPA) uses the same methodology to assess risks, albeit using a somewhat different threshold than that required in a trial.  Petitioners’ own experts used the same scientific approach as well. And using this methodology, it would seem that an expert could reasonably have concluded that the study of workers at an Italian capacitor plant, coupled with data from Monsanto’s study and other studies, raises an inference that PCB’s promote lung cancer.”

General Electric v. Joiner, 522 U.S. 136, 152-54 (1997)(Stevens, J., dissenting)(internal citations omitted)(confusing critical assessment of studies with WOE; and quoting Rabbit Teitelbaum’s attempt to conflate diagnosis with etiological attribution).  Justice Stevens could reach his assessment only by ignoring the serious lack of internal and external validity in the studies relied upon by Rabbi Teitelbaum.  Those studies did not support his opinion individually or collectively.

Justice Stevens was wrong as well about the claimed scientific adequacy of WOE.  Courts have long understood that precautionary, preventive judgments of regulatory agencies are different from scientific conclusions that are admissible in civil and criminal litigation.  See Allen v. Pennsylvania Engineering Corp., 102 F.3d 194 (5th Cir. 1996)(WOE, although suitable for regulatory risk assessment, is not appropriate in civil litigation).  Justice Stevens’ characterization of WOE was little more than judicial ipse dixit, and it was, in any event, not the law; it was the argument of a dissenter.

 

Milward v. Acuity Specialty Products

Admittedly, dissents can sometimes help lower court judges chart a path of evasion and avoidance of a higher court’s holding.  In Milward, Justice Stevens’ mischaracterization of WOE and scientific method was adopted as the legal standard for expert witness testimony by a panel of the United States Court of Appeals, for the First Circuit.  Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D. Mass. 2009), rev’d, 639 F.3d 11 (1st Cir. 2011), cert. denied, U.S. Steel Corp. v. Milward, ___ U.S. ___, 2012 WL 33303 (2012).

Mr. Milward claimed that he was exposed to benzene as a refrigerator technician, and developed acute promyelocytic leukeumia (APL) as result.  664 F. Supp. 2d at 140. In support of his claim, Mr. Milward offered the testimony of Dr. Martyn T. Smith, a toxicologist, who testified that the “weight of the evidence” supported his opinion that benzene exposure causes APL. Id. Smith, in his litigation report, described his methodology as an application of WOE:

“The term WOE has come to mean not only a determination of the statistical and explanatory power of any individual study (or the combined power of all the studies) but the extent to which different types of studies converge on the hypothesis.) In assessing whether exposure to benzene may cause APL, I have applied the Hill considerations . Nonetheless, application of those factors to a particular causal hypothesis, and the relative weight to assign each of them, is both context dependent and subject to the independent judgment of the scientist reviewing the available body of data. For example, some WOE approaches give higher weight to mechanistic information over epidemiological data.”

Smith Report at ¶¶19, 21 (citing Sheldon Krimsky, “The Weight of Scientific Evidence in Policy and Law,” 95(S1) Am. J. Public Health 5130, 5130-31 (2005))(March 9, 2009).  Smith marshaled several bodies of evidence, which he claimed collectively supported his opinion that benzene causes APL.  Milward, 664 F. Supp. 2d at 143.

Milward also offered the testimony of a philosophy professor, Carl F. Cranor, for the opinion that WOE was an acceptable methodology, and that all scientific inference is subject to judgment.  This is the same Cranor who, advocating for open admissions of all putative scientific opinions, showcased his confusion between statistical significance probability and the posterior probability involved in a conclusion of causality.  Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34(Oxford 1993)(“One can think of α, β (the chances of type I and type II errors, respectively) and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

After a four-day evidentiary hearing, the district court found that Martyn Smith’s opinion was merely a plausible hypothesis, and not admissible.  Milward, 664 F. Supp. 2d at 149.  The Court of Appeals, in an opinion by Chief Judge Lynch, however, reversed and ruled that an inference of general causation based on a WOE methodology satisfied the reliability requirement for admission under Federal Rule of Evidence 702.  639 F.3d at 26.  According to the Circuit, WOE methodology was scientifically sound,  Id. at 22-23.

 

WOE Cometh

Because the WOE methodology is not well described, either in the published literature or in Martyn Smith’s litigation report, it is difficult to understand exactly what the First Circuit approved by reversing Smith’s exclusion.  Usually the burden is on the proponent of the opinion testimony, and one would have thought that the vagueness of the described methodology would count against admissibility.  It is hard to escape the conclusion that the Circuit elevated a poorly described method, best characterized as hand waving, into a description of scientific method

The Panel appeared to have been misled by Carl F. Cranor, who described “inference to the best explanation” as requiring a scientist to “consider all of the relevant evidence” and “integrate the evidence using professional judgment to come to a conclusion about the best explanation. Id at 18. The available explanations are then weighed, and a would-be expert witness is free to embrace the one he feels offers the “best” explanation.  The appellate court’s opinion takes WOE, combined with Cranor’s “inference to the best explanation,” to hold that an expert witness need only opine that he has considered the range of plausible explanations for the association, and that he believes that the causal explanation is the best or “most plausible.”  Id. at 20 (upholding this approach as “methodologically reliable”).

What is missing of course is the realization that plausible does not mean established, reasonably certain, or even more likely than not.  The Circuit’s invocation of plausibility also obscures the indeterminacy of the available data for supporting a reliable conclusion of causation in many cases.

Curiously, the Panel likened WOE to the use of differential diagnosis, which is a method for inferring the specific cause of a particular patient’s disease or disorder.  Id. at 18.  This is a serious confusion between a method concerned with general causation and one concerned with specific causation.  Even if, by the principle of charity, we allow that the First Circuit was thinking of some process of differential etiology rather than diagnosis, given that diagnoses (other than for infectious diseases and a few pathognomonic disorders) do not usually carry with them information about unique etiologic agents.  But even such a process of differential etiology is a well-structured dysjunctive syllogism of the form:

A v B v C

~A ∩ ~B

∴ C

There is nothing subjective about assigning weights or drawing inferences in applying such a syllogism.  In the Milward case, one of the propositional facts that might have well explained the available evidence was chance, but plaintiff’s expert witness Smith could not and did not rule out chance in that the studies upon which he relied were not statistically significant.  Smith could thus never get past “therefore” in any syllogism or in any other recognizable process of reasoning.

The Circuit Court provides no insight into the process Smith used to weigh the available evidence, and it failed to address the analytical gaps and evidentiary insufficiencies addressed by the trial court, other than to invoke the mantra that all these issues go to “the weight, not the admissibility” of Smith’s opinions.  This, of course, is a conclusion, not an explanation or a legal theory.

There is also a cute semantic trick lurking in plaintiffs’ position in Milward, which results from their witnesses describing their methodology as “WOE.”  Since the jury is charged with determining the “weight of the evidence,” any evaluation of the WOE would be an invasion of the province of the jury.  Milward, 639 F.3d at 20. QED by the semantic device of deliberating conflating the name of the putative scientific methodology with the term traditionally used to describe jury fact finding.

In any event, the Circuit’s chastisement of the district court for evaluating Smith’s implementation of the WOE methodology, his logical, mathematical, and epidemiological errors, his result-driven reinterpretation of study data, threatens to read an Act of Congress — the Federal Rules of Evidence, and especially Rules 702 and 703 — out of existence by judicial fiat.  The Circuit’s approach is also at odds with Supreme Court precedent (now codified in Rule 702) on the importance and the requirement of evaluating opinion testimony for analytical gaps and the ipse dixit of expert witnesses.  General Electic Co. v. Joiner, 522 U.S. 136, 146 (1997).

 

Smith’s Errors in Recalculating Odds Ratios of Published Studies

In the district court, the defendants presented testimony of an epidemiologist, Dr. David H. Garabrant, who took Smith to task for calculating risk ratios incorrectly.  Smith did not have any particular expertise in epidemiologist, and his faulty calculations were problematic from the perspective of both Rule 702 and Rule 703.  The district court found the criticisms of Smith’s calculations convincing, 664 F. Supp. 2d at 149, but the appellate court held that the technical dispute was for the jury; “both experts’ opinions are supported by evidence and sound scientific reasoning,” Milward, 639 F.3d at 24.  This ruling is incomprehensible.  Plaintiffs had the burden of showing admissibility of Smith opinion generally, but also the reasonability of his reliance upon the calculated odds ratio.  The defendants had no burden of persuasion on the issue of Smith’s calculations, but they presented testimony, which apparently carried the day.  The appellate court had no basis for reversing the specific ruling with respect to the erroneously calculated risk ratio.

 

Smith’s Reliance upon Statistically Insignificant Studies

Smith relied upon studies that were not statistically significant at any accepted level.  An opinion of causality requires a showing that chance, bias, and confounding have been excluded in assessing an existing association.  Smith failed to exclude chance as an explanation for the association, and the burden to make this exclusion was on the plaintiffs. This failure was not something that could readily be patched by adverting to other evidence of studies in animals or in test tubes.    The Court of Appeals excused the important analytical gap in plaintiffs’ witness’s opinion because APL is rare, and data collection is difficult in the United States.  Id. at 24.  Evidence “consistent with” and “suggestive of” the challenged witness’s opinion thus suffices.  This is a remarkable homeopathic dilution of both legal and scientific causation.  Now we have a rule of law that allows plaintiffs to be excused from having to prove their case with reliable evidence if they allege a rare disease for which they lack evidence.

 

Leveling the Hierarchy of Evidence

Imagine trying to bring a medication to market with a small case-control study, with a non-statistically significant odds ratio!  Oh, but these clinical trials are so difficult and expensive; and they take such a long time.  Like a moment’s thought, when thinking is so hard and a moment such a long time.  We would be quite concerned if the FDA abridged the standard for causal efficacy in the licensing of new medications; we should be just as concerned about judicial abridgments of standards for causation of harm in tort actions.

Leveling the hierarchy of evidence has been an explicit or implicit goal of several law professors.  Some of the leveling efforts even show up in the new Reference Manual for Scientific Evidence (RMSE 3d ed. 2011).  SeeNew-Age Levellers – Flattening Hierarchy of Evidence.”

The Circuit, in Milward, quoted an article published in the Journal of the National Cancer Institute by Michele Carbone and others who suggest that there should be no hierarchy, but the Court ignored a huge body of literature that explains and defends the need for recognizing that not all study designs or types are equal.  Interestingly, the RMSE chapter on epidemiology by Professor Green (see more below) cites the same article.  RMSE 3d at 564 & n.48 (citing and quoting symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.” Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004).)  Carbone, of course, is best known for his advocacy of a viral cause (SV40), of human mesothelioma, a claim unsupported, and indeed contradicted, by epidemiologic studies.  Carbone’s statement does not support the RMSE chapter’s leveling of epidemiology and toxicology, and Carbone is, in any event, an unlikely source to cite.

The First Circuit, in Milward, studiously ignored a mountain of literature on evidence-based medicine, including the RSME 3d chapter on “Reference Guide on Medical Testimony,” which teaches that leveling of study designs and types is inappropriate. The RMSE chapter devotes several pages to explaining the role of study design in assessing an etiological issue:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.

When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” RMSE 3d 687, 723 -24 (2011).   The implication that there is no hierarchy of evidence in causal inference, and that tissue culture studies are as relevant as epidemiology, is patently absurd. The Circuit not only went out on a limb, it managed to saw the limb off, while “out there.”

 

Milward – Responses Critical and Otherwise

The First Circuit’s decision in Milward made an immediate impression upon those writers who have worked hard to dismantle or marginalize Rule 702.  The Circuit’s decision was mysteriously cited with obvious approval by Professor Margaret Berger, even though she had died before the decision was published!  Margaret A. Berger, “The Admissibility of Expert Testimony,” RMSE 3d at 20 & n. 51(2011).  Professor Michael Green, one of the reporters for the ALI’s Restatement (Third) of Torts hyperbolically called Milward “[o]ne of the most significant toxic tort causation cases in recent memory.”  Michael D. Green, “Introduction: Restatement of Torts as a Crystal Ball,” 37 Wm. Mitchell L. Rev. 993, 1009 n.53 (2011).

The WOE approach, and its embrace in Milward, obscures the reality that sometimes the evidence does not logically or analytically support the offered conclusion, and at other times, the best explanation is uncertainty.  By adopting the WOE approach, vague and ambiguous as it is, the Milward Court was beguiled into holding that WOE determinations are for the jury.  The lack of meaningful content of WOE means that decisions such as Milward effectively remove the gatekeeping function, or permit that function to be minimally satisfied by accepting an expert witness’s claim to have employed WOE.  The epistemic warrant required by Rule 702 is diluted if not destroyed.  Scientific hunch and speculation, proper in their place, can be passed off for scientific knowledge to gullible or result-oriented judges and juries.

Admissibility versus Sufficiency of Expert Witness Evidence

April 18th, 2012

Professors Michael Green and Joseph Sanders are two of the longest serving interlocutors in the never-ending discussion and debate about the nature and limits of expert witness testimony on scientific questions about causation.  Both have made important contributions to the conversation, and both have been influential in academic and judicial circles.  Professor Green has served as the co-reporter for the American Law Institute’s Restatement (Third) of Torts: Liability for Physical Harm.  Whether wrong or right, new publications about expert witness issues by Green or Sanders call for close attention.

Early last month, Professors Green and Sanders presented together at a conference, on “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States.” Video and audio of their presentation can be found online.  The authors posted a manuscript of their draft article on expert witness testimony to the Social Science Research Network.  See Michael D. Green & Joseph Sanders, “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States,” <downloaded on March 25, 2012>.

The authors argue that most judicial exclusions of expert witness causal opinion testimony are based upon a judgment that the challenged witness’s opinion is based upon insufficient evidence.  They point to litigations, such as the Bendectin and silicone gel breast implant cases, where the defense challenges were supported in part by a body of “exonerative” epidemiologic studies.  Legal theory construction is always fraught with danger in that it either stands to be readily refuted by counterexample, or it is put forward as a normative, prescriptive tool to change the world, thus lacking in descriptive or explanatory component.  Green and Sanders, however, seem to be earnest in suggesting that their reductionist approach is both descriptive and elucidative of actual judicial practice.

The authors’ reductionist approach in this area, and especially as applied to the Bendectin and silicone decisions, however, ignores that even before the so-called exonerative epidemiology on Bendectin and silicone was available, the plaintiffs’ expert witnesses were presenting opinions on general and specific causation, based upon studies and evidence of dubious validity. Given that the silicone litigation erupted before Daubert was decided, and Bendectin cases pretty much ended with Daubert, neither litigations really permit a clean before and after picture.  Before Daubert, courts struggled with how to handle both the invalidity and the insufficiency (once the impermissible inferences were stripped away) in the Bendectin cases.  And before Daubert, all silicone cases went to the jury.  Even after Daubert, for some time, silicone cases resulted in jury verdicts, which were upheld on appeal.  It took defendants some time to uncover the nature and extent of the invalidity in plaintiffs’ expert witnesses’ opinions, the invalidity of the studies upon which these witnesses relied, and the unreasonableness of the witnesses’ reliance upon various animal and in vitro toxicologic and immunologic studies. And it took trial courts a few years after the Supreme Court’s 1993 Daubert decision to warm up to their new assignment.  Indeed, Green and Sanders get a good deal of mileage in their reductionist approach from trial and appellate courts that were quite willing to collapse the distinction between reliability or validity on the one hand, and sufficiency on the other.  Some of those “back benching” courts used consensus statements and reviews, which both marshaled the contrary evidence as well as documented the invalidity of the would-be affirmative evidence.  This judicial reliance upon external sources that encompassed both sufficiency and reliability should not be understood to mean that reliability (or validity) is nothing other than sufficiency.

A post-Daubert line of cases is more revealing:  the claim that the ethyl mercury vaccine preservative, thimerosal, causes autism.  Professors Green and Sanders touch briefly upon this litigation.  See Blackwell v. Wyeth, 971 A.2d 235 (Md. 2009).  Plaintiff’s expert witness, David Geier, had published several articles in which he claimed to have supported a causal nexus between thimerosal and autism.  Green and Sanders dutifully note that the Maryland courts ultimately rejected the claims based upon Geier’s data as wholly inadequate, standing alone to support the inference he zealously urged to be drawn.  Id. at 32.  Whether this is sufficiency or the invalidity of his ultimate inference of causation from an inadequate data set perhaps can be debated, but surely the validity concerns should not be lost in the shuffle of evaluating the evidence available.  Of course, exculpatory epidemiologic studies ultimately were published, based upon high quality data and inferences, but strictly speaking, these studies were not necessary to the process of ruling Geier’s advocacy science out of bounds for valid scientific discourse and legal proceedings.

Some additional comments.

 

1. Questionable reductionism.  The authors describe the thrust of their argument as a need to understand judicial decisions on expert witness admissibility as “sufficiency judgments.”  While their analysis simplifies the gatekeeping decisions, it also abridges the process in a way that omits important determinants of the law and its application.  Sufficiency, or the lack thereof, is often involved as a fatal deficiency in expert witness opinion testimony on causal issues, but the authors’ attempt to reduce many exclusionary decisions to insufficiency determinations ignores the many ways that expert witnesses (and scientists in the real world outside of courtrooms) go astray.  The authors’ reductionism seems a weak, if not flawed, predictive, explanatory, and normative theory of expert witness gatekeeping.  Furthermore, this reductionism holds a false allure to judges who may be tempted to oversimplify their gatekeeping task by conflating gatekeeping with the jury’s role:  exclude the proffered expert witness opinion testimony because, considering all the available evidence, the testimony is probably wrong.

 

2. Weakness of peer review, publication, and general acceptance in predicting gatekeeping decisions.  The authors further describe a “sufficiency approach” as openly acknowledging the relative unimportance of peer review, publication, and general acceptance.  Id. at 39.  These factors do not lack importance because they are unrelated to sufficiency; they are unimportant because they are weak proxies for validity.  Their presence or absence does not really help predict whether the causal opinion offered is invalid,  or otherwise unreliable.  The existence of published, high-quality, peer-reviewed systematic reviews does, however, bear on sufficiency of the evidence.  At least in some cases, courts consider such reviews and rely upon them heavily in reaching a decision on Rule 702, but we should ask to what extent has the court simply avoided the hard work of thinking through the problem on its own.

 

3. Questionable indictment of juries and the adversarial system for the excesses of expert witnesses.  Professors Green and Sanders describe the development of common law, and rules, to control expert witness testimony as “a judicial attempt to moderate the worst consequences of two defining characteristics of United States civil trials:  party control of experts and the widespread use of jury decision makers.” Id. at 2.  There is no doubt that these are two contributing factors in some of the worst excesses, but the authors really offer no support for their causal judgment.  The experience of courts in Europe, where civil juries and party control of expert witnesses are often absent from the process, raises questions about the Green and Sanders’ attribution. See, e.g., R. Meester, M. Collins, R.D. Gill, M. van Lambalgen,  “On the (ab)use of statistics in the legal case against the nurse Lucia de B”. 5 Law, Probability and Risk 233 (2007) (describing the conviction of Nurse Lucia de Berk in the Netherlands, based upon shabby statistical evidence).

Perhaps a more general phenomenon is at play, such as an epistemologic pathology of expert witnesses who feel empowered and unconstrained by speaking in court, to untutored judges or juries.  The thrill of power, the arrogance of asserted opinion, the advancement of causes and beliefs, the lure of lucre, the freedom from contradiction, and a whole array of personality quirks are strong inducements for expert witnesses, in many countries, to outrun their scientific headlights.  See Judge Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans”; “[t]he breast implant litigation was largely based on a litigation fraud. … Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”)

In any event, there have been notoriously bad verdicts in cases decided by trial judges as the finders of fact.  See, e.g., Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D.Ga. 1985), aff’d and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986); Barrow v. Bristol-Meyers Squibb Co., 1998 WL 812318, at *23 (M.D. Fla., Oct. 29, 1998)(finding for breast implant plaintiff whose claims were supported by dubious scientific studies), aff’d, 190 F. 3d 541 (11th Cir. 1999).  Bad things can happen in the judicial process even without the participation of lay juries.

Green and Sanders are correct to point out that juries are often confused by scientific evidence, and lack the time, patience, education, and resources to understand it.  Same for judges.  The real difference is that the decisions of judges is public.  Judges are expected to explain their reasoning, and there is some, even if limited, appellate review for judicial gatekeeping decisions. In this vein, Green and Sanders dismiss the hand wringing over disagreements among courts on admissibility decisions by noting that similar disagreements over evidentiary sufficiency issues fill the appellate reporters.  Id. at 37.  Green and Sanders might well add that at least the disagreements are out in the open, advanced with supporting reasoning, for public discussion and debate, unlike the unimpeachable verdicts of juries and their cloistered, secretive reasoning or lack thereof.

In addition, Green and Sander’s fail to mention a considerable problem:  the admission of weak, pathologic, or overstated scientific opinion undermines confidence in the judicial judgments based upon verdicts that come out of a process that featured the dubious opinions of the expert witnesses.  The public embarrassment of the court system for its judgments, based upon questionable expert witness opinion testimony, was a strong inducement to changing the libertine pre-Daubert laissez-faire approach.

 

4.  Failure to consider the important role of Rule 703, which is quite independent of any “sufficiency” considerations, in the gatekeeping process.  Green and Sanders properly acknowledge the historical role that Rule 703, of the Federal Rules of Evidence, played in judicial attempts to regain some semblance of control over expert witness opinion.  They do not pursue the issue of its present role, which is often neglected and underemphasized.  In part, Rule 703, with its requirement that courts screen expert witness reliance upon independently inadmissible evidence (which means virtually all epidemiologic and animal studies and their data analyses), goes to the heart of gatekeeping by requiring judges to examine the quality of study data, and the reasonableness of reliance upon such data, by testifying expert witnesses.  See Schachtman, RULE OF EVIDENCE 703 — Problem Child of Article VII (Sept. 19, 2011).  Curiously, the authors try to force Rule 703 into their sufficiency pigeonhole even though it calls for a specific inquiry into the reasonableness (vel non) of reliance upon specific (hearsay or otherwise inadmissible) studies.  In my view, Rule 703 is predominantly a validity, and not a sufficiency, inquiry.

Judge Weinstein’s use of Rule 703, in In re Agent Orange, to strip out the most egregiously weak evidence did not predominantly speak to the evidentiary insufficiency of the plaintiffs’ expert witnesses reliance materials; nor did it look to the defendants’ expert witnesses’ reliance upon contradicting evidence.  Judge Weinstein was troubled by the plaintiffs’ expert witnesses reliance upon hearsay statements, from biased witnesses, of the plaintiffs’ medical condition.  Judge Weinstein did, of course, famously apply sufficiency criteria, including relative risks too low to permit an inference of specific causation, and the insubstantial totality of the evidence, but Judge Weinstein’s judicial philosophy then was to reject Rule 702 as a quality-control procedure for expert witness opinion testimony.  See In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 817 (E.D.N.Y. 1984)(plaintiffs must prove at least a two-fold increase in rate of disease allegedly caused by the exposure), aff’d, 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004  (1988); see also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1240, 1262 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).  A decade later, in the breast implant litigation, Judge Weinstein adhered to his rejection of Rule 702 to make explicit expert witness validity rulings or sufficiency determinations by granting summary judgment on the entire evidentiary display.  This assessment of sufficiency was not, however, driven by the rules of evidence; it was based firmly upon Federal Rule of Civil Procedure 56’s empowerment of the trial judge to make an overall assessment that plaintiffs lack a submissible case.  See In re Breast Implant Cases, 942 F.Supp. 958 (E. & S.D.N.Y. 1996)(granting summary judgment because of insufficiency of plaintiffs’ evidence, but specifically declining to rule on defendants’ Rule 702 and Rule 703 motions).  Within a few years, court-appointed expert witnesses, and the Institute of Medicine, weighed in with withering criticisms of plaintiffs’ attempted scientific case.  Given that there was so little valid evidence, sufficiency really never was at issue for these experts, but Judge Weinstein chose to frame the issue as sufficiency to avoid ruling on the pending motions under Rule 702.

 

5. Re-analyzing Re-analysis.  In the Bendectin litigation, some of the plaintiffs’ expert witnesses sought to offer various re-analyses of published papers.  Defendant Merrell Dow objected, and appeared to have framed its objections in general terms to unpublished re-analyses of published papers.  Green and Sanders properly note that some of the defense arguments, to the extent stated generally as prohibitions against re-analyses, were overblown and overstated.  Re-analyses can take so many forms, and the quality of peer reviewed papers is so variable, it would be foolhardy to frame a judicial rule as a prohibition against re-analyzing data in published studies.  Indeed, so many studies are published with incorrect statistical analyses that parties and expert witnesses have an obligation to call the problems to the courts’ attention, and to correct the problems when possible.

The notion that peer review was important in any way to serve as a proxy for reliability or validity has not been borne out.  Similarly, the suggestion that reanalyses of existing data from published papers were presumptively suspect was also not well considered.  Id. at 13.

 

6. Comments dismissive of statistical significance and methodological rigor.  Judgments of causality are, at the end of the day, qualitative judgments, but is it really true that:

“Ultimately, of course, regardless of how rigorous the methodology of more probative studies, the magnitude of any result and whether it is statistically significant, judgment and inference is required as to whether the available research supports an inference of causation.”

Id. at 16 (citing among sources a particularly dubious case, Milward v. Acuity Specialty Prods. Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied, ___ U.S. ___ (2012). Can the authors really intend to say that the judgment of causal inference is or should be made “regardless” of the rigor of methodology, regardless of statistical significance, regardless of a hierarchy of study evidentiary probitiveness?  Perhaps the authors simply meant to say that, at the end of the day, judgments of causal inference are qualitative judgments.  As much as I would like to extend the principle of charity to the authors, their own labeling of appellate decisions contrary to Milward as “silly,” makes the benefit of the doubt seem inappropriate.

 

7.  The shame of scientists and physicians opining on specific causation.  Green and Sanders acknowledge that judgments of specific causation – the causation of harm in a specific person – are often uninformed by scientific considerations, and that Daubert criteria are unhelpful.

“Unfortunately, outside the context of litigation this is an inquiry to which most doctors devote very little time.46  True, they frequently serve as expert witnesses in such cases (because the law demands evidence on this issue) but there is no accepted scientific methodology for determining the cause of an individual’s disease and, therefore, the error rate is simply unknown and unquantifiable.47”

Id. at 18. (Professor Green’s comments at the conference seemed even more apodictic.) The authors, however, seem to have no sense of outrage that expert witnesses offer opinions on this topic, for which the witnesses have no epistemic warrant, and that courts accept these facile, if not fabricated, judgments.  Furthermore, specific causation is very much a scientific issue.  Scientists may, as a general matter, concentrate on population studies that show associations, which may be found to be causal, but some scientists have worked on gene associations that define extremely high risk sub-populations that determine the overall population risk.  As Green and Sanders acknowledge, when the relative risks are extremely high (say > 100), we do not need to use any fancy math to know that most cases in the exposed group will result (but for) from their exposure.  A tremendous amount of scientific work has been done to identify biomarkers of increased risk, and to tie the increased risk to an agent-specific causal mechanism.  See, e.g., Gregory L. Erexson, James L. Wilmer, and Andrew D. Kligerman, “Sister Chromatid Exchange Induction in Human Lymphocytes Exposed to Benzene and Its Metabolites in Vitro,” 45 Cancer Research 2471 (1985).

 

8. Sufficiency versus admissibility.  Green and Sanders opine that many gatekeeping decisions, such as the Bendectin and breast implant cases, should be understood as sufficiency decisions that have incorporated the significant exculpatory epidemiologic evidence offered by defendants.  Id. at 20.  The “mature epidemiologic evidence” overwhelmed the plaintiffs’ meager evidence to the point that a jury verdict was not sustainable as a matter of law.  Id.  The authors’ approach requires a weighing of the complete evidentiary display, “entirely apart from the [plaintiffs’] expert’s testimony, to determine the overall sufficiency and reasonableness of the claimed inference of causation.  Id. at 21.  What is missing, however, from this approach is that even without the defendants’ mature or solid body of epidemiologic evidence, the plaintiff’s expert witness was urging an inference of causation based upon fairly insubstantial evidence. Green and Sanders are concerned, no doubt, that if sufficiency were the main driver of exclusionary rulings, then the disconnect between appellate standard of review for expert witness opinion admissibility, which is reversed only for an “abuse of discretion” by the trial court, and the standard of review for typical grants of summary judgments, which are evaluated “de novo” by the appellate court.  Green and Sanders hint that the expert witnesses decisions, which they see as mainly sufficiency judgments, may not be appropriate for the non-searching “abuse of discretion” standard.  See id. at 40 – 41 (citing the asymmetric “hard look” approach taken in In re Paoli RR Yard PCB Litig., 35 F.3d 717, 749-5- (3d Cir. 1994), and in the intermediate appellate court in Joiner itself).  Of course, the Supreme Court’s decision in Joiner was an abandonment of something akin to de novo hard-look appellate review, lopsidedly applied to exclusions only.  Decisions to admit did not lead to summary dispositions without trial and thus were never given any meaningful appellate review.

Elsewhere, Green and Sanders note that they do not necessarily share the doubts of the “hand wringers” over the inconsistent exclusionary rulings that result from an abuse of discretion standard.  At the end of their article, however, the authors note that viewing expert witness opinion exclusions as “sufficiency determinations” raises the question whether appellate courts should review these determinations de novo, as they would review ordinary factual “no evidence” or “insufficient evidence” grants of summary judgment.  Id. at 40.  There are reasonable arguments both ways, but it is worth pointing out that appellate decisions affirming rulings going both ways on the same expert witnesses, opining about the same litigated causal issue, are different from jury verdicts going both ways on causation.  First, the reasoning of the courts is, we hope, set out for public consumption, discussion, and debate, in a way that a jury’s deliberations are not.  Second, the fact of decisions “going both ways” is a statement that the courts view the issue as close and subject to debate.  Third, if the scientific and legal communities are paying attention, as they should, they can weigh in on the disparity, and on the stated reasons.  Assuming that courts are amenable to good reasons, they may have the opportunity to revisit the issue in a way that juries, which serve for one time on the causal issue, can never do.  We might hope that the better reasoned decisions, especially those that were supported by the disinterested scientific community, would have some persuasive authority,

 

9.  Abridgment of Rule 702’s approach to gatekeeping.  The authors’ approach to sufficiency also suffers from ignoring, not only Rule 703’s requirements into the reasonableness of reliance upon individual studies, but also from ignoring Rule 702 (c) and (d), which require that:

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

These subsections of Rule 702 do not readily allow the use of proxy or substitute measures of validity or reliability; they require the trial court to assess the expert witnesses’ reasoning from data to conclusions. In large part, Green and Sanders have been misled by the instincts of courts to retreat to proxies for validity in the form of “general acceptance,” “peer review,” and contrary evidence that makes the challenged opinion appear “insubstantial.”

There is a substantial danger that Green and Sander’s reductionist approach, and their equation of admissibility with sufficiency, will undermine trial courts’ willingness to assess the more demanding, and time-consuming, validity claims that are inherent in all expert witness causation opinions.

 

10. Weight-of-the evidence (WOE) reasoning.  The authors appear captivated by the use of so-called weight-of-the evidence (WOE) reasoning, questionably featured in some recent judicial decisions.  The so-called WOE method is really not much of a method at all, but rather a hand-waving process that often excuses the poverty of data and valid analysis.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).  See also Schachtman, “Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence” (Sept. 2, 2011).

In Allen v. Pennsylvania Engineering Corp., 102 F.3d 194 (5th Cir.1996), the appellate court disparaged WOE as a regulatory tool for making precautionary judgments, not fit for civil litigation that involves actual causation as opposed to “as if” judgments.  Green and Sanders pejoratively label the Allen court’s approach as “silly”:

“The idea that a regulatory agency would make a carcinogenicity determination if it were not the best explanation of the evidence, i.e., more likely than not, is silly.”

Id. at 29 n.82 (emphasis added).  But silliness is as silliness does.  Only a few pages later in their paper, Green and Sanders admit that:

“As some courts have noted, the regulatory threshold is lower than required in tort claims. With respect to the decision of the FDA to withdraw approval of Parlodel, the court in Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), judgment aff’d, 252 F.3d 986 (8th Cir. 2001), commented that the FDA’s withdrawal statement, “does not establish that the FDA had concluded that bormocriptine can cause an ICH [intreceberal hemorrhage]; instead, it indicates that in light of the limited social utility of bromocriptine in treating lactation and the reports of possible adverse effects, the drug should no longer be used for that purpose. For these reasons, the court does not believe that the FDA statement alone establishes the reliability of plaintiffs’ experts’ causation testimony.” Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), aff’d, 252 F.3d 986 (8th Cir. 2001).”

Id. at 34 n.101. Not only do the authors appear to contradict themselves on the burden of persuasion for regulatory decisions, they offer no support for their silliness indictment.  Certainly, regulatory decisions, and not only the FDA’s, are frequently based upon precautionary principles that involve applying uncertain, ambiguous, or confusing data analyses to the process of formulating protective rules and regulations in the absence of scientific knowledge.  Unlike regulatory agencies, which operate under the Administrative Procedures Act, federal courts, and many state courts, operate under Rule 702 and 703’s requirements that expert witness opinion have the epistemic warrant of “knowledge,” not hunch, conjecture, or speculation.

Judge Posner’s Digression on Regression

April 6th, 2012

Cases that deal with linear regression are not particularly exciting except to a small brand of “quant” lawyers who see such things “differently.”  Judge Posner, the author of several books, including Economic Analysis of Law (8th ed. 2011), is a judge who sees things differently as well.

In a case decided late last year, Judge Posner took the occasion to chide the district court and the parties’ legal counsel for failing to assess critically a regression analysis offered by an expert witness on the quantum of damages in a contract case.  ATA Airlines Inc. (ATA), a subcontractor of Federal Express Corporation, sued FedEx for breaching an alleged contract to include ATA in a lucrative U.S. military deal.

Remarkably, the contract liability was a non-starter; the panel of the Seventh Circuit reversed and rendered the judgment in favor of the plaintiff.  There never was a contract, and so the case should never have gone to trial.  ATA Airlines, Inc. v. Federal Exp. Corp., 665 F.3d 882, 888-89 (2011).

End of Story?

In a diversity case, based upon state law, with no liability, you would think that the panel would and perhaps should stop once it reached the conclusion that there was no contract upon which to predicate liability.  Anything more would be, of course, pure obiter dictum, but Judge Posner could not resist the teaching moment, both for the trial judge below, the parties, their counsel, and the bar:

“But we do not want to ignore the jury’s award of damages, which presents important questions that have been fully briefed and are bound to arise in future cases.”

Id. at 889. That award of damages was based upon plaintiff’s expert witness’s regression analysis.  Judge Posner was perhaps generous in suggesting that the damages issue, as it involved a regression analysis, had been fully briefed.  Neither party addressed the regression with the level of scrutiny given by Judge Posner and his colleagues, Judges Wood and Easterbrook.

The Federal Express defense lawyers were not totally asleep at the wheel; they did object on Rule 702 grounds to the regression analysis offered by plaintiff’s witness, Lawrence D. Morriss, a forensic accountant.

“There were, as we’re about to see, grave questions concerning the reliability of Morriss’s application of regression analysis to the facts. Yet in deciding that the analysis was admissible, all the district judge said was that FedEx’s objections ‘that there is no objective test performed, and that [Morriss] used a subjective test, and [gave] no explanation why he didn’t consider objective criteria’, presented issues to be explored on cross-examination at trial, and that ‘regression analysis is accepted, so this is not “junk science.” [Morriss] appears to have applied it. Although defendants disagree, he has applied it and come up with a result, which apparently is acceptable in some areas under some models. Simple regression analysis is an accepted model.”

Id. (quoting District Judge Richard L. Young).

Apparently it is not enough for trial judges within the Seventh Circuit to wave their hands and proclaim that objections go to weight not admissibility; nor is it sufficient to say that a generally accepted technique was involved in formulating an opinion without exploring whether the technique was employed properly and reliably.  Judge Posner’s rebuke was short on subtlety and tact in describing the district judge’s response to FedEx’s Rule 702 objections:

“This cursory, and none too clear, response to FedEx’s objections to Morriss’s regression analysis did not discharge the duty of a district judge to evaluate in advance of trial a challenge to the admissibility of an expert’s proposed testimony. The evaluation of such a challenge may not be easy; the ‘principles and methods’ used by expert witnesses will often be difficult for a judge to understand. But difficult is not impossible. The judge can require the lawyer who wants to offer the expert’s testimony to explain to the judge in plain English what the basis and logic of the proposed testimony are, and the judge can likewise require the opposing counsel to explain his objections in plain English.”

Id. The lawyers, including Federal Express’s lawyers, also came in for admonishment:

“This might not have worked in the present case; neither party’s lawyers, judging from the trial transcript and the transcript of the Rule 702 hearing and the briefs and oral argument in this court, understand regression analysis; or if they do understand it they are unable to communicate their understanding in plain English.”

Id.

The court and counsel are not without resources, as Judge Posner pointed out.  The trial court can appoint its own expert to assist in evaluating the parties’ expert witnesses’ opinions.  Alternatively, the trial judge could roll up his sleeves and read the chapter on regression analysis in the Reference Manual on Scientific Evidence (3d ed. 2011). Id. at 889-890.  Judge Posner’s opinion makes clear that had the trial court taken any of these steps, Morriss’s regression analysis would not have survived the Rule 702 challenge.

Morriss’s analysis was, to be sure, a rather peculiar regression of costs regressed on revenues.  Inexplicably, ATA’s witness made cost the dependent variable, with revenue the independent variable.  Common sense would have told the judge that revenue (gained or lost) should have been the dependent term in the analysis.  ATA’s expert witness attempted to justify this peculiar regression by claiming that that the more plausible variables that make up costs (personnel, labor, fuel, equipment) were not available.  Judge Posner would have none of this incredible excuse mongering:

“In any event, a plaintiff’s failure to maintain adequate records is not a justification for an irrational damages theory.”

Id. at 893.

Judge Posner proceeded to dissect Morriss’s regression in detail, both in terms of its design and implementation.  Interestingly, FedEx had a damages expert witness, who was not called at trial.  Judge Posner correctly observed that defendants frequently do not call their damages witnesses at trial lest the jury infer that they are less than sincere in their protestations about no liability.  The FedEx damages expert, however, had calculated a 95 percent confidence interval for Morriss’s prediction for ATA’s costs in a year after the alleged breach of contract.  (It is unclear whether the interval calculated was truly a confidence interval, or a prediction interval, which would have been wider.)  In any event, the interval included costs at the high end, which would have resulted in net losses, rather than net profits, as Morriss had opined.  “All else aside, the confidence interval is so wide that there can be no reasonable confidence in the jury’s damages award.”  Id. at 896.

After summarizing the weirdness of Morriss’s regression analysis, Judge Posner delivered his coup de grâce:

“This is not nitpicking. Morriss’s regression had as many bloody wounds as Julius Caesar when he was stabbed 23 times by the Roman Senators led by Brutus. We have gone on at such length about the deficiencies of the regression analysis in order to remind district judges that, painful as it may be, it is their responsibility to screen expert testimony, however technical; we have suggested aids to the discharge of that responsibility. The responsibility is especially great in a jury trial, since jurors on average have an even lower comfort level with technical evidence than judges. The examination and cross-examination of Morriss were perfunctory and must have struck most, maybe all, of the jurors as gibberish. It became apparent at the oral argument of the appeal that even ATA’s lawyer did not understand Morriss’s analysis; he could not answer our questions about it but could only refer us to Morriss’s testimony. And like ATA’s lawyer, FedEx’s lawyer, both at the trial and in his appellate briefs and at argument, could only parrot his expert.

***

If a party’s lawyer cannot understand the testimony of the party’s own expert, the testimony should be withheld from the jury. Evidence unintelligible to the trier or triers of fact has no place in a trial. See Fed.R.Evid. 403, 702.”

Id. at 896.  Ouch! Even being the victor can be a joyless occasion before Judge Posner.  For those who are interested in such things, the appellate briefs of the parties can be found on line, both for ATA and for FedEx.

It is interesting to compare Judge Posner’s close scrutiny and analysis of the plaintiff’s expert witness’s regression with how the United States Supreme Court treated a challenge to the use of multiple regression in a race discrimination case in the mid-1980s.  In Bazemore v. Friday, 478 U.S. 385 (1986), the defendant criticized the plaintiffs’ regression on grounds that it omitted variables for major factors in any fair, sensible model of salary.  The Fourth Circuit had treated the omissions as fatal, but the Supreme Court excused the omissions by shifting the burden of producing a sensible, reliable regression model to the defense:

“The Court of Appeals erred in stating that petitioners’ regression analyses were ‘unacceptable as evidence of discrimination’, because they did not include ‘all measurable variables thought to have an effect on salary level’. The court’s view of the evidentiary value of the regression analysis was plainly incorrect. While the omission of variables from a regression analysis may render the analysis less probative than it otherwise might be, it can hardly be said, absent some other infirmity, that an analysis which accounts for the major factors ‘must be considered unacceptable as evidence of discrimination’. Ibid. Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.

Id. at 400.  The Court, buried in a footnote, made an abstract concession that “there may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.” Id. at 400 n.15.  When the Court decided Bazemore, the federal courts were still enthralled with their libertine approach to expert witness evidence.  It is unclear whether a straightforward analysis of the plaintiffs’ regression analyses in Bazemore under current Rule 702, without the incendiary claims of racism, would have permitted a more dispassionate analysis of the proffered evidence.

The New York Times Goes to War Against Generic Drug Manufacturers

March 25th, 2012

Last week marked the launch of a New York Times rhetorically fevered, legally sophomoric campaign against generic drug preemption.  Saturday saw an editorial, “A Bizarre Outcome on Generic Drugs,” New York Times (March 24, 2012), which screamed, “Bizarre”!  “Outrageous”!

The New York Times editorialists have their knickers in a knot over the inability of people, who are allegedly harmed by adverse drug reactions from generic medications, to sue the generic manufacturers.  The editorial follows a front-page article, from earlier last week, which decried the inability to sue generic drug sellers. See Katie Thomas, “Generic Drugs Proving Resistant to Damage Suits,” New York Times (Mar. 21, 2012).

The Times‘ writers think that it is “bizarre” and “outrageous” that these people are out of court due to federal preemption of state court tort laws that might have provided a remedy.

In particular, the Times suggests that the law is irrational for allowing Ms. Diana Levine to recover against Wyeth for the loss of her arm to gangrene after receiving Phenergan by intravenous push, while another plaintiff, Ms. Schork, cannot recover for a similar injury, from a generic manufacturer of promethazine, the same medication.  Wyeth v. Levine, 555 U.S. 555 (2009).  See also Brief of Petitioner Wyeth, in Wyeth v. Levine (May 2008).

Of course, both Ms. Levine and Ms. Schork received compensation from their healthcare providers, who deviated from their standard of care when they carelessly injected the medication into arteries, contrary to clear instructions.   At the time that Levine received her treatment, the Phenergan package insert contained four separate warnings about the risk of gangrene from improper injection of the medication into an artery.  For instance, the “Adverse Reactions” section of the Phenergan label indicated: “INTRA-ARTERIAL INJECTION [CAN] RESULT IN GANGRENE OF THE AFFECTED EXTREMITY.”

As any tort practitioner knows, a plaintiff can almost always second guess the adequacy of a warning, but the Levine case did not involve any real failure to warn, or failure to warn adequately.  Healthcare providers and the FDA are well aware of the dangers of intra-arterial injection of Phenergan, but plaintiffs’ counsel can always play the hindsight bias game, and ask for something more.

What distorts tort law and creates irrational, disparate outcomes often has more to do with who has the deep pocket.  The Physician’s Assistant who screwed up and caused Ms. Levine’s gangrene settled, but Ms. Levine was left free to pursue the manufacturer despite the negligence of the P.A.  According to the Times article, Ms. Schork, who was injured from a careless injection of the generic version, pursued a similar strategy, and obtained a “limited settlement” from her healthcare provider (amount not specified).

Many states have caps on medical malpractice awards, and often plaintiffs’ counsel are reluctant to push hard against healthcare providers who will be viewed as sympathetic by juries.  Punitive damages are virtually impossible to obtain against physicians, physician assistants, and nurses, but are commonplace against large pharmaceutical companies.  So tort law creates a perverse set of incentives to ignore the truly culpable parties, or “to go lightly” on them, and to pursue the manufacturing defendant on the vagaries of “adequacy” of warning. The incentives are so corrupting that in some states, the healthcare providers, afraid of insurance premium hikes, turn to plaintiffs’ counsel to offer support against the manufacturing defendant.

What is bizarre and outrageous is that anyone would have a remedy against the manufacturer in the facts of the Levine case, or in similar cases.

The Times notes that federal preemption of state tort law in lawsuits against generics follows from the generic manufacturers’ lack of control over package insert labels.  Shilling for the lawsuit industry, the Times fails to mention that it is the original sponsor of the medication, the proprietary pharmaceutical manufacturer, which prepared the new drug application, and which has had the longest experience with post-marketing surveillance of the medication.

Professor Deborah Mayo asks whether anyone is responsible for updating the label after a medication has lost its patent protection.  SeeGeneric Drugs Resistant to Lawsuits” (Mar. 22, 2012)

If the question is raised in the context of the sponsor’s continued sale of the medication, then the sponsor still must fulfill its product stewardship.  The FDA has a considerable interest in maintaining consistency across labels, and the agency has generally been reluctant to permit one manufacturer of a drug, or in a class of drugs, to deviate substantially from what the agency believes is the appropriate level of warning in the context of the indicated use.  Of course, in the case of Phenergan, none of these considerations was really involved, because the FDA was well aware of the danger of gangrene from arterial injection, and the FDA-approved label contained multiple references to the hazard.

Tort law is the wrong hobby horse to ride into battle for an equality of results.  Two patients may have equally careless healthcare providers, but one of them gets lucky and accidentally manages to inject the IV into a vein.  The other is unfortunate, and suffers grievous injury because her physician negligently injected the medication into an artery.  Tort law provides a remedy only against the careless provider whose negligence causes harm.  The other negligent provider goes free, to inflict malpractice on another victim.

Or consider two equally meritorious plaintiffs who sue two companies under almost identical facts.  One of the companies is insolvent. One of the plaintiffs will thus go uncompensated.

Or suppose we have two plaintiffs, each of whom receives a particular medication from his physician and is harmed by a well-recognized adverse drug reaction.  Although well understood by the medical professional, further assume that the harm is not adequately addressed in the package insert. One plaintiff’s physician acknowledges that he was well aware of the reaction; the other plaintiff’s physician is incompetent and says he never heard of the problem.  The learned intermediary doctrine will properly cut off the claim of the knowledgeable physician; the plaintiff whose physician was incompetent may still have a claim against the manufacturer.  Lucky plaintiff to have had such an ill-trained physician.

Similarly, if a plaintiff’s physician testifies that he never reads package inserts, then the plaintiff may lose any remedy because regardless whether the warning label was adequate or not, the prescribing physician’s carelessness prevented the actual labeling, and any hypothetical proposed labeling, from influencing his prescribing practice.

The Times quotes Michael Johnson, a lawyer who represented Gladys Mensing, whose case gave rise to the Supreme Court’s ruling on generic preemption: “Your pharmacists aren’t telling you, hey, when we fill this with your generic, you are giving up all or your legal remedies.”  Well, if you have a well-trained physician, who is aware of the risks of a medication, as he or she should be, you may have given up the rights you had if your physician was an ignoramus, willfully ignorant of the medication’s unavoidable risks.   Pliva, Inc. v. Mensing, 131 S. Ct. 2567 (2011).

The point is that tort law is not, and has never been, a system of generating equal outcomes; nor is tort law a system of insuring people against bad results.  Generic medications cost less.  People who are looking to buy lawsuits may choose to buy proprietary pharmaceutical manufacturers’ drugs.  People who are looking to buy good healthcare may choose to select their healthcare providers more carefully, and look to a system that better places the cost of harm on those parties that can best avoid the bad outcome.  In the Levine and Schork cases, the most efficient, and the fairest, cost and risk avoiders were the women’s healthcare providers.

Is it absurd for recovery to be limited against generics when allowed against proprietary pharmaceutical manufacturers?  Yes, because the proprietary manufacturers should have similar protection from the welter of state tort laws.  The irrational result in Levine is not to be imitated.

Confidence in Intervals and Diffidence in the Courts

March 4th, 2012

Next year, the Supreme Court’s Daubert decision will turn 20.  The decision, in interpreting Federal Rule of Evidence 702, dramatically changed the landscape of expert witness testimony.  Still, there are many who would turn the clock back to disabling the gatekeeping function.  In past posts, I have identified scholars, such as Erica Beecher-Monas and the late Margaret Berger, who tried to eviscerate judicial gatekeeping.  Recently a student note argued for the complete abandonment of all judicial control of expert witness testimony.  See  Note, “Admitting Doubt: A New Standard for Scientific Evidence,” 123 Harv. L. Rev. 2021 (2010)(arguing that courts should admit all relevant evidence).

One advantage that comes from requiring trial courts to serve as gatekeepers is that the expert witnesses’ reasoning is approved or disapproved in an open, transparent, and rational way.  Trial courts subject themselves to public scrutiny in a way that jury decision making does not permit.  The critics of Daubert often engage in a cynical attempt to remove all controls over expert witnesses in order to empower juries to act on their populist passions and prejudices.  When courts misinterpret statistical and scientific evidence, there is some hope of changing subsequent decisions by pointing out their errors.  Jury errors on the other hand, unless they involve determinations of issues for which there were “no evidence,” are immune to institutional criticism or correction.

Despite my whining, not all courts butcher statistical concepts.  There are many astute judges out there who see error and call it error.  Take for instance, the trial judge who was confronted with this typical argument:

“While Giles admits that a p-value of .15 is three times higher than what scientists generally consider statistically significant—that is, a p-value of .05 or lower—she maintains that this ‘‘represents 85% certainty, which meets any conceivable concept of preponderance of the evidence.’’ (Doc. 103 at 16).”

Giles v. Wyeth, Inc., 500 F.Supp. 2d 1048, 1056-57 (S.D.Ill. 2007), aff’d, 556 F.3d 596 (7th Cir. 2009).  Despite having case law cited to it (such as In re Ephedra), the trial court looked to the Reference Manual on Scientific Evidence, a resource that seems to be ignored by many federal judges, and rejected the bogus argument.  Unfortunately, the lawyers who made the bogus argument still are licensed, and at large, to incite the same error in other cases.

This business perhaps would be amenable to an empirical analysis.  An enterprising sociologist of the law could conduct some survey research on the science and math training of the federal judiciary, on whether the federal judges have read chapters of the Reference Manual before deciding cases involving statistics or science, and whether federal judges expressed the need for further education.  This survey evidence could be capped by an analysis of the prevalence of certain kinds of basic errors, such as the transpositional fallacy committed by so many judges (but decisively rejected in the Giles case).  Perhaps such an empirical analysis would advance our understanding whether we need specialty science courts.

One of the reasons that the Reference Manual on Scientific Evidence is worthy of so much critical attention is that the volume has the imprimatur of the Federal Judicial Center, and now the National Academies of Science.  Putting aside the idiosyncratic chapter by the late Professor Berger, the Manual clearly present guidance on many important issues.  To be sure, there are gaps, inconsistencies, and mistakes, but the statistics chapter should be a must-read for federal (and state) judges.

Unfortunately, the Manual has competition from lesser authors whose work obscures, misleads, and confuses important issues.  Consider an article by two would-be expert witnesses, who testify for plaintiffs, and confidently misstate the meaning of a confidence interval:

“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”

Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004).  This misstatement was then cited and quoted with obvious approval by Professor Beecher-Monas, in her text on scientific evidence.  Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 60-61 n. 17 (2007).   Beecher-Monas goes on, however, to argue that confidence interval coefficients are not the same as burdens of proof, but then implies that scientific standards of proof are different from the legal preponderance of the evidence.  She provides no citation or support for the higher burden of scientific proof:

“Some commentators have attributed the causation conundrum in the courts to the differing burdens of proof in science and law.28 In law, the civil standard of ‘more probable than not’ is often characterized as a probability greater than 50 percent.29 In science, on the other hand, the most widely used standard is a 95 percent confidence interval (corresponding to a 5 percent level of significance, or p-level).30 Both sound like probabilistic assessment. As a result, the argument goes, civil judges should not exclude scientific testimony that fails scientific validity standards because the civil legal standards are much lower. The transliteration of the ‘more probable than not’ standard of civil factfinding into a quantitative threshold of statistical evidence is misconceived. The legal and scientific standards are fundamentally different. They have different goals and different measures.  Therefore, one cannot justifiably argue that evidence failing to meet the scientific standards nonetheless should be admissible because the scientific standards are too high for preponderance determinations.”

Id. at 65.  This seems to be on the right track, although Beecher-Monas does not state clearly whether she subscribes to the notion that the burdens of proof in science and law differ.  The argument then takes a wrong turn:

“Equating confidence intervals with burdens of persuasion is simply incoherent. The goal of the scientific standard – the 95 percent confidence interval – is to avoid claiming an effect when there is none (i.e., a false positive).31

Id. at 66.   But this is crazy error; confidence intervals are not burdens of persuasion, legal or scientific.  Beecher-Monas is not, however, content to leave this alone:

“Scientists using a 95 percent confidence interval are making a prediction about the results being due to something other than chance.”

Id. at 66 (emphasis added).  Other than chance?  Well this implies causality, as well as bias and confounding, but the confidence interval, like the p-value, addresses only random or sampling error.  Beecher-Monas’s error is neither random nor scientific.  Indeed, she perpetuates the same error committed by the Fifth Circuit in a frequently cited Bendectin case, which interpreted the confidence interval as resolving questions of the role of matters “other than chance,” such as bias and confounding.  Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989)(“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”)(emphasis in original).  See, e.g., David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The New Wigmore – A Treatise on Evidence:  Expert Evidence § 12.6.4, at 546 (2d ed. 2011) Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009)(criticizing the overinterpretation of confidence intervals by the Brock court).

Clapp, Ozonoff, and Beecher-Monas are not alone in offering bad advice to judges who must help resolve statistical issues.  Déirdre Dwyer, a prominent scholar of expert evidence in the United Kingdom, manages to bundle up the transpositional fallacy and a misstatement of the meaning of the confidence interval into one succinct exposition:

“By convention, scientists require a 95 per cent probability that a finding is not due to chance alone. The risk ratio (e.g. ‘2.2’) represents a mean figure. The actual risk has a 95 per cent probability of lying somewhere between upper and lower limits (e.g. 2.2 ±0.3, which equals a risk somewhere between 1.9 and 2.5) (the ‘confidence interval’).”

Déirdre Dwyer, The Judicial Assessment of Expert Evidence 154-55 (Cambridge Univ. Press 2008).

Of course, Clapp, Ozonoff, Beecher-Monas, and Dwyer build upon a long tradition of academics’ giving errant advice to judges on this very issue.  See, e.g., Christopher B. Mueller, “Daubert Asks the Right Questions:  Now Appellate Courts Should Help Find the Right Answers,” 33 Seton Hall L. Rev. 987, 997 (2003)(describing the 95% confidence interval as “the range of outcomes that would be expected to occur by chance no more than five percent of the time”); Arthur H. Bryant & Alexander A. Reinert, “The Legal System’s Use of Epidemiology,” 87 Judicature 12, 19 (2003)(“The confidence interval is intended to provide a range of values within which, at a specified level of certainty, the magnitude of association lies.”) (incorrectly citing the first edition of Rothman & Greenland, Modern Epidemiology 190 (Philadelphia 1998);  John M. Conley & David W. Peterson, “The Science of Gatekeeping: The Federal Judicial Center’s New Reference Manual on Scientific Evidence,” 74 N.C.L.Rev. 1183, 1212 n.172 (1996)(“a 95% confidence interval … means that we can be 95% certain that the true population average lies within that range”).

Who has prevailed?  The statistically correct authors of the statistics chapter of the Reference Manual on Scientific Evidence, or the errant commentators?  It would be good to have some empirical evidence to help evaluate the judiciary’s competence. Here are some cases, many drawn from the Manual‘s discussions, arranged chronologically, before and after the first appearance of the Manual:

Before First Edition of the Reference Manual on Scientific Evidence:

DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 948 (3d Cir. 1990)(“A 95% confidence interval is constructed with enough width so that one can be confident that it is only 5% likely that the relative risk attained would have occurred if the true parameter, i.e., the actual unknown relationship between the two studied variables, were outside the confidence interval.   If a 95% confidence interval thus contains ‘1’, or the null hypothesis, then a researcher cannot say that the results are ‘statistically significant’, that is, that the null hypothesis has been disproved at a .05 level of significance.”)(internal citations omitted)(citing in part, D. Barnes & J. Conley, Statistical Evidence in Litigation § 3.15, at 107 (1986), as defining a CI as “a limit above or below or a range around the sample mean, beyond which the true population is unlikely to fall”).

United States ex rel. Free v. Peters, 806 F. Supp. 705, 713 n.6 (N.D. Ill. 1992) (“A 99% confidence interval, for instance, is an indication that if we repeated our measurement 100 times under identical conditions, 99 times out of 100 the point estimate derived from the repeated experimentation will fall within the initial interval estimate … .”), rev’d in part, 12 F.3d 700 (7th Cir. 1993)

DeLuca v. Merrell Dow Pharms., Inc., 791 F. Supp. 1042, 1046 (D.N.J. 1992)(”A 95% confidence interval means that there is a 95% probability that the ‘true’ relative risk falls within the interval”) , aff’d, 6 F.3d 778 (3d Cir. 1993)

Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1353-54 & n.1 (6th Cir. 1992)(describing a 95% CI of 0.8 to 3.10, to mean that “random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10”)

Hilao v. Estate of Marcos, 103 F.3d 767, 787 (9th Cir. 1996)(Rymer, J., dissenting and concurring in part).

After the first publication of the Reference Manual on Scientific Evidence:

American Library Ass’n v. United States, 201 F.Supp. 2d 401, 439 & n.11 (E.D.Pa. 2002), rev’d on other grounds, 539 U.S. 194 (2003)

SmithKline Beecham Corp. v. Apotex Corp., 247 F.Supp.2d 1011, 1037-38 (N.D. Ill. 2003)(“the probability that the true value was between 3 percent and 7 percent, that is, within two standard deviations of the mean estimate, would be 95 percent”)(also confusing attained significance probability with posterior probability: “This need not be a fatal concession, since 95 percent (i.e., a 5 percent probability that the sign of the coefficient being tested would be observed in the test even if the true value of the sign was zero) is an  arbitrary measure of statistical significance.  This is especially so when the burden of persuasion on an issue is the undemanding ‘preponderance’ standard, which  requires a confidence of only a mite over 50 percent. So recomputing Niemczyk’s estimates as significant only at the 80 or 85 percent level need not be thought to invalidate his findings.”), aff’d on other grounds, 403 F.3d 1331 (Fed. Cir. 2005)

In re Silicone Gel Breast Implants Prods. Liab. Litig, 318 F.Supp.2d 879, 897 (C.D. Cal. 2004) (interpreting a relative risk of 1.99, in a subgroup of women who had had polyurethane foam covered breast implants, with a 95% CI that ran from 0.5 to 8.0, to mean that “95 out of 100 a study of that type would yield a relative risk somewhere between on 0.5 and 8.0.  This huge margin of error associated with the PUF-specific data (ranging from a potential finding that implants make a woman 50% less likely to develop breast cancer to a potential finding that they make her 800% more likely to develop breast cancer) render those findings meaningless for purposes of proving or disproving general causation in a court of law.”)(emphasis in original)

Ortho–McNeil Pharm., Inc. v. Kali Labs., Inc., 482 F.Supp. 2d 478, 495 (D.N.J.2007)(“Therefore, a 95 percent confidence interval means that if the inventors’ mice experiment was repeated 100 times, roughly 95 percent of results would fall within the 95 percent confidence interval ranges.”)(apparently relying party’s expert witness’s report), aff’d in part, vacated in part, sub nom. Ortho McNeil Pharm., Inc. v. Teva Pharms Indus., Ltd., 344 Fed.Appx. 595 (Fed. Cir. 2009)

Eli Lilly & Co. v. Teva Pharms, USA, 2008 WL 2410420, *24 (S.D.Ind. 2008)(stating incorrectly that “95% percent of the time, the true mean value will be contained within the lower and upper limits of the confidence interval range”)

Benavidez v. City of Irving, 638 F.Supp. 2d 709, 720 (N.D. Tex. 2009)(interpreting a 90% CI to mean that “there is a 90% chance that the range surrounding the point estimate contains the truly accurate value.”)

Estate of George v. Vermont League of Cities and Towns, 993 A.2d 367, 378 n.12 (Vt. 2010)(erroneously describing a confidence interval to be a “range of values within which the results of a study sample would be likely to fall if the study were repeated numerous times”)

Correct Statements

There is no reason for any of these courts to have struggled so with the concept of statistical significance or of the confidence interval.  These concepts are well elucidated in the Reference Manual on Scientific Evidence (RMSE):

“To begin with, ‘confidence’ is a term of art. The confidence level indicates the percentage of the time that intervals from repeated samples would cover the true value. The confidence level does not express the chance that repeated estimates would fall into the confidence interval.91

* * *

According to the frequentist theory of statistics, probability statements cannot be made about population characteristics: Probability statements apply to the behavior of samples. That is why the different term ‘confidence’ is used.”

RMSE 3d at 247 (2011).

Even before the Manual, many capable authors have tried to reach the judiciary to help them learn and apply statistical concepts more confidently.  Professors Michael Finkelstein and Bruce Levin, of the Columbia University’s Law School and Mailman School of Public Health, respectively, have worked hard to educate lawyers and judges in the important concepts of statistical analyses:

“It is the confidence limits PL and PU that are random variables based on the sample data. Thus, a confidence interval (PL, PU ) is a random interval, which may or may not contain the population parameter P. The term ‘confidence’ derives from the fundamental property that, whatever the true value of P, the 95% confidence interval will contain P within its limits 95% of the time, or with 95% probability. This statement is made only with reference to the general property of confidence intervals and not to a probabilistic evaluation of its truth in any particular instance with realized values of PL and PU. “

Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers at 169-70 (2d ed. 2001)

Courts have no doubt been confused to some extent between the operational definition of a confidence interval and the role of the sample point estimate as an estimator of the population parameter.  In some instances, the sample statistic may be the best estimate of the population parameter, but that estimate may be rather crummy because of the sampling error involved.  See, e.g., Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 158 (3d ed. 2008) (“Although a single confidence interval can be much more informative than a single P-value, it is subject to the misinterpretation that values inside the interval are equally compatible with the data, and all values outside it are equally incompatible. * * *  A given confidence interval is only one of an infinite number of ranges nested within one another. Points nearer the center of these ranges are more compatible with the data than points farther away from the center.”); Nicholas P. Jewell, Statistics for Epidemiology 23 (2004)(“A popular interpretation of a confidence interval is that it provides values for the unknown population proportion that are ‘compatible’ with the observed data.  But we must be careful not to fall into the trap of assuming that each value in the interval is equally compatible.”); Charles Poole, “Confidence Intervals Exclude Nothing,” 77 Am. J. Pub. Health 492, 493 (1987)(“It would be more useful to the thoughtful reader to acknowledge the great differences that exist among the p-values corresponding to the parameter values that lie within a confidence interval … .”).

Admittedly, I have given an impressionistic account, and I have used anecdotal methods, to explore the question whether the courts have improved in their statistical assessments in the 20 years since the Supreme Court decided Daubert.  Many decisions go unreported, and perhaps many errors are cut off from the bench in the course of testimony or argument.  I personally doubt that judges exercise greater care in their comments from the bench than they do in published opinions.  Still, the quality of care exercised by the courts would be a worthy area of investigation by the Federal Judicial Center, or perhaps by other sociologists of the law.

Relative of Risk > Two in the Courts – Updated

March 3rd, 2012

See , for the updated the case law on the issue of using relative and attributable risks to satisfy plaintiff’s burden of showing, more likely than not, that an exposure or condition caused a plaintiff’s disease or injury.

Scientific illiteracy among the judiciary

February 29th, 2012

Ken Feinberg, speaking at a symposium on mass torts, asks what legal challenges do mass torts confront in the federal courts.  The answer seems obvious.

Pharmaceutical cases that warrant federal court multi-district litigation (MDL) treatment typically involve complex scientific and statistical issues.  The public deserves having MDL cases assigned to judges who have special experience and competence to preside in cases in which these complex issues predominate.  There appears to be no procedural device to ensure that the judges selected in the MDL process have the necessary experience and competence, and a good deal of evidence to suggest that the MDL judges are not up to the task at hand.

In the aftermath of the Supreme Court’s decision in Daubert, the Federal Judicial Center assumed responsibility for producing science and statistics tutorials to help judges grapple with technical issues in their cases.  The Center has produced videotaped lectures as well as the Reference Manual on Scientific Evidence, now in its third edition.  Despite the Center’s best efforts, many federal judges have shown themselves to be incorrigible.  It is time to revive the discussions and debates about implementing a “science court.”

The following three federal MDLs all involved pharmaceutical products, well-respected federal judges, and a fundamental error in statistical inference.

Avandia

Avandia is a prescription oral anti-diabetic medication licensed by GlaxoSmithKline (GSK).  Concerns over Avandia’s association with excess heart attack risk resulted in regulatory revisions of its availability, as well as thousands of lawsuits.  In a decision that affected virtually all of those several thousand claims, aggregated for pretrial handing in a federal MDL, a federal judge, in ruling on a Rule 702 motion, described a clinical trial with a risk ratio greater than 1.0, with a p-value of 0.08, as follows:

“The DREAM and ADOPT studies were designed to study the impact of Avandia on prediabetics and newly diagnosed diabetics. Even in these relatively low-risk groups, there was a trend towards an adverse outcome for Avandia users (e.g., in DREAM, the p-value was .08, which means that there is a 92% likelihood that the difference between the two groups was not the result of mere chance).FN72

In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011)(Rufe, J.).  This is a remarkable error by a trial judge given the responsibility for pre-trial handling of so many cases.  There are many things you can argue about a p-value of 0.08, but Judge Rufe’s interpretation is not an argument; it is error.  That such an error, explicitly warned against in the Reference Manual on Scientific Evidence, could be made by an MDL judge, over 15 years since the first publication of the Manual, highlights the seriousness and the extent of the illiteracy problem.

What possible basis could the Avandia MDL court have to support this clearly erroneous interpretation of crucial studies in the litigation?  Footnote 72 in Judge Rufe’s opinion references a report by plaintiffs’ expert witness, Allan D. Sniderman, M.D, “a cardiologist, medical researcher, and professor at McGill University.” Id. at *10.  The trial court goes on to note that:

“GSK does not challenge Dr. Sniderman’s qualifications as a cardiologist, but does challenge his ability to analyze and draw conclusions from epidemiological research, since he is not an epidemiologist. GSK’s briefs do not elaborate on this challenge, and in any event the Court finds it unconvincing given Dr. Sniderman’s credentials as a researcher and published author, as well as clinician, and his ability to analyze the epidemiological research, as demonstrated in his report.”

Id.

What more evidence could the Avandia MDL trial court possibly have needed to show that Sniderman was incompetent to give statistical and epidemiologic testimony?  Fundamentally at odds with the Manual on an uncontroversial point, Sniderman had given the court a baseless, incorrect interpretation of a p-value.  Everything else he might have to say on the subject was likely suspect.  If, as the court suggested, GSK did not elaborate upon its challenge with specific examples, then shame on GSK. The trial court, however, could have readily determined that Sniderman was speaking nonsense by reading the chapter on statistics in the Reference Manual on Scientific Evidence.  For all my complaints about gaps in coverage in the Manual, the text, on this issue is clear and concise. It really is not too much to expect an MDL trial judge to be conversant with the basic concepts of scientific and statistical evidence set out in the Manual, which is prepared to help federal judges.

Phenylpropanolamine (PPA) Litigation

Litigation over phenylpropanolamine was aggregated, within the federal system, before Judge Barbara Rothstein.  Judge Rothstein is not only a respected federal trial judge, she was the director of the Federal Judicial Center, which produces the Reference Manual on Scientific Evidence.  Her involvement in overseeing the preparation of the third edition of the Manual, however, did not keep Judge Rothstein from badly misunderstanding and misstating the meaning of a p-value in the PPA litigation.  See In re Phenylpropanolamine (PPA) Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1236 n.1 (W.D. Wash. 2003)(“P-values measure the probability that the reported association was due to chance… .”).  Tellingly, Judge Rothstein denied, in large part, the defendants’ Rule 702 challenges.  Juries, however, overwhelmingly rejected the claims that PPA caused their strokes.

Ephedra Litigation

Judge Rakoff, of the Southern District of New York, notoriously committed the transposition fallacy in the Ephedra litigation:

“Generally accepted scientific convention treats a result as statistically significant if the P-value is not greater than .05. The expression ‘P=.05’ means that there is one chance in twenty that a result showing increased risk was caused by a sampling error—i.e., that the randomly selected sample accidentally turned out to be so unrepresentative that it falsely indicates an elevated risk.”

In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005).

Judge Rakoff then fallaciously argued that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant.  Id. at 188, 193.  See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009).

Judge Rakoff may well have had help in confusing the probability used to characterize the plaintiff’s burden of proof with the probability of attained significance.  At least one of the defense expert witnesses in the Ephedra cases gave an erroneous definition of “statistically significant association,” which may have invited the judicial error:

“A statistically significant association is an association between exposure and disease that meets rigorous mathematical criteria demonstrating that the finding is unlikely to be the result of chance.”

Report of John Concato, MD, MS, MPH, at 7, ¶29 (Sept. 13, 2004).  Dr. Concato’s error was picked up and repeated in the defense briefing of its motion to preclude:

“The likelihood that an observed association could occur by chance alone is evaluated using tests for statistical significance.”

Memorandum of Law in Support of Motion by Ephedra Defendants to Exclude Expert Opinions of Charles Buncher, [et alia] …That Ephedra Causes Hemorrhagic Stroke, Ischemic Stroke, Seizure, Myocardial Infarction, Sudden Cardiac Death, and Heat-Related Illnesses at 9 (Dec. 3, 2004).

Judge Rakoff’s insistence that requiring “statistical significance” at the customary 5% level would change the plaintiffs’ burden of proof, and require greater certitude for epidemiologists than for other expert witnesses who opine in less “rigorous” fields of learning, is wrong as a matter of fact.  His Honor’s comparison, however, ignores the Supreme Court’s observation that the point of Rule 702 is:

‘‘to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

Judge Rakoff not only ignored the conditional nature of significance probability, but he overinterpreted the role of significance testing in arriving at a conclusion of causality.  Statistical significance may answer the question of the strength of the evidence for ruling out chance in producing the data observed based upon an assumption of the no risk, but it doesn’t alone answer the question whether the study result shows an increased risk.  Bias and confounding must be considered, along with other Bradford Hill factors.

Even if the p-value could be turned into a posterior probability of the null hypothesis, there would be many other probabilities that would necessarily diminish that probability.  Some of the other factors (which could be expressed as objective or subjective probabilities) include:

  • accuracy of the data reporting
  • data collection
  • data categorization
  • data cleaning
  • data handling
  • data analysis
  • internal validity of the study
  • external validity of the study
  • credibility of study participants
  • credibility of study researchers
  • credibility of the study authors
  • accuracy of the study authors’ expression of their research
  • accuracy of the editing process
  • accuracy of the testifying expert witness’s interpretation
  • credibility of the testifying expert witness
  • other available studies, and their respective data and analysis factors
  • all the other Bradford Hill factors

If these largely independent factors each had a probability or accuracy of 95%, the conjunction of their probabilities would likely be below the needed feather weight on top of 50%.  In sum, Judge Rakoff’s confusing significance probability and the posterior probability of the null hypothesis does not subvert the usual standards of proof in civil cases.  See also Sander Greenland, “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment,” 53 Preventive Medicine 225 (2011).

WHENCE COMES THIS ERROR

As a matter of intellectual history, I wonder where this error entered into the judicial system.  As a general matter, there was not much judicial discussion of statistical evidence before the 1970s.  The earliest manifestation of the transpositional fallacy in connection with scientific and statistical evidence appears in an opinion of the United States Court of Appeals, for the District of Columbia Circuit.  Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).  The Circuit’s language is worth looking at carefully:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain.

Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App.D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

 Id.  The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the error that is less than 5% is not the probability of error of the belief in hypothesis of no difference between observations and expectations, but rather the probability of observing the data or the data even more extreme, on the assumption that observed would equal the expected.  The District of Columbia Circuit thus created a strawman:  scientific certainty is 95%, whereas civil and administrative law certainty is 51%.  This is rubbish, which confuses the frequentist probability from hypothesis testing with the subjective probability for belief in a fact.

The transpositional fallacy has a good pedigree, but that does not make it correct.  Only a lawyer would suggest that a mistake once made was somehow binding upon future litigants.  The following collection of citations and references illustrate how widespread the fundamental misunderstanding of statistical inference is, in the courts, in the academy, and at the bar.  If courts cannot deliver fair, accurate adjudication of scientific facts, then it is time to reform the system.


Courts

U.S. Supreme Court

Vasquez v. Hillery, 474 U.S. 254, 259 n.3 (1986) (“the District Court . . . accepted . . . a probability of 2 in 1,000 that the phenomenon was attributable to chance”)

U.S. Court of Appeals

First Circuit

Fudge v. Providence Fire Dep’t, 766 F.2d 650, 658 (1st Cir. 1985) (“Widely accepted statistical techniques have been developed to determine the likelihood an observed disparity resulted from mere chance.”)

Second Circuit

Nat’l Abortion Fed. v. Ashcroft, 330 F. Supp. 2d 436 (S.D.N.Y. 2004), aff’d in part, 437 F.3d 278 (2d Cir. 2006), vacated, 224 Fed. App’x 88 (2d Cir. 2007) (reporting an expert witness’s interpretation of a p-value of 0.30 to mean that there was a 30% probability that the study results were due to chance alone)

Smith v. Xerox Corp., 196 F.3d 358, 366 (2d Cir. 1999) (“If an obtained result varies from the expected result by two standard deviations, there is only about a .05 probability that the variance is due to chance.”)

Waisome v. Port Auth., 948 F.2d 1370, 1376 (2d Cir. 1991) (“about one chance in 20 that the explanation for a deviation could be random”)

Ottaviani v. State Univ. of New York at New Paltz, 875 F.2d 365, 372 n.7 (2d Cir. 1989)

Murphy v. General Elec. Co., 245 F. Supp. 2d 459, 467 (N.D.N.Y. 2003) (“less than a 5% probability that age was related to termination by chance”)

Third Circuit

United States v. State of Delaware, 2004 WL 609331, *10 n.27 (D. Del. 2004) (“there is a 5% (or 1 in 20) chance that the relationship observed is purely random”)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 605 n.26 (D.N.J. 2002) (“only 5% probability that an observed association is due to chance”)

Fifth Circuit

EEOC v. Olson’s Dairy Queens, Inc., 989 F.2d 165, 167 (5th Cir. 1993) (“Dr. Straszheim concluded that the likelihood that [the] observed hiring patterns resulted from truly race-neutral hiring practices was less than one chance in ten thousand.”)

Capaci v. Katz & Besthoff, Inc., 711 F.2d 647, 652 (5th Cir. 1983) (“the highest probability of unbiased hiring was 5.367 × 10-20”), cert. denied, 466 U.S. 927 (1984)

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982)(” A variation of two standard deviations would indicate that the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke. Sullivan, Zimmer & Richards, supra n.9 at 74.”)

Vuyanich v. Republic Nat’l Bank, 505 F. Supp. 224, 272 (N.D.Tex. 1980) (“the chances are less than one in 20 that the true coefficient is actually zero”), judgement vacated, 723 F.2d 1195 (5th Cir. 1984).

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982) (“the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke”)

Seventh Circuit

Adams v. Ameritech Services, Inc., 231 F.3d 414, 424, 427 (7th Cir. 2000) (“it is extremely unlikely (that is, there is less than a 5% probability) that the disparity is due to chance.”)

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940, 941 (7th Cir. 1997) (“An affidavit by a statistician . . . states that the probability that the retentions . . . are uncorrelated with age is less than 5 percent.”)

Eighth Circuit

Craik v. Minnesota State Univ. Bd., 731 F.2d 465, 476n. 13 (8th Cir. 1984) (“Statistical significance is a measure of the probability that an observed disparity is not due to chance. Baldus & Cole, Statistical Proof of Discrimination § 9.02, at 290 (1980). A finding that a disparity is statistically significant at the 0.05 or 0.01 level means that there is a 5 per cent. or 1 per cent. probability, respectively, that the disparity is due to chance.

Ninth Circuit

Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1241n.9 (E.D. Wash. 2002)(describing “statistical tools to calculate the probability that the difference seen is caused by random variation”)

D.C. Circuit

National Lime Ass’n v. EPA, 627 F.2d 416,453 (D.C. Cir. 1980)

FEDERAL CIRCUIT

Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”)(citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).


Regulatory Guidance

OSHA’s Guidance for Compliance with Hazard Communication Act:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).


Academic Commentators

Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999):

“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).  See SANDERS, supra note 5, at 51. The discipline of epidemiology is inherently conservative in making causal ascriptions, and regards Type I errors as more serious than Type II errors, or falsely assuming no association when in fact there is one (false negative). Thus, epidemiology conventionally requires a 95% level of statistical significance, i.e. that in statistical terms it is 95% likely that the association is due to exposure, rather than to chance. See id. at 50-52; Thompson, supra note 3, at 256-58. Despite courts’ use of statistical significance as an evidentiary screening device, this measurement has nothing to do with causation. It is most reflective of a study’s sample size, the relative rarity of the disease being studied, and the variance in study populations. Thompson, supra note 3, at 256.”

 

Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30 (2007):

 “‘By rejecting a hypothesis only when the test is statistically significant, we have placed an upper bound, .05, on the chance of rejecting a true hypothesis’. Fienberg et al., p. 22. Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”

Professor Fienberg stated the matter corrrectly, but Beecher-Monas goes on to restate the matter in her own words, erroneously.  Later, she repeats her incorrect interpretation:

“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.19”

Id. at 61 (citing a paper by Sander Greenland, who correctly stated the definition).

Mark G. Haug, “Minimizing Uncertainty in Scientific Evidence,” in Cynthia H. Cwik & Helen E. Witt, eds., Scientific Evidence Review:  Current Issues at the Crossroads of Science, Technology, and the Law – Monograph No. 7, at 87 (2006)

Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34(Oxford 1993)(One can think of α, β (the chances of type I and type II errors, respectively) and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

Arnold Barnett, “An Underestimated Threat to Multiple Regression Analyses Used in Job Discrimination Cases, 5 Indus. Rel. L.J. 156, 168 (1982) (“The most common rule is that evidence is compelling if and only if the probability the pattern obtained would have arisen by chance alone does not exceed five percent.”)

David W. Barnes, Statistics as Proof: Fundamentals of Quantitative Evidence 162 (1983)(“Briefly, however, the findings of statistical significance at the P < .05, P < .04, and P < .02 levels indicate that the court can be 95%, 96%, and 98% certain, respectively, that the null hypotheses involved in the specific tests carried out … should be rejected.”)

Wayne Roth-Nelson & Kathey Verdeal, “Risk Evidence in Toxic Torts,” 2 Envt’l Lawyer 405,415-16 (1996) (confusing burden of proof with standard for hypothesis testint; and apparently endorsing the erroneous views given by Judge Newman, dissenting in Hodges). Caveat: Roth-Nelson is now a “forensic” toxicologist, who testifies in civil and criminal trials.

Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993) (“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”).


Plaintiffs’ Counsel

Steven Rotman, “Don’t Know Much About Epidemiology?” Trial (Sept. 2007) (Author’s question answered in the affirmative:  “P values.  These measure the probability that a reported association between a drug and condition was due to chance.  A P-value of 0.05, which is generally considered the standard for statistical significance, means there is a 5 percent probability that the association was due to chance.”)

Defense Counsel

Bruce R. Parker & Anthony F. Vittoria, “Debunking Junk Science: Techniques for Effective Use of Biostatistics,” 65 Defense Csl. J. 35, 44 (2002) (“a P value of .01 means the researcher can be 99 percent sure that the result was not due to chance”).

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.