In writing previously about the Avandia MDL Court’s handling of the defendants’ Daubert motion, I noted the trial court’s erroneous interpretation of statistical evidence. See “Learning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011). In fact, the Avandia court badly misinterpreted the meaning of a p-value, a basic concept in statistics:
“The DREAM and ADOPT studies were designed to study the impact of Avandia on prediabetics and newly diagnosed diabetics. Even in these relatively low-risk groups, there was a trend towards an adverse outcome for Avandia users (e.g., in DREAM, the p-value was .08, which means that there is a 92% likelihood that the difference between the two groups was not the result of mere chance).”
In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011) (internal citation omitted). The Avandia MDL court was not, however, the first to commit this howler. Professor David Kaye collected examples of statistical blunders from published cases in a 1986 law review, and again in his chapter on statistical evidence in the Federal Judicial Center’s Reference Manual on Scientific Evidence created a list of erroneous interpretations:
United States v. Georgia Power Co., 474 F.2d. 906, 915 (5th Cir. 1973)
National Lime Ass’n v. EPA, 627 F.2d 416, 453 (D.C. Cir. 1980)
Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982) (“A variation of two standard deviations would indicate that the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke.”);
Vuyanich v. Republic Nat’l Bank, 505 F. Supp. 224, 272 (N.D. Tex. 1980) (“[I]f a 5% level of significance is used, a sufficiently large t-statistic for the coefficient indicates that the chances are less than one in 20 that the true coefficient is actually zero.”), vacated, 723 F.2d 1195 (5th Cir. 1984)
Craik v. Minnesota State Univ. Bd., 731 F.2d 465, 476n.13 (8th Cir. 1984)(“[a] finding that a disparity is statistically significant at the 0.095 or 0.01 level means that there is a 5 per cent. Or 1 per cent. Probability, respectively, that the disparity is due to chance.” See also id. at 510 (Swygert, J., dissenting)(stating that coefficients were statistically significant at 1% level, allowing him to say that “we can be 99% confident that each was different from zero.”)
Sheehan v. Daily Racing Form, Inc., 104 F.3d 940, 941 (7th Cir. 1997) (“An affidavit by a statistician . . . states that the probability that the retentions . . . are uncorrelated with age is less than 5 percent.”)
Waisome v. Port Authority, 948 F.2d 1370, 1376 (2d Cir. 1991) (“Social scientists consider a finding of two standard deviations significant, meaning there is about one chance in 20 that the explanation for a deviation could be random . . . .”)
David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in Reference Manual on Scientific Evidence 83, 122-24 (2nd ed. 2000); David H. Kaye, “Is Proof of Statistical Significance Relevant?” 61 Wash. L. Rev. 1333, 1347 (1986)(pointing out that before 1970, there were virtually no references to “statistical significance” or p-values in reported state or federal cases.
Notwithstanding the educational efforts of the Federal Judicial Center, the innumeracy continues, and with the ascent of the MDL model for addressing mass torts, many recent howlers have come from trial judges given responsibility for overseeing the pretrial coordination of thousands of lawsuits. In addition to the Avandia MDL Court, here are some other recent erroneous statements that can be added to Professor Kaye’s lists:
“Scientific convention defines statistical significance as “P ≤ .05,” i.e., no more than one chance in twenty of a finding a false association due to sampling error. Plaintiffs, however, need only prove that causation is more-probable-than-not.”
In re Ephedra Prods. Liab. Litig., 393 F.Supp.2d 181, 193 (S.D.N.Y. 2005)(confusing the standard for Type I statistical error with the burden of proof).
“More-probable-than-not might be likened to P < .5, so that preponderance of the evidence is nearly ten times less significant (whatever that might mean) than the scientific standard.”
Id. at 193 n.9 (same).
In the Phenylpropanolamine litigation, the error was even more clearly stated, for both p-values and confidence intervals:
“P-values measure the probability that the reported association was due to chance… .”
“… while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”
In re Phenylpropanolamine Products Liab. Litig., 289 F. 2d 1230, 1236n.1 (2003)
These misstatements raise important questions about judicial competency for gatekeeping, the selection, education, and training of judges, the assignment of MDL cases to individual trial judges, and the aggregation of Rule 702 motions to a trial judge for a single, one-time decision that will control hundreds if not thousands of cases.
Recently, a student published a bold note that argued for the dismantling of judicial gatekeeping. Note, “Admitting Doubt: A New Standards for Scientific Evidence,” 123 Harvard Law Review 2021 (2010). With all the naiveté of someone who has never tried a jury trial, the student argued that juries are at least as good, if not better, at handling technical questions. The empirical evidence for such a suggestion is slim, and ignores the geographic variability in jury pools. The above instances of erroneous statistical interpretations might seem to support the student’s note, but the argument would miss two important points:
- these errors are put on display for all to see, and for commentators to note and correct, whereas jury decisions obscure their mistakes; and
- judges can be singled out for their technical competencies, and given appropriate assignments (which hardly ever happens at present), and judges can be required to partake in professional continuing legal education, which might well include training in technical areas to improve their decision making.
The Federal Judicial Center, and its state court counterparts, have work to do. Lawyers also have an obligation to help courts get difficult, technical issue right. Finally, courts, lawyers, and commentators need to rethink how the so-called Daubert process works, and does not work, especially in the high-stakes arena of multi-district litigation.