Over 25 years ago, the United States Supreme Court answered a narrow legal question about whether the so-called Frye rule was incorporated into Rule 702 of the Federal Rules of Evidence. Plaintiffs in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), appealed a Ninth Circuit ruling that the Frye rule survived, and was incorporated into, the enactment of a statutory evidentiary rule, Rule 702. As most legal observers can now discern, plaintiffs won the battle and lost the war. The Court held that the plain language of Rule 702 does not memorialize Frye; rather the rule requires an epistemic warrant for the opinion testimony of expert witnesses.
Many of the sub-issues of the Daubert case are now so much water over the dam. The case involved claims of birth defects from maternal use of an anti-nausea medication, Bendectin. Litigation over Bendectin is long over, and the medication is now approved for use in pregnant women, on the basis of a full new drug application, supported by clinical trial evidence.
In revisiting Daubert, therefore, we might imagine that legal scholars and scientists would be interested in the anatomy of the errors that led Bendectin plaintiffs stridently to maintain their causal claims. The oral argument before the Supreme Court is telling with respect to some of the sources of error. Two law professors, Michael H. Gottesman, for plaintiffs, and Charles Fried, for the defense, squared off one Tuesday morning in March 1993. A review of Gottesman’s argument reveals several fallacious lines of argument, which are still relevant today:
A. Regulation is Based Upon Scientific Determinations of Causation
In his oral argument, Gottesman asserted that regulators (as opposed to the scientific community) are in charge of determining causation,1 and environmental regulations are based upon scientific causation determinations.2 By the time that the Supreme Court heard argument in the Daubert case, this conflation of scientific and regulatory standards for causal conclusions was fairly well debunked.3 Gottesman’s attempt to mislead the Court failed, but the effort continues in courtrooms around the United States.
B. Similar Chemical Structures Have the Same Toxicities
Gottesman asserted that human teratogenicity can be determined from similarity in chemical structures with other established teratogens.4 Close may count in horseshoes, but in chemical structural activities, small differences in chemical structures can result in huge differences in toxicologic or pharmacologic properties. A silly little methyl group on a complicated hydrocarbon ring structure can make a world of difference, as in the difference between estrogen and testosterone.
C. All Animals React the Same to Any Given Substance
Gottesman, in his oral argument, maintained that human teratogenicity can be determined from teratogenicity in non-human, non-primate, murine species.5 The Court wasted little time on this claim, the credibility of which has continued to decline in the last 25 years.
D. The Transposition Fallacy
Perhaps of greatest interest to me was Gottesman’s claim that the probability of the claimed causal association can be determined from the p-value or from the coefficient of confidence taken from the observational epidemiologic studies of birth defects among children of women who ingested Bendectin in pregancy; a.k.a. the transposition fallacy.6
All these errors are still in play in American courtrooms, despite efforts of scientists and scientific organizations to disabuse judges and lawyers. The transposition fallacy, which has been addressed in these pages and elsewhere at great length seems especially resilient to educational efforts. Still, the fallacy was as well recognized at the time of the Daubert argument as it is today, and it is noteworthy that the law professor who argued the plaintiffs’ case, in the highest court of the land, advanced this fallacious argument, and that the scientific and statistical community did little to nothing to correct the error.7
Although Professor Gottesman’s meaning in the oral argument is not entirely clear, on multiple occasions, he appeared to have conflated the coefficient of confidence, from confidence intervals, with the posterior probability that attaches to the alternative hypothesis of some association:
“What the lower courts have said was yes, but prove to us to a degree of statistical certainty which would give us 95 percent confidence that the human epidemiological data is reflective, that these higher numbers for the mothers who used Bendectin were not the product of random chance but in fact are demonstrating the linkage between this drug and the symptoms observed.”8
* * * * *
“… what was demonstrated by Shanna Swan was that if you used a degree of confidence lower than 95 percent but still sufficient to prove the point as likelier than not, the epidemiological evidence is positive… .”9
* * * * *
“The question is, how confident can we be that that is in fact probative of causation, not at a 95 percent level, but what Drs. Swan and Glassman said was applying the Rothman technique, a published technique and doing the arithmetic, that you find that this does link causation likelier than not.”10
Professor Fried’s oral argument for the defense largely refused or failed to engage with plaintiffs’ argument on statistical inference. With respect to the “Rothman” approach, Fried pointed out that plaintiffs’ statistical expert witness, Shanna swan, never actually employed “the Rothman principle.”11
With respect to plaintiffs’ claim that individual studies had low power to detect risk ratios of two, Professor Fried missed the opportunity to point out that such post-hoc power calculations, whatever validity they might possess, embrace the concept of statistical significance at the customary 5% level. Fried did note that a meta-analysis, based upon all the epidemiologic studies, rendered plaintiffs’ power complaint irrelevant.12
Some readers may believe that judging advocates speaking extemporaneously about statistical concepts might be overly harsh. How well then did the lawyers explain and represent statistical concepts in their written briefs in the Daubert case?
Petitioners’ Briefs
Petitioners’ Opening Brief
The petitioners’ briefs reveal that Gottesman’s statements at oral argument represent a consistent misunderstanding of statistical concepts. The plaintiffs consistently conflated significance probability or the coefficient of confidence with the civil burden of proof probability:
“The crux of the disagreement between Merrell’s experts and those whose testimony is put forward by plaintiffs is that the latter are prepared to find causation more probable than not when the epidemiological evidence is strongly positive (albeit not at a 95% confidence level) and when it is buttressed with animal and chemical evidence predictive of causation, while the former are unwilling to find causation in the absence of an epidemiological study that satisfies the 95% confidence level.”13
After giving a reasonable fascimile of a definition of statistical significance, the plaintiffs’ brief proceeds to confuse the complement of alpha, or the coefficient of confidence (typically 95%), with probability that the observed risk ratio in a sample is the actual population parameter of risk:
“But in toxic tort lawsuits, the issue is not whether it is certain that a chemical caused a result, but rather whether it is likelier than not that it did. It is not self-evident that the latter conclusion would require eliminating the null hypothesis (i.e. non-causation) to a confidence level of 95%.30” 14
The plaintiffs’ brief cited heavily to Rothman’s textbook, Modern Epidemiology, with the specious claim that the textbook supported the plaintiffs’ use of the coefficient of confidence to derive a posterior probability (> 50%) of the correctness of an elevated risk ratio for birth defects in children born to mothers who had taken Bendectin in their first trimesters of pregnancy:
“An alternative mechanism has been developed by epidemiologists in recent years to give a somewhat more informative picture of what the statistics mean. At any given confidence level (e.g. 95%) a confidence interval can be constructed. The confidence interval identifies the range of relative risks that collectively comprise the 95% universe. Additional confidence levels are then constructed exhibiting the range at other confidence levels, e.g., at 90%, 80%, etc. From this set of nested confidence intervals the epidemiologist can make assessments of how likely it is that the statistics are showing a true association. Rothman, Tab 9, pp. 122-25. By calculating nested confidence intervals for the data in the Bendectin studies, Dr. Swan was able to determine that it is far more likely than not that a true association exists between Bendectin and human limb reduction birth defects. Swan, Tab 12, at 3618-28.”15
The heavy reliance upon Rothman’s textbook at first blush appears confusing. Modern Epidemiology makes one limited mention of nested confidence intervals, and certainly never suggests that such intervals can provide a posterior probability of the correctness of the hypothesis. Rothman’s complaints about reliance upon “statistical significance,” however, are well-known, and Rothman himself submitted an amicus brief16 in Daubert, a brief that has its own problems.17
In direct response to the Rothman Brief,18 Professor Alvin Feinstein filed an amicus brief in Daubert, wherein he acknowledged that meta-analyses and re-analyses can be valid, but these techniques are subject to many sources of invalidity, and their employment by careful practitioners in some instances should not be a blank check to professional witnesses who are supported by plaintiffs’ counsel. Similarly, Feinstein acknowledged that standards of statistical significance:
“should be appropriately flexible, but they must exist if science is to preserve its tradition of intellectual discipline and high quality research.”19
Petitioners’ Reply Brief
The plaintiffs’ statistical misunderstandings are further exemplified in their Reply Brief, where they reassert the transposition fallacy and alternatively state that associations with p-values greater than 5%, or 95% confidence intervals that include the risk ratio of 1.0, do not show the absence of an association.20 The latter point was, of course irrelevant in the Daubert case, in which plaintiffs had the burden of persuasion. As in their oral argument through Professor Gottesman, the plaintiffs’ appellate briefs misunderstand the crucial point that confidence intervals are conditioned upon the data observed from a particular sample, and do not provide posterior probabilities for the correctness of a claimed hypothesis.
Defense Brief
The defense brief spent little time on the statistical issue or plaintiffs’ misstatements, but dispatched the issue in a trenchant footnote:
“Petitioners stress the controversy some epidemiologists have raised about the standard use by epidemiologists of a 95% confidence level as a condition of statistical significance. Pet. Br. 8-10. See also Rothman Amicus Br. It is hard to see what point petitioners’ discussion establishes that could help their case. Petitioners’ experts have never developed and defended a detailed analysis of the epidemiological data using some alternative well-articulated methodology. Nor, indeed, do they show (or could they) that with some other plausible measure of confidence (say, 90%) the many published studies would collectively support an inference that Bendectin caused petitioners’ limb reduction defects. At the very most, all that petitioners’ theoretical speculations do is question whether these studies – as the medical profession and regulatory authorities in many countries have concluded – affirmatively prove that Bendectin is not a teratogen.”21
The defense never responded to the specious argument, stated or implied within the plaintiffs’ briefs, and in Gottesman’s oral argument, that a coefficient of confidence of 51% would have generated confidence intervals that routinely excluded the null hypothesis of risk ratio of 1.0. The defense did, however, respond to plaintiffs’ power argument by adverting to a meta-analysis that failed to find a statistically significant association.22
The defense also advanced two important arguments to which the plaintiffs’ briefs never meaningfully responded. First, the defense detailed the “cherry picking” or selective reliance engaged in by plaintiffs’ expert witnesses.23 Second, the defense noted that plaintiffs’ had a specific causation problem in that their expert witnesses had been attempting to infer specific causation based upon relative risks well below 2.0.24
To some extent, the plaintiffs’ statistical misstatements were taken up by an amicus brief submitted by the United States government, speaking through the office of the Solicitor General.25 Drawing upon the Supreme Court’s decisions in race discrimination cases,26 the government asserted that epidemiologists “must determine” whether a finding of an elevated risk ratio “could have arisen due to chance alone.”27
Unfortunately, the government’s brief butchered the meaning of confidence intervals. Rather than describe the confidence interval as showing what point estimates of risk ratios are reasonable compatible with the sample result, the government stated that confidence intervals show “how close the real population percentage is likely to be to the figure observed in the sample”:
“since there is a 95 percent chance that the ‘true’ value lies within two standard deviations of the sample figure, that particular ‘confidence interval’ (i.e., two standard deviations) is therefore said to have a ‘confidence level’ of about 95 percent.” 28
The Solicitor General’s office seemed to have had some awareness that it was giving offense with the above definition because it quickly added:
“While it is customary (and, in many cases, easier) to speak of ‘a 95 percent chance’ that the actual population percentage is within two standard deviations of the figure obtained from the sample, ‘the chances are in the sampling procedure, not in the parameter’.”29
Easier perhaps but clearly erroneous to speak that way, and customary only among the unwashed. The government half apologized for misleading the Court when it followed up with a better definition from David Freedman’s textbook, but sadly the government lawyers were not content to let the matter sit there. The Solicitor General office’s brief obscured the textbook definition with a further inaccurate and false précis:
“if the sampling from the general population were repeated numerous times, the ‘real’ population figure would be within the confidence interval 95 percent of the time. The ‘real’ figure would be outside that interval the remaining five percent of the time.”30
The lawyers in the Solicitor General’s office thus made the rookie mistake of forgetting that in the long run, after numerous repeated samples, there would be numerous confidence intervals, not one. The 95% probability of containing the true population value belongs to the set of the numerous confidence intervals, not “the confidence interval” obtained in the first go around.
The Daubert case has been the subject of nearly endless scholarly comment, but few authors have chosen to revisit the parties’ briefs. Two authors have published a paper that reviewed the scientists’ amici briefs in Daubert.31 The Rothman brief was outlined in detail; the Feinstein rebuttal was not substantively discussed. The plaintiffs’ invocation of the transposition fallacy in Daubert has apparently gone unnoticed.
1 Oral Argument in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 754951, *5 (Tuesday, March 30, 1993) [Oral Arg.]
2 Oral Arg. at *6.
3 In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y.1984) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir. 1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988).
4 Org. Arg. at *19.
5 Oral Arg. at *18-19.
6 Oral Arg. at *19.
7 See, e.g., “Sander Greenland on ‘The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics’” (Feb. 8, 2015) (noting biostatistician Sander Greenland’s publications, which selectively criticize only defense expert witnesses and lawyers for statistical misstatements); see also “Some High-Value Targets for Sander Greenland in 2018” (Dec. 27, 2017).
8 Oral Arg. at *19.
9 Oral Arg. at *20
10 Oral Arg. at *44. At the oral argument, this last statement was perhaps Gottesman’s clearest misstatement of statistical principles, in that he directly suggested that the coefficient of confidence translates into a posterior probability of the claimed association at the observed size.
11 Oral Arg. at *37.
12 Oral Arg. at *32.
13 Petitioner’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1992 WL 12006442, *8 (U.S. Dec. 2, 1992) [Petitioiner’s Brief].
14 Petitioner’s Brief at *9.
15 Petitioner’s Brief at *n. 36.
16 Brief Amici Curiae of Professors Kenneth Rothman, Noel Weiss, James Robins, Raymond Neutra and Steven Stellman, in Support of Petitioners, 1992 WL 12006438, Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. S. Ct. No. 92-102 (Dec. 2, 1992).
17 See “The ‘Rothman’ Amicus Brief in Daubert v. Merrill Dow Pharmaceuticals” (Nov. 17, 2018).
18 Brief Amicus Curiae of Professor Alvan R. Feinstein in Support of Respondent, in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 13006284, at *2 (U.S., Jan. 19, 1993) [Feinstein Brief].
19 Feinstein Brief at *19.
20 Petitioner’s Reply Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006390, at *4 (U.S., Feb. 22, 1993).
21 Respondent’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006277, at n. 32 (U.S., Jan. 19, 1993) [Respondent Brief].
22 Respondent Brief at *4.
23 Respondent Brief at *42 n.32 and 47.
24 Respondent Brief at *40-41 (citing DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990)).
25 Brief for the United States as Amicus Curiae Supporting Respondent in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006291 (U.S., Jan. 19, 1993) [U.S. Brief].
26 See, e.g., Hazelwood School District v. United States, 433 U.S. 299, 308-312
(1977); Castaneda v. Partida, 430 U.S. 482, 495-499 & nn.16-18 (1977) (“As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would be suspect to a social scientist.”).
27 U.S. Brief at *3-4. Over two decades later, when politically convenient, the United States government submitted an amicus brief in a case involving alleged securities fraud for failing to disclose adverse events of an over-the-counter medication. In Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011), the securities fraud plaintiffs contended that they need not plead “statistically significant” evidence for adverse drug effects. The Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs disclaimed the necessity, or even the importance, of statistical significance:
“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”
Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).
28 U.S. Brief at *5.
29 U.S. Brief at *5-6 (citing David Freedman, Freedman, R. Pisani, R. Purves & A. Adhikari, Statistics 351, 397 (2d ed. 1991)).
30 U.S. Brief at *6 (citing Freedman’s text at 351) (emphasis added).
31 See Joan E. Bertin & Mary S. Henifin, Science, Law, and the Search for Truth in the Courtroom: Lessons from Dauburt v. Menell Dow,” 22 J. Law, Medicine & Ethics 6 (1994); Joan E. Bertin & Mary Sue Henifin, “Scientists Talk to Judges: Reflections on Daubert v. Merrell Dow,” 4(3) New Solutions 3 (1994). The authors’ choice of the New Solutions journal is interesting and curious. New Solutions: A journal of Environmental and Occupational Health Policy was published by the Oil, Chemical and Atomic Workers International Union, under the control of Anthony Mazzocchi (June 13, 1926 – Oct. 5, 2002), who was the union’s secretary-treasurer. Anthony Mazzocchi, “Finding Common Ground: Our Commitment to Confront the Issues,” 1 New Solutions 3 (1990); see also Steven Greenhouse, “Anthony Mazzocchi, 76, Dies; Union Officer and Party Father,” N.Y. Times (Oct. 9, 2002). Even a cursory review of this journal’s contents reveals how concerned, even obsessed, the union was interested and invested in the litigation industry and that industry’s expert witnesses.