TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

March 23rd, 2019

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  • Double the one-sided p-value.
  • Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  • Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). SeeThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., 145 F.Supp. 3d 573 (D.S.C. 2015), reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016). As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed. Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).

ASA Statement Goes to Court – Part 2

March 7th, 2019

It has been almost three years since the American Statistical Association (ASA) issued its statement on statistical significance. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016) [ASA Statement]. Before the ASA’s Statement, courts and lawyers from all sides routinely misunderstood, misstated, and misrepresented the meaning of statistical significance.1 These errors were pandemic despite the efforts of the Federal Judicial Center and the National Academies of Science to educate judges and lawyers, through their Reference Manuals on Scientific Evidence and seminars. The interesting question is whether the ASA’s Statement has improved, or will improve, the unfortunate situation.2

The ASA Statement on Testosterone

“Ye blind guides, who strain out a gnat and swallow a camel!”
Matthew 23:24

To capture the state of the art, or the state of correct and flawed interpretations of the ASA Statement, reviewing a recent but now resolved, large so-called mass tort may be illustrative. Pharmaceutical products liability cases almost always turn on evidence from pharmaco-epidemiologic studies that compare the rate of an outcome of interest among patients taking a particular medication with the rate among similar, untreated patients. These studies compare the observed with the expected rates, and invariably assess the differences as either a “risk ratio,” or a “risk difference,” for both the magnitude of the difference and for “significance probability” of observing a rate at least as large as seen in the exposed group, given the assumptions that that the medication did not change the rate and that the data followed a given probability distribution. In these alleged “health effects” cases, claims and counterclaims of misuse of significance probability have been pervasive. After the ASA Statement was released, some lawyers began to modify their arguments to suggest that their adversaries’ arguments offend the ASA’s pronouncements.

One litigation that showcases the use and misuse of the ASA Statement arose from claims that AbbVie, Inc.’s transdermal testosterone medication (TRT) causes heart attacks, strokes, and venous thromboembolism. The FDA had reviewed the plaintiffs’ claims, made in a Public Citizen complaint, and resoundingly rejected the causal interpretation of two dubious observational studies, and an incomplete meta-analysis that used an off-beat composite end point.3 The Public Citizen petition probably did succeed in pushing the FDA to convene an Advisory Committee meeting, which again resulted in a rejection of the causal claims. The FDA did, however, modify the class labeling for TRT with respect to indication and a possible association with cardiovascular outcomes. And then the litigation came.

Notwithstanding the FDA’s determination that a causal association had not been shown, thousands of plaintiffs sued several companies, with most of the complaints falling on AbbVie, Inc., which had the largest presence in the market. The ASA Statement came up occasionally in pre-trial depositions, but became a major brouhaha, when AbbVie moved to exclude plaintiffs’ causation expert witnesses.4

The Defense’s Anticipatory Parry of the ASA Statement

As AbbVie described the situation:

Plaintiffs’ experts uniformly seek to abrogate the established methods and standards for determining … causal factors in favor of precisely the kind of subjective judgments that Daubert was designed to avoid. Tests for statistical significance are characterized as ‘misleading’ and rejected [by plaintiffs’ expert witnesses] in favor of non-statistical ‘estimates’, ‘clinical judgment’, and ‘gestalt’ views of the evidence.”5

AbbVie’s brief in support of excluding plaintiffs’ expert witnesses barely mentioned the ASA Statement, but in a footnote, the defense anticipated the Plaintiffs’ opposition would be based on rejecting the importance of statistical significance testing and the claim that this rejection was somehow supported by the ASA Statement:

The statistical community is currently debating whether scientists who lack expertise in statistics misunderstand p-values and overvalue significance testing. [citing ASA Statement] The fact that there is a debate among professional statisticians on this narrow issue does not validate Dr. Gerstman’s [plaintiffs’ expert witness’s] rejection of the importance of statistical significance testing, or undermine Defendants’ reliance on accepted methods for determining association and causation.”6

In its brief in support of excluding causation opinions, the defense took pains to define statistical significance, and managed to do so, painfully, or at least in ways that the ASA conferees would have found objectionable:

Any association found must be tested for its statistical significance. Statistical significance testing measures the likelihood that the observed association could be due to chance variation among samples. Scientists evaluate whether an observed effect is due to chance using p-values and confidence intervals. The prevailing scientific convention requires that there be 95% probability that the observed association is not due to chance (expressed as a p-value < 0.05) before reporting a result as “statistically significant. * * * This process guards against reporting false positive results by setting a ceiling for the probability that the observed positive association could be due to chance alone, assuming that no association was actually present.7

AbbVie’s brief proceeded to characterize the confidence interval as a tool of significance testing, again in a way that misstates the mathematical meaning and importance of the interval:

The determination of statistical significance can be described equivalently in terms of the confidence interval calculated in connection with the association. A confidence interval indicates the level of uncertainty that exists around the measured value of the association (i.e., the OR or RR). A confidence interval defines the range of possible values for the actual OR or RR that are compatible with the sample data, at a specified confidence level, typically 95% under the prevailing scientific convention. Reference Manual, at 580 (Ex. 14) (“If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”). * * * If the confidence interval crosses 1.0, this means there may be no difference between the treatment group and the control group, therefore the result is not considered statistically significant.”8

Perhaps AbbVie’s counsel should be permitted a plea in mitigation by having cited to, and quoted from, the Reference Manual on Scientific Evidence’s chapter on epidemiology, which was also wide of the mark in its description of the confidence interval. Counsel would have been better served by the Manual’s more rigorous and accurate chapter on statistics. Even so, the above-quoted statements give an inappropriate interpretation of random error as a probability about the hypothesis being tested.9 Particularly dangerous, in terms of failing to advance AbbVie’s own objectives, was the characterization of the confidence interval as measuring the level of uncertainty, as though there were no other sources of uncertainty other than random error in the measurement of the risk ratio.

The Plaintiffs’ Attack on Significance Testing

The Plaintiffs, of course, filed an opposition brief that characterized the defense position as an attempt to:

elevate statistical significance, as measured by confidence intervals and so-called p-values, to the status of an absolute requirement to the establishment of causation.”10

Tellingly, the plaintiffs’ brief fails to point to any modern-era example of a scientific determination of causation based upon epidemiologic evidence, in which the pertinent studies were not assessed for, and found to show, statistical significance.

After citing a few judicial opinions that underplayed the importance of statistical significance, the Plaintiffs’ opposition turned to the ASA Statement for what it perceived to be support for its loosey-goosey approach to causal inference.11 The Plaintiffs’ opposition brief quoted a series of propositions from the ASA Statement, without the ASA’s elaborations and elucidations, and without much in the way of explanation or commentary. At the very least, the Plaintiffs’ heavy reliance upon, despite their distortions of, the ASA Statement helped them to define key statistical concepts more carefully than had AbbVie in its opening brief.

The ASA Statement, however, was not immune from being misrepresented in the Plaintiffs’ opposition brief. Many of the quoted propositions were quite beside the points of the dispute over the validity and reliability of Plaintiffs’ expert witnesses’ conclusions of causation about testosterone and heart attacks, conclusions not reached or shared by the FDA, any consensus statement from medical organizations, or any serious published systematic review:

P-values do not measure the probability that the studied hypothesis is true, … .”12

This proposition from the ASA Statement is true, but trivially true. (Of course, this ASA principle is relevant to the many judicial decisions that have managed to misstate what p-values measure.) The above-quoted proposition follows from the definition and meaning of the p-value; only someone who did not understand significance probability would confuse it with the probability of the truth of the studied hypothesis. P-values’ not measuring the probability of the null hypothesis, or any alternative hypothesis, is not a flaw in p-values, but arguably their strength.

A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”13

Again, true, true, and immaterial. The existence of other importance metrics, such as the magnitude of an association or correlation, hardly detracts from the importance of assessing the random error in an observed statistic. The need to assess clinical or practical significance of an association or correlation also does not detract from the importance of the assessed random error in a measured statistic.

By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”14

The Plaintiffs’ opposition attempted to spin the above ASA statement as a criticism of p-values involves an elenchi ignoratio. Once again, the p-value assumes a probability model and a null hypothesis, and so it cannot provide a “measure” or the model or hypothesis’ probability.

The Plaintiffs’ final harrumph on the ASA Statement was their claim that the ASA Statement’s conclusion was “especially significant” to the testosterone litigation:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.”15

The existence of other important criteria in the evaluation and synthesis of a complex body of studies does not erase or supersede the importance of assessing stochastic error in the epidemiologic studies. Plaintiffs’ Opposition Brief asserted that the Defense had attempted to:

to substitute the single index, the p-value, for scientific reasoning in the reports of Plaintiffs’ experts should be rejected.”16

Some of the defense’s opening brief could indeed be read as reducing causal inference to the determination of statistical significance. A sympathetic reading of the entire AbbVie brief, however, shows that it had criticized the threats to validity in the observational epidemiologic studies, as well as some of the clinical trials, and other rampant flaws in the Plaintiffs’ expert witnesses’ reasoning. The Plaintiffs’ citations to the ASA Statement’s “negative” propositions about p-values (to emphasize what they are not) appeared to be the stuffing of a strawman, used to divert attention from other failings of their own claims and proffered analyses. In other words, the substance of the Rule 702 application had much more to do with data quality and study validity than statistical significance.

What did the trial court make of this back and forth about statistical significance and the ASA Statement? For the most part, the trial court denied both sides’ challenges to proffered expert witness testimony on causation and statistical issues. In sorting the controversy over the ASA Statement, the trial court apparently misunderstood key statistical concepts and paid little attention to the threats to validity other than random variability in study results.17 The trial court summarized the controversy as follows:

In arguing that the scientific literature does not support a finding that TRT is associated with the alleged injuries, AbbVie emphasize [sic] the importance of considering the statistical significance of study results. Though experts for both AbbVie and plaintiffs agree that statistical significance is a widely accepted concept in the field of statistics and that there is a conventional method for determining the statistical significance of a study’s findings, the parties and their experts disagree about the conclusions one may permissibly draw from a study result that is deemed to possess or lack statistical significance according to conventional methods of making that determination.”18

Of course, there was never a controversy presented to the court about drawing a conclusion from “a study.” By the time the briefs were filed, both sides had multiple observational studies, clinical trials, and meta-analyses to synthesize into opinions for or against causal claims.

Ironically, AbbVie might claim to have prevailed in having the trial court adopt its misleading definitions of p-values and confidence intervals:

Statisticians test for statistical significance to determine the likelihood that a study’s findings are due to chance. *** According to conventional statistical practice, such a result *** would be considered statistically significant if there is a 95% probability, also expressed as a “p-value” of <0.05, that the observed association is not the product of chance. If, however, the p-value were greater than 0.05, the observed association would not be regarded as statistically significant, according to prevailing conventions, because there is a greater than 5% probability that the association observed was the result of chance.”19

The MDL court similarly appeared to accept AbbVie’s dubious description of the confidence interval:

A confidence interval consists of a range of values. For a 95% confidence interval, one would expect future studies sampling the same population to produce values within the range 95% of the time. So if the confidence interval ranged from 1.2 to 3.0, the association would be considered statistically significant, because one would expect, with 95% confidence, that future studies would report a ratio above 1.0 – indeed, above 1.2.”20

The court’s opinion clearly evidences the danger in stating the importance of statistical significance without placing equal emphasis on the need to exclude bias and confounding. Having found an observational study and one meta-analysis of clinical trial safety outcomes that were statistically significant, the trial court held that any dispute over the probativeness of the studies was for the jury to assess.

Some but not all of AbbVie’s brief might have encouraged this lax attitude by failing to emphasize study validity at the same time as emphasizing the importance of statistical significance. In any event, trial court continued with its précis of the plaintiffs’ argument that:

a study reporting a confidence interval ranging from 0.9 to 3.5, for example, should certainly not be understood as evidence that there is no association and may actually be understood as evidence in favor of an association, when considered in light of other evidence. Thus, according to plaintiffs’ experts, even studies that do not show a statistically significant association between TRT and the alleged injuries may plausibly bolster their opinions that TRT is capable of causing such injuries.”21

Of course, a single study that reported a risk ratio greater than 1.0, with a confidence interval 0.9 to 3.5 might be reasonably incorporated into a meta-analysis that in turn could support, or not support a causal inference. In the TRT litigation, however, the well-conducted, most up-to-date meta-analyses did not report statistically significant elevated rates of cardiovascular events among users of TRT. The court’s insistence that a study with a confidence interval 0.9 to 3.5 cannot be interpreted as evidence of no association is, of course, correct. Equally correct would be to say that the interval shows that the study failed to show an association. The trial court never grappled with the reality that the best conducted meta-analyses failed to show statistically significant increases in the rates of cardiovascular events.

The American Statistical Association and its members would likely have been deeply disappointed by how both parties used the ASA Statement for their litigation objectives. AbbVie’s suggestion that the ASA Statement reflects a debate about “whether scientists who lack expertise in statistics misunderstand p-values and overvalue significance testing” would appear to have no support in the Statement itself or any other commentary to come out of the meeting leading up to the Statement. The Plaintiffs’ argument that p-values properly understood are unimportant and misleading similarly finds no support in the ASA Statement. Conveniently, the Plaintiffs’ brief ignored the Statement’s insistence upon transparency in pre-specification of analyses and outcomes, and in handling of multiple comparisons:

P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherrypicking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and ‘p-hacking’, leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided.”22

Most if not all of the plaintiffs’ expert witnesses’ reliance materials would have been eliminated under this principle set forth by the ASA Statement.


1 See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005). See alsoConfidence in Intervals and Diffidence in the Courts” (March 4, 2012); “Scientific illiteracy among the judiciary” (Feb. 29, 2012).

3Letter of Janet Woodcock, Director of FDA’s Center for Drug Evaluation and Research, to Sidney Wolfe, Director of Public Citizen’s Health Research Group (July 16, 2014) (denying citizen petition for “black box” warning).

4 Defendants’ (AbbVie, Inc.’s) Motion to Exclude Plaintiffs Expert Testimony on the Issue of Causation, and for Summary Judgment, and Memorandum of Law in Support, Case No. 1:14-CV-01748, MDL 2545, Document #: 1753, 2017 WL 1104501 (N.D. Ill. Feb. 20, 2017) [AbbVie Brief].

5 AbbVie Brief at 3; see also id. at 7-8 (“Depending upon the expert, even the basic tests of statistical significance are simply ignored, dismissed as misleading… .”) AbbVie’s definitions of statistical significance occasionally wandered off track and into the transposition fallacy, but generally its point was understandable.

6 AbbVie Brief at 63 n.16 (emphasis in original).

7 AbbVie Brief at 13 (emphasis in original).

8 AbbVie Brief at 13-14 (emphasis in original).

9 The defense brief further emphasized statistical significance almost as though it were a sufficient basis for inferring causality from observational studies: “Regardless of this debate, courts have routinely found the traditional epidemiological method—including bedrock principles of significance testing—to be the most reliable and accepted way to establish general causation. See, e.g., In re Zoloft, 26 F. Supp. 3d 449, 455; see also Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319 (7th Cir. 1996) (“The law lags science; it does not lead it.”). AbbVie Brief at 63-64 & n.16. The defense’s language about “including bedrock principles of significance testing” absolves it of having totally ignored other necessary considerations, but still the defense might have advantageously pointed out at the other needed considerations for causal inference at the same time.

10 Plaintiffs’ Steering Committee’ Memorandum of Law in Opposition to Motion of AbbVie Defendants to Exclude Plaintiffs’ Expert Testimony on the Issue of Causation, and for Summary Judgment at p.34, Case No. 1:14-CV-01748, MDL 2545, Document No. 1753 (N.D. Ill. Mar. 23, 2017) [Opp. Brief].

11 Id. at 35 (appending the ASA Statement and the commentary of more than two dozen interested commentators).

12 Id. at 38 (quoting from the ASA Statement at 131).

13 Id. at 38 (quoting from the ASA Statement at 132).

14 Id. at 38 (quoting from the ASA Statement at 132).

15 Id. at 38 (quoting from the ASA Statement at 132).

16 Id. at 38

17  In re Testosterone Replacement Therapy Prods. Liab. Litig., MDL No. 2545, C.M.O. No. 46, 2017 WL 1833173 (N.D. Ill. May 8, 2017) [In re TRT]

18 In re TRT at *4.

19 In re TRT at *4.

20 Id.

21 Id. at *4.

22 ASA Statement at 131-32.

Daubert Retrospective – Statistical Significance

January 5th, 2019

The holiday break was an opportunity and an excuse to revisit the briefs filed in the Supreme Court by parties and amici, in the Daubert case. The 22 amicus briefs in particular provided a wonderful basis upon which to reflect how far we have come, and also how far we have to go, to achieve real evidence-based fact finding in technical and scientific litigation. Twenty-five years ago, Rules 702 and 703 vied for control over errant and improvident expert witness testimony. With Daubert decided, Rule 702 emerged as the winner. Sadly, most courts seem to ignore or forget about Rule 703, perhaps because of its awkward wording. Rule 702, however, received the judicial imprimatur to support the policing and gatekeeping of dysepistemic claims in the federal courts.

As noted last week,1 the petitioners (plaintiffs) in Daubert advanced several lines of fallacious and specious argument, some of which was lost in the shuffle and page limitations of the Supreme Court briefings. The plaintiffs’ transposition fallacy received barely a mention, although it did bring forth at least a footnote in an important and overlooked amicus brief filed by American Medical Association (AMA), the American College of Physicians, and over a dozen other medical specialty organizations,2 all of which both emphasized the importance of statistical significance in interpreting epidemiologic studies, and the fallacy of interpreting 95% confidence intervals as providing a measure of certainty about the estimated association as a parameter. The language of these associations’ amicus brief is noteworthy and still relevant to today’s controversies.

The AMA’s amicus brief, like the brief filed by the National Academies of Science and the American Association for the Advancement of Science, strongly endorsed a gatekeeping role for trial courts to exclude testimony not based upon rigorous scientific analysis:

The touchstone of Rule 702 is scientific knowledge. Under this Rule, expert scientific testimony must adhere to the recognized standards of good scientific methodology including rigorous analysis, accurate and statistically significant measurement, and reproducibility.”3

Having incorporated the term “scientific knowledge,” Rule 702 could not permit anything less in expert witness testimony, lest it pollute federal courtrooms across the land.

Elsewhere, the AMA elaborated upon its reference to “statistically significant measurement”:

Medical researchers acquire scientific knowledge through laboratory investigation, studies of animal models, human trials, and epidemiological studies. Such empirical investigations frequently demonstrate some correlation between the intervention studied and the hypothesized result. However, the demonstration of a correlation does not prove the hypothesized result and does not constitute scientific knowledge. In order to determine whether the observed correlation is indicative of a causal relationship, scientists necessarily rely on the concept of “statistical significance.” The requirement of statistical reliability, which tends to prove that the relationship is not merely the product of chance, is a fundamental and indispensable component of valid scientific methodology.”4

And then again, the AMA spelled out its position, in case the Court missed its other references to the importance of statistical significance:

Medical studies, whether clinical trials or epidemiologic studies, frequently demonstrate some correlation between the action studied … . To determine whether the observed correlation is not due to chance, medical scientists rely on the concept of ‘statistical significance’. A ‘statistically significant’ correlation is generally considered to be one in which statistical analysis suggests that the observed relationship is not the result of chance. A statistically significant correlation does not ‘prove’ causation, but in the absence of such a correlation, scientific causation clearly is not proven.95

In its footnote 9, in the above quoted section of the brief, the AMA called out the plaintiffs’ transposition fallacy, without specifically citing to plaintiffs’ briefs:

It is misleading to compare the 95% confidence level used in empirical research to the 51% level inherent in the preponderance of the evidence standard.”6

Actually the plaintiffs’ ruse was much worse than misleading. The plaintiffs did not compare the two probabilities; they equated them. Some might call this ruse, an outright fraud on the court. In any event, the AMA amicus brief remains an available, citable source for opposing this fraud and the casual dismissal of the importance of statistical significance.

One other amicus brief touched on the plaintiffs’ statistical shanigans. The Product Liability Advisory Council, National Association of Manufacturers, Business Roundtable, and Chemical Manufacturers Association jointly filed an amicus brief to challenge some of the excesses of the plaintiffs’ submissions.7  Plaintiffs’ expert witness, Shanna Swan, had calculated type II error rates and post-hoc power for some selected epidemiologic studies relied upon by the defense. Swan’s complaint had been that some studies had only 20% probability (power) to detect a statistically significant doubling of limb reduction risk, with significance at p < 5%.8

The PLAC Brief pointed out that power calculations must assume an alternative hypothesis, and that the doubling of risk hypothesis had no basis in the evidentiary record. Although the PLAC complaint was correct, it missed the plaintiffs’ point that the defense had set exceeding a risk ratio of 2.0, as an important benchmark for specific causation attributability. Swan’s calculation of post-hoc power would have yielded an even lower probability for detecting risk ratios of 1.2 or so. More to the point, PLAC noted that other studies had much greater power, and that collectively, all the available studies would have had much greater power to have at least one study achieve statistical significance without dodgy re-analyses.


1 The Advocates’ Errors in Daubert” (Dec. 28, 2018).

2 American Academy of Allergy and Immunology, American Academy of Dermatology, American Academy of Family Physicians, American Academy of Neurology, American Academy of Orthopaedic Surgeons, American Academy of Pain Medicine, American Association of Neurological Surgeons, American College of Obstetricians and Gynecologists, American College of Pain Medicine, American College of Physicians, American College of Radiology, American Society of Anesthesiologists, American Society of Plastic and Reconstructive Surgeons, American Urological Association, and College of American Pathologists.

3 Brief of the American Medical Association, et al., as Amici Curiae, in Support of Respondent, in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 13006285, at *27 (U.S., Jan. 19, 1993)[AMA Brief].

4 AMA Brief at *4-*5 (emphasis added).

5 AMA Brief at *14-*15 (emphasis added).

6 AMA Brief at *15 & n.9.

7 Brief of the Product Liability Advisory Council, Inc., National Association of Manufacturers, Business Roundtable, and Chemical Manufacturers Association as Amici Curiae in Support of Respondent, as Amici Curiae, in Support of Respondent, in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 13006288 (U.S., Jan. 19, 1993) [PLAC Brief].

8 PLAC Brief at *21.

The Advocates’ Errors in Daubert

December 28th, 2018

Over 25 years ago, the United States Supreme Court answered a narrow legal question about whether the so-called Frye rule was incorporated into Rule 702 of the Federal Rules of Evidence. Plaintiffs in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), appealed a Ninth Circuit ruling that the Frye rule survived, and was incorporated into, the enactment of a statutory evidentiary rule, Rule 702. As most legal observers can now discern, plaintiffs won the battle and lost the war. The Court held that the plain language of Rule 702 does not memorialize Frye; rather the rule requires an epistemic warrant for the opinion testimony of expert witnesses.

Many of the sub-issues of the Daubert case are now so much water over the dam. The case involved claims of birth defects from maternal use of an anti-nausea medication, Bendectin. Litigation over Bendectin is long over, and the medication is now approved for use in pregnant women, on the basis of a full new drug application, supported by clinical trial evidence.

In revisiting Daubert, therefore, we might imagine that legal scholars and scientists would be interested in the anatomy of the errors that led Bendectin plaintiffs stridently to maintain their causal claims. The oral argument before the Supreme Court is telling with respect to some of the sources of error. Two law professors, Michael H. Gottesman, for plaintiffs, and Charles Fried, for the defense, squared off one Tuesday morning in March 1993. A review of Gottesman’s argument reveals several fallacious lines of argument, which are still relevant today:

A. Regulation is Based Upon Scientific Determinations of Causation

In his oral argument, Gottesman asserted that regulators (as opposed to the scientific community) are in charge of determining causation,1 and environmental regulations are based upon scientific causation determinations.2 By the time that the Supreme Court heard argument in the Daubert case, this conflation of scientific and regulatory standards for causal conclusions was fairly well debunked.3 Gottesman’s attempt to mislead the Court failed, but the effort continues in courtrooms around the United States.

B. Similar Chemical Structures Have the Same Toxicities

Gottesman asserted that human teratogenicity can be determined from similarity in chemical structures with other established teratogens.4 Close may count in horseshoes, but in chemical structural activities, small differences in chemical structures can result in huge differences in toxicologic or pharmacologic properties. A silly little methyl group on a complicated hydrocarbon ring structure can make a world of difference, as in the difference between estrogen and testosterone.

C. All Animals React the Same to Any Given Substance

Gottesman, in his oral argument, maintained that human teratogenicity can be determined from teratogenicity in non-human, non-primate, murine species.5 The Court wasted little time on this claim, the credibility of which has continued to decline in the last 25 years.

D. The Transposition Fallacy

Perhaps of greatest interest to me was Gottesman’s claim that the probability of the claimed causal association can be determined from the p-value or from the coefficient of confidence taken from the observational epidemiologic studies of birth defects among children of women who ingested Bendectin in pregancy; a.k.a. the transposition fallacy.6

All these errors are still in play in American courtrooms, despite efforts of scientists and scientific organizations to disabuse judges and lawyers. The transposition fallacy, which has been addressed in these pages and elsewhere at great length seems especially resilient to educational efforts. Still, the fallacy was as well recognized at the time of the Daubert argument as it is today, and it is noteworthy that the law professor who argued the plaintiffs’ case, in the highest court of the land, advanced this fallacious argument, and that the scientific and statistical community did little to nothing to correct the error.7

Although Professor Gottesman’s meaning in the oral argument is not entirely clear, on multiple occasions, he appeared to have conflated the coefficient of confidence, from confidence intervals, with the posterior probability that attaches to the alternative hypothesis of some association:

What the lower courts have said was yes, but prove to us to a degree of statistical certainty which would give us 95 percent confidence that the human epidemiological data is reflective, that these higher numbers for the mothers who used Bendectin were not the product of random chance but in fact are demonstrating the linkage between this drug and the symptoms observed.”8

* * * * *

“… what was demonstrated by Shanna Swan was that if you used a degree of confidence lower than 95 percent but still sufficient to prove the point as likelier than not, the epidemiological evidence is positive… .”9

* * * * *

The question is, how confident can we be that that is in fact probative of causation, not at a 95 percent level, but what Drs. Swan and Glassman said was applying the Rothman technique, a published technique and doing the arithmetic, that you find that this does link causation likelier than not.”10

Professor Fried’s oral argument for the defense largely refused or failed to engage with plaintiffs’ argument on statistical inference. With respect to the “Rothman” approach, Fried pointed out that plaintiffs’ statistical expert witness, Shanna swan, never actually employed “the Rothman principle.”11

With respect to plaintiffs’ claim that individual studies had low power to detect risk ratios of two, Professor Fried missed the opportunity to point out that such post-hoc power calculations, whatever validity they might possess, embrace the concept of statistical significance at the customary 5% level. Fried did note that a meta-analysis, based upon all the epidemiologic studies, rendered plaintiffs’ power complaint irrelevant.12

Some readers may believe that judging advocates speaking extemporaneously about statistical concepts might be overly harsh. How well then did the lawyers explain and represent statistical concepts in their written briefs in the Daubert case?

Petitioners’ Briefs

Petitioners’ Opening Brief

The petitioners’ briefs reveal that Gottesman’s statements at oral argument represent a consistent misunderstanding of statistical concepts. The plaintiffs consistently conflated significance probability or the coefficient of confidence with the civil burden of proof probability:

The crux of the disagreement between Merrell’s experts and those whose testimony is put forward by plaintiffs is that the latter are prepared to find causation more probable than not when the epidemiological evidence is strongly positive (albeit not at a 95% confidence level) and when it is buttressed with animal and chemical evidence predictive of causation, while the former are unwilling to find causation in the absence of an epidemiological study that satisfies the 95% confidence level.”13

After giving a reasonable fascimile of a definition of statistical significance, the plaintiffs’ brief proceeds to confuse the complement of alpha, or the coefficient of confidence (typically 95%), with probability that the observed risk ratio in a sample is the actual population parameter of risk:

But in toxic tort lawsuits, the issue is not whether it is certain that a chemical caused a result, but rather whether it is likelier than not that it did. It is not self-evident that the latter conclusion would require eliminating the null hypothesis (i.e. non-causation) to a confidence level of 95%.3014

The plaintiffs’ brief cited heavily to Rothman’s textbook, Modern Epidemiology, with the specious claim that the textbook supported the plaintiffs’ use of the coefficient of confidence to derive a posterior probability (> 50%) of the correctness of an elevated risk ratio for birth defects in children born to mothers who had taken Bendectin in their first trimesters of pregnancy:

An alternative mechanism has been developed by epidemiologists in recent years to give a somewhat more informative picture of what the statistics mean. At any given confidence level (e.g. 95%) a confidence interval can be constructed. The confidence interval identifies the range of relative risks that collectively comprise the 95% universe. Additional confidence levels are then constructed exhibiting the range at other confidence levels, e.g., at 90%, 80%, etc. From this set of nested confidence intervals the epidemiologist can make assessments of how likely it is that the statistics are showing a true association. Rothman, Tab 9, pp. 122-25. By calculating nested confidence intervals for the data in the Bendectin studies, Dr. Swan was able to determine that it is far more likely than not that a true association exists between Bendectin and human limb reduction birth defects. Swan, Tab 12, at 3618-28.”15

The heavy reliance upon Rothman’s textbook at first blush appears confusing. Modern Epidemiology makes one limited mention of nested confidence intervals, and certainly never suggests that such intervals can provide a posterior probability of the correctness of the hypothesis. Rothman’s complaints about reliance upon “statistical significance,” however, are well-known, and Rothman himself submitted an amicus brief16 in Daubert, a brief that has its own problems.17

In direct response to the Rothman Brief,18 Professor Alvin Feinstein filed an amicus brief in Daubert, wherein he acknowledged that meta-analyses and re-analyses can be valid, but these techniques are subject to many sources of invalidity, and their employment by careful practitioners in some instances should not be a blank check to professional witnesses who are supported by plaintiffs’ counsel. Similarly, Feinstein acknowledged that standards of statistical significance:

should be appropriately flexible, but they must exist if science is to preserve its tradition of intellectual discipline and high quality research.”19

Petitioners’ Reply Brief

The plaintiffs’ statistical misunderstandings are further exemplified in their Reply Brief, where they reassert the transposition fallacy and alternatively state that associations with p-values greater than 5%, or 95% confidence intervals that include the risk ratio of 1.0, do not show the absence of an association.20 The latter point was, of course irrelevant in the Daubert case, in which plaintiffs had the burden of persuasion. As in their oral argument through Professor Gottesman, the plaintiffs’ appellate briefs misunderstand the crucial point that confidence intervals are conditioned upon the data observed from a particular sample, and do not provide posterior probabilities for the correctness of a claimed hypothesis.

Defense Brief

The defense brief spent little time on the statistical issue or plaintiffs’ misstatements, but dispatched the issue in a trenchant footnote:

Petitioners stress the controversy some epidemiologists have raised about the standard use by epidemiologists of a 95% confidence level as a condition of statistical significance. Pet. Br. 8-10. See also Rothman Amicus Br. It is hard to see what point petitioners’ discussion establishes that could help their case. Petitioners’ experts have never developed and defended a detailed analysis of the epidemiological data using some alternative well-articulated methodology. Nor, indeed, do they show (or could they) that with some other plausible measure of confidence (say, 90%) the many published studies would collectively support an inference that Bendectin caused petitioners’ limb reduction defects. At the very most, all that petitioners’ theoretical speculations do is question whether these studies – as the medical profession and regulatory authorities in many countries have concluded – affirmatively prove that Bendectin is not a teratogen.”21

The defense never responded to the specious argument, stated or implied within the plaintiffs’ briefs, and in Gottesman’s oral argument, that a coefficient of confidence of 51% would have generated confidence intervals that routinely excluded the null hypothesis of risk ratio of 1.0. The defense did, however, respond to plaintiffs’ power argument by adverting to a meta-analysis that failed to find a statistically significant association.22

The defense also advanced two important arguments to which the plaintiffs’ briefs never meaningfully responded. First, the defense detailed the “cherry picking” or selective reliance engaged in by plaintiffs’ expert witnesses.23 Second, the defense noted that plaintiffs’ had a specific causation problem in that their expert witnesses had been attempting to infer specific causation based upon relative risks well below 2.0.24

To some extent, the plaintiffs’ statistical misstatements were taken up by an amicus brief submitted by the United States government, speaking through the office of the Solicitor General.25 Drawing upon the Supreme Court’s decisions in race discrimination cases,26 the government asserted that epidemiologists “must determine” whether a finding of an elevated risk ratio “could have arisen due to chance alone.”27

Unfortunately, the government’s brief butchered the meaning of confidence intervals. Rather than describe the confidence interval as showing what point estimates of risk ratios are reasonable compatible with the sample result, the government stated that confidence intervals show “how close the real population percentage is likely to be to the figure observed in the sample”:

since there is a 95 percent chance that the ‘true’ value lies within two standard deviations of the sample figure, that particular ‘confidence interval’ (i.e., two standard deviations) is therefore said to have a ‘confidence level’ of about 95 percent.” 28

The Solicitor General’s office seemed to have had some awareness that it was giving offense with the above definition because it quickly added:

“While it is customary (and, in many cases, easier) to speak of ‘a 95 percent chance’ that the actual population percentage is within two standard deviations of the figure obtained from the sample, ‘the chances are in the sampling procedure, not in the parameter’.”29

Easier perhaps but clearly erroneous to speak that way, and customary only among the unwashed. The government half apologized for misleading the Court when it followed up with a better definition from David Freedman’s textbook, but sadly the government lawyers were not content to let the matter sit there. The Solicitor General offices brief obscured the textbook definition with a further inaccurate and false précis:

if the sampling from the general population were repeated numerous times, the ‘real’ population figure would be within the confidence interval 95 percent of the time. The ‘real’ figure would be outside that interval the remaining five percent of the time.”30

The lawyers in the Solicitor General’s office thus made the rookie mistake of forgetting that in the long run, after numerous repeated samples, there would be numerous confidence intervals, not one. The 95% probability of containing the true population value belongs to the set of the numerous confidence intervals, not “the confidence interval” obtained in the first go around.

The Daubert case has been the subject of nearly endless scholarly comment, but few authors have chosen to revisit the parties’ briefs. Two authors have published a paper that reviewed the scientists’ amici briefs in Daubert.31 The Rothman brief was outlined in detail; the Feinstein rebuttal was not substantively discussed. The plaintiffs’ invocation of the transposition fallacy in Daubert has apparently gone unnoticed.


1 Oral Argument in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 754951, *5 (Tuesday, March 30, 1993) [Oral Arg.]

2 Oral Arg. at *6.

3 In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y.1984) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir. 1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988).

4 Org. Arg. at *19.

5 Oral Arg. at *18-19.

6 Oral Arg. at *19.

7 See, e.g., “Sander Greenland on ‘The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics’” (Feb. 8, 2015) (noting biostatistician Sander Greenland’s publications, which selectively criticize only defense expert witnesses and lawyers for statistical misstatements); see alsoSome High-Value Targets for Sander Greenland in 2018” (Dec. 27, 2017).

8 Oral Arg. at *19.

9 Oral Arg. at *20

10 Oral Arg. at *44. At the oral argument, this last statement was perhaps Gottesman’s clearest misstatement of statistical principles, in that he directly suggested that the coefficient of confidence translates into a posterior probability of the claimed association at the observed size.

11 Oral Arg. at *37.

12 Oral Arg. at *32.

13 Petitioner’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1992 WL 12006442, *8 (U.S. Dec. 2, 1992) [Petitioiner’s Brief].

14 Petitioner’s Brief at *9.

15 Petitioner’s Brief at *n. 36.

16 Brief Amici Curiae of Professors Kenneth Rothman, Noel Weiss, James Robins, Raymond Neutra and Steven Stellman, in Support of Petitioners, 1992 WL 12006438, Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. S. Ct. No. 92-102 (Dec. 2, 1992).

18 Brief Amicus Curiae of Professor Alvan R. Feinstein in Support of Respondent, in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 13006284, at *2 (U.S., Jan. 19, 1993) [Feinstein Brief].

19 Feinstein Brief at *19.

20 Petitioner’s Reply Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006390, at *4 (U.S., Feb. 22, 1993).

21 Respondent’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006277, at n. 32 (U.S., Jan. 19, 1993) [Respondent Brief].

22 Respondent Brief at *4.

23 Respondent Brief at *42 n.32 and 47.

24 Respondent Brief at *40-41 (citing DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990)).

25 Brief for the United States as Amicus Curiae Supporting Respondent in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006291 (U.S., Jan. 19, 1993) [U.S. Brief].

26 See, e.g., Hazelwood School District v. United States, 433 U.S. 299, 308-312

(1977); Castaneda v. Partida, 430 U.S. 482, 495-499 & nn.16-18 (1977) (“As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would be suspect to a social scientist.”).

27 U.S. Brief at *3-4. Over two decades later, when politically convenient, the United States government submitted an amicus brief in a case involving alleged securities fraud for failing to disclose adverse events of an over-the-counter medication. In Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011), the securities fraud plaintiffs contended that they need not plead “statistically significant” evidence for adverse drug effects. The Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs disclaimed the necessity, or even the importance, of statistical significance:

[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).

28 U.S. Brief at *5.

29 U.S. Brief at *5-6 (citing David Freedman, Freedman, R. Pisani, R. Purves & A. Adhikari, Statistics 351, 397 (2d ed. 1991)).

30 U.S. Brief at *6 (citing Freedman’s text at 351) (emphasis added).

31 See Joan E. Bertin & Mary S. Henifin, Science, Law, and the Search for Truth in the Courtroom: Lessons from Dauburt v. Menell Dow,” 22 J. Law, Medicine & Ethics 6 (1994); Joan E. Bertin & Mary Sue Henifin, “Scientists Talk to Judges: Reflections on Daubert v. Merrell Dow,” 4(3) New Solutions 3 (1994). The authors’ choice of the New Solutions journal is interesting and curious. New Solutions: A journal of Environmental and Occupational Health Policy was published by the Oil, Chemical and Atomic Workers International Union, under the control of Anthony Mazzocchi (June 13, 1926 – Oct. 5, 2002), who was the union’s secretary-treasurer. Anthony Mazzocchi, “Finding Common Ground: Our Commitment to Confront the Issues,” 1 New Solutions 3 (1990); see also Steven Greenhouse, “Anthony Mazzocchi, 76, Dies; Union Officer and Party Father,” N.Y. Times (Oct. 9, 2002). Even a cursory review of this journal’s contents reveals how concerned, even obsessed, the union was interested and invested in the litigation industry and that industry’s expert witnesses. 

 

The “Rothman” Amicus Brief in Daubert v. Merrill Dow Pharmaceuticals

November 17th, 2018

Then time will tell just who fell
And who’s been left behind”

                  Dylan, “Most Likely You Go Your Way” (1966)

 

When the Daubert case headed to the Supreme Court, it had 22 amicus briefs in tow. Today that number is routine for an appeal to the high court, but in 1992, it was a signal of intense interest in the case among both the scientific and legal community. To the litigation industry, the prospect of judicial gatekeeping of expert witness testimony was an anathema. To the manufacturing industry, the prospect was precious to defend against specious claiming.

With the benefit of 25 years of hindsight, a look at some of those amicus briefs reveals a good deal about the scientific and legal acumen of the “friends of the court.” Not all amicus briefs in the case were equal; not all have held up well in the face of time. The amicus brief of the American Association for the Advancement of Science and the National Academy of Science was a good example of advocacy for the full implementation of gatekeeping on scientific principles of valid inference.1 Other amici urged an anything goes approach to judicial oversight of expert witnesses.

One amicus brief often praised by Plaintiffs’ counsel was submitted by Professor Kenneth Rothman and colleagues.2 This amicus brief is still cited by parties who find support in the brief for their excuses for not having consistent, valid, strong, and statistically significance evidence to support their claims of causation. To be sure, Rothman did target statistical significance as a strict criterion of causal inference, but there is little support in the brief for the loosey-goosey style of causal claiming that is so prevalent among lawyers for the litigation industry. Unlike the brief filed by the AAAS and the National Academy of Science, Rothman’s brief abstained from the social policies implied by judicial gatekeeping or its rejection. Instead, Rothman’s brief wet out to make three narrow points:

(1) courts should not rely upon strict statistical significance testing for admissibility determinations;

(2) peer review is not an appropriate touchstone for the validity of an expert witness’s opinion; and

(3) unpublished, non-peer-reviewed “reanalysis” of studies is a routine part of the scientific process, and regularly practiced by epidemiologists and other scientists.

Rothman was encouraged to target these three issues by the lower courts’ opinions in the Daubert case, in which the courts made blanket statements about the role of absent statistical significance and peer review, and the illegitimacy of “re-analyses” of published studies.

Professor Rothman has made many admirable contributions to epidemiologic practice, but the amicus brief submitted by him and his colleagues falls into the trap of making the sort of blanket general statements that they condemned in the lower courts’ opinions. Of the brief’s three points, the first, about statistical significance is the most important for epidemiologic and legal practice. Despite reports of an odd journal here or there “abolishing” p-values, most medical journals continue to require the presentation of either p-values or confidence intervals. In the majority of medical journals, 95% confidence intervals that exclude a null hypothesis risk ratio of 1.0, or risk difference of 0, are labelled “statistically significant,” sometimes improvidently in the presence of multiple comparisons and lack of pre-specification of outcome.

For over three decades, Rothman has criticized the prevailing practice on statistical significance. Professor Rothman is also well known for his advocacy for the superiority of confidence intervals over p-values in conveying important information about what range of values are reasonably compatible with the observed data.3 His criticisms of p-values and his advocacy for estimation with intervals have pushed biomedical publishing to embrace confidence intervals as more informative than just p-values. Still, his views on statistical significance have never gained complete acceptance at most clinical journals. Biomedical scientists continue to interpret 95% confidence intervals, at least in part, as to whether they show “significance” by excluding the null hypothesis value of no risk difference or of risk ratios equal to 1.0.

The first point in Rothman’s amicus brief is styled:

THE LOWER COURTS’ FOCUS ON SIGNIFICANCE TESTING IS BASED ON THE INACCURATE ASSUMPTION THAT ‘STATISTICAL SIGNIFICANCE’ IS REQUIRED IN ORDER TO DRAW INFERENCES FROM EPIDEMIOLOGICAL INFORMATION”

The challenge by Rothman and colleagues to the “assumption” that statistical significance is necessary is what, of course, has endeared this brief to the litigation industry. A close read of the brief, however, shows that Rothman’s critique of the assumption is equivocal. Rothman et amici characterized the lower courts as having given:

blind deference to inappropriate and arcane publication standards and ‘significance testing’.”4

The brief is silent about what might be knowing deference, or appropriate publication standards. To be sure, judges have often poorly expressed their reasoning for deciding scientific evidentiary issues, and perhaps poor communication or laziness by judges was responsible for Rothman’s interest in joining the Daubert fray. Putting aside the unclear, rhetorical, and somewhat hyperbolic use of “arcane” in the quote above, the suggestion of inappropriate blind deference is itself expressed in equivocal terms in the brief. At times the authors rail at the use of statistical significance as the “sole” criterion, and at times, they seem to criticize its use at all.

At least twice in their brief, Rothman and friends declare that the lower court:

misconstrues the validity and appropriateness of significance testing as a decision making tool, apparently deeming it the sole test of epidemiological hypotheses.”5

* * * * * *

this Court should reject significance testing as the sole acceptable criterion of scientific validity in epidemiology.”6

Characterizing “statistical significance” as not the sole test or criterion of scientific inference is hardly controversial, and it implies that statistical significance is one test, criterion, or factor among others. This position is consistent with the current ASA Statement on Significance Testing.7 There is, of course, much more to evaluate in a study or a body of studies, than simply whether they individually or collectively help us to exclude chance as an explanation for their findings.

Statistical Significance Is Not Necessary At All

Elsewhere, Rothman and friends take their challenge to statistical significance testing beyond merely suggesting that such testing is only one test or criterion among others. Indeed, their brief in other places states their opinion that significance testing is not necessary at all:

Testing for significance, however, is often mistaken for a sine qua non of scientific inference.”8

And at other times, Rothman and friends go further yet and claim not only that significance is not necessary, but that it is not even appropriate or useful:

Significance testing, however, is neither necessary nor appropriate as a requirement for drawing inferences from epidemiologic data.”9

Rothman compares statistical significance testing with “scientific inference,” which is not a mechanical, mathematical procedure, but rather a “thoughtful evaluation[] of possible explanations for what is being observed.”10 Significance testing, in contrast,” is “merely a statistical tool,” used inappropriately “in the process of developing inferences.”11 Rothman suggests that the term “statistical significance” could be eliminated from scientific discussions without loss of meaning, and this linguistic legerdemain shows that the phrase is unimportant in science and in law.12 Rothman’s suggestion, however, ignores that causal assessments have always required an evaluation of the play of chance, especially for putative causes, which are neither necessary nor sufficient, and which modify underlying stochastic processes by increasing or decreasing the probability of a specified outcome. Asserting that statistical significance is misleading because it never describes the size of an association, which the Rothman brief does, is like telling us that color terms tell us nothing about the mass of a body.

The Rothman brief does make the salutary point that labeling a study outcome as not “statistically significant” carries the danger that the study’s data have no value, or that the study may be taken to reject the hypothesized association. In 1992, such an interpretation may have been more common, but today, in the face of the proliferation of meta-analyses, the risk of such interpretations of single study outcomes is remote.

Questionable History of Statistics

Rothman suggests that the development of statistical hypothesis testing occurred in the context of agricultural and quality-control experiments, which required yes-no answers for future action.13 This suggestion clearly points at Sir Ronald Fisher and Jerzy Neyman, and their foundational work on frequentist statistical theory and practice. In part, the amici correctly identified the experimental milieu in which Fisher worked, but the description of Fisher’s work is neither accurate nor fair. Fisher spent a lifetime thinking and writing about statistical tests, in much more nuanced ways than implied by the claim that such testing occurred in context of agricultural and quality-control experiments. Although Fisher worked on agricultural experiments, his writings acknowledged that when statistical tests and analyses were applied to observational studies, much more searching analyses of bias and confounding were required. Fisher’s and Berkson’s reactions to the observational studies of Hill and Doll on smoking and lung cancer are telling in this regard. These statisticians criticized the early smoking lung cancer studies, not for lack of statistical significance, but for failing to address confounding by a potential common genetic propensity to smoke and to develop lung cancer.

Questionable History of Drug Development

Twice in Rothman’s amicus brief, the authors suggest that “undue reliance” on statistical significance has resulted in overlooking “effective new treatments” because observed benefits were considered “not significant,” despite an “indication” of efficacy.14 The brief never provided any insight on what is due reliance and what is undue reliance on statistical significance. Their criticism of “undue reliance” implies that there are modes or instances of “due reliance” upon statistical significance. The amicus brief fails also to inform readers exactly what “effective new treatments” have been overlooked because the outcomes were considered “not significant.” This omission is regrettable because it leaves the reader with only abstract recommendations, without concrete examples of what such effective treatments might be. The omission was unfortunate because Rothman almost certainly could have marshalled examples. Recently, Rothman tweeted just such an example:15

“30% ↓ in cancer risk from Vit D/Ca supplements ignored by authors & editorial. Why? P = 0.06. http://bit.ly/2oanl6w http://bit.ly/2p0CRj7. The 95% confidence interval for the risk ratio was 0.42–1.02.”

Of course, this was a large, carefully reported randomized clinical trial, with a narrow confidence interval that just missed “statistical significance.” It is not an example that would have given succor to Bendectin plaintiffs, who were attempting to prove an association by identifying flaws in noisy observational studies that generally failed to show an association.

Readers of the 1992 amicus brief can only guess at what might be “indications of efficacy”; no explanation or examples are provided.16 The reality of FDA approvals of new drugs is that pre-specified 5% level of statistical significance is virtually always enforced.17 If a drug sponsor has “indication of efficacy,” it is, of course, free to follow up with an additional, larger, better-designed clinical trial. Rothman’s recent tweet about the vitamin D clinical trial does provide some context and meaning to what the amici may have meant over 25 years ago by indication of efficacy. The tweet also illustrates Rothman’s acknowledgment of the need to address random variability in a data set, whether by p-value or confidence interval, or both. Clearly, Rothman was criticizing the authors of the vitamin D trial for stopping short of claiming that they had shown (or “demonstrated”) a cancer survival benefit. There is, however, a rich literature on vitamin D and cancer outcomes, and such a claim could be made, perhaps, in the context of a meta-analysis or meta-regression of multiple clinical trials, with a synthesis of other experimental and observational data.18

Questionable History of Statistical Analyses in Epidemiology

Rothman’s amicus brief deserves credit for introducing a misinterpretation of Sir Austin Bradford Hill’s famous paper on inferring causal associations, which has become catechism in the briefs of plaintiffs in pharmaceutical and other products liability cases:

No formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 290 (1965) (quoted at Rothman Brief at *6).

As exegesis of Hill’s views, this quote is misleading. The language quoted above was used by Hill in the context of his nine causal viewpoints or criteria. The Rothman brief ignores Hill’s admonition to his readers, that before reaching the nine criteria, there is a serious, demanding predicate that must be shown:

Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”

Id. at 295 (emphasis added). Rothman and co-authors did not have to invoke the prestige and authority of Sir Austin, but once they did, they were obligated to quote him fully and with accurate context. Elsewhere, in his famous textbook, Hill expressed his view that common sense was insufficient to interpret data, and that the statistical method was necessary to interpret data in medical studies.19

Rothman complains that statistical significance focuses the reader on conjecture on the role of chance in the observed data rather than the information conveyed by the data themselves.20 The “incompleteness” of statistical analysis for arriving at causal conclusions, however, is not an argument against its necessity.

The Rothman brief does make the helpful point that statistical significance cannot be sufficient to support a conclusion of causation because many statistically significant associations or correlations will be non-causal. They give a trivial example of wearing dresses and breast cancer, but the point is well-taken. Associations, even when statistically significant, are not necessarily causal conclusions. Who ever suggested otherwise, other than expert witnesses for the litigation industry?

Unnecessary Fears

The motivation for Rothman’s challenge to the assumption that statistical significance is necessary is revealed at the end of the argument on Point I. The authors plainly express their concern that false negatives will shut down important research:

To give weight to the failure of epidemiological studies to meet strict ‘statistical significant’ standards — to use such studies to close the door on further inquiry — is not good science.”21

The relevance of this concern to the proceedings is a mystery. The judicial decisions in the case are not referenda on funding initiatives. Scientists were as free in 1993, after Daubert was decided, as they were in 1992, when Rothman wrote, to pursue the hypothesis that Bendectin caused birth defects. The decision had the potential to shut down tort claims, and left scientists to their tasks.

Reanalyses Are Appropriate Scientific Tools to Assess and Evaluate Data, and to Forge Causal Opinions

The Rothman brief took issue with the lower courts’ dismissal of plaintiffs’ expert witnesses’ re-analyses of data in published studies. The authors argued that reanalyses were part of the scientific method, and not “an arcane or specialized enterprise,” deserving of heightened or skeptical scrutiny.22

Remarkably, the Rothman brief, if accepted by the Supreme Court on the re-analysis point, would have led to the sort of unthinking blanket acceptance of a methodology, which the brief’s authors condemned in the context of blanket acceptance of significance testing. The brief covertly urges “blind deference” to its authors on the blanket approval of re-analyses.

Although amici have tight page limits, the brief’s authors made clear that they were offering no substantive opinions on the data involved in the published epidemiologic studies on Bendectin, or on the plaintiffs’ expert witnesses’ re-analyses. With the benefit of hindsight, we can see that the sweeping language used by the Ninth Circuit on re-analyses might have been taken to foreclose important and valid meta-analyses or similar approaches. The Rothman brief is not terribly explicit on what re-analysis techniques were part of the scientific method, but meta-analyses surely had been on the authors’ minds:

by focusing on inappropriate criteria applied to determine what conclusions, if any, can be reached from any one study, the trial court forecloses testimony about inferences that can be drawn from the combination of results reported by many such studies, even when those studies, standing alone, might not justify such inferences.”23

The plaintiffs’ statistical expert witness in Daubert had proffered a re-analysis of at least one study by substituting a different control sample, as well as a questionable meta-analyses. By failing to engage on the propriety of the specific analyses at issue in Daubert, the Rothman brief failed to offer meaningful guidance to the appellate court.

Reanalyses Are Not Invalid Just Because They Have Not Been Published

Rothman was certainly correct that the value of peer review was overstated by the defense in Bendectin litigation.24 The quality of pre-publication peer review is spotty, at best. Predatory journals deploy a pay-to-play scheme, which makes a mockery of scientific publishing. Even at respectable journals, peer review cannot effectively guard against fraud, or ensure that statistical analyses have been appropriately done.25 At best, peer review is a weak proxy for study validity, and an unreliable one at that.

The Rothman brief may have moderated the Supreme Court’s reaction to the defense’s argument that peer review is a requirement for studies, or “re-analyses,” relied upon by expert witnesses. The Court in Daubert opined, in dicta, that peer review is a non-dispositive consideration:

The fact of publication (or lack thereof) in a peer reviewed journal … will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”26

To the extent that Rothman and colleagues might have been disappointed in this outcome, they missed some important context of the Bendectin cases. Most of the cases had been resolved by a consolidated causation issues trial, but many opt-out cases had to be tried in state and federal courts around the country.27 The expert witnesses challenged in Daubert (Drs. Swan and Done) participated in many of these opt-out cases, and in each case, they opined that Bendectin was a public health hazard. The failure of these witnesses to publish their analyses and re-analyses spoke volumes about their bona fides. Courts (and juries if the Swan and Done proffered testimony were admissible) could certainly draw negative inferences from the plaintiffs’ expert witnesses’ failure to publish their opinions and re-analyses.

The Fate of the “Rothman Approach” in the Courts

The so-called “Rothman approach” was urged by Bendectin plaintiffs in opposing summary judgment in a case pending in federal court, in New Jersey, before the Supreme Court decided Daubert. Plaintiffs resisted exclusion of their expert witnesses, who had relied upon inconsistent and statistically non-significant studies on the supposed teratogenicity of Bendectin. The trial court excluded the plaintiffs’ witnesses, and granted summary judgment.28

On appeal, the Third Circuit reversed and remanded the DeLucas’s case for a hearing under Rule 702:

by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicia of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”29

After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow.30

In the end, the decisions in the DeLuca case never endorsed the Rothman approach, although Professor Rothman can take credit perhaps for forcing the trial court, on remand, to come to grips with the informational content of the study data, and the many threats to validity, which severely undermined the relied-upon studies and the plaintiffs’ expert witnesses’ opinions.

More recently, in litigation over alleged causation of birth defects in offspring of mothers who used Zoloft during pregnancy, plaintiffs’ counsel attempted to resurrect, through their expert witnesses, the Rothman approach. The multidistrict court saw through counsel’s assertions that the Rothman approach had been adopted in DeLuca, or that it had become generally accepted.31 After protracted litigation in the Zoloft cases, the district court excluded plaintiffs’ expert witnesses and entered summary judgment for the defense. The Third Circuit found that the district court’s handling of the statistical significance issues was fully consistent with the Circuit’s previous pronouncements on the issue of statistical significance.32


1 filed in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102 (Jan. 19, 1993), was submitted by Richard A. Meserve and Lars Noah, of Covington & Burling, and by Bert Black, 12 Biotechnology Law Report 198 (No. 2, March-April 1993); see Daubert’s Silver Anniversary – Retrospective View of Its Friends and Enemies” (Oct. 21, 2018).

2 Brief Amici Curiae of Professors Kenneth Rothman, Noel Weiss, James Robins, Raymond Neutra and Steven Stellman, in Support of Petitioners, 1992 WL 12006438, Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. S. Ct. No. 92-102 (Dec. 2, 1992). [Rothman Brief].

3 Id. at *7.

4 Rothman Brief at *2.

5 Id. at *2-*3 (emphasis added).

6 Id. at *7 (emphasis added).

7 See Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016)

8 Id. at *3.

9 Id. at *2.

10 Id. at *3 – *4.

11 Id. at *3.

12 Id. at *3.

13 Id. at *4 -*5.

14 Id. at*5, *6.

15 at <https://twitter.com/ken_rothman/status/855784253984051201> (April 21, 2017). The tweet pointed to: Joan Lappe, Patrice Watson, Dianne Travers-Gustafson, Robert Recker, Cedric Garland, Edward Gorham, Keith Baggerly, and Sharon L. McDonnell, “Effect of Vitamin D and Calcium Supplementation on Cancer Incidence in Older WomenA Randomized Clinical Trial,” 317 J. Am. Med. Ass’n 1234 (2017).

16 In the case of United States v. Harkonen, Professors Ken Rothman and Tim Lash, and I made common cause in support of Dr. Harkonen’s petition to the United States Supreme Court. The circumstances of Dr. Harkonen’s indictment and conviction provide a concrete example of what Dr. Rothman probably was referring to as “indication of efficacy.” I supported Dr. Harkonen’s appeal because I agreed that there had been a suggestion of efficacy, even if Harkonen had overstated what his clinical trial, standing alone, had shown. (There had been a previous clinical trial, which demonstrated a robust survival benefit.) From my perspective, the facts of the case supported Dr. Harkonen’s exercise of speech in a press release, but it would hardly have justified FDA approval for the indication that Dr. Harkonen was discussing. If Harkonen had indeed committed “wire fraud,” as claimed by the federal prosecutors, then I had (and still have) a rather long list of expert witnesses who stand in need of criminal penalties and rehabilitation for their overreaching opinions in court cases.

17 Robert Temple, “How FDA Currently Makes Decisions on Clinical Studies,” 2 Clinical Trials 276, 281 (2005); Lee Kennedy-Shaffer, “When the Alpha is the Omega: P-Values, ‘Substantial Evidence’, and the 0.05 Standard at FDA,” 72 Food & Drug L.J. 595 (2017); see alsoThe 5% Solution at the FDA” (Feb. 24, 2018).

18 See, e.g., Stefan Pilz, Katharina Kienreich, Andreas Tomaschitz, Eberhard Ritz, Elisabeth Lerchbaum, Barbara Obermayer-Pietsch, Veronika Matzi, Joerg Lindenmann, Winfried Marz, Sara Gandini, and Jacqueline M. Dekker, “Vitamin D and cancer mortality: systematic review of prospective epidemiological studies,” 13 Anti-Cancer Agents in Medicinal Chem. 107 (2013).

19 Austin Bradford Hill, Principles of Medical Statistics at 2, 10 (4th ed. 1948) (“The statistical method is required in the interpretation of figures which are at the mercy of numerous influences, and its object is to determine whether individual influences can be isolated and their effects measured.”) (emphasis added).

20 Id. at *6 -*7.

21 Id. at *9.

22 Id.

23 Id. at *10.

24 Rothman Brief at *12.

25 See William Childs, “Peering Behind The Peer Review Curtain,” Law360 (Aug. 17, 2018).

26 Daubert v. Merrell Dow Pharms., 509 U.S. 579, 594 (1993).

27 SeeDiclegis and Vacuous Philosophy of Science” (June 24, 2015).

28 DeLuca v. Merrell Dow Pharms., Inc., 131 F.R.D. 71 (D.N.J. 1990).

29 DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 955 (3d Cir. 1990).

30 DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (D.N.J. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

31 In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration), aff’d, 858 F.3d 787 (3d Cir. 2017) (affirming exclusion of plaintiffs’ expert witnesses’ dubious opinions, which involved multiple methodological flaws and failures to follow any methodology faithfully). See generallyZoloft MDL Relieves Matrixx Depression” (Jan. 30, 2015); “WOE — Zoloft Escapes a MDL While Third Circuit Creates a Conceptual Muddle” (July 31, 2015).

32 See Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir. 2011) (excluding Concussion hero, Dr. Bennet Omalu).

Passing Hypotheses Off as Causal Conclusions – Allen v. Martin Surfacing

November 11th, 2018

The November 2018 issue of the American Bar Association Journal (ABAJ) featured an exposé-style article on the hazards of our chemical environment, worthy of Mother Jones, or the International Journal of Health Nostrums, by a lawyer, Alan Bell.1 Alan Bell, according to his website, is a self-described “environmental health warrior.” Channeling Chuck McGill, Bell also describes himself as a:

[v]ictim, survivor, advocate and avenger. This former organized crime prosecutor almost died from an environmentally linked illness. He now devotes his life to giving a voice for those too weak or sick to fight for themselves.”

Bell apparently is not so ill that he cannot also serve as “a fierce advocate” for victims of chemicals. Here is how Mr. Bell described his own “environmentally linked illness” (emphasis added):

Over the following months, Alan developed high fevers, sore throats, swollen glands and impaired breathing. Eventually, he experienced seizures and could barely walk. His health continued to worsen until he became so ill he was forced to stop working. Despite being examined and tested by numerous world-renowned doctors, none of them could help. Finally, a doctor diagnosed him with Multiple Chemical Sensitivity, a devastating illness caused by exposure to environmental toxins. The medical profession had no treatment to offer Alan: no cure, and no hope. Doctors could only advise him to avoid all synthetic chemicals and live in complete isolation within a totally organic environment.”

Multiple chemical sensitivity (MCS)? Does anyone still remember “clinical ecology”? Despite the strident advocacy of support groups and self-proclaimed victims, MCS is not recognized as a chemically caused illness by the World Health Organization, the American Medical Association, the American Academy of Allergy and Immunology, and the American College of Physicians.2 Double-blinded, placebo-controlled clinical trials have shown that putative MCS patients respond to placebo as strongly as they react to chemicals.3

Still, Bell’s claims must be true; Bell has written a book, Poisoned, about his ordeal and that of others.4 After recounting his bizarre medical symptoms, he describes his miraculous cure in a sterile bubble in the Arizona desert. From safe within his bubble, Bell has managed to create the “Environmental Health Foundation,” which is difficult if not impossible to find on the internet, although there are some cheesy endorsements to be found on YouTube.

According to Bell’s narrative, Daniel Allen, the football coach of the College of the Holy Cross was experiencing neurological signs and symptoms that could not be explained by physicians in the Boston area, home to some of the greatest teaching hospitals in the world. Allen and his wife, Laura, reached out Bell through his Foundation. Bell describes how he put the Allens in touch with Marcia Ratner, who sits on the Scientific Advisory Board of his Environmental Health Foundation. Bell sent the Allens to see “the world renown” Marcia Ratner, who diagnosed Mr. Allen with amyotrophic lateral sclerosis (ALS). Bell’s story may strike some as odd, considering that Ratner is not a physician. Ratner could not provide a cure for Mr. Allen’s tragic disease, but she could help provide the Allens with a lawsuit.

According to Bell:

Testimony from a sympathetic widow, combined with powerful evidence that the chemicals Dan was exposed to caused him to die long before his time, would smash their case to bits. The defense opted to seek a settlement. The case settled in 2009.5

The ABAJ article on the Allen case is a reprise of chapter 15 of Bell’s book “Chemicals Take Down a Football Coach.” Shame on the A.B.A. for not marking the article as unpaid advertising. More shame on the A.B.A. for not fact checking the glib causal claims made in the article, some of which have been the subject of a recently published “case report” in the red journal, the American Journal of Industrial Medicine, by Dr. Ratner and some, but not all, of the other expert witnesses for Mr. Allen’s litigation team.6 Had the editors of the ABAJ compared Mr. Bell’s statements and claims about the Allen case, they would have seen that Dr. Ratner, et al., ten years after beating back the defendants’ Daubert motion in the Allen case, described their literature review and assessment of Mr. Allen’s case, as merely “hypothesis generating”:

This literature review and clinical case report about a 45-year-old man with no family history of motor neuron disease who developed overt symptoms of a neuromuscular disorder in close temporal association with his unwitting occupational exposure to volatile organic compounds (VOCs) puts forth the hypothesis that exposure to VOCs such as toluene, which disrupt motor function and increase oxidative stress, can unmask latent ALS type neuromuscular disorder in susceptible individuals.”7

         * * * * * * *

In conclusion, this hypothesis generating case report provides additional support for the suggestion that exposure to chemicals that share common mechanisms of action with those implicated in the pathogenesis of ALS type neuromuscular disorders can unmask latent disease in susceptible persons. Further research is needed to elucidate these relationships.”8

So in 2018, the Allen case was merely a “hypothesis generating” case report. Ten years earlier, however, in 2008, when Ratner, Abou-Donia, Oliver, Ewing, and Clapp gave solemn oaths and testified under penalty of perjury to a federal district judge, the facts of the same case warranted a claim to scientific knowledge, under Rule 702. Judges, lawyers, and legal reformers should take note of how expert witnesses will characterize facile opinions as causal conclusions when speaking as paid witnesses, and as mere hypotheses in need of evidentiary support when speaking in professional journals to scientists. You’re shocked; eh?

Sometimes when federal courts permit dubious causation opinion testimony over Rule 702 objections, the culprit is bad lawyering by the opponent of the proffered testimony. The published case report by Ratner helps demonstrate that Allen v. Martin Surfacing, 263 F.R.D. 47 (D. Mass. 2009), was the result of litigation overreach by plaintiffs’ counsel and their paid expert witnesses, and a failure of organized skepticism by defense counsel and the judiciary.

Marcia H. Ratner, Ph.D.

I first encountered Dr. Ratner as an expert witness for the litigation industry in cases involving manganese-containing welding rods. Plaintiffs’ counsel, Dickie Scruggs, et al., withdrew her before the defense could conduct an examination before trial. When I came across the Daubert decision in the Allen case, I was intrigued because I had read Ratner’s dissertation9 and her welding litigation report, and saw what appeared to be fallacies10 similar to those that plagued the research of Dr. Brad Racette, who also had worked with Scruggs in conducting screenings, from which he extracted “data” for a study, which for a while became the center piece of Scruggs’ claims.11

The Allen case provoked some research on my part, and then a blog post about that case and Dr. Ratner.12 Dr. Ratner took umbrage to my blog post; and in email correspondence, she threatened to sue me for tortious interference with her prospective business opportunities. She also felt that the blog post had put her in a bad light by commenting upon her criminal conviction for unlawful gun possession.13 As a result of our correspondence, and seeing that Dr. Ratner was no stranger to the courtroom,14 I wrote a post-script to add some context and her perspective on my original post.15

One fact Dr Ratner wished me to include in the blog post-script was that plaintiffs’ counsel in the Allen case had pressured her to opine that toluene and isocyanates caused Mr. Allen’s ALS, and that she had refused. Dr. Ratner of course was making a virtue of necessity since there was, and is, a mountain of medical opinion, authoritative and well-supportive, that there is no known cause of sporadic ALS.16 Dr. Ratner was very proud, however, of having devised a work-around, by proffering an opinion that toluene caused the acceleration of Mr. Allen’s ALS. This causal claim about accelerated onset could have been tested with an observational study, but the litigation claim about earlier onset was as lacking in evidential support as the more straightforward claim of causation.

Bell’s article in the ABAJ – or rather his advertisement17 – cited an unpublished write up of the Allen case, by Ratner, The Allen Case: Our Daubert Strategy, Victory, and Its Legal and Medical Landmark Ramifications, in which she kvelled about how the Allen case was cited in the Reference Manual on Scientific Evidence. The Manual’s citations, however, were about the admissibility of the industrial hygienist’s proffered testimony on exposure, based in turn on Mr. Allen’s account of acute-onset symptoms.18 The Manual does not address the dubious acceleration aspect of Ratner’s causal opinion in the Allen case.

The puff piece in the ABAJ caused me to look again at Dr. Ratner’s activities. According to the Better Business Bureau reports that Dr. Marcia Ratner is a medical consultant in occupational and environmental toxicology. Since early 2016, she has been the sole proprietor of a consulting firm, Neurotoxicants.com, located in Mendon, Vermont. The firm’s website advertises that:

The Principals and Consultants of Neurotoxicants.com provide expert consulting in neurotoxicology and the relationships between neurotoxic chemical exposures and neurodegenerative disease onset and progression.

Only Ratner is identified as working on consulting through the firm. According to the LinkedIn entry for Neurotoxicants.com, Ratner is the also founder and director of Medical-Legal Research at Neurotoxicants.com. Ratner’s website advertises her involvement in occupational exposure litigation as an expert witness for claimants.19 Previously, Ratner was the Vice President and Director of Research at Chemical Safety Net, Inc., another consulting firm that she had founded with the late Robert G. Feldman, MD.

Conflict of Interest

The authors of the published Allen case report gave a curious conflict-of-interest disclosure at the end of their article:

The authors have no current specific competing interests to declare. However, Drs. Ratner, Abou-Donia and Oliver, and Mr. Ewing all served as expert witnesses in this case which settled favorably for the patient over 10 years ago with an outcome that is a fully disclosed matter of public record. Drs. Ratner, Abou-Donia and Oliver and Mr. Ewing are occasionally asked to serve as expert witnesses and/or consultants in occupational and environmental chemical exposure injury cases.”20

The disclosure conveniently omitted that Dr. Ratner owns a business that she set up to provide medico-legal consulting, and that Dr. Oliver testifies with some frequency in asbestos cases. None of the authors was, or is, an expert in the neuroepidemiology of ALS. Dr. Ratner’s conflict-of-interest disclosure in the Allen case report was, however, better than her efforts in previous publications that touched on the subject matter of her commercial consulting practice.21


1 Alan Bell, “Devastated by office chemicals, an attorney helps others fight toxic torts,Am. Bar. Ass’n J. (Nov. 2018).

2 See, e.g., American Academy of Allergy, Asthma and Immunology, “Idiopathic environmental intolerances,” 103 J. Allergy Clin. Immunol. 36 (1999).

3 See Susanne Bornschein, Constanze Hausteiner, Horst Römmelt, Dennis Nowak, Hans Förstl, and Thomas Zilker, “Double-blind placebo-controlled provocation study in patients with subjective Multiple Chemical Sensitivity and matched control subjects,” 46 Clin. Toxicol. 443 (2008); Susanne Bornschein, Hans Förstl, and Thomas Zilker, “Idiopathic environmental intolerances (formerly multiple chemical sensitivity) psychiatric perspectives,” 250 J. Intern. Med. 309 (2001).

4 Poisoned: How a Crime-Busting Prosecutor Turned His Medical Mystery into a Crusade for Environmental Victims (Skyhorse Publishing 2017).

5 Steven H. Foskett Jr., “Late Holy Cross coach’s family, insurers settle lawsuit for $681K,” Telegram & Gazette (Oct. 1, 2009). Obviously, the settlement amount represented a deep compromise over any plaintiff’s verdict.

6 Marcia H. Ratner, Joe F. Jabre, William M. Ewing, Mohamed Abou-Donia, and L. Christine Oliver, “Amyotrophic lateral sclerosis—A case report and mechanistic review of the association with toluene and other volatile organic compounds,” 61 Am. J. Ind. Med. 251 (2018).

7 Id. at 251.

8 Id. at 258 (emphasis added).

9 Marcia Hillary Ratner, Age at Onset of Parkinson’s Disease Among Subjects Occupationally Exposed to Metals and Pesticides; Doctoral Dissertation, UMI Number 3125932, Boston University (2004). Neither Ratner’s dissertation supervisor nor her three readers were epidemiologists.

11 See Brad A. Racette, S.D. Tabbal, D. Jennings, L. Good, Joel S. Perlmutter, and Brad Evanoff, “Prevalence of parkinsonism and relationship to exposure in a large sample of Alabama welders,” 64 Neurology 230 (2005).

13 See Quincy District Court News,” Patriot Ledger June 09, 2010 (reporting that Ratner pleaded guilty to criminal possession of mace and a firearm).

14 Ratner v. Village Square at Pico Condominium Owners Ass’n, Inc., No. 91-2-11 Rdcv (Teachout, J., Aug. 28, 2012).

17 Bell is a client of the Worthy Marketing Group.

18 RMSE3d at 505-06 n.5, 512-13 n. 26, 540 n.88; see also Allen v. Martin Surfacing, 2009 WL 3461145, 2008 U.S. Dist. LEXIS 111658, 263 F.R.D. 47 (D. Mass. 2008) (holding that an industrial hygienist was qualified to testify about the concentration and duration of plaintiffs’ exposure to toluene and isocyanates).

20 Id. at 259. One of the plaintiffs’ expert witnesses, Richard W. Clapp, opted out of co-author status on this publication.

21 See Marcia H. Ratner & Edward Fitzgerald, “Understanding of the role of manganese in parkinsonism and Parkinson disease,” 88 Neurology 338 (2017) (claiming no relevant conflicts of interest); Marcia H. Ratner, David H. Farb, Josef Ozer, Robert G. Feldman, and Raymon Durso, “Younger age at onset of sporadic Parkinson’s disease among subjects occupationally exposed to metals and pesticides,” 7 Interdiscip. Toxicol. 123 (2014) (failing to make any disclosure of conflicts of interest). In one short case report written with Dr. Jonathan Rutchik, another expert witness actively participated for the plaintiffs’ litigation industry in welding fume cases, Dr. Ratner let on that she “occasionally” is asked to serve as an expert witness, but she failed to disclose that she has a business enterprise set up to commercialize her expert witness work. Jonathan Rutchik & Marcia H. Ratner, “Is it Possible for Late-Onset Schizophrenia to Masquerade as Manganese Psychosis?” 60 J. Occup. & Envt’l Med. E207 (2018) (“The authors have no current specific competing interests to declare. However, Dr. Rutchik served as expert witnesses [sic] in this case. Drs. Rutchik and Ratner are occasionally asked to serve as expert witnesses and/or consultants in occupational and environmental chemical exposure injury cases.”)

Confounding in Daubert, and Daubert Confounded

November 4th, 2018

ABERRANT DECISIONS

The Daubert trilogy and the statutory revisions to Rule 702 have not brought universal enlightenment. Many decisions reflect a curmudgeonly and dismissive approach to gatekeeping.

The New Jersey Experience

Until recently, New Jersey law looked as though it favored vigorous gatekeeping of invalid expert witness opinion testimony. The law as applied, however, was another matter, with most New Jersey judges keen to find ways to escape the logical and scientific implications of the articulated standards, at least in civil cases.1 For example, in Grassis v. Johns-Manville Corp., 248 N.J. Super. 446, 591 A.2d 671, 675 (App. Div. 1991), the intermediate appellate court discussed the possibility that confounders may lead to an erroneous inference of a causal relationship. Plaintiffs’ counsel claimed that occupational asbestos exposure causes colorectal cancer, but the available studies, inconsistent as they were, failed to assess the role of smoking, family history, and dietary factors. The court essentially shrugged its judicial shoulders and let a plaintiffs’ verdict stand, even though it was supported by expert witness testimony that had relied upon seriously flawed and confounded studies. Not surprisingly, 15 years after the Grassis case, the scientific community acknowledged what should have been obvious in 1991: the studies did not support a conclusion that asbestos causes colorectal cancer.2

This year, however, saw the New Jersey Supreme Court step in to help extricate the lower courts from their gatekeeping doldrums. In a case that involved the dismissal of plaintiffs’ expert witnesses’ testimony in over 2,000 Accutane cases, the New Jersey Supreme Court demonstrated how to close the gate on testimony that is based upon flawed studies and involves tenuous and unreliable inferences.3 There were other remarkable aspects of the Supreme Court’s Accutane decision. For instance, the Court put its weight behind the common-sense and accurate interpretation of Sir Austin Bradford Hill’s famous articulation of factors for causal judgment, which requires that sampling error, bias, and confounding be eliminated before assessing whether the observed association is strong, consistent, plausible, and the like.4

Cook v. Rockwell International

The litigation over radioactive contamination from the Colorado Rocky Flats nuclear weapons plant is illustrative of the retrograde tendency in some federal courts. The defense objected to plaintiffs’ expert witness, Dr. Clapp, whose study failed to account for known confounders.5 Judge Kane denied the challenge, claiming that the defense could:

cite no authority, scientific or legal, that compliance with all, or even one, of these factors is required for Dr. Clapp’s methodology and conclusions to be deemed sufficiently reliable to be admissible under Rule 702. The scientific consensus is, in fact, to the contrary. It identifies Defendants’ list of factors as some of the nine factors or lenses that guide epidemiologists in making judgments about causation. Ref. Guide on Epidemiolog at 375.).”6

In Cook, the trial court or the parties or both missed the obvious references in the Reference Manual to the need to control for confounding. Certainly many other scientific sources could be cited as well. Judge Kane apparently took a defense expert witness’s statement that ecological studies do not account for confounders to mean that the presence of confounding does not render such studies unscientific. Id. True but immaterial. Ecological studies may be “scientific,” but they do not warrant inferences of causation. Some so-called scientific studies are merely hypothesis generating, preliminary, tentative, or data-dredging exercises. Judge Kane employed the flaws-are-features approach, and opined that ecological studies are merely “less probative” than other studies, and the relative weights of studies do not render them inadmissible.7 This approach is, of course, a complete abdication of gatekeeping responsibility. First, studies themselves are not admissible; it is the expert witness, whose testimony is challenged. The witness’s reliance upon studies is relevant to the Rule 702 and 703 analyses, but admissibility is not the issue. Second, Rule 702 requires that the proffered opinion be “scientific knowledge,” and ecological studies simply lack the necessary epistemic warrant to support a causal conclusion. Third, the trial court in Cook had to ignore the federal judiciary’s own reference manual’s warnings about the inability of ecological studies to provide causal inferences.8 The Cook case is part of an unfortunate trend to regard all studies as “flawed,” and their relative weights simply a matter of argument and debate for the litigants.9

Abilify

Another example of sloppy reasoning about confounding can be found in a recent federal trial court decision, In re Abilify Products Liability Litigation,10 where the trial court advanced a futility analysis. All observational studies have potential confounding, and so confounding is not an error but a feature. Given this simplistic position, it follows that failure to control for every imaginable potential confounder does not invalidate an epidemiologic study.11 From its nihilistic starting point, the trial court readily found that an expert witness could reasonably dispense with controlling for confounding factors of psychiatric conditions in studies of a putative association between the antipsychotic medication Abilify and gambling disorders.12

Under this sort of “reasoning,” some criminal defense lawyers might argue that since all human beings are “flawed,” we have no basis to distinguish sinners from saints. We have a long way to go before our courts are part of the evidence-based world.


1 In the context of a “social justice” issue such as whether race disparities exist in death penalty cases, New Jersey court has carefully considered confounding in its analyses. See In re Proportionality Review Project (II), 165 N.J. 206, 757 A.2d 168 (2000) (noting that bivariate analyses of race and capital sentences were confounded by missing important variables). Unlike the New Jersey courts (until the recent decision in Accutane), the Texas courts were quick to adopt the principles and policies of gatekeeping expert witness opinion testimony. See Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 714, 724 (Tex.1997) (reviewing court should consider whether the studies relied upon were scientifically reliable, including consideration of the presence of confounding variables).  Even some so-called Frye jurisdictions “get it.” See, e.g., Porter v. SmithKline Beecham Corp., No. 3516 EDA 2015, 2017 WL 1902905 *6 (Phila. Super., May 8, 2017) (unpublished) (affirming exclusion of plaintiffs’ expert witness on epidemiology, under Frye test, for relying upon an epidemiologic study that failed to exclude confounding as an explanation for a putative association), affirming, Mem. Op., No. 03275, 2015 WL 5970639 (Phila. Ct. Com. Pl. Oct. 5, 2015) (Bernstein, J.), and Op. sur Appellate Issues (Phila. Ct. Com. Pl., Feb. 10, 2016) (Bernstein, J.).

3 In re Accutane Litig., ___ N.J. ___, ___ A.3d ___, 2018 WL 3636867 (2018); see N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses(Aug. 8, 2018).

2018 WL 3636867, at *20 (citing the Reference Manual 3d ed., at 597-99).

5 Cook v. Rockwell Internat’l Corp., 580 F. Supp. 2d 1071, 1098 (D. Colo. 2006) (“Defendants next claim that Dr. Clapp’s study and the conclusions he drew from it are unreliable because they failed to comply with four factors or criteria for drawing causal interferences from epidemiological studies: accounting for known confounders … .”), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012). For another example of a trial court refusing to see through important qualitative differences between and among epidemiologic studies, see In re Welding Fume Prods. Liab. Litig., 2006 WL 4507859, *33 (N.D. Ohio 2006) (reducing all studies to one level, and treating all criticisms as though they rendered all studies invalid).

6 Id.   

7 Id.

8 RMSE3d at 561-62 (“[ecological] studies may be useful for identifying associations, but they rarely provide definitive causal answers”) (internal citations omitted); see also David A. Freedman, “Ecological Inference and the Ecological Fallacy,” in Neil J. Smelser & Paul B. Baltes, eds., 6 Internat’l Encyclopedia of the Social and Behavioral Sciences 4027 (2001).

9 See also McDaniel v. CSX Transportation, Inc., 955 S.W.2d 257 (Tenn. 1997) (considering confounding but holding that it was a jury issue); Perkins v. Origin Medsystems Inc., 299 F. Supp. 2d 45 (D. Conn. 2004) (striking reliance upon a study with uncontrolled confounding, but allowing expert witness to testify anyway)

10 In re Abilifiy (Aripiprazole) Prods. Liab. Litig., 299 F. Supp. 3d 1291 (N.D. Fla. 2018).

11 Id. at 1322-23 (citing Bazemore as a purported justification for the court’s nihilistic approach); see Bazemore v. Friday, 478 U.S. 385, 400 (1986) (“Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.).

12 Id. at 1325.


Appendix – Some Federal Court Decisions on Confounding

1st Circuit

Bricklayers & Trowel Trades Internat’l Pension Fund v. Credit Suisse Sec. (USA) LLC, 752 F.3d 82, 85 (1st Cir. 2014) (affirming exclusion of expert witness whose event study and causal conclusion failed to consider relevant confounding variables and information that entered market on the event date)

2d Circuit

In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1984) (noting that confounding had not been sufficiently addressed in a study of U.S. servicemen exposed to Agent Orange), aff’d, 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988)

3d Circuit

In re Zoloft Prods. Liab. Litig., 858 F.3d 787, 793, 799 (2017) (acknowledging that statistically significant findings occur in the presence of inadequately controlled confounding or bias; affirming the exclusion of statistical expert witness, Nicholas Jewell, in part for using an admittedly non-rigorous approach to adjusting for confouding by indication)

4th Circuit

Gross v. King David Bistro, Inc., 83 F. Supp. 2d 597 (D. Md. 2000) (excluding expert witness who opined shigella infection caused fibromyalgia, given the existence of many confounding factors that muddled the putative association)

5th Circuit

Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (noting that observed association may be causal or spurious, and that confounding factors must be considered to distinguish spurious from real associations)

Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 311 (5th Cir. 1989) (noting that “[o]ne difficulty with epidemiologic studies is that often several factors can cause the same disease.”)

6th Circuit

Nelson v. Tennessee Gas Pipeline Co., WL 1297690, at *4 (W.D. Tenn. Aug. 31, 1998) (excluding an expert witness who failed to take into consideration confounding factors), aff’d, 243 F.3d 244, 252 (6th Cir. 2001), cert. denied, 534 U.S. 822 (2001)

Adams v. Cooper Indus. Inc., 2007 WL 2219212, 2007 U.S. Dist. LEXIS 55131 (E.D. Ky. 2007) (differential diagnosis includes ruling out confounding causes of plaintiffs’ disease).

7th Circuit

People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (Posner, J.) (“a statistical study that fails to correct for salient explanatory variables, or even to make the most elementary comparisons, has no value as causal explanation and is therefore inadmissible in a federal court”) (educational achievement in multiple regression);

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940 (7th Cir. 1997) (holding that expert witness’s opinion, which failed to correct for any potential explanatory variables other than age, was inadmissible)

Allgood v. General Motors Corp., 2006 WL 2669337, at *11 (S.D. Ind. 2006) (noting that confounding factors must be carefully addressed; holding that selection bias rendered expert testimony inadmissible)

9th Circuit

In re Bextra & Celebrex Marketing Celebrex Sales Practices & Prod. Liab. Litig., 524 F.Supp. 2d 1166, 1178-79 (N.D. Cal. 2007) (noting plaintiffs’ expert witnesses’ inconsistent criticism of studies for failing to control for confounders; excluding opinions that Celebrex at 200 mg/day can cause heart attacks, as failing to satisfy Rule 702)

Avila v. Willits Envt’l Remediation Trust, 2009 WL 1813125, 2009 U.S. Dist. LEXIS 67981 (N.D. Cal. 2009) (excluding expert witness’s opinion in part because of his failure to rule out confounding exposures and risk factors for the outcomes of interest), aff’d in relevant part, 633 F.3d 828 (9th Cir.), cert denied, 132 S.Ct. 120 (2011)

Hendricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142, 1158 (E.D. Wash. 2009) (“In general, epidemiology studies are probative of general causation: a relative risk greater than 1.0 means the product has the capacity to cause the disease. “Where the study properly accounts for potential confounding factors and concludes that exposure to the agent is what increases the probability of contracting the disease, the study has demonstrated general causation – that exposure to the agent is capable of causing [the illness at issue] in the general population.’’) (internal quotation marks and citation omitted)

Valentine v. Pioneer Chlor Alkali Co., Inc., 921 F. Supp. 666, 677 (D. Nev. 1996) (‘‘In summary, Dr. Kilburn’s study suffers from very serious flaws. He took no steps to eliminate selection bias in the study group, he failed to identify the background rate for the observed disorders in the Henderson community, he failed to control for potential recall bias, he simply ignored the lack of reliable dosage data, he chose a tiny sample size, and he did not attempt to eliminate so-called confounding factors which might have been responsible for the incidence of neurological disorders in the subject group.’’)

Claar v. Burlington No. RR, 29 F.3d 499 (9th Cir. 1994) (affirming exclusion of plaintiffs’ expert witnesses, and grant of summary judgment, when plaintiffs’ witnesses concluded that the plaintiffs’ injuries were caused by exposure to toxic chemicals, without investigating any other possible causes).

10th Circuit

Hollander v. Sandoz Pharms. Corp., 289 F.3d 1193, 1213 (10th Cir. 2002) (affirming exclusion in Parlodel case involving stroke; confounding makes case reports inappropriate bases for causal inferences, and even observational epidemiologic studies must evaluated carefully for confounding)

D.C. Circuit

American Farm Bureau Fed’n v. EPA, 559 F.3d 512 (2009) (noting that in setting particulate matter standards addressing visibility, agency should avoid relying upon data that failed to control for the confounding effects of humidity)

Confounding in the Courts

November 2nd, 2018

Confounding in the Lower Courts

To some extent, lower courts, especially in the federal court system, got the message: Rule 702 required them to think about the evidence, and to consider threats to validity. Institutionally, there were signs of resistance to the process. Most judges were clearly much more comfortable with proxies of validity, such as qualification, publication, peer review, and general acceptance. Unfortunately for them, the Supreme Court had spoken, and then, in 2000, the Rules Committee and Congress spoke by revising Rule 702 to require a searching review of the studies upon which challenged expert witnesses were relying. Some of the cases involving confounding of one sort or another follow.

Confounding and Statistical Significance

Some courts and counsel confuse statistical significance with confounding, and suggest that a showing of statistical significance eliminates concern over confounding. This is, as several commentators have indicated, quite wrong.1 Despite the widespread criticism of this mistake in the Brock opinion, lawyers continue to repeat the mistake. One big-firm defense lawyer, for instance, claimed that “a statistically significant confidence interval helps ensure that the findings of a particular study are not due to chance or some other confounding factors.”2

Confounding and “Effect Size”

There is a role of study “effect size” in evaluating potential invalidity due to confounding, but it is frequently more nuanced than acknowledged by courts. The phrase “effect size,” of course, is misleading in that it is used to refer to the magnitude of an association, which may or may not be causal. This is one among many instances of sloppy terminology in statistical and epidemiologic science. Nonetheless, the magnitude of the relative risk may play a role in evaluating observational analytical epidemiologic studies for their ability to support a causal inference.

Small Effect Size

If the so-called effect size is low, say about 2.0, or less, actual, potential, or residual confounding (or bias) may well account for the entirety of the association.3 Many other well-known authors have concurred, with some setting the bar considerably higher, asking for risk ratios in excess of three or more, before accepting that a “clear-cut” association has been shown, unthreatened by confounding.4

Large Effect Size

Some courts have acknowledged that a strong association, with a high relative risk (without committing to what is “high”), increases the likelihood of a causal relationship, even while proceeding to ignore the effects of confounding.5 The Reference Manual suggests that a large effect size, such as for smoking and lung cancer (greater than ten-fold, and often higher than 30-fold), eliminates the need to worry about confounding:

Many confounders have been proposed to explain the association between smoking and lung cancer, but careful epidemiological studies have ruled them out, one after the other.”6

*  *  *  *  *  *

A relative risk of 10, as seen with smoking and lung cancer, is so high that it is extremely difficult to imagine any bias or confounding factor that might account for it. The higher the relative risk, the stronger the association and the lower the chance that the effect is spurious. Although lower relative risks can reflect causality, the epidemiologist will scrutinize such associations more closely because there is a greater chance that they are the result of uncontrolled confounding or biases.”7

The point about “difficult to imagine” is fair enough in the context of smoking and lung cancer, but that is because no other putative confounder presents such a high relative risk in most studies. In studying other epidemiologic associations, of a high magnitude, the absence of competing risk or correlation from lurking variables would need to be independently shown, rather than relying upon the “case study” of smoking and lung cancer.

Regression and Other Statistical Analyses

The failure to include a lurking or confounding variable may render a regression analysis invalid and meaningless. The Supreme Court, however, in Bazemore, a case decided before its own decision in Daubert, and before Rule 702 was statutorily modified,8 issued a Supreme ipse dixit, to hold that the selection or omission of variables in multiple regression raises an issue that affects the weight of the analysis:

Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.”9

The Supreme Court did, however, acknowledge in Bazemore that:

There may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.”10

The footnote in Bazemore is telling; the majority could imagine or hypothesize a multiple regression so incomplete that it would be irrelevant, but it never thought to ask whether a relevant regression could be so incomplete as to be unreliable or invalid. The invalidity of the regression in Bazemore does not appear to have been raised as an evidentiary issue under Rule 702. None of the briefs in the Supreme Court or the judicial opinions cited or discussed Rule 702.

Despite the inappropriateness of considering the Bazemore precedent after the Court decided Daubert, many lower court decisions have treated Bazemore as dispositive of reliability challenges to regression analyses, without any meaningful discussion.11 In the last several years, however, the appellate courts have awakened on occasion to their responsibilities to ensure that opinions of statistical expert witnesses, based upon regression analyses, are evaluated through the lens of Rule 702.12


1 Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). See, e.g., David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

2 Zach Hughes, “The Legal Significance of Statistical Significance,” 28 Westlaw Journal: Pharmaceutical 1, 2 (Mar. 2012).

See Norman E. Breslow & N. E. Day, “Statistical Methods in Cancer Research,” in The Analysis of Case-Control Studies 36 (IARC Pub. No. 32, 1980) (“[r]elative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor”); David A. Freedman & Philip B. Stark, “The Swine Flu Vaccine and Guillain-Barré Syndrome: A Case Study in Relative Risk and Specific Causation,” 64 Law & Contemp. Probs. 49, 61 (2001) (“If the relative risk is near 2.0, problems of bias and confounding in the underlying epidemiologic studies may be serious, perhaps intractable.”).

See, e.g., Richard Doll & Richard Peto, The Causes of Cancer 1219 (1981) (“when relative risk lies between 1 and 2 … problems of interpretation may become acute, and it may be extremely difficult to disentangle the various contributions of biased information, confounding of two or more factors, and cause and effect.”); Ernst L. Wynder & Geoffrey C. Kabat, “Environmental Tobacco Smoke and Lung Cancer: A Critical Assessment,” in H. Kasuga, ed., Indoor Air Quality 5, 6 (1990) (“An association is generally considered weak if the odds ratio is under 3.0 and particularly when it is under 2.0, as is the case in the relationship of ETS and lung cancer. If the observed relative risk is small, it is important to determine whether the effect could be due to biased selection of subjects, confounding, biased reporting, or anomalies of particular subgroups.”); David A. Grimes & Kenneth F. Schulz, “False alarms and pseudo-epidemics: the limitations of observational epidemiology,” 120 Obstet. & Gynecol. 920 (2012) (“Most reported associations in observational clinical research are false, and the minority of associations that are true are often exaggerated. This credibility problem has many causes, including the failure of authors, reviewers, and editors to recognize the inherent limitations of these studies. This issue is especially problematic for weak associations, variably defined as relative risks (RRs) or odds ratios (ORs) less than 4.”); Ernst L. Wynder, “Epidemiological issues in weak associations,” 19 Internat’l J. Epidemiol. S5 (1990); Straus S, Richardson W, Glasziou P, Haynes R., Evidence-Based Medicine. How to Teach and Practice EBM (3d ed. 2005); Samuel Shapiro, “Bias in the evaluation of low-magnitude associations: an empirical perspective,” 151 Am. J. Epidemiol. 939 (2000); Samuel Shapiro, “Looking to the 21st century: have we learned from our mistakes, or are we doomed to compound them?” 13 Pharmacoepidemiol. & Drug Safety 257 (2004); Muin J. Khoury, Levy M. James, W. Dana Flanders, and David J. Erickson, “Interpretation of recurring weak associations obtained from epidemiologic studies of suspected human teratogens,” 46 Teratology 69 (1992); Mark Parascandola, Douglas L Weed & Abhijit Dasgupta, “Two Surgeon General’s reports on smoking and cancer: a historical investigation of the practice of causal inference,” 3 Emerging Themes in Epidemiol. 1 (2006); David Sackett, R. Haynes, Gordon Guyatt, and Peter Tugwell, Clinical Epidemiology: A Basic Science for Clinical Medicine (2d ed. 1991); Gary Taubes, “Epidemiology Faces Its Limits,” 269 Science164, 168 (July 14, 1995) (quoting Marcia Angell, former editor of the New England Journal of Medicine, as stating that [a]s a general rule of thumb, we are looking for a relative risk of 3 or more [before accepting a paper for publication], particularly if it is biologically implausible or if it’s a brand new finding.”) (quoting John C. Bailar: “If you see a 10-fold relative risk and it’s replicated and it’s a good study with biological backup, like we have with cigarettes and lung cancer, you can draw a strong inference. * * * If it’s a 1.5 relative risk, and it’s only one study and even a very good one, you scratch your chin and say maybe.”); Lynn Rosenberg, “Induced Abortion and Breast Cancer: More Scientific Data Are Needed,” 86 J. Nat’l Cancer Instit. 1569, 1569 (1994) (“A typical difference in risk (50%) is small in epidemiologic terms and severely challenges our ability to distinguish if it reflects cause and effect or if it simply reflects bias.”) (commenting upon Janet R. Daling, K. E. Malone, L. F. Voigt, E. White, and Noel S. Weiss, “Risk of breast cancer among young women: relationship to induced abortion,” 86 J. Nat’l Cancer Instit. 1584 (1994); Linda Anderson, “Abortion and possible risk for breast cancer: analysis and inconsistencies,” (Wash. D.C., Nat’l Cancer Institute, Oct. 26,1994) (“In epidemiologic research, relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or effects of confounding factors that are sometimes not evident.”); Washington Post (Oct 27, 1994) (quoting Dr. Eugenia Calle, Director of Analytic Epidemiology for the American Cancer Society: “Epidemiological studies, in general are probably not able, realistically, to identify with any confidence any relative risks lower than 1.3 (that is a 30% increase in risk) in that context, the 1.5 [reported relative risk of developing breast cancer after abortion] is a modest elevation compared to some other risk factors that we know cause disease.”). See also General Causation and Epidemiologic Measures of Risk Size” (Nov. 24, 2012). Even expert witnesses for the litigation industry have agreed that small risk ratios (under two) are questionable for potential and residual confounding. David F. Goldsmith & Susan G. Rose, “Establishing Causation with Epidemiology,” in Tee L. Guidotti & Susan G. Rose, eds., Science on the Witness Stand: Evaluating Scientific Evidence in Law, Adjudication, and Policy 57, 60 (2001) (“There is no clear consensus in the epidemiology community regarding what constitutes a ‘strong’ relative risk, although, at a minimum, it is likely to be one where the RR is greater than two; i.e., one in which the risk among the exposed is at least twice as great as among the unexposed.”)

See King v. Burlington Northern Santa Fe Railway Co., 762 N.W.2d 24, 40 (Neb. 2009) (“the higher the relative risk, the greater the likelihood that the relationship is causal”).

RMSE3d at 219.

RMSE3d at 602. See Landrigan v. Celotex Corp., 127 N.J. 404, 605 A.2d 1079, 1086 (1992) (“The relative risk of lung cancer in cigarette smokers as compared to nonsmokers is on the order of 10:1, whereas the relative risk of pancreatic cancer is about 2:1. The difference suggests that cigarette smoking is more likely to be a causal factor for lung cancer than for pancreatic cancer.”).

See Federal Rule of Evidence 702, Pub. L. 93–595, § 1, Jan. 2, 1975, 88 Stat. 1937; Apr. 17, 2000 (eff. Dec. 1, 2000); Apr. 26, 2011, eff. Dec. 1, 2011.)

Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing Court of Appeal’s decision that would have disallowed a multiple regression analysis that omitted important variables).

10 Id. at 400 n. 10.

11 See, e.g., Manpower, Inc. v. Insurance Company of the State of Pennsylvania, 732 F.3d 796, 799 (7th Cir., 2013) (“the Supreme Court and this Circuit have confirmed on a number of occasions that the selection of the variables to include in a regression analysis is normally a question that goes to the probative weight of the analysis rather than to its admissibility.”); Cullen v. Indiana Univ. Bd. of Trustees, 338 F.3d 693, 701‐02 & n.4 (7th Cir. 2003) (citing Bazemore in rejecting challenge to expert witness’s omission of variables in regression analysis); In re High Fructose Corn Syrup Antitrust Litigation, 295 F.3d 651, 660‐61 (7th Cir. 2002) (refusing to exclude expert witness opinion testimony based upon regression analyses, flawed by omission of key variables); Adams v. Ameritech Servs., Inc., 231 F.3d 414, 423 (7th Cir. 2000) (relying upon Bazemore to affirm statistical analysis based upon correlation with no regression analysis). See also The Seventh Circuit Regresses on Rule 702” (Oct. 29, 2013).

12 See, e.g., ATA Airlines, Inc. v. Fed. Express Corp., 665 F.3d 882, 888–89 (2011) (Posner, J.) (reversing on grounds that plaintiff’s regression analysis should never have been admitted), cert. denied, 2012 WL 189940 (Oct. 7, 2012); Zenith Elec. Corp. v. WH-TV Broad. Corp., 395 F.3d 416 (7th Cir.) (affirming exclusion of expert witness opinion whose extrapolations were mere “ipse dixit”), cert. denied, 125 S. Ct. 2978 (2005); Sheehan v. Daily Racing Form, Inc. 104 F.3d 940 (7th Cir. 1997) (Posner, J.) (discussing specification error). See also Munoz v. Orr, 200 F.3d 291 (5th Cir. 2000). For a more enlightened and educated view of regression and the scope and application of Rule 702, from another Seventh Circuit panel, Judge Posner’s decision in ATA Airlines, supra, is a good starting place. SeeJudge Posner’s Digression on Regression” (April 6, 2012).

Rule 702 Requires Courts to Sort Out Confounding

October 31st, 2018

CONFOUNDING1

Back in 2000, several law professors wrote an essay, in which they detailed some of the problems courts experienced in expert witness gatekeeping. Their article noted that judges easily grasped the problem of generalizing from animal evidence to human experience, and thus they simplistically emphasized human (epidemiologic) data. But in their emphasis on the problems in toxicological evidence, the judges missed problems of internal validity, such as confounding, in epidemiologic studies:

Why do courts have such a preference for human epidemiological studies over animal experiments? Probably because the problem of external validity (generalizability) is one of the most obvious aspects of research methodology, and therefore one that non-scientists (including judges) are able to discern with ease – and then give excessive weight to (because whether something generalizes or not is an empirical question; sometimes things do and other times they do not). But even very serious problems of internal validity are harder for the untrained to see and understand, so judges are slower to exclude inevitably confounded epidemiological studies (and give insufficient weight to that problem). Sophisticated students of empirical research see the varied weaknesses, want to see the varied data, and draw more nuanced conclusions.”2

I am not sure that the problems are dependent in the fashion suggested by the authors, but their assessment that judges may be reluctant to break the seal on the black box of epidemiology, and that judges frequently lack the ability to make nuanced evaluations of the studies on which expert witnesses rely seems fair enough. Judges continue to miss important validity issues, perhaps because the adversarial process levels all studies to debating points in litigation.3

The frequent existence of validity issues undermines the partisan suggestion that Rule 702 exclusions are merely about “sufficiency of the evidence.” Sometimes, there is just too much of nothing to rise even to a problem of insufficiency. Some studies are “not even wrong.”4 Similarly, validity issues are an embarrassment to those authors who argue that we must assemble all the evidence and consider the entirety under ethereal standards, such as “weight of the evidence,” or “inference to the best explanation.” Sometimes, some or much of the available evidence does not warrant inclusion in the data set at all, and any causal inference is unacceptable.

Threats to validity come in many forms, but confounding is a particularly dangerous one. In claims that substances such as diesel fume or crystalline silica cause lung cancer, confounding is a huge problem. The proponents of the claims suggest relative risks in the range of 1.1 to 1.6 for such substances, but tobacco smoking results in relative risks in excess of 20, and some claim that passive smoking at home or in the workplace results in relative risks of the same magnitude as the risk ratios claimed for diesel particulate or silica. Furthermore the studies behind these claims frequently involve exposures to other known or suspected lung carcinogens, such as arsenic, radon, dietary factors, asbestos, and others.

Definition of Confounding

Confounding results from the presence of a so-called confounding (or lurking) variable, helpfully defined in the chapter on statistics in the Reference Manual on Scientific Evidence:

confounding variable; confounder. A confounder is correlated with the independent variable and the dependent variable. An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding. See controlled experiment; observational study.”5

This definition suggests that the confounder need not be known to cause the dependent variable/outcome; the confounder need be only correlated with the outcome and an independent variable, such as exposure. Furthermore, the confounder may be actually involved in such a way as to increase or decrease the estimated relationship between dependent and independent variables. A confounder that is known to be present typically is referred to as a an “actual” confounder, as opposed to one that may be at work, and known as a “potential” confounder. Furthermore, even after exhausting known and potential confounders, studies of may be affected by “residual” confounding, especially when the total array of causes of the outcome of interest is not understood, and these unknown causes are not randomly distributed between exposed and unexposed groups in epidemiologic studies. Litigation frequently involves diseases or outcomes with unknown causes, and so the reality of unidentified residual confounders is unavoidable.

In some instances, especially in studies pharmaceutical adverse outcomes, there is the danger that the hypothesized outcome is also a feature of the underlying disease being treated. This phenomenon is known as confounding by indication, or as indication bias.6

Kaye and Freedman’s statistics chapter notes that confounding is a particularly important consideration when evaluating observational studies. In randomized clinical trials, one goal of the randomization is the elimination of the role of bias and confounding by the random assignment of exposures:

2. Randomized controlled experiments

In randomized controlled experiments, investigators assign subjects to treatment or control groups at random. The groups are therefore likely to be comparable, except for the treatment. This minimizes the role of confounding.”7

In observational studies, confounding may completely invalidate an association. Kaye and Freedman give an example from the epidemiologic literature:

Confounding remains a problem to reckon with, even for the best observational research. For example, women with herpes are more likely to develop cervical cancer than other women. Some investigators concluded that herpes caused cancer: In other words, they thought the association was causal. Later research showed that the primary cause of cervical cancer was human papilloma virus (HPV). Herpes was a marker of sexual activity. Women who had multiple sexual partners were more likely to be exposed not only to herpes but also to HPV. The association between herpes and cervical cancer was due to other variables.”8

The problem identified as confounding by Freedman and Kaye cannot be dismissed as an issue that goes to the “weight” of the study issue; the confounding goes to the heart of the ability of the herpes studies to show an association that can be interpreted to be causal. Invalidity from confounding renders the studies “weightless” in any “weight of the evidence” approach. There are, of course, many ways to address confounding in studies: stratification, multivariate analyses, multiple regression, propensity scores, etc. Consideration of the propriety and efficacy of these methods is a whole other level of analysis, which does not arise unless and until the threshold question of confounding is addressed.

Reference Manual on Scientific Evidence

The epidemiology chapter of the Second Edition of the Manual stated that ruling out of confounding as an obligation of the expert witness who chooses to rely upon the study.9 Although the same chapter in the Third Edition occasionally waffles, its authors come down on the side of describing confounding as a threat to validity, which must be ruled out before the study can be relied upon. In one place, the authors indicate “care” is required, and that analysis for random error, confounding, bias “should be conducted”:

Although relative risk is a straightforward concept, care must be taken in interpreting it. Whenever an association is uncovered, further analysis should be conducted to assess whether the association is real or a result of sampling error, confounding, or bias. These same sources of error may mask a true association, resulting in a study that erroneously finds no association.”10

Elsewhere in the same chapter, the authors note that “chance, bias, and confounding” must be looked at, but again, the authors stop short of noting that these threats to validity must be eliminated:

Three general categories of phenomena can result in an association found in a study to be erroneous: chance, bias, and confounding. Before any inferences about causation are drawn from a study, the possibility of these phenomena must be examined.”11

                *  *  *  *  *  *  *  *

To make a judgment about causation, a knowledgeable expert must consider the possibility of confounding factors.”12

Eventually, however, the epidemiology chapter takes a stand, and an important one:

When researchers find an association between an agent and a disease, it is critical to determine whether the association is causal or the result of confounding.”13

Mandatory Not Precatory

The better reasoned cases decided under Federal Rule of Evidence 702, and state-court analogues, follow the Reference Manual in making clear that confounding factors must be carefully addressed and eliminated. Failure to rule out the role of confounding renders a conclusion of causation, reached in reliance upon confounded studies, invalid.14

The inescapable mandate of Rules 702 and 703 is to require judges to evaluate the bases of a challenged expert witness’s opinion. Threats to internal validity, such as confounding, in a study may make reliance upon any given study, or an entire set of studies, unreasonable, which thus implicates Rule 703. Importantly, stacking up more invalid studies does not overcome the problem by presenting a heap of evidence, incompetent to show anything.

Pre-Daubert

Before the Supreme Court decided Daubert, few federal or state courts were willing to roll up their sleeves to evaluate the internal validity of relied upon epidemiologic studies. Issues of bias and confounding were typically dismissed by courts as issues that went to “weight, not admissibility.”

Judge Weinstein’s handling of the Agent Orange litigation, in the mid-1980s, marked a milestone in judicial sophistication and willingness to think critically about the evidence that was being funneled into the courtroom.15 The Bendectin litigation also was an important proving ground in which the defendant pushed courts to keep their eyes and minds open to issues of random error, bias, and confounding, when evaluating scientific evidence, on both pre-trial and on post-trial motions.16

Post-Daubert

When the United States Supreme Court addressed the admissibility of plaintiffs’ expert witnesses in Daubert, its principal focus was on the continuing applicability of the so-called Frye rule after the enactment of the Federal Rules of Evidence. The Court left the details of applying the then newly clarified “Daubert” standard to the facts of the case on remand to the intermediate appellate court. The Ninth Circuit, upon reconsidering the case, re-affirmed the trial court’s previous grant of summary judgment, on grounds of the plaintiffs’ failure to show specific causation.

A few years later, the Supreme Court itself engaged with the actual evidentiary record on appeal, in a lung cancer claim, which had been dismissed by the district court. Confounding was one among several validity issues in the studies relied upon by plaintiffs” expert witnesses. The Court concluded that the plaintiffs’ expert witnesses’ bases did not individually or collectively support their conclusions of causation in a reliable way. With respect to one particular epidemiologic study, the Supreme Court observed that a study that looked at workers who “had been exposed to numerous potential carcinogens” could not show that PCBs cause lung cancer. General Elec. Co. v. Joiner, 522 U.S. 136, 146 (1997).17


1 An earlier version of this post can be found at “Sorting Out Confounded Research – Required by Rule 702” (June 10, 2012).

2 David Faigman, David Kaye, Michael Saks, and Joseph Sanders, “How Good is Good Enough? Expert Evidence Under Daubert andKumho,” 50Case Western Reserve L. Rev. 645, 661 n.55 (2000).

3 See, e.g., In re Welding Fume Prods. Liab. Litig., 2006 WL 4507859, *33 (N.D.Ohio 2006) (reducing all studies to one level, and treating all criticisms as though they rendered all studies invalid).

4 R. Peierls, “Wolfgang Ernst Pauli, 1900-1958,” 5Biographical Memoirs of Fellows of the Royal Society 186 (1960) (quoting Wolfgang Pauli’s famous dismissal of a particularly bad physics paper).

5 David Kaye & David Freedman, “Reference Guide on Statistics,” inReference Manual on Scientific Evidence 211, 285 (3d ed. 2011)[hereafter theRMSE3d].

6 See, e.g., R. Didham, et al., “Suicide and Self-Harm Following Prescription of SSRIs and Other Antidepressants: Confounding By Indication,” 60Br. J. Clinical Pharmacol. 519 (2005).

7 RMSE3d at 220.

8 RMSE3d at 219 (internal citations omitted).

9 Reference Guide on Epidemiology at 369 -70 (2ed 2000) (“Even if an association is present, epidemiologists must still determine whether the exposure causes the disease or if a confounding factor is wholly or partly responsible for the development of the outcome.”).

10 RMSE3d at 567-68 (internal citations omitted).

11 RMSE3d at 572.

12 RMSE3d at 591 (internal citations omitted).

13 RMSE3d at 591

14 Similarly, an exonerative conclusion of no association might be vitiated by confounding with a protective factor, not accounted for in a multivariate analysis. Practically, such confounding seems less prevalent than confounding that generates a positive association.

15 In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1984) (noting that confounding had not been sufficiently addressed in a study of U.S. servicemen exposed to Agent Orange), aff’d, 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988).

16 Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 311 , modified on reh’g, 884 F.2d 166 (5th Cir. 1989) (noting that “[o]ne difficulty with epidemiologic studies is that often several factors can cause the same disease.”)

17 The Court’s discussion related to the reliance of plaintiffs’ expert witnesses upon, among other studies, Kuratsune, Nakamura, Ikeda, & Hirohata, “Analysis of Deaths Seen Among Patients with Yusho – A Preliminary Report,” 16 Chemosphere 2085 (1987).

Ruling Out Bias & Confounding is Necessary to Evaluate Expert Witness Causation Opinions

October 29th, 2018

In 2000, Congress amended the Federal Rules of Evidence to clarify, among other things, that Rule 702 had grown past the Supreme Court’s tentative, preliminary statement in Daubert, to include over a decade and half of further judicial experience and scholarly comment. One point of clarification in the 2000 amendments, carried forward since, was that expert witness testimony is admissible only if “the testimony is based on sufficient facts or data.” Rule 702(b). In other words, an expert witness’s opinions could fail the legal requirement of reliability and validity by lacking sufficient facts or data.

The American Law Institute (ALI), in its 2010 revision to The Restatement of Torts, purported to address the nature and quantum of evidence for causation in so-called toxic tort cases as a matter of substantive law only, without addressing admissibility of expert witness opinion testimony, by noting that the Restatement did “not address any other requirements for the admissibility of an expert witness’s testimony, including qualifications, expertise, investigation, methodology, or reasoning.” Restatement (Third) of Torts: Liability for Physical and Emotional Harm § 28, cmt. E (2010). The qualifying language seems to have come from a motion advanced by ALI member Larry S. Stewart.

The Restatement, however, was not faithful to its own claim; nor could it be. Rule 702(b) made sufficiency an explicit part of the admissibility calculus in 2000. The ALI should have known better to claim that its Restatement would not delve, and had not wandered, into the area of expert witness admissibility. The strategic goal for ignoring a key part of Rule 702 seems to have been to redefine expert witness reliability and validity as a “sufficiency” or “weight of the evidence” question, which the trial court was required to leave to the finder of fact (usually a lay jury) to resolve. The Restatement’s pretense to avoid addressing the admissibility of expert witness opinion turns on an incorrect assumption that sufficiency plays no role in judicial gatekeeping of opinion testimony.

At the time of the release of the Restatement (Third) of Torts: Liability for Physical and Emotional Harm, one of its Reporters, Michael D. Green, published an article in Trial, the glossy journal of the Association of Trial Lawyers of America (now known by the self-congratulatory name of the American Association of Justice), the trade organization for the litigation industry in the United States. Professor Green’s co-author was Larry S. Stewart, a former president of the plaintiffs’ lawyers’ group, and the ALI member who pressed the motion that led to the Comment E language quoted above. Their article indecorously touted the then new Restatement as a toolbox for plaintiffs’ lawyers.1

According to Green and Stewart, “Section 28, comment c [of the Restatement], seeks to clear the air.” Green at 46. These authors suggest that the Restatement sought to avoid “bright-line rules,” by recognizing that causal inference is a

matter of informed judgment, not scientific certainty; scientific analysis is informed by numerous factors (commonly known as the Hill criteria); and, in some cases, reasonable scientists can come to differing conclusions.”

Id.

There are several curious aspects to these pronouncements. First, the authors are conceding that the comment e caveat was violated because the Hill criteria certainly involve the causation expert witness’s methodology and reasoning. Second, the authors’ claim to have avoided “bright-line” rules is muddled when they purport to bifurcate “informed judgment” from “scientific certainty.” The latter phrase, “scientific certainty” is not a requirement in science or the law, which makes the comparison with informed judgment confusing. Understandably, Green and Stewart wished to note that in some cases, scientists could reasonably come to different conclusions about causation from a given data set, but their silence about the many cases in which scientists, outside the courtroom, do not reach the causal conclusion contended for by party advocate expert witnesses, is telling, given the obvious pro-litigation bias of their audience.

Perhaps the most problematic aspect of the authors’ approach to causal analysis is their reductionist statement that “scientific analysis is informed by numerous factors (commonly known as the Hill criteria).” The nine Hill criteria, to be sure, are important, but they follow an assessment whether the pre-requisites for the criteria have been met,2 namely an “association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”3

The problematic aspects of this litigation-industry magazine article raise the question whether the Restatement itself similarly provides erroneous guidance. The relevant discussion occurs in Chapter 5, on “Factual Cause, § 28 Comment c (3) General Causation. At one place, the comment seems to elevate the Hill criteria to the entire relevant consideration:

Observational group studies are subject to a variety of errors — sampling error, bias, and confounding — and may, as a result, find associations that are spurious and not causal. Only after an evaluative judgment, based on the Hill criteria, that the association is likely to be causal rather than spurious, is a study valid evidence of general causation and specific causation.”

Restatement at 449b.

This passage, like the Green and Stewart article, appears to treat the Hill criteria as the end-all of the evaluative judgment, which leaves out the need to assess and eliminate “sampling error, bias, and confounding” before proceeding to measure the available evidence against the Hill criteria. The first sentence, however, does suggest that addressing sampling error, bias, and confounding is part of causal inference, at least if spurious associations are to be avoided. Indeed, earlier in comment c, the reporters describe the examination of an association as explained by random error or bias as scientifically required:

when epidemiology finds an association, the observational (rather than experimental) nature of these studies requires an examination of whether the association is truly causal or spurious and due to random error or deficiencies in the study (bias).”

Restatement at 440b (emphasis added). This crucial explanation was omitted from the Green and Stewart article.

An earlier draft of comment c offered the following observation:

Epidemiologists use statistical methods to estimate the range of error that sampling error could produce; assessing the existence and impact of biases and uncorrected confounding is usually qualitative. Whether an inference of causation based on an association is appropriate is a matter of informed judgment, not scientific inquiry, as is a judgment whether a study that finds no association is exonerative or inconclusive.”

Fortunately, this observation was removed in the drafting process. The reason for the deletion is unclear, but its removal was well advised. The struck language would have been at best misleading when it suggests that the assessment of bias and confounding is “usually qualitative.” Elimination of confounding is the goal of multivariate analyses such as logistic regression and propensity score matching models, among other approaches, all of which are quantitative methods. Assessing bias quantitatively has been the subject of book-length treatment in the field of epidemiology.4

In comment c as published, the Reporters acknowledged that confounding can be identified and analyzed:

The observational nature of epidemiologic studies virtually always results in concerns about the results being skewed by biases or unidentified confounders. * * * Sometimes potential confounders can be identified and data gathered that permits analysis of whether confounding exists. Unidentified confounders, however, cannot be analyzed. Often potential biases can be identified, but assessing the extent to which they affected the study’s outcome is problematical. * * * Thus, interpreting the results of epidemiologic studies requires informed judgment and is subject to uncertainty. Unfortunately, contending adversarial experts, because of the pressures of the adversarial system, rarely explore this uncertainty and provide the best, objective assessment of the scientific evidence.”

Restatement at 448a.

It would be a very poorly done epidemiologic study that fails to identify and analyze confounding variables in a multivariate analysis. The key question will be whether the authors have done this analysis with due care, and with all the appropriate co-variates to address confounding thoroughly. The Restatement comment acknowledges that expert witnesses in the our courtrooms often fail to explore the uncertainty created by bias and confounding. Given the pressure on those witnesses claiming causal associations, we might well expect that this failure will not be equally distributed among all expert witnesses.


1 Michael D. Green & Larry S. Stewart, “The New Restatement’s Top 10 Tort Tools,” Trial 44 (April 2010) [cited as Green]. See “The Top Reason that the ALI’s Restatement of Torts Should Steer Clear of Partisan Conflicts.”

2 See Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013); see also Woodside & Davis on the Bradford Hill Considerations(Aug. 23, 2013).

3 Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965).

4 See, e.g., Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink, Applying Quantitative Bias Analysis to Epidemiologic Data (2009).