TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Paraquat Shape-Shifting Expert Witness Quashed

April 24th, 2024

Another multi-district litigation (MDL) has hit a jarring speed bump. Claims for Parkinson’s disease (PD), allegedly caused by exposure to paraquat dichloride (paraquat), were consolidated, in June 2021, for pre-trial coordination in MDL No. 3004, in the Southern District of Illinois, before Chief Judge Nancy J. Rosenstengel. Like many health-effects litigation claims, the plaintiffs’ claims in these paraquat cases turn on epidemiologic evidence. To make their causation case in the first MDL trial cases, plaintiffs’ counsel nominated a statistician, Martin T. Wells, to present their causation case. Last week, Judge Rosenstengel found Wells’ opinion so infected by invalid methodologies and inferences as to be inadmissible under the most recent version of Rule 702.[1] Summary judgment in the trial cases followed.[2]

Back in the 1980s, paraquat gained some legal notoriety in one of the most retrograde Rule 702 decisions.[3] Both the herbicide and Rule 702 survived, however, and they both remain in wide use. For the last two decades, there has been a widespread challenges to the safety of paraquat, and in particular there have been claims that paraquat can cause PD or parkinsonism under some circumstances.  Despite this background, the plaintiffs’ counsel in MDL 3004 began with four problems.

First, paraquat is closely regulated for agricultural use in the United States. Under federal law, paraquat can be used to control the growth of weeds only “by or under the direct supervision of a certified applicator.”[4] The regulatory record created an uphill battle for plaintiffs.[5] Under the Federal Insecticide, Fungicide, and Rodenticide Act (“FIFRA”), the U.S. EPA has regulatory and enforcement authority over the use, sale, and labeling of paraquat.[6] As part of its regulatory responsibilities, in 2019, the EPA systematically reviewed available evidence to assess whether there was an association between paraquat and PD. The agency’s review concluded that “there is limited, but insufficient epidemiologic evidence at this time to conclude that there is a clear associative or causal relationship between occupational paraquat exposure and PD.”[7] In 2021, the EPA issued its Interim Registration Review Decision, and reapproved the registration of paraquat. In doing so, the EPA concluded that “the weight of evidence was insufficient to link paraquat exposure from pesticidal use of U.S. registered products to Parkinson’s disease in humans.”[8]

Second, beyond the EPA, there were no other published reviews, systematic or otherwise, which reached a conclusion that paraquat causes PD.[9]

Third, the plaintiffs claims faced another serious impediment. Their counsel placed their reliance upon Professor Martin Wells, a statistician on the faculty of Cornell University. Unfortunately for plaintiffs, Wells has been known to operate as a “cherry picker,” and his methodology has been previously reviewed in an unfavorable light. Another MDL court, which reviewed a review and meta-analysis propounded by Wells, found that his reports “were marred by a selective review of data and inconsistent application of inclusion criteria.”[10]

Fourth, the plaintiffs’ claims were before Chief Judge Nancy J. Rosenstengel, who was willing to do the hard work required under Rule 702, specially as it has been recently amended for clarification and emphasis of the gatekeeper’s responsibilities to evaluate validity issues in the proffered opinions of expert witnesses. As her 97 page decision evinces, Judge Rosenstengel conducted four days of hearings, which included viva voce testimony from Martin Wells, and she obviously read the underlying papers, reviews, as well as the briefs and the Reference Manual on Scientific Evidence, with great care. What followed did not go well for Wells or the plaintiffs’ claims.[11] Judge Rosenstengel has written an opinion that may be the first careful judicial consideration of the basic requirements of systematic review.

The court noted that systematic reviewers carefully define a research question and what kinds of empirical evidence will be reviewed, and then collect, summarize, and, if feasible, synthesize the available evidence into a conclusion.[12] The court emphasized that systematic reviewers should “develop a protocol for the review before commencement and adhere to the protocol regardless of the results of the review.”[13]

Wells proffered a meta-analysis, and a “weight of the evidence” (WOE) review from which he concluded that paraquat causes PD and nearly triples the risk of the disease among workers exposed to the herbicide.[14] In his reports, Wells identified a universe of at least 36 studies, but included seven in his meta-analysis. The defense had identified another two studies that were germane.[15]

Chief Judge Rosenstengel’s opinion is noteworthy for its fine attention to detail, detail that matters to the validity of the expert witness’s enterprise. Martin Wells set out to do a meta-analysis, which was all fine and good. With a universe of 36 studies, with sub-findings, alternative analyses, and changing definitions of relevant exposure, the devil lay in the details.

The MDL court was careful to point out that it was not gainsaying Wells’ decision to limit his meta-analysis to case-control studies, or to his grading of any particular study as being of low quality. Systematic reviews and meta-analyses are generally accepted techniques that are part of a scientific approach to causal inference, but each has standards, predicates, and requirements for valid use. Expert witnesses must not only use a reliable methodology, Rule 702(d) requires that they must reliably apply their chosen methodology to the facts at hand in reaching their conclusions.[16]

The MDL court concluded that Wells’ meta-analysis was not sufficiently reliable under Rule 702 because he failed faithfully and reliably to apply his own articulated methodology. The court followed Wells’ lead in identifying the source and content of his chosen methodology, and simply examined his proffered opinion for compliance with that methodology.[17] The basic principles of validity for conducting meta-analyses were not, in any event, really contested. These principles and requirements were clearly designed to ensure and enhance the reliability of meta-analyses by pre-empting results-driven, reverse-engineered summary estimates of association.

The court found that Wells failed clearly to pre-specify his eligibility criteria. He then proceeded to redefine exposure criteria and study inclusion or eligibility criteria, and study quality criteria, after looking at the evidence. He also inconsistently applied his stated criteria, all in an apparently desired effort to exclude less favorable study outcomes. These ad hoc steps were some of Wells’ deviations from the standards to which he played lip service.

The court did not exclude Wells because it disagreed with his substantive decisions to include or exclude any particular study, or his quality grading of any study. Rather, Dr. Wells’ meta-analysis does not pass muster under Rule 702 because its methodology was unclear, inconsistently applied, not replicable, and at times transparently reverse-engineered.[18]

The court’s evaluation of Wells was unflinchingly critical. Wells’ proffered opinions “required several methodological contortions and outright violations of the scientific standards he professed to apply.”[19] From his first involvement in this litigation, Wells had violated the basic rules of conducting systematic reviews and meta-analyses.[20] His definition of “occupational” exposure meandered to suit his desire to include one study (with low variance) that might otherwise have been excluded.[21] Rather than pre-specifying his review process, his study inclusion criteria, and his quality scores, Wells engaged in an unwritten “holistic” review process, which he conceded was not objectively replicable. Wells’ approach left him free to include studies he wanted in his meta-analysis, and then provide post hoc justifications.[22] His failure to identify his inclusion/exclusion criteria was a “methodological red flag” in Dr. Wells’ meta-analysis, which suggested his reverse engineering of the whole analysis, the “very antithesis of a systematic review.”[23]

In what the court described as “methodological shapeshifting,” Wells blatantly and inconsistently graded studies he wanted to include, and had already decided to include in his meta-analysis, to be of higher quality.[24] The paraquat MDL court found, unequivocally, that Wells had “failed to apply the same level of intellectual rigor to his work in the four trial selection cases that would be required of him and his peers in a non-litigation setting.”[25]

It was also not lost upon the MDL court that Wells had shifted from a fixed effect to a random effects meta-analysis, between his principal and rebuttal reports.[26] Basic to the meta-analytical enterprise is a predicate systematic review, properly done, with pre-specification of inclusion and exclusion criteria for what studies would go into any meta-analysis. The MDL court noted that both sides had cited Borenstein’s textbook on meta-analysis,[27] and that Wells had himself cited the Cochrane Handbook[28] for the basic proposition that that objective and scientifically valid study selection criteria should be clearly stated in advance to ensure the objectivity of the analysis.

There was of course legal authority for this basic proposition about prespecification. Given that the selection of studies that go into a systematic review and meta-analysis can be dispositive of its conclusion, undue subjectivity or ad hoc inclusion can easily arrange a desired outcome.[29] Furthermore, meta-analysis carries with it the opportunity to mislead a lay jury with a single (and inflated) risk ratio,[30] which is obtained by the operator’s manipulation of inclusion and exclusion criteria. This opportunity required the MDL court to examine the methodological rigor of the proffered meta-analysis carefully to evaluate whether it reflects a valid pooling of data or it was concocted to win a case.[31]

Martin Wells had previously acknowledged the dangers of manipulation and subjective selectivity inherent in systematic reviews and meta-analyses. The MDL court quoted from Wells’ testimony in Martin v. Actavis:

QUESTION: You would certainly agree that the inclusion-exclusion criteria should be based upon objective criteria and not simply because you were trying to get to a particular result?

WELLS: No, you shouldn’t load the – sort of cook the books.

QUESTION: You should have prespecified objective criteria in advance, correct?

WELLS: Yes.[32]

The MDL court also picked up on a subtle but important methodological point about which odds ratio to use in a meta-analysis when a study provides multiple analyses of the same association. In his first paraquat deposition, Wells cited the Cochrane Handbook, for the proposition that if a crude risk ratio and a risk ratio from a multivariate analysis are both presented in a given study, then the adjusted risk ratio (and its corresponding measure of standard error seen in its confidence interval) is generally preferable to reduce the play of confounding.[33] Wells violated this basic principle by ignoring the multivariate analysis in the study that dominated his meta-analysis (Liou) in favor of the unadjusted bivariate analysis. Given that Wells accepted this basic principle, the MDL court found that Wells likely selected the minimally adjusted odds ratio over the multiviariate adjusted odds ratio for inclusion in his meta-analysis in order to have the smaller variance (and thus greater weight) from the former. This maneuver was disqualifying under Rule 702.[34]

All in all, the paraquat MDL court’s Rule 702 ruling was a convincing demonstration that non-expert generalist judges, with assistance from subject-matter experts, treatises, and legal counsel, can evaluate and identify deviations from methodological standards of care.


[1] In re Paraquat Prods. Prods. Liab. Litig., Case No. 3:21-md-3004-NJR, MDL No. 3004, Slip op., ___ F.3d ___ (S.D. Ill. Apr. 17, 2024) [Slip op.]

[2] In re Paraquat Prods. Prods. Liab. Litig., Op. sur motion for judgment, Case No. 3:21-md-3004-NJR, MDL No. 3004 (S.D. Ill. Apr. 17, 2024). See also Brendan Pierson, “Judge rejects key expert in paraquat lawsuits, tosses first cases set for trial,” Reuters (Apr. 17, 2024); Hailey Konnath, “Trial-Ready Paraquat MDL Cases Tossed After Testimony Axed,” Law360 (Apr. 18, 2024).

[3] Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). SeeFerebee Revisited,” Tortini (Dec. 28, 1017).

[4] See 40 C.F.R. § 152.175.

[5] Slip op. at 31.

[6] 7 U.S.C. § 136w; 7 U.S.C. § 136a(a); 40 C.F.R. § 152.175. The agency must periodically review the registration of the herbicide. 7 U.S.C. § 136a(g)(1)(A). See Ruckelshaus v. Monsanto Co., 467 U.S. 986, 991-92 (1984).

[7] See Austin Wray & Aaron Niman, Memorandum, Paraquat Dichloride: Systematic review of the literature to evaluate the relationship between paraquat dichloride exposure and Parkinson’s disease at 35 (June 26, 2019).

[8] See also Jeffrey Brent and Tammi Schaeffer, “Systematic Review of Parkinsonian Syndromes in Short- and Long-Term Survivors of Paraquat Poisoning,” 53 J. Occup. & Envt’l Med. 1332 (2011) (“An analysis the world’s entire published experience found no connection between high-dose paraquat exposure in humans and the development of parkinsonism.”).

[9] Douglas L. Weed, “Does paraquat cause Parkinson’s disease? A review of reviews,” 86 Neurotoxicology 180, 180 (2021).

[10] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp. 3d 1007, 1038, 1043 (S.D. Cal. 2021), aff’d, No. 21-55342, 2022 WL 898595 (9th Cir. Mar. 28, 2022) (per curiam). SeeMadigan’s Shenanigans and Wells Quelled in Incretin-Mimetic CasesTortini (July 15, 2022).

[11] The MDL court obviously worked hard to learn the basics principles of epidemiology. The court relied extensively upon the epidemiology chapter in the Reference Manual on Scientific Evidence. Much of that material is very helpful, but its exposition on statistical concepts is at times confused and erroneous. It is unfortunate that courts do not pay more attention to the more precise and accurate exposition in the chapter on statistics. Citing the epidemiology chapter, the MDL court gave an incorrect interpretation of the p-value: “A statistically significant result is one that is unlikely the product of chance. Slip op. at 17 n. 11. And then again, citing the Reference Manual, the court declared that “[a] p-value of .1 means that there is a 10% chance that values at least as large as the observed result could have been the product of random error. Id.” Id. Similarly, the MDL court gave an incorrect interpretation of the confidence interval. In a footnote, the court tells us that “[r]esearchers ordinarily assert a 95% confidence interval, meaning that ‘there is a 95% chance that the “true” odds ratio value falls within the confidence interval range’. In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., MDL No. 2342, 2015 WL 7776911, at *2 (E.D. Pa. Dec. 2, 2015).” Slip op. at 17n.12.  Citing another court for the definition of a statistical concept is a risky business.

[12] Slip op. at 20, citing Lisa A. Bero, “Evaluating Systematic Reviews and Meta-Analyses,” 14 J.L. & Pol’y 569, 570 (2006).

[13] Slip op. at 21, quoting Bero, at 575.

[14] Slip op. at 3.

[15] The nine studies at issue were as follows: (1) H.H. Liou, et al., “Environmental risk factors and Parkinson’s disease; A case-control study in Taiwan,” 48 Neurology 1583 (1997); (2) Caroline M. Tanner, et al.,Rotenone, Paraquat and Parkinson’s Disease,” 119 Envt’l Health Persps. 866 (2011) (a nested case-control study within the Agricultural Health Study (“AHS”)); (3) Clyde Hertzman, et al., “A Case-Control Study of Parkinson’s Disease in a Horticultural Region of British Columbia,” 9 Movement Disorders 69 (1994); (4) Anne-Maria Kuopio, et al., “Environmental Risk Factors in Parkinson’s Disease,” 14 Movement Disorders 928 (1999); (5) Katherine Rugbjerg, et al., “Pesticide exposure and risk of Parkinson’s disease – a population-based case-control study evaluating the potential for recall bias,” 37 Scandinavian J. of Work, Env’t & Health 427 (2011); (6) Jordan A. Firestone, et al., “Occupational Factors and Risk of Parkinson’s Disease: A Population-Based Case-Control Study,” 53 Am. J. of Indus. Med. 217 (2010); (7) Amanpreet S. Dhillon,“Pesticide / Environmental Exposures and Parkinson’s Disease in East Texas,” 13 J. of Agromedicine 37 (2008); (8) Marianne van der Mark, et al., “Occupational exposure to pesticides and endotoxin and Parkinson’s disease in the Netherlands,” 71 J. Occup. & Envt’l Med. 757 (2014); (9) Srishti Shrestha, et al., “Pesticide use and incident Parkinson’s disease in a cohort of farmers and their spouses,” Envt’l Research 191 (2020).

[16] Slip op. at 75.

[17] Slip op. at 73.

[18] Slip op. at 75, citing In re Mirena IUS Levonorgestrel-Related Prod. Liab. Litig. (No. II), 341 F. Supp. 3d 213, 241 (S.D.N.Y. 2018) (“Opinions that assume a conclusion and reverse-engineer a theory to fit that conclusion are . . . inadmissible.”) (internal citation omitted), aff’d, 982 F.3d 113 (2d Cir. 2020); In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., No. 12-md-2342, 2015 WL 7776911, at *16 (E.D. Pa. Dec. 2, 2015) (excluding expert’s opinion where he “failed to consistently apply the scientific methods he articulat[ed], . . . deviated from or downplayed certain well established principles of his field, and . . . inconsistently applied methods and standards to the data so as to support his a priori opinion.”), aff’d, 858 F.3d 787 (3d Cir. 2017).

[19] Slip op. at 35.

[20] Slip op. at 58.

[21] Slip op. at 55.

[22] Slip op. at 41, 64.

[23] Slip op. at 59-60, citing In re Lipitor (Atorvastatin Calcium) Mktg., Sales Pracs. & Prod. Liab. Litig., 892 F.3d 624, 634 (4th Cir. 2018) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

[24] Slip op. at 67, 69-70, citing In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., 858 F.3d 787, 795-97 (3d Cir. 2017) (“[I]f an expert applies certain techniques to a subset of the body of evidence and other techniques to another subset without explanation, this raises an inference of unreliable application of methodology.”); In re Bextra and Celebrex Mktg. Sales Pracs. & Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1179 (N.D. Cal. 2007) (excluding an expert witness’s causation opinion because of his result-oriented, inconsistent evaluation of data sources).

[25] Slip op. at 40.

[26] Slip op. at 61 n.44.

[27] Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Rothstein, Introduction to Meta-Analysis (2d ed. 2021).

[28] Jacqueline Chandler, James Thomas, Julian P. T. Higgins, Matthew J. Page, Miranda Cumpston, Tianjing Li, Vivian A. Welch, eds., Cochrane Handbook for Systematic Reviews of Interventions (2ed 2023).

[29] Slip op. at 56, citing In re Zimmer Nexgen Knee Implant Prod. Liab. Litig., No. 11 C 5468, 2015 WL 5050214, at *10 (N.D. Ill. Aug. 25, 2015).

[30] Slip op. at 22. The court noted that the Reference Manual on Scientific Evidence cautions that “[p]eople often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiological ones, may consequently be overlooked.” Id., quoting from Manual, at 608.

[31] Slip op. at 57, citing Deutsch v. Novartis Pharms. Corp., 768 F. Supp. 2d 420, 457-58 (E.D.N.Y. 2011) (“[T]here is a strong risk of prejudice if a Court permits testimony based on an unreliable meta-analysis because of the propensity for juries to latch on to the single number.”).

[32] Slip op. at 64, quoting from Notes of Testimony of Martin Wells, in In re Testosterone Replacement Therapy Prod. Liab. Litig., Nos. 1:14-cv-1748, 15-cv-4292, 15-cv-426, 2018 WL 7350886 (N.D. Ill. Apr. 2, 2018).

[33] Slip op. at 70.

[34] Slip op. at 71-72, citing People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (“[A] statistical study that fails to correct for salient explanatory variables . . . has no value as causal explanation and is therefore inadmissible in federal court.”); In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1140 (N.D. Cal. 2018). Slip op. at 17 n. 12.

Excluding Epidemiologic Evidence under Federal Rule of Evidence 702

August 26th, 2023

We are 30-plus years into the “Daubert” era, in which federal district courts are charged with gatekeeping the relevance and reliability of scientific evidence. Not surprisingly, given the lawsuit industry’s propensity on occasion to use dodgy science, the burden of awakening the gatekeepers from their dogmatic slumber often falls upon defense counsel in civil litigation. It therefore behooves defense counsel to speak carefully and accurately about the grounds for Rule 702 exclusion of expert witness opinion testimony.

In the context of medical causation opinions based upon epidemiologic evidence, the first obvious point is that whichever party is arguing for exclusion should distinguish between excluding an expert witness’s opinion and prohibiting an expert witness from relying upon a particular study.  Rule 702 addresses the exclusion of opinions, whereas Rule 703 addresses barring an expert witness from relying upon hearsay facts or data unless they are reasonably relied upon by experts in the appropriate field. It would be helpful for lawyers and legal academics to refrain from talking about “excluding epidemiological evidence under FRE 702.”[1] Epidemiologic studies are rarely admissible themselves, but come into the courtroom as facts and data relied upon by expert witnesses. Rule 702 is addressed to the admissibility vel non of opinion testimony, some of which may rely upon epidemiologic evidence.

Another common lawyer mistake is the over-generalization that epidemiologic research provides “gold standard” of general causation evidence.[2] Although epidemiology is often required, it not “the medical science devoted to determining the cause of disease in human beings.”[3] To be sure, epidemiologic evidence will usually be required because there is no genetic or mechanistic evidence that will support the claimed causal inference, but counsel should be cautious in stating the requirement. Glib statements by courts that epidemiology is not always required are often simply an evasion of their responsibility to evaluate the validity of the proffered expert witness opinions. A more careful phrasing of the role of epidemiology will make such glib statements more readily open to rebuttal. In the absence of direct biochemical, physiological, or genetic mechanisms that can be identified as involved in bringing about the plaintiffs’ harm, epidemiologic evidence will be required, and it may well be the “gold standard” in such cases.[4]

When epidemiologic evidence is required, counsel will usually be justified in adverting to the “hierarchy of epidemiologic evidence.” Associations are shown in studies of various designs with vastly differing degrees of validity; and of course, associations are not necessarily causal. There are thus important nuances in educating the gatekeeper about this hierarchy. First, it will often be important to educate the gatekeeper about the distinction between descriptive and analytic studies, and the inability of descriptive studies such as case reports to support causal inferences.[5]

There is then the matter of confusion within the judiciary and among “scholars” about whether a hierarchy even exists. The chapter on epidemiology in the Reference Manual on Scientific Evidence appears to suggest the specious position that there is no hierarchy.[6] The chapter on medical testimony, however, takes a different approach in identifying a normative hierarchy of evidence to be considered in evaluating causal claims.[7] The medical testimony chapter specifies that meta-analyses of randomized controlled trials sit atop the hierarchy. Yet, there are divergent opinions about what should be at the top of the hierarchical evidence pyramid. Indeed, the rigorous, large randomized trial will often replace a meta-analysis of smaller trials as the more definitive evidence.[8] Back in 2007, a dubious meta-analysis of over 40 clinical trials led to a litigation frenzy over rosiglitazone.[9] A mega-trial of rosiglitazone showed that the 2007 meta-analysis was wrong.[10]

In any event, courts must purge their beliefs that once there is “some” evidence in support of a claim, their gatekeeping role is over. Randomized controlled trials really do trump observational studies, which virtually always have actual or potential confounding in their final analyses.[11] While disclaimers about the unavailability of randomized trials for putative toxic exposures are helpful, it is not quite accurate to say that it is “unethical to intentionally expose people to a potentially harmful dose of a suspected toxin.”[12] Such trials are done all the time when there is an expected therapeutic benefit that creates at least equipoise between the overall benefit and harm at the outset of the trial.[13]

At this late date, it seems shameful that courts must be reminded that evidence of associations does not suffice to show causation, but prudence dictates giving the reminder.[14] Defense counsel will generally exhibit a Pavlovian reflex to state that causality based upon epidemiology must be viewed through a lens of “Bradford Hill criteria.”[15] Rhetorically, this reflex seems wrong given that Sir Austin himself noted that his nine different considerations were “viewpoints,” not criteria. Taking a position that requires an immediate retreat seems misguided. Similarly, urging courts to invoke and apply the Bradford Hill considerations must be accompanied the caveat that courts must first apply Bradford Hill’s predicate[16] for the nine considerations:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[17]

Courts should be mindful that the language from the famous, often-cited paper was part of an after-dinner address, in which Sir Austin was speaking informally. Scientists will understand that he was setting out a predicate that calls for

(1) an association, which is

(2) “perfectly clear cut,” such that bias and confounding are excluded, and

(3) “beyond what we would care to attribute to the play of chance,” with random error kept to an acceptable level, before advancing to further consideration of the nine viewpoints commonly recited.

These predicate findings are the basis for advancing to investigate Bradford Hill’s nine viewpoints; the viewpoints do not replace or supersede the predicates.[18]

Within the nine viewpoints, not all are of equal importance. Consistency among studies, a particularly important consideration, implies that isolated findings in a single observational study will rarely suffice to support causal conclusions. Another important consideration, the strength of the association, has nothing to do with “statistical significance,” which is a predicate consideration, but reminds us that large risk ratios or risk differences provides some evidence that the association does not result from unmeasured confounding. Eliminating confounding, however, is one of the predicate requirements for applying the nine factors. As with any methodology, the Bradford Hill factors are not self-executing. The annals of litigation provide all-too-many examples of undue selectivity, “cherry picking,” and other deviations from the scientist’s standard of care.

Certainly lawyers must steel themselves against recommending the “carcinogen” hazard identifications advanced by the International Agency for Research on Cancer (IARC). There are several problematic aspects to the methods of IARC, not the least of which is IARC’s fanciful use of the word “probable.” According to the IARC Preamble, “probable” has no quantitative meaning.[19] In common legal parlance, “probable” typically conveys a conclusion that is more likely than not. Another problem arises from the IARC’s labeling of “probable human carcinogens” made in some cases without any real evidence of carcinogenesis in humans. Regulatory pronouncements are even more diluted and often involved little more than precautionary principle wishcasting.[20]


[1] Christian W. Castile & and Stephen J. McConnell, “Excluding Epidemiological Evidence Under FRE 702,” For The Defense 18 (June 2023) [Castile]. Although these authors provide an interesting overview of the subject, they fall into some common errors, such as failing to address Rule 703. The article is worth reading for its marshaling recent case law on the subject, but I detail of its errors here in the hopes that lawyers will speak more precisely about the concepts involved in challenging medical causation opinions.

[2] Id. at 18. In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *401 (S.D. Fla. Dec. 6, 2022); see also Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *14-15 (C.D. Cal. May 9, 2003) (“epidemiological studies provide the primary generally accepted methodology for demonstrating a causal relation between a chemical compound and a set of symptoms or disease” *** “The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case.”).

[3] See, e.g., Siharath v. Sandoz Pharm. Corp., 131 F. Supp. 2d 1347, 1356 (N.D. Ga. 2001) (“epidemiology is the medical science devoted to determining the cause of disease in human beings”).

[4] See, e.g., Lopez v. Wyeth-Ayerst Labs., No. C 94-4054 CW, 1996 U.S. Dist. LEXIS 22739, at *1 (N.D. Cal. Dec. 13, 1996) (“Epidemiological evidence is one of the most valuable pieces of scientific evidence of causation”); Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *15 (C.D. Cal. May 9, 2003) (“The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case”).

[5] David A. Grimes & Kenneth F. Schulz, “Descriptive Studies: What They Can and Cannot Do,” 359 Lancet 145 (2002) (“…epidemiologists and clinicians generally use descriptive reports to search for clues of cause of disease – i.e., generation of hypotheses. In this role, descriptive studies are often a springboard into more rigorous studies with comparison groups. Common pitfalls of descriptive reports include an absence of a clear, specific, and reproducible case definition, and interpretations that overstep the data. Studies without a comparison group do not allow conclusions about cause of disease.”).

[6] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” Reference Manual on Scientific Evidence 549, 564n.48 (citing a paid advertisement by a group of scientists, and misleadingly referring to the publication as a National Cancer Institute symposium) (citing Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004) (National Cancer Institute symposium [sic] concluding that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.”).

[7] John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” in Reference Manual on Scientific Evidence 687, 723 (3d ed. 2011).

[8] See, e.g., J.M. Elwood, Critical Appraisal of Epidemiological Studies and Clinical Trials 342 (3d ed. 2007).

[9] See Steven E. Nissen & Kathy Wolski, “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007). See also “Learning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11] In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *402 (S.D. Fla. Dec. 6, 2022) (“Unlike experimental studies in which subjects are randomly assigned to exposed and placebo groups, observational studies are subject to bias due to the possibility of differences between study populations.”)

[12] Castile at 20.

[13] See, e.g., Benjamin Freedman, “Equipoise and the ethics of clinical research,” 317 New Engl. J. Med. 141 (1987).

[14] See, e.g., In Re Onglyza (Saxagliptin) & Kombiglyze Xr (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 136955, at *127 (E.D. Ky. Aug. 2, 2022); Burleson v. Texas Dep’t of Criminal Justice, 393 F.3d 577, 585-86 (5th Cir. 2004) (affirming exclusion of expert causation testimony based solely upon studies showing a mere correlation between defendant’s product and plaintiff’s injury); Beyer v. Anchor Insulation Co., 238 F. Supp. 3d 270, 280-81 (D. Conn. 2017); Ambrosini v. Labarraque, 101 F.3d 129, 136 (D.C. Cir. 1996).

[15] Castile at 21. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449, 454-55 (E.D. Pa. 2014).

[16]Bradford Hill on Statistical Methods” (Sept. 24, 2013); see also Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013). 

[17] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[18] Castile at 21. See, e.g., In re Onglyza (Saxagliptin) & Kombiglyze XR (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 1821, at *43 (E.D. Ky. Jan. 5, 2022) (“The analysis is meant to apply when observations reveal an association between two variables. It addresses the aspects of that association that researchers should analyze before deciding that the most likely interpretation of [the association] is causation”); Hoefling v. U.S. Smokeless Tobacco Co., LLC, 576 F. Supp. 3d 262, 273 n.4 (E.D. Pa. 2021) (“Nor would it have been appropriate to apply them here: scientists are to do so only after an epidemiological association is demonstrated”).

[19] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble 31 (2019) (“The terms probably carcinogenic and possibly carcinogenic have no quantitative significance and are used as descriptors of different strengths of evidence of carcinogenicity in humans.”).

[20]Improper Reliance upon Regulatory Risk Assessments in Civil Litigation” (Mar. 19, 2023).

Judicial Flotsam & Jetsam – Retractions

June 12th, 2023

In scientific publishing, when scientists make a mistake, they publish an erratum or a corrigendum. If the mistake vitiates the study, then the erring scientists retract their article. To be sure, sometimes the retraction comes after an obscene delay, with the authors kicking and screaming.[1] Sometimes the retraction comes at the request of the authors, better late than never.[2]

Retractions in the biomedical journals, whether voluntary or not, are on the rise.[3] The process and procedures for retraction of articles often lack transparency. Many articles are retracted without explanation or disclosure of specific problems about the data or the analysis.[4] Sadly, however, misconduct in the form of plagiarism and data falsification is a frequent reason for retractions.[5] The lack of transparency for retractions, and sloppy scholarship, combine to create Zombie papers, which are retracted but continue to be cited in subsequent publications.[6]

LEGAL RETRACTIONS

The law treats errors very differently. Being a judge usually means that you never have to say you are sorry. Judge Andrew Hurwitz has argued that that our legal system would be better served if judges could and did “freely acknowledged and transparently corrected the occasional ‘goof’.”[7] Alas, as Judge Hurwitz notes, very few published decisions acknowledge mistakes.[8]

In the world of scientific jurisprudence, the judicial reticence to acknowledge mistakes is particularly dangerous, and it leads directly to the proliferation of citations to cases that make egregious mistakes. In the niche area of judicial assessment of scientific and statistical evidence, the proliferation of erroneous statements is especially harmful because it interferes with thinking clearly about the issues before courts. Judges believe that they have argued persuasively for a result, not by correctly marshaling statistical and scientific concepts, but by relying upon precedents erroneously arrived at by other judges in earlier cases. Regardless of how many cases are cited (and there are many possible “precedents”), the true parameter does not have a 95% probability of lying within the interval given by a given 95% confidence interval.[9] Similarly, as much as judges would like p-values and confidence intervals to eliminate the need to worry about systematic error, their saying so cannot make it so.[10] Even a mighty federal judge cannot make the p-value probability, or its complement, substitute for the posterior probability of a causal claim.[11]

Some cases in the books are so egregiously decided that it is truly remarkable that they would be cited for any proposition. I call these scientific Dred Scott cases, which illustrate that sometimes science has no criteria of validity that the law is bound to respect. One such Dred Scott case was the result of a bench trial in a federal district court in Atlanta, in Wells v. Ortho Pharmaceutical Corporation.[12]

Wells was notorious for its poor assessment of all the determinants of scientific causation.[13] The decision was met with a storm of opprobrium from the legal and medical community.[14] No scientists or legal scholars offered a serious defense of Wells on the scientific merits. Even the notorious plaintiffs’ expert witness, Carl Cranor, could muster only a distanced agnosticism:

“In Wells v. Ortho Pharmaceutical Corp., which involved a claim that birth defects were caused by a spermicidal jelly, the U.S. Court of Appeals for the 11th Circuit followed the principles of Ferebee and affirmed a plaintiff’s verdict for about five million dollars. However, some members of the medical community chastised the legal system essentially for ignoring a well-established scientific consensus that spermicides are not teratogenic. We are not in a position to judge this particular issue, but the possibility of such results exists.”[15]

Cranor apparently could not bring himself to note that it was not just scientific consensus that was ignored; the Wells case ignored the basic scientific process of examining relevant studies for both internal and external validity.

Notwithstanding this scholarly consensus and condemnation, we have witnessed the repeated recrudescence of the Wells decision. In Matrixx Initiatives, Inc. v. Siracusano,[16] in 2011, the Supreme Court, speaking through Justice Sotomayor, wandered into a discussion, irrelevant to its holding, whether statistical significance was necessary for a determination of the causality of an association:

“We note that courts frequently permit expert testimony on causation based on evidence other than statistical significance. Seee.g.Best v. Lowe’s Home Centers, Inc., 563 F. 3d 171, 178 (6th Cir 2009); Westberry v. Gislaved Gummi AB, 178 F. 3d 257, 263–264 (4th Cir. 1999) (citing cases); Wells v. Ortho Pharmaceutical Corp., 788 F. 2d 741, 744–745 (11th Cir. 1986). We need not consider whether the expert testimony was properly admitted in those cases, and we do not attempt to define here what constitutes reliable evidence of causation.”[17]

The quoted language is remarkable for two reasons. First, the Best and Westberry cases did not involve statistics at all. They addressed specific causation inferences from what is generally known as differential etiology. Second, the citation to Wells was noteworthy because the case has nothing to do with adverse event reports or the lack of statistical significance.

Wells involved a claim of birth defects caused by the use of spermicidal jelly contraceptive, which had been the subject of several studies, one of which at least yielded a nominally statistically significant increase in detected birth defects over what was expected.

Wells could thus hardly be an example of a case in which there was a judgment of causation based upon a scientific study that lacked statistical significance in its findings. Of course, finding statistical significance is just the beginning of assessing the causality of an association. The most remarkable and disturbing aspect of the citation to Wells, however, was that the Court was unaware of, or ignored, the case’s notoriety, and the scholarly and scientific consensus that criticized the decision for its failure to evaluate the entire evidentiary display, as well as for its failure to rule out bias and confounding in the studies relied upon by the plaintiff.

Justice Sotomayor’s decision for a unanimous Court is not alone in its failure of scholarship and analysis in embracing the dubious precedent of Wells. Many other courts have done much the same, both in state[18] and in federal courts,[19] and both before and after the Supreme Court decided Daubert, and even after Rule 702 was amended in 2000.[20] Perhaps even more disturbing is that the current edition of the Reference Manual on Scientific Evidence glibly cites to the Wells case, for the dubious proposition that

“Generally, researchers are conservative when it comes to assessing causal relationships, often calling for stronger evidence and more research before a conclusion of causation is drawn.”[21]

We are coming up on the 40th anniversary of the Wells judgment. It is long past time to stop citing the case. Perhaps we have reached the stage of dealing with scientific evidence at which errant and aberrant cases should be retracted, and clearly marked as retracted in the official reporters, and in the electronic legal databases. Certainly the technology exists to link the scholarly criticism with a case citation, just as we link subsequent judicial treatment by overruling, limiting, and criticizing.


[1] Laura Eggertson, “Lancet retracts 12-year-old article linking autism to MMR vaccines,” 182 Canadian Med. Ass’n J. E199 (2010).

[2] Notice of retraction for Teng Zeng & William Mitch, “Oral intake of ranitidine increases urinary excretion of N-nitrosodimethylamine,” 37 Carcinogenesis 625 (2016), published online (May 4, 2021) (retraction requested by authors with an acknowledgement that they had used incorrect analytical methods for their study).

[3] Tianwei He, “Retraction of global scientific publications from 2001 to 2010,” 96 Scientometrics 555 (2013); Bhumika Bhatt, “A multi-perspective analysis of retractions in life sciences,” 126 Scientometrics 4039 (2021); Raoul R.Wadhwa, Chandruganesh Rasendran, Zoran B. Popovic, Steven E. Nissen, and Milind Y. Desai, “Temporal Trends, Characteristics, and Citations of Retracted Articles in Cardiovascular Medicine,” 4 JAMA Network Open e2118263 (2021); Mario Gaudino, N. Bryce Robinson, Katia Audisio, Mohamed Rahouma, Umberto Benedetto, Paul Kurlansky, Stephen E. Fremes, “Trends and Characteristics of Retracted Articles in the Biomedical Literature, 1971 to 2020,” 181 J. Am. Med. Ass’n Internal Med. 1118 (2021); Nicole Shu Ling Yeo-Teh & Bor Luen Tang, “Sustained Rise in Retractions in the Life Sciences Literature during the Pandemic Years 2020 and 2021,” 10 Publications 29 (2022).

[4] Elizabeth Wager & Peter Williams, “Why and how do journals retract articles? An analysis of Medline retractions 1988-2008,” 37 J. Med. Ethics 567 (2011).

[5] Ferric C. Fanga, R. Grant Steen, and Arturo Casadevall, “Misconduct accounts for the majority of retracted scientific publications,” 109 Proc. Nat’l Acad. Sci. 17028 (2012); L.M. Chambers, C.M. Michener, and T. Falcone, “Plagiarism and data falsification are the most common reasons for retracted publications in obstetrics and gynaecology,” 126 Br. J. Obstetrics & Gyn. 1134 (2019); M.S. Marsh, “Separating the good guys and gals from the bad,” 126 Br. J. Obstetrics & Gyn. 1140 (2019).

[6] Tzu-Kun Hsiao and Jodi Schneider, “Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine,” 2 Quantitative Science Studies 1144 (2021).

[7] Andrew D. Hurwitz, “When Judges Err: Is Confession Good for the Soul?” 56 Ariz. L. Rev. 343, 343 (2014).

[8] See id. at 343-44 (quoting Justice Story who dealt with the need to contradict a previously published opinion, and who wrote “[m]y own error, however, can furnish no ground for its being adopted by this Court.” U.S. v. Gooding, 25 U.S. 460, 478 (1827)).

[9] See, e.g., DeLuca v. Merrell Dow Pharms., Inc., 791 F. Supp. 1042, 1046 (D.N.J. 1992) (”A 95% confidence interval means that there is a 95% probability that the ‘true’ relative risk falls within the interval”) , aff’d, 6 F.3d 778 (3d Cir. 1993); In re Silicone Gel Breast Implants Prods. Liab. Litig, 318 F.Supp.2d 879, 897 (C.D. Cal. 2004); Eli Lilly & Co. v. Teva Pharms, USA, 2008 WL 2410420, *24 (S.D.Ind. 2008) (stating incorrectly that “95% percent of the time, the true mean value will be contained within the lower and upper limits of the confidence interval range”). See also Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012).

[10] See, e.g., Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). This howler has been widely acknowledged in the scholarly literature. See David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

[11] In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005) (Rakoff, J.) (“Generally accepted scientific convention treats a result as statistically significant if the P-value is not greater than .05. The expression ‘P=.05’ means that there is one chance in twenty that a result showing increased risk was caused by a sampling error — i.e., that the randomly selected sample accidentally turned out to be so unrepresentative that it falsely indicates an elevated risk.”); see also In re Phenylpropanolamine (PPA) Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1236 n.1 (W.D. Wash. 2003) (“P-values measure the probability that the reported association was due to chance… .”). Although the erroneous Ephedra opinion continues to be cited, it has been debunked in the scholarly literature. See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009); Nathan A. Schachtman, “Statistical Evidence in Products Liability Litigation,” at 28-13, chap. 28, in Stephanie A. Scharf, George D. Sax, & Sarah R. Marmor, eds., Product Liability Litigation: Current Law, Strategies and Best Practices (2d ed. 2021).

[12] Wells v. Ortho Pharm. Corp., 615 F. Supp. 262 (N.D. Ga.1985), aff’d & modified in part, remanded, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986).

[13] I have discussed the Wells case in a series of posts, “Wells v. Ortho Pharm. Corp., Reconsidered,” (2012), part one, two, three, four, five, and six.

[14] See, e.g., James L. Mills and Duane Alexander, “Teratogens and ‘Litogens’,” 15 New Engl. J. Med. 1234 (1986); Samuel R. Gross, “Expert Evidence,” 1991 Wis. L. Rev. 1113, 1121-24 (1991) (“Unfortunately, Judge Shoob’s decision is absolutely wrong. There is no scientifically credible evidence that Ortho-Gynol Contraceptive Jelly ever causes birth defects.”). See also Editorial, “Federal Judges v. Science,” N.Y. Times, December 27, 1986, at A22 (unsigned editorial) (“That Judge Shoob and the appellate judges ignored the best scientific evidence is an intellectual embarrassment.”);  David E. Bernstein, “Junk Science in the Courtroom,” Wall St. J. at A 15 (Mar. 24,1993) (pointing to Wells as a prominent example of how the federal judiciary had embarrassed American judicial system with its careless, non-evidence based approach to scientific evidence); Bert Black, Francisco J. Ayala & Carol Saffran-Brinks, “Science and the Law in the Wake of Daubert: A New Search for Scientific Knowledge,” 72 Texas L. Rev. 715, 733-34 (1994) (lawyers and leading scientist noting that the district judge “found that the scientific studies relied upon by the plaintiffs’ expert were inconclusive, but nonetheless held his testimony sufficient to support a plaintiffs’ verdict. *** [T]he court explicitly based its decision on the demeanor, tone, motives, biases, and interests that might have influenced each expert’s opinion. Scientific validity apparently did not matter at all.”) (internal citations omitted); Bert Black, “A Unified Theory of Scientific Evidence,” 56 Fordham L. Rev. 595, 672-74 (1988); Paul F. Strain & Bert Black, “Dare We Trust the Jury – No,” 18 Brief  7 (1988); Bert Black, “Evolving Legal Standards for the Admissibility of Scientific Evidence,” 239 Science 1508, 1511 (1988); Diana K. Sheiness, “Out of the Twilight Zone: The Implications of Daubert v. Merrill Dow Pharmaceuticals, Inc.,” 69 Wash. L. Rev. 481, 493 (1994); David E. Bernstein, “The Admissibility of Scientific Evidence after Daubert v. Merrell Dow Pharmacueticals, Inc.,” 15 Cardozo L. Rev. 2139, 2140 (1993) (embarrassing decision); Troyen A. Brennan, “Untangling Causation Issues in Law and Medicine: Hazardous Substance Litigation,” 107 Ann. Intern. Med. 741, 744-45 (1987) (describing the result in Wells as arising from the difficulties created by the Ferebee case; “[t]he Wells case can be characterized as the court embracing the hypothesis when the epidemiologic study fails to show any effect”); Troyen A. Brennan, “Causal Chains and Statistical Links: Some Thoughts on the Role of Scientific Uncertainty in Hazardous Substance Litigation,” 73 Cornell L. Rev. 469, 496-500 (1988); David B. Brushwood, “Drug induced birth defects: difficult decisions and shared responsibilities,” 91 W. Va. L. Rev. 51, 74 (1988); Kenneth R. Foster, David E. Bernstein, and Peter W. Huber, eds., Phantom Risk: Scientific Inference and the Law 28-29, 138-39 (1993) (criticizing Wells decision); Peter Huber, “Medical Experts and the Ghost of Galileo,” 54 Law & Contemp. Problems 119, 158 (1991); Edward W. Kirsch, “Daubert v. Merrell Dow Pharmaceuticals: Active Judicial Scrutiny of Scientific Evidence,” 50 Food & Drug L.J. 213 (1995) (“a case in which a court completely ignored the overwhelming consensus of the scientific community”); Hans Zeisel & David Kaye, Prove It With Figures: Empirical Methods in Law and Litigation § 6.5, at 93(1997) (noting the multiple comparisons in studies of birth defects among women who used spermicides, based upon the many reported categories of birth malformations, and the large potential for even more unreported categories); id. at § 6.5 n.3, at 271 (characterizing Wells as “notorious,” and noting that the case became a “lightning rod for the legal system’s ability to handle expert evidence.”); Edward K. Cheng , “Independent Judicial Research in the ‘Daubert’ Age,” 56 Duke L. J. 1263 (2007) (“notoriously concluded”); Edward K. Cheng, “Same Old, Same Old: Scientific Evidence Past and Present,” 104 Michigan L. Rev. 1387, 1391 (2006) (“judge was fooled”); Harold P. Green, “The Law-Science Interface in Public Policy Decisionmaking,” 51 Ohio St. L.J. 375, 390 (1990); Stephen L. Isaacs & Renee Holt, “Drug regulation, product liability, and the contraceptive crunch: Choices are dwindling,” 8 J. Legal Med. 533 (1987); Neil Vidmar & Shari S. Diamond, “Juries and Expert Evidence,” 66 Brook. L. Rev. 1121, 1169-1170 (2001); Adil E. Shamoo, “Scientific evidence and the judicial system,” 4 Accountability in Research 21, 27 (1995); Michael S. Davidson, “The limitations of scientific testimony in chronic disease litigation,” 10 J. Am. Coll. Toxicol. 431, 435 (1991); Charles R. Nesson & Yochai Benkler, “Constitutional Hearsay: Requiring Foundational Testing and Corroboration under the Confrontation Clause,” 81 Virginia L. Rev. 149, 155 (1995); Stephen D. Sugarman, “The Need to Reform Personal Injury Law Leaving Scientific Disputes to Scientists,” 248 Science 823, 824 (1990); Jay P. Kesan, “A Critical Examination of the Post-Daubert Scientific Evidence Landscape,” 52 Food & Drug L. J. 225, 225 (1997); Ora Fred Harris, Jr., “Communicating the Hazards of Toxic Substance Exposure,” 39 J. Legal Ed. 97, 99 (1989) (“some seemingly horrendous decisions”); Ora Fred Harris, Jr., “Complex Product Design Litigation: A Need for More Capable Fact-Finders,” 79 Kentucky L. J. 510 & n.194 (1991) (“uninformed judicial decision”); Barry L. Shapiro & Marc S. Klein, “Epidemiology in the Courtroom: Anatomy of an Intellectual Embarrassment,” in Stanley A. Edlavitch, ed., Pharmacoepidemiology 87 (1989); Marc S. Klein, “Expert Testimony in Pharmaceutical Product Liability Actions,” 45 Food, Drug, Cosmetic L. J. 393, 410 (1990); Michael S. Lehv, “Medical Product Liability,” Ch. 39, in Sandy M. Sanbar & Marvin H. Firestone, eds., Legal Medicine 397, 397 (7th ed. 2007); R. Ryan Stoll, “A Question of Competence – Judicial Role in Regulation of Pharmaceuticals,” 45 Food, Drug, Cosmetic L. J. 279, 287 (1990); Note, “A Question of Competence: The Judicial Role in the Regulation of Pharmaceuticals,” Harvard L. Rev. 773, 781 (1990); Peter H. Schuck, “Multi-Culturalism Redux: Science, Law, and Politics,” 11 Yale L. & Policy Rev. 1, 13 (1993); Howard A. Denemark, “Improving Litigation Against Drug Manufacturers for Failure to Warn Against Possible Side  Effects: Keeping Dubious Lawsuits from Driving Good Drugs off the Market,” 40 Case Western Reserve L.  Rev. 413, 438-50 (1989-90); Howard A. Denemark, “The Search for Scientific Knowledge in Federal Courts in the Post-Frye Era: Refuting the Assertion that Law Seeks Justice While Science Seeks Truth,” 8 High Technology L. J. 235 (1993)

[15] Carl Cranor & Kurt Nutting, “Scientific and Legal Standards of Statistical Evidence in Toxic Tort and Discrimination Suits,” 9 Law & Philosophy 115, 123 (1990) (internal citations omitted).

[16] 131 S.Ct. 1309 (2011) [Matrixx]

[17] Id. at 1319.

[18] Baroldy v. Ortho Pharmaceutical Corp., 157 Ariz. 574, 583, 760 P.2d 574 (Ct. App. 1988); Earl v. Cryovac, A Div. of WR Grace, 115 Idaho 1087, 772 P. 2d 725, 733 (Ct. App. 1989); Rubanick v. Witco Chemical Corp., 242 N.J. Super. 36, 54, 576 A. 2d 4 (App. Div. 1990), aff’d in part, 125 N.J. 421, 442, 593 A. 2d 733 (1991); Minnesota Min. & Mfg. Co. v. Atterbury, 978 S.W. 2d 183, 193 n.7 (Tex. App. 1998); E.I. Dupont de Nemours v. Castillo ex rel. Castillo, 748 So. 2d 1108, 1120 (Fla. Dist. Ct. App. 2000); Bell v. Lollar, 791 N.E.2d 849, 854 (Ind. App. 2003; King v. Burlington Northern & Santa Fe Ry, 277 Neb. 203, 762 N.W.2d 24, 35 & n.16 (2009).

[19] City of Greenville v. WR Grace & Co., 827 F. 2d 975, 984 (4th Cir. 1987); American Home Products Corp. v. Johnson & Johnson, 672 F. Supp. 135, 142 (S.D.N.Y. 1987); Longmore v. Merrell Dow Pharms., Inc., 737 F. Supp. 1117, 1119 (D. Idaho 1990); Conde v. Velsicol Chemical Corp., 804 F. Supp. 972, 1019 (S.D. Ohio 1992); Joiner v. General Elec. Co., 864 F. Supp. 1310, 1322 (N.D. Ga. 1994) (which case ultimately ended up in the Supreme Court); Bowers v. Northern Telecom, Inc., 905 F. Supp. 1004, 1010 (N.D. Fla. 1995); Pick v. American Medical Systems, 958 F. Supp. 1151, 1158 (E.D. La. 1997); Baker v. Danek Medical, 35 F. Supp. 2d 875, 880 (N.D. Fla. 1998).

[20] Rider v. Sandoz Pharms. Corp., 295 F. 3d 1194, 1199 (11th Cir. 2002); Kilpatrick v. Breg, Inc., 613 F. 3d 1329, 1337 (11th Cir. 2010); Siharath v. Sandoz Pharms. Corp., 131 F. Supp. 2d 1347, 1359 (N.D. Ga. 2001); In re Meridia Prods. Liab. Litig., Case No. 5:02-CV-8000 (N.D. Ohio 2004); Henricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142, 1177 (E.D. Wash. 2009); Doe v. Northwestern Mutual Life Ins. Co., (D. S.C. 2012); In re Chantix (Varenicline) Prods. Liab. Litig., 889 F. Supp. 2d 1272, 1286, 1288, 1290 (N.D. Ala. 2012); Farmer v. Air & Liquid Systems Corp. at n.11 (M.D. Ga. 2018); In re Abilify Prods. Liab. Litig., 299 F. Supp. 3d 1291, 1306 (N.D. Fla. 2018).

[21] Michael D. Green, D. Michal Freedman & Leon Gordis, “Reference Guide on Epidemiology,” 549, 599 n.143, in Federal Judicial Center, National Research Council, Reference Manual on Scientific Evidence (3d ed. 2011).

Reference Manual – Desiderata for 4th Edition – Part VI – Rule 703

February 17th, 2023

One of the most remarkable, and objectionable, aspects of the third edition was its failure to engage with Federal Rule of Evidence of 703, and the need for courts to assess the validity of individual studies relied upon. The statistics chapter has a brief, but important discussion of Rule 703, as does the chapter on survey evidence. The epidemiology chapter mentions Rule 703 only in a footnote.[1]

Rule 703 appears to be the red-headed stepchild of the Federal Rules, and it is often ignored and omitted from so-called Daubert briefs.[2] Perhaps part of the problem is that Rule 703 (“Bases of an Expert”) is one of the mostly poorly drafted rules in the Federal Rules of Evidence:

“An expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed. If experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted. But if the facts or data would otherwise be inadmissible, the proponent of the opinion may disclose them to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect.”

Despite its tortuous wording, the rule is clear enough in authorizing expert witnesses to rely upon studies that are themselves inadmissible, and allowing such witnesses to disclose the studies that they have relied upon, when there has been the requisite showing of probative value that outweighs any prejudice.

The statistics chapter in the third edition, nonetheless, confusingly suggested that

“a particular study may use a method that is entirely appropriate but that is so poorly executed that it should be inadmissible under Federal Rules of Evidence 403 and 702. Or, the method may be inappropriate for the problem at hand and thus lack the ‘fit’ spoken of in Daubert. Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703.”[3]

Particular studies, even when beautifully executed, are not admissible. And particular studies are not subject to evaluation under Rule 702, apart from the gatekeeping of expert witness opinion testimony that is based upon the particular studies. To be sure, the reference to Rule 703 is important and welcomed counter to the suggestion, elsewhere in the third edition, that courts should not look at individual studies. The independent review of individual studies is occasionally lost in the shuffle of litigation, and the statistics chapter is correct to note an evidentiary concern whether each individual study may or may not be reasonably relied upon by an expert witness. In any event, reasonably relied upon studies do not ipso facto become admissible.

The third edition’s chapter on Survey Research contains the most explicit direction on Rule 703, in terms of courts’ responsibilities.  In that chapter, the authors instruct that Rule 703:

“redirect[ed] attention to the ‘validity of the techniques employed’. The inquiry under Rule 703 focuses on whether facts or data are ‘of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject’.”[4]

Although Rule 703 is clear enough on admissibility, the epidemiology chapter described epidemiologic studies broadly as admissible if sufficiently rigorous:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[5]

The authors of the epidemiology chapter acknowledge, in a footnote, “that [h]earsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert.”[6]

This footnote is curious, and incorrect. There is no question that hearsay “concerns” “may limit” admissibility of a study; hearsay is inadmissible unless there is a statutory exception.[7] Rule 703 is not one of the exceptions to the rule against hearsay in Article VIII of the Federal Rules of Evidence. An expert witness’s reliance upon a study does not make the study admissible. The authors cite two cases,[8] but neither case held that reasonable reliance by expert witnesses transmuted epidemiologic studies into admissible evidence. The text of Rule 703 itself, and the overwhelming weight of case law interpreting and applying the rule,[9]  makes clear that the rule does not render scientific studies admissible. The two cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).[10] As such, the cases failed to support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies. The third edition thus, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence. The point was reasonably clear, however, that studies “may be offered” to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused.

The Reference Manual was certainly not alone in advancing the notion that studies are themselves admissible. Other well-respected evidence scholars have misstated the law on this issue.[11] The fourth edition would do well to note that scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay. A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication. Those leaps do not mean that the final results are thus untrustworthy or not reasonably relied upon, but they do raise well-nigh insuperable barriers to admissibility. The inadmissibility of scientific studies is generally not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

The fourth edition might well also note that under Rule 104(a), the Rules of Evidence themselves do not govern a trial court’s preliminary determination, under Rules 702 or 703, of the admissibility of an expert witness’s opinion, or the appropriateness of reliance upon a particular study. Although Rule 705 may allow disclosure of facts and data described in studies, it is not an invitation to permit testifying expert witnesses to become a conduit for off-hand comments and opinions in the introduction or discussion sections of relied upon articles.[12] The wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence. Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

The third edition evidence considerable ambivalence in whether trial judges should engage in resolving disputes about the validity of individual studies relied upon by expert witnesses. Since 2000, Rule 702 clearly required such engagement, which made the Manual’s hesitancy, on the whole, unjustifiable.  The ambivalence with respect to study validity, however, was on full display in the late Professor Margaret Berger’s chapter, “The Admissibility of Expert Testimony.”[13] Berger’s chapter criticized “atomization,” or looking at individual studies in isolation, a process she described pejoratively as “slicing-and-dicing.”[14]

Drawing on the publications of Daubert-critic Susan Haack, Berger appeared to reject the notion that courts should examine the reliability of each study independently.[15] Berger described the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer (IARC), the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[16]

Berger’s description of the review process, however, was profoundly misleading in its incompleteness. Of course, scientists undertaking a systematic review identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems. Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”[17]

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010, before the third edition was published. She was no friend of Daubert,[18] but her antipathy remarkably outlived her. Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[19]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.” One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Another curious omission in the third edition’s discussions of Milward is the dark ethical cloud of misconduct that hovers over the First Circuit’s reversal of the trial court’s exclusions of Martyn Smith and Carl Cranor. On appeal, the Council for Education and Research on Toxics (CERT) filed an amicus brief in support of reversing the exclusion of Smith and Cranor. The CERT amicus brief, however, never disclosed that CERT was founded by Smith and Cranor, and that CERT funded Smith’s research.[20]

Rule 702 requires courts to pay attention to, among other things, the sufficiency of the facts and data relied upon by expert witnesses. Rule 703’s requirement that individual studies must be reasonably relied upon is an important additional protreptic against the advice given by Professor Berger, in the third edition.


[1] The index notes the following page references for Rule 703: 214, 361, 363-364, and 610 n.184.

[2] See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”);  Schachtman, “Rule 703 – The Problem Child of Article VII,” 17 Proof 3 (Spring 2009); Schachtman “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008). See also Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008); “RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011); “Giving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[3] RMSE3d at 214.

[4] RMSE3d at 364 (internal citations omitted).

[5] RMSE 3d at 610 (internal citations omitted).

[6] RSME3d at 601 n.184.

[7] Rule 802 (“Hearsay Rule”) “Hearsay is not admissible except as provided by these rules or by other rules prescribed by the Supreme Court pursuant to statutory authority or by Act of Congress.”

[8] Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984); Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984). The chapter also cited another the en banc decision in Christophersen for the proposition that “[a]s a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . . ” In the Christophersen case, the Fifth Circuit was clearly addressing the admissibility of the challenged expert witness’s opinions, not the admissibility of relied-upon studies. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1111, 1113-14 (5th Cir. 1991) (en banc) (per curiam) (trial court may exclude opinion of expert witness whose opinion is based upon incomplete or inaccurate exposure data), cert. denied, 112 S. Ct. 1280 (1992).

[9] Interestingly, the authors of this chapter abandoned their suggestion, advanced in the second edition, that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5).” which was part of their argument in the Second Edition. RMSE 2d at 335 (2000). See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[10] See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18. These holdings predated the Supreme Court’s 1993 decision in Daubert, and the issue whether they are subject to Rule 702 has not been addressed.  Federal agency factual findings have been known to be invalid, on occasion.

[11] David L. Faigman, et al., Modern Scientific Evidence: The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[12] Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093, 1093 (2004) (advising readers on how to avoid being misled by published literature, and counseling readers to “Read only the Methods and Results sections; bypass the Discussion section.”)  (emphasis added).

[13] RSME 3d 11 (2011).

[14] Id. at 19.

[15] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[16] Id. at 19-20 & n.52.

[17] See Berger, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Professor Berger never mentions Rule 703 at all!  Gone and forgotten.

[18] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[19] Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”) The addition of the controversial Milward decision cannot seriously be considered an “edit.”

[20]From Here to CERT-ainty” (June 28, 2018); “ THE COUNCIL FOR EDUCATION AND RESEARCH ON TOXICS” (July 9, 2013).

Reference Manual – Desiderata for 4th Edition – Part V – Specific Tortogens

February 14th, 2023

Examples are certainly helpful to explain and to show judges how real scientists reach causal conclusions. The Reference Manual should certainly give such examples of how scientists determine whether a claim has been adequately tested, and whether the claim has eliminated the myriad kinds of error that threaten such claims and require us to withhold our assent. The third edition of the Manual, however, advances some dodgy examples, without any data or citations. I have already pointed out that the third edition’s reference to clear cell adenocarcinoma of the vagina in young women as a “signal” disease caused only by DES is incorrect.[1] There are, alas, other troubling examples in the third edition, which are due for pruning.

Claimed Interaction Between Asbestos and Tobacco Risks for Lung Cancer

The third edition’s chapter on epidemiology discusses the complexities raised by potential interaction between multiple exposures. The discussion is appropriately suggesting that a relative risk cannot be used to determine the probability of individual causation “if the agent interacts with another cause in a way that results in an increase in disease beyond merely the sum of the increased incidence due to each agent separately.” The suggestion is warranted, although the chapter then is mum on whether there are other approaches that can be invoked to derive probabilities of causation when multiple exposures interact in a known way. Then the authors provided an example:

“For example, the relative risk of lung cancer due to smoking is around 10, while the relative risk for asbestos exposure is approximately 5. The relative risk for someone exposed to both is not the arithmetic sum of the two relative risks, that is, 15, but closer to the product (50- to 60-fold), reflecting an interaction between the two.200 Neither of the individual agent’s relative risks can be employed to estimate the probability of causation in someone exposed to both asbestos and cigarette smoke.”[2]

Putting aside for the moment the general issue of interaction, the chapter’s use of the Mt. Sinai catechism, of 5-10-50, for asbestos and tobacco smoking and lung cancer, is a poor choice. The evidence for multiplicative interaction was advanced by the late Irving Selikoff, and frankly the evidence was never very good. The supposed “non-smokers” were really “never smoked regularly,” and the smoking histories were taken by postcard surveys. The cohort of asbestos insulators was well aware of the study hypothesis, in that many of its members had compensations claims, and they had an interest in downplaying their smoking.  Indeed, the asbestos workers’ union helped fund Selikoff’s work, and Selikoff had served as a testifying expert witness for claimants.

Given that “never smoked regularly” is not the same as never having smoked, and given that the ten-fold risk from smoking-alone was already an underestimate of lung cancer risk from smoking alone, the multiplicative model never was on a firm basis.  The smoking-alone risk ratio was doubled in the American Cancer Society’s Cancer Prevention Survey Numbers One and Two, but the Mt. Sinai physicians, who frequently testified in lawsuits for claimants steadfastly held to their outdated statistical control group.[3] It is thus disturbing that the third edition’s authors trotted out a summary of asbestos / smoking lung cancer risks based upon Selikoff’s dodgy studies of asbestos insulation workers. The 5-10-50 dogma was already incorrect when the first edition went to press.

Not only were Selikoff’s study probably incorrect when originally published, updates to the insulation worker cohort published after his death, specifically undermine the multiplicative claim. In a 2013 publication by Selikoff’s successors, asbestos and smoking failed to show multiplicative interaction.  Indeed, occupational asbestos exposure that had not manifested in clinically apparent asbestosis did not show any interaction with smoking.  Only in a subgroup of insulators with clinically detectable asbestosis did the asbestosis and smoking show “supra-additive” (but not multiplicative) interaction.[4]

Manganese and Parkinson’s Disease

Table 1, of the toxicology chapter in the third edition, presented a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” The authors cautioned that the table was “not an exhaustive or inclusive list of organs, end points, or agents. Absence from this list does not indicate a relative lack of evidence for a causal relation as to any agent of concern.”[5] Among the examples presented in this Table 1 was neurotoxicity in the form of “Parkinson’s disease and manganese”[6]

The presence of this example of this example in Table 1 is curious on a number of fronts. First, one of the members of the Development Committee for the third edition was Judge Kathleen O’Malley, who presided over a multi-district litigation involving claims for parkinsonism and Parkinson’s disease against manufacturers of welding rods. It seemed unlikely that Judge O’Malley would have overlooked this section. See, e.g., In re Welding Fume Prods. Liab. Litig., 245 F.R.D. 279 (N.D. Ohio 2007) (exposure to manganese fumes allegedly increased the risk of later developing brain damage). More important, however, the authors’ inclusion of Parkinson’s disease as an outcome from manganese exposure is remarkable because that putative relationship has been extensively studied and rejected by leading researchers in the field of movement disorders.[7] In 2010, neuro-epidemiologists published a comprehensive meta-analysis that confirmed the absence of a relationship between manganese exposure and Parkinson’s disease.[8] The inclusion in Table 1 of a highly controversial relationship, manganese-Parkinson’s disease, suggests either undisclosed partisanship or ignorance of the relevant scientific evidence.

Mesothelioma

The toxicology chapter of the third edition also weighed in on mesothelioma as a supposed signature disease of asbestos exposure. The chapter’s authors described mesothelioma as “almost always caused by asbestos,”[9] which was no doubt true when mesothelioma was first identified as caused by fibrous amphibole minerals.[10] The last two decades, however, has seen a shift in the incidence of mesothelioma among industrially exposed workers, which reveals more cases without asbestos exposure and with other potential causes. Leading scientists in the field have acknowledged non-asbestos causes,[11] and recently researchers have identified genetic mutations that completely account for the causation of individual cases of mesothelioma.[12] It is time for the fourth edition to acknowledge other causes of mesothelioma, and to offer judges and lawyers guidance on genetic causes of sporadic diseases.


[1] SeeReference Manual – Desiderata for the Fourth Edition – Signature Disease” (Jan. 30, 2023).

[2] RMSE3d at 615 & n. 200. The chapter fails to cite support for the 5-10-50 dogma, but it is readily recognizable as the Mt. Sinai Catechism that was endlessly repeated by Irving Selikoff and his protégés.

[3] Michael J. Thun, Cathy A. Day-Lally, Eugenia E. Calle, W. Dana Flanders, and Clark W Heath, “Excess mortality among cigarette smokers: Changes in a 20-year interval,” 85 Am. J. Public Health 1223 (1995).

[4] Steve Markowitz, Stephen Levin, Albert Miller, and Alfredo Morabia, “Asbestos, Asbestosis, Smoking and Lung Cancer: New Findings from the North American Insulator Cohort,” 188 Am. J. Respir. & Critical Care Med. 90 (2013); seeThe Mt. Sinai Catechism” (June 7, 2013).

[5] RMSE3d at 653-54.

[6] Reference Manual at 653.

[7] See e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence. 26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and nonhuman primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong support to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clinical outcome is not associated with the degeneration of nigrostriatal dopaminergic neurons as is the case in PD [Parkinson’s disease].”)

[8] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

[9] Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” RMSE3d 633, 635 (2011).

[10] See J. Christopher Wagner, C.A. Sleggs, and Paul Marchand, “Diffuse pleural mesothelioma and asbestos exposure in the North Western Cape Province,” 17 Br. J. Indus. Med. 260 (1960); J. Christopher Wagner, “The discovery of the association between blue asbestos and mesotheliomas and the aftermath,” 48 Br. J. Indus. Med. 399 (1991); see also Harriet Hardy, M.D., Challenging Man-Made Disease:  The Memoirs of Harriet L. Hardy, M.D. 95 (1983); “Harriet Hardy’s Views on Asbestos Issues” (Mar. 13, 2013).

[11] Richard L. Attanoos, Andrew Churg, Allen R. Gibbs, and Victor L. Roggli, “Malignant Mesothelioma and Its Non-Asbestos Causes,” 142 Arch. Pathol. & Lab. Med. 753 (2018).

[12] Angela Bononia, Qian Wangb, Alicia A. Zolondick, Fang Baib, Mika Steele-Tanjia, Joelle S. Suareza , Sandra Pastorinoa, Abigail Sipesa, Valentina Signoratoa, Angelica Ferroa, Flavia Novellia , Jin-Hee Kima, Michael Minaaia,d, Yasutaka Takinishia, Laura Pellegrinia, Andrea Napolitanoa, Ronghui Xua , Christine Farrara , Chandra Goparajua, Cristian Bassig, Massimo Negrinig, Ian Paganoa , Greg Sakamotoa, Giovanni Gaudinoa, Harvey I. Pass, José N. Onuchic , Haining Yang, and Michele Carbone, “BAP1 is a novel regulator of HIF-1α,” 120 Proc. Nat’l Acad. Sci. e2217840120 (2023).

Reference Manual – Desiderata for 4th Edition – Part IV – Confidence Intervals

February 10th, 2023

Putting aside the idiosyncratic chapter by the late Professor Berger, most of the third edition of the Reference Manual presented guidance on many important issues.  To be sure, there are gaps, inconsistencies, and mistakes, but the statistics chapter should be a must-read for federal (and state) judges. On several issues, especially statistical in nature, the fourth edition could benefit from an editor to ensure that the individual chapters, written by different authors, actually agree on key concepts.  One such example is the third edition’s treatment of confidence intervals.[1]

The “DNA Identification” chapter noted that the meaning of a confidence interval is subtle,[2] but I doubt that the authors, David Kaye and George Sensabaugh, actually found it subtle or difficult. In the third edition’s chapter on statistics, David Kaye and co-author, the late David A. Freedman, gave a reasonable definition of confidence intervals in their glossary:

confidence interval. An estimate, expressed as a range, for a parameter. For estimates such as averages or rates computed from large samples, a 95% confidence interval is the range from about two standard errors below to two standard errors above the estimate. Intervals obtained this way cover the true value about 95% of the time, and 95% is the confidence level or the confidence coefficient.”[3]

Intervals, not the interval, which is correct. This chapter made clear that it was the procedure of obtaining multiple samples with intervals that yielded the 95% coverage. In the substance of their chapter, Kaye and Freedman are explicit about how intervals are constructed, and that:

“the confidence level does not give the probability that the unknown parameter lies within the confidence interval.”[4]

Importantly, the authors of the statistics chapter named names; that is, they cited some cases that butchered the concept of the confidence interval.[5] The fourth edition will have a more difficult job because, despite the care taken in the statistics chapter, many more decisions have misstated or misrepresented the meaning of a confidence interval.[6] Citing more cases perhaps will disabuse federal judges of their reliance upon case law for the meaning of statistical concepts.

The third edition’s chapter on multiple regression defined confidence interval in its glossary:

confidence interval. An interval that contains a true regression parameter with a given degree of confidence.”[7]

The chapter avoided saying anything obviously wrong only by giving a very circular definition. When the chapter substantively described a confidence interval, it ended up giving an erroneous one:

“In general, for any parameter estimate b, the expert can construct an interval around b such that there is a 95% probability that the interval covers the true parameter. This 95% confidence interval is given by: b ± 1.96 (SE of b).”[8]

The formula provided is correct, but the interpretation of a 95% probability that the interval covers the true parameter is unequivocably wrong.[9]

The third edition’s chapter by Shari Seidman Diamond on survey research, on the other hand, gave an anodyne example and a definition:

“A survey expert could properly compute a confidence interval around the 20% estimate obtained from this sample. If the survey were repeated a large number of times, and a 95% confidence interval was computed each time, 95% of the confidence intervals would include the actual percentage of dentists in the entire population who would believe that Goldgate was manufactured by the makers of Colgate.

                 *  *  *  *

Traditionally, scientists adopt the 95% level of confidence, which means that if 100 samples of the same size were drawn, the confidence interval expected for at least 95 of the samples would be expected to include the true population value.”[10]

Similarly, the third edition’s chapter on epidemiology correctly defined the confidence interval operationally as a process of iterative intervals that collectively cover the true value in 95% of all the intervals:

“A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”[11]

Not content to leave it well said, the chapter’s authors returned to the confidence interval and provided another, more problematic definition, a couple of pages later in the text:

“A confidence interval is a range of possible values calculated from the results of a study. If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”[12]

The first sentence refers to “a study”; that is, one study, one range of values. The second sentence then tells us that “the range” (singular, presumably referring back to the single “a study”), will capture 95% of the results from many resamplings from the same population. Now the definition is not framed with respect to the true population parameter, but the results from many other samples. The authors seem to have given the first sample’s confidence interval the property of including 95% of all future studies, and that is incorrect. From reviewing the case law, courts remarkably have gravitated to the second, incorrect definition.

The glossary to the third edition’s epidemiology chapter clearly, however, runs into the ditch:

“confidence interval. A range of values calculated from the results of a study within which the true value is likely to fall; the width of the interval reflects random error. Thus, if a confidence level of .95 is selected for a study, 95% of similar studies would result in the true relative risk falling within the confidence interval.”[13]

Note that the sentence before the semicolon talked of “a study” with “a range of values,” and that there is a likelihood of that range including the “true value.” This definition thus used the singular to describe the study and to describe the range of values.  The definition seemed to be saying, clearly but wrongly, that a single interval from a single study has a likelihood of containing the true value. The second full sentence ascribed a probability, 95%, to the true relative risk’s falling within “the interval.” To point out the obvious, “the interval,” is singular, and refers back to “a study,” also singular. At best, this definition was confusing; at worst, it was wrong.

The Reference Manual has a problem beyond its own inconsistencies, and the refractory resistance of the judiciary to statistical literacy. There are any number of law professors and even scientists who have held out incorrect definitions and interpretations of confidence intervals.  It would be helpful for the fourth edition to caution its readers, both bench and bar, to the prevalent misunderstandings.

Here, for instance, is an example of a well-credentialed statistician, who gave a murky definition in a declaration filed in federal court:

“If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”[14]

The expert witness correctly identifies the repeated sampling, but specifies a 95% probability to “the range,” which leaves unclear whether it is the range of all intervals or “a 95% confidence interval,” which is in the antecedent of the statement.

Much worse was a definition proffered in a recent law review article by well-known, respected authors:

“A 95% confidence interval, in contrast, is a one-sided or two-sided interval from a data sample with 95% probability of bounding a fixed, unknown parameter, for which no nondegenerate probability distribution is conceived, under specified assumptions about the data distribution.”[15]

The phrase “for which no nondegenerate probability distribution is conceived,” is unclear as to whether the quoted phrase refers to the confidence interval or to the unknown parameter. It seems that the phrase modifies the noun closest to it in the sentence, the “fixed, unknown parameter,” which suggests that these authors were simply trying to emphasize that they were giving a frequentist interpretation and not conceiving of the parameter as a random variable as Bayesians would. The phrase “no nondegenerate” appears to be a triple negative, since a degenerate distribution is one that does not have a variation. The phrase makes the definition obscure, and raises questions what is being excluded by the phrase.

The more concerning aspect of the quoted footnote is its obfuscation of the important distinction between the procedure of repeatedly calculating confidence intervals (which procedure has a 95% success rate in the long run) and the probability that any given instance of the procedure, in a single confidence interval, contains the parameter. The latter probability is either zero or one.

The definition’s reference to “a” confidence interval, based upon “a” data sample, actually leaves the reader with no way of understanding the definition to be referring to the repeated process of sampling, and the set of resulting intervals. The upper and lower interval bounds are themselves random variables that need to be taken into account, but by referencing a single interval from a single data sample, the authors misrepresent the confidence interval and invite a Bayesian interpretation.[16]

Sadly, there is a long tradition of scientists and academics in giving errant definitions and interpretations of the confidence interval.[17] Their error is not harmless because they invite the attribution of a high level of probability to the claim that the “true” population measure is within the reported confidence interval. The error encourages readers to believe that the confidence interval is not conditioned upon the single sample result, and it misleads readers into believing that not only random error, but systematic and data errors are accounted for in the posterior probability.[18] 


[1]Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012).

[2] David H. Kaye & George Sensabaugh, “Reference Guide on DNA Identification Evidence” 129, 165 n.76.

[3] David H. Kaye & David A. Freedman, “Reference Guide on Statistics” 211, 284-5 (Glossary).

[4] Id. at 247.

[5] Id. at 247 n.91 & 92 (citing DeLuca v. Merrell Dow Pharms., Inc., 791 F. Supp. 1042, 1046 (D.N.J. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993); SmithKline Beecham Corp. v. Apotex Corp., 247 F. Supp. 2d 1011, 1037 (N.D. Ill. 2003), aff’d on other grounds, 403 F.3d 1331 (Fed. Cir. 2005); In re Silicone Gel Breast Implants Prods. Liab. Litig, 318 F. Supp. 2d 879, 897 (C.D. Cal. 2004) (“a margin of error between 0.5 and 8.0 at the 95% confidence level . . . means that 95 times out of 100 a study of that type would yield a relative risk value somewhere between 0.5 and 8.0.”).

[6] See, e.g., Turpin v. Merrell Dow Pharm., Inc., 959 F.2d 1349, 1353–54 & n.1 (6th Cir. 1992) (erroneously describing a 95% CI of 0.8 to 3.10, to mean that “random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10”); American Library Ass’n v. United States, 201 F.Supp. 2d 401, 439 & n.11 (E.D.Pa. 2002), rev’d on other grounds, 539 U.S. 194 (2003); Ortho–McNeil Pharm., Inc. v. Kali Labs., Inc., 482 F.Supp. 2d 478, 495 (D.N.J.2007) (“Therefore, a 95 percent confidence interval means that if the inventors’ mice experiment was repeated 100 times, roughly 95 percent of results would fall within the 95 percent confidence interval ranges.”) (apparently relying party’s expert witness’s report), aff’d in part, vacated in part, sub nom. Ortho McNeil Pharm., Inc. v. Teva Pharms Indus., Ltd., 344 Fed.Appx. 595 (Fed. Cir. 2009); Eli Lilly & Co. v. Teva Pharms, USA, 2008 WL 2410420, *24 (S.D. Ind. 2008) (stating incorrectly that “95% percent of the time, the true mean value will be contained within the lower and upper limits of the confidence interval range”); Benavidez v. City of Irving, 638 F.Supp. 2d 709, 720 (N.D. Tex. 2009) (interpreting a 90% CI to mean that “there is a 90% chance that the range surrounding the point estimate contains the truly accurate value.”); Pritchard v. Dow Agro Sci., 705 F. Supp. 2d 471, 481, 488 (W.D. Pa. 2010) (excluding Dr. Bennet Omalu who assigned a 90% probability that an 80% confidence interval excluded relative risk of 1.0), aff’d, 430 F. App’x 102 (3d Cir.), cert. denied, 132 S. Ct. 508 (2011); Estate of George v. Vermont League of Cities and Towns, 993 A.2d 367, 378 n.12 (Vt. 2010) (erroneously describing a confidence interval to be a “range of values within which the results of a study sample would be likely to fall if the study were repeated numerous times”); Garcia v. Tyson Foods, 890 F. Supp. 2d 1273, 1285 (D. Kan. 2012) (quoting expert witness Robert G. Radwin, who testified that a 95% confidence interval in a study means “if I did this study over and over again, 95 out of a hundred times I would expect to get an average between that interval.”); In re Chantix (Varenicline) Prods. Liab. Litig., 889 F. Supp. 2d 1272, 1290n.17 (N.D. Ala. 2012); In re Zoloft Products, 26 F. Supp. 3d 449, 454 (E.D. Pa. 2014) (“A 95% confidence interval means that there is a 95% chance that the ‘‘true’’ ratio value falls within the confidence interval range.”), aff’d, 858 F.3d 787 (3d Cir. 2017); Duran v. U.S. Bank Nat’l Ass’n, 59 Cal. 4th 1, 36, 172 Cal. Rptr. 3d 371, 325 P.3d 916 (2014) (“Statisticians typically calculate margin of error using a 95 percent confidence interval, which is the interval of values above and below the estimate within which one can be 95 percent certain of capturing the ‘true’ result.”); In re Accutane Litig., 451 N.J. Super. 153, 165 A.3d 832, 842 (2017) (correctly quoting an incorrect definition from the third edition at p.580), rev’d on other grounds, 235 N.J. 229, 194 A.3d 503 (2018); In re Testosterone Replacement Therapy Prods. Liab., No. 14 C 1748, MDL No. 2545, 2017 WL 1833173, *4 (N.D. Ill. May 8, 2017) (“A confidence interval consists of a range of values. For a 95% confidence interval, one would expect future studies sampling the same population to produce values within the range 95% of the time.”); Maldonado v. Epsilon Plastics, Inc., 22 Cal. App. 5th 1308, 1330, 232 Cal. Rptr. 3d 461 (2018) (“The 95 percent ‘confidence interval’, as used by statisticians, is the ‘interval of values above and below the estimate within which one can be 95 percent certain of capturing the “true” result’.”); Escheverria v. Johnson & Johnson, 37 Cal. App. 5th 292, 304, 249 Cal. Rptr. 3d 642 (2019) (quoting uncritically and with approval one of plaintiff’s expert witnesses, Jack Siemiatycki, who gave the jury an example of a study with a relative risk of 1.2, with a “95 percent probability that the true estimate is between 1.1 and 1.3.” According to the court, Siemiatycki went on to explain that this was “a pretty tight interval, and we call that a confidence interval. We call it a 95 percent confidence interval when we calculate it in such a way that it covers 95 percent of the underlying relative risks that are compatible with this estimate from this study.”); In re Viagra (Sildenafil Citrate) & Cialis (Tadalafil) Prods. Liab. Litig., 424 F.Supp.3d 781, 787 (N.D. Cal. 2020) (“For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent “confidence interval” of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”); Rhyne v. United States Steel Corp., 74 F. Supp. 3d 733, 744 (W.D.N.C. 2020) (relying upon, and quoting, one of the more problematic definitions given in the third edition at p.580: “If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the population.”); Wilant v. BNSF Ry., C.A. No. N17C-10-365 CEB, (Del. Super. Ct. May 13, 2020) (citing third edition at p.573, “a confidence interval provides ‘a range (interval) within which the risk likely would fall if the study were repeated numerous times’.”; “[s]o a 95% confidence interval indicates that the range of results achieved in the study would be achieved 95% of the time when the study is replicated from the same population.”); Germaine v. Sec’y Health & Human Servs., No. 18-800V, (U.S. Fed. Ct. Claims July 29, 2021) (giving an incorrect definition directly from the third edition, at p.621; “[a] “confidence interval” is “[a] range of values … within which the true value is likely to fall[.]”).

[7] Daniel Rubinfeld, “Reference Guide on Multiple Regression” 303, 352.

[8] Id. at 342.

[9] See Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidemiol. 337, 343 (2016).

[10] Shari Seidman Diamond, “Reference Guide on Survey Research” 359, 381.

[11] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 573.

[12] Id. at 580.

[13] Id. at 621.

[14] In re Testosterone Replacement Therapy Prods. Liab. Litig., Declaration of Martin T. Wells, Ph.D., at 2-3 (N.D. Ill., Oct. 30, 2016). 

[15] Joseph Sanders, David Faigman, Peter Imrey, and A. Philip Dawid, “Differential Etiology: Inferring Specific Causation in the Law from Group Data in Science,” 63 Arizona L. Rev. 851, 898 n.173 (2021).

[16] The authors are well-credentialed lawyers and scientists. Peter Imrey, was trained in, and has taught, mathematical statistics, biostatistics, and epidemiology. He is a professor of medicine in the Cleveland Clinic Lerner College of Medicine. A. Philip Dawid is a distinguished statistician, an Emeritus Professor of Statistics, Cambridge University, Darwin College, and a Fellow of the Royal Society. David Faigman is the Chancellor & Dean, and the John F. Digardi Distinguished Professor of Law at the University of California Hastings College of the Law. Joseph Sanders is the A.A. White Professor, at the University of Houston Law Center. I have previously pointed this problem in these authors’ article. “Differential Etiologies – Part One – Ruling In” (June 19, 2022).

[17] See, e.g., Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004) (“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 60-61 n. 17 (2007) (quoting Clapp and Ozonoff with obvious approval); Déirdre DwyerThe Judicial Assessment of Expert Evidence 154-55 (Cambridge Univ. Press 2008) (“By convention, scientists require a 95 per cent probability that a finding is not due to chance alone. The risk ratio (e.g. ‘2.2’) represents a mean figure. The actual risk has a 95 per cent probability of lying somewhere between upper and lower limits (e.g. 2.2 ±0.3, which equals a risk somewhere between 1.9 and 2.5) (the ‘confidence interval’).”); Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103, 110 (2013) (“A confidence interval provides both the relative risk found in the study and a range (interval) within which the risk would likely fall if the study were repeated numerous times.”); Christopher B. Mueller, “Daubert Asks the Right Questions:  Now Appellate Courts Should Help Find the Right Answers,” 33 Seton Hall L. Rev. 987, 997 (2003) (describing the 95% confidence interval as “the range of outcomes that would be expected to occur by chance no more than five percent of the time”); Arthur H. Bryant & Alexander A. Reinert, “The Legal System’s Use of Epidemiology,” 87 Judicature 12, 19 (2003) (“The confidence interval is intended to provide a range of values within which, at a specified level of certainty, the magnitude of association lies.”) (incorrectly citing the first edition of Rothman & Greenland, Modern Epidemiology 190 (Philadelphia 1998);  John M. Conley & David W. Peterson, “The Science of Gatekeeping: The Federal Judicial Center’s New Reference Manual on Scientific Evidence,” 74 N.C.L.Rev. 1183, 1212 n.172 (1996) (“a 95% confidence interval … means that we can be 95% certain that the true population average lies within that range”).

[18] See Brock v. Merrill Dow Pharm., Inc., 874 F.2d 307, 311–12 (5th Cir. 1989) (incorrectly stating that the court need not resolve questions of bias and confounding because “the studies presented to us incorporate the possibility of these factors by the use of a confidence interval”). Bayesian credible intervals can similarly be misleading when the interval simply reflects sample results and sample variance, but not the myriad other ways the estimate may be wrong.

Reference Manual – Desiderata for 4th Edition – Part III – Differential Etiology

February 1st, 2023

Admittedly, I am playing the role of the curmudgeon here by pointing out errors or confusions in the third edition of the Reference Manual.  To be sure, there are many helpful and insightful discussions throughout the Manual, but they do not need to be revised.  Presumably, the National Academies and the Federal Judicial Center are undertaking the project of producing a fourth edition because they understand that revisions, updates, and corrections are needed. Otherwise, why bother?

To be sure, there are aspects of the third edition’s epidemiology chapter that get some important points right. 

(1) The chapter at least acknowledges that small relative risks (1 < RR <3) may be insufficient to support causal inferences.[1]

(2) The chapter correctly notes that the method known as “differential etiology” addresses only specific causation, and that the method presupposes that general causation has been established.[2]

(3) The third edition correctly observes that clinicians generally are not concerned with etiology as much as with diagnosis of disease.[3] The authors of the epidemiology chapter correctly observe that “[f]or many health conditions, the cause of the disease or illness has no relevance to its treatment, and physicians, therefore, do not employ this term or pursue that question.”[4] This observation alone should help trial courts question whether many clinicians have even the pretense of expertise to offer expert causation opinions.[5]

(4) With respect to so-called differential etiology, the third edition correctly states that this mode of reasoning is a logically valid argument if premises are true; that is, general causation must be established for each “differential etiology.” The epidemiology chapter observes that “like any scientific methodology, [differential etiology] can be performed in an unreliable manner.”[6]

(5) The third edition reports that the differential etiology argument as applied in litigation is often invalid because not all the differentials other than the litigation claim have been ruled out.[7]

(6) The third edition properly notes that for diseases for which the causes are largely unknown, such as most birth defects, a differential etiology is of little benefit.[8] Unfortunately, the third edition offered no meaningful guidance for how courts should consider differential etiologies offered when idiopathic cases make up something less “than largely,” (0% < Idiopathic < 10%, 20%, 30%, 40, 50%, etc.).The chapter acknowledges that:

“Although differential etiologies are a sound methodology in principle, this approach is only valid if … a substantial proportion of competing causes are known. Thus, for diseases for which the causes are largely unknown, such as most birth defects, a differential etiology is of little benefit.”[9]

Accordingly, many cases reject proffered expert witness testimony on differential etiology, when the witnesses failed to rule out idiopathic causes in the case at issue. What is a substantial proportion?  Unfortunately, the third edition did not attempt to quantify or define “substantial.” The inability to rule out unknown etiologies remains the fatal flaw in much expert witness opinion testimony on specific causation.

Errant Opinions on Differential Etiology

The third edition’s treatment of differential etiology does leave room for improvement. One glaring error is the epidemiology chapter’s assertion that “differential etiology is a legal invention not used by physicians.”[10] Indeed, the third edition provides a definition for “differential etiology” that reinforces the error:

differential etiology. Term used by the court or witnesses to establish or refute external causation for a plaintiff’s condition. For physicians, etiology refers to cause.”[11]

The third edition’s assertion about legal provenance and exclusivity can be quickly dispelled by a search on “differential etiology” in the National Library of Medicine’s PubMed database, which shows up dozens of results, going back to the early 1960s. Some citations are supplied in the notes.[12] A Google Ngram for “differential etiology” in American English shows prevalent usage well before any of the third edition’s cited cases:

The third edition’s erroneous assertion about the provenance of “differential etiology” has been echoed by other law professors. David Faigman, for instance, has claimed that in advancing differential etiologies, expert witnesses were inventing wholesale an approach that had no foundation or acceptance in their scientific disciplines:

“Differential etiology is ostensibly a scientific methodology, but one not developed by, or even recognized by, physicians or scientists. As described, it is entirely logical, but has no scientific methods or principles underlying it. It is a legal invention and, as such, has analytical heft, but it is entirely bereft of empirical grounding. Courts and commentators have so far merely described the logic of differential etiology; they have yet to define what that methodology is.”[13]

Faigman’s claim that courts and commentators have not defined the methodology underlying differential etiology is wrong. Just as hypothesis testing is predicated upon a probabilistic version of modus tollens, differential etiology is based upon “iterative disjunctive syllogism,” or modus tollendo ponens. Basic propositional logic recognizes that such syllogisms are valid arguments,[14] in which one of its premises is a disjunction (P v Q), and the other premise is the negation of one of the disjuncts:

P v Q

~P­­­_____

∴ Q

If we expand the disjunctive premise to more than one disjunction, we can repeat the inference (iteratively), eliminating one disjunct at a time, until we arrive at a conclusion that is a simple, affirmative proposition, without any disjunctions in it.

P v Q v R

~P­­­_____

∴ Q v R

     ~Q­­­_____

∴ R

Hence, the term “iterative disjunctive syllogism.” Sherlock Holmes’ fans, of course, will recognize that iterative disjunctive syllogism is nothing other than the process of elimination, as explained by the hero of Sir Arthur Conan Doyle’s short stories.[15]

The fourth edition should correct the error of the third edition, and it should dispel the strange notion that differential etiology is not used by scientists or clinicians themselves.

Supreme Nonsense on Differential Etiology

In 2011, the Supreme Court addressed differential etiology in a case, Matrixx Initiatives, in stunningly irrelevant and errant dicta. The third edition did not discuss this troublesome case, in which the defense improvidently moved to dismiss a class action complaint for securities violations allegedly arising from the failure to disclose multiple adverse event reports of anosmia from the use of the defendant’s product, Zicam. The basic reason for the motion on the pleadings was that the plaintiffs’ failed to allege a statistically significant and causally related increased risk of anosmia.  The Supreme Court made short work of the defense argument because material events, such as an FDA recall, did not require the existence of a causal relationship between Zicam use and anosmia. The defense complaints about statistical significance, causation, and their absence, were thus completely beside the point of the case.  Nonetheless, it became the Court’s turn for improvidence in addressing statistical and causation issues not properly before it. With respect to causation, the Court offered this by way of obiter dictum:

“We note that courts frequently permit expert testimony on causation based on evidence other than statistical significance. Seee.g.Best v. Lowe’s Home Centers, Inc., 563 F. 3d 171, 178 (6th Cir 2009); Westberry v. Gislaved Gummi AB, 178 F. 3d 257, 263–264 (4th Cir. 1999) (citing cases); Wells v. Ortho Pharmaceutical Corp., 788 F. 2d 741, 744–745 (11th Cir. 1986). We need not consider whether the expert testimony was properly admitted in those cases, and we do not attempt to define here what constitutes reliable evidence of causation.”[16]

This part of the Court’s opinion was stunningly wrong about the Court of Appeals’ decisions on statistical significance[17] and on causation. The Best and the Westberry decisions were both cases that turned on specific, not general, causation.  Statistical significance this was not part of the reasoning or rationale of the cited cases on specific caustion. Both cases assumed that general causation was established, and inquired into whether expert witnesses could reasonably and validly attribute the health outcome in the case to the exposures that were established causes of such outcomes.  The Court’s selection of these cases, quite irrelevant to its discussion, appears to have come from the Solicitor General’s amicus brief in Matrixx, but mindlessly adopted by the Court.

Although cited for an irrelevant proposition, the Supreme Court’s selection of the Best’s case was puzzling because the Sixth Circuit’s discussion of the issue is particularly muddled. Here is the relevant language from Best:

“[A] doctor’s differential diagnosis is reliable and admissible where the doctor

(1) objectively ascertains, to the extent possible, the nature of the patient’s injury…,

(2) ‘rules in’ one or more causes of the injury using a valid methodology,

and

(3) engages in ‘standard diagnostic techniques by which doctors normally rule out alternative causes” to reach a conclusion as to which cause is most likely’.”[18]

Of course, as the authors of the third edition’s epidemiology chapter correctly note, physicians rarely use this iterative process to arrive at causes of diseases in an individual; they use it to identify the disease or disease process that is responsible for the patient’s signs and symptoms.[19] The Best court’s description does not make sense in that it characterizes the process as ruling in “one or more” causes, and then ruling out alternative causes.  If an expert had ruled in only one cause, then there would be no need or opportunity to rule out an alternative cause.  If the one ruled-in cause was ruled out for other reasons, then the expert witness would be left with a case of idiopathic disease.

In any event, differential etiology was irrelevant to the general causation issue raised by the defense in Matrixx Initiatives. After the Supreme Court correctly recognized that causation was largely irrelevant to the securities fraud claim, it had no reason to opine on general causation.  Certainly, the Supreme Court had no reason to cite two cases on differential etiology in a case that did not even require allegations of general causation. The fourth edition of the Reference Manual should put Matrixx Initatives in its proper (and very limited) place.


[1] RMSE3d at 612 & n.193 (noting that “one commentator contends that, because epidemiology is sufficiently imprecise to accurately measure small increases in risk, in general, studies that find a relative risk less than 2.0 should not be sufficient for causation. The concern is not with specific causation but with general causation and the likelihood that an association less than 2.0 is noise rather than reflecting a true causal relationship. See Michael D. Green, “The Future of Proportional Liability,” in Exploring Tort Law (Stuart Madden ed., 2005); see also Samuel M. Lesko & Allen A. Mitchell, “The Use of Randomized Controlled Trials for Pharmacoepidemiology Studies,” in Pharmacoepidemiology 599, 601 (Brian Strom ed., 4th ed. 2005) (“it is advisable to use extreme caution in making causal inferences from small relative risks derived from observational studies”); Gary Taubes, “Epidemiology Faces Its Limits,” 269 Science 164 (1995) (explaining views of several epidemiologists about a threshold relative risk of 3.0 to seriously consider a causal relationship); N.E. Breslow & N.E. Day, “Statistical Methods in Cancer Research,” in The Analysis of Case-Control Studies 36 (IARC Pub. No. 32, 1980) (“[r]elative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor”); David A. Freedman & Philip B. Stark, “The Swine Flu Vaccine and Guillain-Barré Syndrome: A Case Study in Relative Risk and Specific Causation,” 64 Law & Contemp. Probs. 49, 61 (2001) (“If the relative risk is near 2.0, problems of bias and confounding in the underlying epidemiologic studies may be serious, perhaps intractable.”). For many other supporting comments and observations, see “Small Relative Risks and Causation” (June 28, 2022).

[2] RMSE3d. at 618 (“Although differential etiologies are a sound methodology in principle, this approach is only valid if general causation exists … .”). In the case of a novel putative cause, the case may give rise to a hypothesis that the putative cause can cause the outcome, in general, and did so in the specific case.  That hypothesis must, of course, then be tested and supported by appropriate analytical methods before it can be accepted for general causation and as a putative specific cause in a particular individual.

[3] RMSE3d at 617.

[4] RMSE3d at 617 & n. 211 (citing Zandi v. Wyeth, Inc., No. 27-CV-06-6744, 2007 WL 3224242 (D. Minn. Oct. 15, 2007) (observing that physicians do assess the cause of patients’ breast cancers)).

[5] See, e.g., Tamraz v. BOC Group Inc., No. 1:04-CV-18948, 2008 WL 2796726 (N.D.Ohio July 18, 2008)(denying Rule 702 challenge to treating physician’s causation opinion), rev’d sub nomTamraz v. Lincoln Elec. Co., 620 F.3d 665 (6th Cir. 2010)(carefully reviewing record of trial testimony of plaintiffs’ treating physician; reversing judgment for plaintiff based in substantial part upon treating physician’s speculative causal assessment created by plaintiffs’ counsel), cert. denied, ___ U.S. ___ , 131 S. Ct. 2454 (2011).

[6] RMSE3d at 617-18 & n. 215.

[7] See, e.g, Milward v. Acuity Specialty Products Group, Inc., Civil Action No. 07–11944–DPW, 2013 WL 4812425 (D. Mass. Sept. 6, 2013) (excluding plaintiffs’ expert witnesses on specific causation), aff’d sub nom., Milward v. Rust-Oleum Corp., 820 F.3d 469 (1st Cir. 2016). Interestingly, the earlier appellate journey taken by the Milward litigants resulted in a reversal of a Rule 702 exclusion of plaintiff’s general causation expert witnesses. That reversal meant that there was no longer a final judgment.  The exclusion of specific causation witnesses was affirmed by the First Circuit, and the general causation opinion was no longer necessary to the final judgment. See Differential Diagnosis in Milward v. Acuity Specialty Products Group” (Sept. 26, 2013); “Differential Etiology and Other Courtroom Magic” (June 23, 2014).

[8] RMSE3d at 617-18 & n. 214.

[9] See RMSE at 618 (internal citations omitted).

[10] RMSE3d at 691 (emphasis added).

[11] RMSE3d at 743.

[12] See, e.g., Kløve & D. Doehring, “MMPI in epileptic groups with differential etiology,” 18 J. Clin. Psychol. 149 (1962); Kløve & C. Matthews, “Psychometric and adaptive abilities in epilepsy with differential etiology,” 7 Epilepsia 330 (1966); Teuber & K. Usadel, “Immunosuppression in juvenile diabetes mellitus? Critical viewpoint on the treatment with cyclosporin A with consideration of the differential etiology,” 103  Fortschr. Med. 707 (1985); G.May & W. May, “Detection of serum IgA antibodies to varicella zoster virus (VZV)–differential etiology of peripheral facial paralysis. A case report,” 74 Laryngorhinootologie 553 (1995); Alan Roberts, “Psychiatric Comorbidity in White and African-American Illicity Substance Abusers” Evidence for Differential Etiology,” 20 Clinical Psych. Rev. 667 (2000); Mark E. Mullinsa, Michael H. Leva, Dawid Schellingerhout, Gilberto Gonzalez, and Pamela W. Schaefera, “Intracranial Hemorrhage Complicating Acute Stroke: How Common Is Hemorrhagic Stroke on Initial Head CT Scan and How Often Is Initial Clinical Diagnosis of Acute Stroke Eventually Confirmed?” 26 Am. J. Neuroradiology 2207 (2005); Qiang Fua, et al., “Differential Etiology of Posttraumatic Stress Disorder with Conduct Disorder and Major Depression in Male Veterans,” 62 Biological Psychiatry 1088 (2007); Jesse L. Hawke, et al., “Etiology of reading difficulties as a function of gender and severity,” 20 Reading and Writing 13 (2007); Mastrangelo, “A rare occupation causing mesothelioma: mechanisms and differential etiology,” 105 Med. Lav. 337 (2014).

[13] David L. Faigman & Claire Lesikar, “Organized Common Sense: Some Lessons from Judge Jack Weinstein’s Uncommonly Sensible Approach to Expert Evidence,” 64 DePaul L. Rev. 421, 439, 444 (2015). See alsoDavid Faigman’s Critique of G2i Inferences at Weinstein Symposium” (Sept. 25, 2015).

[14] See Irving Copi & Carl Cohen Introduction to Logic at 362 (2005).

[15] See, e.g., Doyle, The Blanched Soldier (“…when you have eliminated all which is impossible, then whatever remains, however improbable, must be the truth.”); Doyle, The Beryl Coronet (“It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth.”); Doyle, The Hound of the Baskervilles (1902) (“We balance probabilities and choose the most likely. It is the scientific use of the imagination.”); Doyle, The Sign of the Four, ch 6 (1890)(“‘You will not apply my precept’, he said, shaking his head. ‘How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? We know that he did not come through the door, the window, or the chimney. We also know that he could not have been concealed in the room, as there is no concealment possible. When, then, did he come?”)

[16] Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309, 1319 (2011). 

[17] The citation to Wells was clearly wrong in that the plaintiffs in that case had, in fact, relied upon studies that were nominally statistically significant, and so the Wells court could not have held that statistical significance was unnecessary.

[18] Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 179, 183-84 (6th Cir. 2009).

[19] See generally Harold C. Sox, Michael C. Higgins, and Douglas K. Owens, Medical Decision Making (2d ed. 2014). 

Reference Manual – Desiderata for 4th Edition – Part I – Signature Diseases

January 30th, 2023

The fourth edition of the Reference Manual on Scientific Evidence is by all accounts under way. Each of the first three editions represented an improvement over previous editions, but the last edition continued to have substantive problems. The bar, the judiciary, and the scientific community hopefully await an improved fourth edition. Although I have posted previously about issues in the third edition, I am updating and adding to what I have written.[1]  There were only a few reviews and acknowledgments of the third edition.[2] The editorial staff provided little to no opportunity for comments in advance of the third edition, and to date, there has been no call for public comment about the pending fourth edition. I hope there will be more opportunity for the legal and scientific community to comment in the production of the fourth edition.

There are several issues raised by the third edition’s treatment of specific causation, which I hope will be improved in the fourth edition. One such issue is the epidemiology chapter’s brief discussion of so-called signature diseases. The chapter takes the curious position that epidemiology has nothing to say about individual or specific causation, a position I will discuss in later posts. The chapter, however, carves out a limited exception to its (questionable) edict that epidemiology does not concern itself with specific causation.  The chapter tells us, uncontroversially, that some diseases do not occur without exposure to a specific chemical or substance. In my view, the authors of this chapter then go astray in telling us that “[a]bestosis is a signature disease for asbestos, and vaginal adenocarcinoma (in young adult women) is a signature disease for in utero DES exposure.”

Now, by definition, only asbestos can cause asbestosis, but asbestosis presents clinically in a way that is indistinguishable in many cases from idiopathic pulmonary fibrosis and other interstitial fibrotic diseases of the lungs. Over the years, the diagnostic criteria for asbestosis have changed, but these criteria have always had a specificity and sensitivity less than 100%. Saying that a case of asbestosis must have been caused by asbestos begs the clinical question whether the case really is asbestosis.

The chapter’s characterization of vaginal adenocarcinoma as a signature disease of in utero DES exposure is also not correct.  Although this cancer in young women is extremely rare, there is a baseline risk that allows the calculation of relative risks for young women exposed in utero. In older women, the relative risks are lower because the baseline risks are higher, and because the effect of DES is diminished for older onset cases.[3] The disease was known before the use of DES in pregnant women began after World War II.[4]

For support of their discussion of “signal diseases,” the authors of the epidemiology chapter chose, remarkably, to cite an article that was over 25 years old (now over 35 years old) at the time the third edition was published.[5] The referenced passage asks us to:

“Consider tort claims for what have come to be called signature disease. These are diseases characteristically caused by only a few substances – such as the vaginal adenocarcinoma usually associated with exposure to DES in utero – and mesothelioma, a cancer of the pleura caused almost exclusively by exposure to asbestos fibers in the air.”[6]

Well, “usually associated” does equal signature disease.[7] The relative risks for smoking and some kinds of lung cancer are higher than for DES in utero and clear cell vaginal adenocarcinoma, but no one calls lung cancer a signature disease of smoking. (Admittedly, smoking is the major cause and perhaps the most preventable cause of lung cancer in Western countries.)

The third edition’s reference to a source that describes mesothelioma as “caused almost exclusively by exposure to asbestos fibers” is also out of date.[8] Recognizing that casual comments and citations can influence credulous judges, the authors of the fourth edition should strive for greater accuracy in their discussions of such scientific issues. It may be time to find new examples of signature disease.


[1]Reference Manual on Scientific Evidence v4.0” (Feb. 28, 2021); “Reference Manual on Scientific Evidence – 3rd Edition is Past Its Expiry” (Oct. 17, 2021). 

[2] See, e.g., Adam Dutkiewicz, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 28 Thomas M. Cooley L. Rev. 343 (2011); John A. Budny, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 31 Internat’l J. Toxicol. 95 (2012); James F. Rogers, Jim Shelson, and Jessalyn H. Zeigler, “Changes in the Reference Manual on Scientific Evidence (Third Edition),” Internat’l Ass’n Def. Csl. Drug, Device & Biotech. Comm. Newsltr. (June 2012). See Schachtman “New Reference Manual’s Uneven Treatment of Conflicts of Interest” (Oct. 12, 2011).

[3] Janneke Verloop, Flora E. van Leeuwen, Theo J. M. Helmerhorst, Hester H. van Boven, and Matti A. Rookus, “Cancer risk in DES daughters,” 21 Cancer Causes & Control 999 (2010).

[4] See “Risk Factors for Vaginal Cancer,” American Cancer Soc’y website (last visited Jan. 29, 2023).

[5] Kenneth S. Abraham & Richard A. Merrill, Scientific Uncertainty in the Courts, 2 Issues Sci. & Tech. 93, 101 (Winter 1986).

[6] Id.

[7] See, e.g., Kadir Güzin, Semra Kayataş Eserm, Ayşe Yiğit, and Ebru Zemheri, “Primary clear cell carcinoma of the vagina that is not related to in utero diethylstilbestrol use,” 3 Gynecol. Surg. 281 (2006).

[8] Michele Carbone, Harvey I. Pass, Guntulu Ak, H. Richard Alexander Jr., Paul Baas, Francine Baumann, Andrew M. Blakely, Raphael Bueno, Aleksandra Bzura, Giuseppe Cardillo, Jane E. Churpek, Irma Dianzani, Assunta De Rienzo, Mitsuru Emi, Salih Emri, Emanuela Felley-Bosco, Dean A. Fennell, Raja M. Flores, Federica Grosso, Nicholas K. Hayward, Mary Hesdorffer, Chuong D. Hoang, Peter A. Johansson, Hedy L. Kindler, Muaiad Kittaneh, Thomas Krausz, Aaron Mansfield, Muzaffer Metintas, Michael Minaai, Luciano Mutti, Maartje Nielsen, Kenneth O’Byrne, Isabelle Opitz, Sandra Pastorino, Francesca Pentimalli, Marc de Perrot, Antonia Pritchard, Robert Taylor Ripley, Bruce Robinson, and Valerie Rusch, “Medical and Surgical Care of Patients With Mesothelioma and Their Relatives Carrying Germline BAP1 Mutations,” 17 J. Thoracic Oncol. 873 (2022). See also Mitchell Cheung, Yuwaraj Kadariya, Eleonora Sementino, Michael J. Hall, Ilaria Cozzi, Valeria Ascoli, Jill A. Ohar, and Joseph R. Testa, “Novel LRRK2 mutations and other rare, non-BAP1-related candidate tumor predisposition gene variants in high-risk cancer families with mesothelioma and other tumors,” 30 Human Molecular Genetics 1750 (2021); Thomas Wiesner, Isabella Fried, Peter Ulz, Elvira Stacher, Helmut Popper, Rajmohan Murali, Heinz Kutzner, Sigurd Lax, Freya Smolle-Jüttner, Jochen B. Geigl, and Michael R. Speicher, “Toward an Improved Definition of the Tumor Spectrum Associated With BAP1 Germline Mutations,” 30 J. Clin. Oncol. e337 (2012); Alexandra M. Haugh, BA1; Ching-Ni Njauw, MS2,3; Jeffrey A. Bubley, et al., “Genotypic and Phenotypic Features of BAP1 Cancer Syndrome: A Report of 8 New Families and Review of Cases in the Literature,” 153 J.Am. Med. Ass’n Dermatol. 999 (2017).

Improper Reliance upon Regulatory Risk Assessments in Civil Litigation

March 19th, 2022

Risk assessments would seemingly be about assessing risks, but they are not. The Reference Manual on Scientific Evidence defines “risk” as “[a] probability that an event will occur (e.g., that an individual will become ill or die within a stated period of time or by a certain age).”[1] The risk in risk assessment, however, may be zero, or uncertain, or even a probability of benefit. Agencies that must assess risks and set “action levels,” or “permissible exposure limits,” or “acceptable intakes,” often work under great uncertainty, with inspired guesswork, using unproven assumptions.

The lawsuit industry has thus often embraced the false equivalence between agency pronouncements on harmful medicinal, environmental, or occupational exposures and civil litigation adjudication of tortious harms. In the United States, federal agencies such as the Occupational Safety and Health Administration (OSHA), or the Environmental Protection Agency (EPA), and their state analogues, regularly set exposure standards that could not and should not hold up in a common-law tort case. 

Remarkably, there are state and federal court judges who continue to misunderstand and misinterpret regulatory risk assessments, notwithstanding efforts to educate the judiciary. The second edition of the Reference Manual on Scientific Evidence contained a chapter by the late Professor Margaret Berger, who took pains to point out the difference between agency assessments and the adjudication of causal claims in court:

[p]roof of risk and proof of causation entail somewhat different questions because risk assessment frequently calls for a cost-benefit analysis. The agency assessing risk may decide to bar a substance or product if the potential benefits are outweighed by the possibility of risks that are largely unquantifiable because of presently unknown contingencies. Consequently, risk assessors may pay heed to any evidence that points to a need for caution, rather than assess the likelihood that a causal relationship in a specific case is more likely than not.[2]

In March 2003, Professor Berger organized a symposium,[3] the first Science for Judges program (and the last), where the toxicologist Dr. David L. Eaton presented on the differences in the use of toxicology in regulatory pronouncements as opposed to causal assessments in civil actions. As Dr. Eaton noted:

“regulatory levels are of substantial value to public health agencies charged with ensuring the protection of the public health, but are of limited value in judging whether a particular exposure was a substantial contributing factor to a particular individual’s disease or illness.”[4]

The United States Environmental Protection Agency (EPA) acknowledges that estimating “risk” from low level exposures based upon laboratory animal data is fraught because of inter-specie differences in longevity, body habitus and size, genetics, metabolism, excretion patterns, genetic homogeneity of laboratory animals, dosing levels and regimens. The EPA’s assumptions in conducting and promulgating regulatory risk assessments are intended to predict the upper bound of theoretical risk, while fully acknowledging that there may be no actual risk in humans:

“It should be emphasized that the linearized multistage [risk assessment] procedure leads to a plausible upper limit to the risk that is consistent with some proposed mechanisms of carcinogenesis. Such an estimate, however, does not necessarily give a realistic prediction of the risk. The true value of the risk is unknown, and may be as low as zero.”[5]

The approach of the U.S. Food and Drug Administration (FDA) with respect to mutagenic impurities in medications provides an illustrative example of how theoretical and hypothetical risk assessment can be.[6] The FDA’s risk assessment approach is set out in a “Guidance” document, which like all such FDA guidances, describes itself as containing non-binding recommendations, which do not preempt alternative approaches.[7] The agency’s goal is devise a control strategy for any mutagenic impurity to keep it at or below an “acceptable cancer risk level,” even if the risk or the risk level is completely hypothetical.

The FDA guidance advances the concept of a “Threshold of Toxicological Concern (TTC),” to set an “acceptable intake,” for chemical impurities that pose negligible risks of toxicity or carcinogenicity.[8] The agency describes its risk assessment methodology as “very conservative,” given the frequently unproven assumptions made to reach a quantification of an “acceptable intake”:

“The methods upon which the TTC is based are generally considered to be very conservative since they involve a simple linear extrapolation from the dose giving a 50% tumor incidence (TD50) to a 1 in 10-6 incidence, using TD50 data for the most sensitive species and most sensitive site of tumor induction. For application of a TTC in the assessment of acceptable limits of mutagenic impurities in drug substances and drug products, a value of 1.5 micrograms (µg)/day corresponding to a theoretical 10-5 excess lifetime risk of cancer can be justified.”

For more potent mutagenic carcinogens, such as aflatoxin-like-, N-nitroso-, and alkyl-azoxy compounds, the acceptable intake or permissible daily exposure (PDE) is set lower, based upon available animal toxicologic data.

The important divide between regulatory practice and the litigation of causal claims in civil actions arises from the theoretical nature of the risk assessment enterprise. The FDA acknowledges, for instance, that the acceptable intake is set to mark “a small theoretical increase in risk,” and a “highly hypothetical concept that should not be regarded as a realistic indication of the actual risk,” and thus not an actual risk.[9] The corresponding hypothetical or theoretical risk to the acceptable intake level is clearly small when compared with the human’s lifetime probability of developing cancer (which the FDA states is greater than 1/3, but probably now approaches 40%).

Although the TTC concept allows a calculation of an estimated “safe exposure,” the FDA points out that:

“exceeding the TTC is not necessarily associated with an increased cancer risk given the conservative assumptions employed in the derivation of the TTC value. The most likely increase in cancer incidence is actually much less than 1 in 100,000. *** Based on all the above considerations, any exposure to an impurity that is later identified as a mutagen is not necessarily associated with an increased cancer risk for patients already exposed to the impurity. A risk assessment would determine whether any further actions would be taken.”

In other words the FDA’s risk assessment exists to guide agency action, not to determine a person’s risk or medical status.[10]

As small and theoretical as the risks are, they are frequently based upon demonstrably incorrect assumptions, such as:

  1. humans are as sensitive as the most sensitive species;
  2. all organs are as sensitive as the most sensitive organ of the most sensitive species;
  3. the dose-response in the most sensitive species is a simple linear relationship;
  4. the linear relationship runs from zero exposure and zero risk to the exposure that yields the so-called TD50, the exposure that yields tumors in 50% of the experimental animal model;
  5. the TD-50 is calculated based upon the point estimate in the animal model study, regardless of any confidence interval around the point estimate;
  6. the inclusion, in many instances, of non-malignant tumors as part of the assessment of the TD50 exposure;
  7. there is some increased risk for any exposure, no matter how small; that is, there is no threshold below which there is no increased risk; and
  8. the medication with the mutagenic impurity was used daily for 70 years, by a person who weights 50 kg.

Although the FDA acknowledges that there may be some instances in which a “less than lifetime level” (LTL) may be appropriate, it places the burden on manufacturers to show the appropriateness of higher LTLs. The FDA’s M7 Guidance observes that

“[s]tandard risk assessments of known carcinogens assume that cancer risk increases as a function of cumulative dose. Thus, cancer risk of a continuous low dose over a lifetime would be equivalent to the cancer risk associated with an identical cumulative exposure averaged over a shorter duration.”[11]

Similarly, the agency acknowledges that there may be a “practical threshold,” as result of bodily defense mechanisms, such as DNA repair, which counter any ill effects from lower level exposures.[12]

“The existence of mechanisms leading to a dose response that is non-linear or has a practical threshold is increasingly recognized, not only for compounds that interact with non-DNA targets but also for DNA-reactive compounds, whose effects may be modulated by, for example, rapid detoxification before coming into contact with DNA, or by effective repair of induced damage. The regulatory approach to such compounds can be based on the identification of a No-Observed Effect Level (NOEL) and use of uncertainty factors (see ICH Q3C(R5), Ref. 7) to calculate a permissible daily exposure (PDE) when data are available.”

Expert witnesses often attempt to bootstrap their causation opinions by reference to determinations of regulatory agencies that are couched in similar language, but which use different quality and quantity of evidence than is required in the scientific community or in civil courts.

Supreme Court

Industrial Union Dep’t v. American Petroleum Inst., 448 U.S. 607, 656 (1980) (“OSHA is not required to support its finding that a significant risk exists with anything approaching scientific certainty” and “is free to use conservative assumptions in interpreting the data with respect to carcinogens, risking error on the side of overprotection, rather than underprotection.”).

Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309, 1320 (2011) (regulatory agency often makes regulatory decisions based upon evidence that gives rise only to a suspicion of causation) 

First Circuit

Sutera v. Perrier Group of America, Inc., 986 F. Supp. 655, 664-65, 667 (D. Mass. 1997) (a regulatory agency’s “threshold of proof is reasonably lower than that in tort law”; “substances are regulated because of what they might do at given levels, not because of what they will do. . . . The fact of regulation does not imply scientific certainty. It may suggest a decision to err on the side of safety as a matter of regulatory policy rather than the existence of scientific fact or knowledge. . . . The mere fact that substances to which [plaintiff] was exposed may be listed as carcinogenic does not provide reliable evidence that they are capable of causing brain cancer, generally or specifically, in [plaintiff’s] case.”); id. at 660 (warning against the danger that a jury will “blindly accept an expert’s opinion that conforms with their underlying fears of toxic substances without carefully understanding or examining the basis for that opinion.”). Sutera is an important precedent, which involved a claim that exposure to an IARC category I carcinogen, benzene, caused plaintiffs’ leukemia. The plaintiff’s expert witness, Robert Jacobson, espousing a “linear, no threshold” theory, and relying upon an EPA regulation, which he claimed supported his opinion that even trace amounts of benzene can cause leukemia.

In re Neurontin Mktg., Sales Practices, and Prod. Liab. Litig., 612 F. Supp. 2d 116, 136 (D. Mass. 2009) (‘‘It is widely recognized that, when evaluating pharmaceutical drugs, the FDA often uses a different standard than a court does to evaluate evidence of causation in a products liability action. Entrusted with the responsibility of protecting the public from dangerous drugs, the FDA regularly relies on a risk-utility analysis, balancing the possible harm against the beneficial uses of a drug. Understandably, the agency may choose to ‘err on the side of caution,’ … and take regulatory action such as revising a product label or removing a drug from the marketplace ‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-like-than-not standard used to assess tort liability’.’’) (internal citations omitted) 

Whiting v. Boston Edison Co., 891 F. Supp. 12, 23-24 (D. Mass. 1995) (criticizing the linear no-threshold hypothesis, common to regulatory risk assessments, because it lacks any known or potential error rate, and it cannot be falsified as would any scientific theory)

Second Circuit

Wills v. Amerada Hess Corp., No. 98 CIV. 7126(RPP), 2002 WL 140542 (S.D.N.Y. Jan. 31, 2002), aff’d, 379 F.3d 32 (2d Cir. 2004) (Sotomayor, J.). In this Jones Act case, the plaintiff claimed that her husband’s exposure to benzene and polycyclic aromatic hydrocarbons on board ship caused his squamous cell lung cancer. Plaintiff’s expert witness relied heavily upon the IARC categorization of benzene as a “known” carcinogen, and an “oncogene” theory of causation that claimed there was no safe level of exposure because a single molecule could induce cancer. According to the plaintiff’s expert witness, the oncogene theory dispensed with the need to quantify exposure. Then Judge Sotomayor, citing Sutera, rejected plaintiff’s no-threshold theory, and the argument that exposure that exceeded OHSA permissible exposure level supported the causal claim.

Mancuso v. Consolidated Edison Co., 967 F. Supp. 1437, 1448 (S.D.N.Y. 1997) (“recommended or prescribed precautionary standards cannot provide legal causation”; “[f]ailure to meet regulatory standards is simply not sufficient” to establish liability)

In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir.1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988). Judge Weinstein explained that regulatory action would not by itself support imposing liability for an individual plaintiff.  Id. at 782. “A government administrative agency may regulate or prohibit the use of toxic substances through rulemaking, despite a very low probability of any causal relationship.  A court, in contrast, must observe the tort law requirement that a plaintiff establish a probability of more than 50% that the defendant’s action injured him.” Id. at 785.

In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 189 (S.D.N.Y. 2005) (improvidently relying in part upon FDA ban despite “the absence of definitive scientific studies establishing causation”)

Third Circuit

Gates v. Rohm & Haas Co., 655 F.3d 255, 268 (3d Cir. 2011) (affirming the denial of class certification for medical monitoring) (‘‘plaintiffs could not carry their burden of proof for a class of specific persons simply by citing regulatory standards for the population as a whole’’).

In re Schering-Plough Corp. Intron/Temodar Consumer Class Action, 2009 WL 2043604, at *13 (D.N.J. July 10, 2009)(“[T]here is a clear and decisive difference between allegations that actually contest the safety or effectiveness of the Subject Drugs and claims that merely recite violations of the FDCA, for which there is no private right of action.”)

Rowe v. E.I. DuPont de Nemours & Co., Civ. No. 06-1810 (RMB), 2008 U.S. Dist. LEXIS 103528, *46-47 (D.N.J. Dec. 23, 2008) (rejecting reliance upon regulatory findings and risk assessments in which “the basic goal underlying risk assessments . . . is to determine a level that will protect the most sensitive members of the population.”)  (quoting David E. Eaton, “Scientific Judgment and Toxic Torts – A Primer in Toxicology for Judges and Lawyers,” 12 J.L. & Pol’y 5, 34 (2003) (“a number of protective, often ‘worst case’ assumptions . . . the resulting regulatory levels . . . generally overestimate potential toxicity levels for nearly all individuals.”)

Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434, 543 (W.D. Pa. 2003) (finding FDA regulatory proceedings and adverse event reports not adequate or helpful in determining causation; the FDA “ordinarily does not attempt to prove that the drug in fact causes a particular adverse effect.”)Wade-Greaux v. Whitehall Laboratories, Inc., 874 F. Supp. 1441, 1464 (D.V.I.) (“assumption[s that] may be useful in a regulatory risk-benefit context … ha[ve] no applicability to issues of causation-in-fact”), aff’d, 46 F.3d 1120 (3d  Cir. 1994)

O’Neal v. Dep’t of the Army, 852 F. Supp. 327, 333 (M.D. Pa. 1994) (administrative risk figures are “appropriate for regulatory purposes in which the goal is to be particularly cautious [but] overstate the actual risk and, so, are inappropriate for use in determining” civil liability)

Fourth Circuit

Dunn v. Sandoz Pharmaceuticals Corp., 275 F. Supp. 2d 672, 684 (M.D.N.C. 2003) (FDA “risk benefit analysis” “does not demonstrate” causation in any particular plaintiff)

Yates v. Ford Motor Co., 113 F. Supp. 3d 841, 857 (E.D.N.C. 2015) (“statements from regulatory and official agencies … are not bound by standards for causation found in toxic tort law”)

Meade v. Parsley, No. 2:09-cv-00388, 2010 U.S. Dist. LEXIS 125217, * 25 (S.D.W. Va. Nov. 24, 2010) (‘‘Inasmuch as the cost-benefit balancing employed by the FDA differs from the threshold standard for establishing causation in tort actions, this court likewise concludes that the FDA-mandated [black box] warnings cannot establish general causation in this case.’’)

Rhodes v. E.I. du Pont de Nemours & Co., 253 F.R.D. 365, 377 (S.D. W.Va. 2008) (rejecting the relevance of regulatory assessments, which are precautionary and provide no information about actual risk).

Fifth Circuit

Moore v. Ashland Chemical Co., 126 F.3d 679, 708 (5th Cir. 1997) (holding that expert witness could rely upon a material safety data sheet (MSDS) because mandated by the Hazard Communication Act, 29 C.F.R. § 1910.1200), vacated 151 F.3d 269 (5th Cir. 1998) (affirming trial court’s exclusion of expert witness who had relied upon MSDS).

Johnson v. Arkema Inc., 685 F.3d 452, 464 (5th Cir. 2012) (per curiam) (affirming exclusion of expert witness who upon regulatory pronouncements; noting the precautionary nature of such statements, and the absence of specificity for the result claimed at the exposures experienced by plaintiff)

Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198-99 (5th Cir. 1996) (“Scientific knowledge of the harmful level of exposure to a chemical, plus knowledge that the plaintiff was exposed to such quantities, are minimal facts necessary to sustain the plaintiffs’ burden in a toxic tort case”; regulatory agencies, charged with protecting public health, employ a lower standard of proof in promulgating regulations than that used in tort cases). The Allen court explained that it was “also unpersuaded that the “weight of the evidence” methodology these experts use is scientifically acceptable for demonstrating a medical link. . . .  Regulatory and advisory bodies. . .utilize a “weight of the evidence” method to assess the carcinogenicity of various substances in human beings and suggest or make prophylactic rules governing human exposure.  This methodology results from the preventive perspective that the agencies adopt in order to reduce public exposure to harmful substances.  The agencies’ threshold of proof is reasonably lower than that appropriate in tort law, which traditionally makes more particularized inquiries into cause and effect and requires a plaintiff to prove that it is more likely than not that another individual has caused him or her harm.” Id.

Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, *8 (E.D. La. June 16, 2015) (explaining Fifth Circuit’s rejection of regulatory “weight of the evidence” approaches to evaluating causation)

Sprankle v. Bower Ammonia & Chem. Co., 824 F.2d 409, 416 (5th Cir. 1987) (affirmed Rule 403 exclusion evidence of OSHA violations in claim of respiratory impairment in a non-employee who experienced respiratory impairment after exposure to anhydrous ammonia; court found that the jury likely be confused by regulatory pronouncements)

Cano v. Everest Minerals Corp., 362 F. Supp. 2d 814, 825 (W.D. Tex. 2005) (noting that a product that “has been classified as a carcinogen by agencies responsible for public health regulations is not probative of” common-law specific causation) (finding that the linear no-threshold opinion of the plaintiffs’ expert witness, Malin Dollinger, lacked a satisfactory scientific basis)

Burleson v. Glass, 268 F. Supp. 2d 699, 717 (W.D. Tex. 2003) (“the mere fact that [the product] has been classified by certain regulatory organizations as a carcinogen is not probative on the issue of whether [plaintiff’s] exposure. . .caused his. . .cancers”), aff’d, 393 F.3d 577 (5th Cir. 2004)

Newton v. Roche Labs., Inc., 243 F. Supp. 2d 672, 677, 683 (W.D. Tex. 2002) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events) (“Although evidence of an association may … be important in the scientific and regulatory contexts…, tort law requires a higher standard of causation.”)

Molden v. Georgia Gulf Corp., 465 F. Supp. 2d 606, 611 (M.D. La. 2006) (“regulatory and advisory bodies make prophylactic rules governing human exposure based on proof that is reasonably lower than that appropriate in tort law”)

Sixth Circuit

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-53 (6th Cir. 2001) (exposure above regulatory levels is insufficient to establish causation)

Stites v Sundstrand Heat Transfer, Inc., 660 F. Supp. 1516, 1525 (W.D. Mich. 1987) (rejecting use of regulatory standards to support claim of increased risk, noting the differences in goals and policies between regulation and litigation)

Mann v. CSX Transportation, Inc., case no. 1:07-Cv-3512, 2009 U.S. Dist. Lexis 106433 (N.D. Ohio Nov. 10, 2009) (rejecting expert testimony that relied upon EPA action levels, and V.A. compensation for dioxin exposure, as basis for medical monitoring opinions)

Baker v. Chevron USA, Inc., 680 F. Supp. 2d 865, 880 (S.D. Ohio 2010) (“[R]egulatory agencies are charged with protecting public health and thus reasonably employ a lower threshold of proof in promulgating their regulations than is used in tort cases.”) (“[t]he mere fact that Plaintiffs were exposed to [the product] in excess of mandated limits is insufficient to establish causation”; rejecting Dr. Dahlgren’s opinion and its reliance upon a “one-hit” or “no threshold” theory of causation in which exposure to one molecule of a cancer-causing agent has some finite possibility of causing a genetic mutation leading to cancer, a theory that may be accepted for purposes of setting regulatory standards, but not as reliable scientific knowledge)

Adams v. Cooper Indus., 2007 WL 2219212 at *7 (E.D. KY 2007).

Seventh Circuit

Wood v. Textron, Inc., No. 3:10 CV 87, 2014 U.S. Dist. LEXIS 34938 (N.D. Ind. Mar. 17, 2014); 2014 U.S. Dist. LEXIS 141593, at *11 (N.D. Ind. Oct. 3, 2014), aff’d, 807 F.3d 827 (7th Cir. 2015). Dahlgren based his opinions upon the children’s water supply containing vinyl chloride in excess of regulatory levels set by state and federal agencies, including the EPA. Similarly, Ryer-Powder relied upon exposure levels’ exceeding regulatory permissible limits for her causation opinions. The district court, with the approval now of the Seventh Circuit would have none of this nonsense. Exceeding governmental regulatory exposure limits does not prove causation. The con-compliance does not help the fact finder without knowing “the specific dangers” that led the agency to set the permissible level, and thus the regulations are not relevant at all without this information. Even with respect to specific causation, the regulatory infraction may be weak or null evidence for causation. (citing Cunningham v. Masterwear Corp., 569 F.3d 673, 674–75 (7th Cir. 2009)

Eighth Circuit

Glastetter v. Novartis Pharms. Corp., 107 F. Supp. 2d 1015, 1036 (E.D. Mo. 2000) (“[T]he [FDA’s] statement fails to affirmatively state that a connection exists between [the drug] and the type of injury in this case.  Instead, it states that the evidence received by the FDA calls into question [drug’s] safety, that [the drug] may be an additional risk factor. . .and that the FDA had new evidence suggesting that therapeutic use of [the drug] may lead to serious adverse experiences.  Such language does not establish that the FDA had concluded that [the drug] can cause [the injury]; instead, it indicates that in light of the limited social utility of [the drug for the use at issue] and the reports of possible adverse effects, the drug should no longer be used for that purpose.”) (emphasis in original), aff’d, 252 F.3d 986, 991 (8th Cir. 2001) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events; “methodology employed by a government agency results from the preventive perspective that the agencies adopt”)( “The FDA will remove drugs from the marketplace upon a lesser showing of harm to the public than the preponderance-of-the-evidence or the more-like-than-not standard used to assess tort liability . . . . [Its] decision that [the drug] can cause [the injury] is unreliable proof of medical causation.”), aff’d, 252 F.3d 986 (8th Cir. 2001)

Wright v. Willamette Indus., Inc., 91 F.3d 1105, 1107 (8th Cir. 1996) (rejecting claim that plaintiffs were not required to show individual exposure levels to formaldehyde from wood particles). The Wright court elaborated upon the difference between adjudication and regulation of harm:

“Whatever may be the considerations that ought to guide a legislature in its determination of what the general good requires, courts and juries, in deciding cases, traditionally make more particularized inquiries into matters of cause and effect.  Actions in tort for damages focus on the question of whether to transfer money from one individual to another, and under common-law principles (like the ones that Arkansas law recognizes) that transfer can take place only if one individual proves, among other things, that it is more likely than not that another individual has caused him or her harm.  It is therefore not enough for a plaintiff to show that a certain chemical agent sometimes causes the kind of harm that he or she is complaining of.  At a minimum, we think that there must be evidence from which the factfinder can conclude that the plaintiff was exposed to levels of that agent that are known to cause the kind of harm that the plaintiff claims to have suffered. See Abuan v. General Elec. Co., 3 F.3d at 333.  We do not require a mathematically precise table equating levels of exposure with levels of harm, but there must be evidence from which a reasonable person could conclude that a defendant’s emission has probablycaused a particular plaintiff the kind of harm of which he or she complains before there can be a recovery.”

Gehl v. Soo Line RR, 967 F.2d 1204, 1208 (8th Cir. 1992).

Nelson v. Am. Home Prods. Corp., 92 F. Supp. 2d 954, 958 (W.D. Mo. 2000) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

National Bank of Commerce v. Associated Milk Producers, Inc., 22 F. Supp. 2d 942, 961 (E.D.Ark. 1998), aff’d, 191 F.3d 858 (8th Cir. 1999) 

Junk v. Terminix Internat’l Co., 594 F. Supp. 2d 1062, 1071 (S.D. Iowa 2008) (“government agency regulatory standards are irrelevant to [plaintiff’s] burden of proof in a toxic tort cause of action because of the agency’s preventative perspective”)

Ninth Circuit

Henrickson v. ConocoPhillips Co., 605 F. Supp. 2d 1142, 1156 (E.D. Wash. 2009) (excluding expert witness causation opinions in case involving claims that benzene exposure caused leukemia) 

Lopez v. Wyeth-Ayerst Labs., Inc., 1998 WL 81296, at *2 (9th Cir. Feb. 25, 1998) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

In re Epogen & Aranesp Off-Label Marketing & Sales Practices Litig., 2009 WL 1703285, at *5 (C.D. Cal. June 17, 2009) (“have not been proven” allegations are an improper “FDA approval” standard; the FDA’s determination to require warning changes without establishing causation is established does not permit a court or jury, bound by common-law standards, to impose such a duty to warn when common-law causation requirements are not met).

In re Hanford Nuclear Reservation Litig., 1998 U.S. Dist. Lexis 15028 (E.D. Wash. 1998) (radiation and chromium VI), rev’d on other grounds, 292 F.3d 1124 (9th Cir. 2002).

Tenth Circuit

Hollander v. Shandoz Pharm. Corp., 95 F. Supp. 2d 1230, 1239 (W.D. Okla. 2000) (distinguishing FDA’s threshold of proof as lower than appropriate in tort law), aff’d in relevant part, 289 F.3d 1193, 1215 (10th Cir. 2002)

Mitchell v. Gencorp Inc., 165 F.3d 778, 783 n.3 (10th Cir. 1999) (benzene and CML) (quoting Allen, 102 F.3d at 198) (state administrative finding that product was a carcinogen was based upon lower administrative standard than tort standard) (“The methodology employed by a government agency “results from the preventive perspective that the agencies adopt in order to reduce public exposure to harmful substances.  The agencies’ threshold of proof is reasonably lower than that appropriate in tort law, which traditionally makes more particularized inquiries into cause and effect and requires a plaintiff to prove it is more likely than not that another individual has caused him or her harm.”)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1229 (D.Colo. 1998)

Johnston v. United States, 597 F. Supp. 374, 393-394 (D. Kan.1984) (noting that the linear no-threshold hypothesis is based upon a prudent assumption designed to overestimate risk; speculative hypotheses are not appropriate in determining whether one person has harmed another)

Eleventh Circuit

Rider v. Sandoz Pharmaceuticals Corp., 295 F.3d 1194, 1201 (11th Cir. 2002) (FDA may take regulatory action, such as revising warning labels or withdrawing drug from the market ‘‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-likely-than-not standard used to assess tort liability’’) (“A regulatory agency such as the FDA may choose to err on the side of caution. Courts, however, are required by the Daubert trilogy to engage in objective review of the evidence to determine whether it has sufficient scientific basis to be considered reliable.”)

McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1248-1250 (11th Cir. 2005) (ephedra) (allowing that regulators “may pay heed to any evidence that points to a need for caution,” and apply “a much lower standard than that which is demanded by a court of law”) (“[U]se of FDA data and recommendations raises a more subtle methodological issue in a toxic tort case. The issue involves identifying and contrasting the type of risk assessment that a government agency follows for establishing public health guidelines versus an expert analysis of toxicity and causation in a toxic tort case.”)

In re Seroquel Products Liab. Litig., 601 F. Supp. 2d 1313, 1315 (M.D. Fla. 2009) (noting that administrative agencies “impose[] different requirements and employ[] different labeling and evidentiary standards” because a “regulatory system reflects a more prophylactic approach” than the common law)

Siharath v. Sandoz Pharmaceuticals Corp., 131 F. Supp. 2d 1347, 1370 (N.D. Ga. 2001) (“The standard by which the FDA deems a drug harmful is much lower than is required in a court of law.  The FDA’s lesser standard is necessitated by its prophylactic role in reducing the public’s exposure to potentially harmful substances.”), aff’d, 295 F.3d 1194 330 (11th Cir. 2002)

In re Accutane Products Liability, 511 F.Supp.2d 1288, 1291-92 (M.D. Fla. 2007)(acknowledging that regulatory risk assessments are not necessarily realistic in human populations because they are often based upon animal studies, and that the important differences between experimental animals and humans are substantial in various health outcomes).

Kilpatrick v. Breg, Inc., 2009 WL 2058384 at * 6-7 (S.D. Fla. 2009) (excluding plaintiff’s expert witness), aff’d, 613 F.3d 1329 (11th Cir. 2010)

District of Columbia Circuit

Ethyl Corp. v. E.P.A., 541 F.2d 1, 28 & n. 58 (D.C. Cir. 1976) (detailing the precautionary nature of agency regulations that may be based upon suspicions)

STATE COURTS

Arizona

Lofgren v. Motorola, 1998 WL 299925 (Ariz. Super. Ct. 1998) (finding plaintiffs’ expert witnesses’ testimony that TCE caused cancer to be not generally accepted; “it is appropriate public policy for health organizations such as IARC and the EPA to make judgments concerning the health and safety of the population based on evidence which would be less than satisfactory to support a specific plaintiff’s tort claim for damages in a court of law”)

Colorado

Salazar v. American Sterilizer Co., 5 P.3d 357 (Colo. Ct. App. 2000) (allowing testimony about harmful ethylene oxide exposure based upon OSHA regulations)

Georgia

Butler v. Union Carbide Corp., 712 S.E.2d 537, 552 & n.37 (Ga. App. 2011) (distinguishing risk assessment from causation assessment; citing the New York Court of Appeals decision in Parker for correctly rejecting reliance on regulatory pronouncements for causation determinations)

Illinois

La Salle Nat’l Bank v. Malik, 705 N.E.2d 938 (Ill. App. 3d) (reversing trial court’s exclusion of OSHA PEL for ethylene oxide), writ pet’n den’d, 714 N.E.2d 527 (Ill. 2d 1999)

New York

Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 450, 857 N.E.2d 1114, 1122, 824 N.Y.S.2d 584 (N.Y. 2006) (noting that regulatory agency standards usually represent precautionary principle efforts deliberately to err on side of prevention; “standards promulgated by regulatory agencies as protective measures are inadequate to demonstrate legal causation.” 

In re Bextra & Celebrex, 2008 N.Y. Misc. LEXIS 720, *20, 239 N.Y.L.J. 27 (2008) (characterizing FDA Advisory Panel recommendations as regulatory standard and protective measure).

Juni v. A.O. Smith Water Products Co., 48 Misc. 3d 460, 11 N.Y.S.3d 416, 432, 433 (N.Y. Cty. 2015) (“the reports and findings of governmental agencies [declaring there to be no safe dose of asbestos] are irrelevant as they constitute insufficient proof of causation”), aff’d, 32 N.Y.3d 1116, 116 N.E.3d 75, 91 N.Y.S.3d 784 (2018)

Ohio

Valentine v. PPG Industries, Inc., 821 N.E.2d 580, 597-98 (Ohio App. 2004), aff’d, 850 N.E.2d 683 (Ohio 2006). 

Pennsylvania

Betz v. Pneumo Abex LLC, 44 A. 3d 27 (Pa. 2012).

Texas

Borg-Warner Corp., 232 S.W.3d 765, 770 (Tex. 2007)

Exxon Corp. v. Makofski, 116 S.W.3d 176, 187-88 (Tex. App. 2003) (describing “standards used by OSHA [and] the EPA” as inadequate for causal determinations)


[1] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 549, 627 (3d ed. 2011).

[2] Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Reference Manual On Scientific Evidence at 33 (Fed. Jud. Center 2d. ed. 2000).

[3] Margaret A. Berger, “Introduction to the Symposium,” 12 J. L. & Pol’y 1 (2003). Professor Berger described the symposium as a “felicitous outgrowth of a grant from the Common Benefit Trust established in the Silicone Breast Implant Products Liability Litigation to hold a series of conferences at Brooklyn Law School.” Id. at 1. Ironically, that “Trust” was nothing more than the walking-around money of plaintiffs’ lawyers from the Silicone-Gel Breast Implant MDL 926. Although Professor Berger was often hostile the causation requirement in tort law, her symposium included some well-qualified scientists who amplified her point from the Reference Manual about the divide between regulatory risk assessment and scientific causal assessments.

[4] David L. Eaton, Scientific Judgment and Toxic Torts- A Primer in Toxicology for Judges and Lawyers, 12 J.L. & Pol’y 5, 36 (2003). See also Joseph V. Rodricks and Susan H. Rieth, “Toxicological risk assessment in the courtroom: are available methodologies suitable for evaluating toxic tort and product liability claims?” 27 Regul. Toxicol. & Pharmacol. 21, 27 (1998) (“The public health-oriented resolution of scientific uncertainty [used by regulators] is not especially helpful to the problem faced by a court.”)

[5] EPA “Guidelines for Carcinogen Risk Assessment” at 13 (1986).

[6] The approach is set out in FDA, M7 (R1) Assessment and Control of DNA Reactive (Mutagenic) Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk: Guidance for Industry (2018) [FDA M7]. This FDA guidance is essentially an adoption of the M7 document of the Expert Working Group (Multidisciplinary) of the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH).

[7] FDA M7 at 3.

[8] FDA M7 at 5.

[9] FDA M7 at 5 (emphasis added).

[10] See Labeling of Diphenhydramine Containing Drug Products for Over-the-Counter Human Use, 67 Fed. Reg. 72,555, at 72,556 (Dec. 6, 2002) (“FDA’s decision to act in an instance such as this one need not meet the standard of proof required to prevail in a private tort action. . .. To mandate a warning or take similar regulatory action, FDA need not show, nor do we allege, actual causation.”) (citing Glastetter).

[11] FDA M7 at “Acceptable Intakes in Relation to Less-Than-Lifetime (LTL) Exposure (7.3).”

[12] FDA M7 at 12 (“Mutagenic Impurities With Evidence for a Practical Threshold (7.2.2)”).

Confounded by Confounding in Unexpected Places

December 12th, 2021

In assessing an association for causality, the starting point is “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”[1] In other words, before we even embark on consideration of Bradford Hill’s nine considerations, we should have ruled out chance, bias, and confounding as an explanation for the claimed association.[2]

Although confounding is sometimes considered as a type of systematic bias, its importance warrants its own category. Historically, courts have been rather careless in addressing confounding. The Supreme Court, in a case decided before Daubert and the statutory modifications to Rule 702, ignored the role of confounding in a multiple regression model used to support racial discrimination claims. In language that would be reprised many times to avoid and evade the epistemic demands of Rule 702, the Court held, in Bazemore, that the omission of variables in multiple regression models raises an issue that affects “the  analysis’ probativeness, not its admissibility.”[3]

When courts have not ignored confounding,[4] they have sidestepped its consideration by imparting magical abilities to confidence intervals to take care of problem posed by lurking variables.[5]

The advent of the Reference Manual on Scientific Manual allowed a ray of hope to shine on health effects litigation. Several important cases have been decided by judges who have taken note of the importance of assessing studies for confounding.[6] As a new, fourth edition of the Manual is being prepared, its editors and authors should not lose sight of the work that remains to be done.

The Third Edition of the Federal Judicial Center’s and the National Academies of Science, Engineering & Medicine’s Reference Manual on Scientific Evidence (RMSE3d 2011) addressed confounding in several chapters, not always consistently. The chapter on statistics defined “confounder” in terms of correlation between both the independent and dependent variables:

“[a] confounder is correlated with the independent variable and the dependent variable. An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding”[7]

The chapter on epidemiology, on the other hand, defined a confounder as a risk factor for both the exposure and disease outcome of interest:

“A factor that is both a risk factor for the disease and a factor associated with the exposure of interest. Confounding refers to a situation in which an association between an exposure and outcome is all or partly the result of a factor that affects the outcome but is unaffected by the exposure.”[8]

Unfortunately, the epidemiology chapter never defined “risk factor.” The term certainly seems much less neutral than a “correlated” variable, which lacks any suggestion of causality. Perhaps there is some implied help from the authors of the epidemiology chapter when they described a case of confounding by “known causal risk factors,” which suggests that some risk factors may not be causal.[9] To muck up the analysis, however, the epidemiology chapter went on to define “risk” as “[a] probability that an event will occur (e.g., that an individual will become ill or die within a stated period of time or by a certain age).”[10]

Both the statistics and the epidemiology chapters provide helpful examples of confounding and speak to the need for excluding confounding as the basis for an observed association. The statistics chapter, for instance, described confounding as a threat to “internal validity,”[11] and the need to inquire whether the adjustments in multivariate studies were “sensible and sufficient.”[12]

The epidemiology chapter in one passage instructed that when “an association is uncovered, further analysis should be conducted to assess whether the association is real or a result of sampling error, confounding, or bias.[13] Elsewhere in the same chapter, the precatory becomes mandatory.[14]

Legally Unexplored Source of Substantial Confounding

As the Reference Manual implies, attempting to control for confounding is not adequate.  The controlling must be carefully and sufficiently done. Under the heading of sufficiency and due care, there are epidemiologic studies that purport to control for confounding, but fail rather dramatically. The use of administrative databases, whether based upon national healthcare or insurance claims, has become a common place in chronic disease epidemiology. Their large size obviates many concerns about power to detect rare disease outcomes. Unfortunately, there is often a significant threat to the validity of such studies, which are based upon data sets that characterize patients as diabetic, hypertensive, obese, or smokers vel non. By dichotomizing what are continuous variables, the categorization extracts a significant price in multivariate models used in epidemiology.

Of course, physicians frequently create guidelines for normal versus abnormal, and these divisions or categories show up in medical records, in databases, and ultimately in epidemiologic studies. The actual measurements are not always available, and the use of a categorical variable may appear to simplify the statistical analysis of the dataset. Unfortunately, the results can be quite misleading. Consider the measurements of blood pressure in a study that is evaluating whether an exposure variable (such as medication use or environmental contaminant) is associated with an outcome such as cardiovascular or renal disease. Hypertension, if present, would clearly be a confounder, but the use of a categorical variable for hypertension would greatly undermine the validity of the study. If many of the study participants with hypertension had their condition well controlled by medication, then the categorical variable will dilute the adjustment for the role of hypertension in driving the association between the exposure and outcome variables of interest. Even if none of the hypertensive patients had good control, the reduction of all hypertension to a category, rather than a continuous measurement, is a path of the loss of information and the creation of bias.

Almost 40 years ago, Jacob Cohen showed that dichotomization of continuous variables results in a loss of power.[15] Twenty years later, Peter Austin showed in a Monte Carlo simulation that categorizing a continuous variable in a logistic regression results in inflating the rate of finding false positive associations.[16] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Of course, the national databases often have huge sample sizes, which only serves to increase the bias from the use of categorical variables for confounding variables.

The late Douglas Altman, who did so much to steer the medical literature toward greater validity, warned that dichotomizing continuous variables was known to cause loss of information, statistical power, and reliability in medical research.[17]

In the field of pharmaco-epidemiology, the bias created by dichotomization of a continous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[18] While readers are misled into believing that the study adjusts for important co-variates, the study will have lost information and power, with the result of presenting false-positive results that have the false-allure of a fully adjusted model. Indeed, this bias from inadequate control of confounding infects several pending pharmaceutical multi-district litigations.


Supreme Court

General Electric Co. v. Joiner, 522 U.S. 136, 145-46 (1997) (holding that an expert witness’s reliance on a study was misplaced when the subjects of the study “had been exposed to numerous potential carcinogens”)

First Circuit

Bricklayers & Trowel Trades Internat’l Pension Fund v. Credit Suisse Securities (USA) LLC, 752 F.3d 82, 89 (1st Cir. 2014) (affirming exclusion of expert witness who failed to account for confounding in event studies), aff’g 853 F. Supp. 2d 181, 188 (D. Mass. 2012)

Second Circuit

Wills v. Amerada Hess Corp., 379 F.3d 32, 50 (2d Cir. 2004) (holding expert witness’s specific causation opinion that plaintiff’s squamous cell carcinoma had been caused by polycyclic aromatic hydrocarbons was unreliable, when plaintiff had smoked and drunk alcohol)

Deutsch v. Novartis Pharms. Corp., 768 F.Supp. 2d 420, 432 (E.D.N.Y. 2011) (“When assessing the reliability of a epidemiologic study, a court must consider whether the study adequately accounted for “confounding factors.”)

Schwab v. Philip Morris USA, Inc., 449 F. Supp. 2d 992, 1199–1200 (E.D.N.Y. 2006), rev’d on other grounds, 522 F.3d 215 (2d Cir. 2008) (describing confounding in studies of low-tar cigarettes, where authors failed to account for confounding and assessing healthier life styles in users)

Third Circuit

In re Zoloft Prods. Liab. Litig., 858 F.3d 787, 793 (3d Cir. 2017) (affirming exclusion of causation expert witness)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 591 (D.N.J. 2002), aff’d, 68 Fed. Appx. 356 (3d Cir. 2003)(bias, confounding, and chance must be ruled out before an association  may be accepted as showing a causal association)

Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434 (W.D.Pa. 2003) (excluding expert witnesses in Parlodel case; noting that causality assessments and case reports fail to account for confounding)

Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441 (D.V.I. 1994) (unanswered questions about confounding required summary judgment  against plaintiff in Primatene Mist birth defects case)

Fifth Circuit

Knight v. Kirby Inland Marine, Inc., 482 F.3d 347, 353 (5th Cir. 2007) (affirming exclusion of expert witnesses) (“Of all the organic solvents the study controlled for, it could not determine which led to an increased risk of cancer …. The study does not provide a reliable basis for the opinion that the types of chemicals appellants were exposed to could cause their particular injuries in the general population.”)

Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, *7 (E.D. La. June 16, 2015) (excluding expert witness causation opinion that failed to account for other confounding exposures that could have accounted for the putative association), aff’d, 650 F. App’x 170 (5th Cir. 2016)

LeBlanc v. Chevron USA, Inc., 513 F. Supp. 2d 641, 648-50 (E.D. La. 2007) (excluding expert witness testimony that purported to show causality between plaintiff’s benzene ezposure and myelofibrosis), vacated, 275 Fed. App’x 319 (5th Cir. 2008) (remanding case for consideration of new government report on health effects of benzene)

Castellow v. Chevron USA, 97 F. Supp. 2d 780 (S.D. Tex. 2000) (discussing confounding in passing; excluding expert witness causation opinion in gasoline exposure AML case)

Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (confounding in breast implant studies)

Sixth Circuit

Pluck v. BP Oil Pipeline Co., 640 F.3d 671 (6th Cir. 2011) (affirming exclusion of specific causation opinion that failed to rule out confounding factors)

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-54 (6th Cir. 2001) (rewrite: expert’s failure to account for confounding factors in cohort study of alleged PCB exposures rendered his opinion unreliable)

Turpin v. Merrell Dow Pharms., Inc., 959 F. 2d 1349, 1355 -57 (6th Cir. 1992) (discussing failure of some studies to evaluate confounding)

Adams v. Cooper Indus. Inc., 2007 WL 2219212, 2007 U.S. Dist. LEXIS 55131 (E.D. Ky. 2007) (differential diagnosis includes ruling out confounding causes of plaintiffs’ disease).

Seventh Circuit

People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537–38 (7th Cir. 1997) (noting importance of considering role of confounding variables in educational achievement);

Caraker v. Sandoz Pharms. Corp., 188 F. Supp. 2d 1026, 1032, 1036 (S.D. Ill 2001) (noting that “the number of dechallenge/rechallenge reports is too scant to reliably screen out other causes or confounders”)

Eighth Circuit

Penney v. Praxair, Inc., 116 F.3d 330, 333-334 (8th Cir. 1997) (affirming exclusion of expert witness who failed to account of the confounding effects of age, medications, and medical history in interpreting PET scans)

Marmo v. Tyson Fresh Meats, Inc., 457 F.3d 748, 758 (8th Cir. 2006) (affirming exclusion of specific causation expert witness opinion)

Ninth Circuit

Coleman v. Quaker Oats Co., 232 F.3d 1271, 1283 (9th Cir. 2000) (p-value of “3 in 100 billion” was not probative of age discrimination when “Quaker never contend[ed] that the disparity occurred by chance, just that it did not occur for discriminatory reasons. When other pertinent variables were factored in, the statistical disparity diminished and finally disappeared.”)

In re Viagra & Cialis Prods. Liab. Litig., 424 F.Supp. 3d 781 (N.D. Cal. 2020) (excluding causation opinion on grounds including failure to account properly for confounding)

Avila v. Willits Envt’l Remediation Trust, 2009 WL 1813125, 2009 U.S. Dist. LEXIS 67981 (N.D. Cal. 2009) (excluding expert witness opinion that failed to rule out confounding factors of other sources of exposure or other causes of disease), aff’d in relevant part, 633 F.3d 828 (9th Cir. 2011)

In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp.2d 1230 (W.D.Wash. 2003) (ignoring study validity in a litigation arising almost exclusively from a single observational study that had multiple internal and external validity problems; relegating assessment of confounding to cross-examination)

In re Bextra and Celebrex Marketing Sales Practice, 524 F. Supp. 2d 1166, 1172 – 73 (N.D. Calif. 2007) (discussing invalidity caused by confounding in epidemiologic studies)

In re Silicone Gel Breast Implants Products Liab. Lit., 318 F.Supp. 2d 879, 893 (C.D.Cal. 2004) (observing that controlling for potential confounding variables is required, among other findings, before accepting epidemiologic studies as demonstrating causation).

Henricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142 (E.D. Wash. 2009) (noting that confounding must be ruled out)

Valentine v. Pioneer Chlor Alkali Co., Inc., 921 F. Supp. 666 (D. Nev. 1996) (excluding plaintiffs’ expert witnesses, including Dr. Kilburn, for reliance upon study that failed to control for confounding)

Tenth Circuit

Hollander v. Sandoz Pharms. Corp., 289 F.3d 1193, 1213 (10th Cir. 2002) (noting importance of accounting for confounding variables in causation of stroke)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1233 (D. Colo. 1998) (alternative explanations, such confounding, should be ruled out before accepting causal claims).

Eleventh Circuit

In re Abilify (Aripiprazole) Prods. Liab. Litig., 299 F.Supp. 3d 1291 (N.D.Fla. 2018) (discussing confounding in studies but credulously accepting challenged explanations from David Madigan) (citing Bazemore, a pre-Daubert, decision that did not address a Rule 702 challenge to opinion testimony)

District of Columbia Circuit

American Farm Bureau Fed’n v. EPA, 559 F.3d 512 (D.C. Cir. 2009) (noting that data relied upon in setting particulate matter standards addressing visibility should avoid the confounding effects of humidity)

STATES

Delaware

In re Asbestos Litig., 911 A.2d 1176 (New Castle Cty., Del. Super. 2006) (discussing confounding; denying motion to exclude plaintiffs’ expert witnesses’ chrysotile causation opinions)

Minnesota

Goeb v. Tharaldson, 615 N.W.2d 800, 808, 815 (Minn. 2000) (affirming exclusion of Drs. Janette Sherman and Kaye Kilburn, in Dursban case, in part because of expert witnesses’ failures to consider confounding adequately).

New Jersey

In re Accutane Litig., 234 N.J. 340, 191 A.3d 560 (2018) (affirming exclusion of plaintiffs’ expert witnesses’ causation opinions; deprecating reliance upon studies not controlled for confounding)

In re Proportionality Review Project (II), 757 A.2d 168 (N.J. 2000) (noting the importance of assessing the role of confounders in capital sentences)

Grassis v. Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div. 1991) (discussing the possibility that confounders may lead to an erroneous inference of a causal relationship)

Pennsylvania

Porter v. SmithKline Beecham Corp., No. 3516 EDA 2015, 2017 WL 1902905 (Pa. Super. May 8, 2017) (affirming exclusion of expert witness causation opinions in Zoloft birth defects case; discussing the importance of excluding confounding)

Tennessee

McDaniel v. CSX Transportation, Inc., 955 S.W.2d 257 (Tenn. 1997) (affirming trial court’s refusal to exclude expert witness opinion that failed to account for confounding)


[1] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

[2] See, e.g., David A. Grimes & Kenneth F. Schulz, “Bias and Causal Associations in Observational Research,” 359 The Lancet 248 (2002).

[3] Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing Court of Appeal’s decision that would have disallowed a multiple regression analysis that omitted important variables). Buried in a footnote, the Court did note, however, that “[t]here may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.” Id. at 400 n.10. What the Court missed, of course, is that the regression may be so incomplete as to be unreliable or invalid. The invalidity of the regression in Bazemore does not appear to have been raised as an evidentiary issue under Rule 702. None of the briefs in the Supreme Court or the judicial opinions cited or discussed Rule 702.

[4]Confounding in the Courts” (Nov. 2, 2018).

[5] See, e.g., Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). This howler has been widely acknowledged in the scholarly literature. See David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

[6]On Praising Judicial Decisions – In re Viagra” (Feb. 8, 2021); See “Ruling Out Bias and Confounding Is Necessary to Evaluate Expert Witness Causation Opinions” (Oct. 28, 2018); “Rule 702 Requires Courts to Sort Out Confounding” (Oct. 31, 2018).

[7] David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 285 (3ed 2011). 

[8] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 621.

[9] Id. at 592.

[10] Id. at 627.

[11] Id. at 221.

[12] Id. at 222.

[13] Id. at 567-68 (emphasis added).

[14] Id. at 572 (describing chance, bias, and confounding, and noting that “[b]efore any inferences about causation are drawn from a study, the possibility of these phenomena must be examined”); id. at 511 n.22 (observing that “[c]onfounding factors must be carefully addressed”).

[15] Jacob Cohen, “The cost of dichotomization,” 7 Applied Psychol. Measurement 249 (1983).

[16] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[17] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[18] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).