TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Paraquat Shape-Shifting Expert Witness Quashed

April 24th, 2024

Another multi-district litigation (MDL) has hit a jarring speed bump. Claims for Parkinson’s disease (PD), allegedly caused by exposure to paraquat dichloride (paraquat), were consolidated, in June 2021, for pre-trial coordination in MDL No. 3004, in the Southern District of Illinois, before Chief Judge Nancy J. Rosenstengel. Like many health-effects litigation claims, the plaintiffs’ claims in these paraquat cases turn on epidemiologic evidence. To make their causation case in the first MDL trial cases, plaintiffs’ counsel nominated a statistician, Martin T. Wells, to present their causation case. Last week, Judge Rosenstengel found Wells’ opinion so infected by invalid methodologies and inferences as to be inadmissible under the most recent version of Rule 702.[1] Summary judgment in the trial cases followed.[2]

Back in the 1980s, paraquat gained some legal notoriety in one of the most retrograde Rule 702 decisions.[3] Both the herbicide and Rule 702 survived, however, and they both remain in wide use. For the last two decades, there has been a widespread challenges to the safety of paraquat, and in particular there have been claims that paraquat can cause PD or parkinsonism under some circumstances.  Despite this background, the plaintiffs’ counsel in MDL 3004 began with four problems.

First, paraquat is closely regulated for agricultural use in the United States. Under federal law, paraquat can be used to control the growth of weeds only “by or under the direct supervision of a certified applicator.”[4] The regulatory record created an uphill battle for plaintiffs.[5] Under the Federal Insecticide, Fungicide, and Rodenticide Act (“FIFRA”), the U.S. EPA has regulatory and enforcement authority over the use, sale, and labeling of paraquat.[6] As part of its regulatory responsibilities, in 2019, the EPA systematically reviewed available evidence to assess whether there was an association between paraquat and PD. The agency’s review concluded that “there is limited, but insufficient epidemiologic evidence at this time to conclude that there is a clear associative or causal relationship between occupational paraquat exposure and PD.”[7] In 2021, the EPA issued its Interim Registration Review Decision, and reapproved the registration of paraquat. In doing so, the EPA concluded that “the weight of evidence was insufficient to link paraquat exposure from pesticidal use of U.S. registered products to Parkinson’s disease in humans.”[8]

Second, beyond the EPA, there were no other published reviews, systematic or otherwise, which reached a conclusion that paraquat causes PD.[9]

Third, the plaintiffs claims faced another serious impediment. Their counsel placed their reliance upon Professor Martin Wells, a statistician on the faculty of Cornell University. Unfortunately for plaintiffs, Wells has been known to operate as a “cherry picker,” and his methodology has been previously reviewed in an unfavorable light. Another MDL court, which reviewed a review and meta-analysis propounded by Wells, found that his reports “were marred by a selective review of data and inconsistent application of inclusion criteria.”[10]

Fourth, the plaintiffs’ claims were before Chief Judge Nancy J. Rosenstengel, who was willing to do the hard work required under Rule 702, specially as it has been recently amended for clarification and emphasis of the gatekeeper’s responsibilities to evaluate validity issues in the proffered opinions of expert witnesses. As her 97 page decision evinces, Judge Rosenstengel conducted four days of hearings, which included viva voce testimony from Martin Wells, and she obviously read the underlying papers, reviews, as well as the briefs and the Reference Manual on Scientific Evidence, with great care. What followed did not go well for Wells or the plaintiffs’ claims.[11] Judge Rosenstengel has written an opinion that may be the first careful judicial consideration of the basic requirements of systematic review.

The court noted that systematic reviewers carefully define a research question and what kinds of empirical evidence will be reviewed, and then collect, summarize, and, if feasible, synthesize the available evidence into a conclusion.[12] The court emphasized that systematic reviewers should “develop a protocol for the review before commencement and adhere to the protocol regardless of the results of the review.”[13]

Wells proffered a meta-analysis, and a “weight of the evidence” (WOE) review from which he concluded that paraquat causes PD and nearly triples the risk of the disease among workers exposed to the herbicide.[14] In his reports, Wells identified a universe of at least 36 studies, but included seven in his meta-analysis. The defense had identified another two studies that were germane.[15]

Chief Judge Rosenstengel’s opinion is noteworthy for its fine attention to detail, detail that matters to the validity of the expert witness’s enterprise. Martin Wells set out to do a meta-analysis, which was all fine and good. With a universe of 36 studies, with sub-findings, alternative analyses, and changing definitions of relevant exposure, the devil lay in the details.

The MDL court was careful to point out that it was not gainsaying Wells’ decision to limit his meta-analysis to case-control studies, or to his grading of any particular study as being of low quality. Systematic reviews and meta-analyses are generally accepted techniques that are part of a scientific approach to causal inference, but each has standards, predicates, and requirements for valid use. Expert witnesses must not only use a reliable methodology, Rule 702(d) requires that they must reliably apply their chosen methodology to the facts at hand in reaching their conclusions.[16]

The MDL court concluded that Wells’ meta-analysis was not sufficiently reliable under Rule 702 because he failed faithfully and reliably to apply his own articulated methodology. The court followed Wells’ lead in identifying the source and content of his chosen methodology, and simply examined his proffered opinion for compliance with that methodology.[17] The basic principles of validity for conducting meta-analyses were not, in any event, really contested. These principles and requirements were clearly designed to ensure and enhance the reliability of meta-analyses by pre-empting results-driven, reverse-engineered summary estimates of association.

The court found that Wells failed clearly to pre-specify his eligibility criteria. He then proceeded to redefine exposure criteria and study inclusion or eligibility criteria, and study quality criteria, after looking at the evidence. He also inconsistently applied his stated criteria, all in an apparently desired effort to exclude less favorable study outcomes. These ad hoc steps were some of Wells’ deviations from the standards to which he played lip service.

The court did not exclude Wells because it disagreed with his substantive decisions to include or exclude any particular study, or his quality grading of any study. Rather, Dr. Wells’ meta-analysis does not pass muster under Rule 702 because its methodology was unclear, inconsistently applied, not replicable, and at times transparently reverse-engineered.[18]

The court’s evaluation of Wells was unflinchingly critical. Wells’ proffered opinions “required several methodological contortions and outright violations of the scientific standards he professed to apply.”[19] From his first involvement in this litigation, Wells had violated the basic rules of conducting systematic reviews and meta-analyses.[20] His definition of “occupational” exposure meandered to suit his desire to include one study (with low variance) that might otherwise have been excluded.[21] Rather than pre-specifying his review process, his study inclusion criteria, and his quality scores, Wells engaged in an unwritten “holistic” review process, which he conceded was not objectively replicable. Wells’ approach left him free to include studies he wanted in his meta-analysis, and then provide post hoc justifications.[22] His failure to identify his inclusion/exclusion criteria was a “methodological red flag” in Dr. Wells’ meta-analysis, which suggested his reverse engineering of the whole analysis, the “very antithesis of a systematic review.”[23]

In what the court described as “methodological shapeshifting,” Wells blatantly and inconsistently graded studies he wanted to include, and had already decided to include in his meta-analysis, to be of higher quality.[24] The paraquat MDL court found, unequivocally, that Wells had “failed to apply the same level of intellectual rigor to his work in the four trial selection cases that would be required of him and his peers in a non-litigation setting.”[25]

It was also not lost upon the MDL court that Wells had shifted from a fixed effect to a random effects meta-analysis, between his principal and rebuttal reports.[26] Basic to the meta-analytical enterprise is a predicate systematic review, properly done, with pre-specification of inclusion and exclusion criteria for what studies would go into any meta-analysis. The MDL court noted that both sides had cited Borenstein’s textbook on meta-analysis,[27] and that Wells had himself cited the Cochrane Handbook[28] for the basic proposition that that objective and scientifically valid study selection criteria should be clearly stated in advance to ensure the objectivity of the analysis.

There was of course legal authority for this basic proposition about prespecification. Given that the selection of studies that go into a systematic review and meta-analysis can be dispositive of its conclusion, undue subjectivity or ad hoc inclusion can easily arrange a desired outcome.[29] Furthermore, meta-analysis carries with it the opportunity to mislead a lay jury with a single (and inflated) risk ratio,[30] which is obtained by the operator’s manipulation of inclusion and exclusion criteria. This opportunity required the MDL court to examine the methodological rigor of the proffered meta-analysis carefully to evaluate whether it reflects a valid pooling of data or it was concocted to win a case.[31]

Martin Wells had previously acknowledged the dangers of manipulation and subjective selectivity inherent in systematic reviews and meta-analyses. The MDL court quoted from Wells’ testimony in Martin v. Actavis:

QUESTION: You would certainly agree that the inclusion-exclusion criteria should be based upon objective criteria and not simply because you were trying to get to a particular result?

WELLS: No, you shouldn’t load the – sort of cook the books.

QUESTION: You should have prespecified objective criteria in advance, correct?

WELLS: Yes.[32]

The MDL court also picked up on a subtle but important methodological point about which odds ratio to use in a meta-analysis when a study provides multiple analyses of the same association. In his first paraquat deposition, Wells cited the Cochrane Handbook, for the proposition that if a crude risk ratio and a risk ratio from a multivariate analysis are both presented in a given study, then the adjusted risk ratio (and its corresponding measure of standard error seen in its confidence interval) is generally preferable to reduce the play of confounding.[33] Wells violated this basic principle by ignoring the multivariate analysis in the study that dominated his meta-analysis (Liou) in favor of the unadjusted bivariate analysis. Given that Wells accepted this basic principle, the MDL court found that Wells likely selected the minimally adjusted odds ratio over the multiviariate adjusted odds ratio for inclusion in his meta-analysis in order to have the smaller variance (and thus greater weight) from the former. This maneuver was disqualifying under Rule 702.[34]

All in all, the paraquat MDL court’s Rule 702 ruling was a convincing demonstration that non-expert generalist judges, with assistance from subject-matter experts, treatises, and legal counsel, can evaluate and identify deviations from methodological standards of care.


[1] In re Paraquat Prods. Prods. Liab. Litig., Case No. 3:21-md-3004-NJR, MDL No. 3004, Slip op., ___ F.3d ___ (S.D. Ill. Apr. 17, 2024) [Slip op.]

[2] In re Paraquat Prods. Prods. Liab. Litig., Op. sur motion for judgment, Case No. 3:21-md-3004-NJR, MDL No. 3004 (S.D. Ill. Apr. 17, 2024). See also Brendan Pierson, “Judge rejects key expert in paraquat lawsuits, tosses first cases set for trial,” Reuters (Apr. 17, 2024); Hailey Konnath, “Trial-Ready Paraquat MDL Cases Tossed After Testimony Axed,” Law360 (Apr. 18, 2024).

[3] Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). SeeFerebee Revisited,” Tortini (Dec. 28, 1017).

[4] See 40 C.F.R. § 152.175.

[5] Slip op. at 31.

[6] 7 U.S.C. § 136w; 7 U.S.C. § 136a(a); 40 C.F.R. § 152.175. The agency must periodically review the registration of the herbicide. 7 U.S.C. § 136a(g)(1)(A). See Ruckelshaus v. Monsanto Co., 467 U.S. 986, 991-92 (1984).

[7] See Austin Wray & Aaron Niman, Memorandum, Paraquat Dichloride: Systematic review of the literature to evaluate the relationship between paraquat dichloride exposure and Parkinson’s disease at 35 (June 26, 2019).

[8] See also Jeffrey Brent and Tammi Schaeffer, “Systematic Review of Parkinsonian Syndromes in Short- and Long-Term Survivors of Paraquat Poisoning,” 53 J. Occup. & Envt’l Med. 1332 (2011) (“An analysis the world’s entire published experience found no connection between high-dose paraquat exposure in humans and the development of parkinsonism.”).

[9] Douglas L. Weed, “Does paraquat cause Parkinson’s disease? A review of reviews,” 86 Neurotoxicology 180, 180 (2021).

[10] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp. 3d 1007, 1038, 1043 (S.D. Cal. 2021), aff’d, No. 21-55342, 2022 WL 898595 (9th Cir. Mar. 28, 2022) (per curiam). SeeMadigan’s Shenanigans and Wells Quelled in Incretin-Mimetic CasesTortini (July 15, 2022).

[11] The MDL court obviously worked hard to learn the basics principles of epidemiology. The court relied extensively upon the epidemiology chapter in the Reference Manual on Scientific Evidence. Much of that material is very helpful, but its exposition on statistical concepts is at times confused and erroneous. It is unfortunate that courts do not pay more attention to the more precise and accurate exposition in the chapter on statistics. Citing the epidemiology chapter, the MDL court gave an incorrect interpretation of the p-value: “A statistically significant result is one that is unlikely the product of chance. Slip op. at 17 n. 11. And then again, citing the Reference Manual, the court declared that “[a] p-value of .1 means that there is a 10% chance that values at least as large as the observed result could have been the product of random error. Id.” Id. Similarly, the MDL court gave an incorrect interpretation of the confidence interval. In a footnote, the court tells us that “[r]esearchers ordinarily assert a 95% confidence interval, meaning that ‘there is a 95% chance that the “true” odds ratio value falls within the confidence interval range’. In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., MDL No. 2342, 2015 WL 7776911, at *2 (E.D. Pa. Dec. 2, 2015).” Slip op. at 17n.12.  Citing another court for the definition of a statistical concept is a risky business.

[12] Slip op. at 20, citing Lisa A. Bero, “Evaluating Systematic Reviews and Meta-Analyses,” 14 J.L. & Pol’y 569, 570 (2006).

[13] Slip op. at 21, quoting Bero, at 575.

[14] Slip op. at 3.

[15] The nine studies at issue were as follows: (1) H.H. Liou, et al., “Environmental risk factors and Parkinson’s disease; A case-control study in Taiwan,” 48 Neurology 1583 (1997); (2) Caroline M. Tanner, et al.,Rotenone, Paraquat and Parkinson’s Disease,” 119 Envt’l Health Persps. 866 (2011) (a nested case-control study within the Agricultural Health Study (“AHS”)); (3) Clyde Hertzman, et al., “A Case-Control Study of Parkinson’s Disease in a Horticultural Region of British Columbia,” 9 Movement Disorders 69 (1994); (4) Anne-Maria Kuopio, et al., “Environmental Risk Factors in Parkinson’s Disease,” 14 Movement Disorders 928 (1999); (5) Katherine Rugbjerg, et al., “Pesticide exposure and risk of Parkinson’s disease – a population-based case-control study evaluating the potential for recall bias,” 37 Scandinavian J. of Work, Env’t & Health 427 (2011); (6) Jordan A. Firestone, et al., “Occupational Factors and Risk of Parkinson’s Disease: A Population-Based Case-Control Study,” 53 Am. J. of Indus. Med. 217 (2010); (7) Amanpreet S. Dhillon,“Pesticide / Environmental Exposures and Parkinson’s Disease in East Texas,” 13 J. of Agromedicine 37 (2008); (8) Marianne van der Mark, et al., “Occupational exposure to pesticides and endotoxin and Parkinson’s disease in the Netherlands,” 71 J. Occup. & Envt’l Med. 757 (2014); (9) Srishti Shrestha, et al., “Pesticide use and incident Parkinson’s disease in a cohort of farmers and their spouses,” Envt’l Research 191 (2020).

[16] Slip op. at 75.

[17] Slip op. at 73.

[18] Slip op. at 75, citing In re Mirena IUS Levonorgestrel-Related Prod. Liab. Litig. (No. II), 341 F. Supp. 3d 213, 241 (S.D.N.Y. 2018) (“Opinions that assume a conclusion and reverse-engineer a theory to fit that conclusion are . . . inadmissible.”) (internal citation omitted), aff’d, 982 F.3d 113 (2d Cir. 2020); In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., No. 12-md-2342, 2015 WL 7776911, at *16 (E.D. Pa. Dec. 2, 2015) (excluding expert’s opinion where he “failed to consistently apply the scientific methods he articulat[ed], . . . deviated from or downplayed certain well established principles of his field, and . . . inconsistently applied methods and standards to the data so as to support his a priori opinion.”), aff’d, 858 F.3d 787 (3d Cir. 2017).

[19] Slip op. at 35.

[20] Slip op. at 58.

[21] Slip op. at 55.

[22] Slip op. at 41, 64.

[23] Slip op. at 59-60, citing In re Lipitor (Atorvastatin Calcium) Mktg., Sales Pracs. & Prod. Liab. Litig., 892 F.3d 624, 634 (4th Cir. 2018) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

[24] Slip op. at 67, 69-70, citing In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., 858 F.3d 787, 795-97 (3d Cir. 2017) (“[I]f an expert applies certain techniques to a subset of the body of evidence and other techniques to another subset without explanation, this raises an inference of unreliable application of methodology.”); In re Bextra and Celebrex Mktg. Sales Pracs. & Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1179 (N.D. Cal. 2007) (excluding an expert witness’s causation opinion because of his result-oriented, inconsistent evaluation of data sources).

[25] Slip op. at 40.

[26] Slip op. at 61 n.44.

[27] Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Rothstein, Introduction to Meta-Analysis (2d ed. 2021).

[28] Jacqueline Chandler, James Thomas, Julian P. T. Higgins, Matthew J. Page, Miranda Cumpston, Tianjing Li, Vivian A. Welch, eds., Cochrane Handbook for Systematic Reviews of Interventions (2ed 2023).

[29] Slip op. at 56, citing In re Zimmer Nexgen Knee Implant Prod. Liab. Litig., No. 11 C 5468, 2015 WL 5050214, at *10 (N.D. Ill. Aug. 25, 2015).

[30] Slip op. at 22. The court noted that the Reference Manual on Scientific Evidence cautions that “[p]eople often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiological ones, may consequently be overlooked.” Id., quoting from Manual, at 608.

[31] Slip op. at 57, citing Deutsch v. Novartis Pharms. Corp., 768 F. Supp. 2d 420, 457-58 (E.D.N.Y. 2011) (“[T]here is a strong risk of prejudice if a Court permits testimony based on an unreliable meta-analysis because of the propensity for juries to latch on to the single number.”).

[32] Slip op. at 64, quoting from Notes of Testimony of Martin Wells, in In re Testosterone Replacement Therapy Prod. Liab. Litig., Nos. 1:14-cv-1748, 15-cv-4292, 15-cv-426, 2018 WL 7350886 (N.D. Ill. Apr. 2, 2018).

[33] Slip op. at 70.

[34] Slip op. at 71-72, citing People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (“[A] statistical study that fails to correct for salient explanatory variables . . . has no value as causal explanation and is therefore inadmissible in federal court.”); In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1140 (N.D. Cal. 2018). Slip op. at 17 n. 12.

How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

April 13th, 2024

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.

Dipak Panigrahy – Expert Witness & Putative Plagiarist

March 27th, 2024

Citing an IARC monograph may be in itself questionable, given the IARC’s deviations from good systematic review practice. Taking the language of an IARC, monograph, and passing it off as your own, without citation or attribution, and leaving out the qualifications and limitations stated in the monograph, should be disqualifying for an expert witness.

And it in one federal court, it is.

Last week, on March 18, Senior Judge Roy Bale Dalton, Jr., of Orlando, Florida, granted defendant Lockheed Martin’s Rule 702 motion to exclude the proffered testimony of Dr. Dipak Panigrahy.[1] Panigraphy had opined in his Rule 26 report that seven substances[2] present in the Orlando factory cause eight different types of cancer[3] in 22 of the plaintiffs. Lockheed’s motion asserted that Panigrahy copied IARC verbatim, except for its qualifications and limitations. Judge Dalton reportedly found Panigraphy’s conduct so “blatant that it represents deliberate lack of candor” and an “unreliable methodology.” Although Judge Dalton’s opinion is not yet posted on Westlaw or Google Scholar,[4] the report from Legal Newsline quoted the opinion extensively:

“Here, there is no question that Dr. Panigrahy extensively plagiarized his report… .”

“And his deposition made the plagiarism appear deliberate, as he repeatedly outright refused to acknowledge the long swaths of his report that quote other work verbatim without any quotation marks at all – instead stubbornly insisting that he cited over 1,100 references, as if that resolves the attribution issue (it does not).”

“Indeed, the plagiarism is so ubiquitous throughout the report that it is frankly overwhelming to try to make heads or tails of just what is Dr. Panigrahy’s own work – a task that neither he nor Plaintiffs’ counsel even attempts to tackle.”

There is a wide-range of questionable research practices and dubious inferences that lead to the exclusion of expert witnesses under Rule 702, but I would have thought that Panigraphy was the first witness to have been excluded for plagiarism. Judge Dalton did, however, cite cases involving plagiarism by expert witnesses.[5] Although plagiarism might be framed as a credibility issue, the extent of the plagiarism by Panigraphy represented such an egregious lack of candor that it may justify exclusion under Rule 702.

Judge Dalton’s gatekeeping analysis, however, did not stop with the finding of blatant plagiarism from the IARC monograph. Panigraphy’s report was further methodologically marred by reliance upon the IARC, and his confusion of the IARC hazard evaluation with the required determination of causation in the law of torts. Judge Dalton explained that

“the plagiarism here reflects even deeper methodological problems because the report lifts a great deal of its analysis from IARC in particular. As the Court discussed in the interim causation Order, research agencies like IARC are, understandably, focused on protecting public health and recommending protective standards, rather than evaluating causation from an expert standpoint in the litigation context. IARC determines qualitatively whether substances are carcinogenic to humans; its descriptors have “no quantitative significance” such as more likely than not. Troublingly, Dr. Panigrahy did not grasp this crucial distinction between IARC’s classifications and the general causation preponderance standard. Because so much of Dr. Panigrahy’s report is merely a wholesale adoption of IARC’s findings under the guise of his own expertise, and IARC’s findings in and of themselves are insufficient, he fails to reliably establish general causation.”[6]

Dr. Panigraphy was accepted into medical school at the age of 17. His accelerated education may have left him without a firm understanding of the ethical requirements of scholarship.

Earlier this month, Senior Judge Dalton excluded another expert witness’s opinion testimony, from Dr. Donald Mattison, on autism, multiple sclerosis, and Parkinson’s disease, but permitted opinions on the causation of various birth defects.[7] Judge Dalton’s decisions arise from a group of companion cases, brought by more than 60 claimants against Lockheed Martin for various health conditions alleged to have been caused by Lockheed’s supposed contamination of the air, soil, and groundwater, with chemicals from its weapons manufacturing plant.

The unreliability of Panigraphy’s report led to the entry of summary judgment against the 22 plaintiffs, whose cases turned on the Panigraphy report.

The putative plagiarist, Dr. Panigraphy, is an assistant professor of pathology, at Harvard Medical School, in the department of pathology, Beth Israel Deaconess Medical Center, in Boston. Panigraphy has a profile at the “Expert Institute,” sort of an employment agency for expert witnesses. His opinions were excluded in the federal multi-district litigation concerning Zantac/ranitidine.[8]  Very similar opinions were permitted over defense challenges, in a short, perfunctory order, even shorter on reasoning, in the valsartan multi-district litigation.[9]


[1] John O’Brien, “‘A mess’: Expert in Florida toxic tort plagiarizes cancer research of others, tries to submit it to court,” Legal News Line (Mar. 25, 2024).

[2] trichloroethylene, tetrachloroethylene, formaldehyde, arsenic, hexavalent chromium, trichloroethylene, and styrene.

[3] cancers of the kidney, breast, thyroid, pancreas, liver and bile duct, testicles, and anus, as well as Hodgkin’s lymphoma, non-Hodgkin’s lymphoma, and leukemia.

[4] Henderson v. Lockheed Martin Corp., case no. 6:21-cv-1363-RBD-DCI, document 399 (M.D. Fla. Mar. 18, 2024) (Dalton, S.J.).

[5] Henderson Order at 6, citing Moore v. BASF Corp., No. CIV.A. 11-1001, 2012 WL 6002831, at *7 (E.D. La. Nov. 30, 2012) (excluding expert testimony from Bhaskar Kura), aff’d, 547 F. App’x 513 (5th Cir. 2013); Spiral Direct, Inc. v. Basic Sports Apparel, Inc., No. 6:15-cv-641, 2017 WL 11457208, at *2 (M.D. Fla. Apr. 13, 2017); 293 F. Supp. 3d 1334, 1363 n. 20 (2017); Legier & Materne v. Great Plains Software, Inc., No. CIV.A. 03-0278, 2005 WL 2037346, at *4 (E.D. La. Aug. 3, 2005) (denying motion to exclude proffered testimony because expert witness plagiarized a paragraph in his report).

[6] Henderson Order at 8 -10 (internal citations omitted), citing McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1249 (11th Cir. 2005) (distinguishing agency assessment of risk from judicial assessment of causation); Williams v. Mosaic Fertilizer, LLC, 889 F.3d 1239, 1247 (11th Cir. 2018) (identifying “methodological perils” in relying extensively on regulatory agencies’ precautionary standards to determine causation); Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198 (5th Cir. 1996) (noting that IARC’s “threshold of proof is reasonably lower than that appropriate in tort law, which traditionally makes more particularized inquiries into cause and effect and requires a plaintiff to prove that it is more likely than not that another individual has caused him or her harm”); In re Roundup Prods. Liab. Litig., 390 F. Supp. 3d 1102, 1109 (N.D. Cal. 2018) (“IARC classification is insufficient to get the plaintiffs over the general causation hurdle.”), aff’d, 997 F.3d 941 (9th Cir. 2021).

[7] John O’Brien, “Autism plaintiffs rejected from Florida Lockheed Martin toxic tort,” Legal Newsline (Mar. 15, 2024).

[8][8] In re Zantac (ranitidine) Prods. Liab. Litig., MDL NO. 2924 20-MD-2924, 644 F. Supp. 3d 1075, 1100 (S.D. Fla. 2022).

[9] In re Valsartan, Losartan, and Irbesartan Prods. Liab. Litig., Case 1:19-md-02875-RBK-SAK, document 1958 (D.N.J. Mar. 4, 2022).

A Π-Day Celebration of Irrational Numbers and Other Things – Philadelphia Glyphosate Litigation

March 14th, 2024

Science can often be more complicated and nuanced than we might like. Back in 1897, the Indiana legislature attempted to establish that π was equal to 3.2.[1] Sure, that was simpler and easier to use in calculations, but also wrong. The irreducible fact is that π is an irrational number, and Indiana’s attempt to change that fact was, well, irrational. And to celebrate irrationality, consider the lawsuit’s industry’s jihad against glyphosate, including its efforts to elevate a dodgy IARC evaluation, while suppressing evidence of glyphosate’s scientific exonerations

                                                 

After Bayer lost three consecutive glyphosate cases in Philadelphia last year, observers were scratching their heads over why the company had lost when the scientific evidence strongly supports the defense. The Philadelphia Court of Common Pleas, not to be confused with Common Fleas, can be a rough place for corporate defendants. The local newspapers, to the extent people still read newspapers, are insufferably slanted in their coverage of health claims.

The plaintiffs’ verdicts garnered a good deal of local media coverage in Philadelphia.[2] Defense verdicts generally receive no ink from sensationalist newspapers such as the Philadelphia Inquirer. Regardless, media accounts, both lay and legal, are generally inadequate to tell us what happened, or what went wrong in the court room. The defense losses could be attributable to partial judges or juries, or the difficulty in communicating subtle issues of scientific validity. Plaintiffs’ expert witnesses may seem more sure of themselves than defense experts, or plaintiffs’ counsel may connect better with juries primed by fear-mongering media. Without being in the courtroom, or at least studying trial transcripts, outside observers are challenged to explain fully jury verdicts that go against the scientific evidence. The one thing jury verdicts are not, however, are valid assessments of the strength of scientific evidence, inferences, and conclusions.

Although Philadelphia juries can be rough, they like to see a fight. (Remember Rocky.) It is not a place for genteel manners or delicate and subtle distinctions. Last week, Bayer broke its Philadelphia losing streak, with a win in Kline v. Monsanto Co.[3] Mr. Kline claimed that he developed Non-Hodgkin’s lymphoma (NHL) from his long-term use of Round-Up. The two-week trial, before Judge Ann Butchart, last week went to the jury, which deliberated two hours before returning a unanimous defense verdict. The jury found that the defendants, Monsanto and Nouryon Chemicals LLC, were not negligent, and that the plaintiff’s use of Roundup was not a factual cause of his lymphoma.[4]

Law360 reported that the Kline verdict was the first to follow a ruling on Valentine’s Day, February 14, 2024, which excluded any courtroom reference to the hazard evaluation of Glyphosate by the International Agency for Research on Cancer (IARC). The Law360 article indicated that the IARC found that glyphosate can cause cancer; except of course IARC has never reached such a conclusion.

The IARC working group evaluated the evidence for glyphosate and classified the substance as a category IIA carcinogen, which it labels as “probably” causing human cancer. This label sounds close to what might be useful in a courtroom, except that the IARC declares that “probably,” as used in is IIA classification does not mean what people generally, and lawyers and judges specifically, mean by the word probably.  For IARC, “probable” has no quantitative meaning.  In other words, for IARC, probable, a quantitative concept, which everyone understands to be measured on a scale from 0 to 1, or from 0% to 100%, is not quantitative. An IARC IIA classification could thus represent a posterior probability of 1% in favor of carcinogenicity (and 99% probable not a carcinogen). In other words, on whether glyphosate causes cancer in humans, IARC says maybe in its own made-up epistemic modality.

To find the idiosyncratic definition of “probable,” a diligent reader must go outside the monograph of interest to the so-called Preamble, a separate document, last revised in 2019. The first time the jury will hear of the IARC pronouncement will be in the plaintiff’s case, and if the defense wishes to inform the jury on the special, idiosyncratic meaning of IARC “probable,” they must do it on cross-examination of hostile plaintiffs’ witnesses, or wait longer until they present their own witnesses. Disclosing the IARC IIA classification hurts because the “probable” language lines up with what the trial judges will instruct the juries at the end of the case, when the jurors are told that they need not believe that the plaintiff has eliminated all doubt; they need only find that the plaintiff has shown that each element of his case is “probable,” or more likely than not, in order to prevail. Once the jury has heard “probable,” the defense will have a hard time putting the toothpaste back in the tube. Of course, this is why the lawsuit industry loves IARC evaluations, with its fallacies of semantical distortion.[5]

Although identifying the causes of a jury verdict is more difficult than even determining carcinogenicity, Rosemary Pinto, one of plaintiff Kline’s lawyers, suggested that the exclusion of the IARC evaluation sank her case:

“We’re very disappointed in the jury verdict, which we plan to appeal, based upon adverse rulings in advance of the trial that really kept core components of the evidence out of the case. These included the fact that the EPA safety evaluation of Roundup has been vacated, who IARC (the International Agency for Research on Cancer) is and the relevance of their finding that Roundup is a probable human carcinogen [sic], and also the allowance into evidence of findings by foreign regulatory agencies disguised as foreign scientists. All of those things collectively, we believe, tilted the trial in Monsanto’s favor, and it was inconsistent with the rulings in previous Roundup trials here in Philadelphia and across the country.”[6]

Pinto was involved in the case, and so she may have some insight into why the jury ruled as it did. Still, issuing this pronouncement before interviewing the jurors seems little more than wishcasting. As philosopher Harry Frankfurt explained, “the production of bullshit is stimulated whenever a person’s obligations or opportunities to speak about some topic exceed his knowledge of the facts that are relevant to that topic.”[7] Pinto’s real aim was revealed in her statement that the IARC review was “crucial evidence that juries should be hearing.”[8]  

What is the genesis of Pinto’s complaint about the exclusion of IARC’s conclusions? The Valentine’s Day Order, issued by Judge Joshua H. Roberts, who heads up the Philadelphia County mass tort court, provided that:

AND NOW, this 14th day of February, 2024, upon consideration of Defendants’ Motion to Clarify the Court’s January 4, 2024 Order on Plaintiffs Motion in Limine No. 5 to Exclude Foreign Regulatory Registrations and/or Approvals of Glyphosate, GBHs, and/or Roundup, Plaintiffs’ Response, and after oral argument, it is ORDERED as follows:

  1. The Court’s Order of January 4, 2024, is AMENDED to read as follows: [ … ] it is ORDERED that the Motion is GRANTED without prejudice to a party’s introduction of foreign scientific evidence, provided that the evidence is introduced through an expert witness who has been qualified pursuant to Pa. R. E. 702.

  2. The Court specifically amends its Order of January 4, 2024, to exclude reference to IARC, and any other foreign agency and/or foreign regulatory agency.

  3. The Court reiterates that no party may introduce any testimony or evidence regarding a foreign agency and/or foreign regulatory agency which may result in a mini-trial regarding the protocols, rules, and/or decision making process of the foreign agency and/or foreign regulatory agency. [fn1]

  4. The trial judge shall retain full discretion to make appropriate evidentiary rulings on the issues covered by this Order based on the testimony and evidence elicited at trial, including but not limited to whether a party or witness has “opened the door.”[9]

Now what was not covered in the legal media accounts was the curious irony that the exclusion of the IARC evaluation resulted from plaintiffs’ motion, an own goal of sorts. In previous Philadelphia trials, plaintiffs’ counsel vociferously objected to the defense counsel’s and experts’ references to the determinations by foreign regulators, such as European Union Assessment Group on Glyphosate (2017, 2022), Health Canada (2017), European Food Safety Authority (2017, 2023), Australian Pesticides and Veterinary Medicines Authority (2017), German Federal Institute for Risk Assessment (2019), and others, that rejected the IARC evaluation and reported that glyphosate has not been shown to be carcinogenic.[10]

The gravamen of the plaintiffs’ objection was that such regulatory determinations were hearsay, and that they resulted from various procedures, using various criteria, which would require explanation, and would be subject to litigants’ challenges.[11] In other words, for each regulatory agency’s determination, there would be a “mini-trial,” or a “trial within a trial,” about the validity and accuracy of the foreign agency’s assessment.

In the earlier Philadelphia trials, the plaintiffs’ objections were largely sustained, which created a significant evidentiary bias in the courtrooms. Plaintiffs’ expert witnesses could freely discuss the IARC glyphosate evaluation, but the defense and its experts could not discuss the many determinations of the safety of glyphosate. Jurors were apparently left with the erroneous impression that the IARC evaluation was a consensus view of the entire world’s scientific community.

Now plaintiffs’ objection has a point, even though it seems to prove too much and must ultimately fail. In a trial, each side has expert witnesses who can offer an opinion about the key causal issue, whether glyphosate can cause NHL, and whether it caused this plaintiff’s NHL. Each expert witness will have written a report that identifies the facts and data relied upon, and that explains the inferences drawn and conclusions reached. The adversary can challenge the validity of the data, inferences, and conclusions because the opposing expert witness will be subject to cross-examination.

The facts and data relied upon will, however, be “hearsay,” which will come from published studies not written by the expert witnesses at trial. There will be many aspects of the relied upon studies that will be taken on faith without the testimony of the study participants, their healthcare providers, or the scientists who collected the data, chose how to analyze the data, conducted the statistical and scientific analyses, and wrote up the methods and study findings. Permitting reliance upon any study thus allows for a “mini-trial” or a “trial within a trial,” on each study cited and relied upon by the testifying expert witnesses. This complexity involved in expert witness opinion testimony is one of the foundational reasons for Rule 702’s gatekeeping regime in federal court, and most state courts, but which is usually conspicuously absent in Pennsylvania courtrooms.

Furthermore, the plaintiffs’ objections to foreign regulatory determinations would apply to any review paper, and more important, it would apply to the IARC glyphosate monograph itself. After all, if expert witnesses are supposed to have reviewed the underlying studies themselves, and be competent to do so, and to have arrived at an opinion in some reliable way from the facts and data available, then they would have no need to advert to the IARC’s review on the general causation issue.  If an expert witness were allowed to invoke the IARC conclusion, presumably to bolster his or her own causation opinion, then the jury would need to resolve questions about:

  • who was on the working group;
  • how were working group members selected, or excluded;
  • how the working group arrived at its conclusion;
  • what did the working group rely upon, or not rely upon, and why,
  • what was the group’s method for synthesizing facts and data to reach its conclusion;
  • was the working group faithful to its stated methodology;
  • did the working group commit any errors of statistical or scientific judgment along the way;
  • what potential biases did the working group members have;
  • what is the basis for the IARC’s classificatory scheme; and
  • how are IARC’s key terms such as “sufficient,” “limited,” “probable,” “possible,” etc., defined and used by working groups.

Indeed, a very substantial trial could be had on the bona fides and methods of the IARC, and the glyphosate IARC working group in particular.

The curious irony behind the Valentine’s Day order is that plaintiffs’ counsel were generally winning their objections to the defense’s references to foreign regulatory determinations. But as pigs get fatter, hogs get slaughtered. Last year, plaintiffs’ counsel moved to “exclude foreign regulatory registrations and or approvals of glyphosate.”[12] To be sure, plaintiffs’ counsel were not seeking merely the exclusion of glyphosate registrations, but the scientific evaluations of regulatory agencies and their staff scientists and consulting scientists. Plaintiffs wanted trials in which juries would hear only about IARC, as though it was a scientific consensus. The many scientific regulatory considerations and rejections of the IARC evaluation would be purged from the courtroom.

On January 4, 2024, plaintiffs’ counsel obtained what they sought, an order that memorialized the tilted playing field they had largely been enjoying in Philadelphia courtrooms. Judge Roberts’ order was short and somewhat ambiguous:

“upon consideration of plaintiff’s motion in limine no. 5 to exclude foreign regulatory registrations and/or approvals of glyphosate, GBHs, and/or Roundup, any response thereto, the supplements of the parties, and oral argument, it is ORDERED that the motion is GRANTED without prejudice to a party’s introduction of foreign scientific evidence including, but not limited to, evidence from the International Agency for Research on Cancer (IARC), provided that such introduction does not refer to foreign regulatory agencies.”

The courtroom “real world” outcome after Judge Roberts’ order was an obscene verdict in the McKivison case. Again, there may have been many contributing causes to the McKivison verdict, including Pennsylvania’s murky and retrograde law of expert witness opinion testimony.[13] Mr. McKivison was in remission from NHL and had sustained no economic damages, and yet, on January 26, 2024, a jury in his case returned a punitive compensatory damages award of $250 million, and an even more punitive punitive damage award of $2 billion.[14] It seems at least plausible that the imbalance between admitting the IARC evaluation while excluding foreign regulatory assessments helped create a false narrative that scientists and regulators everywhere had determined glyphosate to be unsafe.

On February 2, 2024, the defense moved for a clarification of Judge Roberts’ January 4, 2024 order that applied globally in the Philadelphia glyphosate litigation. The defendants complained that in their previous trial, after Judge Roberts’ Order of January 4, 2024, they were severely prejudiced by being prohibited from referring to the conclusions and assessments of foreign scientists who worked for regulatory agencies. The complaint seems well founded.  If a hearsay evaluation of glyphosate by an IARC working group is relevant and admissible, the conclusions of foreign scientists about glyphosate are relevant and admissible, whether or not they are employed by foreign regulatory agencies. Indeed, plaintiffs’ counsel routinely complained about Monsanto/Bayer’s “influence” over the United States Environmental Protection Agency, but the suggestion that the European Union’s regulators are in the pockets of Bayer is pretty farfetched. Indeed, the complaint about bias is peculiar coming from plaintifs’ counsel, who command an out-sized influence within the Collegium Ramazzini,[15] which in turn often dominates IARC working groups. Every agency and scientific group, including the IARC, has its “method,” its classificatory schemes, its definitions, and the like. By privileging the IARC conclusion, while excluding all the other many agencies and groups, and allowing plaintiffs’ counsel to argue that there is no real-world debate over glyphosate, Philadelphia courts play a malignant role in helping to generate the huge verdicts seen in glyphosate litigation.

The defense motion for clarification also stressed that the issue whether glyphosate causes NHL or other human cancer is not the probandum for which foreign agency and scientific group statements are relevant.  Pennsylvania has a most peculiar, idiosyncratic law of strict liability, under which such statements may not be relevant to liability questions. Plaintiffs’ counsel, in glyphosate and most tort litigations, however, routinely assert negligence as well as punitive damages claims. Allowing plaintiffs’ counsel to create a false and fraudulent narrative that Monsanto has flouted the consensus of the entire scientific and regulatory community in failing to label Roundup with cancer warnings is a travesty of the rule of law.

What seems clever by halves in the plaintiffs’ litigation approach was that its complaints about foreign regulatory assessments applied equally, if not more so, to the IARC glyphosate hazard evaluation. The glyphosate litigation is not likely as interminable as π, but it is irrational.

*      *     *      *      *     * 

Post Script.  Ten days after the verdict in Kline, and one day after the above post, the Philadelphia Inquirer released a story about the defense verdict. See Nick Vadala, “Monsanto wins first Roundup court case in recent string of Philadelphia lawsuits,” Phila. Inq. (Mar. 15, 2024).


[1] Bill 246, Indiana House of Representatives (1897); Petr Beckmann, A History of π at 174 (1971).

[2] See Robert Moran, “Philadelphia jury awards $175 million after deciding 83-year-old man got cancer from Roundup weed killer,” Phila. Inq. (Oct. 27, 2023); Nick Vadala, “Philadelphia jury awards $2.25 billion to man who claimed Roundup weed killer gave him cancer,” Phila. Inq. (Jan. 29, 2024).

[3] Phila. Ct. C.P. 2022-01641.

[4] George Woolston, “Monsanto Nabs 1st Win In Philly’s Roundup Trial Blitz,” Law360 (Mar. 5, 2024); Nicholas Malfitano, “After three initial losses, Roundup manufacturers get their first win in Philly courtroom,” Pennsylvania Record (Mar. 6, 2024).

[5][5] See David Hackett Fischer, “ Fallacies of Semantical Distortion,” chap. 10, in Historians’ Fallacies: Toward a Logic of Historical Thought (1970); see alsoIARC’s Fundamental Distinction Between Hazard and Risk – Lost in the Flood” (Feb. 1, 2024); “The IARC-hy of Evidence – Incoherent & Inconsistent Classification of Carcinogencity” (Sept. 19, 2023).

[6] Malfitano, note 2 (quoting Pinto); see also Law360, note 2 (quoting Pinto).

[7] Harry Frankfurt, On Bullshit at 63 (2005); seeThe Philosophy of Bad Expert Witness Opinion Testimony” (Oct. 2, 2010).

[8] See Malifanto, note 2 (quoting Pinto).

[9] In re Roundup Prods. Litig., Phila. Cty. Ct. C.P., May Term 2022-0550, Control No. 24020394 (Feb. 14, 2024) (Roberts, J.). In a footnote, the court explained that “an expert may testify that foreign scientists have concluded that Roundup and· glyphosate can be used safely and they do not cause cancer. In the example provided, there is no specific reference to an agency or regulatory body, and the jury is free to make a credibility determination based on the totality of the expert’s testimony. It is, however, impossible for this Court, in a pre-trial posture, to anticipate every iteration of a question asked or answer provided; it remains within the discretion of the trial judge to determine whether a question or answer is appropriate based on the context and the trial circumstances.”

[10] See National Ass’n of Wheat Growers v. Bonta, 85 F.4th 1263, 1270 (9th Cir. 2023) (“A significant number of . . . organizations disagree with IARC’s conclusion that glyphosate is a probable carcinogen”; … “[g]lobal studies from the European Union, Canada, Australia, New Zealand, Japan, and South Korea have all concluded that glyphosate is unlikely to be carcinogenic to humans.”).

[11] See, e.g., In re Seroquel, 601 F. Supp. 2d 1313, 1318 (M.D. Fla. 2009) (noting that references to foreign regulatory actions or decisions “without providing context concerning the regulatory schemes and decision-making processes involved would strip the jury of any framework within which to evaluate the meaning of that evidence”)

[12] McKivison v. Monsanto Co., Phila. Cty. Ct. C.P., No. 2022- 00337, Plaintiff’s Motion in Limine No. 5 to Exclude Foreign Regulatory Registration and/or Approvals of Glyphosate, GHBs and/or Roundup.

[13] See Sherman Joyce, “New Rule 702 Helps Judges Keep Bad Science Out Of Court,” Law360 (Feb. 13, 2024) (noting Pennsylvania’s outlier status on evidence law that enables dodgy opinion testimony).

[14] P.J. D’Annunzio, “Monsanto Fights $2.25B Verdict After Philly Roundup Trial,” Law360 (Feb. 8, 2024).

[15]Collegium Ramazzini & Its Fellows – The Lobby” (Nov. 19, 2023).

IARC’S Fundamental Distinction Between Hazard and Risk – Lost in the Flood

February 1st, 2024

Socrates viewed philosophy as beginning in wonder,[1] but Socrates and his philosophic heirs recognized that philosophy does not get down to business until it starts to clarify the terms of discussion. By the middle of the last century, failure to understand the logic of language replaced wonder as the beginning of philosophy.[2] Even if philosophy could not cure all conceptual pathology, most writers came to see that clarifying terms, concepts, and usage was an essential starting point in thinking clearly about a subject.[3]

Hazard versus Risk

Precision in scientific exposition often follows from the use of measurements, using agreed upon quantitative units, and accepted, accurate, reliable procedures for measurements. When scientists substitute qualitative measures for what are inherently quantitative measures, they frequently lapse into error. For example, beware of rodent studies that proclaim harms at “low doses,” which turn out to low in comparison to other rodent studies, but orders of magnitude greater exposure than experienced by human beings.

Risk is a quantitative term meaning a rate of some specified outcome. A Dictionary of Epidemiology, for instance, defines risk as:

“The probability of an adverse or beneficial event in a defined population over a specified time interval. In epidemiology and in clinical research it is commonly measured through the cumulative incidence and the incidence proportion.”[4]

An increased risk thus requires a measurement of a rate or probability of an outcome greater than expected in the absence of the exposure of interest. We might be uncertain of the precise measure of the risk, or of an increased risk, but conceptually a risk connotes a rate or a probability that is, at least in theory, measurable.

Hazard is a categorical concept; something is, or is not, a hazard without regard to the rate or incidence of harm. The definition of “hazard,” in the Dictionary of Epidemiology provides a definition that captures the non-quantitative, categorical natural of some exposure’s being a hazard:

“The inherent capability of a natural or human-made agent or process to adversely affect human life, health, property, or activity, with the potential to cause a disease.”[5]

The International Agency for Research on Cancer (IARC) purports to set out a classification scheme, for human cancer hazards. As used by IARC, its classification scheme involves a set of epistemic modal terms: “known,” “probably,” “possibly,” and “indeterminate.” These epistemic modalities characterize the strength of the evidence that an agent is carcinogenic, and not the magnitude of quantitative risk of cancer from exposure at a given level. The IARC Preamble, which attempts to describe the Agency’s methodology, explains that the distinction between hazard and risk is “fundamental”:

“A cancer hazard is an agent that is capable of causing cancer, whereas a cancer risk is an estimate of the probability that cancer will occur given some level of exposure to a cancer hazard. The Monographs assess the strength of evidence that an agent is a cancer hazard. The distinction between hazard and risk is fundamental. The Monographs identify cancer hazards even when risks appear to be low in some exposure scenarios. This is because the exposure may be widespread at low levels, and because exposure levels in many populations are not known or documented.”[6]

This attempted explanation reveals an important problem in IARC’s project, as stated in the Preamble. First, there is an unproven assumption that there will be cancer hazards regardless of the exposure levels. The IARC contemplates that there may circumstances of low levels of risk from low levels of exposure, but it elides the important issue of thresholds of exposure, below which there is no risk. The Preamble suggests that IARC does not attempt to provide evidence for or against meaningful thresholds of hazardousness, but this failure greatly undermines the project.  Exposure circumstances may be such that there is no hazard at all, and so the risk is zero.

The purported distinction between hazard and risk, supposedly fundamental, is often blurred by the Agency, in the monographs produced by working groups on specific exposure circumstances. Consider for instance how a working group characterized the “hazard” of inhalation of respirable crystalline silica:

“ln making the overalI evaluation, the Working Group noted that carcinogenicity in humans was not detected in all industrial circumstances studied. Carcinogenicity may be dependent on inherent characteristics of the crystalline silica or on external factors affecting its biological activity or distribution of its polymorphs.

Crystalline silica inhaled in the form of quartz or cristobalite from occupational sources is carcinogenic to humans (Group 1)”[7]

So some IARC classifications actually do specify that exposure to a substance is not a hazard in all circumstances, a qualification that implies that the same exposure in some exposure circumstances is not a hazard, and so the risk is zero.

We know something about the deliberations of the crystalline silica working group. The members were deadlocked for some time, and the switch of one vote ultimately gave a bare majority to reclassifying crystalline silica as a Group I exposure. Here is how the working group member, Corbett McDonald described the situation:

“The IARC Working Group, in 1997, had considerable difficulty in reaching a decision and might well not have done so had it not been made clear that it was concerned with hazard identification, not risk.”[8]

It was indeed Professor McDonald who changed his vote based upon this linguistic distinction between hazard and risk. His own description of the dataset, however, suggests that the elderly McDonald was railroaded by more younger, more strident members of the group:

“Of the many studies reviewed by the Working Group … nine were identified as providing the least confounded evidence. Four studies which we considered positive included two of refractory brick workers, one in the diatomite industry and our own in pottery workers; the five which seemed negative or equivocal included studies of South Dakota gold miners, Danish stone workers, US stone workers and US granite workers. This further example that the truth is seldom pure and never simple underlines the difficulty of establishing a rational control policy for some carcinogenic materials.”[9]

In defense of his vote, McDonald meekly offered that

“[s]ome equally expert panel of scientists presented with the same information on another occasion could of course have reached a different verdict. The evidence was conflicting and difficult to assess and such judgments are essentially subjective. Of course, when the evidence is conflicting, it cannot be said to be sufficient. Not only was the epidemiologic evidence conflicting, but so was the whole animal toxicology, which found a risk of tumors in rats, but not in mice or hamsters.”

Aside from endorsing a Group I classification for crystalline silica, the working group ignored the purportedly fundamental distinction between hazard and risk, by noting that not all exposure circumstances posed a hazard of cancer. The same working group did even greater violence to the supposed distinction between risk and hazard in its evaluation of coal dust exposure and human cancer. Coal miners have been studied extensively for cancer risk, and the working group reviewed and evaluated the nature of their exposures and their cancer outcomes. Coal dust virtually always contains crystalline silica, often making up a sizable percentage of the total inhalational exposures (40% or so) of coal dust.[10] And yet, when the group studied the cancer rates among coal miners, and in animals, it concluded that there was “inadequate evidence in humans, and “in experimental animals,” for carcinogenicity. The same working group that agreed, on a divided vote, to place crystalline silica in Group 1, voted that “[c]oal dust cannot be classified as to its carcinogenicity to humans (Group 3).”[11]

The conceptual confusion between hazard and risk is compounded by the IARC’s use of epistemic modalities – known, probably, possibly, and indeterminate – to characterize the existence of a hazard. The Preamble, in Table 4, summarizes the categories and “the stream of evidence” needed to place any particular exposure in a one epistemic modal class or another. What is inexplicable is how and why a single substance such as crystalline silica goes from a known cancer hazard in some unspecified occupational setting but then its putative carcinogenicity becomes indeterminate when it makes up 40% of the inhaled dust in a coal mine.

 

The conceptual difficulty created by IARC’s fundamental distinction between hazard and risk is that risk might well vary across exposure circumstances, but there is no basis for varying the epistemic modality for the hazard assessment simply because coal dust is only say 40% crystalline silica. Some of the exposure circumstances evaluated for the Group I silica hazard classification actually were lower than the silica content of coal.  Granite quarrying, for example, involves exposure to rock dust that is roughly 30% crystalline silica.

The conceptual and epistemic confusion caused by IARC’s treatment of the same substance in different exposure circumstances is hardly unique to its treatment of crystalline silica and coal dust. Benzene has long been classified as a Group I human carcinogen, for its ability to cause a specific form of leukemia.[12] Gasoline contains, on average, about one percent benzene, and so gasoline exposure inevitably involves benzene exposure. And yet, benzene exposure in the form of inhaling gasoline fumes is only a “possible” human carcinogen, Group 2B.[13]

Similarly, in 2018, the IARC classified the evidence for the human carcinogenicity of coffee as “indeterminate,” Group 3.[14] And yet every drop of coffee inevitably contains acrylamide, which is, according to IARC, a Group 2A “probable human carcinogen.”[15] Rent-seeking groups, such as the Council for Education and Research on Toxics (founded by Carl Cranor and Martyn Smith) have tried shamelessly to weaponize the IARC 2A classification for acrylaminde by claiming a bounty against coffee sellers such as Star-Bucks in California Proposition 65 litigation.[16]

Similarly confusing, IARC designates acetaldehyde on its own a “possible” human carcinogen, group 2B, even though acetaldehyde is invariably associated with the metabolism of ethyl alcohol, which itself is a Group I human carcinogen.[17] There may well be other instances of such confusions, and I would welcome examples from readers.

These disparate conclusions strain credulity, and undermine confidence that the hazard-risk distinction does any work at all. Hazard and risk do have different meanings, and I would not want to be viewed as anti-semantic. IARC’s use of the hazard-risk distinction, however, lends itself to the interpretation that hazard is simply risk without the quantification. This usage actually is worse than having no distinction at all, because it ignores the existence of thresholds below which exposure carries no risk, as well as ignoring different routes of exposure and exposure circumstances that carry no risk at all. The vague and unquantified categorical determination that a substance is a hazard allows fear mongers to substitute subjective, emotive, and unscientific judgments for scientific assessment of carcinogenicity under realistic conditions of use and exposure.


[1] Plato, Theaetetus 155d (Fowler transl. 1921).

[2] Ludwig Wittgenstein, Philosophical Investigations (1953).

[3] See, e.g., Richard M. Rorty, ed., The Linguistic Turn: Essays in Philosophical Method (1992); Nicholas Rescher, Concept Audits: A Philosophical Method (2016); Timothy Williamson, Philosophical Method: A Very Short Introduction 32 (2020) (discussing the need to clarify terms).

[4] Miquel Porta, Sander Greenland, Miguel Hernán, Isabel dos Santos Silva, John M. Last, and Andrea Burón, A Dictionary of Epidemiology 250 (6th ed. 2014).

[5] Id. at 128.

[6] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble (2019) (emphasis added).

[7] IARC Monograph on the Evaluation of Carcinogenic Risks to Humans: Volume 68, Silica, Some Silicates, Coal Dust, and para-Aramid Fibrils 210-211 (1997).

[8] Corbett McDonald & Nicola Cherry, “Crystalline Silica and Lung Cancer: The Problem of Conflicting Evidence,” 8 Indoor Built Environment 121, 121 (1999).

[9] Id.

[10] IARC Monograph on the Evaluation of Carcinogenic Risks to Humans: Volume 68, Silica, Some Silicates, Coal Dust, and para-Aramid Fibrils 340 (1997).

[11] Id. at 393.

[12] IARC Monograph, Volume 120: Benzene (2018).

[13] IARC Monographs on the Evaluation of Carcinogenic Risks to Humans: Volume 45, Occupational Exposures in Petroleum Refining; Crude Oil and Major Petroleum Fuels 194 (1989).

[14] IARC Monograph No. 116, Drinking Coffee, Mate, and Very Hot Beverages (2018).

[15] IARC Monograph no. 60, Some Industrial Chemicals (1994).

[16] SeeCoffee with Cream, Sugar & a Dash of Acrylamide” (June 9, 2018); “The Council for Education and Research on Toxics” (July 9, 2013).

[17] IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Volume 96 1278 (2010).

The Proper Study of Mankind

December 24th, 2023

“Know then thyself, presume not God to scan;

The proper study of Mankind is Man.”[1]

 

Kristen Ranges recently earned her law degree from the University of Miami School of Law, and her doctorate in Environmental Science and Policy, from the University of Miami Rosenstiel School of Marine, Atmospheric, and Earth Science. Ranges’ dissertation was titled Animals Aiding Justice: The Deepwater Horizon Oil Spill and Ensuing Neurobehavioral Impacts as a Case Study for Using Animal Models in Toxic Tort Litigation – A Dissertation.[2] At first blush, Ranges would seem to be a credible interlocutor in the never-ending dispute over the role of whole animal toxicology (and in vitro toxicology) in determining human causation in tort litigation. Her dissertation title is, however, as Martin Short would say, a bit of a tell. Zebrafish become sad when exposed to oil spills, as do we all.

Ranges recently published a spin-off of her dissertation as a law review article with one of her professors. “Vermin of Proof: Arguments for the Admissibility of Animal Model Studies as Proof of Causation in Toxic Tort Litigation.”[3] Arguments for; no arguments against. We can thus understand this is an advocacy piece, which is fair enough. The paper was not designed or titled to mislead anyone into thinking it would be a consideration of arguments for and against extrapolation from (non-human) animal studies to human beings. Perhaps you will think it churlish of me to point out that animal studies will rarely be admissible as evidence. They come into consideration in legal cases only through expert witnesses’ reliance upon them. So the issue is not whether animal studies are admissible, but rather whether expert witness opinion testimony that relies solely or excessively on animal studies for purposes of inferring causation is admissible under the relevant evidentiary rules. Talking about the admissibility of animal model studies signals, if nothing else, a serious lack of familiarity with the relevant evidentiary rules.

Ranges’ law review is clearly, and without subtlety, an advocacy piece. She argues:

“However, judges, scholars, and other legal professionals are skeptical of the use of animal studies because of scientific and legal concerns, which range from interspecies disparities to prejudice of juries. These concerns are either unfounded or exaggerated. Animal model studies can be both reliable and relevant in toxic tort cases. Given the Federal Rules of Evidence, case law relevant to scientific evidence, and one of the goals of tort law-justice-judges should more readily admit these types of studies as evidence to help plaintiffs meet the burden of proof in toxic tort litigation.”[4]

For those of you who labor in this vineyard, I would suggest you read Ranges’ article and judge for yourself. What I see is a serious lack of scientific evidence for her claims, and a serious misunderstanding of the relevant law. One might, for starters, putting aside the Agency’s epistemic dilution, ask whether there are any I.A.R.C. category I (“known”) carcinogens based solely upon animal evidence. Or has the U.S. Food & Drug Administration ever approved a medication as reasonably safe and effective based upon only animal studies?

Every dog owner and lover has likely been told by a veterinarian, or the Humane Society, that we should resist their lupine entreaties and withhold chocolate, raisins, walnuts, avocados, and certain other human foods. Despite their obvious intelligence, capacity for affection, when it comes to toxicology, dogs are not people, although some people act like the less reputable varieties of dogs.

Back in 1985, in connection with Agent Orange litigation, the late Judge Jack Weinstein wrote what was correct then, and even more so today, that “laboratory animal studies are generally viewed with more suspicion than epidemiological studies, because they require making the assumption that chemicals behave similarly in different species.”[5] Judge Weinstein was no push-over for strident defense counsel or expert witnesses, but the legal consequences were nonetheless obvious to him, when he looked carefully at the animal studies plaintiffs’ expert witnesses were claiming to support their opinions. “[A]nimal studies are of so little probative value as to be inadmissible.  They cannot be a predicate for an opinion under Rule 703.”[6] One of the several disconnects between the plaintiffs’ expert witnesses’ animal studies and the human diseases claimed was the disparity of dose and duration between the relied upon studies and the service men claimants. Judge Weinstein observed that when the hand waving stopped, “[t]here is no evidence that plaintiffs were exposed to the far higher concentrations involved in both animal and industrial exposure studies.”[7]

Ranges and Oakley unfairly deprecate the Supreme Court’s treatment of animal evidence in the 1997 Joiner opinion.[8] Mr. Joiner had been an electrician by a small city in Georgia, where he experienced dermal exposure, over several years, to polychlorinated biphenyls (PCB’s), a chemical found in electrical transformer coolant. He alleged that he had developed small-cell lung cancer from his occasional occupational exposure. In the district court, a careful judge excluded the plaintiffs’ expert witnesses, who relied heavily upon animal studies and who cherry picked and distorted the available epidemiology.[9] The Court of Appeals reversed, in an unsigned, non-substantive opinion that interjected an asymmetric standard of review.[10]

After granting review, the Supreme Court engaged with the substantive validity issues passed over by the intermediate appellate court. In addressing the plaintiff’s expert witnesses’s reliance upon animal studies, the Court was struck by an extrapolation from a different species, different route of administration, different dose, different duration of exposure, and different disease.[11] Joiner was an adult human whose alleged exposure to PCBs was far less than the exposure in the baby mice that received injections of PCBs in a high concentration. The mice developed alveologenic adenomas, a rare tumor that is usually benign, not malignant.[12] The Joiner Court recognized that these multiple extrapolations were a bridge to nowhere, and reversed the Court of Appeals, and reinstated the judgment of the district court. What is particular salient about the Joiner decision, and about which you will find no discussion in the law review paper by Ranges and Oakley, is how well the Joiner opinion has held up over quarter of a century that passed. Today, in the waning moments of 2023, there is still no valid, scientifically sound support for the claim that the sort of exposure Mr. Joiner had can cause small-cell lung cancer.[13]

Perhaps the most egregious lapses in scholarship occur when Ranges, a newly minted scientist, and her co-author, a full professor of law, write:

“For example, Bendectin, an antinausea medication prescribed to pregnant women, caused a slew of birth defects (hence its nickname ‘The Second Thalidomide’).49[14]

I had to re-read this sentence many times to make sure I was not hallucinating. Ranges’ and Oakley’s statement is, of course, demonstrably false. A double whooper, at least, and a jarring deviation from the standard of scholarly care.

But their statement is footnoted you say. Here is what the cited article, footnote 40 in “Vermin of Proof,” says:

“RESULTS: The temporal trends in prevalence rates for specific birth defects examined from 1970 through 1992 did not show changes that reflected the cessation of Bendectin use over the 1980–84 period. Further, the NVP hospitalization rate doubled when Bendectin use ceased.

CONCLUSIONS: The population results of the ecological analyses complement the person-specific results of the epidemiological analyses in finding no evidence of a teratogenic effect from the use of Bendectin.”[15]

So the cited source actually says the exact opposite of what the authors assert. Apparently, students on law review at Georgetown University Law Center do not check citations for accuracy. Not only was the statement wrong in 1993, when the Supreme Court decided the famous Daubert case, it was wrong 20 years later, in 2013, when the United States Food and Drug Administration (FDA) approved  Diclegis, a combination of doxylamine succinate and pyridoxine hydrochloride, the essential ingredients in Bendectin, for sale in the United States, for pregnant women experiencing nausea and vomiting.[16] The return of Bendectin to the market, although under a different name, was nothing less than a triumph of science over the will of the lawsuit industry.[17] 

Channeling the likes of plaintiffs’ expert witness Carl Cranor (whom they cite liberally and credulously), Ranges and Oakley argue for a vague “weight of the evidence” (WOE) methodology, in which several inconclusive and lighter-than-air pieces of evidence somehow magically combine in cold fusion to warrant a conclusion of causation. Others have gone down this dubious path before, but these authors’ embrace of the plaintiffs’ expert witnesses’ opinion in Bendectin litigation reveals the insubstantiality and the invalidity of their method.[18] As Professor Ronald Allen put the matter:

“Given the weight of evidence in favor of Bendectin’s safety, it seems peculiar to argue for mosaic evidence [WOE] from a case in which it would have plainly been misleading.”[19]

It surely seems like a reduction ad absurdum of the proposed methodology.

One thing these authors get right is that most courts disparage and exclude expert witness opinion that relies exclusively or excessively upon animal toxicology.[20] They wrongly chastise these courts, however, for ignoring scientific opinion. In 2005, the Teratology Society issued a position paper on causation in teratology-related litigation,[21] in which the Society specifically addressed the authors’ claims:

“6. Human data are required for conclusions that there is a causal relationship between an exposure and an outcome in humans. Experimental animal data are commonly and appropriately used in establishing regulatory exposure limits and are useful in addressing biologic plausibility and mechanism questions, but are not by themselves sufficient to establish causation in a lawsuit. In vitro data may be helpful in exploring mechanisms of toxicity but are not by themselves evidence of causation.”[22]

Ranges and Oakley are flummoxed that courts exclude expert witnesses who have relied upon animal studies when regulatory agencies use such studies with abandon. The case law on the distinction between precautionary standards in regulation and causation standards in tort law is clear, and explains the difference in approach, but these authors are determined to ignore the obvious difference.[23] The Teratology Society emphasized what should be hornbook law; namely, regulatory standards for testing and warnings are not particularly germane to tort law standards for causation:

“2. The determination of causation in a lawsuit is not the same as a regulatory determination of a protective level of exposure. If a government agency has determined a regulatory exposure level for a chemical, the existence of that level is not evidence that the chemical produces toxicity in humans at that level or any other level. Regulatory levels use default assumptions that are improper in lawsuits. One such assumption is that humans will be as sensitive to the toxicity of a chemical as is the most sensitive experimental animal species. This assumption may be very useful in regulation but is not evidence that exposure to that chemical caused an adverse outcome in an individual plaintiff. Regulatory levels often incorporate uncertainty factors or margins of exposure. These factors may result in a regulatory level much lower than an exposure level shown to be harmful in any organism and are an additional reason for the lack of utility of regulatory levels in causation considerations.”[24]

The suggestion from Ranges and Oakley that the judicial treatment of reliance upon animal studies is based upon ossified, ancient precedent, prejudice, and uncritical acceptance of defense counsel’s unsupported argument is simply wrong. There are numerous discussions of the difficulty of extrapolating teratogenicity from animal data to humans,[25] and ample basis for criticism of the glib extension of rodent carcinogenicity to humans.[26]

Ranges and Oakley ignore the extensive scientific literature questioning extrapolation from high exposure rodent models to much lower exposures in humans.[27] The invalidity of extrapolation can result in both false positives and false negatives. Indeed the thalidomide case is a compelling example of the failure of animal testing. Thalidomide was tested on pregnant rats and rabbits without detecting teratogenicity; indeed most animal species do not metabolize thalidomide or exhibit teratogenicity as seen in humans. Animal models simply do not have a sufficient positive predictive value to justify a conclusion of causation in humans, even if we accept a precautionary principle recognition of such animal testing for regulatory purposes.[28]

As improvident as Ranges’ pronouncements may be, finding her message amplified by Professor Ed Cheng on his podcast series, Excited Utterances, was even more disturbing. In November 2023, Cheng interviewed Kristen Ranges in an episode of his podcast, Vermin of Proof, in which he gave Ranges a chance to reprise her complaints about the judiciary’s handling of animal evidence, without much in the way of specificity, and with some credulous cheerleading to aid and abet. In his epilogue, Cheng wondered why toxicologic evidence is disfavored when such evidence is routinely used by scientists and regulators. What Cheng misses is that regulators use toxicologic evidence for regulation, not for assessments of human causation, and that the two enterprises are quite different.  The regulatory exercise goes something like asking about the stall speed of a pig. It does not matter that pigs cannot fly; we skip that fact and press on to ask what the pig’s take off and stall speeds are.

Seventy years ago, no less than Sir Austin Bradford Hill, observed that:

“We may subject mice, or other laboratory animals, to such an atmosphere of tobacco smoke that they can — like the old man in the fairy story — neither sleep nor slumber; they can neither breed nor eat. And lung cancers may or may not develop to a significant degree. What then? We may have thus strengthened the evidence, we may even have narrowed the search, but we must, I believe, invariably return to man for the final proof or proofs.”[29]


[1] Alexander Pope, “An Essay on Man” (1733), in Robin Sowerby, ed., Alexander Pope: Selected Poetry and Prose at 153 (1988).

[2] Kristen Ranges, Animals Aiding Justice: The Deepwater Horizon Oil Spill and Ensuing Neurobehavioral Impacts as a Case Study for Using Animal Models in Toxic Tort Litigation – A Dissertation (2023).

[3] Kristen Ranges & Jessica Owley, “Vermin of Proof: Arguments for the Admissibility of Animal Model Studies as Proof of Causation in Toxic Tort Litigation,” 34 Georgetown Envt’l L. Rev. 303 (2022) [Vermin]

[4] Vermin at 303.

[5] In re Agent Orange Prod. Liab. Litig., 611 F. Supp. 1223, 1241 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).

[6] Id.

[7] Id.

[8] General Elec. Co. v. Joiner, 522 U.S. 136, 144 (1997) [Joiner]

[9] Joiner v. General Electric Co., 864 F. Supp. 1310 (N.D. Ga. 1994).

[10] Joiner v. General Electric Co., 134 F.3d 1457 (11th Cir. 1998) (per curiam). 

[11] Joiner, 522 U.S. at 144-45.

[12] See Leonid Roshkovan, Jeffrey C. Thompson, Sharyn I. Katz, Charuhas Deshpande, Taylor Jenkins, Anna K. Nowak, Rosyln Francis, Carole Dennie, Dominique Fabre, Sunil Singhal, and Maya Galperin-Aizenberg, “Alveolar adenoma of the lung: multidisciplinary case discussion and review of the literature,” 12 J. Thoracic Dis. 6847 (2020).

[13] See How Have Important Rule 702 Holdings Held Up With Time?” (Mar. 20, 2015); “The Joiner Finale” (Mar. 23, 2015).

[14] Vermain at 312.

[15] Jeffrey S Kutcher, Arnold Engle, Jacqueline Firth & Steven H. Lamm, “Bendectin and Birth Defects II: Ecological Analyses, 67 Birth Defects Research Part A: Clinical and Molecular Teratology 88, 88 (2003).

[16] See FDA News Release, “FDA approves Diclegis for pregnant women experiencing nausea and vomiting,” (April 8, 2013).

[17] See Gideon Koren, “The Return to the USA of the Doxylamine-Pyridoxine Delayed Release Combination (Diclegis®) for Morning Sickness — A New Morning for American Women,” 20 J. Popul. Ther. Clin. Pharmacol. e161 (2013).

[18] Michael D. Green, “Pessimism About Milward,” 3 Wake Forest J. Law & Policy41, 62-63 (2013); Susan Haack, “Irreconcilable Differences? The Troubled Marriage of Science and Law,” 72 Law & Contemporary Problems 1, 17 (2009); Susan Haack, “Proving Causation: The Holism of Warrant and the Atomism of Daubertm” 4 J. Health & Biomedical Law 273, 274-78 (2008).

[19] Ronald J. Allen & Esfand Nafisi, “Daubert and its Discontents,” 76 Brooklyn L. Rev. 132, 148 (2010). 

[20] See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 466, 475 (E.D. Pa. 2014) (noting that “causation opinions based primarily upon in vitro and live animal studies are unreliable and do not meet the Daubert standard.”), aff’d, 858 F.3d 787 (3rd Cir. 2017); Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296, 1308 (11th Cir. 2014) (affirming exclusion of testimony based on “secondary methodologies,” including animal studies, which offer “insufficient proof of general causation.”); The Sugar Ass’n v. McNeil-PPC, Inc., 2008 WL 11338092, *3 (C.D. Calif. July 21, 2008) (finding that plaintiffs’ expert witnesses, including Dr. Abou-Donia, failed to provide the requisite analytical  support for the extrapolation of their Five Opinions from rats to humans”); In re Silicone Gel Breast Implants Prods. Liab. Litig., 318 F. Supp. 2d 879, 891 (C.D. Cal. 2004) (observing that failure to compare similarities and differences across animals and humans could lead to the exclusion of opinion evidence); Cagle v. The Cooper Companies, 318 F. Supp. 2d 879, 891 (C.D. Calif. 2004) (citing Joiner, for observation that animal studies are not generally admissible when contrary epidemiologic studies are available; and detailing significant disadvantages in relying upon animal studies, such as (1) differences in absorption, distribution, and metabolism; (2) the unrealistic, non-physiological exposures used in animal studies; and (3) the use of unverified assumptions about dose-response); Wills v. Amerada Hess Corp., No. 98 CIV. 7126(RPP), 2002 WL 140542, at *12 (S.D.N.Y. Jan. 31, 2002) (faulting expert’s reliance on animal studies because there was no evidence plaintiff had injected suspected carcinogen in same manner as studied animals, or at same dosage levels), aff’d, 379 F.3d 32 (2nd Cir. 2004) (Sotomayor, J.); Bourne v. E.I. du Pont de Nemours & Co., 189 F. Supp. 2d 482, 501 (S.D. W.Va. 2002) (benlate and birth defects), aff’d, 85 F. App’x 964 (4th Cir.), cert. denied, 543 U.S. 917 (2004); Magistrini v. One Hour Martinizing Dry Cleaning noted that “[a]nimal bioassays are of limited use in determining whether a particular chemical causes a particular disease, or type of cancer, in humans.”190 180 F. Supp. 2d 584, 593 (D.N.J. 2002); Soutiere v. BetzDearborn, Inc., No. 2:99-CV-299, 2002 WL  34381147, at *4 (D. Vt. July 24, 2002) (holding expert’s evidence inadmissible when “[a]t best there are animal studies that suggest a link between massive doses of [the substance in question] and the development of certain kinds of cancers, such that [the substance in question] is listed as a ‘suspected’ or ‘probable’ human carcinogen”); Glastetter v. Novartis Pharms. Corp., 252 F.3d 986, 991 (8th Cir. 2001); Hollander v. Sandoz Pharm. Corp., 95 F. Supp. 2d 1230, 1238 (W.D. Okla. 2000), aff’d, 289 F.3d 1193, 1209 (10th Cir. 2002) (rejecting the relevance of animal studies to causation arguments in the circumstances of the case); Allison v. McGhan Medical Corp., 184 F.3d 1300, 1313–14 (11th Cir.1999); Raynor v. Merrell Pharrns. Inc., 104 F.3d 1371, 1375-1377 (D.C. Cir.1997) (observing that animal studies are unreliable, especially when “sound epidemiological studies produce opposite results from non-epidemiological ones, the rate of error of the latter is likely to be quite high”); Lust v. Merrell Dow Pharms., Inc., 89 F.3d 594, 598 (9th Cir.1996); Barrett v. Atlantic Richfield Co., 95 F.3d 375 (5th Cir. 1996) (extrapolation from a rat study was speculation); Nat’l Bank of Comm. v. Dow Chem. Co., 965 F. Supp. 1490, 1527 (E.D. Ark. 1996) (“because of the difference in animal species, the methods and routes of administration of the suspect chemical agent, maternal metabolisms and other factors, animal studies, taken alone, are unreliable predictors of causation in humans”), aff’d, 133 F.3d 1132 (8th Cir. 1998); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1410-11 (D. Or. 1996) (with the help of court-appointed technical advisors, observing that animal studies taken alone fail to predict human disease reliably); Daubert v. Merrell Dow Pharrns., Inc., 43 F.3d 1311, 1322 (9th Cir. 1995) (on remand from Supreme Court with directions to apply an epistemic standard derived from Rule 702 itself); Sorensen v. Shaklee Corp., 31 F.3d 638, 650 (8th Cir.1994) (affirming exclusion of expert witness opinions based upon animal mutagenicity data not germane to the claimed harm); Elkins v. Richardson-Merrell, Inc., 8 F.3d 1068, 1073 (6th Cir. 1993);Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441, 1482 (D.V.1. 1994), aff’d, 46 F.3d 1120 (3d Cir. 1994) (per curiam); Renaud v. Martin Marietta Corp., Inc., 972 F.2d 304, 307 (10th Cir.1992) (“The etiological evidence proffered by the plaintiff was not sufficiently reliable, being drawn from tests on non-human subjects without confirmatory epidemiological data.”) (“Dr. Jackson performed no calculations to determine whether the dose or route of administration of antidepressants to rats and monkeys in the papers that she cited in her report was equivalent to or substantially similar to human beings taking prescribed doses of Prozac.”); Bell v. Swift Adhesives, Inc., 804 F. Supp. 1577, 1579–81 (S.D. Ga. 1992) (excluding expert opinion of Dr. Janette Sherman, who opined that methylene chloride caused liver cancer, based largely upon on animal studies); Conde v. Velsicol Chem. Corp., 804 F. Supp. 972, 1025-26 (S.D. Ohio 1992) (noting that epidemiology is “the primary generally accepted methodology for demonstrating a causal relation between a chemical compound and a set of symptoms or a disease”), aff’d, 24 F.3d 809 (6th Cir. 1994); Turpin v. Merrell Dow Pharm., Inc., 959 F.2d 1349, 1360-61 (6th Cir. 1992) (“The analytical gap between the [animal study] evidence presented and the inferences to be drawn on the ultimate issue of human birth defects is too wide. Under such circumstances, a jury should not be asked to speculate on the issue of causation.”); Brock v. Merrell Dow Pharm., 874F.2d 307, 313 (5th Cir. 1989) (noting the “very limited usefulness of animal studies when confronted with questions of toxicity”); Richardson v. Richardson-Merrell, Inc., 857 F. 2d 823, 830 (D.C. Cir. 1988) (“Positive results from in vitro studies may provide a clue signaling the need for further research, but alone do not provide a satisfactory basis for opining about causation in the human context.”);  Lynch v. Merrell-Nat‘l Labs., 830 F.2d 1190, 1194 (1st Cir. 1987) (“Studies of this sort [animal studies], singly or in combination, do not have the capability of proving causation in human beings in the absence of any confirmatory epidemiological data.”). See also Merrell Dow Pharrns., Inc. v. Havner, 953 S.W.2d 706, 730 (Tex. 1997); DePyper v. Navarro, No. 83-303467-NM, 1995 WL 788828, at *34 (Mich. Cir. Ct. Nov. 27, 1995), aff’d, No. 191949, 1998 WL 1988927 (Mich. Ct. App. Nov. 6, 1998); Nelson v. American Sterilizer Co., 566 N.W.2d 671 (Mich. Ct. App. 1997)(high-dose animal studies not reliable). But see Ambrosini v. Labarraque,  101 F.3d 129, 137-140 (D.C. Cir.1996); Dyson v. Winfield, 113 F. Supp. 2d 44, 50-51 (D.D.C. 2000).

[21] Teratology Society Public Affairs Committee, “Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421 (2005) [Teratology Position Paper]

[22] Id. at 423.

[23]  SeeImproper Reliance Upon Regulatory Risk Assessments in Civil Litigation” (Mar. 19, 2023) (collecting cases).

[24] Teratology Position Paper at 422-423.

[25] See, e.g., Gideon Koren, Anne Pastuszak & Shinya Ito, “Drugs in Pregnancy,” 338 New England J. Med. 1128, 1131 (1998); Louis Lasagna, “Predicting Human Drug Safety from Animal Studies: Current Issues,” 12 J. Toxicological Sci. 439, 442-43 (1987).

[26] Bruce N. Ames & Lois S. Gold, Too Many Rodent Carcinogens: Mitogenesis Increases Mutagenesis, 249 Science 970, 970 (1990) (noting that chronic irritation induced by many chemicals at high exposures is itself a cause of cancer in rodent models); Bruce N. Ames & Lois Swirsky Gold, “Environmental Pollution and Cancer: Some Misconceptions,” in Jay H. Lehr, ed., Rational Readings on Environmental Concerns 151, 153 (1992); Mary Eubanks, “The Danger of Extrapolation: Humans and Rodents Differ in Response to PCBs,” 112 Envt’l Health Persps. A113 (2004)

[27] Andrea Gawrylewski, “The Trouble with Animal Models: Why did human trials fail?” 21 The Scientist 44 (2007); Michael B. Bracken, “Why animal studies are often poor predictors of human reactions to exposure,” 101 J. Roy. Soc. Med. 120 (2008); Fiona Godlee, “How predictive and productive is animal research?” 3348 Brit. Med. J. g3719 (2014); John P. A. Ioannidis, “Extrapolating from Animals to Humans,” 4 Science Translational Med. 15 (2012); Pandora Pound & Michael Bracken, “Is animal research sufficiently evidence based to be a cornerstone of biomedical research?” 348 Brit. Med. J. g3387 (2014); Pandora Pound, Shah Ebrahim, Peter Sandercock, Michael B Bracken, and Ian Roberts, “Where is the evidence that animal research benefits humans?” 328 Brit. Med. J. 514 (2004) (writing on behalf of the Reviewing Animal Trials Systematically (RATS) Group).

[28] See Ray Greek, Niall Shanks, and Mark J. Rice, “The History and Implications of Testing Thalidomide on Animals,” 11 J. Philosophy, Sci. & Law 1, 19 (2011).

[29] Austin Bradford Hill, “Observation and Experiment,” 248 New Engl. J. Med. 995, 999 (1953).

The IARC-hy of Evidence – Incoherent & Inconsistent Classifications of Carcinogenicity

September 19th, 2023

Recently, two lawyers wrote an article in a legal trade magazine about excluding epidemiologic evidence in civil litigation.[1] The article was wildly wide of the mark, with several conceptual and practical errors.[2] For starters, the authors discussed Rule 702 as excluding epidemiologic studies and evidence, when the rule addresses the admissibility of expert witness opinion testimony. The most egregious recommendation of the authors, however, was their recommendation that counsel urge the classifications of chemicals with respect to carcinogenicity, by the International Agency for Research on Cancer (IARC), and by regulatory agencies, as probative for or against causation.

The project of evaluating the evidence for, or against, carcinogenicity of the myriad natural and synthetic agents to which humans are exposed is certainly important. Certainly, IARC has taken the project seriously. There have, however, been problems with IARC’s classifications of specific chemicals, pharmaceuticals, or exposure circumstances, but a basic problem with the classifications begins with the classes themselves. Classification requires defined classes. I don’t mean to be anti-semantic, but IARC’s definitions and its hierarchy of carcinogenicity are not entirely coherent.

The agency was established in 1965, and by the early 1970s, found itself in the business of preparing “monographs on the evaluation of carcinogenic risk of chemicals to man.” Originally, the IARC set out to classify the carcinogenicity of chemicals, but over the years, its scope increased to include complex mixtures, physical agents such as different forms of radiation, and biological organisms. To date, there have been 134 IARC monographs, addressing 1,045 “agents” (either substances or exposure circumstances).

From its beginnings, the IARC has conducted its classifications through working groups that meet to review and evaluate evidence, and classify the cancer hazards of “agents” under discussion. The breakdown of IARC’s classifications among four groups currently is:

Group 1 – Carcinogenic to humans (127 agents)

Group 2A – Probably carcinogenic to humans (95 agents)

Group 2B – Possibly carcinogenic to humans (323 agents)

Group 3 – Not classifiable as to its carcinogenicity to humans   (500 agents)

Previously, the IARC classification included a Group 4 for agents that are probably not carcinogenic for human beings. After decades of review, the IARC placed only a single agent in Group 4, caprolactam, apparently because the agency found everything else in the world to be presumptively a cause of cancer. The IARC could not find sufficiently strong evidence even for water, air, or basic foods to declare that they do not cause cancer in humans. Ultimately, the IARC abandoned Group 4, in favor of a presumption of universal carcinogencity.

The IARC describes its carcinogen classification procedures, requirements, and rationales in a document known as “The Preamble.” Any discussion of IARC classifications, whether in scientific publications or in legal briefs, without reference to this document should be suspect. The Preamble seeks to define many of the words in the classificatory scheme, some in ways that are not intuitive. This document has been amended over time, and the most recent iteration can be found online at the IARC website.[3]

IARC claims to build its classifications upon “consensus” evaluations, based in turn upon considerations of

(a) the strength of evidence of carcinogenicity in humans,

(b) the evidence of carcinogenicity in experimental (non-human) animals, and

(c) the mechanistic evidence of carcinogenicity.

IARC further claims that its evaluations turn on the use of “transparent criteria and descriptive terms.”[4] This last claim is, for some terms, is falsifiable.

The working groups are described as engaged in consensus evaluations, although past evaluations have been reached on simple majority vote of the working group. The working groups are charged with considering the three lines of evidence, described above, for any given agent, and reaching a synthesis in the form of the IARC classificatory scheme. The chart, from the Preamble, below roughly describes how working groups may “mix and match” lines of evidence, of varying degrees of robustness and validity (vel non) to reach a classification.

 

Agents placed in Category I are thus “carcinogenic to humans.” Interestingly, IARC does not refer to Category I carcinogens as “known” carcinogens, although many commentators are prone to do so. The implication of calling Category I agents “known carcinogens” is to distinguish Category IIA, IIB, and III as agents “not known to cause cancer.” The adjective that IARC uses, rather than “known,” is “sufficient” evidence in humans, but IARC also allows for reaching Category I with “limited,” or even “inadequate” human evidence if the other lines of evidence, in experimental animals or mechanistic evidence in humans, are sufficient.

In describing “sufficient” evidence, the IARC’s Preamble does not refer to epidemiologic evidence as potentially “conclusive” or “definitive”; rather its use of “sufficient” implies, perhaps non-transparently, that its labels of “limited” or “inadequate” evidence in humans refer to insufficient evidence. IARC gives an unscientific, inflated weight and understanding to “limited evidence of carcinogenicity,” by telling us that

“[a] causal interpretation of the positive association observed in the body of evidence on exposure to the agent and cancer is credible, but chance, bias, or confounding could not be ruled out with reasonable confidence.”[5]

Remarkably, for IARC, credible interpretations of causality can be based upon evidentiary displays that are confounded or biased.  In other words, non-credible associations may support IARC’s conclusions of causality. Causal interpretations of epidemiologic evidence are “credible” according to IARC, even though Sir Austin’s predicate of a valid association is absent.[6]

The IARC studiously avoids, however, noting that any classification is based upon “insufficient” evidence, even though that evidence may be less than sufficient, as in “limited,” or “inadequate.” A close look at Table 4 reveals that some Category I classifications, and all Category IIA, IIB, and III classifications are based upon insufficient evidence of carcinogenicity in humans.

Non-Probable Probabilities

The classification immediately below Category or Group I is Group 2A, for agents “probably carcinogenic to humans.” The IARC’s use of “probably” is problematic. Group I carcinogens require only “sufficient” evidence of human carcinogenicity, and there is no suggestion that any aspect of a Group I evaluation requires apodictic, conclusive, or even “definitive” evidence. Accordingly, the determination of Group I carcinogens will be based upon evidence that is essentially probabilistic. Group 2A is also defined as having only “limited evidence of carcinogenicity in humans”; in other words, insufficient evidence of carcinogenicity in humans, or epidemiologic studies with uncontrolled confounding and biases.

Importing IARC 2A classifications into legal or regulatory arenas will allow judgments or regulations based upon “limited evidence” in humans, which as we have seen, can be based upon inconsistent observational studies, and studies that fail to measure and adjust for known and potential confounding risk factors and systematic biases. The 2A classification thus requires little substantively or semantically, and many 2A classifications leave juries and judges to determine whether a chemical or medication caused a human being’s cancer, when the basic predicates for Sir Austin Bradford Hill’s factors for causal judgment have not been met.[7]

An IARC evaluation of Group 2A, or “probably carcinogenic to humans,” would seem to satisfy the legal system’s requirement that an exposure to the agent of interest more likely than not causes the harm in question. Appearances and word usage in different contexts, however, can be deceiving. Probability is a continuous quantitative scale from zero to one. In Bayesian analyses, zero and one are unavailable because if either were our starting point, no amount of evidence could ever change our judgment of the probability of causation. (Cromwell’s Rule). The IARC informs us that its use of “probably” is purely idiosyncratic; the probability that a Group 2A agent causes cancer has “no quantitative” meaning. All the IARC intends is that a Group 2A classification “signifies a greater strength of evidence than possibly carcinogenic.”[8] Group 2A classifications are thus consistent with having posterior probabilities less than 0.5 (or 50 percent). A working group could judge the probability of a substance or a process to be carcinogenic to humans to be greater than zero, but no more than say ten percent, and still vote for a 2A classification, in keeping with the IARC Preamble. This low probability threshold for a 2A classification converts the judgment of “probably carcinogenic” into little more than precautionary prescriptions, rendered when the most probable assessment is either ignorance or lack of causality. There is thus a practical certainty, close to 100%, that a 2A classification will confuse judges and juries, as well as the scientific community.

In addition to being based upon limited, that is insufficient, evidence of human carcinogenicity, Group 2A evaluations of “probable human carcinogenicity” connote “sufficient evidence” in experimental animals. An agent can be classified 2A even when the sufficient evidence of carcinogenicity occurs in only one of several non-human animal species, with the other animal species failing to show carcinogenicity. IARC 2A classifications can thus raise the thorny question in court whether a claimant is more like a rat or a mouse.

Courts should, because of the incoherent and diluted criteria for “probably carcinogenic,” exclude expert witness opinions based upon IARC 2A classifications as scientifically insufficient.[9] Given the distortion of ordinary language in its use of defined terms such as “sufficient,” “limited,” and “probable,” any evidentiary value to IARC 2A classifications, and expert witness opinion based thereon, is “substantially outweighed by a danger of … unfair prejudice, confusing the issues, [and] misleading the jury….”[10]

Everything is Possible

Category 2B denotes “possibly carcinogenic.” This year, the IARC announced that a working group had concluded that aspartame, an artificial sugar substitute, was “possibly carcinogenic.”[11] Such an evaluation, however, tells us nothing. If there are no studies at all of an agent, the agent could be said to be possibly carcinogenic. If there are inconsistent studies, even if the better designed studies are exculpatory, scientists could still say that the agent of interest was possibly carcinogenic. The 2B classification does not tell us anything because everything is possible until there is sufficient evidence to inculpate or exculpate it from causing cancer in humans.

It’s a Hazard, Not a Risk

IARC’s classification does not include an assessment of exposure levels. Consequently, there is no consideration of dose or exposure level at which an agent becomes carcinogenic. IARC’s evaluations are limited to whether the agent is or is not carcinogenic. The IARC explicitly concedes that exposure to a carcinogenic agent may carry little risk, but it cannot bring itself to say no risk, or even benefit at low exposures.

As noted, the IARC classification scheme refers to the strength of the evidence that an agent is carcinogenic, and not to the quantitative risk of cancer from exposure at a given level. The Preamble explains the distinction as fundamental:

“A cancer hazard is an agent that is capable of causing cancer, whereas a cancer risk is an estimate of the probability that cancer will occur given some level of exposure to a cancer hazard. The Monographs assess the strength of evidence that an agent is a cancer hazard. The distinction between hazard and risk is fundamental. The Monographs identify cancer hazards even when risks appear to be low in some exposure scenarios. This is because the exposure may be widespread at low levels, and because exposure levels in many populations are not known or documented.”[12]

This attempted explanation reveals important aspects of IARC’s project. First, there is an unproven assumption that there will be cancer hazards regardless of the exposure levels. The IARC contemplates that there may circumstances of low levels of risk from low levels of exposure, but it elides the important issue of thresholds. Second, IARC’s distinction between hazard and risk is obscured by its own classifications.  For instance, when IARC evaluated crystalline silica and classified it in Group I, it did so for only “occupational exposures.”[13] And yet, when IARC evaluated the hazard of coal exposure, it placed coal dust in Group 3, even though coal dust contains crystalline silica.[14] Similarly, in 2018, the IARC classified coffee as a Group 3,[15] even though every drop of coffee contains acrylamide, which is, according to IARC, a Group 2A “probable human carcinogen.”[16]


[1] Christian W. Castile & and Stephen J. McConnell, “Excluding Epidemiological Evidence Under FRE 702,” For The Defense 18 (June 2023) [Castile].

[2]Excluding Epidemiologic Evidence Under Federal Rule of Evidence 702” (Aug. 26, 2023).

[3] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble (2019).

[4] Jonathan M. Samet , Weihsueh A. Chiu , Vincent Cogliano, Jennifer Jinot, David Kriebel, Ruth M. Lunn, Frederick A. Beland, Lisa Bero, Patience Browne, Lin Fritschi, Jun Kanno , Dirk W. Lachenmeier, Qing Lan, Gerard Lasfargues, Frank Le Curieux, Susan Peters, Pamela Shubat, Hideko Sone, Mary C. White , Jon Williamson, Marianna Yakubovskaya , Jack Siemiatycki, Paul A. White, Kathryn Z. Guyton, Mary K. Schubauer-Berigan, Amy L. Hall, Yann Grosse, Veronique Bouvard, Lamia Benbrahim-Tallaa, Fatiha El Ghissassi, Beatrice Lauby-Secretan, Bruce Armstrong, Rodolfo Saracci, Jiri Zavadil , Kurt Straif, and Christopher P. Wild, “The IARC Monographs: Updated Procedures for Modern and Transparent Evidence Synthesis in Cancer Hazard Identification,” 112 J. Nat’l Cancer Inst. djz169 (2020).

[5] Preamble at 31.

[6] See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965) (noting that only when “[o]ur observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” do we move on to consider the nine articulated factors for determining whether an association is causal.

[7] Id.

[8] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble 31 (2019) (“The terms probably carcinogenic and possibly carcinogenic have no quantitative significance and are used as descriptors of different strengths of evidence of carcinogenicity in humans.”).

[9] SeeIs the IARC lost in the weeds” (Nov. 30, 2019); “Good Night Styrene” (Apr. 18, 2019).

[10] Fed. R. Evid. 403.

[11] Elio Riboli, et al., “Carcinogenicity of aspartame, methyleugenol, and isoeugenol,” 24 The Lancet Oncology P848-850 (2023);

IARC, “Aspartame hazard and risk assessment results released” (2023).

[12] Preamble at 2.

[13] IARC Monograph 68, at 41 (1997) (“For these reasons, the Working Group therefore concluded that overall the epidemiological findings support increased lung cancer risks from inhaled crystalline silica (quartz and cristobalite) resulting from occupational exposure.”).

[14] IARC Monograph 68, at 337 (1997).

[15] IARC Monograph No. 116, Drinking Coffee, Mate, and Very Hot Beverages (2018).

[16] IARC Monograph no. 60, Some Industrial Chemicals (1994).

Excluding Epidemiologic Evidence under Federal Rule of Evidence 702

August 26th, 2023

We are 30-plus years into the “Daubert” era, in which federal district courts are charged with gatekeeping the relevance and reliability of scientific evidence. Not surprisingly, given the lawsuit industry’s propensity on occasion to use dodgy science, the burden of awakening the gatekeepers from their dogmatic slumber often falls upon defense counsel in civil litigation. It therefore behooves defense counsel to speak carefully and accurately about the grounds for Rule 702 exclusion of expert witness opinion testimony.

In the context of medical causation opinions based upon epidemiologic evidence, the first obvious point is that whichever party is arguing for exclusion should distinguish between excluding an expert witness’s opinion and prohibiting an expert witness from relying upon a particular study.  Rule 702 addresses the exclusion of opinions, whereas Rule 703 addresses barring an expert witness from relying upon hearsay facts or data unless they are reasonably relied upon by experts in the appropriate field. It would be helpful for lawyers and legal academics to refrain from talking about “excluding epidemiological evidence under FRE 702.”[1] Epidemiologic studies are rarely admissible themselves, but come into the courtroom as facts and data relied upon by expert witnesses. Rule 702 is addressed to the admissibility vel non of opinion testimony, some of which may rely upon epidemiologic evidence.

Another common lawyer mistake is the over-generalization that epidemiologic research provides “gold standard” of general causation evidence.[2] Although epidemiology is often required, it not “the medical science devoted to determining the cause of disease in human beings.”[3] To be sure, epidemiologic evidence will usually be required because there is no genetic or mechanistic evidence that will support the claimed causal inference, but counsel should be cautious in stating the requirement. Glib statements by courts that epidemiology is not always required are often simply an evasion of their responsibility to evaluate the validity of the proffered expert witness opinions. A more careful phrasing of the role of epidemiology will make such glib statements more readily open to rebuttal. In the absence of direct biochemical, physiological, or genetic mechanisms that can be identified as involved in bringing about the plaintiffs’ harm, epidemiologic evidence will be required, and it may well be the “gold standard” in such cases.[4]

When epidemiologic evidence is required, counsel will usually be justified in adverting to the “hierarchy of epidemiologic evidence.” Associations are shown in studies of various designs with vastly differing degrees of validity; and of course, associations are not necessarily causal. There are thus important nuances in educating the gatekeeper about this hierarchy. First, it will often be important to educate the gatekeeper about the distinction between descriptive and analytic studies, and the inability of descriptive studies such as case reports to support causal inferences.[5]

There is then the matter of confusion within the judiciary and among “scholars” about whether a hierarchy even exists. The chapter on epidemiology in the Reference Manual on Scientific Evidence appears to suggest the specious position that there is no hierarchy.[6] The chapter on medical testimony, however, takes a different approach in identifying a normative hierarchy of evidence to be considered in evaluating causal claims.[7] The medical testimony chapter specifies that meta-analyses of randomized controlled trials sit atop the hierarchy. Yet, there are divergent opinions about what should be at the top of the hierarchical evidence pyramid. Indeed, the rigorous, large randomized trial will often replace a meta-analysis of smaller trials as the more definitive evidence.[8] Back in 2007, a dubious meta-analysis of over 40 clinical trials led to a litigation frenzy over rosiglitazone.[9] A mega-trial of rosiglitazone showed that the 2007 meta-analysis was wrong.[10]

In any event, courts must purge their beliefs that once there is “some” evidence in support of a claim, their gatekeeping role is over. Randomized controlled trials really do trump observational studies, which virtually always have actual or potential confounding in their final analyses.[11] While disclaimers about the unavailability of randomized trials for putative toxic exposures are helpful, it is not quite accurate to say that it is “unethical to intentionally expose people to a potentially harmful dose of a suspected toxin.”[12] Such trials are done all the time when there is an expected therapeutic benefit that creates at least equipoise between the overall benefit and harm at the outset of the trial.[13]

At this late date, it seems shameful that courts must be reminded that evidence of associations does not suffice to show causation, but prudence dictates giving the reminder.[14] Defense counsel will generally exhibit a Pavlovian reflex to state that causality based upon epidemiology must be viewed through a lens of “Bradford Hill criteria.”[15] Rhetorically, this reflex seems wrong given that Sir Austin himself noted that his nine different considerations were “viewpoints,” not criteria. Taking a position that requires an immediate retreat seems misguided. Similarly, urging courts to invoke and apply the Bradford Hill considerations must be accompanied the caveat that courts must first apply Bradford Hill’s predicate[16] for the nine considerations:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[17]

Courts should be mindful that the language from the famous, often-cited paper was part of an after-dinner address, in which Sir Austin was speaking informally. Scientists will understand that he was setting out a predicate that calls for

(1) an association, which is

(2) “perfectly clear cut,” such that bias and confounding are excluded, and

(3) “beyond what we would care to attribute to the play of chance,” with random error kept to an acceptable level, before advancing to further consideration of the nine viewpoints commonly recited.

These predicate findings are the basis for advancing to investigate Bradford Hill’s nine viewpoints; the viewpoints do not replace or supersede the predicates.[18]

Within the nine viewpoints, not all are of equal importance. Consistency among studies, a particularly important consideration, implies that isolated findings in a single observational study will rarely suffice to support causal conclusions. Another important consideration, the strength of the association, has nothing to do with “statistical significance,” which is a predicate consideration, but reminds us that large risk ratios or risk differences provides some evidence that the association does not result from unmeasured confounding. Eliminating confounding, however, is one of the predicate requirements for applying the nine factors. As with any methodology, the Bradford Hill factors are not self-executing. The annals of litigation provide all-too-many examples of undue selectivity, “cherry picking,” and other deviations from the scientist’s standard of care.

Certainly lawyers must steel themselves against recommending the “carcinogen” hazard identifications advanced by the International Agency for Research on Cancer (IARC). There are several problematic aspects to the methods of IARC, not the least of which is IARC’s fanciful use of the word “probable.” According to the IARC Preamble, “probable” has no quantitative meaning.[19] In common legal parlance, “probable” typically conveys a conclusion that is more likely than not. Another problem arises from the IARC’s labeling of “probable human carcinogens” made in some cases without any real evidence of carcinogenesis in humans. Regulatory pronouncements are even more diluted and often involved little more than precautionary principle wishcasting.[20]


[1] Christian W. Castile & and Stephen J. McConnell, “Excluding Epidemiological Evidence Under FRE 702,” For The Defense 18 (June 2023) [Castile]. Although these authors provide an interesting overview of the subject, they fall into some common errors, such as failing to address Rule 703. The article is worth reading for its marshaling recent case law on the subject, but I detail of its errors here in the hopes that lawyers will speak more precisely about the concepts involved in challenging medical causation opinions.

[2] Id. at 18. In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *401 (S.D. Fla. Dec. 6, 2022); see also Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *14-15 (C.D. Cal. May 9, 2003) (“epidemiological studies provide the primary generally accepted methodology for demonstrating a causal relation between a chemical compound and a set of symptoms or disease” *** “The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case.”).

[3] See, e.g., Siharath v. Sandoz Pharm. Corp., 131 F. Supp. 2d 1347, 1356 (N.D. Ga. 2001) (“epidemiology is the medical science devoted to determining the cause of disease in human beings”).

[4] See, e.g., Lopez v. Wyeth-Ayerst Labs., No. C 94-4054 CW, 1996 U.S. Dist. LEXIS 22739, at *1 (N.D. Cal. Dec. 13, 1996) (“Epidemiological evidence is one of the most valuable pieces of scientific evidence of causation”); Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *15 (C.D. Cal. May 9, 2003) (“The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case”).

[5] David A. Grimes & Kenneth F. Schulz, “Descriptive Studies: What They Can and Cannot Do,” 359 Lancet 145 (2002) (“…epidemiologists and clinicians generally use descriptive reports to search for clues of cause of disease – i.e., generation of hypotheses. In this role, descriptive studies are often a springboard into more rigorous studies with comparison groups. Common pitfalls of descriptive reports include an absence of a clear, specific, and reproducible case definition, and interpretations that overstep the data. Studies without a comparison group do not allow conclusions about cause of disease.”).

[6] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” Reference Manual on Scientific Evidence 549, 564n.48 (citing a paid advertisement by a group of scientists, and misleadingly referring to the publication as a National Cancer Institute symposium) (citing Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004) (National Cancer Institute symposium [sic] concluding that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.”).

[7] John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” in Reference Manual on Scientific Evidence 687, 723 (3d ed. 2011).

[8] See, e.g., J.M. Elwood, Critical Appraisal of Epidemiological Studies and Clinical Trials 342 (3d ed. 2007).

[9] See Steven E. Nissen & Kathy Wolski, “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007). See also “Learning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11] In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *402 (S.D. Fla. Dec. 6, 2022) (“Unlike experimental studies in which subjects are randomly assigned to exposed and placebo groups, observational studies are subject to bias due to the possibility of differences between study populations.”)

[12] Castile at 20.

[13] See, e.g., Benjamin Freedman, “Equipoise and the ethics of clinical research,” 317 New Engl. J. Med. 141 (1987).

[14] See, e.g., In Re Onglyza (Saxagliptin) & Kombiglyze Xr (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 136955, at *127 (E.D. Ky. Aug. 2, 2022); Burleson v. Texas Dep’t of Criminal Justice, 393 F.3d 577, 585-86 (5th Cir. 2004) (affirming exclusion of expert causation testimony based solely upon studies showing a mere correlation between defendant’s product and plaintiff’s injury); Beyer v. Anchor Insulation Co., 238 F. Supp. 3d 270, 280-81 (D. Conn. 2017); Ambrosini v. Labarraque, 101 F.3d 129, 136 (D.C. Cir. 1996).

[15] Castile at 21. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449, 454-55 (E.D. Pa. 2014).

[16]Bradford Hill on Statistical Methods” (Sept. 24, 2013); see also Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013). 

[17] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[18] Castile at 21. See, e.g., In re Onglyza (Saxagliptin) & Kombiglyze XR (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 1821, at *43 (E.D. Ky. Jan. 5, 2022) (“The analysis is meant to apply when observations reveal an association between two variables. It addresses the aspects of that association that researchers should analyze before deciding that the most likely interpretation of [the association] is causation”); Hoefling v. U.S. Smokeless Tobacco Co., LLC, 576 F. Supp. 3d 262, 273 n.4 (E.D. Pa. 2021) (“Nor would it have been appropriate to apply them here: scientists are to do so only after an epidemiological association is demonstrated”).

[19] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble 31 (2019) (“The terms probably carcinogenic and possibly carcinogenic have no quantitative significance and are used as descriptors of different strengths of evidence of carcinogenicity in humans.”).

[20]Improper Reliance upon Regulatory Risk Assessments in Civil Litigation” (Mar. 19, 2023).

Reference Manual – Desiderata for 4th Edition – Part VI – Rule 703

February 17th, 2023

One of the most remarkable, and objectionable, aspects of the third edition was its failure to engage with Federal Rule of Evidence of 703, and the need for courts to assess the validity of individual studies relied upon. The statistics chapter has a brief, but important discussion of Rule 703, as does the chapter on survey evidence. The epidemiology chapter mentions Rule 703 only in a footnote.[1]

Rule 703 appears to be the red-headed stepchild of the Federal Rules, and it is often ignored and omitted from so-called Daubert briefs.[2] Perhaps part of the problem is that Rule 703 (“Bases of an Expert”) is one of the mostly poorly drafted rules in the Federal Rules of Evidence:

“An expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed. If experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted. But if the facts or data would otherwise be inadmissible, the proponent of the opinion may disclose them to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect.”

Despite its tortuous wording, the rule is clear enough in authorizing expert witnesses to rely upon studies that are themselves inadmissible, and allowing such witnesses to disclose the studies that they have relied upon, when there has been the requisite showing of probative value that outweighs any prejudice.

The statistics chapter in the third edition, nonetheless, confusingly suggested that

“a particular study may use a method that is entirely appropriate but that is so poorly executed that it should be inadmissible under Federal Rules of Evidence 403 and 702. Or, the method may be inappropriate for the problem at hand and thus lack the ‘fit’ spoken of in Daubert. Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703.”[3]

Particular studies, even when beautifully executed, are not admissible. And particular studies are not subject to evaluation under Rule 702, apart from the gatekeeping of expert witness opinion testimony that is based upon the particular studies. To be sure, the reference to Rule 703 is important and welcomed counter to the suggestion, elsewhere in the third edition, that courts should not look at individual studies. The independent review of individual studies is occasionally lost in the shuffle of litigation, and the statistics chapter is correct to note an evidentiary concern whether each individual study may or may not be reasonably relied upon by an expert witness. In any event, reasonably relied upon studies do not ipso facto become admissible.

The third edition’s chapter on Survey Research contains the most explicit direction on Rule 703, in terms of courts’ responsibilities.  In that chapter, the authors instruct that Rule 703:

“redirect[ed] attention to the ‘validity of the techniques employed’. The inquiry under Rule 703 focuses on whether facts or data are ‘of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject’.”[4]

Although Rule 703 is clear enough on admissibility, the epidemiology chapter described epidemiologic studies broadly as admissible if sufficiently rigorous:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[5]

The authors of the epidemiology chapter acknowledge, in a footnote, “that [h]earsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert.”[6]

This footnote is curious, and incorrect. There is no question that hearsay “concerns” “may limit” admissibility of a study; hearsay is inadmissible unless there is a statutory exception.[7] Rule 703 is not one of the exceptions to the rule against hearsay in Article VIII of the Federal Rules of Evidence. An expert witness’s reliance upon a study does not make the study admissible. The authors cite two cases,[8] but neither case held that reasonable reliance by expert witnesses transmuted epidemiologic studies into admissible evidence. The text of Rule 703 itself, and the overwhelming weight of case law interpreting and applying the rule,[9]  makes clear that the rule does not render scientific studies admissible. The two cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).[10] As such, the cases failed to support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies. The third edition thus, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence. The point was reasonably clear, however, that studies “may be offered” to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused.

The Reference Manual was certainly not alone in advancing the notion that studies are themselves admissible. Other well-respected evidence scholars have misstated the law on this issue.[11] The fourth edition would do well to note that scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay. A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication. Those leaps do not mean that the final results are thus untrustworthy or not reasonably relied upon, but they do raise well-nigh insuperable barriers to admissibility. The inadmissibility of scientific studies is generally not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

The fourth edition might well also note that under Rule 104(a), the Rules of Evidence themselves do not govern a trial court’s preliminary determination, under Rules 702 or 703, of the admissibility of an expert witness’s opinion, or the appropriateness of reliance upon a particular study. Although Rule 705 may allow disclosure of facts and data described in studies, it is not an invitation to permit testifying expert witnesses to become a conduit for off-hand comments and opinions in the introduction or discussion sections of relied upon articles.[12] The wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence. Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

The third edition evidence considerable ambivalence in whether trial judges should engage in resolving disputes about the validity of individual studies relied upon by expert witnesses. Since 2000, Rule 702 clearly required such engagement, which made the Manual’s hesitancy, on the whole, unjustifiable.  The ambivalence with respect to study validity, however, was on full display in the late Professor Margaret Berger’s chapter, “The Admissibility of Expert Testimony.”[13] Berger’s chapter criticized “atomization,” or looking at individual studies in isolation, a process she described pejoratively as “slicing-and-dicing.”[14]

Drawing on the publications of Daubert-critic Susan Haack, Berger appeared to reject the notion that courts should examine the reliability of each study independently.[15] Berger described the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer (IARC), the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[16]

Berger’s description of the review process, however, was profoundly misleading in its incompleteness. Of course, scientists undertaking a systematic review identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems. Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”[17]

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010, before the third edition was published. She was no friend of Daubert,[18] but her antipathy remarkably outlived her. Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[19]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.” One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Another curious omission in the third edition’s discussions of Milward is the dark ethical cloud of misconduct that hovers over the First Circuit’s reversal of the trial court’s exclusions of Martyn Smith and Carl Cranor. On appeal, the Council for Education and Research on Toxics (CERT) filed an amicus brief in support of reversing the exclusion of Smith and Cranor. The CERT amicus brief, however, never disclosed that CERT was founded by Smith and Cranor, and that CERT funded Smith’s research.[20]

Rule 702 requires courts to pay attention to, among other things, the sufficiency of the facts and data relied upon by expert witnesses. Rule 703’s requirement that individual studies must be reasonably relied upon is an important additional protreptic against the advice given by Professor Berger, in the third edition.


[1] The index notes the following page references for Rule 703: 214, 361, 363-364, and 610 n.184.

[2] See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”);  Schachtman, “Rule 703 – The Problem Child of Article VII,” 17 Proof 3 (Spring 2009); Schachtman “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008). See also Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008); “RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011); “Giving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[3] RMSE3d at 214.

[4] RMSE3d at 364 (internal citations omitted).

[5] RMSE 3d at 610 (internal citations omitted).

[6] RSME3d at 601 n.184.

[7] Rule 802 (“Hearsay Rule”) “Hearsay is not admissible except as provided by these rules or by other rules prescribed by the Supreme Court pursuant to statutory authority or by Act of Congress.”

[8] Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984); Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984). The chapter also cited another the en banc decision in Christophersen for the proposition that “[a]s a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . . ” In the Christophersen case, the Fifth Circuit was clearly addressing the admissibility of the challenged expert witness’s opinions, not the admissibility of relied-upon studies. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1111, 1113-14 (5th Cir. 1991) (en banc) (per curiam) (trial court may exclude opinion of expert witness whose opinion is based upon incomplete or inaccurate exposure data), cert. denied, 112 S. Ct. 1280 (1992).

[9] Interestingly, the authors of this chapter abandoned their suggestion, advanced in the second edition, that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5).” which was part of their argument in the Second Edition. RMSE 2d at 335 (2000). See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[10] See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18. These holdings predated the Supreme Court’s 1993 decision in Daubert, and the issue whether they are subject to Rule 702 has not been addressed.  Federal agency factual findings have been known to be invalid, on occasion.

[11] David L. Faigman, et al., Modern Scientific Evidence: The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[12] Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093, 1093 (2004) (advising readers on how to avoid being misled by published literature, and counseling readers to “Read only the Methods and Results sections; bypass the Discussion section.”)  (emphasis added).

[13] RSME 3d 11 (2011).

[14] Id. at 19.

[15] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[16] Id. at 19-20 & n.52.

[17] See Berger, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Professor Berger never mentions Rule 703 at all!  Gone and forgotten.

[18] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[19] Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”) The addition of the controversial Milward decision cannot seriously be considered an “edit.”

[20]From Here to CERT-ainty” (June 28, 2018); “ THE COUNCIL FOR EDUCATION AND RESEARCH ON TOXICS” (July 9, 2013).

Reference Manual – Desiderata for 4th Edition – Part V – Specific Tortogens

February 14th, 2023

Examples are certainly helpful to explain and to show judges how real scientists reach causal conclusions. The Reference Manual should certainly give such examples of how scientists determine whether a claim has been adequately tested, and whether the claim has eliminated the myriad kinds of error that threaten such claims and require us to withhold our assent. The third edition of the Manual, however, advances some dodgy examples, without any data or citations. I have already pointed out that the third edition’s reference to clear cell adenocarcinoma of the vagina in young women as a “signal” disease caused only by DES is incorrect.[1] There are, alas, other troubling examples in the third edition, which are due for pruning.

Claimed Interaction Between Asbestos and Tobacco Risks for Lung Cancer

The third edition’s chapter on epidemiology discusses the complexities raised by potential interaction between multiple exposures. The discussion is appropriately suggesting that a relative risk cannot be used to determine the probability of individual causation “if the agent interacts with another cause in a way that results in an increase in disease beyond merely the sum of the increased incidence due to each agent separately.” The suggestion is warranted, although the chapter then is mum on whether there are other approaches that can be invoked to derive probabilities of causation when multiple exposures interact in a known way. Then the authors provided an example:

“For example, the relative risk of lung cancer due to smoking is around 10, while the relative risk for asbestos exposure is approximately 5. The relative risk for someone exposed to both is not the arithmetic sum of the two relative risks, that is, 15, but closer to the product (50- to 60-fold), reflecting an interaction between the two.200 Neither of the individual agent’s relative risks can be employed to estimate the probability of causation in someone exposed to both asbestos and cigarette smoke.”[2]

Putting aside for the moment the general issue of interaction, the chapter’s use of the Mt. Sinai catechism, of 5-10-50, for asbestos and tobacco smoking and lung cancer, is a poor choice. The evidence for multiplicative interaction was advanced by the late Irving Selikoff, and frankly the evidence was never very good. The supposed “non-smokers” were really “never smoked regularly,” and the smoking histories were taken by postcard surveys. The cohort of asbestos insulators was well aware of the study hypothesis, in that many of its members had compensations claims, and they had an interest in downplaying their smoking.  Indeed, the asbestos workers’ union helped fund Selikoff’s work, and Selikoff had served as a testifying expert witness for claimants.

Given that “never smoked regularly” is not the same as never having smoked, and given that the ten-fold risk from smoking-alone was already an underestimate of lung cancer risk from smoking alone, the multiplicative model never was on a firm basis.  The smoking-alone risk ratio was doubled in the American Cancer Society’s Cancer Prevention Survey Numbers One and Two, but the Mt. Sinai physicians, who frequently testified in lawsuits for claimants steadfastly held to their outdated statistical control group.[3] It is thus disturbing that the third edition’s authors trotted out a summary of asbestos / smoking lung cancer risks based upon Selikoff’s dodgy studies of asbestos insulation workers. The 5-10-50 dogma was already incorrect when the first edition went to press.

Not only were Selikoff’s study probably incorrect when originally published, updates to the insulation worker cohort published after his death, specifically undermine the multiplicative claim. In a 2013 publication by Selikoff’s successors, asbestos and smoking failed to show multiplicative interaction.  Indeed, occupational asbestos exposure that had not manifested in clinically apparent asbestosis did not show any interaction with smoking.  Only in a subgroup of insulators with clinically detectable asbestosis did the asbestosis and smoking show “supra-additive” (but not multiplicative) interaction.[4]

Manganese and Parkinson’s Disease

Table 1, of the toxicology chapter in the third edition, presented a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” The authors cautioned that the table was “not an exhaustive or inclusive list of organs, end points, or agents. Absence from this list does not indicate a relative lack of evidence for a causal relation as to any agent of concern.”[5] Among the examples presented in this Table 1 was neurotoxicity in the form of “Parkinson’s disease and manganese”[6]

The presence of this example of this example in Table 1 is curious on a number of fronts. First, one of the members of the Development Committee for the third edition was Judge Kathleen O’Malley, who presided over a multi-district litigation involving claims for parkinsonism and Parkinson’s disease against manufacturers of welding rods. It seemed unlikely that Judge O’Malley would have overlooked this section. See, e.g., In re Welding Fume Prods. Liab. Litig., 245 F.R.D. 279 (N.D. Ohio 2007) (exposure to manganese fumes allegedly increased the risk of later developing brain damage). More important, however, the authors’ inclusion of Parkinson’s disease as an outcome from manganese exposure is remarkable because that putative relationship has been extensively studied and rejected by leading researchers in the field of movement disorders.[7] In 2010, neuro-epidemiologists published a comprehensive meta-analysis that confirmed the absence of a relationship between manganese exposure and Parkinson’s disease.[8] The inclusion in Table 1 of a highly controversial relationship, manganese-Parkinson’s disease, suggests either undisclosed partisanship or ignorance of the relevant scientific evidence.

Mesothelioma

The toxicology chapter of the third edition also weighed in on mesothelioma as a supposed signature disease of asbestos exposure. The chapter’s authors described mesothelioma as “almost always caused by asbestos,”[9] which was no doubt true when mesothelioma was first identified as caused by fibrous amphibole minerals.[10] The last two decades, however, has seen a shift in the incidence of mesothelioma among industrially exposed workers, which reveals more cases without asbestos exposure and with other potential causes. Leading scientists in the field have acknowledged non-asbestos causes,[11] and recently researchers have identified genetic mutations that completely account for the causation of individual cases of mesothelioma.[12] It is time for the fourth edition to acknowledge other causes of mesothelioma, and to offer judges and lawyers guidance on genetic causes of sporadic diseases.


[1] SeeReference Manual – Desiderata for the Fourth Edition – Signature Disease” (Jan. 30, 2023).

[2] RMSE3d at 615 & n. 200. The chapter fails to cite support for the 5-10-50 dogma, but it is readily recognizable as the Mt. Sinai Catechism that was endlessly repeated by Irving Selikoff and his protégés.

[3] Michael J. Thun, Cathy A. Day-Lally, Eugenia E. Calle, W. Dana Flanders, and Clark W Heath, “Excess mortality among cigarette smokers: Changes in a 20-year interval,” 85 Am. J. Public Health 1223 (1995).

[4] Steve Markowitz, Stephen Levin, Albert Miller, and Alfredo Morabia, “Asbestos, Asbestosis, Smoking and Lung Cancer: New Findings from the North American Insulator Cohort,” 188 Am. J. Respir. & Critical Care Med. 90 (2013); seeThe Mt. Sinai Catechism” (June 7, 2013).

[5] RMSE3d at 653-54.

[6] Reference Manual at 653.

[7] See e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence. 26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and nonhuman primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong support to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clinical outcome is not associated with the degeneration of nigrostriatal dopaminergic neurons as is the case in PD [Parkinson’s disease].”)

[8] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

[9] Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” RMSE3d 633, 635 (2011).

[10] See J. Christopher Wagner, C.A. Sleggs, and Paul Marchand, “Diffuse pleural mesothelioma and asbestos exposure in the North Western Cape Province,” 17 Br. J. Indus. Med. 260 (1960); J. Christopher Wagner, “The discovery of the association between blue asbestos and mesotheliomas and the aftermath,” 48 Br. J. Indus. Med. 399 (1991); see also Harriet Hardy, M.D., Challenging Man-Made Disease:  The Memoirs of Harriet L. Hardy, M.D. 95 (1983); “Harriet Hardy’s Views on Asbestos Issues” (Mar. 13, 2013).

[11] Richard L. Attanoos, Andrew Churg, Allen R. Gibbs, and Victor L. Roggli, “Malignant Mesothelioma and Its Non-Asbestos Causes,” 142 Arch. Pathol. & Lab. Med. 753 (2018).

[12] Angela Bononia, Qian Wangb, Alicia A. Zolondick, Fang Baib, Mika Steele-Tanjia, Joelle S. Suareza , Sandra Pastorinoa, Abigail Sipesa, Valentina Signoratoa, Angelica Ferroa, Flavia Novellia , Jin-Hee Kima, Michael Minaaia,d, Yasutaka Takinishia, Laura Pellegrinia, Andrea Napolitanoa, Ronghui Xua , Christine Farrara , Chandra Goparajua, Cristian Bassig, Massimo Negrinig, Ian Paganoa , Greg Sakamotoa, Giovanni Gaudinoa, Harvey I. Pass, José N. Onuchic , Haining Yang, and Michele Carbone, “BAP1 is a novel regulator of HIF-1α,” 120 Proc. Nat’l Acad. Sci. e2217840120 (2023).