TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Zhang’s Glyphosate Meta-Analysis Succumbs to Judicial Scrutiny

August 5th, 2024

Back in March 2015, the International Agency for Research on Cancer (IARC) issued its working group’s monograph on glyphosate weed killer. The report classified glyphosate as a “probable carcinogen,” which is highly misleading. For IARC, the term “probable” does not mean more likely than not, or for that matter, probable does not have any quantitative meaning. The all-important statement of IARC methods, “The Preamble,” makes this clear.[1] 

In the case of glyphosate, the IARC working group concluded that the epidemiologic evidence for an association between glyphosate exposure and cancer (specifically non-Hodgkins lymphoma (NHL)), was limited, which is IARC’s euphemism for insuffcient. Instead of epidemiology, IARC’s glyphosate conclusion was based largely upon rodent studies, but even the animal evidence relied upon by IARC was dubious. The IARC working group cherry picked a few arguably “positive” rodent study results with increases in tumors, while ignoring exculpatory rodent studies with decreasing tumor yield.[2]

Although the IARC hazard classification was uncritically embraced by the lawsuit industry, most regulatory agencies, even indulging precautionary principle reasoning, rejected the claim of carcinogenicity. The United States  Environmental Protection Agency (EPA), European Food Safety Authority, Food and Agriculture Organization (in conjunction with World Health Organization, European Chemicals Agency, Health Canada, German Federal Institute for Risk Assessment, among others, found that the scientific evidence did not support the claim that glyphosate causes NHL. The IARC monograph very quickly after publication became the proximate cause of a huge litigation effort by the lawsuit industry against Monsanto.

The personal injury cases against Monsanto, filed in federal court, were aggregated for pre-trial hearing, before Judge Vince Chhabria, of the Northern District of California, as MDL 2741. Judge Chhabria denied Monsanto’s early Rule 702 motions, and thus cases proceeded to trial, with mixed results.

In 2019, the Zhang study, a curious meta-analysis of some of the available glyphosate epidemiologic studies appeared in Mutation Research / Reviews in Mutation Research, a toxicology journal that seemed an unlikely venue for a meta-analysis of epidemiologic studies. The authors combined selected results from one large cohort study, the Agricultural Health Study, along with five case-control studies, to reach a summary relative risk of 1.41 (95% confidence interval 1.13-1.75).[3] According to the authors, their “current meta-analysis of human epidemiological studies suggests a compelling link between exposures to GBHs [glyphosate] and increased risk for NHL.”

The Zhang meta-analysis was not well reviewed in regulatory and scientific circles. The EPA found that Zhang used inappropriate methods in her meta-analysis.[4] Academic authors also panned the Zhang meta-analysis in both scholarly,[5] and popular articles.[6] The senior author of the Zhang paper, Lianne Sheppard, a Professor in the University of Washington Departments of Environmental  and  Occupational Health Sciences, and Biostatistics, attempted to defend the study, in Forbes.[7] Professor Geoffrey Kabat very adeptly showed that this defense was futile.[8] Despite the very serious and real objections to the validity of the Zhang meta-analysis, plaintiffs’ expert witnesses, such as Beate Ritz, an epidemiologist with U.C.L.A. testified that she trusted and relied upon the analysis.[9]

For five years, the Zhang study was a debating point for lawyers and expert witnesses in the glyphosate litigation, without significant judicial gatekeeping. It took the entrance of Luoping Zhang herself as an expert witness in the glyphosate litigation, and the procedural oddity of her placing exclusive reliance upon her own meta-analysis, to bring the meta-analysis into the unforgiving light of judicial scrutiny.

Zhang is a biochemist and toxicologist, in the University of California, Berkeley. Along with two other co-authors of her 2019 meta-analysis paper, she had been a board member of the EPA’s 2016 scientific advisory panel on glyphosate. After plaintiffs’ counsel disclosed Zhang as an expert witness, she disclosed her anticipated testimony, as is required by Federal Rule of Civil Procedure 26, by attaching and adopting by reference the contents of two of her published papers. The first paper was her 2019 meta-analysis; the other paper discussed putative mechanisms. Neither paper concluded that glyphosate causes NHL. Zhang’s disclosure did not add materially to her 2019 published analysis of six epidemiologic studies on glyphosate and NHL.

The defense challenged the validity of Dr. Zhang’s proffered opinions, and her exclusive reliance upon her own 2019 meta-analysis required the MDL court to pay attention to the failings of that paper, which had previously escaped critical judicial scrutiny. In June 2024, after an oral hearing in Bulone v. Monsanto, at which Dr. Zhang testified, Judge Chhabria ruled that Zhang’s proffered testimony, and her reliance upon her own meta-analysis was “junk science.”[10]

Judge Chhabria, perhaps encouraged by the recently fortifying amendment to Rule 702, issued a remarkable opinion that paid close attention to the indicia of validity of an expert witness’s opinion and the underlying meta-analysis. Judge Chhabria quickly spotted the disconnect between Zhang’s published papers and what is required for an admissible causation opinion. The mechanism paper did not address the extant epidemiology, and both sides in the MDL had emphasized that the epidemiology was critically important for determining whether there was, or was not, causation.

Zhang’s meta-analysis did evaluate some, but not all, of the available epidemiology, but the paper’s conclusion stopped considerably short of the needed opinion on causation. Zhang and colleagues had concluded that there was a “compelling link” between exposures to [glyphosate-based herbicides] and increased risk for NHL. In their paper’s key figure, show casing the summary estimate of relative risk of 1.41 (95% C.I., 1.13 -1.75), Zhang and her co-authors concluded only that exposure was “associated with an increased risk of NHL.” According to Judge Chhabria, in incorporating her 2019 paper into her Rule 26 report, Zhang failed to add a proper holistic causation analysis, as had other expert witnesses who had considered the Bradford Hill predicates and considerations.

Judge Chhabria picked up on another problem that has both legal and scientific implications. A meta-analysis is out of date as soon as a subsequent epidemiologic study becomes available, which would have satisfied the inclusion criteria for the meta-analysis. Since publishing her meta-analysis in 2019, additional studies had in fact been published. At the hearing, Dr. Zhang acknowledged that several of them would qualify for inclusion in the meta-analysis, per her own stated methods. Her failure to update the meta-analysis made her report incomplete and inadmissible for a court matter in 2024.

Judge Chhabria might have stopped there, but he took a closer look at the meta-analysis to explore whether it was a valid analysis, on its own terms. Much as Chief Judge Nancy Rosenstengel had done with the made-for-litigation meta-analysis concocted by Martin Wells in the paraquat litigation,[11] Judge Chhabria examined whether Zhang had been faithful to her own stated methods. Like Chief Judge Rosenstengel’s analysis, Judge Chhabria’s analysis stands as a strong rebuttal to the uncharitable opinion of Professor Edward Cheng, who has asserted that judges lack the expertise to evaluate the “expert opinions” before them.[12]

Judge Chhabria accepted the intellectual challenge that Rule 702 mandates. With the EPA memorandum lighting the way, Judge Chhabria readily discerned that “the challenged meta-analysis was not reliably performed.” He declared that the Zhang meta-analysis was “junk science,” with “deep methodological problems.”

Zhang claimed that she was basing the meta-analysis on the subgroups of six studies with the heaviest glyphosate exposure. This claim was undermined by the absence of any exposure-response gradient in the study deemed by Zhang to be of the highest quality. Furthermore, of the remaining five studies, three studies failed to provide any exposure-dependent analysis other than a comparison of NHL rates among “ever” versus “never” glyphosate exposure. As a result of this heterogeneity, Zhang used all the data from studies without exposure characterizations, but only limited data from the other studies that analyzed NHL by exposure levels. And because the highest quality study was among those that provided exposure level correlations, Zhang’s meta-analysis used only some of the data from it.

The analytical problems created by Zhang’s meta-analytical approach were compounded by the included studies’ having measured glyphosate exposures differently, with different cut-points for inclusion as heavily exposed. Some of the excluded study participants would have heavier exposure than those included in the summary analysis.

In the universe of included studies, some provided adjusted results from multi-variate analyses that included other pesticide exposures. Other studies reported only unadjusted results. Even though Zhang’s method stated a preference for adjusted analyses, she inexplicably failed to use adjusted data in the case of one study that provided both adjusted and unadjusted results.

As shown in Judge Chhabria’s review, Zhang’s methodological errors created an incoherent analysis, with methods that could not be justified. Even accepting its own stated methodology, the meta-analysis was an exercise in cherry picking. In the court’s terms, it was, without qualification, “junk science.”

After the filing of briefs, Judge Chhabria provided the parties an oral hearing, with an opportunity for viva voce testimony. Dr. Zhang thus had a full opportunity to defend her meta-analysis. The hearing, however, did not go well for her. Zhang could not talk intelligently about the studies included, or how they defined high exposure. Zhang’s lack of familiarity with her own opinion and published paper was yet a further reason for excluding her testimony.

As might be expected, plaintiffs’ counsel attempted to hide behind peer review. Plaintiffs’ counsel attempted to shut down Rule 702 scrutiny of the Zhang meta-analysis by suggesting that the trial court had no business digging into validity concerns given that Zhang had published her meta-analysis in what apparently was a peer reviewed journal. Judge Chhabria would have none of it. In his opinion, publication in a peer-reviewed journal cannot obscure the glaring methodological defects of the relied upon meta-analysis. The court observed that “[p]re-publication editorial peer review, just by itself, is far from a guarantee of scientific reliability.”[13] The EPA memorandum was thus a more telling indicator of the validity issues than the publication in a nominally peer-reviewed journal.

Contrary to some law professors who are now seeking to dismantle expert witness gatekeeping as beyond a judge’s competence, Judge Chhabria dismissed the suggestion that he lacked the expertise to adjudicate the validity issues. Indeed, he displayed a better understanding of the meta-analytic process than did Dr. Zhang. As the court observed, one of the goals of MDL assignments was to permit a single trial judge to have time to engage with the scientific issues and to develop “fluency” in the relevant scientific studies. Indeed, when MDL judges have the fluency in the scientific concepts to address Rule 702 or 703 issues, it would be criminal for them to ignore it.

The Bulone opinion should encourage lawyers to get “into the weeds” of expert witness opinions. There is nothing that a little clear thinking – and glyphosate – cannot clear away. Indeed, now that the weeds of Zhang’s meta-analysis are cleared away, it is hard to fathom that any other expert witness can rely upon it without running afoul of both Federal Rules of Evidence 702 and 703.

There were a few issues not addressed in Bulone. As seen in her oral hearing testimony, Zhang probably lacked the qualifications to proffer the meta-analysis. The bar for qualification as an expert witness, however, is sadly very low. One other issue that might well have been addressed is Zhang’s use of a fixed effect model for her meta-analysis. Considering that she was pooling data from cohort and case-control studies, some with and some without adjustments for confounders, with different measures of exposure, and some with and some without exposure-dependent analyses, Zhang and her co-authors were not justified in using a fixed effect model for arriving at a summary estimate of relative risk. Admittedly, this error could easily have been lost in the flood of others.

Postscript

Glyphosate is not merely a scientific issue. Its manufacturer, Monsanto, is the frequent target of media outlets (such as Telesur) from autocratic countries, such as Communist China and its client state, Venezuela.[14]

天安门广场英雄万岁


[1]The IARC-hy of Evidence – Incoherent & Inconsistent Classifications of Carcinogenicity,” Tortini (Sept. 19, 2023).

[2] Robert E Tarone, “On the International Agency for Research on Cancer classification of glyphosate as a probable human carcinogen,” 27 Eur. J. Cancer Prev. 82 (2018).

[3] Luoping Zhang, Iemaan Rana, Rachel M. Shaffer, Emanuela Taioli, Lianne Sheppard, “Exposure to glyphosate-based herbicides and risk for non-Hodgkin lymphoma: A meta-analysis and supporting evidence,” 781 Mutation Research/Reviews in Mutation Research 186 (2019).

[4] David J. Miller, Acting Chief Toxicology and Epidemiology Branch Health Effects Division, U.S. Environmental Protection Agency, Memorandum to Christine Olinger, Chief Risk Assessment Branch I, “Glyphosate: Epidemiology Review of Zhang et al. (2019) and Leon et al. (2019) publications for Response to Comments on the Proposed Interim Decision” (Jan. 6, 2020).

[5] Geoffrey C. Kabat, William J. Price, Robert E. Tarone, “On recent meta-analyses of exposure to glyphosate and risk of non-Hodgkin’s lymphoma in humans,” 32 Cancer Causes & Control 409 (2021).

[6] Geoffrey Kabat, “Paper Claims A Link Between Glyphosate And Cancer But Fails To Show Evidence,” Science 2.0 (Feb. 18, 2019).

[7] Lianne Sheppard, “Glyphosate Science is Nuanced. Arguments about it on the Internet? Not so much,” Forbes (Feb. 20, 2020).

[8] Geoffrey Kabat, “EPA Refuted A Meta-Analysis Claiming Glyphosate Can Cause Cancer And Senior Author Lianne Sheppard Doubled Down,” Science 2.0 (Feb. 26, 2020).

[9] Maria Dinzeo, “Jurors Hear of New Study Linking Roundup to Cancer,” Courthouse News Service (April 8, 2019).

[10] Bulone v. Monsanto Co., Case No. 16-md-02741-VC, MDL 2741 (N.D. Cal. June 20, 2024). See Hank Campbell, “Glyphosate legal update: Meta-study used by ambulance-chasing tort lawyers targeting Bayer’s Roundup as carcinogenic deemed ‘junk science nonsense’ by trial judge,” Genetic Literacy Project (June 24, 2024).

[11] In re Paraquat Prods. Liab. Litig., No. 3:21-MD-3004-NJR, 2024 WL 1659687 (S.D. Ill. Apr. 17, 2024) (opinion sur Rule 702 motion), appealed sub nom., Fuller v. Syngenta Crop Protection, LLC, No. 24-1868 (7th Cir. May 17, 2024). SeeParaquat Shape-Shifting Expert Witness Quashed,” Tortini (April 24, 2024).

[12] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022). SeeCheng’s Proposed Consensus Rule for Expert Witnesses,” Tortini (Sept. 15, 2022); “Further thoughts on Cheng’s Consensus Rule,” Tortini (Oct. 3, 2022).

[13] Bulone, citing Valentine v. Pioneer Chlor Alkali Co., 921 F. Supp. 666, 674-76 (D. Nev. 1996), for its distinction between “editorial peer review” and “true peer review,” with the latter’s inclusion of post-publication assessment of a paper as really important for Rule 702 purposes.

[14] Anne Applebaum, Autocracy, Inc.: The Dictators Who Want to Run the World 66 (2024).

Paraquat Shape-Shifting Expert Witness Quashed

April 24th, 2024

Another multi-district litigation (MDL) has hit a jarring speed bump. Claims for Parkinson’s disease (PD), allegedly caused by exposure to paraquat dichloride (paraquat), were consolidated, in June 2021, for pre-trial coordination in MDL No. 3004, in the Southern District of Illinois, before Chief Judge Nancy J. Rosenstengel. Like many health-effects litigation claims, the plaintiffs’ claims in these paraquat cases turn on epidemiologic evidence. To make their causation case in the first MDL trial cases, plaintiffs’ counsel nominated a statistician, Martin T. Wells, to present their causation case. Last week, Judge Rosenstengel found Wells’ opinion so infected by invalid methodologies and inferences as to be inadmissible under the most recent version of Rule 702.[1] Summary judgment in the trial cases followed.[2]

Back in the 1980s, paraquat gained some legal notoriety in one of the most retrograde Rule 702 decisions.[3] Both the herbicide and Rule 702 survived, however, and they both remain in wide use. For the last two decades, there has been a widespread challenges to the safety of paraquat, and in particular there have been claims that paraquat can cause PD or parkinsonism under some circumstances.  Despite this background, the plaintiffs’ counsel in MDL 3004 began with four problems.

First, paraquat is closely regulated for agricultural use in the United States. Under federal law, paraquat can be used to control the growth of weeds only “by or under the direct supervision of a certified applicator.”[4] The regulatory record created an uphill battle for plaintiffs.[5] Under the Federal Insecticide, Fungicide, and Rodenticide Act (“FIFRA”), the U.S. EPA has regulatory and enforcement authority over the use, sale, and labeling of paraquat.[6] As part of its regulatory responsibilities, in 2019, the EPA systematically reviewed available evidence to assess whether there was an association between paraquat and PD. The agency’s review concluded that “there is limited, but insufficient epidemiologic evidence at this time to conclude that there is a clear associative or causal relationship between occupational paraquat exposure and PD.”[7] In 2021, the EPA issued its Interim Registration Review Decision, and reapproved the registration of paraquat. In doing so, the EPA concluded that “the weight of evidence was insufficient to link paraquat exposure from pesticidal use of U.S. registered products to Parkinson’s disease in humans.”[8]

Second, beyond the EPA, there were no other published reviews, systematic or otherwise, which reached a conclusion that paraquat causes PD.[9]

Third, the plaintiffs claims faced another serious impediment. Their counsel placed their reliance upon Professor Martin Wells, a statistician on the faculty of Cornell University. Unfortunately for plaintiffs, Wells has been known to operate as a “cherry picker,” and his methodology has been previously reviewed in an unfavorable light. Another MDL court, which reviewed a review and meta-analysis propounded by Wells, found that his reports “were marred by a selective review of data and inconsistent application of inclusion criteria.”[10]

Fourth, the plaintiffs’ claims were before Chief Judge Nancy J. Rosenstengel, who was willing to do the hard work required under Rule 702, specially as it has been recently amended for clarification and emphasis of the gatekeeper’s responsibilities to evaluate validity issues in the proffered opinions of expert witnesses. As her 97 page decision evinces, Judge Rosenstengel conducted four days of hearings, which included viva voce testimony from Martin Wells, and she obviously read the underlying papers, reviews, as well as the briefs and the Reference Manual on Scientific Evidence, with great care. What followed did not go well for Wells or the plaintiffs’ claims.[11] Judge Rosenstengel has written an opinion that may be the first careful judicial consideration of the basic requirements of systematic review.

The court noted that systematic reviewers carefully define a research question and what kinds of empirical evidence will be reviewed, and then collect, summarize, and, if feasible, synthesize the available evidence into a conclusion.[12] The court emphasized that systematic reviewers should “develop a protocol for the review before commencement and adhere to the protocol regardless of the results of the review.”[13]

Wells proffered a meta-analysis, and a “weight of the evidence” (WOE) review from which he concluded that paraquat causes PD and nearly triples the risk of the disease among workers exposed to the herbicide.[14] In his reports, Wells identified a universe of at least 36 studies, but included seven in his meta-analysis. The defense had identified another two studies that were germane.[15]

Chief Judge Rosenstengel’s opinion is noteworthy for its fine attention to detail, detail that matters to the validity of the expert witness’s enterprise. Martin Wells set out to do a meta-analysis, which was all fine and good. With a universe of 36 studies, with sub-findings, alternative analyses, and changing definitions of relevant exposure, the devil lay in the details.

The MDL court was careful to point out that it was not gainsaying Wells’ decision to limit his meta-analysis to case-control studies, or to his grading of any particular study as being of low quality. Systematic reviews and meta-analyses are generally accepted techniques that are part of a scientific approach to causal inference, but each has standards, predicates, and requirements for valid use. Expert witnesses must not only use a reliable methodology, Rule 702(d) requires that they must reliably apply their chosen methodology to the facts at hand in reaching their conclusions.[16]

The MDL court concluded that Wells’ meta-analysis was not sufficiently reliable under Rule 702 because he failed faithfully and reliably to apply his own articulated methodology. The court followed Wells’ lead in identifying the source and content of his chosen methodology, and simply examined his proffered opinion for compliance with that methodology.[17] The basic principles of validity for conducting meta-analyses were not, in any event, really contested. These principles and requirements were clearly designed to ensure and enhance the reliability of meta-analyses by pre-empting results-driven, reverse-engineered summary estimates of association.

The court found that Wells failed clearly to pre-specify his eligibility criteria. He then proceeded to redefine exposure criteria and study inclusion or eligibility criteria, and study quality criteria, after looking at the evidence. He also inconsistently applied his stated criteria, all in an apparently desired effort to exclude less favorable study outcomes. These ad hoc steps were some of Wells’ deviations from the standards to which he played lip service.

The court did not exclude Wells because it disagreed with his substantive decisions to include or exclude any particular study, or his quality grading of any study. Rather, Dr. Wells’ meta-analysis does not pass muster under Rule 702 because its methodology was unclear, inconsistently applied, not replicable, and at times transparently reverse-engineered.[18]

The court’s evaluation of Wells was unflinchingly critical. Wells’ proffered opinions “required several methodological contortions and outright violations of the scientific standards he professed to apply.”[19] From his first involvement in this litigation, Wells had violated the basic rules of conducting systematic reviews and meta-analyses.[20] His definition of “occupational” exposure meandered to suit his desire to include one study (with low variance) that might otherwise have been excluded.[21] Rather than pre-specifying his review process, his study inclusion criteria, and his quality scores, Wells engaged in an unwritten “holistic” review process, which he conceded was not objectively replicable. Wells’ approach left him free to include studies he wanted in his meta-analysis, and then provide post hoc justifications.[22] His failure to identify his inclusion/exclusion criteria was a “methodological red flag” in Dr. Wells’ meta-analysis, which suggested his reverse engineering of the whole analysis, the “very antithesis of a systematic review.”[23]

In what the court described as “methodological shapeshifting,” Wells blatantly and inconsistently graded studies he wanted to include, and had already decided to include in his meta-analysis, to be of higher quality.[24] The paraquat MDL court found, unequivocally, that Wells had “failed to apply the same level of intellectual rigor to his work in the four trial selection cases that would be required of him and his peers in a non-litigation setting.”[25]

It was also not lost upon the MDL court that Wells had shifted from a fixed effect to a random effects meta-analysis, between his principal and rebuttal reports.[26] Basic to the meta-analytical enterprise is a predicate systematic review, properly done, with pre-specification of inclusion and exclusion criteria for what studies would go into any meta-analysis. The MDL court noted that both sides had cited Borenstein’s textbook on meta-analysis,[27] and that Wells had himself cited the Cochrane Handbook[28] for the basic proposition that that objective and scientifically valid study selection criteria should be clearly stated in advance to ensure the objectivity of the analysis.

There was of course legal authority for this basic proposition about prespecification. Given that the selection of studies that go into a systematic review and meta-analysis can be dispositive of its conclusion, undue subjectivity or ad hoc inclusion can easily arrange a desired outcome.[29] Furthermore, meta-analysis carries with it the opportunity to mislead a lay jury with a single (and inflated) risk ratio,[30] which is obtained by the operator’s manipulation of inclusion and exclusion criteria. This opportunity required the MDL court to examine the methodological rigor of the proffered meta-analysis carefully to evaluate whether it reflects a valid pooling of data or it was concocted to win a case.[31]

Martin Wells had previously acknowledged the dangers of manipulation and subjective selectivity inherent in systematic reviews and meta-analyses. The MDL court quoted from Wells’ testimony in Martin v. Actavis:

QUESTION: You would certainly agree that the inclusion-exclusion criteria should be based upon objective criteria and not simply because you were trying to get to a particular result?

WELLS: No, you shouldn’t load the – sort of cook the books.

QUESTION: You should have prespecified objective criteria in advance, correct?

WELLS: Yes.[32]

The MDL court also picked up on a subtle but important methodological point about which odds ratio to use in a meta-analysis when a study provides multiple analyses of the same association. In his first paraquat deposition, Wells cited the Cochrane Handbook, for the proposition that if a crude risk ratio and a risk ratio from a multivariate analysis are both presented in a given study, then the adjusted risk ratio (and its corresponding measure of standard error seen in its confidence interval) is generally preferable to reduce the play of confounding.[33] Wells violated this basic principle by ignoring the multivariate analysis in the study that dominated his meta-analysis (Liou) in favor of the unadjusted bivariate analysis. Given that Wells accepted this basic principle, the MDL court found that Wells likely selected the minimally adjusted odds ratio over the multiviariate adjusted odds ratio for inclusion in his meta-analysis in order to have the smaller variance (and thus greater weight) from the former. This maneuver was disqualifying under Rule 702.[34]

All in all, the paraquat MDL court’s Rule 702 ruling was a convincing demonstration that non-expert generalist judges, with assistance from subject-matter experts, treatises, and legal counsel, can evaluate and identify deviations from methodological standards of care.


[1] In re Paraquat Prods. Prods. Liab. Litig., Case No. 3:21-md-3004-NJR, MDL No. 3004, Slip op., ___ F.3d ___ (S.D. Ill. Apr. 17, 2024) [Slip op.]

[2] In re Paraquat Prods. Prods. Liab. Litig., Op. sur motion for judgment, Case No. 3:21-md-3004-NJR, MDL No. 3004 (S.D. Ill. Apr. 17, 2024). See also Brendan Pierson, “Judge rejects key expert in paraquat lawsuits, tosses first cases set for trial,” Reuters (Apr. 17, 2024); Hailey Konnath, “Trial-Ready Paraquat MDL Cases Tossed After Testimony Axed,” Law360 (Apr. 18, 2024).

[3] Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). SeeFerebee Revisited,” Tortini (Dec. 28, 1017).

[4] See 40 C.F.R. § 152.175.

[5] Slip op. at 31.

[6] 7 U.S.C. § 136w; 7 U.S.C. § 136a(a); 40 C.F.R. § 152.175. The agency must periodically review the registration of the herbicide. 7 U.S.C. § 136a(g)(1)(A). See Ruckelshaus v. Monsanto Co., 467 U.S. 986, 991-92 (1984).

[7] See Austin Wray & Aaron Niman, Memorandum, Paraquat Dichloride: Systematic review of the literature to evaluate the relationship between paraquat dichloride exposure and Parkinson’s disease at 35 (June 26, 2019).

[8] See also Jeffrey Brent and Tammi Schaeffer, “Systematic Review of Parkinsonian Syndromes in Short- and Long-Term Survivors of Paraquat Poisoning,” 53 J. Occup. & Envt’l Med. 1332 (2011) (“An analysis the world’s entire published experience found no connection between high-dose paraquat exposure in humans and the development of parkinsonism.”).

[9] Douglas L. Weed, “Does paraquat cause Parkinson’s disease? A review of reviews,” 86 Neurotoxicology 180, 180 (2021).

[10] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp. 3d 1007, 1038, 1043 (S.D. Cal. 2021), aff’d, No. 21-55342, 2022 WL 898595 (9th Cir. Mar. 28, 2022) (per curiam). SeeMadigan’s Shenanigans and Wells Quelled in Incretin-Mimetic CasesTortini (July 15, 2022).

[11] The MDL court obviously worked hard to learn the basics principles of epidemiology. The court relied extensively upon the epidemiology chapter in the Reference Manual on Scientific Evidence. Much of that material is very helpful, but its exposition on statistical concepts is at times confused and erroneous. It is unfortunate that courts do not pay more attention to the more precise and accurate exposition in the chapter on statistics. Citing the epidemiology chapter, the MDL court gave an incorrect interpretation of the p-value: “A statistically significant result is one that is unlikely the product of chance. Slip op. at 17 n. 11. And then again, citing the Reference Manual, the court declared that “[a] p-value of .1 means that there is a 10% chance that values at least as large as the observed result could have been the product of random error. Id.” Id. Similarly, the MDL court gave an incorrect interpretation of the confidence interval. In a footnote, the court tells us that “[r]esearchers ordinarily assert a 95% confidence interval, meaning that ‘there is a 95% chance that the “true” odds ratio value falls within the confidence interval range’. In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., MDL No. 2342, 2015 WL 7776911, at *2 (E.D. Pa. Dec. 2, 2015).” Slip op. at 17n.12.  Citing another court for the definition of a statistical concept is a risky business.

[12] Slip op. at 20, citing Lisa A. Bero, “Evaluating Systematic Reviews and Meta-Analyses,” 14 J.L. & Pol’y 569, 570 (2006).

[13] Slip op. at 21, quoting Bero, at 575.

[14] Slip op. at 3.

[15] The nine studies at issue were as follows: (1) H.H. Liou, et al., “Environmental risk factors and Parkinson’s disease; A case-control study in Taiwan,” 48 Neurology 1583 (1997); (2) Caroline M. Tanner, et al.,Rotenone, Paraquat and Parkinson’s Disease,” 119 Envt’l Health Persps. 866 (2011) (a nested case-control study within the Agricultural Health Study (“AHS”)); (3) Clyde Hertzman, et al., “A Case-Control Study of Parkinson’s Disease in a Horticultural Region of British Columbia,” 9 Movement Disorders 69 (1994); (4) Anne-Maria Kuopio, et al., “Environmental Risk Factors in Parkinson’s Disease,” 14 Movement Disorders 928 (1999); (5) Katherine Rugbjerg, et al., “Pesticide exposure and risk of Parkinson’s disease – a population-based case-control study evaluating the potential for recall bias,” 37 Scandinavian J. of Work, Env’t & Health 427 (2011); (6) Jordan A. Firestone, et al., “Occupational Factors and Risk of Parkinson’s Disease: A Population-Based Case-Control Study,” 53 Am. J. of Indus. Med. 217 (2010); (7) Amanpreet S. Dhillon,“Pesticide / Environmental Exposures and Parkinson’s Disease in East Texas,” 13 J. of Agromedicine 37 (2008); (8) Marianne van der Mark, et al., “Occupational exposure to pesticides and endotoxin and Parkinson’s disease in the Netherlands,” 71 J. Occup. & Envt’l Med. 757 (2014); (9) Srishti Shrestha, et al., “Pesticide use and incident Parkinson’s disease in a cohort of farmers and their spouses,” Envt’l Research 191 (2020).

[16] Slip op. at 75.

[17] Slip op. at 73.

[18] Slip op. at 75, citing In re Mirena IUS Levonorgestrel-Related Prod. Liab. Litig. (No. II), 341 F. Supp. 3d 213, 241 (S.D.N.Y. 2018) (“Opinions that assume a conclusion and reverse-engineer a theory to fit that conclusion are . . . inadmissible.”) (internal citation omitted), aff’d, 982 F.3d 113 (2d Cir. 2020); In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., No. 12-md-2342, 2015 WL 7776911, at *16 (E.D. Pa. Dec. 2, 2015) (excluding expert’s opinion where he “failed to consistently apply the scientific methods he articulat[ed], . . . deviated from or downplayed certain well established principles of his field, and . . . inconsistently applied methods and standards to the data so as to support his a priori opinion.”), aff’d, 858 F.3d 787 (3d Cir. 2017).

[19] Slip op. at 35.

[20] Slip op. at 58.

[21] Slip op. at 55.

[22] Slip op. at 41, 64.

[23] Slip op. at 59-60, citing In re Lipitor (Atorvastatin Calcium) Mktg., Sales Pracs. & Prod. Liab. Litig., 892 F.3d 624, 634 (4th Cir. 2018) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

[24] Slip op. at 67, 69-70, citing In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., 858 F.3d 787, 795-97 (3d Cir. 2017) (“[I]f an expert applies certain techniques to a subset of the body of evidence and other techniques to another subset without explanation, this raises an inference of unreliable application of methodology.”); In re Bextra and Celebrex Mktg. Sales Pracs. & Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1179 (N.D. Cal. 2007) (excluding an expert witness’s causation opinion because of his result-oriented, inconsistent evaluation of data sources).

[25] Slip op. at 40.

[26] Slip op. at 61 n.44.

[27] Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Rothstein, Introduction to Meta-Analysis (2d ed. 2021).

[28] Jacqueline Chandler, James Thomas, Julian P. T. Higgins, Matthew J. Page, Miranda Cumpston, Tianjing Li, Vivian A. Welch, eds., Cochrane Handbook for Systematic Reviews of Interventions (2ed 2023).

[29] Slip op. at 56, citing In re Zimmer Nexgen Knee Implant Prod. Liab. Litig., No. 11 C 5468, 2015 WL 5050214, at *10 (N.D. Ill. Aug. 25, 2015).

[30] Slip op. at 22. The court noted that the Reference Manual on Scientific Evidence cautions that “[p]eople often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiological ones, may consequently be overlooked.” Id., quoting from Manual, at 608.

[31] Slip op. at 57, citing Deutsch v. Novartis Pharms. Corp., 768 F. Supp. 2d 420, 457-58 (E.D.N.Y. 2011) (“[T]here is a strong risk of prejudice if a Court permits testimony based on an unreliable meta-analysis because of the propensity for juries to latch on to the single number.”).

[32] Slip op. at 64, quoting from Notes of Testimony of Martin Wells, in In re Testosterone Replacement Therapy Prod. Liab. Litig., Nos. 1:14-cv-1748, 15-cv-4292, 15-cv-426, 2018 WL 7350886 (N.D. Ill. Apr. 2, 2018).

[33] Slip op. at 70.

[34] Slip op. at 71-72, citing People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (“[A] statistical study that fails to correct for salient explanatory variables . . . has no value as causal explanation and is therefore inadmissible in federal court.”); In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1140 (N.D. Cal. 2018). Slip op. at 17 n. 12.

An Opinion to SAVOR

November 11th, 2022

The saxagliptin medications are valuable treatments for type 2 diabetes mellitus (T2DM). The SAVOR (Saxagliptin Assessment of Vascular Outcomes Recorded in Patients with Diabetes Mellitus) study was a randomized controlled trial, undertaken by manufacturers at the request of the FDA.[1] As a large (over sixteen thousand patients randomized) double-blinded cardiovascular outcomes trial, SAVOR collected data on many different end points in patients with T2DM, at high risk of cardiovascular disease, over a median of 2.1 years. The primary end point was a composite end point of cardiac death, non-fatal myocardial infarction, and non-fatal stroke. Secondary end points included each constituent of the composite, as well as hospitalizations for heart failure, coronary revascularization, or unstable angina, as well as other safety outcomes.

The SAVOR trial found no association between saxagliptin use and the primary end point, or any of the constituents of the primary end point.  The trial did, however, find a modest association between saxagliptin and one of the several secondary end points, hospitalization for heart failure (hazard ratio, 1.27; 95% C.I., 1.07 to 1.51; p = 0.007). The SAVOR authors urged caution in interpreting their unexpected finding for heart failure hospitalizations, given the multiple end points considered.[2] Notwithstanding the multiplicity, in 2016, the FDA, which does not require a showing of causation for adding warnings to a drug’s labeling, added warnings about the “risk” of hospitalization for heart failure from the use of saxagliptin medications.

And the litigation came.

The litigation evidentiary display grew to include, in addition to SAVOR, observational studies, meta-analyses, and randomized controlled trials of other DPP-4 inhibitor medications that are in the same class as saxagliptin. The SAVOR finding for heart failure was not supported by any of the other relevant human study evidence. The lawsuit industry, however, armed with an FDA warning, pressed its cases. A multi-district litigation (MDL 2809) was established. Rule 702 motions were filed by both plaintiffs’ and defendants’ counsel.

When the dust settled in this saxagliptin litigation, the court found that the defendants’ expert witnesses satisfied the relevance and reliability requirements of Rule 702, whereas the proferred opinions of plaintiff’s expert witness, Dr. Parag Goyal, a cardiologist at Cornell-Weill Hospital in New York, did not satisfy Rule 702.[3] The court’s task was certainly made easier by the lack of any other expert witness or published opinion that saxagliptin actually causes heart failure serious enough to result in hospitalization. 

The saxagliptin litigation presented an interesting array of facts for a Rule 702 show down. First, there was an RCT that reported a nominally statistically significant association between medication use and a harm, hospitalization for heart failure. The SAVOR finding, however, was in a secondary end point, and its statistical significance was unimpressive when considered in the light of the multiple testing that took place in the context of a cardiovascular outcomes trial.

Second, the heart failure increase was not seen in the original registration trials. Third, there was an effort to find corroboration in observational studies and meta-analyses, without success. Fourth, there was no apparent mechanism for the putative effect. Fifth, there was no support from trials or observational studies of other medications in the class of DPP-4 inhibitors.

Dr. Goyal testified that the heart failure finding in SAVOR “should be interpreted as cause and effect unless there is compelling evidence to prove otherwise.” On this record, the MDL court excluded Dr. Goyal’s causation opinions. Dr. Goyal purported to conduct a Bradford Hill analysis, but the MDL court appeared troubled by his glib dismissal of the threat to validity in SAVOR from multiple testing, and his ignoring the consistency prong of the Hill factors. SAVOR was the only heart failure finding in humans, with the remaining observational studies, meta-analyses, and other trials of DPP-4 inhibitors failing to provide supporting evidence.

The challenged defense expert witnesses defended the validity of their opinions, and ultimately the MDL court had little concern in permitting them through the judicial gate. The plaintiffs’ challenges to Suneil Koliwad, a physician with a doctorate in molecular physiology, Eric Adler, a cardiologist, and Todd Lee, a pharmaco-epidemiologist, were all denied. The plaintiffs challenged, among other things, whether Dr. Adler was qualified to apply a Bonferroni correction to the SAVOR results, and whether Dr. Lee was obligated to obtain and statistically analyze the data from the trials and studies ab initio. The MDL court quickly dispatched these frivolous challenges.

The saxagliptin MDL decision is an important reminder that litigants should remain vigilant about inaccurate assertions of “statistical significance,” even in premier, peer-reviewed journals. Not all journals are as careful as the New England Journal of Medicine in requiring qualification of claims of statistical significance in the face of multiple testing.

One legal hiccup in the court’s decision was its improvident citation to Daubert, for the proposition that the gatekeeping inquiry must focus “solely on principles and methodology, not on the conclusions they generate.”[4] That piece of obiter dictum did not survive past the Supreme Court’s 1997 decision in Joiner,[5] and it was clearly superseded by statute in 2000. Surely it is time to stop citing Daubert for this dictum.


[1] Benjamin M. Scirica, Deepak L. Bhatt, Eugene Braunwald, Gabriel Steg, Jaime Davidson, et al., for the SAVOR-TIMI 53 Steering Committee and Investigators, “Saxagliptin and Cardiovascular Outcomes in Patients with Type 2 Diabetes Mellitus,” 369 New Engl. J. Med. 1317 (2013).

[2] Id. at 1324.

[3] In re Onglyza & Kombiglyze XR Prods. Liab. Litig., MDL 2809, 2022 WL 43244 (E.D. Ken. Jan. 5, 2022).

[4] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[5] General Electric Co. v. Joiner, 522 U.S. 136 (1997).

Madigan’s Shenanigans & Wells Quelled in Incretin-Mimetic Cases

July 15th, 2022

The incretin-mimetic litigation involved claims that the use of Byetta, Januvia, Janumet, and Victoza medications causes pancreatic cancer. All four medications treat diabetes mellitus through incretin hormones, which stimulate or support insulin production, which in turn lowers blood sugar. On Planet Earth, the only scientists who contend that these medications cause pancreatic cancer are those hired by the lawsuit industry.

The cases against the manufacturers of the incretin-mimetic medications were consolidated for pre-trial proceedings in federal court, pursuant to the multi-district litigation (MDL) statute, 28 US Code § 1407. After years of MDL proceedings, the trial court dismissed the cases as barred by the doctrine of federal preemption, and for good measure, excluded plaintiffs’ medical causation expert witnesses from testifying.[1] If there were any doubt about the false claiming in this MDL, the district court’s dismissals were affirmed by the Ninth Circuit.[2]

The district court’s application of Federal Rule of Evidence 702 to the plaintiffs’ expert witnesses’ opinion is an important essay in patho-epistemology. The challenged expert witnesses provided many examples of invalid study design and interpretation. Of particular interest, two of the plaintiffs’ high-volume statistician testifiers, David Madigan and Martin Wells, proffered their own meta-analyses of clinical trial safety data. Although the current edition of the Reference Manual on Scientific Evidence[3] provides virtually no guidance to judges for assessing the validity of meta-analyses, judges and counsel do now have other readily available sources, such as the FDA’s Guidance on meta-analysis of safety outcomes of clinical trials.[4] Luckily for the Incretin-Mimetics pancreatic cancer MDL judge, the misuse of meta-analysis methodology by plaintiffs’ statistician expert witnesses, David Madigan and Martin Wells was intuitively obvious.

Madigan and Wells had a large set of clinical trials at their disposal, with adverse safety outcomes assiduously collected. As is the case with many clinical trial safety outcomes, the trialists will often have a procedure for blinded or unblinded adjudication of safety events, such as pancreatic cancer diagnosis.

At deposition, Madigan testified that he counted only adjudicated cases of pancreatic cancer in his meta-analyses, which seems reasonable enough. As discovery revealed, however, Madigan employed the restrictive inclusion criteria of adjudicated pancreatic cancer only to the placebo group, not to the experimental group. His use of restrictive inclusion criteria for only the placebo group had the effect of excluding several non-adjudicated events, with the obvious spurious inflation of risk ratios. The MDL court thus found with relative ease that Madigan’s “unequal application of criteria among the two groups inevitably skews the data and critically undermines the reliability of his analysis.” The unexplained, unjustified change in methodology revealed Madigan’s unreliable “cherry-picking” and lack of scientific rigor as producing a result-driven meta-analyses.[5]

The MDL court similarly found that Wells’ reports “were marred by a selective review of data and inconsistent application of inclusion criteria.”[6] Like Madigan, Wells cherry picked studies. For instance, he excluded one study, EXSCEL, on grounds that it reported “a high pancreatic cancer event rate in the comparison group as compared to background rate in the general population….”[7] Wells’ explanation blatantly failed, however, given that the entire patient population of the clinical trial had diabetes, a known risk factor for pancreatic cancer.[8]

As Professor Ioannidis and others have noted, we are awash in misleading meta-analyses:

“Currently, there is massive production of unnecessary, misleading, and conflicted systematic reviews and meta-analyses. Instead of promoting evidence-based medicine and health care, these instruments often serve mostly as easily produced publishable units or marketing tools.  Suboptimal systematic reviews and meta-analyses can be harmful given the major prestige and influence these types of studies have acquired.  The publication of systematic reviews and meta-analyses should be realigned to remove biases and vested interests and to integrate them better with the primary production of evidence.”[9]

Whether created for litigation, like the Madigan-Wells meta-analyses, or published in the “peer-reviewed” literature, courts will have to up their game in assessing the validity of such studies. Published meta-analyses have grown exponentially from the 1990s to the present. To date, 248,886 meta-analyses have been published, according the National Library of Medicine’s Pub-Med database. Last year saw over 35,000 meta-analyses published. So far, this year, 20,416 meta-analyses have been published, and we appear to be on track to have a bumper crop.

The data analytics from Pub-Med provide a helpful visual representation of the growth of meta-analyses in biomedical science.

 

Count of Publications with Keyword Meta-analysis in Pub-Med Database

In 1979, the year I started law school, one meta-analysis was published. Lawyers could still legitimately argue that meta-analyses involved novel methodology that had not been generally accepted. The novelty of meta-analysis wore off sometime between 1988, when Judge Robert Kelly excluded William Nicholson’s meta-analysis of health outcomes among PCB-exposed workers, on grounds that such analyses were “novel,” and 1990, when the Third Circuit reversed Judge Kelly, with instructions to assess study validity.[10] Fortunately, or not, depending upon your point of view, plaintiffs dropped Nicholson’s meta-analysis in subsequent proceedings. A close look at Nicholson’s non-peer reviewed calculations shows that he failed to standardize for age or sex, and that he merely added observed and expected cases, across studies, without weighting by individual study variance. The trial court never had the opportunity to assess the validity vel non of Nicholson’s ersatz meta-analysis.[11] Today, trial courts must pick up on the challenge of assessing study validity of meta-analyses relied upon by expert witnesses, regulatory agencies, and systematic reviews.


[1] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp.3d 1007 (S.D. Cal. 2021).

[2] In re Incretin-Based Therapies Prods. Liab. Litig., No. 21-55342, 2022 WL 898595 (9th Cir. Mar. 28, 2022) (per curiam)

[3]  “The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 15, 2011).

[4] Food and Drug Administration, Center for Drug Evaluation and Research, “Meta-Analyses of Randomized Controlled Clinical Trials to Evaluate the Safety of Human Drugs or Biological Products – (Draft) Guidance for Industry” (Nov. 2018); Jonathan J. Deeks, Julian P.T. Higgins, Douglas G. Altman, “Analysing data and undertaking meta-analyses,” Chapter 10, in Julian P.T. Higgins, James Thomas, Jacqueline Chandler, Miranda Cumpston, Tianjing Li, Matthew J. Page, and Vivian Welch, eds., Cochrane Handbook for Systematic Reviews of Interventions (version 6.3 updated February 2022); Donna F. Stroup, Jesse A. Berlin, Sally C. Morton, Ingram Olkin, G. David Williamson, Drummond Rennie, David Moher, Betsy J. Becker, Theresa Ann Sipe, Stephen B. Thacker, “Meta-Analysis of Observational Studies: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000); David Moher, Alessandro Liberati, Jennifer Tetzlaff, and Douglas G Altman, “Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement,” 6 PLoS Med e1000097 (2009).

[5] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp.3d 1007, 1037 (S.D. Cal. 2021). See In re Lipitor (Atorvastatin Calcium) Mktg., Sales Practices & Prods. Liab. Litig. (No. II) MDL2502, 892 F.3d 624, 634 (4th Cir. 2018) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

[6] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp.3d 1007, 1043 (S.D. Cal. 2021).

[7] Id. at 1038.

[8] See, e.g., Albert B. Lowenfels & Patrick Maisonneuve, “Risk factors for pancreatic cancer,” 95 J. Cellular Biochem. 649 (2005).

[9] John P. Ioannidis, “The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses,” 94 Milbank Quarterly 485 (2016).

[10] In re Paoli R.R. Yard PCB Litig., 706 F. Supp. 358, 373 (E.D. Pa. 1988), rev’d and remanded, 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991). See also Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).

[11]The Shmeta-Analysis in Paoli” (July 11, 2019). See  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, “Notes on the theory of association of attributes in statistics,” 2 Biometrika 121 (1903).

Reference Manual on Scientific Evidence – 3rd Edition is Past Its Expiry

October 17th, 2021

INTRODUCTION

The new, third edition of the Reference Manual on Scientific Evidence was released to the public in September 2011, as a joint production of the National Academies of Science, and the Federal Judicial Center. Within a year of its publication, I wrote that the Manual needed attention on several key issues. Now that there is a committee working on the fourth edition, I am reprising the critique, slightly modified, in the hope that it may make a difference for the fourth edition.

The Development Committee for the third edition included Co-Chairs, Professor Jerome Kassirer, of Tufts University School of Medicine, and the Hon. Gladys Kessler, who sits on the District Court for the District of Columbia.  The members of the Development Committee included:

  • Ming W. Chin, Associate Justice, The Supreme Court of California
  • Pauline Newman, Judge, Court of Appeals for the Federal Circuit
  • Kathleen O’Malley, Judge, Court of Appeals for the Federal Circuit (formerly a district judge on the Northern District of Ohio)
  • Jed S. Rakoff, Judge, Southern District of New York
  • Channing Robertson, Professor of Engineering, Stanford University
  • Joseph V. Rodricks, Principal, Environ
  • Allen Wilcox, Senior Investigator, Institute of Environmental Health Sciences
  • Sandy L. Zabell, Professor of Statistics and Mathematics, Weinberg College of Arts and Sciences, Northwestern University

Joe S. Cecil, Project Director, Program on Scientific and Technical Evidence, in the Federal Judicial Center’s Division of Research, who shepherded the first two editions, served as consultant to the Committee.

With over 1,000 pages, there was much to digest in the third edition of the Reference Manual on Scientific Evidence (RMSE 3d).  Much of what is covered was solid information on the individual scientific and technical disciplines covered.  Although the information is easily available from other sources, there is some value in collecting the material in a single volume for the convenience of judges and lawyers.  Of course, given that this information is provided to judges from an ostensibly neutral, credible source, lawyers will naturally focus on what is doubtful or controversial in the RMSE. To date, there have been only a few reviews and acknowledgments of the new edition.[1]

Like previous editions, the substantive scientific areas were covered in discrete chapters, written by subject matter specialists, often along with a lawyer who addresses the legal implications and judicial treatment of that subject matter.  From my perspective, the chapters on statistics, epidemiology, and toxicology were the most important in my practice and in teaching, and I have focused on issues raised by these chapters.

The strengths of the chapter on statistical evidence, updated from the second edition, remained, as did some of the strengths and flaws of the chapter on epidemiology.  In addition, there was a good deal of overlap among the chapters on statistics, epidemiology, and medical testimony.  This overlap was at first blush troubling because the RMSE has the potential to confuse and obscure issues by having multiple authors address them inconsistently.  This is an area where reviewers of the upcoming edition should pay close attention.

I. Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

There was a deep discordance among the chapters in the third Reference Manual as to how judges should approach scientific gatekeeping issues. The third edition vacillated between encouraging judges to look at scientific validity, and discouraging them from any meaningful analysis by emphasizing inaccurate proxies for validity, such as conflicts of interest.[2]

The Third Edition featured an updated version of the late Professor Margaret Berger’s chapter from the second edition, “The Admissibility of Expert Testimony.”[3]  Berger’s chapter criticized “atomization,” a process she describes pejoratively as a “slicing-and-dicing” approach.[4]  Drawing on the publications of Daubert-critic Susan Haack, Berger rejected the notion that courts should examine the reliability of each study independently.[5]  Berger contended that the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer, the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[6]

Berger’s contention, however, was profoundly misleading.  Of course, scientists undertaking a systematic review should identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems.  Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010.  She was no friend of Daubert,[7] but remarkably her antipathy had outlived her.  Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[8]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.”  One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay.  A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication.  Those leaps do not mean that the final results are untrustworthy, only that the study itself is not likely admissible in evidence.

The inadmissibility of scientific studies is not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not independently admissible in evidence. The distinction between relied upon and admissible studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

Referring to studies, without qualification, as admissible in themselves is usually wrong as a matter of evidence law.  The error has the potential to encourage carelessness in gatekeeping expert witnesses’ opinions for their reliance upon inadmissible studies.  The error is doubly wrong if this approach to expert witness gatekeeping is taken as license to permit expert witnesses to rely upon any marginally relevant study of their choosing.  It is therefore disconcerting that the RMSE 3d failed to make the appropriate distinction between admissibility of studies and admissibility of expert witness opinion that has reasonably relied upon appropriate studies.

Consider the following statement from the chapter on epidemiology:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[9]

Curiously, the advice from the authors of the epidemiology chapter, by speaking to a single study’s validity, was at odds with Professor Berger’s caution against slicing and dicing. The authors of the epidemiology chapter seemed to be stressing that scientifically valid studies should be admissible.  Their footnote emphasized and confused the point:

See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990); cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984). Hearsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert. In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded that certain epidemiologic studies were admissible despite criticism of the methodology used in the studies. The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . .”).”[10]

This footnote, however, that studies relied upon by an expert in forming an opinion may be admissible pursuant to Rule 703, was unsupported by and contrary to Rule 703 and the overwhelming weight of case law interpreting and applying the rule.[11] The citation to a pre-Daubert decision, Christophersen, was doubtful as a legal argument, and managed to engender much confusion

Furthermore, Kehm and Ellis, the cases cited in this footnote by the authors of the epidemiology chapter, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C). See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18.  As such, the cases hardly support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies.

Here the RMSE 3d, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence.  The point is reasonably clear, however, that the studies “may be offered” in testimony to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused. The offer, however, is to “explain,” not to have the studies admitted in evidence.  The RMSE 3d was certainly not alone in advancing this notion that studies are themselves admissible.  Other well-respected evidence scholars have lapsed into this error.[12]

Evidence scholars should not conflate admissibility of the epidemiologic (or other) studies with the ability of an expert witness to advert to a study to explain his or her opinion.  The testifying expert witness really should not be allowed to become a conduit for off-hand comments and opinions in the introduction or discussion section of relied upon articles, and the wholesale admission of such hearsay opinions undermines the trial court’s control over opinion evidence.  Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

II. Toxicology for Judges

The toxicology chapter, “Reference Guide on Toxicology,” in RMSE 3d was written by Professor Bernard D. Goldstein, of the University of Pittsburgh Graduate School of Public Health, and Mary Sue Henifin, a partner in the Princeton, New Jersey office of Buchanan Ingersoll, P.C.

  1. Conflicts of Interest

At the question and answer session of the Reference Manual’s public release ceremony, in September 2011, one gentleman rose to note that some of the authors were lawyers with big firm affiliations, which he supposed must mean that they represent mostly defendants.  Based upon his premise, he asked what the review committee had done to ensure that conflicts of interest did not skew or distort the discussions in the affected chapters.  Dr. Kassirer and Judge Kessler responded by pointing out that the chapters were peer reviewed by outside reviewers, and reviewed by members of the supervising review committee.  The questioner seemed reassured, but now that I have looked at the toxicology chapter, I am not so sure.

The questioner’s premise that a member of a large firm will represent mostly defendants and thus have a pro-defense bias was probably a common perception among unsophisticated lay observers.  For instance, some large firms represent insurance companies intent upon denying coverage to product manufacturers.  These counsel for insurance companies often take the plaintiffs’ side of the underlying disputed issue in order to claim an exclusion to the contract of insurance, under a claim that the harm was “expected or intended.”  Similarly, the common perception ignores the reality of lawyers’ true conflict:  although gatekeeping helps the defense lawyers’ clients, it takes away legal work from firms that represent defendants in the litigations that are pretermitted by effective judicial gatekeeping.  Erosion of gatekeeping concepts, however, inures to the benefit of plaintiffs, their counsel, as well as the expert witnesses engaged on behalf of plaintiffs in litigation.

The questioner’s supposition in the case of the toxicology chapter, however, is doubly flawed.  If he had known more about the authors, he would probably not have asked his question.  First, the lawyer author, Ms. Henifin, despite her large firm affiliation, has taken some aggressive positions contrary to the interests of manufacturers.[13]  As for the scientist author of the toxicology chapter, Professor Goldstein, the casual reader of the chapter may want to know that he has testified in any number of toxic tort cases, almost invariably on the plaintiffs’ side.  Unlike the defense lawyer, who loses business revenue, when courts shut down unreliable claims, plaintiffs’ testifying or consulting expert witnesses stand to gain by minimalist expert witness opinion gatekeeping.  Given the economic asymmetries, the reader must thus want to know that Professor Goldstein was excluded as an expert witness in some high-profile toxic tort cases.[14]  There do not appear to be any disclosures of Professor Goldstein’s (or any other scientist author’s) conflicts of interests in RMSE 3d.  Having pointed out this conflict, I would note that financial conflicts of interest are nothing really compared with ideological conflicts of interest, which often propel scientists into service as expert witnesses to advance their political agenda.

  1. Hormesis

One way that ideological conflicts might be revealed is to look for imbalances in the presentation of toxicologic concepts.  Most lawyers who litigate cases that involve exposure-response issues are familiar with the “linear no threshold” (LNT) concept that is used frequently in regulatory risk assessments, and which has metastasized to toxic tort litigation, where LNT often has no proper place.

LNT is a dubious assumption because it claims to “know” the dose response at very low exposure levels in the absence of data.  There is a thin plausibility for LNT for genotoxic chemicals claimed to be carcinogens, but even that plausibility evaporates when one realizes that there are DNA defense and repair mechanisms to genotoxicity, which must first be saturated, overwhelmed, or inhibited, before there can be a carcinogenic response. The upshot is that low exposures that do not swamp DNA repair and tumor suppression proteins will not cause cancer.

Hormesis is today an accepted concept that describes a dose-response relationship that shows a benefit at low doses, but harm at high doses. The toxicology chapter in the Reference Manual has several references to LNT but none to hormesis.  That font of all knowledge, Wikipedia reports that hormesis is controversial, but so is LNT.  This is the sort of imbalance that may well reflect an ideological bias.

One of the leading textbooks on toxicology describes hormesis[15]:

“There is considerable evidence to suggest that some non-nutritional toxic substances may also impart beneficial or stimulatory effects at low doses but that, at higher doses, they produce adverse effects. This concept of “hormesis” was first described for radiation effects but may also pertain to most chemical responses.”

Similarly, the Encyclopedia of Toxicology describes hormesis as an important phenomenon in toxicologic science[16]:

“This type of dose–response relationship is observed in a phenomenon known as hormesis, with one explanation being that exposure to small amounts of a material can actually confer resistance to the agent before frank toxicity begins to appear following exposures to larger amounts.  However, analysis of the available mechanistic studies indicates that there is no single hormetic mechanism. In fact, there are numerous ways for biological systems to show hormetic-like biphasic dose–response relationship. Hormetic dose–response has emerged in recent years as a dose–response phenomenon of great interest in toxicology and risk assessment.”

One might think that hormesis would also be of great interest to federal judges, but they will not learn about it from reading the Reference Manual.

Hormesis research has come into its own.  The International Dose-Response Society, which “focus[es] on the dose-response in the low-dose zone,” publishes a journal, Dose-Response, and a newsletter, BELLE:  Biological Effects of Low Level Exposure.  In 2009, two leading researchers in the area of hormesis published a collection of important papers:  Mark P. Mattson and Edward J. Calabrese, eds., Hormesis: A Revolution in Biology, Toxicology and Medicine (2009).

A check in PubMed shows that LNT has more “hits” than “hormesis” or “hermetic,” but still the latter phrases exceed 1,267 references, hardly insubstantial.  In actuality, there are many more hermetic relationships identified in the scientific literature, which often fails to identify the relationship by the term hormesis or hermetic.[17]

The Reference Manual’s omission of hormesis was regrettable.  Its inclusion of references to LNT but not to hormesis suggests a biased treatment of the subject.

  1. Questionable Substantive Opinions

Readers and litigants would fondly hope that the toxicology chapter would not put forward partisan substantive positions on issues that are currently the subject of active litigation.  Fervently, we would hope that any substantive position advanced would at least be well documented.

For at least one issue, the toxicology chapter disappointed significantly.  Table 1 in the chapter presents a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” No documentation or citations are provided for this table.  Most of the exposure agent/disease outcome relationships in the table are well accepted, but curiously at least one agent-disease pair, which is the subject of current litigation, is wildly off the mark:

“Parkinson’s disease and manganese[18]

If the chapter’s authors had looked, they would have found that Parkinson’s disease is almost universally accepted to have no known cause, at least outside court rooms.  They would also have found that the issue has been addressed carefully and the claimed relationship or “concern” has been rejected by the leading researchers in the field (who have no litigation ties).[19]  Table 1 suggests a certain lack of objectivity, and its inclusion of a highly controversial relationship, manganese-Parkinson’s disease, suggests a good deal of partisanship.

  1. When All You Have Is a Hammer, Everything Looks Like a Nail

The substantive area author, Professor Goldstein, is not a physician; nor is he an epidemiologist.  His professional focus on animal and cell research appeared to color and bias the opinions offered in this chapter:[20]

“In qualitative extrapolation, one can usually rely on the fact that a compound causing an effect in one mammalian species will cause it in another species. This is a basic principle of toxicology and pharmacology.  If a heavy metal, such as mercury, causes kidney toxicity in laboratory animals, it is highly likely to do so at some dose in humans.”

Such extrapolations may make sense in regulatory contexts, where precauationary judgments are of interest, but they hardly can be said to be generally accepted in controversies in scientific communities, or in civil actions over actual causation.  There are too many counterexamples to cite, but consider crystalline silica, silicon dioxide.  Silica causes something resembling lung cancer in rats, but not in mice, guinea pigs, or hamsters.  It hardly makes sense to ask juries to decide whether the plaintiff is more like a rat than a mouse.

For a sober second opinion to the toxicology chapter, one may consider the views of some well-known authors:

“Whereas the concordance was high between cancer-causing agents initially discovered in humans and positive results in animal studies (Tomatis et al., 1989; Wilbourn et al., 1984), the same could not be said for the reverse relationship: carcinogenic effects in animals frequently lacked concordance with overall patterns in human cancer incidence (Pastoor and Stevens, 2005).”[21]

III. New Reference Manual’s Uneven Treatment of Causation and of Conflicts of Interest

The third edition of the Reference Manual on Scientific Evidence (RMSE) appeared to get off to a good start in the Preface by Judge Kessler and Dr. Kassirer, when they noted that the Supreme Court mandated federal courts to:

“examine the scientific basis of expert testimony to ensure that it meets the same rigorous standard employed by scientific researchers and practitioners outside the courtroom.”

RMSE at xiii.  The preface faltered, however, on two key issues, causation and conflicts of interest, which are taken up as an introduction to the third edition.

  1. Causation

The authors reported in somewhat squishy terms that causal assessments are judgments:

“Fundamentally, the task is an inferential process of weighing evidence and using judgment to conclude whether or not an effect is the result of some stimulus. Judgment is required even when using sophisticated statistical methods. Such methods can provide powerful evidence of associations between variables, but they cannot prove that a causal relationship exists. Theories of causation (evolution, for example) lose their designation as theories only if the scientific community has rejected alternative theories and accepted the causal relationship as fact. Elements that are often considered in helping to establish a causal relationship include predisposing factors, proximity of a stimulus to its putative outcome, the strength of the stimulus, and the strength of the events in a causal chain.”[22]

The authors left the inferential process as a matter of “weighing evidence,” but without saying anything about how the scientific community does its “weighing.” Language about “proving” causation is also unclear because “proof” in scientific parlance connotes a demonstration, which we typically find in logic or in mathematics. Proving empirical propositions suggests a bar set so high such that the courts must inevitably acquiesce in a very low threshold of evidence.  The question, of course, is how low can and will judges go to admit evidence.

The authors thus introduced hand waving and excuses for why evidence can be weighed differently in court proceedings from the world of science:

“Unfortunately, judges may be in a less favorable position than scientists to make causal assessments. Scientists may delay their decision while they or others gather more data. Judges, on the other hand, must rule on causation based on existing information. Concepts of causation familiar to scientists (no matter what stripe) may not resonate with judges who are asked to rule on general causation (i.e., is a particular stimulus known to produce a particular reaction) or specific causation (i.e., did a particular stimulus cause a particular consequence in a specific instance). In the final analysis, a judge does not have the option of suspending judgment until more information is available, but must decide after considering the best available science.”[23]

But the “best available science” may be pretty crummy, and the temptation to turn desperation into evidence (“well, it’s the best we have now”) is often severe.  The authors of the Preface thus remarkable signalled that “inconclusive” is not a judgment open to judges charged with expert witness gatekeeping.  If the authors truly meant to suggest that judges should go with whatever is dished out as “the best available science,” then they have overlooked the obvious:  Rule 702 opens the door to “scientific, technical, or other specialized knowledge,” not to hunches, suggestive but inconclusive evidence, and wishful thinking about how the science may turn out when further along.  Courts have a choice to exclude expert witness opinion testimony that is based upon incomplete or inconclusive evidence. The authors went fairly far afield to suggest, erroneously, that the incomplete and the inconclusive are good enough and should be admitted.

  1. Conflicts of Interest

Surprisingly, given the scope of the scientific areas covered in the RMSE, the authors discussed conflicts of interest (COI) at some length.  Conflicts of interest are a fact of life in all endeavors, and it is understandable counsel judges and juries to try to identify, assess, and control them.  COIs, however, are weak proxies for unreliability.  The emphasis given here was, however, undue because federal judges were enticed into thinking that they can discern unreliability from COI, when they should be focused on the data, inferences, and analyses.

What becomes fairly clear is that the authors of the Preface set out to use COI as a basis for giving litigation plaintiffs a pass, and for holding back studies sponsored by corporate defendants.

“Conflict of interest manifests as bias, and given the high stakes and adversarial nature of many courtroom proceedings, bias can have a major influence on evidence, testimony, and decisionmaking. Conflicts of interest take many forms and can be based on religious, social, political, or other personal convictions. The biases that these convictions can induce may range from serious to extreme, but these intrinsic influences and the biases they can induce are difficult to identify. Even individuals with such prejudices may not appreciate that they have them, nor may they realize that their interpretations of scientific issues may be biased by them. Because of these limitations, we consider here only financial conflicts of interest; such conflicts are discoverable. Nonetheless, even though financial conflicts can be identified, having such a conflict, even one involving huge sums of money, does not necessarily mean that a given individual will be biased. Having a financial relationship with a commercial entity produces a conflict of interest, but it does not inevitably evoke bias. In science, financial conflict of interest is often accompanied by disclosure of the relationship, leaving to the public the decision whether the interpretation might be tainted. Needless to say, such an assessment may be difficult. The problem is compounded in scientific publications by obscure ways in which the conflicts are reported and by a lack of disclosure of dollar amounts.

Judges and juries, however, must consider financial conflicts of interest when assessing scientific testimony. The threshold for pursuing the possibility of bias must be low. In some instances, judges have been frustrated in identifying expert witnesses who are free of conflict of interest because entire fields of science seem to be co-opted by payments from industry. Judges must also be aware that the research methods of studies funded specifically for purposes of litigation could favor one of the parties. Though awareness of such financial conflicts in itself is not necessarily predictive of bias, such information should be sought and evaluated as part of the deliberations.”[24]

All in all, rather misleading advice.  Financial conflicts are not the only conflicts that can be “discovered.”  Often expert witnesses will have political and organizational alignments, which will show deep-seated ideological alignments with the party for which they are testifying.  For instance, in one silicosis case, an expert witness in the field of history of medicine testified, at an examination before trial, that his father suffered from a silica-related disease.  This witness’s alignment with Marxist historians and his identification with radical labor movements made his non-financial conflicts obvious, although these COI would not necessarily have been apparent from his scholarly publications alone.

How low will the bar be set for discovering COI?  If testifying expert witnesses are relying upon textbooks, articles, essays, will federal courts open the authors/hearsay declarants up to searching discovery of their finances? What really is at stake here is that the issues of accuracy, precision, and reliability are lost in the ad hominem project of discovery COIs.

Also misleading was the suggestion that “entire fields of science seem to be co-opted by payments from industry.”  Do the authors mean to exclude the plaintiffs’ lawyer lawsuit industry, which has become one of the largest rent-seeking organizations, and one of the most politically powerful groups in this country?  In litigations in which I have been involved, I have certainly seen plaintiffs’ counsel, or their proxies – labor unions, federal agencies, or “victim support groups” provide substantial funding for studies.  The Preface authors themselves show an untoward bias by their pointing out industry payments without giving balanced attention to other interested parties’ funding of scientific studies.

The attention to COI was also surprising given that one of the key chapters, for toxic tort practitioners, was written by Dr. Bernard D. Goldstein, who has testified in toxic tort cases, mostly (but not exclusively) for plaintiffs.[25]  In one such case, Makofsky, Dr. Goldstein’s participation was particularly revealing because he was forced to explain why he was willing to opine that benzene caused acute lymphocytic leukemia, despite the plethora of published studies finding no statistically significant relationship.  Dr. Goldstein resorted to the inaccurate notion that scientific “proof” of causation requires 95 percent certainty, whereas he imposed only a 51 percent certainty for his medico-legal testimonial adventures.[26] Dr. Goldstein also attempted to justify the discrepancy from the published literature by adverting to the lower standards used by federal regulatory agencies and treating physicians.  

These explanations were particularly concerning because they reflect basic errors in statistics and in causal reasoning.  The 95 percent derives from the use of the coefficient of confidence in confidence intervals, but the probability involved there is not the probability of the association’s being correct, and it has nothing to do with the probability in the belief that an association is real or is causal.  (Thankfully the RMSE chapter on statistics got this right, but my fear is that judges will skip over the more demanding chapter on statistics and place undue weight on the toxicology chapter.)  The reference to federal agencies (OSHA, EPA, etc.) and to treating physicians was meant, no doubt, to invoke precautionary principle concepts as a justification for some vague, ill-defined, lower standard of causal assessment.  These references were really covert invitations to shift the burden of proof.

The Preface authors might well have taken their own counsel and conducted a more searching assessment of COI among authors of Reference Manual.  Better yet, the authors might have focused the judiciary on the data and the analysis.

  1. Reference Manual on Scientific Evidence (3d edition) on Statistical Significance

How does the new Reference Manual on Scientific Evidence treat statistical significance?  Inconsistently and at times incoherently.

  1. Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raised the question what role statistical significance should play in evaluating a study’s support for causal conclusions[27]:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value,62 at least in proving general causation.63

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology. Certainly the failure to rule out causation is not probative of causation. How can that tautology support the claim that inconclusive studies “therefore” have some probative value? Berger’s argument seems obviously invalid, or perhaps text that badly needed a posthumous editor.  And what epidemiologic studies are conclusive?  Are the studies individually or collectively conclusive?  Berger introduced a tantalizing concept, which was not spelled out anywhere in the Manual.

Berger’s chapter raised other, serious problems. If the relied-upon studies are not statistically significant, how should we understand the testifying expert witness to have ruled out random variability as an explanation for the disparity observed in the study or studies?  Berger did not answer these important questions, but her rhetoric elsewhere suggested that trial courts should not look too hard at the statistical support (or its lack) for what expert witness testimony is proffered.

Berger’s citations in support were curiously inaccurate.  Footnote 62 cites the Cook case:

“62. See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”

Berger’s citation was disturbingly incomplete.[28] The expert witness, Dr. Clapp, in Cook did rely upon his own study, which did not obtain a statistically significant result, but the trial court admitted the expert witness’s testimony; the court denied the Rule 702 challenge to Clapp, and permitted him to testify about a statistically non-significant ecological study. Given that the judgment of the district court was reversed

Footnote 63 is no better:

“63. In re Viagra Prods., 572 F. Supp. 2d 1071 (D. Minn. 2008) (extensive review of all expert evidence proffered in multidistricted product liability case).”

With respect to the concept of statistical significance, the Viagra case centered around the motion to exclude plaintiffs’ expert witness, Gerald McGwin, who relied upon three studies, none of which obtained a statistically significant result in its primary analysis.  The Viagra court’s review was hardly extensive; the court did not report, discuss, or consider the appropriate point estimates in most of the studies, the confidence intervals around those point estimates, or any aspect of systematic error in the three studies.  At best, the court’s review was perfunctory.  When the defendant brought to light the lack of data integrity in McGwin’s own study, the Viagra MDL court reversed itself, and granted the motion to exclude McGwin’s testimony.[29]  Berger’s chapter omitted the cautionary tale of McGwin’s serious, pervasive errors, and how they led to his ultimate exclusion. Berger’s characterization of the review was incorrect, and her failure to cite the subsequent procedural history, misleading.

  1. Chapter on Statistics

The Third Edition’s chapter on statistics was relatively free of value judgments about significance probability, and, therefore, an improvement over Berger’s introduction.  The authors carefully described significance probability and p-values, and explain[30]:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

Although the chapter conflated the positions often taken to be Fisher’s interpretation of p-values and Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment was unfortunately fairly standard in introductory textbooks.  The authors may have felt that presenting multiple interpretations of p-values was asking too much of judges and lawyers, but the oversimplification invited a false sense of certainty about the inferences that can be drawn from statistical significance.

Kaye and Freedman, however, did offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome[31]:

“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large dataset—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113

This important qualification to statistical significance was omitted from the overlapping discussion in the chapter on epidemiology, where it was very much needed.

  1. Chapter on Multiple Regression

The chapter on regression did not add much to the earlier and later discussions.  The author asked rhetorically what is the appropriate level of statistical significance, and answers:

“In most scientific work, the level of statistical significance required to reject the null hypothesis (i.e., to obtain a statistically significant result) is set conventionally at 0.05, or 5%.47

Daniel Rubinfeld, “Reference Guide on Multiple Regression,” in RMSE3d 303, 320.

  1. Chapter on Epidemiology

The chapter on epidemiology[32] mostly muddled the discussion set out in Kaye and Freedman’s chapter on statistics.

“The two main techniques for assessing random error are statistical significance and confidence intervals. A study that is statistically significant has results that are unlikely to be the result of random error, although any criterion for ‘significance’ is somewhat arbitrary. A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”

The suggestion that a statistically significant study has results unlikely due to chance, without reminding the reader that the finding is predicated on the assumption that there is no association, and that the probability distribution was correct, and came close to crossing the line in committing the transposition fallacy so nicely described and warned against in the statistics chapter. The problem was that “results” is ambiguous as between the data as extreme or more so than what was observed, and the point estimate of the mean or proportion in the sample, and the assumptions that lead to a p-value were not disclosed.

The suggestion that alpha is “arbitrary,” was “somewhat” correct, but this truncated discussion was distinctly unhelpful to judges who are likely to take “arbitrary“ to mean “I will get reversed.”  The selection of alpha is conventional to some extent, and arbitrary in the sense that the law’s setting an age of majority or a voting age is arbitrary.  Some young adults, age 17.8 years old, may be better educated, better engaged in politics, better informed about current events, than 35 year olds, but the law must set a cut off.  Two year olds are demonstrably unfit, and 82 year olds are surely past the threshold of maturity requisite for political participation. A court might admit an opinion based upon a study of rare diseases, with tight control of bias and confounding, when p = 0.051, but that is hardly a justification for ignoring random error altogether, or admitting an opinion based upon a study, in which the disparity observed had a p = 0.15.

The epidemiology chapter correctly called out judicial decisions that confuse “effect size” with statistical significance[33]:

“Understandably, some courts have been confused about the relationship between statistical significance and the magnitude of the association. See Hyman & Armstrong, P.S.C. v. Gunderson, 279 S.W.3d 93, 102 (Ky. 2008) (describing a small increased risk as being considered statistically insignificant and a somewhat larger risk as being considered statistically significant.); In re Pfizer Inc. Sec. Litig., 584 F. Supp. 2d 621, 634–35 (S.D.N.Y. 2008) (confusing the magnitude of the effect with whether the effect was statistically significant); In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1041 (S.D.N.Y. 1993) (concluding that any relative risk less than 1.50 is statistically insignificant), rev’d on other grounds, 52 F.3d 1124 (2d Cir. 1995).”

Actually this confusion is not understandable at all.  The distinction has been the subject of teaching since the first edition of the Reference Manual, and two of the cited cases post-date the second edition.  The Southern District of New York asbestos case, of course, predated the first Manual.  To be sure, courts have on occasion badly misunderstood significance probability and significance testing.   The authors of the epidemiology chapter could well have added In re Viagra, to the list of courts that confused effect size with statistical significance.[34]

The epidemiology chapter appropriately chastised courts for confusing significance probability with the probability that the null hypothesis, or its complement, is correct[35]:

“A common error made by lawyers, judges, and academics is to equate the level of alpha with the legal burden of proof. Thus, one will often see a statement that using an alpha of .05 for statistical significance imposes a burden of proof on the plaintiff far higher than the civil burden of a preponderance of the evidence (i.e., greater than 50%).  See, e.g., In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 193 (S.D.N.Y. 2005); Marmo v. IBP, Inc., 360 F. Supp. 2d 1019, 1021 n.2 (D. Neb. 2005) (an expert toxicologist who stated that science requires proof with 95% certainty while expressing his understanding that the legal standard merely required more probable than not). But see Giles v. Wyeth, Inc., 500 F. Supp. 2d 1048, 1056–57 (S.D. Ill. 2007) (quoting the second edition of this reference guide).

Comparing a selected p-value with the legal burden of proof is mistaken, although the reasons are a bit complex and a full explanation would require more space and detail than is feasible here. Nevertheless, we sketch out a brief explanation: First, alpha does not address the likelihood that a plaintiff’s disease was caused by exposure to the agent; the magnitude of the association bears on that question. See infra Section VII. Second, significance testing only bears on whether the observed magnitude of association arose  as a result of random chance, not on whether the null hypothesis is true. Third, using stringent significance testing to avoid false-positive error comes at a complementary cost of inducing false-negative error. Fourth, using an alpha of .5 would not be equivalent to saying that the probability the association found is real is 50%, and the probability that it is a result of random error is 50%.”

The footnotes went on to explain further the difference between alpha probability and burden of proof probability, but somewhat misleadingly asserted that “significance testing only bears on whether the observed magnitude of association arose as a result of random chance, not on whether the null hypothesis is true.”[36]  The significance probability does not address the probability that the observed statistic is the result of random chance; rather it describes the probability of observing at least as large a departure from the expected value if the null hypothesis is true.  Of course, if this cumulative probability is sufficiently low, then the null hypothesis is rejected, and this would seem to bear upon whether the null hypothesis is true.  Kaye and Freedman’s chapter on statistics did much better at describing p-values and avoiding the transposition fallacy.

When they stayed on message, the authors of the epidemiology chapter were certainly correct that significance probability cannot be translated into an assessment of the probability that the null hypothesis, or the obtained sampling statistic, is correct.  What these authors omitted, however, was a clear statement that the many courts and counsel who have misstated this fact do not create any worthwhile precedent, persuasive or binding.

The epidemiology chapter ultimately failed to help judges in assessing statistical significance:

“There is some controversy among epidemiologists and biostatisticians about the appropriate role of significance testing.85 To the strictest significance testers, any study whose p-value is not less than the level chosen for statistical significance should be rejected as inadequate to disprove the null hypothesis. Others are critical of using strict significance testing, which rejects all studies with an observed p-value below that specified level. Epidemiologists have become increasingly sophisticated in addressing the issue of random error and examining the data from a study to ascertain what information they may provide about the relationship between an agent and a disease, without the necessity of rejecting all studies that are not statistically significant.86 Meta-analysis, as well, a method for pooling the results of multiple studies, sometimes can ameliorate concerns about random error.87  Calculation of a confidence interval permits a more refined assessment of appropriate inferences about the association found in an epidemiologic study.88

Id. at 578-79.  Mostly true, but again rather unhelpful to judges and lawyers.  Some of the controversy, to be sure, was instigated by statisticians and epidemiologists who would elevate Bayesian methods, and eliminate the use of significance probability and testing altogether. As for those scientists who still work within the dominant frequentist statistical paradigm, the chapter authors divided the world up into “strict” testers and those critical of “strict” testing.  Where, however, is the boundary? Does criticism of “strict” testing imply embrace of “non-strict” testing, or of no testing at all?  I can sympathize with a judge who permits reliance upon a series of studies that all go in the same direction, with each having a confidence interval that just misses excluding the null hypothesis.  Meta-analysis in such a situation might not just ameliorate concerns about random error, it might eliminate them.  But what of those scientists critical of strict testing?  This certainly does not suggest or imply that courts can or should ignore random error; yet that is exactly what happened in the early going in In re Viagra Products Liab. Litig.[37]  The epidemiology chapter’s reference to confidence intervals was correct in part; they permit a more refined assessment because they permit a more direct assessment of the extent of random error in terms of magnitude of association, as well as the point estimate of the association obtained from and conditioned on the sample.  Confidence intervals, however, do not eliminate the need to interpret the extent of random error; rather they provide a more direct assessment and measurement of the standard error.

V. Power in the Reference Manual for Scientific Evidence

The Third Edition treated statistical power in three of its chapters, those on statistics, epidemiology, and medical testimony.  Unfortunately, the treatments were not always consistent.

The chapter on statistics has been consistently among the most frequently ignored content of the three editions of the Reference Manual.  The third edition offered a good introduction to basic concepts of sampling, random variability, significance testing, and confidence intervals.[38]  Kaye and Freedman provided an acceptable non-technical definition of statistical power[39]:

“More precisely, power is the probability of rejecting the null hypothesis when the alternative hypothesis … is right. Typically, this probability will depend on the values of unknown parameters, as well as the preset significance level α. The power can be computed for any value of α and any choice of parameters satisfying the alternative hypothesis. Frequentist hypothesis testing keeps the risk of a false positive to a specified level (such as α = 5%) and then tries to maximize power. Statisticians usually denote power by the Greek letter beta (β). However, some authors use β to denote the probability of accepting the null hypothesis when the alternative hypothesis is true; this usage is fairly standard in epidemiology. Accepting the null hypothesis when the alternative holds true is a false negative (also called a Type II error, a missed signal, or a false acceptance of the null hypothesis).”

The definition was not, however, without problems.  First, it introduced a nomenclature issue likely to be confusing for judges and lawyers. Kaye and Freeman used β to denote statistical power, but they acknowledge that epidemiologists use β to denote the probability of a Type II error.  And indeed, both the chapters on epidemiology and medical testimony used β to reference Type II error rate, and thus denote power as the complement of β, or (1- β).[40]

Second, the reason for introducing the confusion about β was doubtful.  Kaye and Freeman suggested that statisticians usually denote power by β, but they offered no citations.  A quick review (not necessarily complete or even a random sample) suggests that many modern statistics texts denote power as (1- β).[41]   At the end of the day, there really was no reason for the conflicting nomenclature and the likely confusion it would engenders.  Indeed, the duplicative handling of statistical power, and other concepts, suggested that it is time to eliminate the repetitive discussions, in favor of one, clear, thorough discussion in the statistics chapter.

Third, Kaye and Freeman problematically refer to β as the probability of accepting the null hypothesis when elsewhere they more carefully instructed that a non-significant finding results in not rejecting the null hypothesis as opposed to accepting the null.  Id. at 253.[42]

Fourth, Kaye and Freeman’s discussion of power, unlike most of their chapter, offered advice that is controversial and unclear:

“On the other hand, when studies have a good chance of detecting a meaningful association, failure to obtain significance can be persuasive evidence that there is nothing much to be found.”[43]

Note that the authors left open what a legal or clinically meaningful association is, and thus offered no real guidance to judges on how to evaluate power after data are collected and analyzed.  As Professor Sander Greenland has argued, in legal contexts, this reliance upon observed power (as opposed to power as a guide in determining appropriate sample size in the planning stages of a study) was arbitrary and “unsalvageable as an analytic tool.”[44]

The chapter on epidemiology offered similar controversial advice on the use of power[45]:

“When a study fails to find a statistically significant association, an important question is whether the result tends to exonerate the agent’s toxicity or is essentially inconclusive with regard to toxicity.93 The concept of power can be helpful in evaluating whether a study’s outcome is exonerative or inconclusive.94  The power of a study is the probability of finding a statistically significant association of a given magnitude (if it exists) in light of the sample sizes used in the study. The power of a study depends on several factors: the sample size; the level of alpha (or statistical significance) specified; the background incidence of disease; and the specified relative risk that the researcher would like to detect.95  Power curves can be constructed that show the likelihood of finding any given relative risk in light of these factors. Often, power curves are used in the design of a study to determine what size the study populations should be.96

Although the authors correctly emphasized the need to specify an alternative hypothesis, their discussion and advice were empty of how that alternative should be selected in legal contexts.  The suggestion that power curves can be constructed was, of course, true, but irrelevant unless courts know where on the power curve they should be looking.  The authors were also correct that power is used to determine adequate sample size under specified conditions; but again, the use of power curves in this setting is today rather uncommon.  Investigators select a level of power corresponding to an acceptable Type II error rate, and an alternative hypothesis that would be clinically meaningful for their research, in order to determine their sample size. Translating clinical into legal meaningfulness is not always straightforward.

In a footnote, the authors of the epidemiology chapter noted that Professor Rothman has been “one of the leaders in advocating the use of confidence intervals and rejecting strict significance testing.”[46] What the chapter failed, however, to mention is that Rothman has also been outspoken in rejecting post-hoc power calculations that the epidemiology chapter seemed to invite:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis. The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect. In planning a study, it is reasonable to make conjectures about the magnitude of an effect to compute study-size requirements or power. In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with study-size or power calculations (Smith and Bates, 1992; Goodman and Berlin, 1994; Hoening and Heisey, 2001). Confidence limits and (even more so) P-value functions convey much more of the essential information by indicating the range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level), assuming the statistical model is correct. They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”[47]

The selective, incomplete scholarship of the epidemiology chapter on the issue of statistical power was not only unfortunate, but it distorted the authors’ evaluation of the sparse case law on the issue of power.  For instance, they noted:

“Even when a study or body of studies tends to exonerate an agent, that does not establish that the agent is absolutely safe. See Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767 (N.D. Ohio 2010). Epidemiology is not able to provide such evidence.”[48]

Here the authors, Green, Freedman, and Gordis, shifted the burden to the defendant and then go to an even further extreme of making the burden of proof one of absolute certainty in the product’s safety.  This is not, and never has been, a legal standard. The cases they cited amplified the error. In Cooley, for instance, the defense expert would have opined that welding fume exposure did not cause parkinsonism or Parkinson’s disease.  Although the expert witness had not conducted a meta-analysis, he had reviewed the confidence intervals around the point estimates of the available studies.  Many of the point estimates were at or below 1.0, and in some cases, the upper bound of the confidence interval excluded 1.0.  The trial court expressed its concern that the expert witness had inferred “evidence of absence” from “absence of evidence.”  Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010).  This concern, however, was misguided given that many studies had tested the claimed association, and that virtually every case-control and cohort study had found risk ratios at or below 1.0, or very close to 1.0.  What the court in Cooley, and the authors of the epidemiology chapter in the third edition have lost sight of, is that when the hypothesis is repeatedly tested, with failure to reject the null hypothesis, and with point estimates at or very close to 1.0, and with narrow confidence intervals, then the claimed association is probably incorrect.[49]

The Cooley court’s comments might have had some validity when applied to a single study, but not to the impressive body of exculpatory epidemiologic evidence that pertained to welding fume and Parkinson’s disease.  Shortly after the Cooley case was decided, a published meta-analysis of welding fume or manganese exposure demonstrated a reduced level of risk for Parkinson’s disease among persons occupationally exposed to welding fumes or manganese.[50]

VI. The Treatment of Meta-Analysis in the Third Edition

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis in epidemiology and the social sciences, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by two capable epidemiologists, Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”[51]

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing.[52]

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.[53]

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.[54]  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis. On remand, however, it seems that plaintiffs chose – wisely – not to proceed with Nicholson’s meta-analysis.[55]

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that:

“no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”[56]

The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is completely wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.[57]  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.[58]

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.[59]

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.[60]  Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation.[61]

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis; the third edition did not add very much to the discussion.  The time has come for the next edition to address meta-analyses, and criteria for their validity or invalidity.

  1. Statistics Chapter

The statistics chapter of the third edition gave scant attention to meta-analysis.  The chapter noted, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautioned that meta-analytic procedures “have their own weakness,”[62] without detailing what that weakness is. The time has come to spell out the weaknesses so that trial judges can evaluate opinion testimony based upon meta-analyses.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”[63]

This definition was inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are, or should be, built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.

  1. Epidemiology Chapter

The chapter on epidemiology delved into meta-analysis in greater detail than the statistics chapter, and offered apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”[64]

It is now time to tell trial judges what “careful” means in the context of conducting and reporting and relying upon meta-analyses.

The epidemiology chapter appropriately noted that meta-analysis can help address concerns over random error in small studies.[65]  Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter then proceeded to hedge considerably[66]:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

The stated objection to pooling results for observational studies was certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter went on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the Reference Manual, the authors in the third edition repeated their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”[67]

Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of systematic reviews and meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the third edition of the Reference Manual failed to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice[68]:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

The epidemiology chapter authors were entitled to their opinion, but their discussion left the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed. What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.

  1. Medical Testimony Chapter

The chapter on medical testimony is the third pass at meta-analysis in the third edition of the Reference Manual.  The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs[69]:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

This language from the medical testimony chapter curiously omitted observational studies, but the footnote reference (note 143) then inconsistently discussed two meta-analyses of observational, rather than experimental, studies.[70]  The chapter then provided even further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs[71]:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

This discussion further muddied the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are usually a necessary precondition for conducting a proper meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the third edition of the Reference Manual.

CONCLUSION

The idea of the Reference Manual was important to support trial judge’s efforts to engage in gatekeeping in unfamiliar subject matter areas. In its third incarnation, the Manual has become a standard starting place for discussion, but on several crucial issues, the third edition was unclear, imprecise, contradictory, or muddled. The organizational committee and authors for the fourth edition have a fair amount of work on their hands. There is clearly room for improvement in the Fourth Edition.


[1] Adam Dutkiewicz, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 28 Thomas M. Cooley L. Rev. 343 (2011); John A. Budny, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 31 Internat’l J. Toxicol. 95 (2012); James F. Rogers, Jim Shelson, and Jessalyn H. Zeigler, “Changes in the Reference Manual on Scientific Evidence (Third Edition),” Internat’l Ass’n Def. Csl. Drug, Device & Biotech. Comm. Newsltr. (June 2012).  See Schachtman “New Reference Manual’s Uneven Treatment of Conflicts of Interest.” (Oct. 12, 2011).

[2] The Manual did not do quite so well in addressing its own conflicts of interest.  See, e.g., infra at notes 7, 20.

[3] RSME 3d 11 (2011).

[4] Id. at 19.

[5] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[6] Id. at 19-20 & n.52.

[7] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[8] Id. at 20 n.51. (The editors noted misleadingly that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”). I have written elsewhere of the ethical cloud hanging over this Milward decision. SeeCarl Cranor’s Inference to the Best Explanation” (Feb. 12, 2021); “From here to CERT-ainty” (June 28, 2018); “The Council for Education & Research on Toxics” (July 9, 2013) (CERT amicus brief filed without any disclosure of conflict of interest). See also NAS, “Carl Cranor’s Conflicted Jeremiad Against Daubert” (Sept. 23, 2018).

[9] RMSE 3d at 610 (internal citations omitted).

[10] RMSE 3d at 610 n.184 (emphasis, in bold, added).

[11] Interestingly, the authors of this chapter seem to abandon their suggestion that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5),” which was part of their argument in the Second Edition.  RMSE 2d at 335 (2000).  See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[12] David L. Faigman, et al., Modern Scientific Evidence:  The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[13] See Richard M. Lynch and Mary S. Henifin, “Causation in Occupational Disease: Balancing Epidemiology, Law and Manufacturer Conduct,” 9 Risk: Health, Safety & Environment 259, 269 (1998) (conflating distinct causal and liability concepts, and arguing that legal and scientific causal criteria should be abrogated when manufacturing defendant has breached a duty of care).

[14]  See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006) (dismissing leukemia (AML) claim based upon claimed low-level benzene exposure from gasoline), aff’g 16 A.D.3d 648 (App. Div. 2d Dep’t 2005).  No; you will not find the Parker case cited in the Manual‘s chapter on toxicology. (Parker is, however, cited in the chapter on exposure science even though it is a state court case.).

[15] Curtis D. Klaassen, Casarett & Doull’s Toxicology: The Basic Science of Poisons 23 (7th ed. 2008) (internal citations omitted).

[16] Philip Wexler, Bethesda, et al., eds., 2 Encyclopedia of Toxicology 96 (2005).

[17] See Edward J. Calabrese and Robyn B. Blain, “The hormesis database: The occurrence of hormetic dose responses in the toxicological literature,” 61 Regulatory Toxicology and Pharmacology 73 (2011) (reviewing about 9,000 dose-response relationships for hormesis, to create a database of various aspects of hormesis).  See also Edward J. Calabrese and Robyn B. Blain, “The occurrence of hormetic dose responses in the toxicological literature, the hormesis database: An overview,” 202 Toxicol. & Applied Pharmacol. 289 (2005) (earlier effort to establish hormesis database).

[18] Reference Manual at 653

[19] See e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence.  26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and non­human primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong sup­port to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clini­cal outcome is not associated with the degen­eration of nigrostriatal dopaminergic neurons as is the case in PD [Parkinson’s disease].”)

[20] RMSE3ed at 646.

[21] Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxciological Sciences 223, 224 (2011).

[22] RMSE3d at xiv.

[23] RMSE3d at xiv.

[24] RMSE3d at xiv-xv.

[25] See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006); Exxon Corp. v. Makofski, 116 SW 3d 176 (Tex. Ct. App. 2003).

[26] Goldstein here and elsewhere has confused significance probability with the posterior probability required by courts and scientists.

[27] Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

[28] Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071, 1122 (D. Colo. 2006), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012).

[29] In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009). 

[31] Id. at 256 -57.

[32] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 573.

[33] Id. at 573n.68.

[34] See In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[35] RSME3d at 577 n81.

[36] Id.

[37] 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[38] David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in RMSE3ed 209 (2011).

[39] Id. at 254 n.106

[40] See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3ed 549, 582, 626 ; John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, Abogado, “Reference Guide on Medical Testimony,” in RMSE3ed 687, 724.  This confusion in nomenclature is regrettable, given the difficulty many lawyers and judges seem have in following discussions of statistical concepts.

[41] See, e.g., Richard D. De Veaux, Paul F. Velleman, and David E. Bock, Intro Stats 545-48 (3d ed. 2012); Rand R. Wilcox, Fundamentals of Modern Statistical Methods 65 (2d ed. 2010).

[42] See also Daniel Rubinfeld, “Reference Guide on Multiple Regression,“ in RMSE3d 303, 321 (describing a p-value > 5% as leading to failing to reject the null hypothesis).

[43] RMSE3d at 254.

[44] See Sander Greenland, “Nonsignificance Plus High Power Does Not Imply Support Over the Alternative,” 22 Ann. Epidemiol. 364, 364 (2012).

[45] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE3ed 549, 582.

[46] RMSE3d at 579 n.88.

[47] Kenneth Rothman, Sander Greenland, and Timothy Lash, Modern Epidemiology 160 (3d ed. 2008).  See also Kenneth J. Rothman, “Significance Questing,” 105 Ann. Intern. Med. 445, 446 (1986) (“[Simon] rightly dismisses calculations of power as a weak substitute for confidence intervals, because power calculations address only the qualitative issue of statistical significance and do not take account of the results already in hand.”).

[48] RMSE3d at 582 n.93; id. at 582 n.94 (“Thus, in Smith v. Wyeth-Ayerst Labs. Co., 278 F.Supp. 2d 684, 693 (W.D.N.C. 2003), and Cooley v. Lincoln Electric Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010), the courts recognized that the power of a study was critical to assessing whether the failure of the study to find a statistically significant association was exonerative of the agent or inconclusive.”).

[49] See, e.g., Anthony J. Swerdlow, Maria Feychting, Adele C. Green, Leeka Kheifets, David A. Savitz, International Commission for Non-Ionizing Radiation Protection Standing Committee on Epidemiology, “Mobile Phones, Brain Tumors, and the Interphone Study: Where Are We Now?” 119 Envt’l Health Persp. 1534, 1534 (2011) (“Although there remains some uncertainty, the trend in the accumulating evidence is increasingly against the hypothesis that mobile phone use can cause brain tumors in adults.”).

[50] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

[51] Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

[52] Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

[53] 706 F. Supp. 358, 373 (E.D. Pa. 1988).

[54] In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).

[55] SeeThe Shmeta-Analysis in Paoli,” (July 11, 2019).

[56] In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).

[57] 52 F.3d 1124 (2d Cir. 1995).

[58] Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

[59] See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia CasesJurimetrics J. (2011).

[60] See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

[61] See Finkelstein & Levin, supra at note 59. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies).

[62] RMSE 3d at 254 n.107.

[63] Id. at 289.

[64] Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).

[65] Id. at 579; see also id. at 607 n. 171.

[66] Id. at 607.

[67] Id. at 607 n.177.

[68] Id. at 608.

[69] RMSE 3d at 722-23.

[70] Id. at 723 n.143 (“143. … Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”).

[71] Id. at 723-24.

David Madigan’s Graywashed Meta-Analysis in Taxotere MDL

June 12th, 2020

Once again, a meta-analysis is advanced as a basis for an expert witness’s causation opinion, and once again, the opinion is the subject of a Rule 702 challenge. The litigation is In re Taxotere (Docetaxel) Products Liability Litigation, a multi-district litigation (MDL) proceeding before Judge Jane Triche Milazzo, who sits on the United States District Court for the Eastern District of Louisiana.

Taxotere is the brand name for docetaxel, a chemotherapic medication used either alone or in conjunction with another chemotherapy, to treat a number of different cancers. Hair loss is a side effect of Taxotere, but in the MDL, plaintiffs claim that they have experienced permanent hair loss, which was not adequately warned about in their view. The litigation thus involved issues of exactly what “permanent” means, medical causation, adequacy of warnings in the Taxotere package insert, and warnings causation.

Defendant Sanofi challenged plaintiffs’ statistical expert witness, David Madigan, a frequent testifier for the lawsuit industry. In its Rule 702 motion, Sanofi argued that Madigan had relied upon two randomized clinical trials (TAX 316 and GEICAM 9805) that evaluated “ongoing alopecia” to reach conclusions about “permanent alopecia.” Sanofi made the point that “ongoing” is not “permanent,” and that trial participants who had ongoing alopecia may have had their hair grow back. Madigan’s reliance upon an end point different from what plaintiffs complained about made his analysis irrelevant. The MDL court rejected Sanofi’s argument, with the observation that Madigan’s analysis was not irrelevant for using the wrong end point, only less persuasive, and that Sanofi’s criticism was one that “Sanofi can highlight for the jury on cross-examination.”[1]

Did Judge Milazzo engage in judicial dodging with rejecting the relevancy argument and emphasizing the truism that Sanofi could highlight the discrepancy on cross-examination?  In the sense that the disconnect can be easily shown by highlight the different event rates for the alopecia differently defined, the Sanofi argument seems like one that a jury could easily grasp and refute. The judicial shrug, however, begs the question why the defendant should have to address a data analysis that does not support the plaintiffs’ contention about “permanence.” The federal rules are supposed to advance the finding of the truth and the fair, speedy resolution of cases.

Sanofi’s more interesting argument, from the perspective of Rule 702 case law, was its claim that Madigan had relied upon a flawed methodology in analyzing the two clinical trials:

“Sanofi emphasizes that the results of each study individually produced no statistically significant results. Sanofi argues that Dr. Madigan cannot now combine the results of the studies to achieve statistical significance. The Court rejects Sanofi’s argument and finds that Sanofi’s concern goes to the weight of Dr. Madigan’s testimony, not to its admissibility.34”[2]

There seems to be a lot going on in the Rule 702 challenge that is not revealed in the cryptic language of the MDL district court. First, the court deployed the jurisprudentially horrific, conclusory language to dismiss a challenge that “goes to the weight …, not to … admissibility.” As discussed elsewhere, this judicial locution is rarely true, fails to explain the decision, and shows a lack of engagement with the actual challenge.[3] Of course, aside from the inanity of the expression, and the failure to explain or justify the denial of the Rule 702 challenge, the MDL court may have been able to provide a perfectly adequately explanation.

Second, the footnote in the quoted language, number 34, was to the infamous Milward case,[4] with the explanatory parenthetical that the First Circuit had reversed a district court for excluding testimony of an expert witness who had sought to “draw conclusions based on combination of studies, finding that alleged flaws identified by district court go to weight of testimony not admissibility.”[5] As discussed previously, the widespread use of the “weight not admissibility” locution, even by the Court of Appeals, does not justify it. More important, however, the invocation of Milward suggests that any alleged flaws in combining study results in a meta-analysis are always matters for the jury, no matter how arcane, technical, or threatening to validity they may be.

So was Judge Milazzo engaged in judicial dodging in Her Honor’s opinion in Taxotere? Although the citation to Milward tends to inculpate, the cursory description of the challenge raises questions whether the challenge itself was valid in the first place. Fortunately, in this era of electronic dockets, finding the actual Rule 702 motion is not very difficult, and we can inspect the challenge to see whether it was dodged or given short shrift. Remarkably, the reality is much more complicated than the simple, simplistic rejection by the MDL court would suggest.

Sanofi’s brief attacks three separate analyses proffered by David Madigan, and not surprisingly, the MDL court did not address every point made by Sanofi.[6] Sanofi’s point about the inappropriateness of conducting the meta-analysis was its third in its supporting brief:

“Third, Dr. Madigan conducted a statistical analysis on the TAX316 and GEICAM9805/TAX301 clinical trials separately and combined them to do a ‘meta-analysis’. But Dr. Madigan based his analysis on unproven assumptions, rendering his methodology unreliable. Even without those assumptions, Dr. Madigan did not find statistical significance for either of the clinical trials independently, making this analysis unhelpful to the trier of fact.”[7]

This introductory statement of the issue is itself not particularly helpful because it fails to explain why combining two individual clinical trials (“RCTs”), each not having “statistically significant” results, by meta-analysis would be unhelpful. Sanofi’s brief identified other problems with Madigan’s analyses, but eventually returned to the meta-analysis issue, with the heading:

“Dr. Madigan’s analysis of the individual clinical trials did not result in statistical significance, thus is unhelpful to the jury and will unfairly prejudice Sanofi.”[8]

After a discussion of some of the case law about statistical significance, Sanofi pressed its case against Madigan. Madigan’s statistical analysis of each of two RCTs apparently did not reach statistical significance, and Sanofi complained that permitting Madigan to present these two analyses with results that were “not statistically very impressive,” would confuse and mislead the jury.[9]

“Dr. Madigan tried to avoid that result here [of having two statistically non-significant results] by conducting a ‘meta-analysis’ — a greywashed term meaning that he combined two statistically insignificant results to try to achieve statistical significance. Madigan Report at 20 ¶ 53. Courts have held that meta-analyses are admissible, but only when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist. RMSE at 361–362, fn76.”

Now the claims here are quite unsettling, especially considering that they were lodged in a defense brief, in an MDL, with many cases at stake, made on behalf of an important pharmaceutical company, represented by two large, capable national or international law firms.

First, what does the defense brief signify by placing ‘meta-analysis’ in quotes. Are these scare quotes to suggest that Madigan was passing off something as a meta-analysis that failed to be one? If so, there is nothing in the remainder of the brief that explains such an interpretation. Meta-analysis has been around for decades, and reporting meta-analyses of observational or of experimental studies has been the subject of numerous consensus and standard-setting papers over the last two decades. Furthermore, the FDA has now issued a draft guidance for the use of meta-analyses in pharmacoepidemiology. Scare quotes are at best unexplained, at worst, inappropriate. If the authors had something else in mind, they did not explain the meaning of using quotes around meta-analysis.

Second, the defense lawyers referred to meta-analysis as a “greywashed” term. I am always eager to expand my vocabulary, and so I looked up the word in various dictionaries of statistical and epidemiologic terms. Nothing there. Perhaps it was not a technical term, so I checked with the venerable Oxford English Dictionary. No relevant entries.

Pushed to the wall, I checked the font of all knowledge – the internet. To be sure, I found definitions, but nothing that could explain this odd locution in a brief filed in an important motion:

gray-washing: “noun In calico-bleaching, an operation following the singeing, consisting of washing in pure water in order to wet out the cloth and render it more absorbent, and also to remove some of the weavers’ dressing.”

graywashed: “adj. adopting all the world’s cultures but not really belonging to any of them; in essence, liking a little bit of everything but not everything of a little bit.”

Those definitions do not appear pertinent.

Another website offered a definition based upon the “blogsphere”:

Graywash: “A fairly new term in the blogsphere, this means an investigation that deals with an offense strongly, but not strongly enough in the eyes of the speaker.”

Hmmm. Still not on point.

Another one from “Urban Dictionary” might capture something of what was being implied:

Graywashing: “The deliberate, malicious act of making art having characters appear much older and uglier than they are in the book, television, or video game series.”

Still, I am not sure how this is an argument that a federal judge can respond to in a motion affecting many cases.

Perhaps, you say, I am quibbling with word choices, and I am not sufficiently in tune with the way people talk in the Eastern District of Louisiana. I plead guilty to both counts. But the third, and most important point, is the defense assertion that meta-analyses are only admissible “when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist.”

This assertion is truly puzzling. Meta-analyses involve so many layers of hearsay that they will virtually never be admissible. Admissibility of the meta-analyses is virtually never the issue. When an expert witness has conducted a meta-analysis, or has relied upon one, the important legal question is whether the witness may reasonably rely upon the meta-analysis (under Rule 703) for an inference that satisfies Rule 702. The meta-analysis itself does not come into evidence, and does not go out to the jury for its deliberations.

But what about the defense brief’s “only when” language that clearly implies that courts have held that expert witnesses may rely upon meta-analyses only to reduce “numerical instability on existing statistically significant differences”? This seems clearly wrong because achieving statistical significance from studies that have no “instability” for their point estimates but individually lack statistical significance is a perfectly legitimate and valid goal. Consider a situation in which, for some reason, sample size in each study is limited by the available observations, but we have 10 studies, each with a point estimate of 1.5, and each with a 95% confidence interval of (0.88, 2.5). This hypothetical situation presents no instability of point estimates, and the meta-analytical summary point estimate would shrink the confidence interval so that the lower bound would exclude 1.0, in a perfectly valid analysis. In the real world, meta-analyses are conducted on studies with point estimates of risk that vary, because of random and non-random error, but there is no reason that meta-analyses cannot reduce random error to show that the summary point estimate is statistically significant at a pre-specified alpha, even though no constituent study was statistically significant.

Sanofi’s lawyers did not cite to any case for the remarkable proposition they advanced, but they did cite the Reference Manual for Scientific Evidence (RMSE). Earlier in the brief, the defense cited to this work in its third edition (2011), and so I turned to the cited page (“RMSE at 361–362, fn76”) only to find the introduction to the chapter on survey research, with footnotes 1 through 6.

After a diligent search through the third edition, I could not find any other language remotely supportive of the assertion by Sanofi’s counsel. There are important discussions about how a poorly conducted meta-analysis, or a meta-analysis that was heavily weighted in a direction by a methodologically flawed study, could render an expert witness’s opinion inadmissible under Rule 702.[10] Indeed, the third edition has a more sustained discussion of meta-analysis under the heading “VI. What Methods Exist for Combining the Results of Multiple Studies,”[11] but nothing in that discussion comes close to supporting the remarkable assertion by defense counsel.

On a hunch, I checked the second edition of RMSE, published in the year 2000. There was indeed a footnote 76, on page 361, which discussed meta-analysis. The discussion comes in the midst of the superseded edition’s chapter on epidemiology. Nothing, however, in the text or in the cited footnote appears to support the defense’s contention about meta-analyses are appropriate only when each included clinical trial has independently reported a statistically significant result.

If this analysis is correct, the MDL court was fully justified in rejecting the defense argument that combining two statistically non-significant clinical trials to yield a statistically significant result was methodologically infirm. No cases were cited, and the Reference Manual does not support the contention. Furthermore, no statistical text or treatise on meta-analysis supports the Sanofi claim. Sanofi did not support its motion with any affidavits of experts on meta-analysis.

Now there were other arguments advanced in support of excluding David Madigan’s testimony. Indeed, there was a very strong methodological challenge to Madigan’s decision to include the two RCTs in his meta-analysis, other than those RCTs lack of statistical significance on the end point at issue. In the words of the Sanofi brief:

“Both TAX clinical trials examined two different treatment regimens, TAC (docetaxel in combination with doxorubicin and cyclophosphamide) versus FAC (5-fluorouracil in combination with doxorubicin and cyclophosphamide). Madigan Report at 18–19 ¶¶ 47–48. Dr. Madigan admitted that TAC is not Taxotere alone, Madigan Dep. 305:21–23 (Ex. B); however, he did not rule out doxorubicin or cyclophosphamide in his analysis. Madigan Dep. 284:4–12 (“Q. You can’t rule out other chemotherapies as causes of irreversible alopecia? … A. I can’t rule out — I do not know, one way or another, whether other chemotherapy agents cause irreversible alopecia.”).”[12]

Now unlike the statistical significance argument, this argument is rather straightforward and turns on the clinical heterogeneity of the two trials that seems to clearly point to the invalidity of a meta-analysis of them. Sanofi’s lawyers could have easily supported this point with statements from standard textbooks and non-testifying experts (but alas did not). Sanofi did support their challenge, however, with citations to an important litigation and Fifth Circuit precedent.[13]

This closer look at the actual challenge to David Madigan’s opinions suggests that Sanofi’s counsel may have diluted very strong arguments about heterogeneity in exposure variable, and in the outcome variable, by advancing what seems a very doubtful argument based upon the lack of statistical significance of the individual studies in the Madigan meta-analysis.

Sanofi advanced two very strong points, first about the irrelevant outcome variable definitions used by Madigan, and second about the complexity of Taxotere’s being used with other, and different, chemotherapeutic agents in each of the two trials that Madigan combined.[14] The MDL court addressed the first point in a perfunctory and ultimately unsatisfactory fashion, but did not address the second point at all.

Ultimately, the result was that Madigan was given a pass to offer extremely tenuous opinions in an MDL on causation. Given that Madigan has proffered tendentious opinions in the past, and has been characterized as “an expert on a mission,” whose opinions are “conclusion driven,”[15] the missteps in the briefing, and the MDL court’s abridgement of the gatekeeping process are regrettable. Also regrettable is that the merits or demerits of a Rule 702 challenge cannot be fairly evaluated from cursory, conclusory judicial decisions riddled with meaningless verbiage such as “the challenge goes to the weight and not the admissibility of the witness.” Access to the actual Rule 702 motion helped shed important light on the inadequacy of one point in the motion but also the complexity and fullness of the challenge that was not fully addressed in the MDL court’s decision. It is possible that a Reply or a Supplemental brief, or oral argument, may have filled in gaps, corrected errors, or modified the motion, and the above analysis missed some important aspect of what happened in the Taxotere MDL. If so, all the more reason that we need better judicial gatekeeping, especially when a decision can affect thousands of pending cases.[16]


[1]  In re Taxotere (Docetaxel) Prods. Liab. Litig., 2019 U.S. Dist. LEXIS 143642, at *13 (E.D. La. Aug. 23, 2019) [Op.]

[2]  Op. at *13-14.

[3]  “Judicial Dodgers – Weight not Admissibility” (May 28, 2020).

[4]  Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11, 17-22 (1st Cir. 2011).

[5]  Op. at *13-14 (quoting and citing Milward, 639 F.3d at 17-22).

[6]  Memorandum in Support of Sanofi Defendants’ Motion to Exclude Expert Testimony of David Madigan, Ph.D., Document 6144, in In re Taxotere (Docetaxel) Prods. Liab. Litig. (E.D. La. Feb. 8, 2019) [Brief].

[7]  Brief at 2; see also Brief at 14 (restating without initially explaining why combining two statistically non-significant RCTs by meta-analysis would be unhelpful).

[8]  Brief at 16.

[9]  Brief at 17 (quoting from Madigan Dep. 256:14–15).

[10]  Michael D. Green, Michael Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” at 581n.89, in Fed. Jud. Center, Reference Manual on Scientific Evidence (3d ed. 2011).

[11]  Id. at 606.

[12]  Brief at 14.

[13]  Brief at 14, citing Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, at *7 (E.D. La. June 16, 2015) (Vance, J.) (quoting LeBlanc v. Chevron USA, Inc., 396 F. App’x 94, 99 (5th Cir. 2010)) (“[A] study that notes ‘that the subjects were exposed to a range of substances and then nonspecifically note[s] increases in disease incidence’ can be disregarded.”), aff’d, 650 F. App’x 170 (5th Cir. 2016). SeeThe One Percent Non-solution – Infante Fuels His Own Exclusion in Gasoline Leukemia Case” (June 25, 2015).

[14]  Brief at 14-16.

[15]  In re Accutane Litig., 2015 WL 753674, at *19 (N.J.L.Div., Atlantic Cty., Feb. 20, 2015), aff’d, 234 N.J. 340, 191 A.3d 560 (2018). SeeJohnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015); “N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses” (Aug. 8, 2018).

[16]  Cara Salvatore, “Sanofi Beats First Bellwether In Chemo Drug Hair Loss MDL,” Law360 (Sept. 27, 2019).

Judicial Gatekeeping Cures Claims That Viagra Can Cause Melonoma

January 24th, 2020

The phosphodiesterases 5 inhibitor medications (PDE5i) seem to arouse the litigation propensities of the lawsuit industry. The PDE5i medications (sildenafil, tadalafil, etc.) have multiple indications, but they are perhaps best known for their ability to induce penile erections, which in some situations can be a very useful outcome.

The launch of Viagra in 1998 was followed by litigation that claimed the drug caused heart attacks, and not the romantic kind. The only broken hearts, however, were those of the plaintiffs’ lawyers and their expert witnesses who saw their litigation claims excluded and dismissed.[1]

Then came claims that the PDE5i medications caused non-arteritic anterior ischemic optic neuropathy (“NAION”), based upon a dubious epidemiologic study by Dr. Gerald McGwin. This litigation demonstrated, if anything, that while love may be blind, erections need not be.[2] The NAION cases were consolidated in a multi-district litigation (MDL) in front of Judge Paul Magnuson, in the District of Minnesota. After considerable back and forth, Judge Manguson ultimately concluded that the McGwin study was untrustworthy, and the NAION claims were dismissed.[3]

In 2014, the American Medical Association’s internal medicine journal published an observational epidemiologic study of sildenafil (Viagra) use and melanoma.[4] The authors of the study interpreted their study modestly, concluding:

“[s]ildenafil use may be associated with an increased risk of developing melanoma. Although this study is insufficient to alter clinical recommendations, we support a need for continued investigation of this association.”

Although the Li study eschewed causal conclusions and new clinical recommendations in view of the need for more research into the issue, the litigation industry filed lawsuits, claiming causality.[5]

In the new natural order of things, as soon as the litigation industry cranks out more than a few complaints, an MDL results, and the PDE5i – melanoma claims were no exception. By spring 2016, plaintiffs’ counsel had collected ten cases, a minion, sufficient for an MDL.[6] The MDL plaintiffs named the manufacturers of sildenafil and tadalafil, two of the more widely prescribed PDEi5 medications, on behalf of putative victims.

While the MDL cases were winding their way through discovery and possible trials, additional studies and meta-analyses were published. None of the subsequent studies, including the systematic reviews and meta-analyses, concluded that there was a causal association. Most scientists who were publishing on the issue opined that systematic error (generally confounding) prevented a causal interpretation of the data.[7]

Many of the observational studies found statistically significantly increased relative risks about 1.1 to 1.2 (10 to 20%), typically with upper bounds of 95% confidence intervals less than 2.0. The only scientists who inferred general causation from the available evidence were those who had been recruited and retained by plaintiffs’ counsel. As plaintiffs’ expert witnesses, they contended that the Li study, and the several studies that became available afterwards, collectively showed that PDE5i drugs cause melanoma in humans.

Not surprisingly, given the absence of any non-litigation experts endorsing the causal conclusion, the defendants challenged plaintiffs’ proffered expert witnesses under Federal Rule of Evidence 702. Plaintiffs’ counsel also embraced judicial gatekeeping and challenged the defense experts. The MDL trial judge, the Hon. Richard Seeborg, held hearings with four days of viva voce testimony from four of plaintiffs’ expert witnesses (two on biological plausibility, and two on epidemiology), and three of the defense’s experts. Last week, Judge Seeborg ruled by granting in part, and denying in part, the parties’ motions.[8]

The Decision

The MDL trial judge’s opinion is noteworthy in many respects. First, Judge Richard Seeborg cited and applied Rule 702, a statute, and not dicta from case law that predates the most recent statutory version of the rule. As a legal process matter, this respect for judicial process and the difference in legal authority between statutory and common law was refreshing. Second, the judge framed the Rule 702 issue, in line with the statute, and Ninth Circuit precedent, as an inquiry whether expert witnesses deviated from the standard of care of how scientists “conduct their research and reach their conclusions.”[9]

Biological Plausibility

Plaintiffs proffered three expert witnesses on biological plausibility, Drs. Rizwan Haq, Anand Ganesan, and Gary Piazza. All were subject to motions to exclude under Rule 702. Judge Seeborg denied the defense motions against all three of plaintiffs’ plausibility witnesses.[10]

The MDL judge determined that biological plausibility is neither necessary nor sufficient for inferring causation in science or in the law. The defense argued that the plausibility witnesses relied upon animal and cell culture studies that were unrealistic models of the human experience.[11] The MDL court, however, found that the standard for opinions on biological plausibility is relatively forgiving, and that the testimony of all three of plaintiffs’ proffered witnesses was admissible.

The subjective nature of opinions about biological plausibility is widely recognized in medical science.[12] Plausibility determinations are typically “Just So” stories, offered in the absence of hard evidence that postulated mechanisms are actually involved in a real causal pathway in human beings.

Causal Association

The real issue in the MDL hearings was the conclusion reached by plaintiffs’ expert witnesses that the PDE5i medications cause melanoma. The MDL court did not have to determine whether epidemiologic studies were necessary for such a causal conclusion. Plaintiffs’ counsel had proffered three expert witnesses with more or less expertise in epidemiology: Drs. Rehana Ahmed-Saucedo, Sonal Singh, and Feng Liu-Smith. All of plaintiffs’ epidemiology witnesses, and certainly all of defendants’ experts, implicitly if not explicitly embraced the proposition that analytical epidemiology was necessary to determine whether PDE5i medications can cause melanoma.

In their motions to exclude Ahmed-Saucedo, Singh, and Liu-Smith, the defense pointed out that, although many of the studies yielded statistically significant estimates of melanoma risk, none of the available studies adequately accounted for systematic bias in the form of confounding. Although the plaintiffs’ plausibility expert witnesses advanced “Just-So” stories about PDE5i and melanoma, the available studies showed an almost identical increased risk of basal cell carcinoma of the skin, which would be explained by confounding, but not by plaintiffs’ postulated mechanisms.[13]

The MDL court acknowledged that whether epidemiologic studies “adequately considered” confounding was “central” to the Rule 702 inquiry. Without any substantial analysis, however, the court gave its own ipse dixit that the existence vel non of confounding was an issue for cross-examination and the jury’s resolution.[14] Whether there was a reasonably valid association between PDE5i and melanoma was a jury question. This judicial refusal to engage with the issue of confounding was one of the disappointing aspects of the decision.

The MDL court was less forgiving when it came to the plaintiffs’ epidemiology expert witnesses’ assessment of the association as causal. All the parties’ epidemiology witnesses invoked Sir Austin Bradford Hill’s viewpoints or factors for judging whether associations were causal.[15] Although they embraced Hill’s viewpoints on causation, the plaintiffs’ epidemiologic expert witnesses had a much more difficult time faithfully applying them to the evidence at hand. The MDL court concluded that the plaintiffs’ witnesses deviated from their own professional standard of care in their analysis of the data.[16]

Hill’s first enumerated factor was “strength of association,” which is typically expressed epidemiologically as a risk ratio or a risk difference. The MDL court noted that the extant epidemiologic studies generally showed relative risks around 1.2 for PDE5i and melanoma, which was “undeniably” not a strong association.[17]

The plaintiffs’ epidemiology witnesses were at sea on how to explain away the lack of strength in the putative association. Dr. Ahmed-Saucedo retreated into an emphasis on how all or most of the studies found some increased risk, but the MDL court correctly found that this ruse was merely a conflation of strength with consistency of the observed associations. Dr. Ahmed-Saucedo’s dismissal of the importance of a dose-response relationship, another Hill factor, as unimportant sealed her fate. The MDL court found that her Bradford Hill analysis was “unduly results-driven,” and that her proffered testimony was not admissible.[18] Similarly, the MDL court found that Dr. Feng Liu-Smith similarly conflated strength of association with consistency, which error was too great a professional deviation from the standard of care.[19]

Dr. Sonal Singh fared no better after he contradicted his own prior testimony that there is an order of importance to the Hill factors, with “strength of association,” at or near the top. In the face of a set of studies, none of which showed a strong association, Dr. Singh abandoned his own interpretative principle to suit the litigation needs of the case. His analysis placed the greatest weight on the Li study, which had the highest risk ratio, but he failed to advance any persuasive reason for his emphasis on one of the smallest studies available. The MDL court found that Dr. Singh’s claim to have weighed strength of association heavily, despite the obvious absence of strong associations, puzzling and too great an analytical gap to abide.[20]

Judge Seeborg thus concluded that while the plaintiffs’ expert witness could opine that there was an association, which was arguably plausible, they could not, under Rule 702, contend that the association was causal. In attempting to advance an argument that the association met Bradford Hill’s factors for causality, the plaintiffs’ witnesses had ignored, misrepresented, or confused one of the most important factors, strength of the association, in a way that revealed their analyses to be result driven and unfaithful to the methodology they claimed to have followed. Judge Seeborg emphasized a feature of the revised Rule 702, which often is ignored by his fellow federal judges:[21]

“Under the amendment, as under Daubert, when an expert purports to apply principles and methods in accordance with professional standards, and yet reaches a conclusion that other experts in the field would not reach, the trial court may fairly suspect that the principles and methods have not been faithfully applied. See Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F.3d 594, 598 (9th Cir. 1996). The amendment specifically provides that the trial court must scrutinize not only the principles and methods used by the expert, but also whether those principles and methods have been properly applied to the facts of the case.”

Given that the plaintiffs’ witnesses purported to apply a generally accepted methodology, Judge Seeborg was left to question why they would conclude causality when no one else in their field had done so.[22] The epidemiologic issue had been around for several years, and addressed not just in observational studies, but systematically reviewed and meta-analyzed. The absence of published causal conclusions was not just an absence of evidence, but evidence of absence of expert support for how plaintiffs’ expert witnesses applied the Bradford Hill factors.

Reliance Upon Studies That Did Not Conclude Causation Existed

Parties challenging causal claims will sometimes point to the absence of a causal conclusion in the publication of individual epidemiologic studies that are the main basis for the causal claim. In the PDE5i-melanoma cases, the defense advanced this argument unsuccessfully. The MDL court rejected the defense argument, based upon the absence of any comprehensive review of all the pertinent evidence for or against causality in an individual study; the study authors are mostly concerned with conveying the results of their own study.[23] The authors may have a short discussion of other study results as the rationale for their own study, but such discussions are often limited in scope and purpose. Judge Seeborg, in this latest round of PDE5i litigation, thus did not fault plaintiffs’ witnesses’ reliance upon epidemiologic or mechanistic studies, which individually did not assert causal conclusions; rather it was the absence of causal conclusions in systematic reviews, meta-analyses, narrative reviews, regulatory agency pronouncements, or clinical guidelines that ultimately raised the fatal inference that the plaintiffs’ witnesses were not faithfully deploying a generally accepted methodology.

The defense argument that pointed to the individual epidemiologic studies themselves derives some legal credibility from the Supreme Court’s opinion in General Electric Co. v. Joiner, 522 U.S. 136 (1997). In Joiner, the SCOTUS took plaintiffs’ expert witnesses to task for drawing stronger conclusions than were offered in the papers upon which they relied. Chief Justice Rehnquist gave considerable weight to the consideration that the plaintiffs’ expert witnesses relied upon studies, the authors of which explicitly refused to interpret as supporting a conclusion of human disease causation.[24]

Joiner’s criticisms of the reliance upon studies that do not themselves reach causal conclusions have gained a foothold in the case law interpreting Rule 702. The Fifth Circuit, for example, has declared:[25]

“It is axiomatic that causation testimony is inadmissible if an expert relies upon studies or publications, the authors of which were themselves unwilling to conclude that causation had been proven.”

This aspect of Joiner may properly limit the over-interpretation or misinterpretation of an individual study, which seems fine.[26] The Joiner case may, however, perpetuate an authority-based view of science to the detriment of requiring good and sufficient reasons to support the testifying expert witnesses’ opinions.  The problem with Joiner’s suggestion that expert witness opinion should not be admissible if it disagrees with the study authors’ discussion section is that sometimes study authors grossly over-interpret their data.  When it comes to scientific studies written by “political scientists” (scientists who see their work as advancing a political cause or agenda), then the discussion section often becomes a fertile source of unreliable, speculative opinions that should not be given credence in Rule 104(a) contexts, and certainly should not be admissible in trials. In other words, the misuse of non-rigorous comments in published articles can cut both ways.

There have been, and will continue to be, occasions in which published studies contain data, relevant and important to the causation issue, but which studies also contain speculative, personal opinions expressed in the Introduction and Discussion sections.  The parties’ expert witnesses may disagree with those opinions, but such disagreements hardly reflect poorly upon the testifying witnesses.  Neither side’s expert witnesses should be judged by those out-of-court opinions.  Perhaps the hearsay discussion section may be considered under Rule 104(a), which suspends the application of the Rules of Evidence, but it should hardly be a dispositive factor, other than raising questions for the reviewing court.

In exercising their gatekeeping function, trial judges should exercise care in how they assess expert witnesses’ reliance upon study data and analyses, when they disagree with the hearsay authors’ conclusions or discussions.  Given how many journals cater to advocacy scientists, and how variable the quality of peer review is, testifying expert witnesses should, in some instances,  have the expertise to interpret the data without substantial reliance upon, or reference to, the interpretative comments in the published literature.

Judge Seeborg sensibly seems to have distinguished between the absence of causal conclusions in individual epidemiologic studies and the absence of causal conclusions in any reputable medical literature.[27] He refused to be ensnared in the Joiner argument because:[28]

“Epidemiology studies typically only expressly address whether an association exists between agents such as sildenafil and tadalafil and outcomes like melanoma progression. As explained in In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1116 (N.D. Cal. 2018), ‘[w]hether the agents cause the outcomes, however, ordinarily cannot be proven by epidemiological studies alone; an evaluation of causation requires epidemiologists to exercise judgment about the import of those studies and to consider them in context’.”

This new MDL opinion, relying upon the Advisory Committee Notes to Rule 702, is thus a more felicitous statement of the goals of gatekeeping.

Confidence Intervals

As welcome as some aspects of Judge Seeborg’s opinion are, the decision is not without mistakes. The district judge, like so many of his judicial colleagues, trips over the proper interpretation of a confidence interval:[29]

“When reviewing the results of a study it is important to consider the confidence interval, which, in simple terms, is the ‘margin of error’. For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent ‘confidence interval’ of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”

This statement is inescapably wrong. The 95 percent probability attaches to the capturing of the true parameter – the actual relative risk – in the long run of repeated confidence intervals that result from repeated sampling of the same sample size, in the same manner, from the same population. In Judge Seeborg’s example, the next sample might give a relative risk point estimate 1.9, and that new estimate will have a confidence interval that may run from just below 1.0 to over 3. A third sample might turn up a relative risk estimate of 0.8, with a confidence interval that runs from say 0.3 to 1.4. Neither the second nor the third sample would be reasonably incompatible with the first. A more accurate assessment of the true parameter is that it will be somewhere between 0.3 and 3, a considerably broader range for the 95 percent.

Judge Seeborg’s error is sadly all too common. Whenever I see the error, I wonder whence it came. Often the error is in briefs of both plaintiffs’ and defense counsel. In this case, I did not see the erroneous assertion about confidence intervals made in plaintiffs’ or defendants’ briefs.


[1]  Brumley  v. Pfizer, Inc., 200 F.R.D. 596 (S.D. Tex. 2001) (excluding plaintiffs’ expert witness who claimed that Viagra caused heart attack); Selig v. Pfizer, Inc., 185 Misc. 2d 600 (N.Y. Cty. S. Ct. 2000) (excluding plaintiff’s expert witness), aff’d, 290 A.D. 2d 319, 735 N.Y.S. 2d 549 (2002).

[2]  “Love is Blind but What About Judicial Gatekeeping of Expert Witnesses? – Viagra Part I” (July 7, 2012); “Viagra, Part II — MDL Court Sees The Light – Bad Data Trump Nuances of Statistical Inference” (July 8, 2012).

[3]  In re Viagra Prods. Liab. Litig., 572 F.Supp. 2d 1071 (D. Minn. 2008), 658 F. Supp. 2d 936 (D. Minn. 2009), and 658 F. Supp. 2d 950 (D. Minn. 2009).

[4]  Wen-Qing Li, Abrar A. Qureshi, Kathleen C. Robinson, and Jiali Han, “Sildenafil use and increased risk of incident melanoma in US men: a prospective cohort study,” 174 J. Am. Med. Ass’n Intern. Med. 964 (2014).

[5]  See, e.g., Herrara v. Pfizer Inc., Complaint in 3:15-cv-04888 (N.D. Calif. Oct. 23, 2015); Diana Novak Jones, “Viagra Increases Risk Of Developing Melanoma, Suit Says,” Law360 (Oct. 26, 2015).

[6]  See In re Viagra (Sildenafil Citrate) Prods. Liab. Litig., 176 F. Supp. 3d 1377, 1378 (J.P.M.L. 2016).

[7]  See, e.g., Jenny Z. Wang, Stephanie Le , Claire Alexanian, Sucharita Boddu, Alexander Merleev, Alina Marusina, and Emanual Maverakis, “No Causal Link between Phosphodiesterase Type 5 Inhibition and Melanoma,” 37 World J. Men’s Health 313 (2019) (“There is currently no evidence to suggest that PDE5 inhibition in patients causes increased risk for melanoma. The few observational studies that demonstrated a positive association between PDE5 inhibitor use and melanoma often failed to account for major confounders. Nonetheless, the substantial evidence implicating PDE5 inhibition in the cyclic guanosine monophosphate (cGMP)-mediated melanoma pathway warrants further investigation in the clinical setting.”); Xinming Han, Yan Han, Yongsheng Zheng, Qiang Sun, Tao Ma, Li Dai, Junyi Zhang, and Lianji Xu, “Use of phosphodiesterase type 5 inhibitors and risk of melanoma: a meta-analysis of observational studies,” 11 OncoTargets & Therapy 711 (2018).

[8]  In re Viagra (Sildenafil Citrate) and Cialis (Tadalafil) Prods. Liab. Litig., Case No. 16-md-02691-RS, Order Granting in Part and Denying in Part Motions to Exclude Expert Testimony (N.D. Calif. Jan. 13, 2020) [cited as Opinion].

[9]  Opinion at 8 (“determin[ing] whether the analysis undergirding the experts’ testimony falls within the range of accepted standards governing how scientists conduct their research and reach their conclusions”), citing Daubert v. Merrell Dow Pharm., Inc. (Daubert II), 43 F.3d 1311, 1317 (9th Cir. 1995).

[10]  Opinion at 11.

[11]  Opinion at 11-13.

[12]  See Kenneth J. Rothman, Sander Greenland, and Timothy L. Lash, “Introduction,” chap. 1, in Kenneth J. Rothman, et al., eds., Modern Epidemiology at 29 (3d ed. 2008) (“no approach can transform plausibility into an objective causal criterion).

[13]  Opinion at 15-16.

[14]  Opinion at 16-17.

[15]  See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965); see also “Woodside & Davis on the Bradford Hill Considerations” (April 23, 2013).

[16]  Opinion at 17 – 21.

[17]  Opinion at 18. The MDL court cited In re Silicone Gel Breast Implants Prod. Liab. Litig., 318 F. Supp. 2d 879, 893 (C.D. Cal. 2004), for the proposition that relative risks greater than 2.0 permit the inference that the agent under study “was more likely than not responsible for a particular individual’s disease.”

[18]  Opinion at 18.

[19]  Opinion at 20.

[20]  Opinion at 19.

[21]  Opinion at 21, quoting from Rule 702, Advisory Committee Notes (emphasis in Judge Seeborg’s opinion).

[22]  Opinion at 21.

[23]  SeeFollow the Data, Not the Discussion” (May 2, 2010).

[24]  Joiner, 522 U.S. at 145-46 (noting that the PCB studies at issue did not support expert witnesses’ conclusion that PCB exposure caused cancer because the study authors, who conducted the research, were not willing to endorse a conclusion of causation).

[25]  Huss v. Gayden, 571 F.3d 442  (5th Cir. 2009) (citing Vargas v. Lee, 317 F.3d 498, 501-01 (5th Cir. 2003) (noting that studies that did not themselves embrace causal conclusions undermined the reliability of the plaintiffs’ expert witness’s testimony that trauma caused fibromyalgia); see also McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1247-48 (11th Cir. 2005) (expert witnesses’ reliance upon studies that did not reach causal conclusions about ephedrine supported the challenge to the reliability of their proffered opinions); Happel v. Walmart, 602 F.3d 820, 826 (7th Cir. 2010) (observing that “is axiomatic that causation testimony is inadmissible if an expert relies upon studies or publications, the authors of which were themselves unwilling to conclude that causation had been proven”).

[26]  In re Accutane Prods. Liab. Litig., 511 F. Supp. 2d 1288, 1291 (M.D. Fla. 2007) (“When an expert relies on the studies of others, he must not exceed the limitations the authors themselves place on the study. That is, he must not draw overreaching conclusions.) (internal citations omitted).

[27]  See Rutigliano v. Valley Bus. Forms, 929 F. Supp. 779, 785 (D.N.J. 1996), aff’d, 118 F.3d 1577 (3d Cir. 1997) (“law warns against use of medical literature to draw conclusions not drawn in the literature itself …. Reliance upon medical literature for conclusions not drawn therein is not an accepted scientific methodology.”).

[28]  Opinion at 14

[29]  Opinion at 4 – 5.

The Shmeta-Analysis in Paoli

July 11th, 2019

In the Paoli Railroad yard litigation, plaintiffs claimed injuries and increased risk of future cancers from environmental exposure to polychlorinated biphenyls (PCBs). This massive litigation showed up before federal district judge Hon. Robert F. Kelly,[1] in the Eastern District of Pennsylvania, who may well have been the first judge to grapple with a litigation attempt to use meta-analysis to show a causal association.

One of the plaintiffs’ expert witnesses was the late William J. Nicholson, who was a professor at Mt. Sinai School of Medicine, and a colleague of Irving Selikoff. Nicholson was trained in physics, and had no professional training in epidemiology. Nonetheless, Nicholson was Selikoff’s go-to colleague for performing epidemiologic studies. After Selikoff withdrew from active testifying for plaintiffs in tort litigation, Nicholson was one of his colleagues who jumped into the fray as a surrogate advocate for Selikoff.[2]

For his opinion that PCBs were causally associated with liver cancer in humans,[3] Nicholson relied upon a report he wrote for the Ontario Ministry of Labor. [cited here as “Report”].[4] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[5]

The defense challenged the admissibility of Nicholson’s meta-analysis, on several grounds. The trial court decided the challenge based upon the Downing case, which was the law in the Third Circuit, before the Supreme Court decided Daubert.[6] The Downing case allowed some opportunity for consideration of reliability and validity concerns; there is, however, disappointingly little discussion of any actual validity concerns in the courts’ opinions.

The defense challenge to Nicholson’s proffered testimony on liver cancer turned on its characterization of meta-analysis as a “novel” technique, which is generally unreliable, and its claim that Nicholson’s meta-analysis in particular was unreliable. None of the individual studies that contributed data showed any “connection” between PCBs and liver cancer; nor did any individual study conclude that there was a causal association.

Of course, the appropriate response to this situation, with no one study finding a statistically significant association, or concluding that there was a causal association, should have been “so what?” One of the reasons to do a meta-analysis is that no available study was sufficiently large to find a statistically significant association, if one were there. As for drawing conclusions of causal associations, it is not the role or place of an individual study to synthesize all the available evidence into a principled conclusion of causation.

In any event, the trial court concluded that the proffered novel technique lacked sufficient reliability, that the meta-analysis would “overwhelm, confuse, or mislead the jury,” and that the proffered meta-analysis on liver cancer was not sufficiently relevant to the facts of the case (in which no plaintiff had developed, or had died of, liver cancer). The trial court noted that the Report had not been peer-reviewed, and that it had not been accepted or relied upon by the Ontario government for any finding or policy decision. The trial court also expressed its concern that the proffered testimony along the lines of the Report would possibly confuse the jury because it appeared to be “scientific” and because Nicholson appeared to be qualified.

The Appeal

The Court of Appeals for the Third Circuit, in an opinion by Judge Becker, reversed Judge Kelly’s exclusion of the Nicholson Report, in an opinion that is still sometimes cited, even though Downing is no longer good law in the Circuit or anywhere else.[7] The Court was ultimately not persuaded that the trial court had handled the exclusion of Nicholson’s Report and its meta-analysis correctly, and it remanded the case for a do-over analysis.

Judge Becker described Nicholson’s Report as a “meta-analysis,” which pooled or “combined the results of numerous epidemiologic surveys in order to achieve a larger sample size, adjusted the results for differences in testing techniques, and drew his own scientific conclusions.”[8] Through this method, Nicholson claimed to have shown that “exposure to PCBs can cause liver, gall bladder and biliary tract disorders … even though none of the individual surveys supports such a conclusion when considered in isolation.”[9]

Validity

The appellate court gave no weight to the possibility that a meta-analysis would confuse a jury, or that its “scientific nature” or Nicholson’s credentials would lead a jury to give it more weight than it deserved.[10] The Court of Appeals conceded, however, that exclusion would have been appropriate if the methodology used itself was invalid. The appellate opinion further acknowledged that the defense had offered opposition to Nicholson’s Report in which it documented his failure to include data that were inconsistent with his conclusions, and that “Nicholson had produced a scientifically invalid study.”[11]

Judge Becker’s opinion for a panel of the Third Circuit provided no details about the cherry picking. The opinion never analyzed why this charge of cherry-picking and manipulation of the dataset did not invalidate the meta-analytic method generally, or Nicholson’s method as applied. The opinion gave no suggestion that this counter-affidavit was ever answered by the plaintiffs.

Generally, Judge Becker’s opinion dodged engagement with the specific threats to validity in Nicholson’s Report, and took refuge in the indisputable fact that hundreds of meta-analyses were published annually, and that the defense expert witnesses did not question the general reliability of meta-analysis.[12] These facts undermined the defense claim that meta-analysis was novel.[13] The reality, however, was that meta-analysis was in its infancy in bio-medical research.

When it came to the specific meta-analysis at issue, the court did not discuss or analyze a single pertinent detail of the Report. Despite its lack of engagement with the specifics of the Report’s meta-analysis, the court astutely observed that prevalent errors and flaws do not mean that a particular meta-analysis is “necessarily in error.”[14] Of course, without bothering to look, the court would not know whether the proffered meta-analysis was “actually in error.”

The appellate court would have given Nicholson’s Report a “pass” if it was an application of an accepted methodology. The defense’s remedy under this condition would be to cross-examine the opinion in front of a jury. If, on the other hand, the Nicholson had altered an accepted methodology to skew its results, then the court’s gatekeeping responsibility under Downing would be invoked.

The appellate court went on to fault the trial court for failing to make sufficiently explicit findings as to whether the questioned meta-analysis was unreliable. From its perspective, the Court of Appeals saw the trial court as resolving the reliability issue upon the greater credibility of defense expert witnesses in branding the disputed meta-analysis as unreliability. Credibility determinations are for the jury, but the court left room for a challenge on reliability itself:[15]

“Assuming that Dr. Nicholson’s meta-analysis is the proper subject of Downing scrutiny, the district court’s decision is wanting, because it did not make explicit enough findings on the reliability of Dr. Nicholson’s meta-analysis to satisfy Downing. We decline to define the exact level at which a district court can exclude a technique as sufficiently unreliable. Reliability indicia vary so much from case to case that any attempt to define such a level would most likely be pointless. Downing itself lays down a flexible rule. What is not flexible under Downing is the requirement that there be a developed record and specific findings on reliability issues. Those are absent here. Thus, even if it may be possible to exclude Dr. Nicholson’s testimony under Downing, as an unreliable, skewed meta-analysis, we cannot make such a determination on the record as it now stands. Not only was there no hearing, in limine or otherwise, at which the bases for the opinions of the contesting experts could be evaluated, but the experts were also not even deposed. All of the expert evidence was based on affidavits.”

Peer Review

Understandably, the defense attacked Nicholson’s Report as not having been peer reviewed. Without any scrutiny of the scientific bona fides of the workers’ compensation agency, the appellate court acquiesced in Nicholson’s self-serving characterization of his Report as having been reviewed by “cooperating researchers” and the Panel of the Ontario Workers’ Compensation agency. Another partisan expert witness characterized Nicholson’s Report as a “balanced assessment,” and this seemed to appease the Third Circuit, which was wary of requiring peer review in the first place.[16]

Relevancy Prong

The defense had argued that Nicholson’s Report was irrelevant because no individual plaintiff claimed liver cancer.[17] The trial court largely accepted this argument, but the appellate court disagreed because of conclusory language in Nicholson’s affidavit, in which he asserted that “proof of an increased risk of liver cancer is probative of an increased risk of other forms of cancer.” The court seemed unfazed by the ipse dixit, asserted without any support. Indeed, Nicholson’s assertion was contradicted by his own Report, in which he reported that there were fewer cancers among PCB-exposed male capacitor manufacturing workers than expected,[18] and that the rate for all cancers for both men and women was lower than expected, with 132 observed and 139.40 expected.[19]

The trial court had also agreed with the defense’s suggestion that Nicholson’s report, and its conclusion of causality between PCB exposure and liver cancer, were irrelevant because the Report “could not be the basis for anyone to say with reasonable degree of scientific certainty that some particular person’s disease, not cancer of the liver, biliary tract or gall bladder, was caused by PCBs.”[20]

Analysis

It would likely have been lost on Judge Becker and his colleagues, but Nicholson presented SMRs (standardized mortality ratios) throughout his Report, and for the all cancers statistic, he gave an SMR of 95. What Nicholson clearly did in this, and in all other instances, was simply divide the observed number by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of false and misleading. And in case anyone remembers General Electric v. Joiner, Nicholson’s summary estimate of risk for lung cancer in men was below the expected rate.[21]

Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[22]

Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.

Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies is not a valid meta-analysis; indeed, it is a well-known example of how to generate a Simpson’s Paradox, which can change the direction or magnitude of any association.[23]

Some may be tempted to criticize the defense for having focused its challenge on the “novelty” of Nicholson’s approach in Paoli. The problem of course was the invalidity of Nicholson’s work, but both the trial court’s exclusion of Nicholson, and the Court of Appeals’ reversal and remand of the exclusion decision, illustrate the problem in getting judges, even well-respected judges, to accept their responsibility to engage with questioned scientific evidence.

Even in Paoli, no amount of ketchup could conceal the unsavoriness of Nicholson’s scrapple analysis. When the Paoli case reached the Court Appeals again in 1994, Nicholson’s analysis was absent.[24] Apparently, the plaintiffs’ counsel had second thoughts about the whole matter. Today, under the revised Rule 702, there can be little doubt that Nicholson’s so-called meta-analysis should have been excluded.


[1]  Not to be confused with the Judge Kelly of the same district, who was unceremoniously disqualified after attending an ex parte conference with plaintiffs’ lawyers and expert witnesses, at the invitation of Dr. Irving Selikoff.

[2]  Pace Philip J. Landrigan & Myron A. Mehlman, “In Memoriam – William J. Nicholson,” 40 Am. J. Indus. Med. 231 (2001). Landrigan and Mehlman assert, without any support, that Nicholson was an epidemiologist. Their own description of his career, his undergraduate work at MIT, his doctorate in physics from the University of Washington, his employment at the Watson Laboratory, before becoming a staff member in Irving Selikoff’s department in 1969, all suggest that Nicholson brought little to no experience in epidemiology to his work on occupational and environmental exposure epidemiology.

[3]  In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).

[4]  William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto, Ontario Dec. 1987).

[5]  Id. at 373.

[6]  United States v. Downing, 753 F.2d 1224 (3d Cir.1985)

[7]  In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 111 S.Ct. 1584 (1991).

[8]  Id. at 845.

[9]  Id.

[10]  Id. at 841, 848.

[11]  Id. at 845.

[12]  Id. at 847-48.

[13]  See, e.g., Robert Rosenthal, Judgment studies: Design, analysis, and meta-analysis (1987); Richard J. Light & David B. Pillemer, Summing Up: the Science of Reviewing Research (1984); Thomas A. Louis, Harvey V. Fineberg & Frederick Mosteller, “Findings for Public Health from Meta-Analyses,” 6 Ann. Rev. Public Health 1 (1985); Kristan A. L’abbé, Allan S. Detsky & Keith O’Rourke, “Meta-analysis in clinical research,” 107 Ann. Intern. Med. 224 (1987).

[14]  Id. at 857.

[15]  Id. at 858/

[16]  Id. at 858.

[17]  Id. at 845.

[18]  Report, Table 16.

[19]  Report, Table 18.

[20]  In re Paoli, 916 F.2d at 847.

[21]  See General Electric v. Joiner, 522 U.S. 136 (1997); NAS, “How Have Important Rule 702 Holdings Held Up With Time?” (March 20, 2015).

[22]  Report, Table 22.

[23]  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in Statistics, 2 Biometrika 121 (1903).

[24]  In re Paoli RR Yard Litig., 35 F.3d 717 (3d Cir. 1994).

N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses

August 8th, 2018

The United States Supreme Court’s decision in Daubert is now over 25 years old. The idea of judicial gatekeeping of expert witness opinion testimony is even older in New Jersey state courts. The New Jersey Supreme Court articulated a reliability standard before the Daubert case was even argued in Washington, D.C. See Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991). Articulating a standard, however, is something very different from following a standard, and in many New Jersey trial courts, until very recently, the standard was pretty much anything goes.

One counter-example to the general rule of dog-eat-dog in New Jersey was Judge Nelson Johnson’s careful review and analysis of the proffered causation opinions in cases in which plaintiffs claimed that their use of the anti-acne medication isotretinoin (Accutane) caused Crohn’s disease. Judge Johnson, who sits in the Law Division of the New Jersey Superior Court for Atlantic County held a lengthy hearing, and reviewed the expert witnesses’ reliance materials.1 Judge Johnson found that the plaintiffs’ expert witnesses had employed undue selectivity in choosing what to rely upon. Perhaps even more concerning, Judge Johnson found that these witnesses had refused to rely upon reasonably well-conducted epidemiologic studies, while embracing unpublished, incomplete, and poorly conducted studies and anecdotal evidence. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div., Atlantic Cty. Feb. 20, 2015). In response, Judge Johnson politely but firmly closed the gate to conclusion-driven duplicitous expert witness causation opinions in over 2,000 personal injury cases. “Johnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015).

Aside from resolving over 2,000 pending cases, Judge Johnson’s judgment was of intense interest to all who are involved in pharmaceutical and other products liability litigation. Judge Johnson had conducted a pretrial hearing, sometimes called a Kemp hearing in New Jersey, after the New Jersey Supreme Court’s opinion in Kemp v. The State of New Jersey, 174 N.J. 412 (2002). At the hearing and in his opinion that excluded plaintiffs’ expert witnesses’ causation opinions, Judge Johnson demonstrated a remarkable aptitude for analyzing data and inferences in the gatekeeping process.

When the courtroom din quieted, the trial court ruled that the proffered testimony of Dr., Arthur Kornbluth and Dr. David Madigan did not meet the liberal New Jersey test for admissibility. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015). And in closing the gate, Judge Johnson protected the judicial process from several bogus and misleading “lines of evidence,” which have become standard ploys to mislead juries in courthouses where the gatekeepers are asleep. Recognizing that not all evidence is on the same analytical plane, Judge Johnson gave case reports short shrift.

[u]nsystematic clinical observations or case reports and adverse event reports are at the bottom of the evidence hierarchy.”

Id. at *16. Adverse event reports, largely driven by the very litigation in his courtroom, received little credit and were labeled as “not evidentiary in a court of law.” Id. at 14 (quoting FDA’s description of FAERS).

Judge Johnson recognized that there was a wide range of identified “risk factors” for irritable bowel syndrome, such as prior appendectomy, breast-feeding as an infant, stress, Vitamin D deficiency, tobacco or alcohol use, refined sugars, dietary animal fat, fast food. In re Accutane, 2015 WL 753674, at *9. The court also noted that there were four medications generally acknowledged to be potential risk factors for inflammatory bowel disease: aspirin, nonsteroidal anti-inflammatory medications (NSAIDs), oral contraceptives, and antibiotics. Understandably, Judge Johnson was concerned that the plaintiffs’ expert witnesses preferred studies unadjusted for potential confounding co-variables and studies that had involved “cherry picking the subjects.” Id. at *18.

Judge Johnson had found that both sides in the isotretinoin cases conceded the relative unimportance of animal studies, but the plaintiffs’ expert witnesses nonetheless invoked the animal studies in the face of the artificial absence of epidemiologic studies that had been created by their cherry-picking strategies. Id.

Plaintiffs’ expert witnesses had reprised a common claimants’ strategy; namely, they claimed that all the epidemiology studies lacked statistical power. Their arguments often ignored that statistical power calculations depend upon statistical significance, a concept to which many plaintiffs’ counsel have virulent antibodies, as well as an arbitrarily selected alternative hypothesis of association size. Furthermore, the plaintiffs’ arguments ignored the actual point estimates, most of which were favorable to the defense, and the observed confidence intervals, most of which were reasonably narrow.

The defense responded to the bogus statistical arguments by presenting an extremely capable clinical and statistical expert witness, Dr. Stephen Goodman, to present a meta-analysis of the available epidemiologic evidence.

Meta-analysis has become an important facet of pharmaceutical and other products liability litigation[1]. Fortunately for Judge Johnson, he had before him an extremely capable expert witness, Dr. Stephen Goodman, to explain meta-analysis generally, and two meta-analyses he had performed on isotretinoin and irritable bowel outcomes.

Dr. Goodman explained that the plaintiffs’ witnesses’ failure to perform a meta-analysis was telling when meta-analysis can obviate the plaintiffs’ hyperbolic statistical complaints:

the strength of the meta-analysis is that no one feature, no one study, is determinant. You don’t throw out evidence except when you absolutely have to.”

In re Accutane, 2015 WL 753674, at *8.

Judge Johnson’s judicial handiwork received non-deferential appellate review from a three-judge panel of the Appellate Division, which reversed the exclusion of Kornbluth and Madigan. In re Accutane Litig., 451 N.J. Super. 153, 165 A.3d 832 (App. Div. 2017). The New Jersey Supreme Court granted the isotretinoin defendants’ petition for appellate review, and the issues were joined over the appropriate standard of appellate review for expert witness opinion exclusions, and the appropriateness of Judge Johnson’s exclusions of Kornbluth and Madigan. A bevy of amici curiae joined in the fray.2

Last week, the New Jersey Supreme Court issued a unanimous opinion, which reversed the Appellate Division’s holding that Judge Johnson had “mistakenly exercised” discretion. Applying its own precedents from Rubanick, Landrigan, and Kemp, and the established abuse-of-discretion standard, the Court concluded that the trial court’s ruling to exclude Kornbluth and Madigan was “unassailable.” In re Accutane Litig., ___ N.J. ___, 2018 WL 3636867 (2018), Slip op. at 79.3

The high court graciously acknowledged that defendants and amici had “good reason” to seek clarification of New Jersey law. Slip op. at 67. In abandoning abuse-of-discretion as its standard of review, the Appellate Division had relied upon a criminal case that involved the application of the Frye standard, which is applied as a matter of law. Id. at 70-71. The high court also appeared to welcome the opportunity to grant review and reverse the intermediate court reinforce “the rigor expected of the trial court” in its gatekeeping role. Id. at 67. The Supreme Court, however, did not articulate a new standard; rather it demonstrated at length that Judge Johnson had appropriately applied the legal standards that had been previously announced in New Jersey Supreme Court cases.4

In attempting to defend the Appellate Division’s decision, plaintiffs sought to characterize New Jersey law as somehow different from, and more “liberal” than, the United States Supreme Court’s decision in Daubert. The New Jersey Supreme Court acknowledged that it had never formally adopted the dicta from Daubert about factors that could be considered in gatekeeping, slip op. at 10, but the Court went on to note what disinterested observers had long understood, that the so-called Daubert factors simply flowed from a requirement of sound methodology, and that there was “little distinction” and “not much light” between the Landrigan and Rubanick principles and the Daubert case or its progeny. Id at 10, 80.

Curiously, the New Jersey Supreme Court announced that the Daubert factors should be incorporated into the New Jersey Rules 702 and 703 and their case law, but it stopped short of declaring New Jersey a “Daubert” jurisdiction. Slip op. at 82. In part, the Court’s hesitance followed from New Jersey’s bifurcation of expert witness standards for civil and criminal cases, with the Frye standard still controlling in the criminal docket. At another level, it makes no sense to describe any jurisdiction as a “Daubert” state because the relevant aspects of the Daubert decision were dicta, and the Daubert decision and its progeny were superseded by the revision of the controlling statute in 2000.5

There were other remarkable aspects of the Supreme Court’s Accutane decision. For instance, the Court put its weight behind the common-sense and accurate interpretation of Sir Austin Bradford Hill’s famous articulation of factors for causal judgment, which requires that sampling error, bias, and confounding be eliminated before assessing whether the observed association is strong, consistent, plausible, and the like. Slip op. at 20 (citing the Reference Manual at 597-99), 78.

The Supreme Court relied extensively on the National Academies’ Reference Manual on Scientific Evidence.6 That reliance is certainly preferable to judicial speculations and fabulations of scientific method. The reliance is also positive, considering that the Court did not look only at the problematic epidemiology chapter, but adverted also to the chapters on statistical evidence and on clinical medicine.

The Supreme Court recognized that the Appellate Division had essentially sanctioned an anything goes abandonment of gatekeeping, an approach that has been all-too-common in some of New Jersey’s lower courts. Contrary to the previously prevailing New Jersey zeitgeist, the Court instructed that gatekeeping must be “rigorous” to “prevent[] the jury’s exposure to unsound science through the compelling voice of an expert.” Slip op. at 68-9.

Not all evidence is equal. “[C]ase reports are at the bottom of the evidence hierarchy.” Slip op. at 73. Extrapolation from non-human animal studies is fraught with external validity problems, and such studies “far less probative in the face of a substantial body of epidemiologic evidence.” Id. at 74 (internal quotations omitted).

Perhaps most chilling for the lawsuit industry will be the Supreme Court’s strident denunciation of expert witnesses’ selectivity in choosing lesser evidence in the face of a large body of epidemiologic evidence, id. at 77, and their unprincipled cherry picking among the extant epidemiologic publications. Like the trial court, the Supreme Court found that the plaintiffs’ expert witnesses’ inconsistent use of methodological criteria and their selective reliance upon studies (disregarding eight of the nine epidemiologic studies) that favored their task masters was the antithesis of sound methodology. Id. at 73, citing with approval, In re Lipitor, ___ F.3d ___ (4th Cir. 2018) (slip op. at 16) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

An essential feature of the Supreme Court’s decision is that it was not willing to engage in the common reductionism that has “all epidemiologic studies are flawed,” and which thus privileges cherry picking. Not all disagreements between expert witnesses can be framed as differences in interpretation. In re Accutane will likely stand as a bulwark against flawed expert witness opinion testimony in the Garden State for a long time.


1 Judge Nelson Johnson is also the author of Boardwalk Empire: The Birth, High Times, and Corruption of Atlantic City (2010), a spell-binding historical novel about political and personal corruption.

2 In support of the defendants’ positions, amicus briefs were filed by the New Jersey Business & Industry Association, Commerce and Industry Association of New Jersey, and New Jersey Chamber of Commerce; by law professors Kenneth S. Broun, Daniel J. Capra, Joanne A. Epps, David L. Faigman, Laird Kirkpatrick, Michael M. Martin, Liesa Richter, and Stephen A. Saltzburg; by medical associations the American Medical Association, Medical Society of New Jersey, American Academy of Dermatology, Society for Investigative Dermatology, American Acne and Rosacea Society, and Dermatological Society of New Jersey, by the Defense Research Institute; by the Pharmaceutical Research and Manufacturers of America; and by New Jersey Civil Justice Institute. In support of the plaintiffs’ position and the intermediate appellate court’s determination, amicus briefs were filed by political action committee the New Jersey Association for Justice; by the Ironbound Community Corporation; and by plaintiffs’ lawyer Allan Kanner.

3 Nothing in the intervening scientific record called question upon Judge Johnson’s trial court judgment. See, e.g., I.A. Vallerand, R.T. Lewinson, M.S. Farris, C.D. Sibley, M.L. Ramien, A.G.M. Bulloch, and S.B. Patten, “Efficacy and adverse events of oral isotretinoin for acne: a systematic review,” 178 Brit. J. Dermatol. 76 (2018).

4 Slip op. at 9, 14-15, citing Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991) (“We initially took that step to allow the parties in toxic tort civil matters to present novel scientific evidence of causation if, after the trial court engages in rigorous gatekeeping when reviewing for reliability, the proponent persuades the court of the soundness of the expert’s reasoning.”).

5 The Court did acknowledge that Federal Rule of Evidence 702 had been amended in 2000, to reflect the Supreme Court’s decision in Daubert, Joiner, and Kumho Tire, but the Court did not deal with the inconsistencies between the present rule and the 1993 Daubert case. Slip op. at 64, citing Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 320-21, 320 n.8 (3d Cir. 2003).

6 See Accutane slip op. at 12-18, 24, 73-74, 77-78. With respect to meta-analysis, the Reference Manual’s epidemiology chapter is still stuck in the 1980s and the prevalent resistance to poorly conducted, often meaningless meta-analyses. SeeThe Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011) (The Reference Manual fails to come to grips with the prevalence and importance of meta-analysis in litigation, and fails to provide meaningful guidance to trial judges).

Slemp Trial Part 4 – Graham Colditz

July 22nd, 2017

The Witness

Somehow, in opposition to two epidemiologists presented by the plaintiff in Slemp, the defense managed to call none. The first of the plaintiffs’ two epidemiology expert witnesses was Graham A. Colditz, a physician with doctoral level training in epidemiology. For many years, Colditz was a professor at the Harvard School of Public Health. Colditz left Harvard to become the Niess-Gain Professor at Washington University St. Louis School of Medicine, where he is also the Associate Director for Prevention and Control at the Alvin J. Siteman Cancer Center.

Colditz is a senior epidemiologist, with many book and article publications to his credit. Although he has not published a causal analysis of ovarian cancer and talc, Colditz was an investigator on the well-known Nurses’ Health Study. One of Colditz’s publications on the Nurses’ cohort featured an analysis of talc use and ovarian cancer outcomes.

Although he is not a frequent testifying expert witness, Colditz is no stranger to the courtroom. He was a regular protagonist in the estrogen-progestin hormone replacement therapy (HRT) litigation, which principally involves claims of female breast cancer. Colditz has a charming Australian accent, with a voice tremor that makes him sound older than 63, and perhaps even more distinguished. He charges $1,500 per hour for his testimonial efforts, but is quick to point out that he has given thousands to charity. At his hourly rate, we can be sure he needs tax deductions of some kind.

In discussing his own qualifications, Colditz was low-key and modest except for what seemed like a strange claim that his HRT litigation work for plaintiffs led the FDA to require a boxed warning of breast cancer risk on the package insert for HRT medications. This claim is certainly false, and an extreme instance of post hoc ergo propter hoc. Colditz gilded the lilly by claiming that he does not get involved unless he believes that general causation exists between the exposure or medication and the disease claimed. Since he has only been a plaintiffs’ expert witness, this self-serving claim is quite circular.

The Examinations

The direct and cross-examinations of Dr. Colditz were long and tedious. Most lawyers are reluctant to have an epidemiologists testify at all, and try to limit the length of their examinations, when they must present epidemiologic testimony. Indeed, the defense in Slemp may have opted to present a clinician based upon the prejudice against epidemiologists testifying about quantitative data and analysis. In any event, Colditz’s direct examination went not hours, but days, as did the defense’s cross-examination.

The tedium of the direct examination was exacerbated by the shameless use of leading, loaded, and argumentative questions by plaintiff’s counsel, Allen Smith. A linguistic analysis might well show that Smith spoke 25 to 30 words for every one word spoken by Colditz on direct examination. Even aside from the niceties of courtroom procedure, the direct examination was lacking in aesthetic qualities. Still, it is hard to argue with a $110 million verdict, which cries out for explanation.

There were virtually no objections to Smith’s testifying in lieu of Colditz, with Colditz reduced to just “yes.” Sometimes, Colditz waxed loquacious, and answered, “yes, sir.” From judicial responses to other objections, however, it was clear that the trial court would have provided little control of the leading and argumentative questions.

Smith’s examination also took Colditz beyond the scope of his epidemiologic expertise in to ethics, social policy, and legal requirements of warnings, again without judicial management or control. We learned, over objection, from Colditz of all witnesses that the determination of causation has nothing to do with whether a warning should be given.

The Subject Matter

Colditz was clearly familiar with the subject matter, and allowed Smith to testify for him on a fairly simplistic level. The testimony was a natural outgrowth of his professional interests, and Colditz must have appeared to have been a credible expert witness, especially in a St. Louis courtroom, given that he was in a leadership role at the leading cancer center in that city.

With Smith’s lead, Colditz broached technical issues of bias evaluation, meta-analysis and pooling, which would never be addressed later by a defense expert witness at an equal level of expertise, sophistication, and credibility. Colditz offered criticisms of the Gonzalez (Sister Study) and the latency built into the observation period of that cohort, and he introduced the concept of Berkson bias in some of the case-control studies. Neither of these particular criticisms was rebutted in the defense case, again raising the question whether the defense expert witness, Dr. Huh, a clinician specializing in gynecologic oncology, was an appropriate foil for the line up of plaintiffs’ expert witness. Dr. Colditz was able to talk authoritatively (and in some cases misleadingly) about issues, which Dr. Huh could not contradict effectively, even if he were to have tried.

Colditz characterized his involvement in the talc cases as starting with his conducting a systematic review, undertaken for litigation, but still systematic. As a professor of epidemiology, Colditz should know what a systematic review is, although he never fully described the process on either direct or cross-examinations. No protocol for the systematic review was adduced into evidence. Sadly, the defense expert witness, Dr. Huh, never stated that he had done a systematic review; nor did he offer any criticisms of Dr. Colditz’s systematic review. Indeed, Huh admitted that he had not read Colditz’s testimony. In general, observing Colditz’s testimony after having watched Dr. Huh testify shouted MISMATCH.

The Issues

Statistical Significance

The beginning point of a case such as Slemp, involving a claim that talc causes ovarian cancer, and that it caused her ovarian cancer, is whether there is supporting epidemiology for the claim. As Sir Austin Bradford Hill put it over 50 years ago:

Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Colditz, and plaintiff’s counsel, did not run away from the challenge; they embraced statistical significance and presented an argument for why the association was “clear-cut” (not created by bias or confounding).

In one of his lengthy, leading questions, plaintiffs’ counsel attempted to suggest that statistical significance, or a confidence interval that excluded a risk ratio of 1.0, excluded bias as well as chance. Colditz to his credit broke from the straight jacket of “yes, sirs,” and disagreed as to bias. Smith, perhaps chastised then took a chance and asked an open-ended question about what a confidence interval was. With the bit in his mouth, Colditz managed to describe the observed confidence interval incorrectly as providing the range within which the point estimate would fall 95% of the time if the same study were repeated many times! There is a distribution of 95% confidence intervals, which cover the true parameter 95% of the time, assuming a correct statistical model, random sampling, and no bias or confounding. For the observed confidence interval, the true value is either included or not. Perhaps Colditz was thinking of a prediction interval, but Smith had asked for a definition of a confidence interval, and the jury got non-sense.

Dose Response

Colditz parsed the remaining Bradford Hill factors, and opined that exposure-gradient or dose response was good to have but not necessary to support a causal conclusion. Colditz opined, with respect to whether the statistical assessment of a putative dose-response should include non-exposed women, that the non-exposed women should be excluded. This was one of the few technical issues that Dr. Huh engaged with, in the defense case, but Dr. Colditz was not confronted with any textbooks or writings that cast doubt on his preference for excluding non-users.

Plausibility

Plaintiff’s counsel spent a great deal of time, mostly reading lengthy passages of articles on this or that plausible mechanism for talc’s causing human ovarian cancer, only to have Colditz, with little or no demonstrated expertise in biological mechanism, say “yes.” Some articles discussed that talc use was a modifiable risk and that avoiding perineal talc use “may” reduce ovarian cancer risk. Smith would read (accurately) and then ask Colditz whether he agreed that avoiding talc use would reduce ovarian cancer in women. Colditz himself catches and corrects Smith, some times, but not others.

Smith read from an article that invokes a claim that asbestos (with definition as to what mineral) causes ovarian cancer. Colditz agreed. Smith testified that talc has asbestos in it, and Colditz agreed. Smith read from an article that stated vaguely that talc is chemically similar to asbestos and thus this creates plausibility for a causal connection between talc and cancer. Colditz agreed, without any suggestion that he understands whether or not talc is morphologically similar to asbestos. It seems unlikely that Colditz had any real expertise to offer here, but Smith could not resist touching all bases with Colditz; and the defense did not object or follow up on these excesses.

Smith and Colditz, well mostly Smith, testified that tubal ligation reduces the otherwise observed increased risk of ovarian cancer from talc use. Smith here entrusts Colditz with providing the common-sense explanation. There is no meaningful cross-examination on this “jury friendly” point.

Consistency

Colditz testifed that the studies, both case-control and cohort studies, were consistent in showing an increased risk of ovarian cancer in association with talc use. Indeed, the studies are mostly consistent; the issue is whether they are consistently biased or consistently showing the true population risk. The defense chose to confront Colditz with the lack of statistical significance in some studies (with elevated risk ratios) as though these studies were inconsistent with the studies that found similar risk ratios, with p-values less than 5%. This confrontation did not go well for the defense, either on cross-examination of Colditz, or on direct examination of Dr. Huh. Colditz backed up his opinion on consistency with the available meta-analyses, which find very low p-values for the summary estimate of risk ratio for talc use and ovarian cancer.

Unlike the Zoloft case1, in which consistency was generated across different end points by cherry picking, the consistency in the talc case was evidenced by a consistent elevation of risk ratios for the same end point, across studies. When subgroups of ovarian cell or tumor types were examined, statistical significance was sometimes lost, but the direction of the risk ratio above one was maintained. Meta-analyses generated summary point estimates with very low p-values.

The Gold Standard

Colditz further gilded the consistency lilly by claiming that the Terry study2, a pooled analysis of available case-control studies, was the “gold standard” in this area of observational epidemiology. Smith and Colditz presented at some length as to how the Cochrane Collaboration has labeled combined “individual patient data” (IPD) analyses as the gold standard. Colditz skimmed over the Cochrane’s endorsement of IPD analyses as having been made in the context of systematic reviews, involving primarily randomized clinical trials, for which IPD analyses allow time-to-event measurements, which can substantially modify observed risk ratios, and even reverse their direction. The case-control studies in the Terry pooled analysis did not have anything like the kind of prospectively collected individual patient data, which would warrant holding the Terry paper up as a “gold standard,” and Terry and her co-authors never made such a claim for their analysis. Colditz’s claim about the Terry study cried out for strong rebuttal, which never came.

The defense should have known that this hyperbolic testimony would be forthcoming, but they seemed not to have a rebuttal planned, other than dismissing case-controls studies generally as smaller than cohort studies. Rather than “getting into the weeds” about the merits of pooled analyses of observational studies, as opposed to clinical trials, the defense continued with its bizarre stance that the cohort studies were better because larger, while ignoring that they are smaller with respect to number of ovarian cancer cases and have less precision than the case-control studies. SeeNew Jersey Kemps Ovarian Cancer – Talc Cases” (Sept. 16, 2016). The defense also largely ignored Colditz’s testimony that exposure data collected in the available cohort studies was of limited value because lacking in details about frequency and intensity of use, and in some cases, collected on only one occasion.

Specific Causation

Colditz disclaimed the ability or intention to offer a specific causation opinion about Ms. Slemp’s ovarian cancer. Nonetheless, Colditz volunteered that “cancer is multifactorial,” which says very little because it says so much. In plaintiffs’ counsel’s hands, this characterization became a smokescreen to indict every possible present risk factor as playing a part in the actual causation of a particular case, such as Ms. Slemp’s case. No matter that the plaintiff was massively obese, and a smoker; every risk factor present must be, by fiat, in the “causal pie.”

But this would seem not to be Colditz’s own opinion. Graham Colditz has elsewhere asserted that an increased risk of disease cannot be translated into the “but-for” standard of causation3:

Knowledge that a factor is associated with increased risk of disease does not translate into the premise that a case of disease will be prevented if a specific individual eliminates exposure to that risk factor. Disease pathogenesis at the individual level is extremely complex.”

Just because a risk factor (assuming it is real and causal) is present does not put in the causal set.

Cross-Examination

The direct examination of Graham Colditz included scurrilous attacks on J & J’s lobbying, paying FDA user fees, and other corporate conduct, based upon documents of which Colditz had not personal knowledge. Colditz was reduced to nothing more than a backboard, off which plaintiff’s counsel could make his shots. On cross, the defense carefully dissected this direct examination and obtained disavowals from Colditz that he had suggested any untoward conduct by J & J. The jury could have been spared their valuable time by a trial judge who did not allow the scurrilous, collateral attacks in the first place.

The defense also tried to diminish Dr. Colditz’s testimony as an opinion coming from a non-physician. The problem, however, was that Colditz is a physician, who understands the biological issues, even if he is not a pathologist, toxicologist, or oncologist. Colditz did not offer opinions about Slemp’s medical treatment, and there was nothing in this line of cross-examination that lessened the impact of Colditz’s general causation testimony.

Generally, the cross-examination did not hurt Dr. Colditz’s strongly stated opinion that talc causes ovarian cancer. The defense (and plaintiff’s counsel before them) spent an inordinate amount of time on why Dr. Colditz has not updated his website to state publicly that talc causes ovarian cancer. Colditz blamed the “IT” guys, a rather disingenuous excuse. His explanation on direct, and on cross, as to why he could not post his opinion on his public-service website was so convoluted, however, that there was no clear admission or inference of dereliction. Colditz was permitted to bill his opinion, never posted to his institution’s website, as a “consensus opinion,” endorsed by several researchers, based upon hearsay emails and oral conversations.


1 See In re Zoloft Prod. Liab. Litig., No. 16-2247 , __ F.3d __, 2017 WL 2385279, 2017 U.S. App. LEXIS 9832 (3d Cir. June 2, 2017) (affirming exclusion of dodgy opinion, which involved changing subgroup end points across studies of maternal sertraline use and infant cardiac birth defects ).

2 Kathryn L. Terry, et al., “Genital powder use and risk of ovarian cancer: a pooled analysis of 8,525 cases and 9,859 controls,” 6 Cancer Prev. & Research 811 (2013).

3 Graham A. Colditz, “From epidemiology to cancer prevention: implications for the 21st Century,” 18 Cancer Causes Control 117, 118 (2007).