TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Zhang’s Glyphosate Meta-Analysis Succumbs to Judicial Scrutiny

August 5th, 2024

Back in March 2015, the International Agency for Research on Cancer (IARC) issued its working group’s monograph on glyphosate weed killer. The report classified glyphosate as a “probable carcinogen,” which is highly misleading. For IARC, the term “probable” does not mean more likely than not, or for that matter, probable does not have any quantitative meaning. The all-important statement of IARC methods, “The Preamble,” makes this clear.[1] 

In the case of glyphosate, the IARC working group concluded that the epidemiologic evidence for an association between glyphosate exposure and cancer (specifically non-Hodgkins lymphoma (NHL)), was limited, which is IARC’s euphemism for insuffcient. Instead of epidemiology, IARC’s glyphosate conclusion was based largely upon rodent studies, but even the animal evidence relied upon by IARC was dubious. The IARC working group cherry picked a few arguably “positive” rodent study results with increases in tumors, while ignoring exculpatory rodent studies with decreasing tumor yield.[2]

Although the IARC hazard classification was uncritically embraced by the lawsuit industry, most regulatory agencies, even indulging precautionary principle reasoning, rejected the claim of carcinogenicity. The United States  Environmental Protection Agency (EPA), European Food Safety Authority, Food and Agriculture Organization (in conjunction with World Health Organization, European Chemicals Agency, Health Canada, German Federal Institute for Risk Assessment, among others, found that the scientific evidence did not support the claim that glyphosate causes NHL. The IARC monograph very quickly after publication became the proximate cause of a huge litigation effort by the lawsuit industry against Monsanto.

The personal injury cases against Monsanto, filed in federal court, were aggregated for pre-trial hearing, before Judge Vince Chhabria, of the Northern District of California, as MDL 2741. Judge Chhabria denied Monsanto’s early Rule 702 motions, and thus cases proceeded to trial, with mixed results.

In 2019, the Zhang study, a curious meta-analysis of some of the available glyphosate epidemiologic studies appeared in Mutation Research / Reviews in Mutation Research, a toxicology journal that seemed an unlikely venue for a meta-analysis of epidemiologic studies. The authors combined selected results from one large cohort study, the Agricultural Health Study, along with five case-control studies, to reach a summary relative risk of 1.41 (95% confidence interval 1.13-1.75).[3] According to the authors, their “current meta-analysis of human epidemiological studies suggests a compelling link between exposures to GBHs [glyphosate] and increased risk for NHL.”

The Zhang meta-analysis was not well reviewed in regulatory and scientific circles. The EPA found that Zhang used inappropriate methods in her meta-analysis.[4] Academic authors also panned the Zhang meta-analysis in both scholarly,[5] and popular articles.[6] The senior author of the Zhang paper, Lianne Sheppard, a Professor in the University of Washington Departments of Environmental  and  Occupational Health Sciences, and Biostatistics, attempted to defend the study, in Forbes.[7] Professor Geoffrey Kabat very adeptly showed that this defense was futile.[8] Despite the very serious and real objections to the validity of the Zhang meta-analysis, plaintiffs’ expert witnesses, such as Beate Ritz, an epidemiologist with U.C.L.A. testified that she trusted and relied upon the analysis.[9]

For five years, the Zhang study was a debating point for lawyers and expert witnesses in the glyphosate litigation, without significant judicial gatekeeping. It took the entrance of Luoping Zhang herself as an expert witness in the glyphosate litigation, and the procedural oddity of her placing exclusive reliance upon her own meta-analysis, to bring the meta-analysis into the unforgiving light of judicial scrutiny.

Zhang is a biochemist and toxicologist, in the University of California, Berkeley. Along with two other co-authors of her 2019 meta-analysis paper, she had been a board member of the EPA’s 2016 scientific advisory panel on glyphosate. After plaintiffs’ counsel disclosed Zhang as an expert witness, she disclosed her anticipated testimony, as is required by Federal Rule of Civil Procedure 26, by attaching and adopting by reference the contents of two of her published papers. The first paper was her 2019 meta-analysis; the other paper discussed putative mechanisms. Neither paper concluded that glyphosate causes NHL. Zhang’s disclosure did not add materially to her 2019 published analysis of six epidemiologic studies on glyphosate and NHL.

The defense challenged the validity of Dr. Zhang’s proffered opinions, and her exclusive reliance upon her own 2019 meta-analysis required the MDL court to pay attention to the failings of that paper, which had previously escaped critical judicial scrutiny. In June 2024, after an oral hearing in Bulone v. Monsanto, at which Dr. Zhang testified, Judge Chhabria ruled that Zhang’s proffered testimony, and her reliance upon her own meta-analysis was “junk science.”[10]

Judge Chhabria, perhaps encouraged by the recently fortifying amendment to Rule 702, issued a remarkable opinion that paid close attention to the indicia of validity of an expert witness’s opinion and the underlying meta-analysis. Judge Chhabria quickly spotted the disconnect between Zhang’s published papers and what is required for an admissible causation opinion. The mechanism paper did not address the extant epidemiology, and both sides in the MDL had emphasized that the epidemiology was critically important for determining whether there was, or was not, causation.

Zhang’s meta-analysis did evaluate some, but not all, of the available epidemiology, but the paper’s conclusion stopped considerably short of the needed opinion on causation. Zhang and colleagues had concluded that there was a “compelling link” between exposures to [glyphosate-based herbicides] and increased risk for NHL. In their paper’s key figure, show casing the summary estimate of relative risk of 1.41 (95% C.I., 1.13 -1.75), Zhang and her co-authors concluded only that exposure was “associated with an increased risk of NHL.” According to Judge Chhabria, in incorporating her 2019 paper into her Rule 26 report, Zhang failed to add a proper holistic causation analysis, as had other expert witnesses who had considered the Bradford Hill predicates and considerations.

Judge Chhabria picked up on another problem that has both legal and scientific implications. A meta-analysis is out of date as soon as a subsequent epidemiologic study becomes available, which would have satisfied the inclusion criteria for the meta-analysis. Since publishing her meta-analysis in 2019, additional studies had in fact been published. At the hearing, Dr. Zhang acknowledged that several of them would qualify for inclusion in the meta-analysis, per her own stated methods. Her failure to update the meta-analysis made her report incomplete and inadmissible for a court matter in 2024.

Judge Chhabria might have stopped there, but he took a closer look at the meta-analysis to explore whether it was a valid analysis, on its own terms. Much as Chief Judge Nancy Rosenstengel had done with the made-for-litigation meta-analysis concocted by Martin Wells in the paraquat litigation,[11] Judge Chhabria examined whether Zhang had been faithful to her own stated methods. Like Chief Judge Rosenstengel’s analysis, Judge Chhabria’s analysis stands as a strong rebuttal to the uncharitable opinion of Professor Edward Cheng, who has asserted that judges lack the expertise to evaluate the “expert opinions” before them.[12]

Judge Chhabria accepted the intellectual challenge that Rule 702 mandates. With the EPA memorandum lighting the way, Judge Chhabria readily discerned that “the challenged meta-analysis was not reliably performed.” He declared that the Zhang meta-analysis was “junk science,” with “deep methodological problems.”

Zhang claimed that she was basing the meta-analysis on the subgroups of six studies with the heaviest glyphosate exposure. This claim was undermined by the absence of any exposure-response gradient in the study deemed by Zhang to be of the highest quality. Furthermore, of the remaining five studies, three studies failed to provide any exposure-dependent analysis other than a comparison of NHL rates among “ever” versus “never” glyphosate exposure. As a result of this heterogeneity, Zhang used all the data from studies without exposure characterizations, but only limited data from the other studies that analyzed NHL by exposure levels. And because the highest quality study was among those that provided exposure level correlations, Zhang’s meta-analysis used only some of the data from it.

The analytical problems created by Zhang’s meta-analytical approach were compounded by the included studies’ having measured glyphosate exposures differently, with different cut-points for inclusion as heavily exposed. Some of the excluded study participants would have heavier exposure than those included in the summary analysis.

In the universe of included studies, some provided adjusted results from multi-variate analyses that included other pesticide exposures. Other studies reported only unadjusted results. Even though Zhang’s method stated a preference for adjusted analyses, she inexplicably failed to use adjusted data in the case of one study that provided both adjusted and unadjusted results.

As shown in Judge Chhabria’s review, Zhang’s methodological errors created an incoherent analysis, with methods that could not be justified. Even accepting its own stated methodology, the meta-analysis was an exercise in cherry picking. In the court’s terms, it was, without qualification, “junk science.”

After the filing of briefs, Judge Chhabria provided the parties an oral hearing, with an opportunity for viva voce testimony. Dr. Zhang thus had a full opportunity to defend her meta-analysis. The hearing, however, did not go well for her. Zhang could not talk intelligently about the studies included, or how they defined high exposure. Zhang’s lack of familiarity with her own opinion and published paper was yet a further reason for excluding her testimony.

As might be expected, plaintiffs’ counsel attempted to hide behind peer review. Plaintiffs’ counsel attempted to shut down Rule 702 scrutiny of the Zhang meta-analysis by suggesting that the trial court had no business digging into validity concerns given that Zhang had published her meta-analysis in what apparently was a peer reviewed journal. Judge Chhabria would have none of it. In his opinion, publication in a peer-reviewed journal cannot obscure the glaring methodological defects of the relied upon meta-analysis. The court observed that “[p]re-publication editorial peer review, just by itself, is far from a guarantee of scientific reliability.”[13] The EPA memorandum was thus a more telling indicator of the validity issues than the publication in a nominally peer-reviewed journal.

Contrary to some law professors who are now seeking to dismantle expert witness gatekeeping as beyond a judge’s competence, Judge Chhabria dismissed the suggestion that he lacked the expertise to adjudicate the validity issues. Indeed, he displayed a better understanding of the meta-analytic process than did Dr. Zhang. As the court observed, one of the goals of MDL assignments was to permit a single trial judge to have time to engage with the scientific issues and to develop “fluency” in the relevant scientific studies. Indeed, when MDL judges have the fluency in the scientific concepts to address Rule 702 or 703 issues, it would be criminal for them to ignore it.

The Bulone opinion should encourage lawyers to get “into the weeds” of expert witness opinions. There is nothing that a little clear thinking – and glyphosate – cannot clear away. Indeed, now that the weeds of Zhang’s meta-analysis are cleared away, it is hard to fathom that any other expert witness can rely upon it without running afoul of both Federal Rules of Evidence 702 and 703.

There were a few issues not addressed in Bulone. As seen in her oral hearing testimony, Zhang probably lacked the qualifications to proffer the meta-analysis. The bar for qualification as an expert witness, however, is sadly very low. One other issue that might well have been addressed is Zhang’s use of a fixed effect model for her meta-analysis. Considering that she was pooling data from cohort and case-control studies, some with and some without adjustments for confounders, with different measures of exposure, and some with and some without exposure-dependent analyses, Zhang and her co-authors were not justified in using a fixed effect model for arriving at a summary estimate of relative risk. Admittedly, this error could easily have been lost in the flood of others.

Postscript

Glyphosate is not merely a scientific issue. Its manufacturer, Monsanto, is the frequent target of media outlets (such as Telesur) from autocratic countries, such as Communist China and its client state, Venezuela.[14]

天安门广场英雄万岁


[1]The IARC-hy of Evidence – Incoherent & Inconsistent Classifications of Carcinogenicity,” Tortini (Sept. 19, 2023).

[2] Robert E Tarone, “On the International Agency for Research on Cancer classification of glyphosate as a probable human carcinogen,” 27 Eur. J. Cancer Prev. 82 (2018).

[3] Luoping Zhang, Iemaan Rana, Rachel M. Shaffer, Emanuela Taioli, Lianne Sheppard, “Exposure to glyphosate-based herbicides and risk for non-Hodgkin lymphoma: A meta-analysis and supporting evidence,” 781 Mutation Research/Reviews in Mutation Research 186 (2019).

[4] David J. Miller, Acting Chief Toxicology and Epidemiology Branch Health Effects Division, U.S. Environmental Protection Agency, Memorandum to Christine Olinger, Chief Risk Assessment Branch I, “Glyphosate: Epidemiology Review of Zhang et al. (2019) and Leon et al. (2019) publications for Response to Comments on the Proposed Interim Decision” (Jan. 6, 2020).

[5] Geoffrey C. Kabat, William J. Price, Robert E. Tarone, “On recent meta-analyses of exposure to glyphosate and risk of non-Hodgkin’s lymphoma in humans,” 32 Cancer Causes & Control 409 (2021).

[6] Geoffrey Kabat, “Paper Claims A Link Between Glyphosate And Cancer But Fails To Show Evidence,” Science 2.0 (Feb. 18, 2019).

[7] Lianne Sheppard, “Glyphosate Science is Nuanced. Arguments about it on the Internet? Not so much,” Forbes (Feb. 20, 2020).

[8] Geoffrey Kabat, “EPA Refuted A Meta-Analysis Claiming Glyphosate Can Cause Cancer And Senior Author Lianne Sheppard Doubled Down,” Science 2.0 (Feb. 26, 2020).

[9] Maria Dinzeo, “Jurors Hear of New Study Linking Roundup to Cancer,” Courthouse News Service (April 8, 2019).

[10] Bulone v. Monsanto Co., Case No. 16-md-02741-VC, MDL 2741 (N.D. Cal. June 20, 2024). See Hank Campbell, “Glyphosate legal update: Meta-study used by ambulance-chasing tort lawyers targeting Bayer’s Roundup as carcinogenic deemed ‘junk science nonsense’ by trial judge,” Genetic Literacy Project (June 24, 2024).

[11] In re Paraquat Prods. Liab. Litig., No. 3:21-MD-3004-NJR, 2024 WL 1659687 (S.D. Ill. Apr. 17, 2024) (opinion sur Rule 702 motion), appealed sub nom., Fuller v. Syngenta Crop Protection, LLC, No. 24-1868 (7th Cir. May 17, 2024). SeeParaquat Shape-Shifting Expert Witness Quashed,” Tortini (April 24, 2024).

[12] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022). SeeCheng’s Proposed Consensus Rule for Expert Witnesses,” Tortini (Sept. 15, 2022); “Further thoughts on Cheng’s Consensus Rule,” Tortini (Oct. 3, 2022).

[13] Bulone, citing Valentine v. Pioneer Chlor Alkali Co., 921 F. Supp. 666, 674-76 (D. Nev. 1996), for its distinction between “editorial peer review” and “true peer review,” with the latter’s inclusion of post-publication assessment of a paper as really important for Rule 702 purposes.

[14] Anne Applebaum, Autocracy, Inc.: The Dictators Who Want to Run the World 66 (2024).

Access to a Study Protocol & Underlying Data Reveals a Nuclear Non-Proliferation Test

April 8th, 2024

The limits of peer review ultimately make it a poor proxy for the validity tests posed by Rules 702 and 703. Published peer review articles simply do not permit a very searching evaluation of the facts and data of a study. In the wake of the Daubert decision, expert witnesses quickly saw that they can obscure the search for validity by the reliance upon published studies, and frustrate the goals of judicial gatekeeping. As a practical matter, the burden shifts to the party that wishes to challenge the relied upon facts and data to learn more about the cited studies to show that the facts and data are not sufficient under Rule 702(b), and that the testimony is not the product of reliable methods under Rule 702(c). Obtaining study protocols, and in some instances, underlying data, are necessary for due process in the gatekeeping process. A couple of case studies may illustrate the power of looking under the hood of published studies, even ones that were peer reviewed.

When the Supreme Court decided the Daubert case in June 1993, two recent verdicts in silicone-gel breast implant cases were fresh in memory.[1] The verdicts were large by the standards of the time, and the evidence presented for the claims that silicone caused autoimmune disease was extremely weak. The verdicts set off a feeding frenzy, not only in the lawsuit industry, but also in the shady entrepreneurial world of supposed medical tests for “silicone sensitivity.”

The plaintiffs’ litigation theory lacked any meaningful epidemiologic support, and so there were fulsome presentations of putative, hypothetical mechanisms. One such mechanism involved the supposed in vivo degradation of silicone to silica (silicon dioxide), with silica then inducing an immunogenic reaction, which then, somehow, induced autoimmunity and the induction of autoimmune connective tissue disease. The degradation claim would ultimately prove baseless,[2] and the nuclear magnetic resonance evidence put forward to support degradation would turn out to be instrumental artifact and deception. The immunogenic mechanism had a few lines of potential support, with the most prominent at the time coming from the laboratories of Douglas Radford Shanklin, and his colleague, David L. Smalley, both of whom were testifying expert witnesses for claimants.

The Daubert decision held out some opportunity to challenge the admissibility of testimony that silicone implants led to either the production of a silicone-specific antibody, or the induction of t-cell mediated immunogenicity from silicone (or resulting silica) exposure. The initial tests of the newly articulated standard for admissibility of opinion testimony in silicone litigation did not go well.[3]  Peer review, which was absent in the re-analyses relied upon in the Bendectin litigation, was superficially present in the studies relied upon in the silicone litigation. The absence of supportive epidemiology was excused with hand waving that there was a “credible” mechanism, and that epidemiology took too long and was too expensive. Initially, post-Daubert, federal courts were quick to excuse the absence of epidemiology for a novel claim.

The initial Rule 702 challenges to plaintiffs’ expert witnesses thus focused on  immunogenicity as the putative mechanism, which if true, might lend some plausibility to their causal claim. Ultimately, plaintiffs’ expert witnesses would have to show that the mechanism was real by showing that silicone exposure causes autoimmune disease through epidemiologic studies,

One of the more persistent purveyors of a “test” for detecting alleged silicone sensitivity came from Smalley and Shanklin, then at the University of Tennessee. These authors exploited the fears of implant recipients and the greed of lawyers by marketing a “silicone sensitivity test (SILS).” For a price, Smalley and Shanklin would test mailed-in blood specimens sent directly by lawyers or by physicians, and provide ready-for-litigation reports that claimants had suffered an immune system response to silicone exposure. Starting in 1995, Smalley and Shanklin also cranked out a series of articles at supposedly peer reviewed journals, which purported to identify a specific immune response to crystalline silica in women who had silicone gel breast implants.[4] These studies had two obvious goals. First, the studies promoted their product to the “silicone sisters,” various support groups of claimants, as well as their lawyers, and a network of supporting rheumatologists and plastic surgeons. Second, by identifying a putative causal mechanism, Shanklin could add a meretricious patina of scientific validity to the claim that silicone breast implants cause autoimmune disease, which Shanklin, as a testifying expert witness, needed to survive Rule 702 challenges.

The plaintiffs’ strategy had been to paper over the huge analytical gaps in their causal theory with complicated, speculative research, which had been peer reviewed and published. Although the quality of the journals was often suspect, and the nature of the peer review obscure, the strategy had been initially successful in deflecting any meaningful scrutiny.

Many of the silicone cases were pending in a multi-district litigation, MDL 926, before Judge Sam Pointer, in the Northern District of Alabama. Judge Pointer, however, did not believe that ruling on expert witness admissibility was a function of an MDL court, and by 1995, he started to remand cases to the transferor courts, for those courts to do what they thought appropriate under Rules 702 and 703. Some of the first remanded cases went to the District of Oregon, where they landed in front of Judge Robert E. Jones. In early 1996, Judge Jones invited briefing on expert witness challenges, and in face of the complex immunology and toxicology issues, and the emerging epidemiologic studies, he decided to appoint four technical advisors to assist him in deciding the challenges.

The addition of scientific advisors to the gatekeeper’s bench made a huge difference in the sophistication and detail of the challenges that could be lodged to the relied-upon studies. In June 1996, Judge Jones entertained extensive hearings with viva voce testimony from both challenged witnesses and subject-matter experts on topics, such as immunology and nuclear magnetic resonance spectroscopy. Judge Jones invited final argument in the form of videotaped presentations from counsel so that the videotapes could be distributed to his technical advisors later in the summer. The contrived complexity of plaintiffs’ case dissipated, and the huge analytical gaps became visible. In December 1996, Judge Jones issued his decision that excluded the plaintiffs’ expert witnesses’ proposed testimony on grounds that it failed to satisfy the requirements of Rule 702.[5]

In October 1996, while Judge Jones was studying the record, and writing his opinion in the Hall case, Judge Weinstein, with a judge from the Southern District of New York, and another from New York state trial court, conducted a two-week Rule 702 hearing, in Brooklyn. Judge Weinstein announced at the outset that he had studied the record from the Hall case, and that he would incorporate it into his record for the cases remanded to the Southern and Eastern Districts of New York.

Curious gaps in the articles claiming silicone immunogenicity, and the lack of success in earlier Rule 702 challenges, motivated the defense to obtain the study protocols and underlying data from studies such as those published by Shanklin and Smalley. Shanklin and Smalley were frequently listed as expert witnesses in individual cases, but when requests or subpoenas for their protocols and raw data were filed, plaintiffs’ counsel stonewalled or withdrew them as witnesses. Eventually, the defense was able to enforce a subpoena and obtain the protocol and some data. The respondents claimed that the control data no longer existed, and inexplicably a good part of the experimental data had been destroyed. Enough was revealed, however, to see that the published articles were not what they claimed to be.[6]

In addition to litigation discovery, in March 1996, a surgeon published the results of his test of the Shanklin-Smalley silicone sensitivity test (“SILS”).[7] Dr. Leroy Young sent the Shanklin laboratory several blood samples from women with and without silicone implants. For six women who never had implants, Dr. Young submitted a fabricated medical history that included silicone implants and symptoms of “silicone-associated disease.” All six samples were reported back as “positive”; indeed, these results were more positive than the blood samples from the women who actually had silicone implants. Dr. Young suggested that perhaps the SILS test was akin to cold fusion.

By the time counsel assembled in Judge Weinstein’s courtroom, in October 1996, some epidemiologic studies had become available and much more information was available on the supposedly supportive mechanistic studies upon which plaintiffs’ expert witnesses had previously relied. Not too surprisingly, plaintiffs’ counsel chose not to call the entrepreneurial Dr. Shanklin, but instead called Donard S. Dwyer, a young, earnest immunologist who had done some contract work on an unrelated matter for Bristol-Myers Squibb, a defendant in the litigation.  Dr. Dwyer had filed an affidavit previously in the Oregon federal litigation, in which he gave blanket approval to the methods and conclusions of the Smalley-Shanklin research:

“Based on a thorough review of these extensive materials which are more than adequate to evaluate Dr. Smalley’s test methodology, I formed the following conclusions. First, the experimental protocols that were used are standard and acceptable methods for measuring T Cell proliferation. The results have been reproducible and consistent in this laboratory. Second, the conclusion that there are differences between patients with breast implants and normal controls with respect to the proliferative response to silicon dioxide appears to be justified from the data.”[8]

Dwyer maintained this position even after the defense obtaining the study protocol and underlying data, and various immunologists on the defense side filed scathing evaluatons of the Smalley-Shanklin work.  On direct examination at the hearings in Brooklyn, Dwyer vouched for the challenged t-cell studies, and opined that the work was peer reviewed and sufficiently reliable.[9]

The charade fell apart on cross-examination. Dwyer refused to endorse the studies that claimed to have found an anti-silicone antibody. Researchers at leading universities had attempted to reproduce the findings of such antibodies, without success.[10] The real controversy was over the claimed finding of silicone antigenicity as shown in t-cell or the cell-mediated specific immune response. On direct examination, plaintiffs’ counsel elicited Dwyer’s support for the soundness of the scientific studies that purported to establish such antigenicity, with little attention to the critiques that had been filed before the hearing.[11] Dwyer stuck to his unqualified support he had expressed previously in his affidavit for the Oregon cases.[12]

The problematic aspect of Dwyer’s direct examination testimony was that he had seen the protocol and the partial data produced by Smalley and Shanklin.[13] Dwyer, therefore, could not resist some basic facts about their work. First, the Shanklin data failed to support a dose-response relationship.[14] Second, the blood samples from women with silicone implants had been mailed to Smalley’s laboratory, whereas the control samples were collected locally. The disparity ensured that the silicone blood samples would be older than the controls, which was a departure from treating exposed and control samples in the same way.[15] Third, the experiment was done unblinded; the laboratory technical personnel and the investigators knew which blood samples were silicone exposed and which were controls (except for samples sent by Dr. Leroy Young).[16] Fourth, Shanklin’s laboratory procedures deviated from the standardized procedure set out in the National Institute of Health’s Current Protocols in Immunology.[17]

The SILS study protocol and the data produced by Shanklin and Smalley made clear that each sample was to be tested in triplicate for t-cell proliferation in response to silica, to a positive control mitogen (Con A), and to a negative control blank. The published papers all claimed that the each sample was tested in triplicate for each of these three response situations (silica, mitogen, and nothing).[18] Shanklin and Smalley described their t-cell proliferation studies, in their published papers, as having been done in triplicate. These statements were, however, untrue and never corrected.[19]

The study protocol called for the tests to be run in triplicate, but they instructed the laboratory that two counts may be used if one count does not match the other counts, which is to be decided by a technical specialist on a “case-by-case” basis. Of data that was supposed to be reported in triplicate, fully one third had only two data points, and 10 percent had but one data point.[20] No criteria were provided to the technical specialist for deciding which data to discard.[21] Not only had Shanklin excluded data, but he discarded and destroyed the data such that no one could go back and assess whether the data should have been excluded.[22]

Dwyer agreed that this exclusion and discarding of data was not at all a good method.[23] Dwyer proclaimed that he had not come to Brooklyn to defend this aspect of the Shanklin work, and that it was not defensible at all. Dwyer conceded that “the interpretation of the data and collection of the data are flawed.”[24] Dwyer tried to stake out a position that was incoherent by asserting that there was “nothing inherently wrong with the method,” while conceding that discarding data was problematic.[25] The judges presiding over the hearing could readily see that the Shanklin research was bent.

At this point, the lead plaintiffs’ counsel, Michael Williams, sought an off-ramp. He jumped to his feet and exclaimed “I’m informed that no witness in this case will rely on Dr. Smalley’s [and Shanklin’s] work in any respect.” [26] Judge Weinstein’s eyes lit up with the prospect that the Smalley-Shanklin work, by agreement, would never be mentioned again in New York state or federal cases. Given how central the claim of silicone antigenicity was to plaintiffs’ cases, the defense resisted the stipulation about research that they would continue to face in other state and federal courts. The defense was saved, however, by the obstinence of a lawyer from the Weitz & Luxenberg firm, who rose to report that her firm intended to call Drs. Shanklin and Smalley as witnesses, and that they would not stipulate to the exclusion of their work. Judge Weinstein rolled his eyes, and waved me to continue.[27] The proliferation of the t-cell test was over. The hearing before Judges Weinstein and Baer, and Justice Lobis, continued for several more days, with several other dramatic moments.[28]

In short order, on October 23, 1996, Judge Weinstein issued a short, published opinion, in which he granted partial summary judgment on the claims of systemic disease for all cases pending in federal court in New York.[29] What was curious was that the defendants had not moved for summary judgment. There were, of course, pending motions to exclude plaintiffs’ expert witnesses, but Judge Weinstein effectively ducked those motions, and let it be known that he was never a fan of Rule 702. It would be many years later, before Judge Weinstein allowed his judicial assessment see the light of day. Two decades and some years later, in a law review article, Judge Weinstein gave his judgment that

“[t]he breast implant litigation was largely based on a litigation fraud. …  Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”[30]

Judge Weinstein’s opinion was truly a judgment from which there can be no appeal. Shanklin and Smalley continued to publish papers for another decade. None of the published articles by Shanklin and others have been retracted.


[1] Reuters, “Record $25 Million Awarded In Silicone-Gel Implants Case,” N.Y. Times at A13 (Dec. 24, 1992) (describing the verdict returned in Harris County, Texas, in Johnson v. Medical Engineering Corp.); Associated Press, “Woman Wins Implant Suit,” N.Y. Times at A16 (Dec. 17, 1991) (reporting a verdict in Hopkins v. Dow Corning, for $840,000 in compensatory and $6.5 million in punitive damages); see Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment with minimal attention to Rule 702 issues).

[2] William E. Hull, “A Critical Review of MR Studies Concerning Silicone Breast Implants,” 42 Magnetic Resonance in Medicine 984, 984 (1999) (“From my viewpoint as an analytical spectroscopist, the result of this exercise was disturbing and disappointing. In my judgement as a referee, none of the Garrido group’s papers (1–6) should have been published in their current form.”). See also N.A. Schachtman, “Silicone Data – Slippery & Hard to Find, Part 2,” Tortini (July 5, 2015). Many of the material science claims in the breast implant litigation were as fraudulent as the health effects claims. See, e.g., John Donley, “Examining the Expert,” 49 Litigation 26 (Spring 2023) (discussing his encounters with frequent testifier Pierre Blais, in silicone litigation).

[3] See, e.g., Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment for plaintiff over Rule 702 challenges), cert. denied, 115 S.Ct. 734 (1995). See Donald A. Lawson, “Note, Hopkins v. Dow Corning Corporation: Silicone and Science,” 37 Jurimetrics J. 53 (1996) (concluding that Hopkins was wrongly decided).

[4] See David L. Smalley, Douglas R. Shanklin, Mary F. Hall, and Michael V. Stevens, “Detection of Lymphocyte Stimulation by Silicon Dioxide,” 4 Internat’l J. Occup. Med. & Toxicol. 63 (1995); David L. Smalley, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, and Aram Hanissian, “Immunologic stimulation of T lymphocytes by silica after use of silicone mammary implants,” 9 FASEB J. 424 (1995); David L. Smalley, J. J. Levine, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, “Lymphocyte response to silica among offspring of silicone breast implant recipients,” 196 Immunobiology 567 (1996); David L. Smalley, Douglas R. Shanklin, “T-cell-specific response to silicone gel,” 98 Plastic Reconstr. Surg. 915 (1996); and Douglas R. Shanklin, David L. Smalley, Mary F. Hall, Michael V. Stevens, “T cell-mediated immune response to silica in silicone breast implant patients,” 210 Curr. Topics Microbiol. Immunol. 227 (1996). Shanklin was also no stranger to making his case in the popular media. See, e.g., Douglas Shanklin, “More Research Needed on Breast Implants,” Kitsap Sun at 2 (Aug. 29, 1995) (“Widespread silicone sickness is very real in women with past and continuing exposure to silicone breast implants.”) (writing for Scripps Howard News Service). Even after the Shanklin studies were discredited in court, Shanklin and his colleagues continued to publish their claims that silicone implants led to silica antigenicity. David L. Smalley, Douglas R. Shanklin, and Mary F. Hall, “Monocyte-dependent stimulation of human T cells by silicon dioxide,” 66 Pathobiology 302 (1998); Douglas R. Shanklin and David L. Smalley, “The immunopathology of siliconosis. History, clinical presentation, and relation to silicosis and the chemistry of silicon and silicone,” 18 Immunol. Res. 125 (1998); Douglas Radford Shanklin, David L. Smalley, “Pathogenetic and diagnostic aspects of siliconosis,” 17 Rev. Environ Health 85 (2002), and “Erratum,” 17 Rev Environ Health. 248 (2002); Douglas Radford Shanklin & David L Smalley, “Kinetics of T lymphocyte responses to persistent antigens,” 80 Exp. Mol. Pathol. 26 (2006). Douglas Shanklin died in 2013. Susan J. Ainsworth, “Douglas R. Shanklin,” 92 Chem. & Eng’g News (April 7, 2014). Dr. Smalley appears to be still alive. In 2022, he sued the federal government to challenge his disqualification from serving as a laboratory director of any clinical directory in the United States, under 42 U.S.C. § 263a(k). He lost. Smalley v. Becerra, Case No. 4:22CV399 HEA (E.D. Mo. July 6, 2022).

[5] Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Ore. 1996); see Joseph Sanders & David H. Kaye, “Expert Advice on Silicone Implants: Hall v. Baxter Healthcare Corp., 37 Jurimetrics J. 113 (1997); Laurens Walker & John Monahan, “Scientific Authority: The Breast Implant Litigation and Beyond,” 86 Virginia L. Rev. 801 (2000); Jane F. Thorpe, Alvina M. Oelhafen, and Michael B. Arnold, “Court-Appointed Experts and Technical Advisors,” 26 Litigation 31 (Summer 2000); Laural L. Hooper, Joe S. Cecil & Thomas E. Willging, “Assessing Causation in Breast Implant Litigation: The Role of Science Panels,” 64 Law & Contemp. Problems 139 (2001); Debra L. Worthington, Merrie Jo Stallard, Joseph M. Price & Peter J. Goss, “Hindsight Bias, Daubert, and the Silicone Breast Implant Litigation: Making the Case for Court-Appointed Experts in Complex Medical and Scientific Litigation,” 8 Psychology, Public Policy &  Law 154 (2002).

[6] Judge Jones’ technical advisor on immunology reported that the studies offered in support of the alleged connection between silicone implantation and silicone-specific T cell responses, including the published papers by Shanklin and Smalley, “have a number of methodological shortcomings and thus should not form the basis of such an opinion.” Mary Stenzel-Poore, “Silicone Breast Implant Cases–Analysis of Scientific Reasoning and Methodology Regarding Immunological Studies” (Sept. 9, 1996). This judgment was seconded, over three years later, in the proceedings before MDL 926 and its Rule 706 court-appointed immunology expert witness. See Report of Dr. Betty A. Diamond, in MDL 926, at 14-15 (Nov. 30, 1998). Other expert witnesses who published studies on the supposed immunogenicity of silicone came up with some creative excuses to avoid producing their underlying data. Eric Gershwin consistently testified that his data were with a co-author in Israel, and that he could not produce them. N.A. Schachtman, “Silicone Data – Slippery and Hard to Find, Part I,” Tortini (July 4, 2015). Nonetheless, the court-appointed technical advisors were highly critical of Dr. Gershwin’s results. Dr. Stenzel-Poore, the immunologist on Judge Jones’ panel of advisors, found Gershwin’s claims “not well substantiated.” Hall v. Baxter Healthcare Corp., 947 F.Supp. 1387 (D. Ore. 1996). Similarly, Judge Pointer’s appointed expert immunologist Dr. Betty A. Diamond, was unshakeable in her criticisms of Gershwin’s work and his conclusions. Testimony of Dr. Betty A. Diamond, in MDL 926 (April 23, 1999). And the Institute of Medicine committee, charged with reviewing the silicone claims, found Gershwin’s work inadequate and insufficient to justify the extravagent claims that plaintiffs were making for immunogenicity and for causation of autoimmune disease. Stuart Bondurant, Virginia Ernster, and Roger Herdman, eds., Safety of Silicone Breast Implants 256 (1999). Another testifying expert witness who relied upon his own data, Nir Kossovsky, resorted to a seismic excuse; he claimed that the Northridge Quake destroyed his data. N.A. Schachtman, “Earthquake Induced Data Loss – We’re All Shook Up,” Tortini (June 26, 2015); Kossovsky, along with his wife, Beth Brandegee, and his father, Ram Kossowsky, sought to commercialize an ELISA-based silicone “antibody” biomarker diagnostic test, Detecsil. Although the early Rule 702 decisions declined to take a hard at Kossovsky’s study, the U.S. Food and Drug Administration eventually shut down the Kossovsky Detecsil test. Lillian J. Gill, FDA Acting Director, Office of Compliance, Letter to Beth S. Brandegee, President, Structured Biologicals (SBI) Laboratories: Detecsil Silicone Sensitivity Test (July 15, 1994); see Gary Taubes, “Silicone in the System: Has Nir Kossovsky really shown anything about the dangers of breast implants?” Discover Magazine (Dec. 1995).

[7] Leroy Young, “Testing the Test: An Analysis of the Reliability of the Silicone Sensitivity Test (SILS) in Detecting Immune-Mediated Responses to Silicone Breast Implants,” 97 Plastic & Reconstr. Surg. 681 (1996).

[8] Affid. of Donard S. Dwyer, at para. 6 (Dec. 1, 1995), filed in In re Breast Implant Litig. Pending in U.S. D. Ct, D. Oregon (Groups 1,2, and 3).

[9] Notes of Testimony of Dr. Donnard Dwyer, Nyitray v. Baxter Healthcare Corp., CV 93-159 (E. & S.D.N.Y and N.Y. Sup. Ct., N.Y. Cty. Oct. 8, 9, 1996) (Weinstein, J., Baer, J., Lobis, J., Pollak, M.J.).

[10] Id. at N.T. 238-239 (Oct. 8, 1996).

[11] Id. at N.T. 240.

[12] Id. at N.T. 241-42.

[13] Id. at N.T. 243-44; 255:22-256:3.

[14] Id. at 244-45.

[15] Id. at N.T. 259.

[16] Id. at N.T. 258:20-22.

[17] Id. at N.T. 254.

[18] Id. at N.T. 252:16-254.

[19] Id. at N.T. 254:19-255:2.

[20] Id. at N.T. 269:18-269:14.

[21] Id. at N.T. 261:23-262:1.

[22] Id. at N.T. 269:18-270.

[23] Id. atN.T. 256:3-16.

[24] Id. at N.T. 262:15-17

[25] Id. at N.T. 247:3-5.

[26] Id. at N.T. at 260:2-3

[27] Id. at N.T. at 261:5-8.

[28] One of the more interesting and colorful moments came when the late James Conlon cross-examined plaintiffs’ pathology expert witness, Saul Puszkin, about questionable aspects of his curriculum vitae. The examination was revealed such questionable conduct that Judge Weinstein stopped the examination and directed Dr. Puszkin not to continue without legal counsel of his own.

[29] In re Breast Implant Cases, 942 F. Supp. 958 (E.& S.D.N.Y. 1996). The opinion did not specifically address the Rule 702 and 703 issues that were the subject of pending motions before the court.

[30] Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (emphasis added).

Peer Review, Protocols, and QRPs

April 3rd, 2024

In Daubert, the Supreme Court decided a legal question about the proper interpretation of a statute, Rule 702, and then remanded the case to the Ninth Circuit of the Court of Appeals for further proceedings. The Court did, however, weigh in with dicta about some several considerations in admissibility decisions.  In particular, the Court identified four non-dispositive factors: whether the challenged opinion has been empirically tested, published and peer reviewed, and whether the underlying scientific technique or method supporting the opinion has an acceptable rate of error, and has gained general acceptance.[1]

The context in which peer review was discussed in Daubert is of some importance to our understanding its holding peer review out as a consideraton. One of the bases for the defense challenges to some of the plaintiffs’ expert witnesses’ opinions in Daubert was their reliance upon re-analyses of published studies to suggest that there was indeed an increased risk of birth defects if only the publication authors had used some other control group, or taken some other analytical approach. Re-analyses can be important, but these reanalyses of published Bendectin studies were post hoc, litigation driven, and obviously result oriented. The Court’s discussion of peer review reveals that it was not simply creating a box to be checked before a trial court could admit an expert witness’s opinions. Peer review was suggested as a consideration because:

“submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[2]

Peer review, or the lack thereof, for the challenged expert witnesses’ re-analyses was called out because it raised suspicions of lack of validity. Nothing in Daubert, or in later decisions, or more importantly in Rule 702 itself, supports admitting expert witness testimony just because the witness relied upon peer-reviewed studies, especially when the studies are invalid or are based upon questionable research practices. The Court was careful to point out that peer-reviewed publication was “not a sine qua non of admissibility; it does not necessarily correlate with reliability, … .”[3] The Court thus showed that it was well aware that well-ground (and thus admissible) opinions may not have been previously published, and that the existence of peer review was simply a potential aid in answering the essential question, whether the proponent of a proffered opinion has shown “the scientific validity of a particular technique or methodology on which an opinion is premised.[4]

Since 1993, much has changed in the world of bio-science publishing. The wild proliferation of journals, including predatory and “pay-to-play” journals, has disabused most observers that peer review provides evidence of validity of methods. Along with the exponential growth in publications has come an exponential growth in expressions of concern and out-right retractions of articles, as chronicled and detailed at Retraction Watch.[5] Some journals encourage authors to nominate the peer reviewers for their manuscripts; some journals let authors block some scientists as peer reviewers of their submitted manuscripts. If the Supreme Court were writing today, it might well note that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena.

Since the Supreme Court decided Daubert, the Federal Judicial Center and National Academies of Science have provided a Reference Manual for Scientific Evidence, now in its third edition, and with a fourth edition on the horizon, to assist judges and lawyers involved in the litigation of scientific issues. Professor Goodstein, in his chapter “How Science Works,” in the third edition, provides the most extensive discussion of peer review in the Manual, and emphasizes that peer review “works very poorly in catching cheating or fraud.”[6]  Goodstein invokes his own experience as a peer reviewer to note that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.”[7] Indeed, Goodstein’s essay in the Reference Manual characterizes the ability of peer review to warrant study validity as a “myth”:

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. … It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.[8]

Goodstein’s experience as a peer reviewer is hardly idiosyncratic. One standard text on the ethical conduct of research reports that peer review is often ineffective or incompetent, and that it may not even catch simple statistical or methodological errors.[9] According to the authors, Shamoo and Resnik:

“[p]eer review is not good at detecting data fabrication or falsification partly because reviewers usually do not have access to the material they would need to detect fraud, such as the original data, protocols, and standard operating procedures.”[10]

Indeed, without access to protocols, statistical analysis plans, and original data, peer review often cannot identify good faith or negligent deviations from the standard of scientific care. There is some evidence to support this negative assessment of peer review from testing of the counter-factual. Reviewers were able to detect questionable, selective reporting when they had access to the study authors’ research protocols.[11]

Study Protocol

The study protocol provides the scientific rationale for a study, clearly defines the research question, the data collection process, defines the key exposure and outcomes, and describes the methods to be applied, before commencing data collection.[12] The protocol also typically pre-specifies the statistical data analysis. The epidemiology chapter of the current edition of the Reference Manual for Scientific Evidence offers blandly only that epidemiologists attempt to minimize bias in observational studies with “data collection protocols.”[13] Epidemiologists and statisticians are much clearer in emphasizing the importance, indeed the necessity, of having a study protocol before commencing data collection. Back in 1988, John Bailar and Frederick Mosteller explained that it was critical in reporting statistical analyses to inform readers about how and when the authors devised the study design, and whether they set the design criteria out in writing before they began to collect data.[14]

The necessity of a study protocol is “self-evident,”[15] and essential to research integrity.[16] The International Society of Pharmacoepidemiology has issued Guidelines for “Good Pharmacoepidemiology Practices,”[17] which calls for every study to have a written protocol. Among the requirements set out in this set of guidelines are descriptions of the research method, study design, operational definitions of exposure and outcome variables, and projected study sample size. The Guidelines provide that a detailed statistical analysis plan may be specified after data collection begins, but before any analysis commences.

Expert witness opinions on health effects are built upon studies, and so it behooves legal counsel to identify the methodological strengths and weaknesses of key studies through questioning whether they have protocols, whether the protocols were methodologically appropriate, and whether the researchers faithfully followed their protocols and their statistical analysis plans. Determining the peer review status of a publication, on the other hand, will often not advance a challenge based upon improvident methodology.

In some instances, a published study will have sufficiently detailed descriptions of methods and data that readers, even lawyers, can evaluate their scientific validity or reliability (vel non). In some cases, however, readers will be no better off than the peer reviewers who were deprived of access to protocols, statistical analysis plans, and original data. When a particular study is crucial support for an adversary’s expert witness, a reasonable litigation goal may well be to obtain the protocol and statistical analysis plan, and if need be, the original underlying data. The decision to undertake such discovery is difficult. Discovery of non-party scientists can be expensive and protracted; it will almost certainly be contentious. When expert witnesses rely upon one or a few studies, which telegraph internal validity, this litigation strategy may provide the strongest evidence against the study’s being reasonably relied upon, or its providing “sufficient facts and data” to support an admissible expert witness opinion.


[1] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-594 (1993).

[2] Id. at 594 (internal citations omitted) (emphasis added).

[3] Id.

[4] Id. at 593-94.

[5] Retraction Watch, at https://retractionwatch.com/.

[6] Reference Manual on Scientific Evidence at 37, 44-45 (3rd ed. 2011) [Manual].

[7] Id. at 44-45 n.11.

[8] Id. at 48 (emphasis added).

[9] Adil E. Shamoo and David B. Resnik, Responsible Conduct of Research 133 (4th ed. 2022).

[10] Id.

[11] An-Wen Chan, Asbjørn Hróbjartsson, Mette T. Haahr, Peter C. Gøtzsche, and David G. Altman, D. G. “Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles,” 291 J. Am. Med. Ass’n 2457 (2004).

[12] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[13] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 573 (3rd ed. 2011) 573 (“Study designs are developed before they begin gathering data.”).

[14] John Bailar & Frederick Mosteller, “Guidelines for Statistical Reporting in Articles for Medical Journals,” 108 Ann. Intern. Med. 2266, 268 (1988).

[15] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[16] Sandra Alba, et al., “Bridging research integrity and global health epidemiology statement: guidelines for good epidemiological practice,” 5 BMJ Global Health e003236, at p.3 & passim (2020).

[17] See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at <https://www.pharmacoepi.org/resources/policies/guidelines-08027/>.

QRPs in Science and in Court

April 2nd, 2024

Lay juries usually function well in assessing the relevance of an expert witness’s credentials, experience, command of the facts, likeability, physical demeanor, confidence, and ability to communicate. Lay juries can understand and respond to arguments about personal bias, which no doubt is why trial lawyers spend so much time and effort to emphasize the size of fees and consulting income, and the propensity to testify only for one side. For procedural and practical reasons, however, lay juries do not function very well in assessing the actual merits of scientific controversies. And with respect to methodological issues that underlie the merits, juries barely function at all. The legal system imposes no educational or experiential qualifications for jurors, and trials are hardly the occasion to teach jurors the methodology, skills, and information needed to resolve methodological issues that underlie a scientific dispute.

Scientific studies, reviews, and meta-analyses are virtually never directly admissible in evidence in courtrooms in the United States. As a result, juries do not have the opportunity to read and ponder the merits of these sources, and assess their strengths and weaknesses. The working assumption of our courts is that juries are not qualified to engage directly with the primary sources of scientific evidence, and so expert witnesses are called upon to deliver opinions based upon a scientific record not directly in evidence. In the litigation of scientific disputes, our courts thus rely upon the testimony of so-called expert witnesses in the form of opinions. Not only must juries, the usual trier of fact in our courts, assess the credibility of expert witnesses, but they must assess whether expert witnesses are accurately describing studies that they cannot read in their entirety.

The convoluted path by which science enters the courtroom supports the liberal and robust gatekeeping process outlined under Rules 702 and 703 of the Federal Rules of Evidence. The court, not the jury, must make a preliminary determination, under Rule 104, that the facts and data of a study are reasonably relied upon by an expert witness (Rule 703). And the court, not the jury, again under Rule 104, must determine that expert witnesses possess appropriate qualifications for relevant expertise, and that these witnesses have proffered opinions sufficiently supported by facts or data, based upon reliable principles and methods, and reliably applied to the facts of the case. (Rule 702). There is no constitutional right to bamboozle juries with inconclusive, biased, and confounded or crummy studies, or selective and incomplete assessments of the available facts and data. Back in the days of “easy admissibility,” opinions could be tested on cross-examination, but limited time and acumen of counsel, court, and juries cry out for meaningful scientific due process along the lines set out in Rules 702 and 703.

The evolutionary development of Rules 702 and 703 has promoted a salutary convergence between science and law. According to one historical overview of systematic reviews in science, the foundational period for such reviews (1970-1989) overlaps with the enactment of Rules 702 and 703, and the institutionalization of such reviews (1990-2000) coincides with the development of these Rules in a way that introduced some methodological rigor into scientific opinions that are admitted into evidence.[1]

The convergence between legal admissibility and scientific validity considerations has had the further result that scientific concerns over the quality and sufficiency of underlying data, over the validity of study design, analysis, reporting, and interpretation, and over the adequacy and validity of data synthesis, interpretation, and conclusions have become integral to the gatekeeping process. This convergence has the welcome potential to keep legal judgments more in line with best scientific evidence and practice.

The science-law convergence also means that courts must be apprised of, and take seriously, the problems of study reproducibility, and more broadly, the problems raised by questionable research practices (QRPs), or what might be called the patho-epistemology of science. The development, in the 1970s, and the subsequent evolution, of the systematic review represented the scientific community’s rejection of the old-school narrative reviews that selected a few of all studies to support a pre-existing conclusion. Similarly, the scientific community’s embarrassment, in the 1980s and 1990s, over the irreproducibility of study results, has in this century grown into an existential crisis over study reproducibility in the biomedical sciences.

In 2005, John Ioannidis published an article that brought the concern over “reproducibility” of scientific findings in bio-medicine to an ebullient boil.[2] Ioannidis pointed to several factors, which alone or in combination rendered most published medical findings likely false. Among the publication practices responsible for this unacceptably high error rate, Ioannidis identified the use of small sample sizes, data-dredging and p-hacking techniques, poor or inadequate statistical analysis, in the context of undue flexibility in research design, conflicts of interest, motivated reasoning, fads, and prejudices, and pressure to publish “positive” results.  The results, often with small putative effect sizes, across an inadequate number of studies, are then hyped by lay and technical media, as well as the public relations offices of universities and advocacy groups, only to be further misused by advocates, and further distorted to serve the goals of policy wonks. Social media then reduces all the nuances of a scientific study to an insipid meme.

Ioannidis’ critique resonated with lawyers. We who practice in health effects litigation are no strangers to dubious research methods, lack of accountability, herd-like behavior, and a culture of generating positive results, often out of political or economic sympathies. Although we must prepare for confronting dodgy methods in front of jury, asking for scientific due process that intervenes and decides the methodological issues with well-reasoned, written opinions in advance of trial does not seem like too much.

The sense that we are awash in false-positive studies was heightened by subsequent papers. In 2011, Uri Simonsohn and others showed that by using simulations of various combinations of QRPs in psychological science, researchers could attain a 61% false-positive rate for research outcomes.[3] The following year saw scientists at Amgen attempt replication of 53 important studies in hematology and oncology. They succeeded in replicated only six.[4] Also in 2012, Dr. Janet Woodcock, director of the Center for Drug Evaluation and Research at the Food and Drug Administration, “estimated that as much as 75 per cent of published biomarker associations are not replicable.”[5] In 2016, the journal Nature reported that over 70% of scientists who responded to a survey had unsuccessfully attempted to replicate another scientist’s experiments, and more than half failed to replicate their own work.[6] Of the respondents, 90% agreed that there was a replication problem. A majority of the 90% believed that the problem was significant.

The scientific community reacted to the perceived replication crisis in a variety of ways, from conceptual clarification of the very notion of reproducibility,[7] to identification of improper uses and interpretations of key statistical concepts,[8] to guidelines for improved conduct and reporting of studies.[9]

Entire books dedicated to identifying the sources of, and the correctives for, undue researcher flexibility in the design, conduct, and analysis of studies, have been published.[10] In some ways, the Rule 702 and 703 case law is like the collected works of the Berenstain Bears, on how not to do studies.

The consequences of the replication crisis are real and serious. Badly conducted and interpreted science leads to research wastage,[11] loss of confidence in scientific expertise,[12] contemptible legal judgments, and distortion of public policy.

The proposed correctives to QRPs deserve the careful study of lawyers and judges who have a role in health effects litigation.[13] Whether as the proponent of an expert witness, or the challenger, several of the recurrent proposals, such as the call for greater data sharing and pre-registration of protocols and statistical analysis plans,[14] have real-world litigation salience. In many instances, they can and should direct lawyers’ efforts at discovery and challenging of the relied upon scientific studies in litigation.


[1] Quan Nha Hong & Pierre Pluye, “Systematic Reviews: A Brief Historical Overview,” 34 Education for Information 261 (2018); Mike Clarke & Iain Chalmers, “Reflections on the history of systematic reviews,” 23 BMJ Evidence-Based Medicine 122 (2018); Cynthia Farquhar & Jane Marjoribanks, “A short history of systematic reviews,” 126 Brit. J. Obstetrics & Gynaecology 961 (2019); Edward Purssell & Niall McCrae, “A Brief History of the Systematic Review,” chap. 2, in Edward Purssell & Niall McCrae, How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students 5 (2020).

[2] John P. A. Ioannidis “Why Most Published Research Findings Are False,” 1 PLoS Med 8 (2005).

[3] Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: UndisclosedFlexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” 22 Psychological Sci. 1359 (2011).

[4] C. Glenn Begley and Lee M. Ellis, “Drug development: Raise standards for preclinical cancer research,” 483 Nature 531 (2012).

[5] Edward R. Dougherty, “Biomarker Development: Prudence, risk, and reproducibility,” 34 Bioessays 277, 279 (2012); Turna Ray, “FDA’s Woodcock says personalized drug development entering ‘long slog’ phase,” Pharmacogenomics Reporter (Oct. 26, 2011).

[6] Monya Baker, “Is there a reproducibility crisis,” 533 Nature 452 (2016).

[7] Steven N. Goodman, Daniele Fanelli, and John P. A. Ioannidis, “What does research reproducibility mean?,” 8 Science Translational Medicine 341 (2016); Felipe Romero, “Philosophy of science and the replicability crisis,” 14 Philosophy Compass e12633 (2019); Fiona Fidler & John Wilcox, “Reproducibility of Scientific Results,” Stanford Encyclopedia of Philosophy (2018), available at https://plato.stanford.edu/entries/scientific-reproducibility/.

[8] Andrew Gelman and Eric Loken, “The Statistical Crisis in Science,” 102 Am. Scientist 460 (2014); Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021).

[9] The International Society for Pharmacoepidemiology issued its first Guidelines for Good Pharmacoepidemiology Practices in 1996. The most recent revision, the third, was issued in June 2015. See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at https://www.pharmacoepi.org/resources/policies/guidelines-08027/. See also Erik von Elm, Douglas G. Altman, Matthias Egger, Stuart J. Pocock, Peter C. Gøtzsche, and Jan P. Vandenbroucke, for the STROBE Initiative, “The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement Guidelines for Reporting Observational Studies,” 18 Epidem. 800 (2007); Jan P. Vandenbroucke, Erik von Elm, Douglas G. Altman, Peter C. Gøtzsche, Cynthia D. Mulrow, Stuart J. Pocock, Charles Poole, James J. Schlesselman, and Matthias Egger, for the STROBE initiative, “Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration,” 147 Ann. Intern. Med. W-163 (2007); Shah Ebrahim & Mike Clarke, “STROBE: new standards for reporting observational epidemiology, a chance to improve,” 36 Internat’l J. Epidem. 946 (2007); Matthias Egger, Douglas G. Altman, and Jan P Vandenbroucke of the STROBE group, “Commentary: Strengthening the reporting of observational epidemiology—the STROBE statement,” 36 Internat’l J. Epidem. 948 (2007).

[10] See, e.g., Lee J. Jussim, Jon A. Krosnick, and Sean T. Stevens, eds., Research Integrity: Best Practices for the Social and Behavioral Sciences (2022); Joel Faintuch & Salomão Faintuch, eds., Integrity of Scientific Research: Fraud, Misconduct and Fake News in the Academic, Medical and Social Environment (2022); William O’Donohue, Akihiko Masuda & Scott Lilienfeld, eds., Avoiding Questionable Research Practices in Applied Psychology (2022); Klaas Sijtsma, Never Waste a Good Crisis: Lessons Learned from Data Fraud and Questionable Research Practices (2023).

[11] See, e.g., Iain Chalmers, Michael B Bracken, Ben Djulbegovic, Silvio Garattini, Jonathan Grant, A Metin Gülmezoglu, David W Howells, John P A Ioannidis, and Sandy Oliver, “How to increase value and reduce waste when research priorities are set,” 383 Lancet 156 (2014); John P A Ioannidis, Sander Greenland, Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, and Robert Tibshirani, “Increasing value and reducing waste in research design, conduct, and analysis,” 383 Lancet 166 (2014).

[12] See, e.g., Friederike Hendriks, Dorothe Kienhues, and Rainer Bromme, “Replication crisis = trust crisis? The effect of successful vs failed replications on laypeople’s trust in researchers and research,” 29 Public Understanding Sci. 270 (2020).

[13] R. Barker Bausell, The Problem with Science: The Reproducibility Crisis and What to Do About It (2021).

[14] See, e.g., Brian A. Noseka, Charles R. Ebersole, Alexander C. DeHavena, and David T. Mellora, “The preregistration revolution,” 115 Proc. Nat’l Acad. Soc. 2600 (2018); Michael B. Bracken, “Preregistration of Epidemiology Protocols: A Commentary in Support,” 22 Epidemiology 135 (2011); Timothy L. Lash & Jan P. Vandenbroucke, “Should Preregistration of Epidemiologic Study Protocols Become Compulsory? Reflections and a Counterproposal,” 23 Epidemiology 184 (2012).

The Proper Study of Mankind

December 24th, 2023

“Know then thyself, presume not God to scan;

The proper study of Mankind is Man.”[1]

 

Kristen Ranges recently earned her law degree from the University of Miami School of Law, and her doctorate in Environmental Science and Policy, from the University of Miami Rosenstiel School of Marine, Atmospheric, and Earth Science. Ranges’ dissertation was titled Animals Aiding Justice: The Deepwater Horizon Oil Spill and Ensuing Neurobehavioral Impacts as a Case Study for Using Animal Models in Toxic Tort Litigation – A Dissertation.[2] At first blush, Ranges would seem to be a credible interlocutor in the never-ending dispute over the role of whole animal toxicology (and in vitro toxicology) in determining human causation in tort litigation. Her dissertation title is, however, as Martin Short would say, a bit of a tell. Zebrafish become sad when exposed to oil spills, as do we all.

Ranges recently published a spin-off of her dissertation as a law review article with one of her professors. “Vermin of Proof: Arguments for the Admissibility of Animal Model Studies as Proof of Causation in Toxic Tort Litigation.”[3] Arguments for; no arguments against. We can thus understand this is an advocacy piece, which is fair enough. The paper was not designed or titled to mislead anyone into thinking it would be a consideration of arguments for and against extrapolation from (non-human) animal studies to human beings. Perhaps you will think it churlish of me to point out that animal studies will rarely be admissible as evidence. They come into consideration in legal cases only through expert witnesses’ reliance upon them. So the issue is not whether animal studies are admissible, but rather whether expert witness opinion testimony that relies solely or excessively on animal studies for purposes of inferring causation is admissible under the relevant evidentiary rules. Talking about the admissibility of animal model studies signals, if nothing else, a serious lack of familiarity with the relevant evidentiary rules.

Ranges’ law review is clearly, and without subtlety, an advocacy piece. She argues:

“However, judges, scholars, and other legal professionals are skeptical of the use of animal studies because of scientific and legal concerns, which range from interspecies disparities to prejudice of juries. These concerns are either unfounded or exaggerated. Animal model studies can be both reliable and relevant in toxic tort cases. Given the Federal Rules of Evidence, case law relevant to scientific evidence, and one of the goals of tort law-justice-judges should more readily admit these types of studies as evidence to help plaintiffs meet the burden of proof in toxic tort litigation.”[4]

For those of you who labor in this vineyard, I would suggest you read Ranges’ article and judge for yourself. What I see is a serious lack of scientific evidence for her claims, and a serious misunderstanding of the relevant law. One might, for starters, putting aside the Agency’s epistemic dilution, ask whether there are any I.A.R.C. category I (“known”) carcinogens based solely upon animal evidence. Or has the U.S. Food & Drug Administration ever approved a medication as reasonably safe and effective based upon only animal studies?

Every dog owner and lover has likely been told by a veterinarian, or the Humane Society, that we should resist their lupine entreaties and withhold chocolate, raisins, walnuts, avocados, and certain other human foods. Despite their obvious intelligence, capacity for affection, when it comes to toxicology, dogs are not people, although some people act like the less reputable varieties of dogs.

Back in 1985, in connection with Agent Orange litigation, the late Judge Jack Weinstein wrote what was correct then, and even more so today, that “laboratory animal studies are generally viewed with more suspicion than epidemiological studies, because they require making the assumption that chemicals behave similarly in different species.”[5] Judge Weinstein was no push-over for strident defense counsel or expert witnesses, but the legal consequences were nonetheless obvious to him, when he looked carefully at the animal studies plaintiffs’ expert witnesses were claiming to support their opinions. “[A]nimal studies are of so little probative value as to be inadmissible.  They cannot be a predicate for an opinion under Rule 703.”[6] One of the several disconnects between the plaintiffs’ expert witnesses’ animal studies and the human diseases claimed was the disparity of dose and duration between the relied upon studies and the service men claimants. Judge Weinstein observed that when the hand waving stopped, “[t]here is no evidence that plaintiffs were exposed to the far higher concentrations involved in both animal and industrial exposure studies.”[7]

Ranges and Oakley unfairly deprecate the Supreme Court’s treatment of animal evidence in the 1997 Joiner opinion.[8] Mr. Joiner had been an electrician by a small city in Georgia, where he experienced dermal exposure, over several years, to polychlorinated biphenyls (PCB’s), a chemical found in electrical transformer coolant. He alleged that he had developed small-cell lung cancer from his occasional occupational exposure. In the district court, a careful judge excluded the plaintiffs’ expert witnesses, who relied heavily upon animal studies and who cherry picked and distorted the available epidemiology.[9] The Court of Appeals reversed, in an unsigned, non-substantive opinion that interjected an asymmetric standard of review.[10]

After granting review, the Supreme Court engaged with the substantive validity issues passed over by the intermediate appellate court. In addressing the plaintiff’s expert witnesses’s reliance upon animal studies, the Court was struck by an extrapolation from a different species, different route of administration, different dose, different duration of exposure, and different disease.[11] Joiner was an adult human whose alleged exposure to PCBs was far less than the exposure in the baby mice that received injections of PCBs in a high concentration. The mice developed alveologenic adenomas, a rare tumor that is usually benign, not malignant.[12] The Joiner Court recognized that these multiple extrapolations were a bridge to nowhere, and reversed the Court of Appeals, and reinstated the judgment of the district court. What is particular salient about the Joiner decision, and about which you will find no discussion in the law review paper by Ranges and Oakley, is how well the Joiner opinion has held up over quarter of a century that passed. Today, in the waning moments of 2023, there is still no valid, scientifically sound support for the claim that the sort of exposure Mr. Joiner had can cause small-cell lung cancer.[13]

Perhaps the most egregious lapses in scholarship occur when Ranges, a newly minted scientist, and her co-author, a full professor of law, write:

“For example, Bendectin, an antinausea medication prescribed to pregnant women, caused a slew of birth defects (hence its nickname ‘The Second Thalidomide’).49[14]

I had to re-read this sentence many times to make sure I was not hallucinating. Ranges’ and Oakley’s statement is, of course, demonstrably false. A double whooper, at least, and a jarring deviation from the standard of scholarly care.

But their statement is footnoted you say. Here is what the cited article, footnote 40 in “Vermin of Proof,” says:

“RESULTS: The temporal trends in prevalence rates for specific birth defects examined from 1970 through 1992 did not show changes that reflected the cessation of Bendectin use over the 1980–84 period. Further, the NVP hospitalization rate doubled when Bendectin use ceased.

CONCLUSIONS: The population results of the ecological analyses complement the person-specific results of the epidemiological analyses in finding no evidence of a teratogenic effect from the use of Bendectin.”[15]

So the cited source actually says the exact opposite of what the authors assert. Apparently, students on law review at Georgetown University Law Center do not check citations for accuracy. Not only was the statement wrong in 1993, when the Supreme Court decided the famous Daubert case, it was wrong 20 years later, in 2013, when the United States Food and Drug Administration (FDA) approved  Diclegis, a combination of doxylamine succinate and pyridoxine hydrochloride, the essential ingredients in Bendectin, for sale in the United States, for pregnant women experiencing nausea and vomiting.[16] The return of Bendectin to the market, although under a different name, was nothing less than a triumph of science over the will of the lawsuit industry.[17] 

Channeling the likes of plaintiffs’ expert witness Carl Cranor (whom they cite liberally and credulously), Ranges and Oakley argue for a vague “weight of the evidence” (WOE) methodology, in which several inconclusive and lighter-than-air pieces of evidence somehow magically combine in cold fusion to warrant a conclusion of causation. Others have gone down this dubious path before, but these authors’ embrace of the plaintiffs’ expert witnesses’ opinion in Bendectin litigation reveals the insubstantiality and the invalidity of their method.[18] As Professor Ronald Allen put the matter:

“Given the weight of evidence in favor of Bendectin’s safety, it seems peculiar to argue for mosaic evidence [WOE] from a case in which it would have plainly been misleading.”[19]

It surely seems like a reduction ad absurdum of the proposed methodology.

One thing these authors get right is that most courts disparage and exclude expert witness opinion that relies exclusively or excessively upon animal toxicology.[20] They wrongly chastise these courts, however, for ignoring scientific opinion. In 2005, the Teratology Society issued a position paper on causation in teratology-related litigation,[21] in which the Society specifically addressed the authors’ claims:

“6. Human data are required for conclusions that there is a causal relationship between an exposure and an outcome in humans. Experimental animal data are commonly and appropriately used in establishing regulatory exposure limits and are useful in addressing biologic plausibility and mechanism questions, but are not by themselves sufficient to establish causation in a lawsuit. In vitro data may be helpful in exploring mechanisms of toxicity but are not by themselves evidence of causation.”[22]

Ranges and Oakley are flummoxed that courts exclude expert witnesses who have relied upon animal studies when regulatory agencies use such studies with abandon. The case law on the distinction between precautionary standards in regulation and causation standards in tort law is clear, and explains the difference in approach, but these authors are determined to ignore the obvious difference.[23] The Teratology Society emphasized what should be hornbook law; namely, regulatory standards for testing and warnings are not particularly germane to tort law standards for causation:

“2. The determination of causation in a lawsuit is not the same as a regulatory determination of a protective level of exposure. If a government agency has determined a regulatory exposure level for a chemical, the existence of that level is not evidence that the chemical produces toxicity in humans at that level or any other level. Regulatory levels use default assumptions that are improper in lawsuits. One such assumption is that humans will be as sensitive to the toxicity of a chemical as is the most sensitive experimental animal species. This assumption may be very useful in regulation but is not evidence that exposure to that chemical caused an adverse outcome in an individual plaintiff. Regulatory levels often incorporate uncertainty factors or margins of exposure. These factors may result in a regulatory level much lower than an exposure level shown to be harmful in any organism and are an additional reason for the lack of utility of regulatory levels in causation considerations.”[24]

The suggestion from Ranges and Oakley that the judicial treatment of reliance upon animal studies is based upon ossified, ancient precedent, prejudice, and uncritical acceptance of defense counsel’s unsupported argument is simply wrong. There are numerous discussions of the difficulty of extrapolating teratogenicity from animal data to humans,[25] and ample basis for criticism of the glib extension of rodent carcinogenicity to humans.[26]

Ranges and Oakley ignore the extensive scientific literature questioning extrapolation from high exposure rodent models to much lower exposures in humans.[27] The invalidity of extrapolation can result in both false positives and false negatives. Indeed the thalidomide case is a compelling example of the failure of animal testing. Thalidomide was tested on pregnant rats and rabbits without detecting teratogenicity; indeed most animal species do not metabolize thalidomide or exhibit teratogenicity as seen in humans. Animal models simply do not have a sufficient positive predictive value to justify a conclusion of causation in humans, even if we accept a precautionary principle recognition of such animal testing for regulatory purposes.[28]

As improvident as Ranges’ pronouncements may be, finding her message amplified by Professor Ed Cheng on his podcast series, Excited Utterances, was even more disturbing. In November 2023, Cheng interviewed Kristen Ranges in an episode of his podcast, Vermin of Proof, in which he gave Ranges a chance to reprise her complaints about the judiciary’s handling of animal evidence, without much in the way of specificity, and with some credulous cheerleading to aid and abet. In his epilogue, Cheng wondered why toxicologic evidence is disfavored when such evidence is routinely used by scientists and regulators. What Cheng misses is that regulators use toxicologic evidence for regulation, not for assessments of human causation, and that the two enterprises are quite different.  The regulatory exercise goes something like asking about the stall speed of a pig. It does not matter that pigs cannot fly; we skip that fact and press on to ask what the pig’s take off and stall speeds are.

Seventy years ago, no less than Sir Austin Bradford Hill, observed that:

“We may subject mice, or other laboratory animals, to such an atmosphere of tobacco smoke that they can — like the old man in the fairy story — neither sleep nor slumber; they can neither breed nor eat. And lung cancers may or may not develop to a significant degree. What then? We may have thus strengthened the evidence, we may even have narrowed the search, but we must, I believe, invariably return to man for the final proof or proofs.”[29]


[1] Alexander Pope, “An Essay on Man” (1733), in Robin Sowerby, ed., Alexander Pope: Selected Poetry and Prose at 153 (1988).

[2] Kristen Ranges, Animals Aiding Justice: The Deepwater Horizon Oil Spill and Ensuing Neurobehavioral Impacts as a Case Study for Using Animal Models in Toxic Tort Litigation – A Dissertation (2023).

[3] Kristen Ranges & Jessica Owley, “Vermin of Proof: Arguments for the Admissibility of Animal Model Studies as Proof of Causation in Toxic Tort Litigation,” 34 Georgetown Envt’l L. Rev. 303 (2022) [Vermin]

[4] Vermin at 303.

[5] In re Agent Orange Prod. Liab. Litig., 611 F. Supp. 1223, 1241 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).

[6] Id.

[7] Id.

[8] General Elec. Co. v. Joiner, 522 U.S. 136, 144 (1997) [Joiner]

[9] Joiner v. General Electric Co., 864 F. Supp. 1310 (N.D. Ga. 1994).

[10] Joiner v. General Electric Co., 134 F.3d 1457 (11th Cir. 1998) (per curiam). 

[11] Joiner, 522 U.S. at 144-45.

[12] See Leonid Roshkovan, Jeffrey C. Thompson, Sharyn I. Katz, Charuhas Deshpande, Taylor Jenkins, Anna K. Nowak, Rosyln Francis, Carole Dennie, Dominique Fabre, Sunil Singhal, and Maya Galperin-Aizenberg, “Alveolar adenoma of the lung: multidisciplinary case discussion and review of the literature,” 12 J. Thoracic Dis. 6847 (2020).

[13] See How Have Important Rule 702 Holdings Held Up With Time?” (Mar. 20, 2015); “The Joiner Finale” (Mar. 23, 2015).

[14] Vermain at 312.

[15] Jeffrey S Kutcher, Arnold Engle, Jacqueline Firth & Steven H. Lamm, “Bendectin and Birth Defects II: Ecological Analyses, 67 Birth Defects Research Part A: Clinical and Molecular Teratology 88, 88 (2003).

[16] See FDA News Release, “FDA approves Diclegis for pregnant women experiencing nausea and vomiting,” (April 8, 2013).

[17] See Gideon Koren, “The Return to the USA of the Doxylamine-Pyridoxine Delayed Release Combination (Diclegis®) for Morning Sickness — A New Morning for American Women,” 20 J. Popul. Ther. Clin. Pharmacol. e161 (2013).

[18] Michael D. Green, “Pessimism About Milward,” 3 Wake Forest J. Law & Policy41, 62-63 (2013); Susan Haack, “Irreconcilable Differences? The Troubled Marriage of Science and Law,” 72 Law & Contemporary Problems 1, 17 (2009); Susan Haack, “Proving Causation: The Holism of Warrant and the Atomism of Daubertm” 4 J. Health & Biomedical Law 273, 274-78 (2008).

[19] Ronald J. Allen & Esfand Nafisi, “Daubert and its Discontents,” 76 Brooklyn L. Rev. 132, 148 (2010). 

[20] See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 466, 475 (E.D. Pa. 2014) (noting that “causation opinions based primarily upon in vitro and live animal studies are unreliable and do not meet the Daubert standard.”), aff’d, 858 F.3d 787 (3rd Cir. 2017); Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296, 1308 (11th Cir. 2014) (affirming exclusion of testimony based on “secondary methodologies,” including animal studies, which offer “insufficient proof of general causation.”); The Sugar Ass’n v. McNeil-PPC, Inc., 2008 WL 11338092, *3 (C.D. Calif. July 21, 2008) (finding that plaintiffs’ expert witnesses, including Dr. Abou-Donia, failed to provide the requisite analytical  support for the extrapolation of their Five Opinions from rats to humans”); In re Silicone Gel Breast Implants Prods. Liab. Litig., 318 F. Supp. 2d 879, 891 (C.D. Cal. 2004) (observing that failure to compare similarities and differences across animals and humans could lead to the exclusion of opinion evidence); Cagle v. The Cooper Companies, 318 F. Supp. 2d 879, 891 (C.D. Calif. 2004) (citing Joiner, for observation that animal studies are not generally admissible when contrary epidemiologic studies are available; and detailing significant disadvantages in relying upon animal studies, such as (1) differences in absorption, distribution, and metabolism; (2) the unrealistic, non-physiological exposures used in animal studies; and (3) the use of unverified assumptions about dose-response); Wills v. Amerada Hess Corp., No. 98 CIV. 7126(RPP), 2002 WL 140542, at *12 (S.D.N.Y. Jan. 31, 2002) (faulting expert’s reliance on animal studies because there was no evidence plaintiff had injected suspected carcinogen in same manner as studied animals, or at same dosage levels), aff’d, 379 F.3d 32 (2nd Cir. 2004) (Sotomayor, J.); Bourne v. E.I. du Pont de Nemours & Co., 189 F. Supp. 2d 482, 501 (S.D. W.Va. 2002) (benlate and birth defects), aff’d, 85 F. App’x 964 (4th Cir.), cert. denied, 543 U.S. 917 (2004); Magistrini v. One Hour Martinizing Dry Cleaning noted that “[a]nimal bioassays are of limited use in determining whether a particular chemical causes a particular disease, or type of cancer, in humans.”190 180 F. Supp. 2d 584, 593 (D.N.J. 2002); Soutiere v. BetzDearborn, Inc., No. 2:99-CV-299, 2002 WL  34381147, at *4 (D. Vt. July 24, 2002) (holding expert’s evidence inadmissible when “[a]t best there are animal studies that suggest a link between massive doses of [the substance in question] and the development of certain kinds of cancers, such that [the substance in question] is listed as a ‘suspected’ or ‘probable’ human carcinogen”); Glastetter v. Novartis Pharms. Corp., 252 F.3d 986, 991 (8th Cir. 2001); Hollander v. Sandoz Pharm. Corp., 95 F. Supp. 2d 1230, 1238 (W.D. Okla. 2000), aff’d, 289 F.3d 1193, 1209 (10th Cir. 2002) (rejecting the relevance of animal studies to causation arguments in the circumstances of the case); Allison v. McGhan Medical Corp., 184 F.3d 1300, 1313–14 (11th Cir.1999); Raynor v. Merrell Pharrns. Inc., 104 F.3d 1371, 1375-1377 (D.C. Cir.1997) (observing that animal studies are unreliable, especially when “sound epidemiological studies produce opposite results from non-epidemiological ones, the rate of error of the latter is likely to be quite high”); Lust v. Merrell Dow Pharms., Inc., 89 F.3d 594, 598 (9th Cir.1996); Barrett v. Atlantic Richfield Co., 95 F.3d 375 (5th Cir. 1996) (extrapolation from a rat study was speculation); Nat’l Bank of Comm. v. Dow Chem. Co., 965 F. Supp. 1490, 1527 (E.D. Ark. 1996) (“because of the difference in animal species, the methods and routes of administration of the suspect chemical agent, maternal metabolisms and other factors, animal studies, taken alone, are unreliable predictors of causation in humans”), aff’d, 133 F.3d 1132 (8th Cir. 1998); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1410-11 (D. Or. 1996) (with the help of court-appointed technical advisors, observing that animal studies taken alone fail to predict human disease reliably); Daubert v. Merrell Dow Pharrns., Inc., 43 F.3d 1311, 1322 (9th Cir. 1995) (on remand from Supreme Court with directions to apply an epistemic standard derived from Rule 702 itself); Sorensen v. Shaklee Corp., 31 F.3d 638, 650 (8th Cir.1994) (affirming exclusion of expert witness opinions based upon animal mutagenicity data not germane to the claimed harm); Elkins v. Richardson-Merrell, Inc., 8 F.3d 1068, 1073 (6th Cir. 1993);Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441, 1482 (D.V.1. 1994), aff’d, 46 F.3d 1120 (3d Cir. 1994) (per curiam); Renaud v. Martin Marietta Corp., Inc., 972 F.2d 304, 307 (10th Cir.1992) (“The etiological evidence proffered by the plaintiff was not sufficiently reliable, being drawn from tests on non-human subjects without confirmatory epidemiological data.”) (“Dr. Jackson performed no calculations to determine whether the dose or route of administration of antidepressants to rats and monkeys in the papers that she cited in her report was equivalent to or substantially similar to human beings taking prescribed doses of Prozac.”); Bell v. Swift Adhesives, Inc., 804 F. Supp. 1577, 1579–81 (S.D. Ga. 1992) (excluding expert opinion of Dr. Janette Sherman, who opined that methylene chloride caused liver cancer, based largely upon on animal studies); Conde v. Velsicol Chem. Corp., 804 F. Supp. 972, 1025-26 (S.D. Ohio 1992) (noting that epidemiology is “the primary generally accepted methodology for demonstrating a causal relation between a chemical compound and a set of symptoms or a disease”), aff’d, 24 F.3d 809 (6th Cir. 1994); Turpin v. Merrell Dow Pharm., Inc., 959 F.2d 1349, 1360-61 (6th Cir. 1992) (“The analytical gap between the [animal study] evidence presented and the inferences to be drawn on the ultimate issue of human birth defects is too wide. Under such circumstances, a jury should not be asked to speculate on the issue of causation.”); Brock v. Merrell Dow Pharm., 874F.2d 307, 313 (5th Cir. 1989) (noting the “very limited usefulness of animal studies when confronted with questions of toxicity”); Richardson v. Richardson-Merrell, Inc., 857 F. 2d 823, 830 (D.C. Cir. 1988) (“Positive results from in vitro studies may provide a clue signaling the need for further research, but alone do not provide a satisfactory basis for opining about causation in the human context.”);  Lynch v. Merrell-Nat‘l Labs., 830 F.2d 1190, 1194 (1st Cir. 1987) (“Studies of this sort [animal studies], singly or in combination, do not have the capability of proving causation in human beings in the absence of any confirmatory epidemiological data.”). See also Merrell Dow Pharrns., Inc. v. Havner, 953 S.W.2d 706, 730 (Tex. 1997); DePyper v. Navarro, No. 83-303467-NM, 1995 WL 788828, at *34 (Mich. Cir. Ct. Nov. 27, 1995), aff’d, No. 191949, 1998 WL 1988927 (Mich. Ct. App. Nov. 6, 1998); Nelson v. American Sterilizer Co., 566 N.W.2d 671 (Mich. Ct. App. 1997)(high-dose animal studies not reliable). But see Ambrosini v. Labarraque,  101 F.3d 129, 137-140 (D.C. Cir.1996); Dyson v. Winfield, 113 F. Supp. 2d 44, 50-51 (D.D.C. 2000).

[21] Teratology Society Public Affairs Committee, “Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421 (2005) [Teratology Position Paper]

[22] Id. at 423.

[23]  SeeImproper Reliance Upon Regulatory Risk Assessments in Civil Litigation” (Mar. 19, 2023) (collecting cases).

[24] Teratology Position Paper at 422-423.

[25] See, e.g., Gideon Koren, Anne Pastuszak & Shinya Ito, “Drugs in Pregnancy,” 338 New England J. Med. 1128, 1131 (1998); Louis Lasagna, “Predicting Human Drug Safety from Animal Studies: Current Issues,” 12 J. Toxicological Sci. 439, 442-43 (1987).

[26] Bruce N. Ames & Lois S. Gold, Too Many Rodent Carcinogens: Mitogenesis Increases Mutagenesis, 249 Science 970, 970 (1990) (noting that chronic irritation induced by many chemicals at high exposures is itself a cause of cancer in rodent models); Bruce N. Ames & Lois Swirsky Gold, “Environmental Pollution and Cancer: Some Misconceptions,” in Jay H. Lehr, ed., Rational Readings on Environmental Concerns 151, 153 (1992); Mary Eubanks, “The Danger of Extrapolation: Humans and Rodents Differ in Response to PCBs,” 112 Envt’l Health Persps. A113 (2004)

[27] Andrea Gawrylewski, “The Trouble with Animal Models: Why did human trials fail?” 21 The Scientist 44 (2007); Michael B. Bracken, “Why animal studies are often poor predictors of human reactions to exposure,” 101 J. Roy. Soc. Med. 120 (2008); Fiona Godlee, “How predictive and productive is animal research?” 3348 Brit. Med. J. g3719 (2014); John P. A. Ioannidis, “Extrapolating from Animals to Humans,” 4 Science Translational Med. 15 (2012); Pandora Pound & Michael Bracken, “Is animal research sufficiently evidence based to be a cornerstone of biomedical research?” 348 Brit. Med. J. g3387 (2014); Pandora Pound, Shah Ebrahim, Peter Sandercock, Michael B Bracken, and Ian Roberts, “Where is the evidence that animal research benefits humans?” 328 Brit. Med. J. 514 (2004) (writing on behalf of the Reviewing Animal Trials Systematically (RATS) Group).

[28] See Ray Greek, Niall Shanks, and Mark J. Rice, “The History and Implications of Testing Thalidomide on Animals,” 11 J. Philosophy, Sci. & Law 1, 19 (2011).

[29] Austin Bradford Hill, “Observation and Experiment,” 248 New Engl. J. Med. 995, 999 (1953).

The Role of Peer Review in Rule 702 and 703 Gatekeeping

November 19th, 2023

“There is no expedient to which man will not resort to avoid the real labor of thinking.”
              Sir Joshua Reynolds (1723-92)

Some courts appear to duck the real labor of thinking, and the duty to gatekeep expert witness opinions,  by deferring to expert witnesses who advert to their reliance upon peer-reviewed published studies. Does the law really support such deference, especially when problems with the relied-upon studies are revealed in discovery? A careful reading of the Supreme Court’s decision in Daubert, and of the Reference Manual on Scientific Evidence provides no support for admitting expert witness opinion testimony that relies upon peer-reviewed published studies, when the studies are invalid or are based upon questionable research practices.[1]

In Daubert v. Merrell Dow Pharmaceuticals, Inc.,[2] The Supreme Court suggested that peer review of studies relied upon by a challenged expert witness should be a factor in determining the admissibility of that expert witness’s opinion. In thinking about the role of peer-review publication in expert witness gatekeeping, it is helpful to remember the context of how and why the Supreme was talking about peer review in the first place. In the trial court, the Daubert plaintiff had proffered an expert witness opinion that featured reliance upon an unpublished reanalysis of published studies. On the defense motion, the trial court excluded the claimant’s witness,[3] and the Ninth Circuit affirmed.[4] The intermediate appellate court expressed its view that unpublished, non-peer-reviewed reanalyses were deviations from generally accepted scientific discourse, and that other appellate courts, considering the alleged risks of Bendectin, refused to admit opinions based upon unpublished, non-peer-reviewed reanalyses of epidemiologic studies.[5] The Circuit expressed its view that reanalyses are generally accepted by scientists when they have been verified and scrutinized by others in the field. Unpublished reanalyses done for solely for litigation would be an insufficient foundation for expert witness opinion.[6]

The Supreme Court, in Daubert, evaded the difficult issues involved in evaluating a statistical analysis that has not been published by deciding the case on the ground that the lower courts had applied the wrong standard.  The so-called Frye test, or what I call the “twilight zone” test comes from the heralded 1923 case excluding opinion testimony based upon a lie detector:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”[7]

The Supreme Court, in Daubert, held that with the promulgation of the Federal Rules of Evidence in 1975, the twilight zone test was no longer legally valid. The guidance for admitting expert witness opinion testimony lay in Federal Rule of Evidence 702, which outlined an epistemic test for “knowledge,” which would be helpful to the trier of fact. The Court then proceeded to articulate several  non-definitive factors for “good science,” which might guide trial courts in applying Rule 702, such as testability or falsifiability, a showing of known or potential error rate. Another consideration, general acceptance carried over from Frye as a consideration.[8] Courts have continued to build on this foundation to identify other relevant considerations in gatekeeping.[9]

One of the Daubert Court’s pertinent considerations was “whether the theory or technique has been subjected to peer review and publication.”[10] The Court, speaking through Justice Blackmun, provided a reasonably cogent, but probably now out-dated discussion of peer review:

 “Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?,” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[11]

To the extent that peer review was touted by Justice Blackmun, it was because the peer-review process advanced the ultimate consideration of the scientific validity of the opinion or claim under consideration. Validity was the thing; peer review was just a crude proxy.

If the Court were writing today, it might well have written that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena. And of course, the wild proliferation of journals, including the “pay-to-play” journals, facilitates the festschrift.

Reference Manual on Scientific Evidence

Certainly, judicial thinking evolved since 1993, and the decision in Daubert. Other considerations for gatekeeping have been added. Importantly, Daubert involved the interpretation of a statute, and in 2000, the statute was amended.

Since the Daubert decision, the Federal Judicial Center and the National Academies of Science have weighed in with what is intended to be guidance for judges and lawyers litigating scientific and technical issue. The Reference Manual on Scientific Evidence is currently in a third edition, but a fourth edition is expected in 2024.

How does the third edition[12] treat peer review?

An introduction by now retired Associate Justice Stephen Breyer blandly reports the Daubert considerations, without elaboration.[13]

The most revealing and important chapter in the Reference Manual is the one on scientific method and procedure, and sociology of science, “How Science Works,” by Professor David Goodstein.[14] This chapter’s treatment is not always consistent. In places, the discussion of peer review is trenchant. At other places, it can be misleading. Goodstein’s treatment, at first, appears to be a glib endorsement of peer review as a substitute for critical thinking about a relied-upon published study:

“In the competition among ideas, the institution of peer review plays a central role. Scientifc articles submitted for publication and proposals for funding often are sent to anonymous experts in the field, in other words, to peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (space in prestigious journals, funds from government agencies or private foundations) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their toughest competitor is rigorously honest in the reporting of scientific results, which makes it easy for a purposefully dishonest scientist to fool a referee. Despite all of this, peer review is one of the venerated pillars of the scientific edifice.”[15]

A more nuanced and critical view emerges in footnote 11, from the above-quoted passage, when Goodstein discusses how peer review was framed by some amici curiae in the Daubert case:

“The Supreme Court received differing views regarding the proper role of peer review. Compare Brief for Amici Curiae Daryl E. Chubin et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993) (No. 92-102) (“peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty”), with Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993) (No. 92-102) (proposing that publication in a peer-reviewed journal be the primary criterion for admitting scientifc evidence in the courtroom). See generally Daryl E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990); Arnold S. Relman & Marcia Angell, How Good Is Peer Review? 321 New Eng. J. Med. 827–29 (1989). As a practicing scientist and frequent peer reviewer, I can testify that Chubin’s view is correct.”[16]

So, if, as Professor Goodstein attests, Chubin is correct that peer review does not “guarantee accuracy or veracity or certainty,” the basis for veneration is difficult to fathom.

Later in Goodstein’s chapter, in a section entitled “V. Some Myths and Facts about Science,” the gloves come off:[17]

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. Peer review mostly assures that all papers follow the current paradigm (see comments on Kuhn, above). It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.”[18]

Goodstein is not a post-modern nihilist. He acknowledges that “real” science can be distinguished from “not real science.” He can hardly be seen to have given a full-throated endorsement to peer review as satisfying the gatekeeper’s obligation to evaluate whether a study can be reasonably relied upon, or whether reliance upon such a particular peer-reviewed study can constitute sufficient evidence to render an expert witness’s opinion helpful, or the application of a reliable methodology.

Goodstein cites, with apparent approval, the amicus brief filed by the New England Journal of Medicine, and other journals, which advised the Supreme Court that “good science,” requires a “a rigorous trilogy of publication, replication and verification before it is relied upon.” [19]

“Peer review’s ‘role is to promote the publication of well-conceived articles so that the most important review, the consideration of the reported results by the scientific community, may occur after publication.’”[20]

Outside of Professor Goodstein’s chapter, the Reference Manual devotes very little ink or analysis to the role of peer review in assessing Rule 702 or 703 challenges to witness opinions or specific studies.  The engineering chapter acknowledges that “[t]he topic of peer review is often raised concerning scientific and technical literature,” and helpfully supports Goodstein’s observations by noting that peer review “does not ensure accuracy or validity.”[21]

The chapter on neuroscience is one of the few chapters in the Reference Manual, other than Professor Goodstein’s, to address the limitations of peer review. Peer review, if absent, is highly suspicious, but its presence is only the beginning of an evaluation process that continues after publication:

Daubert’s stress on the presence of peer review and publication corresponds nicely to scientists’ perceptions. If something is not published in a peer-reviewed journal, it scarcely counts. Scientists only begin to have confidence in findings after peers, both those involved in the editorial process and, more important, those who read the publication, have had a chance to dissect them and to search intensively for errors either in theory or in practice. It is crucial, however, to recognize that publication and peer review are not in themselves enough. The publications need to be compared carefully to the evidence that is proffered.[22]

The neuroscience chapter goes on to discuss peer review also in the narrow context of functional magnetic resonance imaging (fMRI). The authors note that fMRI, as a medical procedure, has been the subject of thousands of peer-reviewed, but those peer reviews do little to validate the use of fMRI as a high-tech lie detector.[23] The mental health chapter notes in a brief footnote that the science of memory is now well accepted and has been subjected to peer review, and that “[c]areful evaluators” use only tests that have had their “reliability and validity confirmed in peer-reviewed publications.”[24]

Echoing other chapters, the engineering chapter also mentions peer review briefly in connection with qualifying as an expert witness, and in validating the value of accrediting societies.[25]  Finally, the chapter points out that engineering issues in litigation are often sufficiently novel that they have not been explored in peer-reviewed literature.[26]

Most of the other chapters of the Reference Manual, third edition, discuss peer review only in the context of qualifications and membership in professional societies.[27] The chapter on exposure science discusses peer review only in the narrow context of a claim that EPA guidance documents on exposure assessment are peer reviewed and are considered “authoritative.”[28]

Other chapters discuss peer review briefly and again only in very narrow contexts. For instance, the epidemiology chapter discusses peer review in connection with two very narrow issues peripheral to Rule 702 gatekeeping. First, the chapter raises the question (without providing a clear answer) whether non-peer-reviewed studies should be included in meta-analyses.[29] Second, the chapter asserts that “[c]ourts regularly affirm the legitimacy of employing differential diagnostic methodology,” to determine specific causation, on the basis of several factors, including the questionable claim that the methodology “has been subjected to peer review.”[30] There appears to be no discussion in this key chapter about whether, and to what extent, peer review of published studies can or should be considered in the gatekeeping of epidemiologic testimony. There is certainly nothing in the epidemiology chapter, or for that matter elsewhere in the Reference Manual, to suggest that reliance upon a peer-reviewed published study pretermits analysis of that study to determine whether it is indeed internally valid or reasonably relied upon by expert witnesses in the field.


[1] See Jop de Vrieze, “Large survey finds questionable research practices are common: Dutch study finds 8% of scientists have committed fraud,” 373 Science 265 (2021); Yu Xie, Kai Wang, and Yan Kong, “Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis,” 27 Science & Engineering Ethics 41 (2021).

[2] 509 U.S. 579 (1993).

[3]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 727 F.Supp. 570 (S.D.Cal.1989).

[4] 951 F. 2d 1128 (9th Cir. 1991).

[5]  951 F. 2d, at 1130-31.

[6] Id. at 1131.

[7] Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923) (emphasis added).

[8]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 590 (1993).

[9] See, e.g., In re TMI Litig. II, 911 F. Supp. 775, 787 (M.D. Pa. 1995) (considering the relationship of the technique to methods that have been established to be reliable, the uses of the method in the actual scientific world, the logical or internal consistency and coherence of the claim, the consistency of the claim or hypothesis with accepted theories, and the precision of the claimed hypothesis or theory).

[10] Id. at  593.

[11] Id. at 593-94.

[12] National Research Council, Reference Manual on Scientific Evidence (3rd ed. 2011) [RMSE]

[13] Id., “Introduction” at 1, 13

[14] David Goodstein, “How Science Works,” RMSE 37.

[15] Id. at 44-45.

[16] Id. at 44-45 n. 11 (emphasis added).

[17] Id. at 48 (emphasis added).

[18] Id. at 49 n.16 (emphasis added)

[19] David Goodstein, “How Science Works,” RMSE 64 n.45 (citing Brief for the New England Journal of Medicine, et al., as Amici Curiae supporting Respondent, 1993 WL 13006387 at *2, in Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).

[20] Id. (citing Brief for the New England Journal of Medicine, et al., 1993 WL 13006387 *3)

[21] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 938 (emphasis added).

[22] Henry T. Greely & Anthony D. Wagner, “Reference Guide on Neuroscience,” RMSE 747, 786.

[23] Id. at 776, 777.

[24] Paul S. Appelbaum, “Reference Guide on Mental Health Evidence,” RMSE 813, 866, 886.

[25] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 901, 931.

[26] Id. at 935.

[27] Daniel Rubinfeld, “Reference Guide on Multiple Regression,” 303, 328 RMSE  (“[w]ho should be qualified as an expert?”); Shari Seidman Diamond, “Reference Guide on Survey Research,” RMSE 359, 375; Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” RMSE 633, 677, 678 (noting that membership in some toxicology societies turns in part on having published in peer-reviewed journals).

[28] Joseph V. Rodricks, “Reference Guide on Exposure Science,” RMSE 503, 508 (noting that EPA guidance documents on exposure assessment often are issued after peer review).

[29] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE 549, 608.

[30] Id. at 617-18 n.212.

Excluding Epidemiologic Evidence under Federal Rule of Evidence 702

August 26th, 2023

We are 30-plus years into the “Daubert” era, in which federal district courts are charged with gatekeeping the relevance and reliability of scientific evidence. Not surprisingly, given the lawsuit industry’s propensity on occasion to use dodgy science, the burden of awakening the gatekeepers from their dogmatic slumber often falls upon defense counsel in civil litigation. It therefore behooves defense counsel to speak carefully and accurately about the grounds for Rule 702 exclusion of expert witness opinion testimony.

In the context of medical causation opinions based upon epidemiologic evidence, the first obvious point is that whichever party is arguing for exclusion should distinguish between excluding an expert witness’s opinion and prohibiting an expert witness from relying upon a particular study.  Rule 702 addresses the exclusion of opinions, whereas Rule 703 addresses barring an expert witness from relying upon hearsay facts or data unless they are reasonably relied upon by experts in the appropriate field. It would be helpful for lawyers and legal academics to refrain from talking about “excluding epidemiological evidence under FRE 702.”[1] Epidemiologic studies are rarely admissible themselves, but come into the courtroom as facts and data relied upon by expert witnesses. Rule 702 is addressed to the admissibility vel non of opinion testimony, some of which may rely upon epidemiologic evidence.

Another common lawyer mistake is the over-generalization that epidemiologic research provides “gold standard” of general causation evidence.[2] Although epidemiology is often required, it not “the medical science devoted to determining the cause of disease in human beings.”[3] To be sure, epidemiologic evidence will usually be required because there is no genetic or mechanistic evidence that will support the claimed causal inference, but counsel should be cautious in stating the requirement. Glib statements by courts that epidemiology is not always required are often simply an evasion of their responsibility to evaluate the validity of the proffered expert witness opinions. A more careful phrasing of the role of epidemiology will make such glib statements more readily open to rebuttal. In the absence of direct biochemical, physiological, or genetic mechanisms that can be identified as involved in bringing about the plaintiffs’ harm, epidemiologic evidence will be required, and it may well be the “gold standard” in such cases.[4]

When epidemiologic evidence is required, counsel will usually be justified in adverting to the “hierarchy of epidemiologic evidence.” Associations are shown in studies of various designs with vastly differing degrees of validity; and of course, associations are not necessarily causal. There are thus important nuances in educating the gatekeeper about this hierarchy. First, it will often be important to educate the gatekeeper about the distinction between descriptive and analytic studies, and the inability of descriptive studies such as case reports to support causal inferences.[5]

There is then the matter of confusion within the judiciary and among “scholars” about whether a hierarchy even exists. The chapter on epidemiology in the Reference Manual on Scientific Evidence appears to suggest the specious position that there is no hierarchy.[6] The chapter on medical testimony, however, takes a different approach in identifying a normative hierarchy of evidence to be considered in evaluating causal claims.[7] The medical testimony chapter specifies that meta-analyses of randomized controlled trials sit atop the hierarchy. Yet, there are divergent opinions about what should be at the top of the hierarchical evidence pyramid. Indeed, the rigorous, large randomized trial will often replace a meta-analysis of smaller trials as the more definitive evidence.[8] Back in 2007, a dubious meta-analysis of over 40 clinical trials led to a litigation frenzy over rosiglitazone.[9] A mega-trial of rosiglitazone showed that the 2007 meta-analysis was wrong.[10]

In any event, courts must purge their beliefs that once there is “some” evidence in support of a claim, their gatekeeping role is over. Randomized controlled trials really do trump observational studies, which virtually always have actual or potential confounding in their final analyses.[11] While disclaimers about the unavailability of randomized trials for putative toxic exposures are helpful, it is not quite accurate to say that it is “unethical to intentionally expose people to a potentially harmful dose of a suspected toxin.”[12] Such trials are done all the time when there is an expected therapeutic benefit that creates at least equipoise between the overall benefit and harm at the outset of the trial.[13]

At this late date, it seems shameful that courts must be reminded that evidence of associations does not suffice to show causation, but prudence dictates giving the reminder.[14] Defense counsel will generally exhibit a Pavlovian reflex to state that causality based upon epidemiology must be viewed through a lens of “Bradford Hill criteria.”[15] Rhetorically, this reflex seems wrong given that Sir Austin himself noted that his nine different considerations were “viewpoints,” not criteria. Taking a position that requires an immediate retreat seems misguided. Similarly, urging courts to invoke and apply the Bradford Hill considerations must be accompanied the caveat that courts must first apply Bradford Hill’s predicate[16] for the nine considerations:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[17]

Courts should be mindful that the language from the famous, often-cited paper was part of an after-dinner address, in which Sir Austin was speaking informally. Scientists will understand that he was setting out a predicate that calls for

(1) an association, which is

(2) “perfectly clear cut,” such that bias and confounding are excluded, and

(3) “beyond what we would care to attribute to the play of chance,” with random error kept to an acceptable level, before advancing to further consideration of the nine viewpoints commonly recited.

These predicate findings are the basis for advancing to investigate Bradford Hill’s nine viewpoints; the viewpoints do not replace or supersede the predicates.[18]

Within the nine viewpoints, not all are of equal importance. Consistency among studies, a particularly important consideration, implies that isolated findings in a single observational study will rarely suffice to support causal conclusions. Another important consideration, the strength of the association, has nothing to do with “statistical significance,” which is a predicate consideration, but reminds us that large risk ratios or risk differences provides some evidence that the association does not result from unmeasured confounding. Eliminating confounding, however, is one of the predicate requirements for applying the nine factors. As with any methodology, the Bradford Hill factors are not self-executing. The annals of litigation provide all-too-many examples of undue selectivity, “cherry picking,” and other deviations from the scientist’s standard of care.

Certainly lawyers must steel themselves against recommending the “carcinogen” hazard identifications advanced by the International Agency for Research on Cancer (IARC). There are several problematic aspects to the methods of IARC, not the least of which is IARC’s fanciful use of the word “probable.” According to the IARC Preamble, “probable” has no quantitative meaning.[19] In common legal parlance, “probable” typically conveys a conclusion that is more likely than not. Another problem arises from the IARC’s labeling of “probable human carcinogens” made in some cases without any real evidence of carcinogenesis in humans. Regulatory pronouncements are even more diluted and often involved little more than precautionary principle wishcasting.[20]


[1] Christian W. Castile & and Stephen J. McConnell, “Excluding Epidemiological Evidence Under FRE 702,” For The Defense 18 (June 2023) [Castile]. Although these authors provide an interesting overview of the subject, they fall into some common errors, such as failing to address Rule 703. The article is worth reading for its marshaling recent case law on the subject, but I detail of its errors here in the hopes that lawyers will speak more precisely about the concepts involved in challenging medical causation opinions.

[2] Id. at 18. In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *401 (S.D. Fla. Dec. 6, 2022); see also Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *14-15 (C.D. Cal. May 9, 2003) (“epidemiological studies provide the primary generally accepted methodology for demonstrating a causal relation between a chemical compound and a set of symptoms or disease” *** “The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case.”).

[3] See, e.g., Siharath v. Sandoz Pharm. Corp., 131 F. Supp. 2d 1347, 1356 (N.D. Ga. 2001) (“epidemiology is the medical science devoted to determining the cause of disease in human beings”).

[4] See, e.g., Lopez v. Wyeth-Ayerst Labs., No. C 94-4054 CW, 1996 U.S. Dist. LEXIS 22739, at *1 (N.D. Cal. Dec. 13, 1996) (“Epidemiological evidence is one of the most valuable pieces of scientific evidence of causation”); Horwin v. Am. Home Prods., No. CV 00-04523 WJR (Ex), 2003 U.S. Dist. LEXIS 28039, at *15 (C.D. Cal. May 9, 2003) (“The lack of epidemiological studies supporting Plaintiffs’ claims creates a high bar to surmount with respect to the reliability requirement, but it is not automatically fatal to their case”).

[5] David A. Grimes & Kenneth F. Schulz, “Descriptive Studies: What They Can and Cannot Do,” 359 Lancet 145 (2002) (“…epidemiologists and clinicians generally use descriptive reports to search for clues of cause of disease – i.e., generation of hypotheses. In this role, descriptive studies are often a springboard into more rigorous studies with comparison groups. Common pitfalls of descriptive reports include an absence of a clear, specific, and reproducible case definition, and interpretations that overstep the data. Studies without a comparison group do not allow conclusions about cause of disease.”).

[6] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” Reference Manual on Scientific Evidence 549, 564n.48 (citing a paid advertisement by a group of scientists, and misleadingly referring to the publication as a National Cancer Institute symposium) (citing Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004) (National Cancer Institute symposium [sic] concluding that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.”).

[7] John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” in Reference Manual on Scientific Evidence 687, 723 (3d ed. 2011).

[8] See, e.g., J.M. Elwood, Critical Appraisal of Epidemiological Studies and Clinical Trials 342 (3d ed. 2007).

[9] See Steven E. Nissen & Kathy Wolski, “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007). See also “Learning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11] In re Zantac (Ranitidine) Prods. Liab. Litig., No. 2924, 2022 U.S. Dist. LEXIS 220327, at *402 (S.D. Fla. Dec. 6, 2022) (“Unlike experimental studies in which subjects are randomly assigned to exposed and placebo groups, observational studies are subject to bias due to the possibility of differences between study populations.”)

[12] Castile at 20.

[13] See, e.g., Benjamin Freedman, “Equipoise and the ethics of clinical research,” 317 New Engl. J. Med. 141 (1987).

[14] See, e.g., In Re Onglyza (Saxagliptin) & Kombiglyze Xr (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 136955, at *127 (E.D. Ky. Aug. 2, 2022); Burleson v. Texas Dep’t of Criminal Justice, 393 F.3d 577, 585-86 (5th Cir. 2004) (affirming exclusion of expert causation testimony based solely upon studies showing a mere correlation between defendant’s product and plaintiff’s injury); Beyer v. Anchor Insulation Co., 238 F. Supp. 3d 270, 280-81 (D. Conn. 2017); Ambrosini v. Labarraque, 101 F.3d 129, 136 (D.C. Cir. 1996).

[15] Castile at 21. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449, 454-55 (E.D. Pa. 2014).

[16]Bradford Hill on Statistical Methods” (Sept. 24, 2013); see also Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013). 

[17] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[18] Castile at 21. See, e.g., In re Onglyza (Saxagliptin) & Kombiglyze XR (Saxagliptin & Metformin) Prods. Liab. Litig., No. 5:18-md-2809-KKC, 2022 U.S. Dist. LEXIS 1821, at *43 (E.D. Ky. Jan. 5, 2022) (“The analysis is meant to apply when observations reveal an association between two variables. It addresses the aspects of that association that researchers should analyze before deciding that the most likely interpretation of [the association] is causation”); Hoefling v. U.S. Smokeless Tobacco Co., LLC, 576 F. Supp. 3d 262, 273 n.4 (E.D. Pa. 2021) (“Nor would it have been appropriate to apply them here: scientists are to do so only after an epidemiological association is demonstrated”).

[19] IARC Monographs on the Identification of Carcinogenic Hazards to Humans – Preamble 31 (2019) (“The terms probably carcinogenic and possibly carcinogenic have no quantitative significance and are used as descriptors of different strengths of evidence of carcinogenicity in humans.”).

[20]Improper Reliance upon Regulatory Risk Assessments in Civil Litigation” (Mar. 19, 2023).

Consensus Rule – Shadows of Validity

April 26th, 2023

Back in 2011, at a Fourth Circuit Judicial Conference, Chief Justice John Roberts took a cheap shot at law professors and law reviews when he intoned:

“Pick up a copy of any law review that you see, and the first article is likely to be, you know, the influence of Immanuel Kant on evidentiary approaches in 18th Century Bulgaria, or something, which I’m sure was of great interest to the academic that wrote it, but isn’t of much help to the bar.”[1]

Anti-intellectualism is in vogue these days. No doubt, Roberts was jocularly indulging in an over-generalization, but for anyone who tries to keep up with the law reviews, he has a small point. Other judges have rendered similar judgments. Back in 1993, in a cranky opinion piece – in a law review – then Judge Richard A. Posner channeled the liar paradox by criticizing law review articles for “the many silly titles, the many opaque passages, the antic proposals, the rude polemics, [and] the myriad pretentious citations.”[2] In a speech back in 2008, Justice Stephen Breyer noted that “[t]here is evidence that law review articles have left terra firma to soar into outer space.”[3]

The temptation to rationalize, and to advocate for reflective equilibrium between the law as it exists, and the law as we think it should be, combine to lead to some silly and harmful efforts to rewrite the law as we know it.  Jeremy Bentham, Mr. Nonsense-on-Stilts, who sits stuffed in the hallway of the University of London, ushered in a now venerable tradition of rejecting tradition and common sense, in proposing all sorts of law reforms.[4]  In the early 1800s, Jeremy Bentham, without much in the way of actual courtroom experience, deviled the English bench and bar with sweeping proposals to place evidence law on what he thought was a rational foundation. As with his naïve utilitarianism, Bentham’s contributions to jurisprudence often ignored the realities of human experience and decision making. The Benthamite tradition of anti-tradition is certainly alive and well in the law reviews.

Still, I have a soft place in my heart for law reviews.  Although not peer reviewed, law reviews provide law students a tremendous opportunity to learn about writing and scholarship through publishing the work of legal scholars, judges, thoughtful lawyers, and other students. Not all law review articles are non-sense on stilts, but we certainly should have our wits about us when we read immodest proposals from the law professoriate.

*   *   *   *   *   *   *   *   *   *

Professor Edward Cheng has written broadly and insightfully about evidence law, and he certainly has the educational training to do so. Recently, Cheng has been bemused by the expert paradox, which wonders how lay persons, without expertise, can evaluate and judge issues of the admissibility, validity, and correctness of expert opinion. The paradox has long haunted evidence law, and it is at center stage in the adjudication of expert admissibility issues, as well as the trial of technical cases. Recently, Cheng has proposed a radical overhaul to the law of evidence, which would require that we stop asking courts to act as gatekeepers, and to stop asking juries to determine the validity and correctness of expert witness opinions before them. Cheng’s proposal would revert to the nose counting process of Frye and permit consideration of only whether there is an expert witness consensus to support the proffered opinion for any claim or defense.[5] Or in Plato’s allegory of the cave, we need to learn to be content with shadows on the wall rather than striving to know the real thing.

When Cheng’s proposal first surfaced, I wrote briefly about why it was a bad idea.[6] Since his initial publication, a law review symposium was assembled to address and perhaps to celebrate the proposal.[7] The papers from that symposium are now in print.[8] Unsurprisingly, the papers are both largely sympathetic (but not completely) to Cheng’s proposal, and virtually devoid of references to actual experiences of gatekeeping or trials of technical issues.

Cheng contends that the so-called Daubert framework for addressing the admissibility of expert witness opinion is wrong.  He does not argue that the existing law, in the form of Federal Rules of Evidence 702 and 703, does not call for an epistemic standard for both admitting opinion testimony, as well for the fact-finders’ assessments. There is no effort to claim that somehow four Supreme Court cases, and thousand of lower courts, have erroneously viewed the whole process. Rather, Cheng simply asserts non-expert judges cannot evaluate the reliability (validity) of expert witness opinions, and that non-expert jurors cannot “reach independent, substantive conclusions about specialized facts.”[9] The law must change to accommodate his judgment.

In his symposium contribution, Cheng expands upon his previous articulation of his proposed “consensus rule.”[10] What is conspicuously absent, however, is any example of failed gatekeeping that excluded valid expert witness opinion. One example, the appellate decision in Rosen v. Ciba-Geigy Corporation,[11] which Cheng does give, is illustrative of Cheng’s project. The expert witness, whose opinion was excluded, was on the faculty of the University of Chicago medical school; Richard Posner, the appellate judge who wrote the opinion that affirmed the expert witness’s exclusion, was on the faculty of that university’s law school. Without any discussion of the reports, depositions, hearings, or briefs, Cheng concludes that “the very idea that a law professor would tell medical school colleagues that their assessments were unreliable seems both breathtakingly arrogant and utterly ridiculous.”[12]

Except, of course, very well qualified scientists and physicians advance invalid and incorrect claims all the time. What strikes me as breathtakingly arrogant and utterly ridiculous is the judgment of a law professor who has little to no experience trying or defending Rule 702 and 703 issues labeling the “very idea” as arrogant and ridiculous. Aside from its being a petitio principia, we could probably add that the reaction is emotive, uninformed, and uninformative, and that it fails to support the author’s suggestion that “Daubert has it all wrong,” and that “[w]e need a different approach.”

Judges and jurors obviously will never fully understand the scientific issues before them.  If and when this lack of epistemic competence is problematic, we should honestly acknowledge how we are beyond the realm of the Constitution’s seventh amendment. Since Cheng is fantasizing about what the law should be, why not fantasize about not allowing lay people to decide complex scientific issues? Verdicts from jurors who do not have to give reasons for their decisions, and who are not in any sense peers of the scientists whose work they judge are normatively problematic.

Professor Cheng likens his consensus rule to how the standard of care is decided in medical malpractice litigation. The analogy is interesting, but hardly compelling in that it ignores “two schools of thought” doctrine.[13] In litigation of claims of professional malpractice, the “two schools of thought doctrine” is a complete defense.  As explained by the Pennsylvania Supreme Court,[14] physicians may defend against claims that they deviated from the standard of care, or of professional malpractice, by adverting to support for their treatment by a minority of professionals in their field:

“Where competent medical authority is divided, a physician will not be held responsible if in the exercise of his judgment he followed a course of treatment advocated by a considerable number of recognized and respected professionals in his given area of expertise.”[15]

The analogy to medical malpractice litigation seems inapt.

Professor Cheng advertises that he will be giving full-length book treatment to his proposal, and so perhaps my critique is uncharitable in looking at a preliminary, (antic?) law review article. Still, his proposal seems to ignore that “general acceptance” renders consensus, when it truly exists, as relevant to both the court’s gatekeeping decisions, and the fact finders’ determination of the facts and issues in dispute. Indeed, I have never seen a Rule 702 hearing that did not involve, to some extent, the assertion of a consensus, or the lack thereof.

To the extent that we remain committed to trials of scientific claims, we can see that judges and jurors often can detect inconsistencies, cherry picking, unproven assumptions, and other aspects of the patho-epistemology of exert witness opinions. It takes a community of scientists and engineers to build a space rocket, but any Twitter moron can determine when a rocket blows up on launch. Judges in particular have (and certainly should have) the competence to determine deviations from the scientific and statistical standards of care that pertain to litigants’ claims.

Cheng’s proposal also ignores how difficult and contentious it is to ascertain the existence, scope, and actual content of scientific consensus. In some areas of science, such as occupational and environmental epidemiology and medicine, faux consensuses are set up by would-be expert witnesses for both claimants and defendants. A search of the word “consensus” in the PubMed database yields over a quarter of a million hits. The race to the bottom is on. Replacing epistemic validity with sociological and survey navel gazing seems like a fool’s errand.

Perhaps the most disturbing aspect of Cheng’s proposal is what happens in the absence of consensus.  Pretty much anything goes, a situation that Cheng finds “interesting,” and I find horrifying:

“if there is no consensus, the legal system’s options become a bit more interesting. If there is actual dissensus, meaning that the community is fractured in substantial numbers, then the non-expert can arguably choose from among the available theories. If the expert community cannot agree, then one cannot possibly expect non-experts to do any better.”[16]

Cheng reports that textbooks and other documents “may be both more accurate and more efficient” evidence of consensus.[17] Maybe; maybe not.  Textbooks are typically often dated by the time they arrive on the shelves, and contentious scientists are not beyond manufacturing certainty or doubt in the form of falsely claimed consensus.

Of course, often, if not most of the time, there will be no identifiable, legitimate consensus for a litigant’s claim at trial. What would Professor Cheng do in this default situation? Here Cheng, fully indulging the frolic, tells us that we

“should hypothetically ask what the expert community is likely to conclude, rather than try to reach conclusions on their own.”[18]

So the default situation transforms jurors into tea-leaf readers of what an expert community, unknown to them, will do if and when there is evidence of a quantum and quality to support a consensus, or when that community gets around to articulating what the consensus is. Why not just toss claims that lack consensus support?


[1] Debra Cassens Weiss, “Law Prof Responds After Chief Justice Roberts Disses Legal Scholarship,” Am. Bar Ass’n J. (July 7, 2011).

[2] Richard A. Posner, “Legal Scholarship Today,” 45 Stanford L. Rev. 1647, 1655 (1993), quoted in Walter Olson, “Abolish the Law Reviews!” The Atlantic (July 5, 2012); see also Richard A. Posner, “Against the Law Reviews: Welcome to a world where inexperienced editors make articles about the wrong topics worse,”
Legal Affairs (Nov. 2004).

[3] Brent Newton, “Scholar’s highlight: Law review articles in the eyes of the Justices,” SCOTUS Blog (April 30, 2012); “Fixing Law Reviews,” Inside Higher Education (Nov. 19, 2012).

[4]More Antic Proposals for Expert Witness Testimony – Including My Own Antic Proposals” (Dec. 30, 2014).

[5] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022).

[6]Cheng’s Proposed Consensus Rule for Expert Witnesses” (Sept. 15, 2022);
Further Thoughts on Cheng’s Consensus Rule” (Oct. 3, 2022).

[7] Norman J. Shachoy Symposium, The Consensus Rule: A New Approach to the Admissibility of Scientific Evidence (2022), 67 Villanova L. Rev. (2022).

[8] David S. Caudill, “The ‘Crisis of Expertise’ Reaches the Courtroom: An Introduction to the Symposium on, and a Response to, Edward Cheng’s Consensus Rule,” 67 Villanova L. Rev. 837 (2022); Harry Collins, “The Owls: Some Difficulties in Judging Scientific Consensus,” 67 Villanova L. Rev. 877 (2022); Robert Evans, “The Consensus Rule: Judges, Jurors, and Admissibility Hearings,” 67 Villanova L. Rev. 883 (2022); Martin Weinel, “The Adversity of Adversarialism: How the Consensus Rule Reproduces the Expert Paradox,” 67 Villanova L. Rev. 893 (2022); Wendy Wagner, “The Consensus Rule: Lessons from the Regulatory World,” 67 Villanova L. Rev. 907 (2022); Edward K. Cheng, Elodie O. Currier & Payton B. Hampton, “Embracing Deference,” 67 Villanova L. Rev. 855 (2022).

[9] Embracing Deference at 876.

[10] Edward K. Cheng, Elodie O. Currier & Payton B. Hampton, “Embracing Deference,” 67 Villanova L. Rev. 855 (2022) [Embracing Deference]

[11] Rosen v. Ciba-Geigy Corp., 78 F.3d 316 (7th Cir. 1996).

[12] Embracing Deference at 859.

[13]Two Schools of Thought” (May 25, 2013).

[14] Jones v. Chidester, 531 Pa. 31, 40, 610 A.2d 964 (1992).

[15] Id. at 40.  See also Fallon v. Loree, 525 N.Y.S.2d 93, 93 (N.Y. App. Div. 1988) (“one of several acceptable techniques”); Dailey, “The Two Schools of Thought and Informed Consent Doctrine in Pennsylvania,” 98 Dickenson L. Rev. 713 (1994); Douglas Brown, “Panacea or Pandora’ Box:  The Two Schools of Medical Thought Doctrine after Jones v. Chidester,” 44 J. Urban & Contemp. Law 223 (1993).

[16] Embracing Deference at 861.

[17] Embracing Deference at 866.

[18] Embracing Deference at 876.

Reference Manual – Desiderata for 4th Edition – Part VI – Rule 703

February 17th, 2023

One of the most remarkable, and objectionable, aspects of the third edition was its failure to engage with Federal Rule of Evidence of 703, and the need for courts to assess the validity of individual studies relied upon. The statistics chapter has a brief, but important discussion of Rule 703, as does the chapter on survey evidence. The epidemiology chapter mentions Rule 703 only in a footnote.[1]

Rule 703 appears to be the red-headed stepchild of the Federal Rules, and it is often ignored and omitted from so-called Daubert briefs.[2] Perhaps part of the problem is that Rule 703 (“Bases of an Expert”) is one of the mostly poorly drafted rules in the Federal Rules of Evidence:

“An expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed. If experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted. But if the facts or data would otherwise be inadmissible, the proponent of the opinion may disclose them to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect.”

Despite its tortuous wording, the rule is clear enough in authorizing expert witnesses to rely upon studies that are themselves inadmissible, and allowing such witnesses to disclose the studies that they have relied upon, when there has been the requisite showing of probative value that outweighs any prejudice.

The statistics chapter in the third edition, nonetheless, confusingly suggested that

“a particular study may use a method that is entirely appropriate but that is so poorly executed that it should be inadmissible under Federal Rules of Evidence 403 and 702. Or, the method may be inappropriate for the problem at hand and thus lack the ‘fit’ spoken of in Daubert. Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703.”[3]

Particular studies, even when beautifully executed, are not admissible. And particular studies are not subject to evaluation under Rule 702, apart from the gatekeeping of expert witness opinion testimony that is based upon the particular studies. To be sure, the reference to Rule 703 is important and welcomed counter to the suggestion, elsewhere in the third edition, that courts should not look at individual studies. The independent review of individual studies is occasionally lost in the shuffle of litigation, and the statistics chapter is correct to note an evidentiary concern whether each individual study may or may not be reasonably relied upon by an expert witness. In any event, reasonably relied upon studies do not ipso facto become admissible.

The third edition’s chapter on Survey Research contains the most explicit direction on Rule 703, in terms of courts’ responsibilities.  In that chapter, the authors instruct that Rule 703:

“redirect[ed] attention to the ‘validity of the techniques employed’. The inquiry under Rule 703 focuses on whether facts or data are ‘of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject’.”[4]

Although Rule 703 is clear enough on admissibility, the epidemiology chapter described epidemiologic studies broadly as admissible if sufficiently rigorous:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[5]

The authors of the epidemiology chapter acknowledge, in a footnote, “that [h]earsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert.”[6]

This footnote is curious, and incorrect. There is no question that hearsay “concerns” “may limit” admissibility of a study; hearsay is inadmissible unless there is a statutory exception.[7] Rule 703 is not one of the exceptions to the rule against hearsay in Article VIII of the Federal Rules of Evidence. An expert witness’s reliance upon a study does not make the study admissible. The authors cite two cases,[8] but neither case held that reasonable reliance by expert witnesses transmuted epidemiologic studies into admissible evidence. The text of Rule 703 itself, and the overwhelming weight of case law interpreting and applying the rule,[9]  makes clear that the rule does not render scientific studies admissible. The two cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).[10] As such, the cases failed to support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies. The third edition thus, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence. The point was reasonably clear, however, that studies “may be offered” to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused.

The Reference Manual was certainly not alone in advancing the notion that studies are themselves admissible. Other well-respected evidence scholars have misstated the law on this issue.[11] The fourth edition would do well to note that scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay. A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication. Those leaps do not mean that the final results are thus untrustworthy or not reasonably relied upon, but they do raise well-nigh insuperable barriers to admissibility. The inadmissibility of scientific studies is generally not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

The fourth edition might well also note that under Rule 104(a), the Rules of Evidence themselves do not govern a trial court’s preliminary determination, under Rules 702 or 703, of the admissibility of an expert witness’s opinion, or the appropriateness of reliance upon a particular study. Although Rule 705 may allow disclosure of facts and data described in studies, it is not an invitation to permit testifying expert witnesses to become a conduit for off-hand comments and opinions in the introduction or discussion sections of relied upon articles.[12] The wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence. Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

The third edition evidence considerable ambivalence in whether trial judges should engage in resolving disputes about the validity of individual studies relied upon by expert witnesses. Since 2000, Rule 702 clearly required such engagement, which made the Manual’s hesitancy, on the whole, unjustifiable.  The ambivalence with respect to study validity, however, was on full display in the late Professor Margaret Berger’s chapter, “The Admissibility of Expert Testimony.”[13] Berger’s chapter criticized “atomization,” or looking at individual studies in isolation, a process she described pejoratively as “slicing-and-dicing.”[14]

Drawing on the publications of Daubert-critic Susan Haack, Berger appeared to reject the notion that courts should examine the reliability of each study independently.[15] Berger described the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer (IARC), the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[16]

Berger’s description of the review process, however, was profoundly misleading in its incompleteness. Of course, scientists undertaking a systematic review identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems. Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”[17]

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010, before the third edition was published. She was no friend of Daubert,[18] but her antipathy remarkably outlived her. Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[19]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.” One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Another curious omission in the third edition’s discussions of Milward is the dark ethical cloud of misconduct that hovers over the First Circuit’s reversal of the trial court’s exclusions of Martyn Smith and Carl Cranor. On appeal, the Council for Education and Research on Toxics (CERT) filed an amicus brief in support of reversing the exclusion of Smith and Cranor. The CERT amicus brief, however, never disclosed that CERT was founded by Smith and Cranor, and that CERT funded Smith’s research.[20]

Rule 702 requires courts to pay attention to, among other things, the sufficiency of the facts and data relied upon by expert witnesses. Rule 703’s requirement that individual studies must be reasonably relied upon is an important additional protreptic against the advice given by Professor Berger, in the third edition.


[1] The index notes the following page references for Rule 703: 214, 361, 363-364, and 610 n.184.

[2] See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”);  Schachtman, “Rule 703 – The Problem Child of Article VII,” 17 Proof 3 (Spring 2009); Schachtman “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008). See also Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008); “RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011); “Giving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[3] RMSE3d at 214.

[4] RMSE3d at 364 (internal citations omitted).

[5] RMSE 3d at 610 (internal citations omitted).

[6] RSME3d at 601 n.184.

[7] Rule 802 (“Hearsay Rule”) “Hearsay is not admissible except as provided by these rules or by other rules prescribed by the Supreme Court pursuant to statutory authority or by Act of Congress.”

[8] Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984); Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984). The chapter also cited another the en banc decision in Christophersen for the proposition that “[a]s a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . . ” In the Christophersen case, the Fifth Circuit was clearly addressing the admissibility of the challenged expert witness’s opinions, not the admissibility of relied-upon studies. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1111, 1113-14 (5th Cir. 1991) (en banc) (per curiam) (trial court may exclude opinion of expert witness whose opinion is based upon incomplete or inaccurate exposure data), cert. denied, 112 S. Ct. 1280 (1992).

[9] Interestingly, the authors of this chapter abandoned their suggestion, advanced in the second edition, that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5).” which was part of their argument in the Second Edition. RMSE 2d at 335 (2000). See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[10] See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18. These holdings predated the Supreme Court’s 1993 decision in Daubert, and the issue whether they are subject to Rule 702 has not been addressed.  Federal agency factual findings have been known to be invalid, on occasion.

[11] David L. Faigman, et al., Modern Scientific Evidence: The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[12] Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093, 1093 (2004) (advising readers on how to avoid being misled by published literature, and counseling readers to “Read only the Methods and Results sections; bypass the Discussion section.”)  (emphasis added).

[13] RSME 3d 11 (2011).

[14] Id. at 19.

[15] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[16] Id. at 19-20 & n.52.

[17] See Berger, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Professor Berger never mentions Rule 703 at all!  Gone and forgotten.

[18] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[19] Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”) The addition of the controversial Milward decision cannot seriously be considered an “edit.”

[20]From Here to CERT-ainty” (June 28, 2018); “ THE COUNCIL FOR EDUCATION AND RESEARCH ON TOXICS” (July 9, 2013).

People Get Ready – There’s a Reference Manual a Comin’

July 16th, 2021

Science is the key …

Back in February, I wrote about a National Academies’ workshop that featured some outstanding members of the scientific and statistical world, and which gave participants to identify new potential subjects for inclusion in a proposed fourth edition of the Reference Manual on Scientific Evidence.[1] Funding for that new edition is now secured, and the National Academies has published a précis of the February workshop. National Academies of Sciences, Engineering, and Medicine, Emerging Areas of Science, Engineering, and Medicine for the Courts: Proceedings of a Workshop – in Brief (Washington, DC 2021). The Rapporteurs for these proceedings provide a helpful overview for this meeting, which was not generally covered in the legal media.[2]

The goal of the workshop, which was supported by a planning committee, the Committee on Science, Technology, and Law, the National Academies, the Federal Judicial Center, and the National Science Foundation, was, of course, to identify chapters for a new, fourth edition, of Reference Manual on Scientific Evidence. The workshop was co-chaired by Dr. Thomas D. Albright, of the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, Judge on the U.S. Court of Appeals for the Federal Circuit.

The Rapporteurs duly noted Judge O’Malley’s Workshop comments that she hoped that the reconsideration of the Reference Manual can help close the gap between science and the law. It is thus encouraging that the Rapporteurs focused a large part of their summary on the presentation of Professor Xiao-Li Meng[3] on selection bias, which “can come from cherry picking data, which alters the strength of the evidence.” Meng identified the

“7 S’(ins)” of selection bias:

(1) selection of target/hypothesis (e.g., subgroup analysis);

(2) selection of data (e.g., deleting ‘outliers’ or using only ‘complete cases’);

(3) selection of methodologies (e.g., choosing tests to pass the goodness-of-fit); (4) selective due diligence and debugging (e.g., triple checking only when the outcome seems undesirable);

(5) selection of publications (e.g., only when p-value <0.05);

(6) selections in reporting/summary (e.g., suppressing caveats); and

(7) selections in understanding and interpretation (e.g., our preference for deterministic, ‘common sense’ interpretation).”

Meng also addressed the problem of analyzing subgroup findings after not finding an association in the full sample, dubious algorithms, selection bias in publishing “splashy” and nominally “statistically significant” results, and media bias and incompetence in disseminating study results. Meng discussed how these biases could affect the accuracy of research findings, and how these biases obviously affect the accuracy, validity, and reliability of research findings that are relied upon by expert witnesses in court cases.

The Rapporteurs’ emphasis on Professor Meng’s presentation was noteworthy because the current edition of the Reference Manual is generally lacking in a serious exploration of systematic bias and confounding. To be sure, the concepts are superficially addressed in the Manual’s chapter on epidemiology, but in a way that has allowed many district judges to shrug off serious questions of invalidity with the shibboleth that such questions “to to the weight, not the admissibility,” of challenged expert witness opinion testimony. Perhaps the pending revision to Rule 702 will help improve fidelity to the spirit and text of Rule 702.

Questions of bias and noise have come to receive more attention in the professional statistical and epidemiologic literature. In 2009, Professor Timothy Lash published an important book-length treatment of quantitative bias analysis.[4] Last year, statistician David Hand published a comprehensive, but readily understandable, book on “Dark Data,” and the ways statistical and scientific interference are derailed.[5] One of the presenters at the February workshop, nobel laureate, Daniel Kahneman, published a book on “noise,” just a few weeks ago.[6]

David Hand’s book, Dark Data, (Chapter 10) sets out a useful taxonomy of the ways that data can be subverted by what the consumers of data do not know. The taxonomy would provide a useful organizational map for a new chapter of the Reference Manual:

A Taxonomy of Dark Data

Type 1: Data We Know Are Missing

Type 2: Data We Don’t Know Are Missing

Type 3: Choosing Just Some Cases

Type 4: Self- Selection

Type 5: Missing What Matters

Type 7: Changes with Time

Type 8: Definitions of Data

Type 9: Summaries of Data

Type 11: Feedback and Gaming

Type 12: Information Asymmetry

Type 13: Intentionally Darkened Data

Type 14: Fabricated and Synthetic Data

Type 15: Extrapolating beyond Your Data

Providing guidance not only on “how we know,” but also on how we go astray, patho-epistemology, would be helpful for judges and lawyers. Hand’s book really just a beginning to helping gatekeepers appreciate how superficially plausible health-effects claims are invalidated by the data relied upon by proffered expert witnesses.

* * * * * * * * * * * *

“There ain’t no room for the hopeless sinner
Who would hurt all mankind, just to save his own, believe me now
Have pity on those whose chances grow thinner”


[1]Reference Manual on Scientific Evidence v4.0” (Feb. 28, 2021).

[2] Steven Kendall, Joe S. Cecil, Jason A. Cantone, Meghan Dunn, and Aaron Wolf.

[3] Prof. Meng is the Whipple V. N. Jones Professor of Statistics, in Harvard University. (“Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies.”)

[4] Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink, Applying Quantitative Bias Analysis to Epidemiologic Data (2009).

[5] David J. Hand, Dark data : why what you don’t know matters (2020).

[6] Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (2021).