TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Manufacturing Consensus

December 9th, 2024

The lawsuit industry is fond of claiming that it is victimized by manufactured doubt;[1] its response has often been to manufacture consensus.[2] Doubt and assent are real psychological phenomena that are removed from the more important epistemic question whether the propositions doubted or agreed to are true, or worthy of belief.

Since at least the adoption of Federal Rule of Evidence 702, the law of evidence in federal courts, and in some state courts, has come to realize that expert witness opinion testimony must be judged by epistemic criteria. Implicit in such judging is that reviewing courts, and finders of fact, must assess the validity of facts and data relied upon, and inferences drawn, by expert witnesses.

Professor Edward Cheng has argued that judges and jurors are epistemically incompetent to engage in the tasks required of them by Rule 702.[3] Cheng would replace Rule 702 with what he calls a consensus rule that requires judges and jurors to assess only whether there is a scientific consensus on general scientific propositions such as claims of causality between a particular exposure and a specific disease outcome.

Cheng’s proposal is not the law; it never has been the law; and it will never be the law. Yet, law professors must make a living, and novelty is often the coin of the academic realm.[4] Cheng teaches at Vanderbilt Law School, and a few years ago, he started a podcast, Excited Utterances, which features some insightful and some antic proposals from the law school professoriate. The podcast is hosted by Cheng, or sometimes by his protégé, G. Alexander Nunn (“Alex”), who is now Associate Professor of Law at Texas A&M University School of Law

Cheng’s consensus rule has not gained any traction in the law, but it has attracted support from a few like-minded academics. David Caudill, a Professor of Law, at the Villanova University Charles Widger School of Law, has sponsored a symposium of supporters.[5] This year, Caudill has published another publication that largely endorses Cheng’s consensus rule.[6]

Back in October 2024, Cheng hosted Caudill on Excited Utterances, to talk about his support for Cheng’s consensus rule. The podcast website blurb describes Caudill as having critiqued and improved upon Cheng’s “proposal to have courts defer to expert consensus rather than screening expert evidence through Daubert.” [This is, of course, incorrect. Daubert was one case that interpreted a statute that has since been substantively revised twice. The principle of charity suggests that Nunn meant Federal Rule of Evidence 702.] Alex Nunn conducted the interview of Caudill, which was followed by some comments from Cheng.

If you are averse to reading law review articles, you may find Nunn’s interview of Caudill a more digestible, and time-saving, way to hear a précis of the Cheng-Caudill assault on scientific fact finding in court. You will have to tolerate, however, Nunn’s irrational exuberance over how the consensus rule is “cutting edge,” and “wide ranging,” and Caudill’s endorsement of the consensus rule as “really cool,” and his dismissal of the Daubert case as “infamous.”

Like Cheng, Caudill believes that we can escape the pushing and shoving over data and validity by becoming nose counters. The task, however, will not be straightforward. Many litigations begin before there is any consensus on one side or the other. No one seems to agree how to handle such situations. Some litigations begin with an apparent consensus, but then shift dramatically with the publication of a mega-trial or a definitive systematic review. Some scientific issues remain intractable to easy resolution, and the only consensuses exist within partisan enclaves.

Tellingly, Caudill moves from the need to discern “consensus” to mere “majority rule.” Having litigated health effects claims for 40 years or so, I have no idea of how we tally support for one view over another. Worse yet, Caudill acknowledges that judges and jurors will need expert assistance in identifying consensus. Perhaps litigants will indeed be reduced to calling librarians, historians, and sociologists of science, but such witnesses will not necessarily be able to access, interpret, and evaluate the underlying facts, data, and inferences to the controversy. Cheng and Caudill appear to view this willful blindness as a feature not a bug, but their whole enterprise works in derogation of the goal of evidence law to determine the truth of the matter.[7]

Robust consensus that exists over an extended period of time – in the face of severe testing of the challenged claim – may have some claim to track the truth of the matter. Cheng and Caudill, however, fail to deal with is the situation that results when the question is called among the real experts, and the tally is 51 to 49 percent. Or worse yet, 40% versus 38%, with 22% disclaiming having looked at the issue sufficiently. Cheng and Caudill are left with asking the fact finder to guess what the consensus will be when the scientific community sees the evidence that it has not yet studied or that does not yet exist.

Perhaps the most naïve feature of the Cheng-Caudill agenda is the notion that consensus bubbles up from the pool of real experts without partisan motivations. As though there is not already enough incentive to manufacture consensus, Cheng’s and Caudill’s approach will cause a proliferation of conferences that label themselves “consensus” forming meetings, which will result in self-serving declarations of – you guessed it – consensuses.[8]

Perhaps more important from a jurisprudential view is that the whole process of identifying a consensus has the normative goal of pushing listeners into believing that the consensus has the correct understanding so that they do not have to think very hard. We do not really care about the consensus; we care about the issue that underlies the alleged consensus. At best, when it exists, consensus is a proxy for truth. Without evidence, Caudill asserts that the proxy will be correct virtually all the time. In any event, a carefully reasoned and stated consensus view would virtually always make its way to the finder of fact in litigation in the form of a “learned treatise,” with which partisan expert witnesses would disagree at their peril.


[1] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health (2008); David Michaels, “Manufactured Uncertainty: Protecting Public Health in the Age of Contested Science and Product Defense,” 1076 Ann. N.Y. Acad. Sci. 149 (2006); David Michaels, “Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome,” 16 Ann. Epidemiol. 583 (2006); David Michaels & Celeste Monforton, “Manufacturing Uncertainty: Contested Science and the Protection of the Public’s Health and Environment,” 95 Amer. J. Public Health S39 (2005); David Michaels, “Doubt is their Product,” 292 Sci. Amer. 74 (June 2005).

[2] See generally Edward S. Herman & Noam Chomsky, Manufacturing Consent (1988); Schachtman, “The Rise of Agnothology as Conspiracy Theory,” Tortini (Aug. 21, 2022).

[3] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022); see Schachtman, “Cheng’s Proposed Consensus Rule for Expert Witnesses,” Tortini (Sept. 15, 2022); “Further Thoughts on Cheng’s Consensus RuleTortini (Oct. 3, 2022); “ Consensus Rule – Shadows of Validity,” Tortini (Apr. 26, 2023); “ Consenus is Not Science,” Tortini (Nov. 8, 2023).

[4] Of possible interest, David Madigan, a statistician who has frequently been involved in litigation for the lawsuit industry, and who has proffered some particularly dodgy analyses, was Professor Cheng’s doctoral dissertation advisor. See Schachtman, “Madigan’s Shenanigans & Wells Quelled in Incretin-Mimetic Cases,” Tortini (July 19, 2022); “David Madigan’s Graywashed Meta-Analysis in Taxotere MDL,” Tortini (June 19, 2020); “Disproportionality Analyses Misused by Lawsuit Industry,” Tortini (April 20, 2020); “Johnson of Accutane – Keeping the Gate in the Garden State,” Tortini (March 28, 2015).

[5] David S. Caudill, “The ‘Crisis of Expertise’ Reaches the Courtroom: An Introduction to the Symposium on, and a Response to, Edward Cheng’s Consensus Rule,” 67 Villanova L. Rev. 837 (2023).

[6] David S. Caudill, Harry Collins & Robert Evans, “Judges Should Be Discerning Consensus, Not Evaluating Scientific Expertise,” 92 Univ. Cinn. L. Rev. 1031 (2024).

[7] See, e.g., Jorge R Barrio, “Consensus Science and the Peer Review,” 11 Molecular Imaging & Biol. 293 (2009) (“scientific reviewers of journal articles or grant applications – typically in biomedical research – may use the term (e.g., ‘….it is the consensus in the field…’) often as a justification for shutting down ideas not associated with their beliefs.”); Yehoshua Socol, Yair Y Shaki & Moshe Yanovskiy, “Interests, Bias, and Consensus in Science and Regulation,” 17 Dose Response 1 (2019) (“While appealing to scientific consensus is a legitimate tool in public debate and regulatory decisions, such an appeal is illegitimate in scientific discussion itself.”); Neelay Trivedi, “Science is about Evidence, not Consensus,” The Stanford Rev. (Feb. 25, 2021).

[8] For a tendentious example of such a claim of manufactured consensus, see David Healy, “Manufacturing Consensus,” 34 The Hastings Center Report 52 (2004); David Healy, “Manufacturing Consensus,” 30 Culture, Medicine & Psychiatry 135 (2006).

Professor Lahav’s Radically Misguided Treatment of Chancy Tort Causation

September 27th, 2024

In the 19th and early 20th century, scientists and lay people usually conceptualized causation as “deterministic.” Their model of science was perhaps what was called Newtonian, in which observations were invariably described in terms of identifiable forces that acted upon antecedent phenomena. The universe was akin to a pool table, with the movement of the billiard balls fully explained by their previous positions, mass, and movements. There was little need for probability to describe events or outcomes in such a universe.

The 20th century ushered in probabilistic concepts and models in physics and biology. Because tort law is so focused on claims of bodily integrity and harms, I am focused here on claimed health effects. Departing from the Koch-Henle postulates and our understanding of pathogen-based diseases, the latter half of the 20th century saw the rise of observational epidemiology and scientific conclusions about stochastic processes and effects that could best be understood in terms of probabilities, with statistical inferences from samples of populations. The language of deterministic physics failed to do justice to epidemiologic evidence or conclusions. Modern medicine and biology invoked notions of base rates for chronic diseases, which rates might be modified by environmental exposures.

In the wake of the emerging science of epidemiology, the law experienced a new horizon on which many claimed tortogens did not involve exposures uniquely tied to the harms alleged. Rather, the harms asserted were often diseases of ordinary life, but with that suggested the harms were quantitatively more prevalent or incident among people exposed to the alleged tortogen. Of course, the backwaters of tort law saw reactionary world views on trial, as with claims of trauma-induced cancer cases, which are with us still. Nonetheless, slowly but not always steadily, the law came to grips with probability and statistical evidence.

In law, as in science, a key component of causal attribution is counterfactual analysis. If A causes B, then if in the same world, ceteris paribus, we do not have A, then we don’t have B. Counterfactual analysis applies as much to stochastic processes that are causally influenced by rate changes, as they apply to the Newtonian world of billiard balls. Some writers in the legal academy, however, would opportunistically use the advent of probabilistic analyses of health effects to dispose of science altogether. No one has more explicitly exploited the opportunity than Professor Alexandra Lahav.

In an essay published in 2022, Professor Lahav advanced extraordinary claims about probabilistic causation, or what she called “chancy causation.”[1] The proffered definition of chancy causation is bumfuzzling. Lahav provides an example of an herbicide that is “associated” with the type of cancer that the heavily exposed plaintiff developed. She tells us that:

“[t]here is a chance that the exposure caused his cancer, and a chance that it did not. Probability follows certain rules, or tendencies, but these regular laws do not abolish chance. This is a common problem in modern life, where much of what we know about medicines, interventions, and the chemicals to which we are exposed is probabilistic. Following the philosophical literature, I call this phenomenon chancy causation.”[2]

So the rules of probability do not abolish chance? It is hard to know what Lahav is trying to say here. Probability quantifies chance, and gives us an understanding of phenomena and their predictability. When we can model an empirical process with a probability distribution, such as one that is independent and identically distributed, we can often make and test quantitative inferences about the anticipated phenomena.

Lahav vaguely acknowledges that her term, “chancy causation” is borrowed, but she does not give credit to the many authors who have used it before.[3] Lahav does note that the concept of probabilistic causation used in modern-day risk factor epidemiology is different from the deterministic causal claims that dominated tort law in the 19th and the first half of the 20th century. Lahav claims that chancy causation is inconsistent with counterfactual analysis, but she cites no support for her claim, which is demonstrably false. If we previously saw the counterfactual of if A then B, as key to causality, we can readily restate the counterfactual as a probability: A probably causes B. On a counterfactual analysis, then if we do not have A as an antecedent, then we probably do not have B. For a classic tortogen such as tobacco smoking, we can say confidently that tobacco smoking probably causes lung cancer. And for a given instance of lung cancer, we can say based upon the entire evidentiary display, that if a person did not smoke tobacco, he would probably not have developed lung cancer. Of course, the correspondence is not 100 percent, which is only to say that it is probabilistic. There are highly penetrant genetic mutations that may be the cause of a given lung cancer case. We know, however, that such mutations do not cause or explain the large majority of lung cancer cases.

Contrary to Lahav’s ipse dixits, tort law can incorporate, and has accommodated, both general and specific causation in terms of probabilistic counterfactuals. The modification requires us, of course, to address the baseline situation as a rate or frequency of events, and the post-exposure world as one with a modified rate or frequency. Without confusion or embarrassment, we can say that the exposure is the cause of the change in event rates. Modern physics similarly addresses whether we must be content with probability statements, rather than precise deterministic “billiard ball” physics, which is so useful in a game of snooker, but less so in describing the position of sub-atomic particles. In the first half of the 20th century, the biological sciences learned with some difficulty that it must embrace probabilistic models, in genetic science, as well as in epidemiology. Many biological causation models are completely stated in terms of probabilities that are modified by specified conditions.

Lahav intends for her rejection of counterfactual causality to do a lot of work in her post-modern program. By falsely claiming that chancy causation has no factual basis, Lahav jumps to the conclusion that what the law calls for is nothing but “policy,”[4] and “normative decision.”[5] Having fabricated the demise of but-for causation in the context of probabilistic relationships, Lahav suggests that tort law can pretend that the causation question is nothing more than a normative analysis of the defendant’s conduct. (Perhaps it is more than a tad revealing that she does not see that the plaintiff’s conduct is involved in the normative judgment.) Of course, tort law already has ample room for policy and normative considerations built into the concepts of duty and breach of duty.

As we saw with the lung cancer example above, the claim that tobacco smoking probably caused the smoker to develop lung cancer can be entirely factual, and supported by a probabilistic judgment. Lahav calls her erroneous move “pragmatic,” although it has no relationship to the philosophical pragmatism of Peirce or Quine. Lahav’s move is an incorrect misrepresentation of probability and of epidemiologic science in the name of compensation free-for-alls. Obtaining a heads in the flip of a fair coin has a probability of 50%; that is a fact, not a normative decision, even though it is, to use Lahav’s vocabulary, “chancy.”

Lahav’s argument is not always easy to follow. In one place, she uses “chancy” to refer to the posterior probability of the correctness of the causal claim:

“the counterfactual standard can be successfully defended against by the introduction of chance. The more conflicting studies, the “more chancy” the causation. By that I do not mean proving a lower probability (although this is a good result from a defense point of view) but rather that more, different study results create the impression of irreducible chanciness, which in turn dictates that the causal relation cannot be definitively proven.”[6]

This usage, which clearly refers to the posterior probability of a claim, is not necessarily limited to so-called non-deterministic phenomena. People could refer to any conclusion, based upon conflicting evidence of deterministic phenomena, as “chancy.”

Lurking in her essay is a further confusion between the posterior probability we might assign to a claim, or to an inference from probabilistic evidence, and the probability of random error. In an interview conducted by Felipe Jiménez,[7] Lahav was more transparent in her confusion, and she explicitly commited the transpositional fallacy with her suggestion that customary statistical standards (statistical significance) ensure that even small increased risks, say of 30%, are known to a high degree of certainty.

Despite these confusions, it seems fairly clear that Lahav is concerned with stochastic causal processes, and most of her examples evidence that concern. Lahav poses a hypothetical in which epidemiologic studies show smokers have a 20% increased risk of developing lung cancer compared with non-smokers.[8] Given that typical smoking histories convey relative risks of 20 to 30, or increased risks of 2,000 to 3,000%, Lahav’s hypothetical may readers think she is shilling for tobacco compaies. In any event, in the face of a 20% increased risk (or relative rsk of 1.2), Lahav acknowledges that the probability of a smoker’s developing lung cancer is higher than that of a non-smoker, but “in any particular case the question whether a patient’s lung cancer was caused by smoking is uncertain.” This assertion, however, is untrue; the question is not “uncertain.” She has provided a certain quantification of the increased risk. Furthermore, her hypothetical gives us a good deal of information on which we can say that smoking probably did not result in the patient’s lung cancer. Causation may be chancy because it is based upon a probabilistic inference, but the chances are actual known, and they are low.

Lahav posits a more interesting hypothetical when she considers a case in which there is an 80% chance that a person’s lung cancer is attributable to smoking.[9] We can understand this hypothetical better if we reframe it as classic urn probability problem. In a given (large) population of non-smokers, we expect 100 lung cancers per year. In a population of smokers, otherwise just like the population of non-smokers, we observe 500 lung cancers. Of the observed number, 100 were “expected” because they happen without exposure to the putative causal agent, and 400 are “excess.”The relative risk would be 5, or 400% increased risk, and still well below the actual measure of risk from long-term smoking, but the attributable risk would be [(RR-1)/RR] or 0.8 (or 80%). If we imagine an urn with 100 white “expected” balls, and 400 red “excess” balls added, then any given draw from the urn, with replacement, yields an 80% probability of a red ball, or an excess case. Of course, if we can see the color, we may come to a consensus judgment that the ball is actually red. But on our analogy to discerning the cause of a given lung cancer, we have at present nothing by way of evidence with which to call the question, and so it remains “chancy” or probabilistic. The question is not, however, in any way normative. The answer is different quantitatively in the 20% and in the 400% hypotheticals.

Lahav asserts that we are in a state of complete ignorance once a smoker has lung cancer.[10] This is not, however, true. We have the basis for a probabilistic judgment that will probably be true. It may well be true that the probability of attribution will be affected by the probability that the relative risk = 5 is correct. If the posterior probability for the claim that smoking causes lung cancer by increasing its risk 400% is only 30%, then of course, we could not make the attribution in a given case with an 80% probability of correctness. In actual litigation, the argument is often framed on an assumption arguendo that the increased risk is greater than two, so that only the probability of attribution is involved. If the posterior probability of the claim that exposure to the tortogen increased risk by 400% or 20,000% was only 0.49, then the plaintiff would lose. If the posterior probability of the increased risk was greater than 0.5, the finder of fact could find that the specific causation claim had been carried if the magnitude of the relative risk, and the attributable risk, were sufficiently large. This inference on specific causation would not be a normative judgment; it would be guided by factual evidence about the magnitude of the relevant increased risk.

Lahav advances a perverse skepticism that any inferences about individuals can be drawn from information about rates or frequencies in groups of similar individuals.  Yes, there may always be some debate about what is “similar,” but successive studies may well draw the net tighter around what is the appropriate class. Lahav’s skepticism and her outright denialism about inferences from general causation to specific causation, are common among some in the legal academy, but it ignores that group to individual inferences are drawn in epidemiology in multiple contexts. Regressions for disease prediction are based upon individual data within groups, and the regression equations are then applied to future individuals to help predict those individuals’ probability of future disease (such as heart attack or breast cancer), or their probability of cancer-free survival after a specific therapy. Group to individual inferences are, of course, also the basis for prescribing decisions in clinical medicine.  These are not normative inferences; they are based upon evidence-based causal thinking about probabilistic inferences.

In the early tobacco litigation, defendants denied that tobacco smoking caused lung cancer, but they argued that even if it did, and the relative risk were 20, then the specific causation inference in this case was still insecure because the epidemiologic study tells us nothing about the particular case. Lahav seems to be channeling the tobacco-company argument, which has long since been rejected on the substantive law of causation. Indeed, as noted, epidemiologists do draw inferences about individual cases from population-based studies when they invoke clinical prediction models such as the Framingham cardiovascular risk event model, or the Gale breast cancer prediction model. Physicians base important clinical interventions, both pharmacologic and surgical, for individuals upon population studies. Lahav asserts, without evidence, that the only difference between an intervention based upon an 80% or a 30% probability is a “normative implication.”[11] The difference is starkly factual, not normative, and describes a long-term likelihood of success, as well as an individual probability of success.

Post-Modern Causation

What we have in Lahav’s essay is the ultimate post-modern program, which asserts, without evidence, that when causation is “chancy,” or indeterminate, courts leave the realm of science and step into the twilight zone of “normative decisions.” Lahav suggests that there is an extreme plasticity to the very concept of causation such that causation can be whatever judges want it to be. I for one sincerely doubt it. And if judges make up some Lahav-inspired concept of normative causation, the scientific community would rightfully scoff.

Establishing causation can be difficult, and many so-called mass tort litigations have failed for want of sufficient, valid evidence supporting causal claims. The late Professor Margaret Berger reacted to this difficulty in a more forthright way by arguing for the abandonment of general causation, or cause-in-fact, as an element of tort claims under the law.[12] Berger’s antipathy to requiring causation manifested in her hostility to judicial gatekeeping of the validity of expert witness opinions. Her animus against requiring causation and gatekeeping under Rule 702 was so strong that it exceeded her lifespan. Berger’s chapter in the third edition of the Reference Manual on Scientific Evidence, which came out almost one year after her death, embraced the First Circuit’s notorious anti-Daubert decision in Milward, which also post-dated her passing.[13]

Professor Lahav has previously expressed a distain for the causation requirement in tort law. In an earlier paper, “The Knowledge Remedy,” Lahav argued for an extreme, radical precautionary principle approach to causation.[14] Lahav believes that the likes of David Michaels have “demonstrated” that manufactured uncertainty is a genuine problem, but not one that affects her main claims. Remarkably, Lahav sees no problem with manufactured certainty in the advocacy science of many authors or the lawsuit industry.[15] In “Chancy Causation,” Lahav thus credulously repeats Michaels’ arguments, and goes so far as to describe Rule 702 challenges to causal claims as having the “negative effect” of producing “incentives to sow doubt about epidemiologic studies using methodological battles, a strategy pioneered by the tobacco companies … .”[16] Lahav’s agenda is revealed by the absence of any corresponding concern about the negative effect of producing incentives to overstate the findings, or the validity of inferences, in order to obtain an unwarranted and unsafe verdicts for claimants.


[1] Alexandra D. Lahav, “Chancy Causation in Tort,” 15 J. Tort L. 109 (2022) [hereafter Chancy Causation].

[2] Chancy Causation at 110.

[3] See, e.g., David K. Lewis, Philosophical Papers: Volume 2 175 (1986); Mark Parascandola, “Evidence and Association: Epistemic Confusion in Toxic Tort Law,” 63 Phil. Sci. S168 (1996).

[4] Chancy Causation at 109.

[5] Chancy Causation at 110-11.

[6] Chancy Causation at 129.

[7] Felipe Jiménez, “Alexandra Lahav on Chancy Causation in Tort,” The Private Law Podcast (Mar. 29, 2021).

[8] Chancy Causation at 115.

[9] Chancy Causation at 116-17.

[10] Chancy Causation at 117.

[11] Chancy Causation at 119.

[12] Margaret A. Berger, “Eliminating General Causation: Notes towards a New Theory of Justice and Toxic Torts,” 97 Colum. L. Rev. 2117 (1997).

[13] Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, 132 S. Ct. 1002 (2012).

[14] Alexandra D. Lahav, “The Knowledge Remedy,” 98 Texas L. Rev. 1361 (2020). See “The Knowledge Remedy ProposalTortini (Nov. 14, 2020).

[15] Chancy Causation at 118 (citing plaintiffs’ expert witness David Michaels, The Triumph of Doubt: Dark Money and the Science of Deception (2020), among others).

[16] Chancy Causation at 129.

800 Plaintiffs Fail to Show that Glyphosate Caused Their NHL

September 11th, 2024

Last week, Barbara Billauer, at the American Council on Science and Health[1] website, reported on the Australian court that found insufficient scientific evidence to support plaintiffs’ claims that they had developed non-Hodgkin’s lymphoma (NHL) from their exposure to Monsanto’s glyphosate product. The judgment had previously been reported by the Genetic Literacy Project,[2] which republished an Australian news report from July.[3] European news media seemed more astute in reporting the judgment, with The Guardian[4] and Reuters reporting the court decision in July.[5] The judgment was noteworthy because the mainstream and legal media in the United States generally ignored the development.  The Old Gray Lady and the WaPo in the United States, both of which have covered previous glyphosate cases in the United States, sayeth naught. Crickets at Law360.

On July 24, 2024, Justice Michael Lee, for the Federal Court of Australia, ruled that there was insufficient evidence to support the claims of 800 plaintiffs that their NHL had been caused by glyphosate exposure.[6] Because plaintiffs’ claims were aggregated in a class, the judgment against the class of 800 or so claimants, was the most significant judgment in glyphosate litigation to date.

Justice Lee’s opinion is over 300 pages long, and I have had a chance only to skim it. Regardless of how the Australian court handled various issues, one thing is indisputable: the court has given a written record of its decision processes for the world to assess, critique, validate, or refute. Jury trials provide no similar opportunity to evaluate the reasoning processes (vel non) of the decision maker. The absence of transparency, and an opportunity to evaluate the soundness of verdicts in complex medical causation, raises the question whether jury trials really satisfy the legal due process requirements of civil adjudication.


[1] Barbara Pfeffer Billauer, “The RoundUp Judge Who Got It,” ACSH (Aug. 29, 2024).

[2] Kristian Silva, “Insufficient evidence that glyphosate causes cancer: Australian court tosses 800-person class action lawsuit,” ABC News (Australia) (July 26, 2024).

[3] Kristian Silva, “Major class action thrown out as Federal Court finds insufficient evidence to prove weedkiller Roundup causes cancer,” ABC Australian News (July 25, 2024).

[4] Australian Associated Press, “Australian judge dismisses class action claiming Roundup causes cancer,” The Guardian (July 25, 2024).

[5] Peter Hobson and Alasdair Pal, “Australian judge dismisses lawsuit claiming Bayer weedkiller causes blood cancer,” Reuters (July 25, 2024).

[6] McNickle v. Huntsman Chem. Co. Australia Pty Ltd (Initial Trial) [2024] FCA 807.

Zhang’s Glyphosate Meta-Analysis Succumbs to Judicial Scrutiny

August 5th, 2024

Back in March 2015, the International Agency for Research on Cancer (IARC) issued its working group’s monograph on glyphosate weed killer. The report classified glyphosate as a “probable carcinogen,” which is highly misleading. For IARC, the term “probable” does not mean more likely than not, or for that matter, probable does not have any quantitative meaning. The all-important statement of IARC methods, “The Preamble,” makes this clear.[1] 

In the case of glyphosate, the IARC working group concluded that the epidemiologic evidence for an association between glyphosate exposure and cancer (specifically non-Hodgkins lymphoma (NHL)), was limited, which is IARC’s euphemism for insuffcient. Instead of epidemiology, IARC’s glyphosate conclusion was based largely upon rodent studies, but even the animal evidence relied upon by IARC was dubious. The IARC working group cherry picked a few arguably “positive” rodent study results with increases in tumors, while ignoring exculpatory rodent studies with decreasing tumor yield.[2]

Although the IARC hazard classification was uncritically embraced by the lawsuit industry, most regulatory agencies, even indulging precautionary principle reasoning, rejected the claim of carcinogenicity. The United States  Environmental Protection Agency (EPA), European Food Safety Authority, Food and Agriculture Organization (in conjunction with World Health Organization, European Chemicals Agency, Health Canada, German Federal Institute for Risk Assessment, among others, found that the scientific evidence did not support the claim that glyphosate causes NHL. The IARC monograph very quickly after publication became the proximate cause of a huge litigation effort by the lawsuit industry against Monsanto.

The personal injury cases against Monsanto, filed in federal court, were aggregated for pre-trial hearing, before Judge Vince Chhabria, of the Northern District of California, as MDL 2741. Judge Chhabria denied Monsanto’s early Rule 702 motions, and thus cases proceeded to trial, with mixed results.

In 2019, the Zhang study, a curious meta-analysis of some of the available glyphosate epidemiologic studies appeared in Mutation Research / Reviews in Mutation Research, a toxicology journal that seemed an unlikely venue for a meta-analysis of epidemiologic studies. The authors combined selected results from one large cohort study, the Agricultural Health Study, along with five case-control studies, to reach a summary relative risk of 1.41 (95% confidence interval 1.13-1.75).[3] According to the authors, their “current meta-analysis of human epidemiological studies suggests a compelling link between exposures to GBHs [glyphosate] and increased risk for NHL.”

The Zhang meta-analysis was not well reviewed in regulatory and scientific circles. The EPA found that Zhang used inappropriate methods in her meta-analysis.[4] Academic authors also panned the Zhang meta-analysis in both scholarly,[5] and popular articles.[6] The senior author of the Zhang paper, Lianne Sheppard, a Professor in the University of Washington Departments of Environmental  and  Occupational Health Sciences, and Biostatistics, attempted to defend the study, in Forbes.[7] Professor Geoffrey Kabat very adeptly showed that this defense was futile.[8] Despite the very serious and real objections to the validity of the Zhang meta-analysis, plaintiffs’ expert witnesses, such as Beate Ritz, an epidemiologist with U.C.L.A. testified that she trusted and relied upon the analysis.[9]

For five years, the Zhang study was a debating point for lawyers and expert witnesses in the glyphosate litigation, without significant judicial gatekeeping. It took the entrance of Luoping Zhang herself as an expert witness in the glyphosate litigation, and the procedural oddity of her placing exclusive reliance upon her own meta-analysis, to bring the meta-analysis into the unforgiving light of judicial scrutiny.

Zhang is a biochemist and toxicologist, in the University of California, Berkeley. Along with two other co-authors of her 2019 meta-analysis paper, she had been a board member of the EPA’s 2016 scientific advisory panel on glyphosate. After plaintiffs’ counsel disclosed Zhang as an expert witness, she disclosed her anticipated testimony, as is required by Federal Rule of Civil Procedure 26, by attaching and adopting by reference the contents of two of her published papers. The first paper was her 2019 meta-analysis; the other paper discussed putative mechanisms. Neither paper concluded that glyphosate causes NHL. Zhang’s disclosure did not add materially to her 2019 published analysis of six epidemiologic studies on glyphosate and NHL.

The defense challenged the validity of Dr. Zhang’s proffered opinions, and her exclusive reliance upon her own 2019 meta-analysis required the MDL court to pay attention to the failings of that paper, which had previously escaped critical judicial scrutiny. In June 2024, after an oral hearing in Bulone v. Monsanto, at which Dr. Zhang testified, Judge Chhabria ruled that Zhang’s proffered testimony, and her reliance upon her own meta-analysis was “junk science.”[10]

Judge Chhabria, perhaps encouraged by the recently fortifying amendment to Rule 702, issued a remarkable opinion that paid close attention to the indicia of validity of an expert witness’s opinion and the underlying meta-analysis. Judge Chhabria quickly spotted the disconnect between Zhang’s published papers and what is required for an admissible causation opinion. The mechanism paper did not address the extant epidemiology, and both sides in the MDL had emphasized that the epidemiology was critically important for determining whether there was, or was not, causation.

Zhang’s meta-analysis did evaluate some, but not all, of the available epidemiology, but the paper’s conclusion stopped considerably short of the needed opinion on causation. Zhang and colleagues had concluded that there was a “compelling link” between exposures to [glyphosate-based herbicides] and increased risk for NHL. In their paper’s key figure, show casing the summary estimate of relative risk of 1.41 (95% C.I., 1.13 -1.75), Zhang and her co-authors concluded only that exposure was “associated with an increased risk of NHL.” According to Judge Chhabria, in incorporating her 2019 paper into her Rule 26 report, Zhang failed to add a proper holistic causation analysis, as had other expert witnesses who had considered the Bradford Hill predicates and considerations.

Judge Chhabria picked up on another problem that has both legal and scientific implications. A meta-analysis is out of date as soon as a subsequent epidemiologic study becomes available, which would have satisfied the inclusion criteria for the meta-analysis. Since publishing her meta-analysis in 2019, additional studies had in fact been published. At the hearing, Dr. Zhang acknowledged that several of them would qualify for inclusion in the meta-analysis, per her own stated methods. Her failure to update the meta-analysis made her report incomplete and inadmissible for a court matter in 2024.

Judge Chhabria might have stopped there, but he took a closer look at the meta-analysis to explore whether it was a valid analysis, on its own terms. Much as Chief Judge Nancy Rosenstengel had done with the made-for-litigation meta-analysis concocted by Martin Wells in the paraquat litigation,[11] Judge Chhabria examined whether Zhang had been faithful to her own stated methods. Like Chief Judge Rosenstengel’s analysis, Judge Chhabria’s analysis stands as a strong rebuttal to the uncharitable opinion of Professor Edward Cheng, who has asserted that judges lack the expertise to evaluate the “expert opinions” before them.[12]

Judge Chhabria accepted the intellectual challenge that Rule 702 mandates. With the EPA memorandum lighting the way, Judge Chhabria readily discerned that “the challenged meta-analysis was not reliably performed.” He declared that the Zhang meta-analysis was “junk science,” with “deep methodological problems.”

Zhang claimed that she was basing the meta-analysis on the subgroups of six studies with the heaviest glyphosate exposure. This claim was undermined by the absence of any exposure-response gradient in the study deemed by Zhang to be of the highest quality. Furthermore, of the remaining five studies, three studies failed to provide any exposure-dependent analysis other than a comparison of NHL rates among “ever” versus “never” glyphosate exposure. As a result of this heterogeneity, Zhang used all the data from studies without exposure characterizations, but only limited data from the other studies that analyzed NHL by exposure levels. And because the highest quality study was among those that provided exposure level correlations, Zhang’s meta-analysis used only some of the data from it.

The analytical problems created by Zhang’s meta-analytical approach were compounded by the included studies’ having measured glyphosate exposures differently, with different cut-points for inclusion as heavily exposed. Some of the excluded study participants would have heavier exposure than those included in the summary analysis.

In the universe of included studies, some provided adjusted results from multi-variate analyses that included other pesticide exposures. Other studies reported only unadjusted results. Even though Zhang’s method stated a preference for adjusted analyses, she inexplicably failed to use adjusted data in the case of one study that provided both adjusted and unadjusted results.

As shown in Judge Chhabria’s review, Zhang’s methodological errors created an incoherent analysis, with methods that could not be justified. Even accepting its own stated methodology, the meta-analysis was an exercise in cherry picking. In the court’s terms, it was, without qualification, “junk science.”

After the filing of briefs, Judge Chhabria provided the parties an oral hearing, with an opportunity for viva voce testimony. Dr. Zhang thus had a full opportunity to defend her meta-analysis. The hearing, however, did not go well for her. Zhang could not talk intelligently about the studies included, or how they defined high exposure. Zhang’s lack of familiarity with her own opinion and published paper was yet a further reason for excluding her testimony.

As might be expected, plaintiffs’ counsel attempted to hide behind peer review. Plaintiffs’ counsel attempted to shut down Rule 702 scrutiny of the Zhang meta-analysis by suggesting that the trial court had no business digging into validity concerns given that Zhang had published her meta-analysis in what apparently was a peer reviewed journal. Judge Chhabria would have none of it. In his opinion, publication in a peer-reviewed journal cannot obscure the glaring methodological defects of the relied upon meta-analysis. The court observed that “[p]re-publication editorial peer review, just by itself, is far from a guarantee of scientific reliability.”[13] The EPA memorandum was thus a more telling indicator of the validity issues than the publication in a nominally peer-reviewed journal.

Contrary to some law professors who are now seeking to dismantle expert witness gatekeeping as beyond a judge’s competence, Judge Chhabria dismissed the suggestion that he lacked the expertise to adjudicate the validity issues. Indeed, he displayed a better understanding of the meta-analytic process than did Dr. Zhang. As the court observed, one of the goals of MDL assignments was to permit a single trial judge to have time to engage with the scientific issues and to develop “fluency” in the relevant scientific studies. Indeed, when MDL judges have the fluency in the scientific concepts to address Rule 702 or 703 issues, it would be criminal for them to ignore it.

The Bulone opinion should encourage lawyers to get “into the weeds” of expert witness opinions. There is nothing that a little clear thinking – and glyphosate – cannot clear away. Indeed, now that the weeds of Zhang’s meta-analysis are cleared away, it is hard to fathom that any other expert witness can rely upon it without running afoul of both Federal Rules of Evidence 702 and 703.

There were a few issues not addressed in Bulone. As seen in her oral hearing testimony, Zhang probably lacked the qualifications to proffer the meta-analysis. The bar for qualification as an expert witness, however, is sadly very low. One other issue that might well have been addressed is Zhang’s use of a fixed effect model for her meta-analysis. Considering that she was pooling data from cohort and case-control studies, some with and some without adjustments for confounders, with different measures of exposure, and some with and some without exposure-dependent analyses, Zhang and her co-authors were not justified in using a fixed effect model for arriving at a summary estimate of relative risk. Admittedly, this error could easily have been lost in the flood of others.

Postscript

Glyphosate is not merely a scientific issue. Its manufacturer, Monsanto, is the frequent target of media outlets (such as Telesur) from autocratic countries, such as Communist China and its client state, Venezuela.[14]

天安门广场英雄万岁


[1]The IARC-hy of Evidence – Incoherent & Inconsistent Classifications of Carcinogenicity,” Tortini (Sept. 19, 2023).

[2] Robert E Tarone, “On the International Agency for Research on Cancer classification of glyphosate as a probable human carcinogen,” 27 Eur. J. Cancer Prev. 82 (2018).

[3] Luoping Zhang, Iemaan Rana, Rachel M. Shaffer, Emanuela Taioli, Lianne Sheppard, “Exposure to glyphosate-based herbicides and risk for non-Hodgkin lymphoma: A meta-analysis and supporting evidence,” 781 Mutation Research/Reviews in Mutation Research 186 (2019).

[4] David J. Miller, Acting Chief Toxicology and Epidemiology Branch Health Effects Division, U.S. Environmental Protection Agency, Memorandum to Christine Olinger, Chief Risk Assessment Branch I, “Glyphosate: Epidemiology Review of Zhang et al. (2019) and Leon et al. (2019) publications for Response to Comments on the Proposed Interim Decision” (Jan. 6, 2020).

[5] Geoffrey C. Kabat, William J. Price, Robert E. Tarone, “On recent meta-analyses of exposure to glyphosate and risk of non-Hodgkin’s lymphoma in humans,” 32 Cancer Causes & Control 409 (2021).

[6] Geoffrey Kabat, “Paper Claims A Link Between Glyphosate And Cancer But Fails To Show Evidence,” Science 2.0 (Feb. 18, 2019).

[7] Lianne Sheppard, “Glyphosate Science is Nuanced. Arguments about it on the Internet? Not so much,” Forbes (Feb. 20, 2020).

[8] Geoffrey Kabat, “EPA Refuted A Meta-Analysis Claiming Glyphosate Can Cause Cancer And Senior Author Lianne Sheppard Doubled Down,” Science 2.0 (Feb. 26, 2020).

[9] Maria Dinzeo, “Jurors Hear of New Study Linking Roundup to Cancer,” Courthouse News Service (April 8, 2019).

[10] Bulone v. Monsanto Co., Case No. 16-md-02741-VC, MDL 2741 (N.D. Cal. June 20, 2024). See Hank Campbell, “Glyphosate legal update: Meta-study used by ambulance-chasing tort lawyers targeting Bayer’s Roundup as carcinogenic deemed ‘junk science nonsense’ by trial judge,” Genetic Literacy Project (June 24, 2024).

[11] In re Paraquat Prods. Liab. Litig., No. 3:21-MD-3004-NJR, 2024 WL 1659687 (S.D. Ill. Apr. 17, 2024) (opinion sur Rule 702 motion), appealed sub nom., Fuller v. Syngenta Crop Protection, LLC, No. 24-1868 (7th Cir. May 17, 2024). SeeParaquat Shape-Shifting Expert Witness Quashed,” Tortini (April 24, 2024).

[12] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022). SeeCheng’s Proposed Consensus Rule for Expert Witnesses,” Tortini (Sept. 15, 2022); “Further thoughts on Cheng’s Consensus Rule,” Tortini (Oct. 3, 2022).

[13] Bulone, citing Valentine v. Pioneer Chlor Alkali Co., 921 F. Supp. 666, 674-76 (D. Nev. 1996), for its distinction between “editorial peer review” and “true peer review,” with the latter’s inclusion of post-publication assessment of a paper as really important for Rule 702 purposes.

[14] Anne Applebaum, Autocracy, Inc.: The Dictators Who Want to Run the World 66 (2024).

Paraquat Shape-Shifting Expert Witness Quashed

April 24th, 2024

Another multi-district litigation (MDL) has hit a jarring speed bump. Claims for Parkinson’s disease (PD), allegedly caused by exposure to paraquat dichloride (paraquat), were consolidated, in June 2021, for pre-trial coordination in MDL No. 3004, in the Southern District of Illinois, before Chief Judge Nancy J. Rosenstengel. Like many health-effects litigation claims, the plaintiffs’ claims in these paraquat cases turn on epidemiologic evidence. To make their causation case in the first MDL trial cases, plaintiffs’ counsel nominated a statistician, Martin T. Wells, to present their causation case. Last week, Judge Rosenstengel found Wells’ opinion so infected by invalid methodologies and inferences as to be inadmissible under the most recent version of Rule 702.[1] Summary judgment in the trial cases followed.[2]

Back in the 1980s, paraquat gained some legal notoriety in one of the most retrograde Rule 702 decisions.[3] Both the herbicide and Rule 702 survived, however, and they both remain in wide use. For the last two decades, there has been a widespread challenges to the safety of paraquat, and in particular there have been claims that paraquat can cause PD or parkinsonism under some circumstances.  Despite this background, the plaintiffs’ counsel in MDL 3004 began with four problems.

First, paraquat is closely regulated for agricultural use in the United States. Under federal law, paraquat can be used to control the growth of weeds only “by or under the direct supervision of a certified applicator.”[4] The regulatory record created an uphill battle for plaintiffs.[5] Under the Federal Insecticide, Fungicide, and Rodenticide Act (“FIFRA”), the U.S. EPA has regulatory and enforcement authority over the use, sale, and labeling of paraquat.[6] As part of its regulatory responsibilities, in 2019, the EPA systematically reviewed available evidence to assess whether there was an association between paraquat and PD. The agency’s review concluded that “there is limited, but insufficient epidemiologic evidence at this time to conclude that there is a clear associative or causal relationship between occupational paraquat exposure and PD.”[7] In 2021, the EPA issued its Interim Registration Review Decision, and reapproved the registration of paraquat. In doing so, the EPA concluded that “the weight of evidence was insufficient to link paraquat exposure from pesticidal use of U.S. registered products to Parkinson’s disease in humans.”[8]

Second, beyond the EPA, there were no other published reviews, systematic or otherwise, which reached a conclusion that paraquat causes PD.[9]

Third, the plaintiffs claims faced another serious impediment. Their counsel placed their reliance upon Professor Martin Wells, a statistician on the faculty of Cornell University. Unfortunately for plaintiffs, Wells has been known to operate as a “cherry picker,” and his methodology has been previously reviewed in an unfavorable light. Another MDL court, which reviewed a review and meta-analysis propounded by Wells, found that his reports “were marred by a selective review of data and inconsistent application of inclusion criteria.”[10]

Fourth, the plaintiffs’ claims were before Chief Judge Nancy J. Rosenstengel, who was willing to do the hard work required under Rule 702, specially as it has been recently amended for clarification and emphasis of the gatekeeper’s responsibilities to evaluate validity issues in the proffered opinions of expert witnesses. As her 97 page decision evinces, Judge Rosenstengel conducted four days of hearings, which included viva voce testimony from Martin Wells, and she obviously read the underlying papers, reviews, as well as the briefs and the Reference Manual on Scientific Evidence, with great care. What followed did not go well for Wells or the plaintiffs’ claims.[11] Judge Rosenstengel has written an opinion that may be the first careful judicial consideration of the basic requirements of systematic review.

The court noted that systematic reviewers carefully define a research question and what kinds of empirical evidence will be reviewed, and then collect, summarize, and, if feasible, synthesize the available evidence into a conclusion.[12] The court emphasized that systematic reviewers should “develop a protocol for the review before commencement and adhere to the protocol regardless of the results of the review.”[13]

Wells proffered a meta-analysis, and a “weight of the evidence” (WOE) review from which he concluded that paraquat causes PD and nearly triples the risk of the disease among workers exposed to the herbicide.[14] In his reports, Wells identified a universe of at least 36 studies, but included seven in his meta-analysis. The defense had identified another two studies that were germane.[15]

Chief Judge Rosenstengel’s opinion is noteworthy for its fine attention to detail, detail that matters to the validity of the expert witness’s enterprise. Martin Wells set out to do a meta-analysis, which was all fine and good. With a universe of 36 studies, with sub-findings, alternative analyses, and changing definitions of relevant exposure, the devil lay in the details.

The MDL court was careful to point out that it was not gainsaying Wells’ decision to limit his meta-analysis to case-control studies, or to his grading of any particular study as being of low quality. Systematic reviews and meta-analyses are generally accepted techniques that are part of a scientific approach to causal inference, but each has standards, predicates, and requirements for valid use. Expert witnesses must not only use a reliable methodology, Rule 702(d) requires that they must reliably apply their chosen methodology to the facts at hand in reaching their conclusions.[16]

The MDL court concluded that Wells’ meta-analysis was not sufficiently reliable under Rule 702 because he failed faithfully and reliably to apply his own articulated methodology. The court followed Wells’ lead in identifying the source and content of his chosen methodology, and simply examined his proffered opinion for compliance with that methodology.[17] The basic principles of validity for conducting meta-analyses were not, in any event, really contested. These principles and requirements were clearly designed to ensure and enhance the reliability of meta-analyses by pre-empting results-driven, reverse-engineered summary estimates of association.

The court found that Wells failed clearly to pre-specify his eligibility criteria. He then proceeded to redefine exposure criteria and study inclusion or eligibility criteria, and study quality criteria, after looking at the evidence. He also inconsistently applied his stated criteria, all in an apparently desired effort to exclude less favorable study outcomes. These ad hoc steps were some of Wells’ deviations from the standards to which he played lip service.

The court did not exclude Wells because it disagreed with his substantive decisions to include or exclude any particular study, or his quality grading of any study. Rather, Dr. Wells’ meta-analysis does not pass muster under Rule 702 because its methodology was unclear, inconsistently applied, not replicable, and at times transparently reverse-engineered.[18]

The court’s evaluation of Wells was unflinchingly critical. Wells’ proffered opinions “required several methodological contortions and outright violations of the scientific standards he professed to apply.”[19] From his first involvement in this litigation, Wells had violated the basic rules of conducting systematic reviews and meta-analyses.[20] His definition of “occupational” exposure meandered to suit his desire to include one study (with low variance) that might otherwise have been excluded.[21] Rather than pre-specifying his review process, his study inclusion criteria, and his quality scores, Wells engaged in an unwritten “holistic” review process, which he conceded was not objectively replicable. Wells’ approach left him free to include studies he wanted in his meta-analysis, and then provide post hoc justifications.[22] His failure to identify his inclusion/exclusion criteria was a “methodological red flag” in Dr. Wells’ meta-analysis, which suggested his reverse engineering of the whole analysis, the “very antithesis of a systematic review.”[23]

In what the court described as “methodological shapeshifting,” Wells blatantly and inconsistently graded studies he wanted to include, and had already decided to include in his meta-analysis, to be of higher quality.[24] The paraquat MDL court found, unequivocally, that Wells had “failed to apply the same level of intellectual rigor to his work in the four trial selection cases that would be required of him and his peers in a non-litigation setting.”[25]

It was also not lost upon the MDL court that Wells had shifted from a fixed effect to a random effects meta-analysis, between his principal and rebuttal reports.[26] Basic to the meta-analytical enterprise is a predicate systematic review, properly done, with pre-specification of inclusion and exclusion criteria for what studies would go into any meta-analysis. The MDL court noted that both sides had cited Borenstein’s textbook on meta-analysis,[27] and that Wells had himself cited the Cochrane Handbook[28] for the basic proposition that that objective and scientifically valid study selection criteria should be clearly stated in advance to ensure the objectivity of the analysis.

There was of course legal authority for this basic proposition about prespecification. Given that the selection of studies that go into a systematic review and meta-analysis can be dispositive of its conclusion, undue subjectivity or ad hoc inclusion can easily arrange a desired outcome.[29] Furthermore, meta-analysis carries with it the opportunity to mislead a lay jury with a single (and inflated) risk ratio,[30] which is obtained by the operator’s manipulation of inclusion and exclusion criteria. This opportunity required the MDL court to examine the methodological rigor of the proffered meta-analysis carefully to evaluate whether it reflects a valid pooling of data or it was concocted to win a case.[31]

Martin Wells had previously acknowledged the dangers of manipulation and subjective selectivity inherent in systematic reviews and meta-analyses. The MDL court quoted from Wells’ testimony in Martin v. Actavis:

QUESTION: You would certainly agree that the inclusion-exclusion criteria should be based upon objective criteria and not simply because you were trying to get to a particular result?

WELLS: No, you shouldn’t load the – sort of cook the books.

QUESTION: You should have prespecified objective criteria in advance, correct?

WELLS: Yes.[32]

The MDL court also picked up on a subtle but important methodological point about which odds ratio to use in a meta-analysis when a study provides multiple analyses of the same association. In his first paraquat deposition, Wells cited the Cochrane Handbook, for the proposition that if a crude risk ratio and a risk ratio from a multivariate analysis are both presented in a given study, then the adjusted risk ratio (and its corresponding measure of standard error seen in its confidence interval) is generally preferable to reduce the play of confounding.[33] Wells violated this basic principle by ignoring the multivariate analysis in the study that dominated his meta-analysis (Liou) in favor of the unadjusted bivariate analysis. Given that Wells accepted this basic principle, the MDL court found that Wells likely selected the minimally adjusted odds ratio over the multiviariate adjusted odds ratio for inclusion in his meta-analysis in order to have the smaller variance (and thus greater weight) from the former. This maneuver was disqualifying under Rule 702.[34]

All in all, the paraquat MDL court’s Rule 702 ruling was a convincing demonstration that non-expert generalist judges, with assistance from subject-matter experts, treatises, and legal counsel, can evaluate and identify deviations from methodological standards of care.


[1] In re Paraquat Prods. Prods. Liab. Litig., Case No. 3:21-md-3004-NJR, MDL No. 3004, Slip op., ___ F.3d ___ (S.D. Ill. Apr. 17, 2024) [Slip op.]

[2] In re Paraquat Prods. Prods. Liab. Litig., Op. sur motion for judgment, Case No. 3:21-md-3004-NJR, MDL No. 3004 (S.D. Ill. Apr. 17, 2024). See also Brendan Pierson, “Judge rejects key expert in paraquat lawsuits, tosses first cases set for trial,” Reuters (Apr. 17, 2024); Hailey Konnath, “Trial-Ready Paraquat MDL Cases Tossed After Testimony Axed,” Law360 (Apr. 18, 2024).

[3] Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). SeeFerebee Revisited,” Tortini (Dec. 28, 1017).

[4] See 40 C.F.R. § 152.175.

[5] Slip op. at 31.

[6] 7 U.S.C. § 136w; 7 U.S.C. § 136a(a); 40 C.F.R. § 152.175. The agency must periodically review the registration of the herbicide. 7 U.S.C. § 136a(g)(1)(A). See Ruckelshaus v. Monsanto Co., 467 U.S. 986, 991-92 (1984).

[7] See Austin Wray & Aaron Niman, Memorandum, Paraquat Dichloride: Systematic review of the literature to evaluate the relationship between paraquat dichloride exposure and Parkinson’s disease at 35 (June 26, 2019).

[8] See also Jeffrey Brent and Tammi Schaeffer, “Systematic Review of Parkinsonian Syndromes in Short- and Long-Term Survivors of Paraquat Poisoning,” 53 J. Occup. & Envt’l Med. 1332 (2011) (“An analysis the world’s entire published experience found no connection between high-dose paraquat exposure in humans and the development of parkinsonism.”).

[9] Douglas L. Weed, “Does paraquat cause Parkinson’s disease? A review of reviews,” 86 Neurotoxicology 180, 180 (2021).

[10] In re Incretin-Based Therapies Prods. Liab. Litig., 524 F.Supp. 3d 1007, 1038, 1043 (S.D. Cal. 2021), aff’d, No. 21-55342, 2022 WL 898595 (9th Cir. Mar. 28, 2022) (per curiam). SeeMadigan’s Shenanigans and Wells Quelled in Incretin-Mimetic CasesTortini (July 15, 2022).

[11] The MDL court obviously worked hard to learn the basics principles of epidemiology. The court relied extensively upon the epidemiology chapter in the Reference Manual on Scientific Evidence. Much of that material is very helpful, but its exposition on statistical concepts is at times confused and erroneous. It is unfortunate that courts do not pay more attention to the more precise and accurate exposition in the chapter on statistics. Citing the epidemiology chapter, the MDL court gave an incorrect interpretation of the p-value: “A statistically significant result is one that is unlikely the product of chance. Slip op. at 17 n. 11. And then again, citing the Reference Manual, the court declared that “[a] p-value of .1 means that there is a 10% chance that values at least as large as the observed result could have been the product of random error. Id.” Id. Similarly, the MDL court gave an incorrect interpretation of the confidence interval. In a footnote, the court tells us that “[r]esearchers ordinarily assert a 95% confidence interval, meaning that ‘there is a 95% chance that the “true” odds ratio value falls within the confidence interval range’. In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., MDL No. 2342, 2015 WL 7776911, at *2 (E.D. Pa. Dec. 2, 2015).” Slip op. at 17n.12.  Citing another court for the definition of a statistical concept is a risky business.

[12] Slip op. at 20, citing Lisa A. Bero, “Evaluating Systematic Reviews and Meta-Analyses,” 14 J.L. & Pol’y 569, 570 (2006).

[13] Slip op. at 21, quoting Bero, at 575.

[14] Slip op. at 3.

[15] The nine studies at issue were as follows: (1) H.H. Liou, et al., “Environmental risk factors and Parkinson’s disease; A case-control study in Taiwan,” 48 Neurology 1583 (1997); (2) Caroline M. Tanner, et al.,Rotenone, Paraquat and Parkinson’s Disease,” 119 Envt’l Health Persps. 866 (2011) (a nested case-control study within the Agricultural Health Study (“AHS”)); (3) Clyde Hertzman, et al., “A Case-Control Study of Parkinson’s Disease in a Horticultural Region of British Columbia,” 9 Movement Disorders 69 (1994); (4) Anne-Maria Kuopio, et al., “Environmental Risk Factors in Parkinson’s Disease,” 14 Movement Disorders 928 (1999); (5) Katherine Rugbjerg, et al., “Pesticide exposure and risk of Parkinson’s disease – a population-based case-control study evaluating the potential for recall bias,” 37 Scandinavian J. of Work, Env’t & Health 427 (2011); (6) Jordan A. Firestone, et al., “Occupational Factors and Risk of Parkinson’s Disease: A Population-Based Case-Control Study,” 53 Am. J. of Indus. Med. 217 (2010); (7) Amanpreet S. Dhillon,“Pesticide / Environmental Exposures and Parkinson’s Disease in East Texas,” 13 J. of Agromedicine 37 (2008); (8) Marianne van der Mark, et al., “Occupational exposure to pesticides and endotoxin and Parkinson’s disease in the Netherlands,” 71 J. Occup. & Envt’l Med. 757 (2014); (9) Srishti Shrestha, et al., “Pesticide use and incident Parkinson’s disease in a cohort of farmers and their spouses,” Envt’l Research 191 (2020).

[16] Slip op. at 75.

[17] Slip op. at 73.

[18] Slip op. at 75, citing In re Mirena IUS Levonorgestrel-Related Prod. Liab. Litig. (No. II), 341 F. Supp. 3d 213, 241 (S.D.N.Y. 2018) (“Opinions that assume a conclusion and reverse-engineer a theory to fit that conclusion are . . . inadmissible.”) (internal citation omitted), aff’d, 982 F.3d 113 (2d Cir. 2020); In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., No. 12-md-2342, 2015 WL 7776911, at *16 (E.D. Pa. Dec. 2, 2015) (excluding expert’s opinion where he “failed to consistently apply the scientific methods he articulat[ed], . . . deviated from or downplayed certain well established principles of his field, and . . . inconsistently applied methods and standards to the data so as to support his a priori opinion.”), aff’d, 858 F.3d 787 (3d Cir. 2017).

[19] Slip op. at 35.

[20] Slip op. at 58.

[21] Slip op. at 55.

[22] Slip op. at 41, 64.

[23] Slip op. at 59-60, citing In re Lipitor (Atorvastatin Calcium) Mktg., Sales Pracs. & Prod. Liab. Litig., 892 F.3d 624, 634 (4th Cir. 2018) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

[24] Slip op. at 67, 69-70, citing In re Zoloft (Sertraline Hydrochloride) Prod. Liab. Litig., 858 F.3d 787, 795-97 (3d Cir. 2017) (“[I]f an expert applies certain techniques to a subset of the body of evidence and other techniques to another subset without explanation, this raises an inference of unreliable application of methodology.”); In re Bextra and Celebrex Mktg. Sales Pracs. & Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1179 (N.D. Cal. 2007) (excluding an expert witness’s causation opinion because of his result-oriented, inconsistent evaluation of data sources).

[25] Slip op. at 40.

[26] Slip op. at 61 n.44.

[27] Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Rothstein, Introduction to Meta-Analysis (2d ed. 2021).

[28] Jacqueline Chandler, James Thomas, Julian P. T. Higgins, Matthew J. Page, Miranda Cumpston, Tianjing Li, Vivian A. Welch, eds., Cochrane Handbook for Systematic Reviews of Interventions (2ed 2023).

[29] Slip op. at 56, citing In re Zimmer Nexgen Knee Implant Prod. Liab. Litig., No. 11 C 5468, 2015 WL 5050214, at *10 (N.D. Ill. Aug. 25, 2015).

[30] Slip op. at 22. The court noted that the Reference Manual on Scientific Evidence cautions that “[p]eople often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiological ones, may consequently be overlooked.” Id., quoting from Manual, at 608.

[31] Slip op. at 57, citing Deutsch v. Novartis Pharms. Corp., 768 F. Supp. 2d 420, 457-58 (E.D.N.Y. 2011) (“[T]here is a strong risk of prejudice if a Court permits testimony based on an unreliable meta-analysis because of the propensity for juries to latch on to the single number.”).

[32] Slip op. at 64, quoting from Notes of Testimony of Martin Wells, in In re Testosterone Replacement Therapy Prod. Liab. Litig., Nos. 1:14-cv-1748, 15-cv-4292, 15-cv-426, 2018 WL 7350886 (N.D. Ill. Apr. 2, 2018).

[33] Slip op. at 70.

[34] Slip op. at 71-72, citing People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (“[A] statistical study that fails to correct for salient explanatory variables . . . has no value as causal explanation and is therefore inadmissible in federal court.”); In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1140 (N.D. Cal. 2018). Slip op. at 17 n. 12.

How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

April 13th, 2024

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.

Access to a Study Protocol & Underlying Data Reveals a Nuclear Non-Proliferation Test

April 8th, 2024

The limits of peer review ultimately make it a poor proxy for the validity tests posed by Rules 702 and 703. Published peer review articles simply do not permit a very searching evaluation of the facts and data of a study. In the wake of the Daubert decision, expert witnesses quickly saw that they can obscure the search for validity by the reliance upon published studies, and frustrate the goals of judicial gatekeeping. As a practical matter, the burden shifts to the party that wishes to challenge the relied upon facts and data to learn more about the cited studies to show that the facts and data are not sufficient under Rule 702(b), and that the testimony is not the product of reliable methods under Rule 702(c). Obtaining study protocols, and in some instances, underlying data, are necessary for due process in the gatekeeping process. A couple of case studies may illustrate the power of looking under the hood of published studies, even ones that were peer reviewed.

When the Supreme Court decided the Daubert case in June 1993, two recent verdicts in silicone-gel breast implant cases were fresh in memory.[1] The verdicts were large by the standards of the time, and the evidence presented for the claims that silicone caused autoimmune disease was extremely weak. The verdicts set off a feeding frenzy, not only in the lawsuit industry, but also in the shady entrepreneurial world of supposed medical tests for “silicone sensitivity.”

The plaintiffs’ litigation theory lacked any meaningful epidemiologic support, and so there were fulsome presentations of putative, hypothetical mechanisms. One such mechanism involved the supposed in vivo degradation of silicone to silica (silicon dioxide), with silica then inducing an immunogenic reaction, which then, somehow, induced autoimmunity and the induction of autoimmune connective tissue disease. The degradation claim would ultimately prove baseless,[2] and the nuclear magnetic resonance evidence put forward to support degradation would turn out to be instrumental artifact and deception. The immunogenic mechanism had a few lines of potential support, with the most prominent at the time coming from the laboratories of Douglas Radford Shanklin, and his colleague, David L. Smalley, both of whom were testifying expert witnesses for claimants.

The Daubert decision held out some opportunity to challenge the admissibility of testimony that silicone implants led to either the production of a silicone-specific antibody, or the induction of t-cell mediated immunogenicity from silicone (or resulting silica) exposure. The initial tests of the newly articulated standard for admissibility of opinion testimony in silicone litigation did not go well.[3]  Peer review, which was absent in the re-analyses relied upon in the Bendectin litigation, was superficially present in the studies relied upon in the silicone litigation. The absence of supportive epidemiology was excused with hand waving that there was a “credible” mechanism, and that epidemiology took too long and was too expensive. Initially, post-Daubert, federal courts were quick to excuse the absence of epidemiology for a novel claim.

The initial Rule 702 challenges to plaintiffs’ expert witnesses thus focused on  immunogenicity as the putative mechanism, which if true, might lend some plausibility to their causal claim. Ultimately, plaintiffs’ expert witnesses would have to show that the mechanism was real by showing that silicone exposure causes autoimmune disease through epidemiologic studies,

One of the more persistent purveyors of a “test” for detecting alleged silicone sensitivity came from Smalley and Shanklin, then at the University of Tennessee. These authors exploited the fears of implant recipients and the greed of lawyers by marketing a “silicone sensitivity test (SILS).” For a price, Smalley and Shanklin would test mailed-in blood specimens sent directly by lawyers or by physicians, and provide ready-for-litigation reports that claimants had suffered an immune system response to silicone exposure. Starting in 1995, Smalley and Shanklin also cranked out a series of articles at supposedly peer reviewed journals, which purported to identify a specific immune response to crystalline silica in women who had silicone gel breast implants.[4] These studies had two obvious goals. First, the studies promoted their product to the “silicone sisters,” various support groups of claimants, as well as their lawyers, and a network of supporting rheumatologists and plastic surgeons. Second, by identifying a putative causal mechanism, Shanklin could add a meretricious patina of scientific validity to the claim that silicone breast implants cause autoimmune disease, which Shanklin, as a testifying expert witness, needed to survive Rule 702 challenges.

The plaintiffs’ strategy had been to paper over the huge analytical gaps in their causal theory with complicated, speculative research, which had been peer reviewed and published. Although the quality of the journals was often suspect, and the nature of the peer review obscure, the strategy had been initially successful in deflecting any meaningful scrutiny.

Many of the silicone cases were pending in a multi-district litigation, MDL 926, before Judge Sam Pointer, in the Northern District of Alabama. Judge Pointer, however, did not believe that ruling on expert witness admissibility was a function of an MDL court, and by 1995, he started to remand cases to the transferor courts, for those courts to do what they thought appropriate under Rules 702 and 703. Some of the first remanded cases went to the District of Oregon, where they landed in front of Judge Robert E. Jones. In early 1996, Judge Jones invited briefing on expert witness challenges, and in face of the complex immunology and toxicology issues, and the emerging epidemiologic studies, he decided to appoint four technical advisors to assist him in deciding the challenges.

The addition of scientific advisors to the gatekeeper’s bench made a huge difference in the sophistication and detail of the challenges that could be lodged to the relied-upon studies. In June 1996, Judge Jones entertained extensive hearings with viva voce testimony from both challenged witnesses and subject-matter experts on topics, such as immunology and nuclear magnetic resonance spectroscopy. Judge Jones invited final argument in the form of videotaped presentations from counsel so that the videotapes could be distributed to his technical advisors later in the summer. The contrived complexity of plaintiffs’ case dissipated, and the huge analytical gaps became visible. In December 1996, Judge Jones issued his decision that excluded the plaintiffs’ expert witnesses’ proposed testimony on grounds that it failed to satisfy the requirements of Rule 702.[5]

In October 1996, while Judge Jones was studying the record, and writing his opinion in the Hall case, Judge Weinstein, with a judge from the Southern District of New York, and another from New York state trial court, conducted a two-week Rule 702 hearing, in Brooklyn. Judge Weinstein announced at the outset that he had studied the record from the Hall case, and that he would incorporate it into his record for the cases remanded to the Southern and Eastern Districts of New York.

Curious gaps in the articles claiming silicone immunogenicity, and the lack of success in earlier Rule 702 challenges, motivated the defense to obtain the study protocols and underlying data from studies such as those published by Shanklin and Smalley. Shanklin and Smalley were frequently listed as expert witnesses in individual cases, but when requests or subpoenas for their protocols and raw data were filed, plaintiffs’ counsel stonewalled or withdrew them as witnesses. Eventually, the defense was able to enforce a subpoena and obtain the protocol and some data. The respondents claimed that the control data no longer existed, and inexplicably a good part of the experimental data had been destroyed. Enough was revealed, however, to see that the published articles were not what they claimed to be.[6]

In addition to litigation discovery, in March 1996, a surgeon published the results of his test of the Shanklin-Smalley silicone sensitivity test (“SILS”).[7] Dr. Leroy Young sent the Shanklin laboratory several blood samples from women with and without silicone implants. For six women who never had implants, Dr. Young submitted a fabricated medical history that included silicone implants and symptoms of “silicone-associated disease.” All six samples were reported back as “positive”; indeed, these results were more positive than the blood samples from the women who actually had silicone implants. Dr. Young suggested that perhaps the SILS test was akin to cold fusion.

By the time counsel assembled in Judge Weinstein’s courtroom, in October 1996, some epidemiologic studies had become available and much more information was available on the supposedly supportive mechanistic studies upon which plaintiffs’ expert witnesses had previously relied. Not too surprisingly, plaintiffs’ counsel chose not to call the entrepreneurial Dr. Shanklin, but instead called Donard S. Dwyer, a young, earnest immunologist who had done some contract work on an unrelated matter for Bristol-Myers Squibb, a defendant in the litigation.  Dr. Dwyer had filed an affidavit previously in the Oregon federal litigation, in which he gave blanket approval to the methods and conclusions of the Smalley-Shanklin research:

“Based on a thorough review of these extensive materials which are more than adequate to evaluate Dr. Smalley’s test methodology, I formed the following conclusions. First, the experimental protocols that were used are standard and acceptable methods for measuring T Cell proliferation. The results have been reproducible and consistent in this laboratory. Second, the conclusion that there are differences between patients with breast implants and normal controls with respect to the proliferative response to silicon dioxide appears to be justified from the data.”[8]

Dwyer maintained this position even after the defense obtaining the study protocol and underlying data, and various immunologists on the defense side filed scathing evaluatons of the Smalley-Shanklin work.  On direct examination at the hearings in Brooklyn, Dwyer vouched for the challenged t-cell studies, and opined that the work was peer reviewed and sufficiently reliable.[9]

The charade fell apart on cross-examination. Dwyer refused to endorse the studies that claimed to have found an anti-silicone antibody. Researchers at leading universities had attempted to reproduce the findings of such antibodies, without success.[10] The real controversy was over the claimed finding of silicone antigenicity as shown in t-cell or the cell-mediated specific immune response. On direct examination, plaintiffs’ counsel elicited Dwyer’s support for the soundness of the scientific studies that purported to establish such antigenicity, with little attention to the critiques that had been filed before the hearing.[11] Dwyer stuck to his unqualified support he had expressed previously in his affidavit for the Oregon cases.[12]

The problematic aspect of Dwyer’s direct examination testimony was that he had seen the protocol and the partial data produced by Smalley and Shanklin.[13] Dwyer, therefore, could not resist some basic facts about their work. First, the Shanklin data failed to support a dose-response relationship.[14] Second, the blood samples from women with silicone implants had been mailed to Smalley’s laboratory, whereas the control samples were collected locally. The disparity ensured that the silicone blood samples would be older than the controls, which was a departure from treating exposed and control samples in the same way.[15] Third, the experiment was done unblinded; the laboratory technical personnel and the investigators knew which blood samples were silicone exposed and which were controls (except for samples sent by Dr. Leroy Young).[16] Fourth, Shanklin’s laboratory procedures deviated from the standardized procedure set out in the National Institute of Health’s Current Protocols in Immunology.[17]

The SILS study protocol and the data produced by Shanklin and Smalley made clear that each sample was to be tested in triplicate for t-cell proliferation in response to silica, to a positive control mitogen (Con A), and to a negative control blank. The published papers all claimed that the each sample was tested in triplicate for each of these three response situations (silica, mitogen, and nothing).[18] Shanklin and Smalley described their t-cell proliferation studies, in their published papers, as having been done in triplicate. These statements were, however, untrue and never corrected.[19]

The study protocol called for the tests to be run in triplicate, but they instructed the laboratory that two counts may be used if one count does not match the other counts, which is to be decided by a technical specialist on a “case-by-case” basis. Of data that was supposed to be reported in triplicate, fully one third had only two data points, and 10 percent had but one data point.[20] No criteria were provided to the technical specialist for deciding which data to discard.[21] Not only had Shanklin excluded data, but he discarded and destroyed the data such that no one could go back and assess whether the data should have been excluded.[22]

Dwyer agreed that this exclusion and discarding of data was not at all a good method.[23] Dwyer proclaimed that he had not come to Brooklyn to defend this aspect of the Shanklin work, and that it was not defensible at all. Dwyer conceded that “the interpretation of the data and collection of the data are flawed.”[24] Dwyer tried to stake out a position that was incoherent by asserting that there was “nothing inherently wrong with the method,” while conceding that discarding data was problematic.[25] The judges presiding over the hearing could readily see that the Shanklin research was bent.

At this point, the lead plaintiffs’ counsel, Michael Williams, sought an off-ramp. He jumped to his feet and exclaimed “I’m informed that no witness in this case will rely on Dr. Smalley’s [and Shanklin’s] work in any respect.” [26] Judge Weinstein’s eyes lit up with the prospect that the Smalley-Shanklin work, by agreement, would never be mentioned again in New York state or federal cases. Given how central the claim of silicone antigenicity was to plaintiffs’ cases, the defense resisted the stipulation about research that they would continue to face in other state and federal courts. The defense was saved, however, by the obstinence of a lawyer from the Weitz & Luxenberg firm, who rose to report that her firm intended to call Drs. Shanklin and Smalley as witnesses, and that they would not stipulate to the exclusion of their work. Judge Weinstein rolled his eyes, and waved me to continue.[27] The proliferation of the t-cell test was over. The hearing before Judges Weinstein and Baer, and Justice Lobis, continued for several more days, with several other dramatic moments.[28]

In short order, on October 23, 1996, Judge Weinstein issued a short, published opinion, in which he granted partial summary judgment on the claims of systemic disease for all cases pending in federal court in New York.[29] What was curious was that the defendants had not moved for summary judgment. There were, of course, pending motions to exclude plaintiffs’ expert witnesses, but Judge Weinstein effectively ducked those motions, and let it be known that he was never a fan of Rule 702. It would be many years later, before Judge Weinstein allowed his judicial assessment see the light of day. Two decades and some years later, in a law review article, Judge Weinstein gave his judgment that

“[t]he breast implant litigation was largely based on a litigation fraud. …  Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”[30]

Judge Weinstein’s opinion was truly a judgment from which there can be no appeal. Shanklin and Smalley continued to publish papers for another decade. None of the published articles by Shanklin and others have been retracted.


[1] Reuters, “Record $25 Million Awarded In Silicone-Gel Implants Case,” N.Y. Times at A13 (Dec. 24, 1992) (describing the verdict returned in Harris County, Texas, in Johnson v. Medical Engineering Corp.); Associated Press, “Woman Wins Implant Suit,” N.Y. Times at A16 (Dec. 17, 1991) (reporting a verdict in Hopkins v. Dow Corning, for $840,000 in compensatory and $6.5 million in punitive damages); see Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment with minimal attention to Rule 702 issues).

[2] William E. Hull, “A Critical Review of MR Studies Concerning Silicone Breast Implants,” 42 Magnetic Resonance in Medicine 984, 984 (1999) (“From my viewpoint as an analytical spectroscopist, the result of this exercise was disturbing and disappointing. In my judgement as a referee, none of the Garrido group’s papers (1–6) should have been published in their current form.”). See also N.A. Schachtman, “Silicone Data – Slippery & Hard to Find, Part 2,” Tortini (July 5, 2015). Many of the material science claims in the breast implant litigation were as fraudulent as the health effects claims. See, e.g., John Donley, “Examining the Expert,” 49 Litigation 26 (Spring 2023) (discussing his encounters with frequent testifier Pierre Blais, in silicone litigation).

[3] See, e.g., Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment for plaintiff over Rule 702 challenges), cert. denied, 115 S.Ct. 734 (1995). See Donald A. Lawson, “Note, Hopkins v. Dow Corning Corporation: Silicone and Science,” 37 Jurimetrics J. 53 (1996) (concluding that Hopkins was wrongly decided).

[4] See David L. Smalley, Douglas R. Shanklin, Mary F. Hall, and Michael V. Stevens, “Detection of Lymphocyte Stimulation by Silicon Dioxide,” 4 Internat’l J. Occup. Med. & Toxicol. 63 (1995); David L. Smalley, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, and Aram Hanissian, “Immunologic stimulation of T lymphocytes by silica after use of silicone mammary implants,” 9 FASEB J. 424 (1995); David L. Smalley, J. J. Levine, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, “Lymphocyte response to silica among offspring of silicone breast implant recipients,” 196 Immunobiology 567 (1996); David L. Smalley, Douglas R. Shanklin, “T-cell-specific response to silicone gel,” 98 Plastic Reconstr. Surg. 915 (1996); and Douglas R. Shanklin, David L. Smalley, Mary F. Hall, Michael V. Stevens, “T cell-mediated immune response to silica in silicone breast implant patients,” 210 Curr. Topics Microbiol. Immunol. 227 (1996). Shanklin was also no stranger to making his case in the popular media. See, e.g., Douglas Shanklin, “More Research Needed on Breast Implants,” Kitsap Sun at 2 (Aug. 29, 1995) (“Widespread silicone sickness is very real in women with past and continuing exposure to silicone breast implants.”) (writing for Scripps Howard News Service). Even after the Shanklin studies were discredited in court, Shanklin and his colleagues continued to publish their claims that silicone implants led to silica antigenicity. David L. Smalley, Douglas R. Shanklin, and Mary F. Hall, “Monocyte-dependent stimulation of human T cells by silicon dioxide,” 66 Pathobiology 302 (1998); Douglas R. Shanklin and David L. Smalley, “The immunopathology of siliconosis. History, clinical presentation, and relation to silicosis and the chemistry of silicon and silicone,” 18 Immunol. Res. 125 (1998); Douglas Radford Shanklin, David L. Smalley, “Pathogenetic and diagnostic aspects of siliconosis,” 17 Rev. Environ Health 85 (2002), and “Erratum,” 17 Rev Environ Health. 248 (2002); Douglas Radford Shanklin & David L Smalley, “Kinetics of T lymphocyte responses to persistent antigens,” 80 Exp. Mol. Pathol. 26 (2006). Douglas Shanklin died in 2013. Susan J. Ainsworth, “Douglas R. Shanklin,” 92 Chem. & Eng’g News (April 7, 2014). Dr. Smalley appears to be still alive. In 2022, he sued the federal government to challenge his disqualification from serving as a laboratory director of any clinical directory in the United States, under 42 U.S.C. § 263a(k). He lost. Smalley v. Becerra, Case No. 4:22CV399 HEA (E.D. Mo. July 6, 2022).

[5] Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Ore. 1996); see Joseph Sanders & David H. Kaye, “Expert Advice on Silicone Implants: Hall v. Baxter Healthcare Corp., 37 Jurimetrics J. 113 (1997); Laurens Walker & John Monahan, “Scientific Authority: The Breast Implant Litigation and Beyond,” 86 Virginia L. Rev. 801 (2000); Jane F. Thorpe, Alvina M. Oelhafen, and Michael B. Arnold, “Court-Appointed Experts and Technical Advisors,” 26 Litigation 31 (Summer 2000); Laural L. Hooper, Joe S. Cecil & Thomas E. Willging, “Assessing Causation in Breast Implant Litigation: The Role of Science Panels,” 64 Law & Contemp. Problems 139 (2001); Debra L. Worthington, Merrie Jo Stallard, Joseph M. Price & Peter J. Goss, “Hindsight Bias, Daubert, and the Silicone Breast Implant Litigation: Making the Case for Court-Appointed Experts in Complex Medical and Scientific Litigation,” 8 Psychology, Public Policy &  Law 154 (2002).

[6] Judge Jones’ technical advisor on immunology reported that the studies offered in support of the alleged connection between silicone implantation and silicone-specific T cell responses, including the published papers by Shanklin and Smalley, “have a number of methodological shortcomings and thus should not form the basis of such an opinion.” Mary Stenzel-Poore, “Silicone Breast Implant Cases–Analysis of Scientific Reasoning and Methodology Regarding Immunological Studies” (Sept. 9, 1996). This judgment was seconded, over three years later, in the proceedings before MDL 926 and its Rule 706 court-appointed immunology expert witness. See Report of Dr. Betty A. Diamond, in MDL 926, at 14-15 (Nov. 30, 1998). Other expert witnesses who published studies on the supposed immunogenicity of silicone came up with some creative excuses to avoid producing their underlying data. Eric Gershwin consistently testified that his data were with a co-author in Israel, and that he could not produce them. N.A. Schachtman, “Silicone Data – Slippery and Hard to Find, Part I,” Tortini (July 4, 2015). Nonetheless, the court-appointed technical advisors were highly critical of Dr. Gershwin’s results. Dr. Stenzel-Poore, the immunologist on Judge Jones’ panel of advisors, found Gershwin’s claims “not well substantiated.” Hall v. Baxter Healthcare Corp., 947 F.Supp. 1387 (D. Ore. 1996). Similarly, Judge Pointer’s appointed expert immunologist Dr. Betty A. Diamond, was unshakeable in her criticisms of Gershwin’s work and his conclusions. Testimony of Dr. Betty A. Diamond, in MDL 926 (April 23, 1999). And the Institute of Medicine committee, charged with reviewing the silicone claims, found Gershwin’s work inadequate and insufficient to justify the extravagent claims that plaintiffs were making for immunogenicity and for causation of autoimmune disease. Stuart Bondurant, Virginia Ernster, and Roger Herdman, eds., Safety of Silicone Breast Implants 256 (1999). Another testifying expert witness who relied upon his own data, Nir Kossovsky, resorted to a seismic excuse; he claimed that the Northridge Quake destroyed his data. N.A. Schachtman, “Earthquake Induced Data Loss – We’re All Shook Up,” Tortini (June 26, 2015); Kossovsky, along with his wife, Beth Brandegee, and his father, Ram Kossowsky, sought to commercialize an ELISA-based silicone “antibody” biomarker diagnostic test, Detecsil. Although the early Rule 702 decisions declined to take a hard at Kossovsky’s study, the U.S. Food and Drug Administration eventually shut down the Kossovsky Detecsil test. Lillian J. Gill, FDA Acting Director, Office of Compliance, Letter to Beth S. Brandegee, President, Structured Biologicals (SBI) Laboratories: Detecsil Silicone Sensitivity Test (July 15, 1994); see Gary Taubes, “Silicone in the System: Has Nir Kossovsky really shown anything about the dangers of breast implants?” Discover Magazine (Dec. 1995).

[7] Leroy Young, “Testing the Test: An Analysis of the Reliability of the Silicone Sensitivity Test (SILS) in Detecting Immune-Mediated Responses to Silicone Breast Implants,” 97 Plastic & Reconstr. Surg. 681 (1996).

[8] Affid. of Donard S. Dwyer, at para. 6 (Dec. 1, 1995), filed in In re Breast Implant Litig. Pending in U.S. D. Ct, D. Oregon (Groups 1,2, and 3).

[9] Notes of Testimony of Dr. Donnard Dwyer, Nyitray v. Baxter Healthcare Corp., CV 93-159 (E. & S.D.N.Y and N.Y. Sup. Ct., N.Y. Cty. Oct. 8, 9, 1996) (Weinstein, J., Baer, J., Lobis, J., Pollak, M.J.).

[10] Id. at N.T. 238-239 (Oct. 8, 1996).

[11] Id. at N.T. 240.

[12] Id. at N.T. 241-42.

[13] Id. at N.T. 243-44; 255:22-256:3.

[14] Id. at 244-45.

[15] Id. at N.T. 259.

[16] Id. at N.T. 258:20-22.

[17] Id. at N.T. 254.

[18] Id. at N.T. 252:16-254.

[19] Id. at N.T. 254:19-255:2.

[20] Id. at N.T. 269:18-269:14.

[21] Id. at N.T. 261:23-262:1.

[22] Id. at N.T. 269:18-270.

[23] Id. atN.T. 256:3-16.

[24] Id. at N.T. 262:15-17

[25] Id. at N.T. 247:3-5.

[26] Id. at N.T. at 260:2-3

[27] Id. at N.T. at 261:5-8.

[28] One of the more interesting and colorful moments came when the late James Conlon cross-examined plaintiffs’ pathology expert witness, Saul Puszkin, about questionable aspects of his curriculum vitae. The examination was revealed such questionable conduct that Judge Weinstein stopped the examination and directed Dr. Puszkin not to continue without legal counsel of his own.

[29] In re Breast Implant Cases, 942 F. Supp. 958 (E.& S.D.N.Y. 1996). The opinion did not specifically address the Rule 702 and 703 issues that were the subject of pending motions before the court.

[30] Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (emphasis added).

Peer Review, Protocols, and QRPs

April 3rd, 2024

In Daubert, the Supreme Court decided a legal question about the proper interpretation of a statute, Rule 702, and then remanded the case to the Ninth Circuit of the Court of Appeals for further proceedings. The Court did, however, weigh in with dicta about some several considerations in admissibility decisions.  In particular, the Court identified four non-dispositive factors: whether the challenged opinion has been empirically tested, published and peer reviewed, and whether the underlying scientific technique or method supporting the opinion has an acceptable rate of error, and has gained general acceptance.[1]

The context in which peer review was discussed in Daubert is of some importance to our understanding its holding peer review out as a consideraton. One of the bases for the defense challenges to some of the plaintiffs’ expert witnesses’ opinions in Daubert was their reliance upon re-analyses of published studies to suggest that there was indeed an increased risk of birth defects if only the publication authors had used some other control group, or taken some other analytical approach. Re-analyses can be important, but these reanalyses of published Bendectin studies were post hoc, litigation driven, and obviously result oriented. The Court’s discussion of peer review reveals that it was not simply creating a box to be checked before a trial court could admit an expert witness’s opinions. Peer review was suggested as a consideration because:

“submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[2]

Peer review, or the lack thereof, for the challenged expert witnesses’ re-analyses was called out because it raised suspicions of lack of validity. Nothing in Daubert, or in later decisions, or more importantly in Rule 702 itself, supports admitting expert witness testimony just because the witness relied upon peer-reviewed studies, especially when the studies are invalid or are based upon questionable research practices. The Court was careful to point out that peer-reviewed publication was “not a sine qua non of admissibility; it does not necessarily correlate with reliability, … .”[3] The Court thus showed that it was well aware that well-ground (and thus admissible) opinions may not have been previously published, and that the existence of peer review was simply a potential aid in answering the essential question, whether the proponent of a proffered opinion has shown “the scientific validity of a particular technique or methodology on which an opinion is premised.[4]

Since 1993, much has changed in the world of bio-science publishing. The wild proliferation of journals, including predatory and “pay-to-play” journals, has disabused most observers that peer review provides evidence of validity of methods. Along with the exponential growth in publications has come an exponential growth in expressions of concern and out-right retractions of articles, as chronicled and detailed at Retraction Watch.[5] Some journals encourage authors to nominate the peer reviewers for their manuscripts; some journals let authors block some scientists as peer reviewers of their submitted manuscripts. If the Supreme Court were writing today, it might well note that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena.

Since the Supreme Court decided Daubert, the Federal Judicial Center and National Academies of Science have provided a Reference Manual for Scientific Evidence, now in its third edition, and with a fourth edition on the horizon, to assist judges and lawyers involved in the litigation of scientific issues. Professor Goodstein, in his chapter “How Science Works,” in the third edition, provides the most extensive discussion of peer review in the Manual, and emphasizes that peer review “works very poorly in catching cheating or fraud.”[6]  Goodstein invokes his own experience as a peer reviewer to note that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.”[7] Indeed, Goodstein’s essay in the Reference Manual characterizes the ability of peer review to warrant study validity as a “myth”:

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. … It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.[8]

Goodstein’s experience as a peer reviewer is hardly idiosyncratic. One standard text on the ethical conduct of research reports that peer review is often ineffective or incompetent, and that it may not even catch simple statistical or methodological errors.[9] According to the authors, Shamoo and Resnik:

“[p]eer review is not good at detecting data fabrication or falsification partly because reviewers usually do not have access to the material they would need to detect fraud, such as the original data, protocols, and standard operating procedures.”[10]

Indeed, without access to protocols, statistical analysis plans, and original data, peer review often cannot identify good faith or negligent deviations from the standard of scientific care. There is some evidence to support this negative assessment of peer review from testing of the counter-factual. Reviewers were able to detect questionable, selective reporting when they had access to the study authors’ research protocols.[11]

Study Protocol

The study protocol provides the scientific rationale for a study, clearly defines the research question, the data collection process, defines the key exposure and outcomes, and describes the methods to be applied, before commencing data collection.[12] The protocol also typically pre-specifies the statistical data analysis. The epidemiology chapter of the current edition of the Reference Manual for Scientific Evidence offers blandly only that epidemiologists attempt to minimize bias in observational studies with “data collection protocols.”[13] Epidemiologists and statisticians are much clearer in emphasizing the importance, indeed the necessity, of having a study protocol before commencing data collection. Back in 1988, John Bailar and Frederick Mosteller explained that it was critical in reporting statistical analyses to inform readers about how and when the authors devised the study design, and whether they set the design criteria out in writing before they began to collect data.[14]

The necessity of a study protocol is “self-evident,”[15] and essential to research integrity.[16] The International Society of Pharmacoepidemiology has issued Guidelines for “Good Pharmacoepidemiology Practices,”[17] which calls for every study to have a written protocol. Among the requirements set out in this set of guidelines are descriptions of the research method, study design, operational definitions of exposure and outcome variables, and projected study sample size. The Guidelines provide that a detailed statistical analysis plan may be specified after data collection begins, but before any analysis commences.

Expert witness opinions on health effects are built upon studies, and so it behooves legal counsel to identify the methodological strengths and weaknesses of key studies through questioning whether they have protocols, whether the protocols were methodologically appropriate, and whether the researchers faithfully followed their protocols and their statistical analysis plans. Determining the peer review status of a publication, on the other hand, will often not advance a challenge based upon improvident methodology.

In some instances, a published study will have sufficiently detailed descriptions of methods and data that readers, even lawyers, can evaluate their scientific validity or reliability (vel non). In some cases, however, readers will be no better off than the peer reviewers who were deprived of access to protocols, statistical analysis plans, and original data. When a particular study is crucial support for an adversary’s expert witness, a reasonable litigation goal may well be to obtain the protocol and statistical analysis plan, and if need be, the original underlying data. The decision to undertake such discovery is difficult. Discovery of non-party scientists can be expensive and protracted; it will almost certainly be contentious. When expert witnesses rely upon one or a few studies, which telegraph internal validity, this litigation strategy may provide the strongest evidence against the study’s being reasonably relied upon, or its providing “sufficient facts and data” to support an admissible expert witness opinion.


[1] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-594 (1993).

[2] Id. at 594 (internal citations omitted) (emphasis added).

[3] Id.

[4] Id. at 593-94.

[5] Retraction Watch, at https://retractionwatch.com/.

[6] Reference Manual on Scientific Evidence at 37, 44-45 (3rd ed. 2011) [Manual].

[7] Id. at 44-45 n.11.

[8] Id. at 48 (emphasis added).

[9] Adil E. Shamoo and David B. Resnik, Responsible Conduct of Research 133 (4th ed. 2022).

[10] Id.

[11] An-Wen Chan, Asbjørn Hróbjartsson, Mette T. Haahr, Peter C. Gøtzsche, and David G. Altman, D. G. “Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles,” 291 J. Am. Med. Ass’n 2457 (2004).

[12] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[13] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 573 (3rd ed. 2011) 573 (“Study designs are developed before they begin gathering data.”).

[14] John Bailar & Frederick Mosteller, “Guidelines for Statistical Reporting in Articles for Medical Journals,” 108 Ann. Intern. Med. 2266, 268 (1988).

[15] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[16] Sandra Alba, et al., “Bridging research integrity and global health epidemiology statement: guidelines for good epidemiological practice,” 5 BMJ Global Health e003236, at p.3 & passim (2020).

[17] See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at <https://www.pharmacoepi.org/resources/policies/guidelines-08027/>.

QRPs in Science and in Court

April 2nd, 2024

Lay juries usually function well in assessing the relevance of an expert witness’s credentials, experience, command of the facts, likeability, physical demeanor, confidence, and ability to communicate. Lay juries can understand and respond to arguments about personal bias, which no doubt is why trial lawyers spend so much time and effort to emphasize the size of fees and consulting income, and the propensity to testify only for one side. For procedural and practical reasons, however, lay juries do not function very well in assessing the actual merits of scientific controversies. And with respect to methodological issues that underlie the merits, juries barely function at all. The legal system imposes no educational or experiential qualifications for jurors, and trials are hardly the occasion to teach jurors the methodology, skills, and information needed to resolve methodological issues that underlie a scientific dispute.

Scientific studies, reviews, and meta-analyses are virtually never directly admissible in evidence in courtrooms in the United States. As a result, juries do not have the opportunity to read and ponder the merits of these sources, and assess their strengths and weaknesses. The working assumption of our courts is that juries are not qualified to engage directly with the primary sources of scientific evidence, and so expert witnesses are called upon to deliver opinions based upon a scientific record not directly in evidence. In the litigation of scientific disputes, our courts thus rely upon the testimony of so-called expert witnesses in the form of opinions. Not only must juries, the usual trier of fact in our courts, assess the credibility of expert witnesses, but they must assess whether expert witnesses are accurately describing studies that they cannot read in their entirety.

The convoluted path by which science enters the courtroom supports the liberal and robust gatekeeping process outlined under Rules 702 and 703 of the Federal Rules of Evidence. The court, not the jury, must make a preliminary determination, under Rule 104, that the facts and data of a study are reasonably relied upon by an expert witness (Rule 703). And the court, not the jury, again under Rule 104, must determine that expert witnesses possess appropriate qualifications for relevant expertise, and that these witnesses have proffered opinions sufficiently supported by facts or data, based upon reliable principles and methods, and reliably applied to the facts of the case. (Rule 702). There is no constitutional right to bamboozle juries with inconclusive, biased, and confounded or crummy studies, or selective and incomplete assessments of the available facts and data. Back in the days of “easy admissibility,” opinions could be tested on cross-examination, but limited time and acumen of counsel, court, and juries cry out for meaningful scientific due process along the lines set out in Rules 702 and 703.

The evolutionary development of Rules 702 and 703 has promoted a salutary convergence between science and law. According to one historical overview of systematic reviews in science, the foundational period for such reviews (1970-1989) overlaps with the enactment of Rules 702 and 703, and the institutionalization of such reviews (1990-2000) coincides with the development of these Rules in a way that introduced some methodological rigor into scientific opinions that are admitted into evidence.[1]

The convergence between legal admissibility and scientific validity considerations has had the further result that scientific concerns over the quality and sufficiency of underlying data, over the validity of study design, analysis, reporting, and interpretation, and over the adequacy and validity of data synthesis, interpretation, and conclusions have become integral to the gatekeeping process. This convergence has the welcome potential to keep legal judgments more in line with best scientific evidence and practice.

The science-law convergence also means that courts must be apprised of, and take seriously, the problems of study reproducibility, and more broadly, the problems raised by questionable research practices (QRPs), or what might be called the patho-epistemology of science. The development, in the 1970s, and the subsequent evolution, of the systematic review represented the scientific community’s rejection of the old-school narrative reviews that selected a few of all studies to support a pre-existing conclusion. Similarly, the scientific community’s embarrassment, in the 1980s and 1990s, over the irreproducibility of study results, has in this century grown into an existential crisis over study reproducibility in the biomedical sciences.

In 2005, John Ioannidis published an article that brought the concern over “reproducibility” of scientific findings in bio-medicine to an ebullient boil.[2] Ioannidis pointed to several factors, which alone or in combination rendered most published medical findings likely false. Among the publication practices responsible for this unacceptably high error rate, Ioannidis identified the use of small sample sizes, data-dredging and p-hacking techniques, poor or inadequate statistical analysis, in the context of undue flexibility in research design, conflicts of interest, motivated reasoning, fads, and prejudices, and pressure to publish “positive” results.  The results, often with small putative effect sizes, across an inadequate number of studies, are then hyped by lay and technical media, as well as the public relations offices of universities and advocacy groups, only to be further misused by advocates, and further distorted to serve the goals of policy wonks. Social media then reduces all the nuances of a scientific study to an insipid meme.

Ioannidis’ critique resonated with lawyers. We who practice in health effects litigation are no strangers to dubious research methods, lack of accountability, herd-like behavior, and a culture of generating positive results, often out of political or economic sympathies. Although we must prepare for confronting dodgy methods in front of jury, asking for scientific due process that intervenes and decides the methodological issues with well-reasoned, written opinions in advance of trial does not seem like too much.

The sense that we are awash in false-positive studies was heightened by subsequent papers. In 2011, Uri Simonsohn and others showed that by using simulations of various combinations of QRPs in psychological science, researchers could attain a 61% false-positive rate for research outcomes.[3] The following year saw scientists at Amgen attempt replication of 53 important studies in hematology and oncology. They succeeded in replicated only six.[4] Also in 2012, Dr. Janet Woodcock, director of the Center for Drug Evaluation and Research at the Food and Drug Administration, “estimated that as much as 75 per cent of published biomarker associations are not replicable.”[5] In 2016, the journal Nature reported that over 70% of scientists who responded to a survey had unsuccessfully attempted to replicate another scientist’s experiments, and more than half failed to replicate their own work.[6] Of the respondents, 90% agreed that there was a replication problem. A majority of the 90% believed that the problem was significant.

The scientific community reacted to the perceived replication crisis in a variety of ways, from conceptual clarification of the very notion of reproducibility,[7] to identification of improper uses and interpretations of key statistical concepts,[8] to guidelines for improved conduct and reporting of studies.[9]

Entire books dedicated to identifying the sources of, and the correctives for, undue researcher flexibility in the design, conduct, and analysis of studies, have been published.[10] In some ways, the Rule 702 and 703 case law is like the collected works of the Berenstain Bears, on how not to do studies.

The consequences of the replication crisis are real and serious. Badly conducted and interpreted science leads to research wastage,[11] loss of confidence in scientific expertise,[12] contemptible legal judgments, and distortion of public policy.

The proposed correctives to QRPs deserve the careful study of lawyers and judges who have a role in health effects litigation.[13] Whether as the proponent of an expert witness, or the challenger, several of the recurrent proposals, such as the call for greater data sharing and pre-registration of protocols and statistical analysis plans,[14] have real-world litigation salience. In many instances, they can and should direct lawyers’ efforts at discovery and challenging of the relied upon scientific studies in litigation.


[1] Quan Nha Hong & Pierre Pluye, “Systematic Reviews: A Brief Historical Overview,” 34 Education for Information 261 (2018); Mike Clarke & Iain Chalmers, “Reflections on the history of systematic reviews,” 23 BMJ Evidence-Based Medicine 122 (2018); Cynthia Farquhar & Jane Marjoribanks, “A short history of systematic reviews,” 126 Brit. J. Obstetrics & Gynaecology 961 (2019); Edward Purssell & Niall McCrae, “A Brief History of the Systematic Review,” chap. 2, in Edward Purssell & Niall McCrae, How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students 5 (2020).

[2] John P. A. Ioannidis “Why Most Published Research Findings Are False,” 1 PLoS Med 8 (2005).

[3] Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: UndisclosedFlexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” 22 Psychological Sci. 1359 (2011).

[4] C. Glenn Begley and Lee M. Ellis, “Drug development: Raise standards for preclinical cancer research,” 483 Nature 531 (2012).

[5] Edward R. Dougherty, “Biomarker Development: Prudence, risk, and reproducibility,” 34 Bioessays 277, 279 (2012); Turna Ray, “FDA’s Woodcock says personalized drug development entering ‘long slog’ phase,” Pharmacogenomics Reporter (Oct. 26, 2011).

[6] Monya Baker, “Is there a reproducibility crisis,” 533 Nature 452 (2016).

[7] Steven N. Goodman, Daniele Fanelli, and John P. A. Ioannidis, “What does research reproducibility mean?,” 8 Science Translational Medicine 341 (2016); Felipe Romero, “Philosophy of science and the replicability crisis,” 14 Philosophy Compass e12633 (2019); Fiona Fidler & John Wilcox, “Reproducibility of Scientific Results,” Stanford Encyclopedia of Philosophy (2018), available at https://plato.stanford.edu/entries/scientific-reproducibility/.

[8] Andrew Gelman and Eric Loken, “The Statistical Crisis in Science,” 102 Am. Scientist 460 (2014); Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021).

[9] The International Society for Pharmacoepidemiology issued its first Guidelines for Good Pharmacoepidemiology Practices in 1996. The most recent revision, the third, was issued in June 2015. See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at https://www.pharmacoepi.org/resources/policies/guidelines-08027/. See also Erik von Elm, Douglas G. Altman, Matthias Egger, Stuart J. Pocock, Peter C. Gøtzsche, and Jan P. Vandenbroucke, for the STROBE Initiative, “The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement Guidelines for Reporting Observational Studies,” 18 Epidem. 800 (2007); Jan P. Vandenbroucke, Erik von Elm, Douglas G. Altman, Peter C. Gøtzsche, Cynthia D. Mulrow, Stuart J. Pocock, Charles Poole, James J. Schlesselman, and Matthias Egger, for the STROBE initiative, “Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration,” 147 Ann. Intern. Med. W-163 (2007); Shah Ebrahim & Mike Clarke, “STROBE: new standards for reporting observational epidemiology, a chance to improve,” 36 Internat’l J. Epidem. 946 (2007); Matthias Egger, Douglas G. Altman, and Jan P Vandenbroucke of the STROBE group, “Commentary: Strengthening the reporting of observational epidemiology—the STROBE statement,” 36 Internat’l J. Epidem. 948 (2007).

[10] See, e.g., Lee J. Jussim, Jon A. Krosnick, and Sean T. Stevens, eds., Research Integrity: Best Practices for the Social and Behavioral Sciences (2022); Joel Faintuch & Salomão Faintuch, eds., Integrity of Scientific Research: Fraud, Misconduct and Fake News in the Academic, Medical and Social Environment (2022); William O’Donohue, Akihiko Masuda & Scott Lilienfeld, eds., Avoiding Questionable Research Practices in Applied Psychology (2022); Klaas Sijtsma, Never Waste a Good Crisis: Lessons Learned from Data Fraud and Questionable Research Practices (2023).

[11] See, e.g., Iain Chalmers, Michael B Bracken, Ben Djulbegovic, Silvio Garattini, Jonathan Grant, A Metin Gülmezoglu, David W Howells, John P A Ioannidis, and Sandy Oliver, “How to increase value and reduce waste when research priorities are set,” 383 Lancet 156 (2014); John P A Ioannidis, Sander Greenland, Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, and Robert Tibshirani, “Increasing value and reducing waste in research design, conduct, and analysis,” 383 Lancet 166 (2014).

[12] See, e.g., Friederike Hendriks, Dorothe Kienhues, and Rainer Bromme, “Replication crisis = trust crisis? The effect of successful vs failed replications on laypeople’s trust in researchers and research,” 29 Public Understanding Sci. 270 (2020).

[13] R. Barker Bausell, The Problem with Science: The Reproducibility Crisis and What to Do About It (2021).

[14] See, e.g., Brian A. Noseka, Charles R. Ebersole, Alexander C. DeHavena, and David T. Mellora, “The preregistration revolution,” 115 Proc. Nat’l Acad. Soc. 2600 (2018); Michael B. Bracken, “Preregistration of Epidemiology Protocols: A Commentary in Support,” 22 Epidemiology 135 (2011); Timothy L. Lash & Jan P. Vandenbroucke, “Should Preregistration of Epidemiologic Study Protocols Become Compulsory? Reflections and a Counterproposal,” 23 Epidemiology 184 (2012).

Nullius in verba

March 29th, 2024

The 1975 codification of the law of evidence, in the Federal Rules of Evidence, introduced a subtle, aspirational criterion for expert witness opinion – knowledge. As originally enacted, Rule 702 read:

“If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.”[1]

In case anyone missed the point, the Advisory Committee Note for the original Rule 702 emphasized that the standard was an epistemic standard:

“An intelligent evaluation of facts is often difficult or impossible without the application of some scientific, technical, or other specialized knowledge. The most common source of this knowledge is the expert witness, although there are other techniques for supplying it.”[2]

Perhaps we should not be too surprised that the epistemic standard was missed by most judges, and even by most lawyers. For a very long time, the common law set out a minimal test for expert witness opinion testimony. The expert witness had to be qualified by training, experience, or education, and the opinion proffered had to be logically and legally relevant to the issues in the case.[3] The enactment of Rule 702, in 1975, barely made a dent in the regime of easy admissibility.

Before the Federal Rules of Evidence, there was, of course, the famous Frye case, which involved an appeal from the excluded expert witness opinion based upon William Marston’s polygraph machine. In 1923, the court in Frye affirmed the exclusion of the expert witness opinion, based upon the lack of general acceptance of the device’s reliability, with its famous twilight zone language:[4]

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”

With the explosion of tort litigation fueled by strict products liability doctrine, lawyers pressed Frye’s requirement of general acceptance into service as a bulwark against unreliable scientific opinions. Many courts, however, limited Frye to novel devices, and in 1993, the Supreme Court, in Daubert,[5] rejected the legal claim that Rule 702 had incorporated the common law “general acceptance” test. Looking to the language of the rule itself, the Supreme Court discerned that the rule laid down an epistemic test, not a call for sociological surveys about the prevalence of beliefs.

Resistance to the spirit and text of Rule 702 has been widespread and deep seated. After Daubert, the Supreme Court decided three more cases to emphasize that the epistemic standard was “exacting” and that it would not go away.[6] Since Daubert was decided in 1993, Rule 702 was amended substantively, in 2000, to incorporate some of the essence of the Supreme Court’s quartet,[7] which required the proponent of expert witness opinion to establish that proffered testimony is based upon sufficient facts or data, is the product of reliable principles and methods, and the result of reliably applying those reliable principles and methods to the facts of the case.

The change in the law of expert witnesses, in the 1990s, left some academic commentators well-nigh apoplectic. One professor of evidence law at a large law school complained that the law was a “conceptual muddle containing within it a threat to liberty and popular participation in government.”[8] Many federal district and intermediate appellate courts responded by ignoring the language of Rule 702, by reverting to pre-Daubert precedent, or by inventing new standards and shifting the burden to the party challenging the expert witness opinion’s admissibility. For many commentators, lawyers, and judges, science had no validity concerns that the law was bound to respect.

The judicial evasion and avoidance of the requirements of Rule 702 did not go unnoticed. Professor David Bernstein and practicing lawyer Eric Lasker wrote a paper in 2015, to call attention to the judicial disregard of the requirements of Rule 702.[9]  Several years of discussion and debate ensued before the Judicial Conference Advisory Committee on Evidence Rules (AdCom), in 2021, acknowledged that “in a fair number of cases, the courts have found expert testimony admissible even though the proponent has not satisfied the Rule 702(b) and (d) requirements by a preponderance of the evidence.”[10] This frank acknowledgement led the AdCom to propose amending Rule 702, “to clarify and emphasize” that gatekeeping requires determining whether the proponent has demonstrated to the court “that it is more likely than not that the proffered testimony meets the admissibility requirements set forth in the rule.”[11]  The Proposed Committee Note written in support of amending Rule 702 observed that “many courts have held that the critical questions of the sufficiency of an expert’s basis, and the application of the expert’s methodology, are questions of weight and not admissibility. These rulings are an incorrect application of Rules 702 and 104(a).”[12]

The proposed new Rule 702 is now law,[13] with its remedial clarification that the proponent of expert witness opinion must show the court that the opinion is sufficiently supported by facts or data,[14] that the opinion is “the product of reliable principles and methods,”[15]  and that the opinion “reflects a reliable application of the principles and methods to the facts of the case.”[16] The Rule prohibits deferring the evaluation of sufficiency of support or reliability of application of method to the trier of fact; there is no statutory support for suggesting that these inquires always or usually go to “weight and not admissibility,” or that there is a presumption of admissibility.

We may not have reached the Age of Aquarius, but the days of “easy admissibility” should be confined to the dustbin of legal history. Rule 702 is quickly approaching its 50th birthday, with the last 30 years witnessing the implementation of the promise and potential of an epistemic standard of trustworthiness for expert witness opinion testimony. Rule 702, in its present form, should go a long way towards putting validity questions squarely before the court under Rule 702. Nullius in verba[17] has been the motto of the Royal Society since 1660; it should now guide expert witness practice in federal court going forward.


[1] Pub. L. 93–595, §1, Jan. 2, 1975, 88 Stat. 1937 (emphasis added).

[2] Notes of Advisory Committee on Proposed Rules (1975) (emphasis added).

[3] See Charles T. McCormick, Handbook of the Law of Evidence 28-29, 363 (1954) (“Any relevant conclusions which are supported by a qualified expert witness should be received unless there are other reasons for exclusion.”)

[4] Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923).

[5] Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993).

[6] General Electric Co. v. Joiner, 522 U.S. 136 (1997); Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999); Weisgram v. Marley Co., 528 U.S. 440 (2000).

[7] See notes 5, 6, supra.

[8] John H. Mansfield, “An Embarrassing Episode in the History of the Law of Evidence,” 34 Seton Hall L. Rev. 77, 77 (2003); see also John H. Mansfield, “Scientific Evidence Under Daubert,” 28 St. Mary’s L.J. 1, 23 (1996). Professor Mansfield was the John H. Watson, Jr., Professor of Law, at the Harvard Law School. Many epithets were thrown in the heat of battle to establish meaningful controls over expert witness testimony. See, e.g., Kenneth Chesebro, “Galileo’s Retort: Peter Huber’s Junk Scholarship,” 42 Am. Univ. L. Rev. 1637 (1993). Mr. Chesebro was counsel of record for plaintiffs-appellants in Daubert, well before he became a convicted racketeer in Georgia.

[9] David Bernstein & Eric Lasker, “Defending Daubert: It’s Time to Amend Federal Rules of Evidence 702,” 57 Wm. & Mary L Rev. 1 (2015).

[10] Report of AdCom (May 15, 2021), at https://www.uscourts.gov/rules-policies/archives/committee-reports/advisory-committee-evidence-rules-may-2021. See also AdCom, Minutes of Meeting at 4 (Nov. 13, 2020) (“[F]ederal cases . . . revealed a pervasive problem with courts discussing expert admissibility requirements as matters of weight.”)], at https://www.uscourts.gov/rules-policies/archives/meeting-minutes/advisory-committee-evidence-rules-november-2020.

[11] Proposed Committee Note, Summary of Proposed New and Amended Federal Rules of Procedure (Oct. 19, 2022), at https://www.uscourts.gov/sites/default/files/2022_scotus_package_0.pdf

[12] Id. (emphasis added).

[13] In April 2023, Chief Justice Roberts transmitted the proposed Rule 702, to Congress, under the Rules Enabling Act, and highlighted that the amendment “shall take effect on December 1, 2023, and shall govern in all proceedings thereafter commenced and, insofar as just and practicable all proceedings then pending.” S. Ct. Order, at 3 (Apr. 24, 2023), https://www.supremecourt.gov/orders/courtorders/frev23_5468.pdf; S.Ct. Transmittal Package (Apr. 24, 2023), < https://www.uscourts.gov/sites/default/files/2022_scotus_package_0.pdf>.

[14] Rule 702(b).

[15] Rule 702(c).

[16] Rule 702(d).

[17] Take no one’s word for it.