TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Professor Lahav’s Radically Misguided Treatment of Chancy Tort Causation

September 27th, 2024

In the 19th and early 20th century, scientists and lay people usually conceptualized causation as “deterministic.” Their model of science was perhaps what was called Newtonian, in which observations were invariably described in terms of identifiable forces that acted upon antecedent phenomena. The universe was akin to a pool table, with the movement of the billiard balls fully explained by their previous positions, mass, and movements. There was little need for probability to describe events or outcomes in such a universe.

The 20th century ushered in probabilistic concepts and models in physics and biology. Because tort law is so focused on claims of bodily integrity and harms, I am focused here on claimed health effects. Departing from the Koch-Henle postulates and our understanding of pathogen-based diseases, the latter half of the 20th century saw the rise of observational epidemiology and scientific conclusions about stochastic processes and effects that could best be understood in terms of probabilities, with statistical inferences from samples of populations. The language of deterministic physics failed to do justice to epidemiologic evidence or conclusions. Modern medicine and biology invoked notions of base rates for chronic diseases, which rates might be modified by environmental exposures.

In the wake of the emerging science of epidemiology, the law experienced a new horizon on which many claimed tortogens did not involve exposures uniquely tied to the harms alleged. Rather, the harms asserted were often diseases of ordinary life, but with that suggested the harms were quantitatively more prevalent or incident among people exposed to the alleged tortogen. Of course, the backwaters of tort law saw reactionary world views on trial, as with claims of trauma-induced cancer cases, which are with us still. Nonetheless, slowly but not always steadily, the law came to grips with probability and statistical evidence.

In law, as in science, a key component of causal attribution is counterfactual analysis. If A causes B, then if in the same world, ceteris paribus, we do not have A, then we don’t have B. Counterfactual analysis applies as much to stochastic processes that are causally influenced by rate changes, as they apply to the Newtonian world of billiard balls. Some writers in the legal academy, however, would opportunistically use the advent of probabilistic analyses of health effects to dispose of science altogether. No one has more explicitly exploited the opportunity than Professor Alexandra Lahav.

In an essay published in 2022, Professor Lahav advanced extraordinary claims about probabilistic causation, or what she called “chancy causation.”[1] The proffered definition of chancy causation is bumfuzzling. Lahav provides an example of an herbicide that is “associated” with the type of cancer that the heavily exposed plaintiff developed. She tells us that:

“[t]here is a chance that the exposure caused his cancer, and a chance that it did not. Probability follows certain rules, or tendencies, but these regular laws do not abolish chance. This is a common problem in modern life, where much of what we know about medicines, interventions, and the chemicals to which we are exposed is probabilistic. Following the philosophical literature, I call this phenomenon chancy causation.”[2]

So the rules of probability do not abolish chance? It is hard to know what Lahav is trying to say here. Probability quantifies chance, and gives us an understanding of phenomena and their predictability. When we can model an empirical process with a probability distribution, such as one that is independent and identically distributed, we can often make and test quantitative inferences about the anticipated phenomena.

Lahav vaguely acknowledges that her term, “chancy causation” is borrowed, but she does not give credit to the many authors who have used it before.[3] Lahav does note that the concept of probabilistic causation used in modern-day risk factor epidemiology is different from the deterministic causal claims that dominated tort law in the 19th and the first half of the 20th century. Lahav claims that chancy causation is inconsistent with counterfactual analysis, but she cites no support for her claim, which is demonstrably false. If we previously saw the counterfactual of if A then B, as key to causality, we can readily restate the counterfactual as a probability: A probably causes B. On a counterfactual analysis, then if we do not have A as an antecedent, then we probably do not have B. For a classic tortogen such as tobacco smoking, we can say confidently that tobacco smoking probably causes lung cancer. And for a given instance of lung cancer, we can say based upon the entire evidentiary display, that if a person did not smoke tobacco, he would probably not have developed lung cancer. Of course, the correspondence is not 100 percent, which is only to say that it is probabilistic. There are highly penetrant genetic mutations that may be the cause of a given lung cancer case. We know, however, that such mutations do not cause or explain the large majority of lung cancer cases.

Contrary to Lahav’s ipse dixits, tort law can incorporate, and has accommodated, both general and specific causation in terms of probabilistic counterfactuals. The modification requires us, of course, to address the baseline situation as a rate or frequency of events, and the post-exposure world as one with a modified rate or frequency. Without confusion or embarrassment, we can say that the exposure is the cause of the change in event rates. Modern physics similarly addresses whether we must be content with probability statements, rather than precise deterministic “billiard ball” physics, which is so useful in a game of snooker, but less so in describing the position of sub-atomic particles. In the first half of the 20th century, the biological sciences learned with some difficulty that it must embrace probabilistic models, in genetic science, as well as in epidemiology. Many biological causation models are completely stated in terms of probabilities that are modified by specified conditions.

Lahav intends for her rejection of counterfactual causality to do a lot of work in her post-modern program. By falsely claiming that chancy causation has no factual basis, Lahav jumps to the conclusion that what the law calls for is nothing but “policy,”[4] and “normative decision.”[5] Having fabricated the demise of but-for causation in the context of probabilistic relationships, Lahav suggests that tort law can pretend that the causation question is nothing more than a normative analysis of the defendant’s conduct. (Perhaps it is more than a tad revealing that she does not see that the plaintiff’s conduct is involved in the normative judgment.) Of course, tort law already has ample room for policy and normative considerations built into the concepts of duty and breach of duty.

As we saw with the lung cancer example above, the claim that tobacco smoking probably caused the smoker to develop lung cancer can be entirely factual, and supported by a probabilistic judgment. Lahav calls her erroneous move “pragmatic,” although it has no relationship to the philosophical pragmatism of Peirce or Quine. Lahav’s move is an incorrect misrepresentation of probability and of epidemiologic science in the name of compensation free-for-alls. Obtaining a heads in the flip of a fair coin has a probability of 50%; that is a fact, not a normative decision, even though it is, to use Lahav’s vocabulary, “chancy.”

Lahav’s argument is not always easy to follow. In one place, she uses “chancy” to refer to the posterior probability of the correctness of the causal claim:

“the counterfactual standard can be successfully defended against by the introduction of chance. The more conflicting studies, the “more chancy” the causation. By that I do not mean proving a lower probability (although this is a good result from a defense point of view) but rather that more, different study results create the impression of irreducible chanciness, which in turn dictates that the causal relation cannot be definitively proven.”[6]

This usage, which clearly refers to the posterior probability of a claim, is not necessarily limited to so-called non-deterministic phenomena. People could refer to any conclusion, based upon conflicting evidence of deterministic phenomena, as “chancy.”

Lurking in her essay is a further confusion between the posterior probability we might assign to a claim, or to an inference from probabilistic evidence, and the probability of random error. In an interview conducted by Felipe Jiménez,[7] Lahav was more transparent in her confusion, and she explicitly commited the transpositional fallacy with her suggestion that customary statistical standards (statistical significance) ensure that even small increased risks, say of 30%, are known to a high degree of certainty.

Despite these confusions, it seems fairly clear that Lahav is concerned with stochastic causal processes, and most of her examples evidence that concern. Lahav poses a hypothetical in which epidemiologic studies show smokers have a 20% increased risk of developing lung cancer compared with non-smokers.[8] Given that typical smoking histories convey relative risks of 20 to 30, or increased risks of 2,000 to 3,000%, Lahav’s hypothetical may readers think she is shilling for tobacco compaies. In any event, in the face of a 20% increased risk (or relative rsk of 1.2), Lahav acknowledges that the probability of a smoker’s developing lung cancer is higher than that of a non-smoker, but “in any particular case the question whether a patient’s lung cancer was caused by smoking is uncertain.” This assertion, however, is untrue; the question is not “uncertain.” She has provided a certain quantification of the increased risk. Furthermore, her hypothetical gives us a good deal of information on which we can say that smoking probably did not result in the patient’s lung cancer. Causation may be chancy because it is based upon a probabilistic inference, but the chances are actual known, and they are low.

Lahav posits a more interesting hypothetical when she considers a case in which there is an 80% chance that a person’s lung cancer is attributable to smoking.[9] We can understand this hypothetical better if we reframe it as classic urn probability problem. In a given (large) population of non-smokers, we expect 100 lung cancers per year. In a population of smokers, otherwise just like the population of non-smokers, we observe 500 lung cancers. Of the observed number, 100 were “expected” because they happen without exposure to the putative causal agent, and 400 are “excess.”The relative risk would be 5, or 400% increased risk, and still well below the actual measure of risk from long-term smoking, but the attributable risk would be [(RR-1)/RR] or 0.8 (or 80%). If we imagine an urn with 100 white “expected” balls, and 400 red “excess” balls added, then any given draw from the urn, with replacement, yields an 80% probability of a red ball, or an excess case. Of course, if we can see the color, we may come to a consensus judgment that the ball is actually red. But on our analogy to discerning the cause of a given lung cancer, we have at present nothing by way of evidence with which to call the question, and so it remains “chancy” or probabilistic. The question is not, however, in any way normative. The answer is different quantitatively in the 20% and in the 400% hypotheticals.

Lahav asserts that we are in a state of complete ignorance once a smoker has lung cancer.[10] This is not, however, true. We have the basis for a probabilistic judgment that will probably be true. It may well be true that the probability of attribution will be affected by the probability that the relative risk = 5 is correct. If the posterior probability for the claim that smoking causes lung cancer by increasing its risk 400% is only 30%, then of course, we could not make the attribution in a given case with an 80% probability of correctness. In actual litigation, the argument is often framed on an assumption arguendo that the increased risk is greater than two, so that only the probability of attribution is involved. If the posterior probability of the claim that exposure to the tortogen increased risk by 400% or 20,000% was only 0.49, then the plaintiff would lose. If the posterior probability of the increased risk was greater than 0.5, the finder of fact could find that the specific causation claim had been carried if the magnitude of the relative risk, and the attributable risk, were sufficiently large. This inference on specific causation would not be a normative judgment; it would be guided by factual evidence about the magnitude of the relevant increased risk.

Lahav advances a perverse skepticism that any inferences about individuals can be drawn from information about rates or frequencies in groups of similar individuals.  Yes, there may always be some debate about what is “similar,” but successive studies may well draw the net tighter around what is the appropriate class. Lahav’s skepticism and her outright denialism about inferences from general causation to specific causation, are common among some in the legal academy, but it ignores that group to individual inferences are drawn in epidemiology in multiple contexts. Regressions for disease prediction are based upon individual data within groups, and the regression equations are then applied to future individuals to help predict those individuals’ probability of future disease (such as heart attack or breast cancer), or their probability of cancer-free survival after a specific therapy. Group to individual inferences are, of course, also the basis for prescribing decisions in clinical medicine.  These are not normative inferences; they are based upon evidence-based causal thinking about probabilistic inferences.

In the early tobacco litigation, defendants denied that tobacco smoking caused lung cancer, but they argued that even if it did, and the relative risk were 20, then the specific causation inference in this case was still insecure because the epidemiologic study tells us nothing about the particular case. Lahav seems to be channeling the tobacco-company argument, which has long since been rejected on the substantive law of causation. Indeed, as noted, epidemiologists do draw inferences about individual cases from population-based studies when they invoke clinical prediction models such as the Framingham cardiovascular risk event model, or the Gale breast cancer prediction model. Physicians base important clinical interventions, both pharmacologic and surgical, for individuals upon population studies. Lahav asserts, without evidence, that the only difference between an intervention based upon an 80% or a 30% probability is a “normative implication.”[11] The difference is starkly factual, not normative, and describes a long-term likelihood of success, as well as an individual probability of success.

Post-Modern Causation

What we have in Lahav’s essay is the ultimate post-modern program, which asserts, without evidence, that when causation is “chancy,” or indeterminate, courts leave the realm of science and step into the twilight zone of “normative decisions.” Lahav suggests that there is an extreme plasticity to the very concept of causation such that causation can be whatever judges want it to be. I for one sincerely doubt it. And if judges make up some Lahav-inspired concept of normative causation, the scientific community would rightfully scoff.

Establishing causation can be difficult, and many so-called mass tort litigations have failed for want of sufficient, valid evidence supporting causal claims. The late Professor Margaret Berger reacted to this difficulty in a more forthright way by arguing for the abandonment of general causation, or cause-in-fact, as an element of tort claims under the law.[12] Berger’s antipathy to requiring causation manifested in her hostility to judicial gatekeeping of the validity of expert witness opinions. Her animus against requiring causation and gatekeeping under Rule 702 was so strong that it exceeded her lifespan. Berger’s chapter in the third edition of the Reference Manual on Scientific Evidence, which came out almost one year after her death, embraced the First Circuit’s notorious anti-Daubert decision in Milward, which also post-dated her passing.[13]

Professor Lahav has previously expressed a distain for the causation requirement in tort law. In an earlier paper, “The Knowledge Remedy,” Lahav argued for an extreme, radical precautionary principle approach to causation.[14] Lahav believes that the likes of David Michaels have “demonstrated” that manufactured uncertainty is a genuine problem, but not one that affects her main claims. Remarkably, Lahav sees no problem with manufactured certainty in the advocacy science of many authors or the lawsuit industry.[15] In “Chancy Causation,” Lahav thus credulously repeats Michaels’ arguments, and goes so far as to describe Rule 702 challenges to causal claims as having the “negative effect” of producing “incentives to sow doubt about epidemiologic studies using methodological battles, a strategy pioneered by the tobacco companies … .”[16] Lahav’s agenda is revealed by the absence of any corresponding concern about the negative effect of producing incentives to overstate the findings, or the validity of inferences, in order to obtain an unwarranted and unsafe verdicts for claimants.


[1] Alexandra D. Lahav, “Chancy Causation in Tort,” 15 J. Tort L. 109 (2022) [hereafter Chancy Causation].

[2] Chancy Causation at 110.

[3] See, e.g., David K. Lewis, Philosophical Papers: Volume 2 175 (1986); Mark Parascandola, “Evidence and Association: Epistemic Confusion in Toxic Tort Law,” 63 Phil. Sci. S168 (1996).

[4] Chancy Causation at 109.

[5] Chancy Causation at 110-11.

[6] Chancy Causation at 129.

[7] Felipe Jiménez, “Alexandra Lahav on Chancy Causation in Tort,” The Private Law Podcast (Mar. 29, 2021).

[8] Chancy Causation at 115.

[9] Chancy Causation at 116-17.

[10] Chancy Causation at 117.

[11] Chancy Causation at 119.

[12] Margaret A. Berger, “Eliminating General Causation: Notes towards a New Theory of Justice and Toxic Torts,” 97 Colum. L. Rev. 2117 (1997).

[13] Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, 132 S. Ct. 1002 (2012).

[14] Alexandra D. Lahav, “The Knowledge Remedy,” 98 Texas L. Rev. 1361 (2020). See “The Knowledge Remedy ProposalTortini (Nov. 14, 2020).

[15] Chancy Causation at 118 (citing plaintiffs’ expert witness David Michaels, The Triumph of Doubt: Dark Money and the Science of Deception (2020), among others).

[16] Chancy Causation at 129.

The Role of Peer Review in Rule 702 and 703 Gatekeeping

November 19th, 2023

“There is no expedient to which man will not resort to avoid the real labor of thinking.”
              Sir Joshua Reynolds (1723-92)

Some courts appear to duck the real labor of thinking, and the duty to gatekeep expert witness opinions,  by deferring to expert witnesses who advert to their reliance upon peer-reviewed published studies. Does the law really support such deference, especially when problems with the relied-upon studies are revealed in discovery? A careful reading of the Supreme Court’s decision in Daubert, and of the Reference Manual on Scientific Evidence provides no support for admitting expert witness opinion testimony that relies upon peer-reviewed published studies, when the studies are invalid or are based upon questionable research practices.[1]

In Daubert v. Merrell Dow Pharmaceuticals, Inc.,[2] The Supreme Court suggested that peer review of studies relied upon by a challenged expert witness should be a factor in determining the admissibility of that expert witness’s opinion. In thinking about the role of peer-review publication in expert witness gatekeeping, it is helpful to remember the context of how and why the Supreme was talking about peer review in the first place. In the trial court, the Daubert plaintiff had proffered an expert witness opinion that featured reliance upon an unpublished reanalysis of published studies. On the defense motion, the trial court excluded the claimant’s witness,[3] and the Ninth Circuit affirmed.[4] The intermediate appellate court expressed its view that unpublished, non-peer-reviewed reanalyses were deviations from generally accepted scientific discourse, and that other appellate courts, considering the alleged risks of Bendectin, refused to admit opinions based upon unpublished, non-peer-reviewed reanalyses of epidemiologic studies.[5] The Circuit expressed its view that reanalyses are generally accepted by scientists when they have been verified and scrutinized by others in the field. Unpublished reanalyses done for solely for litigation would be an insufficient foundation for expert witness opinion.[6]

The Supreme Court, in Daubert, evaded the difficult issues involved in evaluating a statistical analysis that has not been published by deciding the case on the ground that the lower courts had applied the wrong standard.  The so-called Frye test, or what I call the “twilight zone” test comes from the heralded 1923 case excluding opinion testimony based upon a lie detector:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”[7]

The Supreme Court, in Daubert, held that with the promulgation of the Federal Rules of Evidence in 1975, the twilight zone test was no longer legally valid. The guidance for admitting expert witness opinion testimony lay in Federal Rule of Evidence 702, which outlined an epistemic test for “knowledge,” which would be helpful to the trier of fact. The Court then proceeded to articulate several  non-definitive factors for “good science,” which might guide trial courts in applying Rule 702, such as testability or falsifiability, a showing of known or potential error rate. Another consideration, general acceptance carried over from Frye as a consideration.[8] Courts have continued to build on this foundation to identify other relevant considerations in gatekeeping.[9]

One of the Daubert Court’s pertinent considerations was “whether the theory or technique has been subjected to peer review and publication.”[10] The Court, speaking through Justice Blackmun, provided a reasonably cogent, but probably now out-dated discussion of peer review:

 “Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?,” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[11]

To the extent that peer review was touted by Justice Blackmun, it was because the peer-review process advanced the ultimate consideration of the scientific validity of the opinion or claim under consideration. Validity was the thing; peer review was just a crude proxy.

If the Court were writing today, it might well have written that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena. And of course, the wild proliferation of journals, including the “pay-to-play” journals, facilitates the festschrift.

Reference Manual on Scientific Evidence

Certainly, judicial thinking evolved since 1993, and the decision in Daubert. Other considerations for gatekeeping have been added. Importantly, Daubert involved the interpretation of a statute, and in 2000, the statute was amended.

Since the Daubert decision, the Federal Judicial Center and the National Academies of Science have weighed in with what is intended to be guidance for judges and lawyers litigating scientific and technical issue. The Reference Manual on Scientific Evidence is currently in a third edition, but a fourth edition is expected in 2024.

How does the third edition[12] treat peer review?

An introduction by now retired Associate Justice Stephen Breyer blandly reports the Daubert considerations, without elaboration.[13]

The most revealing and important chapter in the Reference Manual is the one on scientific method and procedure, and sociology of science, “How Science Works,” by Professor David Goodstein.[14] This chapter’s treatment is not always consistent. In places, the discussion of peer review is trenchant. At other places, it can be misleading. Goodstein’s treatment, at first, appears to be a glib endorsement of peer review as a substitute for critical thinking about a relied-upon published study:

“In the competition among ideas, the institution of peer review plays a central role. Scientifc articles submitted for publication and proposals for funding often are sent to anonymous experts in the field, in other words, to peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (space in prestigious journals, funds from government agencies or private foundations) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their toughest competitor is rigorously honest in the reporting of scientific results, which makes it easy for a purposefully dishonest scientist to fool a referee. Despite all of this, peer review is one of the venerated pillars of the scientific edifice.”[15]

A more nuanced and critical view emerges in footnote 11, from the above-quoted passage, when Goodstein discusses how peer review was framed by some amici curiae in the Daubert case:

“The Supreme Court received differing views regarding the proper role of peer review. Compare Brief for Amici Curiae Daryl E. Chubin et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993) (No. 92-102) (“peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty”), with Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993) (No. 92-102) (proposing that publication in a peer-reviewed journal be the primary criterion for admitting scientifc evidence in the courtroom). See generally Daryl E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990); Arnold S. Relman & Marcia Angell, How Good Is Peer Review? 321 New Eng. J. Med. 827–29 (1989). As a practicing scientist and frequent peer reviewer, I can testify that Chubin’s view is correct.”[16]

So, if, as Professor Goodstein attests, Chubin is correct that peer review does not “guarantee accuracy or veracity or certainty,” the basis for veneration is difficult to fathom.

Later in Goodstein’s chapter, in a section entitled “V. Some Myths and Facts about Science,” the gloves come off:[17]

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. Peer review mostly assures that all papers follow the current paradigm (see comments on Kuhn, above). It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.”[18]

Goodstein is not a post-modern nihilist. He acknowledges that “real” science can be distinguished from “not real science.” He can hardly be seen to have given a full-throated endorsement to peer review as satisfying the gatekeeper’s obligation to evaluate whether a study can be reasonably relied upon, or whether reliance upon such a particular peer-reviewed study can constitute sufficient evidence to render an expert witness’s opinion helpful, or the application of a reliable methodology.

Goodstein cites, with apparent approval, the amicus brief filed by the New England Journal of Medicine, and other journals, which advised the Supreme Court that “good science,” requires a “a rigorous trilogy of publication, replication and verification before it is relied upon.” [19]

“Peer review’s ‘role is to promote the publication of well-conceived articles so that the most important review, the consideration of the reported results by the scientific community, may occur after publication.’”[20]

Outside of Professor Goodstein’s chapter, the Reference Manual devotes very little ink or analysis to the role of peer review in assessing Rule 702 or 703 challenges to witness opinions or specific studies.  The engineering chapter acknowledges that “[t]he topic of peer review is often raised concerning scientific and technical literature,” and helpfully supports Goodstein’s observations by noting that peer review “does not ensure accuracy or validity.”[21]

The chapter on neuroscience is one of the few chapters in the Reference Manual, other than Professor Goodstein’s, to address the limitations of peer review. Peer review, if absent, is highly suspicious, but its presence is only the beginning of an evaluation process that continues after publication:

Daubert’s stress on the presence of peer review and publication corresponds nicely to scientists’ perceptions. If something is not published in a peer-reviewed journal, it scarcely counts. Scientists only begin to have confidence in findings after peers, both those involved in the editorial process and, more important, those who read the publication, have had a chance to dissect them and to search intensively for errors either in theory or in practice. It is crucial, however, to recognize that publication and peer review are not in themselves enough. The publications need to be compared carefully to the evidence that is proffered.[22]

The neuroscience chapter goes on to discuss peer review also in the narrow context of functional magnetic resonance imaging (fMRI). The authors note that fMRI, as a medical procedure, has been the subject of thousands of peer-reviewed, but those peer reviews do little to validate the use of fMRI as a high-tech lie detector.[23] The mental health chapter notes in a brief footnote that the science of memory is now well accepted and has been subjected to peer review, and that “[c]areful evaluators” use only tests that have had their “reliability and validity confirmed in peer-reviewed publications.”[24]

Echoing other chapters, the engineering chapter also mentions peer review briefly in connection with qualifying as an expert witness, and in validating the value of accrediting societies.[25]  Finally, the chapter points out that engineering issues in litigation are often sufficiently novel that they have not been explored in peer-reviewed literature.[26]

Most of the other chapters of the Reference Manual, third edition, discuss peer review only in the context of qualifications and membership in professional societies.[27] The chapter on exposure science discusses peer review only in the narrow context of a claim that EPA guidance documents on exposure assessment are peer reviewed and are considered “authoritative.”[28]

Other chapters discuss peer review briefly and again only in very narrow contexts. For instance, the epidemiology chapter discusses peer review in connection with two very narrow issues peripheral to Rule 702 gatekeeping. First, the chapter raises the question (without providing a clear answer) whether non-peer-reviewed studies should be included in meta-analyses.[29] Second, the chapter asserts that “[c]ourts regularly affirm the legitimacy of employing differential diagnostic methodology,” to determine specific causation, on the basis of several factors, including the questionable claim that the methodology “has been subjected to peer review.”[30] There appears to be no discussion in this key chapter about whether, and to what extent, peer review of published studies can or should be considered in the gatekeeping of epidemiologic testimony. There is certainly nothing in the epidemiology chapter, or for that matter elsewhere in the Reference Manual, to suggest that reliance upon a peer-reviewed published study pretermits analysis of that study to determine whether it is indeed internally valid or reasonably relied upon by expert witnesses in the field.


[1] See Jop de Vrieze, “Large survey finds questionable research practices are common: Dutch study finds 8% of scientists have committed fraud,” 373 Science 265 (2021); Yu Xie, Kai Wang, and Yan Kong, “Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis,” 27 Science & Engineering Ethics 41 (2021).

[2] 509 U.S. 579 (1993).

[3]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 727 F.Supp. 570 (S.D.Cal.1989).

[4] 951 F. 2d 1128 (9th Cir. 1991).

[5]  951 F. 2d, at 1130-31.

[6] Id. at 1131.

[7] Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923) (emphasis added).

[8]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 590 (1993).

[9] See, e.g., In re TMI Litig. II, 911 F. Supp. 775, 787 (M.D. Pa. 1995) (considering the relationship of the technique to methods that have been established to be reliable, the uses of the method in the actual scientific world, the logical or internal consistency and coherence of the claim, the consistency of the claim or hypothesis with accepted theories, and the precision of the claimed hypothesis or theory).

[10] Id. at  593.

[11] Id. at 593-94.

[12] National Research Council, Reference Manual on Scientific Evidence (3rd ed. 2011) [RMSE]

[13] Id., “Introduction” at 1, 13

[14] David Goodstein, “How Science Works,” RMSE 37.

[15] Id. at 44-45.

[16] Id. at 44-45 n. 11 (emphasis added).

[17] Id. at 48 (emphasis added).

[18] Id. at 49 n.16 (emphasis added)

[19] David Goodstein, “How Science Works,” RMSE 64 n.45 (citing Brief for the New England Journal of Medicine, et al., as Amici Curiae supporting Respondent, 1993 WL 13006387 at *2, in Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).

[20] Id. (citing Brief for the New England Journal of Medicine, et al., 1993 WL 13006387 *3)

[21] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 938 (emphasis added).

[22] Henry T. Greely & Anthony D. Wagner, “Reference Guide on Neuroscience,” RMSE 747, 786.

[23] Id. at 776, 777.

[24] Paul S. Appelbaum, “Reference Guide on Mental Health Evidence,” RMSE 813, 866, 886.

[25] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 901, 931.

[26] Id. at 935.

[27] Daniel Rubinfeld, “Reference Guide on Multiple Regression,” 303, 328 RMSE  (“[w]ho should be qualified as an expert?”); Shari Seidman Diamond, “Reference Guide on Survey Research,” RMSE 359, 375; Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” RMSE 633, 677, 678 (noting that membership in some toxicology societies turns in part on having published in peer-reviewed journals).

[28] Joseph V. Rodricks, “Reference Guide on Exposure Science,” RMSE 503, 508 (noting that EPA guidance documents on exposure assessment often are issued after peer review).

[29] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE 549, 608.

[30] Id. at 617-18 n.212.

Consenus is Not Science

November 8th, 2023

Ted Simon, a toxicologist and a fellow board member at the Center for Truth in Science, has posted an intriguing piece in which he labels scientific consensus as a fool’s errand.[1]  Ted begins his piece by channeling the late Michael Crichton, who famously derided consensus in science, in his 2003 Caltech Michelin Lecture:

“Let’s be clear: the work of science has nothing whatever to do with consensus. Consensus is the business of politics. Science, on the contrary, requires only one investigator who happens to be right, which means that he or she has results that are verifiable by reference to the real world. In science, consensus is irrelevant. What is relevant is reproducible results. The greatest scientists in history are great precisely because they broke with the consensus.

* * * *

There is no such thing as consensus science. If it’s consensus, it isn’t science. If it’s science, it isn’t consensus. Period.”[2]

Crichton’s (and Simon’s) critique of consensus is worth remembering in the face of recent proposals by Professor Edward Cheng,[3] and others,[4] to make consensus the touchstone for the admissibility of scientific opinion testimony.

Consensus or general acceptance can be a proxy for conclusions drawn from valid inferences, within reliably applied methodologies, based upon sufficient evidence, quantitatively and qualitatively. When expert witnesses opine contrary to a consensus, they raise serious questions regarding how they came to their conclusions. Carl Sagan declaimed that “extraordinary claims require extraordinary evidence,” but his principle was hardly novel. Some authors quote the French polymath Pierre Simon Marquis de Laplace, who wrote in 1810: “[p]lus un fait est extraordinaire, plus il a besoin d’être appuyé de fortes preuves,”[5] but as the Quote Investigator documents,[6] the basic idea is much older, going back at least another century to church rector who expressed his skepticism of a contemporary’s claim of direct communication with the almighty: “Sure, these Matters being very extraordinary, will require a very extraordinary Proof.”[7]

Ted Simon’s essay is also worth consulting because he notes that many sources of apparent consensus are really faux consensus, nothing more than self-appointed intellectual authoritarians who systematically have excluded some points of view, while turning a blind eye to their own positional conflicts.

Lawyers, courts, and academics should be concerned that Cheng’s “consensus principle” will change the focus from evidence, methodology, and inference, to a surrogate or proxy for validity. And the sociological notion of consensus will then require litigation of whether some group really has announced a consensus. Consensus statements in some areas abound, but inquiring minds may want to know whether they are the result of rigorous, systematic reviews of the pertinent studies, and whether the available studies can support the claimed consensus.

Professor Cheng is hard at work on a book-length explication of his proposal, and some criticism will have to await the event.[8] Perhaps Cheng will overcome the objections placed against his proposal.[9] Some of the examples Professor Cheng has given, however, such as his errant his dramatic misreading of the American Statistical Association’s 2016 p-value consensus statement to represent, in Cheng’s words:

“[w]hile historically used as a rule of thumb, statisticians have now concluded that using the 0.05 [p-value] threshold is more distortive than helpful.”[10]

The 2016 Statement said no such thing, although a few statisticians attempted to distort the statement in the way that Cheng suggests. In 2021, a select committee of leading statisticians, appointed by the President of the ASA, issued a statement to make clear that the ASA had not embraced the Cheng misinterpretation.[11] This one example alone does not bode well for the viability of Cheng’s consensus principle.


[1] Ted Simon, “Scientific consensus is a fool’s errand made worse by IARC” (Oct. 2023).

[2] Michael Crichton, “Aliens Cause Global Warming,” Caltech Michelin Lecture (Jan. 17, 2003).

[3] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022) [Consensus Rule]

[4] See Norman J. Shachoy Symposium, The Consensus Rule: A New Approach to the Admissibility of Scientific Evidence (2022), 67 Villanova L. Rev. (2022); David S. Caudill, “The ‘Crisis of Expertise’ Reaches the Courtroom: An Introduction to the Symposium on, and a Response to, Edward Cheng’s Consensus Rule,” 67 Villanova L. Rev. 837 (2022); Harry Collins, “The Owls: Some Difficulties in Judging Scientific Consensus,” 67 Villanova L. Rev. 877 (2022); Robert Evans, “The Consensus Rule: Judges, Jurors, and Admissibility Hearings,” 67 Villanova L. Rev. 883 (2022); Martin Weinel, “The Adversity of Adversarialism: How the Consensus Rule Reproduces the Expert Paradox,” 67 Villanova L. Rev. 893 (2022); Wendy Wagner, “The Consensus Rule: Lessons from the Regulatory World,” 67 Villanova L. Rev. 907 (2022); Edward K. Cheng, Elodie O. Currier & Payton B. Hampton, “Embracing Deference,” 67 Villanova L. Rev. 855 (2022).

[5] Pierre-Simon Laplace, Théorie analytique des probabilités (1812) (The more extraordinary a fact, the more it needs to be supported by strong proofs.”). See Tressoldi, “Extraordinary Claims Require Extraordinary Evidence: The Case of Non-Local Perception, a Classical and Bayesian Review of Evidences,” 2 Frontiers Psych. 117 (2011); Charles Coulston Gillispie, Pierre-Simon Laplace, 1749-1827: a life in exact science (1997).

[6]Extraordinary Claims Require Extraordinary Evidence” (Dec. 5, 2021).

[7] Benjamin Bayly, An Essay on Inspiration 362, part 2 (2nd ed. 1708).

[8] The Consensus Principle, under contract with the University of Chicago Press.

[9] SeeCheng’s Proposed Consensus Rule for Expert Witnesses” (Sept. 15, 2022);
Further Thoughts on Cheng’s Consensus Rule” (Oct. 3, 2022); “Consensus Rule – Shadows of Validity” (Apr. 26, 2023).

[10] Consensus Rule at 424 (citing but not quoting Ronald L. Wasserstein & Nicole A. Lazar, “The ASA Statement on p-Values: Context, Process, and Purpose,” 70 Am. Statistician 129, 131 (2016)).

[11] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021); see also “A Proclamation from the Task Force on Statistical Significance” (June 21, 2021).

Is the Scientific Method Fascist?

June 14th, 2023

Just before the pandemic, when our country seems to have gone tits up, there was a studied effort to equate any emphasis on scientific method, and the valuation of “[o]bjective, rational linear thinking; “[c]ause and effect relationships”; and “[q]uantitative emphasis,” with white privilege and microaggression against non-white people.

I am not making up this claim; I am not creative enough. Indeed, for a while, the  Smithsonian National Museum of African American History & Culture featured a graphic that included “emphasis on scientific method” as aspect of white culture, and implied it was an unsavory aspect of “white privilege.”[1]

Well, as it turns out, scientific method is not only racist, but fascist as well.

With pretentious citations to Deleuze,[2] Foucault,[3] and Lyotard,[4] a group of Canadian authors[5] set out to decolonize science and medicine from the fascist grip of scientific methodology and organizations such as the Cochrane Group. The grand insight is that the health sciences have been “colonized” by a scientific research “paradigm” that is “outrageously exclusionary and dangerously normative with regards to scientific knowledge.” By excluding “alternative forms of knowledge,” evidence-based medicine acts as a “fascist structure.” The Cochcrane Group in particular is singled out for having created an exclusionary and non-egalitarian hierarchy of evidence.  Intolerance for non-approved modes of inference and thinking are, in these authors’ view, “manifestations of fascism,” which are more “pernicious,” even if less brutal than the fascism practiced by Hitler and Mussolini.[6]

Clutch the pearls!

Never mind that “deconstruction” itself sounds a bit fascoid,[7] not to mention a rather vague concept. The authors seem intent to promote multiple ways of knowing without epistemic content. Indeed, our antifa authors do not attempt to show that evidence-based medicine leads regularly to incorrect results, or that their unspecified alternatives have greater predictive value. Nonetheless, decolonization of medicine and deconstruction of hierarchical methodology remain key for them to achieve an egalitarian epistemology, by which everyone is equally informed and equally stupid. In the inimitable words of the authors, “many scientists find themselves interpellated by hegemonic discourses and come to disregard all others.”[8]

These epistemic freedom fighters want to divorce the idea of evidence from objective reality, and make evidence bend to “values.”[9] Apparently, the required deconstruction of the “knowing subject” is that the subject is “implicitly implicitly male, white, Western and heterosexual.” Medicine’s fixation on binaries such as normal and pathological, male and female, shows that evidence-based medicine is simply not queer enough. Our intrepid authors must be credited for having outed the “hidden political agenda” of those who pretend simply to find the truth, but who salivate over imposing their “hegemonic norms,” asserted in the “name of ‘truth’.”

These Canadian authors leave us with a battle cry: “scholars have not only a scientific duty, but also an ethical obligation to deconstruct these regimes of power.”[10] Scientists of the world, you have nothing to lose but your socially constructed non-sensical conception of scientific truth.

Although it is easy to make fun of post-modernist pretensions,[11] there is a point about the force of argument and evidence. The word “valid” comes to us from the 16th century French word “valide,” which in turn comes from the Latin validus, meaning strong. Similarly, we describe a well-conducted study with robust findings as compelling our belief.

I recall the late Robert Nozick, back in the 1970s, expressing the view that someone who embraced a contradiction might pop out of existence, the way an electron and a positron might cancel each other. If only it were so, we might have people exercising more care in their thinking and speaking.


[1]Is Your Daubert Motion Racist?” (July 17, 2020). The Smithsonian has since seen fit to remove the chart reproduced here, but we know what they really believe.

[2] Gilles Deleuze and Félix Guattari, Anti-oedipus: Capitalism and Schizophrenia (1980); Gilles Deleuze and Félix Guattari, A Thousand Plateaus: Capitalism and Schizophrenia (1987). This dross enjoyed funding from the Canadian Institutes of Health Research, and the Social Science and Humanities Research Council of Canada.

[3] Michel Foucault, The Birth of the Clinic: An Archaeology of Medical Perception (1973); Michel Foucault, The History of Sexuality, Volume 1: An Introduction (trans. Robert Hurley 1978); Michel Foucault, Society Must Be Defended: Lectures at the Collège de France, 1975–1976 (2003); Michel Foucault, Power/Knowledge: Selected Interviews and Other Writings, 1972–1977 (1980); Michel Foucault, Fearless Speech (2001).

[4] Jean-François Lyotard, The Postmodern Condition: A Report on Knowledge (1984).

[5] Dave Holmes, Stuart J Murray, Amélie Perron, and Geneviève Rail, “Deconstructing the evidence-based discourse in health sciences: truth, power and fascism,” 4 Internat’l J. Evidence-Based Health 180 (2006) [Deconstructing]

[6][6] Deconstructing at 181.

[7] Pace David Frum

[8] Deconstructing at 182.

[9] Deconstructing at 183.

[10] Deconstructing  at 180-81.

[11] Alan D. Sokal, “Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quantum Gravity,” 46 Social Text 217 (1994).

Consensus Rule – Shadows of Validity

April 26th, 2023

Back in 2011, at a Fourth Circuit Judicial Conference, Chief Justice John Roberts took a cheap shot at law professors and law reviews when he intoned:

“Pick up a copy of any law review that you see, and the first article is likely to be, you know, the influence of Immanuel Kant on evidentiary approaches in 18th Century Bulgaria, or something, which I’m sure was of great interest to the academic that wrote it, but isn’t of much help to the bar.”[1]

Anti-intellectualism is in vogue these days. No doubt, Roberts was jocularly indulging in an over-generalization, but for anyone who tries to keep up with the law reviews, he has a small point. Other judges have rendered similar judgments. Back in 1993, in a cranky opinion piece – in a law review – then Judge Richard A. Posner channeled the liar paradox by criticizing law review articles for “the many silly titles, the many opaque passages, the antic proposals, the rude polemics, [and] the myriad pretentious citations.”[2] In a speech back in 2008, Justice Stephen Breyer noted that “[t]here is evidence that law review articles have left terra firma to soar into outer space.”[3]

The temptation to rationalize, and to advocate for reflective equilibrium between the law as it exists, and the law as we think it should be, combine to lead to some silly and harmful efforts to rewrite the law as we know it.  Jeremy Bentham, Mr. Nonsense-on-Stilts, who sits stuffed in the hallway of the University of London, ushered in a now venerable tradition of rejecting tradition and common sense, in proposing all sorts of law reforms.[4]  In the early 1800s, Jeremy Bentham, without much in the way of actual courtroom experience, deviled the English bench and bar with sweeping proposals to place evidence law on what he thought was a rational foundation. As with his naïve utilitarianism, Bentham’s contributions to jurisprudence often ignored the realities of human experience and decision making. The Benthamite tradition of anti-tradition is certainly alive and well in the law reviews.

Still, I have a soft place in my heart for law reviews.  Although not peer reviewed, law reviews provide law students a tremendous opportunity to learn about writing and scholarship through publishing the work of legal scholars, judges, thoughtful lawyers, and other students. Not all law review articles are non-sense on stilts, but we certainly should have our wits about us when we read immodest proposals from the law professoriate.

*   *   *   *   *   *   *   *   *   *

Professor Edward Cheng has written broadly and insightfully about evidence law, and he certainly has the educational training to do so. Recently, Cheng has been bemused by the expert paradox, which wonders how lay persons, without expertise, can evaluate and judge issues of the admissibility, validity, and correctness of expert opinion. The paradox has long haunted evidence law, and it is at center stage in the adjudication of expert admissibility issues, as well as the trial of technical cases. Recently, Cheng has proposed a radical overhaul to the law of evidence, which would require that we stop asking courts to act as gatekeepers, and to stop asking juries to determine the validity and correctness of expert witness opinions before them. Cheng’s proposal would revert to the nose counting process of Frye and permit consideration of only whether there is an expert witness consensus to support the proffered opinion for any claim or defense.[5] Or in Plato’s allegory of the cave, we need to learn to be content with shadows on the wall rather than striving to know the real thing.

When Cheng’s proposal first surfaced, I wrote briefly about why it was a bad idea.[6] Since his initial publication, a law review symposium was assembled to address and perhaps to celebrate the proposal.[7] The papers from that symposium are now in print.[8] Unsurprisingly, the papers are both largely sympathetic (but not completely) to Cheng’s proposal, and virtually devoid of references to actual experiences of gatekeeping or trials of technical issues.

Cheng contends that the so-called Daubert framework for addressing the admissibility of expert witness opinion is wrong.  He does not argue that the existing law, in the form of Federal Rules of Evidence 702 and 703, does not call for an epistemic standard for both admitting opinion testimony, as well for the fact-finders’ assessments. There is no effort to claim that somehow four Supreme Court cases, and thousand of lower courts, have erroneously viewed the whole process. Rather, Cheng simply asserts non-expert judges cannot evaluate the reliability (validity) of expert witness opinions, and that non-expert jurors cannot “reach independent, substantive conclusions about specialized facts.”[9] The law must change to accommodate his judgment.

In his symposium contribution, Cheng expands upon his previous articulation of his proposed “consensus rule.”[10] What is conspicuously absent, however, is any example of failed gatekeeping that excluded valid expert witness opinion. One example, the appellate decision in Rosen v. Ciba-Geigy Corporation,[11] which Cheng does give, is illustrative of Cheng’s project. The expert witness, whose opinion was excluded, was on the faculty of the University of Chicago medical school; Richard Posner, the appellate judge who wrote the opinion that affirmed the expert witness’s exclusion, was on the faculty of that university’s law school. Without any discussion of the reports, depositions, hearings, or briefs, Cheng concludes that “the very idea that a law professor would tell medical school colleagues that their assessments were unreliable seems both breathtakingly arrogant and utterly ridiculous.”[12]

Except, of course, very well qualified scientists and physicians advance invalid and incorrect claims all the time. What strikes me as breathtakingly arrogant and utterly ridiculous is the judgment of a law professor who has little to no experience trying or defending Rule 702 and 703 issues labeling the “very idea” as arrogant and ridiculous. Aside from its being a petitio principia, we could probably add that the reaction is emotive, uninformed, and uninformative, and that it fails to support the author’s suggestion that “Daubert has it all wrong,” and that “[w]e need a different approach.”

Judges and jurors obviously will never fully understand the scientific issues before them.  If and when this lack of epistemic competence is problematic, we should honestly acknowledge how we are beyond the realm of the Constitution’s seventh amendment. Since Cheng is fantasizing about what the law should be, why not fantasize about not allowing lay people to decide complex scientific issues? Verdicts from jurors who do not have to give reasons for their decisions, and who are not in any sense peers of the scientists whose work they judge are normatively problematic.

Professor Cheng likens his consensus rule to how the standard of care is decided in medical malpractice litigation. The analogy is interesting, but hardly compelling in that it ignores “two schools of thought” doctrine.[13] In litigation of claims of professional malpractice, the “two schools of thought doctrine” is a complete defense.  As explained by the Pennsylvania Supreme Court,[14] physicians may defend against claims that they deviated from the standard of care, or of professional malpractice, by adverting to support for their treatment by a minority of professionals in their field:

“Where competent medical authority is divided, a physician will not be held responsible if in the exercise of his judgment he followed a course of treatment advocated by a considerable number of recognized and respected professionals in his given area of expertise.”[15]

The analogy to medical malpractice litigation seems inapt.

Professor Cheng advertises that he will be giving full-length book treatment to his proposal, and so perhaps my critique is uncharitable in looking at a preliminary, (antic?) law review article. Still, his proposal seems to ignore that “general acceptance” renders consensus, when it truly exists, as relevant to both the court’s gatekeeping decisions, and the fact finders’ determination of the facts and issues in dispute. Indeed, I have never seen a Rule 702 hearing that did not involve, to some extent, the assertion of a consensus, or the lack thereof.

To the extent that we remain committed to trials of scientific claims, we can see that judges and jurors often can detect inconsistencies, cherry picking, unproven assumptions, and other aspects of the patho-epistemology of exert witness opinions. It takes a community of scientists and engineers to build a space rocket, but any Twitter moron can determine when a rocket blows up on launch. Judges in particular have (and certainly should have) the competence to determine deviations from the scientific and statistical standards of care that pertain to litigants’ claims.

Cheng’s proposal also ignores how difficult and contentious it is to ascertain the existence, scope, and actual content of scientific consensus. In some areas of science, such as occupational and environmental epidemiology and medicine, faux consensuses are set up by would-be expert witnesses for both claimants and defendants. A search of the word “consensus” in the PubMed database yields over a quarter of a million hits. The race to the bottom is on. Replacing epistemic validity with sociological and survey navel gazing seems like a fool’s errand.

Perhaps the most disturbing aspect of Cheng’s proposal is what happens in the absence of consensus.  Pretty much anything goes, a situation that Cheng finds “interesting,” and I find horrifying:

“if there is no consensus, the legal system’s options become a bit more interesting. If there is actual dissensus, meaning that the community is fractured in substantial numbers, then the non-expert can arguably choose from among the available theories. If the expert community cannot agree, then one cannot possibly expect non-experts to do any better.”[16]

Cheng reports that textbooks and other documents “may be both more accurate and more efficient” evidence of consensus.[17] Maybe; maybe not.  Textbooks are typically often dated by the time they arrive on the shelves, and contentious scientists are not beyond manufacturing certainty or doubt in the form of falsely claimed consensus.

Of course, often, if not most of the time, there will be no identifiable, legitimate consensus for a litigant’s claim at trial. What would Professor Cheng do in this default situation? Here Cheng, fully indulging the frolic, tells us that we

“should hypothetically ask what the expert community is likely to conclude, rather than try to reach conclusions on their own.”[18]

So the default situation transforms jurors into tea-leaf readers of what an expert community, unknown to them, will do if and when there is evidence of a quantum and quality to support a consensus, or when that community gets around to articulating what the consensus is. Why not just toss claims that lack consensus support?


[1] Debra Cassens Weiss, “Law Prof Responds After Chief Justice Roberts Disses Legal Scholarship,” Am. Bar Ass’n J. (July 7, 2011).

[2] Richard A. Posner, “Legal Scholarship Today,” 45 Stanford L. Rev. 1647, 1655 (1993), quoted in Walter Olson, “Abolish the Law Reviews!” The Atlantic (July 5, 2012); see also Richard A. Posner, “Against the Law Reviews: Welcome to a world where inexperienced editors make articles about the wrong topics worse,”
Legal Affairs (Nov. 2004).

[3] Brent Newton, “Scholar’s highlight: Law review articles in the eyes of the Justices,” SCOTUS Blog (April 30, 2012); “Fixing Law Reviews,” Inside Higher Education (Nov. 19, 2012).

[4]More Antic Proposals for Expert Witness Testimony – Including My Own Antic Proposals” (Dec. 30, 2014).

[5] Edward K. Cheng, “The Consensus Rule: A New Approach to Scientific Evidence,” 75 Vanderbilt L. Rev. 407 (2022).

[6]Cheng’s Proposed Consensus Rule for Expert Witnesses” (Sept. 15, 2022);
Further Thoughts on Cheng’s Consensus Rule” (Oct. 3, 2022).

[7] Norman J. Shachoy Symposium, The Consensus Rule: A New Approach to the Admissibility of Scientific Evidence (2022), 67 Villanova L. Rev. (2022).

[8] David S. Caudill, “The ‘Crisis of Expertise’ Reaches the Courtroom: An Introduction to the Symposium on, and a Response to, Edward Cheng’s Consensus Rule,” 67 Villanova L. Rev. 837 (2022); Harry Collins, “The Owls: Some Difficulties in Judging Scientific Consensus,” 67 Villanova L. Rev. 877 (2022); Robert Evans, “The Consensus Rule: Judges, Jurors, and Admissibility Hearings,” 67 Villanova L. Rev. 883 (2022); Martin Weinel, “The Adversity of Adversarialism: How the Consensus Rule Reproduces the Expert Paradox,” 67 Villanova L. Rev. 893 (2022); Wendy Wagner, “The Consensus Rule: Lessons from the Regulatory World,” 67 Villanova L. Rev. 907 (2022); Edward K. Cheng, Elodie O. Currier & Payton B. Hampton, “Embracing Deference,” 67 Villanova L. Rev. 855 (2022).

[9] Embracing Deference at 876.

[10] Edward K. Cheng, Elodie O. Currier & Payton B. Hampton, “Embracing Deference,” 67 Villanova L. Rev. 855 (2022) [Embracing Deference]

[11] Rosen v. Ciba-Geigy Corp., 78 F.3d 316 (7th Cir. 1996).

[12] Embracing Deference at 859.

[13]Two Schools of Thought” (May 25, 2013).

[14] Jones v. Chidester, 531 Pa. 31, 40, 610 A.2d 964 (1992).

[15] Id. at 40.  See also Fallon v. Loree, 525 N.Y.S.2d 93, 93 (N.Y. App. Div. 1988) (“one of several acceptable techniques”); Dailey, “The Two Schools of Thought and Informed Consent Doctrine in Pennsylvania,” 98 Dickenson L. Rev. 713 (1994); Douglas Brown, “Panacea or Pandora’ Box:  The Two Schools of Medical Thought Doctrine after Jones v. Chidester,” 44 J. Urban & Contemp. Law 223 (1993).

[16] Embracing Deference at 861.

[17] Embracing Deference at 866.

[18] Embracing Deference at 876.

Reference Manual – Desiderata for 4th Edition – Part VI – Rule 703

February 17th, 2023

One of the most remarkable, and objectionable, aspects of the third edition was its failure to engage with Federal Rule of Evidence of 703, and the need for courts to assess the validity of individual studies relied upon. The statistics chapter has a brief, but important discussion of Rule 703, as does the chapter on survey evidence. The epidemiology chapter mentions Rule 703 only in a footnote.[1]

Rule 703 appears to be the red-headed stepchild of the Federal Rules, and it is often ignored and omitted from so-called Daubert briefs.[2] Perhaps part of the problem is that Rule 703 (“Bases of an Expert”) is one of the mostly poorly drafted rules in the Federal Rules of Evidence:

“An expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed. If experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted. But if the facts or data would otherwise be inadmissible, the proponent of the opinion may disclose them to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect.”

Despite its tortuous wording, the rule is clear enough in authorizing expert witnesses to rely upon studies that are themselves inadmissible, and allowing such witnesses to disclose the studies that they have relied upon, when there has been the requisite showing of probative value that outweighs any prejudice.

The statistics chapter in the third edition, nonetheless, confusingly suggested that

“a particular study may use a method that is entirely appropriate but that is so poorly executed that it should be inadmissible under Federal Rules of Evidence 403 and 702. Or, the method may be inappropriate for the problem at hand and thus lack the ‘fit’ spoken of in Daubert. Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703.”[3]

Particular studies, even when beautifully executed, are not admissible. And particular studies are not subject to evaluation under Rule 702, apart from the gatekeeping of expert witness opinion testimony that is based upon the particular studies. To be sure, the reference to Rule 703 is important and welcomed counter to the suggestion, elsewhere in the third edition, that courts should not look at individual studies. The independent review of individual studies is occasionally lost in the shuffle of litigation, and the statistics chapter is correct to note an evidentiary concern whether each individual study may or may not be reasonably relied upon by an expert witness. In any event, reasonably relied upon studies do not ipso facto become admissible.

The third edition’s chapter on Survey Research contains the most explicit direction on Rule 703, in terms of courts’ responsibilities.  In that chapter, the authors instruct that Rule 703:

“redirect[ed] attention to the ‘validity of the techniques employed’. The inquiry under Rule 703 focuses on whether facts or data are ‘of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject’.”[4]

Although Rule 703 is clear enough on admissibility, the epidemiology chapter described epidemiologic studies broadly as admissible if sufficiently rigorous:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[5]

The authors of the epidemiology chapter acknowledge, in a footnote, “that [h]earsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert.”[6]

This footnote is curious, and incorrect. There is no question that hearsay “concerns” “may limit” admissibility of a study; hearsay is inadmissible unless there is a statutory exception.[7] Rule 703 is not one of the exceptions to the rule against hearsay in Article VIII of the Federal Rules of Evidence. An expert witness’s reliance upon a study does not make the study admissible. The authors cite two cases,[8] but neither case held that reasonable reliance by expert witnesses transmuted epidemiologic studies into admissible evidence. The text of Rule 703 itself, and the overwhelming weight of case law interpreting and applying the rule,[9]  makes clear that the rule does not render scientific studies admissible. The two cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).[10] As such, the cases failed to support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies. The third edition thus, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence. The point was reasonably clear, however, that studies “may be offered” to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused.

The Reference Manual was certainly not alone in advancing the notion that studies are themselves admissible. Other well-respected evidence scholars have misstated the law on this issue.[11] The fourth edition would do well to note that scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay. A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication. Those leaps do not mean that the final results are thus untrustworthy or not reasonably relied upon, but they do raise well-nigh insuperable barriers to admissibility. The inadmissibility of scientific studies is generally not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

The fourth edition might well also note that under Rule 104(a), the Rules of Evidence themselves do not govern a trial court’s preliminary determination, under Rules 702 or 703, of the admissibility of an expert witness’s opinion, or the appropriateness of reliance upon a particular study. Although Rule 705 may allow disclosure of facts and data described in studies, it is not an invitation to permit testifying expert witnesses to become a conduit for off-hand comments and opinions in the introduction or discussion sections of relied upon articles.[12] The wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence. Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

The third edition evidence considerable ambivalence in whether trial judges should engage in resolving disputes about the validity of individual studies relied upon by expert witnesses. Since 2000, Rule 702 clearly required such engagement, which made the Manual’s hesitancy, on the whole, unjustifiable.  The ambivalence with respect to study validity, however, was on full display in the late Professor Margaret Berger’s chapter, “The Admissibility of Expert Testimony.”[13] Berger’s chapter criticized “atomization,” or looking at individual studies in isolation, a process she described pejoratively as “slicing-and-dicing.”[14]

Drawing on the publications of Daubert-critic Susan Haack, Berger appeared to reject the notion that courts should examine the reliability of each study independently.[15] Berger described the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer (IARC), the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[16]

Berger’s description of the review process, however, was profoundly misleading in its incompleteness. Of course, scientists undertaking a systematic review identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems. Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”[17]

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010, before the third edition was published. She was no friend of Daubert,[18] but her antipathy remarkably outlived her. Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[19]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.” One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Another curious omission in the third edition’s discussions of Milward is the dark ethical cloud of misconduct that hovers over the First Circuit’s reversal of the trial court’s exclusions of Martyn Smith and Carl Cranor. On appeal, the Council for Education and Research on Toxics (CERT) filed an amicus brief in support of reversing the exclusion of Smith and Cranor. The CERT amicus brief, however, never disclosed that CERT was founded by Smith and Cranor, and that CERT funded Smith’s research.[20]

Rule 702 requires courts to pay attention to, among other things, the sufficiency of the facts and data relied upon by expert witnesses. Rule 703’s requirement that individual studies must be reasonably relied upon is an important additional protreptic against the advice given by Professor Berger, in the third edition.


[1] The index notes the following page references for Rule 703: 214, 361, 363-364, and 610 n.184.

[2] See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”);  Schachtman, “Rule 703 – The Problem Child of Article VII,” 17 Proof 3 (Spring 2009); Schachtman “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008). See also Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008); “RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011); “Giving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[3] RMSE3d at 214.

[4] RMSE3d at 364 (internal citations omitted).

[5] RMSE 3d at 610 (internal citations omitted).

[6] RSME3d at 601 n.184.

[7] Rule 802 (“Hearsay Rule”) “Hearsay is not admissible except as provided by these rules or by other rules prescribed by the Supreme Court pursuant to statutory authority or by Act of Congress.”

[8] Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984); Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984). The chapter also cited another the en banc decision in Christophersen for the proposition that “[a]s a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . . ” In the Christophersen case, the Fifth Circuit was clearly addressing the admissibility of the challenged expert witness’s opinions, not the admissibility of relied-upon studies. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1111, 1113-14 (5th Cir. 1991) (en banc) (per curiam) (trial court may exclude opinion of expert witness whose opinion is based upon incomplete or inaccurate exposure data), cert. denied, 112 S. Ct. 1280 (1992).

[9] Interestingly, the authors of this chapter abandoned their suggestion, advanced in the second edition, that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5).” which was part of their argument in the Second Edition. RMSE 2d at 335 (2000). See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[10] See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18. These holdings predated the Supreme Court’s 1993 decision in Daubert, and the issue whether they are subject to Rule 702 has not been addressed.  Federal agency factual findings have been known to be invalid, on occasion.

[11] David L. Faigman, et al., Modern Scientific Evidence: The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[12] Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093, 1093 (2004) (advising readers on how to avoid being misled by published literature, and counseling readers to “Read only the Methods and Results sections; bypass the Discussion section.”)  (emphasis added).

[13] RSME 3d 11 (2011).

[14] Id. at 19.

[15] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[16] Id. at 19-20 & n.52.

[17] See Berger, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Professor Berger never mentions Rule 703 at all!  Gone and forgotten.

[18] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[19] Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”) The addition of the controversial Milward decision cannot seriously be considered an “edit.”

[20]From Here to CERT-ainty” (June 28, 2018); “ THE COUNCIL FOR EDUCATION AND RESEARCH ON TOXICS” (July 9, 2013).

Reference Manual – Desiderata for 4th Edition – Part IV – Confidence Intervals

February 10th, 2023

Putting aside the idiosyncratic chapter by the late Professor Berger, most of the third edition of the Reference Manual presented guidance on many important issues.  To be sure, there are gaps, inconsistencies, and mistakes, but the statistics chapter should be a must-read for federal (and state) judges. On several issues, especially statistical in nature, the fourth edition could benefit from an editor to ensure that the individual chapters, written by different authors, actually agree on key concepts.  One such example is the third edition’s treatment of confidence intervals.[1]

The “DNA Identification” chapter noted that the meaning of a confidence interval is subtle,[2] but I doubt that the authors, David Kaye and George Sensabaugh, actually found it subtle or difficult. In the third edition’s chapter on statistics, David Kaye and co-author, the late David A. Freedman, gave a reasonable definition of confidence intervals in their glossary:

confidence interval. An estimate, expressed as a range, for a parameter. For estimates such as averages or rates computed from large samples, a 95% confidence interval is the range from about two standard errors below to two standard errors above the estimate. Intervals obtained this way cover the true value about 95% of the time, and 95% is the confidence level or the confidence coefficient.”[3]

Intervals, not the interval, which is correct. This chapter made clear that it was the procedure of obtaining multiple samples with intervals that yielded the 95% coverage. In the substance of their chapter, Kaye and Freedman are explicit about how intervals are constructed, and that:

“the confidence level does not give the probability that the unknown parameter lies within the confidence interval.”[4]

Importantly, the authors of the statistics chapter named names; that is, they cited some cases that butchered the concept of the confidence interval.[5] The fourth edition will have a more difficult job because, despite the care taken in the statistics chapter, many more decisions have misstated or misrepresented the meaning of a confidence interval.[6] Citing more cases perhaps will disabuse federal judges of their reliance upon case law for the meaning of statistical concepts.

The third edition’s chapter on multiple regression defined confidence interval in its glossary:

confidence interval. An interval that contains a true regression parameter with a given degree of confidence.”[7]

The chapter avoided saying anything obviously wrong only by giving a very circular definition. When the chapter substantively described a confidence interval, it ended up giving an erroneous one:

“In general, for any parameter estimate b, the expert can construct an interval around b such that there is a 95% probability that the interval covers the true parameter. This 95% confidence interval is given by: b ± 1.96 (SE of b).”[8]

The formula provided is correct, but the interpretation of a 95% probability that the interval covers the true parameter is unequivocably wrong.[9]

The third edition’s chapter by Shari Seidman Diamond on survey research, on the other hand, gave an anodyne example and a definition:

“A survey expert could properly compute a confidence interval around the 20% estimate obtained from this sample. If the survey were repeated a large number of times, and a 95% confidence interval was computed each time, 95% of the confidence intervals would include the actual percentage of dentists in the entire population who would believe that Goldgate was manufactured by the makers of Colgate.

                 *  *  *  *

Traditionally, scientists adopt the 95% level of confidence, which means that if 100 samples of the same size were drawn, the confidence interval expected for at least 95 of the samples would be expected to include the true population value.”[10]

Similarly, the third edition’s chapter on epidemiology correctly defined the confidence interval operationally as a process of iterative intervals that collectively cover the true value in 95% of all the intervals:

“A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”[11]

Not content to leave it well said, the chapter’s authors returned to the confidence interval and provided another, more problematic definition, a couple of pages later in the text:

“A confidence interval is a range of possible values calculated from the results of a study. If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”[12]

The first sentence refers to “a study”; that is, one study, one range of values. The second sentence then tells us that “the range” (singular, presumably referring back to the single “a study”), will capture 95% of the results from many resamplings from the same population. Now the definition is not framed with respect to the true population parameter, but the results from many other samples. The authors seem to have given the first sample’s confidence interval the property of including 95% of all future studies, and that is incorrect. From reviewing the case law, courts remarkably have gravitated to the second, incorrect definition.

The glossary to the third edition’s epidemiology chapter clearly, however, runs into the ditch:

“confidence interval. A range of values calculated from the results of a study within which the true value is likely to fall; the width of the interval reflects random error. Thus, if a confidence level of .95 is selected for a study, 95% of similar studies would result in the true relative risk falling within the confidence interval.”[13]

Note that the sentence before the semicolon talked of “a study” with “a range of values,” and that there is a likelihood of that range including the “true value.” This definition thus used the singular to describe the study and to describe the range of values.  The definition seemed to be saying, clearly but wrongly, that a single interval from a single study has a likelihood of containing the true value. The second full sentence ascribed a probability, 95%, to the true relative risk’s falling within “the interval.” To point out the obvious, “the interval,” is singular, and refers back to “a study,” also singular. At best, this definition was confusing; at worst, it was wrong.

The Reference Manual has a problem beyond its own inconsistencies, and the refractory resistance of the judiciary to statistical literacy. There are any number of law professors and even scientists who have held out incorrect definitions and interpretations of confidence intervals.  It would be helpful for the fourth edition to caution its readers, both bench and bar, to the prevalent misunderstandings.

Here, for instance, is an example of a well-credentialed statistician, who gave a murky definition in a declaration filed in federal court:

“If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”[14]

The expert witness correctly identifies the repeated sampling, but specifies a 95% probability to “the range,” which leaves unclear whether it is the range of all intervals or “a 95% confidence interval,” which is in the antecedent of the statement.

Much worse was a definition proffered in a recent law review article by well-known, respected authors:

“A 95% confidence interval, in contrast, is a one-sided or two-sided interval from a data sample with 95% probability of bounding a fixed, unknown parameter, for which no nondegenerate probability distribution is conceived, under specified assumptions about the data distribution.”[15]

The phrase “for which no nondegenerate probability distribution is conceived,” is unclear as to whether the quoted phrase refers to the confidence interval or to the unknown parameter. It seems that the phrase modifies the noun closest to it in the sentence, the “fixed, unknown parameter,” which suggests that these authors were simply trying to emphasize that they were giving a frequentist interpretation and not conceiving of the parameter as a random variable as Bayesians would. The phrase “no nondegenerate” appears to be a triple negative, since a degenerate distribution is one that does not have a variation. The phrase makes the definition obscure, and raises questions what is being excluded by the phrase.

The more concerning aspect of the quoted footnote is its obfuscation of the important distinction between the procedure of repeatedly calculating confidence intervals (which procedure has a 95% success rate in the long run) and the probability that any given instance of the procedure, in a single confidence interval, contains the parameter. The latter probability is either zero or one.

The definition’s reference to “a” confidence interval, based upon “a” data sample, actually leaves the reader with no way of understanding the definition to be referring to the repeated process of sampling, and the set of resulting intervals. The upper and lower interval bounds are themselves random variables that need to be taken into account, but by referencing a single interval from a single data sample, the authors misrepresent the confidence interval and invite a Bayesian interpretation.[16]

Sadly, there is a long tradition of scientists and academics in giving errant definitions and interpretations of the confidence interval.[17] Their error is not harmless because they invite the attribution of a high level of probability to the claim that the “true” population measure is within the reported confidence interval. The error encourages readers to believe that the confidence interval is not conditioned upon the single sample result, and it misleads readers into believing that not only random error, but systematic and data errors are accounted for in the posterior probability.[18] 


[1]Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012).

[2] David H. Kaye & George Sensabaugh, “Reference Guide on DNA Identification Evidence” 129, 165 n.76.

[3] David H. Kaye & David A. Freedman, “Reference Guide on Statistics” 211, 284-5 (Glossary).

[4] Id. at 247.

[5] Id. at 247 n.91 & 92 (citing DeLuca v. Merrell Dow Pharms., Inc., 791 F. Supp. 1042, 1046 (D.N.J. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993); SmithKline Beecham Corp. v. Apotex Corp., 247 F. Supp. 2d 1011, 1037 (N.D. Ill. 2003), aff’d on other grounds, 403 F.3d 1331 (Fed. Cir. 2005); In re Silicone Gel Breast Implants Prods. Liab. Litig, 318 F. Supp. 2d 879, 897 (C.D. Cal. 2004) (“a margin of error between 0.5 and 8.0 at the 95% confidence level . . . means that 95 times out of 100 a study of that type would yield a relative risk value somewhere between 0.5 and 8.0.”).

[6] See, e.g., Turpin v. Merrell Dow Pharm., Inc., 959 F.2d 1349, 1353–54 & n.1 (6th Cir. 1992) (erroneously describing a 95% CI of 0.8 to 3.10, to mean that “random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10”); American Library Ass’n v. United States, 201 F.Supp. 2d 401, 439 & n.11 (E.D.Pa. 2002), rev’d on other grounds, 539 U.S. 194 (2003); Ortho–McNeil Pharm., Inc. v. Kali Labs., Inc., 482 F.Supp. 2d 478, 495 (D.N.J.2007) (“Therefore, a 95 percent confidence interval means that if the inventors’ mice experiment was repeated 100 times, roughly 95 percent of results would fall within the 95 percent confidence interval ranges.”) (apparently relying party’s expert witness’s report), aff’d in part, vacated in part, sub nom. Ortho McNeil Pharm., Inc. v. Teva Pharms Indus., Ltd., 344 Fed.Appx. 595 (Fed. Cir. 2009); Eli Lilly & Co. v. Teva Pharms, USA, 2008 WL 2410420, *24 (S.D. Ind. 2008) (stating incorrectly that “95% percent of the time, the true mean value will be contained within the lower and upper limits of the confidence interval range”); Benavidez v. City of Irving, 638 F.Supp. 2d 709, 720 (N.D. Tex. 2009) (interpreting a 90% CI to mean that “there is a 90% chance that the range surrounding the point estimate contains the truly accurate value.”); Pritchard v. Dow Agro Sci., 705 F. Supp. 2d 471, 481, 488 (W.D. Pa. 2010) (excluding Dr. Bennet Omalu who assigned a 90% probability that an 80% confidence interval excluded relative risk of 1.0), aff’d, 430 F. App’x 102 (3d Cir.), cert. denied, 132 S. Ct. 508 (2011); Estate of George v. Vermont League of Cities and Towns, 993 A.2d 367, 378 n.12 (Vt. 2010) (erroneously describing a confidence interval to be a “range of values within which the results of a study sample would be likely to fall if the study were repeated numerous times”); Garcia v. Tyson Foods, 890 F. Supp. 2d 1273, 1285 (D. Kan. 2012) (quoting expert witness Robert G. Radwin, who testified that a 95% confidence interval in a study means “if I did this study over and over again, 95 out of a hundred times I would expect to get an average between that interval.”); In re Chantix (Varenicline) Prods. Liab. Litig., 889 F. Supp. 2d 1272, 1290n.17 (N.D. Ala. 2012); In re Zoloft Products, 26 F. Supp. 3d 449, 454 (E.D. Pa. 2014) (“A 95% confidence interval means that there is a 95% chance that the ‘‘true’’ ratio value falls within the confidence interval range.”), aff’d, 858 F.3d 787 (3d Cir. 2017); Duran v. U.S. Bank Nat’l Ass’n, 59 Cal. 4th 1, 36, 172 Cal. Rptr. 3d 371, 325 P.3d 916 (2014) (“Statisticians typically calculate margin of error using a 95 percent confidence interval, which is the interval of values above and below the estimate within which one can be 95 percent certain of capturing the ‘true’ result.”); In re Accutane Litig., 451 N.J. Super. 153, 165 A.3d 832, 842 (2017) (correctly quoting an incorrect definition from the third edition at p.580), rev’d on other grounds, 235 N.J. 229, 194 A.3d 503 (2018); In re Testosterone Replacement Therapy Prods. Liab., No. 14 C 1748, MDL No. 2545, 2017 WL 1833173, *4 (N.D. Ill. May 8, 2017) (“A confidence interval consists of a range of values. For a 95% confidence interval, one would expect future studies sampling the same population to produce values within the range 95% of the time.”); Maldonado v. Epsilon Plastics, Inc., 22 Cal. App. 5th 1308, 1330, 232 Cal. Rptr. 3d 461 (2018) (“The 95 percent ‘confidence interval’, as used by statisticians, is the ‘interval of values above and below the estimate within which one can be 95 percent certain of capturing the “true” result’.”); Escheverria v. Johnson & Johnson, 37 Cal. App. 5th 292, 304, 249 Cal. Rptr. 3d 642 (2019) (quoting uncritically and with approval one of plaintiff’s expert witnesses, Jack Siemiatycki, who gave the jury an example of a study with a relative risk of 1.2, with a “95 percent probability that the true estimate is between 1.1 and 1.3.” According to the court, Siemiatycki went on to explain that this was “a pretty tight interval, and we call that a confidence interval. We call it a 95 percent confidence interval when we calculate it in such a way that it covers 95 percent of the underlying relative risks that are compatible with this estimate from this study.”); In re Viagra (Sildenafil Citrate) & Cialis (Tadalafil) Prods. Liab. Litig., 424 F.Supp.3d 781, 787 (N.D. Cal. 2020) (“For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent “confidence interval” of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”); Rhyne v. United States Steel Corp., 74 F. Supp. 3d 733, 744 (W.D.N.C. 2020) (relying upon, and quoting, one of the more problematic definitions given in the third edition at p.580: “If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the population.”); Wilant v. BNSF Ry., C.A. No. N17C-10-365 CEB, (Del. Super. Ct. May 13, 2020) (citing third edition at p.573, “a confidence interval provides ‘a range (interval) within which the risk likely would fall if the study were repeated numerous times’.”; “[s]o a 95% confidence interval indicates that the range of results achieved in the study would be achieved 95% of the time when the study is replicated from the same population.”); Germaine v. Sec’y Health & Human Servs., No. 18-800V, (U.S. Fed. Ct. Claims July 29, 2021) (giving an incorrect definition directly from the third edition, at p.621; “[a] “confidence interval” is “[a] range of values … within which the true value is likely to fall[.]”).

[7] Daniel Rubinfeld, “Reference Guide on Multiple Regression” 303, 352.

[8] Id. at 342.

[9] See Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidemiol. 337, 343 (2016).

[10] Shari Seidman Diamond, “Reference Guide on Survey Research” 359, 381.

[11] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 573.

[12] Id. at 580.

[13] Id. at 621.

[14] In re Testosterone Replacement Therapy Prods. Liab. Litig., Declaration of Martin T. Wells, Ph.D., at 2-3 (N.D. Ill., Oct. 30, 2016). 

[15] Joseph Sanders, David Faigman, Peter Imrey, and A. Philip Dawid, “Differential Etiology: Inferring Specific Causation in the Law from Group Data in Science,” 63 Arizona L. Rev. 851, 898 n.173 (2021).

[16] The authors are well-credentialed lawyers and scientists. Peter Imrey, was trained in, and has taught, mathematical statistics, biostatistics, and epidemiology. He is a professor of medicine in the Cleveland Clinic Lerner College of Medicine. A. Philip Dawid is a distinguished statistician, an Emeritus Professor of Statistics, Cambridge University, Darwin College, and a Fellow of the Royal Society. David Faigman is the Chancellor & Dean, and the John F. Digardi Distinguished Professor of Law at the University of California Hastings College of the Law. Joseph Sanders is the A.A. White Professor, at the University of Houston Law Center. I have previously pointed this problem in these authors’ article. “Differential Etiologies – Part One – Ruling In” (June 19, 2022).

[17] See, e.g., Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004) (“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 60-61 n. 17 (2007) (quoting Clapp and Ozonoff with obvious approval); Déirdre DwyerThe Judicial Assessment of Expert Evidence 154-55 (Cambridge Univ. Press 2008) (“By convention, scientists require a 95 per cent probability that a finding is not due to chance alone. The risk ratio (e.g. ‘2.2’) represents a mean figure. The actual risk has a 95 per cent probability of lying somewhere between upper and lower limits (e.g. 2.2 ±0.3, which equals a risk somewhere between 1.9 and 2.5) (the ‘confidence interval’).”); Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103, 110 (2013) (“A confidence interval provides both the relative risk found in the study and a range (interval) within which the risk would likely fall if the study were repeated numerous times.”); Christopher B. Mueller, “Daubert Asks the Right Questions:  Now Appellate Courts Should Help Find the Right Answers,” 33 Seton Hall L. Rev. 987, 997 (2003) (describing the 95% confidence interval as “the range of outcomes that would be expected to occur by chance no more than five percent of the time”); Arthur H. Bryant & Alexander A. Reinert, “The Legal System’s Use of Epidemiology,” 87 Judicature 12, 19 (2003) (“The confidence interval is intended to provide a range of values within which, at a specified level of certainty, the magnitude of association lies.”) (incorrectly citing the first edition of Rothman & Greenland, Modern Epidemiology 190 (Philadelphia 1998);  John M. Conley & David W. Peterson, “The Science of Gatekeeping: The Federal Judicial Center’s New Reference Manual on Scientific Evidence,” 74 N.C.L.Rev. 1183, 1212 n.172 (1996) (“a 95% confidence interval … means that we can be 95% certain that the true population average lies within that range”).

[18] See Brock v. Merrill Dow Pharm., Inc., 874 F.2d 307, 311–12 (5th Cir. 1989) (incorrectly stating that the court need not resolve questions of bias and confounding because “the studies presented to us incorporate the possibility of these factors by the use of a confidence interval”). Bayesian credible intervals can similarly be misleading when the interval simply reflects sample results and sample variance, but not the myriad other ways the estimate may be wrong.

Amicus Curious – Gelbach’s Foray into Lipitor Litigation

August 25th, 2022

Professor Schauer’s discussion of statistical significance, covered in my last post,[1] is curious for its disclaimer that “there is no claim here that measures of statistical significance map easily onto measures of the burden of proof.” Having made the disclaimer, Schauer proceeds to falls into the transposition fallacy, which contradicts his disclaimer, and, generally speaking, is not a good thing for a law professor eager to advance the understanding of “The Proof,” to do.

Perhaps more curious than Schauer’s error is his citation support for his disclaimer.[2] The cited paper by Jonah B. Gelbach is one of several of Gelbach’s papers that advances the claim that the p-value does indeed map onto posterior probability and the burden of proof. Gelbach’s claim has also been the center piece in his role as an advocate in support of plaintiffs in the Lipitor (atorvastatin) multi-district litigation (MDL) over claims that ingestion of atorvastatin causes diabetes mellitus.

Gelbach’s intervention as plaintiffs’ amicus is peculiar on many fronts. At the time of the Lipitor litigation, Sonal Singh was an epidemiologist and Assistant Professor of Medicine, at the Johns Hopkins University. The MDL trial court initially held that Singh’s proffered testimony was inadmissible because of his failure to consider daily dose.[3] In a second attempt, Singh offered an opinion for 10 mg daily dose of atorvastatin, based largely upon the results of a clinical trial known as ASCOT-LLA.[4]

The ASCOT-LLA trial randomized 19,342 participants with hypertension and at least three other cardiovascular risk factors to two different anti-hypertensive medications. A subgroup with total cholesterol levels less than or equal to 6.5 mmol./l. were randomized to either daily 10 mg. atorvastatin or placebo.  The investigators planned to follow up for five years, but they stopped after 3.3 years because of clear benefit on the primary composite end point of non-fatal myocardial infarction and fatal coronary heart disease. At the time of stopping, there were 100 events of the primary pre-specified outcome in the atorvastatin group, compared with 154 events in the placebo group (hazard ratio 0.64 [95% CI 0.50 – 0.83], p = 0.0005).

The atorvastatin component of ASCOT-LLA had, in addition to its primary pre-specified outcome, seven secondary end points, and seven tertiary end points.  The emergence of diabetes mellitus in this trial population, which clearly was at high risk of developing diabetes, was one of the tertiary end points. Primary, secondary, and tertiary end points were reported in ASCOT-LLA without adjustment for the obvious multiple comparisons. In the treatment group, 3.0% developed diabetes over the course of the trial, whereas 2.6% developed diabetes in the placebo group. The unadjusted hazard ratio was 1.15 (0.91 – 1.44), p = 0.2493.[5] Given the 15 trial end points, an adjusted p-value for this particular hazard ratio, for diabetes, might well exceed 0.5, and even approach 1.0.

On this record, Dr. Singh honestly acknowledged that statistical significance was important, and that the diabetes finding in ASCOT-LLA might have been the result of low statistical power or of no association at all. Based upon the trial data alone, he testified that “one can neither confirm nor deny that atorvastatin 10 mg is associated with significantly increased risk of type 2 diabetes.”[6] The trial court excluded Dr. Singh’s 10mg/day causal opinion, but admitted his 80mg/day opinion. On appeal, the Fourth Circuit affirmed the MDL district court’s rulings.[7]

Jonah Gelbach is a professor of law at the University of California at Berkeley. He attended Yale Law School, and received his doctorate in economics from MIT.

Professor Gelbach entered the Lipitor fray to present a single issue: whether statistical significance at conventionally demanding levels such as 5 percent is an appropriate basis for excluding expert testimony based on statistical evidence from a single study that did not achieve statistical significance.

Professor Gelbach is no stranger to antic proposals.[8] As amicus curious in the Lipitor litigation, Gelbach asserts that plaintiffs’ expert witness, Dr. Singh, was wrong in his testimony about not being able to confirm the ASCOT-LLA association because he, Gelbach, could confirm the association.[9] Ultimately, the Fourth Circuit did not discuss Gelbach’s contentions, which is not surprising considering that the asserted arguments and alleged factual considerations were not only dehors the record, but in contradiction of the record.

Gelbach’s curious claim is that any time a risk ratio, for an exposure and an outcome of interest, is greater than 1.0, with a p-value < 0.5,[10] the evidence should be not only admissible, but sufficient to support a conclusion of causation. Gelbach states his claim in the context of discussing a single randomized controlled trial (ASCOT-LLA), but his broad pronouncements are carelessly framed such that others may take them to apply to a single observational study, with its greater threats to internal validity.

Contra Kumho Tire

To get to his conclusion, Gelbach attempts to remove the constraints of traditional standards of significance probability. Kumho Tire teaches that expert witnesses must “employ[] in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.”[11] For Gelbach, this “eminently reasonable admonition” does not impose any constraints on statistical inference in the courtroom. Statistical significance at traditional levels (p < 0.05) is for elitist scholarly work, not for the “practical” rent-seeking work of the tort bar. According to Gelbach, the inflation of the significance level ten-fold to p < 0.5 is merely a matter of “weight” and not admissibility of any challenged opinion testimony.

Likelihood Ratios and Posterior Probabilities

Gelbach maintains that any evidence that has a likelihood ratio (LR > 1) greater than one is relevant, and should be admissible under Federal Rule of Evidence 401.[12] This argument ignores the other operative Federal Rules of Evidence, namely 702 and 703, which impose additional criteria of admissibility for expert witness opinion testimony.

With respect to variance and random error, Gelbach tells us that any evidence that generates a LR > 1, should be admitted when “the statistical evidence is statistically significant below the 50 percent level, which will be true when the p-value is less than 0.5.”[13]

At times, Gelbach seems to be discussing the admissibility of the ASCOT-LLA study itself, and not the proffered opinion testimony of Dr. Singh. The study itself would not be admissible, although it is clearly the sort of hearsay an expert witness in the field may consider. If Dr. Singh were to have reframed and recalculated the statistical comparisons, then the Rule 703 requirement of “reasonable reliance” by scientists in the field of interest may not have been satisfied.

Gelbach also generates a posterior probability (0.77), which is based upon his calculations from data in the ASCOT-LLA trial, and not the posterior probability of Dr. Singh’s opinion. The posterior probability, as calculated, is problematic on many fronts.

Gelbach does not present his calculations – for the sake of brevity he says – but he tells us that the ASCOT-LLA data yield a likelihood ratio of roughly 1.9, and a p-value of 0.126.[14] What the clinical trialists reported was a hazard ratio of 1.15, which is a weak association on most researchers’ scales, with a two-sided p-value of 0.25, which is five times higher than the usual 5 percent. Gelbach does not explain how or why his calculated p-value for the likelihood ratio is roughly half the unadjusted, two-sided p-value for the tertiary outcome from ASCOT-LLA.

As noted, the reported diabetes hazard ratio of 1.15 was a tertiary outcome for the ASCOT trial, one of 15 calculated by the trialists, with p-values unadjusted for multiple comparisons.  The failure to adjust is perhaps excusable in that some (but certainly not all) of the outcome variables are overlapping or correlated. A sophisticated reader would not be misled; only when someone like Gelbach attempts to manufacture an inflated posterior probability without accounting for the gross underestimate in variance is there an insult to statistical science. Gelbach’s recalculated p-value for his LR, if adjusted for the multiplicity of comparisons in this trial, would likely exceed 0.5, rendering all his arguments nugatory.

Using the statistics as presented by the published ASCOT-LLA trial to generate a posterior probability also ignores the potential biases (systematic errors) in data collection, the unadjusted hazard ratios, the potential for departures from random sampling, errors in administering the participant recruiting and inclusion process, and other errors in measurements, data collection, data cleaning, and reporting.

Gelbach correctly notes that there is nothing methodologically inappropriate in advocating likelihood ratios, but he is less than forthcoming in explaining that such ratios translate into a posterior probability only if he posits a prior probability of 0.5.[15] His pretense to having simply stated “mathematical facts” unravels when we consider his extreme, unrealistic, and unscientific assumptions.

The Problematic Prior

Gelbach’s glibly assumes that the starting point, the prior probability, for his analysis of Dr. Singh’s opinion is 50%. This is an old and common mistake,[16] long since debunked.[17] Gelbach’s assumption is part of an old controversy, which surfaced in early cases concerning disputed paternity. The assumption, however, is wrong legally and philosophically.

The law simply does not hand out 0.5 prior probability to both parties at the beginning of a trial. As Professor Jaffee noted almost 35 years ago:

“In the world of Anglo-American jurisprudence, every defendant, civil and criminal, is presumed not liable. So, every claim (civil or criminal) starts at ground zero (with no legal probability) and depends entirely upon proofs actually adduced.”[18]

Gelbach assumes that assigning “equal prior probability” to two adverse parties is fair, because the fact-finder would not start hearing evidence with any notion of which party’s contentions are correct. The 0.5/0.5 starting point, however, is neither fair nor is it the law.[19] The even odds prior is also not good science.

The defense is entitled to a presumption that it is not liable, and the plaintiff must start at zero.  Bayesians understand that this is the death knell of their beautiful model.  If the prior probability is zero, then Bayes’ Theorem tells us mathematically that no evidence, no matter how large a likelihood ratio, can move the prior probability of zero towards one. Bayes’ theorem may be a correct statement about inverse probabilities, but still be an inadequate or inaccurate model for how factfinders do, or should, reason in determining the ultimate facts of a case.

We can see how unrealistic and unfair Gelbach’s implied prior probability is if we visualize the proof process as a football field.  To win, plaintiffs do not need to score a touchdown; they need only cross the mid-field 50-yard line. Rather than making plaintiffs start at the zero-yard line, however, Gelbach would put them right on the 50-yard line. Since one toe over the mid-field line is victory, the plaintiff is spotted 99.99+% of its burden of having to present evidence to build up 50% probability. Instead, plaintiffs are allowed to scoot from the zero yard line right up claiming success, where even the slightest breeze might give them winning cases. Somehow, in the model, plaintiffs no longer have to present evidence to traverse the first half of the field.

The even odds starting point is completely unrealistic in terms of the events upon which the parties are wagering. The ASCOT-LLA study might have shown a protective association between atorvastatin and diabetes, or it might have shown no association at all, or it might have show a larger hazard ratio than measured in this particular sample. Recall that the confidence interval for hazard ratios for diabetes ran from 0.91 to 1.44. In other words, parameters from 0.91 (protective association) to 1.0 (no association), to 1.44 (harmful association) were all reasonably compatible with the observed statistic, based upon this one study’s data. The potential outcomes are not binary, which makes the even odds starting point inappropriate.[20]


[1]Schauer’s Long Footnote on Statistical Significance” (Aug. 21, 2022).

[2] Frederick Schauer, The Proof: Uses of Evidence in Law, Politics, and Everything Else 54-55 (2022) (citing Michelle M. Burtis, Jonah B. Gelbach, and Bruce H. Kobayashi, “Error Costs, Legal Standards of Proof, and Statistical Significance,” 25 Supreme Court Economic Rev. 1 (2017).

[3] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., MDL No. 2:14–mn–02502–RMG, 2015 WL 6941132, at *1  (D.S.C. Oct. 22, 2015).

[4] Peter S. Sever, et al., “Prevention of coronary and stroke events with atorvastatin in hypertensive patients who have average or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian Cardiac Outcomes Trial Lipid Lowering Arm (ASCOT-LLA): a multicentre randomised controlled trial,” 361 Lancet 1149 (2003). [cited here as ASCOT-LLA]

[5] ASCOT-LLA at 1153 & Table 3.

[6][6] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 174 F.Supp. 3d 911, 921 (D.S.C. 2016) (quoting Dr. Singh’s testimony).

[7] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 892 F.3d 624, 638-39 (2018) (affirming MDL trial court’s exclusion in part of Dr. Singh).

[8] SeeExpert Witness Mining – Antic Proposals for Reform” (Nov. 4, 2014).

[9] Brief for Amicus Curiae Jonah B. Gelbach in Support of Plaintiffs-Appellants, In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 2017 WL 1628475 (April 28, 2017). [Cited as Gelbach]

[10] Gelbach at *2.

[11] Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

[12] Gelbach at *5.

[13] Gelbach at *2, *6.

[14] Gelbach at *15.

[15] Gelbach at *19-20.

[16] See Richard A. Posner, “An Economic Approach to the Law of Evidence,” 51 Stanford L. Rev. 1477, 1514 (1999) (asserting that the “unbiased fact-finder” should start hearing a case with even odds; “[I]deally we want the trier of fact to work from prior odds of 1 to 1 that the plaintiff or prosecutor has a meritorious case. A substantial departure from this position, in either direction, marks the trier of fact as biased.”).

[17] See, e.g., Richard D. Friedman, “A Presumption of Innocence, Not of Even Odds,” 52 Stan. L. Rev. 874 (2000). [Friedman]

[18] Leonard R. Jaffee, “Prior Probability – A Black Hole in the Mathematician’s View of the Sufficiency and Weight of Evidence,” 9 Cardozo L. Rev. 967, 986 (1988).

[19] Id. at p.994 & n.35.

[20] Friedman at 877.

Schauer’s Long Footnote on Statistical Significance

August 21st, 2022

One of the reasons that, in 2016, the American Statistical Association (ASA) issued, for the first time in its history, a consensus statement on p-values, was the persistent and sometimes deliberate misstatements and misrepresentations about the meaning of the p-value. Indeed, of the six principles articulated by the ASA, several were little more than definitional, designed to clear away misunderstandings.  Notably, “Principle Two” addresses one persistent misunderstanding and states:

“P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.”[1]

The ASA consensus statement followed on the heels of an important published article, written by seven important authors in the fields of statistics and epidemiology.[2] One statistician,[3] who frequently shows up as an expert witness for multi-district litigation plaintiffs, described the article’s authors as the “A-Team” of statistics. In any event, the seven prominent thought leaders identified common statistical misunderstandings, including the belief that:

“2. The P value for the null hypothesis is the probability that chance alone produced the observed association; for example, if the P value for the null hypothesis is 0.08, there is an 8% probability that chance alone produced the association. No![4]

This is all basic statistics.

Frederick Schauer is the David and Mary Harrison Distinguished Professor of Law at the University of Virginia. Schauer has had contributed prolifically to legal scholarship, and his publications are often well written and thoughtful analyses. Schauer’s recent book, The Proof: Uses of Evidence in Law, Politics, and Everything Else, published by the Harvard University Press is a contribution to the literature of “legal epistemology,” and the foundations of evidence that lie beneath many of our everyday and courtroom approaches to resolving disputes.[5] Schauer’s book might be a useful addition to an undergraduate’s reading list for a course in practical epistemology, or for a law school course on evidence. The language of The Proof is clear and lively, but at times wanders into objectionable and biased political correctness. For example, Schauer channels Naomi Oreskes and her critique of manufacturing industry in his own discussion of “manufactured evidence,”[6] but studiously avoids any number of examples of explicit manufacturing of fraudulent evidence in litigation by the lawsuit industry.[7] Perhaps the most serious omission in this book on evidence is its failure to discuss the relative quality and hierarchy of evidence in science, medicine, and in policy.  Readers will not find any mention of the methodology of systematic reviews or meta-analyses in Schauer’s work.

At the end of his chapter on burdens of proof, Schauer adds “A Long Footnote on Statistical Significance,” in which he expresses surprise that the subject of statistical significance is controversial. Schauer might well have brushed up on the statistical concepts he wanted to discuss.

Schauer’s treatment of statistical significance is both distinctly unbalanced, as well as misstated. In an endnote,[8] Schauer cites some of the controversialists who have criticized significance tests, but none of the statisticians who have defended their use.[9]

As for conceptual accuracy, after giving a serviceable definition of the p-value, Schauer immediately goes astray:

And this likelihood is conventionally described in terms of a p-value, where the p-value is the probability that positive results—rejection of the “null hypothesis” that there is no connection between the examined variables—were produced by chance.”[10]

And again, for emphasis, Schauer tells us:

“A p-value of greater than .05 – a greater than 5 percent probability that the same results would have been the result of chance – has been understood to mean that the results are not statistically significant.”[11]

And then once more for emphasis, in the context of an emotionally laden hypothetical about an experimental drug “cures” a dread, incurable disease, p = 0.20, Schauer tells us that he suspects most people would want to take the medication:

“recognizing that an 80 percent likelihood that the rejection of ineffectiveness was still good enough, at least if there were no other alternatives.”

Schauer wants to connect his discussion of statistical significance to degrees or varying strengths of evidence, but his discursion into statistical significance largely conflates precision with strength. Evidence can be statistically robust but not be very strong. If we imagine a very large randomized clinical trial that found that a medication lowered systolic blood pressure by 1mm of mercury, p < 0.05, we would not consider that finding to constitute strong evidence for therapeutic benefit. If the observation of lowering blood pressure by 1mm came from an observational study, p < 0.05, the finding might not even qualify as evidence in the views of sophisticated cardiovascular physicians and researchers.

Earlier in the chapter, Schauer points to instances in which substantial evidence for a conclusion is downplayed because it is not “conclusive,” or “definitive.” He is obviously keen to emphasize that evidence that is not “conclusive” may still be useful in some circumstances. In this context, Schauer yet again misstates the meaning of significance probability, when he tells us that:

“[j]ust as inconclusive or even weak evidence may still be evidence, and may still be useful evidence for some purposes, so too might conclusions – rejections of the null hypothesis – that are more than 5 percent likely to have been produced by chance still be valuable, depending on what follows from those conclusions.”[12]

And while Schauer is right that weak evidence may still be evidence, he seems loathe to admit that weak evidence may be pretty crummy support for a conclusion. Take, for instance, a fair coin.  We have an expected value on ten flips of five heads and five tails.  We flip the coin ten times, but we observe six heads and four tails.  Do we now have “evidence” that the expected value and the expected outcome are wrong?  Not really. The probability of observing the expected outcome on the binomial model that most people would endorse for the thought experiment is 24.6%. The probability of not observing the expected value in ten flips is three times greater. If we look at an epidemiologic study, with a sizable number of participants, the “expected value” of 1.0, embodied in the null hypothesis, is an outcome that we would rarely expect to see, even if the null hypothesis is correct.  Schauer seems to have missed this basic lesson of probability and statistics.

Perhaps even more disturbing is that Schauer fails to distinguish the other determinants of study validity and the warrants for inferring a conclusion at any level of certainty. There is a distinct danger that his comments about p-values will be taken to apply to various study designs, descriptive, observational, and experimental. And there is a further danger that incorrect definitions of the p-value and statistical significance probabilities will be used to misrepresent p-values as relating to posterior probabilities. Surely, a distinguished professor of law, at a top law school, in a book published by a prestigious  publisher (Belknap Press) can do better. The message for legal practitioners is clear. If you need to define or discuss statistical concepts in a brief, seek out a good textbook on statistics. Do not rely upon other lawyers, even distinguished law professors, or judges, for accurate working definitions.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129, 131 (2016).

[2] Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 European J. Epidemiol. 337 (2016).[cited as “Seven Sachems”]

[3] Martin T. Wells.

[4] Seven Sachems at 340 (emphasis added).

[5] Frederick Schauer, The Proof: Uses of Evidence in Law, Politics, and Everything Else (2022). [Schauer] One nit: Schauer cites a paper by A. Philip Dawid, “Statistical Risk,” 194 Synthese 3445 (2017). The title of the paper is “On individual risk.”

[6] Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Climate Change (2010).

[7] See, e.g., In re Silica Prods. Liab. Litig., 398 F.Supp. 2d 563 (S.D.Tex. 2005); Transcript of Daubert Hearing at 23 (Feb. 17, 2005) (“great red flags of fraud”).

[8] See Schauer endnote 44 to Chapter 3, “The Burden of Proof,” citing Valentin Amrhein, Sander Greenland, and Blake McShane, “Scientists Rise Up against Statistical Significance,” www .nature .com (March 20, 2019), which in turn commented upon Blakey B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett, “Abandon Statistical Significance,” 73 American Statistician 235 (2019).

[9] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021); see alsoA Proclamation from the Task Force on Statistical Significance” (June 21, 2021).

[10] Schauer at 55. To be sure, Schauer, in endnote 43 to Chapter 3, disclaims any identification of p-values or measures of statistical significance with posterior probabilities or probabilistic measures of the burden of proof. Nonetheless, in the text, he proceeds to do exactly what he disclaimed in the endnote.

[11] Schauer at 55.

[12] Schauer at 56.

Rule 702 is Liberal, Not Libertine; Epistemic, Not Mechanical

October 4th, 2021

One common criticism of expert witness gatekeeping after the Supreme Court’s Daubert decision has been that the decision contravenes the claimed “liberal thrust” of the Federal Rules of Evidence. The criticism has been repeated so often as to become a cliché, but its frequent repetition by lawyers and law professors hardly makes it true. The criticism fails to do justice to the range of interpretations of “liberal” in the English language, the context of expert witness common law, and the language of Rule 702, both before and after the Supreme Court’s Daubert decision.

The first problem with the criticism is that the word “liberal,” or the phrase “liberal thrust,” does not appear in the Federal Rules of Evidence. The drafters of the Rules did, however, set out the underlying purpose of the federal codification of common law evidence in Rule 102, with some care:

“These rules should be construed so as to administer every proceeding fairly, eliminate unjustifiable expense and delay, and promote the development of evidence law, to the end of ascertaining the truth and securing a just determination.”

Nothing could promote ascertaining truth and achieving just determinations more than eliminating weak and invalid scientific inference in the form of expert witness opinion testimony. Barring speculative, unsubstantiated, and invalid opinion testimony before trial certainly has the tendency to eliminate full trials, with their expense and delay. And few people would claim unfairness in deleting invalid opinions from litigation. If there is any “liberal thrust” in the purpose of the Federal Rules of Evidence, it serves to advance the truth-finding function of trials.

And yet some legal commentators go so far as to claim that Daubert was wrongly decided because it offends the “liberal thrust” of federal rules.[1] Of course, it is true that the Supreme Court spoke of basic standard of relevance in the Federal Rules as being a “liberal” standard.[2] And in holding that Rule 702 did not incorporate the so-called Frye general acceptance rule,[3] the Daubert Court observed that drafting history of Rule 702 failed to mention Frye, just before invoking liberal-thrust cliché:

“rigid ‘general acceptance’ requirement would be at odds with the ‘liberal thrust’ of the Federal Rules and their ‘general approach of relaxing the traditional barriers to ‘opinion testimony’.”[4]

The court went on to cite one district court judge famously hostile to expert witness gatekeeping,[5] and to the “permissive backdrop” of the Rules, in holding that the Rules did not incorporate Frye,[6] which it characterized as an “austere” standard.[7]

While the Frye standard may have been “austere,” it was also widely criticized. It was also true that the Frye standard was largely applied to scientific devices and not to the scientific process of causal inference. The Frye case itself addressed the admissibility of a systolic blood pressure deception test, an early attempt by William Marston to design a lasso of truth. When courts distinguished the Frye cases on grounds that they involved devices, not causal inferences, they left no meaningful standard in place.

As a procedural matter, the Frye general acceptance standard made little sense in the context of causal opinions. If the opinion itself was generally accepted, then of course it would have to be admitted. Indeed, if the proponent sought judicial notice of the opinion, a trial court would likely have to admit the opinion, and then bar any contrary opinion as not generally accepted.

To be sure, before the Daubert decision, defense counsel attempted to invoke the Frye standard in challenges to the underlying methodology used by expert witnesses to draw causal inferences. There were, however, few such applications. Although not exactly how Frye operated, the Supreme Court might have imagined that the Frye standard required all expert witness opinion testimony to be based on “sufficiently established and accepted scientific methods. The actual language of the 1923 Frye case provides some ambivalent support with its twilight zone standard:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”[8]

There was always an interpretative difficulty in how exactly a trial court was supposed to poll the world’s scientific community to ascertain “general acceptance.” Moreover, the rule actually before the Daubert Court, Rule 702, spoke of “knowledge.” At best, “general acceptance,” whether of methods or of conclusions, was merely a proxy, and often a very inaccurate one for an epistemic basis for disputed claims or conclusions at issue in litigation.

In cases involving causal claims before Daubert, expert witness opinions received scant attention from trial judges as long as the proffered expert witness met the very minimal standard of expertise needed to qualify to give an opinion. Furthermore, Rule 705 relieved expert witnesses of having to provide any bases for their opinions on direct examination. The upshot was that the standard for admissibility was authoritarian, not epistemic. If the proffered witness had a reasonable pretense to expertise, then the proffering party could parade him or her as an “authority,” on whose opinion the jury could choose to rely in its fact finding. Given this context, any epistemic standard would be “liberal” in freeing the jury or fact finder from the yoke of authoritarian expert witness ipse dixit.

And what exactly is the “liberal” in all this thrusting over Rule 702? Most dictionaries report that the word “liberal” traces back to the Latin liber, meaning “free.” The Latin word is thus the root of both liberty and libertine. One of the major, early uses of the adjective liberal was in the phrase “liberal arts,” meant to denote courses of study freed from authority, dogmas, and religious doctrine. The primary definition provided by the Oxford English Dictionary emphasizes this specific meaning:

“1. Originally, the distinctive epithet of those ‘arts’ or ‘sciences’ (see art 7) that were considered ‘worthy of a free man’; opposed to servile or mechanical.  … . Freq. in liberal arts.”

The Frye general acceptance standard was servile in the sense of its deference to others who were the acceptors, and it was mechanical in its reducing a rule that called for “knowledge” into a formula for nose-counting among the entire field in which an expert witness was testifying. In this light, the Daubert Court’s decision is obvious.

To be sure, the OED provides other subordinate or secondary definitions for “liberal,” such 3c:

Of construction or interpretation: Inclining to laxity or indulgence; not rigorous.”

Perhaps this definition would suggest that a liberal interpretation of Rule 702 would lead to reject the Frye standard because it was rigorous in determining admissibility on a rigid proxy determination that was not necessarily tied to the rule’s requirement of knowledge. Of course, knowledge or epistemic criteria in the Rule imply a different sort of rigor, one that is not servile or mechanical.

The epistemic criterion built into the original Rule 702, and carried forward in every amendment, accords with the secondary meanings given by the OED:

4. a. Free from narrow prejudice; open-minded, candid.

  1. esp. Free from bigotry or unreasonable prejudice in favour of traditional opinions or established institutions; open to the reception of new ideas or proposals of reform.”

The Daubert case represented a step in direction of the classically liberal goal of advancing the truth-finding function of trials. The counter-revolution of let it all in, under the guise of finding challenges to expert witness opinion as going to “weight not admissibility,” or to inventing “presumptions of admissibility” should be seen for what they are: retrograde and illiberal movements in jurisprudential progress.


[1] See, e.g., Michael H. Graham, “The Expert Witness, Predicament: Determining ‘Reliable’ Under the Gatekeeping Test of Daubert, Kumho, and Proposed Amended Rule 702 of the Federal Rules of Evidence,” 54 U. Miami L. Rev. 317, 321 (2000) (“Daubert is a very incomplete case, if not a very bad decision. It did not, in any way, accomplish what it was meant to, i.e., encourage more liberal admissibility of expert witness evidence.”)

[2] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579,587 (1993).

[3] Frye v. United States, 293 F. 1013 (D.C. Cir. 1923).

[4] Id. at 588, citing Beech Aircraft Corp. v. Rainey, 488 U. S. 153, 169 (citing Rules 701 to 705); see also Edward J. Imwinkelried, “A Brief Defense of the Supreme Court’s Approach to the Interpretation of the Federal Rules of Evidence,” Indiana L. Rev. 267, 294 (1993)(writing of the “liberal structural design” of the Federal Rules).

[5] Jack B. Weinstein, “Rule 702 of the Federal Rules of Evidence is Sound; It Should Not Be Amended,” 138 F. R. D. 631 (1991) (“The Rules were designed to depend primarily upon lawyer-adversaries and sensible triers of fact to evaluate conflicts”).

[6] Daubert at 589.

[7] Id.

[8] Frye v. United States, 54 App. D.C. 46, 293 F. 1013 (1923).