TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Amicus Curious – Gelbach’s Foray into Lipitor Litigation

August 25th, 2022

Professor Schauer’s discussion of statistical significance, covered in my last post,[1] is curious for its disclaimer that “there is no claim here that measures of statistical significance map easily onto measures of the burden of proof.” Having made the disclaimer, Schauer proceeds to falls into the transposition fallacy, which contradicts his disclaimer, and, generally speaking, is not a good thing for a law professor eager to advance the understanding of “The Proof,” to do.

Perhaps more curious than Schauer’s error is his citation support for his disclaimer.[2] The cited paper by Jonah B. Gelbach is one of several of Gelbach’s papers that advances the claim that the p-value does indeed map onto posterior probability and the burden of proof. Gelbach’s claim has also been the center piece in his role as an advocate in support of plaintiffs in the Lipitor (atorvastatin) multi-district litigation (MDL) over claims that ingestion of atorvastatin causes diabetes mellitus.

Gelbach’s intervention as plaintiffs’ amicus is peculiar on many fronts. At the time of the Lipitor litigation, Sonal Singh was an epidemiologist and Assistant Professor of Medicine, at the Johns Hopkins University. The MDL trial court initially held that Singh’s proffered testimony was inadmissible because of his failure to consider daily dose.[3] In a second attempt, Singh offered an opinion for 10 mg daily dose of atorvastatin, based largely upon the results of a clinical trial known as ASCOT-LLA.[4]

The ASCOT-LLA trial randomized 19,342 participants with hypertension and at least three other cardiovascular risk factors to two different anti-hypertensive medications. A subgroup with total cholesterol levels less than or equal to 6.5 mmol./l. were randomized to either daily 10 mg. atorvastatin or placebo.  The investigators planned to follow up for five years, but they stopped after 3.3 years because of clear benefit on the primary composite end point of non-fatal myocardial infarction and fatal coronary heart disease. At the time of stopping, there were 100 events of the primary pre-specified outcome in the atorvastatin group, compared with 154 events in the placebo group (hazard ratio 0.64 [95% CI 0.50 – 0.83], p = 0.0005).

The atorvastatin component of ASCOT-LLA had, in addition to its primary pre-specified outcome, seven secondary end points, and seven tertiary end points.  The emergence of diabetes mellitus in this trial population, which clearly was at high risk of developing diabetes, was one of the tertiary end points. Primary, secondary, and tertiary end points were reported in ASCOT-LLA without adjustment for the obvious multiple comparisons. In the treatment group, 3.0% developed diabetes over the course of the trial, whereas 2.6% developed diabetes in the placebo group. The unadjusted hazard ratio was 1.15 (0.91 – 1.44), p = 0.2493.[5] Given the 15 trial end points, an adjusted p-value for this particular hazard ratio, for diabetes, might well exceed 0.5, and even approach 1.0.

On this record, Dr. Singh honestly acknowledged that statistical significance was important, and that the diabetes finding in ASCOT-LLA might have been the result of low statistical power or of no association at all. Based upon the trial data alone, he testified that “one can neither confirm nor deny that atorvastatin 10 mg is associated with significantly increased risk of type 2 diabetes.”[6] The trial court excluded Dr. Singh’s 10mg/day causal opinion, but admitted his 80mg/day opinion. On appeal, the Fourth Circuit affirmed the MDL district court’s rulings.[7]

Jonah Gelbach is a professor of law at the University of California at Berkeley. He attended Yale Law School, and received his doctorate in economics from MIT.

Professor Gelbach entered the Lipitor fray to present a single issue: whether statistical significance at conventionally demanding levels such as 5 percent is an appropriate basis for excluding expert testimony based on statistical evidence from a single study that did not achieve statistical significance.

Professor Gelbach is no stranger to antic proposals.[8] As amicus curious in the Lipitor litigation, Gelbach asserts that plaintiffs’ expert witness, Dr. Singh, was wrong in his testimony about not being able to confirm the ASCOT-LLA association because he, Gelbach, could confirm the association.[9] Ultimately, the Fourth Circuit did not discuss Gelbach’s contentions, which is not surprising considering that the asserted arguments and alleged factual considerations were not only dehors the record, but in contradiction of the record.

Gelbach’s curious claim is that any time a risk ratio, for an exposure and an outcome of interest, is greater than 1.0, with a p-value < 0.5,[10] the evidence should be not only admissible, but sufficient to support a conclusion of causation. Gelbach states his claim in the context of discussing a single randomized controlled trial (ASCOT-LLA), but his broad pronouncements are carelessly framed such that others may take them to apply to a single observational study, with its greater threats to internal validity.

Contra Kumho Tire

To get to his conclusion, Gelbach attempts to remove the constraints of traditional standards of significance probability. Kumho Tire teaches that expert witnesses must “employ[] in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.”[11] For Gelbach, this “eminently reasonable admonition” does not impose any constraints on statistical inference in the courtroom. Statistical significance at traditional levels (p < 0.05) is for elitist scholarly work, not for the “practical” rent-seeking work of the tort bar. According to Gelbach, the inflation of the significance level ten-fold to p < 0.5 is merely a matter of “weight” and not admissibility of any challenged opinion testimony.

Likelihood Ratios and Posterior Probabilities

Gelbach maintains that any evidence that has a likelihood ratio (LR > 1) greater than one is relevant, and should be admissible under Federal Rule of Evidence 401.[12] This argument ignores the other operative Federal Rules of Evidence, namely 702 and 703, which impose additional criteria of admissibility for expert witness opinion testimony.

With respect to variance and random error, Gelbach tells us that any evidence that generates a LR > 1, should be admitted when “the statistical evidence is statistically significant below the 50 percent level, which will be true when the p-value is less than 0.5.”[13]

At times, Gelbach seems to be discussing the admissibility of the ASCOT-LLA study itself, and not the proffered opinion testimony of Dr. Singh. The study itself would not be admissible, although it is clearly the sort of hearsay an expert witness in the field may consider. If Dr. Singh were to have reframed and recalculated the statistical comparisons, then the Rule 703 requirement of “reasonable reliance” by scientists in the field of interest may not have been satisfied.

Gelbach also generates a posterior probability (0.77), which is based upon his calculations from data in the ASCOT-LLA trial, and not the posterior probability of Dr. Singh’s opinion. The posterior probability, as calculated, is problematic on many fronts.

Gelbach does not present his calculations – for the sake of brevity he says – but he tells us that the ASCOT-LLA data yield a likelihood ratio of roughly 1.9, and a p-value of 0.126.[14] What the clinical trialists reported was a hazard ratio of 1.15, which is a weak association on most researchers’ scales, with a two-sided p-value of 0.25, which is five times higher than the usual 5 percent. Gelbach does not explain how or why his calculated p-value for the likelihood ratio is roughly half the unadjusted, two-sided p-value for the tertiary outcome from ASCOT-LLA.

As noted, the reported diabetes hazard ratio of 1.15 was a tertiary outcome for the ASCOT trial, one of 15 calculated by the trialists, with p-values unadjusted for multiple comparisons.  The failure to adjust is perhaps excusable in that some (but certainly not all) of the outcome variables are overlapping or correlated. A sophisticated reader would not be misled; only when someone like Gelbach attempts to manufacture an inflated posterior probability without accounting for the gross underestimate in variance is there an insult to statistical science. Gelbach’s recalculated p-value for his LR, if adjusted for the multiplicity of comparisons in this trial, would likely exceed 0.5, rendering all his arguments nugatory.

Using the statistics as presented by the published ASCOT-LLA trial to generate a posterior probability also ignores the potential biases (systematic errors) in data collection, the unadjusted hazard ratios, the potential for departures from random sampling, errors in administering the participant recruiting and inclusion process, and other errors in measurements, data collection, data cleaning, and reporting.

Gelbach correctly notes that there is nothing methodologically inappropriate in advocating likelihood ratios, but he is less than forthcoming in explaining that such ratios translate into a posterior probability only if he posits a prior probability of 0.5.[15] His pretense to having simply stated “mathematical facts” unravels when we consider his extreme, unrealistic, and unscientific assumptions.

The Problematic Prior

Gelbach’s glibly assumes that the starting point, the prior probability, for his analysis of Dr. Singh’s opinion is 50%. This is an old and common mistake,[16] long since debunked.[17] Gelbach’s assumption is part of an old controversy, which surfaced in early cases concerning disputed paternity. The assumption, however, is wrong legally and philosophically.

The law simply does not hand out 0.5 prior probability to both parties at the beginning of a trial. As Professor Jaffee noted almost 35 years ago:

“In the world of Anglo-American jurisprudence, every defendant, civil and criminal, is presumed not liable. So, every claim (civil or criminal) starts at ground zero (with no legal probability) and depends entirely upon proofs actually adduced.”[18]

Gelbach assumes that assigning “equal prior probability” to two adverse parties is fair, because the fact-finder would not start hearing evidence with any notion of which party’s contentions are correct. The 0.5/0.5 starting point, however, is neither fair nor is it the law.[19] The even odds prior is also not good science.

The defense is entitled to a presumption that it is not liable, and the plaintiff must start at zero.  Bayesians understand that this is the death knell of their beautiful model.  If the prior probability is zero, then Bayes’ Theorem tells us mathematically that no evidence, no matter how large a likelihood ratio, can move the prior probability of zero towards one. Bayes’ theorem may be a correct statement about inverse probabilities, but still be an inadequate or inaccurate model for how factfinders do, or should, reason in determining the ultimate facts of a case.

We can see how unrealistic and unfair Gelbach’s implied prior probability is if we visualize the proof process as a football field.  To win, plaintiffs do not need to score a touchdown; they need only cross the mid-field 50-yard line. Rather than making plaintiffs start at the zero-yard line, however, Gelbach would put them right on the 50-yard line. Since one toe over the mid-field line is victory, the plaintiff is spotted 99.99+% of its burden of having to present evidence to build up 50% probability. Instead, plaintiffs are allowed to scoot from the zero yard line right up claiming success, where even the slightest breeze might give them winning cases. Somehow, in the model, plaintiffs no longer have to present evidence to traverse the first half of the field.

The even odds starting point is completely unrealistic in terms of the events upon which the parties are wagering. The ASCOT-LLA study might have shown a protective association between atorvastatin and diabetes, or it might have shown no association at all, or it might have show a larger hazard ratio than measured in this particular sample. Recall that the confidence interval for hazard ratios for diabetes ran from 0.91 to 1.44. In other words, parameters from 0.91 (protective association) to 1.0 (no association), to 1.44 (harmful association) were all reasonably compatible with the observed statistic, based upon this one study’s data. The potential outcomes are not binary, which makes the even odds starting point inappropriate.[20]


[1]Schauer’s Long Footnote on Statistical Significance” (Aug. 21, 2022).

[2] Frederick Schauer, The Proof: Uses of Evidence in Law, Politics, and Everything Else 54-55 (2022) (citing Michelle M. Burtis, Jonah B. Gelbach, and Bruce H. Kobayashi, “Error Costs, Legal Standards of Proof, and Statistical Significance,” 25 Supreme Court Economic Rev. 1 (2017).

[3] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., MDL No. 2:14–mn–02502–RMG, 2015 WL 6941132, at *1  (D.S.C. Oct. 22, 2015).

[4] Peter S. Sever, et al., “Prevention of coronary and stroke events with atorvastatin in hypertensive patients who have average or lower-than-average cholesterol concentrations, in the Anglo-Scandinavian Cardiac Outcomes Trial Lipid Lowering Arm (ASCOT-LLA): a multicentre randomised controlled trial,” 361 Lancet 1149 (2003). [cited here as ASCOT-LLA]

[5] ASCOT-LLA at 1153 & Table 3.

[6][6] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 174 F.Supp. 3d 911, 921 (D.S.C. 2016) (quoting Dr. Singh’s testimony).

[7] In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 892 F.3d 624, 638-39 (2018) (affirming MDL trial court’s exclusion in part of Dr. Singh).

[8] SeeExpert Witness Mining – Antic Proposals for Reform” (Nov. 4, 2014).

[9] Brief for Amicus Curiae Jonah B. Gelbach in Support of Plaintiffs-Appellants, In re Lipitor Mktg., Sales Practices & Prods. Liab. Litig., 2017 WL 1628475 (April 28, 2017). [Cited as Gelbach]

[10] Gelbach at *2.

[11] Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

[12] Gelbach at *5.

[13] Gelbach at *2, *6.

[14] Gelbach at *15.

[15] Gelbach at *19-20.

[16] See Richard A. Posner, “An Economic Approach to the Law of Evidence,” 51 Stanford L. Rev. 1477, 1514 (1999) (asserting that the “unbiased fact-finder” should start hearing a case with even odds; “[I]deally we want the trier of fact to work from prior odds of 1 to 1 that the plaintiff or prosecutor has a meritorious case. A substantial departure from this position, in either direction, marks the trier of fact as biased.”).

[17] See, e.g., Richard D. Friedman, “A Presumption of Innocence, Not of Even Odds,” 52 Stan. L. Rev. 874 (2000). [Friedman]

[18] Leonard R. Jaffee, “Prior Probability – A Black Hole in the Mathematician’s View of the Sufficiency and Weight of Evidence,” 9 Cardozo L. Rev. 967, 986 (1988).

[19] Id. at p.994 & n.35.

[20] Friedman at 877.

Schauer’s Long Footnote on Statistical Significance

August 21st, 2022

One of the reasons that, in 2016, the American Statistical Association (ASA) issued, for the first time in its history, a consensus statement on p-values, was the persistent and sometimes deliberate misstatements and misrepresentations about the meaning of the p-value. Indeed, of the six principles articulated by the ASA, several were little more than definitional, designed to clear away misunderstandings.  Notably, “Principle Two” addresses one persistent misunderstanding and states:

“P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither. It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.”[1]

The ASA consensus statement followed on the heels of an important published article, written by seven important authors in the fields of statistics and epidemiology.[2] One statistician,[3] who frequently shows up as an expert witness for multi-district litigation plaintiffs, described the article’s authors as the “A-Team” of statistics. In any event, the seven prominent thought leaders identified common statistical misunderstandings, including the belief that:

“2. The P value for the null hypothesis is the probability that chance alone produced the observed association; for example, if the P value for the null hypothesis is 0.08, there is an 8% probability that chance alone produced the association. No![4]

This is all basic statistics.

Frederick Schauer is the David and Mary Harrison Distinguished Professor of Law at the University of Virginia. Schauer has had contributed prolifically to legal scholarship, and his publications are often well written and thoughtful analyses. Schauer’s recent book, The Proof: Uses of Evidence in Law, Politics, and Everything Else, published by the Harvard University Press is a contribution to the literature of “legal epistemology,” and the foundations of evidence that lie beneath many of our everyday and courtroom approaches to resolving disputes.[5] Schauer’s book might be a useful addition to an undergraduate’s reading list for a course in practical epistemology, or for a law school course on evidence. The language of The Proof is clear and lively, but at times wanders into objectionable and biased political correctness. For example, Schauer channels Naomi Oreskes and her critique of manufacturing industry in his own discussion of “manufactured evidence,”[6] but studiously avoids any number of examples of explicit manufacturing of fraudulent evidence in litigation by the lawsuit industry.[7] Perhaps the most serious omission in this book on evidence is its failure to discuss the relative quality and hierarchy of evidence in science, medicine, and in policy.  Readers will not find any mention of the methodology of systematic reviews or meta-analyses in Schauer’s work.

At the end of his chapter on burdens of proof, Schauer adds “A Long Footnote on Statistical Significance,” in which he expresses surprise that the subject of statistical significance is controversial. Schauer might well have brushed up on the statistical concepts he wanted to discuss.

Schauer’s treatment of statistical significance is both distinctly unbalanced, as well as misstated. In an endnote,[8] Schauer cites some of the controversialists who have criticized significance tests, but none of the statisticians who have defended their use.[9]

As for conceptual accuracy, after giving a serviceable definition of the p-value, Schauer immediately goes astray:

And this likelihood is conventionally described in terms of a p-value, where the p-value is the probability that positive results—rejection of the “null hypothesis” that there is no connection between the examined variables—were produced by chance.”[10]

And again, for emphasis, Schauer tells us:

“A p-value of greater than .05 – a greater than 5 percent probability that the same results would have been the result of chance – has been understood to mean that the results are not statistically significant.”[11]

And then once more for emphasis, in the context of an emotionally laden hypothetical about an experimental drug “cures” a dread, incurable disease, p = 0.20, Schauer tells us that he suspects most people would want to take the medication:

“recognizing that an 80 percent likelihood that the rejection of ineffectiveness was still good enough, at least if there were no other alternatives.”

Schauer wants to connect his discussion of statistical significance to degrees or varying strengths of evidence, but his discursion into statistical significance largely conflates precision with strength. Evidence can be statistically robust but not be very strong. If we imagine a very large randomized clinical trial that found that a medication lowered systolic blood pressure by 1mm of mercury, p < 0.05, we would not consider that finding to constitute strong evidence for therapeutic benefit. If the observation of lowering blood pressure by 1mm came from an observational study, p < 0.05, the finding might not even qualify as evidence in the views of sophisticated cardiovascular physicians and researchers.

Earlier in the chapter, Schauer points to instances in which substantial evidence for a conclusion is downplayed because it is not “conclusive,” or “definitive.” He is obviously keen to emphasize that evidence that is not “conclusive” may still be useful in some circumstances. In this context, Schauer yet again misstates the meaning of significance probability, when he tells us that:

“[j]ust as inconclusive or even weak evidence may still be evidence, and may still be useful evidence for some purposes, so too might conclusions – rejections of the null hypothesis – that are more than 5 percent likely to have been produced by chance still be valuable, depending on what follows from those conclusions.”[12]

And while Schauer is right that weak evidence may still be evidence, he seems loathe to admit that weak evidence may be pretty crummy support for a conclusion. Take, for instance, a fair coin.  We have an expected value on ten flips of five heads and five tails.  We flip the coin ten times, but we observe six heads and four tails.  Do we now have “evidence” that the expected value and the expected outcome are wrong?  Not really. The probability of observing the expected outcome on the binomial model that most people would endorse for the thought experiment is 24.6%. The probability of not observing the expected value in ten flips is three times greater. If we look at an epidemiologic study, with a sizable number of participants, the “expected value” of 1.0, embodied in the null hypothesis, is an outcome that we would rarely expect to see, even if the null hypothesis is correct.  Schauer seems to have missed this basic lesson of probability and statistics.

Perhaps even more disturbing is that Schauer fails to distinguish the other determinants of study validity and the warrants for inferring a conclusion at any level of certainty. There is a distinct danger that his comments about p-values will be taken to apply to various study designs, descriptive, observational, and experimental. And there is a further danger that incorrect definitions of the p-value and statistical significance probabilities will be used to misrepresent p-values as relating to posterior probabilities. Surely, a distinguished professor of law, at a top law school, in a book published by a prestigious  publisher (Belknap Press) can do better. The message for legal practitioners is clear. If you need to define or discuss statistical concepts in a brief, seek out a good textbook on statistics. Do not rely upon other lawyers, even distinguished law professors, or judges, for accurate working definitions.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129, 131 (2016).

[2] Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 European J. Epidemiol. 337 (2016).[cited as “Seven Sachems”]

[3] Martin T. Wells.

[4] Seven Sachems at 340 (emphasis added).

[5] Frederick Schauer, The Proof: Uses of Evidence in Law, Politics, and Everything Else (2022). [Schauer] One nit: Schauer cites a paper by A. Philip Dawid, “Statistical Risk,” 194 Synthese 3445 (2017). The title of the paper is “On individual risk.”

[6] Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Climate Change (2010).

[7] See, e.g., In re Silica Prods. Liab. Litig., 398 F.Supp. 2d 563 (S.D.Tex. 2005); Transcript of Daubert Hearing at 23 (Feb. 17, 2005) (“great red flags of fraud”).

[8] See Schauer endnote 44 to Chapter 3, “The Burden of Proof,” citing Valentin Amrhein, Sander Greenland, and Blake McShane, “Scientists Rise Up against Statistical Significance,” www .nature .com (March 20, 2019), which in turn commented upon Blakey B. McShane, David Gal, Andrew Gelman, Christian Robert, and Jennifer L. Tackett, “Abandon Statistical Significance,” 73 American Statistician 235 (2019).

[9] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021); see alsoA Proclamation from the Task Force on Statistical Significance” (June 21, 2021).

[10] Schauer at 55. To be sure, Schauer, in endnote 43 to Chapter 3, disclaims any identification of p-values or measures of statistical significance with posterior probabilities or probabilistic measures of the burden of proof. Nonetheless, in the text, he proceeds to do exactly what he disclaimed in the endnote.

[11] Schauer at 55.

[12] Schauer at 56.

Differential Etiologies – Part One – Ruling In

June 17th, 2022

You put your right foot in

You put your right foot out

You put your right foot in

And you shake it all about

You do the Hokey Pokey and you turn yourself around

That’s what it’s all about!

 

Ever since the United States Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), legal scholars, judges, and lawyers have struggled with the structure and validity of expert opinion on specific causation. Professor David Faigman and others have attempted to articulate the scientific basis (if any) for opinion testimony in health-effects litigation that a give person’s disease has been caused by an exposure or condition.

In 2015, as part of a tribute to the late Judge Jack Weinstein, Professor Faigman offered the remarkable suggestion that in advancing differential etiologies, expert witnesses were inventing wholesale an approach that had no foundation or acceptance in their scientific disciplines:

 “Differential etiology is ostensibly a scientific methodology, but one not developed by, or even recognized by, physicians or scientists. As described, it is entirely logical, but has no scientific methods or principles underlying it. It is a legal invention and, as such, has analytical heft, but it is entirely bereft of empirical grounding. Courts and commentators have so far merely described the logic of differential etiology; they have yet to define what that methodology is.”[1]

Faigman is correct that courts often have left unarticulated exactly what the methodology is, but he does not quite make sense when he writes that the method of differential etiology is “entirely logical,” but has no “scientific methods or principles underlying it.” After all, Faigman starts off his essay with a quotation from Thomas Huxley that “science is nothing but trained and organized common sense.”[2] As I have written elsewhere, the form of reasoning involved in differential diagnosis is nothing other than iterative disjunctive syllogism.[3] Either-or reasoning occurs throughout the physical and biological sciences; it is not clear why Faigman declares it un- or extra-scientific.

The strength of Faigman’s claim about the made-up nature of differential etiology appears to be undermined and contradicted by an example that he provides from clinical allergy and immunology:

“Allergists, for example, attempt to identify the etiology of allergic reactions in order to treat them (or to advise the patient to avoid what caused them), though it might still be possible to treat the allergic reactions without knowing their etiology.”

Faigman at 437. Of course, not only allergists try to determine the cause of an individual patient’s disease. Psychiatrists, in the psychoanalytic tradition, certain do so as well. Physicians who use predictive regression models use group data, in multivariate analyses, to predict outcomes, risk, and mortality in individual patients. Faigman’s claim is similarly undermined by the existence of a few diseases (other than infectious diseases) that are defined by the causative exposure. Silicosis and manganism have played a large role in often bogus litigation, but they represent instances in which a differential diagnosis and puzzle may also be an etiological diagnosis and a puzzle. Of course, to the extent that a disease is defined in terms of causative exposures, there may be serious and even intractable problems caused by the lack of specificity and accuracy in the diagnostic criteria for the supposedly pathognomonic disease.

As I noted at the time of Faigman’s 2015 essay, his suggestion that the concept of “differential etiology” was not used in the sciences themselves, was demonstrably flawed and historically inaccurate.[4]

A year earlier, in a more sustained analysis of specific causation, Professor Faigman went astray in a different direction, this time by stating that:

“it is not customary in the ordinary practice of sociology, epidemiology, anthropology, and related fields (for example, cognitive and social psychology) for professionals to make individual diagnostic judgments derived from group-based data.”[5]

Faigman’s invocation of “ordinary practice” of epidemiology was seriously wide of the mark. Medical practitioners and scientists frequently use epidemiologic data, based upon “group-based data” to make individual diagnostic judgments. The inferences from group data to individual range abound in the diagnostic process itself, where the specificity and sensitivity of disease signs and symptoms are measured by group data. Physicians must rely upon group data to make prognoses for individual patients, and they rely upon group data to predict future disease risks for individual patients. Future disease risks, as in the Framingham risk score for hard coronary heart disease, or the Gale model for breast cancer risk, are, of course, based upon “group-based data.” Medical decisions to intervene, surgically, pharmacologically, or by some other method, all involve applying group data to the individual patient.

Faigman’s 2014 law review article was certainly correct, however, in noting that specific causation inferences and conclusions were often left “profoundly underdefined,” with glib identifications of risk with cause.[6] There was thus plenty of room for further elucidation of specific causation decisions, and I welcome Faigman’s most recent effort to nail conceptual jello to the wall, in a law review article that was published last year.[7]

This new article, “Differential Etiology: Inferring Specific Causation in the Law from Group Data in Science,” is the collaborative product of Professor Faigman and three other academics. Joseph Sanders will be immediately recognizable to the legal community as someone who long pondered causation issues, both general and specific, and who has contributed greatly to the law review literature on causation of health outcomes. In addition to the law professors, Peter B. Imrey, a professor of medicine at the Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, and Philip Dawid, an emeritus professor of statistics in Cambridge University, have joined the effort to make sense of specific causation in the law. The addition of medical and statistical expertise has added greatly to Faigman’s previous efforts, and it has corrected some of his earlier errors and added much nuance to the discussion. The resulting law review article is well worth reading for practitioners. In this post, however, I have not detailed every important insight, but rather I have tried to point out some of the continuing and new errors in the analysis.

The Sanders-Faigman-Imbrey-Dawid analysis begins with a lament that:

“there is no body of science to which experts can turn when addressing this issue. Ultimately, much of the evidence that can be brought to bear on this causal question is the same group-level data employed to prove general causation. Consequently, the expert testimony often feels jerry-rigged, an improvisation designed to get through a tough patch.”[8]

As an assessment of the judicial decisions on specific causation, there can be no dissent or appeal from the judgment of these authors. The authors use of the term “jerry-rigged” is curious. I had first I thought they were straining to avoid using the common phrase “jury rigged” or to avoid inventing a neologism such as “judge rigged.” The American Heritage and Merriam Webster dictionaries, however, describe the phrase “jerry-rigged” as a conflation of “jury-rigged,” a nautical term for a temporary but adequate repair, with “jerry-rigged,” a war-time pejorative term for makeshift devices put together by Germans. So jerry-rigged it is, and the authors are off and running to try to describe, clarify, and justify the process of drawing specific causation inferences by differential etiology. They might have called what passes for judicial decision making in this area as the “hokey pokey.”

The authors begin their analysis of specific causation with a brief acknowledgement that our legal system could abandon any effort to set standards or require rigorous thinking on the matter by simply leaving the matter to the jury.[9] After all, this laissez-faire approach had been the rule of law for centuries. Nevertheless, despite occasional retrograde, recidivist judicial opinions,[10] the authors realize that the law has evolved to a point that some judicial control over specific causation opinions is required. And if judges are going to engage in gatekeeping of specific-causation opinions, they need to explain and justify their decisions in a coherent and cogent fashion.

Having thus dispatched legal nihilism, the authors turn their attention to what they boldly describe as “the first full-scale effort to bring scientific sensibilities – and rigorous statistical thinking – to the legally imperative concept of specific causation.”[11] The claim is remarkable claim given that tort law has been dealing with the issue for decades, but probably correct given how frequently judges have swept the issue under a judicial rug of inpenetrable verbiage and shaggy thinking. The authors also walk back some of Faigman’s earlier claims that there is no science in the assessment of specific causation, although they acknowledge the obvious, that policy issues sometimes play a role in deciding both general and specific causation decisions. The authors also offer the insight, for which they claim novelty, that some of the Bradford Hill guidelines, although stated as part of assessing general causation, have some relevancy to decisions concerning specific causation.[12] Their insight is indeed important, although hardly novel.

Drawing upon some of the clearer judicial decisions, the authors identify three necessary steps to reach a conclusion of specific causation:

“(a) making a proper diagnosis;

(b) supporting (“ruling in”) the plausibility of the alleged cause of the injury on the basis of general evidence and logic; and

(c) particularization, i.e., excluding (‘ruling out’) competing causes in the specific instance under consideration.”[13]

Although this article is ostensibly about specific causation, the authors do not reach a serious discussion of the matter until roughly the 42nd page of a 72 page article. Having described a three-step approach, the authors feel compelled to discuss step one (describing or defining the “diagnosis,” or the outcome of interest), and step two, the “ruling in” process that requires an assessment of general causation.

Although ascertaining general causation is not the focus of this article, the authors give an extensive discourse on it. Indeed, the authors have some useful things to say about steps one and two, and I commend the article to readers for some of its learning. As much as the lawsuit industry might wish to do away with the general causation step, it is not going anywhere soon.[14] The authors also manage to say some things that range from wrong to not even wrong. One example of professoriate wish casting is the following assertion:

“Other things being equal, when the evidence for general causation is strong, and especially when the strength of the exposure–disease relationship as demonstrated in a body of research is substantial, the plaintiff faces a lower threshold in establishing the substance as the cause in a particular case than when the relationship is weaker.”[15]

This assertion appears, sans citation or analysis. The generalization fails in the face of counterexamples. The causal role for estrogen in many breast cancers is extremely strong. The International Agency for Cancer Research classifies estrogen as a Category I, known human carcinogen for breast cancer, even though estrogen is made naturally in the human female, and male, body. In the Women’s Health Initiative clinical trial, researchers reported a hazard ratio of 1.2,[16] but plaintiffs struggled to prevail on specific causation in litigation involving claims of breast cancer caused by post-menopausal hormone therapy. Perhaps the authors meant, by strength of exposure relationship, a high relative risk as well, but that point is taken up when the authors address the “ruling in” step of the three-step approach. In any event, the strength of the case for general causation is quite independent of the specific causation inference, especially in the face of small effect sizes.

On general causation itself, the authors begin their discussion with “threats to validity,” a topic that they characterize as mostly implicit in the Bradford Hill guidelines. But their suggestion that validity is merely implicit in the guidelines is belied by their citation to Dr. Woodside’s helpful article on the “forgotten predicate” to the nine Bradford Hill guidelines.[17] Bradford Hill explicitly noted that the starting point for considering an association to be causal occurred when “[o]ur observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”[18] Sir Austin told us in no uncertain terms that there is no need to consider the nine guidelines until random and systematic error have been rejected.[19]

In this article’s discussion of general causation, Professor’s Dawid’s influence can be seen in the unusual care to describe and define the p-value.[20] But the discussion devolves into more wish casting, when the authors state that p-values are not the only way to assess random error in research results.

They double down by stating that “[m]any prominent statisticians and other scientists have questioned it, and the need for change is increasingly accepted.”[21] The source for their statement, the American Statistical Association (ASA) 2016 p-value Statement, did not questioned the utility of the p-value for assessing random error, and this law review provides no other support for other unidentified methods to assess random error. For the most part, the ASA Statement identified misuses and misstatements of p-values, with the caveat that “[s]cientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.” This is hardly questioning the importance or utility of p-values in assessing random error.

When one of the cited authors, Ronald Wasserstein, published an editorial in 2019, proclaiming that it was time to move past the p-value, the then president of the ASA, Professor Karen Kafadar, commissioned a task force on the matter. That task force, consisting of many of the world’s leading statisticians, issued a short, but pointed rejection of Wasserstein’s advocacy, and by implication, the position asserted in this law review.[22] Several of the leading biomedical journals that were lobbied by Wasserstein to abandon statistical significance testing reassessed their statistical guidelines and reaffirmed the use of p-values and tests.[23]

Similarly, this law review’s statements that alternatives to frequentist tests (p-values) such as Bayesian inference are “ascendant” have no supporting citations, and generally are an inaccurate assessment of what most biomedical journals are currently publishing.

Despite the care with which this law review article has defined p-values, the authors run off the road when defining a confidence interval:

A 95% confidence interval … is a one-sided or two-sided interval from a data sample with 95% probability of bounding a fixed, unknown parameter, for which no nondegenerate probability distribution is conceived, under specified assumptions about the data distribution.”[24]

The emphasis added is to point out that the authors assigned a single confidence interval with the property of bounding the true parameter with 95% probability. That property, however, belongs to the infinite set of confidence intervals based upon repeated sampling of the same size from the same population, and constant variance. There is no probability statement to be made for the true parameter, as either in or not in a given confidence interval.

In an issue that is relevant to general and specific causation, the authors offer some ipse dixit on the issue of “thresholds”:

“with respect to some substance/injury relationships, it is thought that there is no safe threshold. Cancer is the injury for which it is most frequently thought that there is no safe threshold, but even here the mechanism of injury may lead to a different conclusion.”[25]

Here as elsewhere, the authors are repeating dogma, not science, and they ignore the substantial body of scientific evidence that undermines the so-called linear no threshold dose-response curve. The only citation offered is a judicial citation to a case that rejected the no threshold position![26]

So much for “ruling in.” In the next post, I will turn my attention to this law review’s handling of the “ruling out” step of differential etiology.


[1] David L. Faigman & Claire Lesikar, “Organized Common Sense: Some Lessons from Judge Jack Weinstein’s Uncommonly Sensible Approach to Expert Evidence,” 64 DePaul L. Rev. 421, 444 (2015).

[2] Thomas H. Huxley, “On the Education Value of the Natural History Sciences” (1854), in Lay Sermons, Addresses and Reviews 77 (1915).

[3] See, e.g., “Differential Etiology and Other Courtroom Magic” (June 23, 2014) (collecting cases); “Differential Diagnosis in Milward v. Acuity Specialty Products Group” (Sept. 26, 2013).

[4] See David Faigman’s Critique of G2i Inferences at Weinstein Symposium (Sept. 11, 2015); Kløve & D. Doehring, “MMPI in epileptic groups with differential etiology,” 18 J. Clin. Psychol. 149 (1962); Kløve & C. Matthews, “Psychometric and adaptive abilities in epilepsy with differential etiology,” 7 Epilepsia 330 (1966); Teuber & K. Usadel, “Immunosuppression in juvenile diabetes mellitus? Critical viewpoint on the treatment with cyclosporin A with consideration of the differential etiology,” 103 Fortschr. Med. 707 (1985); G.May & W. May, “Detection of serum IgA antibodies to varicella zoster virus (VZV)–differential etiology of peripheral facial paralysis. A case report,” 74 Laryngorhinootologie 553 (1995); Alan Roberts, “Psychiatric Comorbidity in White and African-American Illicity Substance Abusers” Evidence for Differential Etiology,” 20 Clinical Psych. Rev. 667 (2000); Mark E. Mullinsa, Michael H. Leva, Dawid Schellingerhout, Gilberto Gonzalez, and Pamela W. Schaefera, “Intracranial Hemorrhage Complicating Acute Stroke: How Common Is Hemorrhagic Stroke on Initial Head CT Scan and How Often Is Initial Clinical Diagnosis of Acute Stroke Eventually Confirmed?” 26 Am. J. Neuroradiology 2207 (2005);Qiang Fua, et al., “Differential Etiology of Posttraumatic Stress Disorder with Conduct Disorder and Major Depression in Male Veterans,” 62 Biological Psychiatry 1088 (2007); Jesse L. Hawke, et al., “Etiology of reading difficulties as a function of gender and severity,” 20 Reading and Writing 13 (2007); Mastrangelo, “A rare occupation causing mesothelioma: mechanisms and differential etiology,” 105 Med. Lav. 337 (2014).

[5] David L. Faigman, John Monahan & Christopher Slobogin, “Group to Individual (G2i) Inference in Scientific Expert Testimony,” 81 Univ. Chi. L. Rev. 417, 465 (2014).

[6] Id. at 448.

[7] Joseph Sanders, David L. Faigman, Peter B. Imrey, and Philip Dawid, “Differential Etiology: Inferring Specific Causation in the Law from Group Data in Science,” 63 Ariz. L. Rev. 851 (2021) [Differential Etiology]. I am indebted to Kirk Hartley for calling this new publication to my attention.

[8] Id. at 851, 855.

[9] Id. at 855 & n. 8 (citing A. Philip Dawid, David L. Faigman & Stephen E. Fienberg, “Fitting Science into Legal Contexts: Assessing Effects of Causes or Causes of Effects?,” 43 Sociological Methods & Research 359, 363–64 (2014). See also Barbara Pfeffer Billauer, “The Causal Conundrum: Examining the Medical-Legal Disconnect in Toxic Tort Cases from a Cultural Perspective or How the Law Swallowed the Epidemiologist and Grew Long Legs and a Tail,” 51 Creighton L. Rev. 319 (2018) (arguing for a standard-less approach that allows clinicians to offer their ipse dixit opinions on specific causation).

[10] Differential Etiology at 915 & n.231, 919 & n.244 (citing In re Round-Up Prods. Liab. Litig., 358 F. Supp. 3d 956, 960 (N.D. Cal. 2019).

[11] Differential Etiology at 856 (emphasis added).

[12] Differential Etiology at 857.

[13] Differential Etiology at 857 & n.14 (citing Best v. Lowe’s Home Ctrs., Inc., 563 F.3d 171, 180 (6th Cir. 2009)).

[14] See Margaret Berger, “Eliminating General Causation: Notes Toward a New Theory of Justice and Toxic Torts,” 97 Colum L. Rev. 2117 (1997).

[15] Differential Etiology at 864.

[16] Jacques E. Rossouw, et al.,Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the Women’s Health Initiative randomized controlled trial,” 288 J. Am. Med. Ass’n 321 (2002).

[17] Differential Etiology at 884 & n.104, citing Frank Woodside & Allison Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013).

[18] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).  

[19] Differential Etiology at 865.

[20] Differential Etiology at 869.

[21] Differential Etiology at 872, citing Ronald L. Wasserstein and Nicole A. Lazar, “The ASA Statement on p-Values: Context, Process, and Purpose,” 72 Am. Statistician 129 (2016).

[22] Yoav Benjamini, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, Xiao-Li Meng, Nancy M. Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Ann. Applied Statistics 1084 (2021), 34 Chance 10 (2021).

[23] See “Statistical Significance at the New England Journal of Medicine” (July 19, 2019); See also Deborah G. Mayo, “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?” Error Statistics Philosophy  (July 19, 2019).

[24] Differential Etiology at 898 n.173 (emphasis added).

[25] Differential Etiology at 890.

[26] Differential Etiology at n.134, citing Chlorine Chemistry Council v. Envt’l Protection Agency, 206 F.3d 1286 (D.C. Cir. 2000), which rejected the agency’s assumption that the carcinogenic effects of chloroform in drinking water lacked a threshold.

Improper Reliance upon Regulatory Risk Assessments in Civil Litigation

March 19th, 2022

Risk assessments would seemingly be about assessing risks, but they are not. The Reference Manual on Scientific Evidence defines “risk” as “[a] probability that an event will occur (e.g., that an individual will become ill or die within a stated period of time or by a certain age).”[1] The risk in risk assessment, however, may be zero, or uncertain, or even a probability of benefit. Agencies that must assess risks and set “action levels,” or “permissible exposure limits,” or “acceptable intakes,” often work under great uncertainty, with inspired guesswork, using unproven assumptions.

The lawsuit industry has thus often embraced the false equivalence between agency pronouncements on harmful medicinal, environmental, or occupational exposures and civil litigation adjudication of tortious harms. In the United States, federal agencies such as the Occupational Safety and Health Administration (OSHA), or the Environmental Protection Agency (EPA), and their state analogues, regularly set exposure standards that could not and should not hold up in a common-law tort case. 

Remarkably, there are state and federal court judges who continue to misunderstand and misinterpret regulatory risk assessments, notwithstanding efforts to educate the judiciary. The second edition of the Reference Manual on Scientific Evidence contained a chapter by the late Professor Margaret Berger, who took pains to point out the difference between agency assessments and the adjudication of causal claims in court:

[p]roof of risk and proof of causation entail somewhat different questions because risk assessment frequently calls for a cost-benefit analysis. The agency assessing risk may decide to bar a substance or product if the potential benefits are outweighed by the possibility of risks that are largely unquantifiable because of presently unknown contingencies. Consequently, risk assessors may pay heed to any evidence that points to a need for caution, rather than assess the likelihood that a causal relationship in a specific case is more likely than not.[2]

In March 2003, Professor Berger organized a symposium,[3] the first Science for Judges program (and the last), where the toxicologist Dr. David L. Eaton presented on the differences in the use of toxicology in regulatory pronouncements as opposed to causal assessments in civil actions. As Dr. Eaton noted:

“regulatory levels are of substantial value to public health agencies charged with ensuring the protection of the public health, but are of limited value in judging whether a particular exposure was a substantial contributing factor to a particular individual’s disease or illness.”[4]

The United States Environmental Protection Agency (EPA) acknowledges that estimating “risk” from low level exposures based upon laboratory animal data is fraught because of inter-specie differences in longevity, body habitus and size, genetics, metabolism, excretion patterns, genetic homogeneity of laboratory animals, dosing levels and regimens. The EPA’s assumptions in conducting and promulgating regulatory risk assessments are intended to predict the upper bound of theoretical risk, while fully acknowledging that there may be no actual risk in humans:

“It should be emphasized that the linearized multistage [risk assessment] procedure leads to a plausible upper limit to the risk that is consistent with some proposed mechanisms of carcinogenesis. Such an estimate, however, does not necessarily give a realistic prediction of the risk. The true value of the risk is unknown, and may be as low as zero.”[5]

The approach of the U.S. Food and Drug Administration (FDA) with respect to mutagenic impurities in medications provides an illustrative example of how theoretical and hypothetical risk assessment can be.[6] The FDA’s risk assessment approach is set out in a “Guidance” document, which like all such FDA guidances, describes itself as containing non-binding recommendations, which do not preempt alternative approaches.[7] The agency’s goal is devise a control strategy for any mutagenic impurity to keep it at or below an “acceptable cancer risk level,” even if the risk or the risk level is completely hypothetical.

The FDA guidance advances the concept of a “Threshold of Toxicological Concern (TTC),” to set an “acceptable intake,” for chemical impurities that pose negligible risks of toxicity or carcinogenicity.[8] The agency describes its risk assessment methodology as “very conservative,” given the frequently unproven assumptions made to reach a quantification of an “acceptable intake”:

“The methods upon which the TTC is based are generally considered to be very conservative since they involve a simple linear extrapolation from the dose giving a 50% tumor incidence (TD50) to a 1 in 10-6 incidence, using TD50 data for the most sensitive species and most sensitive site of tumor induction. For application of a TTC in the assessment of acceptable limits of mutagenic impurities in drug substances and drug products, a value of 1.5 micrograms (µg)/day corresponding to a theoretical 10-5 excess lifetime risk of cancer can be justified.”

For more potent mutagenic carcinogens, such as aflatoxin-like-, N-nitroso-, and alkyl-azoxy compounds, the acceptable intake or permissible daily exposure (PDE) is set lower, based upon available animal toxicologic data.

The important divide between regulatory practice and the litigation of causal claims in civil actions arises from the theoretical nature of the risk assessment enterprise. The FDA acknowledges, for instance, that the acceptable intake is set to mark “a small theoretical increase in risk,” and a “highly hypothetical concept that should not be regarded as a realistic indication of the actual risk,” and thus not an actual risk.[9] The corresponding hypothetical or theoretical risk to the acceptable intake level is clearly small when compared with the human’s lifetime probability of developing cancer (which the FDA states is greater than 1/3, but probably now approaches 40%).

Although the TTC concept allows a calculation of an estimated “safe exposure,” the FDA points out that:

“exceeding the TTC is not necessarily associated with an increased cancer risk given the conservative assumptions employed in the derivation of the TTC value. The most likely increase in cancer incidence is actually much less than 1 in 100,000. *** Based on all the above considerations, any exposure to an impurity that is later identified as a mutagen is not necessarily associated with an increased cancer risk for patients already exposed to the impurity. A risk assessment would determine whether any further actions would be taken.”

In other words the FDA’s risk assessment exists to guide agency action, not to determine a person’s risk or medical status.[10]

As small and theoretical as the risks are, they are frequently based upon demonstrably incorrect assumptions, such as:

  1. humans are as sensitive as the most sensitive species;
  2. all organs are as sensitive as the most sensitive organ of the most sensitive species;
  3. the dose-response in the most sensitive species is a simple linear relationship;
  4. the linear relationship runs from zero exposure and zero risk to the exposure that yields the so-called TD50, the exposure that yields tumors in 50% of the experimental animal model;
  5. the TD-50 is calculated based upon the point estimate in the animal model study, regardless of any confidence interval around the point estimate;
  6. the inclusion, in many instances, of non-malignant tumors as part of the assessment of the TD50 exposure;
  7. there is some increased risk for any exposure, no matter how small; that is, there is no threshold below which there is no increased risk; and
  8. the medication with the mutagenic impurity was used daily for 70 years, by a person who weights 50 kg.

Although the FDA acknowledges that there may be some instances in which a “less than lifetime level” (LTL) may be appropriate, it places the burden on manufacturers to show the appropriateness of higher LTLs. The FDA’s M7 Guidance observes that

“[s]tandard risk assessments of known carcinogens assume that cancer risk increases as a function of cumulative dose. Thus, cancer risk of a continuous low dose over a lifetime would be equivalent to the cancer risk associated with an identical cumulative exposure averaged over a shorter duration.”[11]

Similarly, the agency acknowledges that there may be a “practical threshold,” as result of bodily defense mechanisms, such as DNA repair, which counter any ill effects from lower level exposures.[12]

“The existence of mechanisms leading to a dose response that is non-linear or has a practical threshold is increasingly recognized, not only for compounds that interact with non-DNA targets but also for DNA-reactive compounds, whose effects may be modulated by, for example, rapid detoxification before coming into contact with DNA, or by effective repair of induced damage. The regulatory approach to such compounds can be based on the identification of a No-Observed Effect Level (NOEL) and use of uncertainty factors (see ICH Q3C(R5), Ref. 7) to calculate a permissible daily exposure (PDE) when data are available.”

Expert witnesses often attempt to bootstrap their causation opinions by reference to determinations of regulatory agencies that are couched in similar language, but which use different quality and quantity of evidence than is required in the scientific community or in civil courts.

Supreme Court

Industrial Union Dep’t v. American Petroleum Inst., 448 U.S. 607, 656 (1980) (“OSHA is not required to support its finding that a significant risk exists with anything approaching scientific certainty” and “is free to use conservative assumptions in interpreting the data with respect to carcinogens, risking error on the side of overprotection, rather than underprotection.”).

Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309, 1320 (2011) (regulatory agency often makes regulatory decisions based upon evidence that gives rise only to a suspicion of causation) 

First Circuit

Sutera v. Perrier Group of America, Inc., 986 F. Supp. 655, 664-65, 667 (D. Mass. 1997) (a regulatory agency’s “threshold of proof is reasonably lower than that in tort law”; “substances are regulated because of what they might do at given levels, not because of what they will do. . . . The fact of regulation does not imply scientific certainty. It may suggest a decision to err on the side of safety as a matter of regulatory policy rather than the existence of scientific fact or knowledge. . . . The mere fact that substances to which [plaintiff] was exposed may be listed as carcinogenic does not provide reliable evidence that they are capable of causing brain cancer, generally or specifically, in [plaintiff’s] case.”); id. at 660 (warning against the danger that a jury will “blindly accept an expert’s opinion that conforms with their underlying fears of toxic substances without carefully understanding or examining the basis for that opinion.”). Sutera is an important precedent, which involved a claim that exposure to an IARC category I carcinogen, benzene, caused plaintiffs’ leukemia. The plaintiff’s expert witness, Robert Jacobson, espousing a “linear, no threshold” theory, and relying upon an EPA regulation, which he claimed supported his opinion that even trace amounts of benzene can cause leukemia.

In re Neurontin Mktg., Sales Practices, and Prod. Liab. Litig., 612 F. Supp. 2d 116, 136 (D. Mass. 2009) (‘‘It is widely recognized that, when evaluating pharmaceutical drugs, the FDA often uses a different standard than a court does to evaluate evidence of causation in a products liability action. Entrusted with the responsibility of protecting the public from dangerous drugs, the FDA regularly relies on a risk-utility analysis, balancing the possible harm against the beneficial uses of a drug. Understandably, the agency may choose to ‘err on the side of caution,’ … and take regulatory action such as revising a product label or removing a drug from the marketplace ‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-like-than-not standard used to assess tort liability’.’’) (internal citations omitted) 

Whiting v. Boston Edison Co., 891 F. Supp. 12, 23-24 (D. Mass. 1995) (criticizing the linear no-threshold hypothesis, common to regulatory risk assessments, because it lacks any known or potential error rate, and it cannot be falsified as would any scientific theory)

Second Circuit

Wills v. Amerada Hess Corp., No. 98 CIV. 7126(RPP), 2002 WL 140542 (S.D.N.Y. Jan. 31, 2002), aff’d, 379 F.3d 32 (2d Cir. 2004) (Sotomayor, J.). In this Jones Act case, the plaintiff claimed that her husband’s exposure to benzene and polycyclic aromatic hydrocarbons on board ship caused his squamous cell lung cancer. Plaintiff’s expert witness relied heavily upon the IARC categorization of benzene as a “known” carcinogen, and an “oncogene” theory of causation that claimed there was no safe level of exposure because a single molecule could induce cancer. According to the plaintiff’s expert witness, the oncogene theory dispensed with the need to quantify exposure. Then Judge Sotomayor, citing Sutera, rejected plaintiff’s no-threshold theory, and the argument that exposure that exceeded OHSA permissible exposure level supported the causal claim.

Mancuso v. Consolidated Edison Co., 967 F. Supp. 1437, 1448 (S.D.N.Y. 1997) (“recommended or prescribed precautionary standards cannot provide legal causation”; “[f]ailure to meet regulatory standards is simply not sufficient” to establish liability)

In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir.1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988). Judge Weinstein explained that regulatory action would not by itself support imposing liability for an individual plaintiff.  Id. at 782. “A government administrative agency may regulate or prohibit the use of toxic substances through rulemaking, despite a very low probability of any causal relationship.  A court, in contrast, must observe the tort law requirement that a plaintiff establish a probability of more than 50% that the defendant’s action injured him.” Id. at 785.

In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 189 (S.D.N.Y. 2005) (improvidently relying in part upon FDA ban despite “the absence of definitive scientific studies establishing causation”)

Third Circuit

Gates v. Rohm & Haas Co., 655 F.3d 255, 268 (3d Cir. 2011) (affirming the denial of class certification for medical monitoring) (‘‘plaintiffs could not carry their burden of proof for a class of specific persons simply by citing regulatory standards for the population as a whole’’).

In re Schering-Plough Corp. Intron/Temodar Consumer Class Action, 2009 WL 2043604, at *13 (D.N.J. July 10, 2009)(“[T]here is a clear and decisive difference between allegations that actually contest the safety or effectiveness of the Subject Drugs and claims that merely recite violations of the FDCA, for which there is no private right of action.”)

Rowe v. E.I. DuPont de Nemours & Co., Civ. No. 06-1810 (RMB), 2008 U.S. Dist. LEXIS 103528, *46-47 (D.N.J. Dec. 23, 2008) (rejecting reliance upon regulatory findings and risk assessments in which “the basic goal underlying risk assessments . . . is to determine a level that will protect the most sensitive members of the population.”)  (quoting David E. Eaton, “Scientific Judgment and Toxic Torts – A Primer in Toxicology for Judges and Lawyers,” 12 J.L. & Pol’y 5, 34 (2003) (“a number of protective, often ‘worst case’ assumptions . . . the resulting regulatory levels . . . generally overestimate potential toxicity levels for nearly all individuals.”)

Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434, 543 (W.D. Pa. 2003) (finding FDA regulatory proceedings and adverse event reports not adequate or helpful in determining causation; the FDA “ordinarily does not attempt to prove that the drug in fact causes a particular adverse effect.”)Wade-Greaux v. Whitehall Laboratories, Inc., 874 F. Supp. 1441, 1464 (D.V.I.) (“assumption[s that] may be useful in a regulatory risk-benefit context … ha[ve] no applicability to issues of causation-in-fact”), aff’d, 46 F.3d 1120 (3d  Cir. 1994)

O’Neal v. Dep’t of the Army, 852 F. Supp. 327, 333 (M.D. Pa. 1994) (administrative risk figures are “appropriate for regulatory purposes in which the goal is to be particularly cautious [but] overstate the actual risk and, so, are inappropriate for use in determining” civil liability)

Fourth Circuit

Dunn v. Sandoz Pharmaceuticals Corp., 275 F. Supp. 2d 672, 684 (M.D.N.C. 2003) (FDA “risk benefit analysis” “does not demonstrate” causation in any particular plaintiff)

Yates v. Ford Motor Co., 113 F. Supp. 3d 841, 857 (E.D.N.C. 2015) (“statements from regulatory and official agencies … are not bound by standards for causation found in toxic tort law”)

Meade v. Parsley, No. 2:09-cv-00388, 2010 U.S. Dist. LEXIS 125217, * 25 (S.D.W. Va. Nov. 24, 2010) (‘‘Inasmuch as the cost-benefit balancing employed by the FDA differs from the threshold standard for establishing causation in tort actions, this court likewise concludes that the FDA-mandated [black box] warnings cannot establish general causation in this case.’’)

Rhodes v. E.I. du Pont de Nemours & Co., 253 F.R.D. 365, 377 (S.D. W.Va. 2008) (rejecting the relevance of regulatory assessments, which are precautionary and provide no information about actual risk).

Fifth Circuit

Moore v. Ashland Chemical Co., 126 F.3d 679, 708 (5th Cir. 1997) (holding that expert witness could rely upon a material safety data sheet (MSDS) because mandated by the Hazard Communication Act, 29 C.F.R. § 1910.1200), vacated 151 F.3d 269 (5th Cir. 1998) (affirming trial court’s exclusion of expert witness who had relied upon MSDS).

Johnson v. Arkema Inc., 685 F.3d 452, 464 (5th Cir. 2012) (per curiam) (affirming exclusion of expert witness who upon regulatory pronouncements; noting the precautionary nature of such statements, and the absence of specificity for the result claimed at the exposures experienced by plaintiff)

Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198-99 (5th Cir. 1996) (“Scientific knowledge of the harmful level of exposure to a chemical, plus knowledge that the plaintiff was exposed to such quantities, are minimal facts necessary to sustain the plaintiffs’ burden in a toxic tort case”; regulatory agencies, charged with protecting public health, employ a lower standard of proof in promulgating regulations than that used in tort cases). The Allen court explained that it was “also unpersuaded that the “weight of the evidence” methodology these experts use is scientifically acceptable for demonstrating a medical link. . . .  Regulatory and advisory bodies. . .utilize a “weight of the evidence” method to assess the carcinogenicity of various substances in human beings and suggest or make prophylactic rules governing human exposure.  This methodology results from the preventive perspective that the agencies adopt in order to reduce public exposure to harmful substances.  The agencies’ threshold of proof is reasonably lower than that appropriate in tort law, which traditionally makes more particularized inquiries into cause and effect and requires a plaintiff to prove that it is more likely than not that another individual has caused him or her harm.” Id.

Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, *8 (E.D. La. June 16, 2015) (explaining Fifth Circuit’s rejection of regulatory “weight of the evidence” approaches to evaluating causation)

Sprankle v. Bower Ammonia & Chem. Co., 824 F.2d 409, 416 (5th Cir. 1987) (affirmed Rule 403 exclusion evidence of OSHA violations in claim of respiratory impairment in a non-employee who experienced respiratory impairment after exposure to anhydrous ammonia; court found that the jury likely be confused by regulatory pronouncements)

Cano v. Everest Minerals Corp., 362 F. Supp. 2d 814, 825 (W.D. Tex. 2005) (noting that a product that “has been classified as a carcinogen by agencies responsible for public health regulations is not probative of” common-law specific causation) (finding that the linear no-threshold opinion of the plaintiffs’ expert witness, Malin Dollinger, lacked a satisfactory scientific basis)

Burleson v. Glass, 268 F. Supp. 2d 699, 717 (W.D. Tex. 2003) (“the mere fact that [the product] has been classified by certain regulatory organizations as a carcinogen is not probative on the issue of whether [plaintiff’s] exposure. . .caused his. . .cancers”), aff’d, 393 F.3d 577 (5th Cir. 2004)

Newton v. Roche Labs., Inc., 243 F. Supp. 2d 672, 677, 683 (W.D. Tex. 2002) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events) (“Although evidence of an association may … be important in the scientific and regulatory contexts…, tort law requires a higher standard of causation.”)

Molden v. Georgia Gulf Corp., 465 F. Supp. 2d 606, 611 (M.D. La. 2006) (“regulatory and advisory bodies make prophylactic rules governing human exposure based on proof that is reasonably lower than that appropriate in tort law”)

Sixth Circuit

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-53 (6th Cir. 2001) (exposure above regulatory levels is insufficient to establish causation)

Stites v Sundstrand Heat Transfer, Inc., 660 F. Supp. 1516, 1525 (W.D. Mich. 1987) (rejecting use of regulatory standards to support claim of increased risk, noting the differences in goals and policies between regulation and litigation)

Mann v. CSX Transportation, Inc., case no. 1:07-Cv-3512, 2009 U.S. Dist. Lexis 106433 (N.D. Ohio Nov. 10, 2009) (rejecting expert testimony that relied upon EPA action levels, and V.A. compensation for dioxin exposure, as basis for medical monitoring opinions)

Baker v. Chevron USA, Inc., 680 F. Supp. 2d 865, 880 (S.D. Ohio 2010) (“[R]egulatory agencies are charged with protecting public health and thus reasonably employ a lower threshold of proof in promulgating their regulations than is used in tort cases.”) (“[t]he mere fact that Plaintiffs were exposed to [the product] in excess of mandated limits is insufficient to establish causation”; rejecting Dr. Dahlgren’s opinion and its reliance upon a “one-hit” or “no threshold” theory of causation in which exposure to one molecule of a cancer-causing agent has some finite possibility of causing a genetic mutation leading to cancer, a theory that may be accepted for purposes of setting regulatory standards, but not as reliable scientific knowledge)

Adams v. Cooper Indus., 2007 WL 2219212 at *7 (E.D. KY 2007).

Seventh Circuit

Wood v. Textron, Inc., No. 3:10 CV 87, 2014 U.S. Dist. LEXIS 34938 (N.D. Ind. Mar. 17, 2014); 2014 U.S. Dist. LEXIS 141593, at *11 (N.D. Ind. Oct. 3, 2014), aff’d, 807 F.3d 827 (7th Cir. 2015). Dahlgren based his opinions upon the children’s water supply containing vinyl chloride in excess of regulatory levels set by state and federal agencies, including the EPA. Similarly, Ryer-Powder relied upon exposure levels’ exceeding regulatory permissible limits for her causation opinions. The district court, with the approval now of the Seventh Circuit would have none of this nonsense. Exceeding governmental regulatory exposure limits does not prove causation. The con-compliance does not help the fact finder without knowing “the specific dangers” that led the agency to set the permissible level, and thus the regulations are not relevant at all without this information. Even with respect to specific causation, the regulatory infraction may be weak or null evidence for causation. (citing Cunningham v. Masterwear Corp., 569 F.3d 673, 674–75 (7th Cir. 2009)

Eighth Circuit

Glastetter v. Novartis Pharms. Corp., 107 F. Supp. 2d 1015, 1036 (E.D. Mo. 2000) (“[T]he [FDA’s] statement fails to affirmatively state that a connection exists between [the drug] and the type of injury in this case.  Instead, it states that the evidence received by the FDA calls into question [drug’s] safety, that [the drug] may be an additional risk factor. . .and that the FDA had new evidence suggesting that therapeutic use of [the drug] may lead to serious adverse experiences.  Such language does not establish that the FDA had concluded that [the drug] can cause [the injury]; instead, it indicates that in light of the limited social utility of [the drug for the use at issue] and the reports of possible adverse effects, the drug should no longer be used for that purpose.”) (emphasis in original), aff’d, 252 F.3d 986, 991 (8th Cir. 2001) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events; “methodology employed by a government agency results from the preventive perspective that the agencies adopt”)( “The FDA will remove drugs from the marketplace upon a lesser showing of harm to the public than the preponderance-of-the-evidence or the more-like-than-not standard used to assess tort liability . . . . [Its] decision that [the drug] can cause [the injury] is unreliable proof of medical causation.”), aff’d, 252 F.3d 986 (8th Cir. 2001)

Wright v. Willamette Indus., Inc., 91 F.3d 1105, 1107 (8th Cir. 1996) (rejecting claim that plaintiffs were not required to show individual exposure levels to formaldehyde from wood particles). The Wright court elaborated upon the difference between adjudication and regulation of harm:

“Whatever may be the considerations that ought to guide a legislature in its determination of what the general good requires, courts and juries, in deciding cases, traditionally make more particularized inquiries into matters of cause and effect.  Actions in tort for damages focus on the question of whether to transfer money from one individual to another, and under common-law principles (like the ones that Arkansas law recognizes) that transfer can take place only if one individual proves, among other things, that it is more likely than not that another individual has caused him or her harm.  It is therefore not enough for a plaintiff to show that a certain chemical agent sometimes causes the kind of harm that he or she is complaining of.  At a minimum, we think that there must be evidence from which the factfinder can conclude that the plaintiff was exposed to levels of that agent that are known to cause the kind of harm that the plaintiff claims to have suffered. See Abuan v. General Elec. Co., 3 F.3d at 333.  We do not require a mathematically precise table equating levels of exposure with levels of harm, but there must be evidence from which a reasonable person could conclude that a defendant’s emission has probablycaused a particular plaintiff the kind of harm of which he or she complains before there can be a recovery.”

Gehl v. Soo Line RR, 967 F.2d 1204, 1208 (8th Cir. 1992).

Nelson v. Am. Home Prods. Corp., 92 F. Supp. 2d 954, 958 (W.D. Mo. 2000) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

National Bank of Commerce v. Associated Milk Producers, Inc., 22 F. Supp. 2d 942, 961 (E.D.Ark. 1998), aff’d, 191 F.3d 858 (8th Cir. 1999) 

Junk v. Terminix Internat’l Co., 594 F. Supp. 2d 1062, 1071 (S.D. Iowa 2008) (“government agency regulatory standards are irrelevant to [plaintiff’s] burden of proof in a toxic tort cause of action because of the agency’s preventative perspective”)

Ninth Circuit

Henrickson v. ConocoPhillips Co., 605 F. Supp. 2d 1142, 1156 (E.D. Wash. 2009) (excluding expert witness causation opinions in case involving claims that benzene exposure caused leukemia) 

Lopez v. Wyeth-Ayerst Labs., Inc., 1998 WL 81296, at *2 (9th Cir. Feb. 25, 1998) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

In re Epogen & Aranesp Off-Label Marketing & Sales Practices Litig., 2009 WL 1703285, at *5 (C.D. Cal. June 17, 2009) (“have not been proven” allegations are an improper “FDA approval” standard; the FDA’s determination to require warning changes without establishing causation is established does not permit a court or jury, bound by common-law standards, to impose such a duty to warn when common-law causation requirements are not met).

In re Hanford Nuclear Reservation Litig., 1998 U.S. Dist. Lexis 15028 (E.D. Wash. 1998) (radiation and chromium VI), rev’d on other grounds, 292 F.3d 1124 (9th Cir. 2002).

Tenth Circuit

Hollander v. Shandoz Pharm. Corp., 95 F. Supp. 2d 1230, 1239 (W.D. Okla. 2000) (distinguishing FDA’s threshold of proof as lower than appropriate in tort law), aff’d in relevant part, 289 F.3d 1193, 1215 (10th Cir. 2002)

Mitchell v. Gencorp Inc., 165 F.3d 778, 783 n.3 (10th Cir. 1999) (benzene and CML) (quoting Allen, 102 F.3d at 198) (state administrative finding that product was a carcinogen was based upon lower administrative standard than tort standard) (“The methodology employed by a government agency “results from the preventive perspective that the agencies adopt in order to reduce public exposure to harmful substances.  The agencies’ threshold of proof is reasonably lower than that appropriate in tort law, which traditionally makes more particularized inquiries into cause and effect and requires a plaintiff to prove it is more likely than not that another individual has caused him or her harm.”)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1229 (D.Colo. 1998)

Johnston v. United States, 597 F. Supp. 374, 393-394 (D. Kan.1984) (noting that the linear no-threshold hypothesis is based upon a prudent assumption designed to overestimate risk; speculative hypotheses are not appropriate in determining whether one person has harmed another)

Eleventh Circuit

Rider v. Sandoz Pharmaceuticals Corp., 295 F.3d 1194, 1201 (11th Cir. 2002) (FDA may take regulatory action, such as revising warning labels or withdrawing drug from the market ‘‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-likely-than-not standard used to assess tort liability’’) (“A regulatory agency such as the FDA may choose to err on the side of caution. Courts, however, are required by the Daubert trilogy to engage in objective review of the evidence to determine whether it has sufficient scientific basis to be considered reliable.”)

McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1248-1250 (11th Cir. 2005) (ephedra) (allowing that regulators “may pay heed to any evidence that points to a need for caution,” and apply “a much lower standard than that which is demanded by a court of law”) (“[U]se of FDA data and recommendations raises a more subtle methodological issue in a toxic tort case. The issue involves identifying and contrasting the type of risk assessment that a government agency follows for establishing public health guidelines versus an expert analysis of toxicity and causation in a toxic tort case.”)

In re Seroquel Products Liab. Litig., 601 F. Supp. 2d 1313, 1315 (M.D. Fla. 2009) (noting that administrative agencies “impose[] different requirements and employ[] different labeling and evidentiary standards” because a “regulatory system reflects a more prophylactic approach” than the common law)

Siharath v. Sandoz Pharmaceuticals Corp., 131 F. Supp. 2d 1347, 1370 (N.D. Ga. 2001) (“The standard by which the FDA deems a drug harmful is much lower than is required in a court of law.  The FDA’s lesser standard is necessitated by its prophylactic role in reducing the public’s exposure to potentially harmful substances.”), aff’d, 295 F.3d 1194 330 (11th Cir. 2002)

In re Accutane Products Liability, 511 F.Supp.2d 1288, 1291-92 (M.D. Fla. 2007)(acknowledging that regulatory risk assessments are not necessarily realistic in human populations because they are often based upon animal studies, and that the important differences between experimental animals and humans are substantial in various health outcomes).

Kilpatrick v. Breg, Inc., 2009 WL 2058384 at * 6-7 (S.D. Fla. 2009) (excluding plaintiff’s expert witness), aff’d, 613 F.3d 1329 (11th Cir. 2010)

District of Columbia Circuit

Ethyl Corp. v. E.P.A., 541 F.2d 1, 28 & n. 58 (D.C. Cir. 1976) (detailing the precautionary nature of agency regulations that may be based upon suspicions)

STATE COURTS

Arizona

Lofgren v. Motorola, 1998 WL 299925 (Ariz. Super. Ct. 1998) (finding plaintiffs’ expert witnesses’ testimony that TCE caused cancer to be not generally accepted; “it is appropriate public policy for health organizations such as IARC and the EPA to make judgments concerning the health and safety of the population based on evidence which would be less than satisfactory to support a specific plaintiff’s tort claim for damages in a court of law”)

Colorado

Salazar v. American Sterilizer Co., 5 P.3d 357 (Colo. Ct. App. 2000) (allowing testimony about harmful ethylene oxide exposure based upon OSHA regulations)

Georgia

Butler v. Union Carbide Corp., 712 S.E.2d 537, 552 & n.37 (Ga. App. 2011) (distinguishing risk assessment from causation assessment; citing the New York Court of Appeals decision in Parker for correctly rejecting reliance on regulatory pronouncements for causation determinations)

Illinois

La Salle Nat’l Bank v. Malik, 705 N.E.2d 938 (Ill. App. 3d) (reversing trial court’s exclusion of OSHA PEL for ethylene oxide), writ pet’n den’d, 714 N.E.2d 527 (Ill. 2d 1999)

New York

Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 450, 857 N.E.2d 1114, 1122, 824 N.Y.S.2d 584 (N.Y. 2006) (noting that regulatory agency standards usually represent precautionary principle efforts deliberately to err on side of prevention; “standards promulgated by regulatory agencies as protective measures are inadequate to demonstrate legal causation.” 

In re Bextra & Celebrex, 2008 N.Y. Misc. LEXIS 720, *20, 239 N.Y.L.J. 27 (2008) (characterizing FDA Advisory Panel recommendations as regulatory standard and protective measure).

Juni v. A.O. Smith Water Products Co., 48 Misc. 3d 460, 11 N.Y.S.3d 416, 432, 433 (N.Y. Cty. 2015) (“the reports and findings of governmental agencies [declaring there to be no safe dose of asbestos] are irrelevant as they constitute insufficient proof of causation”), aff’d, 32 N.Y.3d 1116, 116 N.E.3d 75, 91 N.Y.S.3d 784 (2018)

Ohio

Valentine v. PPG Industries, Inc., 821 N.E.2d 580, 597-98 (Ohio App. 2004), aff’d, 850 N.E.2d 683 (Ohio 2006). 

Pennsylvania

Betz v. Pneumo Abex LLC, 44 A. 3d 27 (Pa. 2012).

Texas

Borg-Warner Corp., 232 S.W.3d 765, 770 (Tex. 2007)

Exxon Corp. v. Makofski, 116 S.W.3d 176, 187-88 (Tex. App. 2003) (describing “standards used by OSHA [and] the EPA” as inadequate for causal determinations)


[1] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 549, 627 (3d ed. 2011).

[2] Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Reference Manual On Scientific Evidence at 33 (Fed. Jud. Center 2d. ed. 2000).

[3] Margaret A. Berger, “Introduction to the Symposium,” 12 J. L. & Pol’y 1 (2003). Professor Berger described the symposium as a “felicitous outgrowth of a grant from the Common Benefit Trust established in the Silicone Breast Implant Products Liability Litigation to hold a series of conferences at Brooklyn Law School.” Id. at 1. Ironically, that “Trust” was nothing more than the walking-around money of plaintiffs’ lawyers from the Silicone-Gel Breast Implant MDL 926. Although Professor Berger was often hostile the causation requirement in tort law, her symposium included some well-qualified scientists who amplified her point from the Reference Manual about the divide between regulatory risk assessment and scientific causal assessments.

[4] David L. Eaton, Scientific Judgment and Toxic Torts- A Primer in Toxicology for Judges and Lawyers, 12 J.L. & Pol’y 5, 36 (2003). See also Joseph V. Rodricks and Susan H. Rieth, “Toxicological risk assessment in the courtroom: are available methodologies suitable for evaluating toxic tort and product liability claims?” 27 Regul. Toxicol. & Pharmacol. 21, 27 (1998) (“The public health-oriented resolution of scientific uncertainty [used by regulators] is not especially helpful to the problem faced by a court.”)

[5] EPA “Guidelines for Carcinogen Risk Assessment” at 13 (1986).

[6] The approach is set out in FDA, M7 (R1) Assessment and Control of DNA Reactive (Mutagenic) Impurities in Pharmaceuticals to Limit Potential Carcinogenic Risk: Guidance for Industry (2018) [FDA M7]. This FDA guidance is essentially an adoption of the M7 document of the Expert Working Group (Multidisciplinary) of the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH).

[7] FDA M7 at 3.

[8] FDA M7 at 5.

[9] FDA M7 at 5 (emphasis added).

[10] See Labeling of Diphenhydramine Containing Drug Products for Over-the-Counter Human Use, 67 Fed. Reg. 72,555, at 72,556 (Dec. 6, 2002) (“FDA’s decision to act in an instance such as this one need not meet the standard of proof required to prevail in a private tort action. . .. To mandate a warning or take similar regulatory action, FDA need not show, nor do we allege, actual causation.”) (citing Glastetter).

[11] FDA M7 at “Acceptable Intakes in Relation to Less-Than-Lifetime (LTL) Exposure (7.3).”

[12] FDA M7 at 12 (“Mutagenic Impurities With Evidence for a Practical Threshold (7.2.2)”).

When the American Medical Association Woke Up

November 17th, 2021

“You are more than entitled not to know what the word ‘performative’ means. It is a new word and an ugly word, and perhaps it does not mean anything very much. But at any rate there is one thing in its favor, it is not a profound word.”

J.L. Austin, “Performative Utterances,” in Philosophical Papers 233 (2nd ed. 1970).

John Langshaw Austin, J.L. to his friends, was a English philosopher who focused on language and how it actually worked in the real world. Austin developed the concept of performative utterances, which have since come to be known as “speech acts.” Little did J.L. know that performative utterances would come to dominate politics and social media.

The key aspect of spoken words that function as speech acts is that they do not simply communicate information, which might have some truth value, and some epistemic basis. Speech acts consist of actual conduct, such as promising, commanding, apologizing, etc.[1] The law has long implicitly recognized the distinction between factual assertions or statements and speech acts. The Federal Rules of Evidence, for instance, limits the rule against hearsay to “statements,” meaning written assertions or nonverbal conduct (such as nodding in agreement) that is intended as an assertion.[2]

When persons in wedding ceremonies say “I do,” at the appropriate moments, they are married, by virtue of their speech acts. Similarly for contracts and other promising under circumstances that give rise to enforceable contracts. A witness’s recounting another’s vows or promises is not hearsay because the witness is offering a recollection only for the fact that the utterance was made, and not to prove the truth of a matter asserted.[3]

The notion of a speech act underlies much political behavior these days. When people palaver about Q, or some QAnon conspiracy, the principle of charity requires us to understand them as not speaking words that can be true or false, but simply signaling their loyalty to a lost cause, usually associated with the loser of the 2020 presidential election. By exchanging ridiculous and humiliating utterances, fellow cultists are signaling loyalty, not making a statement about the world. Their “speech acts” are similar to rituals of exchanging blood with pledges of fraternity.

Of course, there are morons who show up at concerts expecting John F. Kennedy, Jr., to appear, or who show up at pizza places in Washington, D.C., armed with semiautomatic rifles, because their credulity outstripped the linguistic nuances of performative utterances about the Clintons. In days past, members of a cult would get a secret tatoo or wear a special piece of jewelry. Now, the way to show loyalty is to say stupid things in public, and not to laugh when your fellow cultists say similar things.

Astute observers of political systems, on both the left (George Orwell) and the right (Eric Voegelin) have long recognized that ideologies destroy language, including speech acts and performative utterances. The destructive capacities of ideologies are especially disturbing when they invade science and medicine. Alas, the ideology of the Woke has arrived in the halls of the American Medical Association (AMA).

Last month, AMA issued its guide to politically correct language, designed to advance health “equity”: “Advancing Health Equity: A Guide to Language, Narrative and Concepts (Nov. 2, 2021).” The 54 page guide is, at times, worthy of a MAD magazine parody, but the document quickly transcends parody to take us into an Orwellian nightmare of thought-control in the name of neo-Marxist “social justice” goals.[4]

In its guide to language best practices, the AMA urges us to promote health equity by adding progressive political language to what were once simple statements of fact. The AMA document begins with what seems affected, insincere humility:

“We share this document with humility. We recognize that language evolves, and we are mindful that context always matters. This guide is not and cannot be a check list of correct answers. Instead, we hope that this guide will stimulate critical thinking about language, narrative and concepts—helping readers to identify harmful phrasing in their own work and providing alternatives that move us toward racial justice and health equity.”

This pretense at humility quickly evaporates as the document’s tone become increasingly censorious and strident. The AMA seems less concerned with truth, evidence-based conclusions, or dialogue, than with conformity to social justice norms of the Woke mob.

In Table 1, the AMA introduces some “Key Principles and Associated Terms.” “Avoid use of adjectives such as vulnerable, marginalized and high-risk,” at least as to persons. Why? The AMA tells us that the use of such terms to describe individuals is “stigmatizing.” The terms are vague and imply (to the AMA) that the condition is inherent to the group rather than the actual root cause, which seems to be mostly, in the AMA’s view, the depredations of white cis-gendered men. To cure the social injustice, the AMA urges us to speak in terms of groups and communities (never individuals) that “have been historically marginalized or made vulnerable, or underserved, or under-resourced [sic], or experience disadvantage [sic].” The squishy passive voice pervades the AMA Guide, but the true subject – the oppressor – is easy to discern.

Putting aside the recurrent, barbarous use of the passive voice, we now must have medical articles that are sociological treatises. The AMA appears to be especially sensitive, perhaps hypersensitive, to what it considers “unintentional blaming.” For example, rather than discuss “[w]orkers who do not use PPE [personal protective equipment” or “people who do not seek healthcare,” the AMA instructs authors, without any apparent embarrassment or shame, to “try” substituting “workers under-resourced with” PPE, or “people with limited access to” healthcare.

Aside from assuaging the AMA’s social justice warriors, the substitutions are not remotely synonymous. There have been, there are, and there will likely always be workers and others who do not use protective equipment. There have been, there are, and there will likely always be persons who do not seek healthcare. For example, anti-vaxxing yutzballs can be found in all social strata and walks of life. Access to equipment or healthcare is a completely independent issue and concern. The AMA’s effort to hide these facts with the twisted passive-voice contortions assaults our language and our common sense.

Table 2 of the AMA Guide provides a list of commonly used words and phrases and the “equity-focused alternatives.”

“Disadvantaged” in Woke Speak becomes “historically and intentionally excluded.” The aspirational goal of “equality” is recast as “equity.” After all, mere equality, or treating everyone alike:

“ignores the historical legacy of disinvestment and deprivation through policy of historically marginalized and minoritized [sic] communities as well as contemporary forms of discrimination that limit opportunities. Through systematic oppression and deprivation from ethnocide, genocide, forced removal from land and slavery, Indigenous and Black people have been relegated to the lowest socioeconomic ranks of this country. The ongoing xenophobic treatment of undocumented brown people and immigrants (including Indigenous people disposed of their land in other countries) is another example. Intergenerational wealth has mainly benefited and exists for white families.”

In other words, treating people equally is racist. Non-racist is also racist. “Fairness” must also be banished; the equity-focused AMA requires “Social Justice.” Mere fairness pays “no attention” to power relations, and enforced distribution outcomes.

Illegal immigrants are, per AMA guidelines, transformed into “undocumented Immigrant,” because “illegal” is “a dehumanizing, derogatory term,” and because ‘[n]o human being is illegal.” The latter is a lovely sentiment, but human beings can be in countries unlawfully, just as they can be in the Capitol Building illegally.

“Non-compliance” is transmuted into “non-adherence,” because the former term “places blame for treatment failure solely on patients.” The latter term is suggested to exculpate patients, even though patients can be solely responsible for failing to follow prescribed treatment. The AMA wants, however, to remind us that non-adherence may result from “frustration and legitimate mistrust of health care, structural barriers that limit availability and accessibility of medications (including cost, insurance barriers and pharmacy deserts), time and resource constraints (including work hours, family responsibilities), and lack of effective communication about severity of disease or symptoms.” All true, but why not add sloth, stupidity, and superstition? We are still in a pandemic that has been fueled by non-compliance that largely warrants blame on the non-compliant.

The AMA wanders into fraught territory when it tells us impassively that identifying a “social problem” is now a sign of insensitivity. The AMA Woke Guide advises that social problems are really “social injustices.” Referring to a phenomenon as a social problem risks blaming people for their own “marginalization.” The term “marginalization” is part of the Social Justice jargon, and it occurs throughout the AMA Woke Guide. A handy glossary at the end of the document is provided for those of us who have not grown up in Woke culture:

“Marginalization: Process experienced by those under- or unemployed or in poverty, unable to participate economically or socially in society, including the labor market, who thereby suffer material as well as social deprivation.”[5]

The Woke apparently know that calling something a mere “social problem” makes it “seem less serious than social injustice,” and there is some chance that labeling a social phenomenon as a social problem risks “potentially blaming people for their own marginalization.” And yet not every social problem is a social injustice. Underage drinking and unprotected sex are social problems, as is widespread obesity and prevalent diabetes. Alcoholism is a social problem that is prevalent in all social strata; hardly a social injustice.

At page 23 of the Woke Guide, the AMA’s political hostility to individual agency and autonomy breaks through in a screed against meritocracy:

“Among these ideas is the concept of meritocracy, a social system in which advancement in society is based on an individual’s capabilities and merits rather than on the basis of family, wealth or social background. Individualism is problematic in obscuring the dynamics of group domination, especially socioeconomic privilege and racism. In health care, this narrative appears as an over-emphasis on changing individuals and individual behavior instead of the institutional and structural causes of disease.”

Good grief, now physicians cannot simply treat a person for a disease, they must treat entire tribes!

Table 5

Some of the most egregious language of the Woke Guide can be seen in its Table 5, entitled “Contrasting Conventional (Well-intentioned) Phrasing with Equity-focused Language that Acknowledges Root Causes of Inequities.” Table 5 makes clear that the AMA is working from a sociological program that is supported by implicit claims of knowledge for the “root causes” of inequities, a claim that should give everyone serious pause. After all, even if often disappointed, the readers of AMA journals expect rigorous scientific studies, carefully written and edited, which contribute to evidence-based medicine. There is nothing, however, in the AMA Guide, other than its ipse dixit, to support its claimed social justice etiologies.

Table 5 of the AMA Guide provides some of its most far-reaching efforts to impose a political vision through semantic legerdemain. Despite the lack of support for its claimed root causes, the AMA would force writers to assign Social Justice approved narratives and causation. A seemingly apolitical, neutral statement, such as:

“Low-income people have the highest level of coronary artery disease in the United States.”

now must be recast into sanctimonious cant that would warm the cockles of a cold Stalinist’s heart:

“People underpaid and forced into poverty as a result of banking policies, real estate developers gentrifying neighborhoods, and corporations weakening the power of labor movements, among others, have the highest level of coronary artery disease in the United States.”

Banks, corporations, and real estate developers have agency; people do not. With such verbiage, it will be hard to enforce page limits on manuscripts submitted to AMA journals. More important, however, is that the “root cause” analysis is not true in many cases. In countries where property is banned and labor owns the means of production, low-income people have higher rates of disease. The socio-economic variable is important, and consistent, across the globe, even in democratic socialist countries such as Sweden, or in Marxist paradises such as the People’s Republic of China and the former Soviet Union. The bewildered may wonder whether the AMA has ever heard of a control group. Maybe, just maybe, the increased incidence of coronary artery disease among the poor has more to do with Cheez Doodles than the ravages of capitalism.

CRITICAL REACTIONS

The AMA’s guide to linguistic etiquette is a transparent effort to advance a political agenda under the guise of language mandates. The AMA is not merely prescribing thoughtful substitutions for common phrases; the AMA guide is nothing less than an attempt to impose a “progressive” ideology with fulsome apologies. The AMA not only embraces, unquestioningly, the ideology of “white fragility, Ibram Kendi, and Robin DiAngelo; the AMA at times appears on the verge of medicalizing the behaviors of those who question or reject its Woke ideology. Is a psychiatric gulag the next step?

Dr. Michelle Cretella, the executive director of the American College of Pediatricians, expressed her concern that the AMA’s “social justice” plans are “rooted not in science and the medical ethics of the Hippocratic Oath, but in a host of Marxist ideologies that devalue the lives of our most vulnerable patients and seek to undermine the nuclear family which is the single most critical institution to child well-being.”[6]

Journalist Jesse Singal thinks that the AMA has gone berserk.[7] And Matt Bai, at the Washington Post, saw the AMA’s co-opting of language and narratives as having an Orwellian tone, resembling Mao’s “Little Red Book.”[8] The Post writer raised the interesting question why the AMA was even in the business of admonishing physicians and scientists about acceptable language. After all, the editors of Fowler’s Modern English Usage have managed for decades to eschew offering guidance on performing surgery. The Post opinion piece expresses a realistic concern that proposing “weird language” will worsen the current fraying of the social fabric, and pave the way for a Trump Restoration. Perhaps the AMA should stick to medicine rather than “mandating versions of history and their own lists of acceptable terminology.”

AMA Woke Speak has its antecedents,[9] and it will likely have its followers. For lawyers who work with expert witnesses, the AMA guide risks subjecting their medical witnesses to embarrassment, harassment, and impeachment for failing to comply with the new ideological orthodoxy. Just say no.


[1] See generally John L. Austin, How to Do Things with Words: The William James Lectures delivered at Harvard University in 1955 (1962).

[2] See Fed. R. Evid. Rule 801(a) & Notes of Advisory Comm. Definitions That Apply to This Article; Exclusions from Hearsay (defining statement).


[3] See, e.g., Emich Motors Corp. v. General Motors Corp., 181 F.2d 70 (7th Cir. 1950), rev’d on other grounds 340 U.S. 558 (1951).

[4] Harriet Hall, “The AMA’s Guide to Politically Correct Language: Advancing Health Equity,” Science Based Medicine (Nov. 2, 2021).

[5] Citing, Foster Osei Baah, Anne M Teitelman & Barbara Riegel, “Marginalization: Conceptualizing patient vulnerabilities in the framework of social determinants of health-An integrative review,” 26 Nurs Inq. e12268 (2019).

[6] Jeff Johnston, “Woke Medicine: ‘The AMA’s Strategic Plan to Embed Racial Justice and Advance Health Equity’,” The Daily Citizen (May 21, 2021) .

[7] Jesse Singal, “The AMA jumps the Woke Shark, introduces Medspeak,” Why Evolution is True (Nov. 1, 2021).

[8] Matt Bai, “Paging Dr. Orwell. The American Medical Association takes on the politics of language,” Wash. Post (Nov. 3, 2021).

[9] Office of Minority Health, U.S. Department of Health and Human Services, “National Standards for Culturally and Linguistically Appropriate Services in Health and Health Care: A Blueprint for Advancing and Sustaining CLAS

Policy and Practice” (2013); Association of State and Territorial Health Officials, “Health equity terms” (2018).

Reference Manual on Scientific Evidence – 3rd Edition is Past Its Expiry

October 17th, 2021

INTRODUCTION

The new, third edition of the Reference Manual on Scientific Evidence was released to the public in September 2011, as a joint production of the National Academies of Science, and the Federal Judicial Center. Within a year of its publication, I wrote that the Manual needed attention on several key issues. Now that there is a committee working on the fourth edition, I am reprising the critique, slightly modified, in the hope that it may make a difference for the fourth edition.

The Development Committee for the third edition included Co-Chairs, Professor Jerome Kassirer, of Tufts University School of Medicine, and the Hon. Gladys Kessler, who sits on the District Court for the District of Columbia.  The members of the Development Committee included:

  • Ming W. Chin, Associate Justice, The Supreme Court of California
  • Pauline Newman, Judge, Court of Appeals for the Federal Circuit
  • Kathleen O’Malley, Judge, Court of Appeals for the Federal Circuit (formerly a district judge on the Northern District of Ohio)
  • Jed S. Rakoff, Judge, Southern District of New York
  • Channing Robertson, Professor of Engineering, Stanford University
  • Joseph V. Rodricks, Principal, Environ
  • Allen Wilcox, Senior Investigator, Institute of Environmental Health Sciences
  • Sandy L. Zabell, Professor of Statistics and Mathematics, Weinberg College of Arts and Sciences, Northwestern University

Joe S. Cecil, Project Director, Program on Scientific and Technical Evidence, in the Federal Judicial Center’s Division of Research, who shepherded the first two editions, served as consultant to the Committee.

With over 1,000 pages, there was much to digest in the third edition of the Reference Manual on Scientific Evidence (RMSE 3d).  Much of what is covered was solid information on the individual scientific and technical disciplines covered.  Although the information is easily available from other sources, there is some value in collecting the material in a single volume for the convenience of judges and lawyers.  Of course, given that this information is provided to judges from an ostensibly neutral, credible source, lawyers will naturally focus on what is doubtful or controversial in the RMSE. To date, there have been only a few reviews and acknowledgments of the new edition.[1]

Like previous editions, the substantive scientific areas were covered in discrete chapters, written by subject matter specialists, often along with a lawyer who addresses the legal implications and judicial treatment of that subject matter.  From my perspective, the chapters on statistics, epidemiology, and toxicology were the most important in my practice and in teaching, and I have focused on issues raised by these chapters.

The strengths of the chapter on statistical evidence, updated from the second edition, remained, as did some of the strengths and flaws of the chapter on epidemiology.  In addition, there was a good deal of overlap among the chapters on statistics, epidemiology, and medical testimony.  This overlap was at first blush troubling because the RMSE has the potential to confuse and obscure issues by having multiple authors address them inconsistently.  This is an area where reviewers of the upcoming edition should pay close attention.

I. Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

There was a deep discordance among the chapters in the third Reference Manual as to how judges should approach scientific gatekeeping issues. The third edition vacillated between encouraging judges to look at scientific validity, and discouraging them from any meaningful analysis by emphasizing inaccurate proxies for validity, such as conflicts of interest.[2]

The Third Edition featured an updated version of the late Professor Margaret Berger’s chapter from the second edition, “The Admissibility of Expert Testimony.”[3]  Berger’s chapter criticized “atomization,” a process she describes pejoratively as a “slicing-and-dicing” approach.[4]  Drawing on the publications of Daubert-critic Susan Haack, Berger rejected the notion that courts should examine the reliability of each study independently.[5]  Berger contended that the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer, the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[6]

Berger’s contention, however, was profoundly misleading.  Of course, scientists undertaking a systematic review should identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems.  Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010.  She was no friend of Daubert,[7] but remarkably her antipathy had outlived her.  Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[8]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.”  One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay.  A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication.  Those leaps do not mean that the final results are untrustworthy, only that the study itself is not likely admissible in evidence.

The inadmissibility of scientific studies is not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not independently admissible in evidence. The distinction between relied upon and admissible studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

Referring to studies, without qualification, as admissible in themselves is usually wrong as a matter of evidence law.  The error has the potential to encourage carelessness in gatekeeping expert witnesses’ opinions for their reliance upon inadmissible studies.  The error is doubly wrong if this approach to expert witness gatekeeping is taken as license to permit expert witnesses to rely upon any marginally relevant study of their choosing.  It is therefore disconcerting that the RMSE 3d failed to make the appropriate distinction between admissibility of studies and admissibility of expert witness opinion that has reasonably relied upon appropriate studies.

Consider the following statement from the chapter on epidemiology:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[9]

Curiously, the advice from the authors of the epidemiology chapter, by speaking to a single study’s validity, was at odds with Professor Berger’s caution against slicing and dicing. The authors of the epidemiology chapter seemed to be stressing that scientifically valid studies should be admissible.  Their footnote emphasized and confused the point:

See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990); cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984). Hearsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert. In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded that certain epidemiologic studies were admissible despite criticism of the methodology used in the studies. The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . .”).”[10]

This footnote, however, that studies relied upon by an expert in forming an opinion may be admissible pursuant to Rule 703, was unsupported by and contrary to Rule 703 and the overwhelming weight of case law interpreting and applying the rule.[11] The citation to a pre-Daubert decision, Christophersen, was doubtful as a legal argument, and managed to engender much confusion

Furthermore, Kehm and Ellis, the cases cited in this footnote by the authors of the epidemiology chapter, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C). See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18.  As such, the cases hardly support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies.

Here the RMSE 3d, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence.  The point is reasonably clear, however, that the studies “may be offered” in testimony to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused. The offer, however, is to “explain,” not to have the studies admitted in evidence.  The RMSE 3d was certainly not alone in advancing this notion that studies are themselves admissible.  Other well-respected evidence scholars have lapsed into this error.[12]

Evidence scholars should not conflate admissibility of the epidemiologic (or other) studies with the ability of an expert witness to advert to a study to explain his or her opinion.  The testifying expert witness really should not be allowed to become a conduit for off-hand comments and opinions in the introduction or discussion section of relied upon articles, and the wholesale admission of such hearsay opinions undermines the trial court’s control over opinion evidence.  Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

II. Toxicology for Judges

The toxicology chapter, “Reference Guide on Toxicology,” in RMSE 3d was written by Professor Bernard D. Goldstein, of the University of Pittsburgh Graduate School of Public Health, and Mary Sue Henifin, a partner in the Princeton, New Jersey office of Buchanan Ingersoll, P.C.

  1. Conflicts of Interest

At the question and answer session of the Reference Manual’s public release ceremony, in September 2011, one gentleman rose to note that some of the authors were lawyers with big firm affiliations, which he supposed must mean that they represent mostly defendants.  Based upon his premise, he asked what the review committee had done to ensure that conflicts of interest did not skew or distort the discussions in the affected chapters.  Dr. Kassirer and Judge Kessler responded by pointing out that the chapters were peer reviewed by outside reviewers, and reviewed by members of the supervising review committee.  The questioner seemed reassured, but now that I have looked at the toxicology chapter, I am not so sure.

The questioner’s premise that a member of a large firm will represent mostly defendants and thus have a pro-defense bias was probably a common perception among unsophisticated lay observers.  For instance, some large firms represent insurance companies intent upon denying coverage to product manufacturers.  These counsel for insurance companies often take the plaintiffs’ side of the underlying disputed issue in order to claim an exclusion to the contract of insurance, under a claim that the harm was “expected or intended.”  Similarly, the common perception ignores the reality of lawyers’ true conflict:  although gatekeeping helps the defense lawyers’ clients, it takes away legal work from firms that represent defendants in the litigations that are pretermitted by effective judicial gatekeeping.  Erosion of gatekeeping concepts, however, inures to the benefit of plaintiffs, their counsel, as well as the expert witnesses engaged on behalf of plaintiffs in litigation.

The questioner’s supposition in the case of the toxicology chapter, however, is doubly flawed.  If he had known more about the authors, he would probably not have asked his question.  First, the lawyer author, Ms. Henifin, despite her large firm affiliation, has taken some aggressive positions contrary to the interests of manufacturers.[13]  As for the scientist author of the toxicology chapter, Professor Goldstein, the casual reader of the chapter may want to know that he has testified in any number of toxic tort cases, almost invariably on the plaintiffs’ side.  Unlike the defense lawyer, who loses business revenue, when courts shut down unreliable claims, plaintiffs’ testifying or consulting expert witnesses stand to gain by minimalist expert witness opinion gatekeeping.  Given the economic asymmetries, the reader must thus want to know that Professor Goldstein was excluded as an expert witness in some high-profile toxic tort cases.[14]  There do not appear to be any disclosures of Professor Goldstein’s (or any other scientist author’s) conflicts of interests in RMSE 3d.  Having pointed out this conflict, I would note that financial conflicts of interest are nothing really compared with ideological conflicts of interest, which often propel scientists into service as expert witnesses to advance their political agenda.

  1. Hormesis

One way that ideological conflicts might be revealed is to look for imbalances in the presentation of toxicologic concepts.  Most lawyers who litigate cases that involve exposure-response issues are familiar with the “linear no threshold” (LNT) concept that is used frequently in regulatory risk assessments, and which has metastasized to toxic tort litigation, where LNT often has no proper place.

LNT is a dubious assumption because it claims to “know” the dose response at very low exposure levels in the absence of data.  There is a thin plausibility for LNT for genotoxic chemicals claimed to be carcinogens, but even that plausibility evaporates when one realizes that there are DNA defense and repair mechanisms to genotoxicity, which must first be saturated, overwhelmed, or inhibited, before there can be a carcinogenic response. The upshot is that low exposures that do not swamp DNA repair and tumor suppression proteins will not cause cancer.

Hormesis is today an accepted concept that describes a dose-response relationship that shows a benefit at low doses, but harm at high doses. The toxicology chapter in the Reference Manual has several references to LNT but none to hormesis.  That font of all knowledge, Wikipedia reports that hormesis is controversial, but so is LNT.  This is the sort of imbalance that may well reflect an ideological bias.

One of the leading textbooks on toxicology describes hormesis[15]:

“There is considerable evidence to suggest that some non-nutritional toxic substances may also impart beneficial or stimulatory effects at low doses but that, at higher doses, they produce adverse effects. This concept of “hormesis” was first described for radiation effects but may also pertain to most chemical responses.”

Similarly, the Encyclopedia of Toxicology describes hormesis as an important phenomenon in toxicologic science[16]:

“This type of dose–response relationship is observed in a phenomenon known as hormesis, with one explanation being that exposure to small amounts of a material can actually confer resistance to the agent before frank toxicity begins to appear following exposures to larger amounts.  However, analysis of the available mechanistic studies indicates that there is no single hormetic mechanism. In fact, there are numerous ways for biological systems to show hormetic-like biphasic dose–response relationship. Hormetic dose–response has emerged in recent years as a dose–response phenomenon of great interest in toxicology and risk assessment.”

One might think that hormesis would also be of great interest to federal judges, but they will not learn about it from reading the Reference Manual.

Hormesis research has come into its own.  The International Dose-Response Society, which “focus[es] on the dose-response in the low-dose zone,” publishes a journal, Dose-Response, and a newsletter, BELLE:  Biological Effects of Low Level Exposure.  In 2009, two leading researchers in the area of hormesis published a collection of important papers:  Mark P. Mattson and Edward J. Calabrese, eds., Hormesis: A Revolution in Biology, Toxicology and Medicine (2009).

A check in PubMed shows that LNT has more “hits” than “hormesis” or “hermetic,” but still the latter phrases exceed 1,267 references, hardly insubstantial.  In actuality, there are many more hermetic relationships identified in the scientific literature, which often fails to identify the relationship by the term hormesis or hermetic.[17]

The Reference Manual’s omission of hormesis was regrettable.  Its inclusion of references to LNT but not to hormesis suggests a biased treatment of the subject.

  1. Questionable Substantive Opinions

Readers and litigants would fondly hope that the toxicology chapter would not put forward partisan substantive positions on issues that are currently the subject of active litigation.  Fervently, we would hope that any substantive position advanced would at least be well documented.

For at least one issue, the toxicology chapter disappointed significantly.  Table 1 in the chapter presents a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” No documentation or citations are provided for this table.  Most of the exposure agent/disease outcome relationships in the table are well accepted, but curiously at least one agent-disease pair, which is the subject of current litigation, is wildly off the mark:

“Parkinson’s disease and manganese[18]

If the chapter’s authors had looked, they would have found that Parkinson’s disease is almost universally accepted to have no known cause, at least outside court rooms.  They would also have found that the issue has been addressed carefully and the claimed relationship or “concern” has been rejected by the leading researchers in the field (who have no litigation ties).[19]  Table 1 suggests a certain lack of objectivity, and its inclusion of a highly controversial relationship, manganese-Parkinson’s disease, suggests a good deal of partisanship.

  1. When All You Have Is a Hammer, Everything Looks Like a Nail

The substantive area author, Professor Goldstein, is not a physician; nor is he an epidemiologist.  His professional focus on animal and cell research appeared to color and bias the opinions offered in this chapter:[20]

“In qualitative extrapolation, one can usually rely on the fact that a compound causing an effect in one mammalian species will cause it in another species. This is a basic principle of toxicology and pharmacology.  If a heavy metal, such as mercury, causes kidney toxicity in laboratory animals, it is highly likely to do so at some dose in humans.”

Such extrapolations may make sense in regulatory contexts, where precauationary judgments are of interest, but they hardly can be said to be generally accepted in controversies in scientific communities, or in civil actions over actual causation.  There are too many counterexamples to cite, but consider crystalline silica, silicon dioxide.  Silica causes something resembling lung cancer in rats, but not in mice, guinea pigs, or hamsters.  It hardly makes sense to ask juries to decide whether the plaintiff is more like a rat than a mouse.

For a sober second opinion to the toxicology chapter, one may consider the views of some well-known authors:

“Whereas the concordance was high between cancer-causing agents initially discovered in humans and positive results in animal studies (Tomatis et al., 1989; Wilbourn et al., 1984), the same could not be said for the reverse relationship: carcinogenic effects in animals frequently lacked concordance with overall patterns in human cancer incidence (Pastoor and Stevens, 2005).”[21]

III. New Reference Manual’s Uneven Treatment of Causation and of Conflicts of Interest

The third edition of the Reference Manual on Scientific Evidence (RMSE) appeared to get off to a good start in the Preface by Judge Kessler and Dr. Kassirer, when they noted that the Supreme Court mandated federal courts to:

“examine the scientific basis of expert testimony to ensure that it meets the same rigorous standard employed by scientific researchers and practitioners outside the courtroom.”

RMSE at xiii.  The preface faltered, however, on two key issues, causation and conflicts of interest, which are taken up as an introduction to the third edition.

  1. Causation

The authors reported in somewhat squishy terms that causal assessments are judgments:

“Fundamentally, the task is an inferential process of weighing evidence and using judgment to conclude whether or not an effect is the result of some stimulus. Judgment is required even when using sophisticated statistical methods. Such methods can provide powerful evidence of associations between variables, but they cannot prove that a causal relationship exists. Theories of causation (evolution, for example) lose their designation as theories only if the scientific community has rejected alternative theories and accepted the causal relationship as fact. Elements that are often considered in helping to establish a causal relationship include predisposing factors, proximity of a stimulus to its putative outcome, the strength of the stimulus, and the strength of the events in a causal chain.”[22]

The authors left the inferential process as a matter of “weighing evidence,” but without saying anything about how the scientific community does its “weighing.” Language about “proving” causation is also unclear because “proof” in scientific parlance connotes a demonstration, which we typically find in logic or in mathematics. Proving empirical propositions suggests a bar set so high such that the courts must inevitably acquiesce in a very low threshold of evidence.  The question, of course, is how low can and will judges go to admit evidence.

The authors thus introduced hand waving and excuses for why evidence can be weighed differently in court proceedings from the world of science:

“Unfortunately, judges may be in a less favorable position than scientists to make causal assessments. Scientists may delay their decision while they or others gather more data. Judges, on the other hand, must rule on causation based on existing information. Concepts of causation familiar to scientists (no matter what stripe) may not resonate with judges who are asked to rule on general causation (i.e., is a particular stimulus known to produce a particular reaction) or specific causation (i.e., did a particular stimulus cause a particular consequence in a specific instance). In the final analysis, a judge does not have the option of suspending judgment until more information is available, but must decide after considering the best available science.”[23]

But the “best available science” may be pretty crummy, and the temptation to turn desperation into evidence (“well, it’s the best we have now”) is often severe.  The authors of the Preface thus remarkable signalled that “inconclusive” is not a judgment open to judges charged with expert witness gatekeeping.  If the authors truly meant to suggest that judges should go with whatever is dished out as “the best available science,” then they have overlooked the obvious:  Rule 702 opens the door to “scientific, technical, or other specialized knowledge,” not to hunches, suggestive but inconclusive evidence, and wishful thinking about how the science may turn out when further along.  Courts have a choice to exclude expert witness opinion testimony that is based upon incomplete or inconclusive evidence. The authors went fairly far afield to suggest, erroneously, that the incomplete and the inconclusive are good enough and should be admitted.

  1. Conflicts of Interest

Surprisingly, given the scope of the scientific areas covered in the RMSE, the authors discussed conflicts of interest (COI) at some length.  Conflicts of interest are a fact of life in all endeavors, and it is understandable counsel judges and juries to try to identify, assess, and control them.  COIs, however, are weak proxies for unreliability.  The emphasis given here was, however, undue because federal judges were enticed into thinking that they can discern unreliability from COI, when they should be focused on the data, inferences, and analyses.

What becomes fairly clear is that the authors of the Preface set out to use COI as a basis for giving litigation plaintiffs a pass, and for holding back studies sponsored by corporate defendants.

“Conflict of interest manifests as bias, and given the high stakes and adversarial nature of many courtroom proceedings, bias can have a major influence on evidence, testimony, and decisionmaking. Conflicts of interest take many forms and can be based on religious, social, political, or other personal convictions. The biases that these convictions can induce may range from serious to extreme, but these intrinsic influences and the biases they can induce are difficult to identify. Even individuals with such prejudices may not appreciate that they have them, nor may they realize that their interpretations of scientific issues may be biased by them. Because of these limitations, we consider here only financial conflicts of interest; such conflicts are discoverable. Nonetheless, even though financial conflicts can be identified, having such a conflict, even one involving huge sums of money, does not necessarily mean that a given individual will be biased. Having a financial relationship with a commercial entity produces a conflict of interest, but it does not inevitably evoke bias. In science, financial conflict of interest is often accompanied by disclosure of the relationship, leaving to the public the decision whether the interpretation might be tainted. Needless to say, such an assessment may be difficult. The problem is compounded in scientific publications by obscure ways in which the conflicts are reported and by a lack of disclosure of dollar amounts.

Judges and juries, however, must consider financial conflicts of interest when assessing scientific testimony. The threshold for pursuing the possibility of bias must be low. In some instances, judges have been frustrated in identifying expert witnesses who are free of conflict of interest because entire fields of science seem to be co-opted by payments from industry. Judges must also be aware that the research methods of studies funded specifically for purposes of litigation could favor one of the parties. Though awareness of such financial conflicts in itself is not necessarily predictive of bias, such information should be sought and evaluated as part of the deliberations.”[24]

All in all, rather misleading advice.  Financial conflicts are not the only conflicts that can be “discovered.”  Often expert witnesses will have political and organizational alignments, which will show deep-seated ideological alignments with the party for which they are testifying.  For instance, in one silicosis case, an expert witness in the field of history of medicine testified, at an examination before trial, that his father suffered from a silica-related disease.  This witness’s alignment with Marxist historians and his identification with radical labor movements made his non-financial conflicts obvious, although these COI would not necessarily have been apparent from his scholarly publications alone.

How low will the bar be set for discovering COI?  If testifying expert witnesses are relying upon textbooks, articles, essays, will federal courts open the authors/hearsay declarants up to searching discovery of their finances? What really is at stake here is that the issues of accuracy, precision, and reliability are lost in the ad hominem project of discovery COIs.

Also misleading was the suggestion that “entire fields of science seem to be co-opted by payments from industry.”  Do the authors mean to exclude the plaintiffs’ lawyer lawsuit industry, which has become one of the largest rent-seeking organizations, and one of the most politically powerful groups in this country?  In litigations in which I have been involved, I have certainly seen plaintiffs’ counsel, or their proxies – labor unions, federal agencies, or “victim support groups” provide substantial funding for studies.  The Preface authors themselves show an untoward bias by their pointing out industry payments without giving balanced attention to other interested parties’ funding of scientific studies.

The attention to COI was also surprising given that one of the key chapters, for toxic tort practitioners, was written by Dr. Bernard D. Goldstein, who has testified in toxic tort cases, mostly (but not exclusively) for plaintiffs.[25]  In one such case, Makofsky, Dr. Goldstein’s participation was particularly revealing because he was forced to explain why he was willing to opine that benzene caused acute lymphocytic leukemia, despite the plethora of published studies finding no statistically significant relationship.  Dr. Goldstein resorted to the inaccurate notion that scientific “proof” of causation requires 95 percent certainty, whereas he imposed only a 51 percent certainty for his medico-legal testimonial adventures.[26] Dr. Goldstein also attempted to justify the discrepancy from the published literature by adverting to the lower standards used by federal regulatory agencies and treating physicians.  

These explanations were particularly concerning because they reflect basic errors in statistics and in causal reasoning.  The 95 percent derives from the use of the coefficient of confidence in confidence intervals, but the probability involved there is not the probability of the association’s being correct, and it has nothing to do with the probability in the belief that an association is real or is causal.  (Thankfully the RMSE chapter on statistics got this right, but my fear is that judges will skip over the more demanding chapter on statistics and place undue weight on the toxicology chapter.)  The reference to federal agencies (OSHA, EPA, etc.) and to treating physicians was meant, no doubt, to invoke precautionary principle concepts as a justification for some vague, ill-defined, lower standard of causal assessment.  These references were really covert invitations to shift the burden of proof.

The Preface authors might well have taken their own counsel and conducted a more searching assessment of COI among authors of Reference Manual.  Better yet, the authors might have focused the judiciary on the data and the analysis.

  1. Reference Manual on Scientific Evidence (3d edition) on Statistical Significance

How does the new Reference Manual on Scientific Evidence treat statistical significance?  Inconsistently and at times incoherently.

  1. Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raised the question what role statistical significance should play in evaluating a study’s support for causal conclusions[27]:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value,62 at least in proving general causation.63

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology. Certainly the failure to rule out causation is not probative of causation. How can that tautology support the claim that inconclusive studies “therefore” have some probative value? Berger’s argument seems obviously invalid, or perhaps text that badly needed a posthumous editor.  And what epidemiologic studies are conclusive?  Are the studies individually or collectively conclusive?  Berger introduced a tantalizing concept, which was not spelled out anywhere in the Manual.

Berger’s chapter raised other, serious problems. If the relied-upon studies are not statistically significant, how should we understand the testifying expert witness to have ruled out random variability as an explanation for the disparity observed in the study or studies?  Berger did not answer these important questions, but her rhetoric elsewhere suggested that trial courts should not look too hard at the statistical support (or its lack) for what expert witness testimony is proffered.

Berger’s citations in support were curiously inaccurate.  Footnote 62 cites the Cook case:

“62. See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”

Berger’s citation was disturbingly incomplete.[28] The expert witness, Dr. Clapp, in Cook did rely upon his own study, which did not obtain a statistically significant result, but the trial court admitted the expert witness’s testimony; the court denied the Rule 702 challenge to Clapp, and permitted him to testify about a statistically non-significant ecological study. Given that the judgment of the district court was reversed

Footnote 63 is no better:

“63. In re Viagra Prods., 572 F. Supp. 2d 1071 (D. Minn. 2008) (extensive review of all expert evidence proffered in multidistricted product liability case).”

With respect to the concept of statistical significance, the Viagra case centered around the motion to exclude plaintiffs’ expert witness, Gerald McGwin, who relied upon three studies, none of which obtained a statistically significant result in its primary analysis.  The Viagra court’s review was hardly extensive; the court did not report, discuss, or consider the appropriate point estimates in most of the studies, the confidence intervals around those point estimates, or any aspect of systematic error in the three studies.  At best, the court’s review was perfunctory.  When the defendant brought to light the lack of data integrity in McGwin’s own study, the Viagra MDL court reversed itself, and granted the motion to exclude McGwin’s testimony.[29]  Berger’s chapter omitted the cautionary tale of McGwin’s serious, pervasive errors, and how they led to his ultimate exclusion. Berger’s characterization of the review was incorrect, and her failure to cite the subsequent procedural history, misleading.

  1. Chapter on Statistics

The Third Edition’s chapter on statistics was relatively free of value judgments about significance probability, and, therefore, an improvement over Berger’s introduction.  The authors carefully described significance probability and p-values, and explain[30]:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

Although the chapter conflated the positions often taken to be Fisher’s interpretation of p-values and Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment was unfortunately fairly standard in introductory textbooks.  The authors may have felt that presenting multiple interpretations of p-values was asking too much of judges and lawyers, but the oversimplification invited a false sense of certainty about the inferences that can be drawn from statistical significance.

Kaye and Freedman, however, did offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome[31]:

“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large dataset—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113

This important qualification to statistical significance was omitted from the overlapping discussion in the chapter on epidemiology, where it was very much needed.

  1. Chapter on Multiple Regression

The chapter on regression did not add much to the earlier and later discussions.  The author asked rhetorically what is the appropriate level of statistical significance, and answers:

“In most scientific work, the level of statistical significance required to reject the null hypothesis (i.e., to obtain a statistically significant result) is set conventionally at 0.05, or 5%.47

Daniel Rubinfeld, “Reference Guide on Multiple Regression,” in RMSE3d 303, 320.

  1. Chapter on Epidemiology

The chapter on epidemiology[32] mostly muddled the discussion set out in Kaye and Freedman’s chapter on statistics.

“The two main techniques for assessing random error are statistical significance and confidence intervals. A study that is statistically significant has results that are unlikely to be the result of random error, although any criterion for ‘significance’ is somewhat arbitrary. A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”

The suggestion that a statistically significant study has results unlikely due to chance, without reminding the reader that the finding is predicated on the assumption that there is no association, and that the probability distribution was correct, and came close to crossing the line in committing the transposition fallacy so nicely described and warned against in the statistics chapter. The problem was that “results” is ambiguous as between the data as extreme or more so than what was observed, and the point estimate of the mean or proportion in the sample, and the assumptions that lead to a p-value were not disclosed.

The suggestion that alpha is “arbitrary,” was “somewhat” correct, but this truncated discussion was distinctly unhelpful to judges who are likely to take “arbitrary“ to mean “I will get reversed.”  The selection of alpha is conventional to some extent, and arbitrary in the sense that the law’s setting an age of majority or a voting age is arbitrary.  Some young adults, age 17.8 years old, may be better educated, better engaged in politics, better informed about current events, than 35 year olds, but the law must set a cut off.  Two year olds are demonstrably unfit, and 82 year olds are surely past the threshold of maturity requisite for political participation. A court might admit an opinion based upon a study of rare diseases, with tight control of bias and confounding, when p = 0.051, but that is hardly a justification for ignoring random error altogether, or admitting an opinion based upon a study, in which the disparity observed had a p = 0.15.

The epidemiology chapter correctly called out judicial decisions that confuse “effect size” with statistical significance[33]:

“Understandably, some courts have been confused about the relationship between statistical significance and the magnitude of the association. See Hyman & Armstrong, P.S.C. v. Gunderson, 279 S.W.3d 93, 102 (Ky. 2008) (describing a small increased risk as being considered statistically insignificant and a somewhat larger risk as being considered statistically significant.); In re Pfizer Inc. Sec. Litig., 584 F. Supp. 2d 621, 634–35 (S.D.N.Y. 2008) (confusing the magnitude of the effect with whether the effect was statistically significant); In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1041 (S.D.N.Y. 1993) (concluding that any relative risk less than 1.50 is statistically insignificant), rev’d on other grounds, 52 F.3d 1124 (2d Cir. 1995).”

Actually this confusion is not understandable at all.  The distinction has been the subject of teaching since the first edition of the Reference Manual, and two of the cited cases post-date the second edition.  The Southern District of New York asbestos case, of course, predated the first Manual.  To be sure, courts have on occasion badly misunderstood significance probability and significance testing.   The authors of the epidemiology chapter could well have added In re Viagra, to the list of courts that confused effect size with statistical significance.[34]

The epidemiology chapter appropriately chastised courts for confusing significance probability with the probability that the null hypothesis, or its complement, is correct[35]:

“A common error made by lawyers, judges, and academics is to equate the level of alpha with the legal burden of proof. Thus, one will often see a statement that using an alpha of .05 for statistical significance imposes a burden of proof on the plaintiff far higher than the civil burden of a preponderance of the evidence (i.e., greater than 50%).  See, e.g., In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 193 (S.D.N.Y. 2005); Marmo v. IBP, Inc., 360 F. Supp. 2d 1019, 1021 n.2 (D. Neb. 2005) (an expert toxicologist who stated that science requires proof with 95% certainty while expressing his understanding that the legal standard merely required more probable than not). But see Giles v. Wyeth, Inc., 500 F. Supp. 2d 1048, 1056–57 (S.D. Ill. 2007) (quoting the second edition of this reference guide).

Comparing a selected p-value with the legal burden of proof is mistaken, although the reasons are a bit complex and a full explanation would require more space and detail than is feasible here. Nevertheless, we sketch out a brief explanation: First, alpha does not address the likelihood that a plaintiff’s disease was caused by exposure to the agent; the magnitude of the association bears on that question. See infra Section VII. Second, significance testing only bears on whether the observed magnitude of association arose  as a result of random chance, not on whether the null hypothesis is true. Third, using stringent significance testing to avoid false-positive error comes at a complementary cost of inducing false-negative error. Fourth, using an alpha of .5 would not be equivalent to saying that the probability the association found is real is 50%, and the probability that it is a result of random error is 50%.”

The footnotes went on to explain further the difference between alpha probability and burden of proof probability, but somewhat misleadingly asserted that “significance testing only bears on whether the observed magnitude of association arose as a result of random chance, not on whether the null hypothesis is true.”[36]  The significance probability does not address the probability that the observed statistic is the result of random chance; rather it describes the probability of observing at least as large a departure from the expected value if the null hypothesis is true.  Of course, if this cumulative probability is sufficiently low, then the null hypothesis is rejected, and this would seem to bear upon whether the null hypothesis is true.  Kaye and Freedman’s chapter on statistics did much better at describing p-values and avoiding the transposition fallacy.

When they stayed on message, the authors of the epidemiology chapter were certainly correct that significance probability cannot be translated into an assessment of the probability that the null hypothesis, or the obtained sampling statistic, is correct.  What these authors omitted, however, was a clear statement that the many courts and counsel who have misstated this fact do not create any worthwhile precedent, persuasive or binding.

The epidemiology chapter ultimately failed to help judges in assessing statistical significance:

“There is some controversy among epidemiologists and biostatisticians about the appropriate role of significance testing.85 To the strictest significance testers, any study whose p-value is not less than the level chosen for statistical significance should be rejected as inadequate to disprove the null hypothesis. Others are critical of using strict significance testing, which rejects all studies with an observed p-value below that specified level. Epidemiologists have become increasingly sophisticated in addressing the issue of random error and examining the data from a study to ascertain what information they may provide about the relationship between an agent and a disease, without the necessity of rejecting all studies that are not statistically significant.86 Meta-analysis, as well, a method for pooling the results of multiple studies, sometimes can ameliorate concerns about random error.87  Calculation of a confidence interval permits a more refined assessment of appropriate inferences about the association found in an epidemiologic study.88

Id. at 578-79.  Mostly true, but again rather unhelpful to judges and lawyers.  Some of the controversy, to be sure, was instigated by statisticians and epidemiologists who would elevate Bayesian methods, and eliminate the use of significance probability and testing altogether. As for those scientists who still work within the dominant frequentist statistical paradigm, the chapter authors divided the world up into “strict” testers and those critical of “strict” testing.  Where, however, is the boundary? Does criticism of “strict” testing imply embrace of “non-strict” testing, or of no testing at all?  I can sympathize with a judge who permits reliance upon a series of studies that all go in the same direction, with each having a confidence interval that just misses excluding the null hypothesis.  Meta-analysis in such a situation might not just ameliorate concerns about random error, it might eliminate them.  But what of those scientists critical of strict testing?  This certainly does not suggest or imply that courts can or should ignore random error; yet that is exactly what happened in the early going in In re Viagra Products Liab. Litig.[37]  The epidemiology chapter’s reference to confidence intervals was correct in part; they permit a more refined assessment because they permit a more direct assessment of the extent of random error in terms of magnitude of association, as well as the point estimate of the association obtained from and conditioned on the sample.  Confidence intervals, however, do not eliminate the need to interpret the extent of random error; rather they provide a more direct assessment and measurement of the standard error.

V. Power in the Reference Manual for Scientific Evidence

The Third Edition treated statistical power in three of its chapters, those on statistics, epidemiology, and medical testimony.  Unfortunately, the treatments were not always consistent.

The chapter on statistics has been consistently among the most frequently ignored content of the three editions of the Reference Manual.  The third edition offered a good introduction to basic concepts of sampling, random variability, significance testing, and confidence intervals.[38]  Kaye and Freedman provided an acceptable non-technical definition of statistical power[39]:

“More precisely, power is the probability of rejecting the null hypothesis when the alternative hypothesis … is right. Typically, this probability will depend on the values of unknown parameters, as well as the preset significance level α. The power can be computed for any value of α and any choice of parameters satisfying the alternative hypothesis. Frequentist hypothesis testing keeps the risk of a false positive to a specified level (such as α = 5%) and then tries to maximize power. Statisticians usually denote power by the Greek letter beta (β). However, some authors use β to denote the probability of accepting the null hypothesis when the alternative hypothesis is true; this usage is fairly standard in epidemiology. Accepting the null hypothesis when the alternative holds true is a false negative (also called a Type II error, a missed signal, or a false acceptance of the null hypothesis).”

The definition was not, however, without problems.  First, it introduced a nomenclature issue likely to be confusing for judges and lawyers. Kaye and Freeman used β to denote statistical power, but they acknowledge that epidemiologists use β to denote the probability of a Type II error.  And indeed, both the chapters on epidemiology and medical testimony used β to reference Type II error rate, and thus denote power as the complement of β, or (1- β).[40]

Second, the reason for introducing the confusion about β was doubtful.  Kaye and Freeman suggested that statisticians usually denote power by β, but they offered no citations.  A quick review (not necessarily complete or even a random sample) suggests that many modern statistics texts denote power as (1- β).[41]   At the end of the day, there really was no reason for the conflicting nomenclature and the likely confusion it would engenders.  Indeed, the duplicative handling of statistical power, and other concepts, suggested that it is time to eliminate the repetitive discussions, in favor of one, clear, thorough discussion in the statistics chapter.

Third, Kaye and Freeman problematically refer to β as the probability of accepting the null hypothesis when elsewhere they more carefully instructed that a non-significant finding results in not rejecting the null hypothesis as opposed to accepting the null.  Id. at 253.[42]

Fourth, Kaye and Freeman’s discussion of power, unlike most of their chapter, offered advice that is controversial and unclear:

“On the other hand, when studies have a good chance of detecting a meaningful association, failure to obtain significance can be persuasive evidence that there is nothing much to be found.”[43]

Note that the authors left open what a legal or clinically meaningful association is, and thus offered no real guidance to judges on how to evaluate power after data are collected and analyzed.  As Professor Sander Greenland has argued, in legal contexts, this reliance upon observed power (as opposed to power as a guide in determining appropriate sample size in the planning stages of a study) was arbitrary and “unsalvageable as an analytic tool.”[44]

The chapter on epidemiology offered similar controversial advice on the use of power[45]:

“When a study fails to find a statistically significant association, an important question is whether the result tends to exonerate the agent’s toxicity or is essentially inconclusive with regard to toxicity.93 The concept of power can be helpful in evaluating whether a study’s outcome is exonerative or inconclusive.94  The power of a study is the probability of finding a statistically significant association of a given magnitude (if it exists) in light of the sample sizes used in the study. The power of a study depends on several factors: the sample size; the level of alpha (or statistical significance) specified; the background incidence of disease; and the specified relative risk that the researcher would like to detect.95  Power curves can be constructed that show the likelihood of finding any given relative risk in light of these factors. Often, power curves are used in the design of a study to determine what size the study populations should be.96

Although the authors correctly emphasized the need to specify an alternative hypothesis, their discussion and advice were empty of how that alternative should be selected in legal contexts.  The suggestion that power curves can be constructed was, of course, true, but irrelevant unless courts know where on the power curve they should be looking.  The authors were also correct that power is used to determine adequate sample size under specified conditions; but again, the use of power curves in this setting is today rather uncommon.  Investigators select a level of power corresponding to an acceptable Type II error rate, and an alternative hypothesis that would be clinically meaningful for their research, in order to determine their sample size. Translating clinical into legal meaningfulness is not always straightforward.

In a footnote, the authors of the epidemiology chapter noted that Professor Rothman has been “one of the leaders in advocating the use of confidence intervals and rejecting strict significance testing.”[46] What the chapter failed, however, to mention is that Rothman has also been outspoken in rejecting post-hoc power calculations that the epidemiology chapter seemed to invite:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis. The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect. In planning a study, it is reasonable to make conjectures about the magnitude of an effect to compute study-size requirements or power. In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with study-size or power calculations (Smith and Bates, 1992; Goodman and Berlin, 1994; Hoening and Heisey, 2001). Confidence limits and (even more so) P-value functions convey much more of the essential information by indicating the range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level), assuming the statistical model is correct. They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”[47]

The selective, incomplete scholarship of the epidemiology chapter on the issue of statistical power was not only unfortunate, but it distorted the authors’ evaluation of the sparse case law on the issue of power.  For instance, they noted:

“Even when a study or body of studies tends to exonerate an agent, that does not establish that the agent is absolutely safe. See Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767 (N.D. Ohio 2010). Epidemiology is not able to provide such evidence.”[48]

Here the authors, Green, Freedman, and Gordis, shifted the burden to the defendant and then go to an even further extreme of making the burden of proof one of absolute certainty in the product’s safety.  This is not, and never has been, a legal standard. The cases they cited amplified the error. In Cooley, for instance, the defense expert would have opined that welding fume exposure did not cause parkinsonism or Parkinson’s disease.  Although the expert witness had not conducted a meta-analysis, he had reviewed the confidence intervals around the point estimates of the available studies.  Many of the point estimates were at or below 1.0, and in some cases, the upper bound of the confidence interval excluded 1.0.  The trial court expressed its concern that the expert witness had inferred “evidence of absence” from “absence of evidence.”  Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010).  This concern, however, was misguided given that many studies had tested the claimed association, and that virtually every case-control and cohort study had found risk ratios at or below 1.0, or very close to 1.0.  What the court in Cooley, and the authors of the epidemiology chapter in the third edition have lost sight of, is that when the hypothesis is repeatedly tested, with failure to reject the null hypothesis, and with point estimates at or very close to 1.0, and with narrow confidence intervals, then the claimed association is probably incorrect.[49]

The Cooley court’s comments might have had some validity when applied to a single study, but not to the impressive body of exculpatory epidemiologic evidence that pertained to welding fume and Parkinson’s disease.  Shortly after the Cooley case was decided, a published meta-analysis of welding fume or manganese exposure demonstrated a reduced level of risk for Parkinson’s disease among persons occupationally exposed to welding fumes or manganese.[50]

VI. The Treatment of Meta-Analysis in the Third Edition

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis in epidemiology and the social sciences, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by two capable epidemiologists, Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”[51]

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing.[52]

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.[53]

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.[54]  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis. On remand, however, it seems that plaintiffs chose – wisely – not to proceed with Nicholson’s meta-analysis.[55]

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that:

“no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”[56]

The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is completely wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.[57]  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.[58]

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.[59]

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.[60]  Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation.[61]

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis; the third edition did not add very much to the discussion.  The time has come for the next edition to address meta-analyses, and criteria for their validity or invalidity.

  1. Statistics Chapter

The statistics chapter of the third edition gave scant attention to meta-analysis.  The chapter noted, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautioned that meta-analytic procedures “have their own weakness,”[62] without detailing what that weakness is. The time has come to spell out the weaknesses so that trial judges can evaluate opinion testimony based upon meta-analyses.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”[63]

This definition was inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are, or should be, built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.

  1. Epidemiology Chapter

The chapter on epidemiology delved into meta-analysis in greater detail than the statistics chapter, and offered apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”[64]

It is now time to tell trial judges what “careful” means in the context of conducting and reporting and relying upon meta-analyses.

The epidemiology chapter appropriately noted that meta-analysis can help address concerns over random error in small studies.[65]  Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter then proceeded to hedge considerably[66]:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

The stated objection to pooling results for observational studies was certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter went on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the Reference Manual, the authors in the third edition repeated their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”[67]

Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of systematic reviews and meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the third edition of the Reference Manual failed to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice[68]:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

The epidemiology chapter authors were entitled to their opinion, but their discussion left the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed. What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.

  1. Medical Testimony Chapter

The chapter on medical testimony is the third pass at meta-analysis in the third edition of the Reference Manual.  The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs[69]:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

This language from the medical testimony chapter curiously omitted observational studies, but the footnote reference (note 143) then inconsistently discussed two meta-analyses of observational, rather than experimental, studies.[70]  The chapter then provided even further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs[71]:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

This discussion further muddied the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are usually a necessary precondition for conducting a proper meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the third edition of the Reference Manual.

CONCLUSION

The idea of the Reference Manual was important to support trial judge’s efforts to engage in gatekeeping in unfamiliar subject matter areas. In its third incarnation, the Manual has become a standard starting place for discussion, but on several crucial issues, the third edition was unclear, imprecise, contradictory, or muddled. The organizational committee and authors for the fourth edition have a fair amount of work on their hands. There is clearly room for improvement in the Fourth Edition.


[1] Adam Dutkiewicz, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 28 Thomas M. Cooley L. Rev. 343 (2011); John A. Budny, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 31 Internat’l J. Toxicol. 95 (2012); James F. Rogers, Jim Shelson, and Jessalyn H. Zeigler, “Changes in the Reference Manual on Scientific Evidence (Third Edition),” Internat’l Ass’n Def. Csl. Drug, Device & Biotech. Comm. Newsltr. (June 2012).  See Schachtman “New Reference Manual’s Uneven Treatment of Conflicts of Interest.” (Oct. 12, 2011).

[2] The Manual did not do quite so well in addressing its own conflicts of interest.  See, e.g., infra at notes 7, 20.

[3] RSME 3d 11 (2011).

[4] Id. at 19.

[5] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[6] Id. at 19-20 & n.52.

[7] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[8] Id. at 20 n.51. (The editors noted misleadingly that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”). I have written elsewhere of the ethical cloud hanging over this Milward decision. SeeCarl Cranor’s Inference to the Best Explanation” (Feb. 12, 2021); “From here to CERT-ainty” (June 28, 2018); “The Council for Education & Research on Toxics” (July 9, 2013) (CERT amicus brief filed without any disclosure of conflict of interest). See also NAS, “Carl Cranor’s Conflicted Jeremiad Against Daubert” (Sept. 23, 2018).

[9] RMSE 3d at 610 (internal citations omitted).

[10] RMSE 3d at 610 n.184 (emphasis, in bold, added).

[11] Interestingly, the authors of this chapter seem to abandon their suggestion that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5),” which was part of their argument in the Second Edition.  RMSE 2d at 335 (2000).  See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[12] David L. Faigman, et al., Modern Scientific Evidence:  The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[13] See Richard M. Lynch and Mary S. Henifin, “Causation in Occupational Disease: Balancing Epidemiology, Law and Manufacturer Conduct,” 9 Risk: Health, Safety & Environment 259, 269 (1998) (conflating distinct causal and liability concepts, and arguing that legal and scientific causal criteria should be abrogated when manufacturing defendant has breached a duty of care).

[14]  See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006) (dismissing leukemia (AML) claim based upon claimed low-level benzene exposure from gasoline), aff’g 16 A.D.3d 648 (App. Div. 2d Dep’t 2005).  No; you will not find the Parker case cited in the Manual‘s chapter on toxicology. (Parker is, however, cited in the chapter on exposure science even though it is a state court case.).

[15] Curtis D. Klaassen, Casarett & Doull’s Toxicology: The Basic Science of Poisons 23 (7th ed. 2008) (internal citations omitted).

[16] Philip Wexler, Bethesda, et al., eds., 2 Encyclopedia of Toxicology 96 (2005).

[17] See Edward J. Calabrese and Robyn B. Blain, “The hormesis database: The occurrence of hormetic dose responses in the toxicological literature,” 61 Regulatory Toxicology and Pharmacology 73 (2011) (reviewing about 9,000 dose-response relationships for hormesis, to create a database of various aspects of hormesis).  See also Edward J. Calabrese and Robyn B. Blain, “The occurrence of hormetic dose responses in the toxicological literature, the hormesis database: An overview,” 202 Toxicol. & Applied Pharmacol. 289 (2005) (earlier effort to establish hormesis database).

[18] Reference Manual at 653

[19] See e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence.  26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and non­human primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong sup­port to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clini­cal outcome is not associated with the degen­eration of nigrostriatal dopaminergic neurons as is the case in PD [Parkinson’s disease].”)

[20] RMSE3ed at 646.

[21] Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxciological Sciences 223, 224 (2011).

[22] RMSE3d at xiv.

[23] RMSE3d at xiv.

[24] RMSE3d at xiv-xv.

[25] See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006); Exxon Corp. v. Makofski, 116 SW 3d 176 (Tex. Ct. App. 2003).

[26] Goldstein here and elsewhere has confused significance probability with the posterior probability required by courts and scientists.

[27] Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

[28] Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071, 1122 (D. Colo. 2006), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012).

[29] In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009). 

[31] Id. at 256 -57.

[32] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 573.

[33] Id. at 573n.68.

[34] See In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[35] RSME3d at 577 n81.

[36] Id.

[37] 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[38] David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in RMSE3ed 209 (2011).

[39] Id. at 254 n.106

[40] See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3ed 549, 582, 626 ; John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, Abogado, “Reference Guide on Medical Testimony,” in RMSE3ed 687, 724.  This confusion in nomenclature is regrettable, given the difficulty many lawyers and judges seem have in following discussions of statistical concepts.

[41] See, e.g., Richard D. De Veaux, Paul F. Velleman, and David E. Bock, Intro Stats 545-48 (3d ed. 2012); Rand R. Wilcox, Fundamentals of Modern Statistical Methods 65 (2d ed. 2010).

[42] See also Daniel Rubinfeld, “Reference Guide on Multiple Regression,“ in RMSE3d 303, 321 (describing a p-value > 5% as leading to failing to reject the null hypothesis).

[43] RMSE3d at 254.

[44] See Sander Greenland, “Nonsignificance Plus High Power Does Not Imply Support Over the Alternative,” 22 Ann. Epidemiol. 364, 364 (2012).

[45] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE3ed 549, 582.

[46] RMSE3d at 579 n.88.

[47] Kenneth Rothman, Sander Greenland, and Timothy Lash, Modern Epidemiology 160 (3d ed. 2008).  See also Kenneth J. Rothman, “Significance Questing,” 105 Ann. Intern. Med. 445, 446 (1986) (“[Simon] rightly dismisses calculations of power as a weak substitute for confidence intervals, because power calculations address only the qualitative issue of statistical significance and do not take account of the results already in hand.”).

[48] RMSE3d at 582 n.93; id. at 582 n.94 (“Thus, in Smith v. Wyeth-Ayerst Labs. Co., 278 F.Supp. 2d 684, 693 (W.D.N.C. 2003), and Cooley v. Lincoln Electric Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010), the courts recognized that the power of a study was critical to assessing whether the failure of the study to find a statistically significant association was exonerative of the agent or inconclusive.”).

[49] See, e.g., Anthony J. Swerdlow, Maria Feychting, Adele C. Green, Leeka Kheifets, David A. Savitz, International Commission for Non-Ionizing Radiation Protection Standing Committee on Epidemiology, “Mobile Phones, Brain Tumors, and the Interphone Study: Where Are We Now?” 119 Envt’l Health Persp. 1534, 1534 (2011) (“Although there remains some uncertainty, the trend in the accumulating evidence is increasingly against the hypothesis that mobile phone use can cause brain tumors in adults.”).

[50] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

[51] Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

[52] Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

[53] 706 F. Supp. 358, 373 (E.D. Pa. 1988).

[54] In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).

[55] SeeThe Shmeta-Analysis in Paoli,” (July 11, 2019).

[56] In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).

[57] 52 F.3d 1124 (2d Cir. 1995).

[58] Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

[59] See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia CasesJurimetrics J. (2011).

[60] See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

[61] See Finkelstein & Levin, supra at note 59. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies).

[62] RMSE 3d at 254 n.107.

[63] Id. at 289.

[64] Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).

[65] Id. at 579; see also id. at 607 n. 171.

[66] Id. at 607.

[67] Id. at 607 n.177.

[68] Id. at 608.

[69] RMSE 3d at 722-23.

[70] Id. at 723 n.143 (“143. … Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”).

[71] Id. at 723-24.

Expert Witness Reports Are Not Admissible

August 23rd, 2021

The tradition of antic proposals to change the law of evidence is old and venerable in the common law. In the early 19th century, Jeremy Bentham deviled the English bench and bar with sweeping proposals to place evidence law on a rationale foundation. Bentham’s contributions to his contributions to jurisprudence, like his utilitarianism, often ignored the realities of human experience and decision making. Although Bentham contributed little to the actual workings of courtroom law and procedure, he gave rise to a tradition of antic proposals that have long entertained law professors and philosophers.[1]

Bentham seemingly abhorred tradition, but his writings have given rise to a tradition of antic proposals in the law. Expert witness testimony was uncommon in the early 19th century, but today, hardly a case is tried without expert witnesses. We should not be surprised, therefore, by the rise of antic proposals for reforming the evidence law of expert witness opinion testimony.[2]

A key aspect of the Bentham tradition is ignore the actual experience and conduct of human affairs. And so now we have a proposal to shorten trials by foregoing direct examination of expert witnesses, and admitting the expert witnesses’ reports into evidence.[3] The argument contends that since the Rule 26 report requires disclosure of all the expert witnesses’ substantive opinions and all bases for their opinions, the witnesses’ viva voce testimony is merely a recital of the report. The argument proceeds that reports can be helpful in understanding complex issues and in moving trials along more efficiently.

As much as all lawyers want to promote “understanding,” and make trials more efficient, the argument fails on multiple levels. First, judges can read the expert witness reports, in bench or in jury trials, to help themselves prepare for trial, without admitting the reports into evidence. Second, the rules of evidence, which are binding upon trial judges in both bench and jury trials, require that the testimony be helpful, not the reports. Third, the argument ignores that for the last several years, the federal rules have allowed lawyers to draft reports to a large extent, without any discovery into whose phraseology appears in a final report.

Even before the federal rules created an immunity to discovery into who drafted specific language of an expert report, it was not uncommon to find that there at least some parts of an expert witness’s report that did not accurately summarize the witness’s views at the time he or she gave testimony. Often the process of discovery caused expert witnesses to modify their reports, whether through skillful inquiry at deposition, or through the submission of adversarial reports, or through changes in the evidentiary display between drafting the report and testifying at trial.

In other words, expert witnesses’ testimony rarely comes out exactly as it appears in words in Rule 26 reports. Furthermore, reports may be full of argumentative characterization of facts, which fail to survive routine objections and cross-examination. What is represented as a fact or a factual predicate of an opinion may never be cited in testimony because the expert’s representation was always false or hyperbolic. The expert witnesses are typically not percipient witnesses, and any alleged fact would not be admissible, under Rule 703, simply because it appeared in an expert witness’s report. Indeed, Rule 703 makes clear that expert witnesses can rely upon inadmissible hearsay as long as experts in their fields reasonably would do so in the ordinary course of their professions.

Voir dire of charts, graphs, and underlying data may result in large portions of an expert report becoming inadmissible. Not every objection will be submitted as a motion in limine; and not every objection rises to the level of a Rule 702 or 703 pre-trial motion to exclude the expert witness. Foundational lapses or gaps may render some parts of reports to be inadmissible.

The argument for admitting reports as evidence reflects a trend toward blowsy, frowsy jurisprudence. Judges should be listening carefully to testimony, both direct and cross, from expert witnesses. They will have transcripts at their disposal. Although the question and answer format of direct examination may take some time, it ensures the orderly presentation of admissible testimony.

Given that testimony often turns out differently from the unqualified statements in a pre-trial report, the proposed admissibility of reports will create evidentiary chaos when there a disparity between report and testimony, or there is a failure to elicit as testimony something that is stated in the report. Courts and litigants need an unequivocal record of what is in evidence when moving for striking testimony, or for directed verdicts, new trials, or judgments notwithstanding the verdict.

The proposed abridgement of expert witness direct examinations would allow further gaming by not calling an expert witness once the witness’s report has been filed. Expert witnesses may conveniently become unavailable, after their reports have been admitted into evidence.

In multi-district litigations, the course of litigation may take years and even decades. Reports filed early on may not reflect current views or the current state of the science. Deeming filed reports “admissible” could have a significant potential to subvert accurate fact finding.

In Ake v. General Motors Corp.[4], Chief Judge Larimer faced a plaintiff who sought to offer in evidence a report written by plaintiffs’ expert witness, who was scheduled to testify at trial. The trial court held, however, that the report was inadmissible hearsay, for which no exception was available.[5] The report at issue was not a business record, which might be admissible under Rule 803(6), in that it did not record events made at or near the event at issue, and the event did not involve the expert witness’s regularly conducted business activity.

There are plenty of areas of the law in which reforms are helpful and necessary. The formality of presenting an expert witness’s actual opinions, under oath, in open court, subject to objections and challenges, needs no abridgement.


[1] See, e.g., William Twining, “Bentham’s Theory of Evidence: Setting a Context,” 20 J. Bentham Studies 18 (2019); Kenneth M. Ehrenberg, “Less Evidence, Better Knowledge,” 2 McGill L.J. 173 (2015); Laird C. Kirkpatrick, “Scholarly and Institutional Challenges to the Law of Evidence: From Bentham to the ADR Movement,” 25 Loyola L.A. L. Rev. 837 (1992); Frederick N. Judson, “A Modern View of the Law Reforms of Jeremy Bentham,” 10 Columbia L. Rev. 41 (1910).

[2] SeeExpert Witness Mining – Antic Proposals for Reform” (Nov. 4, 2014).

[3] Roger J. Marzulla, “Expert Reports: Objectionable Hearsay or Admissible Evidence in a Bench Trial?” A.B.A.(May 17, 2021).

[4] 942 F.Supp. 869 (W.D.N.Y. 1996).

[5] Ake v. General Motors Corp., 942 F.Supp. 869, 877 (W.D.N.Y. 1996).

Epistemic Virtue – Dropping the Dime on Tenpenny

July 18th, 2021

When Marjorie Taylor Greene came under fire for propagating lies about Jewish space lasers and other fantastical conspiracy theories, she did not apologize. Rather she turned the opproprium into a grievance about being “allowed” to believe the lies. Blaming the media, Greene complained: “I was allowed to believe things that weren’t true… .”[1]

In a stunning show of bad faith, Greene attempted to redirect fault to the media. Beneath the failed attempt was a stratagem that appears to have prevalent appeal in this day of electronic and social media. There are some people who believe that telling a lie may be a moral failing, but believing a lie simply means you have been victimized. And being a victim is the ticket for admission into our grievance society.

Greene’s transparent attempt to foist blame on those who would allow her to believe hateful and crazy sidesteps her personal responsibility for her beliefs, and ignores that she chose to propagate the pernicious claims. Greene’s metaphor of passivity is essentially false in failing to come to grips with how we form beliefs, curate them, test, and verify them, even before we take to the social media “airways” to publish or re-publish them.

For the last few years, there has been scholarly and popular criticism of social media for its ability to propagate falsehoods, lies, conspiracy theories, and dis-, mis-, and mal-information.[2] Clearly, social media can do these things, but is it really surprising that social media can be an information cesspool? Descriptively, we can acknowledge that people are influenced by false claims made on social media platforms. Prescriptively, we can, and should, hold people to higher standards.

Earlier this week, the United States Surgeon General, Dr. Vivek Murthy proclaimed health misinformation on social media to be “urgent threat.”[3] Dr. Murthy stated that tech and social media companies needed to fight information rot more aggressively, and the Surgeon General’s office issued an advisory about “building a healthy information environment.”[4] Later last week, President Biden criticized social media companies for their failure to control misinformation, and announced a plan for government to participate in fact checking claims made on social media.[5] Biden’s initiative may be creating the state action needed for the yutzballs on the right and the left to make out state action in their claims of unconstitutional censorship.

I hate to play the “what about” game that was made so popular during the Trump Administration, but I have moments of weakness. What about governmental platforms for speech? After centuries of allowing any willing, able, and marginally qualified person, with a reasonable pretense to expertise, to give opinions in court, the federal judicial system cracked down on unsound, poorly supported expert witness opinion testimony. Most state courts dragged their judicial feet, but at least uttered in dicta that they were concerned.

Legislative platforms for speech have no gatekeeper. Any quack can show up, and she does. Take Sherri Jane Tenpenny.  Please.

Sherri Tenpenny is an osteopathic physician who is a well-known, virulent disease vector of disinformation. In its March 2021 report, The Disinformation Dozenthe Center for Countering Digital Hate identified Tenpenny as a top anti-vaccination shyster. As a social media vector, she is ranked in the top dozen “influencers.”[6]

Tenpenny is an anti-vaccination osteopathic physician, who shakes down fearful parents at vaccination bootcamps, and hangs out with internet hoodlums such as Alex Jones, and the plumped-up pillow purveyor, Mike Lindell. She is the author of the 2008 book, Saying No to Vaccines: A Resource Guide for all Ages, where you can find hyperbolic claims, such as “[t]he skyrocketing autism epidemic, controversy surrounding mercury and thimerosal, and the rampant childhood epidemics — asthma, allergies, eczema, attention deficit disorders (ADD), attention deficit hyperactivity disorders (ADHD) and cancer — have been linked to vaccines.”

In federal court, Tenpenny has been blocked from disseminating her malarkey at the gate. In one case, Tenpenny served as an expert witness in support of a claim that a man’s receipt of a hepatitis B vaccination caused him to develop Guillain-Barré syndrome. The Special Master incorrectly wrote that the law required him to presume the admissibility of Tenpenny’s proffered testimony. The law actually requires the proponent to show the admissibility of his expert witness’s opinion testimony. But even with the non-existent presumption, Tenpenny’s opinion was ultimately found to be worth less than a plugged nickel, when the Special Master found her methodology “so divergent from the scientific method as to be nonsensical and confusing.”[7]

In other branches of government, a Tenpenny can go a lot further. Last month, the Ohio legislature invited Tennpenny to testify in support of House Bill 248, Enact Vaccine Choice and Anti-Discrimination Act (June 8, 2021). Introduced into the Ohio House of Representatives by Republican member Jennifer Gross, Bill 248 would “prohibit mandatory vaccinations and vaccination status disclosures.” Indeed, the proposed legislation would prohibit requiring, or creating incentives for, any vaccines, not just vaccinations against SARS-CoV-2. Tenpenny’s testimony did not fail to disappoint.

Tenpenny claimed that vaccines “magnetize” people, such that keys and spoons will stick to their bodies:

“I’m sure you’ve seen the pictures all over the Internet of people who have had these shots and now they’re magnetized. They can put a key on their forehead. It sticks. They can put spoons and forks all over them and they can stick, because now we think that there’s a metal piece to that.”

Tenpenny did not, however, discuss the obvious issue of polarity, and whether people would line up “north” to “south,” when together in a crowd. She vaguely suggested that “[t]here’s been people who have long suspected that there’s been some sort of an interface, yet-to-be-defined interface, between what’s being injected in these shots and all of the 5G towers.”[8]

The fallout from the Tenpenny testimony has been amusing. After the hearing, another Republican, Representative Scott Lipps, blamed Gross for having invited Tenpenny. During the hearing, however, none of the legislators strongly pushed back. Republican legislators thanked her for testifying, and praised her work as “enlightening.” The bill sponsor, Jennifer Gross, who trained as a nurse, told Tennpenny that it was “an honor to have you here.” According to some media reports (sorry), Gross previously compared businesses’ requiring vaccination to the Holocaust. Importantly, none of the legislators asked her for the studies upon which she relied.

Why would anyone think that Facebook, Twitter, or YouTube would act with more epistemic virtue than the Ohio Legislature? The Tenpenny phenomenon raises other interesting and important questions. Tenpenny has been licensed in Ohio as a “D.O.” (Doctor of Osteopathy), no. 34.003789, since 1984. Her online record shows no “board actions” taken or pending. Apparently, the state of Ohio, the American Osteopathic Association, and other professional and regulatory bodies do not see a problem with Tenpenny’s performance in the Ohio House of Representatives.

The American Medical Association (AMA) recognizes that medical evidence in legal and administrative proceedings is critical, and that physicians have a duty to assist.[9] Testifying for a legislative committee would certainly qualify for a legal proceeding. Testifying is the practice of medicine, and physicians who testify must do so “honestly,” with “continuous self-examination to ensure that their testimony represents the facts of the case,” and “only in areas in which they have appropriate training and recent, substantive experience and knowledge.”[10] The AMA Ethical Guidelines further provide that a testifying physician has a responsibility to ensure that his or her testimony “reflects current scientific thought and standards of care that have gained acceptance among peers in the relevant field.”[11]

Perhaps most important, the AMA Ethical Guidelines specify that medical societies and medical licensing boards are responsible for maintaining high standards for medical testimony, and must assess “claims of false or misleading testimony.” When the testimony is false or misleading, these bodies should discipline the offender “as appropriate.”[12]

Where are the adults in the room?


[1] Josh K. Elliott, “GOP’s Marjorie Taylor Greene regrets being ‘allowed’ to believe hoaxes,” Global News Canada (Feb. 4, 2021).

[2] See, e.g., Catherine D. Tan, “Defending ‘snake oil’: The preservation of contentious knowledge and practices,” 51 Social Studies of Science 538 (2021).

[3] Sheryl Gay Stolberg & Davey Alba, “Surgeon General Assails Tech Companies Over Misinformation on Covid-19,” N.Y. Times (July 15, 2021).

[4] Vivek H. Murthy, Health Misinformation: The U.S. Surgeon General’s Advisory on

Building a Healthy Information Environment (2021).

[5] The Associated Press, “Biden Slams Social Media Companies for Pandemic Misinformation,” N.Y. Times (July 16, 2021).

[6] Jonathan Jarry, “A Dozen Misguided Influencers Spread Most of the Anti-Vaccination Content on Social Media: The Disinformation Dozen generates two thirds of anti-vaccination content on Facebook and Twitter,” McGill Univ. Office for Science & Soc’y (Mar. 31, 2021).

[7] Shaw v. Sec’y Health & Human Servs., No. 01-707V, 2009 U.S. Claims LEXIS 534, *84 n.40 (Fed. Cl. Spec. Mstr. Aug. 31, 2009).

[8] Andrea Salcedo, “A doctor falsely told lawmakers vaccines magnetize people: ‘They can put a key on their forehead. It sticks.’,” Wash. Post (June 9, 2021); Andy Downing, “What an exceedingly dumb time to be alive,” Columbus Alive (June 10, 2021); Jake Zuckerman, “She says vaccines make you magnetized. This West Chester lawmaker invited her testimony, chair says,” Ohio Capital Journal (July 14, 2021).

[9] A.M.A. Code of Medical Ethics Opinion 9.7.1.

[10] Id.

[11] Id.

[12] Id.

People Get Ready – There’s a Reference Manual a Comin’

July 16th, 2021

Science is the key …

Back in February, I wrote about a National Academies’ workshop that featured some outstanding members of the scientific and statistical world, and which gave participants to identify new potential subjects for inclusion in a proposed fourth edition of the Reference Manual on Scientific Evidence.[1] Funding for that new edition is now secured, and the National Academies has published a précis of the February workshop. National Academies of Sciences, Engineering, and Medicine, Emerging Areas of Science, Engineering, and Medicine for the Courts: Proceedings of a Workshop – in Brief (Washington, DC 2021). The Rapporteurs for these proceedings provide a helpful overview for this meeting, which was not generally covered in the legal media.[2]

The goal of the workshop, which was supported by a planning committee, the Committee on Science, Technology, and Law, the National Academies, the Federal Judicial Center, and the National Science Foundation, was, of course, to identify chapters for a new, fourth edition, of Reference Manual on Scientific Evidence. The workshop was co-chaired by Dr. Thomas D. Albright, of the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, Judge on the U.S. Court of Appeals for the Federal Circuit.

The Rapporteurs duly noted Judge O’Malley’s Workshop comments that she hoped that the reconsideration of the Reference Manual can help close the gap between science and the law. It is thus encouraging that the Rapporteurs focused a large part of their summary on the presentation of Professor Xiao-Li Meng[3] on selection bias, which “can come from cherry picking data, which alters the strength of the evidence.” Meng identified the

“7 S’(ins)” of selection bias:

(1) selection of target/hypothesis (e.g., subgroup analysis);

(2) selection of data (e.g., deleting ‘outliers’ or using only ‘complete cases’);

(3) selection of methodologies (e.g., choosing tests to pass the goodness-of-fit); (4) selective due diligence and debugging (e.g., triple checking only when the outcome seems undesirable);

(5) selection of publications (e.g., only when p-value <0.05);

(6) selections in reporting/summary (e.g., suppressing caveats); and

(7) selections in understanding and interpretation (e.g., our preference for deterministic, ‘common sense’ interpretation).”

Meng also addressed the problem of analyzing subgroup findings after not finding an association in the full sample, dubious algorithms, selection bias in publishing “splashy” and nominally “statistically significant” results, and media bias and incompetence in disseminating study results. Meng discussed how these biases could affect the accuracy of research findings, and how these biases obviously affect the accuracy, validity, and reliability of research findings that are relied upon by expert witnesses in court cases.

The Rapporteurs’ emphasis on Professor Meng’s presentation was noteworthy because the current edition of the Reference Manual is generally lacking in a serious exploration of systematic bias and confounding. To be sure, the concepts are superficially addressed in the Manual’s chapter on epidemiology, but in a way that has allowed many district judges to shrug off serious questions of invalidity with the shibboleth that such questions “to to the weight, not the admissibility,” of challenged expert witness opinion testimony. Perhaps the pending revision to Rule 702 will help improve fidelity to the spirit and text of Rule 702.

Questions of bias and noise have come to receive more attention in the professional statistical and epidemiologic literature. In 2009, Professor Timothy Lash published an important book-length treatment of quantitative bias analysis.[4] Last year, statistician David Hand published a comprehensive, but readily understandable, book on “Dark Data,” and the ways statistical and scientific interference are derailed.[5] One of the presenters at the February workshop, nobel laureate, Daniel Kahneman, published a book on “noise,” just a few weeks ago.[6]

David Hand’s book, Dark Data, (Chapter 10) sets out a useful taxonomy of the ways that data can be subverted by what the consumers of data do not know. The taxonomy would provide a useful organizational map for a new chapter of the Reference Manual:

A Taxonomy of Dark Data

Type 1: Data We Know Are Missing

Type 2: Data We Don’t Know Are Missing

Type 3: Choosing Just Some Cases

Type 4: Self- Selection

Type 5: Missing What Matters

Type 7: Changes with Time

Type 8: Definitions of Data

Type 9: Summaries of Data

Type 11: Feedback and Gaming

Type 12: Information Asymmetry

Type 13: Intentionally Darkened Data

Type 14: Fabricated and Synthetic Data

Type 15: Extrapolating beyond Your Data

Providing guidance not only on “how we know,” but also on how we go astray, patho-epistemology, would be helpful for judges and lawyers. Hand’s book really just a beginning to helping gatekeepers appreciate how superficially plausible health-effects claims are invalidated by the data relied upon by proffered expert witnesses.

* * * * * * * * * * * *

“There ain’t no room for the hopeless sinner
Who would hurt all mankind, just to save his own, believe me now
Have pity on those whose chances grow thinner”


[1]Reference Manual on Scientific Evidence v4.0” (Feb. 28, 2021).

[2] Steven Kendall, Joe S. Cecil, Jason A. Cantone, Meghan Dunn, and Aaron Wolf.

[3] Prof. Meng is the Whipple V. N. Jones Professor of Statistics, in Harvard University. (“Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies.”)

[4] Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink, Applying Quantitative Bias Analysis to Epidemiologic Data (2009).

[5] David J. Hand, Dark data : why what you don’t know matters (2020).

[6] Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (2021).

Slide Rule 702

June 26th, 2021

Note: A “fatal error,” caused by an old theme has disrupted the layout of my website. I am working on it

The opposition to Daubert’s regime of gatekeeping by the lawsuit industry has been fierce. From the beginning, the resistance has found allies on the bench, who have made the application of Rule 702 to expert witnesses, in both civil and criminal, uneven at best. Back in 2015, Professor David Bernstein and Eric Lasker wrote an exposé, about the unlawful disregard of the statutory language of Rule 702, and they called for and proposed an amendment to the rule.[1] At the time, I was skeptical of unleashing a change through the rules committee, given the uncertainty of where any amendment might ultimately look like.[2]

In the last several years, there have been some notable applications of Rule 702 in litigation involving sertraline, atorvastatin, sildenafil, and other medications, but aberrant decisions have continued to upend the rule of law in the area of expert witness gatekeeping. Last year, I noted that I had come to see the wisdom of Professor Bernstein’s proposal,[3] in the light of continued judicial dodging of Rule 702.[4] Numerous lawyers and legal organizations have chimed in to urge a revision to Rule 702.[5]

Earlier this week, the Committee on Rules of Practice & Procedure rolled out a proposed draft of an amended Rule 702.[6] The proposed new rule looks very much like the current rule:[7]

  1. Rule Testimony by Expert Witnesses
  2. A witness   who   is   qualified   as   an   expert   by
  3. knowledge, skill,  experience,  training,  or  education may
  4. testify in the form of an opinion or otherwise if the proponent
  5. has demonstrated by a preponderance of the evidence that:
  6. (a) the  expert’s  scientific,  technical,  or  other
  7. specialized knowledge will help the trier of
  8. fact  to   understand   the   evidence   or   to
  9. determine a fact in issue;
  10. (b) the testimony is based on sufficient facts or
  11. data;
  12. (c) the  testimony  is  the  product  of  reliable
  13. principles and methods; and
  14. (d) the  expert  has  reliably  applied  expert’s
  15. opinion reflects a reliable application of the
  16. principles and methods to the facts of the
  17. case

Despite what looks like minor linguistic changes, the Rules Committee’s note suggests otherwise. First, the amendment is intended to emphasize that the burden of showing the admissibility requirements rests on the proponent of the challenged expert witness testimony. Of course, the burden of course has always been with the proponent, but some courts have deployed various strategems to shift the burden with conclusory assessments that the challenge “goes to the weight not the admissibility,” thereby dodging the judicial responsibility for gatekeeping. The Committee now would make clear that many courts have erred by having treated the “critical questions of the sufficiency of an expert’s basis, and the application of the expert’s methodology” as going to weight and not admissibility.[8]

The Committee appears, however, to be struggling to provide guidance on when challenges do raise “matters of weight rather than admissibility.” For instance, the Committee Note suggests that:

“nothing in the amendment requires the court to nitpick an expert’s opinion in order to reach a perfect expression of what the basis and methodology can support. The Rule 104(a) standard does not require perfection. On the other hand, it does not permit the expert to make extravagant claims that are unsupported by the expert’s basis and methodology.”[9]

Somehow, I fear that the mantra of “weight not admissibility” has been or will be replaced by refusals to nitpick an expert’s opinion. How many nits does it take to make a causal claim “extravant”?

Perhaps I am the one nitpicking now. The Committee has recognized the essential weakness of gatekeeping as frequently practiced in federal courts by emphasizing that judicial gatekeeping is “essential” and required by the institutional incompetence of jurors to determine whether expert witnesses have reliably applied sound methodology to the facts of the case:

“a trial judge must exercise gatekeeping authority with respect to the opinion ultimately expressed by a testifying expert. A testifying expert’s opinion must stay within the bounds of what can be concluded by a reliable application of the expert’s basis and methodology. Judicial gatekeeping is essential because just as jurors may be unable to evaluate meaningfully the reliability of scientific and other methods underlying expert opinion, jurors may also be unable to assess the conclusions of an expert that go beyond what the expert’s basis and methodology may reliably support.”[10]

If the sentiment of the Rule Committee’s draft note carries through to the Committee Note that accompanies the amended rule, then perhaps some good will come of this effort.


[1] David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1 (2015).

[2]On Amending Rule 702 of the Federal Rules of Evidence” (Oct. 17, 2015).

[3]Should Federal Rule of Evidence 702 Be Amended?” (May 8, 2020).

[4]Dodgy Data Duck Daubert Decisions” (April 2, 2020); “Judicial Dodgers – The Crossexamination Excuse for Denying Rule 702 Motions” (May 11, 2020); “Judicial Dodgers – Reassigning the Burden of Proof on Rule 702” (May 13, 2020); “Judicial Dodgers – Weight not Admissibility” (May 28, 2020); “Judicial Dodgers – Rule 702 Tie Does Not Go to Proponent” (June 2, 2020).

[5] See, e.g., Daniel Higginbotham, “The Proposed Amendment to Federal Rule of

Evidence 702 – Will it Work?” IADC Products Liability Newsletter (March 2021); Cary Silverman, “Fact or Fiction: Ensuring the Integrity of Expert Testimony,” U.S. Chamber Instit. Leg. Reform (Feb. 2021); Thomas D. Schroeder, “Federal Courts, Practice & Procedure: Toward a More Apparent Approach to Considering the Admission of Expert Testimony,” 95 Notre Dame L. Rev. 2039, 2043 (2020); Lee Mickus, “Gatekeeping Reorientation: Amend Rule 702 To Correct Judicial Misunderstanding about Expert Evidence,” Wash. Leg. Foundation (May 2020)

13-18 (noting numerous cases that fail to honor the spirit and language of Rule 702); Lawyers for Civil Justice, “Comment to the Advisory Comm. on Evidence Rules and its Rule 702 Subcommittee; A Note about the Note: Specific Rejection of Errant Case Law is Necessary for the Success of an Amendment Clarifying Rule 702’s Admissibility Requirements 1 (Feb. 8, 2021) (arguing that “[t]he only unambiguous way for the Note to convey the intent of the amendment is to reject the specific offending caselaw by name.”).

[6] Committee on Rules of Practice & Procedure Agenda Book (June 22, 2021). See Email Cara Salvatore, “Court Rules Committee Moves to Stiffen Expert Standard,” Law360 (June 23, 2021).

[7] Id. at 836. The proposal has been the subject of submissions and debate for a while. See Jim Beck, “Civil Rules Committee Proposes to Toughen Rule 702,” Drug & Device Law (May 4, 2021).

[8] Committee on Rules of Practice & Procedure Agenda Book at 839 (June 22, 2021).

[9] Id.

[10] Id. at 838-39.