TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The American Tort Law Museum

March 14th, 2022

Last year, Professor Christopher J. Robinette wrote a blog post about the American Tort Law Museum. I had not heard of it, but I was curious. I have stopped by the Museum’s website on a few occasions to learn more.

The Museum’s website describes itself as “the nationally acclaimed American Museum of Tort Law,” which seems hyperbolic. I suppose as long as it is the only museum of tort law, it might as well call itself “the” museum of tort law.

Other than Professor Robinette, I have not read anything about this museum, but perhaps I was somehow left in the dark. The museum’s physical location is in Winsted, Connecticut, about 40 km. northwest of downtown Hartford, in the middle of nowhere.  Hardly a place for a nationally acclaimed museum, although Although Congressman John B. Larson is apparently very happy to have this museum in the boondocks of Connecticut.[1]

The website states that the museum seeks to “educate, inform and inspire Americans about two things: Trial by jury; and the benefits of tort law.” Well, “trial by jury” is like God and apple pie, but I am an atheist and I prefer blueberry pie. Trial by jury is great when the Crown is trying to take your property or your life, but I am a skeptic when it comes to juries’ deciding technical and scientific issues. And the “benefits of tort law”? Well, there are some, but does the museum inform about the many detriments and harms of tort law?

Browsing the website quickly answers the questions. There are case studies of what at least plaintiffs’ tort lawyers might consider benefits ($$$) of tort law, with call out to notable cases that resulted in large awards, and perhaps a few that may have led to safer products. The “nationally acclaimed” museum has nothing, at least in its online presence, about the detriments, irrationality, or failures of tort law. You will not find anything about crime and fraud among the ranks of plaintiffs’ lawyers; nor will you find anything about successful defenses that shut down entire litigations. Nothing here about Dickie Scruggs in prison garb, or about John Edwards’ love child. Hmm, you may be getting a sense that this is a lopsided, partisan effort. Indeed, the museum is a temple to the Lawsuit Industry, and with the exception of one anomalous defense lawyer, its “founders” are the muckety mucks of the plaintiffs’ bar.

Among the founders are Peter Angelos, F. Scott Baldwin, Frederick Baron, Thomas V. Girardi, Robert L. Habush, James F. Humphreys, Tommy Jacks, Joseph D. Jamail Jr., and various rent-seeking organizations, such as Center for Study of Responsive Law, Public Citizen, Public Safety Institute, and Safety Systems Foundation.

You can see who else is associated with this propaganda effort. For education about civics and the right to a jury trial, I prefer the House of Terror, in Budapest.


[1] John B. Larson, “Recognizing the American Museum of Tort Law’s Second Anniversary,” Cong. Rec. E1475 (Nov. 1, 2017).

Hindsight Bias – In Science & in the Law

February 27th, 2022

In the early 1970s, Amos Tversky and Daniel Kahneman raised the awareness of hindsight bias as a pervasive phenomenon in all human judgment.[1] Although these insights seemed eponymously obvious in hindsight, experimental psychologists directly tested the existence and extent of hindsight bias in a now classic paper by Baruch Fischhoff.[2] The lack of awareness of how hindsight bias affects our historical judgments seriously limits our ability to judge the past.

Kahneman’s participation in the planning phase of a new, fourth edition of the Reference Manual on Scientific Evidence, is a hopeful sign that his insights and the research of many psychologists will gain a fuller recognition in the law. Hindsight bias afflicts judges, lawyers, jurors, expert witnesses, scientists, physicians, and children of all ages.[3]

Hindsight Bias in the Law

Sixth Amendment Challenges

Challenges to the effectiveness of legal counsel is a mainstay for habeas petitions, filed by convicted felons. In hindsight, their lawyers’ conduct seems woefully inadequate. In judging such claims of ineffectiveness, the United States Supreme Court acknowledged the role and influence of hindsight bias in judging trial counsel’s strategic decisions:

“A fair assessment of attorney performance requires that every effort be made to eliminate the distorting effects of hindsight, to reconstruct the circumstances of counsel’s challenged conduct, and to evaluate the conduct from counsel’s perspective at the time. Because of the difficulties inherent in making the evaluation, a court must indulge a strong presumption that counsel’s conduct falls within the wide range of reasonable professional assistance; that is, the defendant must overcome the presumption that, under the circumstances, the challenged action might be considered sound trial strategy.”[4]

This decision raises the interesting question why there is not a strong presumption of reasonableness in other legal contexts, such as the “reasonableness” of physician judgments, or of adequate warnings.

Medical Malpractice

There is little doubt that retrospective judgments of the reasonableness of medical decisions is infected, distorted, and corrupted by hindsight bias.[5] In the words of one paper on the subject:

“There is evidence that hindsight bias, which may cause the expert to simplify, trivialise and criticise retrospectively the decisions of the treating doctor, is inevitable when the expert knows there has been an adverse outcome.”[6]

Requiring the finder of fact to assess the reasonableness of complex medical judgments in hindsight, with knowledge of the real-world outcomes of the prior judgments, pose a major threat to fairness in the trial process, in both bench and jury trials. Curiously, lawyers receive a “strong presumption” of reasonableness, but physicians and manufacturers do not.

Patent Litigation

Hindsight bias plays a large role in challenging patent validity. The works of genius seem obvious with hindsight. In the context of judging patent criteria such non-obviousness, the Supreme Court has emphasized that:

“A factfinder should be aware, of course, of the distortion caused by hindsight bias and must be cautious of arguments reliant upon ex post reasoning.”[7]

Certainly, factfinders in every kind of litigation, not just intellectual property cases, should be made aware of the distortion caused by hindsight bias.

Remedies

In all likelihood, hindsight bias can probably never be fully corrected. At a minimum, factfinders should be educated about the phenomenon. In criminal cases, defendants have called psychologists about the inherent difficulties in eyewitness or cross-race identification.[8] In New Jersey, trial courts must give a precautionary instruction in criminal cases that involve eyewitness identification.[9] In some but not all discrimination cases, courts have permitted expert witness opinion testimony about “implicit bias.”[10] In “long-tail” litigation, in which jurors must consider the reasonableness of warning decisions, or claims of failure to test, decades before the trial, defendants may well want to consider calling a psychologist to testify about the reality of hindsight bias, and how it leads to incorrect judgments about past events.

Another, independent remedy would be for the trial court to give a jury instruction on hindsight bias.  After all, the Supreme Court has clearly stated that “[a] factfinder should be aware, of course, of the distortion caused by hindsight bias and must be cautious of arguments reliant upon ex post reasoning.” The trial judge should set the stage for a proper consideration of past events, by alerting jurors to the reality and seductiveness of hindsight bias. What follows is a first attempt at such an instruction. I would love to hear from anyone who has submitted a proposed instruction on the issue.

Members of the jury, this case will require your determination of what were the facts of what scientists knew or should have known at a time in the past. At the same time that you try to make this determination, you will have been made aware of what is now known. Psychological research clearly shows that all human beings, regardless of their age, education, or life circumstances have what is known as hindsight bias. Having this bias means that we all tend to assume that people at times past should have known what we now in fact know. Calling it a bias is a way to say that this assumption is wrong. To decide this case fairly, you must try to determine what people, including experts in the field, actually knew and did before there were more recent discoveries, and without reference to what is now known and accepted.


[1] Amos Tversky & Daniel Kahneman, “Judgment under uncertainty: heuristics and Biases,” 185 Science 1124 (1974). See alsoPeople Get Ready – There’s a Reference Manual a Comin’ ”(June 6, 2021).

[2] Baruch Fischhoff, “Hindsight ≠ foresight: the effect of outcome knowledge on judgment under uncertainty,” 1 Experimental Psychology: Human Perception & Performance 288, 288 (1975), reprinted in 12 Quality & Safety Health Care 304 (2003); Baruch Fischhoff & Ruth Beyth, “I knew it would happen: Remembered probabilities of once – future things?” 13 Organizational Behavior & Human Performance 1 (1975); see Baruch Fischhoff, “An Early History of Hindsight Research,” 25 Social Cognition 10 (2007).

[3] See Daniel M. Bernstein, Edgar Erdfelder, Andrew N. Meltzoff, William Peria & Geoffrey R. Loftus, “Hindsight Bias from 3 to 95 Years of Age,” 37 J. Experimental Psychol., Learning, Memory & Cognition, 378 (2011).

[4] Strickland v. Washington, 466 U.S. 668, 689, 104 S.Ct. 2052, 2052 (1984); see also Feldman v. Thaler, 695 F.3d 372, 378 (5th Cir. 2012).

[5] Edward Banham-Hall & Sian Stevens, “Hindsight bias critically impacts on clinicians’ assessment of care quality in retrospective case note review,” 19 Clinical Medicine 16 (2019); Thom Petty, Lucy Stephenson, Pierre Campbell & Terence Stephenson, “Outcome Bias in Clinical Negligence Medicolegal Cases,” 26 J.Law & Med. 825 (2019); Leonard Berlin, “Malpractice Issues and Radiology – Hindsight Bias” 175 Am. J. Radiol. 597 (2000); Leonard Berlin, “Outcome Bias,” 183 Am. J. Radiol. 557 (2004); Thomas B. Hugh & Sidney W. A. Dekker, “Hindsight bias and outcome bias in the social construction of medical negligence: a review,” 16 J. Law. Med. 846 (2009).

[6] Thomas B. Hugh & G. Douglas Tracy, “Hindsight Bias in Medicolegal Expert Reports,” 176 Med. J. Australia 277 (2002).

[7] KSR International Co. v. Teleflex Inc., 550 U.S. 398, 127 S.Ct. 1727, 1742 (2007) (emphasis added; internal citations omitted).

[8] See Commonwealth v. Walker, 92 A.3d 766 (Pa. 2014) (Todd, J.) (rejecting per se inadmissibility of eyewitness expert witness opinion testimony).

[9] State v. Henderson, 208 N.J. 208, 27 A.3d 872 (2011).

[10] Samaha v. Wash. State Dep’t of Transp., No. cv-10-175-RMP, 2012 WL 11091843, at *4 (E.D. Wash. Jan. 3, 2012) (holding that an expert witness’s proferred opinions about the “concepts of implicit bias and stereotypes is relevant to the issue of whether an employer intentionally discriminated against an employee.”).

Of Significance, Error, Confidence & Confusion – In Law & Statistics

February 27th, 2022

A version of this post appeared previously on Professor Deborah Mayo’s blog, Error Statistics Philosophy. The post was invited as a comment on Professor Mayo’s article in Conservation Biology, which is cited and discussed below. Other commentators had important, insightful comments that can be found at Error Statistics Philosophy.[1] These commentators and many others participated in a virtual special sessionof Professor Mayo’s “Phil Stat Forum,” on January 11, 2022. This session, “Statistical Significance Test Anxiety,” was moderated by David Hand, and included presentations by Deborah Mayo and Yoav Benjamini. The presenters slides, as well as a video of the session are now online.

*      *     *     *     *     *     *     *

The metaphor of law as an “empty vessel” is frequently invoked to describe the law generally, as well as pejoratively to describe lawyers. The metaphor rings true at least in describing how the factual content of legal judgments comes from outside the law. In many varieties of litigation, not only the facts and data, but the scientific and statistical inferences must be added to the “empty vessel” to obtain a correct and meaningful outcome.

Once upon a time, the expertise component of legal judgments came from so-called expert witnesses, who were free to opine about the claims of causality solely by showing that they had more expertise than the lay jurors. In Pennsylvania, for instance, the standard for qualify witnesses to give “expert opinions” was to show that they had “a reasonable pretense to expertise on the subject.”

In the 19th and the first half of the 20th century, causal claims, whether of personal injuries, discrimination, or whatever, virtually always turned on a conception of causation as necessary and sufficient to bring about the alleged harm. In discrimination claims, plaintiffs pointed to the “inexorable zero,” in cases in which no Black citizen was ever seated on a grand jury, in a particular county, since the demise of Reconstruction. In health claims, the mode of reasoning usually followed something like Koch’s postulates.

The second half of the 20th century was marked by the rise of stochastic models in our understanding of the world. The consequence is that statistical inference made its way into the empty vessel. The rapid introduction of statistical thinking into the law did not always go well. In a seminal 1977 discrimination case, Casteneda v. Partida,[2] in an opinion by Associate Justice Blackmun, the court calculated a binomial probability for observing the sample result (rather than a result at least as extreme as such a result), and mislabeled the measurement “standard deviations” rather than standard errors:

“As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would be suspect to a social scientist.  The II-year data here reflect a difference between the expected and observed number of Mexican-Americans of approximately 29 standard deviations. A detailed calculation reveals that the likelihood that such a substantial departure from the expected value would occur by chance is less than I in 10140.”[3]

Justice Blackmun was graduated from Harvard College, summa cum laude, with a major in mathematics.

Despite the extreme statistical disparity in the 11-year run of grand juries, Justice Blackmun’s opinion provoked a robust rejoinder, not only on the statistical analysis, but on the Court’s failure to account for obvious omitted confounding variables in its simplistic analysis. And then there were the inconvenient facts that Mr. Partida was a rapist, indicted by a grand jury (50% with “Hispanic” names), which was appointed by jury commissioners (3/5 Hispanic). Partida was convicted by a petit jury (7/12 Hispanic), in front a trial judge who was Hispanic, and he was denied a writ of habeas court by Judge Garza, who went on to be a member of the Court of Appeals. In any event, Justice Blackmun’s dictum about “two or three” standard deviations soon shaped the outcome of many thousands of discrimination cases, and was translated into a necessary p-value of 5%.

Beginning in the early 1960s, statistical inference became an important feature of tort cases that involved claims based upon epidemiologic evidence. In such health-effects litigation, the judicial handling of concepts such as p-values and confidence intervals often went off the rails.  In 1989, the United States Court of Appeals for the Fifth Circuit resolved an appeal involving expert witnesses who relied upon epidemiologic studies by concluding that it did not have to resolve questions of bias and confounding because the studies relied upon had presented their results with confidence intervals.[4] Judges and expert witnesses persistently interpreted single confidence intervals from one study as having a 95 percent probability of containing the actual parameter.[5] Similarly, many courts and counsel committed the transposition fallacy in interpreting p-values as posterior probabilities for the null hypothesis.[6]

Against this backdrop of mistaken and misrepresented interpretation of p-values, the American Statistical Association’s p-value statement was a helpful and understandable restatement of basic principles.[7] Within a few weeks, however, citations to the p-value Statement started to show up in the briefs and examinations of expert witnesses, to support contentions that p-values (or any procedure to evaluate random error) were unimportant, and should be disregarded.[8]

In 2019, Ronald Wasserstein, the ASA executive director, along with two other authors wrote an editorial, which explicitly called for the abandonment of using “statistical significance.”[9] Although the piece was labeled “editorial,” the journal provided no disclaimer that Wasserstein was not speaking ex cathedra.

The absence of a disclaimer provoked much confusion. Indeed, Brian Turran, the editor of Significancepublished jointly by the ASA and the Royal Statistical Society, wrote an editorial interpreting the Wasserstein editorial as an official ASA “recommendation.” Turran ultimately retracted his interpretation, but only in response to a pointed letter to the editor.[10] Turran adverted to a misleading press release from the ASA as the source of his confusion. Inquiring minds might wonder why the ASA allowed such misleading press releases to go out.

In addition to press releases, some people in the ASA started to send emails to journal editors, to nudge them to abandon statistical significance testing on the basis of what seemed like an ASA recommendation. For the most part, this campaign was unsuccessful in the major biomedical journals.[11]

While this controversy was unfolding, then President Karen Kafadar of the ASA stepped into the breach to state definitively that the Executive Director was not speaking for the ASA.[12] In November 2019, the ASA board of directors approved a motion to create a “Task Force on Statistical Significance and Replicability.” Its charge was “to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors. The task force will be appointed by the ASA President with advice and participation from the ASA Board.”

Professor Mayo’s editorial has done the world of statistics, as well as the legal world of judges, lawyers, and legal scholars, a service in calling attention to the peculiar intellectual conflicts of interest that played a role in the editorial excesses of some of  the ASA’s leadership. From a lawyer’s perspective, it is clear that courts have been misled, and distracted by, some of the ASA officials who seem to have worked to undermine a consensus position paper on p-values.[13]

Curiously, the task force’s report did not find a home in any of the ASA’s several scholarly publications. Instead “The ASA President’s Task Force Statement on Statistical Significance and Replicability[14] appeared in the The Annals of Applied  Statistics, where it is accompanied by an editorial by ASA former President Karen Kafadar.[15] In November 2021, the ASA’s official “magazine,” Chance, also published the Task Force’s Statement.[16]

Judges and litigants who must navigate claims of statistical inference need guidance on the standard of care scientists and statisticians should use in evaluating such claims. Although the Taskforce did not elaborate, it advanced five basic propositions, which had been obscured by many of the recent glosses on the ASA 2016 p-value statement, and the 2019 editorial discussed above:

  1. “Capturing the uncertainty associated with statistical summaries is critical.”
  2. “Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data.”
  3. “The theoretical basis of statistical science offers several general strategies for dealing with uncertainty.”
  4. “Thresholds are helpful when actions are required.”
  5. “P-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data.”

Although the Task Force’s Statement will not end the debate or the “wars,” it will go a long way to correct the contentions made in court about the insignificance of significance testing, while giving courts a truer sense of the professional standard of care with respect to statistical inference in evaluating claims of health effects.


[1] Commentators included John Park, MD; Brian Dennis, Ph.D.; Philip B. Stark, Ph.D.; Kent Staley, Ph.D.; Yudi Pawitan, Ph.D.; Brian, Hennig, Ph.D.; Brian Haig, Ph.D.; and Daniël Lakens, Ph.D.

[2] Casteneda v. Partida, 430 U.S. 432 (1977).

[3] Id. at 430 U.S. 482, 496 n.17 (1977).

[4] Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989).

[5] Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004) (“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”) (Both authors testify for claimants cases involving alleged environmental and occupational harms.); Schachtman, “Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012) (collecting numerous examples of judicial offenders).

[6] See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191, 193 (S.D.N.Y. 2005) (Rakoff, J.) (credulously accepting counsel’s argument that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant). The decision has been criticized in the scholarly literature, but it is still widely cited without acknowledging its error. See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009).

[7] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); see “The American Statistical Association’s Statement on and of Significance” (March 17, 2016). The commentary beyond the “bold faced” principles was at times less helpful in suggesting that there was something inherently inadequate in using p-values. With the benefit of hindsight, this commentary appears to represent editorizing by the authors, and not the sense of the expert committee that agreed to the six principles.

[8] Schachtman, “The American Statistical Association Statement on Significance Testing Goes to Court, Part I” (Nov. 13, 2018), “Part II” (Mar. 7, 2019).

[9] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019); see Schachtman,“Has the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019).

[10] Brian Tarran, “THE S WORD … and what to do about it,” Significance (Aug. 2019); Donald Macnaughton, “Who Said What,” Significance 47 (Oct. 2019).

[11] See, e.g., David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019); Jonathan A. Cook, Dean A. Fergusson, Ian Ford, Mithat Gonen, Jonathan Kimmelman, Edward L. Korn, and Colin B. Begg, “There is still a place for significance testing in clinical trials,” 16 Clin. Trials 223 (2019).

[12] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019); see also Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

[13] Deborah Mayo, “The statistics wars and intellectual conflicts of interest,” 36 Conservation Biology (2022) (in-press, online Dec. 2021).

[14] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics (2021) (in press).

[15] Karen Kafadar, “Editorial: Statistical Significance, P-Values, and Replicability,” 15 Annals of Applied Statistics (2021).

[16] Yoav Benjamini, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, Xiao-Li Meng, Nancy M. Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young & Karen Kafadar, “ASA President’s Task Force Statement on Statistical Significance and Replicability,” 34 Chance 10 (2021).

Confounded by Confounding in Unexpected Places

December 12th, 2021

In assessing an association for causality, the starting point is “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”[1] In other words, before we even embark on consideration of Bradford Hill’s nine considerations, we should have ruled out chance, bias, and confounding as an explanation for the claimed association.[2]

Although confounding is sometimes considered as a type of systematic bias, its importance warrants its own category. Historically, courts have been rather careless in addressing confounding. The Supreme Court, in a case decided before Daubert and the statutory modifications to Rule 702, ignored the role of confounding in a multiple regression model used to support racial discrimination claims. In language that would be reprised many times to avoid and evade the epistemic demands of Rule 702, the Court held, in Bazemore, that the omission of variables in multiple regression models raises an issue that affects “the  analysis’ probativeness, not its admissibility.”[3]

When courts have not ignored confounding,[4] they have sidestepped its consideration by imparting magical abilities to confidence intervals to take care of problem posed by lurking variables.[5]

The advent of the Reference Manual on Scientific Manual allowed a ray of hope to shine on health effects litigation. Several important cases have been decided by judges who have taken note of the importance of assessing studies for confounding.[6] As a new, fourth edition of the Manual is being prepared, its editors and authors should not lose sight of the work that remains to be done.

The Third Edition of the Federal Judicial Center’s and the National Academies of Science, Engineering & Medicine’s Reference Manual on Scientific Evidence (RMSE3d 2011) addressed confounding in several chapters, not always consistently. The chapter on statistics defined “confounder” in terms of correlation between both the independent and dependent variables:

“[a] confounder is correlated with the independent variable and the dependent variable. An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding”[7]

The chapter on epidemiology, on the other hand, defined a confounder as a risk factor for both the exposure and disease outcome of interest:

“A factor that is both a risk factor for the disease and a factor associated with the exposure of interest. Confounding refers to a situation in which an association between an exposure and outcome is all or partly the result of a factor that affects the outcome but is unaffected by the exposure.”[8]

Unfortunately, the epidemiology chapter never defined “risk factor.” The term certainly seems much less neutral than a “correlated” variable, which lacks any suggestion of causality. Perhaps there is some implied help from the authors of the epidemiology chapter when they described a case of confounding by “known causal risk factors,” which suggests that some risk factors may not be causal.[9] To muck up the analysis, however, the epidemiology chapter went on to define “risk” as “[a] probability that an event will occur (e.g., that an individual will become ill or die within a stated period of time or by a certain age).”[10]

Both the statistics and the epidemiology chapters provide helpful examples of confounding and speak to the need for excluding confounding as the basis for an observed association. The statistics chapter, for instance, described confounding as a threat to “internal validity,”[11] and the need to inquire whether the adjustments in multivariate studies were “sensible and sufficient.”[12]

The epidemiology chapter in one passage instructed that when “an association is uncovered, further analysis should be conducted to assess whether the association is real or a result of sampling error, confounding, or bias.[13] Elsewhere in the same chapter, the precatory becomes mandatory.[14]

Legally Unexplored Source of Substantial Confounding

As the Reference Manual implies, attempting to control for confounding is not adequate.  The controlling must be carefully and sufficiently done. Under the heading of sufficiency and due care, there are epidemiologic studies that purport to control for confounding, but fail rather dramatically. The use of administrative databases, whether based upon national healthcare or insurance claims, has become a common place in chronic disease epidemiology. Their large size obviates many concerns about power to detect rare disease outcomes. Unfortunately, there is often a significant threat to the validity of such studies, which are based upon data sets that characterize patients as diabetic, hypertensive, obese, or smokers vel non. By dichotomizing what are continuous variables, the categorization extracts a significant price in multivariate models used in epidemiology.

Of course, physicians frequently create guidelines for normal versus abnormal, and these divisions or categories show up in medical records, in databases, and ultimately in epidemiologic studies. The actual measurements are not always available, and the use of a categorical variable may appear to simplify the statistical analysis of the dataset. Unfortunately, the results can be quite misleading. Consider the measurements of blood pressure in a study that is evaluating whether an exposure variable (such as medication use or environmental contaminant) is associated with an outcome such as cardiovascular or renal disease. Hypertension, if present, would clearly be a confounder, but the use of a categorical variable for hypertension would greatly undermine the validity of the study. If many of the study participants with hypertension had their condition well controlled by medication, then the categorical variable will dilute the adjustment for the role of hypertension in driving the association between the exposure and outcome variables of interest. Even if none of the hypertensive patients had good control, the reduction of all hypertension to a category, rather than a continuous measurement, is a path of the loss of information and the creation of bias.

Almost 40 years ago, Jacob Cohen showed that dichotomization of continuous variables results in a loss of power.[15] Twenty years later, Peter Austin showed in a Monte Carlo simulation that categorizing a continuous variable in a logistic regression results in inflating the rate of finding false positive associations.[16] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Of course, the national databases often have huge sample sizes, which only serves to increase the bias from the use of categorical variables for confounding variables.

The late Douglas Altman, who did so much to steer the medical literature toward greater validity, warned that dichotomizing continuous variables was known to cause loss of information, statistical power, and reliability in medical research.[17]

In the field of pharmaco-epidemiology, the bias created by dichotomization of a continous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[18] While readers are misled into believing that the study adjusts for important co-variates, the study will have lost information and power, with the result of presenting false-positive results that have the false-allure of a fully adjusted model. Indeed, this bias from inadequate control of confounding infects several pending pharmaceutical multi-district litigations.


Supreme Court

General Electric Co. v. Joiner, 522 U.S. 136, 145-46 (1997) (holding that an expert witness’s reliance on a study was misplaced when the subjects of the study “had been exposed to numerous potential carcinogens”)

First Circuit

Bricklayers & Trowel Trades Internat’l Pension Fund v. Credit Suisse Securities (USA) LLC, 752 F.3d 82, 89 (1st Cir. 2014) (affirming exclusion of expert witness who failed to account for confounding in event studies), aff’g 853 F. Supp. 2d 181, 188 (D. Mass. 2012)

Second Circuit

Wills v. Amerada Hess Corp., 379 F.3d 32, 50 (2d Cir. 2004) (holding expert witness’s specific causation opinion that plaintiff’s squamous cell carcinoma had been caused by polycyclic aromatic hydrocarbons was unreliable, when plaintiff had smoked and drunk alcohol)

Deutsch v. Novartis Pharms. Corp., 768 F.Supp. 2d 420, 432 (E.D.N.Y. 2011) (“When assessing the reliability of a epidemiologic study, a court must consider whether the study adequately accounted for “confounding factors.”)

Schwab v. Philip Morris USA, Inc., 449 F. Supp. 2d 992, 1199–1200 (E.D.N.Y. 2006), rev’d on other grounds, 522 F.3d 215 (2d Cir. 2008) (describing confounding in studies of low-tar cigarettes, where authors failed to account for confounding and assessing healthier life styles in users)

Third Circuit

In re Zoloft Prods. Liab. Litig., 858 F.3d 787, 793 (3d Cir. 2017) (affirming exclusion of causation expert witness)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 591 (D.N.J. 2002), aff’d, 68 Fed. Appx. 356 (3d Cir. 2003)(bias, confounding, and chance must be ruled out before an association  may be accepted as showing a causal association)

Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434 (W.D.Pa. 2003) (excluding expert witnesses in Parlodel case; noting that causality assessments and case reports fail to account for confounding)

Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441 (D.V.I. 1994) (unanswered questions about confounding required summary judgment  against plaintiff in Primatene Mist birth defects case)

Fifth Circuit

Knight v. Kirby Inland Marine, Inc., 482 F.3d 347, 353 (5th Cir. 2007) (affirming exclusion of expert witnesses) (“Of all the organic solvents the study controlled for, it could not determine which led to an increased risk of cancer …. The study does not provide a reliable basis for the opinion that the types of chemicals appellants were exposed to could cause their particular injuries in the general population.”)

Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, *7 (E.D. La. June 16, 2015) (excluding expert witness causation opinion that failed to account for other confounding exposures that could have accounted for the putative association), aff’d, 650 F. App’x 170 (5th Cir. 2016)

LeBlanc v. Chevron USA, Inc., 513 F. Supp. 2d 641, 648-50 (E.D. La. 2007) (excluding expert witness testimony that purported to show causality between plaintiff’s benzene ezposure and myelofibrosis), vacated, 275 Fed. App’x 319 (5th Cir. 2008) (remanding case for consideration of new government report on health effects of benzene)

Castellow v. Chevron USA, 97 F. Supp. 2d 780 (S.D. Tex. 2000) (discussing confounding in passing; excluding expert witness causation opinion in gasoline exposure AML case)

Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (confounding in breast implant studies)

Sixth Circuit

Pluck v. BP Oil Pipeline Co., 640 F.3d 671 (6th Cir. 2011) (affirming exclusion of specific causation opinion that failed to rule out confounding factors)

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-54 (6th Cir. 2001) (rewrite: expert’s failure to account for confounding factors in cohort study of alleged PCB exposures rendered his opinion unreliable)

Turpin v. Merrell Dow Pharms., Inc., 959 F. 2d 1349, 1355 -57 (6th Cir. 1992) (discussing failure of some studies to evaluate confounding)

Adams v. Cooper Indus. Inc., 2007 WL 2219212, 2007 U.S. Dist. LEXIS 55131 (E.D. Ky. 2007) (differential diagnosis includes ruling out confounding causes of plaintiffs’ disease).

Seventh Circuit

People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537–38 (7th Cir. 1997) (noting importance of considering role of confounding variables in educational achievement);

Caraker v. Sandoz Pharms. Corp., 188 F. Supp. 2d 1026, 1032, 1036 (S.D. Ill 2001) (noting that “the number of dechallenge/rechallenge reports is too scant to reliably screen out other causes or confounders”)

Eighth Circuit

Penney v. Praxair, Inc., 116 F.3d 330, 333-334 (8th Cir. 1997) (affirming exclusion of expert witness who failed to account of the confounding effects of age, medications, and medical history in interpreting PET scans)

Marmo v. Tyson Fresh Meats, Inc., 457 F.3d 748, 758 (8th Cir. 2006) (affirming exclusion of specific causation expert witness opinion)

Ninth Circuit

Coleman v. Quaker Oats Co., 232 F.3d 1271, 1283 (9th Cir. 2000) (p-value of “3 in 100 billion” was not probative of age discrimination when “Quaker never contend[ed] that the disparity occurred by chance, just that it did not occur for discriminatory reasons. When other pertinent variables were factored in, the statistical disparity diminished and finally disappeared.”)

In re Viagra & Cialis Prods. Liab. Litig., 424 F.Supp. 3d 781 (N.D. Cal. 2020) (excluding causation opinion on grounds including failure to account properly for confounding)

Avila v. Willits Envt’l Remediation Trust, 2009 WL 1813125, 2009 U.S. Dist. LEXIS 67981 (N.D. Cal. 2009) (excluding expert witness opinion that failed to rule out confounding factors of other sources of exposure or other causes of disease), aff’d in relevant part, 633 F.3d 828 (9th Cir. 2011)

In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp.2d 1230 (W.D.Wash. 2003) (ignoring study validity in a litigation arising almost exclusively from a single observational study that had multiple internal and external validity problems; relegating assessment of confounding to cross-examination)

In re Bextra and Celebrex Marketing Sales Practice, 524 F. Supp. 2d 1166, 1172 – 73 (N.D. Calif. 2007) (discussing invalidity caused by confounding in epidemiologic studies)

In re Silicone Gel Breast Implants Products Liab. Lit., 318 F.Supp. 2d 879, 893 (C.D.Cal. 2004) (observing that controlling for potential confounding variables is required, among other findings, before accepting epidemiologic studies as demonstrating causation).

Henricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142 (E.D. Wash. 2009) (noting that confounding must be ruled out)

Valentine v. Pioneer Chlor Alkali Co., Inc., 921 F. Supp. 666 (D. Nev. 1996) (excluding plaintiffs’ expert witnesses, including Dr. Kilburn, for reliance upon study that failed to control for confounding)

Tenth Circuit

Hollander v. Sandoz Pharms. Corp., 289 F.3d 1193, 1213 (10th Cir. 2002) (noting importance of accounting for confounding variables in causation of stroke)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1233 (D. Colo. 1998) (alternative explanations, such confounding, should be ruled out before accepting causal claims).

Eleventh Circuit

In re Abilify (Aripiprazole) Prods. Liab. Litig., 299 F.Supp. 3d 1291 (N.D.Fla. 2018) (discussing confounding in studies but credulously accepting challenged explanations from David Madigan) (citing Bazemore, a pre-Daubert, decision that did not address a Rule 702 challenge to opinion testimony)

District of Columbia Circuit

American Farm Bureau Fed’n v. EPA, 559 F.3d 512 (D.C. Cir. 2009) (noting that data relied upon in setting particulate matter standards addressing visibility should avoid the confounding effects of humidity)

STATES

Delaware

In re Asbestos Litig., 911 A.2d 1176 (New Castle Cty., Del. Super. 2006) (discussing confounding; denying motion to exclude plaintiffs’ expert witnesses’ chrysotile causation opinions)

Minnesota

Goeb v. Tharaldson, 615 N.W.2d 800, 808, 815 (Minn. 2000) (affirming exclusion of Drs. Janette Sherman and Kaye Kilburn, in Dursban case, in part because of expert witnesses’ failures to consider confounding adequately).

New Jersey

In re Accutane Litig., 234 N.J. 340, 191 A.3d 560 (2018) (affirming exclusion of plaintiffs’ expert witnesses’ causation opinions; deprecating reliance upon studies not controlled for confounding)

In re Proportionality Review Project (II), 757 A.2d 168 (N.J. 2000) (noting the importance of assessing the role of confounders in capital sentences)

Grassis v. Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div. 1991) (discussing the possibility that confounders may lead to an erroneous inference of a causal relationship)

Pennsylvania

Porter v. SmithKline Beecham Corp., No. 3516 EDA 2015, 2017 WL 1902905 (Pa. Super. May 8, 2017) (affirming exclusion of expert witness causation opinions in Zoloft birth defects case; discussing the importance of excluding confounding)

Tennessee

McDaniel v. CSX Transportation, Inc., 955 S.W.2d 257 (Tenn. 1997) (affirming trial court’s refusal to exclude expert witness opinion that failed to account for confounding)


[1] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

[2] See, e.g., David A. Grimes & Kenneth F. Schulz, “Bias and Causal Associations in Observational Research,” 359 The Lancet 248 (2002).

[3] Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing Court of Appeal’s decision that would have disallowed a multiple regression analysis that omitted important variables). Buried in a footnote, the Court did note, however, that “[t]here may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.” Id. at 400 n.10. What the Court missed, of course, is that the regression may be so incomplete as to be unreliable or invalid. The invalidity of the regression in Bazemore does not appear to have been raised as an evidentiary issue under Rule 702. None of the briefs in the Supreme Court or the judicial opinions cited or discussed Rule 702.

[4]Confounding in the Courts” (Nov. 2, 2018).

[5] See, e.g., Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). This howler has been widely acknowledged in the scholarly literature. See David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

[6]On Praising Judicial Decisions – In re Viagra” (Feb. 8, 2021); See “Ruling Out Bias and Confounding Is Necessary to Evaluate Expert Witness Causation Opinions” (Oct. 28, 2018); “Rule 702 Requires Courts to Sort Out Confounding” (Oct. 31, 2018).

[7] David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 285 (3ed 2011). 

[8] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 621.

[9] Id. at 592.

[10] Id. at 627.

[11] Id. at 221.

[12] Id. at 222.

[13] Id. at 567-68 (emphasis added).

[14] Id. at 572 (describing chance, bias, and confounding, and noting that “[b]efore any inferences about causation are drawn from a study, the possibility of these phenomena must be examined”); id. at 511 n.22 (observing that “[c]onfounding factors must be carefully addressed”).

[15] Jacob Cohen, “The cost of dichotomization,” 7 Applied Psychol. Measurement 249 (1983).

[16] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[17] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[18] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

When the American Medical Association Woke Up

November 17th, 2021

“You are more than entitled not to know what the word ‘performative’ means. It is a new word and an ugly word, and perhaps it does not mean anything very much. But at any rate there is one thing in its favor, it is not a profound word.”

J.L. Austin, “Performative Utterances,” in Philosophical Papers 233 (2nd ed. 1970).

John Langshaw Austin, J.L. to his friends, was a English philosopher who focused on language and how it actually worked in the real world. Austin developed the concept of performative utterances, which have since come to be known as “speech acts.” Little did J.L. know that performative utterances would come to dominate politics and social media.

The key aspect of spoken words that function as speech acts is that they do not simply communicate information, which might have some truth value, and some epistemic basis. Speech acts consist of actual conduct, such as promising, commanding, apologizing, etc.[1] The law has long implicitly recognized the distinction between factual assertions or statements and speech acts. The Federal Rules of Evidence, for instance, limits the rule against hearsay to “statements,” meaning written assertions or nonverbal conduct (such as nodding in agreement) that is intended as an assertion.[2]

When persons in wedding ceremonies say “I do,” at the appropriate moments, they are married, by virtue of their speech acts. Similarly for contracts and other promising under circumstances that give rise to enforceable contracts. A witness’s recounting another’s vows or promises is not hearsay because the witness is offering a recollection only for the fact that the utterance was made, and not to prove the truth of a matter asserted.[3]

The notion of a speech act underlies much political behavior these days. When people palaver about Q, or some QAnon conspiracy, the principle of charity requires us to understand them as not speaking words that can be true or false, but simply signaling their loyalty to a lost cause, usually associated with the loser of the 2020 presidential election. By exchanging ridiculous and humiliating utterances, fellow cultists are signaling loyalty, not making a statement about the world. Their “speech acts” are similar to rituals of exchanging blood with pledges of fraternity.

Of course, there are morons who show up at concerts expecting John F. Kennedy, Jr., to appear, or who show up at pizza places in Washington, D.C., armed with semiautomatic rifles, because their credulity outstripped the linguistic nuances of performative utterances about the Clintons. In days past, members of a cult would get a secret tatoo or wear a special piece of jewelry. Now, the way to show loyalty is to say stupid things in public, and not to laugh when your fellow cultists say similar things.

Astute observers of political systems, on both the left (George Orwell) and the right (Eric Voegelin) have long recognized that ideologies destroy language, including speech acts and performative utterances. The destructive capacities of ideologies are especially disturbing when they invade science and medicine. Alas, the ideology of the Woke has arrived in the halls of the American Medical Association (AMA).

Last month, AMA issued its guide to politically correct language, designed to advance health “equity”: “Advancing Health Equity: A Guide to Language, Narrative and Concepts (Nov. 2, 2021).” The 54 page guide is, at times, worthy of a MAD magazine parody, but the document quickly transcends parody to take us into an Orwellian nightmare of thought-control in the name of neo-Marxist “social justice” goals.[4]

In its guide to language best practices, the AMA urges us to promote health equity by adding progressive political language to what were once simple statements of fact. The AMA document begins with what seems affected, insincere humility:

“We share this document with humility. We recognize that language evolves, and we are mindful that context always matters. This guide is not and cannot be a check list of correct answers. Instead, we hope that this guide will stimulate critical thinking about language, narrative and concepts—helping readers to identify harmful phrasing in their own work and providing alternatives that move us toward racial justice and health equity.”

This pretense at humility quickly evaporates as the document’s tone become increasingly censorious and strident. The AMA seems less concerned with truth, evidence-based conclusions, or dialogue, than with conformity to social justice norms of the Woke mob.

In Table 1, the AMA introduces some “Key Principles and Associated Terms.” “Avoid use of adjectives such as vulnerable, marginalized and high-risk,” at least as to persons. Why? The AMA tells us that the use of such terms to describe individuals is “stigmatizing.” The terms are vague and imply (to the AMA) that the condition is inherent to the group rather than the actual root cause, which seems to be mostly, in the AMA’s view, the depredations of white cis-gendered men. To cure the social injustice, the AMA urges us to speak in terms of groups and communities (never individuals) that “have been historically marginalized or made vulnerable, or underserved, or under-resourced [sic], or experience disadvantage [sic].” The squishy passive voice pervades the AMA Guide, but the true subject – the oppressor – is easy to discern.

Putting aside the recurrent, barbarous use of the passive voice, we now must have medical articles that are sociological treatises. The AMA appears to be especially sensitive, perhaps hypersensitive, to what it considers “unintentional blaming.” For example, rather than discuss “[w]orkers who do not use PPE [personal protective equipment” or “people who do not seek healthcare,” the AMA instructs authors, without any apparent embarrassment or shame, to “try” substituting “workers under-resourced with” PPE, or “people with limited access to” healthcare.

Aside from assuaging the AMA’s social justice warriors, the substitutions are not remotely synonymous. There have been, there are, and there will likely always be workers and others who do not use protective equipment. There have been, there are, and there will likely always be persons who do not seek healthcare. For example, anti-vaxxing yutzballs can be found in all social strata and walks of life. Access to equipment or healthcare is a completely independent issue and concern. The AMA’s effort to hide these facts with the twisted passive-voice contortions assaults our language and our common sense.

Table 2 of the AMA Guide provides a list of commonly used words and phrases and the “equity-focused alternatives.”

“Disadvantaged” in Woke Speak becomes “historically and intentionally excluded.” The aspirational goal of “equality” is recast as “equity.” After all, mere equality, or treating everyone alike:

“ignores the historical legacy of disinvestment and deprivation through policy of historically marginalized and minoritized [sic] communities as well as contemporary forms of discrimination that limit opportunities. Through systematic oppression and deprivation from ethnocide, genocide, forced removal from land and slavery, Indigenous and Black people have been relegated to the lowest socioeconomic ranks of this country. The ongoing xenophobic treatment of undocumented brown people and immigrants (including Indigenous people disposed of their land in other countries) is another example. Intergenerational wealth has mainly benefited and exists for white families.”

In other words, treating people equally is racist. Non-racist is also racist. “Fairness” must also be banished; the equity-focused AMA requires “Social Justice.” Mere fairness pays “no attention” to power relations, and enforced distribution outcomes.

Illegal immigrants are, per AMA guidelines, transformed into “undocumented Immigrant,” because “illegal” is “a dehumanizing, derogatory term,” and because ‘[n]o human being is illegal.” The latter is a lovely sentiment, but human beings can be in countries unlawfully, just as they can be in the Capitol Building illegally.

“Non-compliance” is transmuted into “non-adherence,” because the former term “places blame for treatment failure solely on patients.” The latter term is suggested to exculpate patients, even though patients can be solely responsible for failing to follow prescribed treatment. The AMA wants, however, to remind us that non-adherence may result from “frustration and legitimate mistrust of health care, structural barriers that limit availability and accessibility of medications (including cost, insurance barriers and pharmacy deserts), time and resource constraints (including work hours, family responsibilities), and lack of effective communication about severity of disease or symptoms.” All true, but why not add sloth, stupidity, and superstition? We are still in a pandemic that has been fueled by non-compliance that largely warrants blame on the non-compliant.

The AMA wanders into fraught territory when it tells us impassively that identifying a “social problem” is now a sign of insensitivity. The AMA Woke Guide advises that social problems are really “social injustices.” Referring to a phenomenon as a social problem risks blaming people for their own “marginalization.” The term “marginalization” is part of the Social Justice jargon, and it occurs throughout the AMA Woke Guide. A handy glossary at the end of the document is provided for those of us who have not grown up in Woke culture:

“Marginalization: Process experienced by those under- or unemployed or in poverty, unable to participate economically or socially in society, including the labor market, who thereby suffer material as well as social deprivation.”[5]

The Woke apparently know that calling something a mere “social problem” makes it “seem less serious than social injustice,” and there is some chance that labeling a social phenomenon as a social problem risks “potentially blaming people for their own marginalization.” And yet not every social problem is a social injustice. Underage drinking and unprotected sex are social problems, as is widespread obesity and prevalent diabetes. Alcoholism is a social problem that is prevalent in all social strata; hardly a social injustice.

At page 23 of the Woke Guide, the AMA’s political hostility to individual agency and autonomy breaks through in a screed against meritocracy:

“Among these ideas is the concept of meritocracy, a social system in which advancement in society is based on an individual’s capabilities and merits rather than on the basis of family, wealth or social background. Individualism is problematic in obscuring the dynamics of group domination, especially socioeconomic privilege and racism. In health care, this narrative appears as an over-emphasis on changing individuals and individual behavior instead of the institutional and structural causes of disease.”

Good grief, now physicians cannot simply treat a person for a disease, they must treat entire tribes!

Table 5

Some of the most egregious language of the Woke Guide can be seen in its Table 5, entitled “Contrasting Conventional (Well-intentioned) Phrasing with Equity-focused Language that Acknowledges Root Causes of Inequities.” Table 5 makes clear that the AMA is working from a sociological program that is supported by implicit claims of knowledge for the “root causes” of inequities, a claim that should give everyone serious pause. After all, even if often disappointed, the readers of AMA journals expect rigorous scientific studies, carefully written and edited, which contribute to evidence-based medicine. There is nothing, however, in the AMA Guide, other than its ipse dixit, to support its claimed social justice etiologies.

Table 5 of the AMA Guide provides some of its most far-reaching efforts to impose a political vision through semantic legerdemain. Despite the lack of support for its claimed root causes, the AMA would force writers to assign Social Justice approved narratives and causation. A seemingly apolitical, neutral statement, such as:

“Low-income people have the highest level of coronary artery disease in the United States.”

now must be recast into sanctimonious cant that would warm the cockles of a cold Stalinist’s heart:

“People underpaid and forced into poverty as a result of banking policies, real estate developers gentrifying neighborhoods, and corporations weakening the power of labor movements, among others, have the highest level of coronary artery disease in the United States.”

Banks, corporations, and real estate developers have agency; people do not. With such verbiage, it will be hard to enforce page limits on manuscripts submitted to AMA journals. More important, however, is that the “root cause” analysis is not true in many cases. In countries where property is banned and labor owns the means of production, low-income people have higher rates of disease. The socio-economic variable is important, and consistent, across the globe, even in democratic socialist countries such as Sweden, or in Marxist paradises such as the People’s Republic of China and the former Soviet Union. The bewildered may wonder whether the AMA has ever heard of a control group. Maybe, just maybe, the increased incidence of coronary artery disease among the poor has more to do with Cheez Doodles than the ravages of capitalism.

CRITICAL REACTIONS

The AMA’s guide to linguistic etiquette is a transparent effort to advance a political agenda under the guise of language mandates. The AMA is not merely prescribing thoughtful substitutions for common phrases; the AMA guide is nothing less than an attempt to impose a “progressive” ideology with fulsome apologies. The AMA not only embraces, unquestioningly, the ideology of “white fragility, Ibram Kendi, and Robin DiAngelo; the AMA at times appears on the verge of medicalizing the behaviors of those who question or reject its Woke ideology. Is a psychiatric gulag the next step?

Dr. Michelle Cretella, the executive director of the American College of Pediatricians, expressed her concern that the AMA’s “social justice” plans are “rooted not in science and the medical ethics of the Hippocratic Oath, but in a host of Marxist ideologies that devalue the lives of our most vulnerable patients and seek to undermine the nuclear family which is the single most critical institution to child well-being.”[6]

Journalist Jesse Singal thinks that the AMA has gone berserk.[7] And Matt Bai, at the Washington Post, saw the AMA’s co-opting of language and narratives as having an Orwellian tone, resembling Mao’s “Little Red Book.”[8] The Post writer raised the interesting question why the AMA was even in the business of admonishing physicians and scientists about acceptable language. After all, the editors of Fowler’s Modern English Usage have managed for decades to eschew offering guidance on performing surgery. The Post opinion piece expresses a realistic concern that proposing “weird language” will worsen the current fraying of the social fabric, and pave the way for a Trump Restoration. Perhaps the AMA should stick to medicine rather than “mandating versions of history and their own lists of acceptable terminology.”

AMA Woke Speak has its antecedents,[9] and it will likely have its followers. For lawyers who work with expert witnesses, the AMA guide risks subjecting their medical witnesses to embarrassment, harassment, and impeachment for failing to comply with the new ideological orthodoxy. Just say no.


[1] See generally John L. Austin, How to Do Things with Words: The William James Lectures delivered at Harvard University in 1955 (1962).

[2] See Fed. R. Evid. Rule 801(a) & Notes of Advisory Comm. Definitions That Apply to This Article; Exclusions from Hearsay (defining statement).


[3] See, e.g., Emich Motors Corp. v. General Motors Corp., 181 F.2d 70 (7th Cir. 1950), rev’d on other grounds 340 U.S. 558 (1951).

[4] Harriet Hall, “The AMA’s Guide to Politically Correct Language: Advancing Health Equity,” Science Based Medicine (Nov. 2, 2021).

[5] Citing, Foster Osei Baah, Anne M Teitelman & Barbara Riegel, “Marginalization: Conceptualizing patient vulnerabilities in the framework of social determinants of health-An integrative review,” 26 Nurs Inq. e12268 (2019).

[6] Jeff Johnston, “Woke Medicine: ‘The AMA’s Strategic Plan to Embed Racial Justice and Advance Health Equity’,” The Daily Citizen (May 21, 2021) .

[7] Jesse Singal, “The AMA jumps the Woke Shark, introduces Medspeak,” Why Evolution is True (Nov. 1, 2021).

[8] Matt Bai, “Paging Dr. Orwell. The American Medical Association takes on the politics of language,” Wash. Post (Nov. 3, 2021).

[9] Office of Minority Health, U.S. Department of Health and Human Services, “National Standards for Culturally and Linguistically Appropriate Services in Health and Health Care: A Blueprint for Advancing and Sustaining CLAS

Policy and Practice” (2013); Association of State and Territorial Health Officials, “Health equity terms” (2018).

Reference Manual on Scientific Evidence – 3rd Edition is Past Its Expiry

October 17th, 2021

INTRODUCTION

The new, third edition of the Reference Manual on Scientific Evidence was released to the public in September 2011, as a joint production of the National Academies of Science, and the Federal Judicial Center. Within a year of its publication, I wrote that the Manual needed attention on several key issues. Now that there is a committee working on the fourth edition, I am reprising the critique, slightly modified, in the hope that it may make a difference for the fourth edition.

The Development Committee for the third edition included Co-Chairs, Professor Jerome Kassirer, of Tufts University School of Medicine, and the Hon. Gladys Kessler, who sits on the District Court for the District of Columbia.  The members of the Development Committee included:

  • Ming W. Chin, Associate Justice, The Supreme Court of California
  • Pauline Newman, Judge, Court of Appeals for the Federal Circuit
  • Kathleen O’Malley, Judge, Court of Appeals for the Federal Circuit (formerly a district judge on the Northern District of Ohio)
  • Jed S. Rakoff, Judge, Southern District of New York
  • Channing Robertson, Professor of Engineering, Stanford University
  • Joseph V. Rodricks, Principal, Environ
  • Allen Wilcox, Senior Investigator, Institute of Environmental Health Sciences
  • Sandy L. Zabell, Professor of Statistics and Mathematics, Weinberg College of Arts and Sciences, Northwestern University

Joe S. Cecil, Project Director, Program on Scientific and Technical Evidence, in the Federal Judicial Center’s Division of Research, who shepherded the first two editions, served as consultant to the Committee.

With over 1,000 pages, there was much to digest in the third edition of the Reference Manual on Scientific Evidence (RMSE 3d).  Much of what is covered was solid information on the individual scientific and technical disciplines covered.  Although the information is easily available from other sources, there is some value in collecting the material in a single volume for the convenience of judges and lawyers.  Of course, given that this information is provided to judges from an ostensibly neutral, credible source, lawyers will naturally focus on what is doubtful or controversial in the RMSE. To date, there have been only a few reviews and acknowledgments of the new edition.[1]

Like previous editions, the substantive scientific areas were covered in discrete chapters, written by subject matter specialists, often along with a lawyer who addresses the legal implications and judicial treatment of that subject matter.  From my perspective, the chapters on statistics, epidemiology, and toxicology were the most important in my practice and in teaching, and I have focused on issues raised by these chapters.

The strengths of the chapter on statistical evidence, updated from the second edition, remained, as did some of the strengths and flaws of the chapter on epidemiology.  In addition, there was a good deal of overlap among the chapters on statistics, epidemiology, and medical testimony.  This overlap was at first blush troubling because the RMSE has the potential to confuse and obscure issues by having multiple authors address them inconsistently.  This is an area where reviewers of the upcoming edition should pay close attention.

I. Reference Manual’s Disregard of Study Validity in Favor of the “Whole Tsumish”

There was a deep discordance among the chapters in the third Reference Manual as to how judges should approach scientific gatekeeping issues. The third edition vacillated between encouraging judges to look at scientific validity, and discouraging them from any meaningful analysis by emphasizing inaccurate proxies for validity, such as conflicts of interest.[2]

The Third Edition featured an updated version of the late Professor Margaret Berger’s chapter from the second edition, “The Admissibility of Expert Testimony.”[3]  Berger’s chapter criticized “atomization,” a process she describes pejoratively as a “slicing-and-dicing” approach.[4]  Drawing on the publications of Daubert-critic Susan Haack, Berger rejected the notion that courts should examine the reliability of each study independently.[5]  Berger contended that the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer, the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.”[6]

Berger’s contention, however, was profoundly misleading.  Of course, scientists undertaking a systematic review should identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems.  Berger cited no support for her remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010.  She was no friend of Daubert,[7] but remarkably her antipathy had outlived her.  Berger’s critical discussion of “atomization” cited the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing.[8]

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole “tsumish” must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.”  One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay.  A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication.  Those leaps do not mean that the final results are untrustworthy, only that the study itself is not likely admissible in evidence.

The inadmissibility of scientific studies is not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not independently admissible in evidence. The distinction between relied upon and admissible studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

Referring to studies, without qualification, as admissible in themselves is usually wrong as a matter of evidence law.  The error has the potential to encourage carelessness in gatekeeping expert witnesses’ opinions for their reliance upon inadmissible studies.  The error is doubly wrong if this approach to expert witness gatekeeping is taken as license to permit expert witnesses to rely upon any marginally relevant study of their choosing.  It is therefore disconcerting that the RMSE 3d failed to make the appropriate distinction between admissibility of studies and admissibility of expert witness opinion that has reasonably relied upon appropriate studies.

Consider the following statement from the chapter on epidemiology:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible, as it tends to make an issue in dispute more or less likely.”[9]

Curiously, the advice from the authors of the epidemiology chapter, by speaking to a single study’s validity, was at odds with Professor Berger’s caution against slicing and dicing. The authors of the epidemiology chapter seemed to be stressing that scientifically valid studies should be admissible.  Their footnote emphasized and confused the point:

See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990); cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984). Hearsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert. In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded that certain epidemiologic studies were admissible despite criticism of the methodology used in the studies. The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . .”).”[10]

This footnote, however, that studies relied upon by an expert in forming an opinion may be admissible pursuant to Rule 703, was unsupported by and contrary to Rule 703 and the overwhelming weight of case law interpreting and applying the rule.[11] The citation to a pre-Daubert decision, Christophersen, was doubtful as a legal argument, and managed to engender much confusion

Furthermore, Kehm and Ellis, the cases cited in this footnote by the authors of the epidemiology chapter, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C). See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18.  As such, the cases hardly support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies.

Here the RMSE 3d, in one sentence, confused Rule 703 with an exception to the rule against hearsay, which would prevent the statistically based epidemiologic studies from being received in evidence.  The point is reasonably clear, however, that the studies “may be offered” in testimony to explain an expert witness’s opinion. Under Rule 705, that offer may also be refused. The offer, however, is to “explain,” not to have the studies admitted in evidence.  The RMSE 3d was certainly not alone in advancing this notion that studies are themselves admissible.  Other well-respected evidence scholars have lapsed into this error.[12]

Evidence scholars should not conflate admissibility of the epidemiologic (or other) studies with the ability of an expert witness to advert to a study to explain his or her opinion.  The testifying expert witness really should not be allowed to become a conduit for off-hand comments and opinions in the introduction or discussion section of relied upon articles, and the wholesale admission of such hearsay opinions undermines the trial court’s control over opinion evidence.  Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

II. Toxicology for Judges

The toxicology chapter, “Reference Guide on Toxicology,” in RMSE 3d was written by Professor Bernard D. Goldstein, of the University of Pittsburgh Graduate School of Public Health, and Mary Sue Henifin, a partner in the Princeton, New Jersey office of Buchanan Ingersoll, P.C.

  1. Conflicts of Interest

At the question and answer session of the Reference Manual’s public release ceremony, in September 2011, one gentleman rose to note that some of the authors were lawyers with big firm affiliations, which he supposed must mean that they represent mostly defendants.  Based upon his premise, he asked what the review committee had done to ensure that conflicts of interest did not skew or distort the discussions in the affected chapters.  Dr. Kassirer and Judge Kessler responded by pointing out that the chapters were peer reviewed by outside reviewers, and reviewed by members of the supervising review committee.  The questioner seemed reassured, but now that I have looked at the toxicology chapter, I am not so sure.

The questioner’s premise that a member of a large firm will represent mostly defendants and thus have a pro-defense bias was probably a common perception among unsophisticated lay observers.  For instance, some large firms represent insurance companies intent upon denying coverage to product manufacturers.  These counsel for insurance companies often take the plaintiffs’ side of the underlying disputed issue in order to claim an exclusion to the contract of insurance, under a claim that the harm was “expected or intended.”  Similarly, the common perception ignores the reality of lawyers’ true conflict:  although gatekeeping helps the defense lawyers’ clients, it takes away legal work from firms that represent defendants in the litigations that are pretermitted by effective judicial gatekeeping.  Erosion of gatekeeping concepts, however, inures to the benefit of plaintiffs, their counsel, as well as the expert witnesses engaged on behalf of plaintiffs in litigation.

The questioner’s supposition in the case of the toxicology chapter, however, is doubly flawed.  If he had known more about the authors, he would probably not have asked his question.  First, the lawyer author, Ms. Henifin, despite her large firm affiliation, has taken some aggressive positions contrary to the interests of manufacturers.[13]  As for the scientist author of the toxicology chapter, Professor Goldstein, the casual reader of the chapter may want to know that he has testified in any number of toxic tort cases, almost invariably on the plaintiffs’ side.  Unlike the defense lawyer, who loses business revenue, when courts shut down unreliable claims, plaintiffs’ testifying or consulting expert witnesses stand to gain by minimalist expert witness opinion gatekeeping.  Given the economic asymmetries, the reader must thus want to know that Professor Goldstein was excluded as an expert witness in some high-profile toxic tort cases.[14]  There do not appear to be any disclosures of Professor Goldstein’s (or any other scientist author’s) conflicts of interests in RMSE 3d.  Having pointed out this conflict, I would note that financial conflicts of interest are nothing really compared with ideological conflicts of interest, which often propel scientists into service as expert witnesses to advance their political agenda.

  1. Hormesis

One way that ideological conflicts might be revealed is to look for imbalances in the presentation of toxicologic concepts.  Most lawyers who litigate cases that involve exposure-response issues are familiar with the “linear no threshold” (LNT) concept that is used frequently in regulatory risk assessments, and which has metastasized to toxic tort litigation, where LNT often has no proper place.

LNT is a dubious assumption because it claims to “know” the dose response at very low exposure levels in the absence of data.  There is a thin plausibility for LNT for genotoxic chemicals claimed to be carcinogens, but even that plausibility evaporates when one realizes that there are DNA defense and repair mechanisms to genotoxicity, which must first be saturated, overwhelmed, or inhibited, before there can be a carcinogenic response. The upshot is that low exposures that do not swamp DNA repair and tumor suppression proteins will not cause cancer.

Hormesis is today an accepted concept that describes a dose-response relationship that shows a benefit at low doses, but harm at high doses. The toxicology chapter in the Reference Manual has several references to LNT but none to hormesis.  That font of all knowledge, Wikipedia reports that hormesis is controversial, but so is LNT.  This is the sort of imbalance that may well reflect an ideological bias.

One of the leading textbooks on toxicology describes hormesis[15]:

“There is considerable evidence to suggest that some non-nutritional toxic substances may also impart beneficial or stimulatory effects at low doses but that, at higher doses, they produce adverse effects. This concept of “hormesis” was first described for radiation effects but may also pertain to most chemical responses.”

Similarly, the Encyclopedia of Toxicology describes hormesis as an important phenomenon in toxicologic science[16]:

“This type of dose–response relationship is observed in a phenomenon known as hormesis, with one explanation being that exposure to small amounts of a material can actually confer resistance to the agent before frank toxicity begins to appear following exposures to larger amounts.  However, analysis of the available mechanistic studies indicates that there is no single hormetic mechanism. In fact, there are numerous ways for biological systems to show hormetic-like biphasic dose–response relationship. Hormetic dose–response has emerged in recent years as a dose–response phenomenon of great interest in toxicology and risk assessment.”

One might think that hormesis would also be of great interest to federal judges, but they will not learn about it from reading the Reference Manual.

Hormesis research has come into its own.  The International Dose-Response Society, which “focus[es] on the dose-response in the low-dose zone,” publishes a journal, Dose-Response, and a newsletter, BELLE:  Biological Effects of Low Level Exposure.  In 2009, two leading researchers in the area of hormesis published a collection of important papers:  Mark P. Mattson and Edward J. Calabrese, eds., Hormesis: A Revolution in Biology, Toxicology and Medicine (2009).

A check in PubMed shows that LNT has more “hits” than “hormesis” or “hermetic,” but still the latter phrases exceed 1,267 references, hardly insubstantial.  In actuality, there are many more hermetic relationships identified in the scientific literature, which often fails to identify the relationship by the term hormesis or hermetic.[17]

The Reference Manual’s omission of hormesis was regrettable.  Its inclusion of references to LNT but not to hormesis suggests a biased treatment of the subject.

  1. Questionable Substantive Opinions

Readers and litigants would fondly hope that the toxicology chapter would not put forward partisan substantive positions on issues that are currently the subject of active litigation.  Fervently, we would hope that any substantive position advanced would at least be well documented.

For at least one issue, the toxicology chapter disappointed significantly.  Table 1 in the chapter presents a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” No documentation or citations are provided for this table.  Most of the exposure agent/disease outcome relationships in the table are well accepted, but curiously at least one agent-disease pair, which is the subject of current litigation, is wildly off the mark:

“Parkinson’s disease and manganese[18]

If the chapter’s authors had looked, they would have found that Parkinson’s disease is almost universally accepted to have no known cause, at least outside court rooms.  They would also have found that the issue has been addressed carefully and the claimed relationship or “concern” has been rejected by the leading researchers in the field (who have no litigation ties).[19]  Table 1 suggests a certain lack of objectivity, and its inclusion of a highly controversial relationship, manganese-Parkinson’s disease, suggests a good deal of partisanship.

  1. When All You Have Is a Hammer, Everything Looks Like a Nail

The substantive area author, Professor Goldstein, is not a physician; nor is he an epidemiologist.  His professional focus on animal and cell research appeared to color and bias the opinions offered in this chapter:[20]

“In qualitative extrapolation, one can usually rely on the fact that a compound causing an effect in one mammalian species will cause it in another species. This is a basic principle of toxicology and pharmacology.  If a heavy metal, such as mercury, causes kidney toxicity in laboratory animals, it is highly likely to do so at some dose in humans.”

Such extrapolations may make sense in regulatory contexts, where precauationary judgments are of interest, but they hardly can be said to be generally accepted in controversies in scientific communities, or in civil actions over actual causation.  There are too many counterexamples to cite, but consider crystalline silica, silicon dioxide.  Silica causes something resembling lung cancer in rats, but not in mice, guinea pigs, or hamsters.  It hardly makes sense to ask juries to decide whether the plaintiff is more like a rat than a mouse.

For a sober second opinion to the toxicology chapter, one may consider the views of some well-known authors:

“Whereas the concordance was high between cancer-causing agents initially discovered in humans and positive results in animal studies (Tomatis et al., 1989; Wilbourn et al., 1984), the same could not be said for the reverse relationship: carcinogenic effects in animals frequently lacked concordance with overall patterns in human cancer incidence (Pastoor and Stevens, 2005).”[21]

III. New Reference Manual’s Uneven Treatment of Causation and of Conflicts of Interest

The third edition of the Reference Manual on Scientific Evidence (RMSE) appeared to get off to a good start in the Preface by Judge Kessler and Dr. Kassirer, when they noted that the Supreme Court mandated federal courts to:

“examine the scientific basis of expert testimony to ensure that it meets the same rigorous standard employed by scientific researchers and practitioners outside the courtroom.”

RMSE at xiii.  The preface faltered, however, on two key issues, causation and conflicts of interest, which are taken up as an introduction to the third edition.

  1. Causation

The authors reported in somewhat squishy terms that causal assessments are judgments:

“Fundamentally, the task is an inferential process of weighing evidence and using judgment to conclude whether or not an effect is the result of some stimulus. Judgment is required even when using sophisticated statistical methods. Such methods can provide powerful evidence of associations between variables, but they cannot prove that a causal relationship exists. Theories of causation (evolution, for example) lose their designation as theories only if the scientific community has rejected alternative theories and accepted the causal relationship as fact. Elements that are often considered in helping to establish a causal relationship include predisposing factors, proximity of a stimulus to its putative outcome, the strength of the stimulus, and the strength of the events in a causal chain.”[22]

The authors left the inferential process as a matter of “weighing evidence,” but without saying anything about how the scientific community does its “weighing.” Language about “proving” causation is also unclear because “proof” in scientific parlance connotes a demonstration, which we typically find in logic or in mathematics. Proving empirical propositions suggests a bar set so high such that the courts must inevitably acquiesce in a very low threshold of evidence.  The question, of course, is how low can and will judges go to admit evidence.

The authors thus introduced hand waving and excuses for why evidence can be weighed differently in court proceedings from the world of science:

“Unfortunately, judges may be in a less favorable position than scientists to make causal assessments. Scientists may delay their decision while they or others gather more data. Judges, on the other hand, must rule on causation based on existing information. Concepts of causation familiar to scientists (no matter what stripe) may not resonate with judges who are asked to rule on general causation (i.e., is a particular stimulus known to produce a particular reaction) or specific causation (i.e., did a particular stimulus cause a particular consequence in a specific instance). In the final analysis, a judge does not have the option of suspending judgment until more information is available, but must decide after considering the best available science.”[23]

But the “best available science” may be pretty crummy, and the temptation to turn desperation into evidence (“well, it’s the best we have now”) is often severe.  The authors of the Preface thus remarkable signalled that “inconclusive” is not a judgment open to judges charged with expert witness gatekeeping.  If the authors truly meant to suggest that judges should go with whatever is dished out as “the best available science,” then they have overlooked the obvious:  Rule 702 opens the door to “scientific, technical, or other specialized knowledge,” not to hunches, suggestive but inconclusive evidence, and wishful thinking about how the science may turn out when further along.  Courts have a choice to exclude expert witness opinion testimony that is based upon incomplete or inconclusive evidence. The authors went fairly far afield to suggest, erroneously, that the incomplete and the inconclusive are good enough and should be admitted.

  1. Conflicts of Interest

Surprisingly, given the scope of the scientific areas covered in the RMSE, the authors discussed conflicts of interest (COI) at some length.  Conflicts of interest are a fact of life in all endeavors, and it is understandable counsel judges and juries to try to identify, assess, and control them.  COIs, however, are weak proxies for unreliability.  The emphasis given here was, however, undue because federal judges were enticed into thinking that they can discern unreliability from COI, when they should be focused on the data, inferences, and analyses.

What becomes fairly clear is that the authors of the Preface set out to use COI as a basis for giving litigation plaintiffs a pass, and for holding back studies sponsored by corporate defendants.

“Conflict of interest manifests as bias, and given the high stakes and adversarial nature of many courtroom proceedings, bias can have a major influence on evidence, testimony, and decisionmaking. Conflicts of interest take many forms and can be based on religious, social, political, or other personal convictions. The biases that these convictions can induce may range from serious to extreme, but these intrinsic influences and the biases they can induce are difficult to identify. Even individuals with such prejudices may not appreciate that they have them, nor may they realize that their interpretations of scientific issues may be biased by them. Because of these limitations, we consider here only financial conflicts of interest; such conflicts are discoverable. Nonetheless, even though financial conflicts can be identified, having such a conflict, even one involving huge sums of money, does not necessarily mean that a given individual will be biased. Having a financial relationship with a commercial entity produces a conflict of interest, but it does not inevitably evoke bias. In science, financial conflict of interest is often accompanied by disclosure of the relationship, leaving to the public the decision whether the interpretation might be tainted. Needless to say, such an assessment may be difficult. The problem is compounded in scientific publications by obscure ways in which the conflicts are reported and by a lack of disclosure of dollar amounts.

Judges and juries, however, must consider financial conflicts of interest when assessing scientific testimony. The threshold for pursuing the possibility of bias must be low. In some instances, judges have been frustrated in identifying expert witnesses who are free of conflict of interest because entire fields of science seem to be co-opted by payments from industry. Judges must also be aware that the research methods of studies funded specifically for purposes of litigation could favor one of the parties. Though awareness of such financial conflicts in itself is not necessarily predictive of bias, such information should be sought and evaluated as part of the deliberations.”[24]

All in all, rather misleading advice.  Financial conflicts are not the only conflicts that can be “discovered.”  Often expert witnesses will have political and organizational alignments, which will show deep-seated ideological alignments with the party for which they are testifying.  For instance, in one silicosis case, an expert witness in the field of history of medicine testified, at an examination before trial, that his father suffered from a silica-related disease.  This witness’s alignment with Marxist historians and his identification with radical labor movements made his non-financial conflicts obvious, although these COI would not necessarily have been apparent from his scholarly publications alone.

How low will the bar be set for discovering COI?  If testifying expert witnesses are relying upon textbooks, articles, essays, will federal courts open the authors/hearsay declarants up to searching discovery of their finances? What really is at stake here is that the issues of accuracy, precision, and reliability are lost in the ad hominem project of discovery COIs.

Also misleading was the suggestion that “entire fields of science seem to be co-opted by payments from industry.”  Do the authors mean to exclude the plaintiffs’ lawyer lawsuit industry, which has become one of the largest rent-seeking organizations, and one of the most politically powerful groups in this country?  In litigations in which I have been involved, I have certainly seen plaintiffs’ counsel, or their proxies – labor unions, federal agencies, or “victim support groups” provide substantial funding for studies.  The Preface authors themselves show an untoward bias by their pointing out industry payments without giving balanced attention to other interested parties’ funding of scientific studies.

The attention to COI was also surprising given that one of the key chapters, for toxic tort practitioners, was written by Dr. Bernard D. Goldstein, who has testified in toxic tort cases, mostly (but not exclusively) for plaintiffs.[25]  In one such case, Makofsky, Dr. Goldstein’s participation was particularly revealing because he was forced to explain why he was willing to opine that benzene caused acute lymphocytic leukemia, despite the plethora of published studies finding no statistically significant relationship.  Dr. Goldstein resorted to the inaccurate notion that scientific “proof” of causation requires 95 percent certainty, whereas he imposed only a 51 percent certainty for his medico-legal testimonial adventures.[26] Dr. Goldstein also attempted to justify the discrepancy from the published literature by adverting to the lower standards used by federal regulatory agencies and treating physicians.  

These explanations were particularly concerning because they reflect basic errors in statistics and in causal reasoning.  The 95 percent derives from the use of the coefficient of confidence in confidence intervals, but the probability involved there is not the probability of the association’s being correct, and it has nothing to do with the probability in the belief that an association is real or is causal.  (Thankfully the RMSE chapter on statistics got this right, but my fear is that judges will skip over the more demanding chapter on statistics and place undue weight on the toxicology chapter.)  The reference to federal agencies (OSHA, EPA, etc.) and to treating physicians was meant, no doubt, to invoke precautionary principle concepts as a justification for some vague, ill-defined, lower standard of causal assessment.  These references were really covert invitations to shift the burden of proof.

The Preface authors might well have taken their own counsel and conducted a more searching assessment of COI among authors of Reference Manual.  Better yet, the authors might have focused the judiciary on the data and the analysis.

  1. Reference Manual on Scientific Evidence (3d edition) on Statistical Significance

How does the new Reference Manual on Scientific Evidence treat statistical significance?  Inconsistently and at times incoherently.

  1. Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raised the question what role statistical significance should play in evaluating a study’s support for causal conclusions[27]:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value,62 at least in proving general causation.63

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology. Certainly the failure to rule out causation is not probative of causation. How can that tautology support the claim that inconclusive studies “therefore” have some probative value? Berger’s argument seems obviously invalid, or perhaps text that badly needed a posthumous editor.  And what epidemiologic studies are conclusive?  Are the studies individually or collectively conclusive?  Berger introduced a tantalizing concept, which was not spelled out anywhere in the Manual.

Berger’s chapter raised other, serious problems. If the relied-upon studies are not statistically significant, how should we understand the testifying expert witness to have ruled out random variability as an explanation for the disparity observed in the study or studies?  Berger did not answer these important questions, but her rhetoric elsewhere suggested that trial courts should not look too hard at the statistical support (or its lack) for what expert witness testimony is proffered.

Berger’s citations in support were curiously inaccurate.  Footnote 62 cites the Cook case:

“62. See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”

Berger’s citation was disturbingly incomplete.[28] The expert witness, Dr. Clapp, in Cook did rely upon his own study, which did not obtain a statistically significant result, but the trial court admitted the expert witness’s testimony; the court denied the Rule 702 challenge to Clapp, and permitted him to testify about a statistically non-significant ecological study. Given that the judgment of the district court was reversed

Footnote 63 is no better:

“63. In re Viagra Prods., 572 F. Supp. 2d 1071 (D. Minn. 2008) (extensive review of all expert evidence proffered in multidistricted product liability case).”

With respect to the concept of statistical significance, the Viagra case centered around the motion to exclude plaintiffs’ expert witness, Gerald McGwin, who relied upon three studies, none of which obtained a statistically significant result in its primary analysis.  The Viagra court’s review was hardly extensive; the court did not report, discuss, or consider the appropriate point estimates in most of the studies, the confidence intervals around those point estimates, or any aspect of systematic error in the three studies.  At best, the court’s review was perfunctory.  When the defendant brought to light the lack of data integrity in McGwin’s own study, the Viagra MDL court reversed itself, and granted the motion to exclude McGwin’s testimony.[29]  Berger’s chapter omitted the cautionary tale of McGwin’s serious, pervasive errors, and how they led to his ultimate exclusion. Berger’s characterization of the review was incorrect, and her failure to cite the subsequent procedural history, misleading.

  1. Chapter on Statistics

The Third Edition’s chapter on statistics was relatively free of value judgments about significance probability, and, therefore, an improvement over Berger’s introduction.  The authors carefully described significance probability and p-values, and explain[30]:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

Although the chapter conflated the positions often taken to be Fisher’s interpretation of p-values and Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment was unfortunately fairly standard in introductory textbooks.  The authors may have felt that presenting multiple interpretations of p-values was asking too much of judges and lawyers, but the oversimplification invited a false sense of certainty about the inferences that can be drawn from statistical significance.

Kaye and Freedman, however, did offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome[31]:

“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large dataset—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113

This important qualification to statistical significance was omitted from the overlapping discussion in the chapter on epidemiology, where it was very much needed.

  1. Chapter on Multiple Regression

The chapter on regression did not add much to the earlier and later discussions.  The author asked rhetorically what is the appropriate level of statistical significance, and answers:

“In most scientific work, the level of statistical significance required to reject the null hypothesis (i.e., to obtain a statistically significant result) is set conventionally at 0.05, or 5%.47

Daniel Rubinfeld, “Reference Guide on Multiple Regression,” in RMSE3d 303, 320.

  1. Chapter on Epidemiology

The chapter on epidemiology[32] mostly muddled the discussion set out in Kaye and Freedman’s chapter on statistics.

“The two main techniques for assessing random error are statistical significance and confidence intervals. A study that is statistically significant has results that are unlikely to be the result of random error, although any criterion for ‘significance’ is somewhat arbitrary. A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”

The suggestion that a statistically significant study has results unlikely due to chance, without reminding the reader that the finding is predicated on the assumption that there is no association, and that the probability distribution was correct, and came close to crossing the line in committing the transposition fallacy so nicely described and warned against in the statistics chapter. The problem was that “results” is ambiguous as between the data as extreme or more so than what was observed, and the point estimate of the mean or proportion in the sample, and the assumptions that lead to a p-value were not disclosed.

The suggestion that alpha is “arbitrary,” was “somewhat” correct, but this truncated discussion was distinctly unhelpful to judges who are likely to take “arbitrary“ to mean “I will get reversed.”  The selection of alpha is conventional to some extent, and arbitrary in the sense that the law’s setting an age of majority or a voting age is arbitrary.  Some young adults, age 17.8 years old, may be better educated, better engaged in politics, better informed about current events, than 35 year olds, but the law must set a cut off.  Two year olds are demonstrably unfit, and 82 year olds are surely past the threshold of maturity requisite for political participation. A court might admit an opinion based upon a study of rare diseases, with tight control of bias and confounding, when p = 0.051, but that is hardly a justification for ignoring random error altogether, or admitting an opinion based upon a study, in which the disparity observed had a p = 0.15.

The epidemiology chapter correctly called out judicial decisions that confuse “effect size” with statistical significance[33]:

“Understandably, some courts have been confused about the relationship between statistical significance and the magnitude of the association. See Hyman & Armstrong, P.S.C. v. Gunderson, 279 S.W.3d 93, 102 (Ky. 2008) (describing a small increased risk as being considered statistically insignificant and a somewhat larger risk as being considered statistically significant.); In re Pfizer Inc. Sec. Litig., 584 F. Supp. 2d 621, 634–35 (S.D.N.Y. 2008) (confusing the magnitude of the effect with whether the effect was statistically significant); In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1041 (S.D.N.Y. 1993) (concluding that any relative risk less than 1.50 is statistically insignificant), rev’d on other grounds, 52 F.3d 1124 (2d Cir. 1995).”

Actually this confusion is not understandable at all.  The distinction has been the subject of teaching since the first edition of the Reference Manual, and two of the cited cases post-date the second edition.  The Southern District of New York asbestos case, of course, predated the first Manual.  To be sure, courts have on occasion badly misunderstood significance probability and significance testing.   The authors of the epidemiology chapter could well have added In re Viagra, to the list of courts that confused effect size with statistical significance.[34]

The epidemiology chapter appropriately chastised courts for confusing significance probability with the probability that the null hypothesis, or its complement, is correct[35]:

“A common error made by lawyers, judges, and academics is to equate the level of alpha with the legal burden of proof. Thus, one will often see a statement that using an alpha of .05 for statistical significance imposes a burden of proof on the plaintiff far higher than the civil burden of a preponderance of the evidence (i.e., greater than 50%).  See, e.g., In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 193 (S.D.N.Y. 2005); Marmo v. IBP, Inc., 360 F. Supp. 2d 1019, 1021 n.2 (D. Neb. 2005) (an expert toxicologist who stated that science requires proof with 95% certainty while expressing his understanding that the legal standard merely required more probable than not). But see Giles v. Wyeth, Inc., 500 F. Supp. 2d 1048, 1056–57 (S.D. Ill. 2007) (quoting the second edition of this reference guide).

Comparing a selected p-value with the legal burden of proof is mistaken, although the reasons are a bit complex and a full explanation would require more space and detail than is feasible here. Nevertheless, we sketch out a brief explanation: First, alpha does not address the likelihood that a plaintiff’s disease was caused by exposure to the agent; the magnitude of the association bears on that question. See infra Section VII. Second, significance testing only bears on whether the observed magnitude of association arose  as a result of random chance, not on whether the null hypothesis is true. Third, using stringent significance testing to avoid false-positive error comes at a complementary cost of inducing false-negative error. Fourth, using an alpha of .5 would not be equivalent to saying that the probability the association found is real is 50%, and the probability that it is a result of random error is 50%.”

The footnotes went on to explain further the difference between alpha probability and burden of proof probability, but somewhat misleadingly asserted that “significance testing only bears on whether the observed magnitude of association arose as a result of random chance, not on whether the null hypothesis is true.”[36]  The significance probability does not address the probability that the observed statistic is the result of random chance; rather it describes the probability of observing at least as large a departure from the expected value if the null hypothesis is true.  Of course, if this cumulative probability is sufficiently low, then the null hypothesis is rejected, and this would seem to bear upon whether the null hypothesis is true.  Kaye and Freedman’s chapter on statistics did much better at describing p-values and avoiding the transposition fallacy.

When they stayed on message, the authors of the epidemiology chapter were certainly correct that significance probability cannot be translated into an assessment of the probability that the null hypothesis, or the obtained sampling statistic, is correct.  What these authors omitted, however, was a clear statement that the many courts and counsel who have misstated this fact do not create any worthwhile precedent, persuasive or binding.

The epidemiology chapter ultimately failed to help judges in assessing statistical significance:

“There is some controversy among epidemiologists and biostatisticians about the appropriate role of significance testing.85 To the strictest significance testers, any study whose p-value is not less than the level chosen for statistical significance should be rejected as inadequate to disprove the null hypothesis. Others are critical of using strict significance testing, which rejects all studies with an observed p-value below that specified level. Epidemiologists have become increasingly sophisticated in addressing the issue of random error and examining the data from a study to ascertain what information they may provide about the relationship between an agent and a disease, without the necessity of rejecting all studies that are not statistically significant.86 Meta-analysis, as well, a method for pooling the results of multiple studies, sometimes can ameliorate concerns about random error.87  Calculation of a confidence interval permits a more refined assessment of appropriate inferences about the association found in an epidemiologic study.88

Id. at 578-79.  Mostly true, but again rather unhelpful to judges and lawyers.  Some of the controversy, to be sure, was instigated by statisticians and epidemiologists who would elevate Bayesian methods, and eliminate the use of significance probability and testing altogether. As for those scientists who still work within the dominant frequentist statistical paradigm, the chapter authors divided the world up into “strict” testers and those critical of “strict” testing.  Where, however, is the boundary? Does criticism of “strict” testing imply embrace of “non-strict” testing, or of no testing at all?  I can sympathize with a judge who permits reliance upon a series of studies that all go in the same direction, with each having a confidence interval that just misses excluding the null hypothesis.  Meta-analysis in such a situation might not just ameliorate concerns about random error, it might eliminate them.  But what of those scientists critical of strict testing?  This certainly does not suggest or imply that courts can or should ignore random error; yet that is exactly what happened in the early going in In re Viagra Products Liab. Litig.[37]  The epidemiology chapter’s reference to confidence intervals was correct in part; they permit a more refined assessment because they permit a more direct assessment of the extent of random error in terms of magnitude of association, as well as the point estimate of the association obtained from and conditioned on the sample.  Confidence intervals, however, do not eliminate the need to interpret the extent of random error; rather they provide a more direct assessment and measurement of the standard error.

V. Power in the Reference Manual for Scientific Evidence

The Third Edition treated statistical power in three of its chapters, those on statistics, epidemiology, and medical testimony.  Unfortunately, the treatments were not always consistent.

The chapter on statistics has been consistently among the most frequently ignored content of the three editions of the Reference Manual.  The third edition offered a good introduction to basic concepts of sampling, random variability, significance testing, and confidence intervals.[38]  Kaye and Freedman provided an acceptable non-technical definition of statistical power[39]:

“More precisely, power is the probability of rejecting the null hypothesis when the alternative hypothesis … is right. Typically, this probability will depend on the values of unknown parameters, as well as the preset significance level α. The power can be computed for any value of α and any choice of parameters satisfying the alternative hypothesis. Frequentist hypothesis testing keeps the risk of a false positive to a specified level (such as α = 5%) and then tries to maximize power. Statisticians usually denote power by the Greek letter beta (β). However, some authors use β to denote the probability of accepting the null hypothesis when the alternative hypothesis is true; this usage is fairly standard in epidemiology. Accepting the null hypothesis when the alternative holds true is a false negative (also called a Type II error, a missed signal, or a false acceptance of the null hypothesis).”

The definition was not, however, without problems.  First, it introduced a nomenclature issue likely to be confusing for judges and lawyers. Kaye and Freeman used β to denote statistical power, but they acknowledge that epidemiologists use β to denote the probability of a Type II error.  And indeed, both the chapters on epidemiology and medical testimony used β to reference Type II error rate, and thus denote power as the complement of β, or (1- β).[40]

Second, the reason for introducing the confusion about β was doubtful.  Kaye and Freeman suggested that statisticians usually denote power by β, but they offered no citations.  A quick review (not necessarily complete or even a random sample) suggests that many modern statistics texts denote power as (1- β).[41]   At the end of the day, there really was no reason for the conflicting nomenclature and the likely confusion it would engenders.  Indeed, the duplicative handling of statistical power, and other concepts, suggested that it is time to eliminate the repetitive discussions, in favor of one, clear, thorough discussion in the statistics chapter.

Third, Kaye and Freeman problematically refer to β as the probability of accepting the null hypothesis when elsewhere they more carefully instructed that a non-significant finding results in not rejecting the null hypothesis as opposed to accepting the null.  Id. at 253.[42]

Fourth, Kaye and Freeman’s discussion of power, unlike most of their chapter, offered advice that is controversial and unclear:

“On the other hand, when studies have a good chance of detecting a meaningful association, failure to obtain significance can be persuasive evidence that there is nothing much to be found.”[43]

Note that the authors left open what a legal or clinically meaningful association is, and thus offered no real guidance to judges on how to evaluate power after data are collected and analyzed.  As Professor Sander Greenland has argued, in legal contexts, this reliance upon observed power (as opposed to power as a guide in determining appropriate sample size in the planning stages of a study) was arbitrary and “unsalvageable as an analytic tool.”[44]

The chapter on epidemiology offered similar controversial advice on the use of power[45]:

“When a study fails to find a statistically significant association, an important question is whether the result tends to exonerate the agent’s toxicity or is essentially inconclusive with regard to toxicity.93 The concept of power can be helpful in evaluating whether a study’s outcome is exonerative or inconclusive.94  The power of a study is the probability of finding a statistically significant association of a given magnitude (if it exists) in light of the sample sizes used in the study. The power of a study depends on several factors: the sample size; the level of alpha (or statistical significance) specified; the background incidence of disease; and the specified relative risk that the researcher would like to detect.95  Power curves can be constructed that show the likelihood of finding any given relative risk in light of these factors. Often, power curves are used in the design of a study to determine what size the study populations should be.96

Although the authors correctly emphasized the need to specify an alternative hypothesis, their discussion and advice were empty of how that alternative should be selected in legal contexts.  The suggestion that power curves can be constructed was, of course, true, but irrelevant unless courts know where on the power curve they should be looking.  The authors were also correct that power is used to determine adequate sample size under specified conditions; but again, the use of power curves in this setting is today rather uncommon.  Investigators select a level of power corresponding to an acceptable Type II error rate, and an alternative hypothesis that would be clinically meaningful for their research, in order to determine their sample size. Translating clinical into legal meaningfulness is not always straightforward.

In a footnote, the authors of the epidemiology chapter noted that Professor Rothman has been “one of the leaders in advocating the use of confidence intervals and rejecting strict significance testing.”[46] What the chapter failed, however, to mention is that Rothman has also been outspoken in rejecting post-hoc power calculations that the epidemiology chapter seemed to invite:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis. The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect. In planning a study, it is reasonable to make conjectures about the magnitude of an effect to compute study-size requirements or power. In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with study-size or power calculations (Smith and Bates, 1992; Goodman and Berlin, 1994; Hoening and Heisey, 2001). Confidence limits and (even more so) P-value functions convey much more of the essential information by indicating the range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level), assuming the statistical model is correct. They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”[47]

The selective, incomplete scholarship of the epidemiology chapter on the issue of statistical power was not only unfortunate, but it distorted the authors’ evaluation of the sparse case law on the issue of power.  For instance, they noted:

“Even when a study or body of studies tends to exonerate an agent, that does not establish that the agent is absolutely safe. See Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767 (N.D. Ohio 2010). Epidemiology is not able to provide such evidence.”[48]

Here the authors, Green, Freedman, and Gordis, shifted the burden to the defendant and then go to an even further extreme of making the burden of proof one of absolute certainty in the product’s safety.  This is not, and never has been, a legal standard. The cases they cited amplified the error. In Cooley, for instance, the defense expert would have opined that welding fume exposure did not cause parkinsonism or Parkinson’s disease.  Although the expert witness had not conducted a meta-analysis, he had reviewed the confidence intervals around the point estimates of the available studies.  Many of the point estimates were at or below 1.0, and in some cases, the upper bound of the confidence interval excluded 1.0.  The trial court expressed its concern that the expert witness had inferred “evidence of absence” from “absence of evidence.”  Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010).  This concern, however, was misguided given that many studies had tested the claimed association, and that virtually every case-control and cohort study had found risk ratios at or below 1.0, or very close to 1.0.  What the court in Cooley, and the authors of the epidemiology chapter in the third edition have lost sight of, is that when the hypothesis is repeatedly tested, with failure to reject the null hypothesis, and with point estimates at or very close to 1.0, and with narrow confidence intervals, then the claimed association is probably incorrect.[49]

The Cooley court’s comments might have had some validity when applied to a single study, but not to the impressive body of exculpatory epidemiologic evidence that pertained to welding fume and Parkinson’s disease.  Shortly after the Cooley case was decided, a published meta-analysis of welding fume or manganese exposure demonstrated a reduced level of risk for Parkinson’s disease among persons occupationally exposed to welding fumes or manganese.[50]

VI. The Treatment of Meta-Analysis in the Third Edition

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis in epidemiology and the social sciences, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by two capable epidemiologists, Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”[51]

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing.[52]

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.[53]

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.[54]  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis. On remand, however, it seems that plaintiffs chose – wisely – not to proceed with Nicholson’s meta-analysis.[55]

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that:

“no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”[56]

The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is completely wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.[57]  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.[58]

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.[59]

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.[60]  Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation.[61]

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis; the third edition did not add very much to the discussion.  The time has come for the next edition to address meta-analyses, and criteria for their validity or invalidity.

  1. Statistics Chapter

The statistics chapter of the third edition gave scant attention to meta-analysis.  The chapter noted, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautioned that meta-analytic procedures “have their own weakness,”[62] without detailing what that weakness is. The time has come to spell out the weaknesses so that trial judges can evaluate opinion testimony based upon meta-analyses.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”[63]

This definition was inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are, or should be, built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.

  1. Epidemiology Chapter

The chapter on epidemiology delved into meta-analysis in greater detail than the statistics chapter, and offered apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”[64]

It is now time to tell trial judges what “careful” means in the context of conducting and reporting and relying upon meta-analyses.

The epidemiology chapter appropriately noted that meta-analysis can help address concerns over random error in small studies.[65]  Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter then proceeded to hedge considerably[66]:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

The stated objection to pooling results for observational studies was certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter went on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the Reference Manual, the authors in the third edition repeated their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”[67]

Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of systematic reviews and meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the third edition of the Reference Manual failed to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice[68]:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

The epidemiology chapter authors were entitled to their opinion, but their discussion left the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed. What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.

  1. Medical Testimony Chapter

The chapter on medical testimony is the third pass at meta-analysis in the third edition of the Reference Manual.  The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs[69]:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

This language from the medical testimony chapter curiously omitted observational studies, but the footnote reference (note 143) then inconsistently discussed two meta-analyses of observational, rather than experimental, studies.[70]  The chapter then provided even further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs[71]:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

This discussion further muddied the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are usually a necessary precondition for conducting a proper meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the third edition of the Reference Manual.

CONCLUSION

The idea of the Reference Manual was important to support trial judge’s efforts to engage in gatekeeping in unfamiliar subject matter areas. In its third incarnation, the Manual has become a standard starting place for discussion, but on several crucial issues, the third edition was unclear, imprecise, contradictory, or muddled. The organizational committee and authors for the fourth edition have a fair amount of work on their hands. There is clearly room for improvement in the Fourth Edition.


[1] Adam Dutkiewicz, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 28 Thomas M. Cooley L. Rev. 343 (2011); John A. Budny, “Book Review: Reference Manual on Scientific Evidence, Third Edition,” 31 Internat’l J. Toxicol. 95 (2012); James F. Rogers, Jim Shelson, and Jessalyn H. Zeigler, “Changes in the Reference Manual on Scientific Evidence (Third Edition),” Internat’l Ass’n Def. Csl. Drug, Device & Biotech. Comm. Newsltr. (June 2012).  See Schachtman “New Reference Manual’s Uneven Treatment of Conflicts of Interest.” (Oct. 12, 2011).

[2] The Manual did not do quite so well in addressing its own conflicts of interest.  See, e.g., infra at notes 7, 20.

[3] RSME 3d 11 (2011).

[4] Id. at 19.

[5] Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).

[6] Id. at 19-20 & n.52.

[7] Professor Berger filed an amicus brief on behalf of plaintiffs, in Rider v. Sandoz Pharms. Corp., 295 F.3d 1194 (11th Cir. 2002).

[8] Id. at 20 n.51. (The editors noted misleadingly that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”). I have written elsewhere of the ethical cloud hanging over this Milward decision. SeeCarl Cranor’s Inference to the Best Explanation” (Feb. 12, 2021); “From here to CERT-ainty” (June 28, 2018); “The Council for Education & Research on Toxics” (July 9, 2013) (CERT amicus brief filed without any disclosure of conflict of interest). See also NAS, “Carl Cranor’s Conflicted Jeremiad Against Daubert” (Sept. 23, 2018).

[9] RMSE 3d at 610 (internal citations omitted).

[10] RMSE 3d at 610 n.184 (emphasis, in bold, added).

[11] Interestingly, the authors of this chapter seem to abandon their suggestion that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5),” which was part of their argument in the Second Edition.  RMSE 2d at 335 (2000).  See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion, which is hardly admissibility at all).

[12] David L. Faigman, et al., Modern Scientific Evidence:  The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009) (“Well conducted studies are uniformly admitted.”).

[13] See Richard M. Lynch and Mary S. Henifin, “Causation in Occupational Disease: Balancing Epidemiology, Law and Manufacturer Conduct,” 9 Risk: Health, Safety & Environment 259, 269 (1998) (conflating distinct causal and liability concepts, and arguing that legal and scientific causal criteria should be abrogated when manufacturing defendant has breached a duty of care).

[14]  See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006) (dismissing leukemia (AML) claim based upon claimed low-level benzene exposure from gasoline), aff’g 16 A.D.3d 648 (App. Div. 2d Dep’t 2005).  No; you will not find the Parker case cited in the Manual‘s chapter on toxicology. (Parker is, however, cited in the chapter on exposure science even though it is a state court case.).

[15] Curtis D. Klaassen, Casarett & Doull’s Toxicology: The Basic Science of Poisons 23 (7th ed. 2008) (internal citations omitted).

[16] Philip Wexler, Bethesda, et al., eds., 2 Encyclopedia of Toxicology 96 (2005).

[17] See Edward J. Calabrese and Robyn B. Blain, “The hormesis database: The occurrence of hormetic dose responses in the toxicological literature,” 61 Regulatory Toxicology and Pharmacology 73 (2011) (reviewing about 9,000 dose-response relationships for hormesis, to create a database of various aspects of hormesis).  See also Edward J. Calabrese and Robyn B. Blain, “The occurrence of hormetic dose responses in the toxicological literature, the hormesis database: An overview,” 202 Toxicol. & Applied Pharmacol. 289 (2005) (earlier effort to establish hormesis database).

[18] Reference Manual at 653

[19] See e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence.  26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and non­human primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong sup­port to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clini­cal outcome is not associated with the degen­eration of nigrostriatal dopaminergic neurons as is the case in PD [Parkinson’s disease].”)

[20] RMSE3ed at 646.

[21] Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxciological Sciences 223, 224 (2011).

[22] RMSE3d at xiv.

[23] RMSE3d at xiv.

[24] RMSE3d at xiv-xv.

[25] See, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006); Exxon Corp. v. Makofski, 116 SW 3d 176 (Tex. Ct. App. 2003).

[26] Goldstein here and elsewhere has confused significance probability with the posterior probability required by courts and scientists.

[27] Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

[28] Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071, 1122 (D. Colo. 2006), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012).

[29] In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009). 

[31] Id. at 256 -57.

[32] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 573.

[33] Id. at 573n.68.

[34] See In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[35] RSME3d at 577 n81.

[36] Id.

[37] 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

[38] David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in RMSE3ed 209 (2011).

[39] Id. at 254 n.106

[40] See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3ed 549, 582, 626 ; John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, Abogado, “Reference Guide on Medical Testimony,” in RMSE3ed 687, 724.  This confusion in nomenclature is regrettable, given the difficulty many lawyers and judges seem have in following discussions of statistical concepts.

[41] See, e.g., Richard D. De Veaux, Paul F. Velleman, and David E. Bock, Intro Stats 545-48 (3d ed. 2012); Rand R. Wilcox, Fundamentals of Modern Statistical Methods 65 (2d ed. 2010).

[42] See also Daniel Rubinfeld, “Reference Guide on Multiple Regression,“ in RMSE3d 303, 321 (describing a p-value > 5% as leading to failing to reject the null hypothesis).

[43] RMSE3d at 254.

[44] See Sander Greenland, “Nonsignificance Plus High Power Does Not Imply Support Over the Alternative,” 22 Ann. Epidemiol. 364, 364 (2012).

[45] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE3ed 549, 582.

[46] RMSE3d at 579 n.88.

[47] Kenneth Rothman, Sander Greenland, and Timothy Lash, Modern Epidemiology 160 (3d ed. 2008).  See also Kenneth J. Rothman, “Significance Questing,” 105 Ann. Intern. Med. 445, 446 (1986) (“[Simon] rightly dismisses calculations of power as a weak substitute for confidence intervals, because power calculations address only the qualitative issue of statistical significance and do not take account of the results already in hand.”).

[48] RMSE3d at 582 n.93; id. at 582 n.94 (“Thus, in Smith v. Wyeth-Ayerst Labs. Co., 278 F.Supp. 2d 684, 693 (W.D.N.C. 2003), and Cooley v. Lincoln Electric Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010), the courts recognized that the power of a study was critical to assessing whether the failure of the study to find a statistically significant association was exonerative of the agent or inconclusive.”).

[49] See, e.g., Anthony J. Swerdlow, Maria Feychting, Adele C. Green, Leeka Kheifets, David A. Savitz, International Commission for Non-Ionizing Radiation Protection Standing Committee on Epidemiology, “Mobile Phones, Brain Tumors, and the Interphone Study: Where Are We Now?” 119 Envt’l Health Persp. 1534, 1534 (2011) (“Although there remains some uncertainty, the trend in the accumulating evidence is increasingly against the hypothesis that mobile phone use can cause brain tumors in adults.”).

[50] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

[51] Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

[52] Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

[53] 706 F. Supp. 358, 373 (E.D. Pa. 1988).

[54] In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).

[55] SeeThe Shmeta-Analysis in Paoli,” (July 11, 2019).

[56] In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).

[57] 52 F.3d 1124 (2d Cir. 1995).

[58] Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

[59] See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia CasesJurimetrics J. (2011).

[60] See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

[61] See Finkelstein & Levin, supra at note 59. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies).

[62] RMSE 3d at 254 n.107.

[63] Id. at 289.

[64] Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).

[65] Id. at 579; see also id. at 607 n. 171.

[66] Id. at 607.

[67] Id. at 607 n.177.

[68] Id. at 608.

[69] RMSE 3d at 722-23.

[70] Id. at 723 n.143 (“143. … Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”).

[71] Id. at 723-24.

State-of-the-Art Legal Defenses and Shifty Paradigms

October 16th, 2021

The essence of a failure-to-warn claim is that (1) a manufacturer knows, or should know, about a harmful aspect of its product, (2) which knowledge is not appreciated by customers, (3) the manufacturer fails to warn adequately of this known harm, and (4) the manufacturer’s failure to warn causes the plaintiff to sustain the particular harm of which the manufacturer had knowledge, actual or constructive.

There are myriad problems with the assessing the knowledge component in failure-to-warn claims. Some formulations impute to manufacturers the knowledge of an expert in the field. First, which expert’s claim to knowledge counts for or against the existence of a duty? The typical formulation begs the question which expert’s understanding will control when experts in the field disagree. Second, and equally problematic, knowledge has a temporal aspect. There are causal relationships we “know” today, which we did not know in times past. This temporal component becomes even more refractory for failure-to-warn claims results when the epistemic criteria for claims of knowledge change over time.

In the early 20th century, infectious disease epidemiology, with its reliance upon Koch’s postulates. dominated the model of causation used in public and scientific discourse. The very nature of Koch’s postulates made the identification of a specific pathogen necessary to the causation of a specific disease. Later in the first half of the 20th century, epidemiologists and clinicians came to realize that the specific pathogen may be necessary but not sufficient for inducing a particular infectious disease. Still there was some comfort in having causal associations predicated upon necessary relationships. Clinicians and clinical scientists did not have to worry too much about probability theory or statistics.

The development of causal models in which the putative cause was neither necessary nor sufficient for bringing about the outcome of interest was a substantial shock to the system. In the absence of a one-to-one specificity, scientists had to account for confounding variables, in ways that they had not done so previously. The implications for legal state-of-the-art defenses could not be more profound. In the first half of the 20th century, case reports and series were frequently seen as adequate for suggesting and establishing causal relationships. By the end of the 1940s, scientists were well aware of the methodological inappropriateness of relying upon case reports and series, and the need for analytical epidemiologic studies to support causal claims.

Several historians of science have addressed the changing causal paradigm, which ultimately would permit and even encourage scientists to identify causal associations, even when the exposures studied were neither necessary nor sufficient to bring about the end point of interest. In 2011, Mark Parascandola, while he was an epidemiologist in the National Cancer Institute’s Tobacco Control Research Branch, wrote an important history of this paradigm shift and its implications in epidemiology.[1] His paper should be required reading for all lawyers who work on “long-tail” litigation, involving claims that risks were known to manufacturers even before World War II.

In Parascandola’s history, epidemiology and clinical science focused largely on infectious diseases in the early 20th century, and as a result, causal association was seen through the lens of Koch’s postulates with its implied model of necessary and sufficient conditions for causal attribution. Not until after World War II did “risk factor” epidemiology emerge to address the causal role of exposures – such as tobacco smoking – that were neither necessary nor sufficient for causing an outcome of interest.[2]

The shift from infectious to chronic diseases, such as cancer and cardiovascular disease, occurred in the 1950s, and brought with it, acceptance of a different concepts of causation, which involved stochastic events, indeterminism, multi-factorial contributions, and confounding of observations by independent but correlated causes. The causal criteria for infectious disease were generally unhelpful in supporting causal claims of chronic diseases.

Parascandola characterizes the paradigm shift as a “radical change,” influenced by developments in statistics, quantum mechanics, and causal theory.[3] Edward Cuyler Hammond, an epidemiologist with the American Cancer Society, for example, wrote in 1955, that:

“[t]he cause of an effect has sometimes been defined as a single factor which antecedes, which is necessary, and which is sufficient to produce the effect. Clearly this definition is inadequate for the study of biologic phenomena, which are produced by a complex pattern of environmental conditions interacting with the highly complex and variable make-up of living organisms.”[4]

The shift in causal models within epidemiologic thinking and research introduced new complexity with important practical implications. Gone was the one-to-one connection between pathogens (or pathogenic exposures) and specific diseases. Specificity was an important victim of the new model of causation. Causal models had to account for multi-factorial contributions to disease.[5] Confounding, the correlation between exposures of interest and other exposures that were truly driving the observations, became a substantial threat to validity. The discerning lens of analytical epidemiology was able to identify tobacco smoking as a cause of lung cancer only because of the large increased risks, ten-fold and greater, observed in multiple studies. There were no competing but independent risks of that magnitude, at hand, which could eliminate or reverse the observed tobacco risks.

Parascandola notes that in the 1950s, the criteria for causal assessment were in flux and the subject of debate:

“Previous informal rules or guides for inference, such as Koch’s postulates, were not adequate to identify partial causes of chronic disease based on a combination of epidemiologic and laboratory evidence.”[6]

As noted above, the legal implications of Parascandola’s historical analysis are hugely important.  Scientists and statisticians were scrambling to develop appropriate methodologies to accommodate the changed causal models and causal criteria. Mistakes were made along the way as the models and criteria changed. In Sir Richard Doll’s famous 1955 study of lung cancer among asbestos factory workers, the statistical methods were surprisingly primitive to modern epidemiology. Even more stunning was that Sir Richard failed to incorporate smoking histories and accounting for confounding from smoking before reaching a conclusion that lung cancer was associated with long-term asbestos factory work that had induced asbestosis.[7]

Not until the lae 1950s and early 1960s did statisticians develop multivariate models to help assess potential confounding.[8] Perhaps the most cited paper in epidemiology was published by Nathan Mantel (the pride of the Brooklyn Hebrew Orphan Asylum) and William Haenszel in 1959. Its approach to stratification of sample analyses was further elaborated upon by the authors and others all through the 1960s and into the 1970s.[9]

Similarly, the evolution of criteria for causal attribution based upon risk factor epidemiology required decades of discussion and debate. Reasonably well defined criteria did not emerge until the mid-1960s, with the famous Public Health Service report on smoking and lung cancer,[10] and Sir Austin Bradford Hill’s famous after-dinner talk to the Royal Society of Medicine.[11]

Several years before Parascandola published his historical analysis, three historians of science published a paper with a very similar thesis.[12] These authors noted that there was, indeed, a legitimate controversy over whether tobacco smoking caused lung cancer, in the 1950s early 1960s, as the mechanistic Koch’s postulates gave way to the statistical methods of risk-factor epidemiology. The historians’ paper observed that by the 1950s, infectious diseases such as tuberculosis were in retreat, and the public health community’s focus was on chronic diseases such as lung cancer. The lung cancer controversy of the 1950s pushed scientists to revise their conceptions of causation ,[13] and ultimately led to the strengthening of, and legitimizing, the field of epidemiology.[14] The growing acceptance of epidemiologic methods for identifying causes, neither necessary nor sufficient, pushed aside the attachment to Koch’s postulates and the skepticism over statistical reasoning.

Interestingly, this historians’ paper was funded completely by the Rollins Public Health of Emory University. Two of the authors had been sought out by a recruiting agency for the tobacco industry, but fell out with the agency and the tobacco companies when they realized that they could not support the litigation goals. In a footnote, the authors emphasized that their factual analysis and argument contradicted the industry’s desired defense.[15]

Reaching back even farther in time, there is the redoubtable Irving John Selikoff, who wrote in 1991:

“We are inevitably bound by the knowledge of the time in which we live. An example may be given. During the 1930s and 194Os, random instances of lung cancer occurring among workers exposed to asbestos were reported and attention was called to these by the collection of cases both in registers and in review papers. With the continued growth of the asbestos industry, it was deemed wise to epidemiologically examine the proposed association. This was done in an elegant, innovative, well-considered study by Richard Doll, a study which any one of us would have been proud to report in 1955.”[16]

What is ironic is that Dr. Selikoff had testified for plaintiffs’ counsel as an expert witness specifically on state of the art, or the question of when defendants should have known and warned that asbestos caused lung cancer.[17] Dr. Selikoff ultimately withdrew from testifying, in large part because his views on this matter were not particularly helpful to plaintiffs.

The shift in causal criteria, and rejection of case reports and case series, can be seen in the suggestion, in the 1930s, of a few pathologists who contended that silicosis caused lung cancer. The few scientists who made this causal claim relied upon heavily upon anecdotal and uncontrolled necropsy series.[18]

After World War II, these causal claims fell into disrepute as not properly supported by valid scientific methodology. Dr. Madge Thurlow Macklin, a female pioneer in clinical medicine and epidemiology,[19] and one the early adopters of statistical methodology in her work, debunked the causal claims:

“If silicosis is being considered as a causative agent in lung cancer, the control group should be as nearly like the experimental or observed group as possible in sex, age distribution, race, facilities for diagnosis, other possible carcinogenic factors, etc. The only point in which the control group should differ in an ideal study would be that they were not exposed to free silica, whereas the experimental group was. The incidence of lung cancer could then be compared in the two groups of patients.

This necessity is often ignored; and a ‘random’ control group is obtained for comparison on the assumption that any group taken at random is a good group for comparison. Fallacious results based on such studies are discussed briefly.”[20]

Macklin’s advice sounds like standard-operating procedure today, but in the 1940s, it was viewed as radical and wrong by many physicians and clinical scientists.

Of course, the change over time in the knowledge of, and techniques for, diagnostic methods, quantitative measurements, and disease definitions also affect litigated issues. The change in epistemic standards and causal criteria, however, fundamentally changed legal standards for tort liability. The shift from deterministic models of necessary and sufficient causation to risk factor causation had, and continues to have, enormous ramifications for the legal adjudication of questions concerning when companies, held to the knowledge of an expert in the field, should have started to warn about the risks created by their products. Mind the gap!


[1] Mark Parascandola, “The epidemiologic transition and changing concepts of causation and causal inference,” 64 Revue d’histoire des sciences 243 (2011).

[2] Id. at 245.

[3] Id. at 248.

[4] Id. at 252, citing Edward Cuyler Hammond, “Cause and Effect,” in Ernest L. Wynder, ed., The Biologic Effects of Tobacco (1955).

[5] Id. at 257.

[6] Id.

[7] Richard Doll, “Mortality from Lung Cancer in Asbestos Workers,” 12 Brit. J. Indus. Med. 81 (1955).

[8] See Parascandola at 258.

[9] Nathan Mantel & William Haenszel, “Statistical aspects of the analysis of data from retrospective studies of disease,” 22 J. Nat’l Cancer Instit. 19 (1959). See Mervyn Susser, “Epidemiology in the United States after World War II: The Evolution of Technique,” 7 Epidemiology Reviews 147 (1985).

[10] Surgeon General, Smoking and health : Report of the Advisory Committee to the surgeon general of the Public Health Service, PHS publication No. 1103 (1964).

[11] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[12] Colin Talley, Howard I. Kushner & Claire E. Sterk, “Lung Cancer, Chronic Disease Epidemiology, and Medicine, 1948-1964,” 59 J. History Med. & Allied Sciences 329 (2004) [Talley]. Parascandola appeared not to have been aware of this article; at least he did not cite it.

[13] Id. at 374.

[14] Id. at 334.

[15] Id. at 329.

[16] Irving John Selikoff, “Statistical Compassion,” 44 J. Clin. Epidemiol. 141S, 142S (1991) (internal citations omitted) (emphasis added).

[17]Selikoff and the Mystery of the Disappearing Testimony,” (Dec. 3, 2010). See also Peter W.J. Bartrip, “Irving John Selikoff and the Strange Case of the Missing Medical Degrees,” 58 J. History Med. 3, 27 & n.88-92 (2003) (quoting insulator union President Andrew Haas, as saying “[w]e all owe a great debt of thanks for often and expert testimony on behalf of our members … .” Andrew Haas, Comments from the General President, 18 Asbestos Worker (Nov. 1972)).

[18] See, e.g., Max O. Klotz, “The Association Silicosis & Carcinoma of Lung 1939,” 35 Cancer Research 38 (1939); C.S. Anderson & J. Heney Dible, “Silicosis and carcinoma of the lung,” 38 J. Hygiene 185 (1938).

[19] Barry Mehler, “Madge Thurlow Macklin,” from Barbara Sicherman and Carl Hurd Green, eds., Notable American Women: The Modern Period 451-52 (1980); Laura Lynn WindsorWomen in Medicine: An Encyclopedia 134 (2002).

[20] Madge Thurlow Macklin, “Pitfalls in Dealing with Cancer Statistics, Especially as Related to Cancer of the Lung,” 14 Diseases Chest 525 532-33, 529-30 (1948). See alsoHistory of Silica Litigation – the Lung Cancer Angle,” (Feb. 3, 2019); “The Unreasonable Success of Asbestos Litigation,” (July 25, 2015); “Careless Scholarship about Silica History,” (July 21, 2014) (discussing David Egilman); “Silicosis, Lung Cancer, and Evidence-Based Medicine in North America,” (July 4, 2014).

Rule 702 is Liberal, Not Libertine; Epistemic, Not Mechanical

October 4th, 2021

One common criticism of expert witness gatekeeping after the Supreme Court’s Daubert decision has been that the decision contravenes the claimed “liberal thrust” of the Federal Rules of Evidence. The criticism has been repeated so often as to become a cliché, but its frequent repetition by lawyers and law professors hardly makes it true. The criticism fails to do justice to the range of interpretations of “liberal” in the English language, the context of expert witness common law, and the language of Rule 702, both before and after the Supreme Court’s Daubert decision.

The first problem with the criticism is that the word “liberal,” or the phrase “liberal thrust,” does not appear in the Federal Rules of Evidence. The drafters of the Rules did, however, set out the underlying purpose of the federal codification of common law evidence in Rule 102, with some care:

“These rules should be construed so as to administer every proceeding fairly, eliminate unjustifiable expense and delay, and promote the development of evidence law, to the end of ascertaining the truth and securing a just determination.”

Nothing could promote ascertaining truth and achieving just determinations more than eliminating weak and invalid scientific inference in the form of expert witness opinion testimony. Barring speculative, unsubstantiated, and invalid opinion testimony before trial certainly has the tendency to eliminate full trials, with their expense and delay. And few people would claim unfairness in deleting invalid opinions from litigation. If there is any “liberal thrust” in the purpose of the Federal Rules of Evidence, it serves to advance the truth-finding function of trials.

And yet some legal commentators go so far as to claim that Daubert was wrongly decided because it offends the “liberal thrust” of federal rules.[1] Of course, it is true that the Supreme Court spoke of basic standard of relevance in the Federal Rules as being a “liberal” standard.[2] And in holding that Rule 702 did not incorporate the so-called Frye general acceptance rule,[3] the Daubert Court observed that drafting history of Rule 702 failed to mention Frye, just before invoking liberal-thrust cliché:

“rigid ‘general acceptance’ requirement would be at odds with the ‘liberal thrust’ of the Federal Rules and their ‘general approach of relaxing the traditional barriers to ‘opinion testimony’.”[4]

The court went on to cite one district court judge famously hostile to expert witness gatekeeping,[5] and to the “permissive backdrop” of the Rules, in holding that the Rules did not incorporate Frye,[6] which it characterized as an “austere” standard.[7]

While the Frye standard may have been “austere,” it was also widely criticized. It was also true that the Frye standard was largely applied to scientific devices and not to the scientific process of causal inference. The Frye case itself addressed the admissibility of a systolic blood pressure deception test, an early attempt by William Marston to design a lasso of truth. When courts distinguished the Frye cases on grounds that they involved devices, not causal inferences, they left no meaningful standard in place.

As a procedural matter, the Frye general acceptance standard made little sense in the context of causal opinions. If the opinion itself was generally accepted, then of course it would have to be admitted. Indeed, if the proponent sought judicial notice of the opinion, a trial court would likely have to admit the opinion, and then bar any contrary opinion as not generally accepted.

To be sure, before the Daubert decision, defense counsel attempted to invoke the Frye standard in challenges to the underlying methodology used by expert witnesses to draw causal inferences. There were, however, few such applications. Although not exactly how Frye operated, the Supreme Court might have imagined that the Frye standard required all expert witness opinion testimony to be based on “sufficiently established and accepted scientific methods. The actual language of the 1923 Frye case provides some ambivalent support with its twilight zone standard:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well-recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”[8]

There was always an interpretative difficulty in how exactly a trial court was supposed to poll the world’s scientific community to ascertain “general acceptance.” Moreover, the rule actually before the Daubert Court, Rule 702, spoke of “knowledge.” At best, “general acceptance,” whether of methods or of conclusions, was merely a proxy, and often a very inaccurate one for an epistemic basis for disputed claims or conclusions at issue in litigation.

In cases involving causal claims before Daubert, expert witness opinions received scant attention from trial judges as long as the proffered expert witness met the very minimal standard of expertise needed to qualify to give an opinion. Furthermore, Rule 705 relieved expert witnesses of having to provide any bases for their opinions on direct examination. The upshot was that the standard for admissibility was authoritarian, not epistemic. If the proffered witness had a reasonable pretense to expertise, then the proffering party could parade him or her as an “authority,” on whose opinion the jury could choose to rely in its fact finding. Given this context, any epistemic standard would be “liberal” in freeing the jury or fact finder from the yoke of authoritarian expert witness ipse dixit.

And what exactly is the “liberal” in all this thrusting over Rule 702? Most dictionaries report that the word “liberal” traces back to the Latin liber, meaning “free.” The Latin word is thus the root of both liberty and libertine. One of the major, early uses of the adjective liberal was in the phrase “liberal arts,” meant to denote courses of study freed from authority, dogmas, and religious doctrine. The primary definition provided by the Oxford English Dictionary emphasizes this specific meaning:

“1. Originally, the distinctive epithet of those ‘arts’ or ‘sciences’ (see art 7) that were considered ‘worthy of a free man’; opposed to servile or mechanical.  … . Freq. in liberal arts.”

The Frye general acceptance standard was servile in the sense of its deference to others who were the acceptors, and it was mechanical in its reducing a rule that called for “knowledge” into a formula for nose-counting among the entire field in which an expert witness was testifying. In this light, the Daubert Court’s decision is obvious.

To be sure, the OED provides other subordinate or secondary definitions for “liberal,” such 3c:

Of construction or interpretation: Inclining to laxity or indulgence; not rigorous.”

Perhaps this definition would suggest that a liberal interpretation of Rule 702 would lead to reject the Frye standard because it was rigorous in determining admissibility on a rigid proxy determination that was not necessarily tied to the rule’s requirement of knowledge. Of course, knowledge or epistemic criteria in the Rule imply a different sort of rigor, one that is not servile or mechanical.

The epistemic criterion built into the original Rule 702, and carried forward in every amendment, accords with the secondary meanings given by the OED:

4. a. Free from narrow prejudice; open-minded, candid.

  1. esp. Free from bigotry or unreasonable prejudice in favour of traditional opinions or established institutions; open to the reception of new ideas or proposals of reform.”

The Daubert case represented a step in direction of the classically liberal goal of advancing the truth-finding function of trials. The counter-revolution of let it all in, under the guise of finding challenges to expert witness opinion as going to “weight not admissibility,” or to inventing “presumptions of admissibility” should be seen for what they are: retrograde and illiberal movements in jurisprudential progress.


[1] See, e.g., Michael H. Graham, “The Expert Witness, Predicament: Determining ‘Reliable’ Under the Gatekeeping Test of Daubert, Kumho, and Proposed Amended Rule 702 of the Federal Rules of Evidence,” 54 U. Miami L. Rev. 317, 321 (2000) (“Daubert is a very incomplete case, if not a very bad decision. It did not, in any way, accomplish what it was meant to, i.e., encourage more liberal admissibility of expert witness evidence.”)

[2] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579,587 (1993).

[3] Frye v. United States, 293 F. 1013 (D.C. Cir. 1923).

[4] Id. at 588, citing Beech Aircraft Corp. v. Rainey, 488 U. S. 153, 169 (citing Rules 701 to 705); see also Edward J. Imwinkelried, “A Brief Defense of the Supreme Court’s Approach to the Interpretation of the Federal Rules of Evidence,” Indiana L. Rev. 267, 294 (1993)(writing of the “liberal structural design” of the Federal Rules).

[5] Jack B. Weinstein, “Rule 702 of the Federal Rules of Evidence is Sound; It Should Not Be Amended,” 138 F. R. D. 631 (1991) (“The Rules were designed to depend primarily upon lawyer-adversaries and sensible triers of fact to evaluate conflicts”).

[6] Daubert at 589.

[7] Id.

[8] Frye v. United States, 54 App. D.C. 46, 293 F. 1013 (1923).

Expert Witness Reports Are Not Admissible

August 23rd, 2021

The tradition of antic proposals to change the law of evidence is old and venerable in the common law. In the early 19th century, Jeremy Bentham deviled the English bench and bar with sweeping proposals to place evidence law on a rationale foundation. Bentham’s contributions to his contributions to jurisprudence, like his utilitarianism, often ignored the realities of human experience and decision making. Although Bentham contributed little to the actual workings of courtroom law and procedure, he gave rise to a tradition of antic proposals that have long entertained law professors and philosophers.[1]

Bentham seemingly abhorred tradition, but his writings have given rise to a tradition of antic proposals in the law. Expert witness testimony was uncommon in the early 19th century, but today, hardly a case is tried without expert witnesses. We should not be surprised, therefore, by the rise of antic proposals for reforming the evidence law of expert witness opinion testimony.[2]

A key aspect of the Bentham tradition is ignore the actual experience and conduct of human affairs. And so now we have a proposal to shorten trials by foregoing direct examination of expert witnesses, and admitting the expert witnesses’ reports into evidence.[3] The argument contends that since the Rule 26 report requires disclosure of all the expert witnesses’ substantive opinions and all bases for their opinions, the witnesses’ viva voce testimony is merely a recital of the report. The argument proceeds that reports can be helpful in understanding complex issues and in moving trials along more efficiently.

As much as all lawyers want to promote “understanding,” and make trials more efficient, the argument fails on multiple levels. First, judges can read the expert witness reports, in bench or in jury trials, to help themselves prepare for trial, without admitting the reports into evidence. Second, the rules of evidence, which are binding upon trial judges in both bench and jury trials, require that the testimony be helpful, not the reports. Third, the argument ignores that for the last several years, the federal rules have allowed lawyers to draft reports to a large extent, without any discovery into whose phraseology appears in a final report.

Even before the federal rules created an immunity to discovery into who drafted specific language of an expert report, it was not uncommon to find that there at least some parts of an expert witness’s report that did not accurately summarize the witness’s views at the time he or she gave testimony. Often the process of discovery caused expert witnesses to modify their reports, whether through skillful inquiry at deposition, or through the submission of adversarial reports, or through changes in the evidentiary display between drafting the report and testifying at trial.

In other words, expert witnesses’ testimony rarely comes out exactly as it appears in words in Rule 26 reports. Furthermore, reports may be full of argumentative characterization of facts, which fail to survive routine objections and cross-examination. What is represented as a fact or a factual predicate of an opinion may never be cited in testimony because the expert’s representation was always false or hyperbolic. The expert witnesses are typically not percipient witnesses, and any alleged fact would not be admissible, under Rule 703, simply because it appeared in an expert witness’s report. Indeed, Rule 703 makes clear that expert witnesses can rely upon inadmissible hearsay as long as experts in their fields reasonably would do so in the ordinary course of their professions.

Voir dire of charts, graphs, and underlying data may result in large portions of an expert report becoming inadmissible. Not every objection will be submitted as a motion in limine; and not every objection rises to the level of a Rule 702 or 703 pre-trial motion to exclude the expert witness. Foundational lapses or gaps may render some parts of reports to be inadmissible.

The argument for admitting reports as evidence reflects a trend toward blowsy, frowsy jurisprudence. Judges should be listening carefully to testimony, both direct and cross, from expert witnesses. They will have transcripts at their disposal. Although the question and answer format of direct examination may take some time, it ensures the orderly presentation of admissible testimony.

Given that testimony often turns out differently from the unqualified statements in a pre-trial report, the proposed admissibility of reports will create evidentiary chaos when there a disparity between report and testimony, or there is a failure to elicit as testimony something that is stated in the report. Courts and litigants need an unequivocal record of what is in evidence when moving for striking testimony, or for directed verdicts, new trials, or judgments notwithstanding the verdict.

The proposed abridgement of expert witness direct examinations would allow further gaming by not calling an expert witness once the witness’s report has been filed. Expert witnesses may conveniently become unavailable, after their reports have been admitted into evidence.

In multi-district litigations, the course of litigation may take years and even decades. Reports filed early on may not reflect current views or the current state of the science. Deeming filed reports “admissible” could have a significant potential to subvert accurate fact finding.

In Ake v. General Motors Corp.[4], Chief Judge Larimer faced a plaintiff who sought to offer in evidence a report written by plaintiffs’ expert witness, who was scheduled to testify at trial. The trial court held, however, that the report was inadmissible hearsay, for which no exception was available.[5] The report at issue was not a business record, which might be admissible under Rule 803(6), in that it did not record events made at or near the event at issue, and the event did not involve the expert witness’s regularly conducted business activity.

There are plenty of areas of the law in which reforms are helpful and necessary. The formality of presenting an expert witness’s actual opinions, under oath, in open court, subject to objections and challenges, needs no abridgement.


[1] See, e.g., William Twining, “Bentham’s Theory of Evidence: Setting a Context,” 20 J. Bentham Studies 18 (2019); Kenneth M. Ehrenberg, “Less Evidence, Better Knowledge,” 2 McGill L.J. 173 (2015); Laird C. Kirkpatrick, “Scholarly and Institutional Challenges to the Law of Evidence: From Bentham to the ADR Movement,” 25 Loyola L.A. L. Rev. 837 (1992); Frederick N. Judson, “A Modern View of the Law Reforms of Jeremy Bentham,” 10 Columbia L. Rev. 41 (1910).

[2] SeeExpert Witness Mining – Antic Proposals for Reform” (Nov. 4, 2014).

[3] Roger J. Marzulla, “Expert Reports: Objectionable Hearsay or Admissible Evidence in a Bench Trial?” A.B.A.(May 17, 2021).

[4] 942 F.Supp. 869 (W.D.N.Y. 1996).

[5] Ake v. General Motors Corp., 942 F.Supp. 869, 877 (W.D.N.Y. 1996).

Crying Wolf Projected

August 10th, 2021

Over the years ago, I have written about David Rosner and Gerald Markowitz, two academic historians, who testify a lot for the lawsuit industry, mostly in asbestos cases, but also in cases involving exposures to lead, silica, and vinyl chloride. Rosner and fellow-traveller Markowitz, or Rosnowitz for short, are fond of telling two stories: (1) how some suspect organization tried to recruit them to testify for hire for defendants in litigation, and (2) how I had the audacity to criticize their suspect historical scholarship about silica, silicosis, and silica litigation.[1]

I was shocked (really) to find that Rosner and Markowitz were at the center of recruiting historians for hire to write attacks on opponents of their socialist ideology, but both historians sit, or have sat, on the Project Advisory Board of the Cry Wolf Project. Back in 2010, this “project” was engaged in hiring historians to write white papers (or should they be “rainbow papers”) to stop or discredit “progressive policy” options.[2] Imagine that: historians for hire by the Left.

Lest you think that the Cry Wolf Project is some innocent group of social justice warriors, you should know that the project has a Nixonian or Stalinist (take your pick) enemies list of “culprits,” including:

Academics
American Medical Association
American Petroleum Institute
American Textile Manufacturers Institute
Business Roundtable
Chamber of Commerce
Conservative media
Democrats
Energy Industry
Financial Institutions
Food Industry
Mainstream media
National Association of Manufacturers (NAM)
National Federation of Independent Business (NFIB)
National Grain and Feed Association
Republicans
Think tanks

No surprise, but the Crying Wolf Project is the darling of socialist academicians. Jake Blumgart, a researcher for the Cry Wolf Project, attempted to explain:

“Progressives need to construct a counter-narrative that demonstrates that in many cases these claims [of conservatives] have been, and continue to be, grossly exaggerated. The Cry Wolf Project’s wants media, opinion leaders, and policy makers to respond ‘There they go again!’ when industry ‘cries wolf.’ Such a refrain will undermine the credibility and arguments of organizations.”[3]

Ah, attacking the messenger; manufacturing doubt; and projecting bad motives and psychological weaknesses upon opponents. Almost full-bore Trumpism. In our current tribalist politics, the extent to which both sides impute their own motives to other tribes is fascinating.

And who is this “Talking Union,” for which Jake Blumgart writes? According to its website, Talking Union is:

“a project of the labor network of Democratic Socialists of America. We will report on the activities and views of DSA and Young Democratic Socialists of America labor activists. We seek to be a place for a broad range of labor activists to discuss ideas for the renewal and strengthening of the labor movement.”

And in this daisy-chain of institutional affiliations, who are the “Democratic Socialists of America”? With thanks to Al Gore for having the invented the internet, we can find an answer quickly. The Democratic Socialists of America is an organization, indeed, it is:

“the largest socialist organization in the United States, with over 92,000 members and chapters in all 50 states. We believe that working people should run both the economy and society democratically to meet human needs, not to make profits for a few.

We are a political and activist organization, not a party; through campus and community-based chapters, DSA members use a variety of tactics, from legislative to direct action, to fight for reforms that empower working people.

The Democratic Socialists of America is the largest socialist organization in the United States because we’re a member-driven mass organization. We believe that working people should run both the economy and civil society, and we show our commitment to this principle by being an organization of, by, and for the working class.”

I have quoted at length from the Democratic Socialists’ website to make clear that this is not an organization that simply a group of “progressives”; they are activists who are engaged in what they conceive of as class warfare. In their own words, they would limit democracy to those people who fit their definition of working people, and that the interests of the “working class” are paramount. At times, there may be only a thin line between trying to tame the excesses of capitalism, such as employer’s failures to protect workers, and outright communism. The Democratic Socialists are quite open about what side of the line they occupy. The apparent commitment to democracy appears to be a sham; not everyone is entitled to run the economy and society, only “working people” are.

There is no democracy in the worldview of the “Democratic Socialists”; the line between its stated goals and those of Marxism is imaginary.[4] Just as Trump has a man crush on Putin, socialist George Bernard Shaw had one on Stalin,[5] Kulaks be damned.

From the Crying Wolf Project, with its counter-narratives, we have traced the ideology to the Talking Union, to the Democratic Socialists of America, to Marxism.

Well, I have had friends who were Marxists, and I would not advocate that Marxists should be kept from teaching in universities, or that Marxists should not enjoy the same freedom of speech and association that we all enjoy. Marxists, however, have an ideological commitment to historical materialism, by which everything can be, and must be, explained by class conflict. Given these commitments, can Marxist historians testify in litigation that involves what they perceive to be class interests and an opportunity to “empower” working class claimants? It would seem that positional commitments to the interests of the “work class” create conscious and unconscious biases when exploring historical issues that touch on labor-management issues.

Lawyers are accustomed to, and know how to exploit, bias that results from money, institutional loyalties, and friendships.[6] And yet, there are real conflicts of interest generated by scientists’ affiliations with advocacy groups, labor unions, or the lawsuit industry, not to mention their deeply held political commitments.[7] The ideological commitments revealed by the writings of the website sponsored by the Democratic Socialists of America should raise questions about expert witnesses who have deep ties to the group.

Historians would seem particularly vulnerable to biased assessment of whether knowledge of hazards was shared by industry and labor, as well as their respective industrial hygiene advisors, governmental actors, academia, and the medical community. Nonetheless, the case books are notably absent of precedents about discovery into political commitments, whereas the cases about discovery of fees, income, and percentages of defense versus plaintiffs’ work are legion.


[1]Succès de scandale – With Thanks to Rosner & Markowitz” (Mar. 26, 2017). See D. Rosner & G. Markowitz, “The Trials and Tribulations of Two Historians: Adjudicating Responsibility for Pollution and Personal Harm, 53 Medical History 271, 280-81 (2009); D. Rosner & G. Markowitz, “L’histoire au prétoire. Deux historiens dans les procès des maladies professionnelles et environnementales,” 56 Revue D’Histoire Moderne & Contemporaine 227, 238-39 (2009); David Rosner, “Trials and Tribulations:  What Happens When Historians Enter the Courtroom,” 72 Law & Contemporary Problems 137, 152 (2009); David Rosner & Gerald Markowitz, “The Historians of Industry” Academe (Nov. 2010).

[2]Counter Narratives for Hire” (Dec. 13, 2010). Other members of the Project Advisory Board include Robert Kuttner (co-founder & co-editor, American Prospect), Alice O’Connor (Univ. California, Santa Barbara), Janice Fine (Rutgers Univ.), Andrea M. Hricko (Southern California Envt’l Health Sciences Center), Jennifer Klein (Yale Univ.), Meg Jacobs, (Mass. Instit. Tech.), William Forbath (Univ. Texas Law School), Tom Sugrue (Univ. Pennsylvania), and Lizabeth Cohen (Harvard Univ.).

[3] Jake Blumgart, “Introducing The Cry Wolf Project,” Talking Union (June 17, 2011).

[4] Staff, “Academia’s latest propaganda factory, the ‘Cry Wolf’ project,” San Francisco Examiner (June 11, 2010).

[5] Fintan O’Toole, “Why George Bernard Shaw Had a Crush on Stalin,” N.Y. Times (Sept. 11, 1017).

[6] Sahana Pal, “Establishing Bias in an Expert Witness: The What, Why and How,” 14 Internat’l Commentary on Evid. 43 (2016); Anthony F. Della Pelle & Richard P. De Angelis, Jr., “Proving Positional Bias: How much discovery should be permitted of an expert witness’s financial interests?A.B.A. Litigation Comm. (April 20, 2011); Michael H. Graham, “Impeaching the Professional Expert Witness by a Showing of Financial Interest,” 53 Indiana L. J. 35 (1977).

[7]Can Expert Bias and Prejudice Disqualify a Witness From Testifying?” (Oct. 11, 2014).

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.