TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Hazard of Composite End Points – More Lumpenepidemiology in the Courts

October 20th, 2018

One of the challenges of epidemiologic research is selecting the right outcome of interest to study. What seems like a simple and obvious choice can often be the most complicated aspect of the design of clinical trials or studies.1 Lurking in this choice of end point is a particular threat to validity in the use of composite end points, when the real outcome of interest is one constituent among multiple end points aggregated into the composite. There may, for instance, be strong evidence in favor of one of the constituents of the composite, but using the composite end point results to support a causal claim for a different constituent begs the question that needs to be answered, whether in science or in law.

The dangers of extrapolating from one disease outcome to another is well-recognized in the medical literature. Remarkably, however, the problem received no meaningful discussion in the Reference Manual on Scientific Evidence (3d ed. 2011). The handbook designed to help judges decide threshold issues of admissibility of expert witness opinion testimony discusses the extrapolation from sample to population, from in vitro to in vivo, from one species to another, from high to low dose, and from long to short duration of exposure. The Manual, however, has no discussion of “lumping,” or on the appropriate (and inappropriate) use of composite or combined end points.

Composite End Points

Composite end points are typically defined, perhaps circularly, as a single group of health outcomes, which group is made up of constituent or single end points. Curtis Meinert defined a composite outcome as “an event that is considered to have occurred if any of several different events or outcomes is observed.”2 Similarly, Montori defined composite end points as “outcomes that capture the number of patients experiencing one or more of several adverse events.”3 Composite end points are also sometimes referred to as combined or aggregate end points.

Many composite end points are clearly defined for a clinical trial, and the component end points are specified. In some instances, the composite nature of an outcome may be subtle or be glossed over by the study’s authors. In the realm of cardiovascular studies, for example, investigators may look at stroke as a single endpoint, without acknowledging that there are important clinical and pathophysiological differences between ischemic strokes and hemorrhagic strokes (intracerebral or subarachnoid). The Fletchers’ textbook4 on clinical epidemiology gives the example:

In a study of cardiovascular disease, for example, the primary outcomes might be the occurrence of either fatal coronary heart disease or non-fatal myocardial infarction. Composite outcomes are often used when the individual elements share a common cause and treatment. Because they comprise more outcome events than the component outcomes alone, they are more likely to show a statistical effect.”

Utility of Composite End Points

The quest for statistical “power” is often cited as a basis for using composite end points. Reduction in the number of “events,” such as myocardial infarction (MI), through improvements in medical care has led to decreased rates of MI in studies and clinical trials. These low event rates have caused power issues for clinical trialists, who have responded by turning to composite end points to capture more events. Composite end points permit smaller sample sizes and shorter follow-up times, without sacrificing power, the ability to detect a statistically significant increased rate of a prespecified size and Type I error. Increasing study power, while reducing sample size or observation time, is perhaps the most frequently cited rationale for using composite end points.

Competing Risks

Another reason sometimes offered in support of using composite end points is composites provide a strategy to avoid the problem of competing risks.5 Death (any cause) is sometimes added to a distinct clinical morbidity because patients who are taken out of the trial by death are “unavailable” to experience the morbidity outcome.

Multiple Testing

By aggregating several individual end points into a single pre-specified outcome, trialists can avoid corrections for multiple testing. Trials that seek data on multiple outcomes, or on multiple subgroups, inevitably raise concerns about the appropriate choice of the measure for the statistical test (alpha) to determine whether to reject the null hypothesis. According to some authors, “[c]omposite endpoints alleviate multiplicity concerns”:

If designated a priori as the primary outcome, the composite obviates the multiple comparisons associated with testing of the separate components. Moreover, composite outcomes usually lead to high event rates thereby increasing power or reducing sample size requirements. Not surprisingly, investigators frequently use composite endpoints.”6

Other authors have similarly acknowledged that the need to avoid false positive results from multiple testing is an important rationale for composite end points:

Because the likelihood of observing a statistically significant result by chance alone increases with the number of tests, it is important to restrict the number of tests undertaken and limit the type 1 error to preserve the overall error rate for the trial.”7

Indecision about an Appropriate Single Outcome

The International Conference on Harmonization suggests that the inability to select a single outcome variable may lead to the adoption of a composite outcome:

If a single primary variable cannot be selected …, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable.”8

The “indecision” rationale has also been criticized as “generally not a good reason to use a composite end point.”9

Validity of Composite End Points

The validity of composite end points depends upon methodological assumptions, which will have to be made at the time of the study design and protocol creation. After the data are collected and analyzed, the assumptions may or may not be supported. Among the supporting assumptions about the validity of using composites are:10

  • similarity in patient importance for included component end points,

  • similarity of association size of the components, and

  • number of events across the components.

The use of composite end points can sometimes be appropriate in the “first look” at a class of diseases or disorders, with the understanding that further research will sort out and refine the associated end point. Research into the causes of human birth defects, for instance, often starts out with a look at “all major malformations,” before focusing in on specific organ and tissue systems. To some extent, the legal system, in its gatekeeping function, has recognized the dangers and invalidity of lumping in the epidemiology of birth defects.11 The Frischhertz decision, for instance, clearly acknowledged that given the clear evidence that different birth defects arise at different times, based upon interference with different embryological processes, “lumping” of end points was methodologically inappropriate. 2012 U.S. Dist. LEXIS 181507, at *8 (citing Chamber v. Exxon Corp., 81 F. Supp. 2d 661 (M.D. La. 2000), aff’d, 247 F.3d 240 (5th Cir. 2001) (unpublished)).

The Chamber decision involved a challenge to the causation opinion of frequent litigation industry witness, Peter Infante,12 who attempted to defend his opinion about benzene and chronic myelogenous leukemia, based upon epidemiology of benzene and acute myelogenous leukemia. Plaintiffs’ witnesses and counsel sought to evade the burden of producing evidence of an AML association by pointing to a study that reported “excess leukemias,” without specifying the relevant type. Chamber, 81 F. Supp. 2d at 664. The trial court, however, perspicaciously recognized the claimants’ failure to identify relevant evidence of the specific association needed to support the causal claim.

The Frischhertz and Chamber cases are hardly unique. Several state and federal courts have concurred in the context of cancer causation claims.13 In the context of birth defects litigation, the Public Affairs Committee of the Teratology Society has weighed in with strong guidance that counsels against extrapolation between different birth defects in litigation:

Determination of a causal relationship between a chemical and an outcome is specific to the outcome at issue. If an expert witness believes that a chemical causes malformation A, this belief is not evidence that the chemical causes malformation B, unless malformation B can be shown to result from malformation A. In the same sense, causation of one kind of reproductive adverse effect, such as infertility or miscarriage, is not proof of causation of a different kind of adverse effect, such as malformation.”14

The threat to validity in attributing a suggested risk for a composite end point to all included component end points is not, unfortunately, recognized by all courts. The trial court, in Ruff v. Ensign-Bickford Industries, Inc.,15 permitted plaintiffs’ expert witness to reanalyze a study by grouping together two previously distinct cancer outcomes to generate a statistically significant result. The result in Ruff is disappointing, but not uncommon. The result is also surprising, considering the guidance provided by the American Law Institute’s Restatement:

Even when satisfactory evidence of general causation exists, such evidence generally supports proof of causation only for a specific disease. The vast majority of toxic agents cause a single disease or a series of biologically-related diseases. (Of course, many different toxic agents may be combined in a single product, such as cigarettes.) When biological-mechanism evidence is available, it may permit an inference that a toxic agent caused a related disease. Otherwise, proof that an agent causes one disease is generally not probative of its capacity to cause other unrelated diseases. Thus, while there is substantial scientific evidence that asbestos causes lung cancer and mesothelioma, whether asbestos causes other cancers would require independent proof. Courts refusing to permit use of scientific studies that support general causation for diseases other than the one from which the plaintiff suffers unless there is evidence showing a common biological mechanism include Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1115-1116 (5th Cir. 1991) (applying Texas law) (epidemiologic connection between heavy-metal agents and lung cancer cannot be used as evidence that same agents caused colon cancer); Cavallo v. Star Enters., 892 F. Supp. 756 (E.D. Va. 1995), aff’d in part and rev’d in part, 100 F.3d 1150 (4th Cir. 1996); Boyles v. Am. Cyanamid Co., 796 F. Supp. 704 (E.D.N.Y. 1992). In Austin v. Kerr-McGee Ref. Corp., 25 S.W.3d 280, 290 (Tex. Ct. App. 2000), the plaintiff sought to rely on studies showing that benzene caused one type of leukemia to prove that benzene caused a different type of leukemia in her decedent. Quite sensibly, the court insisted that before plaintiff could do so, she would have to submit evidence that both types of leukemia had a common biological mechanism of development.”

Restatement (Third) of Torts § 28 cmt. c, at 406 (2010). Notwithstanding some of the Restatement’s excesses on other issues, the guidance on composites, seems sane and consonant with the scientific literature.

Role of Mechanism in Justifying Composite End Points

A composite end point may make sense when the individual end points are biologically related, and the investigators can reasonably expect that the individual end points would be affected in the same direction, and approximately to the same extent:16

Confidence in a composite end point rests partly on a belief that similar reductions in relative risk apply to all the components. Investigators should therefore construct composite endpoints in which the biology would lead us to expect similar effects across components.”

The important point, missed by some investigators and many courts, is that the assumption of similar “effects” must be tested by examining the individual component end points, and especially the end point that is the harm claimed by plaintiffs in a given case.

Methodological Issues

The acceptability of composite end points is often a delicate balance between the statistical power and efficiency gained and the reliability concerns raised by using the composite. As with any statistical or interpretative tool, the key questions turn on how the tool is used, and for what purpose. The reliability issues raised by the use of composites are likely to be highly contextual.

For instance, there is an important asymmetry between justifying the use of a composite for measuring efficacy and the use of the same composite for safety outcomes. A biological improvement in type 2 diabetes might be expected to lead to a reduction in all the macrovascular complications of that disease, but a medication for type 2 diabetes might have a very specific toxicity or drug interaction, which affects only one constituent end point among all macrovascular complications, such as myocardial infarction. The asymmetry between efficacy and safety outcomes is specifically addressed by cardiovascular epidemiologists in an important methodological paper:17

Varying definitions of composite end points, such as MACE, can lead to substantially different results and conclusions. There, the term MACE, in particular, should not be used, and when composite study end points are desired, researchers should focus separately on safety and effectiveness outcomes, and construct separate composite end points to match these different clinical goals.”

There are many clear, published statements that caution consumers of medical studies against being misled by claims based upon composite end points. Several years ago, for example, the British Medical Journal published a paper with six methodological suggestions for consumers of studies, one of which deals explicitly with composite end points:18

“Guide to avoid being misled by biased presentation and interpretation of data

1. Read only the Methods and Results sections; bypass the Discuss section

2. Read the abstract reported in evidence based secondary publications

3. Beware faulty comparators

4. Beware composite endpoints

5. Beware small treatment effects

6. Beware subgroup analyses”

The paper elaborates on the problems that arise from the use of composite end points:19

Problems in the interpretation of these trials arise when composite end points include component outcomes to which patients attribute very different importance… .”

Problems may also arise when the most important end point occurs infrequently or when the apparent effect on component end points differs.”

When the more important outcomes occur infrequently, clinicians should focus on individual outcomes rather than on composite end points. Under these circumstances, inferences about the end points (which because they occur infrequently will have very wide confidence intervals) will be weak.”

Authors generally acknowledge that “[w]hen large variations exist between components the composite end point should be abandoned.”20

Methodological Issues Concerning Causal Inferences from Composite End Points to Individual End Points

Several authors have criticized pharmaceutical companies for using composite end points to “game” their trials. Composites allow smaller sample size, but they lend themselves to broader claims for outcomes included within the composite. The same criticism applies to attempts to infer that there is risk of an individual endpoint based upon a showing of harm in the composite endpoint.

If a trial report specifies a composite endpoint, the components of the composite should be in the well-known pathophysiology of the disease. The researchers should interpret the composite endpoint in aggregate rather than as showing efficacy of the individual components. However, the components should be specified as secondary outcomes and reported beside the results of the primary analysis.”21

Virtually the entire field of epidemiology and clinical trial study has urged caution in inferring risk for a component end point from suggested risk in a composite end point:

In summary, evaluating trials that use composite outcome requires scrutiny in regard to the underlying reasons for combining endpoints and its implications and has impact on medical decision-making (see below in Sect. 47.8). Composite endpoints are credible only when the components are of similar importance and the relative effects of the intervention are similar across components (Guyatt et al. 2008a).”22

Not only do important methodologists urge caution in the interpretation of composite end points,23 they emphasize a basic point of scientific (and legal) relevancy:

[A] positive result for a composite outcome applies only to the cluster of events included in the composite and not to the individual components.”24

Even regular testifying expert witnesses for the litigation industry insist upon the “principle of full disclosure”:

The analysis of the effect of therapy on the combined end point should be accompanied by a tabulation of the effect of the therapy for each of the component end points.”25

Gatekeepers in our judicial system need to be more vigilant against bait-and-switch inferences based upon composite end points. The quest for statistical power hardly justifies larding up an end point with irrelevant data points.


1 See, e.g., Milton Packer, “Unbelievable! Electrophysiologists Embrace ‘Alternative Facts’,” MedPage (May 16, 2018) (describing clinical trialists’ abandoning pre-specified intention-to-treat analysis).

2 Curtis Meinert, Clinical Trials Dictionary (Johns Hopkins Center for Clinical Trials 1996).

3 Victor M. Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596 (2005).

4 R. Fletcher & S. Fletcher, Clinical Epidemiology: The Essentials at 109 (4th ed. 2005).

5 Neaton, et al., “Key issues in end point selection for heart failure trials: composite end points,” 11 J. Cardiac Failure 567, 569a (2005).

6 Schulz & Grimes, “Multiplicity in randomized trials I: endpoints and treatments,” 365 Lancet 1591, 1593a (2005).

7 Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials,” 334 Brit. Med. J. 756, 756a – b (2007).

8 International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use; “ICH harmonized tripartite guideline: statistical principles for clinical trials,” 18 Stat. Med. 1905 (1999).

9 Neaton, et al., “Key issues in end point selection for heart failure trials: composite end points,” 11 J. Cardiac Failure 567, 569b (2005).

10 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596, Summary Point No. 2 (2005).

11 SeeLumpenepidemiology” (Dec. 24, 2012), discussing Frischhertz v. SmithKline Beecham Corp., 2012 U.S. Dist. LEXIS 181507 (E.D. La. 2012).Frischhertz was decided in the same month that a New York City trial judge ruled Dr. Shira Kramer out of bounds in the commission of similarly invalid lumping, in Reeps v. BMW of North America, LLC, 2012 NY Slip Op 33030(U), N.Y.S.Ct., Index No. 100725/08 (New York Cty. Dec. 21, 2012) (York, J.), 2012 WL 6729899, aff’d on rearg., 2013 WL 2362566, aff’d, 115 A.D.3d 432, 981 N.Y.S.2d 514 (2013), aff’d sub nom. Sean R. v. BMW of North America, LLC, ___ N.E.3d ___, 2016 WL 527107 (2016). See also New York Breathes Life Into Frye Standard – Reeps v. BMW(Mar. 5, 2013).

12Infante-lizing the IARC” (May 13, 2018).

13 Knight v. Kirby Inland Marine, 363 F.Supp. 2d 859, 864 (N.D. Miss. 2005), aff’d, 482 F.3d 347 (5th Cir. 2007) (excluding opinion of B.S. Levy on Hodgkin’s disease based upon studies of other lymphomas and myelomas); Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198 (5th Cir. 1996) (noting that evidence suggesting a causal connection between ethylene oxide and human lymphatic cancers is not probative of a connection with brain cancer);Current v. Atochem North America, Inc., 2001 WL 36101283, at *3 (W.D. Tex. Nov. 30, 2001) (excluding expert witness opinion of Michael Gochfeld, who asserted that arsenic causes rectal cancer on the basis of studies that show association with lung and bladder cancer; Hill’s consistency factor in causal inference does not apply to cancers generally); Exxon Corp. v. Makofski, 116 S.W.3d 176, 184-85 (Tex. App. Houston 2003) (“While lumping distinct diseases together as ‘leukemia’ may yield a statistical increase as to the whole category, it does so only by ignoring proof that some types of disease have a much greater association with benzene than others.”).

14The Public Affairs Committee of the Teratology Society, “Teratology Society Public Affairs Committee Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421, 423 (2005).

15 168 F. Supp. 2d 1271, 1284–87 (D. Utah 2001).

16 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 595b (2005).

17 Kevin Kip, et al., “The problem with composite end points in cardiovascular studies,” 51 J. Am. Coll. Cardiol. 701, 701 (2008) (Abstract – Conclusions) (emphasis in original).

18 Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004) (emphasis added).

19 Id. at 1094b, 1095a.

20 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596 (2005).

21 Schulz & Grimes, “Multiplicity in randomized trials I: endpoints and treatments,” 365 Lancet 1591, 1595a (2005) (emphasis added). These authors acknowledge that composite end points often lack clinical relevancy, and that the gain in statistical efficiency comes at the high cost of interpretational difficulties. Id. at 1593.

22 Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints).

23 See, e.g., Stuart J. Pocock, John J.V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”)(“COMPOSITE ENDPOINTS. These are commonly used in CV RCTs to combine evidence across 2 or more outcomes into a single primary endpoint. But, there is a danger of oversimplifying the evidence by putting too much emphasis on the composite, without adequate inspection of the contribution from each separate component.”); Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa, and Douglas G. Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612, 612, 615-16 (2008) (“Individual outcomes do not contribute equally to composite measures, so the overall estimate of effect for a composite measure cannot be assumed to apply equally to each of its individual outcomes.”) (“Therefore, readers are cautioned against assuming that the overall estimate of effect for the composite outcome can be interpreted to be the same for each individual outcome.”); Freemantle, et al., “Composite outcomes in randomized trials: Greater precision but with greater uncertainty.” 289 J. Am. Med. Ass’n 2554, 2559a (2003) (“To avoid the burying of important components of composite primary outcomes for which on their own no effect is concerned, . . . the components of a composite outcome should always be declared as secondary outcomes, and the results described alongside the result for the composite outcome.”).

24 Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J. 757a (2007).

25 Lem Moyé, “Statistical Methods for Cardiovascular Researchers,” 118 Circulation Research 439, 451 (2016).

The Judicial Labyrinth for Scientific Evidence

October 3rd, 2018

The real Daedalus (not the musician), as every school child knows, was the creator of the Cretan Labyrinth, where the Minotaur resided. The Labyrinth had been the undoing of many Greeks and barbarians, until an Athenian, Theseus, took up the challenge of slaying the Minotaur. With the help of Ariadne’s thread, Theseus solved the labyrinthic puzzle and slayed the Minotaur.

Theseus and the Minotaur on 6th-century black-figure pottery (Wikimedia Commons 2005)

Dædalus is also the Journal of the American Academy of Arts and Sciences. The Academy has been, for over 230 years, addressing issues issues in both the humanities and in the sciences. In the fall 2018 issue of Dædalus (volume 147, No. 4), the Academy has published a dozen essays by noted scholars in the field, who report on the murky interface of science and law in the courtrooms of the United States. Several of the essays focus on sorry state of forensic “science” in the criminal justice system, which has been the subject of several critical official investigations, only to be dismissed and downplayed by both the Obama and Trump administrations. Other essays address the equally sorry state of judicial gatekeeping in civil actions, with some limited suggestions on how the process of scientific fact finding might be improved. In any event, this issue, Science & the Legal System,” is worth reading even if you do not agree with the diagnoses or the proposed therapies. There is still room for a collaboration between a modern day Daedalus and Ariadne to help us find the way out of this labyrinth.

Introduction

Shari Seidman Diamond & Richard O. Lempert, “Introduction” (pp. 5–14)

Connecting Science and Law

Sheila Jasanoff, “Science, Common Sense & Judicial Power in U.S. Courts” (pp. 15-27)

Linda Greenhouse, “The Supreme Court & Science: A Case in Point,” (pp. 28–40)

Shari Seidman Diamond & Richard O. Lempert, “When Law Calls, Does Science Answer? A Survey of Distinguished Scientists & Engineers,” (pp. 41–60)

Accomodation or Collision: When Science and Law Meet

Jules Lobel & Huda Akil, “Law & Neuroscience: The Case of Solitary Confinement,” (pp. 61–75)

Rebecca S. Eisenberg & Robert Cook-Deegan, “Universities: The Fallen Angels of Bayh-Dole?” (pp. 76–89)

Jed S. Rakoff & Elizabeth F. Loftus, “The Intractability of Inaccurate Eyewitness Identification” (pp. 90–98)

Jennifer L. Mnookin, “The Uncertain Future of Forensic Science” (pp. 99–118)

Joseph B. Kadane and Jonathan J. Koehler, “Certainty & Uncertainty in Reporting Fingerprint Evidence” (pp. 119–134)

Communicating Science in Court

Nancy Gertner & Joseph Sanders, “Alternatives to Traditional Adversary Methods of Presenting Scientific Expertise in the Legal System” (pp. 135–151)

Daniel L. Rubinfeld & Joe S. Cecil, “Scientists as Experts Serving the Court” (pp. 152–163)

Valerie P. Hans and Michael J. Saks, “Improving Judge & Jury Evaluation of Scientific Evidence” (pp. 164–180)

Continuing the Dialogue

David Baltimore, David S. Tatel & Anne-Marie Mazza, “Bridging the Science-Law Divide” (pp. 181–194)

Carl Cranor’s Conflicted Jeremiad Against Daubert

September 23rd, 2018

Carl Cranor’s Conflicted Jeremiad Against Daubert

It seems that authors who have the most intense and refractory conflicts of interest (COI) often fail to see their own conflicts and are the most vociferous critics of others for failing to identify COIs. Consider the spectacle of having anti-tobacco activists and tobacco plaintiffs’ expert witnesses assert that the American Law Institute had an ethical problem because Institute members included some tobacco defense lawyers.1 Somehow these authors overlooked their own positional and financial conflicts, as well as the obvious fact that the Institute’s members included some tobacco plaintiffs’ lawyers as well. Still, the complaint was instructive because it typifies the abuse of ethical asymmetrical standards, as well as ethical blindspots.2

Recently, Raymond Richard Neutra, Carl F. Cranor, and David Gee published a paper on the litigation use of Sir Austin Bradford Hill’s considerations for evaluating whether an association is causal or not.3 See Raymond Richard Neutra, Carl F. Cranor, and David Gee, “The Use and Misuse of Bradford Hill in U.S. Tort Law,” 58 Jurimetrics 127 (2018) [cited here as Cranor]. Their paper provides a startling example of hypocritical and asymmetrical assertions of conflicts of interests.

Neutra is a self-styled public health advocate4 and the Chief of the Division of Environmental and Occupational Disease Control (DEODC) of the California Department of Health Services (CDHS). David Gee, not to be confused with the English artist or the Australian coin forger, is with the European Environment Agency, in Copenhagen, Denmark. He is perhaps best known for his precautionary principle advocacy and his work with trade unions.5

Carl Cranor is with the Center for Progressive Reform, and he teaches philosophy at one of the University of California campuses. Although he is neither a lawyer nor a scientist, he participates with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants. Perhaps Cranor’s most notorious appearance as an expert witness resulted in the decision of Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, 132 S. Ct. 1002 (2012). Probably less generally known is that Cranor was one of the founders of an organization, the Council for Education and Research on Toxics (CERT), which recently was the complaining party in a California case in which CERT sought money damages for Starbucks’ failure to label each cup of coffee sold as known to the State of California as causing cancer.6 Having a so-called not-for-profit corporation can also be pretty handy, especially when it holds itself out as a scientific organization and files amicus briefs in support of reversing Daubert exclusions of the founding members of the corporation, as CERT did on behalf of its founding member in the Milward case.7 The conflict of interest, in such an amicus brief, however, is no longer potential or subtle, and violates the duty of candor to the court.

In this recent article on Hill’s considerations for judging causality, Cranor followed CERT’s lead from Milward. Cranor failed to disclose that he has been a party expert witness for plaintiffs, in cases in which he was advocating many of the same positions put forward in the Jurimetrics article, including the Milward case, in which he was excluded from testifying by the trial court. Cranor’s lack of candor with the readers of the Jurimetrics article is all the more remarkable in that Cranor and his co-authors give conflicts of interest outsize importance in substantive interpretations of scholarship:

the desired reliability for evidence evaluation requires that biases that derive from the financial interests and ideological commitments of the investigators and editors that control the gateways to publication be considered in a way that Hill did not address.”

Cranor at 137 & n.59. Well, we could add that Cranor’s financial interests and ideological commitments might well be considered in evaluating the reliability of the opinions and positions advanced in this most recent work by Cranor and colleagues. If you believe that COIs disqualify a speaker from addressing important issues, then you have all the reason you need to avoid reading Cranor’s recent article.

Dubious Scholarship

The more serious problem with Cranor’s article is not his ethically strained pronouncements about financial interests, but the dubious scholarship he and his colleagues advance to thwart judicial gatekeeping of even more dubious expert witness opinion testimony. To begin with, the authors disparage the training and abilities of federal judges to assess the epistemic warrant and reliability of proffered causation opinions:

With their enhanced duties to review scientific and technical testimony federal judges, typically not well prepared by legal education for these tasks, have struggled to assess the scientific support for—and the reliability and relevance of—expert testimony.”

Cranor at 147. Their assessment is fair but hides the authors’ cynical agenda to remove gatekeeping and leave the assessment to lay juries, who are less well prepared for the task, and whose function ensures no institutional accountability, review, or public evaluation.

Similarly, the authors note the temporal context and limitations of Bradford Hill’s 1965 paper, which date and limit the advice provided over 50 years ago in a discipline that has changed dramatically with the advancement of biological, epidemiologic, and genetic science.8 Even at the time of its original publication in 1965, Bradford Hill’s paper, which was based upon an informal lecture, was not designed or intended to be a definitive treatment of causal inference. Cranor and his colleagues make no effort to review Bradford Hill’s many other publications, both before and after his 1965 dinner speech, for evidence of his views on the factors for causal inference, including the role of statistical testing and inference.

Nonetheless, Bradford Hill’s 1965 paper has become a landmark, even if dated, because of its author’s iconic status in the world of public health, earned for his showing that tobacco smoking causes lung cancer,9 and for advancing the role of double-blind randomized clinical trials.10 Cranor and his colleagues made no serious effort to engage with the large body of Bradford Hill’s writings, including his immensely important textbook, The Principles of Medical Statistics, which started as a series of articles in The Lancet, and went through 12 editions in print.11 Hill’s reputation will no doubt survive Cranor’s bowdlerized version of Sir Austin’s views.

Epidemiology is Dispensable When It Fails to Support Causal Claims

The egregious aspect of Cranor’s article is its bill of particulars against the federal judiciary for allegedly errant gatekeeping, which for these authors translates really into any gatekeeping at all. Cranor at 144-45. Indeed, the authors provide not a single example of what was a “proper” exclusion of an expert witness, who was contending for some doubtful causal claim. Perhaps they have never seen a proper exclusion, but doesn’t that speak volumes about their agenda and their biases?

High on the authors’ list of claimed gatekeeping errors is the requirement that a causal claim be supported with epidemiologic evidence. Although some causal claims may be supported by strong evidence of a biological process with mechanistic evidence, such claims are not common in United States tort litigation.

In support of the claim that epidemiology is dispensable, Cranor suggests that:

Some courts have recognized this, and distinguished scientific committees often do not require epidemiological studies to infer harm to humans. For example, the International Agency for Research on Cancer (IRAC) [sic], the National Toxicology Program, and California’s Proposition 65 Scientific Advisory Panel, among others, do not require epidemiological data to support findings that a substance is a probable or—in some cases—a known human carcinogen, but it is welcomed if available.”

Cranor at 149. California’s Proposition 65!??? Even IARC is hard to take seriously these days with its capture by consultants for the litigation industry, but if we were to accept IARC as an honest broker of causal inferences, what substance “known” to IARC to cause cancer in humans (Category I) was branded as a “known carcinogen” without the support of epidemiologic studies? Inquiring minds might want to know, but they will not learn the answer from Cranor and his co-authors.

When it comes to adverting to legal decisions that supposedly support the authors’ claim that epidemiology is unnecessary, their scholarship is equally wanting. The paper cites the notorious Wells case, which was so roundly condemned in scientific circles, that it probably helped ensure that a decision such as Daubert would ultimately be handed down by the Supreme Court. The authors seemingly cannot read, understand, and interpret even the most straightforward legal decisions. Here is how they cite Wells as support for their views:

Wells v. Ortho Pharm. Corp., 788 F.2d 741, 745 (11th Cir. 1986) (reviewing a district court’s decision deciding not to require the use of epidemiological evidence and instead allowing expert testimony).”

Cranor at 149-50 n.122. The trial judge in Wells never made such a decision; indeed, the case was tried by the bench, before the Supreme Court decided Daubert. There was no gatekeeping involved at all. More important, however, and contrary to Cranor’s explanatory parenthetical, both sides presented epidemiologic evidence in support of their positions.12

Cranor and his co-authors similarly misread and misrepresent the trial court’s decision in the litigation over maternal sertraline use and infant birth defects. Twice they cite the Multi-District Litigation trial court’s decision that excluded plaintiffs’ expert witnesses:

In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449, 455 (E.D. Pa. 2014) (expert may not rely on nonstatistically significant studies to which to apply the [Bradford Hill] factors).”

Cranor at 144 n.85; 158 n.179. The MDL judge, Judge Rufe, decidedly never held that an expert witness may not rely upon a statistically non-significant study in a “Bradford Hill” analysis, and the Third Circuit, which affirmed the exclusions of the plaintiffs’ expert witnesses’ testimony, was equally clear in avoiding the making of such a pronouncement.13

Who Needs Statistical Significance

Part of Cranor’s post-science agenda is to intimidate judges into believing that statistical significance is unnecessary and a wrong-headed criterion for judging the validity of relied upon research. In their article, Cranor and friends suggest that Hill agreed with their radical approach, but nothing could be further from the truth. Although these authors parse almost every word of Hill’s 1965 article, they conveniently omit Hill’s views about the necessary predicates for applying his nine considerations for causal inference:

Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Cranor’s radicalism leaves no room for assessing whether a putative association is “beyond what we would care to attribute to the play of chance,” and his poor scholarship ignores Hill’s insistence that this statistical analysis be carried out.14

Hill’s work certainly acknowledged the limitations of statistical method, which could not compensate for poorly designed research:

It is a serious mistake to rely upon the statistical method to eliminate disturbing factors at the completion of the work.  No statistical method can compensate for a badly planned experiment.”

Austin Bradford Hill, Principles of Medical Statistics at 4 (4th ed. 1948). Hill was equally clear, however, that the limits on statistical methods did not imply that statistical methods are not needed to interpret a properly planned experiment or study. In the summary section of his textbook’s first chapter, Hill removed any doubt about his view of the importance, and the necessity, of statistical methods:

The statistical method is required in the interpretation of figures which are at the mercy of numerous influences, and its object is to determine whether individual influences can be isolated and their effects measured.”

Id. at 10 (emphasis added).

In his efforts to eliminate judicial gatekeeping of expert witness testimony, Cranor has struggled with understanding of statistical inference and testing.15 In an early writing, a 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities.16 Indeed, Cranor goes further, in confusing significance and posterior probabilities, when he described the usual level of alpha (5%) as the “95%” rule, and claimed that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies.17

Cranor has persisted in this fallacious analysis in his writings. In a 2006 book, he erroneously equated the 95% coefficient of statistical confidence with 95% certainty of knowledge.18 Later in this same text, Cranor again asserted his nonsense that agency regulations are written when supported by “beyond a reasonable doubt.”19 Given that Cranor has consistently confused significance and posterior probability, he really should not be giving advice to anyone about statistical or scientific inference. Cranor’s persistent misunderstandings of basic statistical concepts do, however, explain his motivation for advocating the elimination of statistical significance testing, even if these misunderstandings make his enterprise intellectually unacceptable.

Cranor and company fall into a similar muddle when they offer advice on post-hoc power calculations, which advice ignores standard statistical learning for interpreting completed studies.20 Another measure of the authors’ failed scholarship is their omission of any discussion of recent efforts by many in the scientific community to lower the threshold for statistical significance, based upon the belief that the customary 5% p-value is an order of magnitude too high.21

 

Relative Risks Greater Than Two

There are other tendentious arguments and treatments in Cranor’s brief against gatekeeping, but I will stop with one last example. The inference of specific causation from study risk ratios has provoked a torrent of verbiage from Sander Greenland (who is cited copiously by Cranor). Cranor, however, does not even scratch the surface of the issue and fails to cite the work of epidemiologists, such as Duncan C. Thomas, who have defended the use of probabilities of (specific) causation. More important, however, Cranor fails to speak out against the abuse of using any relative risk greater than 1.0 to support an inference of specific causation, when the nature of the causal relationship is neither necessary nor sufficient. In this context, Kenneth Rothman has reminded us that someone can be exposed to, or have, a risk, and then develop the related outcome, without there being any specific causation:

An elementary but essential principle to keep in mind is that a person may be exposed to an agent and then develop disease without there being any causal connection between the exposure and the disease. For this reason, we cannot consider the incidence proportion or the incidence rate among exposed people to measure a causal effect.”

Kenneth J. Rothman, Epidemiology: An Introduction at 57 (2d ed. 2012).

The danger in Cranor’s article in Jurimetrics is that some readers will not realize the extreme partisanship in its ipse dixit, and erroneous, pronouncements. Caveat lector


1 Elizabeth Laposata, Richard Barnes & Stanton Glantz, “Tobacco Industry Influence on the American Law Institute’s Restatements of Torts and Implications for Its Conflict of Interest Policies,” 98 Iowa L. Rev. 1 (2012).

2 The American Law Institute responded briefly. See Roberta Cooper Ramo & Lance Liebman, “The ALI’s Response to the Center for Tobacco Control Research & Education,” 98 Iowa L. Rev. Bull. 1 (2013), and the original authors’ self-serving last word. Elizabeth Laposata, Richard Barnes & Stanton Glantz, “The ALI Needs to Implement Modern Conflict of Interest Policies,” 98 Iowa L. Rev. Bull. 17 (2013).

3 Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965).

4 Raymond Richard Neutra, “Epidemiology Differs from Public Health Practice,” 7 Epidemiology 559 (1996).

7From Here to CERT-ainty” (June 28, 2018).

8 Kristen Fedak, Autumn Bernal, Zachary Capshaw, and Sherilyn A Gross, “Applying the Bradford Hill Criteria in the 21st Century: How Data Integration Has Changed Causal Inference in Molecular Epidemiology,” Emerging Themes in Epidemiol. 12:14 (2015); John P. A. Ioannides, “Exposure Wide Epidemiology, Revisiting Bradford Hill,” 35 Stats. Med. 1749 (2016).

9 Richard Doll & Austin Bradford Hill, “Smoking and Carcinoma of the Lung,” 2(4682) Brit. Med. J. (1950).

10 Geoffrey Marshall (chairman), “Streptomycin Treatment of Pulmonary Tuberculosis: A Medical Research Council Investigation,” 2 Brit. Med. J. 769, 769–71 (1948).

11 Vern Farewell & Anthony Johnson,The origins of Austin Bradford Hill’s classic textbook of medical statistics,” 105 J. Royal Soc’y Med. 483 (2012). See also Hilary E. Tillett, “Bradford Hill’s Principles of Medical Statistics,” 108 Epidemiol. Infect. 559 (1992).

13 In re Zoloft Prod. Liab. Litig., No. 16-2247 , __ F.3d __, 2017 WL 2385279, 2017 U.S. App. LEXIS 9832 (3d Cir. June 2, 2017) (affirming exclusion of biostatistician Nichols Jewell’s dodgy opinions, which involved multiple methodological flaws and failures to follow any methodology faithfully).

14 See Bradford Hill on Statistical Methods” (Sept. 24, 2013).

16 Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (1993) (arguing incorrectly that one can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”); see also id. at 44, 47, 55, 72-76. At least one astute reviewer called Cranor on his statistical solecisms. Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities).

17 Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025, which impermissibly assumes independence between the two studies).

18 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

19 Id. at 266.

21 See, e.g., John P. A. Ioannidis, “The Proposal to Lower P Value Thresholds to .005,” 319 J. Am. Med. Ass’n 1429 (2018); Daniel J. Benjamin, James O. Berger, Valen E. Johnson, et al., “Redefine statistical significance,” 2 Nature Human Behavior 6 (2018).

Ninth Circuit’s Difficulty with Process of Elimination

September 16th, 2018

Differential etiology is a high-fallutin’ term given to a simple disjunctive syllogism in which all disjuncts in the premise but one are eliminated. The syllogism would be a persuasive argument for the one remaining disjunct but only if all the other premises are effectively eliminated. Otherwise, we are left with competing disjunctive premises that remain, without any way of embracing the “one,” for which someone is contending.

Over 100 years ago, the United States Supreme Court recognized the need for eliminating all but the claimed cause in a simple FELA negligence action. In a unanimous decision, the Court declared:

And where the testimony leaves the matter uncertain and shows that any one of half a dozen things may have brought about the injury, for some of which the employer is responsible and for some of which he is not, it is not for the jury to guess between these half a dozen causes and find that the negligence of the employer was the real cause, when there is no satisfactory foundation in the testimony for that conclusion. If the employe is unable to adduce sufficient evidence to show negligence on the part of the employer, it is only one of the many cases in which the plaintiff fails in his testimony, and no mere sympathy for the unfortunate victim of an accident justifies any departure from settled rules of proof resting upon all plaintiffs.”

Patton v. Texas & Pacific RR, 179 U.S. 658, 663-64 (1901).

Recently the United States Court of Appeals, for the Ninth Circuit, recognized the need to rule out alternative factual explanations before a court could enter judgment on a claim of copyright infringement.1 Cobbler Nevada, LLC v Thomas Gonzales, No. 17-35041 (9th Cir., Aug. 27, 2018). The facts of Cobbler Nevada are illustrative.

Someone with access to an IP address registered to Thomas Gonzales used BitTorrent to download a copy of “The Cobbler,” an Adam Sandler movie. Cobbler Nevada LLC sued Mr. Gonzales, not for bad taste, but for infringing on its copyright to the movie. Mr. Gonzales, however, was the owner of an adult foster home, in which several other people had access to Gonzales’ IP address. Cobbler Nevada had no evidence that eliminated the possibility of downloading by other people in the home.

An amended complaint accused Mr. Gonzales of directly infringing the copyright, and alternatively, of contributing to the infringement by not policing this own internet connection.

The panel affirmed the rejection of the infringement claim because the claimant had failed to rule out downloading by someone who other Gonzales:

The direct infringement claim fails because Gonzales’ status as the registered subscriber of an infringing IP address, standing alone, does not create a reasonable inference that he is also the infringer… .”

Id. The panel reasoned that others in the household could have accessed Gonzales’ internet connection, and that the law did not impose a duty to secure the connection from a “frugal” neighbor.

In personal injury cases, the Ninth Circuit takes a very different, and thoroughly illogical approach from its astute reasoning in Cobbler Nevada. In one Ninth Circuit case, the plaintiff claimed without much of any supporting evidence that he had sustained a drug-induced disease, when over 70 percent of cases of that disease were idiopathic. The trial court accurately diagnosed the situation as an impossible proof problem for the plaintiff because the differential etiology method could not eliminate idiopathic causes in the case before the court. Rule 702 led to the exclusion of plantiffs’ proffered opinions, and the trial court entered summary judgment for the defendants. The Ninth Circuit reversed in an ipse dixit judgment that threw logic to the wind. Wendell v. Johnson & Johnson, No. 09-cv-04124, 2014 WL 2943572, at *5 (N.D. Cal. June 30, 2014), rev’d sub nom. Wendell v. GlaxoSmithKline LLC, 858 F.3d 1227 (9th Cir. 2017).2

The two cases, Wendell and Cobbler Nevada, cannot be reconciled. The aberrant and costive reasoning of Wendell will give rise to unflattering speculation about the Circuit’s motivation. Perhaps the next edition of the Reference Manual on Scientific Evidence should have a chapter on elementary logic, to help avoid such embarrassing situations.


1 Jason Tashea, “9th Circuit rules that sharing IP address is insufficient for copyright infringement,” Am. Bar. Ass’n J. (Sept. 4, 2018).

2 For a lively vivisection of the Ninth Circuit’s decision in Wendell, see David L. Faigman & Jennifer Mnookin, “The Curious Case of Wendell v. GlaxoSmithKline LLC,” 48 Seton Hall L. Rev. 607 (2018).

N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses

August 8th, 2018

The United States Supreme Court’s decision in Daubert is now over 25 years old. The idea of judicial gatekeeping of expert witness opinion testimony is even older in New Jersey state courts. The New Jersey Supreme Court articulated a reliability standard before the Daubert case was even argued in Washington, D.C. See Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991). Articulating a standard, however, is something very different from following a standard, and in many New Jersey trial courts, until very recently, the standard was pretty much anything goes.

One counter-example to the general rule of dog-eat-dog in New Jersey was Judge Nelson Johnson’s careful review and analysis of the proffered causation opinions in cases in which plaintiffs claimed that their use of the anti-acne medication isotretinoin (Accutane) caused Crohn’s disease. Judge Johnson, who sits in the Law Division of the New Jersey Superior Court for Atlantic County held a lengthy hearing, and reviewed the expert witnesses’ reliance materials.1 Judge Johnson found that the plaintiffs’ expert witnesses had employed undue selectivity in choosing what to rely upon. Perhaps even more concerning, Judge Johnson found that these witnesses had refused to rely upon reasonably well-conducted epidemiologic studies, while embracing unpublished, incomplete, and poorly conducted studies and anecdotal evidence. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div., Atlantic Cty. Feb. 20, 2015). In response, Judge Johnson politely but firmly closed the gate to conclusion-driven duplicitous expert witness causation opinions in over 2,000 personal injury cases. “Johnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015).

Aside from resolving over 2,000 pending cases, Judge Johnson’s judgment was of intense interest to all who are involved in pharmaceutical and other products liability litigation. Judge Johnson had conducted a pretrial hearing, sometimes called a Kemp hearing in New Jersey, after the New Jersey Supreme Court’s opinion in Kemp v. The State of New Jersey, 174 N.J. 412 (2002). At the hearing and in his opinion that excluded plaintiffs’ expert witnesses’ causation opinions, Judge Johnson demonstrated a remarkable aptitude for analyzing data and inferences in the gatekeeping process.

When the courtroom din quieted, the trial court ruled that the proffered testimony of Dr., Arthur Kornbluth and Dr. David Madigan did not meet the liberal New Jersey test for admissibility. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015). And in closing the gate, Judge Johnson protected the judicial process from several bogus and misleading “lines of evidence,” which have become standard ploys to mislead juries in courthouses where the gatekeepers are asleep. Recognizing that not all evidence is on the same analytical plane, Judge Johnson gave case reports short shrift.

[u]nsystematic clinical observations or case reports and adverse event reports are at the bottom of the evidence hierarchy.”

Id. at *16. Adverse event reports, largely driven by the very litigation in his courtroom, received little credit and were labeled as “not evidentiary in a court of law.” Id. at 14 (quoting FDA’s description of FAERS).

Judge Johnson recognized that there was a wide range of identified “risk factors” for irritable bowel syndrome, such as prior appendectomy, breast-feeding as an infant, stress, Vitamin D deficiency, tobacco or alcohol use, refined sugars, dietary animal fat, fast food. In re Accutane, 2015 WL 753674, at *9. The court also noted that there were four medications generally acknowledged to be potential risk factors for inflammatory bowel disease: aspirin, nonsteroidal anti-inflammatory medications (NSAIDs), oral contraceptives, and antibiotics. Understandably, Judge Johnson was concerned that the plaintiffs’ expert witnesses preferred studies unadjusted for potential confounding co-variables and studies that had involved “cherry picking the subjects.” Id. at *18.

Judge Johnson had found that both sides in the isotretinoin cases conceded the relative unimportance of animal studies, but the plaintiffs’ expert witnesses nonetheless invoked the animal studies in the face of the artificial absence of epidemiologic studies that had been created by their cherry-picking strategies. Id.

Plaintiffs’ expert witnesses had reprised a common claimants’ strategy; namely, they claimed that all the epidemiology studies lacked statistical power. Their arguments often ignored that statistical power calculations depend upon statistical significance, a concept to which many plaintiffs’ counsel have virulent antibodies, as well as an arbitrarily selected alternative hypothesis of association size. Furthermore, the plaintiffs’ arguments ignored the actual point estimates, most of which were favorable to the defense, and the observed confidence intervals, most of which were reasonably narrow.

The defense responded to the bogus statistical arguments by presenting an extremely capable clinical and statistical expert witness, Dr. Stephen Goodman, to present a meta-analysis of the available epidemiologic evidence.

Meta-analysis has become an important facet of pharmaceutical and other products liability litigation[1]. Fortunately for Judge Johnson, he had before him an extremely capable expert witness, Dr. Stephen Goodman, to explain meta-analysis generally, and two meta-analyses he had performed on isotretinoin and irritable bowel outcomes.

Dr. Goodman explained that the plaintiffs’ witnesses’ failure to perform a meta-analysis was telling when meta-analysis can obviate the plaintiffs’ hyperbolic statistical complaints:

the strength of the meta-analysis is that no one feature, no one study, is determinant. You don’t throw out evidence except when you absolutely have to.”

In re Accutane, 2015 WL 753674, at *8.

Judge Johnson’s judicial handiwork received non-deferential appellate review from a three-judge panel of the Appellate Division, which reversed the exclusion of Kornbluth and Madigan. In re Accutane Litig., 451 N.J. Super. 153, 165 A.3d 832 (App. Div. 2017). The New Jersey Supreme Court granted the isotretinoin defendants’ petition for appellate review, and the issues were joined over the appropriate standard of appellate review for expert witness opinion exclusions, and the appropriateness of Judge Johnson’s exclusions of Kornbluth and Madigan. A bevy of amici curiae joined in the fray.2

Last week, the New Jersey Supreme Court issued a unanimous opinion, which reversed the Appellate Division’s holding that Judge Johnson had “mistakenly exercised” discretion. Applying its own precedents from Rubanick, Landrigan, and Kemp, and the established abuse-of-discretion standard, the Court concluded that the trial court’s ruling to exclude Kornbluth and Madigan was “unassailable.” In re Accutane Litig., ___ N.J. ___, 2018 WL 3636867 (2018), Slip op. at 79.3

The high court graciously acknowledged that defendants and amici had “good reason” to seek clarification of New Jersey law. Slip op. at 67. In abandoning abuse-of-discretion as its standard of review, the Appellate Division had relied upon a criminal case that involved the application of the Frye standard, which is applied as a matter of law. Id. at 70-71. The high court also appeared to welcome the opportunity to grant review and reverse the intermediate court reinforce “the rigor expected of the trial court” in its gatekeeping role. Id. at 67. The Supreme Court, however, did not articulate a new standard; rather it demonstrated at length that Judge Johnson had appropriately applied the legal standards that had been previously announced in New Jersey Supreme Court cases.4

In attempting to defend the Appellate Division’s decision, plaintiffs sought to characterize New Jersey law as somehow different from, and more “liberal” than, the United States Supreme Court’s decision in Daubert. The New Jersey Supreme Court acknowledged that it had never formally adopted the dicta from Daubert about factors that could be considered in gatekeeping, slip op. at 10, but the Court went on to note what disinterested observers had long understood, that the so-called Daubert factors simply flowed from a requirement of sound methodology, and that there was “little distinction” and “not much light” between the Landrigan and Rubanick principles and the Daubert case or its progeny. Id at 10, 80.

Curiously, the New Jersey Supreme Court announced that the Daubert factors should be incorporated into the New Jersey Rules 702 and 703 and their case law, but it stopped short of declaring New Jersey a “Daubert” jurisdiction. Slip op. at 82. In part, the Court’s hesitance followed from New Jersey’s bifurcation of expert witness standards for civil and criminal cases, with the Frye standard still controlling in the criminal docket. At another level, it makes no sense to describe any jurisdiction as a “Daubert” state because the relevant aspects of the Daubert decision were dicta, and the Daubert decision and its progeny were superseded by the revision of the controlling statute in 2000.5

There were other remarkable aspects of the Supreme Court’s Accutane decision. For instance, the Court put its weight behind the common-sense and accurate interpretation of Sir Austin Bradford Hill’s famous articulation of factors for causal judgment, which requires that sampling error, bias, and confounding be eliminated before assessing whether the observed association is strong, consistent, plausible, and the like. Slip op. at 20 (citing the Reference Manual at 597-99), 78.

The Supreme Court relied extensively on the National Academies’ Reference Manual on Scientific Evidence.6 That reliance is certainly preferable to judicial speculations and fabulations of scientific method. The reliance is also positive, considering that the Court did not look only at the problematic epidemiology chapter, but adverted also to the chapters on statistical evidence and on clinical medicine.

The Supreme Court recognized that the Appellate Division had essentially sanctioned an anything goes abandonment of gatekeeping, an approach that has been all-too-common in some of New Jersey’s lower courts. Contrary to the previously prevailing New Jersey zeitgeist, the Court instructed that gatekeeping must be “rigorous” to “prevent[] the jury’s exposure to unsound science through the compelling voice of an expert.” Slip op. at 68-9.

Not all evidence is equal. “[C]ase reports are at the bottom of the evidence hierarchy.” Slip op. at 73. Extrapolation from non-human animal studies is fraught with external validity problems, and such studies “far less probative in the face of a substantial body of epidemiologic evidence.” Id. at 74 (internal quotations omitted).

Perhaps most chilling for the lawsuit industry will be the Supreme Court’s strident denunciation of expert witnesses’ selectivity in choosing lesser evidence in the face of a large body of epidemiologic evidence, id. at 77, and their unprincipled cherry picking among the extant epidemiologic publications. Like the trial court, the Supreme Court found that the plaintiffs’ expert witnesses’ inconsistent use of methodological criteria and their selective reliance upon studies (disregarding eight of the nine epidemiologic studies) that favored their task masters was the antithesis of sound methodology. Id. at 73, citing with approval, In re Lipitor, ___ F.3d ___ (4th Cir. 2018) (slip op. at 16) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

An essential feature of the Supreme Court’s decision is that it was not willing to engage in the common reductionism that has “all epidemiologic studies are flawed,” and which thus privileges cherry picking. Not all disagreements between expert witnesses can be framed as differences in interpretation. In re Accutane will likely stand as a bulwark against flawed expert witness opinion testimony in the Garden State for a long time.


1 Judge Nelson Johnson is also the author of Boardwalk Empire: The Birth, High Times, and Corruption of Atlantic City (2010), a spell-binding historical novel about political and personal corruption.

2 In support of the defendants’ positions, amicus briefs were filed by the New Jersey Business & Industry Association, Commerce and Industry Association of New Jersey, and New Jersey Chamber of Commerce; by law professors Kenneth S. Broun, Daniel J. Capra, Joanne A. Epps, David L. Faigman, Laird Kirkpatrick, Michael M. Martin, Liesa Richter, and Stephen A. Saltzburg; by medical associations the American Medical Association, Medical Society of New Jersey, American Academy of Dermatology, Society for Investigative Dermatology, American Acne and Rosacea Society, and Dermatological Society of New Jersey, by the Defense Research Institute; by the Pharmaceutical Research and Manufacturers of America; and by New Jersey Civil Justice Institute. In support of the plaintiffs’ position and the intermediate appellate court’s determination, amicus briefs were filed by political action committee the New Jersey Association for Justice; by the Ironbound Community Corporation; and by plaintiffs’ lawyer Allan Kanner.

3 Nothing in the intervening scientific record called question upon Judge Johnson’s trial court judgment. See, e.g., I.A. Vallerand, R.T. Lewinson, M.S. Farris, C.D. Sibley, M.L. Ramien, A.G.M. Bulloch, and S.B. Patten, “Efficacy and adverse events of oral isotretinoin for acne: a systematic review,” 178 Brit. J. Dermatol. 76 (2018).

4 Slip op. at 9, 14-15, citing Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991) (“We initially took that step to allow the parties in toxic tort civil matters to present novel scientific evidence of causation if, after the trial court engages in rigorous gatekeeping when reviewing for reliability, the proponent persuades the court of the soundness of the expert’s reasoning.”).

5 The Court did acknowledge that Federal Rule of Evidence 702 had been amended in 2000, to reflect the Supreme Court’s decision in Daubert, Joiner, and Kumho Tire, but the Court did not deal with the inconsistencies between the present rule and the 1993 Daubert case. Slip op. at 64, citing Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 320-21, 320 n.8 (3d Cir. 2003).

6 See Accutane slip op. at 12-18, 24, 73-74, 77-78. With respect to meta-analysis, the Reference Manual’s epidemiology chapter is still stuck in the 1980s and the prevalent resistance to poorly conducted, often meaningless meta-analyses. SeeThe Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011) (The Reference Manual fails to come to grips with the prevalence and importance of meta-analysis in litigation, and fails to provide meaningful guidance to trial judges).

P-Values: Pernicious or Perspicacious?

May 12th, 2018

Professor Kingsley R. Browne, of the Wayne State University Law School, recently published a paper that criticized the use of p-values and significance testing in discrimination litigation. Kingsley R. Browne, “Pernicious P-Values: Statistical Proof of Not Very Much,” 42 Univ. Dayton L. Rev. 113 (2017) (cited below as Browne). Browne amply documents the obvious and undeniable, that judges, lawyers, and even some ill-trained expert witnesses, are congenitally unable to describe and interpret p-values properly. Most of Browne’s examples are from the world of anti-discrimination law, but he also cites a few from health effects litigation as well. Browne also cites from many of the criticisms of p-values in the psychology and other social science literature.

Browne’s efforts to correct judicial innumeracy are welcomed, but they take a peculiar turn in this law review article. From the well-known state of affairs of widespread judicial refusal or inability to discuss statistical concepts accurately, Browne argues for what seem to be two incongruous, inconsistent responses. Rejecting the glib suggestion of former Judge Posner that evidence law is not “fussy” about evidence, Browne argues that federal evidence law requires courts to be “fussy” about evidence, and that Rule 702 requires courts to exclude expert witnesses, whose opinions fail to “employ[] in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Browne at 143 (quoting from Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Browne tells us, with apparently appropriate intellectual rigor, that “[i]f a disparity that does not provide a p-value of less than 0.05 would not be accepted as meaningful in the expert’s discipline, it is not clear that the expert should be allowed to testify – on the basis of his expertise in that discipline – that the disparity is, in fact, meaningful.” Id.

In a volte face, Browne then argues that p-values do “not tell us much,” basically because they are dependent upon sample size. Browne suggests that the quantitative disparity between expected value and observed proportion or average can be assessed without the use of p-values, and that measuring a p-value “adds virtually nothing and just muddies the water.” Id. at 152. The prevalent confusion among judges and lawyers seems sufficient in Browne’s view to justify his proposal, as well as his further suggestion that Rule 403 should be invoked to exclude p-values:

The ease with which reported p-values cause a trier of fact to slip into the transposition fallacy and the difficulty of avoiding that lapse of logic, coupled with the relatively sparse information actually provided by the p-value, make p-values prime candidates for exclusion under Federal Rule of Evidence 403. *** If judges, not to mention the statistical experts they rely on, cannot use the information without falling into fallacious reasoning, the likelihood that the jury will misunderstand the evidence is very high. Since the p-value actually provides little useful relevant information, the high risk of misleading the jury greatly exceeds its scant probative value, so it simply should not be presented to the jury.”

Id. at 152-53.

And yet, elsewhere in the same article, Browne ridicules one court and several expert witnesses who have argued in favor of conclusions that were based upon p-values up to 50%.1 The concept of p-values cannot be so flexible as to straddle the extremes of having no probative value, and yet capable of rendering an expert witness’s opinions ludicrous. P-values quantify an estimate of random error, even if that error rate varies with sample size. To be sure, the measure of random error depends upon the specified model and assumption of a null hypothesis, but the crucial point is that the estimate (whether mean, proportion, risk ratio, risk difference, etc.) is rather meaningless without some further estimate of random variability of the estimate. Of course, random error is not the only type of error, but the existence of other potential systematic errors is hardly a reason to ignore random error.

In the science of health effects, many applications of p-values have given way to the use of confidence intervals, which arguably provide more direct assessments of both sample estimates, along with ranges of potential outcomes that are reasonably compatible with the sample estimates. Remarkably, Browne never substantively discusses confidence intervals in his article.

Under the heading of other problems with p-values and significance testing, Browne advances four additional putative problems with p-values. First, Browne asserts with little to no support that “[t]he null hypothesis is unlikely a priori.” Id. at 155. He fails to tell us why the null hypothesis of no disparity is not a reasonable starting place in the absence of objective evidence of a prior estimate. Furthermore, a null hypothesis of no difference will have legal significance in claims of health effects, or of unlawful discrimination.

Second, Browne argues that significance testing will lead to “[c]onflation of statistical and practical (or legal) significance” in the minds of judges and jurors. Id. at 156-58. This charge is difficult to sustain. The actors in legal cases can probably best appreciate practical significance and its separation from statistical significance, most readily. If a large class action showed that the expected value of a minority’s proportion was 15%, and the observed proportion was 14.8%, p < 0.05, most innumerate judges and jurors would sense that this disparity was unimportant and that no employer would fine tune its discriminatory activities so closely as to achieve such a meaningless difference.

Third, Browne reminds us that the validity and the interpretation of a p-value turns on the assumption that the statistical model is perfectly specified. Id. at 158-59. His reminder is correct, but again, this aspect of p-values (or confidence intervals) is relatively easy to explain, as well as to defend or challenge. To be sure, there may be legitimate disputes about whether an appropriate model was used (say binomial versus hypergeometric), but such disputes are hardly the most arcane issues that judges and jurors will face.

Fourth, Browne claims that “the alternative hypothesis is seldom properly specified.” Id. at 159-62. Unless analysts are focused on measuring pre-test power or type II error, however, they need not advance an alternative hypothesis. Furthermore, it is hardly a flaw with significance testing that it does not account for systematic bias or confounding.

Browne does not offer an affirmative response such as urging courts to adopt a Bayesian program. A Bayesian response to prevalent blunders in interpreting statistical significance would introduce perhaps even more arcane and hard-to-discern blunders in court proceedings. Browne also leaves courts without a meaningful approach to evaluate random error other than to engage in crude comparisons between two means or proportions. The recommendations in this law review article appear to be a giant step, backwards, into an epistemic void.


1See Browne at 146, citing In re Photochromic Lens Antitrust Litig., 2014 WL 1338605 (M.D. Fla. April 3, 2014) (reversing magistrate judge’s exclusion of an expert witness who had advanced claims based upon p-value of 0.50); id. at 147 n. 116, citing In re High-Tech Employee Antitrust Litig., 2014 WL 1351040 (N.D. Cal. 2014).

Failed Gatekeeping in Ambrosini v. Labarraque (1996)

December 28th, 2017

The Ambrosini case straddled the Supreme Court’s 1993 Daubert decision. The case began before the Supreme Court clarified the federal standard for expert witness gatekeeping, and ended in the Court of Appeals for the District of Columbia, after the high court adopted the curious notion that scientific claims should be based upon reliable evidence and valid inferences. That notion has only slowly and inconsistently trickled down to the lower courts.

Given that Ambrosini was litigated in the District of Columbia, where the docket is dominated by regulatory controversies, frequently involving dubious scientific claims, no one should be surprised that the D.C. Court of Appeals did not see that the Supreme Court had read “an exacting standard” into Federal Rule of Evidence 702. And so, we see, in Ambrosini, this Court of Appeals citing and purportedly applying its own pre-Daubert decision in Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).1 In 2000, the Federal Rule of Evidence 702 was revised in a way that extinguishes the precedential value of Ambrosini and the broad dicta of Ferebee, but some courts and commentators have failed to stay abreast of the law.

Escolastica Ambrosini was using a synthetic progestin birth control, Depo-Provera, as well as an anti-nausea medication, Bendectin, when she became pregnant. The child that resulted from this pregnancy, Teresa Ambrosini, was born with malformations of her face, eyes, and ears, cleft lip and palate, and vetebral malformations. About three percent of all live births in the United States have a major malformation. Perhaps because the Divine Being has sovereign immunity, Escolastica sued the manufacturers of Bendectin and Depo-Provera, as well as the prescribing physician.

The causal claims were controversial when made, and they still are. The progestin at issue, medroxyprogesterone acetate (MPA), was embryotoxic in the cynomolgus monkey2, but not in the baboon3. The evidence in humans was equivocal at best, and involved mostly genital malformations4; the epidemiologic evidence for the MPA causal claim to this day remains unconvincing5.

At the close of discovery in Ambrosini, Upjohn (the manufacturer of the progestin) moved for summary judgment, with a supporting affidavit of a physician and geneticist, Dr. Joe Leigh Simpson. In his affidavit, Simpson discussed three epidemiologic studies, as well as other published papers, in support of his opinion that the progestin at issue did not cause the types of birth defects manifested by Teresa Ambrosini.

Ambrosini had disclosed two expert witnesses, Dr. Allen S. Goldman and Dr. Brian Strom. Neither Goldman nor Strom bothered to identify the papers, studies, data, or methodology used in arriving at an opinion on causation. Not surprisingly, the district judge was unimpressed with their opposition, and granted summary judgment for the defendant. Ambrosini v. Labarraque, 966 F.2d 1462, 1466 (D.C. Cir. 1992).

The plaintiffs appealed on the remarkable ground that Goldman’s and Strom’s crypto-evidence satisfied Federal Rule of Evidence 703. Even more remarkably, the Circuit, in a strikingly unscholarly opinion by Judge Mikva, opined that disclosure of relied-upon studies was not required for expert witnesses under Rules 703 and 705. Judge Mikva seemed to forget that the opinions being challenged were not given in testimony, but in (late-filed) affidavits that had to satisfy the requirement of Federal Rule of Civil Procedure 26. Id. at 1468-69. At trial, an expert witness may express an opinion without identifying its bases, but of course the adverse party may compel disclosure of those bases. In discovery, the proffered expert witness must supply all opinions and evidence relied upon in reach the opinions. In any event, the Circuit remanded the case for a hearing and further proceedings, at which the two challenged expert witnesses, Goldman and Strom, would have to identify the bases of their opinions. Id. at 1471.

Not long after the case landed back in the district court, the Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). With an order to produce entered, plaintiffs’ counsel could no longer hide Goldman and Strom’s evidentiary bases, and their scientific inferences came under judicial scrutiny.

Upjohn moved again to exclude Goldman and Strom’s opinions. The district court upheld Upjohn’s challenges, and granted summary judgment in favor of Upjohn for the second time. The Ambrosinis appealed again, but the second case in the D.C. Circuit resulted in a split decision, with the majority holding that the exclusion of Goldman and Strom’s opinions under Rule 702 was erroneous. Ambrosini v. Labarraque, 101 F.3d 129 (D.C. Cir. 1996).

Although issued two decades ago, the majority’s opinion remains noteworthy as an example of judicial resistance to the existence and meaning of the Supreme Court’s Daubert opinion. The majority opinion uncritically cited the notorious Ferebee6 and other pre-Daubert decisions. The court embraced the Daubert dictum about gatekeeping being limited to methodologic consideration, and then proceeded to interpret methodology as superficially as necessary to sustain admissibility. If an expert witness claimed to have looked at epidemiologic studies, and epidemiology was an accepted methodology, then the opinion of the expert witness must satisfy the legal requirements of Daubert, or so it would seem from the opinion of the U.S. Court of Appeals for the District of Columbia.

Despite the majority’s hand waving, a careful reader will discern that there must have been substantial gaps and omissions in the explanations and evidence cited by plaintiffs’ expert witnesses. Seeing anything clearly in the Circuit’s opinion is made difficult, however, by careless and imprecise language, such as its descriptions of studies as showing, or not showing “causation,” when it could have meant only that such studies showed associations, with more or less random and systematic error.

Dr. Strom’s report addressed only general causation, and even so, he apparently did not address general causation of the specific malformations manifested by the plaintiffs’ child. Strom claimed to have relied upon the “totality of the data,” but his methodologic approach seems to have required him to dismiss studies that failed to show an association.

Dr. Strom first set forth the reasoning he employed that led him to disagree with those studies finding no causal relationship [sic] between progestins and birth defects like Teresa’s. He explained that an epidemiologist evaluates studies based on their ‘statistical power’. Statistical power, he continued, represents the ability of a study, based on its sample size, to detect a causal relationship. Conventionally, in order to be considered meaningful, negative studies, that is, those which allege the absence of a causal relationship, must have at least an 80 to 90 percent chance of detecting a causal link if such a link exists; otherwise, the studies cannot be considered conclusive. Based on sample sizes too small to be reliable, the negative studies at issue, Dr. Strom explained, lacked sufficient statistical power to be considered conclusive.”

Id. at 1367.

Putting aside the problem of suggesting that an observational study detects a “causal relationship,” as opposed to an association in need of further causal evaluation, the Court’s précis of Strom’s testimony on power is troublesome, and typical of how other courts have misunderstood and misapplied the concept of statistical power. Statistical power is a probability of observing an association of a specified size at a specified level of statistical significance. The calculation of statistical power turns indeed on sample size, the level of significance probability preselected for “statistical significance, an assumed probability distribution of the sample, and, critically, an alternative hypothesis. Without a specified alternative hypothesis, the notion of statistical power is meaningless, regardless of what probability (80% or 90% or some other percentage) is sought for finding the alternative hypothesis. Furthermore, the notion that the defense must adduce studies with “sufficient statistical power to be considered conclusive” creates an unscientific standard that can never be met, while subverting the law’s requirement that the claimant establish causation.

The suggestion that the studies that failed to find an association cannot be considered conclusive because they “lacked sufficient statistical power” is troublesome because it distorts and misapplies the very notion of statistical power. No attempt was made to describe the confidence intervals surrounding the point estimates of the null studies; nor was there any discussion whether the studies could be aggregated to increase their power to rule out meaningful associations.

The Circuit court’s scientific jurisprudence was thus seriously flawed. Without a discussion of the end points observed, the relevant point estimates of risk ratios, and the confidence intervals, the reader cannot assess the strength of the claims made by Goldman and Strom, or by defense expert Simpson, in their reports. Without identifying the study endpoints, the reader cannot evaluate whether the plaintiffs’ expert witnesses relied upon relevant outcomes in formulating their opinions. The court viewed the subject matter from 30,000 feet, passing over at 600 mph, without engagement or care. A strong dissent, however, suggested serious mischaracterizations of the plaintiffs’ evidence by the majority.

The only specific causation testimony to support plaintiff’s claims came from Goldman, in what appears to have been a “differential etiology.” Goldman purported to rule out a genetic cause, even though he had not conducted a critical family history or ordered a state-of-the-art chromosomal study. Id. at 140. Of course, nothing in a differential etiology approach would allow a physician to rule out “unknown” causes, which, for birth defects, make up the most prevalent and likely causes to explain any particular case. The majority acknowledged that these were short comings, but rhetorically characterized them as substantive, not methodologic, and therefore as issues for cross-examination, not for consideration by a judicial gatekeeping. All this is magical thinking, but it continues to infect judicial approaches to specific causation. See, e.g., Green Mountain Chrysler Plymouth Dodge Jeep v. Crombie, 508 F. Supp. 2d 295, 311 (D.Vt. 2007) (citing Ambrosini for the proposition that “the possibility of uneliminated causes goes to weight rather than admissibility, provided that the expert has considered and reasonably ruled out the most obvious”). In Ambrosini, however, Dr. Goldman had not ruled out much of anything.

Circuit Judge Karen LeCraft Henderson dissented in a short, but pointed opinion that carefully marshaled the record evidence. Drs. Goldman and Strom had relied upon a study by Greenberg and Matsunaga, whose data failed to show a statistically significant association between MPA and cleft lip and palate, when the crucial issue of timing of exposure was taken into consideration. Ambrosini, 101 F.3d at 142.

Beyond the specific claims and evidence, Judge Henderson anticipated the subsequent Supreme Court decisions in Joiner, Kumho Tire, and Weisgram, and the year 2000 revision of Rule 702, in noting that the majority’s acceptance of glib claims to have used a “traditional methodology” would render Daubert nugatory. Id. at 143-45 (characterizing Strom and Goldman’s methodologies as “wispish”). Even more importantly, Judge Henderson refused to indulge the assumption that somehow the length of Goldman’s C.V. substituted for evidence that his methods satisfied the legal (or scientific) standard of reliability. Id.

The good news is that little or nothing in Ambrosini survives the 2000 amendment to Rule 702. The bad news is that not all federal judges seem to have noticed, and that some commentators continue to cite the case, as lovely.

Probably no commentator has promiscuously embraced Ambrosini as warmly as Carl Cranor, a philosopher, and occasional expert witness for the lawsuit industry, in several publications and presentations.8 Cranor has been particularly enthusiastic about Ambrosini’s approval of expert witness’s testimony that failed to address “the relative risk between exposed and unexposed populations of cleft lip and palate, or any other of the birth defects from which [the child] suffers,” as well as differential etiologies that exclude nothing.9 Somehow Cranor, as did the majority in Ambrosini, believes that testimony that fails to identify the magnitude of the point estimate of relative risk can “assist the trier of fact to understand the evidence or to determine a fact in issue.”10 Of course, without that magnitude given, the trier of fact could not evaluate the strength of the alleged association; nor could the trier assess the probability of individual causation to the plaintiff. Cranor also has written approvingly of lumping unrelated end points, which defeats the assessment of biological plausibility and coherence by the trier of fact. When the defense expert witness in Ambrosini adverted to the point estimates for relevant end points, the majority, with Cranor’s approval, rejected the null findings as “too small to be significant.”11 If the null studies were, in fact, too small to be useful tests of the plaintiffs’ claims, intellectual and scientific honesty required an acknowledgement that the evidentiary display was not one from which a reasonable scientist would draw a causal conclusion.


1Ambrosini v. Labarraque, 101 F.3d 129, 138-39 (D.C. Cir. 1996) (citing and applying Ferebee), cert. dismissed sub nom. Upjohn Co. v. Ambrosini, 117 S.Ct. 1572 (1997) See also David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89Notre Dame L. Rev. 27, 31 (2013).

2 S. Prahalada, E. Carroad, M. Cukierski, and A.G. Hendrickx, “Embryotoxicity of a single dose of medroxyprogesterone acetate (MPA) and maternal serum MPA concentrations in cynomolgus monkey (Macaca fascicularis),” 32 Teratology 421 (1985).

3 S. Prahalada, E. Carroad, and A.G. Hendrick, “Embryotoxicity and maternal serum concentrations of medroxyprogesterone acetate (MPA) in baboons (Papio cynocephalus),” 32 Contraception 497 (1985).

4 See, e.g., Z. Katz, M. Lancet, J. Skornik, J. Chemke, B.M. Mogilner, and M. Klinberg, “Teratogenicity of progestogens given during the first trimester of pregnancy,” 65 Obstet Gynecol. 775 (1985); J.L. Yovich, S.R. Turner, and R. Draper, “Medroxyprogesterone acetate therapy in early pregnancy has no apparent fetal effects,” 38 Teratology 135 (1988).

5 G. Saccone, C. Schoen, J.M. Franasiak, R.T. Scott, and V. Berghella, “Supplementation with progestogens in the first trimester of pregnancy to prevent miscarriage in women with unexplained recurrent miscarriage: a systematic review and meta-analysis of randomized, controlled trials,” 107 Fertil. Steril. 430 (2017).

6 Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1535 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).

7 Dr. Strom was also quoted as having provided a misleading definition of statistical significance: “whether there is a statistically significant finding at greater than 95 percent chance that it’s not due to random error.” Ambrosini at 101 F.3d at 136. Given the majority’s inadequate description of the record, the description of witness testimony may not be accurate, and error cannot properly be allocated.

8 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320, 327-28 (2006); see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

9 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320 (2006).

10 Id.

11 Id. ; see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

Ferebee Revisited

December 28th, 2017

The following post was originally published on November 8, 2012, but was hacked, no doubt by the lawsuit industry, and replaced with mindless fluff as is its wont. It is now restored.

Ferebee Revisited

I used to think of the infamous Ferebee decision as the Dred Scott decision of scientific evidence; in essence, declaring that science has no validity issues that the law is bound to respect. Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). The rhetoric on expert witnesses, from the district and circuit courts in this case is sometimes jarring, but the facts of the case make the holding, rather than the expansive dicta, not so unreasonable, under all the facts and circumstances of the case.

On rereading Ferebee, I was struck by several aspects of the case that rarely are discussed when Ferebee is cited. On sober second thought, Ferebee may not be such a bad decision, especially considering that it has no continuing validity as a rule of decision for expert witness admissibility in federal court.

1. Ferebee is a government negligence case.

The plaintiff worked for the federal government when he was exposed to the herbicide paraquat. Richard Ferebee began working for the Department of Agriculture’s Beltsville Agricultural Research Center (BARC), in Beltsville, Maryland. He started spraying paraquat in the summer of 1977, and used the herbicide regularly through the time he was diagnosed with pulmonary fibrosis, in November 1979. 736 F.2d at 1531-32. Ferebee brought a failure to warn claim against the supplier of paraquat, Chevron Chemical Company. The allegations of actual or constructive knowledge of a hazard, however, could just as readily be asserted against the federal government, which owned the BARC facility, employed Ferebee, controlled and supervised his use of paraquat, and failed to comply with Chevron’s instructions. The federal government further regulated the sale and use of paraquat extensively, first by the Department of Agriculture, and later by the Environmental Protection Agency. Id. at 1532.

2. The exposure.

Ferebee filed suit in 1981, he died in 1982. His case was tried twice. In the first trial, the jury deadlocked; in the second trial, the jury returned a verdict in favor of his estate, and for his family, for $60,000. In his deposition testimony, Ferebee described how sprayed paraquat, in the summer of 1977. The chemical was diluted for use, per Chevron’s instructions. There was no evidence that Ferebee ever had direct contact with undiluted paraquat, or that the paraquat he was exposed to was not diluted according to the proportions recommended on Chevron’s label. 552 F. Supp. at 1295 & n. 3. Ferebee frequently got the chemical on his hands. 552 F. Supp. at 1294-95. Ferebee further described an occasion when he was drenched with paraquat when he walked behind a tractor that was spraying the chemical, and another incident when he used a defective sprayer that leaked paraquat “all over his pants.” 736 F.2d at 1532. On both occasions, Ferebee did not wash, and apparently went home contaminated, where he fell asleep, tired and dizzy, without showering. Id. As we will see, the exposure that Ferebee described would not have occurred had his federal employer followed the instructions on the label that it mandated. In 1978, the federal Occupational Health & Safety Administration published Guidelines on the need for protective clothing, respirators, immediate washing of contaminated skin, etc. Ferebee’s federal employer recklessly disregarded its own guidelines.

3. The warnings.

Paraquat could be sold in the United States only when labeled in accordance with EPA regulations, promulgated pursuant to the Federal Insecticide, Fungicide, and Rodenticide Act, 7 U.S.C. § 136, et seq. (FIFRA) The statute bars EPA from allowing sale of regulated herbicides, such as paraquat, unless the chemicals, as labeled, will not cause “unreasonable adverse effects on the environment.” 7 U.S.C. § 136a(c)(5)(C). Such effects are in turn defined as any unreasonable risk to man or the environment, taking into account the economic, social, and environmental costs and benefits of the use of [the] pesticide. 7 U.S.C. § 136(bb). FIFRA further requires the EPA to require labeling that is “adequate to protect health and the environment” and that is “likely to be read and understood.” 7 U.S.C. § 136(q)(1)(E). 736 F.2d at 1539-40.

Unfortunately, the courts failed to provide the complete warning label and the material data safety sheets. There are “snippets” provided, which make clear that the federal government was largely to blame for failing to comply with the directions required under FIFRA. For instance, the district court, in a footnote, acknowledged:

“For example, the label advised the user spraying paraquat to wear waterproof clothing and goggles, to avoid working in spray mist, and to wash splashes on the skin or eyes immediately with water.”

552. F. Supp. at 1304 n.40. The Court of Appeals reported that “the label, in large bold letters states:

DANGER

CAN KILL IF SWALLOWED

HARMFUL TO THE EYES AND SKIN

736 F.2d at 1536. The label also informed users to wash any exposed areas immediately, and to remove contaminated clothing. Id.

4. The Stipulation.

A key fact, rarely described or explained in discussions of the Ferebee case, is the parties’ stipulation

“that Mr. Ferebee’s only significant exposure to paraquat was on his intact skin; i.e., there was no evidence that Mr. Ferebee swallowed or inhaled paraquat, or that he spilled or sprayed it on an area of his skin upon which he had any apparent cuts or scrapes. The jury was not, of course, precluded from concluding that a person engaged in Mr. Ferebee’s line of work could have had some, or even many, minor cuts or abrasions not readily discernible to the naked eye or likely to be remembered some time later.”

552. F. Supp. at 1295 & n. 3.

Why did the plaintiffs try to present their case solely as a dermal exposure cases? As we will see, this stratagem made their medical causation case more difficult, but it avoided serious misuse and lack of proximate cause issues. Ferebee had been instructed by his co-workers and supervisors that paraquat was extremely dangerous if swallowed, and probably also if inhaled. The warning label was unequivocal in detailing the dangers and the need to avoid ingestion. (Without the full label, it is difficult to evaluate how well the label warned against inhalation, but the 1978 OSHA guidelines address the use of a proper respirator for situations in which paraquat may be inhaled.) On the other hand, the label had a weakness, which could be exploited, as long as the preemption defense could be held at bay: the label urged protective clothing, goggles, and immediate washing of contaminated skin, but it failed to describe the consequence of dermal exposure other than irritation. Ferebee could thus avoid his culpable conduct, as well as a sophisticated intermediary defense, by claiming that his exposure was only dermal.

Why did Chevron agree to the stipulation? The defendant probably felt sanguine about its preemption defense, and thus also about the adequacy of its warnings overall. The stipulation limited the plaintiff’s medical causation case to a route of exposure that put it into an arguable “first instance” case report. Chevron stood to gain a claim of “lack of notice,” and thus lack of actual or constructive knowledge of the risk of lung disease from dilute dermal exposure. The clinical presentation itself differed from many of the cases of known paraquat poisoning, see infra, and Chevron probably believed that it could deal with the medical causation claim better if exposure was limited to transdermal absorption. Curiously, Chevron did not argue that Ferebee must have had some inhalational exposure, which he almost certainly did. I suspect that Chevron’s position on inhalation was hedged because its warning label did not specify respirator usage for ordinary work exposures of applicators (as opposed to workers who handled undiluted paraquat, worked in confined spaces, etc.).

5. Medical causation

Chevron took a strident position, standing on the fact that there had been no previous documented cases of pulmonary fibrosis in workers exposed to diluted paraquat through their skin. The following facts were uncontroverted:

  • Paraquat causes pulmonary fibrosis in humans.
  • The evidence that established paraquat as a cause of pulmonary fibrosis was largely case series of acute onset of pulmonary fibrosis after ingestion.
  • Paraquat induces pulmonary fibrosis relatively rapidly.
  • Paraquat can be absorbed through the skin.
  • The parties agreed that any type of exposure – ingestion, inhalation, or dermal absorption – could cause lung damage. 552. F. Supp. at 1300 & n.28.
  • Once paraquat is ingested, inhaled, or absorbed, it can travel to the lungs.
  • Lung fibrosis caused by dermal absorption of paraquat had been described previously only with skin lesions before or after the injury. 736 F.2d at 1538.
  • The lungs are the target organ for paraquat.
  • There are numerous causes of pulmonary fibrosis (such as asbestosis, scleroderma, rheumatoid arthritis, etc.).
  • The variants of pulmonary fibrosis do not all look alike, present alike, or progress alike.
  • Mr. Ferebee had no known other disease or exposure that could account for his pulmonary fibrosis.
  • There is are cases of pulmonary fibrosis with no identifiable cause, known as idiopathic pulmonary fibrosis (IPF).
  • IPF is relatively rare; it too has a rapid onset and progression, although not as fast as the cases described after exposure to undiluted paraquat.
  • Mr. Ferebee’s medical history was largely unhelpful in explaining his clinical course.
  • Ferebee had some shortness of breath before starting to use paraquat. 552. F. Supp. at 1295.
  • Ferebee used paraquat occasionally over three years before he was diagnosed with pulmonary fibrosis.

Some observations about these facts. General causation in a sense was not contested. Paraquat causes pulmonary fibrosis. The issue was whether dilute dermal exposure over three years causes pulmonary fibrosis. Chevron stridently asserted that the “scientific method” required controlled experimental or observational (epidemiologic) studies. The problem with Chevron’s position was that general causation had already been established, and not by analytical epidemiologic studies.

6. The expert witnesses.

Ferebee was initially treated by Dr. Muhammed Yusuf, a pulmonary specialist, who diagnosed pulmonary fibrosis. Dr. Yusef referred Ferebee to the National Institutes of Health (NIH), where he came under the care of Dr. Ronald G. Crystal of the Heart, Lung, and Blood Institute. (Dr. Crystal is now at Cornell-Weill, where he is Chairman of Genetic Medicine, and he practices pulmonary medicine.)

Chevron called Dr. Carrington, who diagnosed Ferebee with IPF. Dr. Carrington challenged the plaintiffs’ expert witnesses’ opinions for lacking reliance upon controlled observational or experimental studies. 552. F. Supp. at 1301. Dr. Carrington, however, acknowledged that dermal cases are too rare for observational epidemiologic analysis, but emphasized that no animal studies of sufficient size had been done to support plaintiffs’ hypothesis. Chevron also called a Dr. Fisher, who presented a toxicokinetic (TK) analysis of Ferebee’s dermal absorption. Based upon his TK analysis, Dr. Fisher concluded that the maximal amount of paraquat absorbed by Ferebee was too small, based upon known cases and animal studies, to have caused paraquat toxicity. Id.

7. Chevron’s challenge to plaintiffs’ expert witnesses’ causation opinion.

None of the defendant’s expert witnesses examined Ferebee. The courts thought this was relevant, but they never articulated what would have been observed on physical examination that was important to resolving the differential diagnosis of paraquat toxicity versus IPF. There was no dispute that Ferebee had rapidly progressing pulmonary fibrosis. The expert witnesses on both sides evaluated Ferebee’s clinical data, presentation, clinical course, and arrived at different diagnoses. The plaintiffs’ expert witnesses’ diagnosis, however, involved a causal attribution to paraquat exposure.

The Ferebee case was litigated under Maryland law because federal statutory law requires state law to control in a wrongful death action arising out of the neglect or wrongful act of another on a federal enclave. 16 U.S.C. § 457. 736 F.2d at 1533. (Maryland law is actually favorable to a sophisticated intermediary defense, although the key decisions post-date Ferebee.) Chevron appears to have relied upon Maryland’s articulation of the Frye general acceptance doctrine, and the courts analyzed Chevron’s arguments as a Frye challenge. 552 F. Supp. at 1301; 736 F.2d at 1535. Although the use of Maryland law to determine an evidentiary issue seems suspect, Chevron pressed apparently pressed its challenge in terms of Maryland’s version of Frye, and not based upon Federal Rule of Evidence 702. The infamous language used by both the district and the circuit courts was, therefore, not an interpretation of federal law. Rule 702 was never cited or discussed in either the trial or the appellate court’s opinion.

My re-reading of Ferebee has softened my criticisms of state courts that had relied upon the case, even after the Supreme Court’s decision in Daubert. Softened but not eliminated my criticism — Ferebee is still a case largely confined to its facts, and the language quoted as a standard of admissibility is really a statement of the appellate standard of review for the jury’s determination of medical causation.

8. The judicial resolution of Chevron’s Frye challenge

The district court insightfully recognized that Chevron was demanding a level of evidence, which had never been required to establish paraquat’s generally accepted ability to cause pulmonary fibrosis. This recognition led to the district court’s colorful language:

“It is true that medical expert testimony must be grounded in proper scientific methodology, but the extremely stringent standard that defendant suggests is beyond reason. Product liability law, especially as it relates to relatively new products or those with a relatively rare yet significant danger, would be rendered next to meaningless if a plaintiff could prove he was injured by a product only after a ‘statistically significant’ number of other people were also injured. A civilized legal system does not require that much human sacrifice before it can intervene. The fact that this is the first case of this exact type-or at least the first of its exact type in which the involvement of paraquat was discovered by alert doctors — cannot be enough by itself to shield defendant from liability. Defendant’s experts were not able to fault Dr. Crystal for his basic diagnostic methodology; in fact, they used the same kinds of test results, consultations, and other tools that he did. What they disagreed with chiefly were his conclusions.”

552 F. Supp. at 1301. The important observation is that general causation had been established case series and reports of human exposure. There never was statistical evidence that had been evaluated for “significance,” to establish general causation for undiluted paraquat, and the trial court refused, under Maryland law, to require such evidence for general causation for diluted paraquat. In this context, we can see that the trial court’s suggestion that statistical significance was not required has little bearing upon, cases in which general causation could only be established using epidemiologic evidence, with its attendant statistical inferences.

Of course, the matter only became worse when Chevron persisted in its argument and presented it to a liberal panel of the D.C. Circuit. (Judge Mikva wrote the opinion for a panel that included Judge Wald, and Senior Judge Bazelon.) The panel’s decision ratcheted up the rhetoric:

“Thus, a cause-effect relationship need not be clearly established by animal or epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists. As long as the basic methodology employed to reach such a conclusion is sound, such as use of tissue samples, standard tests, and patient examination, product liability does not preclude recovery until a ‘statistically significant’ number of people have been injured or until science has had the time and resources to complete sophisticated laboratory studies of the chemical. In a courtroom, the test for allowing a plaintiff to recover is not scientific certainty, but legal sufficiency; if reasonable jurors could conclude from the expert testimony that paraquat more likely than not caused Ferebee’s injury, the fact that another jury might reach the opposite conclusion or that science would require more evidence before conclusively considering the causation question resolved is irrelevant. That Ferebee’s case may have been the first of its exact type, or that his doctors may have been the first alert enough to recognize such a case, does not mean that the testimony of those doctors, who are concededly well qualified in their fields, should not have been admitted.”

736 F.2d at 1535-36 (emphasis in original).

Again, the dismissive attitude towards statistically significant evidence is limited to the context of a causal analysis that had been made, to everyone’s satisfaction, for undiluted paraquat, without the need for epidemiologic, statistical evidence. Statistical significance was never at issue. In this way, Ferebee resembles the untoward language on statistical significance from Matrixx Initiatives Inc. v. Siracusano. In both cases, statistical significance was never really at issue. In Ferebee, there was no statistical evidence needed or used to reach causal conclusions about paraquat’s ability to induce pulmonary fibrosis. In Matrixx Initiatives, allegations of statistical significance and causation were not necessary because the plaintiffs needed only to allege materiality of the facts suppressed by the company in order to plead a securities fraud case. Materiality could be established without causation, and thus neither causation nor statistical significance needed to be alleged.

As for Chevron’s Frye challenge, the district court rejected the implied call for a vote on the general acceptance of Dr. Crystal’s reasoning. Frye may require “vote counting” of some sort, but the process becomes irrelevant when virtually no one has registered to vote. Otherwise, the defense and the plaintiffs’ expert witnesses appeared to be using the same technique of arguing by analogy to accepted cases of paraquat poisoning or IPF. Dr. Crystal opined that Ferebee’s case was “similar” to three other cases he had identified. Dr. Carrington argued that Ferebee’s case was more like IPF cases, although IPF cases themselves have some clinical heterogeneity as well. Paraquat cases described onset to death as a very rapid process. Ferebee did not present with significant symptoms for three years after his first exposure, and then he survived for another two plus years. Ferebee did not report skin lesions, which had been reported in previous cases of dermal exposure leading up to pulmonary fibrosis. The case presented, on the diagnostic level, a difficult call, but it is easy to see the courts’ impatience with the defendant’s insistence upon more stringent criteria and evidence than was used to establish the causal connection with undiluted paraquat.

9. Expert witness qualifications.

Chevron never challenged Dr. Yusuf’s or Dr. Crystal’s qualifications. The oft-quoted comments about expert witness qualifications were made in the context of describing the appellate court’s standard of review, and the court’s role in not assessing credibility or weighing the evidence:

“These admonitions apply with special force in the context of the present action, in which an admittedly dangerous chemical is alleged through long-term exposure to have caused disease. Judges, both trial and appellate, have no special competence to resolve the complex and refractory causal issues raised by the attempt to link low-level exposure to toxic chemicals with human disease. On questions such as these, which stand at the frontier of current medical and epidemiological inquiry, if experts are willing to testify that such a link exists, it is for the jury to decide whether to credit such testimony.”

736 F.2d at 1534.

This procedural posture is obviously very different from the initial determination of admissibility. As far as credentials are concerned, Drs. Yusuf and Crystal were hardly “hired guns”; both physicians were well qualified. Dr. Crystal had outstanding qualifications, and Chevron wisely never challenged them. Remarkably, this language has been mistakenly invoked as a standard for trial courts to use in determining the admissibility of expert witness opinion testimony. It is no such thing.

10. Preemption and Warnings Causation.

Ultimately, Chevron’s preemption defense was rejected by both the district and the circuit court. FIFRA preemption has had its ups and downs; no surprise there. More interesting is the emphasis that both courts gave to the important role of the employer in the case. The evidence overwhelming showed that Ferebee had never read the warning label, and thus the element of proximate causation between allegedly inadequate warning and harm was in jeopardy of going unproved. The courts, however, emphasized the role that the employer, through its supervisors and responsible co-workers, play in the complex organizational situation of a modern workplace:

“Mr. Ferebee’s situation was quite different, however. He did not purchase paraquat for his personal use; rather, it was provided to him by his employer for use on the job. The evidence showed that his principal source of information about paraquat was the oral instructions of his supervisors and co-workers, not the written label. He learned from them how to mix the product and how to spray it. It was also from this source that he learned of the danger of getting the product in his mouth: one of his co-workers warned him that if he accidently swallowed paraquat, it would ‘get in his blood’ and poison him. This is a common pattern of instruction and use of occupational materials in the workplace. Learning by doing and learning by oral instruction are tried and true methods of educating manual workers in their jobs. Therefore, although it is crucial to plaintiff’s case that someone would have read the label, it was not necessary for Mr. Ferebee to have done so. And it is obvious that one or more employees at BARC did read the label, since information did reach Mr. Ferebee about the proportions for diluting the product and about the dangers about which the label did warn. It was appropriate for the jury to infer that a warning about the danger of fatal lung disease from dermal exposure would also have been communicated to Mr. Ferebee. See Restatement (Second) of Torts § 388 comment n (seller normally entitled to assume that adequate warning will be passed on by purchaser to ultimate user); cf. Chambers v. G.D. Searle & Co., 441 F.Supp. at 381 (in product liability case involving prescription drug, relevant warning is the one given to doctor, not patient).”

552 F. Supp. at 1303-04 (internal citations omitted). So here we have Ferebee, the subject of so much derision and aspersion from defense counsel, embracing the Section 388, comment n, as well as applying learned intermediary principles to a case not involving prescription drugs. The appellate court was waxed enthusiastic about the principles of Section 388, and went so far as to cite Victor Schwartz in support:

“We live in an organizational society in which traditional common-law limitations on an actor’s duty must give way to the realities of society. *** In this case, Mr. Ferebee did not purchase the paraquat for his personal use, and there was substantial evidence that workplace communication about the dangers associated with various chemicals usually took the form of oral instructions from supervisors to workers, the latter of whom then retransmitted the information to co-workers. This, rather than individual reading of product warnings, is a typical method by which information is disseminated in the modern workplace. See Schwartz & Driver, “Warnings in the Workplace: The Need for a Synthesis of Law and Communication Theory,” 52 U. Cinn. L. Rev. 38, 66-83 (1983). The requirement that an improper warning proximately ‘cause’ the injury should be elaborated against this background. We believe Maryland would construe its tort law in this case to require only that someone in the workplace have read the label, not that Mr. Ferebee personally have read it. Because there is no dispute that one or more employees at BARC did read the label, we hold that the jury could properly have inferred that, had a warning about the danger of disease from dermal exposure been included on the label, that warning would have been communicated to Mr. Ferebee and that he would as a result have acted differently. Alternatively, the jury could have inferred that an adequate warning would have led Ferebee’s employers to undertake steps that would have protected him from paraquat poisoning-for example, provision of showers for use after spraying.”

736 F.2d at 1539 (emphasis in original; internal citation omitted). Judge Mikva’s prediction, of course, was absolutely accurate; Maryland tort law did, soon thereafter, embrace the sophisticated intermediary defense to exculpate the defendant in such remote supplier situations. See, e.g., Kennedy v. Mobay Corp., 84 Md. App. 397 (1990) (applying sophisticated user defense to bar claims against manufacturers of toluene diisocyanate), aff’d, 325 Md. 385 (1992); Higgins v. E.I. DuPont de Nemours, Inc., 671 F. Supp. 1055 (D. Md. 1987) (Maryland law; holding that manufacturer of paint was in better position than bulk supplier to communicate warnings to customers’ employees), aff’d, 863 F.2d 1162 (4th Cir. 1988). The principle invoked to excuse plaintiff from reading the warning label also works to exculpate the defendant when that warning label is otherwise adequate, or when the intermediary knows of the hazard in any event.

Some High-Value Targets for Sander Greenland in 2018

December 27th, 2017

A couple of years ago, Sander Greenland and I had an interesting exchange on Deborah Mayo’s website. I tweaked Sander for his practice of calling out defense expert witnesses for statistical errors, while ignoring whoopers made by plaintiffs’ expert witnesses. SeeSignificance Levels Made a Whipping Boy on Climate-Change Evidence: Is p < 0.05 Too Strict?” Error Statistics (Jan. 6, 2015).1 Sander acknowledged that he received a biased sample of expert reports through his service as a plaintiffs’ expert witness, but protested that defense counsel avoided him like the plague. In an effort to be helpful, I directed Sander to an example of bad statistical analysis that had been proffered by Dr Bennett Omalu, in a Dursban case, Pritchard v. Dow Agro Sciences, 705 F. Supp. 2d 471 (W.D. Pa. 2010), aff’d, 430 F. App’x 102, 104 (3d Cir. 2011).2

Sander was unimpressed with my example of Dr. Omalu; he found the example “a bit disappointing though because [Omalu] was merely a county medical examiner, and his junk analysis was duly struck. The expert I quoted in my citations was a full professor of biostatistics at a major public university, a Fellow of the American Statistical Association, a holder of large NIH grants, and his analysis (more subtle in its transgressions) was admitted” (emphasis added). Sander expressed an interest in finding “examples involving similarly well-credentialed, professionally accomplished plaintiff experts whose testimony was likewise admitted… .”

Although it was heartening to read Sander’s concurrence in the assessment of Omalu’s analysis as “junk,” Sander’s rejection of Dr. Omalu as merely a low-value target was disappointing, given that Omalu also has a master’s degree in public health, from the University of Pittsburgh, where he claims he studied with Professor Lew Kuller. Omalu has also gained some fame and notoriety for his claim to have identified the problem of chronic traumatic encephalopathy (CTE) among professional football players. After all, even Sander Greenland has not been the subject of a feature-length movie (Concussion), as has Omalu.

I lost track of our exchange in 2015, until recently I was reminded of it when reading an expert report by Professor Martin Wells. Unlike Omalu, Wells meets all the Greenland criteria for high-value targets. He is not only a full, chaired professor but also the statistics department chairman at an ivy-league school, Cornell University. Wells is a fellow of both the American Statistical Association and the Royal Statistical Society, but most important, Wells is a frequent plaintiffs’ expert witness, who is well known to Sander Greenland. Both Wells and Greenland served, side by side, as plaintiffs’ expert witnesses in the pain pump litigation.

So here is the passage in the Wells’ report that is worthy of Greenland’s attention:

If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”

In re Testosterone Replacement Therapy Prods. Liab. Litig., Declaration of Martin T. Wells, Ph.D., at 2-3 (N.D. Ill., Oct. 30, 2016). Unlike the Dursban litigation involving Bennett Omalu, where the “junk analysis” was excluded, in the litigation against AbbVie for its manufacture and selling of prescription testosterone supplementation, Wells’ opinions were not excluded or limited. In re Testosterone Replacement Therapy Prods. Liab. Litig., No. 14 C 1748, MDL No. 2545, 2017 WL 1833173 (N.D. Ill. May 8, 2017) (denying Rule 702 motions).

Now this statement by Wells surely offends the guidance provided by Greenland and colleagues.3 And it was exactly the sort of misrepresentation that led to a confabulation of the American Statistical Association, and that Association’s consensus statement on statistical significance.4

And here is another example, which occurs not in a distorting litigation forum, but on the pages of an occupational health journal, where the editor in chief, Anthony L. Kiorpes, ranted about the need for better statistical editing and writing in his own journal. See Anthony L Kiorpes, “Lies, damned lies, and statistics,” 33 Toxicol. & Indus. Health 885 (2017). Kiorpes decried he misuse of statistics:

I am not implying that it is the intent of the scientists who publish in these pages to mislead readers by their use of statistics, but I submit that the misuse of statistics, whether intentional or otherwise, creates confusion and error.”

Id. at 885. Kiorpes then proceeded to hold himself up as Exhibit A to his screed:

Remember that p values are estimates of the probability that the null hypothesis (no difference) is true.”

Id. Uggh; we seem to be back sliding after the American Statistical Association’s consensus statement.

Almost all scientists have stated (or have been tempted to state) something like ‘the mean of Group A was greater than that of Group B, but the difference was not statistically significant’. With very few exceptions (which I will mention below), this statement is nonsense.”

* * * * *

What the statistics are indicating when the p-value is greater than 0.05 is that there is ‘no difference’ between group A and group B.”

Id. at 886.

Let’s hope that this gets Sander Greenland away from his biased sampling of expert witnesses, off the backs of defense expert witnesses, and on to some of the real culprits out there, in the new year.


See also Sander Greenland on ‘The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics’” (Feb. 8, 2015).

See alsoPritchard v. Dow Agro – Gatekeeping Exemplified” (Aug. 25, 2014); Omalu and Science — A Bad Weld” (Oct. 22, 2016); Brian v. Association of Independent Oil Distributors, No. 2011-3413, Westmoreland Cty. Ct. Common Pleas, Order of July 18, 2016 (excluding Dr. Omalu’s testimony on welding and solvents and Parkinson’s disease).

3 See, e.g., Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

4 Ronald L. Wasserstein & Nicole A. Lazar, “American Statistical Association Statement on statistical significance and p values,” 70 Am. Statistician 129 (2016)

Gatekeeping of Expert Witnesses Needs a Bair Hug

December 20th, 2017

For every Rule 702 (“Daubert”) success story, there are multiple gatekeeping failures. See David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).1 Exemplars of inadequate expert witness gatekeeping in state or federal court abound, and overwhelm the bar. The only solace one might find is that the abuse-of-discretion appellate standard of review keeps the bad decisions from precedentially outlawing the good ones.

Judge Joan Ericksen recently provided another Berenstain Bears’ example of how not to keep the expert witness gate, in litigation claims that the Bair Hugger forced air warming devices (“Bair Huggers”) cause infections. In re Bair Hugger Forced Air Warming, MDL No. 15-2666, 2017 WL 6397721 (D. Minn. Dec. 13, 2017). Although Her Honor properly cited and quoted Rule 702 (2000), a new standard is announced in a bold heading:

Under Federal Rule of Evidence 702, the Court need only exclude expert testimony that is so fundamentally unsupported that it can offer no assistance to the jury.”

Id. at *1. This new standard thus permits largely unsupported opinion that can offer bad assistance to the jury. As Judge Ericksen demonstrates, this new standard, which has no warrant in the statutory text of Rule 702 or its advisory committee notes, allows expert witnesses to rely upon studies that have serious internal and external validity flaws.

Jonathan Samet, a specialist in pulmonary medicine, not infectious disease or statistics, is one of the plaintiffs’ principal expert witnesses. Samet relies in large measure upon an observational study2, which purports to find an increased odds ratio for use of the Bair Hugger among infection cases in one particular hospital. The defense epidemiologist, Jonathan B. Borak, criticized the McGovern observational study on several grounds, including that the study was highly confounded by the presence of other known infection risks. Id. at *6. Judge Ericksen characterized Borak’s opinion as an assertion that the McGovern study was an “insufficient basis” for the plaintiffs’ claims. A fair reading of even Judge Ericksen’s précis of Borak’s proffered testimony requires the conclusion that Borak’s opinion was that the McGovern study was invalid because of data collection errors and confounding. Id.

Judge Ericksen’s judicial assessment, taken from the disagreement between Samet and Borak, is that there are issues with the McGovern study, which go to “weight of the evidence.” This finding obscures, however, that there were strong challenges to the internal and external validity of the study. Drawing causal inferences from an invalid observational study is a methodological issue, not a weight-of-the-evidence problem for the jury to resolve. This MDL opinion never addresses the Rule 703 issue, whether an epidemiologic expert would reasonably rely upon such a confounded study.

The defense proffered the opinion of Theodore R. Holford, who criticized Dr. Samet for drawing causal inferences from the McGovern observational study. Holford, a professor of biostatistics at Yale University’s School of Public Health, analyzed the raw data behind the McGovern study. Id. at *8. The plaintiffs challenged Holford’s opinions on the ground that he relied on data in “non-final” form, from a temporally expanded dataset. Even more intriguingly, given that the plaintiffs did not present a statistician expert witness, plaintiffs argued that Holford’s opinions should be excluded because

(1) he insufficiently justified his use of a statistical test, and

(2) he “emphasizes statistical significance more than he would in his professional work.”

Id.

The MDL court dismissed the plaintiffs’ challenge on the mistaken conclusion that the alleged contradictions between Holford’s practice and his testimony impugn his credibility at most.” If there were truly such a deviation from the statistical standard of care, the issue is methodological, not a credibility issue of whether Holford was telling the truth. And as for the alleged over-emphasis on statistical significance, the MDL court again falls back to the glib conclusions that the allegation goes to the weight, not the admissibility of expert witness opinion testimony, and that plaintiffs can elicit testimony from Dr Samet as to how and why Professor Holford over-emphasized statistical significance. Id. Inquiring minds, at the bar, and in the academy, are left with no information about what the real issues are in the case.

Generally, both sides’ challenges to expert witnesses were denied.3 The real losers, however, were the scientific and medical communities, bench, bar, and general public. The MDL court glibly and incorrectly treated methodological issues as “credibility” issues, confused sufficiency with validity, and banished methodological failures to consideration by the trier of fact for “weight.” Confounding was mistreated as simply a debating point between the parties’ expert witnesses. The reader of Judge Ericksen’s opinion never learns what statistical test was used by Professor Holford, what justification was needed but allegedly absent for the test, why the justification was contested, and what other test was alleged by plaintiffs to have been a “better” statistical test. As for the emphasis given statistical significance, the reader is left in the dark about exactly what that emphasis was, and how it led to Holford’s conclusions and opinions, and what the proper emphasis should have been.

Eventually appellate review of the Bair Hugger MDL decision must turn on whether the district court abused its discretion. Although appellate courts give trial judges discretion to resolve Rule 702 issues, the appellate courts cannot reach reasoned decisions when the inferior courts fail to give even a cursory description of what the issues were, and how and why they were resolved as they were.


2 P. D. McGovern, M. Albrecht, K. G. Belani, C. Nachtsheim, P. F. Partington, I. Carluke, and M. R. Reed, “Forced-Air Warming and Ultra-Clean Ventilation Do Not Mix: An Investigation of Theatre Ventilation, Patient Warming and Joint Replacement Infection in Orthopaedics,” 93 J. Bone Joint 1537 (2011). The article as published contains no disclosures of potential or actual conflicts of interest. A persistent rumor has it that the investigators were funded by a commercial rival to the manufacturer of the Bair Hugger at issue in Judge Ericksen’s MDL. See generally, Melissa D. Kellam, Loraine S. Dieckmann, and Paul N. Austin, “Forced-Air Warming Devices and the Risk of Surgical Site Infections,” 98 Ass’n periOperative Registered Nurses (AORN) J. 354 (2013).

3 A challenge to plaintiffs’ expert witness Yadin David was sustained to the extent he sought to offer opinions about the defendant’s state of mind. Id. at *5.