For your delectation and delight, desultory dicta on the law of delicts.

Confounding in Daubert, and Daubert Confounded

November 4th, 2018


The Daubert trilogy and the statutory revisions to Rule 702 have not brought universal enlightenment. Many decisions reflect a curmudgeonly and dismissive approach to gatekeeping.

The New Jersey Experience

Until recently, New Jersey law looked as though it favored vigorous gatekeeping of invalid expert witness opinion testimony. The law as applied, however, was another matter, with most New Jersey judges keen to find ways to escape the logical and scientific implications of the articulated standards, at least in civil cases.1 For example, in Grassis v. Johns-Manville Corp., 248 N.J. Super. 446, 591 A.2d 671, 675 (App. Div. 1991), the intermediate appellate court discussed the possibility that confounders may lead to an erroneous inference of a causal relationship. Plaintiffs’ counsel claimed that occupational asbestos exposure causes colorectal cancer, but the available studies, inconsistent as they were, failed to assess the role of smoking, family history, and dietary factors. The court essentially shrugged its judicial shoulders and let a plaintiffs’ verdict stand, even though it was supported by expert witness testimony that had relied upon seriously flawed and confounded studies. Not surprisingly, 15 years after the Grassis case, the scientific community acknowledged what should have been obvious in 1991: the studies did not support a conclusion that asbestos causes colorectal cancer.2

This year, however, saw the New Jersey Supreme Court step in to help extricate the lower courts from their gatekeeping doldrums. In a case that involved the dismissal of plaintiffs’ expert witnesses’ testimony in over 2,000 Accutane cases, the New Jersey Supreme Court demonstrated how to close the gate on testimony that is based upon flawed studies and involves tenuous and unreliable inferences.3 There were other remarkable aspects of the Supreme Court’s Accutane decision. For instance, the Court put its weight behind the common-sense and accurate interpretation of Sir Austin Bradford Hill’s famous articulation of factors for causal judgment, which requires that sampling error, bias, and confounding be eliminated before assessing whether the observed association is strong, consistent, plausible, and the like.4

Cook v. Rockwell International

The litigation over radioactive contamination from the Colorado Rocky Flats nuclear weapons plant is illustrative of the retrograde tendency in some federal courts. The defense objected to plaintiffs’ expert witness, Dr. Clapp, whose study failed to account for known confounders.5 Judge Kane denied the challenge, claiming that the defense could:

cite no authority, scientific or legal, that compliance with all, or even one, of these factors is required for Dr. Clapp’s methodology and conclusions to be deemed sufficiently reliable to be admissible under Rule 702. The scientific consensus is, in fact, to the contrary. It identifies Defendants’ list of factors as some of the nine factors or lenses that guide epidemiologists in making judgments about causation. Ref. Guide on Epidemiolog at 375.).”6

In Cook, the trial court or the parties or both missed the obvious references in the Reference Manual to the need to control for confounding. Certainly many other scientific sources could be cited as well. Judge Kane apparently took a defense expert witness’s statement that ecological studies do not account for confounders to mean that the presence of confounding does not render such studies unscientific. Id. True but immaterial. Ecological studies may be “scientific,” but they do not warrant inferences of causation. Some so-called scientific studies are merely hypothesis generating, preliminary, tentative, or data-dredging exercises. Judge Kane employed the flaws-are-features approach, and opined that ecological studies are merely “less probative” than other studies, and the relative weights of studies do not render them inadmissible.7 This approach is, of course, a complete abdication of gatekeeping responsibility. First, studies themselves are not admissible; it is the expert witness, whose testimony is challenged. The witness’s reliance upon studies is relevant to the Rule 702 and 703 analyses, but admissibility is not the issue. Second, Rule 702 requires that the proffered opinion be “scientific knowledge,” and ecological studies simply lack the necessary epistemic warrant to support a causal conclusion. Third, the trial court in Cook had to ignore the federal judiciary’s own reference manual’s warnings about the inability of ecological studies to provide causal inferences.8 The Cook case is part of an unfortunate trend to regard all studies as “flawed,” and their relative weights simply a matter of argument and debate for the litigants.9


Another example of sloppy reasoning about confounding can be found in a recent federal trial court decision, In re Abilify Products Liability Litigation,10 where the trial court advanced a futility analysis. All observational studies have potential confounding, and so confounding is not an error but a feature. Given this simplistic position, it follows that failure to control for every imaginable potential confounder does not invalidate an epidemiologic study.11 From its nihilistic starting point, the trial court readily found that an expert witness could reasonably dispense with controlling for confounding factors of psychiatric conditions in studies of a putative association between the antipsychotic medication Abilify and gambling disorders.12

Under this sort of “reasoning,” some criminal defense lawyers might argue that since all human beings are “flawed,” we have no basis to distinguish sinners from saints. We have a long way to go before our courts are part of the evidence-based world.

1 In the context of a “social justice” issue such as whether race disparities exist in death penalty cases, New Jersey court has carefully considered confounding in its analyses. See In re Proportionality Review Project (II), 165 N.J. 206, 757 A.2d 168 (2000) (noting that bivariate analyses of race and capital sentences were confounded by missing important variables). Unlike the New Jersey courts (until the recent decision in Accutane), the Texas courts were quick to adopt the principles and policies of gatekeeping expert witness opinion testimony. See Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 714, 724 (Tex.1997) (reviewing court should consider whether the studies relied upon were scientifically reliable, including consideration of the presence of confounding variables).  Even some so-called Frye jurisdictions “get it.” See, e.g., Porter v. SmithKline Beecham Corp., No. 3516 EDA 2015, 2017 WL 1902905 *6 (Phila. Super., May 8, 2017) (unpublished) (affirming exclusion of plaintiffs’ expert witness on epidemiology, under Frye test, for relying upon an epidemiologic study that failed to exclude confounding as an explanation for a putative association), affirming, Mem. Op., No. 03275, 2015 WL 5970639 (Phila. Ct. Com. Pl. Oct. 5, 2015) (Bernstein, J.), and Op. sur Appellate Issues (Phila. Ct. Com. Pl., Feb. 10, 2016) (Bernstein, J.).

3 In re Accutane Litig., ___ N.J. ___, ___ A.3d ___, 2018 WL 3636867 (2018); see N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses(Aug. 8, 2018).

2018 WL 3636867, at *20 (citing the Reference Manual 3d ed., at 597-99).

5 Cook v. Rockwell Internat’l Corp., 580 F. Supp. 2d 1071, 1098 (D. Colo. 2006) (“Defendants next claim that Dr. Clapp’s study and the conclusions he drew from it are unreliable because they failed to comply with four factors or criteria for drawing causal interferences from epidemiological studies: accounting for known confounders … .”), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012). For another example of a trial court refusing to see through important qualitative differences between and among epidemiologic studies, see In re Welding Fume Prods. Liab. Litig., 2006 WL 4507859, *33 (N.D. Ohio 2006) (reducing all studies to one level, and treating all criticisms as though they rendered all studies invalid).

6 Id.   

7 Id.

8 RMSE3d at 561-62 (“[ecological] studies may be useful for identifying associations, but they rarely provide definitive causal answers”) (internal citations omitted); see also David A. Freedman, “Ecological Inference and the Ecological Fallacy,” in Neil J. Smelser & Paul B. Baltes, eds., 6 Internat’l Encyclopedia of the Social and Behavioral Sciences 4027 (2001).

9 See also McDaniel v. CSX Transportation, Inc., 955 S.W.2d 257 (Tenn. 1997) (considering confounding but holding that it was a jury issue); Perkins v. Origin Medsystems Inc., 299 F. Supp. 2d 45 (D. Conn. 2004) (striking reliance upon a study with uncontrolled confounding, but allowing expert witness to testify anyway)

10 In re Abilifiy (Aripiprazole) Prods. Liab. Litig., 299 F. Supp. 3d 1291 (N.D. Fla. 2018).

11 Id. at 1322-23 (citing Bazemore as a purported justification for the court’s nihilistic approach); see Bazemore v. Friday, 478 U.S. 385, 400 (1986) (“Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.).

12 Id. at 1325.

Appendix – Some Federal Court Decisions on Confounding

1st Circuit

Bricklayers & Trowel Trades Internat’l Pension Fund v. Credit Suisse Sec. (USA) LLC, 752 F.3d 82, 85 (1st Cir. 2014) (affirming exclusion of expert witness whose event study and causal conclusion failed to consider relevant confounding variables and information that entered market on the event date)

2d Circuit

In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1984) (noting that confounding had not been sufficiently addressed in a study of U.S. servicemen exposed to Agent Orange), aff’d, 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988)

3d Circuit

In re Zoloft Prods. Liab. Litig., 858 F.3d 787, 793, 799 (2017) (acknowledging that statistically significant findings occur in the presence of inadequately controlled confounding or bias; affirming the exclusion of statistical expert witness, Nicholas Jewell, in part for using an admittedly non-rigorous approach to adjusting for confouding by indication)

4th Circuit

Gross v. King David Bistro, Inc., 83 F. Supp. 2d 597 (D. Md. 2000) (excluding expert witness who opined shigella infection caused fibromyalgia, given the existence of many confounding factors that muddled the putative association)

5th Circuit

Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (noting that observed association may be causal or spurious, and that confounding factors must be considered to distinguish spurious from real associations)

Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 311 (5th Cir. 1989) (noting that “[o]ne difficulty with epidemiologic studies is that often several factors can cause the same disease.”)

6th Circuit

Nelson v. Tennessee Gas Pipeline Co., WL 1297690, at *4 (W.D. Tenn. Aug. 31, 1998) (excluding an expert witness who failed to take into consideration confounding factors), aff’d, 243 F.3d 244, 252 (6th Cir. 2001), cert. denied, 534 U.S. 822 (2001)

Adams v. Cooper Indus. Inc., 2007 WL 2219212, 2007 U.S. Dist. LEXIS 55131 (E.D. Ky. 2007) (differential diagnosis includes ruling out confounding causes of plaintiffs’ disease).

7th Circuit

People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537-38 (7th Cir. 1997) (Posner, J.) (“a statistical study that fails to correct for salient explanatory variables, or even to make the most elementary comparisons, has no value as causal explanation and is therefore inadmissible in a federal court”) (educational achievement in multiple regression);

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940 (7th Cir. 1997) (holding that expert witness’s opinion, which failed to correct for any potential explanatory variables other than age, was inadmissible)

Allgood v. General Motors Corp., 2006 WL 2669337, at *11 (S.D. Ind. 2006) (noting that confounding factors must be carefully addressed; holding that selection bias rendered expert testimony inadmissible)

9th Circuit

In re Bextra & Celebrex Marketing Celebrex Sales Practices & Prod. Liab. Litig., 524 F.Supp. 2d 1166, 1178-79 (N.D. Cal. 2007) (noting plaintiffs’ expert witnesses’ inconsistent criticism of studies for failing to control for confounders; excluding opinions that Celebrex at 200 mg/day can cause heart attacks, as failing to satisfy Rule 702)

Avila v. Willits Envt’l Remediation Trust, 2009 WL 1813125, 2009 U.S. Dist. LEXIS 67981 (N.D. Cal. 2009) (excluding expert witness’s opinion in part because of his failure to rule out confounding exposures and risk factors for the outcomes of interest), aff’d in relevant part, 633 F.3d 828 (9th Cir.), cert denied, 132 S.Ct. 120 (2011)

Hendricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142, 1158 (E.D. Wash. 2009) (“In general, epidemiology studies are probative of general causation: a relative risk greater than 1.0 means the product has the capacity to cause the disease. “Where the study properly accounts for potential confounding factors and concludes that exposure to the agent is what increases the probability of contracting the disease, the study has demonstrated general causation – that exposure to the agent is capable of causing [the illness at issue] in the general population.’’) (internal quotation marks and citation omitted)

Valentine v. Pioneer Chlor Alkali Co., Inc., 921 F. Supp. 666, 677 (D. Nev. 1996) (‘‘In summary, Dr. Kilburn’s study suffers from very serious flaws. He took no steps to eliminate selection bias in the study group, he failed to identify the background rate for the observed disorders in the Henderson community, he failed to control for potential recall bias, he simply ignored the lack of reliable dosage data, he chose a tiny sample size, and he did not attempt to eliminate so-called confounding factors which might have been responsible for the incidence of neurological disorders in the subject group.’’)

Claar v. Burlington No. RR, 29 F.3d 499 (9th Cir. 1994) (affirming exclusion of plaintiffs’ expert witnesses, and grant of summary judgment, when plaintiffs’ witnesses concluded that the plaintiffs’ injuries were caused by exposure to toxic chemicals, without investigating any other possible causes).

10th Circuit

Hollander v. Sandoz Pharms. Corp., 289 F.3d 1193, 1213 (10th Cir. 2002) (affirming exclusion in Parlodel case involving stroke; confounding makes case reports inappropriate bases for causal inferences, and even observational epidemiologic studies must evaluated carefully for confounding)

D.C. Circuit

American Farm Bureau Fed’n v. EPA, 559 F.3d 512 (2009) (noting that in setting particulate matter standards addressing visibility, agency should avoid relying upon data that failed to control for the confounding effects of humidity)

Rule 702 Requires Courts to Sort Out Confounding

October 31st, 2018


Back in 2000, several law professors wrote an essay, in which they detailed some of the problems courts experienced in expert witness gatekeeping. Their article noted that judges easily grasped the problem of generalizing from animal evidence to human experience, and thus they simplistically emphasized human (epidemiologic) data. But in their emphasis on the problems in toxicological evidence, the judges missed problems of internal validity, such as confounding, in epidemiologic studies:

Why do courts have such a preference for human epidemiological studies over animal experiments? Probably because the problem of external validity (generalizability) is one of the most obvious aspects of research methodology, and therefore one that non-scientists (including judges) are able to discern with ease – and then give excessive weight to (because whether something generalizes or not is an empirical question; sometimes things do and other times they do not). But even very serious problems of internal validity are harder for the untrained to see and understand, so judges are slower to exclude inevitably confounded epidemiological studies (and give insufficient weight to that problem). Sophisticated students of empirical research see the varied weaknesses, want to see the varied data, and draw more nuanced conclusions.”2

I am not sure that the problems are dependent in the fashion suggested by the authors, but their assessment that judges may be reluctant to break the seal on the black box of epidemiology, and that judges frequently lack the ability to make nuanced evaluations of the studies on which expert witnesses rely seems fair enough. Judges continue to miss important validity issues, perhaps because the adversarial process levels all studies to debating points in litigation.3

The frequent existence of validity issues undermines the partisan suggestion that Rule 702 exclusions are merely about “sufficiency of the evidence.” Sometimes, there is just too much of nothing to rise even to a problem of insufficiency. Some studies are “not even wrong.”4 Similarly, validity issues are an embarrassment to those authors who argue that we must assemble all the evidence and consider the entirety under ethereal standards, such as “weight of the evidence,” or “inference to the best explanation.” Sometimes, some or much of the available evidence does not warrant inclusion in the data set at all, and any causal inference is unacceptable.

Threats to validity come in many forms, but confounding is a particularly dangerous one. In claims that substances such as diesel fume or crystalline silica cause lung cancer, confounding is a huge problem. The proponents of the claims suggest relative risks in the range of 1.1 to 1.6 for such substances, but tobacco smoking results in relative risks in excess of 20, and some claim that passive smoking at home or in the workplace results in relative risks of the same magnitude as the risk ratios claimed for diesel particulate or silica. Furthermore the studies behind these claims frequently involve exposures to other known or suspected lung carcinogens, such as arsenic, radon, dietary factors, asbestos, and others.

Definition of Confounding

Confounding results from the presence of a so-called confounding (or lurking) variable, helpfully defined in the chapter on statistics in the Reference Manual on Scientific Evidence:

confounding variable; confounder. A confounder is correlated with the independent variable and the dependent variable. An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding. See controlled experiment; observational study.”5

This definition suggests that the confounder need not be known to cause the dependent variable/outcome; the confounder need be only correlated with the outcome and an independent variable, such as exposure. Furthermore, the confounder may be actually involved in such a way as to increase or decrease the estimated relationship between dependent and independent variables. A confounder that is known to be present typically is referred to as a an “actual” confounder, as opposed to one that may be at work, and known as a “potential” confounder. Furthermore, even after exhausting known and potential confounders, studies of may be affected by “residual” confounding, especially when the total array of causes of the outcome of interest is not understood, and these unknown causes are not randomly distributed between exposed and unexposed groups in epidemiologic studies. Litigation frequently involves diseases or outcomes with unknown causes, and so the reality of unidentified residual confounders is unavoidable.

In some instances, especially in studies pharmaceutical adverse outcomes, there is the danger that the hypothesized outcome is also a feature of the underlying disease being treated. This phenomenon is known as confounding by indication, or as indication bias.6

Kaye and Freedman’s statistics chapter notes that confounding is a particularly important consideration when evaluating observational studies. In randomized clinical trials, one goal of the randomization is the elimination of the role of bias and confounding by the random assignment of exposures:

2. Randomized controlled experiments

In randomized controlled experiments, investigators assign subjects to treatment or control groups at random. The groups are therefore likely to be comparable, except for the treatment. This minimizes the role of confounding.”7

In observational studies, confounding may completely invalidate an association. Kaye and Freedman give an example from the epidemiologic literature:

Confounding remains a problem to reckon with, even for the best observational research. For example, women with herpes are more likely to develop cervical cancer than other women. Some investigators concluded that herpes caused cancer: In other words, they thought the association was causal. Later research showed that the primary cause of cervical cancer was human papilloma virus (HPV). Herpes was a marker of sexual activity. Women who had multiple sexual partners were more likely to be exposed not only to herpes but also to HPV. The association between herpes and cervical cancer was due to other variables.”8

The problem identified as confounding by Freedman and Kaye cannot be dismissed as an issue that goes to the “weight” of the study issue; the confounding goes to the heart of the ability of the herpes studies to show an association that can be interpreted to be causal. Invalidity from confounding renders the studies “weightless” in any “weight of the evidence” approach. There are, of course, many ways to address confounding in studies: stratification, multivariate analyses, multiple regression, propensity scores, etc. Consideration of the propriety and efficacy of these methods is a whole other level of analysis, which does not arise unless and until the threshold question of confounding is addressed.

Reference Manual on Scientific Evidence

The epidemiology chapter of the Second Edition of the Manual stated that ruling out of confounding as an obligation of the expert witness who chooses to rely upon the study.9 Although the same chapter in the Third Edition occasionally waffles, its authors come down on the side of describing confounding as a threat to validity, which must be ruled out before the study can be relied upon. In one place, the authors indicate “care” is required, and that analysis for random error, confounding, bias “should be conducted”:

Although relative risk is a straightforward concept, care must be taken in interpreting it. Whenever an association is uncovered, further analysis should be conducted to assess whether the association is real or a result of sampling error, confounding, or bias. These same sources of error may mask a true association, resulting in a study that erroneously finds no association.”10

Elsewhere in the same chapter, the authors note that “chance, bias, and confounding” must be looked at, but again, the authors stop short of noting that these threats to validity must be eliminated:

Three general categories of phenomena can result in an association found in a study to be erroneous: chance, bias, and confounding. Before any inferences about causation are drawn from a study, the possibility of these phenomena must be examined.”11

                *  *  *  *  *  *  *  *

To make a judgment about causation, a knowledgeable expert must consider the possibility of confounding factors.”12

Eventually, however, the epidemiology chapter takes a stand, and an important one:

When researchers find an association between an agent and a disease, it is critical to determine whether the association is causal or the result of confounding.”13

Mandatory Not Precatory

The better reasoned cases decided under Federal Rule of Evidence 702, and state-court analogues, follow the Reference Manual in making clear that confounding factors must be carefully addressed and eliminated. Failure to rule out the role of confounding renders a conclusion of causation, reached in reliance upon confounded studies, invalid.14

The inescapable mandate of Rules 702 and 703 is to require judges to evaluate the bases of a challenged expert witness’s opinion. Threats to internal validity, such as confounding, in a study may make reliance upon any given study, or an entire set of studies, unreasonable, which thus implicates Rule 703. Importantly, stacking up more invalid studies does not overcome the problem by presenting a heap of evidence, incompetent to show anything.


Before the Supreme Court decided Daubert, few federal or state courts were willing to roll up their sleeves to evaluate the internal validity of relied upon epidemiologic studies. Issues of bias and confounding were typically dismissed by courts as issues that went to “weight, not admissibility.”

Judge Weinstein’s handling of the Agent Orange litigation, in the mid-1980s, marked a milestone in judicial sophistication and willingness to think critically about the evidence that was being funneled into the courtroom.15 The Bendectin litigation also was an important proving ground in which the defendant pushed courts to keep their eyes and minds open to issues of random error, bias, and confounding, when evaluating scientific evidence, on both pre-trial and on post-trial motions.16


When the United States Supreme Court addressed the admissibility of plaintiffs’ expert witnesses in Daubert, its principal focus was on the continuing applicability of the so-called Frye rule after the enactment of the Federal Rules of Evidence. The Court left the details of applying the then newly clarified “Daubert” standard to the facts of the case on remand to the intermediate appellate court. The Ninth Circuit, upon reconsidering the case, re-affirmed the trial court’s previous grant of summary judgment, on grounds of the plaintiffs’ failure to show specific causation.

A few years later, the Supreme Court itself engaged with the actual evidentiary record on appeal, in a lung cancer claim, which had been dismissed by the district court. Confounding was one among several validity issues in the studies relied upon by plaintiffs” expert witnesses. The Court concluded that the plaintiffs’ expert witnesses’ bases did not individually or collectively support their conclusions of causation in a reliable way. With respect to one particular epidemiologic study, the Supreme Court observed that a study that looked at workers who “had been exposed to numerous potential carcinogens” could not show that PCBs cause lung cancer. General Elec. Co. v. Joiner, 522 U.S. 136, 146 (1997).17

1 An earlier version of this post can be found at “Sorting Out Confounded Research – Required by Rule 702” (June 10, 2012).

2 David Faigman, David Kaye, Michael Saks, and Joseph Sanders, “How Good is Good Enough? Expert Evidence Under Daubert andKumho,” 50Case Western Reserve L. Rev. 645, 661 n.55 (2000).

3 See, e.g., In re Welding Fume Prods. Liab. Litig., 2006 WL 4507859, *33 (N.D.Ohio 2006) (reducing all studies to one level, and treating all criticisms as though they rendered all studies invalid).

4 R. Peierls, “Wolfgang Ernst Pauli, 1900-1958,” 5Biographical Memoirs of Fellows of the Royal Society 186 (1960) (quoting Wolfgang Pauli’s famous dismissal of a particularly bad physics paper).

5 David Kaye & David Freedman, “Reference Guide on Statistics,” inReference Manual on Scientific Evidence 211, 285 (3d ed. 2011)[hereafter theRMSE3d].

6 See, e.g., R. Didham, et al., “Suicide and Self-Harm Following Prescription of SSRIs and Other Antidepressants: Confounding By Indication,” 60Br. J. Clinical Pharmacol. 519 (2005).

7 RMSE3d at 220.

8 RMSE3d at 219 (internal citations omitted).

9 Reference Guide on Epidemiology at 369 -70 (2ed 2000) (“Even if an association is present, epidemiologists must still determine whether the exposure causes the disease or if a confounding factor is wholly or partly responsible for the development of the outcome.”).

10 RMSE3d at 567-68 (internal citations omitted).

11 RMSE3d at 572.

12 RMSE3d at 591 (internal citations omitted).

13 RMSE3d at 591

14 Similarly, an exonerative conclusion of no association might be vitiated by confounding with a protective factor, not accounted for in a multivariate analysis. Practically, such confounding seems less prevalent than confounding that generates a positive association.

15 In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1984) (noting that confounding had not been sufficiently addressed in a study of U.S. servicemen exposed to Agent Orange), aff’d, 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988).

16 Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 311 , modified on reh’g, 884 F.2d 166 (5th Cir. 1989) (noting that “[o]ne difficulty with epidemiologic studies is that often several factors can cause the same disease.”)

17 The Court’s discussion related to the reliance of plaintiffs’ expert witnesses upon, among other studies, Kuratsune, Nakamura, Ikeda, & Hirohata, “Analysis of Deaths Seen Among Patients with Yusho – A Preliminary Report,” 16 Chemosphere 2085 (1987).

The Hazard of Composite End Points – More Lumpenepidemiology in the Courts

October 20th, 2018

One of the challenges of epidemiologic research is selecting the right outcome of interest to study. What seems like a simple and obvious choice can often be the most complicated aspect of the design of clinical trials or studies.1 Lurking in this choice of end point is a particular threat to validity in the use of composite end points, when the real outcome of interest is one constituent among multiple end points aggregated into the composite. There may, for instance, be strong evidence in favor of one of the constituents of the composite, but using the composite end point results to support a causal claim for a different constituent begs the question that needs to be answered, whether in science or in law.

The dangers of extrapolating from one disease outcome to another is well-recognized in the medical literature. Remarkably, however, the problem received no meaningful discussion in the Reference Manual on Scientific Evidence (3d ed. 2011). The handbook designed to help judges decide threshold issues of admissibility of expert witness opinion testimony discusses the extrapolation from sample to population, from in vitro to in vivo, from one species to another, from high to low dose, and from long to short duration of exposure. The Manual, however, has no discussion of “lumping,” or on the appropriate (and inappropriate) use of composite or combined end points.

Composite End Points

Composite end points are typically defined, perhaps circularly, as a single group of health outcomes, which group is made up of constituent or single end points. Curtis Meinert defined a composite outcome as “an event that is considered to have occurred if any of several different events or outcomes is observed.”2 Similarly, Montori defined composite end points as “outcomes that capture the number of patients experiencing one or more of several adverse events.”3 Composite end points are also sometimes referred to as combined or aggregate end points.

Many composite end points are clearly defined for a clinical trial, and the component end points are specified. In some instances, the composite nature of an outcome may be subtle or be glossed over by the study’s authors. In the realm of cardiovascular studies, for example, investigators may look at stroke as a single endpoint, without acknowledging that there are important clinical and pathophysiological differences between ischemic strokes and hemorrhagic strokes (intracerebral or subarachnoid). The Fletchers’ textbook4 on clinical epidemiology gives the example:

In a study of cardiovascular disease, for example, the primary outcomes might be the occurrence of either fatal coronary heart disease or non-fatal myocardial infarction. Composite outcomes are often used when the individual elements share a common cause and treatment. Because they comprise more outcome events than the component outcomes alone, they are more likely to show a statistical effect.”

Utility of Composite End Points

The quest for statistical “power” is often cited as a basis for using composite end points. Reduction in the number of “events,” such as myocardial infarction (MI), through improvements in medical care has led to decreased rates of MI in studies and clinical trials. These low event rates have caused power issues for clinical trialists, who have responded by turning to composite end points to capture more events. Composite end points permit smaller sample sizes and shorter follow-up times, without sacrificing power, the ability to detect a statistically significant increased rate of a prespecified size and Type I error. Increasing study power, while reducing sample size or observation time, is perhaps the most frequently cited rationale for using composite end points.

Competing Risks

Another reason sometimes offered in support of using composite end points is composites provide a strategy to avoid the problem of competing risks.5 Death (any cause) is sometimes added to a distinct clinical morbidity because patients who are taken out of the trial by death are “unavailable” to experience the morbidity outcome.

Multiple Testing

By aggregating several individual end points into a single pre-specified outcome, trialists can avoid corrections for multiple testing. Trials that seek data on multiple outcomes, or on multiple subgroups, inevitably raise concerns about the appropriate choice of the measure for the statistical test (alpha) to determine whether to reject the null hypothesis. According to some authors, “[c]omposite endpoints alleviate multiplicity concerns”:

If designated a priori as the primary outcome, the composite obviates the multiple comparisons associated with testing of the separate components. Moreover, composite outcomes usually lead to high event rates thereby increasing power or reducing sample size requirements. Not surprisingly, investigators frequently use composite endpoints.”6

Other authors have similarly acknowledged that the need to avoid false positive results from multiple testing is an important rationale for composite end points:

Because the likelihood of observing a statistically significant result by chance alone increases with the number of tests, it is important to restrict the number of tests undertaken and limit the type 1 error to preserve the overall error rate for the trial.”7

Indecision about an Appropriate Single Outcome

The International Conference on Harmonization suggests that the inability to select a single outcome variable may lead to the adoption of a composite outcome:

If a single primary variable cannot be selected …, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable.”8

The “indecision” rationale has also been criticized as “generally not a good reason to use a composite end point.”9

Validity of Composite End Points

The validity of composite end points depends upon methodological assumptions, which will have to be made at the time of the study design and protocol creation. After the data are collected and analyzed, the assumptions may or may not be supported. Among the supporting assumptions about the validity of using composites are:10

  • similarity in patient importance for included component end points,

  • similarity of association size of the components, and

  • number of events across the components.

The use of composite end points can sometimes be appropriate in the “first look” at a class of diseases or disorders, with the understanding that further research will sort out and refine the associated end point. Research into the causes of human birth defects, for instance, often starts out with a look at “all major malformations,” before focusing in on specific organ and tissue systems. To some extent, the legal system, in its gatekeeping function, has recognized the dangers and invalidity of lumping in the epidemiology of birth defects.11 The Frischhertz decision, for instance, clearly acknowledged that given the clear evidence that different birth defects arise at different times, based upon interference with different embryological processes, “lumping” of end points was methodologically inappropriate. 2012 U.S. Dist. LEXIS 181507, at *8 (citing Chamber v. Exxon Corp., 81 F. Supp. 2d 661 (M.D. La. 2000), aff’d, 247 F.3d 240 (5th Cir. 2001) (unpublished)).

The Chamber decision involved a challenge to the causation opinion of frequent litigation industry witness, Peter Infante,12 who attempted to defend his opinion about benzene and chronic myelogenous leukemia, based upon epidemiology of benzene and acute myelogenous leukemia. Plaintiffs’ witnesses and counsel sought to evade the burden of producing evidence of an AML association by pointing to a study that reported “excess leukemias,” without specifying the relevant type. Chamber, 81 F. Supp. 2d at 664. The trial court, however, perspicaciously recognized the claimants’ failure to identify relevant evidence of the specific association needed to support the causal claim.

The Frischhertz and Chamber cases are hardly unique. Several state and federal courts have concurred in the context of cancer causation claims.13 In the context of birth defects litigation, the Public Affairs Committee of the Teratology Society has weighed in with strong guidance that counsels against extrapolation between different birth defects in litigation:

Determination of a causal relationship between a chemical and an outcome is specific to the outcome at issue. If an expert witness believes that a chemical causes malformation A, this belief is not evidence that the chemical causes malformation B, unless malformation B can be shown to result from malformation A. In the same sense, causation of one kind of reproductive adverse effect, such as infertility or miscarriage, is not proof of causation of a different kind of adverse effect, such as malformation.”14

The threat to validity in attributing a suggested risk for a composite end point to all included component end points is not, unfortunately, recognized by all courts. The trial court, in Ruff v. Ensign-Bickford Industries, Inc.,15 permitted plaintiffs’ expert witness to reanalyze a study by grouping together two previously distinct cancer outcomes to generate a statistically significant result. The result in Ruff is disappointing, but not uncommon. The result is also surprising, considering the guidance provided by the American Law Institute’s Restatement:

Even when satisfactory evidence of general causation exists, such evidence generally supports proof of causation only for a specific disease. The vast majority of toxic agents cause a single disease or a series of biologically-related diseases. (Of course, many different toxic agents may be combined in a single product, such as cigarettes.) When biological-mechanism evidence is available, it may permit an inference that a toxic agent caused a related disease. Otherwise, proof that an agent causes one disease is generally not probative of its capacity to cause other unrelated diseases. Thus, while there is substantial scientific evidence that asbestos causes lung cancer and mesothelioma, whether asbestos causes other cancers would require independent proof. Courts refusing to permit use of scientific studies that support general causation for diseases other than the one from which the plaintiff suffers unless there is evidence showing a common biological mechanism include Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1115-1116 (5th Cir. 1991) (applying Texas law) (epidemiologic connection between heavy-metal agents and lung cancer cannot be used as evidence that same agents caused colon cancer); Cavallo v. Star Enters., 892 F. Supp. 756 (E.D. Va. 1995), aff’d in part and rev’d in part, 100 F.3d 1150 (4th Cir. 1996); Boyles v. Am. Cyanamid Co., 796 F. Supp. 704 (E.D.N.Y. 1992). In Austin v. Kerr-McGee Ref. Corp., 25 S.W.3d 280, 290 (Tex. Ct. App. 2000), the plaintiff sought to rely on studies showing that benzene caused one type of leukemia to prove that benzene caused a different type of leukemia in her decedent. Quite sensibly, the court insisted that before plaintiff could do so, she would have to submit evidence that both types of leukemia had a common biological mechanism of development.”

Restatement (Third) of Torts § 28 cmt. c, at 406 (2010). Notwithstanding some of the Restatement’s excesses on other issues, the guidance on composites, seems sane and consonant with the scientific literature.

Role of Mechanism in Justifying Composite End Points

A composite end point may make sense when the individual end points are biologically related, and the investigators can reasonably expect that the individual end points would be affected in the same direction, and approximately to the same extent:16

Confidence in a composite end point rests partly on a belief that similar reductions in relative risk apply to all the components. Investigators should therefore construct composite endpoints in which the biology would lead us to expect similar effects across components.”

The important point, missed by some investigators and many courts, is that the assumption of similar “effects” must be tested by examining the individual component end points, and especially the end point that is the harm claimed by plaintiffs in a given case.

Methodological Issues

The acceptability of composite end points is often a delicate balance between the statistical power and efficiency gained and the reliability concerns raised by using the composite. As with any statistical or interpretative tool, the key questions turn on how the tool is used, and for what purpose. The reliability issues raised by the use of composites are likely to be highly contextual.

For instance, there is an important asymmetry between justifying the use of a composite for measuring efficacy and the use of the same composite for safety outcomes. A biological improvement in type 2 diabetes might be expected to lead to a reduction in all the macrovascular complications of that disease, but a medication for type 2 diabetes might have a very specific toxicity or drug interaction, which affects only one constituent end point among all macrovascular complications, such as myocardial infarction. The asymmetry between efficacy and safety outcomes is specifically addressed by cardiovascular epidemiologists in an important methodological paper:17

Varying definitions of composite end points, such as MACE, can lead to substantially different results and conclusions. There, the term MACE, in particular, should not be used, and when composite study end points are desired, researchers should focus separately on safety and effectiveness outcomes, and construct separate composite end points to match these different clinical goals.”

There are many clear, published statements that caution consumers of medical studies against being misled by claims based upon composite end points. Several years ago, for example, the British Medical Journal published a paper with six methodological suggestions for consumers of studies, one of which deals explicitly with composite end points:18

“Guide to avoid being misled by biased presentation and interpretation of data

1. Read only the Methods and Results sections; bypass the Discuss section

2. Read the abstract reported in evidence based secondary publications

3. Beware faulty comparators

4. Beware composite endpoints

5. Beware small treatment effects

6. Beware subgroup analyses”

The paper elaborates on the problems that arise from the use of composite end points:19

Problems in the interpretation of these trials arise when composite end points include component outcomes to which patients attribute very different importance… .”

Problems may also arise when the most important end point occurs infrequently or when the apparent effect on component end points differs.”

When the more important outcomes occur infrequently, clinicians should focus on individual outcomes rather than on composite end points. Under these circumstances, inferences about the end points (which because they occur infrequently will have very wide confidence intervals) will be weak.”

Authors generally acknowledge that “[w]hen large variations exist between components the composite end point should be abandoned.”20

Methodological Issues Concerning Causal Inferences from Composite End Points to Individual End Points

Several authors have criticized pharmaceutical companies for using composite end points to “game” their trials. Composites allow smaller sample size, but they lend themselves to broader claims for outcomes included within the composite. The same criticism applies to attempts to infer that there is risk of an individual endpoint based upon a showing of harm in the composite endpoint.

If a trial report specifies a composite endpoint, the components of the composite should be in the well-known pathophysiology of the disease. The researchers should interpret the composite endpoint in aggregate rather than as showing efficacy of the individual components. However, the components should be specified as secondary outcomes and reported beside the results of the primary analysis.”21

Virtually the entire field of epidemiology and clinical trial study has urged caution in inferring risk for a component end point from suggested risk in a composite end point:

In summary, evaluating trials that use composite outcome requires scrutiny in regard to the underlying reasons for combining endpoints and its implications and has impact on medical decision-making (see below in Sect. 47.8). Composite endpoints are credible only when the components are of similar importance and the relative effects of the intervention are similar across components (Guyatt et al. 2008a).”22

Not only do important methodologists urge caution in the interpretation of composite end points,23 they emphasize a basic point of scientific (and legal) relevancy:

[A] positive result for a composite outcome applies only to the cluster of events included in the composite and not to the individual components.”24

Even regular testifying expert witnesses for the litigation industry insist upon the “principle of full disclosure”:

The analysis of the effect of therapy on the combined end point should be accompanied by a tabulation of the effect of the therapy for each of the component end points.”25

Gatekeepers in our judicial system need to be more vigilant against bait-and-switch inferences based upon composite end points. The quest for statistical power hardly justifies larding up an end point with irrelevant data points.

1 See, e.g., Milton Packer, “Unbelievable! Electrophysiologists Embrace ‘Alternative Facts’,” MedPage (May 16, 2018) (describing clinical trialists’ abandoning pre-specified intention-to-treat analysis).

2 Curtis Meinert, Clinical Trials Dictionary (Johns Hopkins Center for Clinical Trials 1996).

3 Victor M. Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596 (2005).

4 R. Fletcher & S. Fletcher, Clinical Epidemiology: The Essentials at 109 (4th ed. 2005).

5 Neaton, et al., “Key issues in end point selection for heart failure trials: composite end points,” 11 J. Cardiac Failure 567, 569a (2005).

6 Schulz & Grimes, “Multiplicity in randomized trials I: endpoints and treatments,” 365 Lancet 1591, 1593a (2005).

7 Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials,” 334 Brit. Med. J. 756, 756a – b (2007).

8 International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use; “ICH harmonized tripartite guideline: statistical principles for clinical trials,” 18 Stat. Med. 1905 (1999).

9 Neaton, et al., “Key issues in end point selection for heart failure trials: composite end points,” 11 J. Cardiac Failure 567, 569b (2005).

10 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596, Summary Point No. 2 (2005).

11 SeeLumpenepidemiology” (Dec. 24, 2012), discussing Frischhertz v. SmithKline Beecham Corp., 2012 U.S. Dist. LEXIS 181507 (E.D. La. 2012).Frischhertz was decided in the same month that a New York City trial judge ruled Dr. Shira Kramer out of bounds in the commission of similarly invalid lumping, in Reeps v. BMW of North America, LLC, 2012 NY Slip Op 33030(U), N.Y.S.Ct., Index No. 100725/08 (New York Cty. Dec. 21, 2012) (York, J.), 2012 WL 6729899, aff’d on rearg., 2013 WL 2362566, aff’d, 115 A.D.3d 432, 981 N.Y.S.2d 514 (2013), aff’d sub nom. Sean R. v. BMW of North America, LLC, ___ N.E.3d ___, 2016 WL 527107 (2016). See also New York Breathes Life Into Frye Standard – Reeps v. BMW(Mar. 5, 2013).

12Infante-lizing the IARC” (May 13, 2018).

13 Knight v. Kirby Inland Marine, 363 F.Supp. 2d 859, 864 (N.D. Miss. 2005), aff’d, 482 F.3d 347 (5th Cir. 2007) (excluding opinion of B.S. Levy on Hodgkin’s disease based upon studies of other lymphomas and myelomas); Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198 (5th Cir. 1996) (noting that evidence suggesting a causal connection between ethylene oxide and human lymphatic cancers is not probative of a connection with brain cancer);Current v. Atochem North America, Inc., 2001 WL 36101283, at *3 (W.D. Tex. Nov. 30, 2001) (excluding expert witness opinion of Michael Gochfeld, who asserted that arsenic causes rectal cancer on the basis of studies that show association with lung and bladder cancer; Hill’s consistency factor in causal inference does not apply to cancers generally); Exxon Corp. v. Makofski, 116 S.W.3d 176, 184-85 (Tex. App. Houston 2003) (“While lumping distinct diseases together as ‘leukemia’ may yield a statistical increase as to the whole category, it does so only by ignoring proof that some types of disease have a much greater association with benzene than others.”).

14The Public Affairs Committee of the Teratology Society, “Teratology Society Public Affairs Committee Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421, 423 (2005).

15 168 F. Supp. 2d 1271, 1284–87 (D. Utah 2001).

16 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 595b (2005).

17 Kevin Kip, et al., “The problem with composite end points in cardiovascular studies,” 51 J. Am. Coll. Cardiol. 701, 701 (2008) (Abstract – Conclusions) (emphasis in original).

18 Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004) (emphasis added).

19 Id. at 1094b, 1095a.

20 Montori, et al., “Validity of composite end points in clinical trials.” 300 Brit. Med. J. 594, 596 (2005).

21 Schulz & Grimes, “Multiplicity in randomized trials I: endpoints and treatments,” 365 Lancet 1591, 1595a (2005) (emphasis added). These authors acknowledge that composite end points often lack clinical relevancy, and that the gain in statistical efficiency comes at the high cost of interpretational difficulties. Id. at 1593.

22 Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints).

23 See, e.g., Stuart J. Pocock, John J.V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”)(“COMPOSITE ENDPOINTS. These are commonly used in CV RCTs to combine evidence across 2 or more outcomes into a single primary endpoint. But, there is a danger of oversimplifying the evidence by putting too much emphasis on the composite, without adequate inspection of the contribution from each separate component.”); Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa, and Douglas G. Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612, 612, 615-16 (2008) (“Individual outcomes do not contribute equally to composite measures, so the overall estimate of effect for a composite measure cannot be assumed to apply equally to each of its individual outcomes.”) (“Therefore, readers are cautioned against assuming that the overall estimate of effect for the composite outcome can be interpreted to be the same for each individual outcome.”); Freemantle, et al., “Composite outcomes in randomized trials: Greater precision but with greater uncertainty.” 289 J. Am. Med. Ass’n 2554, 2559a (2003) (“To avoid the burying of important components of composite primary outcomes for which on their own no effect is concerned, . . . the components of a composite outcome should always be declared as secondary outcomes, and the results described alongside the result for the composite outcome.”).

24 Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J. 757a (2007).

25 Lem Moyé, “Statistical Methods for Cardiovascular Researchers,” 118 Circulation Research 439, 451 (2016).

The Judicial Labyrinth for Scientific Evidence

October 3rd, 2018

The real Daedalus (not the musician), as every school child knows, was the creator of the Cretan Labyrinth, where the Minotaur resided. The Labyrinth had been the undoing of many Greeks and barbarians, until an Athenian, Theseus, took up the challenge of slaying the Minotaur. With the help of Ariadne’s thread, Theseus solved the labyrinthic puzzle and slayed the Minotaur.

Theseus and the Minotaur on 6th-century black-figure pottery (Wikimedia Commons 2005)

Dædalus is also the Journal of the American Academy of Arts and Sciences. The Academy has been, for over 230 years, addressing issues issues in both the humanities and in the sciences. In the fall 2018 issue of Dædalus (volume 147, No. 4), the Academy has published a dozen essays by noted scholars in the field, who report on the murky interface of science and law in the courtrooms of the United States. Several of the essays focus on sorry state of forensic “science” in the criminal justice system, which has been the subject of several critical official investigations, only to be dismissed and downplayed by both the Obama and Trump administrations. Other essays address the equally sorry state of judicial gatekeeping in civil actions, with some limited suggestions on how the process of scientific fact finding might be improved. In any event, this issue, Science & the Legal System,” is worth reading even if you do not agree with the diagnoses or the proposed therapies. There is still room for a collaboration between a modern day Daedalus and Ariadne to help us find the way out of this labyrinth.


Shari Seidman Diamond & Richard O. Lempert, “Introduction” (pp. 5–14)

Connecting Science and Law

Sheila Jasanoff, “Science, Common Sense & Judicial Power in U.S. Courts” (pp. 15-27)

Linda Greenhouse, “The Supreme Court & Science: A Case in Point,” (pp. 28–40)

Shari Seidman Diamond & Richard O. Lempert, “When Law Calls, Does Science Answer? A Survey of Distinguished Scientists & Engineers,” (pp. 41–60)

Accomodation or Collision: When Science and Law Meet

Jules Lobel & Huda Akil, “Law & Neuroscience: The Case of Solitary Confinement,” (pp. 61–75)

Rebecca S. Eisenberg & Robert Cook-Deegan, “Universities: The Fallen Angels of Bayh-Dole?” (pp. 76–89)

Jed S. Rakoff & Elizabeth F. Loftus, “The Intractability of Inaccurate Eyewitness Identification” (pp. 90–98)

Jennifer L. Mnookin, “The Uncertain Future of Forensic Science” (pp. 99–118)

Joseph B. Kadane and Jonathan J. Koehler, “Certainty & Uncertainty in Reporting Fingerprint Evidence” (pp. 119–134)

Communicating Science in Court

Nancy Gertner & Joseph Sanders, “Alternatives to Traditional Adversary Methods of Presenting Scientific Expertise in the Legal System” (pp. 135–151)

Daniel L. Rubinfeld & Joe S. Cecil, “Scientists as Experts Serving the Court” (pp. 152–163)

Valerie P. Hans and Michael J. Saks, “Improving Judge & Jury Evaluation of Scientific Evidence” (pp. 164–180)

Continuing the Dialogue

David Baltimore, David S. Tatel & Anne-Marie Mazza, “Bridging the Science-Law Divide” (pp. 181–194)