Confounding in the Courts

Confounding in the Lower Courts

To some extent, lower courts, especially in the federal court system, got the message: Rule 702 required them to think about the evidence, and to consider threats to validity. Institutionally, there were signs of resistance to the process. Most judges were clearly much more comfortable with proxies of validity, such as qualification, publication, peer review, and general acceptance. Unfortunately for them, the Supreme Court had spoken, and then, in 2000, the Rules Committee and Congress spoke by revising Rule 702 to require a searching review of the studies upon which challenged expert witnesses were relying. Some of the cases involving confounding of one sort or another follow.

Confounding and Statistical Significance

Some courts and counsel confuse statistical significance with confounding, and suggest that a showing of statistical significance eliminates concern over confounding. This is, as several commentators have indicated, quite wrong.1 Despite the widespread criticism of this mistake in the Brock opinion, lawyers continue to repeat the mistake. One big-firm defense lawyer, for instance, claimed that “a statistically significant confidence interval helps ensure that the findings of a particular study are not due to chance or some other confounding factors.”2

Confounding and “Effect Size”

There is a role of study “effect size” in evaluating potential invalidity due to confounding, but it is frequently more nuanced than acknowledged by courts. The phrase “effect size,” of course, is misleading in that it is used to refer to the magnitude of an association, which may or may not be causal. This is one among many instances of sloppy terminology in statistical and epidemiologic science. Nonetheless, the magnitude of the relative risk may play a role in evaluating observational analytical epidemiologic studies for their ability to support a causal inference.

Small Effect Size

If the so-called effect size is low, say about 2.0, or less, actual, potential, or residual confounding (or bias) may well account for the entirety of the association.3 Many other well-known authors have concurred, with some setting the bar considerably higher, asking for risk ratios in excess of three or more, before accepting that a “clear-cut” association has been shown, unthreatened by confounding.4

Large Effect Size

Some courts have acknowledged that a strong association, with a high relative risk (without committing to what is “high”), increases the likelihood of a causal relationship, even while proceeding to ignore the effects of confounding.5 The Reference Manual suggests that a large effect size, such as for smoking and lung cancer (greater than ten-fold, and often higher than 30-fold), eliminates the need to worry about confounding:

Many confounders have been proposed to explain the association between smoking and lung cancer, but careful epidemiological studies have ruled them out, one after the other.”6

*  *  *  *  *  *

A relative risk of 10, as seen with smoking and lung cancer, is so high that it is extremely difficult to imagine any bias or confounding factor that might account for it. The higher the relative risk, the stronger the association and the lower the chance that the effect is spurious. Although lower relative risks can reflect causality, the epidemiologist will scrutinize such associations more closely because there is a greater chance that they are the result of uncontrolled confounding or biases.”7

The point about “difficult to imagine” is fair enough in the context of smoking and lung cancer, but that is because no other putative confounder presents such a high relative risk in most studies. In studying other epidemiologic associations, of a high magnitude, the absence of competing risk or correlation from lurking variables would need to be independently shown, rather than relying upon the “case study” of smoking and lung cancer.

Regression and Other Statistical Analyses

The failure to include a lurking or confounding variable may render a regression analysis invalid and meaningless. The Supreme Court, however, in Bazemore, a case decided before its own decision in Daubert, and before Rule 702 was statutorily modified,8 issued a Supreme ipse dixit, to hold that the selection or omission of variables in multiple regression raises an issue that affects the weight of the analysis:

Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.”9

The Supreme Court did, however, acknowledge in Bazemore that:

There may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.”10

The footnote in Bazemore is telling; the majority could imagine or hypothesize a multiple regression so incomplete that it would be irrelevant, but it never thought to ask whether a relevant regression could be so incomplete as to be unreliable or invalid. The invalidity of the regression in Bazemore does not appear to have been raised as an evidentiary issue under Rule 702. None of the briefs in the Supreme Court or the judicial opinions cited or discussed Rule 702.

Despite the inappropriateness of considering the Bazemore precedent after the Court decided Daubert, many lower court decisions have treated Bazemore as dispositive of reliability challenges to regression analyses, without any meaningful discussion.11 In the last several years, however, the appellate courts have awakened on occasion to their responsibilities to ensure that opinions of statistical expert witnesses, based upon regression analyses, are evaluated through the lens of Rule 702.12


1 Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). See, e.g., David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

2 Zach Hughes, “The Legal Significance of Statistical Significance,” 28 Westlaw Journal: Pharmaceutical 1, 2 (Mar. 2012).

See Norman E. Breslow & N. E. Day, “Statistical Methods in Cancer Research,” in The Analysis of Case-Control Studies 36 (IARC Pub. No. 32, 1980) (“[r]elative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor”); David A. Freedman & Philip B. Stark, “The Swine Flu Vaccine and Guillain-Barré Syndrome: A Case Study in Relative Risk and Specific Causation,” 64 Law & Contemp. Probs. 49, 61 (2001) (“If the relative risk is near 2.0, problems of bias and confounding in the underlying epidemiologic studies may be serious, perhaps intractable.”).

See, e.g., Richard Doll & Richard Peto, The Causes of Cancer 1219 (1981) (“when relative risk lies between 1 and 2 … problems of interpretation may become acute, and it may be extremely difficult to disentangle the various contributions of biased information, confounding of two or more factors, and cause and effect.”); Ernst L. Wynder & Geoffrey C. Kabat, “Environmental Tobacco Smoke and Lung Cancer: A Critical Assessment,” in H. Kasuga, ed., Indoor Air Quality 5, 6 (1990) (“An association is generally considered weak if the odds ratio is under 3.0 and particularly when it is under 2.0, as is the case in the relationship of ETS and lung cancer. If the observed relative risk is small, it is important to determine whether the effect could be due to biased selection of subjects, confounding, biased reporting, or anomalies of particular subgroups.”); David A. Grimes & Kenneth F. Schulz, “False alarms and pseudo-epidemics: the limitations of observational epidemiology,” 120 Obstet. & Gynecol. 920 (2012) (“Most reported associations in observational clinical research are false, and the minority of associations that are true are often exaggerated. This credibility problem has many causes, including the failure of authors, reviewers, and editors to recognize the inherent limitations of these studies. This issue is especially problematic for weak associations, variably defined as relative risks (RRs) or odds ratios (ORs) less than 4.”); Ernst L. Wynder, “Epidemiological issues in weak associations,” 19 Internat’l J. Epidemiol. S5 (1990); Straus S, Richardson W, Glasziou P, Haynes R., Evidence-Based Medicine. How to Teach and Practice EBM (3d ed. 2005); Samuel Shapiro, “Bias in the evaluation of low-magnitude associations: an empirical perspective,” 151 Am. J. Epidemiol. 939 (2000); Samuel Shapiro, “Looking to the 21st century: have we learned from our mistakes, or are we doomed to compound them?” 13 Pharmacoepidemiol. & Drug Safety 257 (2004); Muin J. Khoury, Levy M. James, W. Dana Flanders, and David J. Erickson, “Interpretation of recurring weak associations obtained from epidemiologic studies of suspected human teratogens,” 46 Teratology 69 (1992); Mark Parascandola, Douglas L Weed & Abhijit Dasgupta, “Two Surgeon General’s reports on smoking and cancer: a historical investigation of the practice of causal inference,” 3 Emerging Themes in Epidemiol. 1 (2006); David Sackett, R. Haynes, Gordon Guyatt, and Peter Tugwell, Clinical Epidemiology: A Basic Science for Clinical Medicine (2d ed. 1991); Gary Taubes, “Epidemiology Faces Its Limits,” 269 Science164, 168 (July 14, 1995) (quoting Marcia Angell, former editor of the New England Journal of Medicine, as stating that [a]s a general rule of thumb, we are looking for a relative risk of 3 or more [before accepting a paper for publication], particularly if it is biologically implausible or if it’s a brand new finding.”) (quoting John C. Bailar: “If you see a 10-fold relative risk and it’s replicated and it’s a good study with biological backup, like we have with cigarettes and lung cancer, you can draw a strong inference. * * * If it’s a 1.5 relative risk, and it’s only one study and even a very good one, you scratch your chin and say maybe.”); Lynn Rosenberg, “Induced Abortion and Breast Cancer: More Scientific Data Are Needed,” 86 J. Nat’l Cancer Instit. 1569, 1569 (1994) (“A typical difference in risk (50%) is small in epidemiologic terms and severely challenges our ability to distinguish if it reflects cause and effect or if it simply reflects bias.”) (commenting upon Janet R. Daling, K. E. Malone, L. F. Voigt, E. White, and Noel S. Weiss, “Risk of breast cancer among young women: relationship to induced abortion,” 86 J. Nat’l Cancer Instit. 1584 (1994); Linda Anderson, “Abortion and possible risk for breast cancer: analysis and inconsistencies,” (Wash. D.C., Nat’l Cancer Institute, Oct. 26,1994) (“In epidemiologic research, relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or effects of confounding factors that are sometimes not evident.”); Washington Post (Oct 27, 1994) (quoting Dr. Eugenia Calle, Director of Analytic Epidemiology for the American Cancer Society: “Epidemiological studies, in general are probably not able, realistically, to identify with any confidence any relative risks lower than 1.3 (that is a 30% increase in risk) in that context, the 1.5 [reported relative risk of developing breast cancer after abortion] is a modest elevation compared to some other risk factors that we know cause disease.”). See also General Causation and Epidemiologic Measures of Risk Size” (Nov. 24, 2012). Even expert witnesses for the litigation industry have agreed that small risk ratios (under two) are questionable for potential and residual confounding. David F. Goldsmith & Susan G. Rose, “Establishing Causation with Epidemiology,” in Tee L. Guidotti & Susan G. Rose, eds., Science on the Witness Stand: Evaluating Scientific Evidence in Law, Adjudication, and Policy 57, 60 (2001) (“There is no clear consensus in the epidemiology community regarding what constitutes a ‘strong’ relative risk, although, at a minimum, it is likely to be one where the RR is greater than two; i.e., one in which the risk among the exposed is at least twice as great as among the unexposed.”)

See King v. Burlington Northern Santa Fe Railway Co., 762 N.W.2d 24, 40 (Neb. 2009) (“the higher the relative risk, the greater the likelihood that the relationship is causal”).

RMSE3d at 219.

RMSE3d at 602. See Landrigan v. Celotex Corp., 127 N.J. 404, 605 A.2d 1079, 1086 (1992) (“The relative risk of lung cancer in cigarette smokers as compared to nonsmokers is on the order of 10:1, whereas the relative risk of pancreatic cancer is about 2:1. The difference suggests that cigarette smoking is more likely to be a causal factor for lung cancer than for pancreatic cancer.”).

See Federal Rule of Evidence 702, Pub. L. 93–595, § 1, Jan. 2, 1975, 88 Stat. 1937; Apr. 17, 2000 (eff. Dec. 1, 2000); Apr. 26, 2011, eff. Dec. 1, 2011.)

Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing Court of Appeal’s decision that would have disallowed a multiple regression analysis that omitted important variables).

10 Id. at 400 n. 10.

11 See, e.g., Manpower, Inc. v. Insurance Company of the State of Pennsylvania, 732 F.3d 796, 799 (7th Cir., 2013) (“the Supreme Court and this Circuit have confirmed on a number of occasions that the selection of the variables to include in a regression analysis is normally a question that goes to the probative weight of the analysis rather than to its admissibility.”); Cullen v. Indiana Univ. Bd. of Trustees, 338 F.3d 693, 701‐02 & n.4 (7th Cir. 2003) (citing Bazemore in rejecting challenge to expert witness’s omission of variables in regression analysis); In re High Fructose Corn Syrup Antitrust Litigation, 295 F.3d 651, 660‐61 (7th Cir. 2002) (refusing to exclude expert witness opinion testimony based upon regression analyses, flawed by omission of key variables); Adams v. Ameritech Servs., Inc., 231 F.3d 414, 423 (7th Cir. 2000) (relying upon Bazemore to affirm statistical analysis based upon correlation with no regression analysis). See also The Seventh Circuit Regresses on Rule 702” (Oct. 29, 2013).

12 See, e.g., ATA Airlines, Inc. v. Fed. Express Corp., 665 F.3d 882, 888–89 (2011) (Posner, J.) (reversing on grounds that plaintiff’s regression analysis should never have been admitted), cert. denied, 2012 WL 189940 (Oct. 7, 2012); Zenith Elec. Corp. v. WH-TV Broad. Corp., 395 F.3d 416 (7th Cir.) (affirming exclusion of expert witness opinion whose extrapolations were mere “ipse dixit”), cert. denied, 125 S. Ct. 2978 (2005); Sheehan v. Daily Racing Form, Inc. 104 F.3d 940 (7th Cir. 1997) (Posner, J.) (discussing specification error). See also Munoz v. Orr, 200 F.3d 291 (5th Cir. 2000). For a more enlightened and educated view of regression and the scope and application of Rule 702, from another Seventh Circuit panel, Judge Posner’s decision in ATA Airlines, supra, is a good starting place. SeeJudge Posner’s Digression on Regression” (April 6, 2012).