The gatekeeper’s door really must swing both ways on causal analysis. For decades, the courts allowed anything as long as the speaker was “an expert witness,” who uttered the magic words “reasonable medical certainty.” For the most part, this willingness to tolerate all sorts of nonsense favored plaintiffs. In the backlash against this judicial libertine approach, some courts, such as those in Texas, have embraced a principle that unfairly favors defendants. Abridgment of scientific method and reasoning is offensive regardless who is being favored.
The Texas courts have adopted a rule that plaintiffs must offer a statistically significant study, with a risk ratio (RR) greater than two, to show general causation. A RR ≤ 2 can be a strong practical argument against specific causation in many cases. See Courts and Commentators on Relative Risks to Infer Specific Causation; Relative Risks and Individual Causal Attribution; and Risk and Causation in the Law. But a RR > 2 threshold has little in theory to do with general causation. There are any number of well-established causal relationships, where the magnitude of the ex ante risk in an exposed population is > 1, but ≤ 2. The magnitude of risk for cardiovascular disease and smoking is one such well-known example. As I noted in “Confusion Over Causation in Texas” (Aug. 27, 2011), the Texas Supreme Court managed to confuse general and specific causation concepts in its decision in Merck & Co. v. Garza, 347 S.W.3d 256 (2011).
Still, the search for a RR threshold for general causation does have some basis in the practice of epidemiology. When assessing general causation from only observational epidemiologic studies, where residual confounding and bias may be lurking, it is prudent to require a RR > 2, as a measure of strength of the association that can help us rule out the role of systemic error. As the cardiovascular disease/smoking example illustrates, however, there is clearly no scientific requirement that the RR be greater than 2 to establish general causation. Courts should recognize that there are spurious associations with RR >> 2, and true, causal associations with RR < 2. Much will depend upon the number of studies, and the potential for bias or confounding in the body of evidence. If the other important Bradford Hill factors are present – dose-response, consistent, coherence, etc. – then risk ratios ≤ 2, from observational studies, may suffice to show general causation. So a requirement of RR > 2, for the showing of general causation, does not make sense as a criterion for general causation; and at best, RR > 2 is a much weaker consideration for general causation than it is for specific causation.
Randomization and double blinding are major steps in controlling confounding and bias, but they are not guarantees that systematic bias has been eliminated. Similarly, despite the confusion and errors of lawyers and judges, statistical significance does not address bias or confounding. See, e.g., Zach Hughes, “The Legal Significance of Statistical Significance,” 28 Westlaw Journal: Pharmaceutical 1, 2 (Mar. 2012) (erroneously describing the meaning and function of significance testing; “Stated simply, a statistically significant confidence interval helps ensure that the findings of a particular study are not due to chance or some other confounding factors.”).
A double-blinded, placebo-controlled, randomized clinical trial (RCT) will usually have less opportunity for bias and confounding to play a role. Imposing a RR > 2 requirement for general causation thus makes less sense in the context of trying to infer general causation from the results of RCTs. The Garza Court, however, went a dictum too far by describing RR > 2 as a requirement that applied to general causation:
“Havner holds, and we reiterate, that when parties attempt to prove general causation using epidemiological evidence, a threshold requirement of reliability is that the evidence demonstrate a statistically significant doubling of the risk. In addition, Havner requires that a plaintiff show ‘that he or she is similar to [the subjects] in the studies’ and that ‘other plausible causes of the injury or condition that could be negated [are excluded] with reasonable certainty’.40”
347 S.W.3d at 265 (quoting from Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 S.W.2d 706, 720 (Tex. 1997). See Merk’s Appellant’s Brief to the Texas Court of Appeals at 16, 17 (July 16, 2007) (citing the Havner case as providing a “rational basis for inferring causation”; “To prove general causation, the Garzas were required to introduce at least two statistically significant scientific studies showing that Vioxx at the same dose and duration as taken by Mr. Garza more than doubled the risk of heart attack. Havner, 953 S.W.2d at 718-23, 727.”).
Imposing RR > 2 as a requirement for general causation, in the context of risk ratios from clinical trials, was particularly unwarranted. If general causation were the issue, it would be difficult to make out a reason for why the dose and duration used in the study had to be the same as that used by the specific plaintiff. General causation was not the dispositive issue in Garza, and so this language should be treated as dictum. The confusion between general and specific causation is unfortunate.
What is the source of the Garza court’s notion about RR and general causation? One popular article from Science, in the 1990’s, gave some credence to the notion of a minimal RR for general causation. Gary Taubes, “Epidemiology Faces Its Limits,” 269 Science 164 (July 14, 1995) [cited as Taubes]. Taubes collected quotes (or sound bites) from various authors, about the relevance of the magnitude of observed associations. For instance, Taubes quoted Marcia Angell, a former editor of the New England Journal of Medicine, as articulating a general rule:
“As a general rule of thumb, we are looking for a relative risk of 3 or more [before accepting a paper for publication], particularly if it is biologically implausible or if it’s a brand new finding.”
Taubes at 168. John Bailar, a professor emeritus at the University of Chicago, was quoted by Taubes as rejecting any reliable dividing line, thus taking a more nuanced approach:
“If you see a 10-fold relative risk and it’s replicated and it’s a good study with biological backup, like we have with cigarettes and lung cancer, you can draw a strong inference. * * * If it’s a 1.5 relative risk, and it’s only one study and even a very good one, you scratch your chin and say maybe.”
Taubes at 168. Taubes described Harvard epidemiologist Dimitrios Trichopoulos as suggesting that a study should show a four-fold increased risk, and the late Sir Richard Doll of Oxford University as suggesting that a single epidemiologic study would not be persuasive unless the lower limit of its 95% confidence interval exclude 3.0. Id.
Even if Taubes’ quotes are accurate, there is a risk that they were stripped of important nuance provided by the scientists he interviewed. There are other, more credible sources, however, for scientists who have insisted on a need to use the size of a RR as a consideration in evaluating the causality of an association, especially for observational studies. For example, Breslow and Day, two respected cancer researchers, noted in a publication of the World Health Organization, that
“[r]elative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor, those over 5.0 are unlikely to do so.”
Norman E. Breslow & Nicholas E. Day, Statistical Methods in Cancer Research. Volume I – The Analysis of Case-Control Studies at 36 (Lyon, International Agency for Research on Cancer Scientific Publications No. 32, 1980). The caveat makes sense, but it clearly was never intended to be some sort of bright-line rule for people too lazy to look at the actual studies and data. Unfortunately, not all epidemiologists are as capable as Breslow and Day, and there are plenty of examples of spurious RR > 5, arising from biased or confounded studies.
Sir Richard Doll, and Sir Richard Peto, expressed a similarly skeptical view about RR < 2, in assessing the causality of associations:
“when relative risk lies between 1 and 2 … problems of interpretation may become acute, and it may be extremely difficult to disentangle the various contributions of biased information, confounding of two or more factors, and cause and effect.”
Richard Doll & Richard Peto, The Causes of Cancer 1219 (Oxford Univ. Press 1981).
More recently, plaintiffs’ testifying expert witness, David Goldsmith expressed the view that a RR > 2 is a minimal indication of a strong RR, which is a likely candidate for causality. David F. Goldsmith & Susan G. Rose, “Establishing Causation with Epidemiology,” in Tee L. Guidotti & Susan G. Rose, eds., Science on the Witness Stand: Evaluating Scientific Evidence in Law, Adjudication, and Policy 57, 60 (OEM Press 2001) (“There is no clear consensus in the epidemiology community regarding what constitutes a ‘strong’ relative risk, although, at a minimum, it is likely to be one where the RR is greater than two; i.e., one in which the risk among the exposed is at least twice as great as among the unexposed.”); Ernst L. Wynder & Geoffrey C. Kabat, “Environmental Tobacco Smoke and Lung Cancer: A Critical Assessment,” in H. Kasuga, ed., Indoor Air Quality 5, 6 (Berlin Springer Verlag, 1990) (“An association is generally considered weak if the odds ratio is under 3.0 and particularly when it is under 2.0, as is the case in the relationship of ETS and lung cancer. If the observed relative risk is small, it is important to determine whether the effect could be due to biased selection of subjects, confounding, biased reporting, or anomalies of particular subgroups.”).
In the 1990’s, Dr. Janet Daling and her colleagues published an observational epidemiologic study on whether abortion was related to later breast cancer. Janet R. Daling, K.E. Malone, L.F. Voigt, E. White, Noel S. Weiss, “Risk of breast cancer among young women: relationship to induced abortion,” 86 J. Nat’l Cancer Instit. 1584 (1994). Several scientists, concerned that Dr. Daling’s findings would be distorted by religious propagandists, wrote that the small RRs in the Daling study could not support a causal interpretation of the data. In an editorial that accompanied the article, Dr. Lynn Rosenberg, of the Boston University School of Medicine, wrote:
“A typical difference in risk (50%) is small in epidemiologic terms and severely challenges our ability to distinguish if it reflects cause and effect or if it simply reflects bias.”
Lynn Rosenberg, “Induced Abortion and Breast Cancer: More Scientific Data Are Needed,” 86 J. Nat’l Cancer Instit. 1569, 1569 (1994). Rosenberg’s caution was picked up and repeated by an official statement of the National Cancer Institute (NCI). Linda Anderson, of the NCI Press Office (NIH) issued a press release to stifle fears raised by Dr. Daling’s abortion research:
“In epidemiologic research, relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or effects of confounding factors that are sometimes not evident.”
Linda Anderson, “Abortion and possible risk for breast cancer: analysis and inconsistencies,” (Wash. DC, NCI Oct. 26. 1994). In the lay media, an American Cancer Society epidemiologist was quoted in reference to the Daling study:
“Epidemiological studies, in general are probably not able, realistically, to identify with any confidence any relative risks lower than 1.3 (that is a 30% increase in risk) in that context, the 1.5 [reported relative risk of developing breast cancer after abortion] is a modest elevation compared to some other risk factors that we know cause disease.”
Washington Post (Oct 27,1994) (Dr. Eugenia Calle, Director of Analytic Epidemiology for the ACS).
Not surprisingly, tobacco companies, embattled by claims of cancer from environmental tobacco smoke (ETS) cried political correctness when the NCI and the ACS announced a skeptical view of whether RRs between 1 and 2 could show a causal relationship between abortion and breast cancer, while endorsing a low RR as real in the case of ETS and lung cancer.
What the tobacconists, however, missed was that Daling’s association was a relatively novel finding. Subsequent studies failed to corroborate the association, which now lives on only because of the efforts of theocratic regimes in some of the United States. The NCI’s reaction to the Daling study was in line with the quotes from Taubes’ article, above.
Recently, two epidemiologists reviewed the issue of minimal reliable risk, and concluded:
“There is no single number for a minimal reliable risk that pertains to all studies.”
Mark J. Nicolich and John F. Gamble, “What is the Minimum Risk that can be Estimated from an Epidemiology Study?,” in Anca Moldoveanu, ed., Advanced Topics in Environmental Health and Air Pollution Case Studies,at 4.1.1 Point 1 (2011). Of course, this pronouncement by Nicolich and Gamble is precisely the sort of call for sound judgment that lawyers fear because it involves engagement with the studies, their methods, and their data. The potential for bias and confounding is not constant across all studies. The potential for such errors varies with the nature of the exposure and the outcome under investigation, the design of the study, and myriad particulars and details of the studies involved. As Nicolich and Gamble explained:
“Theoretically, there is no relative risk that is too small to be estimated. The relative risk is a construct or a concept, not a physical reality. Since it is a mathematically defined concept it can be mathematically estimated to any degree of precision. However, we have shown in this paper that (1) there are many assumptions that must be met to make certain that the RR estimate is accurate and precise; and (2) the significance level or uncertainty associated with the RR estimate has its own set of assumptions that must be met. So, while there may be no theoretical minimum RR that can be estimated, in practice there is a minimum risk and varies depending on uncertainties present in the context of each study.
An analogy in the physical world of estimating a RR is to measure the length of an object. A meterstick is precise enough to determine the width of a table to see if it will fit through a doorway, but a meterstick is not precise enough to measure the diameter of a shaft in an automobile engine with a tolerance of ±1.0 mm. To measure the shaft diameter one would use a micrometer. The micrometer while sufficiently precise to measure the shaft is not adequate to determine the size of a dust mite, usually in the range of 200 to 300 μm. The analogy can be carried through to the size of molecules, to the wavelength of visible light, and to the diameter of an electron. The conclusion is that while all the tasks involve measuring length and there is no practical ‘minimum length’, different tools and considerations are needed depending on the object to be measured and the precision required.”
Id. at 21.
“We agree with Wynder (1987) that epidemiology is able to correctly interpret relatively small relative risks, but only if the best epidemiological methodology is applied and only if the data are fully evaluated by examining all judgment criteria, especially those of biological plausibility. As RRs become smaller, the need for close adherence to these basic principles becomes greater. If these ideas are applied, a conclusion of no risk should reassure society. And when a risk is reported as positive, appropriate preventive measures to reduce avoidable illness can be used to successfully reach the ultimate goal of epidemiology and preventive medicine.”
Id. at 22.
Nicolich and Gamble probably provide more nuance than most courts want, but it is what scientists, policy makers, and lawyers need to hear. Simplistic rules, such as a requirement of two statistically significant studies with RR > 2, do not enhance the credibility of judicial judgments. The requirement is over- and under-inclusive; it screens out real causal associations while allowing spurious associations, almost certainly the product of bias or confounding, to stand.