Take all the evidence, throw it into the hopper, close your eyes, open your heart, and guess the weight. You could be a lucky winner! The weight of the evidence suggests that the weight-of-the-evidence (WOE) method is little more than subjective opinion, but why care if it helps you to get to a verdict?
The scientific community has never been seriously impressed by the so-called weight of the evidence (WOE) approach to determining causality. The phrase is vague and ambiguous; its use, inconsistent. See, e.g., V. H. Dale, G.R. Biddinger, M.C. Newman, J.T. Oris, G.W. Suter II, T. Thompson, et al., “Enhancing the ecological risk assessment process,” 4 Integrated Envt’l Assess. Management 306 (2008)(“An approach to interpreting lines of evidence and weight of evidence is critically needed for complex assessments, and it would be useful to develop case studies and/or standards of practice for interpreting lines of evidence.”); Igor Linkov, Drew Loney, Susan M. Cormier, F.Kyle Satterstrom, Todd Bridges, “Weight-of-evidence evaluation in environmental assessment: review of qualitative and quantitative approaches,” 407 Science of Total Env’t 5199–205 (2009); Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review); R.G. Stahl Jr., “Issues addressed and unaddressed in EPA’s ecological risk guidelines,” 17 Risk Policy Report 35 (1998); (noting that U.S. Environmental Protection Agency’s guidelines for ecological weight-of-evidence approaches to risk assessment fail to provide guidance); Glenn W. Suter II, Susan M. Cormier, “Why and how to combine evidence in environmental assessments: Weighing evidence and building cases,” 409 Science of the Total Environment 1406, 1406 (2011)(noting arbitrariness and subjectivity of WOE “methodology”).
General Electric v. Joiner
Most savvy judges quickly figured out that weight of the evidence (WOE) was suspect methodology, woefully lacking, and indeed, not really a methodology at all.
The WOE method was part of the hand waving in Joiner by plaintiffs’ expert witnesses, including the frequent testifier Rabbi Teitelbaum. The majority recognized that Rabbi Teitelbaum’s WOE weighed in at less than a peppercorn, and affirmed the district court’s exclusion of his opinions. The Joiner Court’s assessment provoked a dissent from Justice Stevens, who was troubled by the Court’s undressing of the WOE methodology:
“Dr. Daniel Teitelbaum elaborated on that approach in his deposition testimony: ‘[A]s a toxicologist when I look at a study, I am going to require that that study meet the general criteria for methodology and statistical analysis, but that when all of that data is collected and you ask me as a patient, Doctor, have I got a risk of getting cancer from this? That those studies don’t answer the question, that I have to put them all together in my mind and look at them in relation to everything I know about the substance and everything I know about the exposure and come to a conclusion. I think when I say, “To a reasonable medical probability as a medical toxicologist, this substance was a contributing cause,” … to his cancer, that that is a valid conclusion based on the totality of the evidence presented to me. And I think that that is an appropriate thing for a toxicologist to do, and it has been the basis of diagnosis for several hundred years, anyway’.
* * * *
Unlike the District Court, the Court of Appeals expressly decided that a ‘weight of the evidence’ methodology was scientifically acceptable. To this extent, the Court of Appeals’ opinion is persuasive. It is not intrinsically “unscientific” for experienced professionals to arrive at a conclusion by weighing all available scientific evidence—this is not the sort of ‘junk science’ with which Daubert was concerned. After all, as Joiner points out, the Environmental Protection Agency (EPA) uses the same methodology to assess risks, albeit using a somewhat different threshold than that required in a trial. Petitioners’ own experts used the same scientific approach as well. And using this methodology, it would seem that an expert could reasonably have concluded that the study of workers at an Italian capacitor plant, coupled with data from Monsanto’s study and other studies, raises an inference that PCB’s promote lung cancer.”
General Electric v. Joiner, 522 U.S. 136, 152-54 (1997)(Stevens, J., dissenting)(internal citations omitted)(confusing critical assessment of studies with WOE; and quoting Rabbit Teitelbaum’s attempt to conflate diagnosis with etiological attribution). Justice Stevens could reach his assessment only by ignoring the serious lack of internal and external validity in the studies relied upon by Rabbi Teitelbaum. Those studies did not support his opinion individually or collectively.
Justice Stevens was wrong as well about the claimed scientific adequacy of WOE. Courts have long understood that precautionary, preventive judgments of regulatory agencies are different from scientific conclusions that are admissible in civil and criminal litigation. See Allen v. Pennsylvania Engineering Corp., 102 F.3d 194 (5th Cir. 1996)(WOE, although suitable for regulatory risk assessment, is not appropriate in civil litigation). Justice Stevens’ characterization of WOE was little more than judicial ipse dixit, and it was, in any event, not the law; it was the argument of a dissenter.
Milward v. Acuity Specialty Products
Admittedly, dissents can sometimes help lower court judges chart a path of evasion and avoidance of a higher court’s holding. In Milward, Justice Stevens’ mischaracterization of WOE and scientific method was adopted as the legal standard for expert witness testimony by a panel of the United States Court of Appeals, for the First Circuit. Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D. Mass. 2009), rev’d, 639 F.3d 11 (1st Cir. 2011), cert. denied, U.S. Steel Corp. v. Milward, ___ U.S. ___, 2012 WL 33303 (2012).
Mr. Milward claimed that he was exposed to benzene as a refrigerator technician, and developed acute promyelocytic leukeumia (APL) as result. 664 F. Supp. 2d at 140. In support of his claim, Mr. Milward offered the testimony of Dr. Martyn T. Smith, a toxicologist, who testified that the “weight of the evidence” supported his opinion that benzene exposure causes APL. Id. Smith, in his litigation report, described his methodology as an application of WOE:
“The term WOE has come to mean not only a determination of the statistical and explanatory power of any individual study (or the combined power of all the studies) but the extent to which different types of studies converge on the hypothesis.) In assessing whether exposure to benzene may cause APL, I have applied the Hill considerations . Nonetheless, application of those factors to a particular causal hypothesis, and the relative weight to assign each of them, is both context dependent and subject to the independent judgment of the scientist reviewing the available body of data. For example, some WOE approaches give higher weight to mechanistic information over epidemiological data.”
Smith Report at ¶¶19, 21 (citing Sheldon Krimsky, “The Weight of Scientific Evidence in Policy and Law,” 95(S1) Am. J. Public Health 5130, 5130-31 (2005))(March 9, 2009). Smith marshaled several bodies of evidence, which he claimed collectively supported his opinion that benzene causes APL. Milward, 664 F. Supp. 2d at 143.
Milward also offered the testimony of a philosophy professor, Carl F. Cranor, for the opinion that WOE was an acceptable methodology, and that all scientific inference is subject to judgment. This is the same Cranor who, advocating for open admissions of all putative scientific opinions, showcased his confusion between statistical significance probability and the posterior probability involved in a conclusion of causality. Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34(Oxford 1993)(“One can think of α, β (the chances of type I and type II errors, respectively) and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.
After a four-day evidentiary hearing, the district court found that Martyn Smith’s opinion was merely a plausible hypothesis, and not admissible. Milward, 664 F. Supp. 2d at 149. The Court of Appeals, in an opinion by Chief Judge Lynch, however, reversed and ruled that an inference of general causation based on a WOE methodology satisfied the reliability requirement for admission under Federal Rule of Evidence 702. 639 F.3d at 26. According to the Circuit, WOE methodology was scientifically sound, Id. at 22-23.
WOE Cometh
Because the WOE methodology is not well described, either in the published literature or in Martyn Smith’s litigation report, it is difficult to understand exactly what the First Circuit approved by reversing Smith’s exclusion. Usually the burden is on the proponent of the opinion testimony, and one would have thought that the vagueness of the described methodology would count against admissibility. It is hard to escape the conclusion that the Circuit elevated a poorly described method, best characterized as hand waving, into a description of scientific method
The Panel appeared to have been misled by Carl F. Cranor, who described “inference to the best explanation” as requiring a scientist to “consider all of the relevant evidence” and “integrate the evidence using professional judgment to come to a conclusion about the best explanation. Id at 18. The available explanations are then weighed, and a would-be expert witness is free to embrace the one he feels offers the “best” explanation. The appellate court’s opinion takes WOE, combined with Cranor’s “inference to the best explanation,” to hold that an expert witness need only opine that he has considered the range of plausible explanations for the association, and that he believes that the causal explanation is the best or “most plausible.” Id. at 20 (upholding this approach as “methodologically reliable”).
What is missing of course is the realization that plausible does not mean established, reasonably certain, or even more likely than not. The Circuit’s invocation of plausibility also obscures the indeterminacy of the available data for supporting a reliable conclusion of causation in many cases.
Curiously, the Panel likened WOE to the use of differential diagnosis, which is a method for inferring the specific cause of a particular patient’s disease or disorder. Id. at 18. This is a serious confusion between a method concerned with general causation and one concerned with specific causation. Even if, by the principle of charity, we allow that the First Circuit was thinking of some process of differential etiology rather than diagnosis, given that diagnoses (other than for infectious diseases and a few pathognomonic disorders) do not usually carry with them information about unique etiologic agents. But even such a process of differential etiology is a well-structured dysjunctive syllogism of the form:
A v B v C
~A ∩ ~B
∴ C
There is nothing subjective about assigning weights or drawing inferences in applying such a syllogism. In the Milward case, one of the propositional facts that might have well explained the available evidence was chance, but plaintiff’s expert witness Smith could not and did not rule out chance in that the studies upon which he relied were not statistically significant. Smith could thus never get past “therefore” in any syllogism or in any other recognizable process of reasoning.
The Circuit Court provides no insight into the process Smith used to weigh the available evidence, and it failed to address the analytical gaps and evidentiary insufficiencies addressed by the trial court, other than to invoke the mantra that all these issues go to “the weight, not the admissibility” of Smith’s opinions. This, of course, is a conclusion, not an explanation or a legal theory.
There is also a cute semantic trick lurking in plaintiffs’ position in Milward, which results from their witnesses describing their methodology as “WOE.” Since the jury is charged with determining the “weight of the evidence,” any evaluation of the WOE would be an invasion of the province of the jury. Milward, 639 F.3d at 20. QED by the semantic device of deliberating conflating the name of the putative scientific methodology with the term traditionally used to describe jury fact finding.
In any event, the Circuit’s chastisement of the district court for evaluating Smith’s implementation of the WOE methodology, his logical, mathematical, and epidemiological errors, his result-driven reinterpretation of study data, threatens to read an Act of Congress — the Federal Rules of Evidence, and especially Rules 702 and 703 — out of existence by judicial fiat. The Circuit’s approach is also at odds with Supreme Court precedent (now codified in Rule 702) on the importance and the requirement of evaluating opinion testimony for analytical gaps and the ipse dixit of expert witnesses. General Electic Co. v. Joiner, 522 U.S. 136, 146 (1997).
Smith’s Errors in Recalculating Odds Ratios of Published Studies
In the district court, the defendants presented testimony of an epidemiologist, Dr. David H. Garabrant, who took Smith to task for calculating risk ratios incorrectly. Smith did not have any particular expertise in epidemiologist, and his faulty calculations were problematic from the perspective of both Rule 702 and Rule 703. The district court found the criticisms of Smith’s calculations convincing, 664 F. Supp. 2d at 149, but the appellate court held that the technical dispute was for the jury; “both experts’ opinions are supported by evidence and sound scientific reasoning,” Milward, 639 F.3d at 24. This ruling is incomprehensible. Plaintiffs had the burden of showing admissibility of Smith opinion generally, but also the reasonability of his reliance upon the calculated odds ratio. The defendants had no burden of persuasion on the issue of Smith’s calculations, but they presented testimony, which apparently carried the day. The appellate court had no basis for reversing the specific ruling with respect to the erroneously calculated risk ratio.
Smith’s Reliance upon Statistically Insignificant Studies
Smith relied upon studies that were not statistically significant at any accepted level. An opinion of causality requires a showing that chance, bias, and confounding have been excluded in assessing an existing association. Smith failed to exclude chance as an explanation for the association, and the burden to make this exclusion was on the plaintiffs. This failure was not something that could readily be patched by adverting to other evidence of studies in animals or in test tubes. The Court of Appeals excused the important analytical gap in plaintiffs’ witness’s opinion because APL is rare, and data collection is difficult in the United States. Id. at 24. Evidence “consistent with” and “suggestive of” the challenged witness’s opinion thus suffices. This is a remarkable homeopathic dilution of both legal and scientific causation. Now we have a rule of law that allows plaintiffs to be excused from having to prove their case with reliable evidence if they allege a rare disease for which they lack evidence.
Leveling the Hierarchy of Evidence
Imagine trying to bring a medication to market with a small case-control study, with a non-statistically significant odds ratio! Oh, but these clinical trials are so difficult and expensive; and they take such a long time. Like a moment’s thought, when thinking is so hard and a moment such a long time. We would be quite concerned if the FDA abridged the standard for causal efficacy in the licensing of new medications; we should be just as concerned about judicial abridgments of standards for causation of harm in tort actions.
Leveling the hierarchy of evidence has been an explicit or implicit goal of several law professors. Some of the leveling efforts even show up in the new Reference Manual for Scientific Evidence (RMSE 3d ed. 2011). See “New-Age Levellers – Flattening Hierarchy of Evidence.”
The Circuit, in Milward, quoted an article published in the Journal of the National Cancer Institute by Michele Carbone and others who suggest that there should be no hierarchy, but the Court ignored a huge body of literature that explains and defends the need for recognizing that not all study designs or types are equal. Interestingly, the RMSE chapter on epidemiology by Professor Green (see more below) cites the same article. RMSE 3d at 564 & n.48 (citing and quoting symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.” Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004).) Carbone, of course, is best known for his advocacy of a viral cause (SV40), of human mesothelioma, a claim unsupported, and indeed contradicted, by epidemiologic studies. Carbone’s statement does not support the RMSE chapter’s leveling of epidemiology and toxicology, and Carbone is, in any event, an unlikely source to cite.
The First Circuit, in Milward, studiously ignored a mountain of literature on evidence-based medicine, including the RSME 3d chapter on “Reference Guide on Medical Testimony,” which teaches that leveling of study designs and types is inappropriate. The RMSE chapter devotes several pages to explaining the role of study design in assessing an etiological issue:
“3. Hierarchy of medical evidence
With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence. A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.
When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154“
John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” RMSE 3d 687, 723 -24 (2011). The implication that there is no hierarchy of evidence in causal inference, and that tissue culture studies are as relevant as epidemiology, is patently absurd. The Circuit not only went out on a limb, it managed to saw the limb off, while “out there.”
Milward – Responses Critical and Otherwise
The First Circuit’s decision in Milward made an immediate impression upon those writers who have worked hard to dismantle or marginalize Rule 702. The Circuit’s decision was mysteriously cited with obvious approval by Professor Margaret Berger, even though she had died before the decision was published! Margaret A. Berger, “The Admissibility of Expert Testimony,” RMSE 3d at 20 & n. 51(2011). Professor Michael Green, one of the reporters for the ALI’s Restatement (Third) of Torts hyperbolically called Milward “[o]ne of the most significant toxic tort causation cases in recent memory.” Michael D. Green, “Introduction: Restatement of Torts as a Crystal Ball,” 37 Wm. Mitchell L. Rev. 993, 1009 n.53 (2011).
The WOE approach, and its embrace in Milward, obscures the reality that sometimes the evidence does not logically or analytically support the offered conclusion, and at other times, the best explanation is uncertainty. By adopting the WOE approach, vague and ambiguous as it is, the Milward Court was beguiled into holding that WOE determinations are for the jury. The lack of meaningful content of WOE means that decisions such as Milward effectively remove the gatekeeping function, or permit that function to be minimally satisfied by accepting an expert witness’s claim to have employed WOE. The epistemic warrant required by Rule 702 is diluted if not destroyed. Scientific hunch and speculation, proper in their place, can be passed off for scientific knowledge to gullible or result-oriented judges and juries.