Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence

It has been an interesting year in the world of expert witnesses.  We have seen David Egilman attempt a personal appeal of a district court’s order excluding him as an expert.  Stephen Ziliak has prattled on about how he steered the Supreme Court from the brink of disaster by helping them to avoid the horrors of statistical significance.  And then we had a philosophy professor turned expert witness, Carl Cranor, publicly touting an appellate court’s decision that held his testimony admissible.  Cranor, under the banner of the Center for Progressive Reform (CPR), hails the First Circuit’s opinion as the greatest thing since Sir Isaac Newton.   Carl Cranor, “Milward v. Acuity Specialty Products: How the First Circuit Opened Courthouse Doors for Wronged Parties to Present Wider Range of Scientific Evidence” (July 25, 2011).

Philosophy Professor Carl Cranor has been trying for decades to dilute the scientific approach to causal conclusions to permit the precautionary principle to find its way into toxic tort cases.  Cranor, along with others, has also criticized federal court expert witness gatekeeping for deconstructing individual studies, showing that the individual studies are weak, and ignoring the overall pattern of evidence from different disciplines.  This criticism has some theoretical merit, but the criticism is typically advanced as an excuse for “manufacturing certainty” from weak, inconsistent, and incoherent scientific evidence.  The criticism also ignores the actual text of the relevant rule – Rule 702, which does not limit the gatekeeping court to assessing individual “pieces” of evidence.  The scientific community acknowledges that there are times when a weaker epidemiologic dataset may be supplemented by strong experiment evidence that leads appropriately to a conclusion of causation.  See, e.g., Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxicological Sci. 223 (2011) (noting the lack of a systematic, transparent way to integrate toxicologic and epidemiologic data to support conclusions of causality; proposing a “grid” to permit disparate lines of evidence to be integrated into more straightforward conclusions).

For the most part, Cranor’s publications have been ignored in the Rule 702 gatekeeping process.  Perhaps that is why he shrugged his academic regalia and took on the mantle of the expert witness, in Milward v. Acuity Specialty Products, a case involving a claim that benzene exposure caused plaintiff’s acute promyelocytic leukemia (APL), one of several types of acute myeloid leukemia.  Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D.Mass. 2009) (O’Toole, J.).

Philosophy might seem like the wrong discipline to help a court or a jury decide general and specific causation of a rare cancer, with an incidence of less 8 cases per million per year.  (A PubMed search on leukeumia and Cranor yielded no hits.)  Cranor supplemented the other, more traditional testimony from a toxiciologist, by attempting to show that the toxicologist’s testimony was based upon sound scientific method.  Cranor was particularly intent to show that the toxicologist, Dr. Martyn Smith, had used sound method to reach a scientific conclusion, even though he lacked strong epidemiologic studies to support his opinion.

The district court excluded Cranor’s testimony, along with plaintiff’s scientific expert witnesses.  The Court of Appeals, however, reversed, and remanded with instructions that plaintiff’s scientific expert witnesses’ opinions were admissible.  639 F.3d 11 (1st Cir. 2011).  Hence Cranor’s and the CPR’s hyperbole about the opening of the courthouse doors.

The district court was appropriately skeptical about plaintiff’s expert witnesses’ reliance upon epidemiologic studies, the results of which were not statistically significant.  Before reaching the issue of statistical significance, however, the district court found that Dr. Smith had relied upon studies that did not properly support his opinion.  664 F.Supp. 2d at 148.  The defense presented Dr. David Garabrant, an expert witness with substantial qualifications and accomplishments in epidemiologic science.  Dr. Garabrant persuaded the Court that Dr. Smith had relied upon some studies that tended to show no association, and others that presented faulty statistical analyses.  Other studies, relied upon by Dr. Smith, presented data on AML, but Dr. Smith speculated that these AML cases could have been APL cases.  Id.

None of the studies relied upon by plaintiffs’ Dr Smith had a statistically significant result for APL.  Id. at 144. The district court pointed out that scientists typically take care to rely upon data only that shows “statistical significance,” and Dr. Smith (plaintiff’s expert witness) deviated from sound scientific method in attempting to support his conclusion with studies that had not ruled out chance as an explanation for their increased risk ratios.  Id.  The district court did not summarize the studies’ results, and so the unsoundness of plaintiff’s method is difficult to evaluate.  Rather than engaging in hand waving and speculating about “trends” and suggestions, those witnesses could have performed a meta-analysis to increase the statistical precision of a summary point estimate beyond what was achieved in any single, small study.  Neither the plaintiff nor the district court addressed the issue of aggregating study results to address the role of chance in producing the observed results.

The inability to show a statistically significant result was not surprising given how rare the APL subtype of AML is.  Sample size might legitimately interfere with the ability of epidemiologic studies to detect a statistically significant association that really existed.  If this were truly the case, the lack of a statistically significant association could not be interpreted to mean the absence of an association without potentially committing a type II error. In any event, the district court in Milward was willing to credit the plaintiffs’ claim that epidemiologic evidence may not always be essential for establishing causality.  If causality does exist, however, epidemiologic studies are usually required to confirm the existence of the causal relationship.  Id. at 148.

The district court also took a close look at Smith’s mechanistic biological evidence, and found it equally speculative.  Although plausibility is a desirable feature of a causal hypothesis, it only sets the stage for actual data:

“Dr. Smith’s opinion is that ‘[s]ince benzene is clastogenic and has the capability of breaking and rearranging chromosomes, it is biologically plausible for benzene to cause’ the t(15;17) translocation. (Smith Decl. ¶ 28.b.) This is a kind of ‘bull in the china shop’ generalization: since the bull smashes the teacups, it must also smash the crystal. Whether that is so, of course, would depend on the bull having equal access to both teacups and crystal.”

Id. at 146.

“Since general extrapolation is not justified and since there is no direct observational evidence that benzene causes the t(15;17) translocation, Dr. Smith’s opinion — that because benzene is an agent that can cause some chromosomal mutations, it is ‘plausible’ that it causes the one critical to APL—is simply an hypothesis, not a reliable scientific conclusion.”

Id. at 147.

Judge O’Toole’s opinion is a careful, detailed consideration of the facts and data upon which Dr. Smith relied upon, but the First Circuit found an abuse of discretion, and reversed. 639 F.3d 11 (1st Cir. 2011).

The Circuit incorrectly suggested that Smith’s opinion was based upon a “weight of the evidence” methodology described by “the world-renowned epidemiologist Sir Arthur Bradford Hill in his seminal methodological article on inferences of causality. See Arthur Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965).” Id. at 17.  This suggestion is remarkable because everyone knows that it was Arthur’s much smarter brother, Austin, who wrote the seminal article and gave the Bradford Hill name to the famous presidential address published by the Royal Society of Medicine.  Arthur Bradford Hill was not even a knight if he existed at all.

The Circuit’s suggestion is also remarkable for confusing a vague “weight of the evidence” methodology with the statistical and epidemiologic approach of one of the 20th century’s great methodologists.  Sir Austin is known for having conducted the first double-blinded randomized clinical trial, as well as having shown, with fellow knight Sir Richard Doll, the causal relationship between smoking and lung cancer.  Sir Austin wrote one of the first texts on medical statistics, Principles of Medical Statistics (London 1937).  Sir Austin no doubt was turning in his grave when he was associated with Cranor’s loosey-goosey “weight of the evidence” methodology.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).

The Circuit adopted a dismissive attitude towards epidemiology in general, citing to an opinion piece by several cancer tumor biologists, whom the court described as a group from the National Cancer Institute (NCI).  The group was actually a workshop sponsored by the NCI, with participants from many institutions.  Id. at 17 (citing Michele Carbon[e] et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004)).  The cited article did report some suggestions for modifying Bradford Hill’s criteria in the light of modern molecular biology, as well as a sense of the group that there was no “hierarchy” in which epidemiology was at the top.  (The group definitely did not address the established concept that some types of epidemiologic studies are analytically more powerful to support inferences of causality than others — the hierarchy of epidemiologic evidence.)

The Circuit then proceeded to evaluate Dr. Smith’s consideration of the available epidemiologic studies.  The Circuit mistakenly defined an “odds ratio” as the “the difference in the incidence of a disease between a population that has been exposed to benzene and one that has not.”  Id. at 24. Having failed to engage with the evidence sufficiently to learn what an odds ratio was, the Circuit Court then proceeded to state that the difference between Dr. Garabrant and Dr. Smith, as to how to calculate the odds ratio in some of the studies, was a mere difference in opinion between experts, and Dr. Garabrant’s criticisms of Dr. Smith’s approach went to the weight, not the admissibility, of the evidence.  These sparse words are, of course, a legal conclusion, not an explanation, and the Circuit leaves us without any real understanding of how Dr. Smith may have gone astray, but still have been advancing a legitimate opinion within epidemiology, which was not his discipline.  Id. at 22. If Dr. Smith’s idea of an odds ratio was as incorrect as the Circuit’s, his calculation may have had no validity whatsoever, and thus his opinions derived from his flawed ideas may have clearly failed the requirements of Rule 702.  The Circuit’s opinion is not terribly helpful in understanding anything other than its summary rejection of the district court’s more detailed analysis.

The Circuit also advanced the “impossibility” defense for Dr. Smith’s failure to rely upon epidemiologic studies with statistically significant results.  Id. at 24. As noted above, such studies fail to rule out chance for their finding of risk ratios above or below 1.0 (the measure of no association).  Because the likelihood of obtaining a risk ratio of exactly 1.0 is vanishingly small, epidemiologic science must and does consider the role of chance in explaining data that diverges from a measure of no association.  Dr. Smith’s hand waving about the large size of the studies needed to show an increased risk may have some validity in the context of benzene exposure and APL, but it does not explain or justify the failure to use aggregative techniques such as meta-analysis.  The hand waving also does nothing to rule out the role of chance in producing the results he relied upon in court.

The Circuit Court appeared to misunderstand the very nature of the need for statistical evaluation of stochastic biological events, such as APL incidence in a population.  According to the Circuit, Dr. Smith’s reliance upon epidemiologic data was merely

“meant to challenge the theory that benzene exposure could not cause APL, and to highlight that the limited data available was consistent with the conclusions that he had reached on the basis of other bodies of evidence. He stated that ‘[i]f epidemiologic studies of benzene-exposed workers were devoid of workers who developed APL, one could hypothesize that benzene does not cause this particular subtype of AML.’ The fact that, on the  contrary, ‘APL is seen in studies of workers exposed to benzene where the subtypes of AML have been separately analyzed and has been found at higher levels than expected’ suggested to him that the limited epidemiological evidence was at the very least consistent with, and suggestive of, the conclusion that benzene can cause APL.

* * *

Dr. Smith did not infer causality from this suggestion alone, but rather from the accumulation of multiple scientifically acceptable inferences from different bodies of evidence.”

Id. at 25

But challenging the theory that benzene exposure does not cause APL does not help show the validity of the studies relied upon, or the inferences drawn from them.  This was plaintiffs’ and Dr. Smith’s burden under Rule 702, and the Circuit seemed to lose sight of the law and the science with Professor Cranor’s and Dr. Smith’s sleight of hand.  As for the Circuit’s suggestion that scraps of evidence from different kinds of scientific studies can establish scientific knowledge, this approach was rejected by the great mathematician, physicist, and philosopher of science, Henri Poincaré:

“[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”

Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).  Litigants, either plaintiff or defendant, should not be allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science.

As unclear and dubious as the Circuit’s opinion is, the court did not throw out the last 18 years of Rule 702 law.  The Court distinguished the Milward case, with its sparse epidemiologic studies from those cases “in which the available epidemiological studies found that there is no causal link.”  Id. at 24 (citing Norris v. Baxter Healthcare Corp., 397 F.3d 878, 882 (10th Cir.2005), and Allen v. Pa. Eng’g Corp., 102 F.3d 194, 197 (5th Cir.1996).  The Court, however, provided no insight into why the epidemiologic studies must rise to the level of showing no causal link before an expert can torture weak, inconsistent, and contradictory data to claim such a link.  This legal sleight of hand is simply a shifting of the burden of proof, which should have been on plaintiffs and Dr. Smith.  Desperation is not a substitute for adequate scientific evidence to support a scientific conclusion.

The Court’s failure to engage more directly with the actual data, facts, and inferences, however, is likely to cause mischief in federal cases around the country.