Rhetorical Strategy in Characterizing Scientific Burdens of Proof

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration.  Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities: Fatico

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.


 

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).