Reinventing the Burden of Proof

If lawyers make antic claims that keep the courtrooms busy, law professors make antic proposals to suggest that the law is conceptually confused and misguided, to keep law reviews full.

A few years ago, an article by Professor Edward Cheng claimed that common law courts have failed to grasp the true meaning of burdens of proof. Edward K. Cheng, “Reconceptualizing the Burden of Proof,” 122 Yale L. J. 1254 (2013) [Cheng]. Every law student knows that the preponderance-of-the-evidence standard requires that the party with the burden of proof to establish each element of the claim or defense to a probability greater than 50%. Cheng acknowledges that courts know this as well (citations omitted), but then he goes on to state some remarkable assertions.

First, Cheng suggests that the legal system has engaged in a “casual recharacterization of the burden of proof into p > 0.5 and p > 0.95.” Cheng at 1258. Being charitable, let’s say “characterization” rather than “recharacterization,” for Cheng cites nothing for his suggestion that there was some prior characterization that the law mischievously changed. Cheng at 1258.

Second, Cheng claims that the failure to deal with quantified posterior probabilities is the result of an educational or psychological deficiency of judges and lawyers:

“By comparison, the criminal beyond-a-reasonable-doubt standard is akin to a probability greater than 0.9 or 0.95. Perhaps, as most courts have ruled, the prosecution is not allowed to quantify ‘reasonable doubt’, but that is only an odd quirk of the math-phobic legal system.”

Cheng at 1256 (internal citations omitted). Cheng’s “recharacterization” has given way to his own mischaracterization of the legal system. There is a pandemic math phobia in the legal system, but the refusal to quantify the burden of proof in criminal cases has nothing to do with fear or mathematical incompetence. Most cases simply do not permit any rational or principled quantification of posterior probabilities. And even if they were to allow such a cognitive maneuver, most people, and even judges, cannot map practical certainty, or something like “beyond a reaonable doubt” on to a probability scale of 0 to 1. No less than Judge Jack Weinstein, certainly a friend to the notion that “all evidence is probabilistic,” showed in his informal survey of federal judges of the Eastern District of New York, that judges have no idea of what probability corresponds to the criminal burden of proof:

US v Fatico BoP

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Judge Weinstein’s informal survey showed well enough that there is no real understanding of how to map reasonable doubt or its complement onto a scale of 0 to 1. Furthermore, for the vast majority of cases, there is simply no way to assign meaningful probabilities to events, causes, and states of mind, which make up the elements of claims and defenses in our legal system.

Third, Cheng makes much of the non-existence of absolute probabilities in legal contexts. The word “absolute” is used 14 times in his essay. This point is confusing as stated because no one, to my knowledge, has claimed that the burden of proof is an absolute probability that is stated or arrived at independently of evidence in the case. Plaintiffs and defendants can have burdens of proof and claims and defenses, respectively, but for sake of simplicity, let’s follow Cheng and describe the civil burden of proof as the plaintiff’s burden. The relevant probability is not the absolute probability P(Hπ), but rather the conditional posterior probability: P(Hπ | E).

Fourth, Cheng’s principal innovation, the introduction of a probability ratio as the true meaning and model of the burden of proof has little or no support in case law or in evidence theory. Cheng cites virtually no cases, and only a few selected publications from the world of law reviews. Cheng proposes to recast burdens of proof as a ratio of conditional probabilities of the plaintiff’s and defendant’s “stories.” If the posterior probability of the plaintiff’s story at trial’s end is P(Hπ | E)1, and the defendant’s story is represented as P(Hδ | E), then Cheng argues that the plaintiff has carried his burden of proof whenever

P(Hπ | E) / P(Hδ | E) > 1.0

This innovation seems fundamentally wrong for several reasons. Again, assuming that the plaintiff or the State has the burden of proof, the defendant has none. If the plaintiff presents no evidence, then the numerator will be zero, and the ratio will be zero. The defendant prevails, and Cheng’s theory holds. But if the plaintiff presents some evidence and the defendant presents none, then the ratio is undefined. Alternatively, we may see the ratio in this situation as approaching infinity as a limit as the probability of the defendant’s “story” based upon his evidence approaches zero. On either interpretation of this scenario, the ratio Cheng invents is huge, and yet the plaintiff may well lose as for instance when plaintiff’s case is insufficient as a matter of law.

Cheng’s ratio theory thus fails as a descriptive theory. The theory appears to fail prescriptively as well. In most civil and criminal cases, the finder of fact is instructed that the defendant has no burden of proof and need not present any evidence at all. Even when the defendant has remained silent, and the plaintiff has presented a legally sufficient case, the fact finder may return a verdict for the defendant when the P(Hπ | E) seems too low with respect to the burden of proof.

Let’s consider an example, perhap not too far fetched in some American courtrooms. The plaintiff claims that drug A has caused him to develop Syndrome Z. Plaintiff has no clinical trial, or analytical epidemiologic, or animal evidence to support his claim. All the plaintiff can adduce is a so-called disproportionality analysis based upon the reporting of adverse events to the FDA. The defendant does not present any evidence of safety. The end point of interest in the lawsuit, Syndrome Z, was not observed in the trials, and was never looked for in any epidemiologic or toxicologic study. The defendant thus has no affirmative evidence of safety that counts for P(Hδ | E).

Assuming that the trial court does not toss this claim pretrial on a Rule 702 motion, or on a directed verdict, the defendant must address the plaintiff’s claim and the assertion that P(Hπ | E) > 0. The plaintiff supports his claim and assertion by presenting an expert witness who endorses the validity, accuracy, and probativeness of the disproportionality analysis. The defendant confronts this evidence solely on cross-examination, and not by trying to suggest that the plaintiff’s expert witness’s analysis is actually evidence of safety. The point of the cross-examination is to show that the proferred analysis is not a valid tool and lacks validity, accuracy, and probativeness.

In this situation, the plaintiff’s P(Hπ | E) might have been greater than 0.5 at the end of direct examination, but if defense counsel has done his job, then at the end of the cross-examination, the P(Hπ | E) < 0.5. Perhaps at this stage of the proceedings, P(Hπ | E) < 0.01.

The defendant, having no affirmative evidence of safety, rests without presenting any evidence. P(Hδ | E) = 0. Alas, we cannot say that P(Hδ | E) is the complement of P(Hπ | E). There is, in most cases, way too much room for ignorance, indeterminate, or unknown probability of the P(Hδ). In this hypothetical, however, there is no evidence adduced for safety at all, only very weak and unreliable evidence of harm. The ratio is undefined, but the law would allow the dismissal of the plaintiff’s case, or would affirm a rational fact finder’s return of a defense verdict. And the law should do those things.

Fifth, Cheng commits other errors along the way to arriving at his ratio theory. In one instance, he commits a serious category mistake:

“Looking at the statistical world, we immediately see that characterizing any decision rule as a 0.5 probability threshold is odd. Statisticians rarely attempt to prove the truth of a proposition or hypothesis by using its absolute probability. Instead, hypothesis testing is usually comparative. There is a null hypothesis and an alternative hypothesis, and one is rejected in favor of the other depending on the evidence observed and the consistency of that evidence with the two hypotheses.”

Cheng at 1259 (internal citations omitted; emphasis added).

Again, Cheng is correct insofar as he suggests that statisticians do not often use use absolute probabilities. Attained levels of significance probabilities, whether used in hypothesis testing or otherwise, are conditional probabilities that describe the probability of observing the sample statistic, or one more extreme, based upon the statistical model and posited null hypothesis. Indeed, many methodologically rigorous statisticians and scientists would resist placing a quantified posterior probability on the truth of a proposition or hypothesis. The measures of probability may be helpful in identifying uncertainties due to random error, or even on occasion due to bias, but these measures do not translate into assigning the quantified posterior probabilites that Cheng wants and needs to make his ratio theory work. There is nothing, however, odd about using the quantified posterior probability of greater than 50% as a metaphor.

But whence comes rejecting one hypothesis “in favor of” another, as a matter of statistics? The null hypothesis is not accepted in the hypothesis test; rather it was assumed in order to conduct the test. The inference Cheng describes would be improper. In a footnote, Cheng asserts that “classical hypothesis testing strongly favors the null hypothesis,” but this conflates attained level of significance with posterior probabilities. Cheng at 1259 n. 12. Cheng states that “the null hypothesis can be given no specific preference,” in legal contexts, id., but this statement seems to ignore what it means for a party to have a burden of proving facts needed to establish its claim or defense.

Of course, over the course of multiple studies, which look at the issue repeatedly with increasingly precise and valid experiments and studies, and which consistently fail to reject a given null hypothesis, we sometimes do, as a matter of judgment, accept the null hypothesis. This situation has little to do with the Cheng’s ratio theory, however.


1   Where P stands for probability, Hπ for the plaintiff’s “story,” Hδ for the defendant’s story, P(Hπ | E) represents the posterior probability at trial’s end of the plaintiff’s story given the evidence, and P(Hδ | E) represents the posterior probability at trial’s end of the defendant’s story given the evidence.