So now that the new, fourth, edition of the Reference Manual on Scientific Evidence,[1] has been released, inquiring minds may want to know whether it has corrected errors in the previous, third, edition.[2] The authors of the new edition have had 14 years to ponder and reflect upon errors and to correct them.
Judges and lawyers look to the Manual for guidance and understanding of basic concepts, and the first three editions contained significant errors in addressing statistical concepts. There is probably no better place to jump in to see whether the new edition has corrected the prevalent mistakes in defining the statistical concept of a confidence interval, which was botched in several chapters in the third edition.[3] The concept of a confidence interval is important in many statistical applications, but it is especially important in the interpretation of epidemiologic studies.
Contrition is good for the soul. The new edition, in places, evinces an awareness that earlier editions had misled readers, and that the fourth edition needed to do better. And in several key places, including in particular the chapter, the fourth edition has improved in its discussion of confidence intervals.
Professor David Kaye has two chapters in the new edition, one on DNA evidence, and another chapter, with Professor Hal Stern, on statistical evidence.[4] Kaye is a careful writer with substantial statistical expertise. His contributions to the third edition were anodyne treatments of statistical concepts, and his chapters in the new edition seem excellent as well upon first reading. In his chapter on DNA evidence, Kaye alludes to the misunderstandings and misrepresentations of the confidence interval,[5] and in his chapter on statistical evidence, Kaye, along with Stern, gives careful definitions and explications of confidence intervals.
Kaye and Stern call out several cases, frequently cited, for having given clearly incorrect definitions of confidence intervals. This sort of candor to the court is necessary if judges, and lawyers, are going to correct bad practices.[6] The statistics chapter in the fourth edition also does not shy away from calling out the authors of another chapter [epidemiology] in the Reference Manual’s third edition for having given erroneous definitions:
“Language from another reference guide in the previous edition of this Reference Manual that is often quoted may inadvertently convey the incorrect impression that a confidence coefficient such as 95% refers to the percentage of results in (hypothetically) repeated studies that would be expected to lie within the interval reported in the study before the court.”[7]
A very gentle criticism indeed; the epidemiology chapter was manifestly incorrect, and we can all agree that its error was negligent, not intentional. The epidemiology chapter from the third edition did not merely convey the incorrect impression; that chapter contained erroneous definitions of confidence intervals.
Kaye and Sterne correctly note that a given confidence interval “does not give the probability that the unknown parameter lies within the confidence interval.”[8] And they helpfully point out that there is no tendency for the point estimate near the center of a confidence interval to be closer to the true value than any other value within the interval.[9]
The authors of the new edition’s chapter on epidemiology obviously got the message from Professors Kaye and Sterne.[10] Fourth time is a charm. The epidemiology chapter in the third edition had been a mess on statistical issues.[11] Without any acknowledgment or confession of error committed in the first three editions, the authors of the epidemiology chapter in the fourth edition now note:
“Just as the p-value does not provide the probability that the risk estimate found in a study is correct, the confidence interval does not provide the range within which the true risk is likely to lie. In other words, it is a misconception to interpret a 95% confidence interval as representing an interval within which the true value has a 95% probability of being found.”[12]
Unfortunately, in the glossary at the end of the new edition’s epidemiology chapter, the erroneous definition of confidence interval was carried forward from the third edition, without change or correction:
“confidence interval. A range of values that reflects random error. Thus, if a confidence level of 0.95 is selected for a study, 95% of similar studies would result in the true relative risk falling within the confidence interval.”[13]
What the authors no doubt meant to write was that:
“95% of similar studies would result in the true relative risk falling within the confidence intervals.”
By putting “interval” in the singular, the authors fell into the trap described by Professors Kaye and Hall, and into the error that the previous chapters on epidemiology committed.
The new edition of the Reference Manual appears to suffer, at least on this statistical issue, from the lack of high-level editing across chapters. The interaction between authors of the statistics and the epidemiology chapters sorted out a serious error, but the error pops up in new chapters. Michael Weisberg and Anastasia Thanukos have an introductory chapter on How Science Works, which crudely and incorrectly describes confidence intervals:
“Uncertainty and error are generally expressed as a range, within which we are confident that, if the study were repeated, the new result would fall. Scientists often use a 95% confidence interval for this purpose.”[14]
Confidence intervals model only random error, and the “range” around one point estimate does not give us “confidence” that the next point estimate would fall into that range.
The chapter on regression analyses in third edition of the Reference Manual incorrectly defined confidence intervals.[15] Alas the fourth edition did not auto-correct:
“Loosely speaking, a confidence interval represents an interval of values in which the true value of a regression coefficient falls within some pre-specified probability (where the true value is the estimate that would be obtained from the same model with a very large sample).”[16]
Why the authors of a highly technical chapter chose to speak loosely, rather than accurately, is a mystery. All the authors of the regression chapter had to do was refer to the accurate, helpful definitions in the statistics chapter.
Why should we care about the Reference Manual’s misleading, incorrect definitions of confidence intervals (or p-values for that matter)? The erroneous definitions and misuses typically place a Bayesian interpretation upon the confidence interval by claiming that the coefficient of confidence (typically 95% when alpha is set at 0.05) states the probability that the parameter, the true population measure, falls within the interval around the point estimate. This misinterpretation might suffice for a Bayesian 95% credible interval, but almost invariably the calculation under discussion is the point estimate ± 1.96 standard errors. Good statistics, like good grammar, costs nothing.
Whether the conflation of confidence intervals with credible intervals results from ignorance or willful efforts to mislead, it is wrong. And the conflation is part of a long-running rhetorical campaign to mislead about the meaning of the burden of proof and statistical significance in order to abandon statistical tests, and to green-light precautionary principle judgments as “scientific.”[17]
In past posts, I have cited and quoted any number of scientists and lawyers who have engaged in the effort, either intentional or negligent, to mislead readers about the nature of science, by idealizing and falsely elevating the burden of proof in science, and declaring it to be different from the legal and regulatory burden of proof.[18]
To pick one particularly notorious author, consider junk science writer Naomi Oreskes.[19] In her 2010 book, Oreskes declares:
“The 95 percent confidence standard means that there is only 1 chance in 20 that you believe something that isn’t true.
* * * * *
That is a very high bar. It reflects a scientific worldview in which skepticism is a virtue, credulity is not.”[20]
In fact, statistics, science, and law, the confidence interval has nothing to do with the burden of proof; rather it reflects the precision of a single point estimate. Truth is a virtue that may be lost on the likes of Naomi Oreskes, but it is essential to litigating scientific issues. Given that many lawyers in the past had cited the Reference Manual’s chapter on epidemiology for its incorrect definitions of the statistical confidence interval, we should rejoice that this one error has been corrected.
[1] National Academies of Sciences, Engineering, and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE (4th ed. 2025) (cited as RMSE 4th ed.)
[2] National Academies of Sciences, Engineering, and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE (3rd ed. 2011) (cited as RMSE 3rd ed.)
[3] See Nathan Schachtman, Reference Manual – Desiderata for 4th Edition – Part IV – Confidence Intervals, TORTINI (Feb. 10, 2023).
[4] In RMSE 3rd ed., Professor Kaye, along with David Freedman, wrote the chapter on statistical evidence; the two gave careful definitions and explications of confidence intervals. Professor Freedman sadly died before the third edition was released, and he is replaced by Hal Stern in the chapter on statistics in the fourth edition.
[5] David H. Kaye, Reference Guide on Human DNA Identification Evidence in RMSE 4th ed. at 261, (noting that “the meaning of a confidence interval is subtle, and the estimate commonly is misconstrued”).
[6] See Kaye & Sterne, RMSE 4th ed. at 511 n.125 (citing Turpin v. Merrell Dow Pharm., Inc., 959 F.2d 1349, 1353 (6th Cir. 1992) (“If a confidence interval of ‘95 percent between 0.8 and 3.10 is cited, this means that random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10.”); Garcia v. Tyson Foods, Inc., 890 F. Supp. 2d 1273, 1285 (D. Kan. 2012) (“Dr. Radwin testified that his study was conducted within a confidence interval of 95 — that is ‘if I did this study over and over again, 95 out of a hundred times I would expect to get an average between that interval.’”); In re Silicone Gel Breast Implants Prods. Liab. Litig., 318 F. Supp. 2d 879, 897 (C.D. Cal. 2004) (“a margin of error between 0.5 and 8.0 at the 95% confidence level . . . means that 95 times out of 100 a study of that type would yield a relative risk value somewhere between 0.5 and 8.0”)).
[7] See Kaye & Sterne, RMSE 4th ed. at 511 n.125 (citing Rhyne v. U.S. Steel Corp., 474 F. Supp. 3d 733, 744 (W.D.N.C. 2020) (“‘If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the population.’ Reference Guide on Epidemiology, at 580.”).
[8] Kaye & Sterne, RMSE 4th ed. at 512 & n. 126 (citing additional errant judicial decisions, and Geoff Cumming & Robert Maillardet, Confidence Intervals and Replication: Where Will the Next Mean Fall?, 11 PSYCH. METHODS 217 (2006).)
[9] Id. at 512.
[10] Steve C. Gold, Michael D. Green, Jonathan Chevrier, & Brenda Eskenazi, Reference Guide on Epidemiology, in RMSE 4th ed. at 897
[11] Michael D. Green, D. Michal Freedman & Leon Gordis, Reference Guide on Epidemiology, 549, 573, 580, in RMSE 3rd ed.
[12] Steve C. Gold, Michael D. Green, Jonathan Chevrier, & Brenda Eskenazi, Reference Guide on Epidemiology, RMSE 4th ed. at 897, 939.
[13] Id. at 1011.
[14] Michael Weisberg & Anastasia Thanukos, How Science Works , in RMSE 4th ed. at 47, 90.
[15] Daniel Rubinfeld, Reference Guide on Multiple Regression, RMSE 3rd ed. at 303, 342, 352.
[16] Daniel Rubinfeld & David Card, Reference Guide on Multiple Regression and Advanced Statistical Models, in RMSE 4th ed. at 577, 613.
[17] Schachtman, Rhetorical Strategy in Characterizing Scientific Burdens of Proof, TORTINI (Nov. 11, 2014);
[18] See, e.g., Kevin C. Elliott & David B. Resnik, Science, Policy, and the Transparency of Values, 122 ENVT’L HEALTH PERSP. 647 (2014) (exemplifying the rhetorical strategy that idealizes and elevates a burden of proof in science, and then declaring it to be different from legal and regulatory burdens of proof).
[19] Schachtman, Playing Dumb on Statistical Significance, TORTINI (Jan. 4, 2015); The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes, TORTINI (Jan. 17, 2015).
[20] Naomi Oreskes & Erik M. Conway, MERCHANTS OF DOUBT: HOW A HANDFUL OF SCIENTISTS OBSCURED THE TRUTH ON ISSUES FROM TOBACCO SMOKE TO GLOBAL WARMING at 156-57 (2010).
