The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes

As a matter of policy, I leave the comment field turned off on this blog. I don’t have the time or patience to moderate discussions, but that is not to say that I don’t value feedback. Many readers have written, with compliments, concurrences, criticisms, and corrections. Some correspondents have given me valuable suggestions and materials. I believe I can say that aside from a few scurrilous emails, the feedback generally has been constructive, and welcomed.

My last post was on Naomi Oreskes’ opinion piece in the Sunday New York Times[1]. Professor Deborah Mayo asked me for permission to re-post the substance of this post, and to link to the original[2]. Mayo’s blog does allow for comments, and much to my surprise, the posts drew a great deal of attention, links, comment, and twittering. The number and intensity of the comments, as well as the other blog posts and tweets, seemed out of proportion to the point I was trying to make about misinterpreting confidence intervals and other statistical concepts. I suspect that some climate skeptics received my criticisms of Oreskes with a degree of schadenfreude, and that some who criticized me did so because they fear any challenge to Oreskes as a climate-change advocate. So be it. As I made clear in my post, I was not seeking to engage Oreskes on climate change or her judgments on that issue. What I saw in Oreskes’ article was the same rhetorical move made in the courtroom, and in scientific publications, in which plaintiffs environmentalists attempt to claim a scientific imprimatur for their conclusions without adhering to the rigor required for scientific judgments[3].

Some of the comments about Professor Oreskes caused me to take a look at her recent book, Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (N.Y. 2010). Interestingly, much of the substance of Oreskes’ newspaper article comes directly from this book. In the context of reporting on the dispute over the EPA’s meta-analysis of studies on passive smoking and lung cancer, Oreskes addressed the 95 percent issue:

“There’s nothing magic about 95 percent. It could be 80 percent. It could be 51 percent. In Vegas if you play a game with 51 percent odds in your favor, you’ll still come out ahead if you play long enough. The 95 percent confidence level is a social convention, a value judgment. And the value it reflects is one that says that the worst mistake a scientist can make is to fool herself: to think an effect is real when it is not. Statisticians call this a type I error. You can think of it as being gullible, naive, or having undue faith in your own ideas.89 To avoid it, scientists place the burden of proof on the person claiming a cause and effect. But there’s another kind of error-type 2-where you miss effects that are really there. You can think of that as being excessively skeptical or overly cautious. Conventional statistics is set up to be skeptical and avoid type I errors. The 95 percent confidence standard means that there is only 1 chance in 20 that you believe something that isn’t true. That is a very high bar. It reflects a scientific worldview in which skepticism is a virtue, credulity is not.90 As one Web site puts it, ‘A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error’.91 In fact, some statisticians claim that type 2 errors aren’t really errors at all,  just missed opportunities.92

Id. at 156-57 (emphasis added). Oreskes’ statement of the confidence interval, from her book, advances more ambiguity by not specifying what the “something” you don’t believe to be true. Of course, if it is the assumed parameter, then she has made the same error as she did in the Times. Oreskes’ further discussion of the EPA environmental tobacco smoke meta-analysis issue makes her meaning clearer, and her interpretation of statistical significance, less defensible:

“Even if 90 percent is less stringent than 95 percent, it still means that there is a 9 in 10 chance that the observed results did not occur by chance. Think of it this way. If you were nine-tenths sure about a crossword puzzle answer, wouldn’t you write it in?94

Id.  Throughout her discussion, Oreskes fails to acknowledge that the p-value assumes the correctness of the null hypothesis in order to assess the strength of the specific data as evidence against the null. As I have pointed out elsewhere, this misinterpretation of significance testing is a rhetorical strategy to evade significance testing, as well as to obscure the role of bias and confounding in accounting for data that differs from an expected value.

Oreskes also continues to maintain that a failure to reject the null is playing “dumb” and placing:

“the burden of proof on the victim, rather than, for example, the manufacturer of a harmful product-and we may fail to protect some people who are really getting hurt.”

Id. So again, the same petitio principii as we saw in the Times. Victimhood is exactly what remains to be established. Oreskes cannot assume it, and then criticize time-tested methods that fail to deliver a confirmatory judgment.

There are endnotes in her book, but the authors fail to cite any serious statistics text. The only reference of dubious relevance is another University of Michigan book, Stephen T. Ziliak & Deidre N. McCloskey, The Cult of Statistical Significance (2008). Enough said[4].

With a little digging, I learned that Oreskes and Conway are science fiction writers, and perhaps we should judge them by literary rather than scientific standards. See Naomi Oreskes & Erik M. Conway, “The Collapse of Western Civilization: A View from the Future,” 142 Dædalus 41 (2013). I do not imply any pejorative judgment of Oreskes for advancing her apocalyptic vision of the future of Earth’s environment as a work of fiction. Her literary work is a worthy thought experiment that has the potential to lead us to accept her precautionary judgments; and at least her publication, in Dædalus, is clearly labeled science fiction.

Oreskes’ future fantasy is, not surprisingly, exactly what Oreskes, the historian of science, now predicts in terms of catastrophic environmental change. Looking back from the future, the science fiction authors attempt to explore the historical origins of the catastrophe, only to discover that it is the fault of everyone who disagreed with Naomi Oreskes in the early 21st century. Heavy blame is laid at the feet of the ancestor scientists (Oreskes’ contemporaries) who insisted upon scientific and statistical standards for inferring conclusions from observational data. Implicit in the science fiction tale is the welcome acknowledgment that science should make accurate predictions.

In Oreskes’ science fiction, these scientists of yesteryear, today’s adversaries of climate-change advocates, were “almost childlike,” in their felt-need to adopt “strict” standards, and their adherence to severe tests derived from their ancestors’ religious asceticism. In other words, significance testing is a form of self-flagellation. Lest you think, I exaggerate, consider the actual words of Oreskes and Conway:

“In an almost childlike attempt to demarcate their practices from those of older explanatory traditions, scientists felt it necessary to prove to themselves and the world how strict they were in their intellectual standards. Thus, they placed the burden of proof on novel claims, including those about climate. Some scientists in the early twenty-first century, for example, had recognized that hurricanes were intensifying, but they backed down from this conclusion under pressure from their scientific colleagues. Much of the argument surrounded the concept of statistical significance. Given what we now know about the dominance of nonlinear systems and the distribution of stochastic processes, the then-dominant notion of a 95 percent confidence limit is hard to fathom. Yet overwhelming evidence suggests that twentieth-century scientists believed that a claim could be accepted only if, by the standards of Fisherian statistics, the possibility that an observed event could have happened by chance was less than 1 in 20. Many phenomena whose causal mechanisms were physically, chemically, or biologically linked to warmer temperatures were dismissed as “unproven” because they did not adhere to this standard of demonstration.

Historians have long argued about why this standard was accepted, given that it had no substantive mathematical basis. We have come to understand the 95 percent confidence limit as a social convention rooted in scientists’ desire to demonstrate their disciplinary severity. Just as religious orders of prior centuries had demonstrated moral rigor through extreme practices of asceticism in dress, lodging, behavior, and food–in essence, practices of physical self-denial–so, too, did natural scientists of the twentieth century attempt to demonstrate their intellectual rigor through intellectual self-denial.14 This practice led scientists to demand an excessively stringent standard for accepting claims of any kind, even those involving imminent threats.”

142 Dædalus at 44.

The science fiction piece in Dædalus has now morphed into a short book, which is billed within as a “haunting, provocative work of science-based fiction.” Naomi Oreskes & Erik M. Conway, The Collapse of Western Civilization: A View from the Future (N.Y. 2014). Under the cover of fiction, Oreskes and Conway provide their idiosyncratic, fictional definition of statistical significance, in a “Lexicon of Archaic Terms,” at the back of the book:

statistical significance  The archaic concept that an observed phenomenon could only be accepted as true if the odds of it happening by chance were very small, typically taken to be no more than 1 in 20.”

Id. at 61-62. Of course, in writing fiction, you can make up anything you like. Caveat lector.


 

[1] SeePlaying Dumb on Statistical Significance” (Jan. 4, 2015).

[2] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[3] SeeRhetorical Strategy in Characterizing Scientific Burdens of Proof” (Nov. 15, 2014).

[4] SeeThe Will to Ummph” (Jan. 10, 2012).