Lawyers and judges pay close attention to standards, guidances, and consenus statements from respected and recognized professional organizations. Deviations from these standards may be presumptive evidence of malpractice or malfeasance in civil and criminal litigation, in regulatory matters, and in other contexts. One important, recurring situation arises when trial judges must act as gatekeepers of the admissibility of expert witness opinion testimony. In making this crucial judicial determination, judges will want to know whether a challenged expert witness has deviated from an accepted professional standard of care or practice.
In 2016, the American Statistical Association (ASA) published a consensus statement on p-values. The ASA statement grew out of a lengthy process that involved assembling experts of diverse viewpoints. In October 2015, the ASA convened a two-day meeting for 20 experts to meet and discuss areas of core agreement. Over the following three months, the participating experts and the ASA Board members continued their discussions, which led to the ASA Executive Committee’s approval of the statement that was published in March 2016.[1]
The ASA 2016 Statement spelled out six relatively uncontroversial principles of basic statistical practice.[2] Far from rejecting statistical significance, the six principles embraced statistical tests as an important but insufficient basis for scientific conclusions:
“3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”
Despite the fairly clear and careful statement of principles, legal actors did not take long to misrepresent the ASA principles.[3] What had been a prescription about the insufficiency of p-value thresholds was distorted into strident assertions that statistical significance was unnecessary for scientific conclusions.
Three years after the ASA published its p-value consensus document, ASA Executive Director, Ronald Wasserstein, and two other statisticians, published an editorial in a supplemental issue of The American Statistician, in which they called for the abandonment of significance testing.[4] Although the Wasserstein’s editorial was clearly labeled as such, his essay introduced the special journal issue, and it appeared without disclaimer over his name, and his official status as the ASA Executive Director.
Sowing further confusion, the editorial made the following pronouncement:[5]
“The [2016] ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of ‘statistical significance’ be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as ‘significantly different’, ‘p < 0.05’, and ‘nonsignificant’ survive, whether expressed in words, by asterisks in a table, or in some other way.”
The ASA is a collective body, and its ASA Statement 2016 was a statement from that body, which spoke after lengthy deliberation and debate. The language, quoted above, moves within one paragraph, from the ASA Statement to the royal “We,” who are taking the step of abandoning the term “statistically significant.” Given the unqualified use of the collective first person pronoun in the same paragraph that refers to the ASA, combined with Ronald Wasserstein’s official capacity, and the complete absence of a disclaimer that this pronouncement was simply a personal opinion, a reasonable reader could hardly avoid concluding that this pronouncement reflected ASA policy.
Your humble blogger, and others, read Wasserstein’s 2019 editorial as an ASA statement.[6] Although it is true that the 2019 paper is labeled “editorial,” and that the editorial does not describe a consensus process, there is no disclaimer such as is customary when someone in an official capacity publishes a personal opinion. Indeed, rather than the usual disclaimer, the Wasserstein editorial thanks the ASA Board of Directors “for generously and enthusiastically supporting the ‘p-values project’ since its inception in 2014.” This acknowledgement strongly suggests that the editorial is itself part of the “p-values project,” which is “enthusiastically” supported by the ASA Board of Directors.
If the editorial were not itself confusing enough, an unsigned email from “ASA <asamail@amstat.org>” was sent out in July 2019, in which the anonymous ASA author(s) takes credit for changing statistical guidelines at the New England Journal of Medicine:[7]
From: ASA <asamail@amstat.org>
Date: Thu, Jul 18, 2019 at 1:38 PM
Subject: Major Medical Journal Updates Statistical Policy in Response to ASA Statement
To: <XXXX>
The email is itself an ambiguous piece of evidence as to what the ASA is claiming. The email says that the New England Journal of Medicine changed its guidelines “in response to the ASA Statement on P-values and Statistical Significance and the subsequent The American Statistician special issue on statistical inference.” Of course, the “special issue” was not just Wasserstein’s editorial, but the 42 other papers. So this claim leaves open to doubt exactly what in the 2019 special issue the NEJM editors were responding to. Given that the 42 articles that followed Wasserstein’s editorial did not all agree with Wasserstein’s “steps taken,” or with each other, the only landmark in the special issue was the editorial over the name of the ASA’s Executive Director.
Moreover, a reading of the NEJM revised guidelines does not suggest that the journal’s editors were unduly influenced by the Wasserstein editorial or the 42 accompanying papers. The journal mostly responded to the ASA 2016 consensus paper by putting some teeth into its Principle 4, which dealt with multiplicity concerns in submitted manuscripts. The newly adopted (2019) NEJM author guidelines do not take step out with Wasserstein and colleagues; there is no general prohibition on p-values or statements of “statistical significance.”
The confusion propagated by the Wasserstein 2019 editorial has not escaped the attention of other ASA officials. An editorial in the June 2019 issue of AmStat News, by ASA President Karen Kafadar, noted the prevalent confusion and uneasiness over the 2019 The American Statistician special issue, the lack of consensus, and the need for healthy debate.[8]
In this month’s issue of AmStat News, President Kafadar returned to the issue of the confusion over the 2019 ASA special issue of The American Statistician, in her “President’s Corner.” Because Executive Director Wasserstein’s editorial language about “we now take this step” is almost certainly likely to find its way into opportunistic legal briefs, Kafadar’s comments are worth noting in some detail:[9]
“One final challenge, which I hope to address in my final month as ASA president, concerns issues of significance, multiplicity, and reproducibility. In 2016, the ASA published a statement that simply reiterated what p-values are and are not. It did not recommend specific approaches, other than ‘good statistical practice … principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean’.
The guest editors of the March 2019 supplement to The American Statistician went further, writing: ‘The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. … [I]t is time to stop using the term “statistically significant” entirely’.
Many of you have written of instances in which authors and journal editors – and even some ASA members – have mistakenly assumed this editorial represented ASA policy. The mistake is understandable: The editorial was coauthored by an official of the ASA. In fact, the ASA does not endorse any article, by any author, in any journal – even an article written by a member of its own staff in a journal the ASA publishes.”
Kafadar’s caveat should quash incorrect assertions about the ASA’s position on statistical significance testing. It is a safe bet, however, that such assertions will appear in trial and appellate briefs.
Statistical reasoning is difficult enough for most people, but the hermeneutics of American Statistical Association publications on statistical significance may require a doctorate of divinity degree. In a cleverly titled post, Professor Deborah Mayo argues that there is no other way to interpret the Wasserstein 2019 editorial except as laying down an ASA prescription. Deborah G. Mayo, “Les stats, c’est moi,” Error Philosophy (Dec. 13, 2019). I accept President Kafadar’s correction at face value, and accept that I, like many other readers, misinterpreted the Wasserstein editorial as having the imprimatur of the ASA. Mayo points out, however, that Kafadar’s correction in a newsletter may be insufficient at this point, and that a stronger disclaimer is required. Officers of the ASA are certainly entitled to their opinions and the opportunity to present them, but disclaimers would bring clarity and transparency to published work of these officials.
Wasserstein’s 2019 editorial goes further to make a claim about how his “step” will ameliorate the replication crisis:
“In this world, where studies with ‘p < 0.05’ and studies with ‘p > 0.05 are not automatically in conflict, researchers will see their results more easily replicated – and, even when not, they will better understand why.”
The editorial here seems to be attempting to define replication failure out of existence. This claim, as stated, is problematic. A sophisticated practitioner may think of the situation in which two studies, one with p = .048, and another with p = 0.052 might be said not to be conflict. In real world litigation, however, advocates will take Wasserstein’s statement about studies not in conflict (despite p-values above and below a threshold, say 5%) to the extremes. We can anticipate claims that two similar studies with p-values above and below 5%, say with one p-value at 0.04, and the other at 0.40, will be described as not in conflict, with the second a replication of the first test. It is hard to see how this possible interpretation of Wasserstein’s editorial, although consistent with its language, will advance sound, replicable science.[10]
[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016).
[2] “The American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016).
[3] See, e.g., “The Education of Judge Rufe – The Zoloft MDL” (April 9, 2016) (Zoloft litigation); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016); “The American Statistical Association Statement on Significance Testing Goes to Court – Part I” (Nov. 13, 2018).
[4] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).
[5] Id. at S2.
[6] See “Has the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019); Deborah G. Mayo, “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean,” Error Statistics Philosophy (June 17, 2019); B. Haig, “The ASA’s 2019 update on P-values and significance,” Error Statistics Philosophy (July 12, 2019).
[7] See “Statistical Significance at the New England Journal of Medicine” (July 19, 2019); See also Deborah G. Mayo, “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?” Error Statistics Philosophy (July 19, 2019).
[8] See Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).
[9] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019).
[10] See also Deborah G. Mayo, “P‐value thresholds: Forfeit at your peril,” 49 Eur. J. Clin. Invest. e13170 (2019).