The American Statistical Association’s Statement on and of Significance

In scientific circles, some commentators have so zealously criticized the use of p-values that they have left uninformed observers with the impression that random error was not an interesting or important consideration in evaluating the results of a scientific study. In legal circles, counsel for the litigation industry and their expert witnesses have argued duplicitously that statistical significance was at once both unimportant, except when statistical significance is observed, in which causation is conclusive. The recently published Statement of the American Statistical Association (“ASA”) restores some sanity to the scientific and legal discussions of statistical significance and p-values. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>.

Recognizing that sound statistical practice and communication affects research and public policy decisions, the ASA has published a statement of interpretative principles for statistical significance and p-values. The ASA’s statement first, and foremost, points out that the soundness of scientific conclusions turns on more than statistical methods alone. Study design, conduct, and evaluation often involve more than a statistical test result. And the ASA goes on to note, contrary to the contrarians, that “the p-value can be a useful statistical measure,” although this measure of attained significance probability “is commonly misused and misinterpreted.” ASA at 7. No news there.

The ASA’s statement puts forth six principles, all of which have substantial implications for how statistical evidence is received and interpreted in courtrooms. All are worthy of consideration by legal actors – legislatures, regulators, courts, lawyers, and juries.

1. P-values can indicate how incompatible the data are with a specified statistical model.”

The ASA notes that a p-value shows the “incompatibility between a particular set of data and a proposed model for the data.” Although there are some in the statistical world who rail against null hypotheses of no association, the ASA reports that “[t]he most common context” for p-values consists of a statistical model that includes a set of assumptions, including a “null hypothesis,” which often postulates the absence of association between exposure and outcome under study. The ASA statement explains:

The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.”

Some lawyers want to overemphasize statistical significance when present, but to minimize the importance of statistical significance when it is absent.  They will find no support in the ASA’s statement.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”

Of course, there are those who would misinterpret the meaning of p-values, but the flaw lies in the interpreters, not in the statistical concept.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Note that the ASA did not say that statistical significance is irrelevant to scientific conclusions. Of course, statistical significance is but one factor, which does not begin to account for study validity, data integrity, or model accuracy. The ASA similarly criticizes the use of statistical significance as a “bright line” mode of inference, without consideration of the contextual considerations of “the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis.” Criticizing the use of “statistical significance” as singularly assuring the correctness of scientific judgment does not, however, mean that “statistical significance” is irrelevant or unimportant as a consideration in a much more complex decision process.

4. Proper inference requires full reporting and transparency”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

ASA Statement. See also “Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses” (Oct. 14, 2014).

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”

The ASA notes the commonplace distinction between statistical and practical significance. The independence between statistical and practice significance does not, however, make statistical significance irrelevant, especially in legal and regulatory contexts, in which parties claim that a risk, however small, is relevant. Of course, we want the claimed magnitude of association to be relevant, but we also need the measured association to be accurate and precise.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”

Of course, a p-value cannot validate the model, which is assumed to generate the p-value. Contrary to the hyperbolic claims one sees in litigation, the ASA notes that “a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.” And so the ASA counsels that “data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.” 

What is important, however, is that the ASA never suggests that significance testing or measurement of significance probability is not an important and relevant part of the process. To be sure, the ASA notes that because of “the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches.”

First of these other methods unsurprisingly is estimation with assessment of confidence intervals, although the ASA also includes Bayesian and other methods as well. There are some who express irrational exuberance about the protential of Bayesian methods to restore confidence in scientific process and conclusions. Bayesian approaches are less manipulated than frequentist ones, largely because very few people use Bayesian methods, and even fewer people really understand them.

In some ways, Bayesian statistical approaches are like Apple computers. The Mac OS is less vulnerable to viruses, compared with Windows, because its lower market share makes it less attractive to virus code writers. As Apple’s OS has gained market share, its vulnerability has increased. (My Linux computer on the other hand is truly less vulnerable to viruses because of system architecture, but also because Linux personal computers have almost no market share.) If Bayesian methods become more prevalent, my prediction is that they will be subject to as much abuse as frequent views. The ASA wisely recognized that the “reproducibility crisis” and loss of confidence in scientific research were mostly due to bias, both systematic and cognitive, in how studies are done, interpreted, and evaluated.