The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees

People say crazy things. In a radio interview, Evangelical Michael Huckabee argued that the Kentucky civil clerk who refused to issue a marriage license to a same-sex couple was as justified in defying an unjust court decision as people are justified in disregarding Dred Scott v. Sanford, 60 U.S. 393 (1857), which Huckabee described as still the “law of the land.”1 Chief Justice Roger B. Taney would be proud of Huckabee’s use of faux history, precedent, and legal process to argue his cause. Definition of “huckabee”: a bogus factoid.

Consider the case of Sander Greenland, who attempted to settle a score with an adversary’s expert witness, who had opined in 2002, that Bayesian analyses were rarely used at the FDA for reviewing new drug applications. The adversary’s expert witness obviously got Greenland’s knickers in a knot because Greenland wrote an article in a law review of all places, in which he presented his attempt to “correct the record” and show how the statement of the opposing expert witness was“ludicrous” .2 To support his indictment on charges of ludicrousness, Greenland ignored the FDA’s actual behavior in reviewing new drug applications,3 and looked at the practice of the Journal of Clinical Oncology, a clinical journal published 24 issues a year, with occasional supplements. Greenland found the word “Bayesian” 50 times in over 40,000 journal pages, and declared victory. According to Greenland, “several” (unquantified) articles had used Bayesian methods to explore, post hoc, statistically nonsignificant results.”4

Given Greenland’s own evidence, the posterior odds that Greenland was correct in his charges seem to be disturbingly low, but he might have looked at the published papers that conducted more serious, careful surveys of the issue.5 This week, the Journal of the American Medical Association published yet another study by John Ioannidis and colleagues, which documented actual practice in the biomedical literature. And no surprise, Bayesian methods barely register in a systematic survey of the last 25 years of published studies. See David Chavalarias, Joshua David Wallach, Alvin Ho Ting Li, John P. A. Ioannidis, “Evolution of reporting P values in the biomedical literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016). See also Demetrios N. Kyriacou, “The Enduring Evolution of the P Value,” 315 J. Am. Med. Ass’n 1113 (2016) (“Bayesian methods are not frequently used in most biomedical research analyses.”).

So what are we to make of Greenland’s animadversions in a law review article? It was a huckabee moment.

Recently, the American Statistical Association (ASA) issued a statement on the use of statistical significance and p-values. In general, the statement was quite moderate, and declined to move in the radical directions urged by some statisticians who attended the ASA’s meeting on the subject. Despite the ASA’s moderation, the ASA’s statement has been met with huckabee-like nonsense and hyperbole. One author, a pharmacologist trained at the University of Washington, with post-doctoral training at the University of California, Berkeley, and an editor of PloS Biology, was moved to write:

However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Lauren Richardson, “Is the p-value pointless?” (Mar. 16, 2016). And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure. Richardson suffered a lapse and wrote a huckabee.

Not surprisingly, lawyers attempting to spin the ASA’s statement have unleashed entire hives of huckabees in an attempt to deflate the methodological points made by the ASA. Here is one example of a litigation-industry lawyer who argues that the American Statistical Association Statement shows the irrelevance of statistical significance for judicial gatekeeping of expert witnesses:

To put it into the language of Daubert, debates over ‘p-values’ might be useful when talking about the weight of an expert’s conclusions, but they say nothing about an expert’s methodology.”

Max Kennerly, “Statistical Significance Has No Place In A Daubert Analysis” (Mar. 13, 2016) [cited as Kennerly]

But wait; the expert witness must be able to rule out chance, bias and confounding when evaluating a putative association for causality. As Austin Bradford Hill explained, even before assessing a putative association for causality, scientists need first to have observations that

reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

The analysis of random error is an essential step on the methodological process. Simply because a proper methodology requires consideration of non-statistical factors does not remove the statistical from the methodology. Ruling out chance as a likely explanation is a crucial first step in the methodology for reaching a causal conclusion when there is an “expected value” or base rate of for the outcome of interest in the population being sampled.

Kennerly shakes his hive of huckabees:

The erroneous belief in an ‘importance of statistical significance’ is exactly what the American Statistical Association was trying to get rid of when they said, ‘The widespread use of “statistical significance” (generally interpreted as p ≤ 0.05)’ as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

And yet, the ASA never urged that scientists “get rid of” statistical analyses and assessments of attained levels of significance probability. To be sure, they cautioned against overinterpreting p-values, especially in the context of multiple comparisons, non-prespecified outcomes, and the like. The ASA criticized bright-line rules, which are often used by litigation-industry expert witnesses to over-endorse the results of studies with p-values less than 5%, often in the face of multiple comparisons, cherry-picked outcomes, and poorly and incompletely described methods and results. What the ASA described as a “considerable distortion of the scientific process” was claiming scientific truth on the basis of “p < 0.05.” As Bradford Hill pointed out in 1965, a clear-cut association, beyond that which we would care to attribute to chance, is the beginning of the analysis of an association for causality, not the end of it. Kennerly ignores who is claiming “truth” in the litigation context.  Defense expert witnesses frequently are opining no more than “not proven.” The litigation industry expert witnesses must opine that there is causation, or else they are out of a job.

The ASA explained that the distortion of the scientific process comes from making a claim of a scientific conclusion of causality or its absence, when the appropriate claim is “we don’t know.” The ASA did not say, suggest, or imply that a claim of causality can be made in the absence of finding statistical significance, and as well as validation of the statistical model on which it is based, and other factors as well. The ASA certainly did not say that the scientific process will be served well by reaching conclusions of causation without statistical significance. What is clear is that statistical significance should not be an abridgment for a much more expansive process. Reviewing the annals of the International Agency for Research on Cancer (even in its currently politicized state), or the Institute of Medicine, an honest observer would be hard pressed to come up with examples of associations for outcomes that have known base rates, which associations were determined to be causal in the absence of studies that exhibited statistical significance, along with many other indicia of causality.

Some other choice huckabees from Kennerly:

“It’s time for courts to start seeing the phrase ‘statistically significant’ in a brief the same way they see words like ‘very,’ ‘clearly,’ and ‘plainly’. It’s an opinion that suggests the speaker has strong feelings about a subject. It’s not a scientific principle.”

Of course, this ignores the central limit theorems, the importance of random sampling, the pre-specification of hypotheses and level of Type I error, and the like. Stuff and nonsense.

And then in a similar vein, from Kennerly:

The problem is that many courts have been led astray by defendants who claim that ‘statistical significance’ is a threshold that scientific evidence must pass before it can be admitted into court.”

In my experience, litigation-industry lawyers oversell statistical significance rather than defense counsel who may question reliance upon studies that lack it. Kennerly’s statement is not even wrong, however, because defense counsel knowledgeable of the rules of evidence would know that statistical studies themselves are rarely admitted into evidence. What is admitted, or not, is the opinion of expert witnesses, who offer opinions about whether associations are causal, or not causal, or inconclusive.


1 Ben Mathis-Lilley, “Huckabee Claims Black People Aren’t Technically Citizens During Critique of Unjust Laws,” The Slatest (Sept. 11 2015) (“[T]he Dred Scott decision of 1857 still remains to this day the law of the land, which says that black people aren’t fully human… .”).

2 Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014).

3 To be sure, eight years after Greenland published this diatribe, the agency promulgated a guidance that set recommended practices for Bayesian analyses in medical device trials. FDA Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (February 5, 2010); 75 Fed. Reg. 6209 (February 8, 2010); see also Laura A. Thompson, “Bayesian Methods for Making Inferences about Rare Diseases in Pediatric Populations” (2010); Greg Campbell, “Bayesian Statistics at the FDA: The Trailblazing Experience with Medical Devices” (Presentation give by Director, Division of Biostatistics Center for Devices and Radiological Health at Rutgers Biostatistics Day, April 3, 2009). Even today, Bayesian analysis remains uncommon at the U.S. FDA.

4 39 Wake Forest Law Rev. at 306-07 & n.61 (citing only one paper, Lisa Licitra et al., Primary Chemotherapy in Resectable Oral Cavity Squamous Cell Cancer: A Randomized Controlled Trial, 21 J. Clin. Oncol. 327 (2003)).

5 See, e.g., J. Martin Bland & Douglas G. Altman, “Bayesians and frequentists,” 317 Brit. Med. J. 1151, 1151 (1998) (“almost all the statistical analyses which appear in the British Medical Journal are frequentist”); David S. Moore, “Bayes for Beginners? Some Reasons to Hesitate,” 51 The Am. Statistician 254, 254 (“Bayesian methods are relatively rarely used in practice”); J.D. Emerson & Graham Colditz, “Use of statistical analysis in the New England Journal of Medicine,” in John Bailar & Frederick Mosteler, eds., Medical Uses of Statistics 45 (1992) (surveying 115 original research studies for statistical methods used; no instances of Bayesian approaches counted); Douglas Altman, “Statistics in Medical Journals: Developments in the 1980s,” 10 Statistics in Medicine 1897 (1991); B.S. Everitt, “Statistics in Psychiatry,” 2 Statistical Science 107 (1987) (finding only one use of Bayesian methods in 441 papers with statistical methodology).