Subgroups — Subpar Statistical Practice versus Fraud

Several people have asked me why I do not enable comments on this blog.  Although some bloggers (e.g., Deborah Mayo’s Error Statistics site) have had great success in generating interesting and important discussions, I have seen too much spam on other websites, and I want to avoid having to police the untoward posts.  Still, I welcome comments and I try to respond to helpful criticism.  If and when I am wrong, I will gladly eat my words, which usually have been quite digestible.

Probably none of the posts here have generated more comments and criticisms than those written about the prosecution of Dr. Harkonen.  In general, critics have argued that defending Harkonen and his press release was tantamount to condoning bad statistical practice.  I have tried to show that Dr. Harkonen’s press release was much more revealing than it was portrayed in abbreviated accounts of his case, and the evidentiary support for his claim of efficacy in a subgroup was deeper and broader than acknowledged. The criticism and condemnation of Dr. Harkonen’s press release in the face of prevalent statistical practice, among leading journals and practitioners, is nothing short of hypocrisy and bad faith. If Dr. Harkonen deserves prison time for a press release, which promised a full analysis and discussion in upcoming conference calls and presentations at scientific meetings, then we can only imagine what criminal sanction awaits the scientists and journal editors who publish purportedly definitive accounts of clinical trials and epidemiologic studies, with subgroup analyses not prespecified and not labeled as post-hoc.

The prevalence of the practice does not transform Dr. Harkonen’s press release into “best practice,” but some allowance must be made for offering a causal opinion in the informal context of a press release rather than in a manuscript for submission to a journal.  And those critics, with prosecutorial temperaments, must recognize that, when the study was presented at conferences, and when manuscript was written up and submitted to the New England Journal of Medicine, the authors did reveal the ad hoc nature of the subgroup.

The Harkonen case will remain important for several reasons. There is an important distinction in the Harkonen case, ignored and violated by the government’s position, between opinion and fact.  If Harkonen is guilty of Wire Fraud, then so are virtually every cleric, minister, priest, rabbi, imam, mullah, and other religious person who makes supernatural claims and predictions.  Add in all politicians, homeopaths, vaccine deniers, and others who reject evidence for superstition, who are much more culpable than a scientist who accurately reports the actual data and p-value.

Then there is the disconnect between what expert witnesses are permitted to say and what resulted in Dr. Harkonen’s conviction. If any good could come from the government’s win, it would be the insistence upon “best practice” for gatekeeping of expert witness opinion testimony.

For better or worse, scientists often describe post-hoc subgroup findings as “demonstrated” effects. Although some scientists would disagree with this reporting, the practice is prevalent.  Some scientists would go further and contest the claim that pre-specified hypotheses are inherently more reliable than post-hoc hypotheses. See Timothy Lash & Jan Vandenbroucke, “Should Preregistration of Epidemiologic Study Protocols Become Compulsory?,” 23 Epidemiology 184 (2012).

One survey compared grant applications with later published papers and found that subgroup analyses were pre-specified in only a minority of cases; in a substantial majority (77%) of the subgroup analyses in the published papers, the analyses were not characterized as either pre-specified or post hoc. Chantal W. B. Boonacker, Arno W. Hoes, Karen van Liere-Visser, Anne G. M. Schilder, and Maroeska M. Rovers, “A Comparison of Subgroup Analyses in Grant

Applications and Publications,” 174 Am. J. Epidem. 291, 291 (2011).  Indeed, this survey’s comparison between grant applications and published papers revealed that most of the published subgroup analyses were post hoc, and that the authors of the published papers rarely reported justifications for their post-hoc subgroup. Id.

Again, for better or worse, the practice of presenting unplanned subgroup analyses, is common in the biomedical literature. Several years ago, the New England Journal of Medicine reported a survey of publication practice in its own pages, with findings similar to those of Boonacker and colleagues. Rui Wang, Stephen W. Lagakos, James H. Ware, David J. Hunter, and Jeffrey M. Drazen, “Statistics in Medicine — Reporting of Subgroup Analyses in Clinical Trials,” 357 New Eng. J. Med. 2189 (2007).  In general, Wang, et al.,  were unable to determine the total number of subgroup analyses performed; and in the majority (68%) of trials discussed, Wang could not determine whether the subgroup analyses were prespecified. Id. at 2912. Although Wang proposed guidelines for identifying subgroup analyses as prespecified or post-hoc, she emphasized that the proposals were not “rules” that could be rigidly prescribed. Id. at 2194.

The Wang study is hardly unique; the Journal of the American Medical Association reported a similar set of results. An-Wen Chan, Asbjørn Hrobjartsson, Mette T. Haahr, Peter C. Gøtzsche, and Douglas G. Altman, “Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials Comparison of Protocols to Published Articles,” 291 J. Am. Med. Ass’n 2457 (2004).  Chan and colleagues set out to document and analyze “outcome reporting bias” in studies; that is, the extent to which publications fail to report accurately the pre-specified outcomes in published studies of randomized clinical trials.  The authors compared and analyzed protocols and published reports of randomized clinical trials conducted in Denmark in 1994 and 1995. Their findings document a large discrepancy between idealized notion of pre-specification of study design, outcomes, and analyses, and the actual practice revealed by later publication.

Chan identified 102 clinical trials, with 3,736 outcomes, and found that 50% of efficacy, and 65% of harm outcomes were incompletely reported. There was a statistically significant risk of statistically significant outcomes to be fully reported compared with statistically insignificant results. (pooled odds ratio for efficacy outcomes = 2.4; 95% confidence interval, 1.4 – 4.0, and pooled odds ratio for harm outcomes = 4.7; 95% confidence interval, 1.8 -12.0. Their comparison of protocols with later published articles revealed that a majority of trials (62%) had at least one primary outcome that was changed, omitted, or innovated in the published version. The authors concluded that published accounts of clinical trials were frequently incomplete, biased, and inconsistent with protocols.

This week, an international group of scientists published their analysis of agreement vel non between protocols and corresponding later publications of randomized clinical trials. Matthias Briel, DISCO study group, “Subgroup analyses in randomised controlled trials: cohort study on trial protocols and journal publications,” 349 Brit. Med. J. g4539 (Published 16 July 2014). Predictably, the authors found a good deal of sloppy practice, or worse.  Of the 515 journal articles identified, about half (246 or 47.8%) reported one or more subgroup analysis. Of the articles that reported subgroup analyses, 81 (32.9%) publications stated that the subgroup analyses were prespecified, but in 28 of these articles (34.6%), the corresponding protocols did not identify the subgroup analysis.

In 86 of the publications surveyed, the authors found that the articles claimed a subgroup “effect,” but only 36 of the corresponding protocols reported a planned subgroup analysis.  Briel and the DISCO study group concluded that protocols of randomized clinical trials insufficiently describe subgroup analyses. In over one-third of publications, the articles reported subgroup analyses not pre-specified in earlier protocols. The DISCO study group called for access to protocols and statistical analysis plans for all randomized clinical trials.

In view of these empirical data, the government’s claims against Dr. Harkonen stand out, at best, as vindictive, selective prosecution.