District Court Denies Writ of Coram Nobis to Dr Harkonen

Courts are generally suspicious of convicted defendants who challenge the competency of their trial counsel on any grounds that might reflect strategic trial decisions. A convicted defendant can always speculate about how his trial might have gone better had some witnesses, who did not fare well at trial, not been called. Similarly, a convicted defendant might well speculate that his trial counsel could and should have called other or better witnesses. Still, sometimes, trial counsel really do screw up, especially when it comes to technical, scientific, or statistical issues.

The Harkonen case is a true comedy of errors – statistical, legal, regulatory, and practical. Indeed, some would say it is truly criminal to convict someone for an interpretation of a clinical trial result.[1] As discussed in several previous posts, Dr. W. Scott Harkonen was convicted under the wire fraud statute, 18 U.S.C. § 1343, for having distributed a faxed press release about InterMune’s clinical trial, in which he described the study as having “demonstrated” Actimmune’s survival benefit in patients with mild to moderate idiopathic pulmonary fibrosis (cryptogenic fibrosing alveolitis). The trial had not shown a statistically significant result on its primary outcome, and the significance probability on the secondary outcome of survival benefit was 0.08. Dr. Harkonen reported on a non-prespecified subgroup of patients with mild to moderate disease at randomization, in which subgroup, the trial showed better survival in the experimental therapy group, p-value of 0.004, compared with the placebo group.

Having exhausted his direct appeal, Dr. Harkonen petitioned for post-conviction relief in the form of a writ of coram nobis, on grounds of ineffective assistance of counsel. Last week, federal District Judge Richard Seeborg, in San Francisco, denied Dr. Harkonen’s petition. United States v. Harkonen, Case No. 08-cr-00164-RS-1, Slip op. (N.D. Cal. Aug. 21, 2015). See Dani Kass, “Ex-InterMune CEO’s Complaints Against Trial Counsel Nixed,” Law360 (Aug. 24, 2015). Judge Seeborg held that Dr. Harkonen had failed to explain why he had not raised the claim of ineffective assistance earlier, and that trial counsel’s tactical and strategic decisions, with respect to not calling statistical expert witnesses, were “not so beyond the pale of reasonable conduct as to warrant the finding of ineffective assistance.” Slip op. at 1.

To meet its burden at trial, the government presented Dr. Thomas Fleming, a statistician and “trialist,” who had served on the data safety and monitoring board of the clinical trial at issue.[2] Fleming took the rather extreme view that a clinical trial that “fails” to meet its primary pre-stated end point at the conventional p-value of less than 5 percent is an abject failure and provides no demonstration of any claim of efficacy. (Other experts might well say that the only failed clinical trial is one that was not done.) Judge Seeborg correctly discerned that Fleming’s testimony was in the form of an opinion, and that the law of wire fraud prohibits prosecution of scientific opinions about which reasonable scientists may differ. The government’s burden was thus to show, beyond a reasonable doubt, that no reasonable scientist could have reported the Actimmune clinical trial as having “demonstrated” a survival benefit in the mild to moderate disease subgroup. Slip op. at 2.

Remarkably, at trial, the government presented no expert witnesses, and Fleming testified as a fact witness. While acknowledging that the contested issue, whether anyone could fairly say that the Actimmune clinical trial had demonstrated efficacy in a non-prespecified subgroup, called for an opinion, Judge Seeborg gave the government a pass for not presenting expert witnesses to make out its case. Indeed, Judge Seeborg noted that the government had “stressed testimony from its experts touting the view that study results without sufficiently low p-values are inherently unreliable and meaningless.” Slip op. at 3 (emphasis added). Judge Seeborg’s description of Fleming as an expert witness is remarkable because the government never sought to qualify Dr. Fleming as an expert witness, and the trial judge never gave the jury an instruction on how to evaluate the testimony of an expert witness, including an explanation that the jury was free to accept some, all, or none of Fleming’s opinion testimony. After the jury returned its guilty verdict, Harkonen’s counsel filed a motion for judgment of acquittal, based in part upon the government’s failure to qualify Fleming as an expert witness in the field of biostatistics. The trial judge refused this motion on grounds that

(1) at one point Fleming had been listed as an expert witness;

(2) Fleming’s curriculum vitae had been marked and admitted into evidence; and

(3) “[m]ost damningly,” according to the trial judge, Harkonen’s lawyers had failed to object to Fleming’s holding forth on opinions about statistical theory and practice.

Slip op. at 7. Damning indeed as evidence of a potentially serious deviation from a reasonable standard of care and competence for trial practice! On the petition for coram nobis, Judge Seeborg curiously refers to Dr. Harkonen as not objecting, when the very issue before the court, on the petition for coram nobis, is the competency of his counsel’s failing to object. Allowing a well-credentialed statistician, such as Fleming, to testify, without requesting a limiting instruction on expert witness opinion testimony certainly seems “beyond the pale.” If there were some potential tactic involved in this default, Judge Seeborg does not identify it, and none comes to mind. And even if this charade, of calling Fleming as a fact witness, were some sort of tactical cat-and-mouse litigation game between government and defendant, certainly the trial judge should have taken control of the matter by disallowing a witness, not tendered as an expert witness, from offering opinion testimony on arcane statistical issues.

Having not objected to Fleming’s opinions, Dr. Harkonen’s counsel decided not to call its own defense expert witnesses. The post-conviction court makes much of the lesser credentials of the defense witnesses, and a decision not to call expert witnesses based upon defense counsel’s apparent belief that it had undermined Fleming’s opinion on cross-examination. There is little in the cross-examination of Fleming to support the coram nobis court’s assessment. Fleming’s opinions were vulnerable in ways that trial counsel failed to exploit, and in ways that even a lesser credentialed expert witness could have made clear to a lay jury or the court. Even a journeyman statistician would have realized that Fleming had overstated the statistical orthodoxy that p-values are “magical numbers,” by noting that many statisticians and epidemiologists disagreed with invoking statistical hypothesis testing as a rigid decision procedure, based upon p-values less than 0.05. Indeed, the idea of statistical testing as driven by a rigid, pre-selected level of acceptable Type 1 error rate was rejected by the very statistician who developed and advanced computations of the p-value. See Sir Ronald Fisher, Statistical Methods and Scientific Inference 42 (Hafner 1956) (ridiculing rigid hypothesis testing as “absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.”).

After the jury convicted on the wire fraud count, Dr. Harkonen changed counsel from Kasowitz Benson Torres & Friedman LLP, to Mark Haddad at Sidley Austin LLP. Mr. Haddad was able, in relatively short order, to line up two outstanding statisticians, Professor Steven Goodman, of Stanford University’s Medical School, and Professor Donald Rubin, of Harvard University. Both Professors Goodman and Rubin robustly rejected Fleming’s orthodox positions in post-trial declarations, which were too late to affect the litigation of the merits, although their contributions may well have made it difficult for the trial judge to side with the government on its request for a Draconian ten-year prison sentence. From my own perspective, I can say it was not difficult to recruit two leading, capable epidemiologists, Professors Kenneth Rothman and Timothy Lash to join in an amicus brief that criticized Fleming’s testimony in a way that would have been devastating had it been done at trial.

The entire Harkonen affair is marked by extraordinary governmental hypocrisy. As Judge Seeborg reports:

“[t]hroughout its case in chief, the government stressed testimony from Fleming and Crager who offered that, in the world of biostatistical analysis, a 0.05 p-value threshold is ‘somewhat of a magic number’; that the only meaningful p-value from a study is the one for its primary endpoint; and that data from post-hoc subgroup analyses cannot be reported upon accurately without information about the rest of the sampling context.”[3]

Slip op. at 4. And yet, in another case, when it was politically convenient to take the opposite position, the government proclaimed, through its Solicitor General, on behalf of the FDA, that statistical significance at any level is not necessary at all for demonstrating causation:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). The methods of epidemiology and data analysis are not, however, so amenable to political expedience. The government managed both to overstate the interpretation of p-values in Harkonen, and to understate them in Matrixx Initiatives.

Like many of the judges who previously have ruled on one or another issue in the Harkonen case, Judge Seeborg struggled with statistical concepts and gave a rather bizarre, erroneous definition of what exactly was at issue with the p-values in the Actimmune trial:

“In clinical trials, a p-value is a number between one and zero which represents the probability that the results establish a cause-and-effect relationship, rather than a random effect, between the drug and a positive health benefit. Because a p-value indicates the degree to which the tested drug does not explain observed benefits, the smaller the p-value, the larger a study’s significance.”

Slip op. at 2-3. Ultimately, this error was greatly overshadowed by a simpler error of overlooking, and condoning, trial counsel’s default in challenging the government’s failure to present credible expert witness opinion testimony on the crucial issue in the case.

At the heart of the government’s complaint is that Dr. Harkonen’s press release does not explicitly disclose that the subgroup of mild and moderate disease patients was not pre-specified for analysis in the trial protocol and statistical analysis plan. Dr. Harkonen’s failure to disclose the ad hoc nature of the subgroup, while not laudable, hardly rose to the level of criminal fraud, especially when considered in the light of the available prior clinical trials on the same medication, and prevalent practice in not making the appropriate disclosure in press releases, and even in full, peer-reviewed publications of clinical trials and epidemiologic studies.

For better or worse, the practice of presenting unplanned subgroup analyses, is quite common in the scientific community. Several years ago the New England Journal of Medicine published a survey of publication practice in its own pages, and documented the widespread failure to limit “demonstrated” findings to pre-specified analyses.[4] In general, the survey authors were unable to determine the total number of subgroup analyses performed; and in the majority (68%) of trials discussed, the authors could not determine whether the subgroup analyses were pre-specified.[5] Although the authors of this article proposed guidelines for identifying subgroup analyses as pre-specified or post-hoc, they emphasized that the proposals were not “rules” that could be rigidly prescribed.[6]

Of course, what was at issue in Dr. Harkonen’s case was not a peer-reviewed article in a prestigious journal, but a much more informal, less rigorous communication that is typical of press releases. Lack of rigor in this context is not limited to academic and industry press releases. Consider the press release recently issued by the National Institutes of Health (NIH) in connection with a NIH funded clinical trial on age-related macular degeneration (AMD). NIH Press Release, “NIH Study Provides Clarity on Supplements for Protection against Blinding Eye Disease,” NIH News & Events Website (May 5, 2013) [last visited August 27, 2015]. The clinical trial studied a modified dietary supplement in common use to prevent or delay AMD. The NIH’s press release claimed that the study “provides clarity on supplements,” and announced a “finding” of “some benefits” when looking at just two of the subgroups. The press release does not use the words “post hoc” or “ad hoc” in connection with the subgroup analysis used to support the “finding” of benefit.

The clinical trial results were published the same day in a journal article that labeled the subgroup findings as post hoc subgroup findings.[7] The published paper also reported that the pre-specified endpoints of the clinical trial did not show statistically significant differences between therapies and placebo.

None of the p-values for any of the post-hoc subgroup analysis was adjusted for multiple comparisons. NIH webpages with Questions and Answers for the public and the media both fail to report the post-hoc nature of the subgroup findings.[8] By the standards imposed upon Dr. Harkonen in this case through Dr. Fleming’s testimony, and contrary to the NIH’s public representations, the NIH trial had “failed,” and no inferences could be drawn with respect to any endpoint because the primary endpoint did not yield a statistically significant result.

There are, to be sure, hopeful signs that the prevalent practice is changing. A recent article documented an increasing number of “null” effect clinical trials that have been reported, perhaps as the result of better reporting of trials without dramatic successes, increasing willingness to publish such trial results, and greater availability of trial protocols in advance of, or with, peer-review publication of trial results.[9] Transparency in clinical and other areas of research is welcome and should be the norm, descriptively and prescriptively, but we should be wary of criminalizing lapses with indictments of wire fraud for conduct that can be found in most scientific journals and press releases.


[1] See, e.g.,Who Jumped the Shark in United States v. Harkonen”; “Multiplicity versus Duplicity – The Harkonen Conviction”; “The (Clinical) Trial by Franz Kafka”; “Further Musings on U.S. v. Harkonen”; and “Subgroups — Subpar Statistical Practice versus Fraud.” In the Supreme Court, two epidemiologists and a law school lecturer filed an Amicus Brief that criticized the government’s statistical orthodoxy. Brief by Scientists And Academics as Amici Curiae, in Harkonen v. United States, 2013 WL 5915131, 2013 WL 6174902 (Supreme Court Sept. 9, 2013).

[2] The government also presented the testimony of Michael Crager, an InterMune biostatistician. Reading between the lines, we may infer that Dr. Crager was induced to testify in exchange for not being prosecuted, and that his credibility was compromised.

[3] This testimony was particularly egregious because mortality or survival is often the most important outcome measure, but frequently not made the primary trial end point because of concern over whether there would be a sufficient number of deaths over the course of the trial to assess efficacy in this outcome. In the context of the Actimmune trial, this concern was in full display, but as it turned out, when the data were collected, there was a survival benefit (p = 0.08, which shrank to 0.055 when the analysis was limited to patients who met entrance criteria, and shrank further to 0.004, when the analysis was limited plausibly to patients with only mild or moderate disease at randomization).

[4] Rui Wang, et al., “Statistics in Medicine – Reporting of Subgroup Analyses in Clinical Trials,” 357 New Eng. J. Med. 2189 (2007).

[5] Id. at 2192.

[6] Id. at 2194.

[7] Emily Chew, et al., Lutein + Zeaxanthin and Omega-3 Fatty Acids for Age-Related Macular Degeneration, 309 J. Am. Med. Ass’n 2005 (2013).

[8] SeeFor the Public: What the Age-Related Eye Disease Studies Mean for You” (May 2013) [last visited August 27, 2015]; “For the Media: Questions and Answers about AREDS2” (May 2013) [last visited August 27, 2015].

[9] See Robert M. Kaplan & Veronica L. Irvin, “Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time,” 10 PLoS ONE e0132382 (2015); see also Editorial, “Trials register sees null results rise,” 524 Nature 269 (Aug. 20, 2015); Paul Basken, “When Researchers State Goals for Clinical Trials in Advance, Success Rates Plunge,” The Chronicle of Higher Education (Aug. 5, 2015).