The (Clinical) Trial by Franz Kafka

United States of America v. W. Scott Harkonen, MD — Part I

Last week, Mark Haddad, of Sidley Austin, argued Dr. W. Scott Harkonen’s appeal in the Ninth Circuit.   In 2009, Dr. Harkonen was convicted by a jury, before the Hon. Marilyn Hall Patel, on a single count of wire fraud, under 18 U.S.C. § 1343. The jury acquitted Dr. Harkonen of felony misbranding, 21 U.S.C. §§ 331(k), 333(a)(2), 352(a).  Dr. Harkonen’s crime?  Bad statistical practice!

Dr. Harkonen, a physician, was the President and CEO of InterMune, Inc., a biotechnology company that researches and develops medications. InterMune developed interferon gamma-1b (Actimmune®), which was licensed by the FDA for the treatment of two rare diseases, chronic granulomatous disease and severe, malignant osteopetrosis.  In 1999, Austrian researchers published the results of a small randomized clinical trial, which concluded that at 12 months, treatment with interferon gamma-1b (Actimmune®) plus prednisolone was associated with “substantial improvements in the conditions of patients with idiopathic pulmonary fibrosis [IPF] who had had no response to glucocorticoids alone.” Rolf Ziesche, Elisabeth Hofbauer, Karin Wittmann, Ventzislav Petkov, Lutz-Henning Block, 341 New Engl. J. Med., 1264 (1999).  Based upon this 1999 clinical trial, InterMune conducted another clinical trial, with a primary end point of “progression-free” survival,” measured by decrease in specific pulmonary function tests or death.  InterMune’s trial specified nine secondary end points, including survival time over from randomization until the end of the trial.

InterMune’s trial failed to show overall reduction in progression-free survival.  Patients on Actimmune did, however, experience improvements on the survival end point, which were not statistically significant at the pre-specified level of alpha (p < 0.05).  Although not statistically significant as defined, 28 of 168 patients on placebo died, while only 16 of 162 patients on Actimmune died – an absolute value of 40% higher survival on therapy, p-value = 0.084.  The relative survival benefit was greater (70%) for a non-prespecified subgroup that had mild-to-moderate IPF (by pulmonary function criteria) at the outset of the trial.

For a combined subgroup of all mild-to-moderate IPF patients (FVC>55%), making up 77% of all trial participants, the absolute difference in mortality was only 6 patients on Actimmune (n = 126), compared to 21 on placebo (n = 128). For this non-prespecified subgroup, the improvement was 70%, p = 0.004.

In August 2002, Dr. Harkonen approved a press release, which carried a headline, “phase III data demonstrating survival benefit of Actimmune in IPF.” A subtitle announced the 70% relative reduction in patients with mild to moderate disease.  The text of the press release stated that the company’s view was based upon “preliminary,” clinical trial data, which “demonstrate a significant survival benefit in patients with mild to moderate disease randomly assigned to Actimmune versus control treatment (p=0.004).” The press release also stated the results and associated p-value for the survival endpoint for the whole study population, as well as the results of the long-term follow-up study of the patients from the original study by Ziesche, et al. (which also showed a survival benefit for those randomized to Actimmune).  The remainder of the four-page press release acknowledged that the results of the primary end point did not reach statistical significance, and identified two upcoming medical conferences, as well as a conference call with the investment community that would be recorded and posted on the company’s website for two days, at which further details would be provided.

Dr. Harkonen was acquitted of misbranding, but convicted of wire fraud for having issued this press release.  The gravamen of his crime was stating that the clinical trial “demonstrated” prolonged survival for IPF patients.  The prosecution asserted that Dr. Harkonen engaged in data dredging, grasping for the right non-prespecified end point that had a low p-value attached. Such data dredging implicates the problem of multiple comparisons or tests, with the result of increasing the risk of a false-positive finding, notwithstanding the p-value below 0.05.

Supported by the testimony of Professor Thomas Fleming, who chaired the Data Safety Monitoring Board for the clinical trial in question, the government claimed that the trial results were “negative” because the p-values for all the pre-specified endpoints exceeded 0.05.  Shortly after the press release, Fleming sent InterMune a letter that strongly dissented from the language of the press release, which he characterized as misleading.  Because the primary and secondary end points were not statistically significant, and because the reported mortality benefit was found in a non-prespecified subgroup, the interpretation of the trial data required “greater caution,” and the press release was a “serious misrepresentation of results obtained from exploratory data subgroup analyses.”

The district court sentenced Harkonen to six months of home confinement, three years of probation, 200 hours of community service, and a fine of $20,000. Dr. Harkonen appealed on grounds that the federal fraud statutes do not permit the government to prosecute persons for expressing scientific opinions about which reasonable minds can differ.  If any reasonable could find the defendant’s statement to be true, the trial court should dismiss the prosecution.  Statements that have support even from a minority of the scientific community should not be the basis for a fraud charge.  In Dr. Harkonen’s case, the government did not allege any misstatement of an objectively verifiable fact, but alleged falsity in his characterization of the data’s “demonstration” of an efficacy effect.  The government cross-appealed to complain about the leniency of the sentence.

Dr. Harkonen’s trial counsel did not present any expert witnesses, but he did elicit testimony from some of the government witnesses about the proper interpretation of the trial data and about controversy concerning the reliance upon a precise p-value for interpreting causality.  On appeal, for instance, Dr. Harkonen’s counsel quoted government witness, Dr. Wayne Hockmeyer:

“Many times people have the impression that—that when you look at data, it’s immediately clear what conclusions you ought to draw from those data. . . . And sometimes that’s true. And sometimes there are gray areas. And it is not true all the time. And there’s a lot of vigorous debate that goes on amongst members of the scientific and medical community about the conclusions that one ought to draw from those data. ER1085.”

A panel of three judges, Judges Nelson, Tashima, and Murguia, heard Dr. Harkonen’s appeal.  The case presents obvious first amendment issues, but the more curious issues involve whether the government can impose a statistical orthodoxy on pain of punishment under the wire fraud statutes.  There is much that can be said of Dr. Harkonen’s interpretation of the data.  Clearly, multiplicity was a problem that diluted the meaning of the reported p-value, but the government never presented evidence of what the p-value, corrected for multiple testing, might be.  If Dr. Harkonen committed a crime, then so have many biomedical journal editors, article authors, and government scientists for having over-interpreted evidence in communications that travel in the U.S. mails, and by the internet.