United States of America v. W. Scott Harkonen, MD — Part III
Background
The recent oral argument in United States v. Harkonen (see “The (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief filed by the Solicitor General’s office in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011). One of Dr. Harkonen’s post-trial motions contended that the government’s failure to disclose its Matrixx amicus brief deprived him of a powerful argument that would have resulted from citing the language of the brief, which disparaged the necessity of statistical significance for “demonstrating” causal inferences. See “Multiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012).
Matrixx Initiatives is a good example of how litigants make bad law when they press for rulings on bad facts. The Supreme Court ultimately held that pleading and proving causation were not necessary for a securities fraud action that turned on non-disclosure of information about health outcomes among users of the company’s medication. What is required is “materiality,” which may be satisfied upon a much lower showing than causation. Because Matrixx Initiatives contended that statistical significance was necessary to causation, which in turn was needed to show materiality, much of the briefings before the Supreme Court addressed statistical significance, but the reality is that the Court’s disposition obviated any discussion of the role of statistical inferences for causation. 131 S.Ct. at 1319.
Still, the Supreme Court, in a unanimous opinion, plowed forward and issued its improvident dicta about statistical significance. Taken at face value, the Court’s statement that “the premise that statistical significance is the only reliable indication of causation … is flawed,” is unexceptionable. Matrixx Initiatives, 131 S.Ct. at 1319. For one thing, the statement would be true if statistical significance were necessary but not sufficient to “indicate” causation. But more to the point, there are some cases in which statistical significance may not be part of the analytical toolkit for reaching a causal conclusion. For instance, the infamous Ferebee case, which did not involve Federal Rule of 702, is a good example of a case that did not involve epidemiologic or statistical evidence. See “Ferebee Revisited” (Nov. 8, 2012) (discussing the agreement of both parties that statistical evidence was not necessary to resolve general causation because of the acute onset, post-exposure, of an extremely uncommon medical outcome – severe diffuse interstitial pulmonary fibrosis).
Surely, there are other such cases, but in modern products liability law, many causation puzzles are based upon the interpretation of rate-driven processes, measured using epidemiologic studies, involving a measurable base-line risk and an observed higher or lower risk among a sample of an exposed population. In this context, some evaluation of the size of random error is, indeed, necessary. The Supreme Court’s muddled dicta, however, has confused the issues by painting with an extremely broad brush.
The dicta in Matrixx Initiatives has already led to judicial errors. The MDL court in the Chantix litigation provides one such instance. Plaintiffs claimed that Chantix, a medication that helps people stop smoking, causes suicide. Pfizer, the manufacturer, challenged plaintiffs’ general causation expert witnesses, for not meeting the standards of Federal Rule of Evidence 702, for various reasons, not the least of which was that the studies relied upon by plaintiffs’ witnesses did not show statistical significance. In re Chantix Prods. Liab. Litig., MDL 2092, 2012 U.S. Dist. LEXIS 130144 (Aug. 21, 2012). The Chantix MDL court, citing Matrixx Initiatives for a blanket rejection of the need to consider random error, denied the defendant’s challenge. Id. at *41-42 (citing Matrixx Initiatives, 131 S.Ct. at 1319).
The Supreme Court, in Matrixx, however, never stated or implied such a blanket rejection of the importance of considering random error in evidence that was essentially statistical in nature. Of course, if it had done so, it would have been wrong.
Within two weeks of the Chantix decision, a similar erroneous interpretation of Matrixx Initiatives surfaced in MDL litigation over fenfluramine. Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012). Rejecting a Rule 702 challenge to plaintiffs’ expert witness’s opinion, the MDL trial judge, cited Matrixx Initiatives for the assertion that:
“Daubert does not require that an expert opinion regarding causation be based on statistical evidence in order to be reliable. * * * In fact, many courts have recognized that medical professionals often base their opinions on data other than statistical evidence from controlled clinical trials or epidemiological studies.”
Id. at *22 (citing Matrixx Initiatives, 131 S. Ct. at 1319, 1320). While some causation opinions might be perfectly appropriately based upon other than statistical evidence, the Supreme Court specifically disclaimed any comment upon Rule 702, in Matrixx Initiatives, which was a case about proper pleading of materiality in a securities fraud case, not about proper foundations for actual evidence of causation, at trial, of a health-effects claim. The Cheek decision is thus remarkable for profoundly misunderstanding the Matrixx case. There was no resolution of any Rule 702 issue in Matrixx.
The Trial Court’s Denial of the Matrixx Motion in Harkonen
Dr. Harkonen argued that he is entitled to a new trial on the basis of “newly discovered evidence” in the form of the government’s amicus brief in Matrixx. The trial court denied this motion on several grounds. First, the government’s amicus brief was filed after the jury returned its verdict against Dr. Harkonen. Second, the language in the Solicitor General’s amicus brief was just “argument.” And third, the issue in Matrixx involved adverse events, not efficacy, and the FDA, as well as investors, would be concerned with lesser levels of evidence that did not “demonstrate” causation. United States v. Harkonen, Memorandum & Order re Defendant Harkonen’s Motions for a New Trial, No. C 08-00164 MHP (N.D. Calif. April 18, 2011). Perhaps the most telling ground might have been that the government’s amicus briefing about statistical significance, prompted by Matrixx Initiatives’ appellate theory, was irrelevant to the proper resolution of that Supreme Court case. Still, if these reasons are taken individually, or in combination, they fail to mitigate the unfairness of the government’s prosecution of Dr. Harkonen.
The Amicus Brief Behind the Matrixx Motion
Judge Patel’s denial of the motion raised serious problems. See “Multiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012). It may thus be worth a closer look at the government’s amicus brief to evaluate Dr. Harkonen’s Matrixx motion. The distinction between efficacy and adverse effects is particularly unconvincing. Similarly, it does not seem fair to permit the government to take inconsistent positions, whether on facts or on inferences and arguments, when those inconsistencies confuse criminal defendants, prosecutors, civil litigants, and lower court judges. After all, Dr. Harkonen’s use of the key word, “demonstrate” was an argument about the strength and epistemic strength of the evidence at hand.
The government’s amicus brief was filed by the Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services. The government, in its brief, appeared to disclaim the necessity, or even the importance, of statistical significance:
“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”
Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). This statement, with its double negatives, is highly problematic. Validity of a correlation is really not what is at issue in randomized clinical trial; rather it is the statistical reliability or stability of the measurement that is called into question when the result is not statistically significant. A statistically insignificant result may not refute causation, but it certainly does not thereby support an inference of causation. The Solicitor General’s brief made this statement without citation to any biostatistics text or treatise.
The government’s amicus brief introduces its discussion of statistical significance with a heading, entitled “Statistical significance is a limited and non-exclusive tool for inferring causation.” Id. at *13. In a footnote, the government elaborated that its position applied to both safety and efficacy outcomes:
“[t]he same principle applies to studies suggesting that a particular drug is efficacious. A study in which the cure rate for cancer patients who took a drug was twice the cure rate for those who took a placebo could generate meaningful interest even if the results were not statistically significant.”
Id. at *15 n.2. Judge Patel’s distinction between efficacy and adverse events thus cannot be sustained. Of course, “meaningful interest” is not exactly a sufficient basis for a causal conclusion. As a general matter, Dr. Harkonen’s motion seems well grounded. Although not a model of clarity, the amicus brief appears to disparage the necessity of statistical significance for supporting a causal conclusion. A criminal defendant being prosecuted for using the wrong verb to describe his characterization of the inference he drew from a clinical trial would certainly want to showcase these high-profile statements made by Solicitor General’s office to the highest court of the land.
Solicitor General’s Good Advice
Much of the Solicitor General’s brief is directly on point for the Matrixx case. The amicus brief leads off by insisting that information that supports reasonable suspicions about adverse events, may be material absent sufficient evidence of causation. Id. at 11. Of course, this is the dispositive argument, and it is stated well in the brief. The brief then wonders into scientific and statistical territory, with little or no authority, at times misciting important works such as the Reference Manual on Scientific Evidence.
The Solicitor General’s amicus brief hones in on the key issue: materiality, which does not necessarily involve causation:
“Second, a reasonable investor may consider information suggesting an adverse drug effect important even if it does not prove that the drug causes the effect.”
Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *8.
“As explained above (see p. 19, supra), however, adverse event reports do not lend themselves to a statistical-significance analysis. At a minimum, the standard petitioners advocate would require the design of a scientific study able to capture the relative rates of incidence (either through a clinical trial or observational study); enough participants and data to perform such a study and make it powerful enough to detect any increased incidence of the adverse effect; and a researcher equipped and interested enough to conduct it.”
Id. at 23.
“As petitioners acknowledge (Br. 23), FDA does not apply any single metric for determining when additional inquiry or action is necessary, and it certainly does not insist upon ‘statistical significance.’ See Adverse Event Reporting 7. Indeed, statistical significance is not a scientifically appropriate or meaningful standard in evaluating adverse event data outside of carefully designed studies. Id. at 5; cf. Lempert 240 (‘it is meaningless to talk about receiving a statistically significant number of complaints’).”
Id. at 19. So statistical significance is unrelated to the case, and the kind of evidence of materiality, alleged by plaintiffs, does not even open itself to a measurement of statistical significance. At this point, the brief writers might have called it a day. The amicus brief, however, pushes on.
Solicitor General’s Ignoratio Elenchi
A good part of the government’s amicus brief in Matrixx presented argument irrelevant to the issues before the Court, even assuming that statistical significance was relevant to materiality.
“First, data showing a statistically significant association are not essential to establish a link between use of a drug and an adverse effect. As petitioners ultimately acknowledge (Br. 44 n.22), medical researchers, regulators, and courts consider multiple factors in assessing causation.”
Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *12. This statement is a non-sequitur. The consideration of multiple factors in assessing causation does not make the need for a statistically significant association or more less essential. Statistical significance could still be necessary but not sufficient in assessing causation. The government’s brief writers pick up the thread a few pages later:
“More broadly, causation can appropriately be inferred through consideration of multiple factors independent of statistical significance. In a footnote, petitioners acknowledge that critical fact: ‘[C]ourts permit an inference of causation on the basis of scientifically reliable evidence other than statistically significant epidemiological data. In such cases experts rely on a lengthy list of factors to draw reliable inferences, including, for example,
(1) the “strength” of the association, including “whether it is statistically significant”;
(2) temporal relationship between exposure and the adverse event;
(3) consistency across multiple studies;
(4) “biological plausibility”;
(5) “consideration of alternative explanations” (i.e., confounding);
(6) “specificity” (i.e., whether the specific chemical is associated with the specific disease at issue); and
(7) dose-response relationship (i.e., whether an increase in exposure yields an increase in risk).’ ”
Pet. Br. 44 n.22 (citations omitted). Those and other factors for inferring causation have been well recognized in the medical literature and by the courts of appeals. See, e.g., Reference Guide on Epidemiology 345-347 (discussing relevance of toxicologic studies), 375-379 (citing, e.g., Austin Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965))… .”
Id. at 15-16. These enumerated factors are obviously due to Sir Austin Bradford Hill. No doubt Matrixx Initiatives cited the Bradford Hill factors, but that was because the company was contending that statistical significance was necessary but not sufficient to show causation. As Bradford Hill showed by his famous conclusion that smoking causes lung cancer, these factors were considered after statistical significance was shown in several epidemiologic studies. The Supreme Court incorporated this non-argument into its opinion, even after disclaiming that causation was needed for materiality or that the Court was going to assess the propriety of causal findings in other cases.
The Solicitor General went on to cite three cases for the proposition that statistical significance is not necessary for assessing causation:
“Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (6th Cir. 2009) (“an ‘overwhelming majority of the courts of appeals’ agree” that differential diagnosis, a process for medical diagnosis that does not entail statistical significance tests, informs causation) (quoting Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263 (4th Cir. 1999)).”
Id. at 16. These two cases both involved so-called “differential diagnosis” or differential etiology, a process of ruling in, by ruling out. This method, which involves iterative disjunctive syllogism, starts from established causes, and reasons to a single cause responsible for a given case of the disease. The citation of these cases was irrelevant and bad scholarship by the government. The Solicitor General’s error here seems to have been responsible for the Supreme Court’s unthinking incorporation of these cases into its opinion.
The Solicitor General went on to cite a third case, the infamous Ferebee, for its suggestion that statistical significance was not necessary to establish causation:
“Ferebee v. Chevron Chem. Co., 736 F.2d 1529, 1536 (D.C. Cir.) (‘[P]roducts liability law does not preclude recovery until a “statistically significant” number of people have been injured’.), cert. denied, 469 U.S. 1062 (1984). As discussed below (see pp. 19-20, infra), FDA relies on a number of those factors in deciding whether to take regulatory action based on reports of an adverse drug effect.”
Id. at 16. Curiously, the Supreme Court departed from its reliance on the Solicitor General’s brief, with respect to Ferebee, and substituted its own citation to Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D. Ga. 1985), aff’d in relevant part, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986). See Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 1 (Nov. 12, 2012). The reliance upon the two differential etiology cases was “demonstrably” wrong, but citing Wells was even more bizarre because that case featured at least one statistically significant study relied upon by plaintiffs’ expert witnesses. Ferebee, on the other hand, involved an acute onset of a rare condition – severe pulmonary fibrosis – shortly after exposure to paraquat. Ferebee was thus a case in which the parties agreed that the causal relationship between paraquat and lung fibrosis had been established by non-analytical epidemiologic evidence. See Ferebee Revisited.
The government then pointed out in its amicus that sometimes statistical significance is hard to obtain:
“In some circumstances —e.g., where an adverse effect is subtle or has a low rate of incidence —an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance. Ibid. That does not mean, however, that researchers have no basis on which to infer a plausible causal link between a drug and an adverse effect.”
Id. at 15. Biological plausibility is hardly a biologically established causal link. Inability to find an appropriate data set often translates into an inability to draw a causal conclusion; inappropriate data are not an excuse for jumping to unsupported conclusions.
Solicitor General’s Bad Advice – Crimen Falsi?
The government’s brief then manages to go from bad to worse. The government’s amicus brief in Matrixx raises serious concerns about criminalizing inappropriate statistical statements, inferences, or conclusions. If the Solicitor General’s office, with input from Chief Counsel of the Food and Drug Division, of the Department of Health & Human Services, cannot correctly state basic definitions of statistical significance, then the government has no business of prosecuting others for similar offenses.
“To assess statistical significance in the medical context, a researcher begins with the ‘null hypothesis’, i.e., that there is no relationship between the drug and the adverse effect. The researcher calculates a ‘p-value’, which is the probability that the association observed in the study would have occurred even if there were in fact no link between the drug and the adverse effect. If that p-value is lower than the ‘significance level’ selected for the study, then the results can be deemed statistically significant.”
Id. at 13. Here the government’s brief commits a common error that results when lawyers want to simplify the definition of a p-value. The p-value is a cumulative probability of observing a disparity at least as great as observed, given the assumption that there is no difference. Furthermore, the subjunctive is not appropriate to describe the basic assumption of significance probability.
“The significance level most commonly used in medical studies is 0.05. If the p-value is less than 0.05, there is less than a 5% chance that the observed association between the drug and the effect would have occurred randomly, and the results from such a study are deemed statistically significant. Conversely, if the p-value is greater than 0.05, there is greater than a 5% chance that the observed association would have occurred randomly, and the results are deemed not statistically significant. See Reference Guide on Epidemiology 357-358; David Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 123, 123-125 (2d ed. 2000) (Reference Guide on Statistics).”
Id. at 14. Here the government’s brief drops the conditional of the significance probability; the p-value provides the probability that a disparity at least as large as observed would have occurred (based upon the assumed probability model), given the assumption that there really is no difference between the observed and expected results.
“While statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant – let alone, as here, the absence of any determination one way or the other — does not refute an inference of causation. See Michael D. Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643, 682- 683 (1992).”
Id. at 14. Validity is probably the wrong word since most statisticians and scientific authors use validity to refer to features other than low random error.
“Take, for example, results from a study, with a p-value of 0.06, showing that those who take a drug develop a rare but serious adverse effect (e.g., permanent paralysis) three times as often as those who do not. Because the p-value exceeds 5%, the study’s results would not be considered statistically significant at the 0.05 level. But since the results indicate a 94% likelihood that the observed association between the drug and the effect would not have occurred randomly, the data would clearly bear on the drug’s safety. Upon release of such a study, “confidence in the safety of the drug in question should diminish, and if the drug were important enough to [the issuer’s] balance sheet, the price of its stock would be expected to decline.” Lempert 239.2”
Id. at 14-15. The citation to Lempert’s article is misleading. At the cited page, Professor Lempert is simply making the point that materiality in a securities fraud case will often be present when evidence for a causal conclusion is not. Richard Lempert, “The Significance of Statistical Significance: Two Authors Restate An Incontrovertible Caution. Why A Book?” 34 Law & Social Inquiry 225, 239 (2009). In so writing, Lempert anticipated the true holding of Matrixx Initiative. The calculation of the 94% likelihood is also incorrect. The quantity (1 – [p-value]) yields a probability that describes the probability of obtaining a disparity no greater than the observed result, on the assumption that there is no difference at all between observed and expect results. There is, however, a larger point lurking in this passage of the amicus brief, which is the difference between a p-value of 0.05 and 0.06 is not particularly large, and there is thus a degree of arbitrariness to treating it as too sharp a line.
All in all, a distressingly poor performance by the Solicitor General’s office. With access to many talented statisticians, the government could have at least have had a competent statistician review and approve the content of this amicus brief. I suspect that most judges and lawyers, however, would balk at drawing an inference that the Solicitor General intended to mislead the Court simply because the brief contained so many misstatements about statistical inference. This reluctance should have obvious implications for the government’s attempt to criminalize Dr. Harkonen’s statistical inferences.