TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Matrixx Motion in U.S. v. Harkonen

December 17th, 2012

United States of America v. W. Scott Harkonen, MD — Part III

Background

The recent oral argument in United States v. Harkonen (seeThe (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief filed by the Solicitor General’s office in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  One of Dr. Harkonen’s post-trial motions contended that the government’s failure to disclose its Matrixx amicus brief deprived him of a powerful argument that would have resulted from citing the language of the brief, which disparaged the necessity of statistical significance for “demonstrating” causal inferences. SeeMultiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012).

Matrixx Initiatives is a good example of how litigants make bad law when they press for rulings on bad facts.  The Supreme Court ultimately held that pleading and proving causation were not necessary for a securities fraud action that turned on non-disclosure of information about health outcomes among users of the company’s medication. What is required is “materiality,” which may be satisfied upon a much lower showing than causation.  Because Matrixx Initiatives contended that statistical significance was necessary to causation, which in turn was needed to show materiality, much of the briefings before the Supreme Court addressed statistical significance, but the reality is that the Court’s disposition obviated any discussion of the role of statistical inferences for causation. 131 S.Ct. at 1319.

Still, the Supreme Court, in a unanimous opinion, plowed forward and issued its improvident dicta about statistical significance. Taken at face value, the Court’s statement that “the premise that statistical significance is the only reliable indication of causation … is flawed,” is unexceptionable. Matrixx Initiatives, 131 S.Ct. at 1319.  For one thing, the statement would be true if statistical significance were necessary but not sufficient to “indicate” causation. But more to the point, there are some cases in which statistical significance may not be part of the analytical toolkit for reaching a causal conclusion. For instance, the infamous Ferebee case, which did not involve Federal Rule of 702, is a good example of a case that did not involve epidemiologic or statistical evidence.  SeeFerebee Revisited” (Nov. 8, 2012) (discussing the agreement of both parties that statistical evidence was not necessary to resolve general causation because of the acute onset, post-exposure, of an extremely uncommon medical outcome – severe diffuse interstitial pulmonary fibrosis).

Surely, there are other such cases, but in modern products liability law, many causation puzzles are based upon the interpretation of rate-driven processes, measured using epidemiologic studies, involving a measurable base-line risk and an observed higher or lower risk among a sample of an exposed population. In this context, some evaluation of the size of random error is, indeed, necessary. The Supreme Court’s muddled dicta, however, has confused the issues by painting with an extremely broad brush.

The dicta in Matrixx Initiatives has already led to judicial errors. The MDL court in the Chantix litigation provides one such instance. Plaintiffs claimed that Chantix, a medication that helps people stop smoking, causes suicide. Pfizer, the manufacturer, challenged plaintiffs’ general causation expert witnesses, for not meeting the standards of Federal Rule of Evidence 702, for various reasons, not the least of which was that the studies relied upon by plaintiffs’ witnesses did not show statistical significance.  In re Chantix Prods. Liab. Litig., MDL 2092, 2012 U.S. Dist. LEXIS 130144 (Aug. 21, 2012).  The Chantix MDL court, citing Matrixx Initiatives for a blanket rejection of the need to consider random error, denied the defendant’s challenge. Id. at *41-42 (citing Matrixx Initiatives, 131 S.Ct. at 1319).

The Supreme Court, in Matrixx, however, never stated or implied such a blanket rejection of the importance of considering random error in evidence that was essentially statistical in nature. Of course, if it had done so, it would have been wrong.

Within two weeks of the Chantix decision, a similar erroneous interpretation of Matrixx Initiatives surfaced in MDL litigation over fenfluramine.  Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012). Rejecting a Rule 702 challenge to plaintiffs’ expert witness’s opinion, the MDL trial judge, cited Matrixx Initiatives for the assertion that:

Daubert does not require that an expert opinion regarding causation be based on statistical evidence in order to be reliable. * * * In fact, many courts have recognized that medical professionals often base their opinions on data other than statistical evidence from controlled clinical trials or epidemiological studies.”

Id. at *22 (citing Matrixx Initiatives, 131 S. Ct. at 1319, 1320).  While some causation opinions might be perfectly appropriately based upon other than statistical evidence, the Supreme Court specifically disclaimed any comment upon Rule 702, in Matrixx Initiatives, which was a case about proper pleading of materiality in a securities fraud case, not about proper foundations for actual evidence of causation, at trial, of a health-effects claim. The Cheek decision is thus remarkable for profoundly misunderstanding the Matrixx case. There was no resolution of any Rule 702 issue in Matrixx.

The Trial Court’s Denial of the Matrixx Motion in Harkonen

Dr. Harkonen argued that he is entitled to a new trial on the basis of “newly discovered evidence” in the form of the government’s amicus brief in Matrixx. The trial court denied this motion on several grounds.  First, the government’s amicus brief was filed after the jury returned its verdict against Dr. Harkonen.  Second, the language in the Solicitor General’s amicus brief was just “argument.”  And third, the issue in Matrixx involved adverse events, not efficacy, and the FDA, as well as investors, would be concerned with lesser levels of evidence that did not “demonstrate” causation.  United States v. Harkonen, Memorandum & Order re Defendant Harkonen’s Motions for a New Trial, No. C 08-00164 MHP (N.D. Calif. April 18, 2011). Perhaps the most telling ground might have been that the government’s amicus briefing about statistical significance, prompted by Matrixx Initiatives’ appellate theory, was irrelevant to the proper resolution of that Supreme Court case.  Still, if these reasons are taken individually, or in combination, they fail to mitigate the unfairness of the government’s prosecution of Dr. Harkonen.

The Amicus Brief Behind the Matrixx Motion

Judge Patel’s denial of the motion raised serious problems. SeeMultiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012).  It may thus be worth a closer look at the government’s amicus brief to evaluate Dr. Harkonen’s Matrixx motion. The distinction between efficacy and adverse effects is particularly unconvincing.  Similarly, it does not seem fair to permit the government to take inconsistent positions, whether on facts or on inferences and arguments, when those inconsistencies confuse criminal defendants, prosecutors, civil litigants, and lower court judges. After all, Dr. Harkonen’s use of the key word, “demonstrate” was an argument about the strength and epistemic strength of the evidence at hand.

The government’s amicus brief was filed by the Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services. The government, in its brief, appeared to disclaim the necessity, or even the importance, of statistical significance:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). This statement, with its double negatives, is highly problematic.  Validity of a correlation is really not what is at issue in randomized clinical trial; rather it is the statistical reliability or stability of the measurement that is called into question when the result is not statistically significant.  A statistically insignificant result may not refute causation, but it certainly does not thereby support an inference of causation.  The Solicitor General’s brief made this statement without citation to any biostatistics text or treatise.

The government’s amicus brief introduces its discussion of statistical significance with a heading, entitled “Statistical significance is a limited and non-exclusive tool for inferring causation.” Id. at *13.  In a footnote, the government elaborated that its position applied to both safety and efficacy outcomes:

“[t]he same principle applies to studies suggesting that a particular drug is efficacious. A study  in which the cure rate for cancer patients who took a drug was twice the cure rate for those who took a placebo could generate meaningful interest even if the results were not statistically significant.”

Id. at *15 n.2.  Judge Patel’s distinction between efficacy and adverse events thus cannot be sustained. Of course, “meaningful interest” is not exactly a sufficient basis for a causal conclusion. As a general matter, Dr. Harkonen’s motion seems well grounded.  Although not a model of clarity, the amicus brief appears to disparage the necessity of statistical significance for supporting a causal conclusion. A criminal defendant being prosecuted for using the wrong verb to describe his characterization of the inference he drew from a clinical trial would certainly want to showcase these high-profile statements made by Solicitor General’s office to the highest court of the land.

Solicitor General’s Good Advice

Much of the Solicitor General’s brief is directly on point for the Matrixx case. The amicus brief leads off by insisting that information that supports reasonable suspicions about adverse events, may be material absent sufficient evidence of causation.  Id. at 11.  Of course, this is the dispositive argument, and it is stated well in the brief.  The brief then wonders into scientific and statistical territory, with little or no authority, at times misciting important works such as the Reference Manual on Scientific Evidence.

The Solicitor General’s amicus brief hones in on the key issue: materiality, which does not necessarily involve causation:

“Second, a reasonable investor may consider information suggesting an adverse drug effect important even if it does not prove that the drug causes the effect.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *8.

“As explained above (see p. 19, supra), however, adverse event reports do not lend themselves to a statistical-significance analysis. At a minimum, the standard petitioners advocate would require the design of a scientific study able to capture the relative rates of incidence (either through a clinical trial or observational study); enough participants and data to perform such a study and make it powerful enough to detect any increased incidence of the adverse effect; and a researcher equipped and interested enough to conduct it.”

Id. at 23.

“As petitioners acknowledge (Br. 23), FDA does not apply any single metric for determining when additional inquiry or action is necessary, and it certainly does not insist upon ‘statistical significance.’ See Adverse Event Reporting 7. Indeed, statistical significance is not a scientifically appropriate or meaningful standard in evaluating adverse event data outside of carefully designed studies. Id. at 5; cf. Lempert 240 (‘it is meaningless to talk about receiving a statistically significant number of complaints’).”

Id. at 19. So statistical significance is unrelated to the case, and the kind of evidence of materiality, alleged by plaintiffs, does not even open itself to a measurement of statistical significance.  At this point, the brief writers might have called it a day.  The amicus brief, however, pushes on.

Solicitor General’s Ignoratio Elenchi

A good part of the government’s amicus brief in Matrixx presented argument irrelevant to the issues before the Court, even assuming that statistical significance was relevant to materiality.

“First, data showing a statistically significant association are not essential to establish a link between use of a drug and an adverse effect. As petitioners ultimately acknowledge (Br. 44 n.22), medical researchers, regulators, and courts consider multiple factors in assessing causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *12.  This statement is a non-sequitur.  The consideration of multiple factors in assessing causation does not make the need for a statistically significant association or more less essential. Statistical significance could still be necessary but not sufficient in assessing causation.  The government’s brief writers pick up the thread a few pages later:

“More broadly, causation can appropriately be inferred through consideration of multiple factors independent of statistical significance. In a footnote, petitioners acknowledge that critical fact: ‘[C]ourts permit an inference of causation on the basis of scientifically reliable evidence other than statistically significant epidemiological data. In such cases experts rely on a lengthy list of factors to draw reliable inferences, including, for example,

(1) the “strength” of the association, including “whether it is statistically significant”;

(2) temporal relationship between exposure and the adverse event;

(3) consistency across multiple studies;

(4) “biological plausibility”;

(5) “consideration of alternative explanations” (i.e., confounding);

(6) “specificity” (i.e., whether the specific chemical is associated with the specific disease at issue); and

(7) dose-response relationship (i.e., whether an increase in exposure yields an increase in risk).’ ”

Pet. Br. 44 n.22 (citations omitted). Those and other factors for inferring causation have been well recognized in the medical literature and by the courts of appeals. See, e.g., Reference Guide on Epidemiology 345-347 (discussing relevance of toxicologic studies), 375-379 (citing, e.g., Austin Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965))… .”

Id. at 15-16. These enumerated factors are obviously due to Sir Austin Bradford Hill. No doubt Matrixx Initiatives cited the Bradford Hill factors, but that was because the company was contending that statistical significance was necessary but not sufficient to show causation.  As Bradford Hill showed by his famous conclusion that smoking causes lung cancer, these factors were considered after statistical significance was shown in several epidemiologic studies.  The Supreme Court incorporated this non-argument into its opinion, even after disclaiming that causation was needed for materiality or that the Court was going to assess the propriety of causal findings in other cases.

The Solicitor General went on to cite three cases for the proposition that statistical significance is not necessary for assessing causation:

Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (6th Cir. 2009) (“an ‘overwhelming majority of the courts of appeals’ agree” that differential diagnosis, a process for medical diagnosis that does not entail statistical significance tests, informs causation) (quoting Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263 (4th Cir. 1999)).”

Id. at 16.  These two cases both involved so-called “differential diagnosis” or differential etiology, a process of ruling in, by ruling out.  This method, which involves iterative disjunctive syllogism, starts from established causes, and reasons to a single cause responsible for a given case of the disease.  The citation of these cases was irrelevant and bad scholarship by the government.  The Solicitor General’s error here seems to have been responsible for the Supreme Court’s unthinking incorporation of these cases into its opinion.

The Solicitor General went on to cite a third case, the infamous Ferebee, for its suggestion that statistical significance was not necessary to establish causation:

Ferebee v. Chevron Chem. Co., 736 F.2d 1529, 1536 (D.C. Cir.) (‘[P]roducts liability law does not preclude recovery until a “statistically significant” number of people have been injured’.), cert. denied, 469 U.S. 1062 (1984). As discussed below (see pp. 19-20, infra), FDA relies on a number of those factors in deciding whether to take regulatory action based on reports of an adverse drug effect.”

Id. at 16.  Curiously, the Supreme Court departed from its reliance on the Solicitor General’s brief, with respect to Ferebee, and substituted its own citation to Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D. Ga. 1985), aff’d in relevant part, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986). See Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 1 (Nov. 12, 2012).  The reliance upon the two differential etiology cases was “demonstrably” wrong, but citing Wells was even more bizarre because that case featured at least one statistically significant study relied upon by plaintiffs’ expert witnesses. Ferebee, on the other hand, involved an acute onset of a rare condition – severe pulmonary fibrosis – shortly after exposure to paraquat.  Ferebee was thus a case in which the parties agreed that the causal relationship between paraquat and lung fibrosis had been established by non-analytical epidemiologic evidence.  See Ferebee Revisited.

The government then pointed out in its amicus that sometimes statistical significance is hard to obtain:

“In some circumstances —e.g., where an adverse effect is subtle or has a low rate of incidence —an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance. Ibid. That does not mean, however, that researchers have no basis on which to infer a plausible causal link between a drug and an adverse effect.”

Id. at 15. Biological plausibility is hardly a biologically established causal link.  Inability to find an appropriate data set often translates into an inability to draw a causal conclusion; inappropriate data are not an excuse for jumping to unsupported conclusions.

Solicitor General’s Bad Advice – Crimen Falsi?

The government’s brief then manages to go from bad to worse. The government’s amicus brief in Matrixx raises serious concerns about criminalizing inappropriate statistical statements, inferences, or conclusions.  If the Solicitor General’s office, with input from Chief Counsel of the Food and Drug Division, of the Department of Health & Human Services, cannot correctly state basic definitions of statistical significance, then the government has no business of prosecuting others for similar offenses.

“To assess statistical significance in the medical context, a researcher begins with the ‘null hypothesis’, i.e., that there is no relationship between the drug and the adverse effect. The researcher calculates a ‘p-value’, which is the probability that the association observed in the study would have occurred even if there were in fact no link between the drug and the adverse effect. If that p-value is lower than the ‘significance level’ selected for the study, then the results can be deemed statistically significant.”

Id. at 13. Here the government’s brief commits a common error that results when lawyers want to simplify the definition of a p-value. The p-value is a cumulative probability of observing a disparity at least as great as observed, given the assumption that there is no difference.  Furthermore, the subjunctive is not appropriate to describe the basic assumption of significance probability.

“The significance level most commonly used in medical studies is 0.05. If the p-value is less than 0.05, there is less than a 5% chance that the observed association between the drug and the effect would have occurred randomly, and the results from such a study are deemed statistically significant. Conversely, if the p-value is greater than 0.05, there is greater than a 5% chance that the observed association would have occurred randomly, and the results are deemed not statistically significant. See Reference Guide on Epidemiology 357-358; David Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 123, 123-125 (2d ed. 2000) (Reference Guide on Statistics).”

Id. at 14. Here the government’s brief drops the conditional of the significance probability; the p-value provides the probability that a disparity at least as large as observed would have occurred (based upon the assumed probability model), given the assumption that there really is no difference between the observed and expected results.

“While statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant – let alone, as here, the absence of any determination one way or the other — does not refute an inference of causation. See Michael D. Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643, 682- 683 (1992).”

Id. at 14. Validity is probably the wrong word since most statisticians and scientific authors use validity to refer to features other than low random error.

“Take, for example, results from a study, with a p-value of 0.06, showing that those who take a drug develop a rare but serious adverse effect (e.g., permanent paralysis) three times as often as those who do not. Because the p-value exceeds 5%, the study’s results would not be considered statistically significant at the 0.05 level. But since the results indicate a 94% likelihood that the observed association between the drug and the effect would not have occurred randomly, the data would clearly bear on the drug’s safety. Upon release of such a study, “confidence in the safety of the drug in question should diminish, and if the drug were important enough to [the issuer’s] balance sheet, the price of its stock would be expected to decline.” Lempert 239.2

Id. at 14-15. The citation to Lempert’s article is misleading. At the cited page, Professor Lempert is simply making the point that materiality in a securities fraud case will often be present when evidence for a causal conclusion is not. Richard Lempert, “The Significance of Statistical Significance:  Two Authors Restate An Incontrovertible Caution. Why A Book?” 34 Law & Social Inquiry 225, 239 (2009).  In so writing, Lempert anticipated the true holding of Matrixx Initiative.  The calculation of the 94% likelihood is also incorrect.  The quantity (1 – [p-value]) yields a probability that describes the probability of obtaining a disparity no greater than the observed result, on the assumption that there is no difference at all between observed and expect results. There is, however, a larger point lurking in this passage of the amicus brief, which is the difference between a p-value of 0.05 and 0.06 is not particularly large, and there is thus a degree of arbitrariness to treating it as too sharp a line.

All in all, a distressingly poor performance by the Solicitor General’s office.  With access to many talented statisticians, the government could have at least have had a competent statistician review and approve the content of this amicus brief.  I suspect that most judges and lawyers, however, would balk at drawing an inference that the Solicitor General intended to mislead the Court simply because the brief contained so many misstatements about statistical inference.  This reluctance should have obvious implications for the government’s attempt to criminalize Dr. Harkonen’s statistical inferences.

Egilman Petitions the Supreme Court for Review of His Own Exclusion in Newkirk v. Conagra Foods

December 13th, 2012

Last year, the Ninth Circuit of the United States Court of Appeals affirmed a district judge’s decision to exclude Dr David S. Egilman from testifying in a consumer-exposure diacetyl case.  Newkirk v. Conagra Foods Inc., 438 Fed.Appx. 607  (9th Cir. 2011).  The plaintiff moved on, but his expert witness could not let his exclusion go.

To get the full “flavor” of this diacetyl case, read the district court’s opinion, which excluded Egilman and other witnesses, and entered summary judgment for the defense. Newkirk v. Conagra Foods, Inc., 727 F. Supp. 2d 1006  (E.D. Wash. July 2, 2010).  Here is the language that had Dr. Egilman popping mad:

“In other parts of his reports and testimony, Dr. Egilman relies on existing data, mostly in the form of published studies, but draws conclusions far beyond what the study authors concluded, or Dr. Egilman manipulates the data from those studies to reach misleading conclusions of his own. See Daubert I, 509 U.S. at 592–93, 113 S.Ct. 2786.”

727 F. Supp. 2d at 1018.

This language, cut Dr. Egilman to the kernel, and provoked him to lodge a personal appeal to the Ninth Circuit, based in part upon the economic harm done to his litigation consulting and testimonial practice. (See attached Egilman Motion Appeal Diacetyl Exclusion 2011 and Egilman Declaration Newkirk Diacetyl Appeal 2011.)  Not only did the exclusion hurt Dr. Egilman’s livelihood, but also his eleemosynary endeavors:

“The Daubert ruling eliminates my ability to testify in this case and in others. I will lose the opportunity to bill for services in this case and in others (although I generally donate most fees related to courtroom testimony to charitable organizations, the lack of opportunity to do so is an injury to me). Based on my experience, it is virtually certain that some lawyers will choose not to attempt to retain me as a result of this ruling. Some lawyers will be dissuaded from retaining my services because the ruling is replete with unsubstantiated pejorative attacks on my qualifications as a scientist and expert. The judge’s rejection of my opinion is primarily an ad hominem attack and not based on an actual analysis of what I said – in an effort to deflect the ad hominem nature of the attack the judge creates ‘straw man’ arguments and then knocks the straw men down, without ever addressing the substance of my positions.”

Egilman Declaration in Newkirk at Paragraph 11.

The Ninth Circuit affirmed Dr. Egilman’s exclusion, Newkirk v. Conagra Foods, Inc., 438 Fed. Appx. 607 (9th Cir. 2011).  SeeNinth Circuit Affirms Rule 702 Exclusion of Dr David Egilman in Diacetyl Case.

This year, the Ninth Circuit dismissed his personal appeal for lack of standing.  Egilman v. Conagra Foods, Inc., 2012 WL 3836100 (9th Cir. 2012). Previously, I suggested that the Ninth Circuit had issued a judgment from which there will be no appeal.  I may have been mistaken.  Last week, counsel for Dr. Egilman filed a petition for certiorari in the United States Supreme Court.  Smarting from the district court’s attack on his character and professionalism, Dr. Egilman is seeking the personal right to appeal an adverse Rule 702 ruling.  The Circuit split, which Dr. Egilman hopes will get him a hearing in the Supreme Court, involves the issue whether he, as a non-party witness, must intervene in the proceedings in order to preserve his right to appeal:

“Whether a nonparty to a district court proceeding has a right to appeal a decision that adversely affects his interest, as the Second, Sixth, and D.C. Circuits hold, or whether, as six other circuit courts hold, the nonparty must intervene or otherwise participate in the district court proceedings to have a right to appeal.”

Egilman Pet’n Cert Newkirk v Conagra SCOTUS at 5 (Dec. 2012).  Of course there is also a split among courts about Dr. Egilman reliability.

And who represents Dr. Egilman?  Counsel of record is Alexander A. Reinert, who teaches at Cardozo Law School, here in New York.  Dr. Egilman and Reinert have published several articles together, within the scope of Dr. Egilman’s litigation-oriented practice.[i]  In the past, I have commented upon Reinert’s work.  See, e.g., Schachtman, “Confidence in Intervals and Diffidence in the Courts” (May 8, 2012 ) (Arthur H. Bryant & Alexander A. Reinert, “The Legal System’s Use of Epidemiology,” 87 Judicature 12, 19 (2003)(“The confidence interval is intended to provide a range of values within which, at a specified level of certainty, the magnitude of association lies.”) (incorrectly citing the first edition of Rothman & Greenland, Modern Epidemiology 190 (Philadelphia 1998)). It should be interesting to see what mischief Egilman & Reinert can make in the Supreme Court.


[i] David S. Egilman & Alexander A. Reinert, “Corruption of Previously Published Asbestos Research,” 55 Arch. Envt’l Health 75 (2000); David S. Egilman & Alexander A. Reinert,“Asbestos Exposure and Lung Cancer: Asbestosis Is Not Necessary,” 30 Am. J. Indus. Med. 398 (1996); David S. Egilman & Alexander A. Reinert, “The Asbestos TLV: Early Evidence of Inadequacy,” Am. J. Indus. Med. 369 (1996);  David S. Egilman & Alexander A. Reinert,“The Origin and Development of the Asbestos Threshold Limit Value: Scientific Indifference and Corporate Influence,”  25 Internat’l J. Health Serv. 667 (1995).

Multiplicity versus Duplicity – The Harkonen Conviction

December 11th, 2012

United States of America v. W. Scott Harkonen, MD — Part II

The Alleged Fraud – “False as a matter of statistics”

The essence of the government’s case was that drawing an inference of causation from a statistically nonsignificant, post-hoc analysis was “false as a matter of statistics.” ER2498.  Dr. Harkonen’s trial counsel did not present any statistician testimony at trial.  In their final argument, his counsel explained that they obtained sufficient concessions at trial to make their point.

In post-trial motions, new counsel for Dr. Harkonen submitted affidavits from Dr. Steven Goodman and Dr. Donald Rubin, two very capable and highly accomplished statisticians, who explained the diversity of views in their field about the role of p-values in interpreting study data and drawing causal inferences.  At trial, however, the government’s witnesses, Drs. Crager and Fleming, testified that p-values of [less than] 0.05 were “magic numbers.”  United States v. Harkonen, 2010 WL 2985257, at *5 (N.D. Calif. 2010) (Judge Patel’s opinion denying defendant’s post–trial motions to dismiss the indictment, for acquittal, or for a new trial).  Sometimes judges are looking for bright lines in the wrong places.

The Multiplicity Problem

The government argued that the proper interpretation of a given p-value requires information about the nature and context of the statistical test that gave rise to the p-value.  If many independent tests are run on the same set of data, a low p-value would be expected to occur by chance alone.  Multiple testing can inflate the rate of false-positive findings, Type I errors.  The generation of these potentially false positive results is sometimes called the “multiplicity problem”; in the face of multiple testing, a stated p-value can greatly understate the level of false-positive findings.

In the context of a randomized clinical trial, it is thus important to know what the prespecified primary and secondary end points were.  David Moher, Kenneth F. Schulz, and Douglas G. Altman, “The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials,” 357 Lancet 1191 (2001). Post hoc data dredging can lead to the “Texas Sharpshooter Fallacy,” which results when an investigator draws a target around a hit, after the fact, and declares a bulls-eye.

Dr. Fleming thus had a limited point; namely the use of the verb “demonstrate” rather than “show” or “suggest” was too strong if based solely upon InterMune’s clinical trial, given that the low p-value came in the context of a non-prespecified subgroup analysis. (The supposedly offensive press release issued by Dr. Harkonen did indicate that the data confirmed the results in a previously reported phase II trial.) If the government engaged in some counter-speech to say that Dr. Harkonen’s statements fell below an idealized “best statistical practice” in his use of “demonstrate,” many statisticians might well agree with the government.  Even this limited point would evaporates if Dr. Harkonen had stated that the phase III subgroup analysis, along with the earlier published clinical trial, and clinical experience, “demonstrated” a survival benefit.  Had Dr. Harkonen issued this more scientifically felicitous statement, the government could not have made a claim of falsity in using the verb “to demonstrate” with a single p-value from a post hoc subgroup analysis.  Such a statement would have taken Dr. Harkonen’s analytic inference out of the purely statistical realm. Indeed, Dr. Harkonen’s press release did reference an earlier phase II trial, as well as notify readers that more detailed analyses would be presented at upcoming medical conferences.  Although Dr. Harkonen did use “demonstrate” to characterize the results of the phase III trial, standing alone, the entire press release made clear that the data were preliminary. It is difficult to imagine any reasonable physician prescribing Actimmune on the basis of the press release.

The prosecution and conviction of Dr. Harkonen thus raises the issue whether the alleged improper characterization of a study’s statistical result can be criminalized by the State.  Clearly, the federal prosecutors were motivated by their perception that the alleged fraud was connected to an attempt to promote an off-label use of Actimmune.  Such linguistic precision, however, is widely flouted in the world of law and science.  Lawyers use the word “proofs,” which often admit of inferences for either side, to describe real, demonstrative, and testimonial evidence.  A mathematician might be moved to prosecute all lawyers for fraudulent speech.  From the mathematicians’ perspective, the lawyers have made a claim of certainty in using “proof,” which is totally out of place.  Even in the world of science, the verb “to demonstrate” is used in a way that does not imply the sort of certitude that the purists might wish to retain for the strongest of empirical inferences from clinical trials. See, e.g., William B. Wong, Vincent W. Lin, Denise Boudreau, and Emily Beth Devine, “Statins in the prevention of dementia and Alzheimer’s disease: A meta-analysis of observational studies and an assessment of confounding,” 21 Pharmacoepidemiology & Drug Safety in-press, at Abstract (2012) (“Studies demonstrate the potential for statins to prevent dementia and Alzheimer’s disease (AD), but the evidence is inconclusive.”) (emphasis added).

The Duplicity Problem – The Matrixx Motion

After the conviction, Dr. Harkonen’s counsel moved for a new trial on grounds of newly discovered evidence. Dr. Harkonen’s counsel hoisted the prosecutors with their own petards, by quoting the government’s amicus brief to the United States Supreme Court in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  In Matrixx, the securities fraud plaintiffs contended that they need not plead “statistically significant” evidence for adverse drug effects.  The Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs in their claims against an over-the-counter pharmaceutical manufacturer, disclaimed the necessity, or even the importance, of statistical significance:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).

The government’s amicus brief introduces its discussion of this topic with a heading, entitled “Statistical significance is a limited and non-exclusive tool for inferring causation.” Id. at *13.  In a footnote, the government elaborated that its position applied to both safety and efficacy outcomes:

“[t]he same principle applies to studies suggesting that a particular drug is efficacious. A study  in which the cure rate for cancer patients who took a drug was twice the cure rate for those who took a placebo could generate meaningful interest even if the results were not statistically significant.”

Id. at *15 n.2.

The government might have suggested that Dr. Harkonen was parsing the amicus brief incorrectly.  After all, generating “meaningful interest” is not the same as generating a scientific conclusion, or as “demonstrating.” As I will show in a future post, the government, in its amicus brief, consistently misstated the meaning of statistical significance, and of significance probability.  The government’s inability to communicate these concepts correctly raises serious due process issues with a prosecution against someone for having using the wrong verb to describe a statistical inference. 

SCOTUS

The government’s amicus brief was clearly influential before the Supreme Court. The Court cited to, and adopted in dictum, the claim that the absence of statistical significance did not mean that medical expert witnesses could not have a reliable basis for inferring causation between a drug and an adverse event.  Matrixx Initiatives, Inc. v. Siracusano, — U.S. –, 131 S.Ct. 1309, 1319-20 (2011) (“medical professionals and researchers do not limit the data they consider to the results of randomized clinical trials or to statistically significant evidence”).

In any event, the prosecutor, in Dr. Harkonen’s trial, argued in summation that InterMune’s clinical trial had “failed,” and no conclusions could be drawn from the trial.  If this argument was not flatly contradicted by the government’s Matrixx brief, then the argument was certainly undermined by the rhetorical force of the government’s amicus brief.

The district court denied Dr. Harkonen’s motion for a new trial, and explained that the government’s Matrixx amicus brief contained “argument” rather than “newly discovered evidence.” United States v. Harkonen, No. C 08-00164 MHP, Memorandum and Order re Defendant Harkonen’s Motions for a New Trial at 14 (N.D. Calif. April 18, 2011). This rationale seems particularly inappropriate because the interpretation of a statistical test and the drawing of an inference are both “arguments,” and it is a fact that the government contended that p < 0.05 was not necessary to drawing causal inferences. The district court also offered that Matrixx was distinguishable on grounds that the securities fraud in Matrixx involved a safety outcome rather than an efficacy conclusion. This distinction truly lacks a difference:  the standards for determining causation do not differ between establishing harm or efficacy.  Of course, the FDA does employ a lesser, precautionary standard for regulating against harm, but this difference does not mean that the causal connections between harm and drugs are assessed on different standards.

On December 6th, the appeals in United States v. Harkonen were argued and submitted for decision.  Win or lose, Dr. Harkonen is likely to make important law in how scientists and lawyers speak about statistical inferences.

The (Clinical) Trial by Franz Kafka

December 9th, 2012

United States of America v. W. Scott Harkonen, MD — Part I

Last week, Mark Haddad, of Sidley Austin, argued Dr. W. Scott Harkonen’s appeal in the Ninth Circuit.   In 2009, Dr. Harkonen was convicted by a jury, before the Hon. Marilyn Hall Patel, on a single count of wire fraud, under 18 U.S.C. § 1343. The jury acquitted Dr. Harkonen of felony misbranding, 21 U.S.C. §§ 331(k), 333(a)(2), 352(a).  Dr. Harkonen’s crime?  Bad statistical practice!

Dr. Harkonen, a physician, was the President and CEO of InterMune, Inc., a biotechnology company that researches and develops medications. InterMune developed interferon gamma-1b (Actimmune®), which was licensed by the FDA for the treatment of two rare diseases, chronic granulomatous disease and severe, malignant osteopetrosis.  In 1999, Austrian researchers published the results of a small randomized clinical trial, which concluded that at 12 months, treatment with interferon gamma-1b (Actimmune®) plus prednisolone was associated with “substantial improvements in the conditions of patients with idiopathic pulmonary fibrosis [IPF] who had had no response to glucocorticoids alone.” Rolf Ziesche, Elisabeth Hofbauer, Karin Wittmann, Ventzislav Petkov, Lutz-Henning Block, 341 New Engl. J. Med., 1264 (1999).  Based upon this 1999 clinical trial, InterMune conducted another clinical trial, with a primary end point of “progression-free” survival,” measured by decrease in specific pulmonary function tests or death.  InterMune’s trial specified nine secondary end points, including survival time over from randomization until the end of the trial.

InterMune’s trial failed to show overall reduction in progression-free survival.  Patients on Actimmune did, however, experience improvements on the survival end point, which were not statistically significant at the pre-specified level of alpha (p < 0.05).  Although not statistically significant as defined, 28 of 168 patients on placebo died, while only 16 of 162 patients on Actimmune died – an absolute value of 40% higher survival on therapy, p-value = 0.084.  The relative survival benefit was greater (70%) for a non-prespecified subgroup that had mild-to-moderate IPF (by pulmonary function criteria) at the outset of the trial.

For a combined subgroup of all mild-to-moderate IPF patients (FVC>55%), making up 77% of all trial participants, the absolute difference in mortality was only 6 patients on Actimmune (n = 126), compared to 21 on placebo (n = 128). For this non-prespecified subgroup, the improvement was 70%, p = 0.004.

In August 2002, Dr. Harkonen approved a press release, which carried a headline, “phase III data demonstrating survival benefit of Actimmune in IPF.” A subtitle announced the 70% relative reduction in patients with mild to moderate disease.  The text of the press release stated that the company’s view was based upon “preliminary,” clinical trial data, which “demonstrate a significant survival benefit in patients with mild to moderate disease randomly assigned to Actimmune versus control treatment (p=0.004).” The press release also stated the results and associated p-value for the survival endpoint for the whole study population, as well as the results of the long-term follow-up study of the patients from the original study by Ziesche, et al. (which also showed a survival benefit for those randomized to Actimmune).  The remainder of the four-page press release acknowledged that the results of the primary end point did not reach statistical significance, and identified two upcoming medical conferences, as well as a conference call with the investment community that would be recorded and posted on the company’s website for two days, at which further details would be provided.

Dr. Harkonen was acquitted of misbranding, but convicted of wire fraud for having issued this press release.  The gravamen of his crime was stating that the clinical trial “demonstrated” prolonged survival for IPF patients.  The prosecution asserted that Dr. Harkonen engaged in data dredging, grasping for the right non-prespecified end point that had a low p-value attached. Such data dredging implicates the problem of multiple comparisons or tests, with the result of increasing the risk of a false-positive finding, notwithstanding the p-value below 0.05.

Supported by the testimony of Professor Thomas Fleming, who chaired the Data Safety Monitoring Board for the clinical trial in question, the government claimed that the trial results were “negative” because the p-values for all the pre-specified endpoints exceeded 0.05.  Shortly after the press release, Fleming sent InterMune a letter that strongly dissented from the language of the press release, which he characterized as misleading.  Because the primary and secondary end points were not statistically significant, and because the reported mortality benefit was found in a non-prespecified subgroup, the interpretation of the trial data required “greater caution,” and the press release was a “serious misrepresentation of results obtained from exploratory data subgroup analyses.”

The district court sentenced Harkonen to six months of home confinement, three years of probation, 200 hours of community service, and a fine of $20,000. Dr. Harkonen appealed on grounds that the federal fraud statutes do not permit the government to prosecute persons for expressing scientific opinions about which reasonable minds can differ.  If any reasonable could find the defendant’s statement to be true, the trial court should dismiss the prosecution.  Statements that have support even from a minority of the scientific community should not be the basis for a fraud charge.  In Dr. Harkonen’s case, the government did not allege any misstatement of an objectively verifiable fact, but alleged falsity in his characterization of the data’s “demonstration” of an efficacy effect.  The government cross-appealed to complain about the leniency of the sentence.

Dr. Harkonen’s trial counsel did not present any expert witnesses, but he did elicit testimony from some of the government witnesses about the proper interpretation of the trial data and about controversy concerning the reliance upon a precise p-value for interpreting causality.  On appeal, for instance, Dr. Harkonen’s counsel quoted government witness, Dr. Wayne Hockmeyer:

“Many times people have the impression that—that when you look at data, it’s immediately clear what conclusions you ought to draw from those data. . . . And sometimes that’s true. And sometimes there are gray areas. And it is not true all the time. And there’s a lot of vigorous debate that goes on amongst members of the scientific and medical community about the conclusions that one ought to draw from those data. ER1085.”

A panel of three judges, Judges Nelson, Tashima, and Murguia, heard Dr. Harkonen’s appeal.  The case presents obvious first amendment issues, but the more curious issues involve whether the government can impose a statistical orthodoxy on pain of punishment under the wire fraud statutes.  There is much that can be said of Dr. Harkonen’s interpretation of the data.  Clearly, multiplicity was a problem that diluted the meaning of the reported p-value, but the government never presented evidence of what the p-value, corrected for multiple testing, might be.  If Dr. Harkonen committed a crime, then so have many biomedical journal editors, article authors, and government scientists for having over-interpreted evidence in communications that travel in the U.S. mails, and by the internet.

EPA Post Hoc Statistical Tests – One Tail vs Two

December 2nd, 2012

EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 2

In 1992, the U.S. Environmental Protection Agency (EPA) published a risk assessment of lung cancer (and other) risks from environmental tobacco smoke (ETS).  See Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders EPA/600/6-90/006F (1992).  The agency concluded that ETS causes about 3,000 lung cancer deaths each year among non-smoking adults.  See also EPA “Fact Sheet: Respiratory Health Effects of Passive Smoking,” Office of Research and Development, and Office of Air and Radiation, EPA Document Number 43-F-93-003 (Jan. 1993).

In my last post, I discussed  how various plaintiffs, including tobacco companies, challenged the EPA’s conclusions as agency action that violated administrative and statutory procedures. “EPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2. 2012). The plaintiffs further claimed that the EPA had manufactured its methods to achieve the result it desired in advance of the analyses. A federal district court agreed with the methodological challenges to the EPA’s report, but the Court of Appeals reversed on grounds that the agency’s report was not reviewable agency action.  Flue-Cured Tobacco Cooperative Stabilization Corp. v. EPA, 4 F. Supp. 2d 435 (M.D.N.C. 1998), rev’d 313 F.3d 852, 862 (4th Cir. 2002) (Widener, J.) (holding that the issuance of the report was not “final agency action”).

One of the grounds of the plaintiffs’ challenge was that the EPA had changed, without explanation, from a 95% to a 90% confidence interval.  The change in the specification of the coefficient of confidence was equivalent to a shift from a two-tailed to a one-tailed test of confidence, with alpha set at 5%.  This change, along with gerrymandering or “cherry picking” of studies, allowed the EPA to claim a statistically significant association between ETS and lung cancer. 4 F. Supp. 2d at 461.  The plaintiffs pointed to EPA’s own previous risk assessments, as well as statistical analyses by the World Health Organization (International Agency for Research on Cancer), the National Research Council, and the Surgeon General, all of which routinely use 95% intervals, and two-tailed tests of significance.  Id.

In its 1990 Draft ETS Risk Assessment, the EPA had used a 95% confidence interval, but in later drafts, changed to a 90% interval.  One of the epidemiologists on the EPA’s Scientific Advisory Board, Geoffrey Kabat, criticized this post hoc change, noting that the use of 90% intervals are disfavored and that the post hoc change in statistical methodology created the appearance of an intent to influence the outcome of the analysis. Id. (citing Geoffrey Kabat, “Comments on EPA’s Draft Report: Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders,” II.SAB.9.15 at 6 (July 28, 1992) (JA 12,185).

The EPA argued that its adoption of a one-tailed test of significance was justified on the basis of an a priori hypothesis that ETS is associated with lung cancer.  Id. at 451-52, 461 (citing to ETS Risk Assessment at 5–2). The court found this EPA argument hopelessly circular.  The agency postulated its a priori hypothesis, which it then took as license to dilute the statistical test for assessing the evidence.  The agency, therefore, had assumed what it wished to show, in order to achieve the result it sought.  Id. at 456.  The EPA claimed that the one-tailed test had more power, but with dozens of studies aggregated into a summary result, the court recognized that Type I error was a larger threat to the validity of the agency’s conclusions.

The EPA also advanced a muddled defense of its use of 90% confidence intervals by arguing that if it used a 95% interval, the results would have been incongruent with the one-tailed p-values.  The court recognized that this was really no discrepancy at all, but only a corollary of using either one-tailed 5% tests or 90% confidence intervals.  Id. at 461.

If the EPA had adhered to its normal methodology, there would have been no statistically significant association between ETS and lung cancer. With its post hoc methodological choice, and highly selective approach to study inclusions in its meta-analysis, the EPA was able to claim a weak statistically significant association between ETS and lung cancer.  Id. at 463.  The court found this to be a deviation from the legally required use of “best judgment possible based upon the available evidence.”  Id.

Of course, the EPA could have announced its one-tailed test from the inception of the risk assessment, and justified its use on grounds that it was attempting to reach only a precautionary judgment for purposes of regulation.  Instead, the agency tried to showcase its finding as a scientific conclusion, which only further supported the tobacco companies’ challenge to the post hoc change in plan for statistical analysis.

Although the validity issues in the EPA’s 1992 meta-analysis should have been superseded by later studies, and later meta-analyses, the government’s fraud case, before Judge Kessler, resurrected the issue:

“3344. Defendants criticized EPA’s meta-analysis of U.S. epidemiological studies, particularly its use of an ‘unconventional 90 percent confidence interval’. However, Dr. [David] Burns, who participated in the EPA Risk Assessment, testified that the EPA used a one-tailed 95% confidence interval, not a two-tailed 90% confidence interval. He also explained in detail why a one-tailed test was proper: The EPA did not use a 90% confidence interval. They used a traditional 95% confidence interval, but they tested for that interval only in one direction. That is, rather than testing for both the possibility that exposure to ETS increased risk and the possibility that it decreased risk, the EPA only tested for the possibility that it increased the risk. It tested for that possibility using the traditional 5% chance or a P value of 0.05. It did not test for the possibility that ETS protected those exposed from developing lung cancer at the direction of the advisory panel which made that decision based on its prior decision that the evidence established that ETS was a carcinogen. What was being tested was whether the exposure was sufficient to increase lung cancer risk, not whether the agent itself, that is cigarette smoke, had the capacity to cause lung cancer with sufficient exposure. The statement that a 90% confidence interval was used comes from the observation that if you test for a 5% probability in one direction the boundary is the same as testing for a 10% probability in two directions. Burns WD, 67:5-15. In fact, the EPA Risk Assessment stated, ‘Throughout this chapter, one-tailed tests of significance (p = 0.05) are used …’ .”

U.S. v. Philip Morris USA, Inc., 449 F. Supp. 2d 1, 702-03 (D.D.C., 2006) (Kessler, J.) (internal citations omitted).

Judge Kessler was misled by Dr. Burns, a frequent testifier for plaintiffs’ counsel in tobacco cases.  Burns should have known that with respect to the lower bound of the confidence interval, which is what matters for determining whether the meta-analysis excludes a risk ratio of 1.0, there is no difference between a one-tailed 95% confidence interval and a two-tailed 90% interval.  Burns’ sophistry hardly saves the EPA’s error in changing its pre-specified end point and statistical analysis, or the danger of unduly increasing the risk of Type I error in the EPA meta-analysis. SeePin the Tail on the Significance Test” (July 14th, 2012)

Post-script

Judge Widener wrote the opinion for a panel of the United States Court of Appeals, for the Fourth Circuit, which reversed the district court’s judgment, enjoining the EPA’s report.  The Circuit’s decision did not address the scientific issues, but by holding that the agency action was not reviewable, the basis for the district court’ review of the scientific and statistical issues was removed.  For those pundits who see only self-interested behavior in judging, the author of the Circuit’s decision was a life-time smoker, who grew Burley tobacco on his farm, outside Abingdon, Virginia.  Judge Widener died on September 19, 2007, of lung cancer.

EPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETS & Lung Cancer – Part 1

December 2nd, 2012

Somehow, before the Supreme Court breathed life into Federal Rule of Evidence 702, parties sometimes found a way to challenge dubious scientific evidence in court.  One good example is the challenge to the United States Environmental Protection Agency’s risk assessment of passive smoking, also known as environmental tobacco smoke (ETS).  In 1992, the Environmental Protection Agency (EPA) published a risk assessment of lung cancer (and other) risks from ETS.  See Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders EPA/600/6-90/006F (1992).  The agency concluded that ETS causes about 3,000 lung cancer deaths each year among non-smoking adults in the United States.  See also EPA “Fact Sheet: Respiratory Health Effects of Passive Smoking,” Office of Research & Development; EPA Document Number 43-F-93-003 (Jan. 1993).

Various plaintiffs, including tobacco companies, challenged the EPA’s conclusions as agency action that violated administrative and statutory procedures.  The plaintiffs further claimed that the EPA had manufactured its methods to achieve the result it desired in advance of the analyses. In other words, plaintiffs asserted that the EPA’s issuance of the ETS report violated the Administrative Procedures Act’ procedural requirements, as well as the requirements of the specific enabling legislation, the Radon Gas and Indoor Air Quality Research Act, Pub.L. No. 99–499, 100 Stat. 1758–60 (1986) (codified at 42 U.S.C. § 7401 note (1994)).  A federal district court agreed with the methodological challenges to the EPA’s report, but the Court of Appeals reversed on grounds that the agency’s report was not reviewable agency action.  Flue-Cured Tobacco Cooperative Stabilization Corp. v. EPA, 4 F. Supp. 2d 435 (M.D.N.C. 1998), rev’d on other grounds, 313 F.3d 852, 862 (4th Cir. 2002) (Widener, J.) (holding that the issuance of the report was not “final agency action”). The district court’s assessment of the validity issues were not addressed by the appellate court.

Notwithstanding the district court’s findings, the EPA continues to claim that it had reached valid scientific conclusions using a “scientific approach”:

“EPA reached its conclusions concerning the potential for ETS to act as a human carcinogen based on an analysis of all of the available data, including more than 30 epidemiologic (human) studies looking specifically at passive smoking as well as information on active or direct smoking. In addition, EPA considered animal data, biological measurements of human uptake of tobacco smoke components and other available data. The conclusions were based on what is commonly known as the total weight-of-evidence” rather than on any one study or type of study.

The finding that ETS should be classified as a Group A carcinogen is based on the conclusive evidence of the dose-related lung carcinogenicity of mainstream smoke in active smokers and the similarities of mainstream and sidestream smoke given off by the burning end of the cigarette. The finding is bolstered by the statistically significant exposure-related increase in lung cancer in nonsmoking spouses of smokers which is found in an analysis of more than 30 epidemiology studies that examined the association between secondhand smoke and lung cancer.”

EPA “Fact Sheet: Respiratory Health Effects of Passive Smoking,”  Office of Research and Development; EPA Document Number 43-F-93-003, January 1993 (emphasis added).

A prominent feature of the EPA’s analysis was a meta-analysis of epidemiologic studies of ETS and lung cancer.  Interestingly, the tobacco industry plaintiffs did not appear to challenge the legitimacy of the basic meta-analytic enterprise, which was still controversial at the time.  See, e.g., “Samuel Shapiro, Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771 (1994); Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).  Their challenge went straight to the validity of the EPA’s meta-analysis, and a documented post hoc change in the agency’s statistical plan for analyzing the meta-analysis results.  Only a few years earlier, the defense in polychlorobiphenyl (PCB) litigation broadly challenged a plaintiffs’ expert witness’s use of meta-analysis of observational epidemiologic studies, only to have the Third Circuit reject the challenge and to direct the district court to review the validity of the meta-analysis as conducted by the witness.  In re Paoli RR Yard PCB Litig., 706 F. Supp. 358, 373 (E.D. Pa. 1988), rev’d, 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); see also Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).

The EPA report was not the first attempt to use meta-analysis for the epidemiology of ETS and lung cancer.  In 1986, the National Academy of Sciences reported a meta-analysis on the subject.  See National Research Council, National Academy of Sciences,  Environmental tobacco smoke: measuring exposures and assessing health effects (Wash. DC 1986).  This earlier meta-analysis was also controversial.  Indeed, some of the early concerns over the use of meta-analysis for observational epidemiologic studies arose in the context of studies of ETS.  See, e.g., Joseph L. Fleiss & Alan J. Gross, “Meta-Analysis in Epidemiology, with Special Reference to Studies of the Association between Exposure to Environmental Tobacco Smoke and Lung Cancer:  A Critique,” 44 J. Clin. Epidem. 127 (1991) (criticizing the National Research Council 1986 meta-analysis of ETS and lung cancer studies as unwarranted based upon the low quality of the studies included).  These concerns were heightened by politicized use of meta-analyses in regulatory agencies to overclaim scientific conclusions from weak, inconclusive data.

In the EPA’s meta-analysis, statistical significance was achieved only by changing the criterion of significance, post hoc, from a two-tailed to a one-tailed 5% test.  Perhaps more disturbing was the scientific gerrymandering that took place as to which studies to include and exclude from the meta-analysis.

In its first review of the EPA’s draft report, a committee of the agency’s Scientific Advisory Board, the IAQC [the Indoor Air Quality/Total Human Exposure Committee] found that the EPA’s ETS risk assessment violated one of the necessary criteria for a valid meta-analysis – a “precise definition of criteria used to include (or exclude) studies.”  4 F. Supp. 2d at 459 (citing EPA, An SAB Report: Review of Draft Environmental Tobacco Smoke Health Effects Document, EPA/SAB/IAQC/91/007 at 32–33 (1991) (SAB 1991 Review) (JA 9,497–98)).  The agency had not provided specific criteria for including studies. The IAQC also noted that it was important to evaluate the consequences of having excluded studies in the form of sensitivity studies. In a later review, in 1992, both the EPA and the IAQC dropped this critique of the agency’s meta-analysis, without explanation.  Id. at 459.

By the time the EPA released its ETS report in 1993, there were about 58 published epidemiologic studies available for inclusion in any meta-analysis.  The EPA included only 31.  The agency limited its analysis to nonsmoking women married to smoking spouses.  There were 33 studies of this exposed group; the EPA included 31 of the 33.  There were also available 12 studies of women exposed to ETS in their workplace, and 13 studies of women who were exposed to ETS as children.  Id. at 458. There were three late-breaking studies of women with spousal exposures, but the EPA excluded two, without explanation.  Id. at 459.

In reviewing the plaintiffs’ challenge, the district court noted that the EPA had given a bare, unconvincing explanation for excluding the childhood and workplace studies.  Id.  The EPA argued that there was less data in the childhood and workplace studies, but this assertion struck the court as an evasive rationale when one of the purposes of conducting a meta-analysis was to incorporate the data from smaller, less powerful studies.  Id. 458-59.  The primary author of the disputed chapter of the EPA report, Kenneth Brown, called the disputed studies “inadequate,” without providing a rational basis or explanation.  The IAQC, in its earlier review of a 1991 draft report, recognized that the excluded studies provided less information, but concluded that the agency’s “the report should review and comment on the data that do exist… .” Id. at 459.

The court found the EPA’s selection of studies for inclusion in a meta-analysis to be “disturbing”:

“First, there is evidence in the record supporting the accusation that EPA ‘cherry picked’ its data. Without criteria for pooling studies into a meta-analysis, the court cannot determine whether the exclusion of studies likely to disprove EPA’s a priori hypothesis was coincidence or intentional. Second, EPA’s excluding nearly half of the available studies directly conflicts with EPA’s purported purpose for analyzing the epidemiological studies and conflicts with EPA’s Risk Assessment Guidelines. See ETS Risk Assessment at 4–29 (“These data should also be examined in the interest of weighing all the available evidence, as recommended by EPA’s carcinogen risk assessment guidelines (U.S.EPA, 1986a) ….” (emphasis added)). Third, EPA’s selective use of data conflicts with the Radon Research Act. The Act states EPA’s program shall ‘‘gather data and information on all aspects of indoor air quality….’’ Radon Research Act § 403(a)(1) (emphasis added). In conducting a risk assessment under the Act, EPA deliberately refused to assess information on all aspects of indoor air quality.”

4 F. Supp. 2d at 460.

The court was no doubt impressed by the duplicity of the agency’s claim to have used a “total weight of the evidence” approach to the question of causality, and its censoring of the analysis in a way that appeared to game the result.  Id. at 454  The EPA’s guidelines called for basing conclusions on all available evidence.  EPA’s Guidelines for Carcinogen Risk Assessment, 51 Fed. Reg. 33,996, 33,999-34,000 (1986).

Using evidence selectively, with a post hoc adoption of a one-tailed test of statistical significance, the EPA reported a summary estimate of risk of 1.19, and categorized ETS as a “Group A” carcinogens. In most of its previous Group A classifications, the agency had based its decisions upon much higher relative risks.  Indeed, the agency had rejected Group A classifications when relative risks were found to be less than three.  4 F. Supp. 2d at 461.  The sum total of the agency’s methodological laxity was too much for the district court, which struck the chapters of the EPA report.  Four years later, the Fourth Circuit of the U.S. Court of Appeals reversed, on grounds that the EPA report was not reviewable agency action.

The EPA report became a lightning rod for methodological criticism of meta-analysis for observational studies, and the EPA’s use of meta-analysis.  Critics argued that the EPA had succumbed to political pressure from the anti-tobacco lobby.  See, e.g., Gio B. Gori & John C. Luik, Passive Smoke: The EPA’s Betrayal of Science and Policy (Vancouver, BC: The Fraser Institute 1999); John C. Luik, “Pandora’s Box: The Dangers of Politically Corrupted Science for Democratic Public Policy,” Bostonia 54 (Winter 1999-94).  See also Elizabeth Fisher, “Case law analysis. Passive smoking and active courts: the nature and role of risk regulators in the US and UK.  Flue-cured Tobacco Co-op v US Environmental Protection Agency,” 12 J. Envt’l Law 79 (2000).

The federal government has been trying to defend the EPA’s 1992 report, ever since.  In 1998, upon listing ETS as a known carcinogen, the Department of Health and Human Services noted that “[t]he individual studies were carefully summarized and evaluated”  in the 1992 EPA report.  U.S. Dep’t of Health & Human Services, National Toxicology Program, Final Report on Carcinogens – Background Document for Environmental Tobacco Smoke: Meeting of the NTP Board of Scientific Counselors – Report on Carcinogens Subcommittee at 24 (Research Triangle Park, NC 1998).  Anti-tobacco scientists, including scientists involved in the EPA report, have attacked the motives of the industry, and of the scientists who have challenged the report.  See, e.g., Jonathan M. Samet & Thomas A. Burke, “Turning Science Into Junk: The Tobacco Industry and Passive Smoking,” 91 Am. J. Pub. Health 1742 (2001); Monique E. Muggl, Richard D. Hurt, and James Repace, “The Tobacco Industry’s Political Efforts to Derail the EPA Report on ETS,” 26 Am. J. Prev. Med. 167 (2004); Deborah E. Barnes & Lisa A. Bero, “Why review articles on the health effects of passive smoking reach different conclusions,” 279 J. Am. Med. Ass’n 1566 (1998).

Of course, science did not remain status quo 1992.  Later studies were published, and the controversy continued, such that the 1992 meta-analysis is now largely scientifically irrelevant.  See James Enstrom & Geoffrey Kabat, “Environmental tobacco smoke and tobacco related mortality in a prospective study of Californians, 1960-98,” 326 Br. Med. J. 1057 (2003); G. Davey Smith, “Effect of passive smoking on health: More information is available, but the controversy still persists,” 326 Br. Med. J. 1048–9 (2003).

A troubling implication of those who attack the tobacco industry is that the industry was not allowed to raise methodological challenges to the EPA’s purported use of a scientific method.  The EPA defenders rarely engage with the specifics of the methodological challenge or the district court’s review. Another implication is that the EPA’s meta-analysis remains a clear example of where a regulatory agency could have acted upon a precautionary principle, but chose to dress up its analysis as something it was not:  a scientific conclusion of causality. Given that the agency was not even engaged in reviewable agency action, and that it had plenty of biological plausibility for a precautionary finding that ETS causes lung cancer, the agency could easily have avoided the vitriolic debate it engendered with its 1992 report.

Bad Gatekeeping or Missed Opportunity – Allen v. Martin Surfacing

November 30th, 2012

Sometimes when federal courts permit dubious causation opinion testimony over Rule 702 objections, the culprit is bad lawyering by the opponent of the proffered testimony.  Allen v. Martin Surfacing, 263 F.R.D. 47 (D. Mass. 2009), may be an important example.

THE CLAIMS

Daniel Allen was the former football coach of the College of The Holy Cross, in Worcester, Massachusetts.  In spring of 2001, defendant Martin Surfacing refinished the gymnasium floor at the college.  Coach Allen was exposed to solvent fumes, including toluene fumes, during defendant’s work, as well as for a couple of months afterwards.   While exposed, Allen experienced “dizziness, headaches, and disorientation.” 263 F.R.D. at 51.  After the gym floor resurfacing was completed, Allen experienced other symptoms, such as fatigue, muscle weakness, and fasciculations in his lower limbs.  In January 2002, at the age of 45, Allen was diagnosed with amyotrophic lateral sclerosis (ALS).  Id. Allen’s condition progressed, and he died three years later, in May 2004.  Id. at 52.

Allen’s family sued for wrongful death.  The parties’ apparently agreed on the following:

  • ALS occurs as a sporadic ALS, as well as “familial ALS,”
  • the cause of sporadic ALS is unknown,
  • Allen developed and died of sporadic ALS,
  • no air sampling established overexposure to any chemical,
  • there were no reliable exposure models to quantify Allen’s exposures,
  • there are no known causes of sporadic ALS, and
  • toluene did not cause Allen’s ALS

Remarkably, defendant lost the Rule 702 challenge to plaintiffs’ expert witnesses’ opinion testimony.  It is easy to suspect that the district judge was sleep at the gate, and that his gatekeeping was deficient.  A close read of the opinion supports the view that this was not Rule 702’s finest moment, but much more was going on to get to admissibility.

First, the plaintiffs’ counsel cleverly avoided running into a wall by avoiding a claim that toluene caused Allen’s ALS. Instead, plaintiffs’ claimed that toluene accelerated the onset of the disease.  This claim was equally dubious, but it allowed the expert witnesses to avoid a mountain of medical opinion, authoritative and well-supportive, that there is no known cause of sporadic ALS.

Second, the plaintiffs’ counsel took the initiative by filing an affirmative motion to admit the testimony of their expert witnesses.  Rather than ceding the initiative to the defendant, the plaintiffs seized the initiative and had the first and last word on admissibility.  As a result, plaintiffs were able to present and frame their witnesses’ opinions sympathetically rather than defensively.

Third, the plaintiffs had the good fortune of the defendant’s counsel’s apparent failure to find the key fallacies, invalidities, and flaws in plaintiffs’ questionable expert witness opinions.

The Allen case teaches that sometimes good lawyering can win a losing case.

The plaintiffs’ counsel retained and presented an array of expert witnesses who might be the usual suspects in a district court’s exclusion of expert witness testimony:

None of these four expert witnesses was a specialist in ALS or ALS causation; none was a neurologist; none had ever addressed ALS causation in a peer-reviewed article.  All four witnesses were frequent testifiers in tort litigation, and some have are repeat offenders when it comes to offering questionable or excludable opinion testimony.  Somehow, the defense dashed this opportunity by retaining only one expert, Dean M. Hashimoto, M.D., J.D., M.P.H., who was also not a specialist in ALS, who was not a neurologist, and who had never published anything on ALS.  And to make matters worse, the defense proceeded to challenge the plaintiffs’ expert witnesses for lack of qualifications!

The defense’s challenges to qualifications takes up a good deal of Judge Saylor’s published opinion, which illustrates the maxim that judges have short attention spans, and you should not waste the opportunity of a motion on an issue that is so easily decided against you.  The scientific issues are difficult and the temptation to avoid them is great.  By leading with an issue that will almost certainly lose, the defense wasted a valuable advocacy opportunity to show the court the fallacious reasoning in the plaintiffs’ case.  By submitting reports from only one expert witness, who had all the deficiencies claimed in the plaintiffs’ set of witnesses, the defense exhibited a duplicity that must have seriously undermined its credibility for the entire set of Rule 702 motion issues.

THE WITNESSES

Dr. Christine Oliver has been testifying in asbestos and other occupational lung disease cases for decades.  She is a pulmonary physician on staff at Massachusetts General Hospital, in Boston, and an associate professor of clinical medicine at the Harvard Medical School.  She is board certified in internal medicine and in occupational medicine (American Board of Preventive Medicine), and her clinical interests are asthma occupational lung disease, and health hazards of construction work.  If the defense had presented real expert witnesses in ALS causation, Dr. Oliver’s expertise would have seemed quite irrelevant.  Dr. Oliver has, as well as I can determine, never researched or published on ALS causation.  She has, however, published on “multiple chemical sensitivity,” which should give a disinterested court some pause.  See L. Christine Oliver and Alison Johnson, “Multiple Chemical Sensitivity: Reflections” (Nov. 4, 2011).

Richard Clapp, professor emeritus at the Boston University School of Public Health, is a known purveyor of dubious courtroom testimony. See, e.g., Sutera v. The Perrier Group of America Inc., 986 F.Supp. 655 (D. Mass. 1997).  He is a frequent testifier and a charter member of the surreptitiously funded SKAPP organization.  Clapp is a non-physician epidemiologist, who has never published on ALS.

Marcia Ratner Ph.D. may be best known for her possession of mace and an unlicensed gun, but she does occasionally show up in civil litigation as an expert witness.  SeeQuincy District Court News,” Patriot Ledger June 09, 2010 (reporting that Ratner pleaded guilty to criminal possession of mace and a firearm).

Ratner is a postdoctoral researcher at Boston University, where she works as a neurotoxicologist.  She does not appear to have ever published a peer-reviewed paper on ALS or ALS causation.  Plaintiffs’ counsel claimed that she was researching a new drug with therapeutic potential for ALS treatment, although they were quite sketchy about details.  Ratner does not appear to hold any NIH grants for ALS drug research.

[Please see update on the discussion of Dr. Ratner at http://schachtmanlaw.com/gatekeeping-in-allen-v-martin-surfacing-postscript/]

William Ewing, an industrial hygienist, frequently testifies in asbestos litigation.  He offered no opinion on causation.

Against this field of witnesses, the defense punted on presenting its own witness with relevant expertise. Dr. Dean M. Hashimoto, the defense’s sole witness on causation, is a physician, lawyer, and has a master’s degree in occupational health.  Hashimoto has no specialized training in ALS or clinical neurology, although he serves on the Massachusetts Workers’ Compensation Board. A pubmed search  shows that Hashimoto has never published on the neurology or causation of ALS.

CAUSATION

The plaintiffs had a huge problem to avoid:  ALS has no known cause.  Counsel table could be filled up with textbooks and review articles, but perhaps the following, lengthy quote from the National Institute for Neurological Disorders and Stroke website suffices to make the point:

“What causes ALS?

The cause of ALS is not known, and scientists do not yet know why ALS strikes some people and not others. An important step toward answering that question came in 1993 when scientists supported by the National Institute of Neurological Disorders and Stroke (NINDS) discovered that mutations in the gene that produces the SOD1 enzyme were associated with some cases of familial ALS. This enzyme is a powerful antioxidant that protects the body from damage caused by free radicals. Free radicals are highly reactive molecules produced by cells during normal metabolism. If not neutralized, free radicals can accumulate and cause random damage to the DNA and proteins within cells. Although it is not yet clear how the SOD1 gene mutation leads to motor neuron degeneration, researchers have theorized that an accumulation of free radicals may result from the faulty functioning of this gene. In support of this, animal studies have shown that motor neuron degeneration and deficits in motor function accompany the presence of the SOD1 mutation.

Studies also have focused on the role of glutamate in motor neuron degeneration. Glutamate is one of the chemical messengers or neurotransmitters in the brain. Scientists have found that, compared to healthy people, ALS patients have higher levels of glutamate in the serum and spinal fluid. Laboratory studies have demonstrated that neurons begin to die off when they are exposed over long periods to excessive amounts of glutamate. Now, scientists are trying to understand what mechanisms lead to a buildup of unneeded glutamate in the spinal fluid and how this imbalance could contribute to the development of ALS.

Autoimmune responses—which occur when the body’s immune system attacks normal cells—have been suggested as one possible cause for motor neuron degeneration in ALS. Some scientists theorize that antibodies may directly or indirectly impair the function of motor neurons, interfering with the transmission of signals between the brain and muscles.

In searching for the cause of ALS, researchers have also studied environmental factors such as exposure to toxic or infectious agents. Other research has examined the possible role of dietary deficiency or trauma. However, as of yet, there is insufficient evidence to implicate these factors as causes of ALS.

Future research may show that many factors, including a genetic predisposition, are involved in the development of ALS.”

NINDS – “Amyotrophic Lateral Sclerosis (ALS) Fact Sheet.”

As a result, the plaintiffs adopted a strategy of confession and avoidance; they renounced any claim that they were asserting a causal claim.  Instead, they insisted that they were “merely” claiming that toluene exposure had accelerated the onset of sporadic ALS in Coach Allen.  This mere claim, however, was actually a causal claim in disguise, and the district judge was taken in by the ruse.  If plaintiffs were claiming that toluene can accelerate the onset of ALS by a meaningful period of time (years), then they were making a causal claim, legally and scientifically.  A shift in the age of onset of a sporadic disease is a causal claim, and it requires supporting evidence, not hand waving.

PLAUSIBLE MECHANISM

One scientist could postulate a reasonable mechanism even for a sporadic disease.  Professional journals and textbooks are filled with such speculation.  These postulations are part of science in that they inform research hypotheses and funding, but they are not conclusions of causality.  The quote above from the NINDS discusses the lack of an anti-oxidizing enzyme and glutamate toxicity as potential mechanisms in familial ALS, but even there, the authors are appropriately modest in avoiding a claim to know the pathogenesis of familial ALS.

The plaintiffs’ approach was to take the suggestion of a mechanism, misrepresent it as a known mechanism, and then claim that toluene activated glutamate toxicity and exercised an oxidizing effect on neurons. The plaintiffs’ team had no basis for claiming that short-term exposure to solvents, or toluene specifically, translated into a toxicity to the relevant human motor neurons that are involved in ALS.  It is a long stretch from suggesting a mechanism to documenting the mechanism to be actually at work in producing, or accelerating, a disease in humans.

A typical statement, from the Yale School of Medicine, Division of Neurology, in 2012:

Why the motor neurons begin to die is still unknown. Recent evidence, however, have implicated glutamate excitotoxicity, free radical toxicity, and mitochondrial dysfunction as possible mechanisms, and this is an area of active research.”

Amyotrophic Lateral Sclerosis (ALS)” (emphasis added).   See also Adams and Victor’s Principles of Neurology 1157-58 (7th ed. 2001) (noting that the pathogenesis of ALS and similar motor neuron diseases is not known).

The district judge seemed mesmerized by Ratner’s having providing a biologically plausible theory for tying ALS progression to toluene exposure.  263 F.R.D. at 60.  Judge Saylor stated that the defense did not address any flaw in Ratner’s methodology other than to point out that her theory was not supported by epidemiology.  The court seemed to equate providing a plausible theory with establishing a scientific conclusion.  More to the point, the court was truly asleep at its gatekeeping task because Ratner’s theory actually presupposed that she knew that Coach Allen was going to develop ALS in any event, only not as early as 2001.  The court faulted the defense for not showing that Ratner’s (and the other plaintiffs’ witnesses’) theory was unreliable, but the burden was on the plaintiffs to show reliability.  Id.  The court not only faulted the defense for carrying a burden it did not have, but it overlooked the very telling criticisms of Ratner’s theories of acceleration and mechanism.

EXCUSES – EPIDEMIOLOGY

Plaintiffs’ expert witnesses had a welter of excuses as to why there was no epidemiologic data to support their theories.  The absence of statistical significance, according to plaintiffs’ expert witnesses does not mean that a study should be disregarded.  Id. at 58.  Their claim is superficially true, but a study not disregarded does not necessarily support a causal inference, either alone or conjunction with other such studies. Similarly, plaintiffs’ claim that flawed studies should not be disregarded is also a half truth.  A flawed study may lead to a much better one, which can support valid inferences.  Flawed studies are thus part of the scientific process because they may lead to a self-correcting triangulation of the truth, but there is little to recommend relying upon flawed studies to support scientific conclusions of causality.  Nevertheless, the district court appeared to swallow these half truths, whole.

Ratner also advanced a claim that the acceleration theory had not been subjected to epidemiologic analysis because of “funding limitations, as most funding goes toward finding treatment or cures for the disease, not towards finding what accelerates the course of the disease.”  Id. at 59 n. 14.  The district court repeats this excuse without critical thought.  If a commonly used solvent such as toluene accelerated the onset of a terrible disease such as ALS by decades, such a putative effect would be amenable to epidemiologic analysis and would be a source of incredible concern and funding efforts by the NIH, NINDS, NIEHS, and other granting agencies and organizations.  Despite excusifying verbiage, Ratner maintained that there were no epidemiologic data that refuted her novel acceleration.  Id. at 59.  Of course, if her excuses were taken seriously, then this absence of refutation was fairly irrelevant, but in any event, this supposed absence could not support the reliability of Ratner’s inferences or conclusions.

The defense focused on the lack of short-term exposures in epidemiologic studies, and also the lack of statistical significance in some studies.  What appears to have been missing from both sides was a comprehensive analysis of the available epidemiologic data.  If long-term exposure were associated with earlier age of onset of ALS, or even a greater risk of ALS, then it would have given some support to Ratner’s novel theory.  The defense appeared to punt on the epidemiology by claiming its irrelevance.  It might have been helpful to point out internal as well as external validity issues to the court.

As for both sides citing different studies, and no side presenting a comprehensive view of the epidemiologic evidence, the court could have given some consideration to the ethical considerations of the incomplete presentation:

“Basis of Expert Medical Testimony

The testimony of an expert medical witness should be founded on a thorough and critical review of the pertinent medical and scientific facts, available data, and relevant literature.”

Ethical Guidelines for Occupational and Environmental Medicine Physicians Serving as Expert Witnesses (Oct. 25, 2007).

DIFFERENTIAL DIAGNOSIS

The plaintiffs’ claim that they were not asserting causation was disingenuous.  As noted above, acceleration of onset is a form of causation.  Of course, exposure to a neurotoxic material, with some symptoms, might have made Allen more aware of other symptoms, and so the time to diagnosis was abbreviated.  The plaintiffs, however, were claiming more than earlier ascertainment; they claimed the toluene exposure caused an underlying disease process to accelerate.

Oliver actually went further and performed an invalid differential etiologic analysis. Oliver reviewed medical records and claimed to have applied “differential diagnosis to the review.”  Id. at 63. This claim was quite bogus because there was no dispute that Allen had and died of ALS, but the district court was beguiled.  Having ruled out family history, Oliver claimed to then rule out other “putative causes” of ALS:  “pesticides and agricultural chemicals containing solvents, 60-hertz magnetic fields, and welding fumes.”  Id. at 63.  In one fell swoop, Oliver created several known causes to be ruled out, and then ruled them out in Allen’s case.  This is remarkable given that NINDS and most of medical sciences does not recognize any known or putative causes of sporadic ALS, and that Oliver failed to rule out the one potential cause that some scientists take seriously:  cigarette smoking.  See, e.g., Hao Wang, Éilis J. O’Reilly, Marc G. Weisskopf, Giancarlo Logroscino, Marji L. McCullough, Michael Thun, Arthur Schatzkin, Laurence N. Kolonel, Alberto Ascherio, “Smoking and risk of amyotrophic lateral sclerosis: a pooled analysis of 5 prospective cohorts” 68 Arch. Neurol. 207 (2011); A. Alonso, G. Logroscino, M.A. Hernán, “Smoking and the risk of amyotrophic lateral sclerosis: a systematic review and meta-analysis,” 81 J. Neurol. Neurosurg. & Psychiatry 1249 (2010); F. Fang & W. Ye, “Smoking may be considered an established risk factor for sporadic ALS,” 74 Neurology 1927 (2010).

Of course, Oliver, and the entire plaintiffs’ expert witness team failed to rule out the most obvious, most prevalent explanation for Allen’s ALS:  unknown.

GENETIC SUSCEPTIBILITY

Ratner testified “to a reasonable degree of scientific certainty that Allen was genetically predisposed to develop ALS and would have developed and died from ALS later in his life.”  263 F.R.D. at 61.  This assertion was truly an incredible, unsupported, unverifiable, and unfalsifiable statement.  If a drug company ever made a similarly unsupported claim in an electronically transmitted document, the Department of Justice would prosecute it for wire fraud.  United States v. Harkonen, 2010 WL 2985257 (N.D. Calif. 2010).

The parties had essentially stipulated that Allen did not suffer from familial ALS, and neither Ratner nor anyone else identified any gene that was responsible for his “susceptibility.”  The district court, of course, did not report how Ratner could possibly have known that Allen was going to develop ALS, only at some unspecified date later than the date when Allen first became aware of signs and symptoms of motor neuron disease.  The district court announced that plaintiffs’ expert witnesses were not propounding “junk science,” but perhaps the heavy perfume helped masquerade the garbage.

POST HOC ERGO PROPTER HOC

The court conclusorily noted, without explanation, that the temporal relationship between exposure and disease manifestation would allow a conclusion of causality:

“Finally, after interpreting the data within a chronological context, the clinician may conclude that the patient’s disease is a neurotoxic illness.”

Id. at 61.  The court appears to accept the temporal pattern as sufficient in itself, or with other information, to support the conclusion.  This reasoning is fallacious.

AGE OF ONSET

Allen developed ALS when he was 45 years old.  Ratner reasoned that the average age of onset was 60, and Allen developed his disease “much earlier than would be expected”; therefore toluene accelerated the onset of Allen’s disease.  Id. at 61. The problem is that there is no “therefore” that can reasonably be claimed in the court’s sentence.

Most publications put the mean and median of age of ALS onset around 55 years, but even if the court were to accept Ratner’s reference to 60 as correct, surely the court recognized that half the cases therefore occurred below the age of 60.  The question of course is the variability in age of onset, and the court’s opinion is silent about the scatter or distribution of age-of-onset data.  Ratner’s reasoning was prima facie invalid unless there was additional information to show a very narrow distribution of age of onset around the mean.  It is difficult to discern whether the defense made this point, but Ratner could not have supported this counterfactual claim.

Here is what the ALS association has to say about the issue:

“Most people who develop ALS are between the ages of 40 and 70, with an average age of 55 at the time of diagnosis. However, cases of the disease do occur in persons in their twenties and thirties.”

Who Gets ALS.”

Ratner essentially conceded that her argument was vacuous and invalid.  When confronted at her deposition about whether age of onset greater than the mean would have changed her opinion, she emphatically denied its relevance:

“My opinion would be the same even if that guy died at 60 instead of 75 and had history of this exposure … but you wouldn’t have bothered to depose me in that case… . Somebody else has moved down from where they are to here. But it may not result in a lawsuit, and I wouldn’t be here, because— I wouldn’t be here.”

Ratner Deposition at 172-3.

RULE 702 ANALYSIS

The district court recognized the novelty of Ratner’s analysis, but opined that Ratner, Oliver, and Clapp had provided sufficient cumulative evidence to support their theories.  263 F.R.D. at 61.  The trial court apparently conducted a Rule 702 hearing, over three days. Both sides filed what appears to have been extensive briefing and affidavits.  There are some huge gaps in the reasoning of the plaintiffs’ expert witnesses, and in the district court’s opinion.  Perhaps those gaps could be filled in with volumes of testimony.  My unscientific opinion is to doubt it. Although the plaintiffs should have had the burden of showing admissibility, the defendant had the practical burden of illustrating the analytical gaps, ipse dixit, fallacies, and invalid inferences that were before the court.  The defense may have indeed pointed out such problems, which were fulsomely present, but the district court’s opinion does not report the obvious defense arguments.  Without more background information, it is difficult to evaluate comprehensively the court’s or the defense’s handling of the scientific issues that were clearly before the court on the Rule 702 motions.  What is clear from what the district court reports is, however, sufficient to document an unsatisfactory judicial review of the evidence discussed.

General Causation and Epidemiologic Measures of Risk Size

November 24th, 2012

The gatekeeper’s door really must swing both ways on causal analysis. For decades, the courts allowed anything as long as the speaker was “an expert witness,” who uttered the magic words “reasonable medical certainty.”  For the most part, this willingness to tolerate all sorts of nonsense favored plaintiffs.  In the backlash against this judicial libertine approach, some courts, such as those in Texas, have embraced a principle that unfairly favors defendants.  Abridgment of scientific method and reasoning is offensive regardless who is being favored.

The Texas courts have adopted a rule that plaintiffs must offer a statistically significant study, with a risk ratio (RR) greater than two, to show general causation.  A RR ≤ 2 can be a strong practical argument against specific causation in many cases. See Courts and Commentators on Relative Risks to Infer Specific CausationRelative Risks and Individual Causal Attribution; and  Risk and Causation in the Law.   But a RR > 2 threshold has little in theory to do with general causation.  There are any number of well-established causal relationships, where the magnitude of the ex ante risk in an exposed population is > 1, but ≤ 2.  The magnitude of risk for cardiovascular disease and smoking is one such well-known example.  As I noted in “Confusion Over Causation in Texas” (Aug. 27, 2011), the Texas Supreme Court managed to confuse general and specific causation concepts in its decision in Merck & Co. v. Garza, 347 S.W.3d 256 (2011).

Still, the search for a RR threshold for general causation does have some basis in the practice of epidemiology. When assessing general causation from only observational epidemiologic studies, where residual confounding and bias may be lurking, it is prudent to require a RR > 2, as a measure of strength of the association that can help us rule out the role of systemic error.  As the cardiovascular disease/smoking example illustrates, however, there is clearly no scientific requirement that the RR be greater than 2 to establish general causation.  Courts should recognize that there are spurious associations with RR >> 2, and true, causal associations with RR < 2. Much will depend upon the number of studies, and the potential for bias or confounding in the body of evidence.  If the other important Bradford Hill factors are present – dose-response, consistent, coherence, etc. – then risk ratios ≤ 2, from observational studies, may suffice to show general causation.  So a requirement of RR > 2, for the showing of general causation, does not make sense as a criterion for general causation; and at best, RR > 2 is a much weaker consideration for general causation than it is for specific causation.

Randomization and double blinding are major steps in controlling confounding and bias, but they are not guarantees that systematic bias has been eliminated.  Similarly, despite the confusion and errors of lawyers and judges, statistical significance does not address bias or confounding.  See, e.g., Zach Hughes, “The Legal Significance of Statistical Significance,” 28 Westlaw Journal: Pharmaceutical 1, 2 (Mar. 2012) (erroneously describing the meaning and function of significance testing; “Stated simply, a statistically significant confidence interval helps ensure that the findings of a particular study are not due to chance or some other confounding factors.”).

A double-blinded, placebo-controlled, randomized clinical trial (RCT) will usually have less opportunity for bias and confounding to play a role.  Imposing a RR > 2 requirement for general causation thus makes less sense in the context of trying to infer general causation from the results of RCTs. The Garza Court, however, went a dictum too far by describing RR > 2 as a requirement that applied to general causation:

Havner holds, and we reiterate, that when parties attempt to prove general causation using epidemiological evidence, a threshold requirement of reliability is that the evidence demonstrate a statistically significant doubling of the risk. In addition, Havner requires that a plaintiff show ‘that he or she is similar to [the subjects] in the studies’ and that ‘other plausible causes of the injury or condition that could be negated [are excluded] with reasonable certainty’.40

347 S.W.3d at 265 (quoting from Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 S.W.2d 706, 720 (Tex. 1997).  See Merk’s Appellant’s Brief to the Texas Court of Appeals at 16, 17 (July 16, 2007) (citing the Havner case as providing a “rational basis for inferring causation”; “To prove general causation, the Garzas were required to introduce at least two statistically significant scientific studies showing that Vioxx at the same dose and duration as taken by Mr. Garza more than doubled the risk of heart attack. Havner, 953 S.W.2d at 718-23, 727.”).

Imposing RR > 2 as a requirement for general causation, in the context of risk ratios from clinical trials, was particularly unwarranted. If general causation were the issue, it would be difficult to make out a reason for why the dose and duration used in the study had to be the same as that used by the specific plaintiff. General causation was not the dispositive issue in Garza, and so this language should be treated as dictum.  The confusion between general and specific causation is unfortunate.

What is the source of the Garza court’s notion about RR and general causation?  One popular article from Science, in the 1990’s, gave some credence to the notion of a minimal RR for general causation. Gary Taubes, “Epidemiology Faces Its Limits,” 269 Science 164 (July 14, 1995) [cited as Taubes]. Taubes collected quotes (or sound bites) from various authors, about the relevance of the magnitude of observed associations.  For instance, Taubes quoted Marcia Angell, a former editor of the New England Journal of Medicine, as articulating a general rule:

“As a general rule of thumb, we are looking for a relative risk of 3 or more [before accepting a paper for publication], particularly if it is biologically implausible or if it’s a brand new finding.”

Taubes at 168.  John Bailar, a professor emeritus at the University of Chicago, was quoted by Taubes as rejecting any reliable dividing line, thus taking a more nuanced approach:

“If you see a 10-fold relative risk and it’s replicated and it’s a good study with biological backup, like we have with cigarettes and lung cancer, you can draw a strong inference. * *  * If it’s a 1.5 relative risk, and it’s only one study and even a very good one, you scratch your chin and say maybe.”

Taubes at 168. Taubes described Harvard epidemiologist Dimitrios Trichopoulos as suggesting that a study should show a four-fold increased risk, and the late Sir Richard Doll of Oxford University as suggesting that a single epidemiologic study would not be persuasive unless the lower limit of its 95% confidence interval exclude 3.0.  Id.

Even if Taubes’ quotes are accurate, there is a risk that they were stripped of important nuance provided by the scientists he interviewed.  There are other, more credible sources, however, for scientists who have insisted on a need to use the size of a RR as a consideration in evaluating the causality of an association, especially for observational studies.  For example, Breslow and Day, two respected cancer researchers, noted in a publication of the World Health Organization, that

“[r]elative risks of less than 2.0 may readily reflect some unperceived bias or confounding factor, those over 5.0 are unlikely to do so.”

Norman E. Breslow & Nicholas E. Day, Statistical Methods in Cancer Research. Volume I The Analysis of Case-Control Studies at 36 (Lyon, International Agency for Research on Cancer Scientific Publications No. 32, 1980).  The caveat makes sense, but it clearly was never intended to be some sort of bright-line rule for people too lazy to look at the actual studies and data.  Unfortunately, not all epidemiologists are as capable as Breslow and Day, and there are plenty of examples of spurious RR > 5, arising from biased or confounded studies.

Sir Richard Doll, and Sir Richard Peto, expressed a similarly skeptical view about RR < 2, in assessing the causality of associations:

“when relative risk lies between 1 and 2 … problems of interpretation may become acute, and it may be extremely difficult to disentangle the various contributions of biased information, confounding of two or more factors, and cause and effect.”

Richard Doll & Richard Peto, The Causes of Cancer 1219 (Oxford Univ. Press 1981).

More recently, plaintiffs’ testifying expert witness, David Goldsmith expressed the view that a RR > 2 is a minimal indication of a strong RR, which is a likely candidate for causality. David F. Goldsmith & Susan G. Rose, “Establishing Causation with Epidemiology,” in Tee L. Guidotti & Susan G. Rose, eds., Science on the Witness Stand:  Evaluating Scientific Evidence in Law, Adjudication, and Policy 57, 60 (OEM Press 2001) (“There is no clear consensus in the epidemiology community regarding what constitutes a ‘strong’ relative risk, although, at a minimum, it is likely to be one where the RR is greater than two; i.e., one in which the risk among the exposed is at least twice as great as among the unexposed.”); Ernst L. Wynder & Geoffrey C. Kabat, “Environmental Tobacco Smoke and Lung Cancer: A Critical Assessment,” in H. Kasuga, ed., Indoor Air Quality 5, 6 (Berlin Springer Verlag, 1990) (“An association is generally considered weak if the odds ratio is under 3.0 and particularly when it is under 2.0, as is the case in the relationship of ETS and lung cancer. If the observed relative risk is small, it is important to determine whether the effect could be due to biased selection of subjects, confounding, biased reporting, or anomalies of particular subgroups.”).

In the 1990’s, Dr. Janet Daling and her colleagues published an observational epidemiologic study on whether abortion was related to later breast cancer. Janet R. Daling, K.E. Malone, L.F. Voigt, E. White, Noel S. Weiss, “Risk of breast cancer among young women: relationship to induced abortion,” 86 J. Nat’l Cancer Instit. 1584 (1994). Several scientists, concerned that Dr. Daling’s findings would be distorted by religious propagandists, wrote that the small RRs in the Daling study could not support a causal interpretation of the data.  In an editorial that accompanied the article, Dr. Lynn Rosenberg, of the Boston University School of Medicine, wrote:

“A typical difference in risk (50%) is small in epidemiologic terms and severely challenges our ability to distinguish if it reflects cause and effect or if it simply reflects bias.”

Lynn Rosenberg, “Induced Abortion and Breast Cancer: More Scientific Data Are Needed,” 86 J. Nat’l Cancer Instit. 1569, 1569 (1994).  Rosenberg’s caution was picked up and repeated by an official statement of the National Cancer Institute (NCI).  Linda Anderson, of the NCI Press Office (NIH) issued a press release to stifle fears raised by Dr. Daling’s abortion research:

“In epidemiologic research, relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or effects of confounding factors that are sometimes not evident.”

Linda Anderson, “Abortion and possible risk for breast cancer: analysis and inconsistencies,” (Wash. DC, NCI Oct. 26. 1994).  In the lay media, an American Cancer Society epidemiologist was quoted in reference to the Daling study:

“Epidemiological studies, in general are probably not able, realistically, to identify with any confidence any relative risks lower than 1.3 (that is a 30% increase in risk) in that context, the 1.5 [reported relative risk of developing breast cancer after abortion] is a modest elevation compared to some other risk factors that we know cause disease.”

Washington Post (Oct 27,1994) (Dr. Eugenia Calle, Director of Analytic Epidemiology for the ACS).

Not surprisingly, tobacco companies, embattled by claims of cancer from environmental tobacco smoke (ETS) cried political correctness when the NCI and the ACS announced a skeptical view of whether RRs between 1 and 2 could show a causal relationship between abortion and breast cancer, while endorsing a low RR as real in the case of ETS and lung cancer.

What the tobacconists, however, missed was that Daling’s association was a relatively novel finding.  Subsequent studies failed to corroborate the association, which now lives on only because of the efforts of theocratic regimes in some of the United States.  The NCI’s reaction to the Daling study was in line with the quotes from Taubes’ article, above.

Recently, two epidemiologists reviewed the issue of minimal reliable risk, and concluded:

“There is no single number for a minimal reliable risk that pertains to all studies.”

Mark J. Nicolich and John F. Gamble, “What is the Minimum Risk that can be Estimated from an Epidemiology Study?,” in Anca Moldoveanu, ed., Advanced Topics in Environmental Health and Air Pollution Case Studies,at 4.1.1 Point 1 (2011).   Of course, this pronouncement by Nicolich and Gamble is precisely the sort of call for sound judgment that lawyers fear because it involves engagement with the studies, their methods, and their data. The potential for bias and confounding is not constant across all studies.  The potential for such errors varies with the nature of the exposure and the outcome under investigation, the design of the study, and myriad particulars and details of the studies involved.  As Nicolich and Gamble explained:

“Theoretically, there is no relative risk that is too small to be estimated. The relative risk is a construct or a concept, not a physical reality. Since it is a mathematically defined concept it can be mathematically estimated to any degree of precision. However, we have shown in this paper that (1) there are many assumptions that must be met to make certain that the RR estimate is accurate and precise; and (2) the significance level or uncertainty associated with the RR estimate has its own set of assumptions that must be met. So, while there may be no theoretical minimum RR that can be estimated, in practice there is a minimum risk and varies depending on uncertainties present in the context of each study.

An analogy in the physical world of estimating a RR is to measure the length of an object. A meterstick is precise enough to determine the width of a table to see if it will fit through a doorway, but a meterstick is not precise enough to measure the diameter of a shaft in an automobile engine with a tolerance of ±1.0 mm. To measure the shaft diameter one would use a micrometer. The micrometer while sufficiently precise to measure the shaft is not adequate to determine the size of a dust mite, usually in the range of 200 to 300 μm. The analogy can be carried through to the size of molecules, to the wavelength of visible light, and to the diameter of an electron. The conclusion is that while all the tasks involve measuring length and there is no practical ‘minimum length’, different tools and considerations are needed depending on the object to be measured and the precision required.”

Id. at 21.

“We agree with Wynder (1987) that epidemiology is able to correctly interpret relatively small relative risks, but only if the best epidemiological methodology is applied and only if the data are fully evaluated by examining all judgment criteria, especially those of biological plausibility. As RRs become smaller, the need for close adherence to these basic principles becomes greater. If these ideas are applied, a conclusion of no risk should reassure society. And when a risk is reported as positive, appropriate preventive measures to reduce avoidable illness can be used to successfully reach the ultimate goal of epidemiology and preventive medicine.”

Id. at 22.

Nicolich and Gamble probably provide more nuance than most courts want, but it is what scientists, policy makers, and lawyers need to hear. Simplistic rules, such as a requirement of two statistically significant studies with RR > 2, do not enhance the credibility of judicial judgments. The requirement is over- and under-inclusive; it screens out real causal associations while allowing spurious associations, almost certainly the product of bias or confounding, to stand.

Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 6

November 21st, 2012

In 1984, before Judge Shoob gave his verdict in the Wells case, another firm filed a birth defects case against Ortho for failure to warn in connection with its non-ionic surfactant spermicides, in the same federal district court, the Northern District of Georgia. The mother in Smith used Ortho’s product about the same time as the mother in Wells (in 1980).  The case was assigned to Judge Shoob, who recused himself.  Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1562 n.1 (N.D. Ga. 1991) (no reasons for the recusal provided).  The Smith case was reassigned to Judge Horace Ward, who entertained Ortho’s motion for summary judgment in July 1988.  Two and one-half years later, Judge Ward granted summary judgment to Ortho on grounds that the plaintiffs’ expert witnesses’ testimony was not based upon the type of data reasonably relied upon by experts in the field, and was thus inadmissible under Federal Rule of Evidence 703. 770 F. Supp. at 1681.

A prevalent interpretation of the split between Wells and Smith is that the scientific evidence developed with new studies, and that the scientific community’s views matured in the five years between the two district court opinions. The discussion in Modern Scientific Evidence is typical:

“As epidemiological evidence develops over time, courts may change their view as to whether testimony based on other evidence is admissible. In this regard it is worth comparing Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir. 1986), with Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991). Both involve allegations that the use of spermicide caused a birth defect. At the time of the Wells case there was limited epidemiological evidence and this type of claim was relatively novel.  In a bench trial the court found for the plaintiff.  *** The Smith court, writing five years later, noted that, ‘The issue of causation with respect to spermicide and birth defects has been extensively researched since the Wells decision.’ Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1563 (N.D. Ga. 1991).”

1 David L. Faigman, Michael J. Saks, Joseph Sanders, and Edward K. Cheng, Modern Scientific Evidence:  The Law and Science of Expert Testimony, “Chapter 23 – Epidemiology,” § 23:4, at 213 n.12 (West 2011) (internal citations omitted).

Although Judge Ward was being charitable to his judicial colleague, this attempt to reconcile Wells and Smith does a disservice to Judge Ward’s hard work in Smith, and Judge Shoob’s errors in Wells.

Even a casual reading of Smith and Wells reveals that the injuries were completely differently.  Plaintiff Crystal Smith was born with a chromosomal defect known as Trisomy-18; Plaintiff Katie Wells was born with limb reduction deficits.   Some studies relevant to one injury had no information about the other.  Other studies, which addressed both injuries, yielded different results for the different injuries.  Although some additional studies were available to Judge Ward in 1988, this difference is hardly the compelling difference between the two cases.

Perhaps the most important difference between the cases is that in Smith, the biologically plausibility that spermicides could cause a Trisomy-18 was completely absent.  The chromosomal defect arises from a meiotic disjunction, an error in meiosis that is part of the process in which germ cells are formed.  Simply put, spermicides arrive on the scene too late to cause a Trisomy-18.  Notwithstanding the profound differences between the injuries involved in Wells and Smith, the Smith plaintiffs sought the application of collateral estoppel.  Judge Ward refused this motion, on the basis of the factual differences in the cases, as well as the availability of new evidence.  770 F.Supp. at 1562.

The difference in injuries, however, was not the only important difference between these two cases.  Wells was actually tried, apparently without any challenge under Frye, or Rules 702 or 703, to the admissibility of expert witness testimony.  There is little to no discussion of scientific validity of studies, or analysis of the requisites for evaluating associations for causality.  It is difficult to escape the conclusion that Judge Shoob decided the Wells case on the basis of superficial appearances, and that he frequently ignored validity concerns in drawing invidious distinctions between plaintiffs’ and defendant’s expert witnesses and their “credibility.”  Smith, on the other hand, was never tried.  Judge Ward entertained and granted dispositive motions for summary judgment, on grounds that the plaintiffs’ expert witnesses’ testimony was inadmissible. Legally, the cases are light years apart.

In Smith, Judge Ward evaluated the same FDA reports and decisions seen by Judge Shoob.  Judge Ward did not, however, dismiss these agency materials simply because one or two of dozens of independent scientists involved had some fleeting connection with industry. 770 F.Supp. at 1563-64.

Judge Ward engaged with the structure and bases of the expert witnesses’ opinions, under Rules 702 and 703.  The Smith case thus turned on whether expert witness opinions were admissible, an issue not considered or discussed in Wells.  As was often the case before the Supreme Court decided Daubert in 1993, Judge Ward paid little attention to Rule 702’s requirement of helpfulness or knowledge.  The court’s 702 analysis was limited to qualifications.  Id. at 1566-67.  The qualifications of the plaintiffs’ witnesses were rather marginal.  They relied upon genetic and epidemiologic studies, but they had little training or experience in these disciplines. Finding the plaintiffs’ expert witnesses to meet the low threshold for qualification to offer an opinion in court, Judge Ward focused on Rule 703’s requirement that expert witnesses reasonably rely upon facts and data that are not otherwise admissible.

The trial court in Smith struggled with how it should analyze the underpinnings of plaintiffs’ witnesses’ proffered testimony.  The court acknowledged that conflicts between expert witnesses typically raise questions of weight, not admissibility.  Id. at 1569.  Ortho had, however, challenged plaintiffs’ witnesses for having given opinions that lacked a “sound underlying methodology.” Id.  The trial court found at least one Fifth Circuit case that suggested that Rule 703 requires trial courts to evaluate the reliability of expert witnesses’ sources.  Id. (citing Soden v. Freightliner Corp., 714 F.2d 498, 505 (5th Cir. 1983). Elsewhere, the trial court also found precedent from Judge Weinstein’s opinion in Agent Orange, as well as Court of Appeals decisions involving Bendectin, all of which turned to Rule 703 as the legal basis for reviewing, and in some cases limiting or excluding expert witness opinion testimony.  Id.

The defendant’s argument under Rule 703 was strained; Ortho argued that the plaintiffs’

“experts’ selection and use of the epidemiological data is faulty and thus provides an insufficient basis upon which experts in the field of diagnosing the source of birth defects normally form their opinions. The defendant also contends that the plaintiffs’ experts’ data on genetics is not of the kind reasonably relied upon by experts in field of determining causation of birth defects.”

Id. at 1572.  Nothing in Rule 703 addresses the completeness or thoroughness of expert witnesses in their consideration of facts and data; nor does Rule 703 address the sufficiency of data or the validity vel non of inferences drawn from facts and data considered.  Nonetheless, the trial court in Smith took Rule 703 as its legal basis for exploring the epistemic warrant for plaintiffs’ witnesses’ causation opinions.

Although plaintiffs’ expert witnesses stated that they had relied upon epidemiologic studies and method, the trial court in Smith went beyond their asseverations.  The Smith trial court explored the credibility of these witnesses at a whole other level.  The court reviewed and discussed the basic structure of epidemiologic studies, and noted that the objective of such studies is to provide a statistical analysis:

“The objective of both case-control and cohort studies is to determine whether the difference observed in the two groups, if any, is ‘statistically significant’, (that is whether the difference found in the particular study did not occur by chance alone).40 However, statistical methods alone, or the finding of a statistically significant association in one study, do not establish a causal relationship.41 As one authority states:

‘Statistical methods alone cannot establish proof of a causal relationship in an association’.42

As a result, once a statistical association is found in an epidemiological study, that data must then be evaluated in a systematic manner to determine causation. If such an association is present, then the researcher looks for ‘bias’ in the study.  Bias refers to the existence of factors in the design of a study or in the manner in which the study was carried out which might distort the result.43

If a statistically significant association is found and there is no apparent ‘bias’, an inference is created that there may be a cause-and-effect relationship between the agent and the medical effect. To confirm or rebut that inference, an epidemiologist must apply five criteria in making judgments as to whether the associations found reflect a cause-and-effect relationship.44 The five criteria are:

1. The consistency of the association;

2. The strength of the association;

3. The specificity of the association;

4. The temporal relationship of the association; and,

5. The coherence of the association.

Assuming there is some statistical association, it is these five criteria that provide the generally accepted method of establishing causation between drugs or chemicals and birth defects.45

The Smith court acknowledged that there were differences of opinion in weighting these five factors, but that some of them were very important to drawing a reliable inference of causality.  Id. at 1775.

A major paradigm shift thus separates Wells and Smith.  The trial court in Wells contented itself with superficial and subjective indicia of witnesses’ personal credibility; the trial in Smith delved into the methodology of drawing an appropriate scientific conclusion about causation.  Telling was the Smith court’s citation to Moultrie v. Martin, 690 F.2d 1078, 1082 (4th Cir. 1982) (“In borrowing from another discipline. a litigant cannot be selective in which principles are applied.”).  770 F.Supp. at 1575 & n.45.  Gone is the Wells retreat from engagement with science, and the dodge that the court must make a legal, not a scientific decision.

Applying the relevant principles, the Smith court found that the plaintiffs’ expert witnesses had deviated from the scientific standards of reasoning and analysis:

“It is apparent to the court that the testimony of Doctors Bussey and Holbrook is insufficiently grounded in any reliable evidence. * * * The conclusions Doctors Bussey and Holbrook reach are also insufficient as a basis for a finding of causality because they fail to consider critical information, such as the most relevant epidemiologic studies and the other possible causes of disease.81

The court finds that the opinions of plaintiffs’ experts are not based upon the type of data reasonably relied upon by experts in determining the cause of birth defects. Experts in determining birth defects rely upon a consensus in genetic or epidemiological investigations or specific generally accepted studies in these fields. While a consensus in genetics or epidemiology is not a prerequisite to a finding of causation in any and all birth defect cases, Rule 703 requires some reliable evidence for the basis of an expert’s opinion.

Experts in determining birth defects also utilize methodologies and protocols not followed by plaintiffs’ experts. Without a well-founded methodology, opinions which run contrary to the consensus of the scientific community and are not supported by any reliable data are necessarily speculative and lacking in the type of foundation necessary to be admissible.

For the foregoing reasons, the court finds that plaintiffs have failed to produce admissible evidence sufficient to show that defendant’s product caused Crystal’s birth defects.”

Id. at 1581.  Rule 703 was forced into a service to filter out methodologically specious opinions.

Not all was smooth sailing for Judge Ward.  Like Judge Shoob, Judge Ward seemed to think that a physical examination of the plaintiff provided helpful, relevant evidence, but he never articulated what the basis for this opinion was. (His Honor did note that the parties agreed that the physical examination offered no probative evidence about causation.  Id. at 1572 n.32.) No harm came of this opinion.  Judge Ward wrestled with the lack of peer review in some unpublished studies, and the existence of a study only in abstract form.  See, e.g., id. at 1579 (“a scientific study not subject to peer review has little probative value”); id. at 1578 (insightfully noting that an abstract had insufficient data to permit a reader to evaluate its conclusions).  The Smith court recognized the importance of statistical analysis, but it confused Bayesian posterior probabilities with significance probabilities:

“Because epidemiology involves evidence on causation derived from group based information, rather than specific conclusions regarding causation in an individual case, epidemiology will not conclusively prove or disprove that an agent or chemical causes a particular birth defect. Instead, its probative value lies in the statistical likelihood of a specific agent causing a specific defect. If the statistical likelihood is negligible, it establishes a reasonable degree of medical certainty that there is no cause-and-effect relationship absent some other evidence.”

The confusion here is hardly unique, but ultimately it did not prevent Judge Ward from reaching a sound result in Smith.

What intervened between Wells and Smith was not any major change in the scientific evidence on spermicides and birth defects; the sea change came in the form of judicial attitudes toward the judge’s role in evaluating expert witness opinion testimony.  In 1986, for instance, after the Court of Appeals affirmed the judgment in Wells, Judge Higginbotham, speaking for a panel of the Fifth Circuit, declared:

“Our message to our able trial colleagues: it is time to take hold of expert testimony in federal trials.”

 In re Air Crash Disaster at New Orleans, 795 F.2d 1230, 1234 (5th Cir. 1986).  By the time the motion for summary judgment in Smith was decided, that time had come.

Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 5

November 21st, 2012

While the trial court was preparing its findings of fact and conclusions of law, Ortho moved to reopen to evidence to permit additional testimony based upon three new articles.  Ortho’s motion came three months after the close of evidence, and Judge Shoob’s announcement of his verdict. The court denied this motion without mentioning what the new articles purported to show.  Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262, 298 (N.D. Ga. 1985), aff’d and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986).

What is remarkable in Wells, from the vantage point of current practice, is the absence of motions directed at the proffered expert witness opinion testimony.  On the basis of Judge Shoob’s opinion, there appears to have been no Frye motion, no motions to exclude expert witnesses based upon the Federal Rules of Evidence, and no motions to strike testimony after the fact for lack of a proper basis.

Having lost the verdict in a bench trial, Ortho had little chance for success in the Court of Appeals on a claim that the evidence supporting the plaintiffs’ verdict was legally insufficient.  The traditional standard, applied by the Court of Appeals, was to sustain the trier of fact’s decision as not “clearly erroneous” when there were two “permissible” views of the evidence. 788 F.2d 741, 743 (11th Cir. 1986).  Without some legal doctrine to filter out flawed, invalid, and inadequate expert witness opinion from permissible views of an evidentiary display, the Court of Appeals was left with only a rubber stamp, which it proceeded to use with alacrity.

Ortho attempted to turn its appellate argument about the sufficiency of the evidence into a legal principle about rejecting factual findings not based upon “scientifically reliable foundations.”  Id. at 744.  The appellate court framed the issue on appeal simply as a “battle of the experts,” which Ortho had lost.  Both sides had qualified expert witnesses, and thus, according to the appellate court, “the district court was forced to make credibility determinations to ‘decide the victor’.” Id. (citing Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1535 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984)).  The Court of Appeals thus acquiesced in Judge Shoob’s superficial analysis, which attempted to resolve a scientific issue by trial atmospherics, demeanor, and subjective impressions of witness confidence rather than the validity of the studies relied upon and inferences drawn therefrom.  The possibility that Judge Shoob might have evaluated the evidentiary basis underlying the expert witnesses’ opinions was not even acknowledged.

The Court of Appeals invoked the language from Ferebee on statistical significance, despite its irrelevance to the case before it:

“We recognize, as did the Ferebee court, that ‘a cause-effect relationship need not be clearly established by animal or epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists. As long as the basic methodology employed to reach such a conclusion is sound, such as use of tissue samples, standard tests, and patient examination, products liability law does not preclude recovery until a “statistically significant” number of people have been injured or until science has had the time and resources to complete sophisticated laboratory studies of the chemical. Id. at 1535-36.”

Wells, 788 F.2d at 745 (quoting Ferebee). Ferebee involved an injury that all parties agreed could be attributed to paraquat exposure without the need for epidemiologic studies; statistical analysis was not particularly germane.  In Wells, on the other hand, both sides relied upon studies that required statistical analyses for any sensible interpretation, and some of the studies actually reported statistically significant results.  The appellate court’s rhetoric was empty and irrelevant.

(to be continued)

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.