TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Manganese Meta-Analysis Further Undermines Reference Manual’s Toxicology Chapter

October 15th, 2012

Last October, when the ink was still wet on the Reference Manual on Scientific Evidence (3d 2011), I dipped into the toxicology chapter only to find the treatment of a number of key issues to be partial and biased.  SeeToxicology for Judges – The New Reference Manual on Scientific Evidence” (Oct. 5, 2011).

The chapter, “Reference Guide on Toxicology,” was written by Professor Bernard D. Goldstein, of the University of Pittsburgh Graduate School of Public Health, and Mary Sue Henifin, a partner in the law firm of Buchanan Ingersoll, P.C.  In particular, I noted the authors’ conflicts of interest, both financial and ideological, which may have resulted in an incomplete and tendentious presentation of important concepts in the chapter.  Important concepts in toxicology, such as hormesis, were omitted completely from the chapter.  See, e.g., Mark P. Mattson and Edward J. Calabrese, eds., Hormesis: A Revolution in Biology, Toxicology and Medicine (N.Y. 2009); Curtis D. Klaassen, Casarett & Doull’s Toxicology: The Basic Science of Poisons 23 (7th ed. 2008) (“There is considerable evidence to suggest that some non-nutritional toxic substances may also impart beneficial or stimulatory effects at low doses but that, at higher doses, they produce adverse effects. This concept of “hormesis” was first described for radiation effects but may also pertain to most chemical responses.”)(internal citations omitted); Philip Wexler, et al., eds., 2 Encyclopedia of Toxicology 96 (2005) (“This type of dose–response relationship is observed in a phenomenon known as hormesis, with one explanation being that exposure to small amounts of a material can actually confer resistance to the agent before frank toxicity begins to appear following exposures to larger amounts.  However, analysis of the available mechanistic studies indicates that there is no single hormetic mechanism. In fact, there are numerous ways for biological systems to show hormetic-like biphasic dose–response relationship. Hormetic dose–response has emerged in recent years as a dose–response phenomenon of great interest in toxicology and risk assessment.”).

The financial conflicts are perhaps more readily appreciated.  Goldstein has testified in any number of so-called toxic tort cases, including several in which courts had excluded his testimony as being methodologically unreliable.  These cases are not cited in the ManualSee, e.g., Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 857 N.E.2d 1114, 824 N.Y.S.2d 584 (2006) (dismissing leukemia (AML) claim based upon claimed low-level benzene exposure from gasoline) , aff’g 16 A.D.3d 648 (App. Div. 2d Dep’t 2005); Exxon Corp. v. Makofski, 116 S.W.3d 176 (Tex.App.–Houston [14th Dist.] 2003, pet. denied) (benzene and ALL claim).

One of the disappointments of the toxicology chapter was its failure to remain neutral in substantive disputes, unless of course it could document its position against adversarial claims.  Table 1 in the chapter presents, without documentation or citation,  a “Sample of Selected Toxicological End Points and Examples of Agents of Concern in Humans.” Although many of the agent/disease outcome relationships in the table are well accepted, one was curiously unsupported at the time; namely the claim that manganese causes Parkinson’s disease (PD).  Reference Manual at 653.This tendentious claim undermines the Manual’s attempt to remain disinterested in what was then an ongoing litigation effort.  Last year, I noted that Goldstein’s scholarship was questionable at the time of publication because PD is generally accepted to have no known cause.  Claims that manganese can cause PD had been addressed in several reviews. See, e.g., Karin Wirdefeldt, Hans-Olaf Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence.  26 European J. Epidemiol. S1, S20-21 (2011); Tomas R. Guilarte, “Manganese and Parkinson’s Disease: A Critical Review and New Findings,” 118 Environ Health Perspect. 1071, 1078 (2010) (“The available evidence from human and non­human primate studies using behavioral, neuroimaging, neurochemical, and neuropathological end points provides strong sup­port to the hypothesis that, although excess levels of [manganese] accumulation in the brain results in an atypical form of parkinsonism, this clini­cal outcome is not associated with the degen­eration of nigrostriatal dopaminergic neurons as is the case in PD.”).

More recently, three neuro-epidemiologists have published a systematic review and meta-analysis of the available analytical epidemiologic studies.  What they found was an inverse association between welding, a trade that involves manganese fume exposure, and Parkinson’s disease. James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

Here are the summary figures from the published meta-analysis:

 

The Fourth Edition should aim at a better integration of toxicology into the evolving science of human health effects.

Reference Manual on Scientific Evidence (3d edition) on Statistical Significance

July 8th, 2012

How does the new Reference Manual on Scientific Evidence (RMSE3d 2011) treat statistical significance?  Inconsistently and at times incoherently.

Professor Berger’s Introduction

In her introductory chapter, the late Professor Margaret A. Berger raises the question of the role statistical significance should play in evaluating a study’s support for causal conclusions:

“What role should statistical significance play in assessing the value of a study? Epidemiological studies that are not conclusive but show some increased risk do not prove a lack of causation. Some courts find that they therefore have some probative value,62 at least in proving general causation.63”

Margaret A. Berger, “The Admissibility of Expert Testimony,” in RMSE3d 11, 24 (2011).

This seems rather backwards.  Berger’s suggestion that inconclusive studies do not prove lack of causation seems nothing more than a tautology.  And how can that tautology support the claim that inconclusive studies “therefore ” have some probative value? This is a fairly obvious logical invalid argument, or perhaps a passage badly in need of an editor.

Berger’s citations in support are curiously inaccurate.  Footnote 62 cites the Cook case:

“62. See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”

The expert witness, Dr. Clapp, in Cook did rely upon his own study, which did not obtain a statistically significant result, but the trial court admitted the expert witness’s testimony; the court denied the Rule 702 challenge to Clapp, and permitted him to testify about a statistically non-significant ecological study.

Footnote 63 is no better:

“63. In re Viagra Prods., 572 F. Supp. 2d 1071 (D. Minn. 2008) (extensive review of all expert evidence proffered in multidistricted product liability case).”

With respect to the concept of statistical significance, the Viagra case centered around the motion to exclude plaintiffs’ expert witness, Gerald McGwin, who relied upon three studies, none of which obtained a statistically significant result in its primary analysis.  The Viagra court’s review was hardly extensive; the court did not report, discuss, or consider the appropriate point estimates in most of the studies, the confidence intervals around those point estimates, or any aspect of systematic error in the three studies.  The court’s review was hardly extensive.  When the defendant brought to light the lack of data integrity in McGwin’s own study, the Viagra MDL court reversed itself, and granted the motion to exclude McGwin’s testimony.  In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).  Berger’s characterization of the review is incorrect, and her failure to cite the subsequent procedural history disturbing.

 

Chapter on Statistics

The RMSE’s chapter on statistics is relatively free of value judgments about significance probability, and, therefore, a great improvement upon Berger’s introduction.  The authors carefully describe significance probability and p-values, and explain:

“Small p-values argue against the null hypothesis. Statistical significance is determined by reference to the p-value; significance testing (also called hypothesis testing) is the technique for computing p-values and determining statistical significance.”

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 241 (3ed 2011).  Although the chapter confuses and conflates the positions often taken to be Fisher’s interpretation of p-values and Neyman’s conceptualization of hypothesis testing as a dichotomous decision procedure, this treatment is unfortunately fairly standard in introductory textbooks.

Kaye and Freedman, however, do offer some important qualifications to the untoward consequences of using significance testing as a dichotomous outcome:

“Artifacts from multiple testing are commonplace. Because research that fails to uncover significance often is not published, reviews of the literature may produce an unduly large number of studies finding statistical significance.111 Even a single researcher may examine so many different relationships that a few will achieve statistical significance by mere happenstance. Almost any large dataset—even pages from a table of random digits—will contain some unusual pattern that can be uncovered by diligent search. Having detected the pattern, the analyst can perform a statistical test for it, blandly ignoring the search effort. Statistical significance is bound to follow.

There are statistical methods for dealing with multiple looks at the data, which permit the calculation of meaningful p-values in certain cases.112 However, no general solution is available, and the existing methods would be of little help in the typical case where analysts have tested and rejected a variety of models before arriving at the one considered the most satisfactory (see infra Section V on regression models). In these situations, courts should not be overly impressed with claims that estimates are significant. Instead, they should be asking how analysts developed their models.113 ”

Id. at 256 -57.  This qualification is omitted from the overlapping discussion in the chapter on epidemiology, where it is very much needed.

 

Chapter on Multiple Regression

The chapter on regression does not add much to the earlier and later discussions.  The author asks rhetorically what is the appropriate level of statistical significance, and answers:

“In most scientific work, the level of statistical significance required to reject the null hypothesis (i.e., to obtain a statistically significant result) is set conventionally at 0.05, or 5%.47”

Daniel Rubinfeld, “Reference Guide on Multiple Regression,” in RMSE3d 303, 320.

 

Chapter on Epidemiology

The chapter on epidemiology mostly muddles the discussion set out in Kaye and Freedman’s chapter on statistics.

“The two main techniques for assessing random error are statistical significance and confidence intervals. A study that is statistically significant has results that are unlikely to be the result of random error, although any criterion for “significance” is somewhat arbitrary. A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”

Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 573.  The suggestion that a statistically significant study has results unlikely due to chance probably crosses the line in committing the transpositional fallacy so nicely described and warned against in the chapter on statistics. The problem is that “results” is ambiguous as between the data as extreme or more so than what was observed, and the point estimate of the mean or proportion in the sample.  Furthermore, the chapter’s statement here omits reference to the conditional nature of the probability that makes it dependent upon the assumption of correctness of the null hypothesis.

The suggestion that alpha is “arbitrary,” is “somewhat” correct, but this truncated discussion is distinctly unhelpful to judges who are likely to take “arbitrary“ to mean “I will get reversed.”  The selection of alpha is conventional to some extent, and arbitrary in the sense that the law’s setting an age of majority or a voting age is arbitrary.  Some young adults, age 17.8 years old, may be better educated, better engaged in politics, better informed about current events, than 35 year olds, but the law must set a cut off.  Two year olds are demonstrably unfit, and 82 year olds are surely past the threshold of maturity requisite for political participation. A court might admit an opinion based upon a study of rare diseases, with tight control of bias and confounding, when p = 0.051, but that is hardly a justification for ignoring random error altogether, or admitting an opinion based upon a study, in which the disparity observed had a p = 0.15.

The epidemiology chapter correctly calls out judicial decisions that confuse “effect size” with statistical significance:

“Understandably, some courts have been confused about the relationship between statistical significance and the magnitude of the association. See Hyman & Armstrong, P.S.C. v. Gunderson, 279 S.W.3d 93, 102 (Ky. 2008) (describing a small increased risk as being considered statistically insignificant and a somewhat larger risk as being considered statistically significant.); In re Pfizer Inc. Sec. Litig., 584 F. Supp. 2d 621, 634–35 (S.D.N.Y. 2008) (confusing the magnitude of the effect with whether the effect was statistically significant); In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1041 (S.D.N.Y. 1993) (concluding that any relative risk less than 1.50 is statistically insignificant), rev’d on other grounds, 52 F.3d 1124 (2d Cir. 1995).”

Id. at 573n.68.  Actually this confusion is not understandable at all, other than to emphasize that the cited courts badly misunderstood significance probability and significance testing.   The authors could well have added In re Viagra, to the list of courts that confused effect size with statistical significance.  See In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).

The epidemiology chapter also chastises courts for confusing significance probability with the probability that the null hypothesis, or its complement, is correct:

“A common error made by lawyers, judges, and academics is to equate the level of alpha with the legal burden of proof. Thus, one will often see a statement that using an alpha of .05 for statistical significance imposes a burden of proof on the plaintiff far higher than the civil burden of a preponderance of the evidence (i.e., greater than 50%). See, e.g., In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 193 (S.D.N.Y. 2005); Marmo v. IBP, Inc., 360 F. Supp. 2d 1019, 1021 n.2 (D. Neb. 2005) (an expert toxicologist who stated that science requires proof with 95% certainty while expressing his understanding that the legal standard merely required more probable than not). But see Giles v. Wyeth, Inc., 500 F. Supp. 2d 1048, 1056–57 (S.D. Ill. 2007) (quoting the second edition of this reference guide).”

Comparing a selected p-value with the legal burden of proof is mistaken, although the reasons are a bit complex and a full explanation would require more space and detail than is feasible here. Nevertheless, we sketch out a brief explanation: First, alpha does not address the likelihood that a plaintiff’s disease was caused by exposure to the agent; the magnitude of the association bears on that question. See infra Section VII. Second, significance testing only bears on whether the observed magnitude of association arose  as a result of random chance, not on whether the null hypothesis is true. Third, using stringent significance testing to avoid false-positive error comes at a complementary cost of inducing false-negative error. Fourth, using an alpha of .5 would not be equivalent to saying that the probability the association found is real is 50%, and the probability that it is a result of random error is 50%.”

577 n81.  The footnotes goes to explain further the difference between alpha probability and burden of proof probability, but incorrectly asserts that “significance testing only bears on whether the observed magnitude of association arose as a result of random chance, not on whether the null hypothesis is true.”  Id.  The significance probability does not address the probability that the observed statistic is the result of random chance; rather it describes the probability of observing at least as large a departure from the expect value if the null hypothesis is true.  Kaye and Freedman’s chapter on statistics does much better at describing and avoiding the transpositional fallacy when describing p-values.

When they are on message, the authors of the epidemiology chapter are certainly correct that significance probability cannot be translated into an assessment of the probability that the null hypothesis, or the obtained sampling statistic, is correct.  What these authors omit, however, is a clear statement that the many courts and counsel who misstate this fact do not create any worthwhile precedent, persuasive or binding.

The epidemiology chapter ultimately offers nothing to help judges in assessing statistical significance:

“There is some controversy among epidemiologists and biostatisticians about the appropriate role of significance testing.85 To the strictest significance testers, any study whose p-value is not less than the level chosen for statistical significance should be rejected as inadequate to disprove the null hypothesis. Others are critical of using strict significance testing, which rejects all studies with an observed p-value below that specified level. Epidemiologists have become increasingly sophisticated in addressing the issue of random error and examining the data from a study to ascertain what information they may provide about the relationship between an agent and a disease, without the necessity of rejecting all studies that are not statistically significant.86 Meta-analysis, as well, a method for pooling the results of multiple studies, sometimes can ameliorate concerns about random error.87

Calculation of a confidence interval permits a more refined assessment of appropriate inferences about the association found in an epidemiologic study.88”

Id. at 578-79.  Mostly true, but again rather  unhelpful to judges and lawyers.  The authors divide the world up into “strict” testers and those critical of “strict” testing.  Where is the boundary? Does criticism of “strict” testing imply embrace of “non-strict” testing, or of no testing at all?  I can sympathize with a judge who permits reliance upon a series of studies that all go in the same direction, with each having a confidence interval that just misses excluding the null hypothesis.  Meta-analysis in such a situation might not just ameliorate concerns about random error, it might eliminate them.  But what of those critical of strict testing?  This certainly does not suggest or imply that courts can or should ignore random error; yet that is exactly what happened in In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1081 (D. Minn. 2008).  The chapter’s reference to confidence intervals is correct in part; they permit a more refined assessment because they permit a more direct assessment of the extent of random error in terms of magnitude of association, as well as the point estimate of the association obtained from the sample.  Confidence intervals, however, do not eliminate the need to interpret the extent of random error.

In the final analysis, the epidemiology chapter is unclear and imprecise.  I believe it confuses matters more than it clarifies.  There is clearly room for improvement in the Fourth Edition.

Meta-Meta-Analysis – Celebrex Litigation – The Claims – Part 2

June 25th, 2012

IMPUTATION

As I noted in part one, the tables were turned on imputation, with plaintiffs making the same accusation that G.E. made in the gadolinium litigation:  imputation involves adding “phantom events” or “imaginary events to each arm of ‘zero event’ trials.”  See Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 8, 9 (May 5, 2010), in Securities Litig.

The plaintiffs claimed that Wei “created” an artifact of a risk ratio of 1.0 by using imputation in each of the zero-event trials.  The reality, however, is that each of those trials had zero risk difference, and the rates of event in drug and placebo arms were both low and equal to one another.  The plaintiffs’ claim that Wei “diluted” the risk is little more than saying that he failed to inflate the risk by excluding zero-event trials.  But zero-event trials represent a test in which the risk of events in both arms is equal, and relatively low.

The plaintiffs seemed to make their point half-heartedly.  They admitted that “imputation in and of itself is a commonly used methodology,” id. at 10, but they claimed that “adding zero-event trials to a meta-analysis is debated among scientists.”  Id.  A debate over methodology in the realm of meta-analysis procedures hardly makes any one of the debated procedures “not generally accepted,” especially in the context of meta-analysis of uncommon adverse events arising in clinical trials designed for other outcomes.  After all, investigators do not design trials to assess a suspected causal association between a medication and an adverse outcome as their primary outcome.  The debate over the ethics of such a trial would be much greater than any gentle debate over whether to include zero-event trials by using either the risk difference or imputation procedures.

The gravamen of the plaintiffs’ complaint against Wei seems to be that he included too many zero-event trials, “skewing the numbers greatly, and notably cites to no publications in which the dominant portion of the meta-analysis was comprised of studies with no events.”  Id. The plaintiffs further argue that Wei could have minimized the “distortion” created by imputation by using a fractional event, ” a smaller number like .000000001 to each trial.”  Id. The plaintiffs notably cited no texts or articles for this strategy.  In any event, if the zero-event trials are small, as they typically are, then they will have large study variances.  Because meta-analyses weight each trial by the inverse of the variance, studies with large variances have little weight in the summary estimate of association.  Including small studies with imputation methods will generally not affect the outcome very much, and their contribution may well reflect the reality of lower or non-differential risk from the medication.

Eliminating trials on the grounds that they had zero events has also been criticized for throwing away important data.  Charles H. Hennekens, David L. DeMets, C. Noel Bairey Merz, Steven L. Borzak, Jeffrey S. Borer,  “Doing More Harm Than Good,” 122 Am. J. Med. 315 (2009) (criticizing Nissen’s meta-analysis of rosiglitazone in which he excluded zero event trials for as biased towards overestimating the magnitude of the summary estimate of association). George A. Diamond, L. Bax, S. Kaul, “Uncertain effects of rosiglitazone on the risk for myocardial infarction and cardiovascular death,” 147 Ann. Intern. Med. 578 (2007) (conducting sensitivity analyses on Nissen’s meta-analysis of rosiglitazone to show that Nissen’s findings lost statistical significance when continuity corrections were made for zero-event trials).

 

RISK DIFFERENCE

The plaintiffs are correct that the risk difference is not the predominant risk measure used in meta-analysis or in clinical trials for that matter.  Researchers prefer risk ratios because they reflect base rates in the ratio.  As one textbook explains:

“the limitation of the [risk difference] statistic is its insensitivity to base rates. For example, a risk that increases from 50% to 52% may be less important than one that increases from 2% to 4%, although in both instances RD = 0.02.”

Julia Littell, Jacqueline Corcoran, and Vijayan Pillai, Systematic Reviews and Meta-Analysis 85 (Oxford 2008).  This feature of the risk difference hardly makes its use unreliable, however.

Pfizer pointed out that at least one other case addressed the circumstances in which the risk difference would be superior to risk ratios in meta-analyses:

“The risk difference method is often used in meta-analyses where many of the individual studies (which are all being pooled together in one, larger analysis) do not contain any individuals who developed the investigated side effect.FN17  whereas such studies would have to be excluded from an odds ratio calculation, they can be included in a risk difference calculation. FN18

FN17. This scenario is more likely to occur when studying a particularly rare event, such as suicide.

FN18. Studies where no individuals experienced the effect must be excluded from an odds ratio calculation because their inclusion would necessitate dividing by zero, which, as perplexed middle school math students come to learn, is impossible. The risk difference’s reliance on subtraction, rather than division, enables studies with zero incidences to remain in a meta-analysis. (Hr’g Tr. 310-11, June 20, 2008 (Gibbons.)).”

In re Neurontin Marketing, Sales Practices, and Products Liab. Litig.,  612 F.Supp. 2d 116, 126 (D. Mass. 2009) (MDL 1629).  See Pfizer’s Defendants’ Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei (Sept. 8, 2009), in Securities Litig. (citing In re Neurontin).

Pfizer also pointed out that Wei had employed both the risk ratio and the risk difference in conducting his meta-analyses, and that none of his summary estimates of association were statistically significant.  Id. at 19, 24.


EXACT CONFIDENCE INTERVALS

The plaintiffs argued that the use of “exact confidence” intervals was not scientifically reliable and could not have been used by Pfizer at the time period covered by the securities class’s allegations.  See Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 15 (May 5, 2010).  Exact intervals, however, are hardly a novelty, and there is often no single way to calculate a confidence interval.  See E. B. Wilson,  “Probable inference, the law of succession, and statistical inference,” 22 J. Am. Stat. Ass’n 209 (1927); C. Clopper, E. S. Pearson, “The use of confidence or fiducial limits illustrated in the case of the binomial,” 26 Biometrika 404 (1934).  Approximation methods are often used, despite their lack of precision, because of their ease in calculation.

Plaintiffs further claimed that the combination of risk difference and exact intervals is novel, not reliable, and not in existence during the class period.  Plaintiffs’ Reply Mem at 15.  The plaintiffs’ argument traded on Wei’s having published on the use of exact intervals in conjunction with the risk difference for heart attacks in clinical trials of Avandia.  See L. Tian, T. Cai, M.A. Pfeffer, N. Piankov, P.Y. Cremieux, and L.J. Wei, “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction,” 10 Biostatistics 275 (2009).  Their argument ignored that Wei combined two well-understood statistical techniques, in a transparent way, with empirical testing of the validity of his approach.  Contrary to plaintiffs’ innuendo, Wei did not develop his approach as an expert witness for GlaxoSmithKline; a version of the manuscript describing his approach was posted on line well before he was ever contacted by GSK counsel. (L.J. Wei, personal communication)  Plaintiffs also claimed that Wei’s use of exact intervals for risk difference showed no increased risk of heart attack for Avandia, contrary to a well-known meta-analysis by Dr. Steven Nissen.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457, 2457 (2007).  This claim, however, is a crude distortion of Wei’s paper, which showed that there was a positive risk difference for heart attacks in the same dataset used by Nissen, but the confidence intervals included zero (no risk difference), and thus chance could not be excluded as explaining Nissen’s result.

 

DURATION OF TRIALS

Pfizer was ultimately successful in defending the Celebrex litigation on the basis of lack of risk associated with 200 mg/day use.  Pfizer also attempted to argue a duration effect on grounds that in one large trial that saw a statistically significant hazard ratio associated with higher doses, the result occurred for the first time among trial participants on medication, at 33 months into the trial.  Judge Bryer rejected this challenge, without explanation.  In re Bextra & Celebrex Marketing Celebrex Sales Practices & Prod. Liab. Litig., 524 F.Supp. 2d 1166, 1183 (2007).  The reasonable inference, however, is that the meta-analyses showed statistically significant results across trials with less duration of use, for 400 mg and 800 mg/day use.

Clearly duration of use is a potential consideration unless the mechanism of causation is such that a causally related adverse event would occur from the first use or very short-term use of the medication.  See In re Vioxx Prods. Liab. Litig., MDL No. 1657, 414 F. Supp. 2d. 574, 579 (E.D. La. 2006) (“A trial court may consider additional factors in assessing the scientific reliability of expert testimony . . . includ[ing] whether the expert’s opinion is based on incomplete or inaccurate dosage or duration data.”).  In the Celebrex litigation, plaintiffs’ counsel appeared to want to have duration effects both ways; they did not want to disenfrancise plaintiffs whose claims turned on short-term use, but at the same time, they criticized Professor Wei for including short-term trials of Celebrex.

One form that the plaintiffs’ criticism of Wei took was his failure to weight the trials included in his meta-analyses by duration.  In the plaintiffs’ words:

“Wei failed to utilize important information regarding the duration of the clinical trials that he analyzed, information that is critical to interpreting and understanding the Celebrex and Bextra safety information that is contained within those clinical trials.3 Because the types of cardiovascular events that are at issue in this case occur relatively rarely and are more likely to be observed after an extended period of exposure, the scientific community is in agreement that they would not be expected to appear in trials of very short duration.”

Plaintiffs’ Mem. of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 2 (July 23, 2009), submitted in In re Pfizer, Inc. Securities Litig., Nos. 04 Civ. 9866(LTS)(JLC), 05 md 1688(LTS) (S.D.N.Y.)[hereafter Securities Litig.]  The plaintiffs maintained that Wei’s meta-analyses were “fatally flawed” because he ignored trial duration, such as would be factored in by performing the analyses in terms of patient years.  Id. at 3

Many of the sources cited by plaintiffs do not support their argument. For instance, the plaintiffs cited articles that noted that weighted averages should be used, but virtually all methods, including Wei’s, weight studies by their variance, which takes into account sample size. Id. at 9 n.3, citing Egger, et al. “Meta-analysis: Principles and Procedures,” 315 Brit. Med. J. 1533 (1997) (an arithmetic average from all trials gives misleading results as results from small studies are more subject to the play of chance and should be given less weight. Meta-analyses use weighted results in which larger trials have more influence that smaller ones). See also id. at 22.  True, true, and immaterial.  No one in the Celebrex cases was using an arithmetic average of risk across trials or studies.

Most of the short-term studies were small, and thus contributed little to the overall summary estimate of association.  Some of the plaintiffs’ citations actually supported using “individual patient data” in the form of time-to-event analyses, which was not possible with many of the clinical trials available.  Indeed, the article the plaintiffs cited, by Dahabreh, did not use time-to-event data for rosiglitazone, because such data were not generally available.  Id. at 9 n.3, citing Dahabreh, “Meta-Analysis Of Rare Events: An Update And Sensitivity Analysis Of Cardiovascular Events In Randomized Trials Of Rosiglitazone,” 5 Clinical Trials 116 (2008).

The plaintiffs’ claim was thus a fairly weak challenge to using simple 2 x 2 tables for the included studies in Wei’s meta-analysis. Both sides failed to mention that many published meta-analyses eschew “patient years” in favor of a simple odds ratio for dichotomous count data from each included study.  See, e.g., Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457, 2457 (2007)(using Peto method with count data, for fixed effect model).  Patient years would be a crude tool to modify the fairly common 2 x 2 table.  The analysis for large studies, with a high number of patient years, would still not reveal whether the adverse events occurred early or late in the trials.  Only a time-to-event analysis could provide the missing information about “duration,” and neither side’s expert witnesses appeared to use a time-to-event analysis.

Interestingly, plaintiffs’ expert witness, Prof. Madigan appears to have received the patient-level data from Pfizer’s clinical trials, but still did not conduct a time-to-event analysis.  Plaintiffs’ Mem. of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 12 (July 23, 2009), submitted in In re Pfizer, Inc. Securities Litig., Nos. 04 Civ. 9866(LTS)(JLC), 05 md 1688(LTS) (S.D.N.Y.)[hereafter Securities Litig] (noting that Madigan had examined all SAS data files produced by Pfizer, and that “[t]hese  files contained voluminous information on each subject in the trials, including information about duration of exposure to the drug ( or placebo), any adverse events experienced and a wide variety of other information.”).  Of course, even with time-to-event data from the Pfizer clinical trials, Madigan had the problem of whether to limit himself to just the Pfizer trials or use all the data, including non-Pfizer trials.  If he opted for completeness, he would have been forced to include trials for which he did not have underlying data.  In all likelihood, Madigan used patient-years in his analyses because he could not conduct a complete analysis with time-to-event data for all trials.

The plaintiffs’ point appears well taken if the court were to assume that there really was a duration issue, but the plaintiffs’ theories were to the contrary, and Pfizer lost its attempt to limit claims to those events that appeared 33 months (or some other fixed time) after first ingestion.  It is certainly correct that patient-year analyses, in the absence of time-to-event analyses, is generally preferred.  Pfizer had used patient-year information to analyze combined trials in its submission to the FDA’s Advisory Committee.  See Pfizer’s Submission of Advisory Committee Briefing Document at 15 (January 12, 2005).  See also  FDA Reviewer Guidance: Conducting a Clinical Safety Review of a New Product Application and Preparing a Report on the Review at 22 (2005); see also id. at 15 (“If there is a substantial difference in exposure across treatment groups, incidence rates should be calculated using person-time exposure in the denominator, rather than number of patients in the denominator.”);  R. H. Friis & T. A. Sellers, Epidemiology for Public Health Practice at 105 (2008) (“To allow for varying periods of observation of the subjects, one uses a modification of the formula for incidence in which the denominator becomes person-time of observation”).

Professor Wei chose not to do a “patient-year” analysis because such a methodological commitment would have required him to drop over a dozen Celebrex clinical trials involving thousands of patients, and dozens of heart attack and stroke events of interest.  Madigan’s approach led him to disregard a large amount of data.  Wei could, of course, stratified the summary estimates for different length clinical trials, and analyzed whether there were differences as a function of trial duration.  Pfizer claimed that Wei conducted a variety of sensitivity analyses, but it is unclear whether he ever used this technique.  Wei should have been allowed in any event to take plaintiffs at their word that thrombotic events from Celebrex occurred shortly after first ingestion.   Pfizer Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei at 2 (Sept. 8, 2009), in Secur. Litig.

 

MADIGAN’S META-ANALYSIS

According to Pfizer, Professor Madigan reached different results from Wei’s largely because he had used different event counts and end points.  The defendants’ challenge to Madigan turned largely upon the unreliable way he went about counting events to include in his meta-analyses.

Data concerning unexpected adverse events in clinical trials often is collected as reports of treating physicians, whose descriptions may be incomplete, inaccurate, or inadequate.  When there is a suggestion that a particular adverse event – say heart attack – occurred more frequently in the medication arm as opposed to the placebo or comparator arms, the usual course of action is to have a panel of clinical experts review all the adverse event reports, and supporting medical charts, to provide diagnoses that can be used in a more complete statistical analyses.  Obviously, the reviewers should be blinded to the patients’ assignment to medication or placebo, and the reviewers should be clinical experts in the clinical specialty of the adverse event.  Cardiologists should be making the call for heart attacks.

In addition to event definition and adjudication, clinical trial interpretation sometimes leads to the use of “composite end points,” which consist of related diagnostic categories, aggregated in some way that makes biological sense.  For instance, if the concern is that a medication causes cardiovascular thrombotic events, a suitable cardiovascular composite end point might include heart attack and ischemic stroke.  Inclusion of hemorrhagic stroke, endocarditis, and valvular disease in the composite, however, would be inappropriate, given the concern over thrombosis.

Professor Madigan is a highly qualified statistician, but, as Pfizer argued, he had no clinical expertise to reassign diagnoses or determine appropriate composite end points.  The essence of the defendants’ challenges revolved around claims of flawed outcome and endpoint ascertainment and definitions.  According to Pfizer’s briefing, the event definition process was unblinded, and conducted by inexpert, partisan reviewers.  Madigan apparently relied upon the work of another plaintiffs’ witness, cardiologist Dr. Lawrence Baruch, as well as that of Dr. Curt Furberg.  Furberg was not a cardiologist; indeed he has never been licensed to practice medicine in the United Dates, and he had not treated a patient in over 30 years. Pfizer Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei at 29 (Sept. 8, 2009), in Secur. Litig.  Furthermore, Furberg was not familiar with current diagnostic criteria for heart attack.  Plaintiffs’ counsel asked Furberg to rework some but not all of Baruch’s classifications, but only for fatal events.  Baruch could not explain why Furberg made these reclassifications.  Furberg acknowledged that he had never used “one-line descriptions to classify events,” which he did in the Celebrex litigation, when he received the assignment from plaintiffs’ counsel on the eve of the Court’s deadline for disclosures.  Id. According to Pfizer, if the plaintiffs’ witnesses had used appropriate end points and event counts, their meta-analyses would not have differed from Professor Wei’s work.  Id.

Pfizer pointed to Madigan’s testimony to claim that he had admitted that, based upon the impropriety of Furberg’s changing end point definitions, and his own changes, made without the assistance of a clinician, he would not submit the earlier version of his meta-analysis for peer review.  Pfizer’s [Proposed] Findings of Fact and Conclusions of Law with Respect to Motion to Exclude Certain Plaintiffs’ Experts’ Opinions Regarding Celebrex and Bextra, and Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei, Document 175, submitted in Securities Litig. (Dec. 4, 2009). at 33,  43.  The plaintiffs countered that Furberg’s reclassifications did not change Madigan’s reports, at least for certain years. Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 18 (May 5, 2010), in Securities Litig.

The trial court denied Pfizer’s challenges to Madigan’s meta-analysis in the securities fraud class action.  The court attributed any weakness in the classification of fatal adverse events by Baruch and Furberg to the limitations of the underlying data created and produced by Pfizer itself.  In re Pfizer Inc. Securities Litig., 2010 WL 1047618, *4 (S.D.N.Y. 2010).

 

Composites

Pfizer also argued that Madigan put together composite outcomes that did not make biological sense in view of the plaintiffs’ causal theories.  For instance, Madigan left out strokes in his composite, although he included both heart attack and stroke in his primary end point for his Vioxx litigation analysis, and he had no reason to distinguish Vioxx and Celebrex in terms of claimed thrombotic effects.  Pfizer’s [Proposed] Findings of Fact and Conclusions of Law with Respect to Motion to Exclude Certain Plaintiffs’ Experts’ Opinions Regarding Celebrex and Bextra, and Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei, Document 175, submitted in Securities Litig. (Dec. 4, 2009). at 13-14, 18.  According to Pfizer, Madigan’s composite was novel and unvalidated by relevant, clinical opinion.  Id. at 29, 33.

The plaintiffs’ response is obscure.  The plaintiffs seemed to claim that Madigan was justified in excluding strokes because some kinds of stroke, hemorrhagic strokes, are unrelated to thrombosis.  Plaintiffs’ Reply Memorandum of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 14 (May 5, 2010), in Securities Litig. at 14.  This argument is undermined by the facts:  better than 85% of strokes being ischemic in origin, and even some hemorrhagic strokes start as a result of an ischemic event.

In any event, Pfizer’s argument about Madigan’s composite end points did not gain any traction with the trial judge in the securities fraud class action:

“Dr. Madigan’s written submissions and testimony described clearly and justified cogently his statistical methods, selection of endpoints, decisions regarding event classification, sources of data, as well as the conclusions he drew from his analysis. Indeed, Dr. Madigan’s meta-analysis was based largely on data and endpoints developed by Pfizer. All four of the endpoints that Dr. Madigan used in his analysis-Hard CHD, Myocardial Thromboembolic Events, Cardiovascular Thromboembolic Events, and CV Mortality-have been employed by Pfizer in its own research and analysis. The use of Hard CHD in the relevant literature combined with the use of the other three endpoints by Pfizer in its own 2005 meta-analysis will assist the trier of fact in determining Pfizer’s knowledge and understanding of the pre-December 17, 2004, cardiovascular safety profile of Celebrex.”

In re Pfizer Inc. Securities Litig., 2010 WL 1047618, *4 (S.D.N.Y. 2010).

Meta-Meta-Analysis – Celebrex Litigation – The Claims – Part One

June 21st, 2012

In the Celebrex/Bextra litigation, both sides acknowledged the general acceptance and validity of meta-analysis, for both observational studies and clinical trials, but attacked the other side’s witnesses’ meta-analyses on grounds specific to how they were conducted.  See, e.g., Pfizer Defendants’ Motion to Exclude Certain Plaintiffs’ Experts’ Causation Opinion Regarding Celebrex – Memorandum of Points and Authorities in Support Thereof at 14, 16 (describing meta-analysis as “appropriate” and a “useful way to evaluate the presence and consistency of an effect,” and “a valid technique for analyzing the results of both randomized clinical trials and observational studies”)(dated July 20, 2007), submitted in MDL 1699, In re Bextra and Celebrex Marketing Sales Practices & Prod. Liab. Litig., Case No. 05-CV-01699 CRB (N.D. Calif.) [hereafter MDL 1699]; Plaintiffs’ Memorandum of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 2 (July 23, 2009) (“While use of a properly conducted meta-analysis is appropriate, there are underlying scientific principles and techniques to be used in meta-analysis that are widely accepted among biostatisticians and epidemiologists. Wei’s meta-analysis – which he acknowledges is based in part on an admittedly novel approach that is not generally recognized by the scientific community – fails to follow certain of these key principles.”), submitted in In re Pfizer, Inc. Securities Litig., Nos. 04 Civ. 9866(LTS)(JLC), 05 md 1688(LTS) (S.D.N.Y.)[hereafter Securities Litig.]

The plaintiffs and defendants expended a great deal of energy in attacking the other side’s meta-analyses as conducted.  With all the briefing in the federal MDL, the New York state cases, and the securities fraud class action, hundreds of pages were written on the suspected flaws in meta-analyses.  The courts, in both the products liability MDL cases and in the securities case, denied the challenges in a few sentences.  Indeed, it is difficult if not impossible to discern what the challenges were from reading the courts’ decisions. In re Pfizer Inc. Securities Litig., 2010 WL 1047618 (S.D.N.Y. 2010); In re Bextra and Celebrex, 2008 N.Y. Misc. LEXIS 720; 239 N.Y.L.J. 27(2008); In re Bextra and Celebrex Marketing Sales Practices and Product Liability Litig., MDL No. 1699, 524 F.Supp. 2d 1166 (N.D. Calif. 2007)

Although the issues shifted some over the course of these litigations, certain important themes recurred.  The plaintiffs focused their attack upon the meta-analyses conducted by defense expert witness, Lee-Jen Wei, a professor of biostatistics at the Harvard School of Public Health.

The plaintiffs maintained that Professor Wei’s meta-analyses should be excluded under Rule 702, or the New York case law, because of

  • inclusion of short-term clinical trials
  • failure to weight risk ratios by person years
  • inclusion of zero-event trials with use of imputation methods
  • use of risk difference instead of risk ratios
  • use of exact confidence intervals instead of estimated intervals

See generally Plaintiffs’ Memorandum of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei (July 23, 2009), in Securities Litig.

The plaintiffs advanced meta-analyses conducted by Professor David Madigan, Professor and Chair in the Department of Statistics, Columbia University.  The essence of the defendants’ challenges revolved around claims of flawed outcome and endpoint ascertainment and definitions:

  • invalid clinical endpoints
  • flawed data collection procedures
  • ad hoc changes in procedure and methods
  • novel methodologies “never used in the history of clinical research”
  • lack of documentation for classifying events
  • absence of expert clinical judgment in classifying event for inclusion in meta-analysis
  • creation of composite endpoints that included events unrelated to plaintiffs’ theory of thrombotic mechanism
  • lack of blinding to medication use when categorizing events
  • failure to adjust for multiple comparisons in meta-analyses

See generally Pfizer Defendants’ Motion to Exclude Certain Plaintiffs’ Experts’ Causation Opinion Regarding Celebrex – Memorandum of Points and Authorities in Support Thereof (dated July 20, 2007), in MDL 1699; Pfizer defendants’ [Proposed] Findings of Fact and Conclusions of Law with Respect to Motion to Exclude Certain Plaintiffs’ Experts’ Opinions Regarding Celebrex and Bextra, and Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei, Document 175, submitted in Securities Litig. (Dec. 4, 2009).

Why did the three judges involved (Judge Breyer in the federal MDL; Justice Kornreich in the New York state cases; and Judge Swain in the federal securities putative class action) give such cursory attention to these Rule 702/Frye challenges?  The complexity of the issues, the lack of clarity in the lawyers’ briefings, and the stridency of both sides perhaps contributed to shorten judicial attention span.  Some of the claims were simply untenable, and may have obliterated more telling critiques.

ZERO-EVENT TRIALS

Many of the Celebrex parties’ claims can be traced to a broader issue of what to include or exclude in a meta-analysis.  Consider for instance the plaintiffs’ challenge to Wei’s meta-analysis.  The plaintiffs faulted Wei for including short-term clinical trials in his meta-analysis, while sponsoring their own expert witness testimony that Celebrex could induce heart attack or stroke after first ingestion of the medication.  Having made the claim, the plaintiffs were hard pressed to exclude short-term trials, other than to argue that such trials frequently had zero adverse events in either the medication or placebo arms.  Many meta-analytic methods, which treat each included study as a 2 x 2 contingency table, and calculate an odds ratio for each table, cannot accommodate zero event data.

Whether or not hard pressed, the plaintiffs made the claim. The plaintiffs’ analogized to the lack of reliability of underpowered clinical trials to provide evidence of safety.  See Plaintiffs’ Reply Memorandum of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 6 (May 5, 2010), in Securities Litig. (citing In re Neurontin Mktg., Sales Practices, and Prod. Liab. Litig., 612 F. Supp. 2d 116, 141 (D. Mass. 2009) (noting that many of Pfizer’s studies were “underpowered” to detect the alleged connection between Neurontin and suicide).  The power argument, however, does not make sense in the context of a meta-analysis, which is aggregating data across studies to overcome the alleged lack of power in a single study.

Not surprisingly, clinical trials of a non-cardiac medication will often report no event of the outcome of interest, such as heart attack.  These trials are referred to as a “zero event”, which can happen in one or both arms of a given trial.  Some searchers exclude these studies from a meta-analysis because of the impossibility of calculating an odds ratio without using imputation in the zero cells of the 2 x 2 tables. Although there are methods to address zero-event trials, some researchers believe that the existence of several zero-event trials essentially means that the sparse data from rare outcomes deprives statistical tests of their usual meaning.  Traditional statistical standards of significance (p < 0.05) are described as “tenuous,” and too high, in this situation. A.V. Hernandez, E. Walker, J. P. Ioannidis, M.W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

The exclusion of zero-event trials from meta-analyses of rare outcomes can yield biased results. See generally M.J. Bradburn, J.J Deeks, J.A. Berlin, and A. Russell Localio,” Much ado about nothing: a comparison of the performance of meta-analytical methods with rare events,” 26 Statistics in Med. 53 (2007); M.J. Sweeting, A.J. Sutton, and P.C. Lambert, “What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data,” 23 Statistics in Med. 1351 (2004)(erratum at 25 Statistics in Med. 2700 (2006) (“Many routinely used summary methods provide widely ranging estimates when applied to sparse data with high imbalance between the size of the studies’ arms. A sensitivity analysis using several methods and continuity correction factors is advocated for routine practice.”).

Others researchers include zero-event trials as providing helpful information about the absence of risk. Zero-event trials:

“provide relevant data by showing that event rates for both the intervention and control groups are low and relatively equal. Excluding such trial data potentially increases the risk of inflating the magnitude of the pooled treatment effect.”

J.O. Friedrich, N.K. Adhikari, J. Beyene, “Inclusion of zero total event trials in meta-analyses maintains analytic consistency and incorporates all available data,” 5 BMC Med. Res. Methodol. 2 (2007)[cited as Friedrich].  Zero event trials can be included in meta-analyses by using something called a standard “continuity correction,” which involves imputing events, or fractional events, in all cells of the 2 x 2 table. One approach, the zero is replaced with 0.5 and all other numbers are increased by 0.5. Friedrich at 7.

After examining the bias in several meta-analyses from excluding zero-event trials, Friedrich and colleagues recommended:

“We believe these trials [with zero events] should also be included if RR [relative risks] or OR [odds ratios] are the effect measures to provide a more conservative estimate of effect size(even if this change in effect size is very small for RR and OR), and to provide analytic consistency and include the same number of trials in the meta-analysis, regardless of the summary effect measure used. Inclusion of zero total event trials would enable the inclusion of all available randomized controlled data in a meta-analysis, thereby providing the most generalizable estimate of treatment effect.”

Friedrich at 5-6.

Wei addressed the problem of zero-event trials by using common imputation methods, not so different from what plaintiffs’ expert witness Dr. Ix used in the gadolinium litigation. See Meta-Meta-Analysis — The Gadolinium MDL — More Than Ix’se Dixit.  Given that plaintiffs advanced a mechanistic theory, which would explain cardiovascular thrombotic events almost immediately upon first ingestion of Celebrex, Professor Wei’s attempt to save the data inherent in zero-event trials by “continuity correction” or imputation methods seems reasonable and well within meta-analytic procedures.

 

RISK DIFFERENCE

Professor Wei did not limit himself to a single method or approach.  In addition to using imputation methods, Wei used risk difference, rather than risk ratios, as the parameter of interest.  The risk difference is simply the difference between two risks: the risk or probability of an event in one group less the risk or probability of that event in another group.  Contrary to the plaintiffs’ claims, there is nothing novel or subversive about conducting a meta-analysis with the risk difference as the parameter of interest, rather than a risk ratio.  In the context of randomized clinical trials, the risk difference is expected as a measure of absolute effect.  See generally, Michael Borenstein, L. V. Hedges, J. P. T. Higgins, and H. R. Rothstein, Introduction to Meta-Analysis (2009); Julian PT Higgins and Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions (2008)

Like risk ratios, the risk difference yield a calculated confidence interval at any desired coefficient of confidence.  Confidence intervals for dichotomous events are often based upon approximate methods that build upon the normal approximation to the binomial distribution.  These approximate methods require assumptions of sample size that may not be met in cases involving sparse data.  With modern computers, calculating exact confidence intervals is not particularly difficult, and Professor Wei has published a methods paper in which he explains the desirability of using the risk difference with exact intervals in addressing meta-analyses of sparse data, such as was involved in the Celebrex litigation.  See L. Tian, T. Cai, M.A. Pfeffer, N. Piankov, P.Y. Cremieux, and L.J. Wei, “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction,” 10 Biostatistics 275 (2009).

Plaintiffs attacked Wei’s approach as “novel” and not generally accepted.  Judge Swain appropriately dismissed this attack:

“Dr. Wei’s methodology, the validity of which Plaintiffs contest and the novelty of which Plaintiffs seek to highlight, appears to have survived the rigors of peer review at least once, and is subject to critique by virtue of its transparency. Dr. Wei’s report, supplemented by his declaration, is sufficient to meet Defendants’ burden of demonstrating that his testimony is the product of reliable principles and methods. He has explained his methods, which can be tested. Plaintiffs’ critiques of Dr. Wei’s choices regarding which trials to include in his own meta-analysis, the origins of the data he used, the date at which he undertook his meta-analysis, and at whose behest he performed his analysis all go to the weight of Dr. Wei’s testimony.”

In re Pfizer Inc. Securities Litig., 2010 WL 1047618, *7 (S.D.N.Y. 2010).  The approach taken by Wei is novel only in the sense that researchers have not previously tried to push the methodological envelope of meta-analysis to deploy the technique for rare outcomes and sparse data, with many zero-event trials.  The risk difference approach is well suited to the situation, and the use of exact confidence intervals is hardly novel or dubious.

Meta-Meta-Analysis — The Gadolinium MDL — More Than Ix’se Dixit

June 8th, 2012

There is an tendency, for better or worse, for legal bloggers to be partisan cheerleaders over litigation outcomes.  I admit that most often I am dismayed by judicial failures or refusals to exclude dubious plaintiffs’ expert witnesses’ opinion testimony, and I have been known to criticize such decisions.  Indeed, I wouldn’t mind seeing courts exclude dubious defendants’ expert witnesses.  I have written approvingly about cases in which judges have courageously engaged with difficult scientific issues, seen through the smoke screen, and properly assessed the validity of the opinions expressed.  The Gadolinium MDL (No. 1909) Daubert motions and decision offer a fascinating case study of a challenge to an expert witness’s meta-analysis, an effective defense of the meta-analysis, and a judicial decision to admit the testimony, based upon the meta-analysis.  In re Gadolinium-Based Contrast Agents Prods. Liab. Litig., 2010 WL 1796334 (N.D. Ohio May 4, 2010) [hereafter Gadolinium], reconsideration denied, 2010 WL 5173568 (June 18, 2010).

Plaintiffs proffered general causation opinions (between gadolinium contrast media and Nephrogenic Systemic Fibrosis (“NSF”), by a nephrologist, Joachim H. Ix, M.D., with training in epidemiology.  Dr. Ix’s opinions were based in large part upon a meta-analysis he conducted on data in published observational studies.  Judge Dan Aaron Polster, the MDL judge, itemized the defendant’s challenges to Dr. Ix’s proposed testimony:

“The previously-used procedures GEHC takes issue with are:

(1) the failure to consult with experts about which studies to include;

(2) the failure to independently verify which studies to select for the meta-analysis;

(3) using retrospective and non-randomized studies;

(4) relying on studies with wide confidence intervals; and

(5) using a “more likely than not” standard for causation that would not pass scientific scrutiny.”

Gadolinium at *23.  Judge Polster confidently dispatched these challenges.  Dr. Ix, as a nephrologist, had subject-matter expertise with which to develop inclusionary and exclusionary criteria on his own.  The defendant never articulated what, if any, studies were inappropriately included or excluded.  The complaint that Dr. Ix had used retrospective and non-randomized studies also rang hollow in the absence of any showing that there were randomized clinical trials with pertinent data at hand.  Once a serious concern of nephrotoxicity arose, clinical trials were unethical, and the defendant never explained why observational studies were somehow inappropriate for inclusion in a meta-analysis.

Relying upon studies with wide confidence intervals can be problematic, but that is one of the reasons to conduct a meta-analysis, assuming the model assumptions for the meta-analysis can be verified.  The plaintiffs effectively relied upon a published meta-analysis, which pre-dated their expert witness’s litigation effort, in which the authors used less conservative inclusionary criteria, and reported a statistically significant summary estimate of risk, with an even wider confidence interval.  R. Agarwal, et al., ” Gadolinium-based contrast agents and nephrogenic systemic fibrosis: a systematic review and meta-analysis,” 24 Nephrol. Dialysis & Transplantation 856 (2009).  As the plaintiffs noted in their opposition to the challenge to Dr. Ix:

“Furthermore, while GEHC criticizes Dr. Ix’s CI from his meta-analysis as being “wide” at (5.18864 and 25.326) it fails to share with the court that the peer-reviewed Agarwal meta-analysis, reported a wider CI of (10.27–69.44)… .”

Plaintiff’s Opposition to GE Healthcare’s Motion to Exclude the Opinion Testimony of Joachim Ix at 28 (Mar. 12, 2010)[hereafter Opposition].

Wider confidence intervals certainly suggest greater levels of random error, but Dr. Ix’s intervals suggested statistical significance, and he had carefully considered statistical heterogeneity.  Opposition at 19. (Heterogeneity was never advanced by the defense as an attack on Dr. Ix’s meta-analysis).  Remarkably, the defendant never advanced a sensitivity analysis to suggest or to show that reasonable changes to the evidentiary dataset could result in loss of statistical significance, as might be expected from the large intervals.  Rather, the defendant relied upon the fact that Dr. Ix had published other meta-analyses in which the confidence interval was much narrower, and then claimed that he had “required” these narrower confidence intervals for his professional, published research.  Memorandum of Law of GE Healthcare’s Motion to Exclude Certain Testimony of Plaintiffs’ Generic Expert, Joachim H. Ix, MD, MAS, In re Gadolinium MDL No. 1909, Case: 1:08-gd-50000-DAP  Doc #: 668   (Filed Feb. 12, 2010)[hereafter Challenge].  There never was, however, a showing that narrower intervals were required for publication, and the existence of the published Agarwal meta-analysis contradicted the suggestion.

Interestingly, the defense did not call attention to Dr. Ix’s providing an incorrect definition of the confidence interval!  Here is how Dr. Ix described the confidence interval, in language quoted by plaintiffs in their Opposition:

“The horizontal lines display the “95% confidence interval” around this estimate. This 95% confidence interval reflects the range of odds ratios that would be observed 95 times if the study was repeated 100 times, thus the narrower these confidence intervals, the more precise the estimate.”

Opposition at 20.  The confidence interval does not provide a probability distribution of the parameter of interest; rather the distribution of confidence intervals has a probability of covering the hypothesized “true value” of the parameter.

Finally, the defendant never showed any basis for suggesting that a scientific opinion on causation requires something more than a “more likely than not” basis.

Judge Polster also addressed some more serious challenges:

“Defendants contend that Dr. Ix’s testimony should also be excluded because the methodology he utilized for his generic expert report, along with varying from his normal practice, was unreliable. Specifically, Defendants assert that:

(1) Dr. Ix could not identify a source he relied upon to conduct his meta-analysis;

(2) Dr. Ix imputed data into the study;

(3) Dr. Ix failed to consider studies not reporting an association between GBCAs and NSF; and

(4) Dr. Ix ignored confounding factors.”

Gadolinium at *24

IMPUTATION

The first point, above – the alleged failure to identify a source for conducting the meta-analysis – rings fairly hollow, and Judge Polster easily deflected it.  The second point raised a more interesting challenge.  In the words of defense counsel:

“However, in arriving at this estimate, Dr. Ix imputed, i.e., added, data into four of the five studies.  (See Sept. 22 Ix Dep. Tr. (Ex. 20), at 149:10-151:4.)  Specifically, Dr. Ix added a single case of NSF without antecedent GBCA exposure to the patient data in the underlying studies.

* * *

During his deposition, Dr. Ix could not provide any authority for his decision to impute the additional data into his litigation meta-analysis.  (See Sept. 22 Ix Dep. Tr. (Ex. 20), at 149:10-151:4.)  When pressed for any authority supporting his decision, Dr. Ix quipped that ‘this may be a good question to ask a Ph.D level biostatistician about whether there are methods to [calculate an odds ratio] without imputing a case [of NSF without antecedent GBCA exposure]’.”

Challenge at 12-13.

The deposition reference suggests that the examiner had scored a debating point by catching Dr. Ix unprepared, but by the time the parties briefed the challenge, the plaintiffs had the issue well in hand, citing A. W. F. Edwards, “The Measure of Association in a 2 × 2 Table,” 126 J. Royal Stat. Soc. Series A 109 (1963); R.L. Plackett, “The Continuity Correction in 2 x 2 Tables,” 51 Biometrika 327 (1964).  Opposition at 36 (describing the process of imputation in the event of zero counts in the cells of a 2 x 2 table for odds ratios).  There are qualms to be stated about imputation, but the defense failed to make them.  As a result, the challenge overall lost momentum and credibility.  As the trial court stated the matter:

“Next, there is no dispute that Dr. Ix imputed data into his meta-analysis. However, as Defendants acknowledge, there are valid scientific reasons to impute data into a study. Here, Dr. Ix had a valid basis for imputing data. As explained by Plaintiffs, Dr. Ix’s imputed data is an acceptable technique for avoiding the calculation of an infinite odds ratio that does not accurately measure association.7 Moreover, Dr. Ix chose the most conservative of the widely accepted approaches for imputing data.8 Therefore, Dr. Ix’s decision to impute data does not call into question the reliability of his meta-analysis.”

Gadolinium at *24.

FAILURE TO CONSIDER NULL STUDIES

The defense’s challenged including a claim that Dr. Ix had arbitrarily excluded studies in which there was no reported incidence of NSF. The defense brief unfortunately does not describe the studies excluded, and what, if any, effect their inclusion in the meta-analysis would have had.  This was, after all, the crucial issue. The abstract nature of the defense claim left the matter ripe for misrepresentation by the plaintiffs:

“GEHC continues to misunderstand the role of a meta-analysis and the need for studies that included patients both that did or did not receive GBCAs and reported on the incidence of NSF, despite Dr. Ix’s clear elucidation during his deposition. (Ix Depo. TR [Exh.1] at 97-98).  Meta-analyses such as performed by Dr. Ix and Dr. Agarwal search for whether or not there is a statistically valid association between exposure and disease event. In order to ascertain the relationship between the exposure and event one must have an event to evaluate. In other words, if you have a study in which the exposed group consists of 10,000 people that are exposed to GBCAs and none develop NSF, compared to a non-exposed group of 10,000 who were not exposed to GBCAs and did not develop NSF, the study provides no information about the association between GBCAs and NSF or the relative risk of developing NSF.”

Challenge at 37 – 38 (emphasis in original).  What is fascinating about this particular challenge, and the plaintiffs’ response, is the methodological hypocrisy exhibited.  In essence, the plaintiffs argued that imputation was appropriate in a case-control study, in which one cell contained a zero, but they would ignore a great deal of data in a cohort study with data.  To be sure, case-control studies are more efficient than cohort studies for identifying and assessing risk ratios for rare outcomes.  Nevertheless, the plaintiffs could easily have been hoisted with their own hypothetical petard.  No one in 10,000 gadolinium-exposed patients developed NSF; and no one in a control group did either.  The hypothetical study suggests that the rate of NSF is low and not different in the exposed and in the unexposed patients.  The risk ratio could be obtained by imputing an integer for the cells containing zero, and a confidence interval calculated.  The risk ratio, of course, would be 1.0.

Unfortunately, the defense did not make this argument; nor did it explore where the meta-analysis might have come out had a more even-handed methodology been taken by Dr. Ix.  The gap allowed the trial court to brush the challenge aside:

“The failure to consider studies not reporting an association between GBCAs and NSF also does not render Dr. Ix’s meta-analysis unreliable. The purpose of Dr. Ix’s meta-analysis was to study the strength of the association between an exposure (receiving GBCA) and an outcome (development of NSF). In order to properly do this, Dr. Ix necessarily needed to examine studies where the exposed group developed NSF.”

Gadolinium at *24.  Judge Polster, with no help from the defense brief, missed the irony of Dr. Ix’s willingness to impute data in the case-control 2 x 2 contingency tables, but not in the relative risk tables.

CONFOUNDING

Defendants complained that Dr. Ix had ignored the possibility that confounding factors had contributed to the development of NSF.  Challenge at 13.  Defendants went so far as to charge Dr. Ix with misleading the court by failing to consider other possible causative exposures or conditions.  Id.

Defendants never identified the existence, source, and likely magnitude of confounding factors.  As a result, the plaintiffs’ argument, based in the Reference Manual, that confounding was an unlikely explanation for a very large risk ratio was enthusiastically embraced by the trial court, virtually verbatim from the plaintiffs’ Opposition (at 14):

“Finally, the Court rejects Defendants’ argument that Dr. Ix failed to consider confounding factors. Plaintiffs argued and Defendants did not dispute that, applying the Bradford Hill criteria, Dr. Ix calculated a pooled odds ratio of 11.46 for the five studies examined, which is higher than the 10 to 1 odds ratio of smoking and lung cancer that the Reference Manual on Scientific Evidence deemed to be “so high that it is extremely difficult to imagine any bias or confounding factor that may account for it.” Id. at 376.  Thus, from Dr. Ix’s perspective, the odds ratio was so high that a confounding factor was improbable. Additionally, in his deposition, Dr. Ix acknowledged that the cofactors that have been suggested are difficult to confirm and therefore he did not try to specifically quantify them. (Doc # : 772-20, at 27.) This acknowledgement of cofactors is essentially equivalent to the Agarwal article’s representation that “[t]here may have been unmeasured variables in the studies confounding the relationship between GBCAs and NSF,” cited by Defendants as a representative model for properly considering confounding factors. (See Doc # : 772, at 4-5.)”

Gadolinium at *24.

The real problem is that the defendant’s challenge pointed only to possible, unidentified causal agents.  The smoking/lung cancer analogy, provided by the Reference Manual, was inapposite.  Smoking is indeed a large risk factor for lung cancer, with relative risks over 20.  Although there are other human lung carcinogens, none is consistently in the same order of magnitude (not even asbestos), and as a result, confounding can generally be excluded as an explanation for the large risk ratios seen in smoking studies.  It would be easy to imagine that there are confounders for NSF, especially given that it is relatively recently been identified, and that they might be of the same or greater magnitude as that suggested for the gadolinium contrast media.  The defense, however, failed to identify confounders that actually threatened the validity of any of the individual studies, or of the meta-analysis.

CONCLUSION

The defense hinted at the general unreliability of meta-analysis, with references to References Manual on Scientific Evidence at 381 (2d ed. 2000)(noting problems with meta-analysis), and other, relatively dated papers.  See, e.g., John Bailar, “Assessing Assessments,” 277 Science 529 (1997)(arguing that “problems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with [meta-analysis].”).  The Reference Manual language carried over into the third edition, is out of date, and represents a failing of the new edition.  See The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011).

The plaintiffs came forward with some descriptive statistics of the prevalence of meta-analysis in contemporary biomedical literature.  The defendants gave mostly argument; there is a dearth of citation to defense expert witnesses, affidavits, consensus papers on meta-analysis, textbooks, papers by leading authors, and the like.  The defense challenge suffered from being diffuse and unfocused; it lost persuasiveness by including weak, collateral issues such as claiming that Dr. Ix was opining “only” on a “more likely than not” basis, and that he had not consulted with other experts, and that he had failed to use randomized trial data.  The defense was quick to attack perceived deficiencies, but it did not illustrate how or why the alleged deficiencies threatened the validity of Dr. Ix’s meta-analysis.  Indeed, even when the defense made strong points, such as the exclusion of zero-event cohort studies, it failed to document that such studies existed, and that their inclusion might have made a difference.

 

Haack’s Holism vs. Too Much of Nothing

May 24th, 2012

Professor Haack has been an unflagging critic of Daubert and its progeny.  Haack’s major criticism of the Daubert and Joiner cases is based upon the notion that the Supreme Court engaged in a “divide and conquer” strategy in its evaluation of plaintiffs’ evidence, when it should have been considered the “whole gemish” (my phrase, not Haack’s).  See Susan Haack, “Warrant, Causation, and the Atomism of Evidence Law,” 5 Episteme 253, 261 (2008)[hereafter “Warrant“];  “Proving Causation: The Holism of Warrant and the Atomism of Daubert,” 4 J. Health & Biomedical Law 273, 304 (2008)[hereafter “Proving Causation“].

ATOMISM vs. HOLISM

Haack’s concern is that combined pieces of evidence, none individually sufficient to warrant an opinion of causation, may provide the warrant when considered jointly.  Haack reads Daubert to require courts to screen each piece of evidence relied upon an expert witness for reliability, a process that can interfere with discerning the conclusion most warranted by the totality or “the mosaic” of the evidence:

“The epistemological analysis offered in this paper reveals that a combination of pieces of evidence, none of them sufficient by itself to warrant a causal conclusion to the legally required degree of proof, may do so jointly. The legal analysis offered here, interlocking with this, reveals that Daubert’s requirement that courts screen each item of scientific expert testimony for reliability can actually impede the process of arriving at the conclusion most warranted by the evidence proffered.”

Warrant at 253.

But there is nothing in Daubert, or its progeny, to support this crude characterization of the judicial gatekeeping function.  Indeed, there is another federal rule of evidence, Rule 703, which is directed at screening the reasonableness of reliance upon a single piece of evidence.

Surely there are times when the single, relied upon study is one that an expert in the relevant field should and would not rely upon because of invalidity of the data, the conduct of the study, or the study’s analysis of the data.  Indeed, there may well be times, especially in litigation contexts, when an expert witness has relied upon a collection of studies, none of which is reasonably relied upon by experts in the discipline.

Rule 702, which Daubert was interpreting, was, and is, focused with an expert witness’s opinion:

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:

(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data;

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case

To be sure, Chief Justice Rehnquist, in explicating why plaintiffs’ expert witnesses’ opinions must be excluded in Joiner, noted the wild, irresponsible, unwarranted inferential leaps made in interpreting specific pieces of evidence.  The plaintiffs’ expert witnesses’ interpretation of a study, involving massive injections of PCBs into the peritoneum of baby mice, with consequent alveologenic adenomas, provided an amusing example of how they, the putative experts, had outrun their scientific headlights by over-interpreting a study in a different species, at different stages of maturation, with different routes of exposure, with different, non-cancerous outcomes.  These examples were effectively aimed at showing that the overall opinion advanced by Rabbi Teitelbaum and others, on behalf of plaintiffs in Joiner, were unreliable.  Haack, however, sees a philosophical kinship with Justice Stevens, who in dissent, argued to give plaintiffs’ expert witnesses a “pass,” based upon the whole evidentiary display.  General Electric Co. v. Joiner, 522 U.S. 136, 153 (1997) (Justice Stevens, dissenting) (“It is not intrinsically ‘unscientific’ for experienced professionals to arrive at a conclusion by weighing all available evidence.”). The problem, of course, is that sometimes “all available evidence” includes a good deal of junk, irrelevant, or invalid studies.  Sometimes “all available evidence” is just too much of nothing.

Perhaps Professor Haack was hurt that she was not cited by Justice Blackmun in Daubert, along with Popper and Hempel.  Haack has written widely on philosophy of science, and on epistemology, and she clearly believes her theory of knowledge would provide a better guide to the difficult task of screening expert witness opinions.

When Professor Haacks describes the “degree to which evidence warrants a conclusion,” she identifies three factors, which in part, require assessment of the strength of individual studies:

(i) how strong the connection is between the evidence and the conclusion (supportiveness);

(ii) how solid each of the elements of the evidence is, independent of the conclusion (independent security); and

(iii) how much of the relevant evidence the evidence includes (comprehensiveness).

Warrant at 258

Of course, supportiveness includes interconnectedness, but nothing in her theory of “warrant” excuses or omits rigorous examination of individual pieces of evidence in assessing a causal claim.

DONE WRONG

Haack seems enamored of the holistic approach taken by Dr. Done, plaintiffs’ expert witness in the Bendectin litigation. Done tried to justify his causal opinions based upon the entire “mosaic” of evidence. See, e.g., Oxendine v. Merrell Dow Pharms. Inc, 506 A.2d 1100, 1108 (D.C 1986)(“[Dr. Done] conceded his inability to conclude that Bendectin is a teratogen based on any of the individual studies which he discussed, but he also made quite clear that all these studies must be viewed together, and that, so viewed, they supplied his conclusion”).

Haack tilts at windmills by trying to argue the plausibility of Dr. Done’s mosaic in some of the Bendectin cases.  She rightly points out that Done challenged the internal and external validity of the defendant’s studies.  Such challenges to the validity of either side’s studies are a legitimate part of scientific discourse, and certainly a part of legal argumentation, but attacks on validity of null studies are not affirmative evidence of an association.  Haack correctly notes that “absence of evidence that p is just that — an absence of evidence of evidence; it is not evidence that not-p.”  Proving Causation at 300.  But the same point holds with respect to Done’s challenges to Merrill Dow’s studies.  If those studies are invalid, and Merrill Dow lacks evidence that “not-p,” this lack is not evidence for Done in favor of p.

Given the lack of supporting epidemiologic data in many studies, and the weak and invalid data relied upon, Done’s causal claims were suspect and have come to be discredited.  Professor Ronald Allen notes that invoking the Bendectin litigation in defense of a “mosaic theory” of evidentiary admissibility is a rather peculiar move for epistemology:

“[T]here were many such hints of risk at the time of litigation, but it is now generally accepted that those slight hints were statistical aberrations or the results of poorly conducted studies.76 Bendectin is still prescribed in many places in the world, including Europe, is endorsed by the World Health Organization as safe, and has been vindicated by meta-analyses and the support of a number of epidemiological studies.77 Given the weight of evidence in favor of Bendectin’s safety, it seems peculiar to argue for mosaic evidence from a case in which it would have plainly been misleading.”

Ronald J. Allen & Esfand Nafisi, “Daubert and its Discontents,” 76 Brooklyn L. Rev. 131, 148 (2010).

Screening each item of “expert evidence” for reliability may deprive the judge of “the mosaic,” but that is not all that the judicial gatekeepers were doing in Bendectin or other Rule 702 cases.   It is all well and good to speak metaphorically about mosaics, but the metaphor and its limits were long ago acknowledged in the philosophy of science.  The suggestion that scraps of evidence from different kinds of scientific studies can establish scientific knowledge was rejected by the great mathematician, physicist, and philosopher of science, Henri Poincaré:

“[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”

Jules Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique)( “Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house.”).  Poincaré’s metaphor is more powerful than Haack’s and Done’s “mosaic” because it acknowledges that interlocking pieces of evidence may cohere as a building, or they may be no more than a pile of rubble.  Poorly constructed walls may soon revert to the pile of stones from which they came.  Much more is required than simply invoking the “mosaic” theory to bless this mess as a “warranted” claim to knowledge.

Haack’s point about aggregation of evidence is, at one level, unexceptionable.  Surely, the individual pieces of evidence, each inconclusive alone, may be powerful when combined.  An easy example is a series of studies, each with a non-statistically significant result of finding more disease than expected.  None of the studies alone can rule out chance as an explanation, and the defense might be tempted to argue that it is inappropriate to rely upon any of the studies because none is statistically significant.

The defense argument may be wrong in cases in which a valid meta-analysis can be deployed to combine the results into a summary estimate of association.  If a meta-analysis is appropriate, the studies collectively may allow the exclusion of chance as an explanation for the disparity from expected rates of disease in the observed populations.  [Haack misinterprets study “effect size” to be relevant to ruling out chance as explanation for the increased rate of the outcome of interest. Proving Causation at 297.]

The availability of meta-analysis, in some cases, does not mean that hand waving about the “combined evidence” or “mosaics” automatically supports admissibility of the causal opinion.  The gatekeeper would still have to contend with the criteria of validity for meta-analysis, as well as with bias and confounding in the underlying studies.

NECESSITY OF JUDGMENT

Of course, unlike the meta-analysis example, most instances of evaluating an entire evidentiary display are not quantitative exercises.  Haack is troubled by the qualitative, continuous nature of reliability, but the “in or out” aspect of ruling on expert witness opinion admissibility.  Warrant at 262.  The continuous nature of a reliability spectrum, however, does not preclude the practical need for a decision.  We distinguish young from old people, although we age imperceptibly by units of time that are continuous and capable of being specified with increasingly small magnitudes.  Differences of opinions or close cases are likely, but decisions are made in scientific contexts all the time.

FAGGOT FALLACY

Although Haack criticizes defendants for beguiling courts with the claimed “faggot fallacy,” she occasionally, acknowledges that there simply is not sufficient valid evidence to support a conclusion.  Indeed, she makes the case for why, in legal contexts, we will frequently be dealing with “unwarranted” claims:

“Against this background, it isn’t hard to see why the legal system has had difficulties in handling scientific testimony. It often calls on the weaker areas of science and/or on weak or marginal scientists in an area; moreover, its adversarial character may mean that even solid scientific information gets distorted; it may suppress or sequester relevant data; it may demand scientific answers when none are yet well-warranted; it may fumble in applying general scientific findings to specific cases; and it may fail to adapt appropriately as a relevant scientific field progresses.”

Susan Haack, ” Of Truth, in Science and in Law,” 73 Brooklyn L. Rev. 985, 1000 (2008).  It is difficult to imagine a more vigorous call for, and defense of, judicial gatekeeping of expert witness opinion testimony.

Haack seems to object to the scope and intensity of federal judicial gatekeeping, but her characterization of the legal context should awaken her to the need to resist admitting opinions on scientific issues when “none are yet well-warranted.” Id. at 1004 (noting that “the legal system quite often want[s] scientific answers when no warranted answers are available).  The legal system, however, does not “want” unwarranted “scientific” answers; only an interested party on one side or the other wants such a thing.  The legal systems wants a procedure for ensuring rejection of unwarranted claims, which may be passed off as properly warranted, due to the lack of sophistication of the intended audience.

TOO MUCH OF NOTHING

Despite her flirtation with Dr. Done’s holistic medicine, Haack acknowledges that sometimes a study or an entire line of studies is simply not valid, and they should not be part of the “gemish.”  For instance, in the context of meta-analysis, which requires pre-specified inclusionary and exclusionary criteria for studies, Haack acknowledges that a “well-designed and well-conducted meta-analysis” will include a determination “which studies are good enough to be included … and which are best disregarded.”  Proving Causation at 286.  Exactly correct.  Sometimes we simply must drill down to the individual study, and what we find may require us to exclude it from the meta-analysis.  The same could be said of any study that is excluded by appropriate exclusionary criteria.

Elsewhere, Haack acknowledges myriad considerations of validity or invalidity, which must be weighed as part of the gemish:

“The effects of S on animals may be different from its effects on humans. The effects of b when combined with a and c may be different from its effects alone, or when combined with x and/or y.52 Even an epidemiological study showing a strong association between exposure to S and elevated risk of D would be insufficient by itself: it might be poorly-designed and/or poorly-executed, for example (moreover, what constitutes a well-designed study – e.g., what controls are needed – itself depends on further information about the kinds of factor that might be relevant). And even an excellent epidemiological study may pick up, not a causal connection between S and D, but an underlying cause both of exposure to S and of D; or possibly reflect the fact that people in the very early stages of D develop a craving for S. Nor is evidence that the incidence of D fell after S was withdrawn sufficient by itself to establish causation – perhaps vigilance in reporting D was relaxed after S was withdrawn, or perhaps exposure to x, y, z was also reduced, and one or all of these cause D, etc.53

Proving Causation at 288.  These are precisely the sorts of reasons that make gatekeeping of expert witness opinions an important part of the judicial process in litigation.

RATS TO YOU

Similarly, Haack acknowledges that animal studies may be quite irrelevant to the issue at hand:

“The elements of E will also interlock more tightly the more physiologically similar the animals used in any animal studies are to human beings. The results of tests on hummingbirds or frogs would barely engage at all with epidemiological evidence of risk to humans, while the results of tests on mice, rats, guinea-pigs, or rabbits would interlock more tightly with such evidence, and the results of tests on primates more tightly yet. Of course, “similar” has to be understood as elliptical for “similar in the relevant respects;” and which respects are relevant may depend on, among other things, the mode of exposure: if humans are exposed to S by inhalation, for example, it matters whether the laboratory animals used have a similar rate of respiration. (Sometimes animal studies may themselves reveal relevant differences; for example, the rats on which Thalidomide was tested were immune to the sedative effect it had on humans; which should have raised suspicions that rats were a poor choice of experimental animal for this drug.)55 Again, the results of animal tests will interlock more tightly with evidence of risk to humans the more similar the dose of S involved. (One weakness of Joiner’s expert testimony was that the animal studies relied on involved injecting massive doses of PCBs into a baby mouse’s peritoneum, whereas Mr. Joiner had been exposed to much smaller doses when the contaminated insulating oil splashed onto his skin and into his eyes.)56 The timing of the exposure may also matter, e.g., when the claim at issue is that a pregnant woman’s being exposed to S causes this or that specific type of damage to the fetus.”

Proving Causation at 290.

WEIGHT OF THE EVIDENCE (WOE)

Just as she criticizes General Electric for advancing the “faggot fallacy” in Joiner, Haack criticizes the plaintiffs’ appeal to “weight of evidence methodology,” as misleadingly suggesting “that there is anything like an algorithm or protocol, some effective, mechanical procedure for calculating the combined worth of evidence.”  Proving Causation at 293.

INFERENCE  TO BEST EXPLANATION

Professor Haack cautiously evaluates the glib invocation of “inference to the best explanation” as a substitute for actual warrant of a claim to knowledge.  Haack acknowledges the obvious: the legal system is often confronted with claims lacking sufficient warrant.  She appropriately refuses to permit such claims to be dressed up as scientific conclusions by invoking their plausibility:

“Can we infer from the fact that the causes of D are as yet unknown, and that a plaintiff developed D after being exposed to S, that it was this exposure that caused Ms. X’s or Mr. Y’s D?102  No. Such evidence would certainly give us reason to look into the possibility that S is the, or a, cause of D. But loose talk of ‘inference to the best explanation’ disguises the fact that what presently seems like the most plausible explanation may not really be so – indeed, may not really be an explanation at all. We may not know all the potential causes of D, or even which other candidate-explanations we would be wise to investigate.”

Proving Causation at 305.  See also Warrant at 261 (invoking the epistemic category of Rumsfeld’s “known unknowns” and “unknown unknowns” to describe a recurring situation in law’s treatment of scientific claims)(U.S. Sec’y of Defense Donald Rumsfeld: “[T]here are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – there are things we do not know we don’t know. (Feb. 12, 2002)).

It is easy to see why the folks at SKAPP are so fond of Professor Haack’s writings, and why they have invited her to their conferences and meetings.  She has written close to a dozen articles critical of Daubert, each repeating the same mistaken criticisms of the gatekeeping process.  She has provided SKAPP and its plaintiffs’ lawyer sponsors with sound bites to throw at impressionable judges about the epistemological weakness of Daubert and its progeny.  In advancing this critique and SKAPP’s propaganda purposes, Professor Haack has misunderstood the gatekeeping enterprise.  She has, however, correctly identified the gatekeeping process as an exercise in determining whether an opinion possesses sufficient epistemic warrant.  Despite her enthusiasm for the dubious claims of Dr. Done, Haack acknowledges that “warrant” requires close attention to the internal and external validity of studies, and to rigorous analysis of a body of evidence.  Haack’s own epistemic analysis would be hugely improved and advanced by focusing on how the mosaic theory, or WOE, failed to hold up in some of the more egregious, pathological claims of health “effects” — Bendectin, silicone, electro-magnetic frequency, asbestos and colorectal cancer, etc.

Meta-Analysis of Observational Studies in Non-Pharmaceutical Litigations

February 26th, 2012

Yesterday, I posted on several pharmaceutical litigations that have involved meta-analytic studies.   Meta-analytic studies have also figured prominently in non-pharmaceutical product liability litigation, as well as in litigation over videogames, criminal recidivism, and eyewitness testimony.  Some, but not all, of the cases in these other areas of litigation are collected below.  In some cases, the reliability or validity of the meta-analyses were challenged; in some cases, the court fleetingly referred to meta-analyses relied upon the parties.  Some of the courts’ treatments of meta-analysis are woefully inadequate or erroneous.  The failure of the Reference Manual on Scientific Evidence to update its treatment of meta-analysis is telling.  See The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011).

 

Abortion (Breast Cancer)

Christ’s Bride Ministries, Inc. v. Southeastern Pennsylvania Transportation Authority, 937 F.Supp. 425 (E.D. Pa. 1996), rev’d, 148 F.3d 242 (3d Cir. 1997)

Asbestos

In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993)(“adding a series of positive but statistically insignificant SMRs [standardized mortality ratios] together does not produce a statistically significant pattern”), rev’d, 52 F.3d 1124 (2d Cir. 1995).

In Re Asbestos Litigation, Texas Multi District Litigation Cause No. 2004-03964 (June 30, 2005)(Davidson, J.)(“The Defendants’ response was presented by Dr. Timothy Lash.  I found him to be highly qualified and equally credible.  He largely relied on the report submitted to the Environmental Protection Agency by Berman and Crump (“B&C”).  He found the meta-analysis contained in B&C credible and scientifically based.  B&C has not been published or formally accepted by the EPA, but it does perform a valuable study of the field.  If the question before me was whether B&C is more credible than the Plaintiffs’ studies taken together, my decision might well be different.”)

Jones v. Owens-Corning Fiberglas, 288 N.J. Super. 258, 672 A.2d 230 (1996)

Berger v. Amchem Prods., 818 N.Y.S.2d 754 (2006)

Grenier v. General Motors Corp., 2009 WL 1034487 (Del.Super. 2009)

Benzene

Knight v. Kirby Inland Marine, Inc., 363 F. Supp. 2d 859 (N.D. Miss. 2005)(precluding proffered opinion that benzene caused bladder cancer and lymphoma; noting without elaboration or explanation, that meta-analyses are “of limited value in combining the results of epidemiologic studies based on observation”), aff’d, 482 F.3d 347 (5th Cir. 2007)

Baker v. Chevron USA, Inc., 680 F.Supp. 2d 865 (S.D. Ohio 2010)

Diesel Exhaust Exposure

King v. Burlington Northern Santa Fe Ry. Co., 277 Neb. 203, 762 N.W.2d 24 (2009)

Kennecott Greens Creek Mining Co. v. Mine Safety & Health Admin., 476 F.3d 946 (D.C. Cir. 2007)

Eyewitness Testimony

State of New Jersey v. Henderson, 208 N.J. 208, 27 A.3d 872 (2011)

Valle v. Scribner, 2010 WL 4671466 (C.D. Calif. 2010)

People v. Banks, 16 Misc.3d 929, 842 N.Y.S.2d 313 (2007)

Lead

Palmer Asarco Inc., 510 F.Supp.2d 519 (N.D. Okla. 2007)

PCBs

In re Paoli R.R. Yard PCB Litigation, 916 F.2d 829, 856-57 (3d Cir.1990) (‘‘There is some evidence that half the time you shouldn’t believe meta-analysis, but that does not mean that meta-analyses are necessarily in error. It means that they are, at times, used in circumstances in which they should not be.’’) (internal quotation marks and citations omitted), cert. denied, 499 U.S. 961 (1991)

Repetitive Stress

Allen v. International Business Machines Corp., 1997 U.S. Dist. LEXIS 8016 (D. Del. 1997)

Tobacco

Flue-Cured Tobacco Cooperative Stabilization Corp. v. United States Envt’l Protection Agency, 4 F.Supp.2d 435 (M.D.N.C. 1998), vacated by, 313 F.3d 852 (4th Cir. 2002)

Tocolytics – Medical Malpractice

Hurd v. Yaeger, 2009 WL 2516874 (M.D. Pa. 2009)

Toluene

Black v. Rhone-Poulenc, Inc., 19 F.Supp.2d 592 (S.D.W.Va. 1998)

Video Games (Violent Behavior)

Brown v. Entertainment Merchants Ass’n, ___ U.S.___, 131 S.Ct. 2729 (2011)

Entertainment Software Ass’n v. Blagojevich, 404 F.Supp.2d 1051 (N.D. Ill. 2005)

Entertainment Software Ass’n v. Hatch, 443 F.Supp.2d 1065 (D. Minn. 2006)

Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950 (9th Cir. 2009)

Vinyl Chloride

Taylor v. Airco, 494 F. Supp. 2d 21 (D. Mass. 2007)(permitting opinion testimony that vinyl chloride caused intrahepatic cholangiocarcinoma, without commenting upon the reasonableness of reliance upon the meta-analysis cited)

Welding

Cooley v. Lincoln Electric Co., 693 F.Supp.2d 767 (N.D. Ohio. 2010)