When the Supreme Court delivered its decision in Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309 (2011), a colleague, David Venderbush from Alston & Bird LLP, and I wrote a Washington Legal Foundation Legal Backgrounder, in which we predicted that plaintiffs’ counsel would distort the holding, and inflate the dicta of the opinion. Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011). Our prediction was sadly all-too accurate. Not only was the context of the Matrixx distorted, but several district courts appeared to adopt the dicta on statistical significance as though it represented the holding of the case.
The Matrixx decision, along with the few district court opinions that had embraced its dicta, was urged as the basis for denying a defense challenge to the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist, in the Zoloft MDL. The trial court, however, correctly discerned several methodological shortcomings and failures, including Dr. Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).
Plaintiffs (through their Plaintiffs’ Steering Committee (PSC) in the Zoloft MDL) were undaunted and moved for reconsideration, asserting that the MDL trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx, and a Third Circuit decision in DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The MDL trial judge, however, deftly rebuffed the plaintiffs’ use of Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration).
In rejecting the motion for reconsideration, the Zoloft MDL trial judge noted that the PSC had previously cited Matrixx, and that the Court had addressed the case in its earlier ruling. 2015 WL 314149, at *2-3. The MDL Court then proceeded to expand upon its earlier ruling, and to explain how Matrixx was largely irrelevant to the Rule 702 context of Pfizer’s challenge to Dr. Bérard. There were, to be sure, some studies with nominal statistically significant results, for some birth defects, among children of mothers who took Zoloft in their first trimester of pregnancy. As Judge Rufe explained, statistical significance, or the lack thereof, was only one item in a fairly long list of methodological deficiencies in Dr. Bérard’s causation opinions:
“The [original] opinion set forth a detailed and multi-faceted rationale for finding Dr. Bérard’s testimony unreliable, including her inattention to the principles of replication and statistical significance, her use of certain principles and methods without demonstrating either that they are recognized by her scientific community or that they should otherwise be considered scientifically valid, the unreliability of conclusions drawn without adequate hypothesis testing, the unreliability of opinions supported by a ‛cherry-picked’ sub-set of research selected because it was supportive of her opinions (without adequately addressing non-supportive findings), and Dr. Bérard’s failure to reconcile her currently expressed opinions with her prior opinions and her published, peer-reviewed research. Taking into account all these factors, as well as others discussed in the Opinion, the Court found that Dr. Bérard departed from well-established epidemiological principles and methods, and that her opinion on human causation must be excluded.”
Id. at *1.
In citing the multiple deficiencies of the proffered expert witness, the Zoloft MDL Court thus put its decision well within the scope of the Third Circuit’s recent precedent of affirming the exclusion of Dr. Bennet Omalu, in Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir.2011). The Zoloft MDL Court further defended its ruling by pointing out that it had not created a legal standard requiring statistical significance, but rather had made a factual finding that epidemiologist, such as the challenged witness, Dr. Anick Bérard, would use some measure of statistical significance in reaching conclusions in her discipline of epidemiology. 2015 WL 314149, at *2.
On the plaintiffs’ motion for reconsideration, the Zoloft Court revisited the Matrixx case, properly distinguishing the case as a securities fraud case about materiality of non-disclosed information, not about causation. 2015 WL 314149, at *4. Although the MDL Court could and should have identified the Matrixx language as clearly obiter dicta, it did confidently distinguish the Supreme Court holding about pleading materiality from its own task of gatekeeping expert witness testimony on causation in a products liability case:
“Because the facts and procedural posture of the Zoloft MDL are so dissimilar from those presented in Matrixx, this Court reviewed but did not rely upon Matrixx in reaching its decision regarding Dr. Bérard. However, even accepting the PSC’s interpretation of Matrixx, the Court’s Opinion is consistent with that ruling, as the Court reviewed Dr. Bérard’s methodology as a whole, and did not apply a bright-line rule requiring statistically significant findings.”
Id. at *4.
In mounting their challenge to the MDL Court’s earlier ruling, the Zoloft plaintiffs asserted that the Court had failed to credit Dr. Bérard’s reliance upon what Dr. Bérard called the “Rothman approach.” This approach, attribution to Professor Kenneth Rothman had received some attention in the Bendectin litigation in the Third Circuit, where plaintiffs sought to be excused from their failure to show statistically significant associations when claiming causation between maternal use of Bendectin and infant birth defects. DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The Zoloft MDL Court pointed out that the Circuit, in DeLuca, had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:
“by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicial of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”
2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).
In the Zoloft MDL, the plaintiffs not only offered an erroneous interpretation of the Third Circuit’s precedents in DeLuca, they also failed to show that the “Rothman” approach had become generally accepted in over two decades since DeLuca. 2015 WL 314149, at *4. Indeed, the hearing record was quite muddled about what the “Rothman” approach involved, other than glib, vague suggestions that the approach would have countenanced Dr. Bérard’s selective, over-reaching analysis of the extant epidemiologic studies. The plaintiffs did not call Rothman as an expert witness; nor did they offer any of Rothman’s publications as exhibits at the Zoloft hearing. Although Professor Rothman has criticized the overemphasis upon p-values and significance testing, he has never suggested that researchers and scientists should ignore random error in interpreting research data. Nevertheless, plaintiffs attempted to invoke some vague notion of a Rothman approach that would ignore confidence intervals, attained significance probability, multiplicity, bias, and confounding. Ultimately, the MDL Court would have none of it. The Court held that the Rothman Approach (whatever that is), as applied by Dr. Bérard, did not satisfy Rule 702.
The testimony at the Rule 702 hearing on the so-called “Rothman approach” had been sketchy at best. Dr. Bérard protested, perhaps too much, when asked about her having ignored p-values:
“I’m not the only one saying that. It’s really the evolution of the thinking of the importance of statistical significance. One of my professors and also a friend of mine at Harvard, Ken Rothman, actually wrote on it – wrote on the topic. And in his book at the end he says obviously what I just said, validity should not be confused with precision, but the third bullet point, it’s saying that the lack of statistical significance does not invalidate results because sometimes you are in the context of rare events, few cases, few exposed cases, small sample size, exactly – you know even if you start with hundreds of thousands of pregnancies because you are looking at rare events and if you want to stratify by exposure category, well your stratum becomes smaller and smaller and your precision decreases. I’m not the only one saying that. Ken Rothman says it as well, so I’m not different from the others. And if you look at many of the studies published nowadays, they also discuss that as well.”
Notes of Testimony of Dr. Anick Bérard, at 76:21- 77:14 (April 9, 2014). See also Notes of Testimony of Dr. Anick Bérard, at 211 (April 11, 2014) (discussing non-statistically significant findings as a “trend,” and asserting that the lack of a significant finding does not mean that there is “no effect”). Bérard’s invocation of Rothman here is accurate but unhelpful. Rothman and Bérard are not alone in insisting that confidence intervals provide a measure of precision of an estimate, and that we should be careful not to interpret the lack of significance to mean no effect. But the lack of significance cannot be used to interpret data to show an effect.
At the Rule 702 hearing, the PSC tried to bolster Dr. Bérard’s supposed reliance upon the “Rothman approach” in cross-examining Pfizer’s expert witness, Dr. Stephen Kimmel:
“Q. You know who Dr. Rothman is, the epidemiologist?
Q. You actually took a course from Dr. Rothman, didn’t you?
A. I did when I was a student way back.
Q. He is a well-known epidemiologist, isn’t he?
A. Yes, he is.
Q. He has published this book, Modern Epidemiology. Do you have a copy of this?
A. I do.
Q. Do you – Have you ever read it?
A. I read his earlier edition. I have not read the most recent edition.
Q. There’s two other authors, Sander Greenland and Tim Lash. Do you know either one of them?
A. I know Sander. I don’t know Tim.
Q. Dr. Rothman has some – he has written about confidence intervals and statistical significance for some time, hasn’t he?
A. He has.
Q. Do you agree with him that statistical significance is a not matter of validity. It’s a matter of precision?
A. It’s a matter of – well, confidence intervals are matters of precision. P-values are not.
Q. Okay. I want to put up a table and see if you are in agreement with Dr. Rothman. This is the third edition of Modern Epidemiology. And he has – and ignore my brother’s handwriting. But there is an hypothesized rate ratio under 10-3. It says: p-value function from which one can find all confidence limits for a hypothetical study with a rate ratio estimate of 3.1 Do you see that there?
A. Yes. I don’t see the top of the figure, not that it matters.
Q. I want to make sure. The way I understand this, he is giving us a hypothesis that we have a relative risk of 3.1 and it [presumably a 95% confidence interval] crosses 1, meaning it’s not statistically significant. Is that fair?
A. Well, if you are using a value of .05, yes. And again, if this is a single test and there’s a lot of things that go behind it. But, yes, so this is a total hypothetical.
A. I’ sorry. He’s saying here is a hypothetical based on math. And so here is – this is what we would propose.
Q. Yes, I want to highlight what he says about this figure and get your thoughts on it. He says:
The message of figure 10-3 is that the example data are more compatible with a moderate to strong association than with no association, assuming the statistical model used to construct the function is correct.
Q. Would you agree with that statement?
A. Assuming the statistical model is correct. And the problem is, this is a hypothetical.
Q. Sure. So let’s just assume. So what this means to sort of put some meat on the bone, this means that although we cross 1 and therefore are statistically
significant [sic, non-significant], he says the more likely truth here is that there is a moderate to strong effect rather than no effect?
A. Well, you know he has hypothesized this. This is not used in common methods practice in pharmacoepi. Dr. Rothman has lots of ideas but it’s not part of our standard scientific method.
Notes of Testimony of Dr. Stephen Kimmel, at 126:2 to 128:20.
Nothing very concrete about the “Rothman approach” is put before the MDL Court, either through Dr. Bérard or Dr. Kimmel. There are, however, other instructive aspects to the plaintiff’s counsel’s examination. First, the referenced portion of the text, Modern Epidemiology, is a discussion of p-value functions, not of p-values or of confidence intervals per se. Modern Epidemiology at 158-59 (3d ed. 2008). Dr. Bérard never discussed p-value functions in her report or in her testimony, and Dr. Kimmel testified, without contradiction, that such p-value functions are “not used in common methods practice.” Second, the plaintiff’s counsel never marked and offered the Rothman text as an exhibit for the MDL Court to consider. Third, the cross-examiner first asked about the implication for a hypothetical association, and then, when he wanted to “put some meat on the bone” changed the word used in Rothman’s text, “association,” to “effect.” The word “effect” does not appear in Rothman’s text at the referenced discussion about p-value functions. Fortunately, the MDL Court was not poisoned by the “meat on the bone.”
The Pit and the Pendulum
Another document glibly referenced but not provided to the MDL Court was the publication of Sir Austin Bradford Hill’s presidential address to the Royal Society of Medicine on causation. The MDL Court acknowledged that the PSC had argued that the emphasis upon statistical significance was contrary to Hill’s work and teaching. 2015 WL 314149, at *5. In the Court’s words:
“the PSC argues that the Court’s finding regarding the importance of statistical significance in the field of epidemiology is inconsistent with the work of Bradford Hill. The PSC points to a 1965 address by Sir Austin Bradford Hill, which it has not previously presented to the Court, except in opening statements of the Daubert hearings.20 The PSC failed to put forth evidence establishing that Bradford Hill’s statement that ‛I wonder whether the pendulum has not swung too far [in requiring statistical significance before drawing conclusions]’ has, in the decades since that 1965 address, altered the importance of statistical significance to scientists in the field of epidemiology.”
Id. This failure, identified by the Court, is hardly surprising. The snippet of a quotation from Hill would not sustain the plaintiffs’ sweeping generalization. The quoted language in context may help to explain why Hill’s paper was not provided:
“I wonder whether the pendulum has not swung too far – not only with the attentive pupils but even with the statisticians themselves. To decline to draw conclusions without standard errors can surely be just as silly? Fortunately I believe we have not yet gone so far as our friends in the USA where, I am told, some editors of journals will return an article because tests of significance have not been applied. Yet there are innumerable situations in which they are totally unnecessary – because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too small to be of any practical importance. What is worse the glitter of the t table diverts attention from the inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnel volunteer for some procedure or interview, 20% of patients treated in some particular way are lost to sight, 30% of a randomly-drawn sample are never contracted. The sample may, indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house and carried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers.’ The writer, the editor and the reader are unmoved. The magic formulae are there.”
Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 299 (1965).
In the Zoloft cases, no expert witness was prepared to state that the disparity was “grotesquely obvious,” or “negligible.” And Bradford Hill’s larger point was that bias and confounding often dwarf considerations of random error, and that there are many instances in which significance testing is unavailing or unhelpful. And in some studies, with large “effect sizes,” statistical significance testing may be beside the point.
Hill’s presidential address to the Royal Society of Medicine commemorated his successes in epidemiology, and we need only turn to Hill’s own work to see how prevalent was his use of measurements of significance probability. See, e.g., Richard Doll & Austin Bradford Hill, “Smoking and Carcinoma of the Lung: Preliminary Report,” Brit. Med. J. 740 (Sept. 30, 1950); Medical Research Council, “Streptomycin Treatment of Pulmonary Tuberculosis,” Brit. Med. J. 769 (Oct. 30, 1948).
Considering the misdirection on Rothman and on Hill, the Zoloft MDL Court did an admirable job in unraveling the Matrixx trap set by counsel. The Court insisted upon parsing the Bradford Hill factors, over Pfizer’s objection, despite the plaintiffs’ failure to show “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” which Bradford Hill insisted was the prerequisite for the exploration of the nine factors he set out in his classic paper. Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Given the outcome, the Court’s questionable indulgence of plaintiffs’ position was ultimately harmless.
 See, e.g., In re Chantix (Varenicline) Prods. Liab. Litig., 2012 U.S. Dist. LEXIS 130144, at *22 (N.D. Ala. 2012); Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012); In re Celexa & Lexapro Prods. Liab. Litig., ___ F.3d ___, 2013 WL 791780 (E.D. Mo. 2013).
 The Court’s reasoning on this point begged the question whether an ordinary clinician, ignorant of the standards, requirements, and niceties of statistical reasoning and inference, would be allowed to testify, unconstrained by any principled epidemiologic reasoning about random or systematic error. It is hard to imagine that Rule 702 would countenance such an end-run around the requirements of sound science.
 Adhering to Bradford Hill’s own admonition might have saved the Court the confusion of describing statistical significance as a measure of strength of association. 2015 WL 314149, at *2.