TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Education of Judge Rufe – The Zoloft MDL

April 9th, 2016

The Honorable Cynthia M. Rufe is a judge on the United States District Court, for the Eastern District of Pennsylvania.  Judge Rufe was elected to a judgeship on the Bucks County Court of Common Pleas in 1994.  She was appointed to the federal district court in 2002. Like most state and federal judges, little in her training and experience as a lawyer prepared her to serve as a gatekeeper of complex expert witness scientific opinion testimony.  And yet, the statutory code of evidence, and in particular, Federal Rules of Evidence 702 and 703, requires her do just that.

The normal approach to MDL cases is marked by the Field of Dreams: “if you build it, they will come.” Last week, Judge Rufe did something that is unusual in pharmaceutical litigation; she closed the gate and sent everyone home. In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016).

Her Honor’s decision was hardly made in haste.  The MDL began in 2012, and proceeded in a typical fashion with case management orders that required the exchange of general causation expert witness reports. The plaintiffs’ steering committee (PSC), acting for the plaintiffs, served the report of only one epidemiologist, Anick Bérard, who took the position that Zoloft causes virtually every major human congenital anomaly known to medicine. The defendants challenged the admissibility of Bérard’s opinions.  After extensive briefings and evidentiary hearings, the trial court found that Bérard’s opinions were riddled with inconsistent assessments of studies, eschewed generally accepted methods of causal inference, ignored contrary evidence, adopted novel, unreliable methods of endorsing “trends” in studies, and failed to address epidemiologic studies that did not support her subjective opinions. In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D.Pa.2014). The trial court permitted plaintiffs an opportunity to seek reconsideration of Bérard’s exclusion, which led to the trial court’s reaffirming its previous ruling. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 314149, at *2 (E.D.Pa. Jan. 23, 2015).

Notwithstanding the PSC’s claims that Bérard was the best qualified expert witness in her field and that she was the only epidemiologist needed to support the plaintiffs’ causal claims, the MDL court indulged the PSC by permitting plaintiffs another bite at the apple.  Over defendants’ objections, the court permitted the PSC to name yet another expert witness, statistician Nicholas Jewell, to do what Bérard had failed to do: proffer an opinion on general causation supported by sound science.  In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 115486, at * 2 (E.D.Pa. Jan. 7, 2015).

As a result of this ruling, the MDL dragged on for over a year, in which time, the PSC served a report by Jewell, and then the defendants conducted a discovery deposition of Jewell, and lodged a new Rule 702 challenge.  Although Jewell brought more statistical sophistication to the task, he could not transmute lead into gold; nor could he support the plaintiffs’ causal claims without committing most of the same fallacies found in Bérard’s opinions.  After another round of Rule 702 briefs and hearings, the MDL court excluded Jewell’s unwarranted causal opinions. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D.Pa. Dec. 2, 2015).

The successive exclusions of Bérard and Jewell left the MDL court in a peculiar position. There were other witnesses, Robert Cabrera, a teratologist, Michael Levin, a molecular biologist, and Thomas Sadler, an embryologist, whose opinions addressed animal toxicologic studies, biological plausibility, and putative mechanisms.  These other witnesses, however, had little or no competence in epidemiology, and they explicitly relied upon Bérard’s opinions with respect to human outcomes.  As a result of Bérard’s exclusion, these witnesses were left free to offer their views about what happens in animals at high doses, or about theoretical mechanisms, but they were unable to address human causation.

Although the PSC had no expert witnesses who could legitimately offer reasonably supported opinions about the causation of human birth defects, the plaintiffs refused to decamp and leave the MDL forum. Faced with the prospect of not trying their cases to juries, the PSC instead tried the patience of the MDL judge. The PSC pulled out the stops in adducing weak, irrelevant, and invalid evidence to support their claims, sans epidemiologic expertise. The PSC argued that adverse event reports, internal company documents that discussed possible associations, the biological plausibility opinions of Levin and Sadler, the putative mechanism opinions of Cabrera, differential diagnoses offered to support specific causation, and the hip-shot opinions of a former-FDA-commissioner-for-hire, David Kessler could come together magically to supply sufficient evidence to have their cases submitted to juries. Judge Rufe saw through the transparent effort to manufacture evidence of causation, and granted summary judgment on all remaining Zoloft cases in the MDL. s In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799, at *4 (E.D. Pa. April 5, 2016).

After a full briefing and hearing on Bérard’s opinion, a reconsideration of Bérard, a permitted “do over” of general causation with Jewell, a full briefing and hearing on Jewell’s opinions, the MDL court was able to deal deftly with the snippets of evidence “cobbled together” to substitute for evidence that might support a conclusion of causation. The PSC’s cobbled case was puffed up to give the appearance of voluminous evidence, in 200 exhibits that filled six banker’s boxes.  Id. at *5. The ruse was easily undone; most of the exhibits and purported evidence were obvious rubbish. “The quantity of the evidence is not, however, coterminous with the quality of evidence with regard to the issues now before the Court.” Id. The banker’s boxes contained artifices such as untranslated foreign-language documents, and company documents relating to the development and marketing of the medication. The PSC resubmitted reports from Levin, Cabrera, and Sadler, whose opinions were already adjudicated to be incompetent, invalid, irrelevant, or inadequate to support general causation.  The PSC pointed to the specific causation opinions of a clinical cardiologist, Ra-Id Abdulla, M.D., who proffered dubious differential etiologies, ruling in Zoloft as a cause of individual children’s birth defects, despite his inability to rule out truly known and unknown causes in the differential reasoning.  The MDL court, however, recognized that “[a] differential diagnosis assumes that general causation has been established,” id. at *7, and that Abdulla could not bootstrap general causation by purporting to reach a specific causation opinion (even if those specific causation opinions were legitimate).

The PSC submitted the recent consensus statement of the American Statistical Association (ASA)[1], which it misrepresented to be an epidemiologic study.  Id. at *5. The consensus statement makes some pedestrian pronouncements about the difference between statistical and clinical significance, about the need for other considerations in addition to statistical significance, in supporting causal claims, and the lack of bright-line distinctions for statistical significance in assessing causality.  All true, but immaterial to the PSC’s expert witnesses’ opinions that over-endorsed statistical significance in the few instances in which it was shown, and over-interpreted study data that was based upon data mining and multiple comparisons, in blatant violation of the ASA’s declared principles.

Stretching even further for “human evidence,” the PSC submitted documentary evidence of adverse event reports, as though they could support a causal conclusion.[2]  There are about four million live births each year, with an expected rate of serious cardiac malformations of about one per cent.[3]  The prevalence of SSRI anti-depressant use is at least two per cent, which means that we would expect 800 cardiac birth defects each year to occur in children of mother’s who took SSRI anti-depressants in the first trimester. If Zoloft had an average market share of all the SSRIs of about 25 per cent, then 200 cardiac defects each year would occur in children born to mothers who took Zoloft.  Given that Zoloft has been on the market since the early 1990s, we would expect that there would be thousands of children, exposed to Zoloft during embryogenesis, born with cardiac defects, if there was nothing untoward about maternal exposure to the medication.  Add the stimulated reporting of adverse events from lawyers, lawyer advertising, and lawyer instigation, you have manufactured evidence not probative of causation at all.[4] The MDL court cut deftly and swiftly through the smoke screen:

“These reports are certainly relevant to the generation of study hypotheses, but are insufficient to create a material question of fact on general causation.”

Id. at *9. The MDL court recognized that epidemiology was very important in discerning a causal connection between a common exposure and a common outcome, especially when the outcome has an expected rate in the general population. The MDL court stopped short of holding that epidemiologic evidence was required (which on the facts of the case would have been amply justified), but instead supported its ratio decidendi on the need to account for the extant epidemiology that contradicted or failed to support the strident and subjective opinions of the plaintiffs’ expert witnesses. The MDL court thus gave plaintiffs every benefit of the doubt by limiting its holding on the need for epidemiology to:

“when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why that body of research does not contradict or undermine their opinion.”

Id. at *5, quoting from In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449, 476 (E.D. Pa. 2014).

The MDL court also saw through the thin veneer of respectability of the testimony of David Kessler, a former FDA commissioner who helped make large fortunes for some of the members of the PSC by the feeding frenzy he created with his moratorium on silicone gel breast implants.  Even viewing Kessler’s proffered testimony in the most charitable light, the court recognized that he offered little support for a causal conclusion other than to delegate the key issues to epidemiologists. Id. at *9. As for the boxes of regulatory documents, foreign labels, and internal company memoranda, the MDL court found that these documents did not raise a genuine issue of material fact concerning general causation:

“Neither these documents, nor draft product documents or foreign product labels containing language that advises use of birth control by a woman taking Zoloft constitute an admission of causation, as opposed to acknowledging a possible association.”

Id.

In the end, the MDL court found that the PSC’s many banker boxes of paper contained too much of nothing for the issue at hand.  Having put the defendants through the time and expense of litigating and re-litigating these issues, nothing short of dismissing the pending cases was a fair and appropriate outcome to the Zoloft MDL.

_______________________________________

Given the denouement of the Zoloft MDL, it is worth considering the MDL judge’s handling of the scientific issues raised, misrepresented, argued, or relied upon by the parties.  Judge Rufe was required, by Rules 702 and 703, to roll up her sleeves and assess the methodological validity of the challenged expert witnesses’ opinions.  That Her Honor was able to do this is a testament to her hard work. Zoloft was not Judge Rufe’s first MDL, and she clearly learned a lot from her previous judicial assignment to an MDL for Avandia personal injury actions.

On May 21, 2007, the New England Journal of Medicine published online a seriously flawed meta-analysis of cardiovascular disease outcomes and rosiglitazone (Avandia) use.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  The Nissen article did not appear in print until June 14, 2007, but the first lawsuits resulted within a day or two of the in-press version. The lawsuits soon thereafter reached a critical mass, with the inevitable creation of a federal court Multi-District Litigation.

Within a few weeks of Nissen’s article, the Annals of Internal Medicine published an editorial by Cynthia Mulrow, and other editors, in which questioned the Nissen meta-analysis[5], and introduced an article that attempted to replicate Nissen’s work[6].  The attempted replication showed that the only way Nissen could have obtained his nominally statistically significant result was to have selected a method, Peto’s fixed effect method, known to be biased for use with clinical trials with uneven arms. Random effect methods, more appropriate for the clinically heterogeneous clinical trials, consistently failed to replicate the Nissen result. Other statisticians weighed in and pointed out that using the risk difference made much more sense when there were multiple trials with zero events in one or the other or both arms of the trials. Trials with zero cardiovascular events in both arms represented important evidence of low, but equal risk, of heart attacks, which should be captured in an appropriate analysis.  When the risk difference approach was used, with exact statistical methods, there was no statistically significant increase in risk in the dataset used by Nissen.[7] Other scientists, including some of Nissen’s own colleagues at the Cleveland Clinic, and John Ioannidis, weighed in to note how fragile and insubstantial the Nissen meta-analysis was[8]:

“As rosiglitazone case demonstrates, minor modifications of the meta-analysis protocol can change the statistical significance of the result.  For small effects, even the direction of the treatment effect estimate may change.”

Nissen achieved his political objective with his shaky meta-analysis.  The FDA convened an Advisory Committee meeting, which in turn resulted in a negative review of the safety data, and the FDA’s imposition of warnings and a Risk Evaluation and Mitigation Strategy, which all but prohibited use of rosiglizone.[9]  A clinical trial, RECORD, had already started, with support from the drug sponsor, GlaxoSmithKline, which fortunately was allowed to continue.

On a parallel track to the regulatory activities, the federal MDL, headed by Judge Rufe, proceeded to motions and a hearing on GSK’s Rule 702 challenge to plaintiffs’ evidence of general causation. The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that showed significant diffidence in addressing scientific issues.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011.

After Judge Rufe denied GSK’s challenges to the admissibility of plaintiffs’ expert witnesses’ causation opinions in the Avandia MDL, the RECORD trial was successfully completed and published.[10]  RECORD was a long term, prospectively designed randomized cardiovascular trial in over 4,400 patients, followed on average of 5.5 yrs.  The trial was designed with a non-inferiority end point of ruling out a 20% increased risk when compared with standard-of-care diabetes treatment The trial achieved its end point, with a hazard ratio of 0.99 (95% confidence interval, 0.85-1.16) for cardiovascular hospitalization and death. A readjudication of outcomes by the Duke Clinical Research Institute confirmed the published results.

On Nov. 25, 2013, after convening another Advisory Committee meeting, the FDA announced the removal of most of its restrictions on Avandia:

“Results from [RECORD] showed no elevated risk of heart attack or death in patients being treated with Avandia when compared to standard-of-care diabetes drugs. These data do not confirm the signal of increased risk of heart attacks that was found in a meta-analysis of clinical trials first reported in 2007.”

FDA Press Release, “FDA requires removal of certain restrictions on the diabetes drug Avandia” (Nov. 25, 2013). And in December 2015, the FDA abandoned its requirement of a Risk Evaluation and Mitigation Strategy for Avandia. FDA, “Rosiglitazone-containing Diabetes Medicines: Drug Safety Communication – FDA Eliminates the Risk Evaluation and Mitigation Strategy (REMS)” (Dec. 16, 2015).

GSK’s vindication came too late to reverse Judge Rufe’s decision in the Avandia MDL.  GSK spent over six billion dollars on resolving Avandia claims.  And to add to the company’s chagrin, GSK lost patent protection for Avandia in April 2012.[11]

Something good, however, may have emerged from the Avandia litigation debacle.  Judge Rufe heard from plaintiffs’ expert witnesses in Avandia about the hierarchy of evidence, about how observational studies must be evaluated for bias and confounding, about the importance of statistical significance, and about how studies that lack power to find relevant associations may still yield conclusions with appropriate meta-analysis. Important nuances of meta-analysis methodology may have gotten lost in the kerfuffle, but given that plaintiffs had reasonable quality clinical trial data, Avandia plaintiffs’ counsel could eschew their typical reliance upon weak and irrelevant lines of evidence, based upon case reports, adverse event disproportional reporting, and the like.

The Zoloft litigation introduced Judge Rufe to a more typical pharmaceutical litigation. Because the outcomes of interest were birth defects, there were no clinical trials.  To be sure, there were observational epidemiologic studies, but now the defense expert witnesses were carefully evaluating the studies for bias and confounding, and the plaintiffs’ expert witnesses were double counting studies and ignoring multiple comparisons and validity concerns.  Once again, in the Zoloft MDL, plaintiffs’ expert witnesses made their non-specific complaints about “lack of power” (without ever specifying the relevant alternative hypothesis), but it was the defense expert witnesses who cited relevant meta-analyses that attempted to do something about the supposed lack of power. Plaintiffs’ expert witnesses inconsistently argued “lack of power” to disregard studies that had outcomes that undermined their opinions, even when those studies had narrow confidence intervals surrounding values at or near 1.0.

The Avandia litigation laid the foundation for Judge Rufe’s critical scrutiny by exemplifying the nature and quantum of evidence to support a reasonable scientific conclusion.  Notwithstanding the mistakes made in the Avandia litigation, this earlier MDL created an invidious distinction with the Zoloft PSC’s evidence and arguments, which looked as weak and insubstantial as they really were.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>. SeeThe American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016).

[2] See 21 C.F.R. § 314.80 (a) Postmarketing reporting of adverse drug experiences (defining “[a]dverse drug experience” as “[a]ny adverse event associated with the use of a drug in humans, whether or not considered drug related”).

[3] See Centers for Disease Control and Prevention, “Birth Defects Home Page” (last visited April 8, 2016).

[4] See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).

[5] Cynthia D. Mulrow, John Cornell & A. Russell Localio, “Rosiglitazone: A Thunderstorm from Scarce and Fragile Data,” 147 Ann. Intern. Med. 585 (2007).

[6] George A. Diamond, Leon Bax & Sanjay Kaul, “Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infartion and Cardiovascular Death,” 147 Ann. Intern. Med. 578 (2007).

[7] Tian, et al., “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction” 10 Biostatistics 275 (2008)

[8] Adrian V. Hernandez, Esteban Walker, John P.A. Ioannidis,  and Michael W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

[9] Janet Woodcock, FDA Decision Memorandum (Sept. 22, 2010).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11]Pharmacovigilantism – Avandia Litigation” (Nov. 27, 2013).

Expert Witness – Ghost Busters

March 29th, 2016

Andrew Funkhouser was tried and convicted for selling cocaine.  On appeal, the Missouri Court of Appeals affirmed his conviction and his sentence of prison for 30 years. State v. Funkhouser, 729 S.W.2d 43 (Mo. App. 1987). On a petition for post-conviction relief, Funkhouser asserted that he was deprived of his Sixth Amendment right to effective counsel. Funkhouser v. State, 779 S.W.2d 30 (Mo. App. 1989).

One of the alleged grounds of ineffectiveness was his lawyer’s failure to object to the prosecutor’s cross-examination of a defense expert witness, clinical psychologist Frederick Nolen, on Nolan’s belief in ghosts. Id. at 32. On direct examination, Nolen testified that he had published or presented on multiple personalities, hypnosis, and ghosts.

On cross-examination, the prosecution inquired of Nolan about his theory of ghosts:

“Q. Doctor, I believe that you’ve done some work in the theory of ghosts, is that right?

A. Yes.

Q. I believe you told me that some of that work you’d based on your own experiences, is that correct?

A. Yes.

Q. You also told me you have lived in a haunted house for 13 years, is that right?

A. Yes.

Q. You have seen the ghost, is that correct?

A. Yes.”

Id. at 32-33. Funkhouser asserted that the cross-examination was improper because his expert witness was examined on his religious beliefs, and his counsel was ineffective for failing to object. Id. at 33.  The Missouri Court of Appeals disagreed. Counsel are permitted to cross-examine an adversary’s expert witness

“in any reasonable respect that will test his qualifications, credibility, skill or knowledge and the value and accuracy of his opinions.”

The court held that any failure to object could not be incompetence because the examination was proper. Id.

So there you have it: wacky beliefs systems are fair game for cross-examination of expert witnesses, at least in the “Show-Me” state.

And this broad scope of cross-examination is probably a good thing because almost anything seems to go in Missouri. The Show-Me state has been wiping up the rear in the law of expert witness admissibility. Missouri Revised Statutes contains a version of the Federal Rule of Evidence 702, which goes back to the language before the federal statutory revision in 2000:

Expert witness, opinion testimony admissible–hypothetical question not required, when.

490.065. 1. In any civil action, if scientific, technical or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education may testify thereto in the form of an opinion or otherwise.

In January 2016, the Missouri state senate passed a bill that would bring the Missouri standard in line with the current federal court rule of evidence. Most of the Republican senators voted for the bill; none of the Democrats voted in favor of the reform. Chris Semones, Missouri: One Step Closer to Daubert,” in Expert Witness Network (Jan. 26, 2016).

The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees

March 19th, 2016

People say crazy things. In a radio interview, Evangelical Michael Huckabee argued that the Kentucky civil clerk who refused to issue a marriage license to a same-sex couple was as justified in defying an unjust court decision as people are justified in disregarding Dred Scott v. Sanford, 60 U.S. 393 (1857), which Huckabee described as still the “law of the land.”1 Chief Justice Roger B. Taney would be proud of Huckabee’s use of faux history, precedent, and legal process to argue his cause. Definition of “huckabee”: a bogus factoid.

Consider the case of Sander Greenland, who attempted to settle a score with an adversary’s expert witness, who had opined in 2002, that Bayesian analyses were rarely used at the FDA for reviewing new drug applications. The adversary’s expert witness obviously got Greenland’s knickers in a knot because Greenland wrote an article in a law review of all places, in which he presented his attempt to “correct the record” and show how the statement of the opposing expert witness was“ludicrous” .2 To support his indictment on charges of ludicrousness, Greenland ignored the FDA’s actual behavior in reviewing new drug applications,3 and looked at the practice of the Journal of Clinical Oncology, a clinical journal published 24 issues a year, with occasional supplements. Greenland found the word “Bayesian” 50 times in over 40,000 journal pages, and declared victory. According to Greenland, “several” (unquantified) articles had used Bayesian methods to explore, post hoc, statistically nonsignificant results.”4

Given Greenland’s own evidence, the posterior odds that Greenland was correct in his charges seem to be disturbingly low, but he might have looked at the published papers that conducted more serious, careful surveys of the issue.5 This week, the Journal of the American Medical Association published yet another study by John Ioannidis and colleagues, which documented actual practice in the biomedical literature. And no surprise, Bayesian methods barely register in a systematic survey of the last 25 years of published studies. See David Chavalarias, Joshua David Wallach, Alvin Ho Ting Li, John P. A. Ioannidis, “Evolution of reporting P values in the biomedical literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016). See also Demetrios N. Kyriacou, “The Enduring Evolution of the P Value,” 315 J. Am. Med. Ass’n 1113 (2016) (“Bayesian methods are not frequently used in most biomedical research analyses.”).

So what are we to make of Greenland’s animadversions in a law review article? It was a huckabee moment.

Recently, the American Statistical Association (ASA) issued a statement on the use of statistical significance and p-values. In general, the statement was quite moderate, and declined to move in the radical directions urged by some statisticians who attended the ASA’s meeting on the subject. Despite the ASA’s moderation, the ASA’s statement has been met with huckabee-like nonsense and hyperbole. One author, a pharmacologist trained at the University of Washington, with post-doctoral training at the University of California, Berkeley, and an editor of PloS Biology, was moved to write:

However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Lauren Richardson, “Is the p-value pointless?” (Mar. 16, 2016). And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure. Richardson suffered a lapse and wrote a huckabee.

Not surprisingly, lawyers attempting to spin the ASA’s statement have unleashed entire hives of huckabees in an attempt to deflate the methodological points made by the ASA. Here is one example of a litigation-industry lawyer who argues that the American Statistical Association Statement shows the irrelevance of statistical significance for judicial gatekeeping of expert witnesses:

To put it into the language of Daubert, debates over ‘p-values’ might be useful when talking about the weight of an expert’s conclusions, but they say nothing about an expert’s methodology.”

Max Kennerly, “Statistical Significance Has No Place In A Daubert Analysis” (Mar. 13, 2016) [cited as Kennerly]

But wait; the expert witness must be able to rule out chance, bias and confounding when evaluating a putative association for causality. As Austin Bradford Hill explained, even before assessing a putative association for causality, scientists need first to have observations that

reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

The analysis of random error is an essential step on the methodological process. Simply because a proper methodology requires consideration of non-statistical factors does not remove the statistical from the methodology. Ruling out chance as a likely explanation is a crucial first step in the methodology for reaching a causal conclusion when there is an “expected value” or base rate of for the outcome of interest in the population being sampled.

Kennerly shakes his hive of huckabees:

The erroneous belief in an ‘importance of statistical significance’ is exactly what the American Statistical Association was trying to get rid of when they said, ‘The widespread use of “statistical significance” (generally interpreted as p ≤ 0.05)’ as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

And yet, the ASA never urged that scientists “get rid of” statistical analyses and assessments of attained levels of significance probability. To be sure, they cautioned against overinterpreting p-values, especially in the context of multiple comparisons, non-prespecified outcomes, and the like. The ASA criticized bright-line rules, which are often used by litigation-industry expert witnesses to over-endorse the results of studies with p-values less than 5%, often in the face of multiple comparisons, cherry-picked outcomes, and poorly and incompletely described methods and results. What the ASA described as a “considerable distortion of the scientific process” was claiming scientific truth on the basis of “p < 0.05.” As Bradford Hill pointed out in 1965, a clear-cut association, beyond that which we would care to attribute to chance, is the beginning of the analysis of an association for causality, not the end of it. Kennerly ignores who is claiming “truth” in the litigation context.  Defense expert witnesses frequently are opining no more than “not proven.” The litigation industry expert witnesses must opine that there is causation, or else they are out of a job.

The ASA explained that the distortion of the scientific process comes from making a claim of a scientific conclusion of causality or its absence, when the appropriate claim is “we don’t know.” The ASA did not say, suggest, or imply that a claim of causality can be made in the absence of finding statistical significance, and as well as validation of the statistical model on which it is based, and other factors as well. The ASA certainly did not say that the scientific process will be served well by reaching conclusions of causation without statistical significance. What is clear is that statistical significance should not be an abridgment for a much more expansive process. Reviewing the annals of the International Agency for Research on Cancer (even in its currently politicized state), or the Institute of Medicine, an honest observer would be hard pressed to come up with examples of associations for outcomes that have known base rates, which associations were determined to be causal in the absence of studies that exhibited statistical significance, along with many other indicia of causality.

Some other choice huckabees from Kennerly:

“It’s time for courts to start seeing the phrase ‘statistically significant’ in a brief the same way they see words like ‘very,’ ‘clearly,’ and ‘plainly’. It’s an opinion that suggests the speaker has strong feelings about a subject. It’s not a scientific principle.”

Of course, this ignores the central limit theorems, the importance of random sampling, the pre-specification of hypotheses and level of Type I error, and the like. Stuff and nonsense.

And then in a similar vein, from Kennerly:

The problem is that many courts have been led astray by defendants who claim that ‘statistical significance’ is a threshold that scientific evidence must pass before it can be admitted into court.”

In my experience, litigation-industry lawyers oversell statistical significance rather than defense counsel who may question reliance upon studies that lack it. Kennerly’s statement is not even wrong, however, because defense counsel knowledgeable of the rules of evidence would know that statistical studies themselves are rarely admitted into evidence. What is admitted, or not, is the opinion of expert witnesses, who offer opinions about whether associations are causal, or not causal, or inconclusive.


1 Ben Mathis-Lilley, “Huckabee Claims Black People Aren’t Technically Citizens During Critique of Unjust Laws,” The Slatest (Sept. 11 2015) (“[T]he Dred Scott decision of 1857 still remains to this day the law of the land, which says that black people aren’t fully human… .”).

2 Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014).

3 To be sure, eight years after Greenland published this diatribe, the agency promulgated a guidance that set recommended practices for Bayesian analyses in medical device trials. FDA Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (February 5, 2010); 75 Fed. Reg. 6209 (February 8, 2010); see also Laura A. Thompson, “Bayesian Methods for Making Inferences about Rare Diseases in Pediatric Populations” (2010); Greg Campbell, “Bayesian Statistics at the FDA: The Trailblazing Experience with Medical Devices” (Presentation give by Director, Division of Biostatistics Center for Devices and Radiological Health at Rutgers Biostatistics Day, April 3, 2009). Even today, Bayesian analysis remains uncommon at the U.S. FDA.

4 39 Wake Forest Law Rev. at 306-07 & n.61 (citing only one paper, Lisa Licitra et al., Primary Chemotherapy in Resectable Oral Cavity Squamous Cell Cancer: A Randomized Controlled Trial, 21 J. Clin. Oncol. 327 (2003)).

5 See, e.g., J. Martin Bland & Douglas G. Altman, “Bayesians and frequentists,” 317 Brit. Med. J. 1151, 1151 (1998) (“almost all the statistical analyses which appear in the British Medical Journal are frequentist”); David S. Moore, “Bayes for Beginners? Some Reasons to Hesitate,” 51 The Am. Statistician 254, 254 (“Bayesian methods are relatively rarely used in practice”); J.D. Emerson & Graham Colditz, “Use of statistical analysis in the New England Journal of Medicine,” in John Bailar & Frederick Mosteler, eds., Medical Uses of Statistics 45 (1992) (surveying 115 original research studies for statistical methods used; no instances of Bayesian approaches counted); Douglas Altman, “Statistics in Medical Journals: Developments in the 1980s,” 10 Statistics in Medicine 1897 (1991); B.S. Everitt, “Statistics in Psychiatry,” 2 Statistical Science 107 (1987) (finding only one use of Bayesian methods in 441 papers with statistical methodology).

The American Statistical Association’s Statement on and of Significance

March 17th, 2016

In scientific circles, some commentators have so zealously criticized the use of p-values that they have left uninformed observers with the impression that random error was not an interesting or important consideration in evaluating the results of a scientific study. In legal circles, counsel for the litigation industry and their expert witnesses have argued duplicitously that statistical significance was at once both unimportant, except when statistical significance is observed, in which causation is conclusive. The recently published Statement of the American Statistical Association (“ASA”) restores some sanity to the scientific and legal discussions of statistical significance and p-values. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>.

Recognizing that sound statistical practice and communication affects research and public policy decisions, the ASA has published a statement of interpretative principles for statistical significance and p-values. The ASA’s statement first, and foremost, points out that the soundness of scientific conclusions turns on more than statistical methods alone. Study design, conduct, and evaluation often involve more than a statistical test result. And the ASA goes on to note, contrary to the contrarians, that “the p-value can be a useful statistical measure,” although this measure of attained significance probability “is commonly misused and misinterpreted.” ASA at 7. No news there.

The ASA’s statement puts forth six principles, all of which have substantial implications for how statistical evidence is received and interpreted in courtrooms. All are worthy of consideration by legal actors – legislatures, regulators, courts, lawyers, and juries.

1. P-values can indicate how incompatible the data are with a specified statistical model.”

The ASA notes that a p-value shows the “incompatibility between a particular set of data and a proposed model for the data.” Although there are some in the statistical world who rail against null hypotheses of no association, the ASA reports that “[t]he most common context” for p-values consists of a statistical model that includes a set of assumptions, including a “null hypothesis,” which often postulates the absence of association between exposure and outcome under study. The ASA statement explains:

The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.”

Some lawyers want to overemphasize statistical significance when present, but to minimize the importance of statistical significance when it is absent.  They will find no support in the ASA’s statement.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”

Of course, there are those who would misinterpret the meaning of p-values, but the flaw lies in the interpreters, not in the statistical concept.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Note that the ASA did not say that statistical significance is irrelevant to scientific conclusions. Of course, statistical significance is but one factor, which does not begin to account for study validity, data integrity, or model accuracy. The ASA similarly criticizes the use of statistical significance as a “bright line” mode of inference, without consideration of the contextual considerations of “the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis.” Criticizing the use of “statistical significance” as singularly assuring the correctness of scientific judgment does not, however, mean that “statistical significance” is irrelevant or unimportant as a consideration in a much more complex decision process.

4. Proper inference requires full reporting and transparency”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

ASA Statement. See also “Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses” (Oct. 14, 2014).

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”

The ASA notes the commonplace distinction between statistical and practical significance. The independence between statistical and practice significance does not, however, make statistical significance irrelevant, especially in legal and regulatory contexts, in which parties claim that a risk, however small, is relevant. Of course, we want the claimed magnitude of association to be relevant, but we also need the measured association to be accurate and precise.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”

Of course, a p-value cannot validate the model, which is assumed to generate the p-value. Contrary to the hyperbolic claims one sees in litigation, the ASA notes that “a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.” And so the ASA counsels that “data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.” 

What is important, however, is that the ASA never suggests that significance testing or measurement of significance probability is not an important and relevant part of the process. To be sure, the ASA notes that because of “the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches.”

First of these other methods unsurprisingly is estimation with assessment of confidence intervals, although the ASA also includes Bayesian and other methods as well. There are some who express irrational exuberance about the protential of Bayesian methods to restore confidence in scientific process and conclusions. Bayesian approaches are less manipulated than frequentist ones, largely because very few people use Bayesian methods, and even fewer people really understand them.

In some ways, Bayesian statistical approaches are like Apple computers. The Mac OS is less vulnerable to viruses, compared with Windows, because its lower market share makes it less attractive to virus code writers. As Apple’s OS has gained market share, its vulnerability has increased. (My Linux computer on the other hand is truly less vulnerable to viruses because of system architecture, but also because Linux personal computers have almost no market share.) If Bayesian methods become more prevalent, my prediction is that they will be subject to as much abuse as frequent views. The ASA wisely recognized that the “reproducibility crisis” and loss of confidence in scientific research were mostly due to bias, both systematic and cognitive, in how studies are done, interpreted, and evaluated.

Birth Defects Case Exceeds NY Court of Appeal’s Odor Threshold

March 14th, 2016

The so-called “weight of the evidence” (WOE) approach by expert witnesses has largely been an argument for subjective weighting of studies and cherry picking of data to reach a favored, pre-selected conclusion. The approach is so idiosyncratic and amorphous that it really is no method at all, which is exactly why it seems to have been embraced by the litigation industry and its cadre of expert witnesses.

The WOE enjoyed some success in the First Circuit’s Milward decision, with much harrumphing from the litigation industry and its proxies, but more recently courts have mostly seen through the ruse and employed their traditional screening approaches to exclude opinions that deviate from the relevant standard of care of scientific opinion testimony.[1]

In Reeps, the plaintiff child was born with cognitive and physical defects, which his family claimed resulted from his mother’s inhalation of gasoline fumes in her allegedly defective BMW. To support their causal claims, the Reeps proffered the opinions of two expert witnesses, Linda Frazier and Shira Kramer, on both general and specific causation of the child’s conditions. The defense presented reports from Anthony Scialli and Peter Lees.

Justice York, of the Supreme Court for New York County, sustained defendants’ objections to the admissibility of Frazier and Kramer’s opinions, in a careful opinion that dissected the general and specific causation opinions that invoked WOE methods. Reeps v. BMW of North America, LLC, 2012 NY Slip Op 33030(U), N.Y.S.Ct., Index No. 100725/08 (New York Cty. Dec. 21, 2012) (York, J.), 2012 WL 6729899, aff’d on rearg., 2013 WL 2362566.

The First Department of the Appellate Division affirmed Justice York’s exclusionary ruling and then certified the appellate question to the New York Court of Appeals. 115 A.D.3d 432, 981 N.Y.S.2d 514 (2013).[2] Last month, the New York high court affirmed in a short opinion that focused on the plaintiff’s claim that Mrs. Reeps must have been exposed to a high level of gasoline (and its minor constituents, such as benzene) because she experienced symptoms such as dizziness while driving the car. Sean R. v. BMW of North America, LLC, ___ N.E.3d ___, 2016 WL 527107, 2016 N.Y. Slip Op. 01000 (2016).[3]

The car in question was a model that was recalled by BMW for a gasoline line leak, and there was thus no serious question that there had been some gasoline exposure to the plaintiff’s mother and thus to the plaintiff and thus perhaps to the plaintiff in utero. According to the Court of Appeals, the plaintiff’s expert witness Frazier concluded that the gasoline fume exposures to the car occupants exceeded 1,000 parts per million (ppm) because studies showed that symptoms of acute toxicity were reported when exposures reached or exceeded 1,000 ppm. The mother of the car’s owner claimed to suffer dizziness and nausea when riding in the car, and Frazier inferred from these self-reported, in litigation, symptoms that the plaintiff’s mother also sustained gasoline exposures in excess of 1,000 ppm. From this inference about level of exposure, Frazier then proceeded to use the “Bradford Hill criteria” to opine that unleaded gasoline vapor is capable of causing the claimed birth defects based upon “the link between exposure to the constituent chemicals and adverse birth outcomes.” And then using the wizardry of differential etiology, Frazier was able to conclude that the mother’s first-trimester exposure to gasoline fumes was the probable cause of plaintiff’s birth defects.

There was much wrong with Frazier’s opinions, as detailed in the trial court’s decision, but for reasons unknown, the Court of Appeals chose to focus on Frazier’s symptom-threshold analysis. The high court provided no explanation of how Frazier applied the Bradford Hill criteria, or her downward extrapolation from high-exposure benzene or solvent exposure birth defect studies to a gasoline-exposure case that involved only a small percentage of benzene or solvent in the high-exposure studies. There is no description from the Court of what a “link” might be, or how it is related to a cause; nor is there any discussion of how Frazier might have excluded the most likely cause of birth defects: the unknown. The Court also noted that plaintiff’s expert witness Kramer had employed a WOE-ful analysis, but it provided no discussion of what was amiss with Kramer’s opinion. A curious reader might think that the Court had overlooked and dismissed “sound science,” but Justice York’s trial court opinion fully addressed the inadequacies of these other opinions.

The Court of Appeals acknowledge that “odor thresholds” can be helpful in estimating a plaintiff’s level of exposure to a potentially toxic chemical, but it noted that there was no generally accepted exposure assessment methodology that connected the report of an odor to adverse pregnancy outcomes.

Frazier, however, had not adverted to an odor threshold, but a symptom threshold. In support, Frazier pointed to three things:

  1. A report of the American Conference of Governmental and Industrial Hygienists (ACGIH), (not otherwise identified) which synthesized the results of controlled studies, and reported a symptom threshold of “mild toxic effects” to be about 1,000 ppm;
  1. A 1991 study (not further identified) that purportedly showed a dose-response between exposures to ethanol and toluene and headaches; and
  1. A 2008 report (again not further identified) that addressed the safety of n-Butyl alcohol in cosmetic products.

Item (2) seems irrelevant at best, given that ethanol and toluene are again minor components of gasoline, and that the exposure levels in the study are not given. Item (3) again seems off the report because the Court’s description does not allude to any symptom threshold; nor is there any attempt to tie exposure levels of n-Butyl to the experienced levels of gasoline in the Reeps case.

With respect to item (1), which supposedly had reported that if exposure exceeded 1,000 ppm, then headaches and nausea can occur acutely, the Court asserted that the ACGIH report did not support an inverse inference, that if headaches and nausea had occurred, then exposures exceeded 1,000 ppm.

It is true that ) does not logically support ), but the claimed symptoms, their onset and abatement, and the lack of other known precipitating causes would seem to provide some evidence for exposures above the symptom threshold. Rather than engaging with the lack of scientific evidence on the claimed causal connection between gasoline and birth defects, however, the Court invoked the lack of general acceptance of the “symptom-threshold” methodology to dispose of the case.

In its short opinion, The Court of Appeals did not address the quality, validity, or synthesis of studies urged by plaintiff’s expert witnesses; nor did it address the irrelevancy of whether the plaintiff’s grandmother or his mother had experienced acute symptoms such as nausea to the level that might be relevant to causing embryological injury. Had it done so, the Court would have retraced the path of Justice York, in the trial court, who saw through the ruse of WOE and the blatantly false claim that the scientific evidence even came close to satisfying the Bradford Hill factors. Furthermore, the Court might have found that the defense expert witnesses were entirely consistent with the Centers for Disease Control:

“The hydrocarbons found in gasoline can cross the placenta. There is no direct evidence that maternal exposure to gasoline causes fetotoxic or teratogenic effects. Gasoline is not included in Reproductive and Developmental Toxicants, a 1991 report published by the U.S. General Accounting Office (GAO) that lists 30 chemicals of concern because of widely acknowledged reproductive and developmental consequences.”

Agency for Toxic Substances and Disease Registry, “Medical Management Guidelines for Gasoline” (Oct. 21, 2014, last updated) (“Toxic Substances Portal – Gasoline, Automotive”); Agency for Toxic Substances and Disease Registry, “Public Health Statement for Automotive Gasoline” (June 1995) (“There is not enough information available to determine if gasoline causes birth defects or affects reproduction.”); see also National Institute for Occupational Safety & Health, Occupational Exposure to Refined Petroleum Solvents: Criteria for a Recommended Standard (1977).


[1] See, e.g., In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345, 1367 (S.D. Fla. 2011), aff’d, Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296 (11th Cir. 2014). See alsoFixodent Study Causes Lockjaw in Plaintiffs’ Counsel” (Feb. 4, 2015); “WOE-fully Inadequate Methodology – An Ipse Dixit By Another Name” (May 1, 2012); “I Don’t See Any Method At All”   (May 2, 2013).

[2]New York Breathes Life Into Frye Standard – Reeps v. BMW” (March 5, 2013); “As They WOE, So No Recovery Have the Reeps” (May 22, 2013).

[3] See Sean T. Stadelman “Symptom Threshold Methodology Rejected by Court of Appeals of New York Pursuant to Frye,” (Feb. 18, 2016).

Systematic Reviews and Meta-Analyses in Litigation, Part 2

February 11th, 2016

Daubert in Knee’d

In a recent federal court case, adjudicating a plaintiff’s Rule 702 challenge to defense expert witnesses, the trial judge considered plaintiff’s claim that the challenged witness had deviated from PRISM guidelines[1] for systematic reviews, and thus presumably had deviated from the standard of care required of expert witnesses giving opinions about causal conclusions.

Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015) [cited as Batty I]. The trial judge, the Hon. Rebecca R. Pallmeyer, denied plaintiff’s motion to exclude the allegedly deviant witness, but appeared to accept the premise of the plaintiff’s argument that an expert witness’s opinion should be reached in the manner of a carefully constructed systematic review.[2] The trial court’s careful review of the challenged witness’s report and deposition testimony revealed that there had mean no meaningful departure from the standards put forward for systematic reviews. SeeSystematic Reviews and Meta-Analyses in Litigation” (Feb. 5, 2016).

Two days later, the same federal judge addressed a different set of objections by the same plaintiff to two other of the defendant’s, Zimmer Inc.’s, expert witnesses, Dr. Stuart Goodman and Dr. Timothy Wright. Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5095727, (N.D. Ill. Aug. 27, 2015) [cited as Batty II]. Once again, plaintiff Batty argued for the necessity of adherence to systematic review principles. According to Batty, Dr. Wright’s opinion, based upon his review of the clinical literature, was scientifically and legally unreliable because he had not conducted a proper systematic review. Plaintiff alleged that Dr. Wright’s review selectively “cherry picked” favorable studies to buttress his opinion, in violation of systematic review guidelines. The trial court, which had assumed that a systematic review was the appropriate “methodology” for Dr. Vitale, in Batty I, refused to sustain the plaintiff’s challenge in Batty II, in large part because the challenged witness, Dr. Wright, had not claimed to have performed a systematic or comprehensive review, and so his failure to follow the standard methodology did not require the exclusion of his opinion at trial. Batty II at *3.

The plaintiff never argued that Dr. Wright misinterpreted any of his selected studies upon which he relied, and the trial judge thus suggested that Dr. Wright’s discussion of the studies, even if a partial, selected group of studies, would be helpful to the jury. The trial court thus left the plaintiff to her cross-examination to highlight Dr. Wright’s selectivity and lack of comprehensiveness. Apparently, in the trial judge’s view, this expert witness’s failure to address contrary studies did not render his testimony unreliable under “Daubert scrutiny.” Batty II at *3.

Of course, it is no longer the Daubert judicial decision that mandates scrutiny of expert witness opinion testimony, but Federal Rule of Evidence 702. Perhaps it was telling that when the trial court backed away from its assumption, made in Batty I, that guidelines or standards for systematic reviews should inform a Rule 702 analysis, the court cited Daubert, a judicial opinion superseded by an Act of Congress, in 2000. The trial judge’s approach, in Batty II, threatens to make gatekeeping meaningless by deferring to the expert witness’s invocation of personal, idiosyncratic, non-scientific standards. Furthermore, the Batty II approach threatens to eviscerate gatekeeping for clinical practitioners who remain blithely unaware of advances in epidemiology and evidence-based medicine. The upshot of Batty I and II combined seems to be that systematic review principles apply to clinical expert witnesses only if those witness choose to be bound by such principles. If this is indeed what the trial court intended, then it is jurisprudential nonsense.

The trial court, in Batty II, exercised a more searching approach, however, to Dr. Wright’s own implant failure analysis, which he relied upon in an attempt to rebut plaintiff’s claim of defective design. The plaintiff claimed that the load-bearing polymer surfaces of the artificial knee implant experienced undue deformation. Dr. Wright’s study found little or no deformation on the load bearing polymer surfaces of the eight retrieved artificial joints. Batty II at *4.

Dr. Wright assessed deformation qualitatively, not quantitatively, through the use of a “colormetric map of deformation” of the polymer surface. Dr. Wright, however, provided no scale to define or assess how much deformation was represented by the different colors in his study. Notwithstanding the lack of any metric, Dr. Wright concluded that his findings, based upon eight retrieved implants, “suggested” that the kind of surface failing claimed by plaintiff was a “rare event.”

The trial court had little difficulty in concluding that Dr. Wright’s evidentiary base was insufficient, as was his presentation of the study’s data and inferences. The challenged witness failed to explain how his conclusions followed from his data, and thus his proffered testimony fell into the “ipse dixit” category of inadmissible opinion testimony. General Electric v. Joiner, 522 U.S. 136, 146 (1997). In the face of the challenge to his opinions, Dr. Wright supplemented his retrieval study with additional scans of surficial implant wear patterns, but he failed again to show the similarity of previous use and failure conditions in the patients from whom these implants were retrieved and the plaintiff’s case (which supposedly involved aseptic loosening). Furthermore, Dr. Wright’s interpretation of his own retrieval study was inadequate in the trial court’s view because he had failed to rule out other modes of implant failure, in which the polyethylene surface would have been preserved. Because, even as supplemented, Dr. Wright’s study failed to support his proffered opinions, the court held that his opinions, based upon his retrieval study had to be excluded under Rule 702. The trial court did not address the Rule 703 implications for Dr. Wright’s reliance upon a study that was poorly designed and explained, and which lacked the ability to support his contention that the claimed mode of implant failure was a “rare” event. Batty II at *4 – 5.


[1] See David Moher , Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, & The PRISMA Group, “Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement,” 6 PLoS Med e1000097 (2009) [PRISMA].

[2] Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015).

Vaccine Court Inoculated Against Pathological Science

October 25th, 2015

Richard I. Kelley, M.D., Ph.D., is the Director of the Division of Metabolism, Kennedy Krieger Institute, and a member of the Department of Pediatrics, in Johns Hopkins University. The National Library of Medicine’s Pubmed database shows that Dr. Kelley has written dozens of articles on mitochondrial disease, but none that concludes that thimerosal or the measles, mumps and rubella vaccine plays a causal role in causing autism by inducing or aggravating mitochondrial disease. In one article, Kelley opines:

“Large, population-based studies will be needed to identify a possible relationship of vaccination with autistic regression in persons with mitochondrial cytopathies.”

Jacqueline R. Weissman, Richard I. Kelley, Margaret L. Bauman, Bruce H. Cohen, Katherine F. Murray, Rebecca L. Mitchell, Rebecca L. Kern, and Marvin R. Natowicz, “Mitochondrial Disease in Autism Spectrum Disorder Patients: A Cohort Analysis,” 3 PLoS One e3815 (Nov. 26, 2008). Those large scale population-based studies to support the speculation of Kelley and his colleagues have not materialized since 2008, and meta-analyses and systematic reviews have dampened the enthusiasm for Kelley’s hypothesis.[1]

Special Master Denise K. Vowell, in the Federal Court of Claims, has now further dampened the enthusiasm for Dr. Kelley’s mitochondrial theories, in a 115 page opinion, written in support of rejecting Kelley’s testimony and theories that the MMR vaccine caused a child’s autism. Madariaga v. Sec’y Dep’t H.H.S., No. 02-1237V (Ct. Fed. Claims Sept. 26, 2015) Slip op. [cited as Madariaga].

Special Master Vowell fulsomely recounts the history of vaccine litigation, in which the plaintiffs have presented theories that the combination of thimerosal-containing vaccines and the MMR vaccine cause autism, or just the thimerosal-containing vaccines cause autism. Madariaga at 3. Both theories were tested in the crucible of litigation and cross-examination in a series of test cases. The first theory resulted in verdicts against the claimants, which were affirmed on appeal.[2] Similarly, the trials on the thimerosal-only claims uniformly resulted in decisions from the Special Masters against the claims.[3] The three Special Masters, hearing the cases, found that the vaccine-causation claims were not close cases, and were based upon unreliable evidence.[4] Madariaga at 4.[5]

In Madariaga, Special Master Vowell noted that Doctor Kelley had conceded the “absence of an association between the MMR vaccine and autism in large epidemiological studies.” Madariaga at 61. Kelley attempted to evade the force of his lack of evidence by retreating into a claim that “autistic regressions caused by the live attenuated MMR vaccine are rare events,” and an assertion that there are many inflammatory factors that can induce autistic regression. Madariaga at 61.

Special Master described the whole of Kelley’s testimony as “meandering, confusing, and completely unpersuasive elaboration of his unique insights and methods.” Madariaga at 66. Although it is clear from the Special Master’s opinion that Kelley was unbridled in his over-interpretation of studies, and perhaps undisciplined in his interpretation of test results, the lengthy opinion provides only a high-altitude view of Kelley’s errors. There are tantalizing comments and notes in the Special Master’s decision, such as one that reports that one study may have been over-interpreted by Kelley because he ignored the authors’ comment that their findings could be consistent with chance because of their multiple comparisons, and another that paper that failed to show statistical significance. Madariaga at 90 & n.160.

The unreliability of Kelley’s testimony appeared to be more than hand waving in the absence of evidence. He compared the child’s results on a four-hour fasting test, when the child had not fasted for four hours. When pressed about this maneuver, Kelley claimed that he had made calculations to bring the child’s results “back to some standard.” Madariaga at 66 & n.115.

Although the Special Master’s opinion itself was ultimately persuasive, the tome left me eager to know more about Dr. Kelley’s epistemic screw ups, and less about vaccine court procedure.


[1] See Vittorio Demicheli, Alessandro Rivetti, Maria Grazia Debalini, and Carlo Di Pietrantonj, “Vaccines for measles, mumps and rubella in children,” Cochrane Database Syst. Rev., Issue 2. Art. No. CD004407, DOI:10.1002/14651858.CD004407.pub3 (2012) (“Exposure to the MMR vaccine was unlikely to be associated with autism … .”); Luke E. Taylor, Amy L. Swerdfeger, and Guy D. Eslick, “Vaccines are not associated with autism: An evidence-based meta-analysis of case-control and cohort studies,” 32 Vaccine 3623 (2014) (“Findings of this meta-analysis suggest that vaccinations are not associated with the development of autism or autism spectrum disorder. Furthermore, the components of the vaccines (thimerosal or mercury) or multiple vaccines (MMR) are not associated with the development of autism or autism spectrum disorder.”).

[2] Cedillo v. Sec’y, HHS, No. 98-916V, 2009 WL 331968 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 89 Fed. Cl. 158 (2009), aff’d, 617 F.3d 1328 (Fed. Cir. 2010); Hazlehurst v. Sec’y, HHS, No. 03-654V, 2009 WL 332306 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 88 Fed. Cl. 473 (2009), aff’d, 604 F.3d 1343 (Fed. Cir. 2010); Snyder v. Sec’y, HHS, No. 01-162V, 2009 WL 332044 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 88 Fed. Cl. 706 (2009).

[3] Dwyer v. Sec’y, HHS, 2010 WL 892250; King v. Sec’y, HHS, No. 03-584V, 2010 WL 892296 (Fed. Cl. Spec. Mstr. Mar. 12, 2010); Mead v. Sec’y, HHS, 2010 WL 892248.

[4] See, e.g., King, 2010 WL 892296, at *90 (emphasis in original); Snyder, 2009 WL 332044, at *198.

[5] The Federal Rule of Evidence technically do not control the vaccine court proceedings, but the Special Masters are bound by the requirement of Daubert v. Merrell Dow Pharm., 509 U.S. 579, 590 (1993), to find that expert witness opinion testimony is reliable before they consider it. Knudsen v. Sec’y, HHS, 35 F.3d 543, 548-49 (Fed. Cir. 1994). Madariaga at 7.

On Amending Rule 702 of the Federal Rules of Evidence

October 17th, 2015

No serious observer or scholar of the law of evidence can deny that the lower federal courts have applied Daubert and its progeny, and the revised Federal Rule of Evidence 702, inconstantly and inconsistently, in their decisions to admit or exclude proffered expert witness opinion testimony. Opponents of trial court “gatekeeping” of expert witnesses applaud the lapses in hopes that the gates have been unhinged and that there will be “open admissions” for expert witness testimony. These opponents latch on to the suggestion that the Rules favor “liberal” admissibility with a libertine; they lose sight of the meaning of “liberal” that conveys enlightened, with an openness to progress and salutary change, and the claims of knowledge over blind faith.  Supporters of gatekeeping lament the courts’ inability or unwillingness to apply a clear statutory mandate that is designed to improve and ensure the correctness of fact finding in the federal courts. A few have decried the lawlessness of the courts’ evasions and refusals to apply Rule 702’s requirements.

Given the clear body of Supreme Court precedent, and the statutory revision to Rule 702, which was clearly designed to embrace, embody, enhance, and clarify the high Court precedent, I did not think that an amendment to Rule 702 was needed to improve the sorry state of lower court decisions. Professor David Bernstein and lawyer Eric Lasker, however, have made a powerful case for amendment as a way of awakening and galvanizing federal judges to their responsibilities under the law. David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1 (2015) [cited below as Bernstein & Lasker].

Bernstein and Lasker remind us that Rule 702 is a statute[1] that superseded inconsistent prior judicial pronouncements. The authors review many of the more egregious cases that ignore the actual text of Rule 702, while adverting to judicial gloss on the superseded rule, and even to judicial precedent and dicta pre-dating the Daubert case itself. Like the Papa Bear in the Berenstain Bear family, the authors show us how not to interpret a statute properly, through examples from federal court decisions.

The Dodgers’ Dodges

Questions about whether expert witnesses properly applied a methodology to the facts of a case are for the jury, and not the proper subject of gatekeeping.

As Bernstein and Lasker document, this thought- and Rule-avoidance dodge is particularly shocking given that the Supreme Court clearly directed close and careful analysis of the specific application of general principles to the facts of a case.[2] Shortly after the Supreme Court decided Daubert, the Third Circuit decided a highly influential decision in which it articulated the need for courts to review every step in expert witnesses’ reasoning for reliability. In re Paoli RR Yard PCB Litig., 35 F.3d 717, 745 (3d Cir. 1994). The Paoli case thus represents the antithesis of a judicial approach that asks only the 10,000 foot level question whether the right methodology was used; Paoli calls for a close, careful analysis of the application of a proper methodology to every step in the case. Id. (“any step that renders the analysis unreliable … renders the expert’s testimony inadmissible … whether the step completely changes a reliable methodology or merely misapplies that methodology”).

While the Paoli approach is unpopular with some judges who might prefer not to work so hard, the Advisory Committee heartily endorsed Paoli’s “any step” approach in its Note to the 2000 amendment. Bernstein & Lasker at 32. Bernstein and Lasker further point out that the Committee’s Report, Professor Dan Capra, acknowledged, shortly after the amendment went into effect, that the Paoli “any step” approach had a “profound impact” on the drafting of amended Rule 702. Bernstein & Lasker at 28.[3]

Having demonstrated the reasons, the process, and the substance of the judicial and legislative history of the revised Rule 702, Bernstein and Lasker are understandably incensed at the lawlessness of circuit and trial courts that have eschewed the statute, have ignored Supreme Court precedent, and have retreated to vague judicial pronouncements that trace back to before some or any of the important changes occurred in Rule 702.[4]

Let’s Cherry Pick and Weigh of the Evidence; Forget the Scale

Along with some courts’ insistence that trial judges may not examine the application of methods to the facts of a case, other courts, perhaps mindful of their citation practices, have endorsed “cherry picking” as a satisfactory methodology for partial expert witnesses to support their opinions. Id. at 35-36. Our law review authors also trace the influence of plaintiffs’ counsel, through their “walking around money” from the breast implant litigation, in sponsoring anti-Daubert, anti-gatekeeping conferences, at which prominent plaintiffs’ advocates and expert witnesses, such as Carl Cranor presented in favor of a vague “weight of the evidence” (WOE) approach to decision making. Id. at 39. Following these conferences, some courts have managed to embrace WOE, which is usually packaged as an abandonment of scientific standards of validity and sufficiency, in favor of selective review and subjective decisions. To do this, however, courts have had to ignore both Supreme Court precedent and the clear language of Rule 702. In Joiner, the high Court rejected WOE, over the dissent of a single justice,[5] but some of the inferior federal courts have embraced the dissent to the exclusion of the majority’s clear holding, as well as the incorporation of that holding into the revised Rule 702.[6] An interesting case of judicial disregard.

Other Dodges

The law review authors did not purport to provide an exhaustive catalogue of avoidance and evasion techniques. Here is one that is not discussed: shifting the burden of proof on admissibility to the opponent of the expert witness’s opinion:

“Testimony from an expert is presumed to be helpful unless it concerns matters within the everyday knowledge and experience of a lay juror.”

Earp v. Novartis Pharms., No. 5:11–CV–680–D, 2013 WL 4854488, at *3 (Sept. 11, 2013). See also Kopf v. Skyrm, 993 F.2d 374, 377 (4th Cir.1993); accord Koger v. Norfolk S. Ry. Co., No. 1:08–0909, 2010 WL 692842, at *1 (S.D.W.Va. Feb. 23, 2010) (unpublished).

Whence comes this presumption? Perhaps it is no more than a requirement for the opponent to object and articulate the flaws before the trial court will act. But the “presumption” sure looks like a covert shifting of the burden of proof for the requisite reliability of an expert witness’s opinion, which burden clearly falls on the proponent of the testimony.

The Proposed Amended Rule 702

There are several possible responses to the problem of the judiciary’s infidelity to basic principles, precedent, and legislative directive. Bernstein and Lasker advance amendments to the current Rule 702, as a blunt reminder that the times and the law have changed, really. Here is their proposed revision, with new language italicized, and deleted language shown to be struck:

“Rule 702. Testimony by Expert Witnesses

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if the testimony satisfies each of the following requirements:

(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data that reliably support the expert’s opinion;

(c) the testimony is the product of reliable and objectively reasonable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case and reached his conclusions without resort to unsupported speculation.

Appeals of district court decisions under this Rule are considered under the abuse-of-discretion standard. Such decisions are evaluated with the same level of rigor regardless of whether the district court admitted or excluded the testimony in question. This Rule supersedes any preexisting precedent that conflicts with any or all sections of this Rule.

Bernstein & Lasker at 44-45.

Before discussing and debating the changes, we should ask, “why change a fairly good statute just because lower courts evade its terms?” The corrupt efforts of SKAPP[7] to influence public and judicial policy, as well as the wildly one-sided Milward symposium,[8] which the authors discuss, should serve as a potent reminder that there would be many voices in the review and revision process, both from within plaintiffs’ bar, and from those sympathetic to the litigation industry’s goals and desires. Opening up the language of Rule 702 to revision could result in reactionary change, driven by the tort bar’s and allies’ lobbying. The result could be the evisceration of Rule 702, as it now stands. This danger requires a further exploration of alternatives to the proposed amendment.

Rule 702 has had the benefit of evolutionary change and development, which have made it better and also possibly burdened with vestigial language. To be sure, the rule is a difficult statute to draft, and while the authors give us a helpful start, there is many problems to be subdued before a truly workable working draft can be put be forward.

The first sentence’s new language, “the testimony satisfies each of the following requirements,” is probably already satisfied the use of “and” between the following numbered paragraphs. Given the judicial resistance to Rule 702, the additional verbiage could be helpful; it should be unnecessary. The conditionality of “if,” however, leaves the meaning of the Rule unclear when that condition is not satisfied. The Rule clearly signifies that “if” in the introductory sentence means “only if,” and the law and litigants would be better off if the Rule said what it means.

Proposed Subsection (b)

(b) the testimony is based on sufficient facts or data that reliably support the expert’s opinion;

The authors do not make much of a case for striking “sufficient.” There will be times when there are perfectly good facts and data supporting an expert witness’s opinion, in a completely reliable opinion, but the supporting facts and data do not support an epistemic claim of “knowledge,” because the support is indeterminate between the claim and many other competing hypotheses that might explain the outcome at issue. The reliably supporting facts and data may amount to little more than a scientific peppercorn, and really be too much of too little to support the claim. Deleting “sufficient” from subsection b could be a serious retrograde move, which will confuse the judiciary more than instruct it.

The revised subsection also fails to address the integrity of the facts and data, and the validity of how the data were generated. To be sure, Rule 703 could pick up some of the slack, but Rule 703 is often ignored, and even when invoked, that rule has its own drafting and interpretation problems. SeeGiving Rule 703 the Cold Shoulder” (May 12, 2012); “RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011). Also missing is an acknowledgment that the facts or data must often be analyzed in some way, whether by statistical tests or some other means. And finally, there is the problem in that reliable does not necessarily connote valid or accurate. Subsection (b) thus seems to cry out for additional qualification, such as:

“the testimony is based on sufficient facts or data, reliably, accurately, and validly ascertained, and analyzed, which facts or data reliably and validly support the expert’s opinion”

Proposed Subsection (c)

Bernstein and Lasker propose modifying this subsection to inject “and objectively reasonable” before “principles and methods.” The authors do not explain what objectively reasonable principles and methods encompass, and they qualification does seem self-explanatory. Perhaps they are calling for principles and methods that are “generally accepted,” and otherwise justified as warranted to produce accurate, true results? Is so, that might be a helpful addition.

Proposed Subsection (d)

Here the authors bolster the language of the subsection with a prohibition against using unsupported speculation. OK; but would supported or inspired or ingenious speculation be any better? Subsection (a) speaks of knowledge, and it should be obvious that the expert witness’s opinion has an epistemic warrant to be something more than a mere subjective opinion.

Whether Bernstein and Lasker have opened a can or a Concordat of Worms remains to be seen.


[1] The authors provide a great resource on the legislative history of attempts to revise 702, up to and including the 2000 revision. The 2000 revision began with a proposed amendment from the Advisory Committee in April 1999. The Standing Committee on Rules of Practice and Procedure approved the proposal, and forwarded the proposed amendment to the Judicial Conference, which approved the amendment without change in September 1999. The Supreme Court ordered the amendment in April 2000, and submitted the revised rule to Congress. Order Amending the Federal Rules of Evidence, 529 U.S. 1189, 1195 (2000). The revised Rule 702 became effective on December 1, 2000. See also Bernstein & Lasker at 19 n. 99 (citing Edward J. Imwinkelried, “Response, Whether the Federal Rules of Evidence Should Be Conceived as a Perpetual Index Code: Blindness Is Worse than Myopia,” 40 Wm. & Mary L. Rev. 1595, 1595-98 (1999) (noting and supporting the Supreme Court’s interpretation and application of the Federal Rules of Evidence as a statute, and subject to the judicial constraints on statutory construction). For a strident student’s pro-plaintiff view of the same legislative history, see Nancy S. Farrell, “Congressional Action to Amend Federal Rule of Evidence 702: A Mischievous Attempt to Codify Daubert v. Merrell Dow Pharmaceuticals, Inc.”, 13 J. Contemp. Health L. & Pol’y 523 (1997).

[2] General Electric Co. v. Joiner, 522 U.S. 136 (1997) (reviewing and analyzing individual studies’ internal and external validity, and rejecting plaintiffs’ argument that only the appropriateness of the methodology in the abstract was subject of gatekeeping); Kumho Tire Co. v. Carmichael, 526 U.S. 137, 156-57 (1999) (“stressing that district courts must scrutinize whether the principles and methods employed by an expert have been properly applied to the facts of the case”) (quoting what was then the proposed advisory committee’s note to Rule 702, Preliminary Draft of Proposed Amendments to the Federal Rules of Civil Procedure and Evidence: Request for Comment, 181 F.R.D. 18, 148 (1998)).

[3] citing Stephen A. Saltzburg, Edwin J. Imwinkelried, & Daniel J. Capra, “Keeping the Reformist Spirit Alive in Evidence Law,” 149 U. Pa. L. Rev. 1277, 1289-90 (2001). The authors note that other circuits have embraced the Paoli “any steps” approach. Bernstein & Lasker at 28 at n. 152 (citing Paz v. Brush Engineered Materials, Inc., 555 F.3d 383, 387-91 (5th Cir. 2009); McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1245 (11th Cir. 2005); Dodge v. Cotter Corp., 328 F.3d 1212, 1222 (10th Cir. 2003); Amorgianos v. Nat’l R.R. Passenger Corp., 303 F.3d 256, 267 (2d Cir. 2002) (quoting In re Paoli, 35 F.3d at 746).

[4] See, e.g., City of Pomona v. SQM N. Am. Corp., 750 F.3d 1036, 1047 (9th Cir. 2014) (rejecting the Paoli any step approach without careful analysis of the statute, the advisory committee note, or Supreme Court decisions); Manpower, Inc. v. Ins. Co. of Pa., 732 F.3d 796, 808 (7th Cir. 2013) (“[t]he reliability of data and assumptions used in applying a methodology is tested by the adversarial process and determined by the jury; the court’s role is generally limited to assessing the reliability of the methodology – the framework – of the expert’s analysis”); Bonner v. ISP Techs., Inc., 259 F.3d 924, 929 (8th Cir. 2001) (“the factual basis of an expert opinion goes to the credibility of the testimony, not the admissibility, and it is up to the opposing party to examine the factual basis for the opinion in cross-examination”).

[5] General Electric Co. v. Joiner, 522 U.S. 136, 146-47 (1997) (holding that district court had the “discretion to conclude that the studies upon which the experts relied were not sufficient, whether individually or in combination, to support their conclusions that Joiner’s exposure to PCB’s contributed to his cancer”). Other federal and state courts have followed Joiner. See Allen v. Pa. Eng’g Corp., 102 F.3d 194, 198 (5th Cir. 1996) (“We are also unpersuaded that the ‘weight of the evidence’ methodology these experts use is scientifically acceptable for demonstrating a medical link between Allen’s EtO exposure and brain cancer.”). For similar rejections of vague claims that weak evidence add up to more than the sum of its parts, see Hollander v. Sandoz Pharm. Corp., 289 F.3d 1193, 1216 n.21 (10th Cir. 2002); Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 608 (D.N.J. 2002); Caraker v. Sandoz Pharm. Corp., 188 F. Supp. 2d 1026, 1040 (S.D. Ill. 2001); Siharath v. Sandoz Pharm. Corp., 131 F. Supp. 2d 1347, 1371 (N.D. Ga. 2001), aff’d sub nom. Rider v. Sandoz Pharm. Corp., 295 F.3d 1194 (11th Cir. 2002); Merck & Co. v. Garza, 347 S.W.3d 256, 268 (Tex. 2011); Estate of George v. Vt. League of Cities & Towns, 993 A.2d 367, 379-80 (Vt. 2010).

[6] Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 17-18 (1st Cir. 2011) (reversing the exclusion of expert witnesses who embraced WOE). Milward has garnered some limited support in a few courts, as noted by Bernstein and Lasker; see In re Fosamax (Alendronate Sodium) Prods. Liab. Litig., Nos. 11-5304, 08-08, 2013 WL 1558690, at *4 (D.N.J. Apr. 10, 2013); Harris v. CSX Transp., Inc., 753 S.E.2d 275, 287-89, 301-02 (W. Va. 2013).

[7]SKAPP A LOT” (April 30, 2010).

[8]Milward Symposium Organized by Plaintiffs’ Counsel and Witnesses” (Feb. 13, 2013); [http://perma.cc/PW2V-X7TK].

Woodside on Access to Underlying Research Data

October 10th, 2015

Access to underlying data and materials, source codes, and other research materials is a two-edged sword. Many scientists who hold forth on the issue, including some prominent plaintiffs’ expert witnesses, have been extremely critical of the pharmaceutical and other industries for not sharing underlying data of their research. On the other hand, some of the same people have resisted sharing data and information when the litigants have sought access to these materials to understand or to challenge the published conclusions and analyses.[1]

Dr. Frank Woodside, of Dinsmore & Shohl, kindly sent me a copy of his recent law review article, written with a colleague, which advocates for full disclosure of underlying research data when research becomes material to the outcome of litigation.[2] Frank C. Woodside & Michael J. Gray, “Researchers’ Privilege: Full Disclosure,” 32 West. Mich. Univ. Cooley L. Rev. 1 (2015). The authors make the case that the so-called researcher’s privilege has little or no support in federal or state law. My previous posts have largely supported this view, at least for research that has been published, and especially for research that is being relied upon by testifying expert witnesses in pending litigation. As Lord Chancellor Hardwicke put the matter, “the public has a right to every man’s evidence,”[3] and scientists should not be immune to the requirement of giving and sharing their evidence.

Woodside and Gray have updated the scholarship in this area, and their article should be consulted in any ongoing discovery, subpoena, or Freedom of Information Act (FOIA) battle. Their discussion of the evolving case law under FOIA is especially timely. Despite the strong presumption in favor of disclosure under FOIA,[4] and President Obama’s pronouncements[5] about a new era in FOIA openness and transparency, the government’s compliance is at an all-time low. See Ted Bridis, “Obama administration sets new record for withholding FOIA requests,” PBS News Hour (Mar. 18, 2015). Court decisions have made clear that researchers cannot refuse to produce underlying data simply “because disclosure would diminish the researchers’ ability to publish their results in prestigious journals.”[6] And yet the National Institute of Environmental Health and Safety continues in its aggressive resistance to disclosure of underlying data, often by invoking FOIA exemption number four. In my cases, I have seen the NIEHS resort to this exemption that protects documents that reveal “[t]rade secrets and commercial or financial information obtained from a person and privileged or confidential,”[7] even when the research in question was conducted by academic researchers funded by the NIEHS.


[1] See, e.g., Enoch v. Forest Research Institute, Inc., N.J. Law Div. Hudson Cty., Civ. Div. L-3896-14, Order Granting Defendants’ Motion to Compel Production of Documents Necessary to Verify the Validity and Accuracy of a Study by Plaintiffs’ Expert, Anick Berard, Ph.D. (Oct. 9, 2015) (Jablonski, J.) (ordering plaintiffs to “produce the documents sought by the Forest defendants to verify the validity and accuracy of the study known as “Berard et al., Sertraline Use During Pregnancy and the Risk of Major Malformations, Am. J. Obstet. Gynecol. (2015), doi 10.1016/j.ajog.2015.01.034, namely the study’s SAS source codes and the specific generalized estimating equation models that were used to generate Table 2 of the study”).

[2] And I should thank Dr. Woodside and Mr. Gray for their generous citations to your humble blogger’s posts on this subject.

[3] Debate in the House of Lords on the Bill to Indemnify Evidence, 12 Hansard’s Parliamentary History of England, 675, 693, May 25, 1742, quoted in 8 Wigmore on Evidence at 64, § 2192 (3d ed. 1940).

[4] See S. REP. No. 89-813, at 3 (1965) (the purpose of FOIA is to “establish a general philosophy of full agency disclosure . . . and to provide a court procedure by which citizens and the press may obtain information wrongfully withheld”).

[5] See Executive Order, Memorandum, 74 Fed. Reg. 4685 (Jan. 21, 2009).

[6] See Burka v. U.S. Dep’t of Health and Human Serv., 87 F.3d 508, 515 (D.C. Cir. 1996).

[7] See 5 U.S.C. § 552(b)(4).

Demonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case

October 6th, 2015

Michael D. Freeman is a chiropractor and self-styled “forensic epidemiologist,” affiliated with Departments of Public Health & Preventive Medicine and Psychiatry, Oregon Health & Science University School of Medicine, in Portland, Oregon. His C.V. can be found here. Freeman has an interesting publication in press on his views of forensic epidemiology. Michael D. Freeman & Maurice Zeegers, “Principles and applications of forensic epidemiology in the medico-legal setting,” Law, Probability and Risk (2015); doi:10.1093/lpr/mgv010. Freeman’s views on epidemiology did not, however, pass muster in the courtroom. Porter v. Smithkline Beecham Corp., Phila. Cty. Ct. C.P., Sept. Term 2007, No. 03275. Slip op. (Oct. 5, 2015).

In Porter, plaintiffs sued Pfizer, the manufacturer of the SSRI antidepressant Zoloft. Plaintiffs claimed the mother plaintiff’s use of Zoloft during pregnancy caused her child to be born with omphalocele, a serious defect that occurs when the child’s intestines develop outside his body. Pfizer moved to exclude plaintiffs’ medical causation expert witnesses, Dr. Cabrera and Dr. Freeman. The trial judge was the Hon. Mark I. Bernstein, who has written and presented frequently on expert witness evidence.[1] Judge Bernstein held a two day hearing in September 2015, and last week, His Honor ruled that the plaintiffs’ expert witnesses failed to meet Pennsylvania’s Frye standard for admissibility. Judge Bernstein’s opinion reads a bit like a Berenstain Bear book on how not to use epidemiology.

GENERAL CAUSATION SCREW UPS

Proper Epidemiologic Method

First, Find An Association

Dr. Freeman has a methodologic map that included Bradford Hill criteria at the back end of the procedure. Dr. Freeman, however, impetuously forgot that before you get to the back end, you must traverse the front end:

“Dr. Freemen agrees that he must, and claims he has, applied the Bradford Hill Criteria to support his opinion. However, the starting procedure of any Bradford-Hill analysis is ‘an association between two variables’ that is ‘perfectly clear-cut and beyond what we would care to attribute to the play of chance’.35 Dr. Freeman testified that generally accepted methodology requires a determination, first, that there’s evidence of an association and, second, whether chance, bias and confounding have been accounted for, before application of the Bradford-Hill criteria.36 Because no such association has been properly demonstrated, the Bradford Hill criteria could not have been properly applied.”

Slip op. at 12-13. In other words, don’t go rushing to the Bradford Hill factors until and unless you have first shown an association; second, you have shown that it is “clear cut,” and not likely the result of bias or confounding; and third, you have ruled out the play of chance or random variability in explaining the difference between the observed and expected rates of disease.

Proper epidemiologic method requires surveying the pertinent published studies that investigate whether there is an association between the medication use and the claimed harm. The expert witnesses must, however, do more than write a bibliography; they must assess any putative associations for “chance, confounding or bias”:

“Proper epidemiological methodology begins with published study results which demonstrate an association between a drug and an unfortunate effect. Once an association has been found, a judgment as whether a real causal relationship between exposure to a drug and a particular birth defect really exists must be made. This judgment requires a critical analysis of the relevant literature applying proper epidemiologic principles and methods. It must be determined whether the observed results are due to a real association or merely the result of chance. Appropriate scientific studies must be analyzed for the possibility that the apparent associations were the result of chance, confounding or bias. It must also be considered whether the results have been replicated.”

Slip op. at 7.

Then Rule Out Chance

So if there is something that appears to be an association in a study, the expert epidemiologist must assess whether it is likely consistent with a chance association. If we flip a fair coin 10 times, we “expect” 5 heads and 5 tails, but actually the probability of not getting the expected result is about three times greater than obtaining the expected result. If on one series of 10 tosses we obtain 6 heads and 4 tails, we would certainly not reject a starting assumption that the expected outcome was 5 heads/ 5 tails. Indeed, the probability of obtaining 6 heads / 4 tails or 4 heads /6 tails is almost double that of the probability of obtaining the expected outcome of equal number of heads and tails.

As it turned out in the Porter case, Dr. Freeman relied rather heavily upon one study, the Louik study, for his claim that Zoloft causes the birth defect in question. See Carol Louik, Angela E. Lin, Martha M. Werler, Sonia Hernández-Díaz, and Allen A. Mitchell, “First-Trimester Use of Selective Serotonin-Reuptake Inhibitors and the Risk of Birth Defects,” 356 New Engl. J. Med. 2675 (2007). The authors of the Louik study were quite clear that they were not able to rule out chance as a sufficient explanation for the observed data in their study:

“The previously unreported associations we identified warrant particularly cautious interpretation. In the absence of preexisting hypotheses and the presence of multiple comparisons, distinguishing random variation from true elevations in risk is difficult. Despite the large size of our study overall, we had limited numbers to evaluate associations between rare outcomes and rare exposures. We included results based on small numbers of exposed subjects in order to allow other researchers to compare their observations with ours, but we caution that these estimates should not be interpreted as strong evidence of increased risks.24

Slip op at 10 (quoting from Louik study).

Judge Bernstein thus criticized Freeman for failing to account for chance in explaining his putative association between maternal Zoloft use and infant omphalocele. The appropriate and generally accepted methodology for accomplishing this step of evaluating a putative association is to consider whether the association is statistically significant at the conventional level.

In relying heavily upon the Louik study, Dr. Freeman opened himself up to serious methodological criticism. Judge Bernstein’s opinion stands for the important proposition that courts should not be unduly impressed with nominal statistical significance in the presence of multiple comparisons and very broad confidence intervals:

“The Louik study is the only study to report a statistically significant association between Zoloft and omphalocele. Louik’s confidence interval which ranges between 1.6 and 20.7 is exceptionally broad. … The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power. Studies that rely on a very small number of cases can present a random statistically unstable clustering pattern that may not replicate the reality of a larger population. The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”

Slip op. at 8. Statistical precision in the point estimate of risk, which includes assessing the outcome in the context of whether the authors conducted multiple comparisons, and whether the observed confidence intervals were very broad, is part of the generally accepted epidemiologic methodology, which Freeman flouted:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”

Slip op. at 9. The studies that Freeman cited and apparently relied upon failed to report statistically significant associations between sertraline (Zoloft) and omphalocele. Judge Bernstein found this lack to be a serious problem for Freeman and his epidemiologic opinion:

“While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”

Slip op. at 10. The lack of statistical significance, in the context of repeated attempts to find it, helped sink Freeman’s proffered testimony.

Then Rule Out Bias and Confounding

As noted, Freeman relied heavily upon the Louik study, which was the only study to report a nominally statistically significant risk ratio for maternal Zoloft use and infant omphalocele. The Louik study, by its design, however, could not exclude chance or confounding as full explanation for the apparent association, and Judge Bernstein chastised Dr. Freeman for overselling the study as support for the plaintiffs’ causal claim:

“The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”

Slip op. at 8.

And Only Then Consider the Bradford Hill Factors

Even when an association is clear cut, and beyond what we can likely attribute to chance, generally accepted methodology requires the epidemiologist to consider the Bradford Hill factors. As Judge Bernstein explains, generally accepted methodology for assessing causality in this area requires a proper consideration of Hill’s factors before a conclusion of causation is reached:

“As the Bradford-Hill factors are properly considered, causality becomes a matter of the epidemiologist’s professional judgment.”

Slip op. at 7.

Consistency or Replication

The nine Hill factors are well known to lawyers because they have been stated and discussed extensively in Hill’s original article, and in references such as the Reference Manual on Scientific Evidence. Not all the Hill factors are equally important, or important at all, but one that is consistency or concordance of results among the available epidemiologic studies. Stated alternatively, a clear cut association unlikely to be explained by chance is certainly interesting and probative, but it raises an important methodological question — can the result be replicated? Judge Bernstein restated this important Hill factor as an important determinant of whether a challenged expert witness employed a generally accepted method:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”

Slip op. at 10.

“More significantly neither Reefhuis nor Alwan reported statistically significant associations between Zoloft and omphalocele. While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”

Slip op. at 10.

Replication But Without Double Dipping the Data

Epidemiologic studies are sometimes updated and extended with additional follow up. An expert witness who wished to skate over the replication and consistency requirement might be tempted, as was Dr. Freeman, to count the earlier and later iteration of the same basic study to count as “replication.” The Louik study was indeed updated and extended this year in a published paper by Jennita Reefhuis and colleagues.[2] Proper methodology, however, prohibits double dipping data to count the later study that subsumes the early one as a “replication”:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology. Dr. Freeman claims the Alwan and Reefhuis studies demonstrate replication. However, the population Alwan studied is only a subset of the Reefhuis population and therefore they are effectively the same.”

Slip op. at 10.

The Lumping Fallacy

Analyzing the health outcome of interest at the right level of specificity can sometimes be a puzzle and a challenge, but Freeman generally got it wrong by opportunistically “lumping” disparate outcomes together when it helps him get a result that he likes. Judge Bernstein admonishes:

“Proper methodology further requires that one not fall victim to the … the ‘Lumping Fallacy’. … Different birth defects should not be grouped together unless they a part of the same body system, share a common pathogenesis or there is a specific valid justification or necessity for an association20 and chance, bias, and confounding have been eliminated.”

Slip op. at 7. Dr. Freeman lumped a lot, but Judge Bernstein saw through the methodological ruse. As Judge Bernstein pointed out:

“Dr. Freeman’s analysis improperly conflates three types of data: Zoloft and omphalocele, SSRI’s generally and omphalocele, and SSRI’s and gastrointestinal and abdominal malformations.”

Slip op. at 8. Freeman’s approach, which sadly is seen frequently in pharmaceutical and other products liability cases, is methodologically improper:

“Generally accepted causation criteria must be based on the data applicable to the specific birth defect at issue. Dr. Freeman improperly lumps together disparate birth defects.”

Slip op. at 11.

Class Effect Fallacy

Another kind of improper lumping results from treating all SSRI antidepressants the same to either lump them together, or to pick and choose from among all the SSRIs, the data points that are supportive of the plaintiffs’ claims (while ignoring those SSRI data points not supportive of the claims). To be sure, the SSRI antidepressants do form a “class,” in that they all have a similar pharmacologic effect. The SSRIs, however, do not all achieve their effect in the serotonergic neurons the same way; nor do they all have the same “off-target” effects. Treating all the SSRIs as interchangeable for a claimed adverse effect, without independent support for this treatment, is known as the class effect fallacy. In Judge Bernstein’s words:

“Proper methodology further requires that one not fall victim to the ‘Class Effect Fallacy’ … . A class effect cannot be assumed. The causation conclusion must be drug specific.”

Slip op. at 7. Dr. Freeman’s analysis improperly conflated Zoloft data with SSRI data generally. Slip op. at 8. Assuming what you set out to demonstrate is, of course, a fine way to go methodologically into the ditch:

“Without significant independent scientific justification it is contrary to generally accepted methodology to assume the existence of a class effect. Dr. Freeman lumps all SSRI drug results together and assumes a class effect.”

Slip op. at 10.

SPECIFIC CAUSATION SCREW UPS

Dr. Freeman was also offered by plaintiffs to provide a specific causation opinion – that Mrs. Porter’s use of Zoloft in pregnancy caused her child’s omphalocele. Freeman claimed to have performed a differential diagnosis or etiology or something to rule out alternative causes.

Genetics

In the field of birth defects, one possible cause looming in any given case is an inherited or spontaneous genetic mutation. Freeman purported to have considered and ruled out genetic causes, which he acknowledged to make up a substantial percentage of all omphalocele cases. Bo Porter, Mrs. Porter’s son, was tested for known genetic causes, and Freeman argued that this allowed him to “rule out” genetic causes. But the current state of the art in genetic testing allowed only for identifying a small number of possible genetic causes, and Freeman failed to explain how he might have ruled out the as-of-yet unidentified genetic causes of birth defects:

“Dr. Freeman fails to properly rule out genetic causes. Dr. Freeman opines that 45-49% of omphalocele cases are due to genetic factors and that the remaining 50-55% of cases are due to non-genetic factors. Dr. Freeman relies on Bo Porter’s genetic testing which did not identify a specific genetic cause for his injury. However, minor plaintiff has not been tested for all known genetic causes. Unknown genetic causes of course cannot yet be tested. Dr. Freeman has made no analysis at all, only unwarranted assumptions.”

Slip op. at 15-16. Judge Bernstein reviewed Freeman’s attempted analysis and ruling out of potential causes, and found that it departed from the generally accepted methodology in conducting differential etiology. Slip op. at 17.

Timing Errors

One feature of putative terotogenicity is that an embryonic exposure must take place at a specific gestational developmental time in order to have its claimed deleterious effect. As Judge Bernstein pointed out, omphalocele results from an incomplete folding of the abdominal wall during the third to fifth weeks of gestation. Mrs. Porter, however, did not begin taking Zoloft until her seventh week of pregnancy, which left Dr. Freeman opinion-less as to how Zoloft contributed to the claimed causation of the minor plaintiff’s birth defect. Slip op. at 14. This aspect of Freeman’s specific causation analysis was glaringly defect, and clearly not the sort of generally accepted methodology of attributing a birth defect to a teratogen.

******************************************************

All in all, Judge Bernstein’s opinion is a tour de force demonstration of how a state court judge, in a so-called Frye jurisdiction, can show that failure to employ generally accepted methods renders an expert witness’s opinions inadmissible. There is one small problem in statistical terminology.

Statistical Power

Judge Bernstein states, at different places, that the Louik study was and was not statistically significant for Zoloft and omphalocele. The court’s opinion ultimately does explain that the nominal statistical significance was vitiated by multiple comparisons and an extremely broad confidence interval, which more than justified its statement that the study was not truly statistically significant. In another moment, however, the court referred to the problem as one of lack of statistical power. For some reason, however, Judge Bernstein chose to explain the problem with the Louik study as a lack of statistical power:

“Equally significant is the lack of power concerning the omphalocele results. The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power.”

Slip op. at 8. The adjusted odds ratio for Zoloft and omphalocele, was 5.7, with a 95% confidence interval of 1.6 – 20.7. Power was not the issue because if the odds ratio were otherwise credible, free from bias, confounding, and chance, the study had the power to observe an increased risk of close to 500%, which met the pre-stated level of significance. The problem, however, was multiple testing, fragile and imprecise results, and inability to evaluate the odds ratio fully for bias and confounding.


 

[1] Mark I. Bernstein, “Expert Testimony in Pennsylvania,” 68 Temple L. Rev. 699 (1995); Mark I. Bernstein, “Jury Evaluation of Expert Testimony under the Federal Rules,” 7 Drexel L. Rev. 239 (2014-2015).

[2] Jennita Reefhuis, Owen Devine, Jan M Friedman, Carol Louik, Margaret A Honein, “Specific SSRIs and birth defects: bayesian analysis to interpret new data in the context of previous reports,” 351 Brit. Med. J. (2015).