TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The IARC Process is Broken

May 4th, 2016

Last spring, the International Agency for Research on Cancer (IARC) convened a working group that voted to classify the herbicide glyphosate as “probably carcinogenic to humans.” The vote was followed by IARC’s Press Release, a summary in The Lancet,[1] and the publication of a “monograph,” volume 112 in the IARC series.

IARC classifications of a chemical as “probably” carcinogenic to humans are actually fairly meaningless exercises in semantics, not science. A close reading of the IARC Preamble definition of probable reveals that probable does not mean greater than 50%:

“The terms probably carcinogenic and possibly carcinogenic have no quantitative significance and are used simply as descriptors of different levels of evidence of human carcinogenicity, with probably carcinogenic signifying a higher level of evidence than possibly carcinogenic.”

Despite the vacuity of the IARC’s “probability” determinations, IARC decisions have serious real-world consequences in the realm of regulation and litigation. Monsanto, the manufacturer of glyphosate herbicide, reacted strongly, expressing “outrage” and claiming that the IARC had cherry picked data to reach its conclusion. Jack Kaskey, “Monsanto ‘Outraged’ by Assessment That Roundup Probably Causes Cancer,” 43 Product Safety & Liability Reporter 416 (Mar. 30, 2015).

In the wake of the IARC classification, in the fall of 2015, the United States Environmental Protection Agency (EPA) reviewed the evidence for, and against, glysophate’s carcinogenicity. The EPA found that the IARC had deliberately failed to consider studies that did not find associations, and that the complete scientific record did not support a conclusion of human carcinogenicity. EPA Report of the Cancer Assessment Review Committee on Glyphosate (Oct. 1, 2015).

For undisclosed reasons, however, the EPA’s report was never made public until a couple of weeks ago, when it showed up briefly on the agency’s website, only to be pulled down after a day or so. See David Schultz, “EPA Panel Finds Glyphosate Not Likely to Cause Cancer,” Product Safety & Liability Reporter (May 03, 2016). No doubt the present Administration viewed a conflict between EPA and IARC, and disparaging comments about the IARC’s “process” to be national security issues.  At the very least, the Administration would not want to undermine the litigation industry’s reliance upon the IARC cherry-picked report.

All joking aside, the incident highlights the problematic nature of the IARC decision process, and the reliance of regulatory agencies on the apparent authority of IARC determinations. The IARC process is toxic and should be remediated.


[1] Kathryn Z Guyton, Dana Loomis, Yann Grosse, Fatiha El Ghissassi, Lamia Benbrahim-Tallaa, Neela Guha, Chiara Scoccianti, Heidi Mattock, Kurt Straif, on behalf of the International Agency for Research on Cancer Monograph Working Group, IARC, Lyon, France, “Carcinogenicity of tetrachlorvinphos, parathion, malathion, diazinon, and glyphosate,” 16 The Lancet Oncology 490 (2015).

 

 

The Education of Judge Rufe – The Zoloft MDL

April 9th, 2016

The Honorable Cynthia M. Rufe is a judge on the United States District Court, for the Eastern District of Pennsylvania.  Judge Rufe was elected to a judgeship on the Bucks County Court of Common Pleas in 1994.  She was appointed to the federal district court in 2002. Like most state and federal judges, little in her training and experience as a lawyer prepared her to serve as a gatekeeper of complex expert witness scientific opinion testimony.  And yet, the statutory code of evidence, and in particular, Federal Rules of Evidence 702 and 703, requires her do just that.

The normal approach to MDL cases is marked by the Field of Dreams: “if you build it, they will come.” Last week, Judge Rufe did something that is unusual in pharmaceutical litigation; she closed the gate and sent everyone home. In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016).

Her Honor’s decision was hardly made in haste.  The MDL began in 2012, and proceeded in a typical fashion with case management orders that required the exchange of general causation expert witness reports. The plaintiffs’ steering committee (PSC), acting for the plaintiffs, served the report of only one epidemiologist, Anick Bérard, who took the position that Zoloft causes virtually every major human congenital anomaly known to medicine. The defendants challenged the admissibility of Bérard’s opinions.  After extensive briefings and evidentiary hearings, the trial court found that Bérard’s opinions were riddled with inconsistent assessments of studies, eschewed generally accepted methods of causal inference, ignored contrary evidence, adopted novel, unreliable methods of endorsing “trends” in studies, and failed to address epidemiologic studies that did not support her subjective opinions. In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D.Pa.2014). The trial court permitted plaintiffs an opportunity to seek reconsideration of Bérard’s exclusion, which led to the trial court’s reaffirming its previous ruling. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 314149, at *2 (E.D.Pa. Jan. 23, 2015).

Notwithstanding the PSC’s claims that Bérard was the best qualified expert witness in her field and that she was the only epidemiologist needed to support the plaintiffs’ causal claims, the MDL court indulged the PSC by permitting plaintiffs another bite at the apple.  Over defendants’ objections, the court permitted the PSC to name yet another expert witness, statistician Nicholas Jewell, to do what Bérard had failed to do: proffer an opinion on general causation supported by sound science.  In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 115486, at * 2 (E.D.Pa. Jan. 7, 2015).

As a result of this ruling, the MDL dragged on for over a year, in which time, the PSC served a report by Jewell, and then the defendants conducted a discovery deposition of Jewell, and lodged a new Rule 702 challenge.  Although Jewell brought more statistical sophistication to the task, he could not transmute lead into gold; nor could he support the plaintiffs’ causal claims without committing most of the same fallacies found in Bérard’s opinions.  After another round of Rule 702 briefs and hearings, the MDL court excluded Jewell’s unwarranted causal opinions. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D.Pa. Dec. 2, 2015).

The successive exclusions of Bérard and Jewell left the MDL court in a peculiar position. There were other witnesses, Robert Cabrera, a teratologist, Michael Levin, a molecular biologist, and Thomas Sadler, an embryologist, whose opinions addressed animal toxicologic studies, biological plausibility, and putative mechanisms.  These other witnesses, however, had little or no competence in epidemiology, and they explicitly relied upon Bérard’s opinions with respect to human outcomes.  As a result of Bérard’s exclusion, these witnesses were left free to offer their views about what happens in animals at high doses, or about theoretical mechanisms, but they were unable to address human causation.

Although the PSC had no expert witnesses who could legitimately offer reasonably supported opinions about the causation of human birth defects, the plaintiffs refused to decamp and leave the MDL forum. Faced with the prospect of not trying their cases to juries, the PSC instead tried the patience of the MDL judge. The PSC pulled out the stops in adducing weak, irrelevant, and invalid evidence to support their claims, sans epidemiologic expertise. The PSC argued that adverse event reports, internal company documents that discussed possible associations, the biological plausibility opinions of Levin and Sadler, the putative mechanism opinions of Cabrera, differential diagnoses offered to support specific causation, and the hip-shot opinions of a former-FDA-commissioner-for-hire, David Kessler could come together magically to supply sufficient evidence to have their cases submitted to juries. Judge Rufe saw through the transparent effort to manufacture evidence of causation, and granted summary judgment on all remaining Zoloft cases in the MDL. s In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799, at *4 (E.D. Pa. April 5, 2016).

After a full briefing and hearing on Bérard’s opinion, a reconsideration of Bérard, a permitted “do over” of general causation with Jewell, a full briefing and hearing on Jewell’s opinions, the MDL court was able to deal deftly with the snippets of evidence “cobbled together” to substitute for evidence that might support a conclusion of causation. The PSC’s cobbled case was puffed up to give the appearance of voluminous evidence, in 200 exhibits that filled six banker’s boxes.  Id. at *5. The ruse was easily undone; most of the exhibits and purported evidence were obvious rubbish. “The quantity of the evidence is not, however, coterminous with the quality of evidence with regard to the issues now before the Court.” Id. The banker’s boxes contained artifices such as untranslated foreign-language documents, and company documents relating to the development and marketing of the medication. The PSC resubmitted reports from Levin, Cabrera, and Sadler, whose opinions were already adjudicated to be incompetent, invalid, irrelevant, or inadequate to support general causation.  The PSC pointed to the specific causation opinions of a clinical cardiologist, Ra-Id Abdulla, M.D., who proffered dubious differential etiologies, ruling in Zoloft as a cause of individual children’s birth defects, despite his inability to rule out truly known and unknown causes in the differential reasoning.  The MDL court, however, recognized that “[a] differential diagnosis assumes that general causation has been established,” id. at *7, and that Abdulla could not bootstrap general causation by purporting to reach a specific causation opinion (even if those specific causation opinions were legitimate).

The PSC submitted the recent consensus statement of the American Statistical Association (ASA)[1], which it misrepresented to be an epidemiologic study.  Id. at *5. The consensus statement makes some pedestrian pronouncements about the difference between statistical and clinical significance, about the need for other considerations in addition to statistical significance, in supporting causal claims, and the lack of bright-line distinctions for statistical significance in assessing causality.  All true, but immaterial to the PSC’s expert witnesses’ opinions that over-endorsed statistical significance in the few instances in which it was shown, and over-interpreted study data that was based upon data mining and multiple comparisons, in blatant violation of the ASA’s declared principles.

Stretching even further for “human evidence,” the PSC submitted documentary evidence of adverse event reports, as though they could support a causal conclusion.[2]  There are about four million live births each year, with an expected rate of serious cardiac malformations of about one per cent.[3]  The prevalence of SSRI anti-depressant use is at least two per cent, which means that we would expect 800 cardiac birth defects each year to occur in children of mother’s who took SSRI anti-depressants in the first trimester. If Zoloft had an average market share of all the SSRIs of about 25 per cent, then 200 cardiac defects each year would occur in children born to mothers who took Zoloft.  Given that Zoloft has been on the market since the early 1990s, we would expect that there would be thousands of children, exposed to Zoloft during embryogenesis, born with cardiac defects, if there was nothing untoward about maternal exposure to the medication.  Add the stimulated reporting of adverse events from lawyers, lawyer advertising, and lawyer instigation, you have manufactured evidence not probative of causation at all.[4] The MDL court cut deftly and swiftly through the smoke screen:

“These reports are certainly relevant to the generation of study hypotheses, but are insufficient to create a material question of fact on general causation.”

Id. at *9. The MDL court recognized that epidemiology was very important in discerning a causal connection between a common exposure and a common outcome, especially when the outcome has an expected rate in the general population. The MDL court stopped short of holding that epidemiologic evidence was required (which on the facts of the case would have been amply justified), but instead supported its ratio decidendi on the need to account for the extant epidemiology that contradicted or failed to support the strident and subjective opinions of the plaintiffs’ expert witnesses. The MDL court thus gave plaintiffs every benefit of the doubt by limiting its holding on the need for epidemiology to:

“when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why that body of research does not contradict or undermine their opinion.”

Id. at *5, quoting from In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449, 476 (E.D. Pa. 2014).

The MDL court also saw through the thin veneer of respectability of the testimony of David Kessler, a former FDA commissioner who helped make large fortunes for some of the members of the PSC by the feeding frenzy he created with his moratorium on silicone gel breast implants.  Even viewing Kessler’s proffered testimony in the most charitable light, the court recognized that he offered little support for a causal conclusion other than to delegate the key issues to epidemiologists. Id. at *9. As for the boxes of regulatory documents, foreign labels, and internal company memoranda, the MDL court found that these documents did not raise a genuine issue of material fact concerning general causation:

“Neither these documents, nor draft product documents or foreign product labels containing language that advises use of birth control by a woman taking Zoloft constitute an admission of causation, as opposed to acknowledging a possible association.”

Id.

In the end, the MDL court found that the PSC’s many banker boxes of paper contained too much of nothing for the issue at hand.  Having put the defendants through the time and expense of litigating and re-litigating these issues, nothing short of dismissing the pending cases was a fair and appropriate outcome to the Zoloft MDL.

_______________________________________

Given the denouement of the Zoloft MDL, it is worth considering the MDL judge’s handling of the scientific issues raised, misrepresented, argued, or relied upon by the parties.  Judge Rufe was required, by Rules 702 and 703, to roll up her sleeves and assess the methodological validity of the challenged expert witnesses’ opinions.  That Her Honor was able to do this is a testament to her hard work. Zoloft was not Judge Rufe’s first MDL, and she clearly learned a lot from her previous judicial assignment to an MDL for Avandia personal injury actions.

On May 21, 2007, the New England Journal of Medicine published online a seriously flawed meta-analysis of cardiovascular disease outcomes and rosiglitazone (Avandia) use.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  The Nissen article did not appear in print until June 14, 2007, but the first lawsuits resulted within a day or two of the in-press version. The lawsuits soon thereafter reached a critical mass, with the inevitable creation of a federal court Multi-District Litigation.

Within a few weeks of Nissen’s article, the Annals of Internal Medicine published an editorial by Cynthia Mulrow, and other editors, in which questioned the Nissen meta-analysis[5], and introduced an article that attempted to replicate Nissen’s work[6].  The attempted replication showed that the only way Nissen could have obtained his nominally statistically significant result was to have selected a method, Peto’s fixed effect method, known to be biased for use with clinical trials with uneven arms. Random effect methods, more appropriate for the clinically heterogeneous clinical trials, consistently failed to replicate the Nissen result. Other statisticians weighed in and pointed out that using the risk difference made much more sense when there were multiple trials with zero events in one or the other or both arms of the trials. Trials with zero cardiovascular events in both arms represented important evidence of low, but equal risk, of heart attacks, which should be captured in an appropriate analysis.  When the risk difference approach was used, with exact statistical methods, there was no statistically significant increase in risk in the dataset used by Nissen.[7] Other scientists, including some of Nissen’s own colleagues at the Cleveland Clinic, and John Ioannidis, weighed in to note how fragile and insubstantial the Nissen meta-analysis was[8]:

“As rosiglitazone case demonstrates, minor modifications of the meta-analysis protocol can change the statistical significance of the result.  For small effects, even the direction of the treatment effect estimate may change.”

Nissen achieved his political objective with his shaky meta-analysis.  The FDA convened an Advisory Committee meeting, which in turn resulted in a negative review of the safety data, and the FDA’s imposition of warnings and a Risk Evaluation and Mitigation Strategy, which all but prohibited use of rosiglizone.[9]  A clinical trial, RECORD, had already started, with support from the drug sponsor, GlaxoSmithKline, which fortunately was allowed to continue.

On a parallel track to the regulatory activities, the federal MDL, headed by Judge Rufe, proceeded to motions and a hearing on GSK’s Rule 702 challenge to plaintiffs’ evidence of general causation. The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that showed significant diffidence in addressing scientific issues.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011.

After Judge Rufe denied GSK’s challenges to the admissibility of plaintiffs’ expert witnesses’ causation opinions in the Avandia MDL, the RECORD trial was successfully completed and published.[10]  RECORD was a long term, prospectively designed randomized cardiovascular trial in over 4,400 patients, followed on average of 5.5 yrs.  The trial was designed with a non-inferiority end point of ruling out a 20% increased risk when compared with standard-of-care diabetes treatment The trial achieved its end point, with a hazard ratio of 0.99 (95% confidence interval, 0.85-1.16) for cardiovascular hospitalization and death. A readjudication of outcomes by the Duke Clinical Research Institute confirmed the published results.

On Nov. 25, 2013, after convening another Advisory Committee meeting, the FDA announced the removal of most of its restrictions on Avandia:

“Results from [RECORD] showed no elevated risk of heart attack or death in patients being treated with Avandia when compared to standard-of-care diabetes drugs. These data do not confirm the signal of increased risk of heart attacks that was found in a meta-analysis of clinical trials first reported in 2007.”

FDA Press Release, “FDA requires removal of certain restrictions on the diabetes drug Avandia” (Nov. 25, 2013). And in December 2015, the FDA abandoned its requirement of a Risk Evaluation and Mitigation Strategy for Avandia. FDA, “Rosiglitazone-containing Diabetes Medicines: Drug Safety Communication – FDA Eliminates the Risk Evaluation and Mitigation Strategy (REMS)” (Dec. 16, 2015).

GSK’s vindication came too late to reverse Judge Rufe’s decision in the Avandia MDL.  GSK spent over six billion dollars on resolving Avandia claims.  And to add to the company’s chagrin, GSK lost patent protection for Avandia in April 2012.[11]

Something good, however, may have emerged from the Avandia litigation debacle.  Judge Rufe heard from plaintiffs’ expert witnesses in Avandia about the hierarchy of evidence, about how observational studies must be evaluated for bias and confounding, about the importance of statistical significance, and about how studies that lack power to find relevant associations may still yield conclusions with appropriate meta-analysis. Important nuances of meta-analysis methodology may have gotten lost in the kerfuffle, but given that plaintiffs had reasonable quality clinical trial data, Avandia plaintiffs’ counsel could eschew their typical reliance upon weak and irrelevant lines of evidence, based upon case reports, adverse event disproportional reporting, and the like.

The Zoloft litigation introduced Judge Rufe to a more typical pharmaceutical litigation. Because the outcomes of interest were birth defects, there were no clinical trials.  To be sure, there were observational epidemiologic studies, but now the defense expert witnesses were carefully evaluating the studies for bias and confounding, and the plaintiffs’ expert witnesses were double counting studies and ignoring multiple comparisons and validity concerns.  Once again, in the Zoloft MDL, plaintiffs’ expert witnesses made their non-specific complaints about “lack of power” (without ever specifying the relevant alternative hypothesis), but it was the defense expert witnesses who cited relevant meta-analyses that attempted to do something about the supposed lack of power. Plaintiffs’ expert witnesses inconsistently argued “lack of power” to disregard studies that had outcomes that undermined their opinions, even when those studies had narrow confidence intervals surrounding values at or near 1.0.

The Avandia litigation laid the foundation for Judge Rufe’s critical scrutiny by exemplifying the nature and quantum of evidence to support a reasonable scientific conclusion.  Notwithstanding the mistakes made in the Avandia litigation, this earlier MDL created an invidious distinction with the Zoloft PSC’s evidence and arguments, which looked as weak and insubstantial as they really were.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>. SeeThe American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016).

[2] See 21 C.F.R. § 314.80 (a) Postmarketing reporting of adverse drug experiences (defining “[a]dverse drug experience” as “[a]ny adverse event associated with the use of a drug in humans, whether or not considered drug related”).

[3] See Centers for Disease Control and Prevention, “Birth Defects Home Page” (last visited April 8, 2016).

[4] See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).

[5] Cynthia D. Mulrow, John Cornell & A. Russell Localio, “Rosiglitazone: A Thunderstorm from Scarce and Fragile Data,” 147 Ann. Intern. Med. 585 (2007).

[6] George A. Diamond, Leon Bax & Sanjay Kaul, “Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infartion and Cardiovascular Death,” 147 Ann. Intern. Med. 578 (2007).

[7] Tian, et al., “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction” 10 Biostatistics 275 (2008)

[8] Adrian V. Hernandez, Esteban Walker, John P.A. Ioannidis,  and Michael W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

[9] Janet Woodcock, FDA Decision Memorandum (Sept. 22, 2010).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11]Pharmacovigilantism – Avandia Litigation” (Nov. 27, 2013).

Lipitor MDL Cuts the Fat Out of Specific Causation

March 25th, 2016

Ms. Juanita Hempstead was diagnosed with hyperlipidemia in March 1998. Over a year later, in June 1999, with her blood lipids still elevated, her primary care physician prescribed 20 milligrams of atorvastatin per day. Ms. Hempstead did not start taking the statin regularly until July 2000. In September 2002, her lipids were under control, her blood glucose was abnormally high, and she had gained 13 pounds since she was first prescribed a statin medication. Hempstead v. Pfizer, Inc., 2:14–cv–1879, MDL No. 2:14–mn–02502–RMG, 2015 WL 9165589, at *2-3 (D.S.C. Dec. 11, 2015) (C.M.O. No. 55 in In re Lipitor Marketing, Sales Practices and Products Liability Litigation) [cited as Hempstead]. In the fall of 2003, Hempstead experienced abdominal pain, and she stopped taking the statin for a few weeks, presumably because of a concern over potential liver toxicity. Her cessation of the statin led to an increase in her blood fat, but her blood sugar remained elevated, although not in the range that would have been diagnostic of diabetes. In May 2004, about five years after starting on statin medication, having gained 15 pounds since 1999, Ms. Hempstead was diagnosed with type II diabetes mellitus. Id.

Living in a litigious society, and being bombarded with messages from the litigation industry, Ms. Hempstead sued the manufacturer of atorvastatin, Pfizer, Inc. In support of her litigation claim, Hempstead’s lawyers enlisted the support of Elizabeth Murphy, M.D., D.Phil., a Professor of Clinical Medicine, and Chief of Endocrinology and Metabolism at San Francisco General Hospital. Id. at *6. Dr. Murphy received her doctorate in biochemistry from Oxford University, and her medical degree from the Harvard Medical School. Despite her graduations from elite educational institutions, Dr. Murphy never learned the distinction between ex ante risk and assignment of causality in an individual patient.

Dr. Murphy claimed that atorvastatin causes diabetes, and that the medication caused Ms. Hempstead’s diabetes in 2004. Murphy pointed to a five-part test for her assessment of specific causation:

(1) reports or reliable studies of diabetes in patients taking atorvastatin;

(2) causation is biological plausible;

(3) diabetes appeared in the patient after starting atorvastatin;

(4) the existence of other possible causes of the patient’s diabetes; and

(5) whether the newly diagnosed diabetes was likely caused by the atorvastatin.

Id. In response to this proffered testimony, the defendant, Pfizer, Inc., challenged the admissibility of Dr. Murphy’s opinion under Federal Rule of Evidence 702.

The trial court, in reviewing Pfizer’s challenge, saw that Murphy’s opinion essentially was determined by (1), (2), and (3), above. In other words, once Murphy had become convinced of general causation, she was willing to causally attribute diabetes to atorvastatin in every patient who developed diabetes after starting to take the medication. Id. at *6-7.

Dr. Murphy relied upon some epidemiologic studies that suggested a relative risk of diabetes to be about 1.5 in patients who had taken atorvastatin. Id. at *5, *8. Unfortunately, the trial court, as is all too common among judges writing Rule 702 opinions, failed to provide citations to the materials upon which plaintiff’s expert witness relied. A safe bet, however, is that those studies, if they had any internal and external validity at all, involved multivariate analyses to analyze risk ratios for diabetes at time t1, in patients at time who had no diabetes before starting use of atorvastatin at time t0, compared with patients who did not have diabetes at t0 but never took the statin. If so, then Dr. Murphy’s use of a temporal relationship between starting atorvastatin and developing diabetes is quite irrelevant because the relative risk (1.5) relied upon is generated in studies in which the temporality is present. Ms. Hempstead’s development of diabetes five years after starting atorvastatin does not make her part of a group with a relative risk any higher than the risk ratio of 1.5, cited by Dr. Murphy. Similarly, the absence or presence of putative risk factors other than the accused statin is irrelevant because the risk ratio of 1.5 was mostly likely arrived at in studies that controlled or adjusted for the other risk factors in the epidemiologic study by a multivariate analysis. Id. at *5 & n. 8.

Dr. Murphy acknowledged that there are known risk factors for diabetes, and that plaintiff Ms. Hempstead had a few. Plaintiff was 55 years old at the time of diagnosis, and advancing age is a risk factor. Plaintiff’s body mass index (BMI) was elevated and it had increased over the five years since beginning to take atorvastatin. Even though not obese, Ms. Hempstead’s BMI was sufficiently high to confer a five-fold increase in risk for diabetes. Id. at *9. Plaintiff also had hypertension and metabolic syndrome, both of which are risk factors (with the latter adding to the level of risk of the former). Id. at *10. Perhaps hoping to avoid the intractable problem of identifying which risk factors were actually at work in Ms. Hempstead to produce her diabetes, Dr. Murphy claimed that all risk factors were causes of plaintiff’s diabetes. Her analysis was thus not so much a differential etiology as a non-differential, non-discriminating assertion that any and all risk factors were probably involved in producing the individual case. Not surprisingly, Dr. Murphy, when pressed, could not identify any professional organizations or peer-reviewed publications that employed such a methodology of attribution. Id. at *6. Dr. Murphy had never used such a method of attribution in her clinical practice; instead she attempted to justify and explain her methodology by adverting to its widespread use by expert witnesses in litigation. Id.

Relative Risk and the Inference of Specific Causation

The main thrust of the Dr. Murphy’s and the plaintiff’s specific causation claim seems to have been based upon a simple, simplistic identification of ex ante risk with causation. The MDL court recognized, however, that in science and in law, risk is not the same as causation.[1]

The existence of general causation, with elevated relative risks not likely the result of bias, chance, or confounding, does not necessarily support the inference that every person exposed to the substance or drug and who develops the outcome of interest, had his or her outcome caused by the exposure.

The law requires each plaintiff to show that his or her alleged injury, the outcome in the relied upon epidemiologic studies, was actually caused by the alleged exposure under a preponderance of the evidence. Id. at *4 (citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1249 n. 1 (11th Cir.2010))

The disconnect between risk and causation is especially strong when the nature of the causation involved results from the modification of the incidence rate of a disease as a function of exposure. Although the MDL court did not explicitly note the importance of a base rate, which gives rise to an “expected value” or “expected outcome” in an epidemiologic sample, the court’s insistence upon a relative risk greater than two, from studies of sample groups that are sufficiently similar to the plaintiff, implicitly affirms the principle. The MDL court did, however, call out Dr. Murphy’s reasoning that specific causation exists for every drug-exposed patient, in the face of studies that show general causation with associations of the magnitude less than risk ratios of two, was logically flawed. Id. at *8 (citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1255 (11th Cir. 2010) (“The fact that exposure to [a substance] may be a risk factor for [a disease] does not make it an actual cause simply because [the disease] developed.”).

The MDL court acknowledged the obvious, that some causal relationships may be based upon risk ratios of two or less (but greater than 1.0). Id. at *4. A risk ratio greater than 1.0, but not greater than two, can result only when some of the cases with the outcome of interest, here diabetes, would have occurred anyway in the population that has been sampled. And with increased risk ratios at two or less, a majority of the study sample would have developed the outcome even in the absence of the exposure of interest. With this in mind, the MDL court asked how plaintiff could show specific causation, even assuming that general causation were established with the use of epidemiologic methods.

The court in Hempstead reasoned that if the risk ratio were greater than 2.0, a majority of the exposed sample would have developed the outcome of interest because of the exposure being studied. Id. at *5. If the sampled population has had the same level of exposure as the plaintiff, then a case-specific inference of specific causation is supported.[2] Of course, this inferential strategy presupposes that general causation has been established, by ruling out bias, confounding, and chance, with high-quality, statistically significant findings of risk ratios in excess of 2.0. Id. at *5.

To be sure, there are some statisticians, such as Sander Greenland, who have criticized this use of a sample metric to assess the probability of individual causation, in part because the sample metric is an average level of risk, based upon the whole sample. Greenland is fond of speculating that the risk may not be stochastically distributed, but as the Supreme Court has recently acknowledged, there are times when the use of an average is appropriate to describe individuals within a sampled population. Tyson Foods, Inc. v. Bouaphakeo, No. 14-1146, 2016 WL 1092414 (U.S. S. Ct. Mar. 22, 2016).

The Whole Tsumish

Dr. Murphy, recognizing that there are other known and unknown causes and risk factors for diabetes, made a virtue of foolish consistency by opining that all risk factors present in Ms. Hempstead were involved in producing her diabetes. Dr. Murphy did not, and could not, explain, however, how or why she believed that every risk factor (age, BMI, hypertension, recent weight gain, metabolic syndrome, etc.), rather than some subset of factors, or some idiopathic factors, were involved in producing the specific plaintiff’s disease. The MDL court concluded that Dr. Murphy’s opinion was an ipse dixit of the sort that qualified her opinion for exclusion from trial. Id. at *10.

Biological Fingerprints

Plaintiffs posited typical arguments about “fingerprints” or biological markers that would support inferences of specific causation in the absence of high relative risks, but as is often the case with such arguments, they had no factual foundation for their claims that atorvastatin causes diabetes. Neither Dr. Murphy nor anyone else had ever identified a biological marker that allowed drug-exposed patients with diabetes to be identified as having had their diabetes actually caused by the drug of interest, as opposed to other known or unknown causes.

With Dr. Murphy’s testimony failing to satisfy common sense and Rule 702, plaintiff relied upon cases in which circumstances permitted inferences of specific causation from temporal relationships between exposure and outcome. In one such case, the plaintiff developed throat irritation from very high levels of airborne industrial talc exposure, which abated upon cessation of exposure, and returned with renewed exposure. Given that general causation was conceded, and natural experimental nature of challenge, dechallenge, and rechallenge, the Fourth Circuit in this instance held that the temporal relationship of an acute insult and onset was an adequate basis for expert witness opinion testimony on specific causation. Id. at *11. (citing Westberry v. Gislaved Gummi AB, 178 F.3d 257, 265 (4th Cir.1999) (“depending on the circumstances, a temporal relationship between exposure to a substance and the onset of a disease or a worsening of symptoms can provide compelling evidence of causation”); Cavallo v. Star Enter., 892 F. Supp. 756, 774 (E.D. Va.1995) (discussing unique, acute onset of symptoms caused by chemicals). In the Hempstead case, however, the very nature of the causal relationship claimed did not involve an acute reaction. The claimed injury, diabetes, emerged five years after statin use commenced, and the epidemiologic studies relied upon were all based upon this chronic use, with a non-acute, latent outcome. The trial judge thus would not credit the mere temporality between drug use and new onset of diabetes as probative of anything.


[1] Id. at *8, citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1255 (11th Cir.2010) (“The fact that exposure to [a substance] may be a risk factor for [a disease] does not make it an actual cause simply because [the disease] developed.”); id. at *11, citing McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1243 (11th Cir.2005) (“[S]imply because a person takes drugs and then suffers an injury does not show causation. Drawing such a conclusion from temporal relationships leads to the blunder of the post hoc ergo propter hoc fallacy.”); see also Roche v. Lincoln Prop. Co., 278 F.Supp. 2d 744, 752 (E.D. Va.2003) (“Dr. Bernstein’s reliance on temporal causation as the determinative factor in his analysis is suspect because it is well settled that a causation opinion based solely on a temporal relationship is not derived from the scientific method and is therefore insufficient to satisfy the requirements of Rule 702.”) (internal quotes omitted).

[2] See Reference Manual on Scientific Evidence at 612 (3d ed. 2011) (noting “the logic of the effect of doubling of the risk”); see also Marder v.G.D. Searle & Co., 630 F. Supp. 1087, 1092 (D. Md.1986) (“In epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof-a showing of causation by the preponderance of the evidence or, in other words, a probability of greater than 50%.”).

The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees

March 19th, 2016

People say crazy things. In a radio interview, Evangelical Michael Huckabee argued that the Kentucky civil clerk who refused to issue a marriage license to a same-sex couple was as justified in defying an unjust court decision as people are justified in disregarding Dred Scott v. Sanford, 60 U.S. 393 (1857), which Huckabee described as still the “law of the land.”1 Chief Justice Roger B. Taney would be proud of Huckabee’s use of faux history, precedent, and legal process to argue his cause. Definition of “huckabee”: a bogus factoid.

Consider the case of Sander Greenland, who attempted to settle a score with an adversary’s expert witness, who had opined in 2002, that Bayesian analyses were rarely used at the FDA for reviewing new drug applications. The adversary’s expert witness obviously got Greenland’s knickers in a knot because Greenland wrote an article in a law review of all places, in which he presented his attempt to “correct the record” and show how the statement of the opposing expert witness was“ludicrous” .2 To support his indictment on charges of ludicrousness, Greenland ignored the FDA’s actual behavior in reviewing new drug applications,3 and looked at the practice of the Journal of Clinical Oncology, a clinical journal published 24 issues a year, with occasional supplements. Greenland found the word “Bayesian” 50 times in over 40,000 journal pages, and declared victory. According to Greenland, “several” (unquantified) articles had used Bayesian methods to explore, post hoc, statistically nonsignificant results.”4

Given Greenland’s own evidence, the posterior odds that Greenland was correct in his charges seem to be disturbingly low, but he might have looked at the published papers that conducted more serious, careful surveys of the issue.5 This week, the Journal of the American Medical Association published yet another study by John Ioannidis and colleagues, which documented actual practice in the biomedical literature. And no surprise, Bayesian methods barely register in a systematic survey of the last 25 years of published studies. See David Chavalarias, Joshua David Wallach, Alvin Ho Ting Li, John P. A. Ioannidis, “Evolution of reporting P values in the biomedical literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016). See also Demetrios N. Kyriacou, “The Enduring Evolution of the P Value,” 315 J. Am. Med. Ass’n 1113 (2016) (“Bayesian methods are not frequently used in most biomedical research analyses.”).

So what are we to make of Greenland’s animadversions in a law review article? It was a huckabee moment.

Recently, the American Statistical Association (ASA) issued a statement on the use of statistical significance and p-values. In general, the statement was quite moderate, and declined to move in the radical directions urged by some statisticians who attended the ASA’s meeting on the subject. Despite the ASA’s moderation, the ASA’s statement has been met with huckabee-like nonsense and hyperbole. One author, a pharmacologist trained at the University of Washington, with post-doctoral training at the University of California, Berkeley, and an editor of PloS Biology, was moved to write:

However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Lauren Richardson, “Is the p-value pointless?” (Mar. 16, 2016). And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure. Richardson suffered a lapse and wrote a huckabee.

Not surprisingly, lawyers attempting to spin the ASA’s statement have unleashed entire hives of huckabees in an attempt to deflate the methodological points made by the ASA. Here is one example of a litigation-industry lawyer who argues that the American Statistical Association Statement shows the irrelevance of statistical significance for judicial gatekeeping of expert witnesses:

To put it into the language of Daubert, debates over ‘p-values’ might be useful when talking about the weight of an expert’s conclusions, but they say nothing about an expert’s methodology.”

Max Kennerly, “Statistical Significance Has No Place In A Daubert Analysis” (Mar. 13, 2016) [cited as Kennerly]

But wait; the expert witness must be able to rule out chance, bias and confounding when evaluating a putative association for causality. As Austin Bradford Hill explained, even before assessing a putative association for causality, scientists need first to have observations that

reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

The analysis of random error is an essential step on the methodological process. Simply because a proper methodology requires consideration of non-statistical factors does not remove the statistical from the methodology. Ruling out chance as a likely explanation is a crucial first step in the methodology for reaching a causal conclusion when there is an “expected value” or base rate of for the outcome of interest in the population being sampled.

Kennerly shakes his hive of huckabees:

The erroneous belief in an ‘importance of statistical significance’ is exactly what the American Statistical Association was trying to get rid of when they said, ‘The widespread use of “statistical significance” (generally interpreted as p ≤ 0.05)’ as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

And yet, the ASA never urged that scientists “get rid of” statistical analyses and assessments of attained levels of significance probability. To be sure, they cautioned against overinterpreting p-values, especially in the context of multiple comparisons, non-prespecified outcomes, and the like. The ASA criticized bright-line rules, which are often used by litigation-industry expert witnesses to over-endorse the results of studies with p-values less than 5%, often in the face of multiple comparisons, cherry-picked outcomes, and poorly and incompletely described methods and results. What the ASA described as a “considerable distortion of the scientific process” was claiming scientific truth on the basis of “p < 0.05.” As Bradford Hill pointed out in 1965, a clear-cut association, beyond that which we would care to attribute to chance, is the beginning of the analysis of an association for causality, not the end of it. Kennerly ignores who is claiming “truth” in the litigation context.  Defense expert witnesses frequently are opining no more than “not proven.” The litigation industry expert witnesses must opine that there is causation, or else they are out of a job.

The ASA explained that the distortion of the scientific process comes from making a claim of a scientific conclusion of causality or its absence, when the appropriate claim is “we don’t know.” The ASA did not say, suggest, or imply that a claim of causality can be made in the absence of finding statistical significance, and as well as validation of the statistical model on which it is based, and other factors as well. The ASA certainly did not say that the scientific process will be served well by reaching conclusions of causation without statistical significance. What is clear is that statistical significance should not be an abridgment for a much more expansive process. Reviewing the annals of the International Agency for Research on Cancer (even in its currently politicized state), or the Institute of Medicine, an honest observer would be hard pressed to come up with examples of associations for outcomes that have known base rates, which associations were determined to be causal in the absence of studies that exhibited statistical significance, along with many other indicia of causality.

Some other choice huckabees from Kennerly:

“It’s time for courts to start seeing the phrase ‘statistically significant’ in a brief the same way they see words like ‘very,’ ‘clearly,’ and ‘plainly’. It’s an opinion that suggests the speaker has strong feelings about a subject. It’s not a scientific principle.”

Of course, this ignores the central limit theorems, the importance of random sampling, the pre-specification of hypotheses and level of Type I error, and the like. Stuff and nonsense.

And then in a similar vein, from Kennerly:

The problem is that many courts have been led astray by defendants who claim that ‘statistical significance’ is a threshold that scientific evidence must pass before it can be admitted into court.”

In my experience, litigation-industry lawyers oversell statistical significance rather than defense counsel who may question reliance upon studies that lack it. Kennerly’s statement is not even wrong, however, because defense counsel knowledgeable of the rules of evidence would know that statistical studies themselves are rarely admitted into evidence. What is admitted, or not, is the opinion of expert witnesses, who offer opinions about whether associations are causal, or not causal, or inconclusive.


1 Ben Mathis-Lilley, “Huckabee Claims Black People Aren’t Technically Citizens During Critique of Unjust Laws,” The Slatest (Sept. 11 2015) (“[T]he Dred Scott decision of 1857 still remains to this day the law of the land, which says that black people aren’t fully human… .”).

2 Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014).

3 To be sure, eight years after Greenland published this diatribe, the agency promulgated a guidance that set recommended practices for Bayesian analyses in medical device trials. FDA Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (February 5, 2010); 75 Fed. Reg. 6209 (February 8, 2010); see also Laura A. Thompson, “Bayesian Methods for Making Inferences about Rare Diseases in Pediatric Populations” (2010); Greg Campbell, “Bayesian Statistics at the FDA: The Trailblazing Experience with Medical Devices” (Presentation give by Director, Division of Biostatistics Center for Devices and Radiological Health at Rutgers Biostatistics Day, April 3, 2009). Even today, Bayesian analysis remains uncommon at the U.S. FDA.

4 39 Wake Forest Law Rev. at 306-07 & n.61 (citing only one paper, Lisa Licitra et al., Primary Chemotherapy in Resectable Oral Cavity Squamous Cell Cancer: A Randomized Controlled Trial, 21 J. Clin. Oncol. 327 (2003)).

5 See, e.g., J. Martin Bland & Douglas G. Altman, “Bayesians and frequentists,” 317 Brit. Med. J. 1151, 1151 (1998) (“almost all the statistical analyses which appear in the British Medical Journal are frequentist”); David S. Moore, “Bayes for Beginners? Some Reasons to Hesitate,” 51 The Am. Statistician 254, 254 (“Bayesian methods are relatively rarely used in practice”); J.D. Emerson & Graham Colditz, “Use of statistical analysis in the New England Journal of Medicine,” in John Bailar & Frederick Mosteler, eds., Medical Uses of Statistics 45 (1992) (surveying 115 original research studies for statistical methods used; no instances of Bayesian approaches counted); Douglas Altman, “Statistics in Medical Journals: Developments in the 1980s,” 10 Statistics in Medicine 1897 (1991); B.S. Everitt, “Statistics in Psychiatry,” 2 Statistical Science 107 (1987) (finding only one use of Bayesian methods in 441 papers with statistical methodology).

Birth Defects Case Exceeds NY Court of Appeal’s Odor Threshold

March 14th, 2016

The so-called “weight of the evidence” (WOE) approach by expert witnesses has largely been an argument for subjective weighting of studies and cherry picking of data to reach a favored, pre-selected conclusion. The approach is so idiosyncratic and amorphous that it really is no method at all, which is exactly why it seems to have been embraced by the litigation industry and its cadre of expert witnesses.

The WOE enjoyed some success in the First Circuit’s Milward decision, with much harrumphing from the litigation industry and its proxies, but more recently courts have mostly seen through the ruse and employed their traditional screening approaches to exclude opinions that deviate from the relevant standard of care of scientific opinion testimony.[1]

In Reeps, the plaintiff child was born with cognitive and physical defects, which his family claimed resulted from his mother’s inhalation of gasoline fumes in her allegedly defective BMW. To support their causal claims, the Reeps proffered the opinions of two expert witnesses, Linda Frazier and Shira Kramer, on both general and specific causation of the child’s conditions. The defense presented reports from Anthony Scialli and Peter Lees.

Justice York, of the Supreme Court for New York County, sustained defendants’ objections to the admissibility of Frazier and Kramer’s opinions, in a careful opinion that dissected the general and specific causation opinions that invoked WOE methods. Reeps v. BMW of North America, LLC, 2012 NY Slip Op 33030(U), N.Y.S.Ct., Index No. 100725/08 (New York Cty. Dec. 21, 2012) (York, J.), 2012 WL 6729899, aff’d on rearg., 2013 WL 2362566.

The First Department of the Appellate Division affirmed Justice York’s exclusionary ruling and then certified the appellate question to the New York Court of Appeals. 115 A.D.3d 432, 981 N.Y.S.2d 514 (2013).[2] Last month, the New York high court affirmed in a short opinion that focused on the plaintiff’s claim that Mrs. Reeps must have been exposed to a high level of gasoline (and its minor constituents, such as benzene) because she experienced symptoms such as dizziness while driving the car. Sean R. v. BMW of North America, LLC, ___ N.E.3d ___, 2016 WL 527107, 2016 N.Y. Slip Op. 01000 (2016).[3]

The car in question was a model that was recalled by BMW for a gasoline line leak, and there was thus no serious question that there had been some gasoline exposure to the plaintiff’s mother and thus to the plaintiff and thus perhaps to the plaintiff in utero. According to the Court of Appeals, the plaintiff’s expert witness Frazier concluded that the gasoline fume exposures to the car occupants exceeded 1,000 parts per million (ppm) because studies showed that symptoms of acute toxicity were reported when exposures reached or exceeded 1,000 ppm. The mother of the car’s owner claimed to suffer dizziness and nausea when riding in the car, and Frazier inferred from these self-reported, in litigation, symptoms that the plaintiff’s mother also sustained gasoline exposures in excess of 1,000 ppm. From this inference about level of exposure, Frazier then proceeded to use the “Bradford Hill criteria” to opine that unleaded gasoline vapor is capable of causing the claimed birth defects based upon “the link between exposure to the constituent chemicals and adverse birth outcomes.” And then using the wizardry of differential etiology, Frazier was able to conclude that the mother’s first-trimester exposure to gasoline fumes was the probable cause of plaintiff’s birth defects.

There was much wrong with Frazier’s opinions, as detailed in the trial court’s decision, but for reasons unknown, the Court of Appeals chose to focus on Frazier’s symptom-threshold analysis. The high court provided no explanation of how Frazier applied the Bradford Hill criteria, or her downward extrapolation from high-exposure benzene or solvent exposure birth defect studies to a gasoline-exposure case that involved only a small percentage of benzene or solvent in the high-exposure studies. There is no description from the Court of what a “link” might be, or how it is related to a cause; nor is there any discussion of how Frazier might have excluded the most likely cause of birth defects: the unknown. The Court also noted that plaintiff’s expert witness Kramer had employed a WOE-ful analysis, but it provided no discussion of what was amiss with Kramer’s opinion. A curious reader might think that the Court had overlooked and dismissed “sound science,” but Justice York’s trial court opinion fully addressed the inadequacies of these other opinions.

The Court of Appeals acknowledge that “odor thresholds” can be helpful in estimating a plaintiff’s level of exposure to a potentially toxic chemical, but it noted that there was no generally accepted exposure assessment methodology that connected the report of an odor to adverse pregnancy outcomes.

Frazier, however, had not adverted to an odor threshold, but a symptom threshold. In support, Frazier pointed to three things:

  1. A report of the American Conference of Governmental and Industrial Hygienists (ACGIH), (not otherwise identified) which synthesized the results of controlled studies, and reported a symptom threshold of “mild toxic effects” to be about 1,000 ppm;
  1. A 1991 study (not further identified) that purportedly showed a dose-response between exposures to ethanol and toluene and headaches; and
  1. A 2008 report (again not further identified) that addressed the safety of n-Butyl alcohol in cosmetic products.

Item (2) seems irrelevant at best, given that ethanol and toluene are again minor components of gasoline, and that the exposure levels in the study are not given. Item (3) again seems off the report because the Court’s description does not allude to any symptom threshold; nor is there any attempt to tie exposure levels of n-Butyl to the experienced levels of gasoline in the Reeps case.

With respect to item (1), which supposedly had reported that if exposure exceeded 1,000 ppm, then headaches and nausea can occur acutely, the Court asserted that the ACGIH report did not support an inverse inference, that if headaches and nausea had occurred, then exposures exceeded 1,000 ppm.

It is true that ) does not logically support ), but the claimed symptoms, their onset and abatement, and the lack of other known precipitating causes would seem to provide some evidence for exposures above the symptom threshold. Rather than engaging with the lack of scientific evidence on the claimed causal connection between gasoline and birth defects, however, the Court invoked the lack of general acceptance of the “symptom-threshold” methodology to dispose of the case.

In its short opinion, The Court of Appeals did not address the quality, validity, or synthesis of studies urged by plaintiff’s expert witnesses; nor did it address the irrelevancy of whether the plaintiff’s grandmother or his mother had experienced acute symptoms such as nausea to the level that might be relevant to causing embryological injury. Had it done so, the Court would have retraced the path of Justice York, in the trial court, who saw through the ruse of WOE and the blatantly false claim that the scientific evidence even came close to satisfying the Bradford Hill factors. Furthermore, the Court might have found that the defense expert witnesses were entirely consistent with the Centers for Disease Control:

“The hydrocarbons found in gasoline can cross the placenta. There is no direct evidence that maternal exposure to gasoline causes fetotoxic or teratogenic effects. Gasoline is not included in Reproductive and Developmental Toxicants, a 1991 report published by the U.S. General Accounting Office (GAO) that lists 30 chemicals of concern because of widely acknowledged reproductive and developmental consequences.”

Agency for Toxic Substances and Disease Registry, “Medical Management Guidelines for Gasoline” (Oct. 21, 2014, last updated) (“Toxic Substances Portal – Gasoline, Automotive”); Agency for Toxic Substances and Disease Registry, “Public Health Statement for Automotive Gasoline” (June 1995) (“There is not enough information available to determine if gasoline causes birth defects or affects reproduction.”); see also National Institute for Occupational Safety & Health, Occupational Exposure to Refined Petroleum Solvents: Criteria for a Recommended Standard (1977).


[1] See, e.g., In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345, 1367 (S.D. Fla. 2011), aff’d, Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296 (11th Cir. 2014). See alsoFixodent Study Causes Lockjaw in Plaintiffs’ Counsel” (Feb. 4, 2015); “WOE-fully Inadequate Methodology – An Ipse Dixit By Another Name” (May 1, 2012); “I Don’t See Any Method At All”   (May 2, 2013).

[2]New York Breathes Life Into Frye Standard – Reeps v. BMW” (March 5, 2013); “As They WOE, So No Recovery Have the Reeps” (May 22, 2013).

[3] See Sean T. Stadelman “Symptom Threshold Methodology Rejected by Court of Appeals of New York Pursuant to Frye,” (Feb. 18, 2016).

Systematic Reviews and Meta-Analyses in Litigation, Part 2

February 11th, 2016

Daubert in Knee’d

In a recent federal court case, adjudicating a plaintiff’s Rule 702 challenge to defense expert witnesses, the trial judge considered plaintiff’s claim that the challenged witness had deviated from PRISM guidelines[1] for systematic reviews, and thus presumably had deviated from the standard of care required of expert witnesses giving opinions about causal conclusions.

Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015) [cited as Batty I]. The trial judge, the Hon. Rebecca R. Pallmeyer, denied plaintiff’s motion to exclude the allegedly deviant witness, but appeared to accept the premise of the plaintiff’s argument that an expert witness’s opinion should be reached in the manner of a carefully constructed systematic review.[2] The trial court’s careful review of the challenged witness’s report and deposition testimony revealed that there had mean no meaningful departure from the standards put forward for systematic reviews. SeeSystematic Reviews and Meta-Analyses in Litigation” (Feb. 5, 2016).

Two days later, the same federal judge addressed a different set of objections by the same plaintiff to two other of the defendant’s, Zimmer Inc.’s, expert witnesses, Dr. Stuart Goodman and Dr. Timothy Wright. Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5095727, (N.D. Ill. Aug. 27, 2015) [cited as Batty II]. Once again, plaintiff Batty argued for the necessity of adherence to systematic review principles. According to Batty, Dr. Wright’s opinion, based upon his review of the clinical literature, was scientifically and legally unreliable because he had not conducted a proper systematic review. Plaintiff alleged that Dr. Wright’s review selectively “cherry picked” favorable studies to buttress his opinion, in violation of systematic review guidelines. The trial court, which had assumed that a systematic review was the appropriate “methodology” for Dr. Vitale, in Batty I, refused to sustain the plaintiff’s challenge in Batty II, in large part because the challenged witness, Dr. Wright, had not claimed to have performed a systematic or comprehensive review, and so his failure to follow the standard methodology did not require the exclusion of his opinion at trial. Batty II at *3.

The plaintiff never argued that Dr. Wright misinterpreted any of his selected studies upon which he relied, and the trial judge thus suggested that Dr. Wright’s discussion of the studies, even if a partial, selected group of studies, would be helpful to the jury. The trial court thus left the plaintiff to her cross-examination to highlight Dr. Wright’s selectivity and lack of comprehensiveness. Apparently, in the trial judge’s view, this expert witness’s failure to address contrary studies did not render his testimony unreliable under “Daubert scrutiny.” Batty II at *3.

Of course, it is no longer the Daubert judicial decision that mandates scrutiny of expert witness opinion testimony, but Federal Rule of Evidence 702. Perhaps it was telling that when the trial court backed away from its assumption, made in Batty I, that guidelines or standards for systematic reviews should inform a Rule 702 analysis, the court cited Daubert, a judicial opinion superseded by an Act of Congress, in 2000. The trial judge’s approach, in Batty II, threatens to make gatekeeping meaningless by deferring to the expert witness’s invocation of personal, idiosyncratic, non-scientific standards. Furthermore, the Batty II approach threatens to eviscerate gatekeeping for clinical practitioners who remain blithely unaware of advances in epidemiology and evidence-based medicine. The upshot of Batty I and II combined seems to be that systematic review principles apply to clinical expert witnesses only if those witness choose to be bound by such principles. If this is indeed what the trial court intended, then it is jurisprudential nonsense.

The trial court, in Batty II, exercised a more searching approach, however, to Dr. Wright’s own implant failure analysis, which he relied upon in an attempt to rebut plaintiff’s claim of defective design. The plaintiff claimed that the load-bearing polymer surfaces of the artificial knee implant experienced undue deformation. Dr. Wright’s study found little or no deformation on the load bearing polymer surfaces of the eight retrieved artificial joints. Batty II at *4.

Dr. Wright assessed deformation qualitatively, not quantitatively, through the use of a “colormetric map of deformation” of the polymer surface. Dr. Wright, however, provided no scale to define or assess how much deformation was represented by the different colors in his study. Notwithstanding the lack of any metric, Dr. Wright concluded that his findings, based upon eight retrieved implants, “suggested” that the kind of surface failing claimed by plaintiff was a “rare event.”

The trial court had little difficulty in concluding that Dr. Wright’s evidentiary base was insufficient, as was his presentation of the study’s data and inferences. The challenged witness failed to explain how his conclusions followed from his data, and thus his proffered testimony fell into the “ipse dixit” category of inadmissible opinion testimony. General Electric v. Joiner, 522 U.S. 136, 146 (1997). In the face of the challenge to his opinions, Dr. Wright supplemented his retrieval study with additional scans of surficial implant wear patterns, but he failed again to show the similarity of previous use and failure conditions in the patients from whom these implants were retrieved and the plaintiff’s case (which supposedly involved aseptic loosening). Furthermore, Dr. Wright’s interpretation of his own retrieval study was inadequate in the trial court’s view because he had failed to rule out other modes of implant failure, in which the polyethylene surface would have been preserved. Because, even as supplemented, Dr. Wright’s study failed to support his proffered opinions, the court held that his opinions, based upon his retrieval study had to be excluded under Rule 702. The trial court did not address the Rule 703 implications for Dr. Wright’s reliance upon a study that was poorly designed and explained, and which lacked the ability to support his contention that the claimed mode of implant failure was a “rare” event. Batty II at *4 – 5.


[1] See David Moher , Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, & The PRISMA Group, “Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement,” 6 PLoS Med e1000097 (2009) [PRISMA].

[2] Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015).

Systematic Reviews and Meta-Analyses in Litigation

February 5th, 2016

Kathy Batty is a bellwether plaintiff in a multi-district litigation[1] (MDL) against Zimmer, Inc., in which hundreds of plaintiffs claim that Zimmer’s NexGen Flex implants are prone to have their femoral and tibial elements prematurely aseptically loosen (independent of any infection). Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015) [cited as Batty].

PRISMA Guidelines for Systematic Reviews

Zimmer proffered Dr. Michael G. Vitale, an orthopedic surgeon, with a master’s degree in public health, to testify that, in his opinion, Batty’s causal claims were unfounded. Batty at *4. Dr. Vitale prepared a Rule 26 report that presented a formal, systematic review of the pertinent literature. Batty at *3. Plaintiff Batty challenged the admissibility of Dr. Vitale’s opinion on grounds that his purportedly “formal systematic literature review,” done for litigation, was biased and unreliable, and not conducted according to generally accepted principles for such reviews. The challenged was framed, cleverly, in terms of Dr. Vitale’s failure to comply with a published set of principles outlined in “PRISMA” guidelines (Preferred Reporting Items for Systematic reviews and Meta-Analyses), which enjoy widespread general acceptance among the clinical journals. See David Moher , Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, & The PRISMA Group, “Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement,” 6 PLoS Med e1000097 (2009) [PRISMA]. Batty at *5. The trial judge, Hon. Rebecca R. Pallmeyer, denied plaintiff’s motion to exclude Dr. Vitale, but in doing so accepted, arguendo, the plaintiff’s implicit premise that an expert witness’s opinion should be reached in the manner of a carefully constructed systematic review.

The plaintiff’s invocation of the PRISMA guidelines presented several difficult problems for her challenge and for the court. PRISMA provides a checklist of 27 items for journal editors to assess the quality and completeness of systematic reviews that are submitted for publication. Plaintiff Batty focused on several claimed deviations from the guidelines:

  • “failing to explicitly state his study question,
  • failing to acknowledge the limitations of his review,
  • failing to present his findings graphically, and failing to reproduce his search results.”

Batty’s challenge to Dr. Vitale thus turned on whether Zimmer’s expert witness had failed to deploy “same level of intellectual rigor,” as someone in the world of clinical medicine would [should] have in conducting a similar systematic review. Batty at *6.

Zimmer deflected the challenge, in part by arguing that PRISMA’s guidelines are for the reporting of systematic reviews, and they are not necessarily criteria for valid reviews. The trial court accepted this rebuttal, Batty at *7, but missed the point that some of the guidelines call for methods that are essential for rigorous, systematic reviews, in any forum, and do not merely specify “publishability.” To be sure, PRISMA itself does not always distinguish between what is essential for journal publication, as opposed to what is needed for a sufficiently valid systematic review. The guidelines, for instance, call for graphical displays, but in litigation, charts, graphs, and other demonstratives are often not produced until the eve of trial, when case management orders call for the parties to exchange such materials. In any event, Dr. Vitale’s omission of graphical representations of his findings was consistent with his finding that the studies were too clinical heterogeneous in study design, follow-up time, and pre-specified outcomes, to permit nice, graphical summaries. Batty at *7-8.

Similarly, the PRISMA guidelines call for a careful specification of the clinical question to be answered, but in litigation, the plaintiff’s causal claims frame the issue to be addressed by the defense expert witness’s literature review. The trial court readily found that Dr. Vitale’s research question was easily discerned from the context of his report in the particular litigation. Batty at *7.

Plaintiff Batty’s challenge pointed to Dr. Vitale’s failure to acknowledge explicitly the limitations of his systematic review, an omission that virtually defines expert witness reports in litigation. Given the availability of discovery tools, such as a deposition of Dr. Vitale (at which he readily conceded the limitations of his review), and the right of confrontation and cross-examination (which are not available, alas, for published articles), the trial court found that this alleged deviation was not particularly relevant to the plaintiff’s Rule 702 challenge. Batty at *8.

Batty further charged that Dr. Vitale had not “reproduced” his own systematic review. Arguing that a systematic review’s results must be “transparent and reproducible,” Batty claimed that Zimmer’s expert witness’s failure to compile a list of studies that were originally retrieved from his literature search deprived her, and the trial court, of the ability to determine whether the search was complete and unbiased. Batty at *8. Dr. Vitale’s search protocol and inclusionary and exclusionary criteria were, however, stated, explained, and reproducible, even though Dr. Vitale did not explain the application of his criteria to each individual published paper. In the final analysis, the trial court was unmoved by Batty’s critique, especially given that her expert witnesses failed to identify any relevant studies omitted from Dr. Vitale’s review. Batty at *8.

Lumping or Commingling of Heterogeneous Studies

The plaintiff pointed to Dr. Vitale’s “commingling” of studies, heterogeneous in terms of “study length, follow-up, size, design, power, outcome, range of motion, component type” and other clinical features, as a deep flaw in the challenged expert witness’s methodology. Batty at *9. Batty’s own retained expert witness, Dr. Kocher, supported Batty’s charge by adverting to the clinical variability in studies included in Dr. Vitale’s review, and suggesting that “[h]igh levels of heterogeneity preclude combining study results and making conclusions based on combining studies.” Dr. Kocher’s argument was rather beside the point because Dr. Vitale had not impermissibly combined clinically or statistically heterogeneous outcomes.[2] Similarly, the plaintiff’s complaint that Dr. Vitale had used inconsistent criteria of knee implant survival rates was dismissed by the trial court, which easily found Dr. Vitale’s survival criteria both pre-specified and consistent across his review of studies, and relevant to the specific alleged by Ms. Batty. Batty at *9.

Cherry Picking

The trial court readily agreed with Plaintiff’s premise that an expert witness who used inconsistent inclusionary and exclusionary criteria would have to be excluded under Rule 702. Batty at *10, citing In Re Zoloft, 26 F. Supp. 3d 449, 460–61 (E.D. Pa.2014) (excluding epidemiologist Dr. Anick Bérard proffered testimony because of her biased cherry picking and selection of studies to support her studies, and her failure to account for contradictory evidence). The trial court, however, did find that Dr. Vitale’s review was corrupted by the kind of biased cherry picking that Judge Rufe found to have been committed by Dr. Anick Bérard, in the Zoloft MDL.

Duplicitous Duplication

Plaintiff’s challenge of Dr. Vitale did manage to spotlight an error in Dr. Vitale’s inclusion of two studies that were duplicate analyses of the same cohort. Apparently, Dr. Vitale had confused the studies as not being of the same cohort because the two papers reported different sample sizes. Dr. Vitale admitted that his double counting the same cohort “got by the peer-review process and it got by my filter as well.” Batty at *11, citing Vitale Dep. 284:3–12. The trial court judged Dr. Vitale’s error to have been:

“an inadvertent oversight, not an attempt to distort the data. It is also easily correctable by removing one of the studies from the Group 1 analysis so that instead of 28 out of 35 studies reporting 100% survival rates, only 27 out of 34 do so.”

Batty at *11.

The error of double counting studies in quantitative reviews and meta-analyses has become a prevalent problem in both published studies[3] and in litigation reports. Epidemiologic studies are sometimes updated and extended with additional follow up. The prohibition against double counting data is so obvious that it often is not even identified on checklists, such as PRISMA. Furthermore, double counting of studies, or subgroups within studies, is a flaw that most careful readers can identify in a meta-analysis, without advance training. According to statistician Stephen Senn, double counting of evidence is a serious problem in published meta-analytical studies.[4] Senn observes that he had little difficulty in finding examples of meta-analyses gone wrong, including meta-analyses with double counting of studies or data, in some of the leading clinical medical journals. Senn urges analysts to “[b]e vigilant about double counting,” and recommends that journals should withdraw meta-analyses promptly when mistakes are found.”[5]

An expert witness who wished to skate over the replication and consistency requirement might be tempted, as was Dr. Michael Freeman, to count the earlier and later iteration of the same basic study to count as “replication.” Proper methodology, however, prohibits double dipping data to count the later study that subsumes the early one as a “replication”:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology. Dr. Freeman claims the Alwan and Reefhuis studies demonstrate replication. However, the population Alwan studied is only a subset of the Reefhuis population and therefore they are effectively the same.”

Porter v. SmithKline Beecham Corp., No. 03275, 2015 WL 5970639, at *9 (Phila. Cty. Pennsylvania, Ct. C.P. October 5, 2015) (Mark I. Bernstein, J.)

Conclusions

The PRISMA and similar guidelines do not necessarily map the requisites of admissible expert witness opinion testimony, but they are a source of some important considerations for the validity of any conclusion about causality. On the other hand, by specifying the requisites of a good publication, some PRISMA guidelines are irrelevant to litigation reports and testimony of expert witnesses. Although Plaintiff Batty’s challenge overreached and failed, the premise of her challenge is noteworthy, as is the trial court’s having taken the premise seriously. Ultimately, the challenge to Dr. Vitale’s opinion failed because the specified PRISMA guidelines, supposedly violated, were either irrelevant or satisfied.


[1] Zimmer Nexgen Knee Implant Products Liability Litigation.

[2] Dr. Vitale’s review is thus easily distinguished from what has become commonplace in litigation of birth defect claims, where, for instance, some well-known statisticians [names available upon request] have conducted qualitative reviews and quantitative meta-analyses of highly disparate outcomes, such as any and all cardiovascular congenital anomalies. In one such case, a statistician expert witness hired by plaintiffs presented a meta-analysis that included study results of any nervous system defect, and central nervous system defect, and any neural tube defect, without any consideration of clinical heterogeneity or even overlap with study results.

[3] See, e.g., Shekoufeh Nikfar, Roja Rahimi, Narjes Hendoiee, and Mohammad Abdollahi, “Increasing the risk of spontaneous abortion and major malformations in newborns following use of serotonin reuptake inhibitors during pregnancy: A systematic review and updated meta-analysis,” 20 DARU J. Pharm. Sci. 75 (2012); Roja Rahimi, Shekoufeh Nikfara, Mohammad Abdollahic, “Pregnancy outcomes following exposure to serotonin reuptake inhibitors: a meta-analysis of clinical trials,” 22 Reproductive Toxicol. 571 (2006); Anick Bérard, Noha Iessa, Sonia Chaabane, Flory T. Muanda, Takoua Boukhris, and Jin-Ping Zhao, “The risk of major cardiac malformations associated with paroxetine use during the first trimester of pregnancy: A systematic review and meta-analysis,” 81 Brit. J. Clin. Pharmacol. (2016), in press, available at doi: 10.1111/bcp.12849.

[4] Stephen J. Senn, “Overstating the evidence – double counting in meta-analysis and related problems,” 9, at *1 BMC Medical Research Methodology 10 (2009).

[5] Id. at *1, *4.


DOUBLE-DIP APPENDIX

Some papers and textbooks, in addition to Stephen Senn’s paper, cited above, which note the impermissible method of double counting data or studies in quantitative reviews.

Aaron Blair, Jeanne Burg, Jeffrey Foran, Herman, Gibb, Sander Greenland, Robert Morris, Gerhard Raabe, David Savitz, Jane Teta, Dan Wartenberg, Otto Wong, and Rae Zimmerman, “Guidelines for Application of Meta-analysis in Environmental Epidemiology,” 22 Regulatory Toxicol. & Pharmacol. 189, 190 (1995).

“II. Desirable and Undesirable Attributes of Meta-Analysis

* * *

Redundant information: When more than one study has been conducted on the same cohort, the later or updated version should be included and the earlier study excluded, provided that later versions supply adequate information for the meta-analysis. Exclusion of, or in rare cases, carefully adjusting for overlapping or duplicated studies will prevent overweighting of the results by one study. This is a critical issue where the same cohort is reexamined or updated several times. Where duplication exists, decision criteria should be developed to determine which of the studies are to be included and which excluded.”

Sander Greenland & Keith O’Rourke, “Meta-Analysis – Chapter 33,” in Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 652, 655 (3d ed. 2008) (emphasis added)

Conducting a Sound and Credible Meta-Analysis

Like any scientific study, an ideal meta-analysis would follow an explicit protocol that is fully replicable by others. This ideal can be hard to attain, but meeting certain conditions can enhance soundness (validity) and credibility (believability). Among these conditions we include the following:

  • A clearly defined set of research questions to address.

  • An explicit and detailed working protocol.

  • A replicable literature-search strategy.

  • Explicit study inclusion and exclusion criteria, with a rationale for each.

  • Nonoverlap of included studies (use of separate subjects in different included studies), or use of statistical methods that account for overlap.* * * * *”

Matthias Egger, George Davey Smith, and Douglas G. Altman, Systematic Reviews in Health Care: Meta-Analysis in Context 59 – 60 (2001).

Duplicate (multiple) publication bias

***

The production of multiple publications from single studies can lead to bias in a number of ways.85 Most importantly, studies with significant results are more likely to lead to multiple publications and presentations,45 which makes it more likely that they will be located and included in a meta-analysis. The inclusion of duplicated data may therefore lead to overestimation of treatment effects, as recently demonstrated for trials of the efficacy of ondansetron to prevent postoperative nausea and vomiting86.”

Khalid Khan, Regina Kunz, Joseph Kleijnen, and Gerd Antesp, Systematic Reviews to Support Evidence-Based Medicine: How to Review and Apply Findings of Healthcare Research 35 (2d ed. 2011)

“2.3.5 Selecting studies with duplicate publication

Reviewers often encounter multiple publications of the same study. Sometimes these will be exact  duplications, but at other times they might be serial publications with the more recent papers reporting increasing numbers of participants or lengths of follow-up. Inclusion of duplicated data would inevitably bias the data synthesis in the review, particularly because studies with more positive results are more likely to be duplicated. However, the examination of multiple reports of the same study may provide useful information about its quality and other characteristics not captured by a single report. Therefore, all such reports should be examined. However, the data should only be counted once using the largest, most complete report with the longest follow-up.”

Julia H. Littell, Jacqueline Corcoran, and Vijayan Pillai, Systematic Reviews and Meta-Analysis 62-63 (2008)

Duplicate and Multiple Reports

***

It is a bit more difficult to identify multiple reports that emanate from a single study. Sometimes these reports will have the same authors, sample sizes, program descriptions, and methodological details. However, author lines and sample sizes may vary, especially when there are reports on subsamples taken from the original study (e.g., preliminary results or special reports). Care must be taken to ensure that we know which reports are based on the same samples or on overlapping samples—in meta-analysis these should be considered multiple reports from a single study. When there are multiple reports on a single study, we put all of the citations for that study together in summary information on the study.”

Kay Dickersin, “Publication Bias: Recognizing the Problem, Understanding Its Origins and Scope, and Preventing Harm,” Chapter 2, in Hannah R. Rothstein, Alexander J. Sutton & Michael Borenstein, Publication Bias in Meta-Analysis – Prevention, Assessment and Adjustments 11, 26 (2005)

“Positive results appear to be published more often in duplicate, which can lead to overestimates of a treatment effect (Timmer et al., 2002).”

Julian P.T. Higgins & Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions 152 (2008)

“7.2.2 Identifying multiple reports from the same study

Duplicate publication can introduce substantial biases if studies are  inadvertently included more than once in a meta-analysis (Tramer 1997). Duplicate publication can take various forms, ranging from identical manuscripts to reports describing different numbers of participants and different outcomes (von Elm 2004). It can be difficult to detect duplicate publication, and some ‘detectivework’ by the review authors may be required.”

Lipitor MDL Takes The Fat Out Of Dose Extrapolations

December 2nd, 2015

Philippus Aureolus Theophrastus Bombastus von Hohenheim thankfully went by the simple moniker Paracelsus, sort of the Cher of the 1500s. Paracelsus’ astrological research is graciously overlooked today, but his 16th century dictum, in the German vernacular has created a lasting impression on linguistic conventions and toxicology:

“Alle Ding’ sind Gift, und nichts ohn’ Gift; allein die Dosis macht, dass ein Ding kein Gift ist.”

(All things are poison and nothing is without poison, only the dose permits something not to be poisonous.)

or more simply

“Die Dosis macht das Gift.”

Paracelsus, “Die dritte Defension wegen des Schreibens der neuen Rezepte,” Septem Defensiones (1538), in 2 Werke 510 (Darmstadt 1965). Today, his notion that the “dose is the poison” is a basic principle of modern toxicology,[1] which can be found in virtually every textbook on the subject.[2]

Paracelsus’ dictum has also permeated the juridical world, and become a commonplace in legal commentary and judicial decisions. The Reference Manual on Scientific Evidence is replete with supportive statements on the general acceptance of Paracelsus’ dictum. The chapter on epidemiology notes:

“The idea that the ‘dose makes the poison’ is a central tenet of toxicology and attributed to Paracelsus, in the sixteenth century… [T]his dictum reflects only the idea that there is a safe dose below which an agent does not cause any toxic effect.”

Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 603 & n.160, in Reference Manual on Scientific Evidence (3d ed. 2011). Citing an unpublished, non-scientific advocacy piece written for a regulatory agency, the chapter does, however, claim that “[t]he question whether there is a no-effect threshold dose is a controversial one in a variety of toxic substances areas.”[3] The epidemiology chapter thus appears to confuse two logically distinct propositions: that there is no threshold dose and that there is no demonstrated threshold dose.

The Reference Manual’s chapter on toxicology also weighs in on Paracelsus:

“There are three central tenets of toxicology. First, “the dose makes the poison”; this implies that all chemical agents are intrinsically hazardous—whether they cause harm is only a question of dose. Even water, if consumed in large quantities, can be toxic.”

Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” 633, 636, in Reference Manual on Scientific Evidence (3d ed. 2011) (internal citations omitted).

Recently, Judge Richard Mark Gergel had the opportunity to explore the relevance of dose-response to plaintiffs’ claims that atorvastatin causes diabetes. In re Lipitor (Atorvastatin Calcium) Marketing, Sales Practices & Prod. Liab. Litig., MDL No. 2:14–mn–02502–RMG, Case Mgmt. Order 49, 2015 WL 6941132 (D.S.C. Oct. 22, 2015) [Lipitor]. Plaintiffs’ expert witnesses insisted that they could disregard dose once they had concluded that there was a causal association between atorvastatin at some dose and diabetes. On Rule 702 challenges to plaintiffs’ expert witnesses, the court held that, when there is a dose-response relationship and there is an absence of association at low doses, then plaintiffs must show, through expert witness testimony, that the medication is capable of causing the alleged harm at particular doses. The court permitted the plaintiffs’ expert witnesses to submit supplemental reports to address the dose issue, and the defendants to relodge their Rule 702 challenge after discovery on the new reports. Lipitor at *6.

The Lipitor court’s holding built on the ruling by Judge Breyer’s treatment of dose in In re Bextra & Celebrex Mktg. Sales Practices & Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174-75 (N.D. Cal.2007). Judge Breyer, Justice Breyer’s kid brother, denied defendants’ Rule 702 challenges to plaintiffs’ expert witnesses who opined that Bextra and Celebrex can cause heart attacks and strokes, at 400 mg./day. For plaintiffs who ingested 200 mg/day, however, Judge Breyer held that the lower dose had to be analyzed separately, and he granted the motions to exclude plaintiffs’ expert witnesses’ opinions about the alleged harms caused by the lower dose. Lipitor at *1-2. The plaintiffs’ expert witnesses reached their causation opinions about 200 mg by cherry picking from the observational studies, and disregarding the randomized trials and meta-analyses of observational studies that failed to find an association between 200 mg/day and cardiovascular risk. Id. at *2. Given the lack of support for an association at 200mg/day, the court rejected the plaintiffs’ speculative downward extrapolation asserted.

Because of dose-response gradients, and the potential for a threshold, a risk estimate based upon greater doses or exposure does not apply to a person exposed at lower doses or exposure levels. See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 613, in Reference Manual on Scientific Evidence (3d ed. 2011) (“[A] risk estimate from a study that involved a greater exposure is not applicable to an individual exposed to a lower dose.”).

In the Lipitor case, as in the Celebrex case, multiple studies reported no statistically significant associations between the lower doses and the claimed adverse outcome. This absence, combined with a putative dose-response relationship, made plaintiffs’ downward extrapolation impermissibly speculative. See, e.g., McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1241 (11th Cir. 2005) (reversing admission of expert witness’s testimony when the witness conceded a dose-response, but failed to address the dose of the medication needed to cause the claimed harm).

Courts sometimes confuse thresholds with dose-response relationships. The concepts are logically independent. There can be a dose-response relationship with or without a threshold. And there can be an absence of a dose-response relationship with a threshold, as in cases in which the effect is binary: positive at or above some threshold level, and negative below. A causal claim can run go awry because it ignores the possible existence of a threshold, or the existence of a dose-response relationship. The latter error is commonplace in litigation and regulatory contexts, when science or legal advocates attempt to evaluate risk using data that are based upon higher exposures or doses. The late Irving Selikoff, no stranger to exaggerated claims, warned against this basic error, when he wrote that his asbestos insulator cancer data were inapposite for describing risks of other tradesmen:

“These particular figures apply to the particular groups of asbestos workers in this study. The net synergistic effect would not have been the same if their smoking habits had been different; and it probably would have been different if their lapsed time from first exposure to asbestos dust had been different or if the amount of asbestos dust they had inhaled had been different.”

E.. Cuyler Hammond, Irving J. Selikoff, and Herbert Seidman, “Asbestos Exposure, Cigarette Smoking and Death Rates,” 330 Ann. N.Y. Acad. Sci. 473, 487 (1979).[4] Given that the dose-response between asbestos exposure and disease outcomes was an important tenant of the Selikoff’s work, it is demonstrable incorrect for expert witnesses to invoke relative risks for heavily exposed asbestos insulators and apply them to less exposed workers, as though the risks were the same and there are no thresholds.

The legal insufficiency of equating high and low dose risk assessments has been noted by many courts. In Texas Independent Ginners Ass’n v. Marshall, 630 F.2d 398 (5th Cir. 1980), the Circuit reviewed an OSHA regulation promulgated to protect cotton gin operators from the dangers of byssinosis. OSHA based its risk assessments on cotton dust exposures experienced by workers in the fabric manufacturing industry, but the group of workers to be regulated had intermittent exposures, at different levels, from that of the workers in the relied upon studies. Because of the exposure level disconnect, the Court of Appeals struck the OSHA regulation. Id. at 409. OSHA’s extrapolation from high to low doses was based upon an assumption, not evidence, and the regulation could not survive the deferential standard required for judicial review of federal agency action. Id.[5]

The fallacy of “extrapolation down” often turns on the glib assumption that an individual claimant must have experienced a “causative exposure” because he has the disease that can result at some, higher level of exposure. In reasoning backwards from untoward outcome to sufficient dose, when dose is at issue, is a petitio principii, as recognized by several astute judges:

“The fallacy of the ‘extrapolation down’ argument is plainly illustrated by common sense and common experience. Large amounts of alcohol can intoxicate, larger amounts can kill; a very small amount, however, can do neither. Large amounts of nitroglycerine or arsenic can injure, larger amounts can kill; small amounts, however, are medicinal. Great volumes of water may be harmful, greater volumes or an extended absence of water can be lethal; moderate amounts of water, however, are healthful. In short, the poison is in the dose.”

In re Toxic Substances Cases, No. A.D. 03-319.No. GD 02-018135, 05-010028, 05-004662, 04-010451, 2006 WL 2404008, at *6-7 (Alleghany Cty. Ct. C.P. Aug. 17, 2006) (Colville, J.) (“Drs. Maddox and Laman attempt to “extrapolate down,” reasoning that if high dose exposure is bad for you, then surely low dose exposure (indeed, no matter how low) must still be bad for you.”)(“simple logical error”), rev’d sub nom. Betz v. Pneumo Abex LLC, 998 A.2d 962 (Pa. Super. 2010), rev’d 615 Pa. 504, 44 A.3d 27 (2012).

Exposure Quantification

An obvious corollary of the fallacy of downward extrapolation is that claimants must have a reasonable estimate of their dose or exposure in order to place themselves on the dose-response curve, to estimate in turn what their level of risk was before they developed the claimed harm. For example, in Mateer v. U.S. Aluminum Co., 1989 U.S. Dist. LEXIS 6323 (E.D. Pa. 1989), the court, applying Pennsylvania law, dismissed plaintiffs’ claim for personal injuries in a ground-water contamination case. Although the plaintiffs had proffered sufficient evidence of contamination, their expert witnesses failed to quantify the plaintiffs’ actual exposures. Without an estimate of the claimants’ actual exposure, the challenged expert witnesses could not give reliable, reasonably based opinions and conclusions whether plaintiffs were injured from the alleged exposures. Id. at *9-11.[6]

Science and law are sympatico; dose or exposure matters, in pharmaceutical, occupational, and environmental cases.


[1] Joseph F. Borzelleca, “Paracelsus: Herald of Modern Toxicology,” 53 Toxicol. Sci. 2 (2000); David L. Eaton, “Scientific Judgment and Toxic Torts – A Primer in Toxicology for Judges and Lawyers,” 12 J.L. & Pol’y 5, 15 (2003); Ellen K. Silbergeld, “The Role of Toxicology in Causation: A Scientific Perspective,” 1 Cts. Health Sci. & L. 374, 378 (1991). Of course, the claims of endocrine disruption have challenged the generally accepted principle. See, e.g., Dan Fagin, “Toxicology: The learning curve,” Nature (24 October 2012) (misrepresenting Paracelsus’ dictum as meaning that dose responses will be predictably linear).

[2] See, e.g., Curtis D. Klaassen, “Principles of Toxicology and Treatment of Poisoning,” in Goodman and Gilman’s The Pharmacological Basis of Therapeutics 1739 (11th ed. 2008); Michael A Gallo, “History and Scope of Toxicology,” in Curtis D. Klaassen, ed., Casarett and Doull’s Toxicology: The Basic Science of Poisons 1, 4–5 (7th ed. 2008).

[3] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 603 & n.160, in Reference Manual on Scientific Evidence (3d ed. 2011) (Irving J. Selikoff, Disability Compensation for Asbestos-Associated Disease in the United States: Report to the U.S. Department of Labor 181–220 (1981). The chapter also cites two judicial decisions that clearly were influenced by advocacy science and regulatory assumptions. Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1536 (D.C. Cir. 1984) (commenting that low exposure effects are “one of the most sharply contested questions currently being debated in the medical community”); In re TMI Litig. Consol. Proc., 927 F. Supp. 834, 844–45 (M.D. Pa. 1996) (considering extrapolations from high radiation exposure to low exposure for inferences of causality).

[4] See also United States v. Reserve Mining Co., 380 F. Supp. 11, 52-53 (D. Minn. 1974) (questioning the appropriateness of comparing community asbestos exposures to occupational and industrial exposures). Risk assessment modesty was uncharacteristic of Irving Selikoff, who used insulator risk figures, which were biased high, to issue risk projections for total predicted asbestos-related mortality.

[5] See also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1250 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988) (noting that the plaintiffs’ expert witnesses relied upon studies that involved heavier exposures than those experienced by plaintiffs; the failure to address the claimants’ actual exposures rendered the witnesses’ proposed testimony legally irrelevant); Gulf South Insulation v. United States Consumer Products Safety Comm’n, 701 F.2d 1137, 1148 (5th Cir. 1983) (invalidating CPSC’s regulatory ban on urea formaldehyde foam insulation, as not supported by substantial evidence, when the agency based its ban upon high-exposure level studies and failed to quantify putative risks at actual exposure levels; criticizing extrapolations from high to low doses); Graham v. Wyeth Laboratories, 906 F.2d 1399, 1415 (10th Cir.) (holding that trial court abused its discretion in failing to grant new trial upon a later-discovered finding that plaintiff’s expert misstated the level of toxicity of defendant’s DTP vaccine by an order of magnitude), cert. denied, 111 S.Ct. 511 (1990).

Two dubious decisions that fail to acknowledge the fallacy of extrapolating down from high-exposure risk data have come out of the Fourth Circuit. See City of Greenville v. W.R. Grace & Co., 827 F.2d 975 (4th Cir. 1987) (affirming judgment based upon expert testimony that identified risk at low levels of asbestos exposure based upon studies at high levels of exposure); Smith v. Wyeth-Ayerst Labs Co., 278 F. Supp. 2d 684, 695 (W.D.N.C. 2003)(suggesting that expert witnesses may extrapolate down to lower doses, and even to extrapolate to different time window of latency).

[6] See also Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1111, 1113-14 (5th Cir. 1991) (en banc) (per curiam) (trial court may exclude opinion of expert witness whose opinion is based upon incomplete or inaccurate exposure data), cert. denied, 112 S. Ct. 1280 (1992); Wills v. Amareda Hess Co., 2002 WL 140542, *10 (S.D.N.Y. Jan. 31, 2002) (noting that the plaintiff’s expert witness failed to quantify the decedent’s exposure, but was nevertheless “ready to form a conclusion first, without any basis, and then try to justify it” by claiming that the decedent’s development of cancer was itself sufficient evidence that he had had intensive exposure to the alleged carcinogen).

Vaccine Court Inoculated Against Pathological Science

October 25th, 2015

Richard I. Kelley, M.D., Ph.D., is the Director of the Division of Metabolism, Kennedy Krieger Institute, and a member of the Department of Pediatrics, in Johns Hopkins University. The National Library of Medicine’s Pubmed database shows that Dr. Kelley has written dozens of articles on mitochondrial disease, but none that concludes that thimerosal or the measles, mumps and rubella vaccine plays a causal role in causing autism by inducing or aggravating mitochondrial disease. In one article, Kelley opines:

“Large, population-based studies will be needed to identify a possible relationship of vaccination with autistic regression in persons with mitochondrial cytopathies.”

Jacqueline R. Weissman, Richard I. Kelley, Margaret L. Bauman, Bruce H. Cohen, Katherine F. Murray, Rebecca L. Mitchell, Rebecca L. Kern, and Marvin R. Natowicz, “Mitochondrial Disease in Autism Spectrum Disorder Patients: A Cohort Analysis,” 3 PLoS One e3815 (Nov. 26, 2008). Those large scale population-based studies to support the speculation of Kelley and his colleagues have not materialized since 2008, and meta-analyses and systematic reviews have dampened the enthusiasm for Kelley’s hypothesis.[1]

Special Master Denise K. Vowell, in the Federal Court of Claims, has now further dampened the enthusiasm for Dr. Kelley’s mitochondrial theories, in a 115 page opinion, written in support of rejecting Kelley’s testimony and theories that the MMR vaccine caused a child’s autism. Madariaga v. Sec’y Dep’t H.H.S., No. 02-1237V (Ct. Fed. Claims Sept. 26, 2015) Slip op. [cited as Madariaga].

Special Master Vowell fulsomely recounts the history of vaccine litigation, in which the plaintiffs have presented theories that the combination of thimerosal-containing vaccines and the MMR vaccine cause autism, or just the thimerosal-containing vaccines cause autism. Madariaga at 3. Both theories were tested in the crucible of litigation and cross-examination in a series of test cases. The first theory resulted in verdicts against the claimants, which were affirmed on appeal.[2] Similarly, the trials on the thimerosal-only claims uniformly resulted in decisions from the Special Masters against the claims.[3] The three Special Masters, hearing the cases, found that the vaccine-causation claims were not close cases, and were based upon unreliable evidence.[4] Madariaga at 4.[5]

In Madariaga, Special Master Vowell noted that Doctor Kelley had conceded the “absence of an association between the MMR vaccine and autism in large epidemiological studies.” Madariaga at 61. Kelley attempted to evade the force of his lack of evidence by retreating into a claim that “autistic regressions caused by the live attenuated MMR vaccine are rare events,” and an assertion that there are many inflammatory factors that can induce autistic regression. Madariaga at 61.

Special Master described the whole of Kelley’s testimony as “meandering, confusing, and completely unpersuasive elaboration of his unique insights and methods.” Madariaga at 66. Although it is clear from the Special Master’s opinion that Kelley was unbridled in his over-interpretation of studies, and perhaps undisciplined in his interpretation of test results, the lengthy opinion provides only a high-altitude view of Kelley’s errors. There are tantalizing comments and notes in the Special Master’s decision, such as one that reports that one study may have been over-interpreted by Kelley because he ignored the authors’ comment that their findings could be consistent with chance because of their multiple comparisons, and another that paper that failed to show statistical significance. Madariaga at 90 & n.160.

The unreliability of Kelley’s testimony appeared to be more than hand waving in the absence of evidence. He compared the child’s results on a four-hour fasting test, when the child had not fasted for four hours. When pressed about this maneuver, Kelley claimed that he had made calculations to bring the child’s results “back to some standard.” Madariaga at 66 & n.115.

Although the Special Master’s opinion itself was ultimately persuasive, the tome left me eager to know more about Dr. Kelley’s epistemic screw ups, and less about vaccine court procedure.


[1] See Vittorio Demicheli, Alessandro Rivetti, Maria Grazia Debalini, and Carlo Di Pietrantonj, “Vaccines for measles, mumps and rubella in children,” Cochrane Database Syst. Rev., Issue 2. Art. No. CD004407, DOI:10.1002/14651858.CD004407.pub3 (2012) (“Exposure to the MMR vaccine was unlikely to be associated with autism … .”); Luke E. Taylor, Amy L. Swerdfeger, and Guy D. Eslick, “Vaccines are not associated with autism: An evidence-based meta-analysis of case-control and cohort studies,” 32 Vaccine 3623 (2014) (“Findings of this meta-analysis suggest that vaccinations are not associated with the development of autism or autism spectrum disorder. Furthermore, the components of the vaccines (thimerosal or mercury) or multiple vaccines (MMR) are not associated with the development of autism or autism spectrum disorder.”).

[2] Cedillo v. Sec’y, HHS, No. 98-916V, 2009 WL 331968 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 89 Fed. Cl. 158 (2009), aff’d, 617 F.3d 1328 (Fed. Cir. 2010); Hazlehurst v. Sec’y, HHS, No. 03-654V, 2009 WL 332306 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 88 Fed. Cl. 473 (2009), aff’d, 604 F.3d 1343 (Fed. Cir. 2010); Snyder v. Sec’y, HHS, No. 01-162V, 2009 WL 332044 (Fed. Cl. Spec. Mstr. Feb. 12, 2009), aff’d, 88 Fed. Cl. 706 (2009).

[3] Dwyer v. Sec’y, HHS, 2010 WL 892250; King v. Sec’y, HHS, No. 03-584V, 2010 WL 892296 (Fed. Cl. Spec. Mstr. Mar. 12, 2010); Mead v. Sec’y, HHS, 2010 WL 892248.

[4] See, e.g., King, 2010 WL 892296, at *90 (emphasis in original); Snyder, 2009 WL 332044, at *198.

[5] The Federal Rule of Evidence technically do not control the vaccine court proceedings, but the Special Masters are bound by the requirement of Daubert v. Merrell Dow Pharm., 509 U.S. 579, 590 (1993), to find that expert witness opinion testimony is reliable before they consider it. Knudsen v. Sec’y, HHS, 35 F.3d 543, 548-49 (Fed. Cir. 1994). Madariaga at 7.

Demonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case

October 6th, 2015

Michael D. Freeman is a chiropractor and self-styled “forensic epidemiologist,” affiliated with Departments of Public Health & Preventive Medicine and Psychiatry, Oregon Health & Science University School of Medicine, in Portland, Oregon. His C.V. can be found here. Freeman has an interesting publication in press on his views of forensic epidemiology. Michael D. Freeman & Maurice Zeegers, “Principles and applications of forensic epidemiology in the medico-legal setting,” Law, Probability and Risk (2015); doi:10.1093/lpr/mgv010. Freeman’s views on epidemiology did not, however, pass muster in the courtroom. Porter v. Smithkline Beecham Corp., Phila. Cty. Ct. C.P., Sept. Term 2007, No. 03275. Slip op. (Oct. 5, 2015).

In Porter, plaintiffs sued Pfizer, the manufacturer of the SSRI antidepressant Zoloft. Plaintiffs claimed the mother plaintiff’s use of Zoloft during pregnancy caused her child to be born with omphalocele, a serious defect that occurs when the child’s intestines develop outside his body. Pfizer moved to exclude plaintiffs’ medical causation expert witnesses, Dr. Cabrera and Dr. Freeman. The trial judge was the Hon. Mark I. Bernstein, who has written and presented frequently on expert witness evidence.[1] Judge Bernstein held a two day hearing in September 2015, and last week, His Honor ruled that the plaintiffs’ expert witnesses failed to meet Pennsylvania’s Frye standard for admissibility. Judge Bernstein’s opinion reads a bit like a Berenstain Bear book on how not to use epidemiology.

GENERAL CAUSATION SCREW UPS

Proper Epidemiologic Method

First, Find An Association

Dr. Freeman has a methodologic map that included Bradford Hill criteria at the back end of the procedure. Dr. Freeman, however, impetuously forgot that before you get to the back end, you must traverse the front end:

“Dr. Freemen agrees that he must, and claims he has, applied the Bradford Hill Criteria to support his opinion. However, the starting procedure of any Bradford-Hill analysis is ‘an association between two variables’ that is ‘perfectly clear-cut and beyond what we would care to attribute to the play of chance’.35 Dr. Freeman testified that generally accepted methodology requires a determination, first, that there’s evidence of an association and, second, whether chance, bias and confounding have been accounted for, before application of the Bradford-Hill criteria.36 Because no such association has been properly demonstrated, the Bradford Hill criteria could not have been properly applied.”

Slip op. at 12-13. In other words, don’t go rushing to the Bradford Hill factors until and unless you have first shown an association; second, you have shown that it is “clear cut,” and not likely the result of bias or confounding; and third, you have ruled out the play of chance or random variability in explaining the difference between the observed and expected rates of disease.

Proper epidemiologic method requires surveying the pertinent published studies that investigate whether there is an association between the medication use and the claimed harm. The expert witnesses must, however, do more than write a bibliography; they must assess any putative associations for “chance, confounding or bias”:

“Proper epidemiological methodology begins with published study results which demonstrate an association between a drug and an unfortunate effect. Once an association has been found, a judgment as whether a real causal relationship between exposure to a drug and a particular birth defect really exists must be made. This judgment requires a critical analysis of the relevant literature applying proper epidemiologic principles and methods. It must be determined whether the observed results are due to a real association or merely the result of chance. Appropriate scientific studies must be analyzed for the possibility that the apparent associations were the result of chance, confounding or bias. It must also be considered whether the results have been replicated.”

Slip op. at 7.

Then Rule Out Chance

So if there is something that appears to be an association in a study, the expert epidemiologist must assess whether it is likely consistent with a chance association. If we flip a fair coin 10 times, we “expect” 5 heads and 5 tails, but actually the probability of not getting the expected result is about three times greater than obtaining the expected result. If on one series of 10 tosses we obtain 6 heads and 4 tails, we would certainly not reject a starting assumption that the expected outcome was 5 heads/ 5 tails. Indeed, the probability of obtaining 6 heads / 4 tails or 4 heads /6 tails is almost double that of the probability of obtaining the expected outcome of equal number of heads and tails.

As it turned out in the Porter case, Dr. Freeman relied rather heavily upon one study, the Louik study, for his claim that Zoloft causes the birth defect in question. See Carol Louik, Angela E. Lin, Martha M. Werler, Sonia Hernández-Díaz, and Allen A. Mitchell, “First-Trimester Use of Selective Serotonin-Reuptake Inhibitors and the Risk of Birth Defects,” 356 New Engl. J. Med. 2675 (2007). The authors of the Louik study were quite clear that they were not able to rule out chance as a sufficient explanation for the observed data in their study:

“The previously unreported associations we identified warrant particularly cautious interpretation. In the absence of preexisting hypotheses and the presence of multiple comparisons, distinguishing random variation from true elevations in risk is difficult. Despite the large size of our study overall, we had limited numbers to evaluate associations between rare outcomes and rare exposures. We included results based on small numbers of exposed subjects in order to allow other researchers to compare their observations with ours, but we caution that these estimates should not be interpreted as strong evidence of increased risks.24

Slip op at 10 (quoting from Louik study).

Judge Bernstein thus criticized Freeman for failing to account for chance in explaining his putative association between maternal Zoloft use and infant omphalocele. The appropriate and generally accepted methodology for accomplishing this step of evaluating a putative association is to consider whether the association is statistically significant at the conventional level.

In relying heavily upon the Louik study, Dr. Freeman opened himself up to serious methodological criticism. Judge Bernstein’s opinion stands for the important proposition that courts should not be unduly impressed with nominal statistical significance in the presence of multiple comparisons and very broad confidence intervals:

“The Louik study is the only study to report a statistically significant association between Zoloft and omphalocele. Louik’s confidence interval which ranges between 1.6 and 20.7 is exceptionally broad. … The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power. Studies that rely on a very small number of cases can present a random statistically unstable clustering pattern that may not replicate the reality of a larger population. The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”

Slip op. at 8. Statistical precision in the point estimate of risk, which includes assessing the outcome in the context of whether the authors conducted multiple comparisons, and whether the observed confidence intervals were very broad, is part of the generally accepted epidemiologic methodology, which Freeman flouted:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”

Slip op. at 9. The studies that Freeman cited and apparently relied upon failed to report statistically significant associations between sertraline (Zoloft) and omphalocele. Judge Bernstein found this lack to be a serious problem for Freeman and his epidemiologic opinion:

“While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”

Slip op. at 10. The lack of statistical significance, in the context of repeated attempts to find it, helped sink Freeman’s proffered testimony.

Then Rule Out Bias and Confounding

As noted, Freeman relied heavily upon the Louik study, which was the only study to report a nominally statistically significant risk ratio for maternal Zoloft use and infant omphalocele. The Louik study, by its design, however, could not exclude chance or confounding as full explanation for the apparent association, and Judge Bernstein chastised Dr. Freeman for overselling the study as support for the plaintiffs’ causal claim:

“The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”

Slip op. at 8.

And Only Then Consider the Bradford Hill Factors

Even when an association is clear cut, and beyond what we can likely attribute to chance, generally accepted methodology requires the epidemiologist to consider the Bradford Hill factors. As Judge Bernstein explains, generally accepted methodology for assessing causality in this area requires a proper consideration of Hill’s factors before a conclusion of causation is reached:

“As the Bradford-Hill factors are properly considered, causality becomes a matter of the epidemiologist’s professional judgment.”

Slip op. at 7.

Consistency or Replication

The nine Hill factors are well known to lawyers because they have been stated and discussed extensively in Hill’s original article, and in references such as the Reference Manual on Scientific Evidence. Not all the Hill factors are equally important, or important at all, but one that is consistency or concordance of results among the available epidemiologic studies. Stated alternatively, a clear cut association unlikely to be explained by chance is certainly interesting and probative, but it raises an important methodological question — can the result be replicated? Judge Bernstein restated this important Hill factor as an important determinant of whether a challenged expert witness employed a generally accepted method:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”

Slip op. at 10.

“More significantly neither Reefhuis nor Alwan reported statistically significant associations between Zoloft and omphalocele. While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”

Slip op. at 10.

Replication But Without Double Dipping the Data

Epidemiologic studies are sometimes updated and extended with additional follow up. An expert witness who wished to skate over the replication and consistency requirement might be tempted, as was Dr. Freeman, to count the earlier and later iteration of the same basic study to count as “replication.” The Louik study was indeed updated and extended this year in a published paper by Jennita Reefhuis and colleagues.[2] Proper methodology, however, prohibits double dipping data to count the later study that subsumes the early one as a “replication”:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology. Dr. Freeman claims the Alwan and Reefhuis studies demonstrate replication. However, the population Alwan studied is only a subset of the Reefhuis population and therefore they are effectively the same.”

Slip op. at 10.

The Lumping Fallacy

Analyzing the health outcome of interest at the right level of specificity can sometimes be a puzzle and a challenge, but Freeman generally got it wrong by opportunistically “lumping” disparate outcomes together when it helps him get a result that he likes. Judge Bernstein admonishes:

“Proper methodology further requires that one not fall victim to the … the ‘Lumping Fallacy’. … Different birth defects should not be grouped together unless they a part of the same body system, share a common pathogenesis or there is a specific valid justification or necessity for an association20 and chance, bias, and confounding have been eliminated.”

Slip op. at 7. Dr. Freeman lumped a lot, but Judge Bernstein saw through the methodological ruse. As Judge Bernstein pointed out:

“Dr. Freeman’s analysis improperly conflates three types of data: Zoloft and omphalocele, SSRI’s generally and omphalocele, and SSRI’s and gastrointestinal and abdominal malformations.”

Slip op. at 8. Freeman’s approach, which sadly is seen frequently in pharmaceutical and other products liability cases, is methodologically improper:

“Generally accepted causation criteria must be based on the data applicable to the specific birth defect at issue. Dr. Freeman improperly lumps together disparate birth defects.”

Slip op. at 11.

Class Effect Fallacy

Another kind of improper lumping results from treating all SSRI antidepressants the same to either lump them together, or to pick and choose from among all the SSRIs, the data points that are supportive of the plaintiffs’ claims (while ignoring those SSRI data points not supportive of the claims). To be sure, the SSRI antidepressants do form a “class,” in that they all have a similar pharmacologic effect. The SSRIs, however, do not all achieve their effect in the serotonergic neurons the same way; nor do they all have the same “off-target” effects. Treating all the SSRIs as interchangeable for a claimed adverse effect, without independent support for this treatment, is known as the class effect fallacy. In Judge Bernstein’s words:

“Proper methodology further requires that one not fall victim to the ‘Class Effect Fallacy’ … . A class effect cannot be assumed. The causation conclusion must be drug specific.”

Slip op. at 7. Dr. Freeman’s analysis improperly conflated Zoloft data with SSRI data generally. Slip op. at 8. Assuming what you set out to demonstrate is, of course, a fine way to go methodologically into the ditch:

“Without significant independent scientific justification it is contrary to generally accepted methodology to assume the existence of a class effect. Dr. Freeman lumps all SSRI drug results together and assumes a class effect.”

Slip op. at 10.

SPECIFIC CAUSATION SCREW UPS

Dr. Freeman was also offered by plaintiffs to provide a specific causation opinion – that Mrs. Porter’s use of Zoloft in pregnancy caused her child’s omphalocele. Freeman claimed to have performed a differential diagnosis or etiology or something to rule out alternative causes.

Genetics

In the field of birth defects, one possible cause looming in any given case is an inherited or spontaneous genetic mutation. Freeman purported to have considered and ruled out genetic causes, which he acknowledged to make up a substantial percentage of all omphalocele cases. Bo Porter, Mrs. Porter’s son, was tested for known genetic causes, and Freeman argued that this allowed him to “rule out” genetic causes. But the current state of the art in genetic testing allowed only for identifying a small number of possible genetic causes, and Freeman failed to explain how he might have ruled out the as-of-yet unidentified genetic causes of birth defects:

“Dr. Freeman fails to properly rule out genetic causes. Dr. Freeman opines that 45-49% of omphalocele cases are due to genetic factors and that the remaining 50-55% of cases are due to non-genetic factors. Dr. Freeman relies on Bo Porter’s genetic testing which did not identify a specific genetic cause for his injury. However, minor plaintiff has not been tested for all known genetic causes. Unknown genetic causes of course cannot yet be tested. Dr. Freeman has made no analysis at all, only unwarranted assumptions.”

Slip op. at 15-16. Judge Bernstein reviewed Freeman’s attempted analysis and ruling out of potential causes, and found that it departed from the generally accepted methodology in conducting differential etiology. Slip op. at 17.

Timing Errors

One feature of putative terotogenicity is that an embryonic exposure must take place at a specific gestational developmental time in order to have its claimed deleterious effect. As Judge Bernstein pointed out, omphalocele results from an incomplete folding of the abdominal wall during the third to fifth weeks of gestation. Mrs. Porter, however, did not begin taking Zoloft until her seventh week of pregnancy, which left Dr. Freeman opinion-less as to how Zoloft contributed to the claimed causation of the minor plaintiff’s birth defect. Slip op. at 14. This aspect of Freeman’s specific causation analysis was glaringly defect, and clearly not the sort of generally accepted methodology of attributing a birth defect to a teratogen.

******************************************************

All in all, Judge Bernstein’s opinion is a tour de force demonstration of how a state court judge, in a so-called Frye jurisdiction, can show that failure to employ generally accepted methods renders an expert witness’s opinions inadmissible. There is one small problem in statistical terminology.

Statistical Power

Judge Bernstein states, at different places, that the Louik study was and was not statistically significant for Zoloft and omphalocele. The court’s opinion ultimately does explain that the nominal statistical significance was vitiated by multiple comparisons and an extremely broad confidence interval, which more than justified its statement that the study was not truly statistically significant. In another moment, however, the court referred to the problem as one of lack of statistical power. For some reason, however, Judge Bernstein chose to explain the problem with the Louik study as a lack of statistical power:

“Equally significant is the lack of power concerning the omphalocele results. The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power.”

Slip op. at 8. The adjusted odds ratio for Zoloft and omphalocele, was 5.7, with a 95% confidence interval of 1.6 – 20.7. Power was not the issue because if the odds ratio were otherwise credible, free from bias, confounding, and chance, the study had the power to observe an increased risk of close to 500%, which met the pre-stated level of significance. The problem, however, was multiple testing, fragile and imprecise results, and inability to evaluate the odds ratio fully for bias and confounding.


 

[1] Mark I. Bernstein, “Expert Testimony in Pennsylvania,” 68 Temple L. Rev. 699 (1995); Mark I. Bernstein, “Jury Evaluation of Expert Testimony under the Federal Rules,” 7 Drexel L. Rev. 239 (2014-2015).

[2] Jennita Reefhuis, Owen Devine, Jan M Friedman, Carol Louik, Margaret A Honein, “Specific SSRIs and birth defects: bayesian analysis to interpret new data in the context of previous reports,” 351 Brit. Med. J. (2015).