TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

April 13th, 2024

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.

Access to a Study Protocol & Underlying Data Reveals a Nuclear Non-Proliferation Test

April 8th, 2024

The limits of peer review ultimately make it a poor proxy for the validity tests posed by Rules 702 and 703. Published peer review articles simply do not permit a very searching evaluation of the facts and data of a study. In the wake of the Daubert decision, expert witnesses quickly saw that they can obscure the search for validity by the reliance upon published studies, and frustrate the goals of judicial gatekeeping. As a practical matter, the burden shifts to the party that wishes to challenge the relied upon facts and data to learn more about the cited studies to show that the facts and data are not sufficient under Rule 702(b), and that the testimony is not the product of reliable methods under Rule 702(c). Obtaining study protocols, and in some instances, underlying data, are necessary for due process in the gatekeeping process. A couple of case studies may illustrate the power of looking under the hood of published studies, even ones that were peer reviewed.

When the Supreme Court decided the Daubert case in June 1993, two recent verdicts in silicone-gel breast implant cases were fresh in memory.[1] The verdicts were large by the standards of the time, and the evidence presented for the claims that silicone caused autoimmune disease was extremely weak. The verdicts set off a feeding frenzy, not only in the lawsuit industry, but also in the shady entrepreneurial world of supposed medical tests for “silicone sensitivity.”

The plaintiffs’ litigation theory lacked any meaningful epidemiologic support, and so there were fulsome presentations of putative, hypothetical mechanisms. One such mechanism involved the supposed in vivo degradation of silicone to silica (silicon dioxide), with silica then inducing an immunogenic reaction, which then, somehow, induced autoimmunity and the induction of autoimmune connective tissue disease. The degradation claim would ultimately prove baseless,[2] and the nuclear magnetic resonance evidence put forward to support degradation would turn out to be instrumental artifact and deception. The immunogenic mechanism had a few lines of potential support, with the most prominent at the time coming from the laboratories of Douglas Radford Shanklin, and his colleague, David L. Smalley, both of whom were testifying expert witnesses for claimants.

The Daubert decision held out some opportunity to challenge the admissibility of testimony that silicone implants led to either the production of a silicone-specific antibody, or the induction of t-cell mediated immunogenicity from silicone (or resulting silica) exposure. The initial tests of the newly articulated standard for admissibility of opinion testimony in silicone litigation did not go well.[3]  Peer review, which was absent in the re-analyses relied upon in the Bendectin litigation, was superficially present in the studies relied upon in the silicone litigation. The absence of supportive epidemiology was excused with hand waving that there was a “credible” mechanism, and that epidemiology took too long and was too expensive. Initially, post-Daubert, federal courts were quick to excuse the absence of epidemiology for a novel claim.

The initial Rule 702 challenges to plaintiffs’ expert witnesses thus focused on  immunogenicity as the putative mechanism, which if true, might lend some plausibility to their causal claim. Ultimately, plaintiffs’ expert witnesses would have to show that the mechanism was real by showing that silicone exposure causes autoimmune disease through epidemiologic studies,

One of the more persistent purveyors of a “test” for detecting alleged silicone sensitivity came from Smalley and Shanklin, then at the University of Tennessee. These authors exploited the fears of implant recipients and the greed of lawyers by marketing a “silicone sensitivity test (SILS).” For a price, Smalley and Shanklin would test mailed-in blood specimens sent directly by lawyers or by physicians, and provide ready-for-litigation reports that claimants had suffered an immune system response to silicone exposure. Starting in 1995, Smalley and Shanklin also cranked out a series of articles at supposedly peer reviewed journals, which purported to identify a specific immune response to crystalline silica in women who had silicone gel breast implants.[4] These studies had two obvious goals. First, the studies promoted their product to the “silicone sisters,” various support groups of claimants, as well as their lawyers, and a network of supporting rheumatologists and plastic surgeons. Second, by identifying a putative causal mechanism, Shanklin could add a meretricious patina of scientific validity to the claim that silicone breast implants cause autoimmune disease, which Shanklin, as a testifying expert witness, needed to survive Rule 702 challenges.

The plaintiffs’ strategy had been to paper over the huge analytical gaps in their causal theory with complicated, speculative research, which had been peer reviewed and published. Although the quality of the journals was often suspect, and the nature of the peer review obscure, the strategy had been initially successful in deflecting any meaningful scrutiny.

Many of the silicone cases were pending in a multi-district litigation, MDL 926, before Judge Sam Pointer, in the Northern District of Alabama. Judge Pointer, however, did not believe that ruling on expert witness admissibility was a function of an MDL court, and by 1995, he started to remand cases to the transferor courts, for those courts to do what they thought appropriate under Rules 702 and 703. Some of the first remanded cases went to the District of Oregon, where they landed in front of Judge Robert E. Jones. In early 1996, Judge Jones invited briefing on expert witness challenges, and in face of the complex immunology and toxicology issues, and the emerging epidemiologic studies, he decided to appoint four technical advisors to assist him in deciding the challenges.

The addition of scientific advisors to the gatekeeper’s bench made a huge difference in the sophistication and detail of the challenges that could be lodged to the relied-upon studies. In June 1996, Judge Jones entertained extensive hearings with viva voce testimony from both challenged witnesses and subject-matter experts on topics, such as immunology and nuclear magnetic resonance spectroscopy. Judge Jones invited final argument in the form of videotaped presentations from counsel so that the videotapes could be distributed to his technical advisors later in the summer. The contrived complexity of plaintiffs’ case dissipated, and the huge analytical gaps became visible. In December 1996, Judge Jones issued his decision that excluded the plaintiffs’ expert witnesses’ proposed testimony on grounds that it failed to satisfy the requirements of Rule 702.[5]

In October 1996, while Judge Jones was studying the record, and writing his opinion in the Hall case, Judge Weinstein, with a judge from the Southern District of New York, and another from New York state trial court, conducted a two-week Rule 702 hearing, in Brooklyn. Judge Weinstein announced at the outset that he had studied the record from the Hall case, and that he would incorporate it into his record for the cases remanded to the Southern and Eastern Districts of New York.

Curious gaps in the articles claiming silicone immunogenicity, and the lack of success in earlier Rule 702 challenges, motivated the defense to obtain the study protocols and underlying data from studies such as those published by Shanklin and Smalley. Shanklin and Smalley were frequently listed as expert witnesses in individual cases, but when requests or subpoenas for their protocols and raw data were filed, plaintiffs’ counsel stonewalled or withdrew them as witnesses. Eventually, the defense was able to enforce a subpoena and obtain the protocol and some data. The respondents claimed that the control data no longer existed, and inexplicably a good part of the experimental data had been destroyed. Enough was revealed, however, to see that the published articles were not what they claimed to be.[6]

In addition to litigation discovery, in March 1996, a surgeon published the results of his test of the Shanklin-Smalley silicone sensitivity test (“SILS”).[7] Dr. Leroy Young sent the Shanklin laboratory several blood samples from women with and without silicone implants. For six women who never had implants, Dr. Young submitted a fabricated medical history that included silicone implants and symptoms of “silicone-associated disease.” All six samples were reported back as “positive”; indeed, these results were more positive than the blood samples from the women who actually had silicone implants. Dr. Young suggested that perhaps the SILS test was akin to cold fusion.

By the time counsel assembled in Judge Weinstein’s courtroom, in October 1996, some epidemiologic studies had become available and much more information was available on the supposedly supportive mechanistic studies upon which plaintiffs’ expert witnesses had previously relied. Not too surprisingly, plaintiffs’ counsel chose not to call the entrepreneurial Dr. Shanklin, but instead called Donard S. Dwyer, a young, earnest immunologist who had done some contract work on an unrelated matter for Bristol-Myers Squibb, a defendant in the litigation.  Dr. Dwyer had filed an affidavit previously in the Oregon federal litigation, in which he gave blanket approval to the methods and conclusions of the Smalley-Shanklin research:

“Based on a thorough review of these extensive materials which are more than adequate to evaluate Dr. Smalley’s test methodology, I formed the following conclusions. First, the experimental protocols that were used are standard and acceptable methods for measuring T Cell proliferation. The results have been reproducible and consistent in this laboratory. Second, the conclusion that there are differences between patients with breast implants and normal controls with respect to the proliferative response to silicon dioxide appears to be justified from the data.”[8]

Dwyer maintained this position even after the defense obtaining the study protocol and underlying data, and various immunologists on the defense side filed scathing evaluatons of the Smalley-Shanklin work.  On direct examination at the hearings in Brooklyn, Dwyer vouched for the challenged t-cell studies, and opined that the work was peer reviewed and sufficiently reliable.[9]

The charade fell apart on cross-examination. Dwyer refused to endorse the studies that claimed to have found an anti-silicone antibody. Researchers at leading universities had attempted to reproduce the findings of such antibodies, without success.[10] The real controversy was over the claimed finding of silicone antigenicity as shown in t-cell or the cell-mediated specific immune response. On direct examination, plaintiffs’ counsel elicited Dwyer’s support for the soundness of the scientific studies that purported to establish such antigenicity, with little attention to the critiques that had been filed before the hearing.[11] Dwyer stuck to his unqualified support he had expressed previously in his affidavit for the Oregon cases.[12]

The problematic aspect of Dwyer’s direct examination testimony was that he had seen the protocol and the partial data produced by Smalley and Shanklin.[13] Dwyer, therefore, could not resist some basic facts about their work. First, the Shanklin data failed to support a dose-response relationship.[14] Second, the blood samples from women with silicone implants had been mailed to Smalley’s laboratory, whereas the control samples were collected locally. The disparity ensured that the silicone blood samples would be older than the controls, which was a departure from treating exposed and control samples in the same way.[15] Third, the experiment was done unblinded; the laboratory technical personnel and the investigators knew which blood samples were silicone exposed and which were controls (except for samples sent by Dr. Leroy Young).[16] Fourth, Shanklin’s laboratory procedures deviated from the standardized procedure set out in the National Institute of Health’s Current Protocols in Immunology.[17]

The SILS study protocol and the data produced by Shanklin and Smalley made clear that each sample was to be tested in triplicate for t-cell proliferation in response to silica, to a positive control mitogen (Con A), and to a negative control blank. The published papers all claimed that the each sample was tested in triplicate for each of these three response situations (silica, mitogen, and nothing).[18] Shanklin and Smalley described their t-cell proliferation studies, in their published papers, as having been done in triplicate. These statements were, however, untrue and never corrected.[19]

The study protocol called for the tests to be run in triplicate, but they instructed the laboratory that two counts may be used if one count does not match the other counts, which is to be decided by a technical specialist on a “case-by-case” basis. Of data that was supposed to be reported in triplicate, fully one third had only two data points, and 10 percent had but one data point.[20] No criteria were provided to the technical specialist for deciding which data to discard.[21] Not only had Shanklin excluded data, but he discarded and destroyed the data such that no one could go back and assess whether the data should have been excluded.[22]

Dwyer agreed that this exclusion and discarding of data was not at all a good method.[23] Dwyer proclaimed that he had not come to Brooklyn to defend this aspect of the Shanklin work, and that it was not defensible at all. Dwyer conceded that “the interpretation of the data and collection of the data are flawed.”[24] Dwyer tried to stake out a position that was incoherent by asserting that there was “nothing inherently wrong with the method,” while conceding that discarding data was problematic.[25] The judges presiding over the hearing could readily see that the Shanklin research was bent.

At this point, the lead plaintiffs’ counsel, Michael Williams, sought an off-ramp. He jumped to his feet and exclaimed “I’m informed that no witness in this case will rely on Dr. Smalley’s [and Shanklin’s] work in any respect.” [26] Judge Weinstein’s eyes lit up with the prospect that the Smalley-Shanklin work, by agreement, would never be mentioned again in New York state or federal cases. Given how central the claim of silicone antigenicity was to plaintiffs’ cases, the defense resisted the stipulation about research that they would continue to face in other state and federal courts. The defense was saved, however, by the obstinence of a lawyer from the Weitz & Luxenberg firm, who rose to report that her firm intended to call Drs. Shanklin and Smalley as witnesses, and that they would not stipulate to the exclusion of their work. Judge Weinstein rolled his eyes, and waved me to continue.[27] The proliferation of the t-cell test was over. The hearing before Judges Weinstein and Baer, and Justice Lobis, continued for several more days, with several other dramatic moments.[28]

In short order, on October 23, 1996, Judge Weinstein issued a short, published opinion, in which he granted partial summary judgment on the claims of systemic disease for all cases pending in federal court in New York.[29] What was curious was that the defendants had not moved for summary judgment. There were, of course, pending motions to exclude plaintiffs’ expert witnesses, but Judge Weinstein effectively ducked those motions, and let it be known that he was never a fan of Rule 702. It would be many years later, before Judge Weinstein allowed his judicial assessment see the light of day. Two decades and some years later, in a law review article, Judge Weinstein gave his judgment that

“[t]he breast implant litigation was largely based on a litigation fraud. …  Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”[30]

Judge Weinstein’s opinion was truly a judgment from which there can be no appeal. Shanklin and Smalley continued to publish papers for another decade. None of the published articles by Shanklin and others have been retracted.


[1] Reuters, “Record $25 Million Awarded In Silicone-Gel Implants Case,” N.Y. Times at A13 (Dec. 24, 1992) (describing the verdict returned in Harris County, Texas, in Johnson v. Medical Engineering Corp.); Associated Press, “Woman Wins Implant Suit,” N.Y. Times at A16 (Dec. 17, 1991) (reporting a verdict in Hopkins v. Dow Corning, for $840,000 in compensatory and $6.5 million in punitive damages); see Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment with minimal attention to Rule 702 issues).

[2] William E. Hull, “A Critical Review of MR Studies Concerning Silicone Breast Implants,” 42 Magnetic Resonance in Medicine 984, 984 (1999) (“From my viewpoint as an analytical spectroscopist, the result of this exercise was disturbing and disappointing. In my judgement as a referee, none of the Garrido group’s papers (1–6) should have been published in their current form.”). See also N.A. Schachtman, “Silicone Data – Slippery & Hard to Find, Part 2,” Tortini (July 5, 2015). Many of the material science claims in the breast implant litigation were as fraudulent as the health effects claims. See, e.g., John Donley, “Examining the Expert,” 49 Litigation 26 (Spring 2023) (discussing his encounters with frequent testifier Pierre Blais, in silicone litigation).

[3] See, e.g., Hopkins v. Dow Corning Corp., 33 F.3d 1116 (9th Cir. 1994) (affirming judgment for plaintiff over Rule 702 challenges), cert. denied, 115 S.Ct. 734 (1995). See Donald A. Lawson, “Note, Hopkins v. Dow Corning Corporation: Silicone and Science,” 37 Jurimetrics J. 53 (1996) (concluding that Hopkins was wrongly decided).

[4] See David L. Smalley, Douglas R. Shanklin, Mary F. Hall, and Michael V. Stevens, “Detection of Lymphocyte Stimulation by Silicon Dioxide,” 4 Internat’l J. Occup. Med. & Toxicol. 63 (1995); David L. Smalley, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, and Aram Hanissian, “Immunologic stimulation of T lymphocytes by silica after use of silicone mammary implants,” 9 FASEB J. 424 (1995); David L. Smalley, J. J. Levine, Douglas R. Shanklin, Mary F. Hall, Michael V. Stevens, “Lymphocyte response to silica among offspring of silicone breast implant recipients,” 196 Immunobiology 567 (1996); David L. Smalley, Douglas R. Shanklin, “T-cell-specific response to silicone gel,” 98 Plastic Reconstr. Surg. 915 (1996); and Douglas R. Shanklin, David L. Smalley, Mary F. Hall, Michael V. Stevens, “T cell-mediated immune response to silica in silicone breast implant patients,” 210 Curr. Topics Microbiol. Immunol. 227 (1996). Shanklin was also no stranger to making his case in the popular media. See, e.g., Douglas Shanklin, “More Research Needed on Breast Implants,” Kitsap Sun at 2 (Aug. 29, 1995) (“Widespread silicone sickness is very real in women with past and continuing exposure to silicone breast implants.”) (writing for Scripps Howard News Service). Even after the Shanklin studies were discredited in court, Shanklin and his colleagues continued to publish their claims that silicone implants led to silica antigenicity. David L. Smalley, Douglas R. Shanklin, and Mary F. Hall, “Monocyte-dependent stimulation of human T cells by silicon dioxide,” 66 Pathobiology 302 (1998); Douglas R. Shanklin and David L. Smalley, “The immunopathology of siliconosis. History, clinical presentation, and relation to silicosis and the chemistry of silicon and silicone,” 18 Immunol. Res. 125 (1998); Douglas Radford Shanklin, David L. Smalley, “Pathogenetic and diagnostic aspects of siliconosis,” 17 Rev. Environ Health 85 (2002), and “Erratum,” 17 Rev Environ Health. 248 (2002); Douglas Radford Shanklin & David L Smalley, “Kinetics of T lymphocyte responses to persistent antigens,” 80 Exp. Mol. Pathol. 26 (2006). Douglas Shanklin died in 2013. Susan J. Ainsworth, “Douglas R. Shanklin,” 92 Chem. & Eng’g News (April 7, 2014). Dr. Smalley appears to be still alive. In 2022, he sued the federal government to challenge his disqualification from serving as a laboratory director of any clinical directory in the United States, under 42 U.S.C. § 263a(k). He lost. Smalley v. Becerra, Case No. 4:22CV399 HEA (E.D. Mo. July 6, 2022).

[5] Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Ore. 1996); see Joseph Sanders & David H. Kaye, “Expert Advice on Silicone Implants: Hall v. Baxter Healthcare Corp., 37 Jurimetrics J. 113 (1997); Laurens Walker & John Monahan, “Scientific Authority: The Breast Implant Litigation and Beyond,” 86 Virginia L. Rev. 801 (2000); Jane F. Thorpe, Alvina M. Oelhafen, and Michael B. Arnold, “Court-Appointed Experts and Technical Advisors,” 26 Litigation 31 (Summer 2000); Laural L. Hooper, Joe S. Cecil & Thomas E. Willging, “Assessing Causation in Breast Implant Litigation: The Role of Science Panels,” 64 Law & Contemp. Problems 139 (2001); Debra L. Worthington, Merrie Jo Stallard, Joseph M. Price & Peter J. Goss, “Hindsight Bias, Daubert, and the Silicone Breast Implant Litigation: Making the Case for Court-Appointed Experts in Complex Medical and Scientific Litigation,” 8 Psychology, Public Policy &  Law 154 (2002).

[6] Judge Jones’ technical advisor on immunology reported that the studies offered in support of the alleged connection between silicone implantation and silicone-specific T cell responses, including the published papers by Shanklin and Smalley, “have a number of methodological shortcomings and thus should not form the basis of such an opinion.” Mary Stenzel-Poore, “Silicone Breast Implant Cases–Analysis of Scientific Reasoning and Methodology Regarding Immunological Studies” (Sept. 9, 1996). This judgment was seconded, over three years later, in the proceedings before MDL 926 and its Rule 706 court-appointed immunology expert witness. See Report of Dr. Betty A. Diamond, in MDL 926, at 14-15 (Nov. 30, 1998). Other expert witnesses who published studies on the supposed immunogenicity of silicone came up with some creative excuses to avoid producing their underlying data. Eric Gershwin consistently testified that his data were with a co-author in Israel, and that he could not produce them. N.A. Schachtman, “Silicone Data – Slippery and Hard to Find, Part I,” Tortini (July 4, 2015). Nonetheless, the court-appointed technical advisors were highly critical of Dr. Gershwin’s results. Dr. Stenzel-Poore, the immunologist on Judge Jones’ panel of advisors, found Gershwin’s claims “not well substantiated.” Hall v. Baxter Healthcare Corp., 947 F.Supp. 1387 (D. Ore. 1996). Similarly, Judge Pointer’s appointed expert immunologist Dr. Betty A. Diamond, was unshakeable in her criticisms of Gershwin’s work and his conclusions. Testimony of Dr. Betty A. Diamond, in MDL 926 (April 23, 1999). And the Institute of Medicine committee, charged with reviewing the silicone claims, found Gershwin’s work inadequate and insufficient to justify the extravagent claims that plaintiffs were making for immunogenicity and for causation of autoimmune disease. Stuart Bondurant, Virginia Ernster, and Roger Herdman, eds., Safety of Silicone Breast Implants 256 (1999). Another testifying expert witness who relied upon his own data, Nir Kossovsky, resorted to a seismic excuse; he claimed that the Northridge Quake destroyed his data. N.A. Schachtman, “Earthquake Induced Data Loss – We’re All Shook Up,” Tortini (June 26, 2015); Kossovsky, along with his wife, Beth Brandegee, and his father, Ram Kossowsky, sought to commercialize an ELISA-based silicone “antibody” biomarker diagnostic test, Detecsil. Although the early Rule 702 decisions declined to take a hard at Kossovsky’s study, the U.S. Food and Drug Administration eventually shut down the Kossovsky Detecsil test. Lillian J. Gill, FDA Acting Director, Office of Compliance, Letter to Beth S. Brandegee, President, Structured Biologicals (SBI) Laboratories: Detecsil Silicone Sensitivity Test (July 15, 1994); see Gary Taubes, “Silicone in the System: Has Nir Kossovsky really shown anything about the dangers of breast implants?” Discover Magazine (Dec. 1995).

[7] Leroy Young, “Testing the Test: An Analysis of the Reliability of the Silicone Sensitivity Test (SILS) in Detecting Immune-Mediated Responses to Silicone Breast Implants,” 97 Plastic & Reconstr. Surg. 681 (1996).

[8] Affid. of Donard S. Dwyer, at para. 6 (Dec. 1, 1995), filed in In re Breast Implant Litig. Pending in U.S. D. Ct, D. Oregon (Groups 1,2, and 3).

[9] Notes of Testimony of Dr. Donnard Dwyer, Nyitray v. Baxter Healthcare Corp., CV 93-159 (E. & S.D.N.Y and N.Y. Sup. Ct., N.Y. Cty. Oct. 8, 9, 1996) (Weinstein, J., Baer, J., Lobis, J., Pollak, M.J.).

[10] Id. at N.T. 238-239 (Oct. 8, 1996).

[11] Id. at N.T. 240.

[12] Id. at N.T. 241-42.

[13] Id. at N.T. 243-44; 255:22-256:3.

[14] Id. at 244-45.

[15] Id. at N.T. 259.

[16] Id. at N.T. 258:20-22.

[17] Id. at N.T. 254.

[18] Id. at N.T. 252:16-254.

[19] Id. at N.T. 254:19-255:2.

[20] Id. at N.T. 269:18-269:14.

[21] Id. at N.T. 261:23-262:1.

[22] Id. at N.T. 269:18-270.

[23] Id. atN.T. 256:3-16.

[24] Id. at N.T. 262:15-17

[25] Id. at N.T. 247:3-5.

[26] Id. at N.T. at 260:2-3

[27] Id. at N.T. at 261:5-8.

[28] One of the more interesting and colorful moments came when the late James Conlon cross-examined plaintiffs’ pathology expert witness, Saul Puszkin, about questionable aspects of his curriculum vitae. The examination was revealed such questionable conduct that Judge Weinstein stopped the examination and directed Dr. Puszkin not to continue without legal counsel of his own.

[29] In re Breast Implant Cases, 942 F. Supp. 958 (E.& S.D.N.Y. 1996). The opinion did not specifically address the Rule 702 and 703 issues that were the subject of pending motions before the court.

[30] Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (emphasis added).

Peer Review, Protocols, and QRPs

April 3rd, 2024

In Daubert, the Supreme Court decided a legal question about the proper interpretation of a statute, Rule 702, and then remanded the case to the Ninth Circuit of the Court of Appeals for further proceedings. The Court did, however, weigh in with dicta about some several considerations in admissibility decisions.  In particular, the Court identified four non-dispositive factors: whether the challenged opinion has been empirically tested, published and peer reviewed, and whether the underlying scientific technique or method supporting the opinion has an acceptable rate of error, and has gained general acceptance.[1]

The context in which peer review was discussed in Daubert is of some importance to our understanding its holding peer review out as a consideraton. One of the bases for the defense challenges to some of the plaintiffs’ expert witnesses’ opinions in Daubert was their reliance upon re-analyses of published studies to suggest that there was indeed an increased risk of birth defects if only the publication authors had used some other control group, or taken some other analytical approach. Re-analyses can be important, but these reanalyses of published Bendectin studies were post hoc, litigation driven, and obviously result oriented. The Court’s discussion of peer review reveals that it was not simply creating a box to be checked before a trial court could admit an expert witness’s opinions. Peer review was suggested as a consideration because:

“submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[2]

Peer review, or the lack thereof, for the challenged expert witnesses’ re-analyses was called out because it raised suspicions of lack of validity. Nothing in Daubert, or in later decisions, or more importantly in Rule 702 itself, supports admitting expert witness testimony just because the witness relied upon peer-reviewed studies, especially when the studies are invalid or are based upon questionable research practices. The Court was careful to point out that peer-reviewed publication was “not a sine qua non of admissibility; it does not necessarily correlate with reliability, … .”[3] The Court thus showed that it was well aware that well-ground (and thus admissible) opinions may not have been previously published, and that the existence of peer review was simply a potential aid in answering the essential question, whether the proponent of a proffered opinion has shown “the scientific validity of a particular technique or methodology on which an opinion is premised.[4]

Since 1993, much has changed in the world of bio-science publishing. The wild proliferation of journals, including predatory and “pay-to-play” journals, has disabused most observers that peer review provides evidence of validity of methods. Along with the exponential growth in publications has come an exponential growth in expressions of concern and out-right retractions of articles, as chronicled and detailed at Retraction Watch.[5] Some journals encourage authors to nominate the peer reviewers for their manuscripts; some journals let authors block some scientists as peer reviewers of their submitted manuscripts. If the Supreme Court were writing today, it might well note that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena.

Since the Supreme Court decided Daubert, the Federal Judicial Center and National Academies of Science have provided a Reference Manual for Scientific Evidence, now in its third edition, and with a fourth edition on the horizon, to assist judges and lawyers involved in the litigation of scientific issues. Professor Goodstein, in his chapter “How Science Works,” in the third edition, provides the most extensive discussion of peer review in the Manual, and emphasizes that peer review “works very poorly in catching cheating or fraud.”[6]  Goodstein invokes his own experience as a peer reviewer to note that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.”[7] Indeed, Goodstein’s essay in the Reference Manual characterizes the ability of peer review to warrant study validity as a “myth”:

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. … It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.[8]

Goodstein’s experience as a peer reviewer is hardly idiosyncratic. One standard text on the ethical conduct of research reports that peer review is often ineffective or incompetent, and that it may not even catch simple statistical or methodological errors.[9] According to the authors, Shamoo and Resnik:

“[p]eer review is not good at detecting data fabrication or falsification partly because reviewers usually do not have access to the material they would need to detect fraud, such as the original data, protocols, and standard operating procedures.”[10]

Indeed, without access to protocols, statistical analysis plans, and original data, peer review often cannot identify good faith or negligent deviations from the standard of scientific care. There is some evidence to support this negative assessment of peer review from testing of the counter-factual. Reviewers were able to detect questionable, selective reporting when they had access to the study authors’ research protocols.[11]

Study Protocol

The study protocol provides the scientific rationale for a study, clearly defines the research question, the data collection process, defines the key exposure and outcomes, and describes the methods to be applied, before commencing data collection.[12] The protocol also typically pre-specifies the statistical data analysis. The epidemiology chapter of the current edition of the Reference Manual for Scientific Evidence offers blandly only that epidemiologists attempt to minimize bias in observational studies with “data collection protocols.”[13] Epidemiologists and statisticians are much clearer in emphasizing the importance, indeed the necessity, of having a study protocol before commencing data collection. Back in 1988, John Bailar and Frederick Mosteller explained that it was critical in reporting statistical analyses to inform readers about how and when the authors devised the study design, and whether they set the design criteria out in writing before they began to collect data.[14]

The necessity of a study protocol is “self-evident,”[15] and essential to research integrity.[16] The International Society of Pharmacoepidemiology has issued Guidelines for “Good Pharmacoepidemiology Practices,”[17] which calls for every study to have a written protocol. Among the requirements set out in this set of guidelines are descriptions of the research method, study design, operational definitions of exposure and outcome variables, and projected study sample size. The Guidelines provide that a detailed statistical analysis plan may be specified after data collection begins, but before any analysis commences.

Expert witness opinions on health effects are built upon studies, and so it behooves legal counsel to identify the methodological strengths and weaknesses of key studies through questioning whether they have protocols, whether the protocols were methodologically appropriate, and whether the researchers faithfully followed their protocols and their statistical analysis plans. Determining the peer review status of a publication, on the other hand, will often not advance a challenge based upon improvident methodology.

In some instances, a published study will have sufficiently detailed descriptions of methods and data that readers, even lawyers, can evaluate their scientific validity or reliability (vel non). In some cases, however, readers will be no better off than the peer reviewers who were deprived of access to protocols, statistical analysis plans, and original data. When a particular study is crucial support for an adversary’s expert witness, a reasonable litigation goal may well be to obtain the protocol and statistical analysis plan, and if need be, the original underlying data. The decision to undertake such discovery is difficult. Discovery of non-party scientists can be expensive and protracted; it will almost certainly be contentious. When expert witnesses rely upon one or a few studies, which telegraph internal validity, this litigation strategy may provide the strongest evidence against the study’s being reasonably relied upon, or its providing “sufficient facts and data” to support an admissible expert witness opinion.


[1] Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-594 (1993).

[2] Id. at 594 (internal citations omitted) (emphasis added).

[3] Id.

[4] Id. at 593-94.

[5] Retraction Watch, at https://retractionwatch.com/.

[6] Reference Manual on Scientific Evidence at 37, 44-45 (3rd ed. 2011) [Manual].

[7] Id. at 44-45 n.11.

[8] Id. at 48 (emphasis added).

[9] Adil E. Shamoo and David B. Resnik, Responsible Conduct of Research 133 (4th ed. 2022).

[10] Id.

[11] An-Wen Chan, Asbjørn Hróbjartsson, Mette T. Haahr, Peter C. Gøtzsche, and David G. Altman, D. G. “Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles,” 291 J. Am. Med. Ass’n 2457 (2004).

[12] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[13] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 573 (3rd ed. 2011) 573 (“Study designs are developed before they begin gathering data.”).

[14] John Bailar & Frederick Mosteller, “Guidelines for Statistical Reporting in Articles for Medical Journals,” 108 Ann. Intern. Med. 2266, 268 (1988).

[15] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 477 (2nd ed. 2014).

[16] Sandra Alba, et al., “Bridging research integrity and global health epidemiology statement: guidelines for good epidemiological practice,” 5 BMJ Global Health e003236, at p.3 & passim (2020).

[17] See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at <https://www.pharmacoepi.org/resources/policies/guidelines-08027/>.

Data Games – A Techno Thriller

April 22nd, 2020

Data Games – A Techno Thriller

Sherlock Holmes, Hercule Poirot, Miss Marple, Father Brown, Harry Bosch, Nancy Drew, Joe and Frank Hardy, Sam Spade, Columbo, Lennie Briscoe, Inspector Clouseau, and Dominic Da Vinci:

Move over; there is a new super sleuth in town.

Meet Professor Ken Wheeler.

Ken is a statistician, and so by profession, he is a data detective. In his day job, he teaches at a northeastern university, where his biggest challenges are managing the expectations of students and administrators, while trying to impart statistical learning. At home, Ken rarely manages to meet the expectations of his wife and son. But as some statisticians are wont to do, Ken sometimes takes on consulting gigs that require him to use his statistical skills to help litigants sort out the role of chance in cases that run from discrimination claims to rare health effects. In this contentious, sharp-elbowed environment, Ken excels. And truth be told, Ken actually finds great satisfaction in identifying the egregious errors and distortions of adversary statisticians

Wheeler’s sleuthing usually involves ascertaining random error or uncovering a lurking variable, but in Herberg I. Weisberg’s just-published novel, Data Games: A Techno Thriller, Wheeler is drawn into a high-stakes conspiracy of intrigue, violence, and fraud that goes way beyond the run-of-the-mine p-hacking and data dredging.

An urgent call from a scientific consulting firm puts Ken Wheeler in the midst of imminent disaster for a pharmaceutical manufacturer, whose immunotherapy anti-cancer wonder drug, Verbana, is under attack. A group of apparently legitimate scientists have obtained the dataset from Verbana’s pivotal clinical trial, and they appear on the verge of blowing Verbana out of the formulary with a devastating analysis that will show that the drug causes early dementia. Wheeler’s mission is to debunk the debunking analysis when it comes.

For those readers who are engaged in the litigation defense of products liability claims against medications, the scenario is familiar enough. The scientific group studying Verbana’s alleged side effect seems on the up-and-up, but they appear to engaged in a cherry-picking exercise, guided by a dubious theory of biological plausibility, known as the “Kreutzfeld hypothesis.”

It is not often that mystery novels turn on surrogate outcomes, biomarkers, genomic medicine, and predictive analytics, but Data Games is no ordinary mystery. And Wheeler is no ordinary detective. To be sure, the middle-aged Wheeler drives a middle-aged BMW, not a Bond car, and certainly not a Bonferroni. And Wheeler’s toolkit may not include a Glock, but he can handle the lasso, the jacknife, and the logit, and serve them up with SAS. Wheeler sees patterns where others see only chaos.

Unlike the typical Hollywood rubbish about stereotyped evil pharmaceutical companies, the hero of Data Games finds that there are sinister forces behind what looks like an honest attempt to uncover safety problems with Verbana. These sinister forces will use anything to achieve their illicit ends, including superficially honest academics with white hats. The attack on Verbana gets the FDA’s attention and an urgent hearing in White Oak, where Wheeler shines.

The author of Data Games, Herbert I. Weisberg, is himself a statistician, and a veteran of some of the dramatic data games he writes about in this novel. Weisberg is perhaps better known for his “homework” books, such asWillful Ignorance: The Mismeasure of Uncertainty (2014), and Bias and Causation: Models and Judgment for Valid Comparisons (2010). If, however, you ever find yourself in a pandemic lockdown, Weisberg’s Data Games: A Techno Thriller is a perfect way to escape. For under $3, you will be entertained, and you might even learn something about probability and statistics.

April Fool – Zambelli-Weiner Must Disclose

April 2nd, 2020

Back in the summer of 2019, Judge Saylor, the MDL judge presiding over the Zofran birth defect cases, ordered epidemiologist, Dr. Zambelli-Weiner to produce documents relating to an epidemiologic study of Zofran,[1] as well as her claimed confidential consulting relationship with plaintiffs’ counsel.[2]

This previous round of motion practice and discovery established that Zambelli-Weiner was a paid consultant in advance of litigation, that her Zofran study was funded by plaintiffs’ counsel, and that she presented at a Las Vegas conference, for plaintiffs’ counsel only, on [sic] how to make mass torts perfect. Furthermore, she had made false statements to the court about her activities.[3]

Zambelli-Weiner ultimately responded to the discovery requests but she and plaintiffs’ counsel withheld several documents as confidential, pursuant to the MDL’s procedure for protective orders. Yesterday, April 1, 2020, Judge Saylor entered granted GlaxoSmithKline’s motion to de-designate four documents that plaintiffs claimed to be confidential.[4]

Zambelli-Weiner sought to resist GSK’s motion to compel disclosure of the documents on a claim that GSK was seeking the documents to advance its own litigation strategy. Judge Saylor acknowledged that Zambelli-Weiner’s psycho-analysis might be correct, but that GSK’s motive was not the critical issue. According to Judge Saylor, the proper inquiry was whether the claim of confidentiality was proper in the first place, and whether removing the cloak of secrecy was appropriate under the facts and circumstances of the case. Indeed, the court found “persuasive public-interest reasons” to support disclosure, including providing the FDA and the EMA a complete, unvarnished view of Zambelli-Weiner’s research.[5] Of course, the plaintiffs’ counsel, in close concert with Zambelli-Weiner, had created GSK’s need for the documents.

This discovery battle has no doubt been fought because plaintiffs and their testifying expert witnesses rely heavily upon the Zambelli-Weiner study to support their claim that Zofran causes birth defects. The present issue is whether four of the documents produced by Dr. Zambelli-Weiner pursuant to subpoena should continue to enjoy confidential status under the court’s protective order. GSK argued that the documents were never properly designated as confidential, and alternatively, the court should de-designate the documents because, among other things, the documents would disclose information important to medical researchers and regulators.

Judge Saylor’s Order considered GSK’s objections to plaintiffs’ and Zambelli-Weiner’s withholding four documents:

(1) Zambelli-Weiner’s Zofran study protocol;

(2) Undisclosed, hidden analyses that compared birth defects rates for children born to mothers who used Zofran with the rates seen with the use of other anti-emetic medications;

(3) An earlier draft Zambelli-Weiner’s Zofran study, which she had prepared to submit to the New England Journal of Medicine; and

(4) Zambelli-Weiner’s advocacy document, a “Causation Briefing Document,” which she prepared for plaintiffs’ lawyers.

Judge Saylor noted that none of the withheld documents would typically be viewed as confidential. None contained “sensitive personal, financial, or medical information.”[6]  The court dismissed Zambelli-Weiner’s contention that the documents all contained “business and proprietary information,” as conclusory and meritless. Neither she nor plaintiffs’ counsel explained how the requested documents implicated proprietary information when Zambelli-Weiner’s only business at issue is to assist in making lawsuits. The court observed that she is not “engaged in the business of conducting research to develop a pharmaceutical drug or other proprietary medical product or device,” and is related solely to her paid consultancy to plaintiffs’ lawyers. Neither she nor the plaintiffs’ lawyers showed how public disclosure would hurt her proprietary or business interests. Of course, if Zambelli-Weiner had been dishonest in carrying out the Zofran study, as reflected in study deviations from its protocol, her professional credibility and her business of conducting such studies might well suffer. Zambelli-Weiner, however, was not prepared to affirm the antecedent of that hypothetical. In any event, the court found that whatever right Zambelli-Weiner might have enjoyed to avoid discovery evaporated with her previous dishonest representations to the MDL court.[7]

The Zofran Study Protocol

GSK sought production of the Zofran study protocol, which in theory contained the research plan for the Zofran study and the analyses the researchers intended to conduct. Zambelli-Weiner attempted to resist production on the specious theory that she had not published the protocol, but the court found this “non-publication” irrelevant to the claim of confidentiality. Most professional organizations, such as the International Society of Pharmacoepidemiology (“ISPE”), which ultimately published Zambelli-Weiner’s study, encourage the publication and sharing of study protocols.[8] Disclosure of protocols helps ensure the integrity of studies by allowing readers to assess whether the researchers have adhered to their study plan, or have engaged in ad hoc data dredging in search for a desired result.[9]

The Secret, Undisclosed Analyses

Perhaps even more egregious than withholding the study protocol was the refusal to disclose unpublished analyses comparing the rate of birth defects among children born to mothers who used Zofran with the birth defect rates of children with in utero exposure to other anti-emetic medications.  In ruling that Zambelli-Weiner must produce the unpublished analyses, the court expressed its skepticism over whether these analyses could ever have been confidential. Under ISPE guidelines, researchers must report findings that significantly affect public health, and the relative safety of Zofran is essential to its evaluation by regulators and prescribing physicians.

Not only was Zambelli-Weiner’s failure to include these analyses in her published article ethically problematic, but she apparently hid these analyses from the Pharmacovigilance Risk Assessment Committee (PRAC) of the European Medicines Agency, which specifically inquired of Zambelli-Weiner whether she had performed such analyses. As a result, the PRAC recommended a label change based upon Zambelli-Weiner’s failure to disclosure material information. Furthermore, the plaintiffs’ counsel represented they intended to oppose GSK’s citizen petition to the FDA, based upon the Zambelli-Weiner study. The apparently fraudulent non-disclosure of relevant analyses could not have been more fraught for public health significance. The MDL court found that the public health need trumped any (doubtful) claim to confidentiality.[10] Against the obvious public interest, Zambelli-Weiner offered no “compelling countervailing interest” in keeping her secret analyses confidential.

There were other aspects to the data-dredging rationale not discussed in the court’s order. Without seeing the secret analyses of other anti-emetics, readers were deprive of an important opportunity to assess actual and potential confounding in her study. Perhaps even more important, the statistical tools that Zambelli-Weiner used, including any measurements of p-values and confidence intervals, and any declarations of “statistical significance,” were rendered meaningless by her secret, undisclosed, multiple testing. As noted by the American Statistical Association (ASA) in its 2016 position statement, “4. Proper inference requires full reporting and transparency.”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

“Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”[11]

The Draft Manuscript for the New England Journal of Medicine

The MDL court wasted little time and ink in dispatching Zambelli-Weiner’s claim of confidentiality for her draft New England Journal of Medicine manuscript. The court found that she failed to explain how any differences in content between this manuscript and the published version constituted “proprietary business information,” or how disclosure would cause her any actual prejudice.

Zambelli-Weiner’s Litigation Road Map

In a world where social justice warriors complain about organizations such as Exponent, for its litigation support of defense efforts, the revelation that Zambelli-Weiner was helping to quarterback the plaintiffs’ offense deserves greater recognition. Zambelli-Weiner’s litigation road map was clearly created to help Grant & Eisenhofer, P.A., the plaintiffs’ lawyers,, create a causation strategy (to which she would add her Zofran study). Such a document from a consulting expert witness is typically the sort of document that enjoys confidentiality and protection from litigation discovery. The MDL court, however, looked beyond Zambelli-Weiner’s role as a “consulting witness” to her involvement in designing and conducting research. The broader extent of her involvement in producing studies and communicating with regulators made her litigation “strategery” “almost certainly relevant to scientists and regulatory authorities” charged with evaluating her study.”[12]

Despite Zambelli-Weiner’s protestations that she had made a disclosure of conflict of interest, the MDL court found her disclosure anemic and the public interest in knowing the full extent of her involvement in advising plaintiffs’ counsel, long before the study was conducted, great.[13]

The legal media has been uncommonly quiet about the rulings on April Zambelli-Weiner, in the Zofran litigation. From the Union of Concerned Scientists, and other industry scolds such as David Egilman, David Michaels, and Carl Cranor – crickets. Meanwhile, while the appeal over the admissibility of her testimony is pending before the Pennsylvania Supreme Court,[14] Zambelli-Weiner continues to create an unenviable record in Zofran, Accutane,[15] Mirena,[16] and other litigations.


[1]  April Zambelli‐Weiner, Christina Via, Matt Yuen, Daniel Weiner, and Russell S. Kirby, “First Trimester Pregnancy Exposure to Ondansetron and Risk of Structural Birth Defects,” 83 Reproductive Toxicology 14 (2019).

[2]  See In re Zofran (Ondansetron) Prod. Liab. Litig., 392 F. Supp. 3d 179, 182-84 (D. Mass. 2019) (MDL 2657) [cited as In re Zofran].

[3]  “Litigation Science – In re Zambelli-Weiner” (April 8, 2019); “Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL” (July 30, 2019). See also Nate Raymond, “GSK accuses Zofran plaintiffs’ law firms of funding academic study,” Reuters (Mar. 5, 2019).

[4]  In re Zofran Prods. Liab. Litig., MDL No. 1:15-md-2657-FDS, Order on Defendant’s Motion to De-Designate Certain Documents as Confidential Under the Protective Order (D.Mass. Apr. 1, 2020) [Order].

[5]  Order at n.3

[6]  Order at 3.

[7]  See In re Zofran, 392 F. Supp. 3d at 186.

[8]  Order at 4. See also Xavier Kurz, Susana Perez-Gutthann, the ENCePP Steering Group, “Strengthening standards, transparency, and collaboration to support medicine evaluation: Ten years of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP),” 27 Pharmacoepidemiology & Drug Safety 245 (2018).

[9]  Order at note 2 (citing Charles J. Walsh & Marc S. Klein, “From Dog Food to Prescription Drug Advertising: Litigating False Scientific Establishment Claims Under the Lanham Act,” 22 Seton Hall L. Rev. 389, 431 (1992) (noting that adherence to study protocol “is essential to avoid ‘data dredging’—looking through results without a predetermined plan until one finds data to support a claim”).

[10]  Order at 5, citing Anderson v. Cryovac, Inc., 805 F.2d 1, 8 (1st Cir. 1986) (describing public-health concerns as “compelling justification” for requiring disclosing of confidential information).

[11]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016)

See alsoThe American Statistical Association’s Statement on and of Significance” (March 17, 2016).“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses (Oct. 14, 2014).

[12]  Order at 6.

[13]  Cf. Elizabeth J. Cabraser, Fabrice Vincent & Alexandra Foote, “Ethics and Admissibility: Failure to Disclose Conflicts of Interest in and/or Funding of Scientific Studies and/or Data May Warrant Evidentiary Exclusions,” Mealey’s Emerging Drugs Reporter (Dec. 2002) (arguing that failure to disclose conflicts of interest and study funding should result in evidentiary exclusions).

[14]  Walsh v. BASF Corp., GD #10-018588 (Oct. 5, 2016, Pa. Ct. C.P. Allegheny Cty., Pa.) (finding that Zambelli-Weiner’s and Nachman Brautbar’s opinions that pesticides generally cause acute myelogenous leukemia, that even the smallest exposure to benzene increases the risk of leukemia offended generally accepted scientific methodology), rev’d, 2018 Pa. Super. 174, 191 A.3d 838, 842-43 (Pa. Super. 2018), appeal granted, 203 A.3d 976 (Pa. 2019).

[15]  In re Accutane Litig., No. A-4952-16T1, (Jan. 17, 2020 N.J. App. Div.) (affirming exclusion of Zambelli-Weiner as an expert witness).

[16]  In re Mirena IUD Prods. Liab. Litig., 169 F. Supp. 3d 396 (S.D.N.Y. 2016) (excluding Zambelli-Weiner in part).

Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL

July 30th, 2019

Judge Saylor, who presides over the Zofran MDL, handed down his opinion on the Zambelli-Weiner affair, on July 25, 2019.[1] As discussed on these pages back in April of this year,[2] GlaxoSmithKline (GSK), the defendant in the Zofran birth defects litigation, sought documents from plaintiffs and Dr Zambelli-Weiner (ZW) about her published study on Zofran and birth defects.[3] Plaintiffs refused to respond to the discovery on grounds of attorney work product,[4] and of consulting expert witness confidential communications.[5] After an abstract of ZW’s study appeared in print, GSK subpoenaed ZW and her co-author, Dr. Russell Kirby, for a deposition and for production of documents.

Plaintiffs’ counsel sought a protective order. Their opposition relied upon a characterization of ZW as a research scientist; they conveniently ommitted their retention of her as a paid expert witness. In December 2018, the MDL court denied plaintiffs’ motion for a protective order, and allowed the deposition to go forward to explore the financial relationship between counsel and ZW.

In January 2019, when GSK served ZW with its subpoena duces tecum, ZW through her own counsel moved for a protective order, supported by ZW’s affidavit with factual assertions to support her claim to be not subject to the deposition. The MDL court quickly denied her motion, and in short order, her lawyer notified the court that ZW’s affidavit contained “factual misrepresentations,” which she refused to correct, and he sought leave to withdraw.

According to the MDL court, the ZW affidavit contained three falsehoods. She claimed not to have been retained by any party when she had been a paid consultant to plaintiffs at times over the previous five years, since December 2014. ZW claimed that she had no factual information about the litigation, when in fact she had participated in a Las Vegas plaintiffs’ lawyers’ conference, “Mass Torts Made Perfect,” in October 2015. Furthermore, ZW falsely claimed that monies received from plaintiffs’ law firms did not go to fund the Zofran study, but went to her company, Translational Technologies International Health Research & Economics, for unrelated work. ZW received in excess of $200,000 for her work on the Zofran study.

After ZW obtained new counsel, she gave deposition testimony in February 2019, when she acknowledged the receipt of money for the study, and the lengthy relationship with plaintiffs’ counsel. Armed with this information, GSK moved for full responses to its document requests. Again, plaintiffs’ counsel and ZW resisted on grounds of confidentiality and privilege.

Judge Saylor reviewed the requested documents in camera, and held last week that they were not protected by consulting expert witness privilege or by attorney work product confidentiality. ZW’s materials and communications in connection with the Las Vegas plaintiffs’ conference never had the protection of privilege or confidentiality. ZW presented at a “quasi-public” conference attended by lawyers who had no connection to the Zofran litigation.[6]

With respect to work product claims, Judge Saylor found that GSK had shown “exceptional circumstances” and “substantial need” for the requested materials given that the plaintiffs’ testifying expert witnesses had relied upon the ZW study, which had been covertly financially supported by plaintiffs’ lawyers.[7] With respect to whatever was thinly claimed to be privileged and confidential, Judge Saylor found the whole arrangement to fail the smell test:[8]

“It is troublesome, to say the least, for a party to engage a consulting, non-testifying expert; pay for that individual to conduct and publish a study, or otherwise affect or influence the study; engage a testifying expert who relies upon the study; and then cloak the details of the arrangement with the consulting expert in the confidentiality protections of Rule 26(b) in order to conceal it from a party opponent and the Court. The Court can see no valid reason to permit such an arrangement to avoid the light of discovery and the adversarial process. Under the circumstances, GSK has made a showing of substantial need and an inability to obtain these documents by other means without undue hardship.

Furthermore, in this case, the consulting expert made false statements to the Court as to the nature of her relationship with plaintiffs’ counsel. The Court would not have been made aware of those falsehoods but for the fact that her attorney became aware of the issue and sought to withdraw. Certainly plaintiffs’ counsel did nothing at the time to correct the false impressions created by the affidavit. At a minimum, the submission of those falsehoods effectively waived whatever protections might otherwise apply. The need to discover the truth and correct the record surely outweighs any countervailing policy in favor of secrecy, particularly where plaintiffs’ testifying experts have relied heavily on Dr. Zambelli-Weiner’s study as a basis for their causation opinions. In order to effectively cross-examine plaintiffs’ experts about those opinions at trial, GSK is entitled to review the documents. At a minimum, the documents shed additional light on the nature of the relationship between Dr. Zambelli-Weiner and plaintiffs’ counsel, and go directly to the credibility of Dr. Zambelli-Weiner and the reliability of her study results.”

It remains to be seen whether Judge Saylor will refer the matter of ZW’s false statements in her affidavit to the U.S. Attorney’s office, or the lawyers’ complicity in perpetuating these falsehoods to disciplinary boards.

Mass torts will never be perfect, or even very good. Judge Saylor, however, has managed to make the Zofran litigation a little less bad.


[1]  Memorandum and order on In Camera Production of Documents Concerning Dr. April Zambelli-Weiner, In re Zofran Prods. Liab. Litig., MDL 2657, D.Mass. (July 25, 2019) [cited as Mem.].

[2]  NAS, “Litigation Science – In re Zambelli-Weiner” (April 8, 2019).

[3]  April Zambelli-Weiner, et al., “First Trimester Ondansetron Exposure and Risk of Structual Birth Defects,” 83 Reproductive Toxicol. 14 (2019).

[4]  Fed. R. Civ. P. 26(b)(3).

[5]  Fed. R. Civ. P. 26(b)(4)(D).

[6]  Mem. at 7-9.

[7]  Mem. at 9.

[8]  Mem. at 9-10.

Litigation Science – In re Zambelli-Weiner

April 8th, 2019

Back in 2001, in the aftermath of the silicone gel breast implant litigation, I participated in a Federal Judicial Center (FJC) television production of “Science in the Courtroom, program 6” (2001). Program six was a round-table discussion among the directors (past, present, and future) of the FJC, all of whom were sitting federal judges, with two lawyers in private practice, Elizabeth Cabraser and me.1 One of the more exasperating moments in our conversation came when Ms. Cabraser, who represented plaintiffs in the silicone litigation, complained that Daubert was unfair because corporate defendants were able to order up supporting scientific studies, whereas poor plaintiffs counsel did not have the means to gin up studies that confirmed what they knew to be true.2 Refraining from talking over her required all the self-restraint I could muster, but I did eventually respond by denying her glib generalization and offering the silicone litigation as one in which plaintiffs, plaintiffs’ counsel, and plaintiffs’ support groups were all involved in funding and directing some of the sketchiest studies, most of which managed to find homes in so-called peer-reviewed journals of some sort, even if not the best.

The litigation connections of the plaintiff-sponsored studies in the silicone litigation were not apparent on the face of the published articles. The partisan funding and provenance of the studies were mostly undisclosed and required persistent discovery and subpoenas. Cabraser’s propaganda reinforced the recognition of what so-called mass tort litigation had taught me about all scientific studies: “trust but verify.” Verification is especially important for studies that are sponsored by litigation-industry actors who have no reputation at stake in the world of healthcare.

Verification is not a straightforward task, however. Peer-review publication usually provides some basic information about “methods and materials,” but rarely if ever do published articles provide sufficient data and detail about methodology to replicate the reported analysis. In legal proceedings, verification of studies conducted and relied upon by testifying expert witnesses is facilitated by the rules of expert witness discovery. In federal court, expert witnesses must specify all opinions and all bases for their opinions. When such witnesses rely upon their own studies, and thus have had privileged access to the complete data and all analyses, courts have generally permitted full inquiry into the underlying materials of relied-upon studies. On the other, when the author of a relied-upon study is a “stranger to the litigation,” neither a party nor a retained expert witness, courts have permitted generally more limited discovery of the study’s full data set and analyses. Regardless of the author’s status, the question remains how litigants are to challenge an adversary’s expert witness’s trusted reliance upon a study, which cannot be “verified.”

Most lawyers would prefer, of course, to call an expert witness who has actually conducted studies pertinent to the issues in the case. The price, however, of allowing the other side to discover the underlying data and materials of the author expert witness’s studies may be too high. The relied-upon studies may well end up discredited, as well as the professional reputation of the expert witness. The litigation industry has adapted to these rules of discovery by avoiding, in most instances, calling testifying expert witnesses who have published studies that might be vulnerable.3

One work-around to the discovery rules lies in the use of “consulting, non-testifying expert witnesses.” The law permits the use of such expert witnesses to some extent to facilitate candid consultations with expert witnesses, usually without concerns that communications will be shared with the adversary party and witnesses. The hope is that such candid communications will permit realistic assessment of partisan positions, as well as allowing scientists and scholars to participate in an advisory capacity without the burden of depositions, formal report writing, and appearances at judicial hearings and trials. The confidentiality of consulting expert witnesses is open to abuse by counsel who would engage the consultants to conduct and publish studies, which can then be relied upon by the testifying expert witnesses. The upshot is that legal counsel can manipulate the published literature in a favorable way, without having to disclose their financial sponsorship or influence of the published studies used by their testifying expert witnesses.

This game of hiding study data and sponsorship through the litigation industry’s use of confidential consulting expert witnesses pervades so-called mass tort litigation, which provides ample financial incentives for study sponsorship and control. Defendants will almost always be unable to play the game, without detection. A simple interrogatory or other discovery request about funding of studies will reveal the attempt to pass off a party-sponsored study as having been conducted by disinterested scientists. Furthermore, most scientists will feel obligated to reveal corporate funding as a potential conflict of interest, in their submission of manuscripts for publication.

Revealing litigation-industry (plaintiffs’) funding of studies is more complicated. First, the funding may be through one firm, which is not the legal counsel in the case for which discovery is being conducted. In such instances, the plaintiff’s lawyers can truthfully declare that they lack personal knowledge of any financial support for studies relied upon by their testifying expert witnesses. Second, the plaintiffs’ lawyer firm is not a party is not itself subject to discovery. Even if the plaintiffs’ lawyers funded a study, they can claim, with plausible deniability, that they funded the study in connection with another client’s case, not the client who is plaintiff in the case in which discovery is sought. Third, the plaintiffs’ firm may take the position, however dubious it might be, that the funding of the relied-upon study was simply a confidential consultation with the authors of that study, and not subject to discovery.

The now pending litigation against ondansetron (Zofran) provides the most recent example of the dubious use of consulting expert witnesses to hide party sponsorship of an epidemiologic study. The plaintiffs, who are claiming that Zofran causes birth defects in this multi-district litigation assigned to Judge F. Dennis Saylor, have designated Dr. Carol Luik as their sole testifying expert witness on epidemiology. Dr. Luik, in turn, has relied substantially upon a study conducted by Dr. April Zambelli-Weiner.4

According to motion papers filed by defendants,5 the plaintiffs’ counsel initially claimed that they had no knowledge of any financial support or conflicts for Dr Zambelli-Weiner. The conflict-of-interest disclosure in Zambelli-Weiner’s paper was, to say the least, suspicious:

The authors declare that there was no outside involvement in study design; in the collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication.”

As an organization TTi reports receiving funds from plaintiff law firms involved in ondansetron litigation and a manufacturer of ondansetron.”

According to its website, TTi

is an economically disadvantaged woman-owned small business headquartered in Westminster, Maryland. We are focused on the development, evaluation, and implementation of technologies and solutions that advance the transformation of data into actionable knowledge. TTi serves a diverse clientele, including all stakeholders in the health space (governments, payors, providers, pharmaceutical and device companies, and foundations) who have a vested interest in advancing research to improve patient outcomes, population health, and access to care while reducing costs and eliminating health disparities.”

According to defendants’ briefing, and contrary to plaintiffs’ initial claims and Zambelli-Weiner’s anemic conflicts disclosure, plaintiffs’ counsel eventually admitted that “Plaintiffs’ Leadership Attorneys paid $210,000 as financial support relating to” Zambelli-Weiner’s epidemiologic study. The women at TTi are apparently less economically disadvantaged than advertised.

The Zofran defendants served subpoenas duces tecum and ad testificandum on two of the study authors, Drs. April Zambelli-Weiner and Russell Kirby. Curiously, the plaintiffs (who would seem to have no interest in defending the third-party subpoenas) sought a protective order by arguing that defendants were harassing “third-party scientists.” Their motion for protection conveniently and disingenuously omitted, that Zambelli-Weiner had been a paid consultant to the Zofran plaintiffs.

Judge Saylor refused to quash the subpoenas, and Zambelli-Weiner appeared herself, through counsel, to seek a protective order. Her supporting affidavit averred that she had not been retained as an expert witness, and that she had no documents “concerning any data analyses or results that were not reported in the [published study].” Zambelli-Weiner’s attempt to evade discovery was embarrassed by her having presented a “Zofran Litigation Update” with Plaintiffs’ counsel Robert Jenner and Elizabeth Graham at a national conference for plaintiffs’ attorneys. Judge Saylor was not persuaded, and the MDL court refused Dr. Zambelli-Weiner’s motion. The law and the public has a right to every man’s, and every woman’s, (even if economically disadvantaged) evidence.6

Tellingly, in the aftermath of the motions to quash, Zambelli-Weiner’s counsel, Scott Marder, abandoned his client by filing an emergency motion to withdraw, because “certain of the factual assertions in Dr. Zambelli-Weiner’s Motion for Protective Order and Affidavit were inaccurate.” Mr. Marder also honorably notified defense counsel that he could no longer represent that Zambelli-Weiner’s document production was complete.

Early this year, on January 29, 2019, Zambelli-Weiner submitted, through new counsel, a “Supplemental Affidavit,” wherein she admitted she had been a “consulting expert” witness for the law firm of Grant & Eisenhofer on the claimed teratogenicity of Zofran.7 Zambelli-Weiner also produced a few extensively redacted documents. On February 1, 2019, Zambelli-Weiner testified at deposition that the moneys she received from Grant & Eisenhofer were not to fund her Zofran study, but for other, “unrelated work.” Her testimony was at odds with the plaintiffs’ counsel’s confession that the $210,000 related to her Zofran study.

Zambelli-Weiner’s etiolated document production was confounded by the several hundred of pages of documents produced by fellow author, Dr. Russell Kirby. When confronted with documents from Kirby’s production, Zambelli-Weiner’s lawyer unilaterally suspended the deposition.

Deja Vu All Over Again

Federal courts have seen the Zambelli maneuver before. In litigation over claimed welding fume health effects, plaintiffs’ counsel Richard (Dickie) Scruggs and colleagues funded some neurological researchers to travel to Alabama and Mississippi to “screen” plaintiffs and potential plaintiffs in litigation for over claims of neurological injury and disease from welding fume exposure, with a novel videotaping methodology. The plaintiffs’ lawyers rounded up the research subjects (a.k.a. clients and potential clients), talked to them before the medical evaluations, and administered the study questionnaires. The study subjects were clearly aware of Mr. Scruggs’ “research” hypothesis, and had already promised him 40% of any recovery.8

After their sojourn, at Scruggs’ expense to Alabama and Mississippi, the researchers wrote up their results, with little or no detail of the circumstances of how they had acquired their research “participants,” or those participants’ motives to give accurate or inaccurate medical and employment history information.9

Defense counsel served subpoenas upon both Dr. Racette and his institution, Washington University St. Louis, for the study protocol, underlying data, data codes, and all statistical analyses. Racette and Washington University resisted sharing their data and materials with every page in the Directory of Non-Transparent Research. They claimed that the subpoenas sought production of testimony, information and documents in violation of:

(1) the Federal Regulations set forth in the Department of Health and Human Services Policy for Protection of Human Research Subjects,

(2) the Federal regulations set forth in the HIPPA Regulations,

(3) the physician/patient privilege,

(4) the research scholar’s privilege,

(5) the trade secret/confidential research privilege and

(6) the scope of discovery as codified by the Federal Rules of Civil Procedure and the Missouri Rules of Civil Procedure.”

After a long discovery fight, the MDL court largely enforced the subpoenas.10 The welding MDL court ordered Racette to produce

a ‘limited data set’ which links the specific categories requested by defendants: diagnosis, occupation, and age. This information may be produced as a ‘deidentified’ data set, such that the categories would be linked to each particular patient, without using any individual patient identifiers. This data set should: (1) allow matching of each study participant’s occupational status and age with his or her neurological condition, as diagnosed by the study’s researchers; and (2) to the greatest extent possible (except for necessary de-identification), show original coding and any code-keys.”

After the defense had the opportunity to obtain and analyze the underlying data in the Scruggs-Racette study, the welding plaintiffs retreated from their epidemiologic case. Various defense expert witnesses analyzed the underlying data produced by Racette, and prepared devastating rebuttal reports. These reports were served upon plaintiffs’ counsel, whose expert witnesses never attempted any response. Reliance upon Racette’s study was withdrawn or abandoned. After the underlying data were shared with the parties to MDL 1535, no scientist appeared to defend the results in the published papers.11 The Racette Alabama study faded into the background of the subsequent welding-fume cases and trials.

The motion battle in the welding MDL revealed interesting contradictions, similar to those seen in the Zambelli-Weiner affair. For example, Racette claimed he had no relationship whatsoever with plaintiffs’ counsel, other than showing up by happenstance in Alabama at places where Scruggs’ clients also just happened to show up. Racette claimed that the men and women he screened were his patients, but he had no license to practice in Alabama, where the screenings took place. Plaintiffs’ counsel disclaimed that Racette was a treating physician, which acknowledgment would have made the individual’s screening results discoverable in their individual cases. And more interestingly, plaintiffs’ counsel claimed that both Dr. Racette and Washington University were “non-testifying, consulting experts utilized to advise and assist Plaintiffs’ counsel with respect to evaluating and assessing each of their client’s potential lawsuit or claim (or not).”12

Over the last decade or so, best practices and codes of conduct for the relationship between pharmacoepidemiologists and study funders have been published.13 These standards apply with equal force to public agencies, private industry, and regulatory authories. Perhaps it is time for them to specify that the apply to the litigation industry as well.


1 See Smith v. Wyeth-Ayerst Labs. Co., 278 F. Supp. 2d 684, 710 & n. 56 (W.D.N.C. 2003).

2 Ironically, Ms. Cabraser has published her opinion that failure to disclose conflicts of interest and study funding should result in evidentiary exclusions, a view which would have simplified and greatly shortened the silicone gel breast implant litigation. See Elizabeth J. Cabraser, Fabrice Vincent & Alexandra Foote, “Ethics and Admissibility: Failure to Disclose Conflicts of Interest in and/or Funding of Scientific Studies and/or Data May Warrant Evidentiary Exclusions,” Mealey’s Emerging Drugs Reporter (Dec. 2002).

3 Litigation concerning Viagra is one notable example where plaintiffs’ counsel called an expert witness who was the author of the very study that supposedly supported their causal claim. It did not go well for the plaintiffs or the expert witness. See Lori B. Leskin & Bert L. Slonim, “A Primer on Challenging Peer-Reviewed Scientific Literature in Mass Tort and Product Liability Actions,” 25 Toxics L. Rptr. 651 (Jul. 1, 2010).

4 April Zambelli‐Weiner, Christina Via, Matt Yuen, Daniel Weiner, and Russell S. Kirby, “First Trimester Pregnancy Exposure to Ondansetron and Risk of Structural Birth Defects,” 83 Reproductive Toxicology 14 (2019).

5 Nate Raymond, “GSK accuses Zofran plaintiffs’ law firms of funding academic study,” Reuters (Mar. 5, 2019).

6 See Branzburg v. Hayes, 408 U.S. 665, 674 (1972).

7 Affidavit of April Zambelli-Weiner, dated January 9, 2019 (Doc. No. 1272).

8 The plaintiffs’ lawyers’ motive and opportunity to poison the study by coaching their “clients” was palpable. See David B. Resnik & David J. McCann, “Deception by Research Participants,” 373 New Engl. J. Med. 1192 (2015).

9 See Brad A. Racette, S.D. Tabbal, D. Jennings, L. Good, J.S. Perlmutter, and Brad Evanoff, “Prevalence of parkinsonism and relationship to exposure in a large sample of Alabama welders,” 64 Neurology 230 (2005); Brad A. Racette, et al., “A rapid method for mass screening for parkinsonism,” 27 Neurotoxicology 357 (2006) (a largely duplicative report of the Alabama welders study).

10 See, e.g., In re Welding Fume Prods. Liab. Litig., MDL 1535, 2005 WL 5417815 (N.D. Ohio Oct. 18, 2005) (upholding defendants’ subpoena for protocol, data, data codes, statistical analyses, and other things from Dr. Racette’s Alabama study on welding and parkinsonism).

11 Racette sought and obtained a protective order for the data produced, and thus I still cannot share the materials he provided asking that any reviewer sign the court-mandated protective order. Revealingly, Racette was concerned about who had seen his underlying data, and he obtained a requirement in the court’s non-disclosure affidavit that any one who reviews the underlying data will not sit on peer review of his publications or his grant applications. See Motion to Compel List of Defendants’ Reviewers of Data Produced by Brad A. Racette, M.D., and Washington University Pursuant to Protective Order, in In re Welding Fume Products Liab. Litig., MDL No. 1535, Case 1:03-cv-17000-KMO, Document 1642-1 (N.D. Ohio Feb. 14, 2006). Curiously, Racette never moved to compel a list of Plaintiffs’ Reviewers!

12 Plaintiffs’ Motion for Protective Order, Motion to Reconsider Order Requiring Disclovery from Dr. Racette, and Request for In Camera Inspection as to Any Responses or Information Provided by Dr. Racette, filed in Solis v. Lincoln Elec. Co., case No. 1:03-CV-17000, MDL 1535 (N.D. Ohio May 8, 2006).

13 See, e.g., Xavier Kurz, Susana Perez‐Gutthann, and the ENCePP Steering Group, “Strengthening standards, transparency, and collaboration to support medicine evaluation: Ten years of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP),” 27 Pharmacoepidem. & Drug Safety 245 (2018).

Statistical Deontology

March 2nd, 2018

In courtrooms across America, there has been a lot of buzzing and palavering about the American Statistical Association’s Statement on Statistical Significance Testing,1 but very little discussion of the Society’s Ethical Guidelines, which were updated and promulgated in the same year, 2016. Statisticians and statistics, like lawyers and the law, receive their fair share of calumny over their professional activities, but the statistician’s principal North American professional organization is trying to do something about members’ transgressions.

The American Statistical Society (ASA) has promulgated ethical guidelines for statisticians, as has the Royal Statistical Society,2 even if these organizations lack the means and procedures to enforce their codes. The ASA’s guidelines3 are rich with implications for statistical analyses put forward in all contexts, including in litigation and regulatory rule making. As such, the guidelines are well worth studying by lawyers.

The ASA Guidelines were prepared by the Committee on Professional Ethics, and approved by the ASA’s Board in April 2016. There are lots of “thou shall” and “thou shall nots,” but I will focus on the issues that are more likely to arise in litigation. What is remarkable about the Guidelines is that if followed, they probably are more likely to eliminate unsound statistical practices in the courtroom than the ASA State on P-values.

Defining Good Statistical Practice

Good statistical practice is fundamentally based on transparent assumptions, reproducible results, and valid interpretations.” Guidelines at 1. The Guidelines thus incorporate something akin to the Kumho Tire standard that an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

A statistician engaged in expert witness testimony should provide “only expert testimony, written work, and oral presentations that he/she would be willing to have peer reviewed.” Guidelines at 2. “The ethical statistician uses methodology and data that are relevant and appropriate, without favoritism or prejudice, and in a manner intended to produce valid, interpretable, and reproducible results.” Id. Similarly, the statistician, if ethical, will identify and mitigate biases, and use analyses “appropriate and valid for the specific question to be addressed, so that results extend beyond the sample to a population relevant to the objectives with minimal error under reasonable assumptions.” Id. If the Guidelines were followed, a lot of spurious analyses would drop off the litigation landscape, regardless whether they used p-values or confidence intervals, or a Bayesian approach.

Integrity of Data and Methods

The ASA’s Guidelines also have a good deal to say about data integrity and statistical methods. In particular, the Guidelines call for candor about limitations in the statistical methods or the integrity of the underlying data:

The ethical statistician is candid about any known or suspected limitations, defects, or biases in the data that may impact the integrity or reliability of the statistical analysis. Objective and valid interpretation of the results requires that the underlying analysis recognizes and acknowledges the degree of reliability and integrity of the data.”

Guidelines at 3.

The statistical analyst openly acknowledges the limits of statistical inference, the potential sources of error, as well as the statistical and substantive assumptions made in the execution and interpretation of any analysis,” including data editing and imputation. Id. The Guidelines urge analysts to address potential confounding not assessed by the study design. Id. at 3, 10. How often do we see these acknowledgments in litigation-driven analyses, or in peer-reviewed papers, for that matter?

Affirmative Actions Prescribed

In the aid of promoting data and methodological integrity, the Guidelines also urge analysts to share data when appropriate without revealing the identities of study participants. Statistical analysts should publicly correct any disseminated data and analyses in their own work, as well as working to “expose incompetent or corrupt statistical practice.” Of course, the Lawsuit Industry will call this ethical duty “attacking the messenger,” but maybe that’s a rhetorical strategy based upon an assessment of risks versus benefits to the Lawsuit Industry.

Multiplicity

The ASA Guidelines address the impropriety of substantive statistical errors, such as:

[r]unning multiple tests on the same data set at the same stage of an analysis increases the chance of obtaining at least one invalid result. Selecting the one “significant” result from a multiplicity of parallel tests poses a grave risk of an incorrect conclusion. Failure to disclose the full extent of tests and their results in such a case would be highly misleading.”

Guidelines at 9.

There are some Lawsuit Industrialists who have taken comfort in the pronouncements of Kenneth Rothman on corrections for multiple comparisons. Rothman’s views on multiple comparisons are, however, much broader and more nuanced than the Industry’s sound bites.4 Given that Rothman opposes anything like strict statistical significance testing, it follows that he is relatively unmoved for the need for adjustments to alpha or the coefficient of confidence. Rothman, however, has never deprecated the need to consider the multiplicity of testing, and the need for researchers to be forthright in disclosing the the scope of comparisons originally planned and actually done.


2 Royal Statistical Society – Code of Conduct (2014); Steven Piantadosi, Clinical Trials: A Methodologic Perspective 609 (2d ed. 2005).

3 Shelley Hurwitz & John S. Gardenier, “Ethical Guidelines for Statistical Practice: The First 60 Years and Beyond,” 66 Am. Statistician 99 (2012) (describing the history and evolution of the Guidelines).

4 Kenneth J. Rothman, “Six Persistent Research Misconceptions,” 29 J. Gen. Intern. Med. 1060, 1063 (2014).

Gatekeeping of Expert Witnesses Needs a Bair Hug

December 20th, 2017

For every Rule 702 (“Daubert”) success story, there are multiple gatekeeping failures. See David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).1 Exemplars of inadequate expert witness gatekeeping in state or federal court abound, and overwhelm the bar. The only solace one might find is that the abuse-of-discretion appellate standard of review keeps the bad decisions from precedentially outlawing the good ones.

Judge Joan Ericksen recently provided another Berenstain Bears’ example of how not to keep the expert witness gate, in litigation claims that the Bair Hugger forced air warming devices (“Bair Huggers”) cause infections. In re Bair Hugger Forced Air Warming, MDL No. 15-2666, 2017 WL 6397721 (D. Minn. Dec. 13, 2017). Although Her Honor properly cited and quoted Rule 702 (2000), a new standard is announced in a bold heading:

Under Federal Rule of Evidence 702, the Court need only exclude expert testimony that is so fundamentally unsupported that it can offer no assistance to the jury.”

Id. at *1. This new standard thus permits largely unsupported opinion that can offer bad assistance to the jury. As Judge Ericksen demonstrates, this new standard, which has no warrant in the statutory text of Rule 702 or its advisory committee notes, allows expert witnesses to rely upon studies that have serious internal and external validity flaws.

Jonathan Samet, a specialist in pulmonary medicine, not infectious disease or statistics, is one of the plaintiffs’ principal expert witnesses. Samet relies in large measure upon an observational study2, which purports to find an increased odds ratio for use of the Bair Hugger among infection cases in one particular hospital. The defense epidemiologist, Jonathan B. Borak, criticized the McGovern observational study on several grounds, including that the study was highly confounded by the presence of other known infection risks. Id. at *6. Judge Ericksen characterized Borak’s opinion as an assertion that the McGovern study was an “insufficient basis” for the plaintiffs’ claims. A fair reading of even Judge Ericksen’s précis of Borak’s proffered testimony requires the conclusion that Borak’s opinion was that the McGovern study was invalid because of data collection errors and confounding. Id.

Judge Ericksen’s judicial assessment, taken from the disagreement between Samet and Borak, is that there are issues with the McGovern study, which go to “weight of the evidence.” This finding obscures, however, that there were strong challenges to the internal and external validity of the study. Drawing causal inferences from an invalid observational study is a methodological issue, not a weight-of-the-evidence problem for the jury to resolve. This MDL opinion never addresses the Rule 703 issue, whether an epidemiologic expert would reasonably rely upon such a confounded study.

The defense proffered the opinion of Theodore R. Holford, who criticized Dr. Samet for drawing causal inferences from the McGovern observational study. Holford, a professor of biostatistics at Yale University’s School of Public Health, analyzed the raw data behind the McGovern study. Id. at *8. The plaintiffs challenged Holford’s opinions on the ground that he relied on data in “non-final” form, from a temporally expanded dataset. Even more intriguingly, given that the plaintiffs did not present a statistician expert witness, plaintiffs argued that Holford’s opinions should be excluded because

(1) he insufficiently justified his use of a statistical test, and

(2) he “emphasizes statistical significance more than he would in his professional work.”

Id.

The MDL court dismissed the plaintiffs’ challenge on the mistaken conclusion that the alleged contradictions between Holford’s practice and his testimony impugn his credibility at most.” If there were truly such a deviation from the statistical standard of care, the issue is methodological, not a credibility issue of whether Holford was telling the truth. And as for the alleged over-emphasis on statistical significance, the MDL court again falls back to the glib conclusions that the allegation goes to the weight, not the admissibility of expert witness opinion testimony, and that plaintiffs can elicit testimony from Dr Samet as to how and why Professor Holford over-emphasized statistical significance. Id. Inquiring minds, at the bar, and in the academy, are left with no information about what the real issues are in the case.

Generally, both sides’ challenges to expert witnesses were denied.3 The real losers, however, were the scientific and medical communities, bench, bar, and general public. The MDL court glibly and incorrectly treated methodological issues as “credibility” issues, confused sufficiency with validity, and banished methodological failures to consideration by the trier of fact for “weight.” Confounding was mistreated as simply a debating point between the parties’ expert witnesses. The reader of Judge Ericksen’s opinion never learns what statistical test was used by Professor Holford, what justification was needed but allegedly absent for the test, why the justification was contested, and what other test was alleged by plaintiffs to have been a “better” statistical test. As for the emphasis given statistical significance, the reader is left in the dark about exactly what that emphasis was, and how it led to Holford’s conclusions and opinions, and what the proper emphasis should have been.

Eventually appellate review of the Bair Hugger MDL decision must turn on whether the district court abused its discretion. Although appellate courts give trial judges discretion to resolve Rule 702 issues, the appellate courts cannot reach reasoned decisions when the inferior courts fail to give even a cursory description of what the issues were, and how and why they were resolved as they were.


2 P. D. McGovern, M. Albrecht, K. G. Belani, C. Nachtsheim, P. F. Partington, I. Carluke, and M. R. Reed, “Forced-Air Warming and Ultra-Clean Ventilation Do Not Mix: An Investigation of Theatre Ventilation, Patient Warming and Joint Replacement Infection in Orthopaedics,” 93 J. Bone Joint 1537 (2011). The article as published contains no disclosures of potential or actual conflicts of interest. A persistent rumor has it that the investigators were funded by a commercial rival to the manufacturer of the Bair Hugger at issue in Judge Ericksen’s MDL. See generally, Melissa D. Kellam, Loraine S. Dieckmann, and Paul N. Austin, “Forced-Air Warming Devices and the Risk of Surgical Site Infections,” 98 Ass’n periOperative Registered Nurses (AORN) J. 354 (2013).

3 A challenge to plaintiffs’ expert witness Yadin David was sustained to the extent he sought to offer opinions about the defendant’s state of mind. Id. at *5.

White Hat Bias in the Lab and in the Courtroom

February 20th, 2017

nqhefb6sjs