TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

April 13th, 2024

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.

The Maestro and Mesothelioma – Wikipedia & False Claims

January 21st, 2024

The Maestro is a biographical film of the late Leonard Bernstein. The film, starring Bradley Cooper as Bernstein, had a limited release before streaming on Netflix. As a work of biography, the film is peculiar in its focus on Bernstein’s sexuality and filandering, while paying virtually no attention to his radical chic politics, or his engagement with teaching music appreciation.

In any event, the film sent me to Wikipedia to fact check some details of Bernstein’s life, and I was surprised to see that Wikipedia described Bernstein’s cause of death as involving mesothelioma:

“Bernstein announced his retirement from conducting on October 9, 1990.[174] He died five days later at the age of 72, in his New York apartment at The Dakota, of a heart attack brought on by mesothelioma.[175][2]”

Bernstein certainly did not have occupational exposure to amphibole asbestos, but he did smoke cigarettes, several packs a day, for decades. Mesothelioma seemed unlikely, unless perhaps he smoked Kent cigarettes in the 1950s, when they had crocidolite filters. As you can see from the above quote, the Wikipedia article cites two sources, a newspaper account and a book. Footnote number 2 is an obituary was written by Donal Henahan, and printed in the New York Times.[1] The Times reported that:

“Leonard Bernstein, one of the most prodigally talented and successful musicians in American history, died yesterday evening at his apartment at the Dakota on the Upper West Side of Manhattan. He was 72 years old.

*   *   *   *   *   *   *

Mr. Bernstein’s spokeswoman, Margaret Carson, said he died of a heart attack caused by progressive lung failure.”

There is no mention of mesothelioma in the Times article, and the citation provided does not support the assertion that mesothelioma was involved in the cause of Bernstein’s death. The obituary cited was published the day following Bernstein’s death the night before, which suggests that there was no information from an autopsy, which would have been important in ascertaining any tissue pathology for an accurate and complete cause of death. In 1990, the diagnosis of malignant mesothelioma was often uncertain, even with extensive tissue available post-mortem.

The other citation provided by the Wikipedia article was even less impressive. Footnote 175 pointed to a book of short articles on musicians, with an entry for Bernstein.[2] The book tells us that

“Bernstein is most remembered, perhaps, for his flamboyant conducting style. *** Leonard Bernstein died at his home from cardiac arrest brought on by mesothelioma.”

The blurb on Bernstein provides no support for the statement that cardiac arrest was brought on by mesothelioma, and the narrative struck me as odd in leaving out the progressive lung failure caused by non-malignant smoking-induced lung disease.

I set out to find what else may have been written about the causes of Bernstein’s death. I was surprised to find other references to mesothelioma, but all without any support. One online article seemed promising, but offered a glib conclusion without any source:

“Leonard Bernstein, a towering figure in American music, met his end on October 14, 1990, just five days after retiring from his illustrious career as a conductor. Found in his New York apartment, the cause of his death was a heart attack induced by mesothelioma, a consequence of a lifetime of smoking.”[3]

The lack of any foot- or end-notes disqualifies this source, and others, for establishing a diagnosis of mesothelioma. Other internet articles, inspired by the Cooper production of Maestro, made very similar statements, all without citing any source.[4] Some of the internet articles likely plagiarized others, but I was unable to find who first gave rise to the conclusion that Bernstein died of complications of “mesothelioma” caused by smoking.

Whence came the Wikipedia’s pronouncement that Bernstein died of, or with, mesothelioma? Two “mainstream” print newspapers provided some real information and insight. An article in the Washington Post elaborated on Bernstein’s final illness and the cause of his death:

“Leonard Bernstein, 72, a giant in the American musical community who was simultaneously one of this nation’s most respected and versatile composers and preeminent conductors, died yesterday at his Manhattan apartment. He died in the presence of his physician, who said the cause of death was sudden cardiac arrest caused by progressive lung failure.

On the advice of the doctor, Kevin M. Cahill, Bernstein had announced through a spokeswoman Tuesday that he would retire from conducting. Cahill said progressive emphysema complicated by a pleural tumor and a series of lung infections had left Bernstein too weak to continue working.”[5]

Ah a pleural tumor, but no report or representation that it was malignant mesothelioma.

The Los Angeles Times, with the benefit of an extra three hours to prepare its obituary for a west coast audience, provided similar, detailed information about Bernstein’s death:

“Bernstein, known and beloved by the world as ‘Lenny’, died at 6:15 p.m. in the presence of his son, Alexander, and physician, Kevin M. Cahill, who said the cause of death was complications of progressive lung failure. On Cahill’s advice, the conductor had announced Tuesday that he would retire. Cahill said progressive emphysema complicated by a pleural tumor and a series of lung infections had left Bernstein too weak to continue working.”[6]

Now a pleural tumor can be benign or malignant. And if the tumor were malignant, it may or may not be a primary tumor of the pleura. Metastatic lesions of the pleura, or in the lung parenchyma adjacent to the pleura are common enough that the physician’s statement about tumor of the pleura cannot be transformed into a conclusion about mesothelioma.[7]

Feeling good about having sorted a confusion, I thought I could add to the font of all knowledge, Wikipedia, by editing its unsupported statement about mesothelioma to “pleural tumor.” I made the edit, but within a few days, someone had changed the text back to mesothelioma, without adding any support. The strength of any statement is, of course, based entirely upon its support and the strength of its inferences. Wikipedia certainly can be a reasonable starting place to look for information, but it has no ability to support a claim, whether historical, scientific, or medical. Perhaps I should have added the citation to the Washington Post obituary when I made my edit. Still, it was clear that nothing in article’s footnotes supported the text, and someone felt justified in returning the mention of mesothelioma based upon two completely unsupportive sources. Not only is the Bernstein article in Wikipedia suspect, but there is actually an entry in Wikipedia for “Deaths from Mesothelioma,” which lists Bernstein as well. The article has but one sentence: “This is a list of individuals who have died as a result of mesothelioma, which is usually caused by exposure to asbestos.” And then follows a list of 67 persons, of varying degree of noteworthiness, who supposedly died of mesothelioma. I wonder how many of the entries are false.


[1] Donal  Henahan, “Leonard Bernstein, 72, Music’s Monarch, Dies,” New York Times (October 15, 1990).

[2] Scott Stanton, The Tombstone Tourist: Musicians at 29 (2003).

[3] Soumyadeep Ganguly, “Leonard Bernstein’s cause of death explored: How does Bradley Cooper Maestro end? Movie ending explored,” SK POP (modified Dec 25, 2023).

[4] See, e.g., Gargi Chatterjee, “How did Leonard Bernstein die?” pinkvilla (Dec 23, 2023).

[5] Bart Barnes, “Conductor Leonard Bernstein Dies at 72,” Wash. Post (Oct. 15, 1990) (emphasis added).

[6] Myrna Oliver, “Leonard Bernstein Dies; Conductor, Composer. Renaissance man of his art was 72. The longtime leader of the N.Y. Philharmonic carved a niche in history with ‘West Side Story’,” Los Angeles Times (Oct. 15, 1990) (emphasis added).

[7] See, e.g., Julie Desimpel, Filip M. Vanhoenacker, Laurens Carp, and Annemiek Snoeckx, “Tumor and tumorlike conditions of the pleura and juxtapleural region: review of imaging findings,” 12 Insights Imaging 97 (2021).

The Role of Peer Review in Rule 702 and 703 Gatekeeping

November 19th, 2023

“There is no expedient to which man will not resort to avoid the real labor of thinking.”
              Sir Joshua Reynolds (1723-92)

Some courts appear to duck the real labor of thinking, and the duty to gatekeep expert witness opinions,  by deferring to expert witnesses who advert to their reliance upon peer-reviewed published studies. Does the law really support such deference, especially when problems with the relied-upon studies are revealed in discovery? A careful reading of the Supreme Court’s decision in Daubert, and of the Reference Manual on Scientific Evidence provides no support for admitting expert witness opinion testimony that relies upon peer-reviewed published studies, when the studies are invalid or are based upon questionable research practices.[1]

In Daubert v. Merrell Dow Pharmaceuticals, Inc.,[2] The Supreme Court suggested that peer review of studies relied upon by a challenged expert witness should be a factor in determining the admissibility of that expert witness’s opinion. In thinking about the role of peer-review publication in expert witness gatekeeping, it is helpful to remember the context of how and why the Supreme was talking about peer review in the first place. In the trial court, the Daubert plaintiff had proffered an expert witness opinion that featured reliance upon an unpublished reanalysis of published studies. On the defense motion, the trial court excluded the claimant’s witness,[3] and the Ninth Circuit affirmed.[4] The intermediate appellate court expressed its view that unpublished, non-peer-reviewed reanalyses were deviations from generally accepted scientific discourse, and that other appellate courts, considering the alleged risks of Bendectin, refused to admit opinions based upon unpublished, non-peer-reviewed reanalyses of epidemiologic studies.[5] The Circuit expressed its view that reanalyses are generally accepted by scientists when they have been verified and scrutinized by others in the field. Unpublished reanalyses done for solely for litigation would be an insufficient foundation for expert witness opinion.[6]

The Supreme Court, in Daubert, evaded the difficult issues involved in evaluating a statistical analysis that has not been published by deciding the case on the ground that the lower courts had applied the wrong standard.  The so-called Frye test, or what I call the “twilight zone” test comes from the heralded 1923 case excluding opinion testimony based upon a lie detector:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”[7]

The Supreme Court, in Daubert, held that with the promulgation of the Federal Rules of Evidence in 1975, the twilight zone test was no longer legally valid. The guidance for admitting expert witness opinion testimony lay in Federal Rule of Evidence 702, which outlined an epistemic test for “knowledge,” which would be helpful to the trier of fact. The Court then proceeded to articulate several  non-definitive factors for “good science,” which might guide trial courts in applying Rule 702, such as testability or falsifiability, a showing of known or potential error rate. Another consideration, general acceptance carried over from Frye as a consideration.[8] Courts have continued to build on this foundation to identify other relevant considerations in gatekeeping.[9]

One of the Daubert Court’s pertinent considerations was “whether the theory or technique has been subjected to peer review and publication.”[10] The Court, speaking through Justice Blackmun, provided a reasonably cogent, but probably now out-dated discussion of peer review:

 “Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?,” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”[11]

To the extent that peer review was touted by Justice Blackmun, it was because the peer-review process advanced the ultimate consideration of the scientific validity of the opinion or claim under consideration. Validity was the thing; peer review was just a crude proxy.

If the Court were writing today, it might well have written that peer review is often a feature of bad science, advanced by scientists who know that peer-reviewed publication is the price of admission to the advocacy arena. And of course, the wild proliferation of journals, including the “pay-to-play” journals, facilitates the festschrift.

Reference Manual on Scientific Evidence

Certainly, judicial thinking evolved since 1993, and the decision in Daubert. Other considerations for gatekeeping have been added. Importantly, Daubert involved the interpretation of a statute, and in 2000, the statute was amended.

Since the Daubert decision, the Federal Judicial Center and the National Academies of Science have weighed in with what is intended to be guidance for judges and lawyers litigating scientific and technical issue. The Reference Manual on Scientific Evidence is currently in a third edition, but a fourth edition is expected in 2024.

How does the third edition[12] treat peer review?

An introduction by now retired Associate Justice Stephen Breyer blandly reports the Daubert considerations, without elaboration.[13]

The most revealing and important chapter in the Reference Manual is the one on scientific method and procedure, and sociology of science, “How Science Works,” by Professor David Goodstein.[14] This chapter’s treatment is not always consistent. In places, the discussion of peer review is trenchant. At other places, it can be misleading. Goodstein’s treatment, at first, appears to be a glib endorsement of peer review as a substitute for critical thinking about a relied-upon published study:

“In the competition among ideas, the institution of peer review plays a central role. Scientifc articles submitted for publication and proposals for funding often are sent to anonymous experts in the field, in other words, to peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (space in prestigious journals, funds from government agencies or private foundations) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their toughest competitor is rigorously honest in the reporting of scientific results, which makes it easy for a purposefully dishonest scientist to fool a referee. Despite all of this, peer review is one of the venerated pillars of the scientific edifice.”[15]

A more nuanced and critical view emerges in footnote 11, from the above-quoted passage, when Goodstein discusses how peer review was framed by some amici curiae in the Daubert case:

“The Supreme Court received differing views regarding the proper role of peer review. Compare Brief for Amici Curiae Daryl E. Chubin et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993) (No. 92-102) (“peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty”), with Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993) (No. 92-102) (proposing that publication in a peer-reviewed journal be the primary criterion for admitting scientifc evidence in the courtroom). See generally Daryl E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990); Arnold S. Relman & Marcia Angell, How Good Is Peer Review? 321 New Eng. J. Med. 827–29 (1989). As a practicing scientist and frequent peer reviewer, I can testify that Chubin’s view is correct.”[16]

So, if, as Professor Goodstein attests, Chubin is correct that peer review does not “guarantee accuracy or veracity or certainty,” the basis for veneration is difficult to fathom.

Later in Goodstein’s chapter, in a section entitled “V. Some Myths and Facts about Science,” the gloves come off:[17]

Myth: The institution of peer review assures that all published papers are sound and dependable.

Fact: Peer review generally will catch something that is completely out of step with majority thinking at the time, but it is practically useless for catching outright fraud, and it is not very good at dealing with truly novel ideas. Peer review mostly assures that all papers follow the current paradigm (see comments on Kuhn, above). It certainly does not ensure that the work has been fully vetted in terms of the data analysis and the proper application of research methods.”[18]

Goodstein is not a post-modern nihilist. He acknowledges that “real” science can be distinguished from “not real science.” He can hardly be seen to have given a full-throated endorsement to peer review as satisfying the gatekeeper’s obligation to evaluate whether a study can be reasonably relied upon, or whether reliance upon such a particular peer-reviewed study can constitute sufficient evidence to render an expert witness’s opinion helpful, or the application of a reliable methodology.

Goodstein cites, with apparent approval, the amicus brief filed by the New England Journal of Medicine, and other journals, which advised the Supreme Court that “good science,” requires a “a rigorous trilogy of publication, replication and verification before it is relied upon.” [19]

“Peer review’s ‘role is to promote the publication of well-conceived articles so that the most important review, the consideration of the reported results by the scientific community, may occur after publication.’”[20]

Outside of Professor Goodstein’s chapter, the Reference Manual devotes very little ink or analysis to the role of peer review in assessing Rule 702 or 703 challenges to witness opinions or specific studies.  The engineering chapter acknowledges that “[t]he topic of peer review is often raised concerning scientific and technical literature,” and helpfully supports Goodstein’s observations by noting that peer review “does not ensure accuracy or validity.”[21]

The chapter on neuroscience is one of the few chapters in the Reference Manual, other than Professor Goodstein’s, to address the limitations of peer review. Peer review, if absent, is highly suspicious, but its presence is only the beginning of an evaluation process that continues after publication:

Daubert’s stress on the presence of peer review and publication corresponds nicely to scientists’ perceptions. If something is not published in a peer-reviewed journal, it scarcely counts. Scientists only begin to have confidence in findings after peers, both those involved in the editorial process and, more important, those who read the publication, have had a chance to dissect them and to search intensively for errors either in theory or in practice. It is crucial, however, to recognize that publication and peer review are not in themselves enough. The publications need to be compared carefully to the evidence that is proffered.[22]

The neuroscience chapter goes on to discuss peer review also in the narrow context of functional magnetic resonance imaging (fMRI). The authors note that fMRI, as a medical procedure, has been the subject of thousands of peer-reviewed, but those peer reviews do little to validate the use of fMRI as a high-tech lie detector.[23] The mental health chapter notes in a brief footnote that the science of memory is now well accepted and has been subjected to peer review, and that “[c]areful evaluators” use only tests that have had their “reliability and validity confirmed in peer-reviewed publications.”[24]

Echoing other chapters, the engineering chapter also mentions peer review briefly in connection with qualifying as an expert witness, and in validating the value of accrediting societies.[25]  Finally, the chapter points out that engineering issues in litigation are often sufficiently novel that they have not been explored in peer-reviewed literature.[26]

Most of the other chapters of the Reference Manual, third edition, discuss peer review only in the context of qualifications and membership in professional societies.[27] The chapter on exposure science discusses peer review only in the narrow context of a claim that EPA guidance documents on exposure assessment are peer reviewed and are considered “authoritative.”[28]

Other chapters discuss peer review briefly and again only in very narrow contexts. For instance, the epidemiology chapter discusses peer review in connection with two very narrow issues peripheral to Rule 702 gatekeeping. First, the chapter raises the question (without providing a clear answer) whether non-peer-reviewed studies should be included in meta-analyses.[29] Second, the chapter asserts that “[c]ourts regularly affirm the legitimacy of employing differential diagnostic methodology,” to determine specific causation, on the basis of several factors, including the questionable claim that the methodology “has been subjected to peer review.”[30] There appears to be no discussion in this key chapter about whether, and to what extent, peer review of published studies can or should be considered in the gatekeeping of epidemiologic testimony. There is certainly nothing in the epidemiology chapter, or for that matter elsewhere in the Reference Manual, to suggest that reliance upon a peer-reviewed published study pretermits analysis of that study to determine whether it is indeed internally valid or reasonably relied upon by expert witnesses in the field.


[1] See Jop de Vrieze, “Large survey finds questionable research practices are common: Dutch study finds 8% of scientists have committed fraud,” 373 Science 265 (2021); Yu Xie, Kai Wang, and Yan Kong, “Prevalence of Research Misconduct and Questionable Research Practices: A Systematic Review and Meta-Analysis,” 27 Science & Engineering Ethics 41 (2021).

[2] 509 U.S. 579 (1993).

[3]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 727 F.Supp. 570 (S.D.Cal.1989).

[4] 951 F. 2d 1128 (9th Cir. 1991).

[5]  951 F. 2d, at 1130-31.

[6] Id. at 1131.

[7] Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923) (emphasis added).

[8]  Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 590 (1993).

[9] See, e.g., In re TMI Litig. II, 911 F. Supp. 775, 787 (M.D. Pa. 1995) (considering the relationship of the technique to methods that have been established to be reliable, the uses of the method in the actual scientific world, the logical or internal consistency and coherence of the claim, the consistency of the claim or hypothesis with accepted theories, and the precision of the claimed hypothesis or theory).

[10] Id. at  593.

[11] Id. at 593-94.

[12] National Research Council, Reference Manual on Scientific Evidence (3rd ed. 2011) [RMSE]

[13] Id., “Introduction” at 1, 13

[14] David Goodstein, “How Science Works,” RMSE 37.

[15] Id. at 44-45.

[16] Id. at 44-45 n. 11 (emphasis added).

[17] Id. at 48 (emphasis added).

[18] Id. at 49 n.16 (emphasis added)

[19] David Goodstein, “How Science Works,” RMSE 64 n.45 (citing Brief for the New England Journal of Medicine, et al., as Amici Curiae supporting Respondent, 1993 WL 13006387 at *2, in Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).

[20] Id. (citing Brief for the New England Journal of Medicine, et al., 1993 WL 13006387 *3)

[21] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 938 (emphasis added).

[22] Henry T. Greely & Anthony D. Wagner, “Reference Guide on Neuroscience,” RMSE 747, 786.

[23] Id. at 776, 777.

[24] Paul S. Appelbaum, “Reference Guide on Mental Health Evidence,” RMSE 813, 866, 886.

[25] Channing R. Robertson, John E. Moalli, David L. Black, “Reference Guide on Engineering,” RMSE 897, 901, 931.

[26] Id. at 935.

[27] Daniel Rubinfeld, “Reference Guide on Multiple Regression,” 303, 328 RMSE  (“[w]ho should be qualified as an expert?”); Shari Seidman Diamond, “Reference Guide on Survey Research,” RMSE 359, 375; Bernard D. Goldstein & Mary Sue Henifin, “Reference Guide on Toxicology,” RMSE 633, 677, 678 (noting that membership in some toxicology societies turns in part on having published in peer-reviewed journals).

[28] Joseph V. Rodricks, “Reference Guide on Exposure Science,” RMSE 503, 508 (noting that EPA guidance documents on exposure assessment often are issued after peer review).

[29] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE 549, 608.

[30] Id. at 617-18 n.212.

Links, Ties, and Other Hook Ups in Risk Factor Epidemiology

July 5th, 2023

Many journalists struggle with reporting the results from risk factor epidemiology. Recently, JAMA Network Open published an epidemiologic study (“Williams study”) that explored whether exposure to Agent Orange amoby ng United States military veterans was associated with bladder cancer.[1] The reported study found little to no association, but lay and scientific journalists described the study as finding a “link,”[2] or a “tie,”[3] thus suggesting causality. One web-based media report stated, without qualification, that Agent Orange “increases bladder cancer risk.”[4]

 

Even the authors of the Williams study described the results inconsistently and hyperbolically. Within the four corners of the published article, the authors described their having found a “modestly increased risk of bladder cancer,” and then, on the same page, they reported that “the association was very slight (hazard ratio, 1.04; 95% C.I.,1.02-1.06).”

In one place, the Williams study states it looked at a cohort of 868,912 veterans with exposure to Agent Orange, and evaluated bladder cancer outcomes against outcomes in 2,427,677 matched controls. Elsewhere, they report different numbers, which are hard to reconcile. In any event, the authors had a very large sample size, which had the power to detect theoretically small differences as “statistically significant” (p < 0.05). Indeed, the study was so large that even a very slight disparity in rates between the exposed and unexposed cohort members could be “statistically significantly” different, notwithstanding that systematic error certainly played a much larger role in the results than random error. In terms of absolute numbers, the researchers found 50,781 bladder cancer diagnoses, on follow-up of 28,672,655 person-years. There were overall 2.1% bladder cancers among the exposed servicemen, and 2.0% among the unexposed. Calling this putative disparity a “modest association” is a gross overstatement, and it is difficult to square the authors’ pronouncement of a “modest association” with a “very slight increased risk.”

The authors also reported that there was no association between Agent Orange exposure and aggressiveness of bladder cancer, with bladder wall muscle invasion taken to be the marker of aggressiveness. Given that the authors were willing to proclaim a hazard ratio of 1.04 as an association, this report of no association with aggressiveness is manifestly false. The Williams study found a decreased odds of a diagnosis of muscle-invasive bladder cancer among the exposed cases, with an odds ratio of 0.91, 95% CI 0.85-0.98 (p = 0.009). The study thus did not find an absence of an association, but rather an inverse association.

Causality

Under the heading of “Meaning,” the authors wrote that “[t]hese findings suggest an association between exposure to Agent Orange and bladder cancer, although the clinical relevance of this was unclear.” Despite disclaiming a causal interpretation of their results, Williams and colleagues wrote that their results “support prior investigations and further support bladder cancer to be designated as an Agent Orange-associated disease.”

Williams and colleagues note that the Institute of Medicine had suggested that the association between Agent Orange exposure and bladder cancer outcomes required further research.[5] Requiring additional research was apparently sufficient for the Department of Veterans Affairs, in 2021, to assume facts not in evidence, and to designate “bladder cancer as a cancer caused by Agent Orange exposure.”[6]

Williams and colleagues themselves appear to disavow a causal interpretation of their results: “we cannot determine causality given the retrospective nature of our study design.” They also acknowledged their inability to “exclude potential selection bias and misclassification bias.” Although the authors did not explore the issue, exposed servicemen may well have been under greater scrutiny, creating surveillance and diagnostic biases.

The authors failed to grapple with other, perhaps more serious biases and inadequacy of methodology in their study. Although the authors claimed to have controlled for the most important confounders, they failed to include diabetes as a co-variate in their analysis, even though diabetic patients have a more than doubled increased risk for bladder cancer, even after adjustment for smoking.[7] Diabetic patients would also have been likely to have had more visits to VA centers for healthcare and more opportunity to have been diagnosed with bladder cancer.

Futhermore, with respect to the known confounding variable of smoking, the authors trichotomized smoking history as “never,” “former,” or “current” smoker. The authors were missing smoking information in about 13% of the cohort. In a univariate analysis based upon smoking status (Table 4), the authors reported the following hazard ratios for bladder cancer, by smoking status:

Smoking status at bladder cancer diagnosis

Never smoked      1   [Reference]

Current smoker   1.10 (1.00-1.21)

Former smoker    1.08 (1.00-1.18)

Unknown              1.17 (1.05-1.31)

This analysis for smoking risk points to the fragility of the Agent Orange analyses. First, the “unknown” smoking status is associated with roughly twice the risk for current or former smokers. Second, the risk ratios for muscle-invasive bladder cancer were understandably higher for current smokers (OR 1.10, 95% CI 1.00-1.21) and former smokers (OR 1.08, 95% CI 1.00-1.18) than for non-smoking veterans.

Third, the Williams’ study’s univariate analysis of smoking and bladder cancer generates risk ratios that are quite out of line with independent studies of smoking and bladder cancer risk. For instance, meta-analyses of studies of smoking and bladder cancer risk report risk ratios of 2.58 (95% C.I., 2.37–2.80) for any smoking, 3.47 (3.07–3.91) for current smoking, and 2.04 (1.85–2.25) for past smoking.[8] These smoking-related bladder cancer risks are thus order(s) of magnitude greater than the univariate analysis of smoking risk in the Williams study, as well as the multivariate analysis of Agent Orange risk reported.

Fourth, the authors engage in the common, but questionable practice of categorizing a known confounder, smoking, which should ideally be reported as a continuous variable with respect to quantity consumed, years smoked, and years since quitting.[9] The question here, given that the study is very large, is not the loss of power,[10] but bias away from the null. Peter Austin has shown, by Monte Carlo simulation, that categorizing a continuous variable in a logistic regression results in inflating the rate of finding false positive associations.[11] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. The large dataset used by Williams and colleagues, which they see as a plus, works against them by increasing the bias from the use of categorical variables for confounding variables.[12]

The Williams study raises serious questions not only about the quality of medical journalism, but also about how an executive agency such as the Veterans Administration evaluates scientific evidence. If the Williams study were to play a role in compensation determinations, it would seem that veterans with muscle-invasive bladder cancer would be turned away, while those veterans with less serious cancers would be compensated. But with 2.1% incidence versus 2.0%, how can compensation be rationally permitted in every case?


[1] Stephen B. Williams, Jessica L. Janes, Lauren E. Howard, Ruixin Yang, Amanda M. De Hoedt, Jacques G. Baillargeon, Yong-Fang Kuo, Douglas S. Tyler, Martha K. Terris, Stephen J. Freedland, “Exposure to Agent Orange and Risk of Bladder Cancer Among US Veterans,” 6 JAMA Network Open e2320593 (2023).

[2] Elana Gotkine, “Exposure to Agent Orange Linked to Risk of Bladder Cancer,” Buffalo News (June 28, 2023); Drew Amorosi, “Agent Orange exposure linked to increased risk for bladder cancer among Vietnam veterans,” HemOnc Today (June 27, 2023).

[3] Andrea S. Blevins Primeau, “Agent Orange Exposure Tied to Increased Risk of Bladder Cancer,” Cancer Therapy Advisor (June 30, 2023); Mike Bassett, “Agent Orange Exposure Tied to Bladder Cancer Risk in Veterans — Increased risk described as ‘modest’, and no association seen with aggressiveness of cancer,” Medpage Today (June 27, 2023).

[4] Darlene Dobkowski, “Agent Orange Exposure Modestly Increases Bladder Cancer Risk in Vietnam Veterans,” Cure Today (June 27, 2023).

[5] Institute of Medicine – Committee to Review the Health Effects in Vietnam Veterans of Exposure to Herbicides (Tenth Biennial Update), Veterans and Agent Orange: Update 2014 at 10 (2016) (upgrading previous finding of “inadequate” to “suggestive”).

[6] Williams study, citing U.S. Department of Veterans Affairs, “Agent Orange exposure and VA disability compensation.”

[7] Yeung Ng, I. Husain, N. Waterfall, “Diabetes Mellitus and Bladder Cancer – An Epidemiological Relationship?” 9 Path. Oncol. Research 30 (2003) (diabetic patients had an increased, significant odds ratio for bladder cancer compared with non diabetics even after adjustment for smoking and age [OR: 2.69 p=0.049 (95% CI 1.006-7.194)]).

[8] Marcus G. Cumberbatch, Matteo Rota, James W.F. Catto, and Carlo La Vecchia, “The Role of Tobacco Smoke in Bladder and Kidney Carcinogenesis: A Comparison of Exposures and Meta-analysis of Incidence and Mortality Risks?” 70 European Urology 458 (2016).

[9] See generally, “Confounded by Confounding in Unexpected Places” (Dec. 12, 2021).

[10] Jacob Cohen, “The cost of dichotomization,” 7 Applied Psychol. Measurement 249 (1983).

[11] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[12] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006); Valerii Fedorov, Frank Mannino, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M. Cumberland, Gabriela Czanner, Catey Bunce, Caroline J. Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014); Julie R. Irwin & Gary H. McClelland, “Negative Consequences of Dichotomizing Continuous Predictor Variables,” 40 J. Marketing Research 366 (2003); Stanley E. Lazic, “Four simple ways to increase power without increasing the sample size,” PeerJ Preprints (23 Oct 2017).

Scientists Suing Scientists, and Behaving Badly

June 2nd, 2021

In his 1994 Nobel Prize acceptance speech, the Hungarian born chemist George Andrew Olah acknowledged an aspect of science that rarely is noted in popular discussions:

“[One] way of dealing with errors is to have friends who are willing to spend the time necessary to carry out a critical examination of the experimental design beforehand and the results after the experiments have been completed. An even better way is to have an enemy. An enemy is willing to devote a vast amount of time and brain power to ferreting out errors both large and small, and this without any compensation. The trouble is that really capable enemies are scarce; most of them are only ordinary. Another trouble with enemies is that they sometimes develop into friends and lose a good deal of their zeal. It was in this way the writer lost his three best enemies. Everyone, not just scientists, need a few good enemies!”[1]

If you take science seriously, you must take error as something for which we should always be vigilant, and something we are committed to eliminate. As Olah and Von Békésy have acknowledged, sometimes an enemy is required. It would thus seem to be quite unscientific to complain that an enemy was harassing you, when she was criticizing your data, study design, methods, or motives.

Elisabeth Margaretha Harbers-Bik would be a good enemy to have. Trained in the Netherlands in microbiology, Dr. Bik came to the United States, where for some years she conducted research at Stanford University. In 2018, Bik began in earnest a new career in analyzing published scientific studies for image duplication and manipulation, and other dubious practices.[2]

Her blog, Scientific Integrity Digest, should be on the reading list of every lawyer who labors in the muck of science repurposed for litigation. You never know when your adversary’s expert witness will be featured in the pages of the Digest!

Dr. Bik is not a lone ranger; there are other scientists who have committed to cleaning up the scientific literature. After an illustrious career as an editor of prestigious journals, and a director of the Rockefeller University Press, Dr. Mike Rossner founded Image Data Integrity, Inc., to stamp out image fraud and error in scientific publications.

On March 16, 2020, a gaggle of French authors, including Dr. Didier Raoult, uploaded a pre-print of a paper to medRxiv, reporting on hydroxychloroquine (HCQ) and azithromycin in Covid-19 patients. The authors submitted their manuscript that same day to the International Journal of Antimicrobial Agents, which accepted it in 24 hours or less, on March 17, 2020. The journal published the paper online, three days after acceptance, on March 20th. Peer-review, to the extent it took place, was abridged.[3]

The misleading title of the paper, “Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial,” may have led some untutored observers into thinking the paper reported a study high in the hierachy of evidence. Instead the paper was a rather flawed observational study, or perhaps just a concatenation of anecdotes. In any event, the authors reported that patients who had received both medications cleared the SARS-CoV2 the fastest.

Four days after publication online at a supposedly peer-reviewed journal, Elisabeth Bik posted an insightful analysis of the Raoult paper.[4] If peer review it were, her blog post pointed out the review’s failure by identifying an apparent conflict of interest and various methodological flaws, including missing data on six (out of 26) patients, including one patient who died, and three whose conditions worsened on therapy.

Raoult’s paper, and his overly zealous advocacy for HCQ did not go unnoticed in the world of kooks, speculators, and fraudfeasors. Elon Musk tweeted about Raoult’s paper; and Fox News amplified Musk’s tweet, which made it into the swamp of misinformation, Trump’s mind and his twitterverse.[5]

In the wake of the hoopla over Raoult’s paper, the journal owner admitted that the paper did not live up to the society’s standards. The publisher, Elsevier, called for an independent investigation. The French Infectious Diseases Society accused Raoult of spreading false information about hydroxychloroquine’s efficacy in Covid-19 patients. To date, there has been no further official discussions of disciplinary actions or proceedings at the Society.

Raoult apparently stewed over Bik’s criticisms and debunking of his over-interpretation of his flawed HCQ study.  Last month, Raoult filed a complaint with a French prosecutor, which marked the commencement of legal proceedings against Bik for harassment and “extortion.” The extortion charge is based upon nothing more than Bik’s having a Patreon account to support her search for fraud and error in the published medical literature.[6]

The initial expression of outrage over Marseille Raoult’s bad behavior came from Citizen4Science, a French not-for-profit organization that works to promote scientific integrity. According to Dr. Fabienne Blum, president of Citizen4Science, the organization issued its press release on May 5, 2021, to call on authorities to investigate and to intervene in Raoult’s harassment of scientists. Their press release about “the French scandal” was signed by scientists and non-scientists from around the world; it currently remains open for signatures, which number well over 4,000. “Harassment of scientific spokespersons and defenders of scientific integrity: Citizen4Science calls on the authorities to intervene urgently” (May 5, 2021). Dr. Blum and Citizen4Science are now harassed on Twitter, where they have been labeled “Bik’s gang.” Inevitably, they will be sued as well.

On June 1st, Dr. Raoult posted his self-serving take on the controversy on that scholarly forum known as YouTube. An English translation of Raoult’s diatribe can be found at Citizen4Science’s website. Perhaps others have noted that Raoult refers to Bik as “Madame” (or Mrs.) Bik, rather than as Dr. Bik, which leads to some speculation that Raoult has trouble taking criticism from intelligent women.

Having projected his worst characteristics onto adversaries, Raoult lodged accusations against Bik, which actually reflected his own behaviors closely. Haven’t we seen someone in public life who operates just like this? Raoult has criticized Bik in the lay media, and he released personal information about her, including her residential address. Raoult’s intemperate and inappropriate personal attacks on Bik have led several hundred scientists to sign an open letter in support of Bik.[7]

This scientist doth protest too much, methinks.


[1] George Andrew Olah Nobel Prize Speech (1994) (quoting from George Von Békésy, Experiments in Hearing 8 (1960).

[2] Elisabeth M. Bik, Arturo Casadevall, and Ferric C. Fang, “The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications,” 7 mBio e00809 (2016); Daniele Fanelli, Rodrigo Costas, Ferric C. Fang, Arturo Casadevall, Elisabeth M. Bik, “Testing Hypotheses on Risk Factors for Scientific Misconduct via Matched-Control Analysis of Papers Containing Problematic Image Duplications,” 25 Science & Engineering Ethics 771 (2019); see also Jayashree Rajagopalan, “I have found about 2,000 problematic papers, says Dr. Elisabeth Bik,” Editage Insights (Aug 08, 2019).

[3] Philippe Gautret, Jean-Christophe Lagier, Philippe Parola, Van Thuan Hoang, Line Meddeb, Morgane Mailhe, Barbara Doudier, Johan Courjon, Valérie Giordanengo, Vera Esteves Vieira, Hervé Tissot Dupont, Stéphane Honoré, Philippe Colson, Eric Chabrière, Bernard La Scola, Jean-Marc Rolain, Philippe Brouqui, and Didier Raoult, “Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial,” 56 Clinical Trial Internat’l J. Antimicrob. Agents e105949 (2020).

[4] Bik, “Thoughts on the Gautret et al. paper about Hydroxychloroquine and Azithromycin treatment of COVID-19 infections,” Scientific Integrity Digest (March 24, 2020).

[5] Charles Piller, “‘This is insane!’ Many scientists lament Trump’s embrace of risky malaria drugs for coronavirus,” Science Mag. (Mar. 26, 2020).

[6] Melissa Davey, “World expert in scientific misconduct faces legal action for challenging integrity of hydroxychloroquine study,” The Guardian (May 22, 2021); Kristina Fiore, “HCQ Doc Sues Critic,” MedPage Today (May 26, 2021).

[7] Lonni Besançon, Alexander Samuel, Thibault Sana, Mathieu Rebeaud, Anthony Guihur, Marc Robinson-Rechavi, Nicolas Le Berre, Matthieu Mulot, Gideon Meyerowitz-Katz, Maisonneuve, Brian A. Nosek, “Open Letter: Scientists stand up to protect academic whistleblowers and post-publication peer review,” (May 18, 2021).

Reference Manual on Scientific Evidence v4.0

February 28th, 2021

The need for revisions to the third edition of the Reference Manual on Scientific Evidence (RMSE) has been apparent since its publication in 2011. A decade has passed, and the federal agencies involved in the third edition, the Federal Judicial Center (FJC) and the National Academies of Science Engineering and Medicine (NASEM), are assembling staff to prepare the long-needed revisions.

The first sign of life for this new edition came back on November 24, 2020, when the NASEM held a short, closed door virtual meeting to discuss planning for a fourth edition.[1] The meeting was billed by the NASEM as “the first meeting of the Committee on Emerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence.” The Committee members heard from John S. Cooke (FJC Director), and Alan Tomkins and Reggie Sheehan, both of the National Science Foundation (NSF). The stated purpose of the meeting was to review the third edition of the RMSE to identify “identify areas of science, technology, and medicine that may be candidates for new or updated chapters in a proposed new (fourth) edition of the manual.” The only public pronouncement from the first meeting was that the committee would sponsor a workshop on the topic of new chapters for the RMSE, in early 2021.

The Committee’s second meeting took place a week later, again in closed session.[2] The stated purpose of the Committee’s second meeting was to review the third edition of the RMSE, and to discuss candidate areas for inclusion as new and updated chapters for a fourth edition.

Last week saw the Committee’s third, public meeting. The meeting spanned two days (Feb. 24 and 25, 2021), and was open to the public. The meeting was sponsored by NASEM, FJC, along with the NSF, and was co-chaired by Thomas D. Albright, Professor and Conrad T. Prebys Chair at the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, who sits on the United States Court of Appeals for the Federal Circuit. Identified members of the committee include:

Steven M. Bellovin, professor in the Computer Science department at Columbia University;

Karen Kafadar, Departmental Chair and Commonwealth Professor of Statistics at the University of Virginia, and former president of the American Statistical Association;

Andrew Maynard, professor, and director of the Risk Innovation Lab at the School for the Future of Innovation in Society, at Arizona State University;

Venkatachalam Ramaswamy, Director of the Geophysical Fluid Dynamics Laboratory of the National Oceanic and Atmospheric Administration (NOAA) Office of Oceanic and Atmospheric Research (OAR), studying climate modeling and climate change;

Thomas Schroeder, Chief Judge for the U.S. District Court for the Middle District of North Carolina;

David S. Tatel, United States Court of Appeals for the District of Columbia Circuit; and

Steven R. Kendall, Staff Officer

The meeting comprised five panel presentations, made up of remarkably accomplished and talented speakers. Each panel’s presentations were followed by discussion among the panelists, and the committee members. Some panels answered questions submitted from the public audience. Judge O’Malley opened the meeting with introductory remarks about the purpose and scope of the RMSE, and of the inquiry into additional possible chapters.

  1. Challenges in Evaluating Scientific Evidence in Court

The first panel consisted entirely of judges, who held forth on their approaches to judicial gatekeeping of expert witnesses, and their approach to scientific and technical issues. Chief Judge Schroeder moderated the presentations of panelists:

Barbara Parker Hervey, Texas Court of Criminal Appeals;

Patti B. Saris, Chief Judge of the United States District Court for the District of Massachusetts,  member of President’s Council of Advisors on Science and Technology (PCAST);

Leonard P. Stark, U.S. District Court for the District of Delaware; and

Sarah S. Vance, Judge (former Chief Judge) of the U.S. District Court for the Eastern District of Louisiana, chair of the Judicial Panel on Multidistrict Litigation.

  1. Emerging Issues in the Climate and Environmental Sciences

Paul Hanle, of the Environmental Law Institute moderated presenters:

Joellen L. Russell, the Thomas R. Brown Distinguished Chair of Integrative Science and Professor at the University of Arizona in the Department of Geosciences;

Veerabhadran Ramanathan, Edward A. Frieman Endowed Presidential Chair in Climate Sustainability at the Scripps Institution of Oceanography at the University of California, San Diego;

Benjamin D. Santer, atmospheric scientist at Lawrence Livermore National Laboratory; and

Donald J. Wuebbles, the Harry E. Preble Professor of Atmospheric Science at the University of Illinois.

  1. Emerging Issues in Computer Science and Information Technology

Josh Goldfoot, Principal Deputy Chief, Computer Crime & Intellectual Property Section, at U.S. Department of Justice, moderated panelists:

Jeremy J. Epstein, Deputy Division Director of Computer and Information Science and Engineering (CISE) and Computer and Network Systems (CNS) at the National Science Foundation;

Russ Housley, founder of Vigil Security, LLC;

Subbarao Kambhampati, professor of computer science at Arizona State University; and

Alice Xiang, Senior Research Scientist at Sony AI.

  1. Emerging Issues in the Biological Sciences

Panel four was moderated by Professor Ellen Wright Clayton, the Craig-Weaver Professor of Pediatrics, and Professor of Law and of Health Policy at Vanderbilt Law School, at Vanderbilt University. Her panelists were:

Dana Carroll, distinguished professor in the Department of Biochemistry at the University of Utah School of Medicine;

Yaniv Erlich, Chief Executive Officer of Eleven Therapeutics, Chief Science Officer of MyHeritage;

Steven E. Hyman, director of the Stanley Center for Psychiatric Research at Broad Institute of MIT and Harvard; and

Philip Sabes, Professor Emeritus in Physiology at the University of California, San Francisco (UCSF).

  1. Emerging areas in Psychology, Data, and Statistical Sciences

Gary Marchant, Lincoln Professor of Emerging Technologies, Law and Ethics, at Arizona State University’s Sandra Day O’Connor College of Law, moderated panelists:

Xiao-Li Meng, the Whipple V. N. Jones Professor of Statistics, Harvard University, and the Founding Editor-in-Chief of Harvard Data Science Review;

Rebecca Doerge, Glen de Vries Dean of the Mellon College of Science at Carnegie Mellon University, member of the Dietrich College of Humanities and Social Sciences’ Department of Statistics and Data Science, and of the Mellon College of Science’s Department of Biological Sciences;

Daniel Kahneman, Professor of Psychology and Public Affairs Emeritus at the Princeton School of Public and International Affairs, the Eugene Higgins Professor of Psychology Emeritus at Princeton University, and a fellow of the Center for Rationality at the Hebrew University in Jerusalem; and

Goodwin Liu, Associate Justice of the California Supreme Court.

The Proceedings of this two day meeting were recorded and will be published. The website materials are unclear whether the verbatim remarks will be included, but regardless, the proceedings should warrant careful reading.

Judge O’Malley, in her introductory remarks, emphasized that the RMSE must be a neutral, disinterested source of information for federal judges, an aspirational judgment from which there can be no dissent. More controversial will be Her Honor’s assessment that epidemiologic studies can “take forever,” and other judges’ suggestion that plaintiffs lack financial resources to put forward credible, reliable expert witnesses. Judge Vance corrected the course of the discussion by pointing out that MDL plaintiffs were not disadvantaged, but no one pointed out that plaintiffs’ counsel were among the wealthiest individuals in the United States, and that they have been known to sponsor epidemiologic and other studies that wind up as evidence in court.

Panel One was perhaps the most discomforting experience, as it involved revelations about how sausage is made in the gatekeeping process. The panel was remarkable for including a state court judge from Texas, Judge Barbara Parker Hervey, of the Texas Court of Criminal Appeals. Judge Hervey remarked that [in her experience] if we judges “can’t understand it, we won’t read it.” Her dictum raises interesting issues. No doubt, in some instances, the judicial failure of comprehension is the fault of the lawyers. What happens when the judges “can’t understand it”? Do they ask for further briefing? Or do they ask for a hearing with viva voce testimony from expert witnesses? The point was not followed up.

Leonard P. Stark’s insights were interesting in that his docket in the District of Delaware is flooded with patent and Hatch-Waxman Act litigation. Judge Stark’s extensive educational training is in politics and political science. The docket volume Judge Stark described, however, raised issues about how much attention he could give to any one case.

When the panel was asked how they dealt with scientific issues, Judge Saris discussed her presiding over In re Neurontin, which was a “big challenge for me to understand,” with no randomized trials or objective assessments by the litigants.[3] Judge Vance discussed her experience of presiding in a low-level benzene exposure case, in which plaintiff claimed that his acute myelogenous leukemia was caused by gasoline.[4]

Perhaps the key difference in approach to Rule 702 emerged when the judges were asked whether they read the underlying studies. Judge Saris did not answer directly, but stated she reads the reports. Judge Vance, on the other hand, noted that she reads the relied upon studies. In her gasoline-leukemia case, she read the relied-upon epidemiologic studies, which she described as a “hodge podge,” and which were misrepresented by the expert witnesses and counsel. She emphasized the distortions of the adversarial system and the need to moderate its excesses by validating what exactly the expert witnesses had relied upon.

This division in judicial approach was seen again when Professor Karen Kafadar asked how the judges dealt with peer review. Judge Saris seemed to suggest that the peer-reviewed published article was prima facie reliable. Others disagreed and noted that peer reviewed articles can have findings that are overstated, and wrong. One speaker noted that Jerome Kassirer had downplayed the significance of, and the validation provided by, peer review, in the RMSE (3rd ed 2011).

Curiously, there was no discussion of Rule 703, either in Judge O’Malley’s opening remarks on the RMSE, or in the first panel discussion. When someone from the audience submitted a question about the role of Rule 703 in the gatekeeping process, the moderator did not read it.

Panel Two. The climate change panel was a tour de force of the case for anthropogenic climate change. To some, the presentations may have seemed like a reprise of The Day After Tomorrow. Indeed, the science was presented so confidently, if not stridently, that one of the committee members asked whether there could be any reasonable disagreement. The panelists responded essentially by pointing out that there could be no good faith opposition. The panelists were much less convincing on the issue of attributability. None of the speakers addressed the appropriateness vel non of climate change litigation, when the federal and state governments encouraged, licensed, and regulated the exploitation and use of fossil fuel reserves.

Panel Four. Dr. Clayton’s panel was fascinating and likely to lead to new chapters. Professor Hyman presented on heritability, a subject that did not receive much attention in the RMSE third edition. With the advent of genetic claims of susceptibility and defenses of mutation-induced disease, courts will likely need some good advice on navigating the science. Dana Carroll presented on human genome editing (CRISPR). Philip Sabes presented on brain-computer interfaces, which have progressed well beyond the level of sci-fi thrillers, such as The Brain That Wouldn’t Die (“Jan in the Pan”).

In addition to the therapeutic applications, Sabes discussed some of potential forensic uses, such as lie detectors, pain quantification, and the like. Yaniv Erlich, of MyHeritage, discussed advances in forensic genetic genealogy, which have made a dramatic entrance to the common imagination through the apprehension of Joseph James DeAngelo, the Golden State killer. The technique of triangulating DNA matches from consumer DNA databases has other applications, of course, such as identifying lost heirs, and resolving paternity issues.

Panel Five. Professor Marchant’s panel may well have identified some of the most salient needs for the next edition of the RMSE. Nobel Laureate Daniel Kahneman presented some of the highlights from his forthcoming book about “noise” in human judgment.[5] Kahneman’s expansion upon his previous thinking about the sources of error in human – and scientific – judgment are a much needed addition to the RMSE. Along the same lines, Professor Xiao Li Meng, presented on selection bias, and how it pervades scientific work, and detracts from the strength of evidence in the form of:

  1. cherry picking
  2. subgroup analyses
  3. unprincipled handling of outliers
  4. selection in methodologies (different tests)
  5. selection in due diligence (check only when you don’t like results)
  6. publication bias that results from publishing only impressive or statistically significant results
  7. selection in reporting, not reporting limitations all analyses
  8. selection in understanding

Professor Meng’s insights are sorely lacking in the third edition of the RMSE, and among judicial gatekeepers generally.  All too often, undue selectivity in methodologies and in relied-upon data is treated by judges as an issue that “goes to the weight, not the admissibility” of expert witness opinion testimony. In actuality, the selection biases, and other systematic and cognitive biases, are as important as, if not more important than, random error assessments. Indeed a close look at the RMSE third edition reveals a close embrace of the amorphous, anything-goes “weight of the evidence” approach in the epidemiology chapter.  That chapter marginalizes meta-analyses and fails to mention systematic review techiniques altogether. The chapter on clinical medicine, however, takes a divergent approach, emphasizing the hierarchy of evidence inherent in different study types, and the need for principled and systematic reviews of the available evidence.[6]

The Committee co-chairs and panel moderators did a wonderful job to identify important new trends in genetics, data science, error assessment, and computer science, and they should be congratulated for their efforts. Judge O’Malley is certainly correct in saying that the RMSE must be a neutral source of information on statistical and scientific methodologies, and it needs to be revised and updated to address errors and omissions in the previous editions. The legal community should look for, and study, the published proceedings when they become available.

——————————————————————————————————

[1]  SeeEmerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence – Committee Meeting” (Nov. 24, 2020).

[2]  SeeEmerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence – Committee Meeting 2 (Virtual)” (Dec. 1, 2020).

[3]  In re Neurontin Marketing, Sales Practices & Prods. Liab. Litig., 612 F. Supp. 2d 116 (D. Mass. 2009) (Saris, J.).

[4]  Burst v. Shell Oil Co., 104 F.Supp.3d 773 (E.D.La. 2015) (Vance, J.), aff’d, ___ Fed. App’x ___, 2016 WL 2989261 (5th Cir. May 23, 2016), cert. denied, 137 S.Ct. 312 (2016). SeeThe One Percent Non-solution – Infante Fuels His Own Exclusion in Gasoline Leukemia Case” (June 25, 2015).

[5]  Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (anticipated May 2021).

[6]  See John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” Reference Manual on Scientific Evidence 723-24 (3ed ed. 2011) (discussing hierarchy of medical evidence, with systematic reviews at the apex).

Daubert’s Silver Anniversary – Retrospective View of Its Friends and Enemies

October 21st, 2018

Science is inherently controversial because when done properly it has no respect for power or political correctness or dogma or entrenched superstition. We should thus not be surprised that the scientific process has many detractors in houses of worship, houses of representatives, and houses of the litigation industry. And we have more than a few “Dred Scott” decisions, in which courts have held that science has no criteria of validity that they are bound to follow.

To be sure, many judges have recognized a different danger in scientific opinion testimony, namely, its ability to overwhelm the analytical faculties of lay jurors. Fact-finders may view scientific expert witness opinion testimony as having an overwhelming certainty and authority, which swamps their ability to evaluate the testimony.1

One errant judicial strategy to deal with their own difficulty in evaluating scientific evidence was to invent a fictitious divide between a scientific and legal burden of proof:2

Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain. Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty. Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not.”

By falsely elevating the scientific standard, judges see themselves free to decide expeditiously and without constraints, because they are operating at much lower epistemic level.

Another response advocated by “the Lobby,” scientists in service to the litigation industry, has been to deprecate gatekeeping altogether. Perhaps the most brazen anti-science response to the Supreme Court’s decision in Daubert was advanced by David Michaels and his Project on Scientific Knowledge and Public Policy (SKAPP). In its heyday, SKAPP organized meetings and conferences, and cranked out anti-gatekeeping propaganda to the delight of the litigation industry3, while obfuscating and equivocating about the source of its funding (from the litigation industry).4

SKAPP principal David Michaels was also behind the efforts of the American Public Health Association (APHA) to criticize the judicial move to scientific standards in gatekeeping. In 2004, Michaels and fellow litigation industrialists prevailed upon the APHA to adopt a policy statement that attacked evidence-based science and data transparency in the form of “Policy Number: 2004-11 Threats to Public Health Science.”5

SKAPP appears to have gone the way of the dodo, although the defunct organization still has a Wikipedia­ page with the misleading claim that a federal court had funded its operation, and the old link for this sketchy outfit now redirects to the website for the Union of Concerned Scientists. In 2009, David Michaels, fellow in the Collegium Ramazzini, and formerly the driving force of SKAPP, went on to become an under-secretary of Labor, and OSHA administrator in the Obama administration.6

With the end of his regulatory work, Michaels is now back in the litigation saddle. In April 2018, Michaels participated in a ruse in which he allowed himself to be “subpoenaed” by Mark Lanier, to give testimony in a cases involving claims that personal talc use caused ovarian cancers.7 Michaels had no real subject matter expertise, but he readily made himself available so that Mr. Lanier could inject Michaels’ favorite trope of “doubt is their product” into his trial.

Against this backdrop of special pleading from the litigation industry’s go-to expert witnesses, it is helpful to revisit the Daubert decision, which is now 25 years old. The decision followed the grant of the writ of certiorari by the Supreme Court, full briefing by the parties on the merits, oral argument, and twenty two amicus briefs. Not all briefs are created equal, and this inequality is especially true of amicus briefs, for which the quality of argument, and the reputation of the interested third parties, can vary greatly. Given the shrill ideological ranting of SKAPP and the APHA, we might find some interest in what two leading scientific organizations, the American Association for the Advancement of Science (AAAS) and the National Academy of Science (NAS), contributed to the debate over the proper judicial role in policing expert witness opinion testimony.

The Amicus Brief of the AAAS and the NAS, filed in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102 (Jan. 19, 1993), was submitted by Richard A. Meserve and Lars Noah, of Covington & Burling, and by Bert Black, of Weinberg & Green. Unfortunately, the brief does not appear to be available on Westlaw, but it was republished shortly after filing, at 12 Biotechnology Law Report 198 (No. 2, March-April 1993) [all citations below are to this republication].

The amici were and are well known to the scientific community. The AAAS is a not-for-profit scientific society, which publishes the prestigious journal Science, and engages in other activities to advance public understanding of science. The NAS was created by congressional charter in the administration of Abraham Lincoln, to examine scientific, medical, and technological issues of national significance. Brief at 208. Meserve, counsel of record for these Amici Curiae, is a member of the National Academy, a president emeritus of the Carnegie Institution for Science, and a former chair of the U.S. Nuclear Regulatory Commission. He received his doctorate in applied physics from Stanford University, and his law degree from Harvard. Noah is now a professor of law in the University of Florida, and Black is still a practicing lawyer, ironically for the litigation industry.

The brief of the AAAP and the NAS did not take a position on the merits of whether Bendectin can cause birth defects, but it had a great deal to say about the scientific process, and the need for courts to intervene to ensure that expert witness opinion testimony was developed and delivered with appropriate methodological rigor.

A Clear and Present Danger

The amici, AAAS and NAS, clearly recognized a threat to the integrity of scientific fact-finding in the regime of uncontrolled and unregulated expert witness testimony. The amici cited the notorious case of Wells v. Ortho Pharmaceutical Corp.8, which had provoked an outcry from the scientific community, and a particularly scathing article by two scientists from the National Institute of Child Health and Human Development.9

The amici also cited several judicial decisions on the need for robust gatekeeping, including the observations of Judge Jack Weinstein that

[t]he uncertainty of the evidence in [toxic tort] cases, dependent as it is on speculative scientific hypotheses and epidemiological studies, creates a special need for robust screening of experts and gatekeeping under Rules 403 and 703 by the court.”10

The AAAS and the NAS saw the “obvious danger that research results generated solely for litigation may be skewed.” Brief at 217& n.11.11 The AAAS and the NAS thus saw a real, substantial threat in countenancing expert witnesess who proffered “putatively scientific evidence that does not in fact reflect the application of scientific principles.” Brief at 208. The Supreme Court probably did not need the AAAS and the NAS to tell them that “[s]uch evidence can lead to incorrect decisions and can serve to discredit the contributions of science,” id., but it may have helped ensure that the Court articulated meaningful guidelines to trial judges to police their courtrooms against scientific conclusions that were not reached in accordance with scientific principles. The amici saw and stated that

[t]he unique persuasive power of scientific evidence and its inherent limitations requires that courts engage special efforts to ensure that scientific evidence is valid and reliable before it is admitted. In performing that task, courts can look to the same criteria that scientists themselves use to evaluate scientific claims.”

Brief at 212.

It may seem quaint to the post-modernists at the APHA, but the AAAS and the NAS were actually concerned “to avoid outcomes that are at odds with reality,” and they were willing to urge that “courts must exercise special care to assure that such evidence is based on valid and reliable scientific methodologies.” Brief at 209 (emphasis added). The amici also urged caution in allowing opinion testimony that conflicted with existing learning, and which had not been presented to the scientific community for evaluation. Brief at 218-19. In the words of the amici:

Courts should admit scientific evidence only if it conforms to scientific standards and is derived from methods that are generally accepted by the scientific community as valid and reliable. Such a test promotes sound judicial decisionmaking by providing workable means for screening and assessing the quality of scientific expert testimony in advance of trial.”

Brief at 233. After all, part of the scientific process itself is weeding out false ideas.

Authority for Judicial Control

The AAAS and NAS and its lawyers gave their full support to Merrill Dow’s position that “courts have the authority and the responsibility to exclude expert testimony that is based upon unreliable or misapplied methodologies.” Brief at 209. The Federal Rules of Evidence, and Rules 702, 703, and 403 in particular, gave trial courts “ample authority for empowering courts to serve as gatekeepers.” Brief at 230. The amici argued what ultimately would become the law, that judicial control, in the spirit and text of the Federal Rules, of “[t]hreshold determinations concerning the admissibility of scientific evidence are necessary to ensure accurate decisions and to avoid unnecessary expenditures of judicial resources on collateral issues. Brief at 210. The AAAS and NAS further recommended that:

Determinations concerning the admissibility of expert testimony based on scientific evidence should be made by a judge in advance of trial. Such judicial control is explicitly called for under Rule 104(a) of the Federal Rules of Evidence, and threshold admissibility determinations by a judge serve several important functions, including simplification of issues at trial (thereby increasing the speed of trial), improvement in the consistency and predictability of results, and clarification of the issues for purposes of appeal. Indeed, it is precisely because a judge can evaluate the evidence in a focused and careful manner, free from the prejudices that might infect deliberations by a jury, that the determination should be made as a threshold matter.”

Brief at 228 (internal citations omitted).

Criteria of Validity

The AAAS and NAS did not shrink from the obvious implications of their position. They insisted that “[i]n evaluating scientific evidence, courts should consider the same factors that scientists themselves employ to assess the validity and reliability of scientific assertions.” Brief at 209, 210. The amici may have exhibited an aspirational view of the ability of judges, but they shared their optimistic view that “judges can understand the fundamental characteristics that separate good science from bad.” Brief at 210. Under the gatekeeping regime contemplated by the AAAS and the NAS, judges would have to think and analyze, rather than delegating to juries. In carrying out their task, judges would not be starting with a blank slate:

When faced with disputes about expert scientific testimony, judges should make full use of the scientific community’s criteria and quality-control mechanisms. To be admissible, scientific evidence should conform to scientific standards and should be based on methods that are generally accepted by the scientific community as valid and reliable.”

Brief at 210. Questions such as whether an hypothesis has survived repeated severe, rigorous tests, whether the hypothesis is consistent with other existing scientific theories, whether the results of the tests have been presented to the scientific community, need to be answered affirmatively before juries are permitted to weigh in with their verdicts. Brief at 216, 217.

The AAAS and the NAS acknowledged implicitly and explicitly that courtrooms were not good places to trot out novel hypotheses, which lacked severe testing and sufficient evidentiary support. New theories must survive repeated testing and often undergo substantial refinements before they can be accepted in the scientific community. The scientific method requires nothing less. Brief at 219. These organizational amici also acknowledged that there will be occasionally “truly revolutionary advances” in the form of an hypothesis not fully tested. The danger of injecting bad science into broader decisions (such as encouraging meritless litigation, or the abandonment of useful products) should cause courts to view unestablished hypotheses with “heightened skepticism pending further testing and review.” Brief at 229. In other words, some hypotheses simply have not matured to the point at which they can support tort or other litigation.

The AAAS and the NAS contemplated that the gatekeeping process could and should incorporate the entire apparatus of scientific validity determinations into Rule 104(a) adjudications. Nowhere in their remarkable amicus brief do they suggest that if there some evidence (however weak) favoring a causal claim, with nothing yet available to weigh against it, expert witnesses can declare that they have the “weight of the evidence” on their side, and gain a ticket to the courthouse door. The scientists at SKAPP, or now those at the Union for Concerned Scientists, prefer to brand gatekeeping as a trick to sell “doubt.” What they fail to realize is that their propaganda threatens both universalism and organized skepticism, two of the four scientific institutional norms, described by sociologist of science Robert K. Merton.12


1 United States v. Brown, 557 F.2d 541, 556 (6th Cir. 1977) (“Because of its apparent objectivity, an opinion that claims a scientific basis is apt to carry undue weight with the trier of fact”); United States v. Addison, 498 F.2d 741, 744 (D.C. Cir. 1974) (“scientific proof may in some instances assume a posture of mystic infallibility in the eyes of a jury of laymen”). Some people say that our current political morass reflects poorly on the ability of United States citizens to assess and evaluate evidence and claims to the truth.

2 See, e.g., Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976). See also Rhetorical Strategy in Characterizing Scientific Burdens of Proof(Nov. 15, 2014).

3 See, e.g., Project on Scientific Knowledge and Public Policy, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of(2003).

4 See, e.g., SKAPP A LOT(April 30, 2010); “Manufacturing Certainty(Oct. 25, 2011);David Michaels’ Public Relations Problem(Dec. 2, 2011); Conflicted Public Interest Groups (Nov. 3, 2013).

7 Notes of Testimony by David Michaels, in Ingham v. Johnson & Johnson, Case No. 1522-CC10417-01, St. Louis Circuit Ct, Missouri (April 17, 2018).

8 788 F.2d 741, 744-45 (11th Cir.), cert. denied, 479 U.S. 950 (1986). Remarkably, consultants for the litigation industry have continued to try to “rehabilitate” the Wells decision. SeeCarl Cranor’s Conflicted Jeremiad Against Daubert” (Sept. 23, 2018).

9 James L. Mills & Duane Alexander, “Teratogens and Litogens,” 315 New Engl. J. Med. 1234, 1235 (1986).

10 Brief at n. 31, citing In re Agent Orange Product Liab. Litig., 611 F. Supp. 1267, 1269 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2th Cir. 1987), cert. denied, 487 U.S. 1234 (1988).

11 citing among other cases, Perry v. United States, 755 F.2d 888, 892 (11th Cir. 1985) (“A scientist who has a formed opinion as to the answer he is going to find before he even begins his research may be less objective than he needs to be in order to produce reliable scientific results.”).

12 Robert K. Merton, “The Normative Structure of Science,” in Robert K. Merton, The Sociology of Science: Theoretical and Empirical Investigations, chap. 13, at 267, 270 (1973).

David Egilman and Friends Circle the Wagons at the International Journal of Occupational & Environmental Health

May 4th, 2017

Andrew Maier is an associate professor in the Department of Environmental Health, in the University of Cincinnati. Maier received his Ph.D. degree in toxicology, with a master’s degree in industrial health. He is a Certified Industrial Hygienest and has published widely on occupational health issues. Earlier this year, Maier was named the editor-in-chief of the International Journal of Occupational and Environmental Health (IJOEH). See Casey Allen, “Andy Maier Named Editor of Environmental Health Journal(Jan. 18, 2017).

Before Maier’s appointment, the IJOEH was, for the last several years, the vanity press for former editor-in-chief David Egilman and “The Lobby,” the expert witness brigade of the lawsuit industry. Egilman’s replacement with Andrew Maier apparently took place after the IJOEH was acquired by the scientific publishing company Taylor & Francis, from the former publisher, Maney.

The new owner, however, left the former IJOEH editorial board, largely a gaggle of Egilman friends and fellow travelers in place. Last week, the editorial board revoltingly wrote [contact information redacted] to Roger Horton, Chief Executive Officer of Taylor & Francis, to request that Egilman be restored to power, or that the current Editorial Board be empowered to choose Egilman’s successor. With Trump-like disdain for evidence, the Board characterized the new Editor as a “corporate consultant.” If Maier has consulted with corporations, his work appears to have rarely if ever landed him in a courtroom at the request of a corporate defendant. And with knickers tightly knotted, the Board also made several other demands for control over Board membership and journal content.

Andrew Watterson wrote to Horton on behalf of all current and former IJOEH Editorial Board members, a group heavily populated by plaintiffs’ litigation expert witnesses and “political” scientists, including among others:

Arthur Frank

Morris Greenberg

Barry S. Levy

David Madigan

Jock McCulloch

David Wegman

Barry Castleman

Peter Infante

Ron Melnick

Daniel Teitelbaum

None of the signatories apparently disclosed their affiliations as corporate consultants for the lawsuit industry.

Removing Egilman from control was bad enough, but the coup de grâce for the Lobby came earlier in April 2016, when Taylor & Francis notified Egilman that a paper that he had published in IJOEH was being withdrawn. According to the petitioners, the paper, “The production of corporate research to manufacture doubt about the health hazards of products: an overview of the Exponent Bakelite simulation study,” was removed without explanation. See Public health journal’s editorial board tells publisher they have ‘grave concerns’ over new editor,” Retraction Watch (April 27, 2017).

According to Taylor & Francis, the Egilman article was “published inadvertently, before the review process had been completed. On completing that review, it was decided the article was unsuitable for publication in the journal.” Id. Well, of course, Egilman’s article was unlikely to receive much analytical scrutiny at a journal where he was Editor-in-Chief, and where the Board was populated by his buddies. The same could be said for many articles published under Egilman’s tenure at the IJOEH. Taylor & Francis owes Egilman and the scientific and legal community a detailed statement of what was in the article, which was “unsuitable,” and why. Certainly, the law department at Taylor & Francis should make sure that it does not give Egilman and his former Board of Editors grounds for litigation. They are, after all, tight with the lawsuit industry. More important, Taylor & Francis owes Dr. Egilman, as well as the scientific and legal community, a full explanation of why the article in question was unsuitable for publication in the IJOEH.

Don’t Double Dip Data

March 9th, 2015

Meta-analyses have become commonplace in epidemiology and in other sciences. When well conducted and transparently reported, meta-analyses can be extremely helpful. In several litigations, meta-analyses determined the outcome of the medical causation issues. In the silicone gel breast implant litigation, after defense expert witnesses proffered meta-analyses[1], court-appointed expert witnesses adopted the approach and featured meta-analyses in their reports to the MDL court[2].

In the welding fume litigation, plaintiffs’ expert witness offered a crude, non-quantified, “vote counting” exercise to argue that welding causes Parkinson’s disease[3]. In rebuttal, one of the defense expert witnesses offered a quantitative meta-analysis, which provided strong evidence against plaintiffs’ claim.[4] Although the welding fume MDL court excluded the defense expert’s meta-analysis from the pre-trial Rule 702 hearing as untimely, plaintiffs’ counsel soon thereafter initiated settlement discussions of the entire set of MDL cases. Subsequently, the defense expert witness, with his professional colleagues, published an expanded version of the meta-analysis.[5]

And last month, a meta-analysis proffered by a defense expert witness helped dispatch a long-festering litigation in New Jersey’s multi-county isotretinoin (Accutane) litigation. In re Accutane Litig., No. 271(MCL), 2015 WL 753674 (N.J. Super., Law Div., Atlantic Cty., Feb. 20, 2015) (excluding plaintiffs’ expert witness David Madigan).

Of course, when a meta-analysis is done improperly, the resulting analysis may be worse than none at all. Some methodological flaws involve arcane statistical concepts and procedures, and may be easily missed. Other flaws are flagrant and call for a gatekeeping bucket brigade.

When a merchant puts his hand the scale at the check-out counter, we call that fraud. When George Costanza double dipped his chip twice in the chip dip, he was properly called out for his boorish and unsanitary practice. When a statistician or epidemiologist produces a meta-analysis that double counts crucial data to inflate a summary estimate of association, or to create spurious precision in the estimate, we don’t need to crack open Modern Epidemiology or the Reference Manual on Scientific Evidence to know that something fishy has taken place.

In litigation involving claims that selective serotonin reuptake inhibitors cause birth defects, plaintiffs’ expert witness, a perinatal epidemiologist, relied upon two published meta-analyses[6]. In an examination before trial, this epidemiologist was confronted with the double counting (and other data entry errors) in the relied-upon meta-analyses, and she readily agreed that the meta-analyses were improperly done and that she had to abandon her reliance upon them.[7] The result of the expert witness’s deposition epiphany, however, was that she no longer had the illusory benefit of an aggregation of data, with an outcome supporting her opinion. The further consequence was that her opinion succumbed to a Rule 702 challenge. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Double counting of studies, or subgroups within studies, is a flaw that most careful readers can identify in a meta-analysis, without advance training. According to statistician Stephen Senn, double counting of evidence is a serious problem in published meta-analytical studies. Stephen J. Senn, “Overstating the evidence – double counting in meta-analysis and related problems,” 9, at *1 BMC Medical Research Methodology 10 (2009). Senn observes that he had little difficulty in finding examples of meta-analyses gone wrong, including meta-analyses with double counting of studies or data, in some of the leading clinical medical journals. Id. Senn urges analysts to “[b]e vigilant about double counting,” id. at *4, and recommends that journals should withdraw meta-analyses promptly when mistakes are found,” id. at *1.

Similar advice abounds in books and journals[8]. Professor Sander Greenland addresses the issue in his chapter on meta-analysis in Modern Epidemiology:

Conducting a Sound and Credible Meta-Analysis

Like any scientific study, an ideal meta-analysis would follow an explicit protocol that is fully replicable by others. This ideal can be hard to attain, but meeting certain conditions can enhance soundness (validity) and credibility (believability). Among these conditions we include the following:

  • A clearly defined set of research questions to address.

  • An explicit and detailed working protocol.

  • A replicable literature-search strategy.

  • Explicit study inclusion and exclusion criteria, with a rationale for each.

  • Nonoverlap of included studies (use of separate subjects in different included studies), or use of statistical methods that account for overlap. * * * * *”

Sander Greenland & Keith O’Rourke, “Meta-Analysis – Chapter 33,” in Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 652, 655 (3d ed. 2008) (emphasis added).

Just remember George Costanza; don’t double dip that chip, and don’t double dip in the data.


[1] See, e.g., Otto Wong, “A Critical Assessment of the Relationship between Silicone Breast Implants and Connective Tissue Diseases,” 23 Regulatory Toxicol. & Pharmacol. 74 (1996).

[2] See Barbara Hulka, Betty Diamond, Nancy Kerkvliet & Peter Tugwell, “Silicone Breast Implants in Relation to Connective Tissue Diseases and Immunologic Dysfunction:  A Report by a National Science Panel to the Hon. Sam Pointer Jr., MDL 926 (Nov. 30, 1998)”; Barbara Hulka, Nancy Kerkvliet & Peter Tugwell, “Experience of a Scientific Panel Formed to Advise the Federal Judiciary on Silicone Breast Implants,” 342 New Engl. J. Med. 812 (2000).

[3] Deposition of Dr. Juan Sanchez-Ramos, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008514 (N.D. Ohio May 17, 2011).

[4] Deposition of Dr. James Mortimer, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008054 (N.D. Ohio June 29, 2011).

[5] James Mortimer, Amy Borenstein & Laurene Nelson, Associations of Welding and Manganese Exposure with Parkinson’s Disease: Review and Meta-Analysis, 79 Neurology 1174 (2012).

[6] Shekoufeh Nikfar, Roja Rahimi, Narjes Hendoiee, and Mohammad Abdollahi, “Increasing the risk of spontaneous abortion and major malformations in newborns following use of serotonin reuptake inhibitors during pregnancy: A systematic review and updated meta-analysis,” 20 DARU J. Pharm. Sci. 75 (2012); Roja Rahimi, Shekoufeh Nikfara, Mohammad Abdollahic, “Pregnancy outcomes following exposure to serotonin reuptake inhibitors: a meta-analysis of clinical trials,” 22 Reproductive Toxicol. 571 (2006).

[7] “Q So the question was: Have you read it carefully and do you understand everything that was done in the Nikfar meta-analysis?

A Yes, I think so.

* * *

Q And Nikfar stated that she included studies, correct, in the cardiac malformation meta-analysis?

A That’s what she says.

* * *

Q So if you look at the STATA output, the demonstrative, the — the forest plot, the second study is Kornum 2010. Do you see that?

A Am I —

Q You’re looking at figure four, the cardiac malformations.

A Okay.

Q And Kornum 2010, —

A Yes.

Q — that’s a study you relied upon.

A Mm-hmm.

Q Is that right?

A Yes.

Q And it’s on this forest plot, along with its odds ratio and confidence interval, correct?

A Yeah.

Q And if you look at the last study on the forest plot, it’s the same study, Kornum 2010, same odds ratio and same confidence interval, true?

A You’re right.

Q And to paraphrase My Cousin Vinny, no self-respecting epidemiologist would do a meta-analysis by including the same study twice, correct?

A Well, that was an error. Yeah, you’re right.

***

Q Instead of putting 2 out of 98, they extracted the data and put 9 out of 28.

A Yeah. You’re right.

Q So there’s a numerical transposition that generated a 25-fold increased risk; is that right?

A You’re correct.

Q And, again, to quote My Cousin Vinny, this is no way to do a meta-analysis, is it?

A You’re right.”

Testimony of Anick Bérard, Kuykendall v. Forest Labs, at 223:14-17; 238:17-20; 239:11-240:10; 245:5-12 (Cole County, Missouri; Nov. 15, 2013). According to a Google Scholar search, the Rahimi 2005 meta-analysis had been cited 90 times; the Nikfar 2012 meta-analysis, 11 times, as recently as this month. See, e.g., Etienne Weisskopf, Celine J. Fischer, Myriam Bickle Graz, Mathilde Morisod Harari, Jean-Francois Tolsa, Olivier Claris, Yvan Vial, Chin B. Eap, Chantal Csajka & Alice Panchaud, “Risk-benefit balance assessment of SSRI antidepressant use during pregnancy and lactation based on best available evidence,” 14 Expert Op. Drug Safety 413 (2015); Kimberly A. Yonkers, Katherine A. Blackwell & Ariadna Forray, “Antidepressant Use in Pregnant and Postpartum Women,” 10 Ann. Rev. Clin. Psychol. 369 (2014); Abbie D. Leino & Vicki L. Ellingrod, “SSRIs in pregnancy: What should you tell your depressed patient?” 12 Current Psychiatry 41 (2013).

[8] Julian Higgins & Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions 152 (2008) (“7.2.2 Identifying multiple reports from the same study. Duplicate publication can introduce substantial biases if studies are inadvertently included more than once in a meta-analysis (Tramèr 1997). Duplicate publication can take various forms, ranging from identical manuscripts to reports describing different numbers of participants and different outcomes (von Elm 2004). It can be difficult to detect duplicate publication, and some ‘detectivework’ by the reviewauthors may be required.”); see also id. at 298 (Table 10.1.a “Definitions of some types of reporting biases”); id. at 304-05 (10.2.2.1 Duplicate (multiple) publication bias … “The inclusion of duplicated data may therefore lead to overestimation of intervention effects.”); Julian P.T. Higgins, Peter W. Lane, Betsy Anagnostelis, Judith Anzures-Cabrera, Nigel F. Baker, Joseph C. Cappelleri, Scott Haughie, Sally Hollis, Steff C. Lewis, Patrick Moneuse & Anne Whitehead, “A tool to assess the quality of a meta-analysis,” 4 Research Synthesis Methods 351, 363 (2013) (“A common error is to double-count individuals in a meta-analysis.”); Alessandro Liberati, Douglas G. Altman, Jennifer Tetzlaff, Cynthia Mulrow, Peter C. Gøtzsche, John P.A. Ioannidis, Mike Clarke, Devereaux, Jos Kleijnen, and David Moher, “The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration,” 151 Ann. Intern. Med. W-65, W-75 (2009) (“Some studies are published more than once. Duplicate publications may be difficult to ascertain, and their inclusion may introduce bias. We advise authors to describe any steps they used to avoid double counting and piece together data from multiple reports of the same study (e.g., juxtaposing author names, treatment comparisons, sample sizes, or outcomes).”) (internal citations omitted); Erik von Elm, Greta Poglia; Bernhard Walder, and Martin R. Tramèr, “Different patterns of duplicate publication: an analysis of articles used in systematic reviews,” 291 J. Am. Med. Ass’n 974 (2004); John Andy Wood, “Methodology for Dealing With Duplicate Study Effects in a Meta-Analysis,” 11 Organizational Research Methods 79, 79 (2008) (“Dependent studies, duplicate study effects, nonindependent studies, and even covert duplicate publications are all terms that have been used to describe a threat to the validity of the meta-analytic process.”) (internal citations omitted); Martin R. Tramèr, D. John M. Reynolds, R. Andrew Moore, Henry J. McQuay, “Impact of covert duplicate publication on meta­analysis: a case study,” 315 Brit. Med. J. 635 (1997); Beverley J Shea, Jeremy M Grimshaw, George A. Wells, Maarten Boers, Neil Andersson, Candyce Hamel, Ashley C. Porter, Peter Tugwell, David Moher, and Lex M. Bouter, “Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews,” 7(10) BMC Medical Research Methodology 2007 (systematic reviews must inquire whether there was “duplicate study selection and data extraction”).

Beware the Academic-Publishing Complex!

January 11th, 2012

Today’s New York Times contains an important editorial on an attempt by some congressmen to undermine access to federally funded research.  See Michael B. Eisen, “Research Bought, Then Paid ForNew York Times (January 11, 2012).  Eisen’s editorial alerts us to this attempt to undo a federal legal requirement that requires federally funded medical research be made available, for free, on the National Library of Medicine’s Web site (NLM).

As a founder of the Public Library of Science (PLoS), which is committed to promoting and implementing the free distribution of scientific research, Eisen may be regarded as an “interested” ora  biased commentator.  Such a simple-minded ascription of bias would be wrong. The PLoS has become an important distribution source of research results in the world of science, and competes with the publishing oligarchies:  Elsevier, Springer, and others.  The articles of the sort that PLoS makes available for free are sold by publishers for $40 or more.  Subscriptions from these oligarchical sources are often priced in the thousands of dollars per year. Eisen’s simple and unassailable point is that the public, whether the medical profession, patients and citizens, students and teachers, should be able to read about the results of research funded with their tax monies.

“[I]f the taxpayers paid for it, they own.”

The United States government and its employees do not enjoy copyright protections for their creative work (and they do not), neither should their contractors.

Public access is all the more important given that the mainstream media seems so reluctant or unable to cover scientific research in a thoughtful and incisive way.

The Bill goes beyond merely unraveling a requirement of making published papers available free of charge at the NLM.    The language of the Bill, H.R.3699, the Research Works Act, creates a false dichotomy between public and private sector research:

 “SEC. 2. LIMITATION ON FEDERAL AGENCY ACTION.

No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that—

(1) causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work … .”

Work that is conducted in private or in state universities, but funded by the federal taxpayers, cannot be said to be “private” in any meaningful sense.  The public’s access to this research, as well as its underlying data, is especially important when the subject matter of the research involves issues that are material to public policy and litigation disputes.

Who is behind this bailout for the private-sector publishing industry?  Congressman Darrell Issa (California) introduced the Bill, on December 16, 2011.  The Bill was cosponsored by Congresswoman Carolyn B. Maloney, the Democratic representative of New York’s 14th district.  Oh Lord, Congresswoman Maloney represents me!  NOT.  How humiliating to be associated with this regressive measure.

This heavy-handed piece of legislation was referred to the House Committee on Oversight and Government Reform.  Let us hope it dies a quick death in committee.  See Michael Eisen, “Elsevier-funded NY Congresswoman Carolyn Maloney Wants to Deny Americans Access to Taxpayer Funded Research” (Jan. 5, 2012).