TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

April 13th, 2024

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.

Confounded by Confounding in Unexpected Places

December 12th, 2021

In assessing an association for causality, the starting point is “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”[1] In other words, before we even embark on consideration of Bradford Hill’s nine considerations, we should have ruled out chance, bias, and confounding as an explanation for the claimed association.[2]

Although confounding is sometimes considered as a type of systematic bias, its importance warrants its own category. Historically, courts have been rather careless in addressing confounding. The Supreme Court, in a case decided before Daubert and the statutory modifications to Rule 702, ignored the role of confounding in a multiple regression model used to support racial discrimination claims. In language that would be reprised many times to avoid and evade the epistemic demands of Rule 702, the Court held, in Bazemore, that the omission of variables in multiple regression models raises an issue that affects “the  analysis’ probativeness, not its admissibility.”[3]

When courts have not ignored confounding,[4] they have sidestepped its consideration by imparting magical abilities to confidence intervals to take care of problem posed by lurking variables.[5]

The advent of the Reference Manual on Scientific Manual allowed a ray of hope to shine on health effects litigation. Several important cases have been decided by judges who have taken note of the importance of assessing studies for confounding.[6] As a new, fourth edition of the Manual is being prepared, its editors and authors should not lose sight of the work that remains to be done.

The Third Edition of the Federal Judicial Center’s and the National Academies of Science, Engineering & Medicine’s Reference Manual on Scientific Evidence (RMSE3d 2011) addressed confounding in several chapters, not always consistently. The chapter on statistics defined “confounder” in terms of correlation between both the independent and dependent variables:

“[a] confounder is correlated with the independent variable and the dependent variable. An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding”[7]

The chapter on epidemiology, on the other hand, defined a confounder as a risk factor for both the exposure and disease outcome of interest:

“A factor that is both a risk factor for the disease and a factor associated with the exposure of interest. Confounding refers to a situation in which an association between an exposure and outcome is all or partly the result of a factor that affects the outcome but is unaffected by the exposure.”[8]

Unfortunately, the epidemiology chapter never defined “risk factor.” The term certainly seems much less neutral than a “correlated” variable, which lacks any suggestion of causality. Perhaps there is some implied help from the authors of the epidemiology chapter when they described a case of confounding by “known causal risk factors,” which suggests that some risk factors may not be causal.[9] To muck up the analysis, however, the epidemiology chapter went on to define “risk” as “[a] probability that an event will occur (e.g., that an individual will become ill or die within a stated period of time or by a certain age).”[10]

Both the statistics and the epidemiology chapters provide helpful examples of confounding and speak to the need for excluding confounding as the basis for an observed association. The statistics chapter, for instance, described confounding as a threat to “internal validity,”[11] and the need to inquire whether the adjustments in multivariate studies were “sensible and sufficient.”[12]

The epidemiology chapter in one passage instructed that when “an association is uncovered, further analysis should be conducted to assess whether the association is real or a result of sampling error, confounding, or bias.[13] Elsewhere in the same chapter, the precatory becomes mandatory.[14]

Legally Unexplored Source of Substantial Confounding

As the Reference Manual implies, attempting to control for confounding is not adequate.  The controlling must be carefully and sufficiently done. Under the heading of sufficiency and due care, there are epidemiologic studies that purport to control for confounding, but fail rather dramatically. The use of administrative databases, whether based upon national healthcare or insurance claims, has become a common place in chronic disease epidemiology. Their large size obviates many concerns about power to detect rare disease outcomes. Unfortunately, there is often a significant threat to the validity of such studies, which are based upon data sets that characterize patients as diabetic, hypertensive, obese, or smokers vel non. By dichotomizing what are continuous variables, the categorization extracts a significant price in multivariate models used in epidemiology.

Of course, physicians frequently create guidelines for normal versus abnormal, and these divisions or categories show up in medical records, in databases, and ultimately in epidemiologic studies. The actual measurements are not always available, and the use of a categorical variable may appear to simplify the statistical analysis of the dataset. Unfortunately, the results can be quite misleading. Consider the measurements of blood pressure in a study that is evaluating whether an exposure variable (such as medication use or environmental contaminant) is associated with an outcome such as cardiovascular or renal disease. Hypertension, if present, would clearly be a confounder, but the use of a categorical variable for hypertension would greatly undermine the validity of the study. If many of the study participants with hypertension had their condition well controlled by medication, then the categorical variable will dilute the adjustment for the role of hypertension in driving the association between the exposure and outcome variables of interest. Even if none of the hypertensive patients had good control, the reduction of all hypertension to a category, rather than a continuous measurement, is a path of the loss of information and the creation of bias.

Almost 40 years ago, Jacob Cohen showed that dichotomization of continuous variables results in a loss of power.[15] Twenty years later, Peter Austin showed in a Monte Carlo simulation that categorizing a continuous variable in a logistic regression results in inflating the rate of finding false positive associations.[16] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Of course, the national databases often have huge sample sizes, which only serves to increase the bias from the use of categorical variables for confounding variables.

The late Douglas Altman, who did so much to steer the medical literature toward greater validity, warned that dichotomizing continuous variables was known to cause loss of information, statistical power, and reliability in medical research.[17]

In the field of pharmaco-epidemiology, the bias created by dichotomization of a continous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[18] While readers are misled into believing that the study adjusts for important co-variates, the study will have lost information and power, with the result of presenting false-positive results that have the false-allure of a fully adjusted model. Indeed, this bias from inadequate control of confounding infects several pending pharmaceutical multi-district litigations.


Supreme Court

General Electric Co. v. Joiner, 522 U.S. 136, 145-46 (1997) (holding that an expert witness’s reliance on a study was misplaced when the subjects of the study “had been exposed to numerous potential carcinogens”)

First Circuit

Bricklayers & Trowel Trades Internat’l Pension Fund v. Credit Suisse Securities (USA) LLC, 752 F.3d 82, 89 (1st Cir. 2014) (affirming exclusion of expert witness who failed to account for confounding in event studies), aff’g 853 F. Supp. 2d 181, 188 (D. Mass. 2012)

Second Circuit

Wills v. Amerada Hess Corp., 379 F.3d 32, 50 (2d Cir. 2004) (holding expert witness’s specific causation opinion that plaintiff’s squamous cell carcinoma had been caused by polycyclic aromatic hydrocarbons was unreliable, when plaintiff had smoked and drunk alcohol)

Deutsch v. Novartis Pharms. Corp., 768 F.Supp. 2d 420, 432 (E.D.N.Y. 2011) (“When assessing the reliability of a epidemiologic study, a court must consider whether the study adequately accounted for “confounding factors.”)

Schwab v. Philip Morris USA, Inc., 449 F. Supp. 2d 992, 1199–1200 (E.D.N.Y. 2006), rev’d on other grounds, 522 F.3d 215 (2d Cir. 2008) (describing confounding in studies of low-tar cigarettes, where authors failed to account for confounding and assessing healthier life styles in users)

Third Circuit

In re Zoloft Prods. Liab. Litig., 858 F.3d 787, 793 (3d Cir. 2017) (affirming exclusion of causation expert witness)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 591 (D.N.J. 2002), aff’d, 68 Fed. Appx. 356 (3d Cir. 2003)(bias, confounding, and chance must be ruled out before an association  may be accepted as showing a causal association)

Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434 (W.D.Pa. 2003) (excluding expert witnesses in Parlodel case; noting that causality assessments and case reports fail to account for confounding)

Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441 (D.V.I. 1994) (unanswered questions about confounding required summary judgment  against plaintiff in Primatene Mist birth defects case)

Fifth Circuit

Knight v. Kirby Inland Marine, Inc., 482 F.3d 347, 353 (5th Cir. 2007) (affirming exclusion of expert witnesses) (“Of all the organic solvents the study controlled for, it could not determine which led to an increased risk of cancer …. The study does not provide a reliable basis for the opinion that the types of chemicals appellants were exposed to could cause their particular injuries in the general population.”)

Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, *7 (E.D. La. June 16, 2015) (excluding expert witness causation opinion that failed to account for other confounding exposures that could have accounted for the putative association), aff’d, 650 F. App’x 170 (5th Cir. 2016)

LeBlanc v. Chevron USA, Inc., 513 F. Supp. 2d 641, 648-50 (E.D. La. 2007) (excluding expert witness testimony that purported to show causality between plaintiff’s benzene ezposure and myelofibrosis), vacated, 275 Fed. App’x 319 (5th Cir. 2008) (remanding case for consideration of new government report on health effects of benzene)

Castellow v. Chevron USA, 97 F. Supp. 2d 780 (S.D. Tex. 2000) (discussing confounding in passing; excluding expert witness causation opinion in gasoline exposure AML case)

Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (confounding in breast implant studies)

Sixth Circuit

Pluck v. BP Oil Pipeline Co., 640 F.3d 671 (6th Cir. 2011) (affirming exclusion of specific causation opinion that failed to rule out confounding factors)

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-54 (6th Cir. 2001) (rewrite: expert’s failure to account for confounding factors in cohort study of alleged PCB exposures rendered his opinion unreliable)

Turpin v. Merrell Dow Pharms., Inc., 959 F. 2d 1349, 1355 -57 (6th Cir. 1992) (discussing failure of some studies to evaluate confounding)

Adams v. Cooper Indus. Inc., 2007 WL 2219212, 2007 U.S. Dist. LEXIS 55131 (E.D. Ky. 2007) (differential diagnosis includes ruling out confounding causes of plaintiffs’ disease).

Seventh Circuit

People Who Care v. Rockford Bd. of Educ., 111 F.3d 528, 537–38 (7th Cir. 1997) (noting importance of considering role of confounding variables in educational achievement);

Caraker v. Sandoz Pharms. Corp., 188 F. Supp. 2d 1026, 1032, 1036 (S.D. Ill 2001) (noting that “the number of dechallenge/rechallenge reports is too scant to reliably screen out other causes or confounders”)

Eighth Circuit

Penney v. Praxair, Inc., 116 F.3d 330, 333-334 (8th Cir. 1997) (affirming exclusion of expert witness who failed to account of the confounding effects of age, medications, and medical history in interpreting PET scans)

Marmo v. Tyson Fresh Meats, Inc., 457 F.3d 748, 758 (8th Cir. 2006) (affirming exclusion of specific causation expert witness opinion)

Ninth Circuit

Coleman v. Quaker Oats Co., 232 F.3d 1271, 1283 (9th Cir. 2000) (p-value of “3 in 100 billion” was not probative of age discrimination when “Quaker never contend[ed] that the disparity occurred by chance, just that it did not occur for discriminatory reasons. When other pertinent variables were factored in, the statistical disparity diminished and finally disappeared.”)

In re Viagra & Cialis Prods. Liab. Litig., 424 F.Supp. 3d 781 (N.D. Cal. 2020) (excluding causation opinion on grounds including failure to account properly for confounding)

Avila v. Willits Envt’l Remediation Trust, 2009 WL 1813125, 2009 U.S. Dist. LEXIS 67981 (N.D. Cal. 2009) (excluding expert witness opinion that failed to rule out confounding factors of other sources of exposure or other causes of disease), aff’d in relevant part, 633 F.3d 828 (9th Cir. 2011)

In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp.2d 1230 (W.D.Wash. 2003) (ignoring study validity in a litigation arising almost exclusively from a single observational study that had multiple internal and external validity problems; relegating assessment of confounding to cross-examination)

In re Bextra and Celebrex Marketing Sales Practice, 524 F. Supp. 2d 1166, 1172 – 73 (N.D. Calif. 2007) (discussing invalidity caused by confounding in epidemiologic studies)

In re Silicone Gel Breast Implants Products Liab. Lit., 318 F.Supp. 2d 879, 893 (C.D.Cal. 2004) (observing that controlling for potential confounding variables is required, among other findings, before accepting epidemiologic studies as demonstrating causation).

Henricksen v. ConocoPhillips Co., 605 F. Supp. 2d 1142 (E.D. Wash. 2009) (noting that confounding must be ruled out)

Valentine v. Pioneer Chlor Alkali Co., Inc., 921 F. Supp. 666 (D. Nev. 1996) (excluding plaintiffs’ expert witnesses, including Dr. Kilburn, for reliance upon study that failed to control for confounding)

Tenth Circuit

Hollander v. Sandoz Pharms. Corp., 289 F.3d 1193, 1213 (10th Cir. 2002) (noting importance of accounting for confounding variables in causation of stroke)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1233 (D. Colo. 1998) (alternative explanations, such confounding, should be ruled out before accepting causal claims).

Eleventh Circuit

In re Abilify (Aripiprazole) Prods. Liab. Litig., 299 F.Supp. 3d 1291 (N.D.Fla. 2018) (discussing confounding in studies but credulously accepting challenged explanations from David Madigan) (citing Bazemore, a pre-Daubert, decision that did not address a Rule 702 challenge to opinion testimony)

District of Columbia Circuit

American Farm Bureau Fed’n v. EPA, 559 F.3d 512 (D.C. Cir. 2009) (noting that data relied upon in setting particulate matter standards addressing visibility should avoid the confounding effects of humidity)

STATES

Delaware

In re Asbestos Litig., 911 A.2d 1176 (New Castle Cty., Del. Super. 2006) (discussing confounding; denying motion to exclude plaintiffs’ expert witnesses’ chrysotile causation opinions)

Minnesota

Goeb v. Tharaldson, 615 N.W.2d 800, 808, 815 (Minn. 2000) (affirming exclusion of Drs. Janette Sherman and Kaye Kilburn, in Dursban case, in part because of expert witnesses’ failures to consider confounding adequately).

New Jersey

In re Accutane Litig., 234 N.J. 340, 191 A.3d 560 (2018) (affirming exclusion of plaintiffs’ expert witnesses’ causation opinions; deprecating reliance upon studies not controlled for confounding)

In re Proportionality Review Project (II), 757 A.2d 168 (N.J. 2000) (noting the importance of assessing the role of confounders in capital sentences)

Grassis v. Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div. 1991) (discussing the possibility that confounders may lead to an erroneous inference of a causal relationship)

Pennsylvania

Porter v. SmithKline Beecham Corp., No. 3516 EDA 2015, 2017 WL 1902905 (Pa. Super. May 8, 2017) (affirming exclusion of expert witness causation opinions in Zoloft birth defects case; discussing the importance of excluding confounding)

Tennessee

McDaniel v. CSX Transportation, Inc., 955 S.W.2d 257 (Tenn. 1997) (affirming trial court’s refusal to exclude expert witness opinion that failed to account for confounding)


[1] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

[2] See, e.g., David A. Grimes & Kenneth F. Schulz, “Bias and Causal Associations in Observational Research,” 359 The Lancet 248 (2002).

[3] Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing Court of Appeal’s decision that would have disallowed a multiple regression analysis that omitted important variables). Buried in a footnote, the Court did note, however, that “[t]here may, of course, be some regressions so incomplete as to be inadmissible as irrelevant; but such was clearly not the case here.” Id. at 400 n.10. What the Court missed, of course, is that the regression may be so incomplete as to be unreliable or invalid. The invalidity of the regression in Bazemore does not appear to have been raised as an evidentiary issue under Rule 702. None of the briefs in the Supreme Court or the judicial opinions cited or discussed Rule 702.

[4]Confounding in the Courts” (Nov. 2, 2018).

[5] See, e.g., Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989) (“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”). This howler has been widely acknowledged in the scholarly literature. See David Kaye, David Bernstein, and Jennifer Mnookin, The New Wigmore – A Treatise on Evidence: Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009) (criticizing the blatantly incorrect interpretation of confidence intervals by the Brock court).

[6]On Praising Judicial Decisions – In re Viagra” (Feb. 8, 2021); See “Ruling Out Bias and Confounding Is Necessary to Evaluate Expert Witness Causation Opinions” (Oct. 28, 2018); “Rule 702 Requires Courts to Sort Out Confounding” (Oct. 31, 2018).

[7] David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” in RMSE3d 211, 285 (3ed 2011). 

[8] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3d 549, 621.

[9] Id. at 592.

[10] Id. at 627.

[11] Id. at 221.

[12] Id. at 222.

[13] Id. at 567-68 (emphasis added).

[14] Id. at 572 (describing chance, bias, and confounding, and noting that “[b]efore any inferences about causation are drawn from a study, the possibility of these phenomena must be examined”); id. at 511 n.22 (observing that “[c]onfounding factors must be carefully addressed”).

[15] Jacob Cohen, “The cost of dichotomization,” 7 Applied Psychol. Measurement 249 (1983).

[16] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[17] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[18] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

People Get Ready – There’s a Reference Manual a Comin’

July 16th, 2021

Science is the key …

Back in February, I wrote about a National Academies’ workshop that featured some outstanding members of the scientific and statistical world, and which gave participants to identify new potential subjects for inclusion in a proposed fourth edition of the Reference Manual on Scientific Evidence.[1] Funding for that new edition is now secured, and the National Academies has published a précis of the February workshop. National Academies of Sciences, Engineering, and Medicine, Emerging Areas of Science, Engineering, and Medicine for the Courts: Proceedings of a Workshop – in Brief (Washington, DC 2021). The Rapporteurs for these proceedings provide a helpful overview for this meeting, which was not generally covered in the legal media.[2]

The goal of the workshop, which was supported by a planning committee, the Committee on Science, Technology, and Law, the National Academies, the Federal Judicial Center, and the National Science Foundation, was, of course, to identify chapters for a new, fourth edition, of Reference Manual on Scientific Evidence. The workshop was co-chaired by Dr. Thomas D. Albright, of the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, Judge on the U.S. Court of Appeals for the Federal Circuit.

The Rapporteurs duly noted Judge O’Malley’s Workshop comments that she hoped that the reconsideration of the Reference Manual can help close the gap between science and the law. It is thus encouraging that the Rapporteurs focused a large part of their summary on the presentation of Professor Xiao-Li Meng[3] on selection bias, which “can come from cherry picking data, which alters the strength of the evidence.” Meng identified the

“7 S’(ins)” of selection bias:

(1) selection of target/hypothesis (e.g., subgroup analysis);

(2) selection of data (e.g., deleting ‘outliers’ or using only ‘complete cases’);

(3) selection of methodologies (e.g., choosing tests to pass the goodness-of-fit); (4) selective due diligence and debugging (e.g., triple checking only when the outcome seems undesirable);

(5) selection of publications (e.g., only when p-value <0.05);

(6) selections in reporting/summary (e.g., suppressing caveats); and

(7) selections in understanding and interpretation (e.g., our preference for deterministic, ‘common sense’ interpretation).”

Meng also addressed the problem of analyzing subgroup findings after not finding an association in the full sample, dubious algorithms, selection bias in publishing “splashy” and nominally “statistically significant” results, and media bias and incompetence in disseminating study results. Meng discussed how these biases could affect the accuracy of research findings, and how these biases obviously affect the accuracy, validity, and reliability of research findings that are relied upon by expert witnesses in court cases.

The Rapporteurs’ emphasis on Professor Meng’s presentation was noteworthy because the current edition of the Reference Manual is generally lacking in a serious exploration of systematic bias and confounding. To be sure, the concepts are superficially addressed in the Manual’s chapter on epidemiology, but in a way that has allowed many district judges to shrug off serious questions of invalidity with the shibboleth that such questions “to to the weight, not the admissibility,” of challenged expert witness opinion testimony. Perhaps the pending revision to Rule 702 will help improve fidelity to the spirit and text of Rule 702.

Questions of bias and noise have come to receive more attention in the professional statistical and epidemiologic literature. In 2009, Professor Timothy Lash published an important book-length treatment of quantitative bias analysis.[4] Last year, statistician David Hand published a comprehensive, but readily understandable, book on “Dark Data,” and the ways statistical and scientific interference are derailed.[5] One of the presenters at the February workshop, nobel laureate, Daniel Kahneman, published a book on “noise,” just a few weeks ago.[6]

David Hand’s book, Dark Data, (Chapter 10) sets out a useful taxonomy of the ways that data can be subverted by what the consumers of data do not know. The taxonomy would provide a useful organizational map for a new chapter of the Reference Manual:

A Taxonomy of Dark Data

Type 1: Data We Know Are Missing

Type 2: Data We Don’t Know Are Missing

Type 3: Choosing Just Some Cases

Type 4: Self- Selection

Type 5: Missing What Matters

Type 7: Changes with Time

Type 8: Definitions of Data

Type 9: Summaries of Data

Type 11: Feedback and Gaming

Type 12: Information Asymmetry

Type 13: Intentionally Darkened Data

Type 14: Fabricated and Synthetic Data

Type 15: Extrapolating beyond Your Data

Providing guidance not only on “how we know,” but also on how we go astray, patho-epistemology, would be helpful for judges and lawyers. Hand’s book really just a beginning to helping gatekeepers appreciate how superficially plausible health-effects claims are invalidated by the data relied upon by proffered expert witnesses.

* * * * * * * * * * * *

“There ain’t no room for the hopeless sinner
Who would hurt all mankind, just to save his own, believe me now
Have pity on those whose chances grow thinner”


[1]Reference Manual on Scientific Evidence v4.0” (Feb. 28, 2021).

[2] Steven Kendall, Joe S. Cecil, Jason A. Cantone, Meghan Dunn, and Aaron Wolf.

[3] Prof. Meng is the Whipple V. N. Jones Professor of Statistics, in Harvard University. (“Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies.”)

[4] Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink, Applying Quantitative Bias Analysis to Epidemiologic Data (2009).

[5] David J. Hand, Dark data : why what you don’t know matters (2020).

[6] Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (2021).

Carl Cranor’s Inference to the Best Explanation

February 12th, 2021

Carl Cranor pays me the dubious honor of quoting my assessment of weight of the evidence (WOE) pseudo-methodology as used by lawsuit industry expert witnesses, in one of his recent publications:

“Take all the evidence, throw it into the hopper, close your eyes, open your heart, and guess the weight. You could be a lucky winner! The weight of the evidence suggests that the weight-of-the-evidence (WOE) method is little more than subjective opinion, but why care if it helps you to get to a verdict!”[1]

Cranor’s intent was to deride my comments, but they hold up fairly well. I have always maintained that if were wrong, I would eat my words, but that they will be quite digestible. Nothing to eat here, though.

In his essay in the Public Affairs Quarterly, Cranor attempts to explain and support his advocacy of WOE in the notorious case, Milward, in which Cranor, along with his friend and business partner, Martyn Smith, served as partisan, paid expert witnesss.[2] Not disclosed in this article is that after the trial court excluded the opinions of Cranor and Smith under Federal Rule of Evidence 702, and plaintiff appealed, the lawsuit industry, acting through The Council for Education and Research on Toxics (CERT) filed an amicus brief to persuade the Court of Appeals to reverse the exclusion. The plaintiffs’ counsel, Cranor and Smith, and CERT failed to disclose that CERT was founded by the two witnesses, Cranor and Smith, whose exclusion was at issue.[3] Many of the lawsuit industry’s regular testifiers were signatories, and none raised any ethical qualms about the obvious conflict of interest, or the conspiracy to pervert the course of justice.[4]

Cranor equates WOE to “inference to the best explanation,” which reductively strips science of its predictive and reproducible nature. Readers may get the sense he is operating in the realm of narrative, not science, and they would be correct. Cranor goes on to conflate WOE methodology with “diagnostic induction,” and “differential diagnosis.”[5] The latter term is well understood in both medicine and in law to involve the assessment of an individual patient’s condition, based upon what is already known upon good and sufficient bases. The term has no accepted or justifiable meaning for assessing general causation. Cranor’s approach would pretermit the determination of general causation by making the disputed cause a differential.

Cranor offers several considerations in support of his WOE-ful methodology. First, he notes that the arguments for causal claims are not deductive. True, but indifferent as to his advocacy for WOE and inference to the best explanation.

Second, Cranor describes a search for relevant evidence once the scientific issue (hypothesis?) is formulated. Again, there is nothing unique about this described step, but Cranor intentionally leaves out considerations of validity, as in extrapolations between high and low dose, or between species. Similarly, he leaves out considerations of validity of study designs (such as whether any weight would be given to case studies, cross-sectional, or ecological studies) or of validity of individual studies.

Cranor’s third step is the formulation of a “sufficiently complete range of reasonable and plausible explanations to account for the evidence.” Again, nothing unique here about WOE, except that Cranor’s WOE abridges the process by ignoring the very real possibility that we do not have the correct plausible explanation available.

Fourth, according to Cranor, scientists rank, explicitly or implicitly, the putative “explanations” by plausibility and persuasiveness, based upon the evidence at hand, in view of general toxicological and background knowledge.[6] Note the absence of consideration of the predictive abilities of the competing explanations, or any felt need to assess the quality of evidence or the validity of study design.

For Cranor, the fifth consideration is to use the initial plausibility assessments, made on incomplete understanding of the phenomena, and on incomplete evidence, to direct “additionally relevant /available evidence to separate founded explanations from less well-founded ones.” Obviously missing from Cranor’s scheme is the idea of trying to challenge or test hypotheses severely to see whether withstand such challenges.

Sixth, Cranor suggests that “all scientifically relevant information” should be considered in moving to the “best supported” explanation. Because “best” is determined based upon what is available, regardless of the quality of the data, or the validity of the inference, Cranor rigs his WOE-ful methodology in favor of eliminating “indeterminate” as a possible conclusion.

In a seventh step, Cranor points to the need to “integrate, synthesize, and assess or evaluate,” all lines of “available relevant evidence.” There is nothing truly remarkable about this step, which clearly requires judgment. Cranor notes that there can be convergence of disparate lines of evidence, or divergence, and that some selection of “lines” of evidence may be endorsed as supporting the “more persuasive conclusion” of causality.[7] In other words, a grand gemish.

Cranor’s WOE-ful approach leaves out any consideration of random error, or systematic bias, or data quality, or study design. The words “bias” and “confounding” do not appear in Cranor’s essay, and he erroneously discusses “error” and “error rates,” only to disparage them as the machinations of defense lawyers in litigation. Similarly, Cranor omits any serious mention of reproducibility, or of the need to formulate predictions that have the ability to falsify tentative conclusions.

Quite stridently, Cranor insists that there is no room for any actual weighting of study types or designs. In apparent earnest, Cranor writes that:

“this conclusion is in accordance with a National Cancer Institute (NCI) recommendation that ‘there should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.”[8]

There is much whining and special pleading about the difficulty, expense, and lack of statistical power of epidemiologic studies, even though the last point is a curious backdoor endorsement of statistical significance. The first two points ignore the availability of large administrative databases from which large cohorts can be identified and studied, with tremendous statistical power. Case-control studies can in some instances be assembled quickly as studies nested in existing cohorts.

As I have noted elsewhere,[9] Cranor’s attempt to level all types of evidence starkly misrepresents the cited “NCI” source, which is not at all an NCI recommendation, but rather a “meeting report” of a workshop of non-epidemiologists.[10] The cited source is not an official pronouncement of the NCI, the authors were not NCI scientists, and the NCI did not sponsored the meeting. The meeting report appeared in the journal Cancer Research as a paid advertisement, not in the NCI’s Journal of the National Cancer Institute as a scholarly article:

“The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.”[11]

Tellingly, Cranor’s deception was relied upon and cited by the First Circuit, in its Milward, decision.[12] The scholarly fraud hit its mark. As a result of Cranor’s own dubious actions, the Milward decision has both both ethical and scholarship black clouds hovering over it.  The First Circuit should withdraw the decision as improvidently decided.

The article ends with Cranor’s triumphant view of Milward,[13] which he published previously, along with the plaintiffs’ lawyer who hired him.[14] What Cranor leaves out is that the First Circuit’s holding is now suspect because of the court’s uncritical acceptance of Cranor’s own misrepresentations and CERT’s omissions of conflict-of-interest disclosures, as well as the subsequent procedural history of the case. After the Circuit reversed the Rule 702 exclusions, and the Supreme Court denied the petition for a writ of certiorari, the case returned to the federal district court, where the defense lodged a Rule 702 challenge to expert witness opinion that attributed plaintiff’s acute promyelocytic leukemia to benzene exposure. This specific causation issue was not previously addressed in the earlier proceedings. The trial court sustained the challenge, which left the plaintiff unable to show specific causation. The result was summary judgment for the defense, which the First Circuit affirmed on appeal.[15] The upshot of the subsequent proceedings, with their dispositive ruling in favor of the defense on specific causation, is that the earlier ruling on general causation is no longer necessary to the final judgment, and not the holding of the case when all the proceedings are considered.

In the end, Cranor’s WOE leaves us with a misdirected search for an “explanation of causation,” rather than a testable, tested, reproducible, and valid “inference of causation.” Cranor’s attempt to invoke the liberalization of the Federal Rules of Evidence ignores the true meaning of “liberal” in being free from dogma and authority. Evidence does not equal eminence, and expert witnesses in court must show their data and defend their inferences, whatever their explanations may be.

——————————————————————————————————–

[1]  Carl F. Cranor, “How Courts’ Reviews of Science in Toxic Tort Cases Have Changed and Why That’s a Good Thing,” 31 Public Affairs Q. 280 (2017), quoting from Schachtman, “WOE-fully Inadequate Methodology – An Ipse Dixit by Another Name” (May 1, 2012).

[2]  Milward v. Acuity Specialty Products Group, Inc., 639 F. 3d 11 (1st Cir. 2011), cert. denied, 132 S.Ct. 1002 (2012).

[3]  SeeThe Council for Education and Research on Toxics” (July 9, 2013).

[4] Among the signatures were Nachman Brautbar, David C. Christiani, Richard W. Clapp, James Dahlgren, Arthur L. Frank, Peter F. Infante, Philip J. Landrigan, Barry S. Levy, David Ozonoff, David Rosner, Allan H. Smith, and Daniel Thau Teitelbaum.

[5]  Cranor at 286-87.

[6]  Cranor at 287.

[7]  Cranor at 287-88.

[8]  Cranor at 290.

[9]  “Cranor’s Defense of Milward at the CPR’s Celebration” (May 12, 2013).

[10]  Michelle Carbone, Jack Gruber, and May Wong, “Modern criteria to establish human cancer etiology,” 14 Semin. Cancer Biol. 397 (2004).

[11]  Michele Carbone, George Klein, Jack Gruber and May Wong, “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Research 5518 (2004).

[12]  Milward v. Acuity Specialty Products Group, Inc., 639 F. 3d 11, 17 (1st Cir. 2011) (“when a group from the National Cancer Institute was asked to rank the different types of evidence, it concluded that ‘[t]here should be no such hierarchy’.”), cert. denied, 132 S.Ct. 1002 (2012).

[13]  Cranor at 292.

[14]  SeeWake Forest Publishes the Litigation Industry’s Views on Milward” (April 20, 2013).

[15]  Milward v. Acuity Specialty Products Group, Inc., 969 F. Supp. 2d 101 (D. Mass. 2013), aff’d sub nom. Milward v. Rust-Oleum Corp., 820 F.3d 469 (1st Cir. 2016).

Hacking at the “A” Cell

November 10th, 2020

At the heart of epidemiologic studies and clinical trials is the contingency table. The term, contingency table, was introduced by Karl Pearson in the early 20th century as a way to explore the independence, vel non, in a multivariate model. The simplest version of the table is the “2 by 2” table that is at the heart of case-control and other studies:

  Cases (with outcome of interest) Controls (without outcome of interest)  
Exposure of Interest Present                A                  B A + B

Marginal total of all exposed

Exposure of Interest Absent                C                  D C + D

Marginal total of all non-exposed

  A + C

Marginal total of cases

B + D

Marginal total of controls

A + B + C + D

Total observed in study

 

A measure of association between the exposure of interest and the outcome of interest can be shown in the odds ratio (OR), which can be assessed for random error on the assumption of no association.

OR = (A/C)/(B/D) = A*D/B*C

The measurement of the OR turns on faithfully applying the same method of counting cases regardless of exposure status. When investigators expand the “A” cell by loosening their criteria for exposure, we say that they have engaged in “hacking the A cell.”

Something akin to hacking the A cell occurred in the large epidemiologic study, known as  “Yale Hemorrhagic Stroke Project (HSP),” which was the center piece of the plaintiffs’ case in In re Phenylpropanolamine Products Liability Litigation. Although the HSP was sponsored by manufacturers, it was conducted independently without any manufacturer oversight beyond the protocol. The FDA reviewed the HSP results, and ultimately the HSP was published in the New England Journal of Medicine.[1]

The HSP was challenged in a Rule 702 hearing in the Multi-District Litigation (MDL). The MDL judge, Judge Rothstein, conducted hearings and entertained extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon the HSP. The hearings, however, could not go beyond doubts raised by the published paper, and Judge Rothstein permitted plaintiffs’ expert witnesses’ proffered testimony based upon the study, finding that:

“The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.”[2]

The HSP study was subjected to much greater analysis in litigation.  After the MDL concluded its abridged gatekeeping process, the defense successfully sought the underlying data to the HSP. These data unraveled the HSP paper by showing that the study investigators had deviated from the protocol in a way to increase the number of exposed cases (A cell), with the obvious result of increasing the OR reported by the study.

Both sides of the PPA litigation accused the other side of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published. A notable string of defense verdicts ensued. After one of the early defense verdicts, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data that went behind the peer-reviewed publication.  The trial court rejected the request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers. Yale gets — Yale gets a big black eye on this.”[3]

Today we can see the equivalent of “A” cell hacking in a rather sleazy attempt by the Banana Republicans to steal a presidential election they lost. Cry-baby conservatives are seeking recounts where they lost, but not where they won. They are challenging individual ballots on the basis of outcome. They are raising speculative questions about the electoral processes of entire states, even where the states in question have handed them notable wins down ballot.


[1]  Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, Joseph P. Broderick, Thomas Brott, Edward Feldmann, Lewis B. Morgenstern,  Janet Lee Wilterdink, and Ralph I. Horwitz, “Phenylpropanolamine and the Risk of Hemorrhagic Stroke,” 343 New Engl. J. Med. 1826 (2000). SeeMisplaced Reliance On Peer Review to Separate Valid Science From Nonsense” (Aug. 14, 2011).

[2]  In re Phenylpropanolamine Prod. Liab. Litig., 289 F. 2d 1230, 1239 (2003) (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science).  There were many layers of peer review for the HSP study, all of which proved ultimately ineffectual compared with the closer scrutiny that the HSP received in litigation where underlying data were produced.

[3]  O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr).