How Access to a Protocol and Underlying Data Gave Yale Researchers a Big Black Eye

Prelude to Litigation

Phenylpropanolamine (PPA) was a widely used direct α-adrenergic agonist used as a medication to control cold symptoms and to suppress appetite for weight loss.[1] In 1972, an over-the-counter (OTC) Advisory Review Panel considered the safety and efficacy of PPA-containing nasal decongestant medications, leading, in 1976, to a recommendation that the agency label these medications as “generally recognized as safe and effective.” Several years later, another Panel recommended that PPA-containing weight control products also be recognized as safe and effective.

Six years later, in 1982, another FDA panel recommended that PPA be considered safe and effective for appetite suppression in dieting.  Two epidemiologic studies of PPA and hemorrhagic stroke were conducted in the 1980s. The results of one study by Hershel Jick and colleagues, presented as a letter to the editor, reported a relative risk of 0.58, with a 95% exact confidence interval, 0.03 – 2.9.[2] A year later, two researchers, reporting a study based upon Medicaid databases, found no significant associations between HS and PPA.[3]

The FDA, however, did not approve a final monograph for PPA, with recognition of its “safe and effective” status because of occasional reports of hemorrhagic stroke that occurred in patients who used PPA-containing medications, mostly young women who had used PPA appetite suppressants for dieting. In 1982, the FDA requested information on the effects of PPA on blood pressure, particularly with respect to weight-loss medications. The agency deferred a proposed 1985 final monograph because of the blood pressure issue.

The FDA deemed the data inadequate to answer its safety concerns. Congressional and agency hearings in the early 1990s amplified some public concern, but in 1990, the Director of Cardio-Renal Drug Products, at the Center for Drug Evaluation and Research, found several well-supported facts, based upon robust evidence. Blood pressure studies in humans showed a biphasic response. PPA initially causes blood pressure to rise above baseline (a pressor effect), and then to fall below baseline (depressor effect). These blood pressure responses are dose-related, and diminish with repeated use. Patients develop tolerance to the pressor effects within a few hours. The Center concluded that at doses of 50 mg of PPA and below, the pressor effects of the medication are smaller, indeed smaller than normal daily variations in basal blood pressure. Humans develop tolerance to the pressor effects quickly, within the time frame of a single dose. The only time period in which even a theoretical risk might exist is within a few hours, or less, of a patient’s taking the first dose of PPA medication. Doses of 25 mg. immediate-release PPA could not realistically be considered to pose any “absolute safety risk and have a reasonable safety margin.”[4]

In 1991, Dr. Heidi Jolson, an FDA scientist wrote that the agency’s spontaneous adverse event reporting system “suggested” that PPA appetite suppressants increased the risk of cerebrovascular accidents. A review of stroke data, including the adverse event reports, by epidemiology consultants failed to support a causal association between PPA and hemorrhagic stroke (HS). The reviewers, however, acknowledged that the available data did not permit them to rule out a risk of HS. The FDA adopted the reviewers’ recommendation for a prospective, large case-control study designed to take into account the known physiological effects of PPA on blood pressure.[5]

What emerged from this regulatory indecision was a decision to conduct another epidemiologic study. In November 1992, a manufacturers’ group, now known as the Consumer Healthcare Products Association (CHPA) proposed a case-control study that would become known as the Hemorrhagic Stroke Project (HSP). In March 1993, the group submitted a proposed protocol, and a suggestion that the study be conducted by several researchers at Yale University. After feedback from the public and the Yale researchers, the group submitted a final protocol in April 1994. Both the researchers and the sponsors agreed to a scientific advisory group that would operate independently and oversee the study. The study began in September 1994. The FDA deferred action on a final monograph for PPA, and product marketing continued.

The Yale HSP authors delivered their final report on their case-control study to FDA, in May 2000.[6] The HSP was a study, with 702 HS cases, and over 1,376 controls, men and women, ages 18 to 49. The report authors concluded that “the results of the HSP suggest that PPA increases the risk for hemorrhagic stroke.”[7] The study had taken over five years to design, conduct, and analyze. In September 2000, the FDA’s Office of Post-Marketing Drug Risk Assessment released the results, with its own interpretation and conclusion that dramatically exceeded the HSP authors’ own interpretation.[8] The FDA’s Non-Prescription Drug Advisory Committee then voted, on October 19, 2000, to recommend that PPA be reclassified as “unsafe.” The Committee’s meeting, however, was attended by several leading epidemiologists who pointed to important methodological problems and limitations in the design and execution of the HSP.[9]

In November 2000, the FDA” Nonprescription Drugs Advisory Committee determined that there was a significant association PPA and HS, and recommended that PPA not be considered safe for OTC use. The FDA never addressed causality; nor did it have to do so under governing law. The FDA’s actions led the drug companies voluntarily to withdraw PPA-containing products.

The December 21, 2000, issue of The New England Journal of Medicine featured a revised version of the HSP report as its lead article.[10] Under the journal’s guidelines for statistical reporting, the authors were required to present two-tailed p-values or confidence intervals. Results from the HSP Final Report looked considerably less impressive after the obtained significance probabilities were doubled. Only the finding in appetite suppressant use was branded an independent risk factor:

“The results suggest that phenylpropanolamine in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.”[11]

The HSP had multiple pre-specified aims, and several other statistical comparisons and analyses were added along the way. No statistical adjustment was made for these multiple comparisons, but their presence in the study must be considered. Perhaps that is why the authors merely suggest that PPA in appetite suppressants was an independent risk factor for HS in women. Under current statistical guidelines for the New England Journal of Medicine, this suggestion might require even further qualification and weakening.[12]

The HSP study faced difficult methodological issues. The detailed and robust identification of PPA’s blood pressure effects in humans focused attention on the crucial timing of timing of a HS in relation to ingestion of a PPA medication. Any use, or any use within the last seven or 30 days, would be fairly irrelevant to the pathophysiology of a cerebral hemorrhage. The HSP authors settled on a definition of “first use” as any use of a PPA product within 24 hours, and no other uses in the previous two weeks.[13] Given the rapid onset of pressor and depressor effects, and adaptation response, this definition of first use was generous and likely included many irrelevant exposed cases, but at least the definition attempted to incorporate the phenomena of short-lived effect and adaption. The appetite suppressant association did not involve any “first use,” which makes the one “suggested” increase risk much less certain and relevant.

The alternative definition of exposure, in addition to “first use,” the ingestion of the PPA-containing medication took place as “the index day before the focal time and the preceding three calendar days.” Again, given the known pharmacokinetics and physiological effects of PPA, this three-day (plus) window seems doubtfully relevant.

All instances of “first use” occurred among men and women who used a cough or cold remedy, with an adjusted OR of 3.14, with a 95% confidence interval (CI), of 0.96–10.28), p = 0.06. The very wide confidence interval, in excess of an order of magnitude, reveals the fragility of the statistical inference. There were but 8 first use exposed stroke cases (out of 702), and 5 exposed controls (out of 1,376).

When this first use analysis is broken down between men and women, the result becomes even more fragile. Among men, there was only one first use exposure in 319 male HS patients, and one first use exposure in 626 controls, for an adjusted OR of 2.95, CI 0.15 – 59.59, and p = 0.48. Among women, there were 7 first use exposures among 383 female HS patients, and 4 first use exposures among 750 controls, with an adjusted OR of 3.13, CI 0.86 – 11.46, p = 0.08.

The small numbers of actual first exposure events speak loudly for the inconclusiveness and fragility of the study results, and the sensitivity of the results to any methodological deviations or irregularities. Of course, for the one “suggested” association for appetite suppressant use among women, the results were even more fragile. None of the appetite suppressant cases were “first use,” which raises serious questions whether anything meaningful was measured. There were six (non-first use) exposed among 383 female HS patients, with only a single exposed female control among 750. The authors presented an adjusted OR of 15.58, with a p-value of 0.02. The CI, however, spanned more than two orders of magnitude, 1.51 – 182.21, which makes the result well-nigh uninterpretable. One of six appetite suppressant cases was also a user of cough-cold remedies, and she was double counted in the study’s analyses. This double-counted case, had a body-mass index of 19, which is certainly not overweight, and at the low end of normal.[14] The one appetite suppressant control was obese.

For the more expansive any exposure analysis for use of PPA cough-cold medication, the results were significantly unimpressive. There were six exposed male cases among 391 male HS cases, and 13 exposed controls, for an adjusted odds ratio of 0.62, CI 0.20 – 1.92, p = 0.41. Although not an inverse association, the sample results for men were incompatible with a hypothetical doubling of risk. For women, on the expansive exposure definition, there were 16 exposed cases, among 383 female cases, with 19 exposed controls out of 750 female controls.  The odds ratio for female PPA cough-cold medication was 1.54, CI 0.76 – 3.14, p = 0.23.

Aside from doubts whether the HSP measured meaningful exposures, the small number of exposed cases and controls present insuperable interpretative difficulties for the study. First, working with a case-control design and odds ratios, there should be some acknowledgment that odds ratios always exaggerate the observed association size compared with a relative risk.[15] Second, the authors knew that confounding would be an important consideration in evaluating any observed association. Known and suspected risk factors were consistently more prevalent among cases than controls.[16]

The HSP authors valiantly attempted to control for confounding in two ways. They selected controls by a technique known as random digit dialing, to find two controls for each case, matched on telephone exchange, sex, age, and race. The HSP authors, however, used imperfectly matched controls rather than lose the corresponding case from their study.[17] For other co-variates, the authors used multi-variate logistic regression to provide odds ratios that were adjusted for potential confounding from the measured covariates. At least two of co-variates, alcohol and cocaine use, in the population under age 50 sample involved potential legal or moral judgment, which almost certainly would have skewed interview results.

An even more important threat to methodological validity, key co-variates, such as smoking, alcohol use, hypertension, and cocaine use were incorporated into the adjustment regression as dichotomous variables; body mass index was entered as a polychotomous variable. Monte Carlo simulation shows that categorizing a continuous variable in logistic regression results in inflating the rate of finding false positive associations.[18] The type I (false-positive) error rates increases with sample size, with increasing correlation between the confounding variable and outcome of interest, and the number of categories used for the continuous variables. Numerous authors have warned of the cost and danger of dichotomizing continuous variables, in losing information, statistical power, and reliability.[19]  In the field of pharmaco-epidemiology, the bias created by dichotomization of a continuous variable is harmful from both the perspective of statistical estimation and hypothesis testing.[20] Readers will be misled into believing that a study has adjusted for important co-variates with the false allure of fully adjusted model.

Finally, with respect to the use of logistic regression to control confounding and provide adjusted odds ratios, there is the problem of the small number of events. Although the overall sample size is adequate for logistic regression, cell sizes of one, or two, or three, raise serious questions about the use of large-sample statistical methods for analysis of the HSP results.[21]

A Surfeit of Sub-Groups

The study protocol identified three (really four or five) specific goals, to estimate the associations: (1) between PPA use and HS; (2) between HS and type of PPA use (cough-cold remedy or appetite suppression); and (3) in women, between PPA appetite suppressant use and HS, and between PPA first use and HS.[22]

With two different definitions of “exposure,” and some modifications added along the way, with two sexes, two different indications (cold remedy and appetite suppression), and with non-pre-specified analyses such as men’s cough-cold PPA use, there was ample opportunity to inflate the Type I error rate. As the authors of the HSP final report acknowledged, they were able to identify only 60 “exposed” cases and controls.[23] In the context of a large case-controls study, the authors were able to identify some nominally statistically significant outcomes (PPA appetite suppressant and HS), but these were based upon very small numbers (six and one exposed, cases and controls, respectively), which made the results very uncertain considering the potential biases and confounding.

Design and Implementation Problems

Case-control studies always present some difficulty of obtaining controls that are similar to cases except that they did not experience the outcome of interest. As noted, controls were selected using “random digit dialing” in the same area code as the cases. The investigators were troubled by poor response rates from potential controls. They deviated from standard methodology for enrolling controls through random digit dialing by enrolling the first eligible control who agreed to participate, while failing to call back candidates who had asked to speak at another time.[24]

The exposure prevalence rate among controls was considerably lower than shown from PPA-product marketing research. This again raises questions about the low reported exposure rates among controls, which would inflate any observed odds ratios. Of course, it seems eminently reasonable to predict that persons who were suffering from head colds or the flu might not answer their phones or might request a call back. People who are obese might be reluctant to tell a stranger on the telephone that they are using a medication to suppress their appetite.

In the face of this obvious opportunity for selection bias, there was also ample room for recall bias. Cases were asked about medication use just before a unforgettable catastrophic event in their lives. Controls were asked about medication use before a day within the range of the previous week. More controls were interviewed by phone than were cases. Given the small number of exposed cases and controls, recall bias created by the differential circumstances and interview settings and procedures, was never excluded.

Lumpen Epidemiology ICH vs SAH

Every epidemiologic study or clinical trial has an exposure and outcome of interest, in a population of interest. The point is to compare exposed and unexposed persons, of relevant age, gender, and background, with comparable risk factors other than the exposure of interest, to determine if the exposure makes any difference in the rate of events of the outcome of interest.

Composite end points represent “lumping” together different individual end points for consideration as a single outcome. The validity of composite end points depends upon assumptions, which will have to be made at the time investigators design their study and write their protocol.  After the data are collected and analyzed, the assumptions may or may not be supported.

Lumping may offer some methodological benefits, such as increasing statistical power or reducing sample size requirements. Standard epidemiologic practice, however, as reflected in numerous textbooks and methodology articles, requires the reporting of the individual constitutive end points, along with the composite result. Even when the composite end point was employed based upon a view that the component end points are sufficiently related, that view must itself ultimately be tested by showing that the individual end points are, in fact, concordant, with risk ratios in the same direction.

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  In 2004, the British Medical Journal published a useful paper, “Users’ guide to detecting misleading claims in clinical research reports,” One of the authors’ suggestions to readers was:

“Beware of composite endpoints.”[25]

The one methodological point to which virtually all writers agree is that authors should report the results for the composite end point separately to permit readers to evaluate the individual results.[26]  A leading biostatistical methodologist, the late Douglas Altman, cautioned readers against assuming that the overall estimate of association can be interpreted for each individual end point, and advised authors to provide “[a] clear listing of the individual endpoints and the number of participants experiencing them” to permit a more meaningful interpretation of composite outcomes.[27]

The HSP authors used a composite of hemorrhagic strokes, which was composed of both intracerebral hemorrhages (ICH) and subarachnoid hemorrhages (SAH). In their New England Journal of Medicine article, the authors presented the composite end point, but not the risk ratios for the two individual end points. Before they published the article, one of the authors wrote his fellow authors to advise them that because ICH and SAH are very different medical phenomena, they should present the individual end points in their analysis.[28]

The HSP researchers eventually did publish an analysis of SAH and PPA use.[29] The authors identified 425 SAH cases, of which 312 met the criteria for aneurysmal SAH. They looked at many potential risk factors such as smoking (OR = 5.07), family history (OR = 3.1), marijuana (OR = 2.38), cocaine (OR = 24.97), hypertension (OR = 2.39), aspirin (OR = 1.24), alcohol (OR = 2.95), education, as well as PPA.

Only a bivariate analysis was presented for PPA, with an odds ratio of 1.15, p = 0.87. No confidence intervals were presented. The authors were a bit more forthcoming about the potential role of bias and confounding in this publication than they were in their earlier 2000 HSP paper. “Biases that might have affected this analysis of the HSP include selection and recall bias.”[30]

Judge Rothstein’s Rule 702 opinion reports that the “Defendants assert that this article demonstrates the lack of an association between PPA and SAHs resulting from the rupture of an aneurysm.”[31] If the defendants actually claimed a “demonstration” of “the lack of association,” then shame, and more shame, on them! First, the cited study provided only a bivariate analysis for PPA and SAH. The odds ratio of 1.15 pales in comparison the risk ratios reported for many other common exposures. We can only speculate what happens to the 1.15, when the PPA exposure is placed in a fully adjusted model for all important covariates. Second, the p-value of 0.87 does not tell that 1.15 is unreal or due to chance. The HSP reported a 15% increase in odds ratio, which is very compatible with no risk at all. Perhaps if the defendants had been more modest in their characterization they would not have given the court the basis to find that “defendants distort and misinterpret the Stroke Article.”[32]

Rejecting the defendants’ characterization, the court drew upon an affidavit from plaintiffs’ expert witness, Kenneth Rothman, who explained that a p-value cannot provide evidence of lack of an effect.[33] A high p-value, with its corresponding 95% confidence interval that includes 1.0, can, however, show that the sample data are compatible with the null hypothesis. What Judge Rothstein missed, and the defendants may not have said effectively, is that the statistical analysis was a test of an hypothesis, and the test failed to allow us to reject the null hypothesis.  The plaintiffs were left with an indeterminant analysis, from which they really could not honestly claim an association between PPA use and aneurismal SAH.

I Once Was Blind, But Now I See

The HSP protocol called for interviewers to be blinded to the study hypothesis, but this guard against bias was abandoned.[34]  The HSP report acknowledged that “[b]linding would have provided extra protection against unequal ascertainment of PPA exposure in case subjects compared with control subjects.”[35]

The study was conducted out of four sites, and at least one of the sites violated protocol by informing cases that they were participating in a study designed to evaluate PPA and HS.[36] The published article in the New England Journal of Medicine misleadingly claimed that study participants were blinded to its research hypothesis.[37] Although the plaintiffs’ expert witnesses tried to slough off this criticism, the lack of blinding among interviewers and study subjects amplifies recall biases, especially when study subjects and interviewers may have been reluctant to discuss fully several of the co-variate exposures, such as cocaine, marijuana, and alcohol use.[38]

No Causation At All

Scientists and the general population alike have been conditioned to view the controversy over tobacco smoking and lung cancer as a contrivance of the tobacco industry. What is lost in this conditioning is the context of Sir Arthur Bradford Hill’s triumphant 1965 Royal Society of Medicine presidential address. Hill, along with his colleague Sir Richard Doll, were not overly concerned with the tobacco industry, but rather the important methodological criticisms  posited by three leading statistical scientists, Joseph Berkson, Jerzy Neyman, and Sir Ronald Fisher. Hill and Doll’s success in showing that tobacco smoking causes lung cancer required sufficient rebuttal to these critics. The 1965 speech is often cited for its articulation of nine factors to consider in evaluating an association, but the necessary condition is often overlooked. In his speech, Hill identified the situation before the nine factors come into play:

“Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”[39]

The starting point, before the Bradford Hill nine factors come into play, requires a “clear-cut” association, which is “beyond what we would care to attribute to the play of chance.”  What is “clear-cut” association?  The most reasonable interpretation of Bradford Hill is that the starting point is an association that is not the result of chance, bias, or confounding.

Looking at the state of the science after the HSP was published, there were two studies that failed to find any association between PPA and HS. The HSP authors “suggested” an association between PPA appetite suppressant and HS, but with six cases and one control, this was hardly beyond the play of chance. And none of the putative associations were “clear cut” in removing bias and confounding as an explanation for the observations.

And Then Litigation Cometh

A tsunami of state and federal cases followed the publication of the HSP study.[40] The Judicial Panel on Multi-district Litigation gave Judge Barbara Rothstein, in the Western District of Washington, responsibility for the pre-trial management of the federal PPA cases. Given the problems with the HSP, the defense unsurprisingly lodged Rule 702 challenges to plaintiffs’ expert witnesses’ opinions, and Rule 703 challenges to reliance upon the HSP.[41]

In June 2003, Judge Rothstein issued her decision on the defense motions. After reviewing a selective regulatory history of PPA, the court turned to epidemiology, and its statistical analysis.  Although misunderstanding of p-values and confidence intervals is endemic among the judiciary, the descriptions provided by Judge Rothstein portended a poor outcome:

“P-values measure the probability that the reported association was due to chance, while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”[42]

Both descriptions are seriously incorrect,[43] which is especially concerning given that Judge Rothstein would go on, in 2003, to become the director of the Federal Judicial Center, where she would oversee work on the Reference Manual on Scientific Evidence.

The MDL court also managed to make a mash out of the one-tailed test used in the HSP report. That report was designed to inform regulatory action, where actual conclusions of causation are not necessary. When the HSP authors submitted their paper to the New England Journal of Medicine, they of course had to comply with the standards of that journal, and they doubled their reported p-values to comply with the journal’s requirement of using a two-tailed test. Some key results of the HSP no longer had p-values below 5 percent, as the defense was keen to point out in its briefings.

From the sources it cited, the court clearly did not understand the issue, which was the need to control for random error. The court declared that it had found:

“that the HSP’s one-tailed statistical analysis complies with proper scientific methodology, and concludes that the difference in the expression of the HSP’s findings [and in the published article] falls far short of impugning the study’s reliability.”[44]

This finding ignores the very different contexts between regulatory action and causation in civil litigation. The court’s citation to an early version of the Reference Manual on Scientific Evidence further illustrates its confusion:

“Since most investigators of toxic substances are only interested in whether the agent increases the incidence of disease (as distinguished from providing protection from the disease), a one-tailed test is often viewed as appropriate.”

*****

“a rigid rule [requiring a two-tailed test] is not required if p-values and significance levels are used as clues rather than as mechanical rules for statistical proof.”[45]

In a sense, given the prevalence of advocacy epidemiology, many researchers are interested in only showing an increased risk. Nonetheless, the point of evaluating p-values is to assess random error involved in sampling of a population, and that sampling generates a rate of error even when the null hypothesis is assumed to be absolutely correct. Random error can go in either direction, resulting in risk ratios above or below 1.0. Indeed, the probability of observing a risk ratio of exactly 1.0, in a large study, is incredibly small even if the null hypothesis is correct. The risk ratio for men who had used a PPA product was below 1.0, which also recommends a two-tailed test. Trading on the confusion of regulatory and litigation findings, the court proceeded to mischaracterize the parties’ interests in designing the HSP, as only whether PPA increased the risk of stroke. In the MDL, the parties did not want “clues,” or help on what FDA policy should be; they wanted a test of the causal hypothesis.

In a footnote, the court pointed to testimony of Dr. Ralph Horwitz, one of the HSP investigators, who stated that all parties “[a]ll parties involved in designing the HSP were interested solely in testing whether PPA increased the risk of stroke.” The parties, of course, were not designing the HSP for support for litigation claims.[46] The court also cited, in this footnote, a then recent case that found a one-tailed p-value inappropriate “where that analysis assumed the very fact in dispute.” The plaintiffs’ reliance upon the one-sided p-values in the unpublished HSP report did exactly that.[47] The court tried to excuse the failure to rule out random error by pointing to language in the published HSP article, where the authors stated that inconclusive findings raised “concern regarding  safety.”[48]

In analyzing the defense challenge to the opinions based upon the HSP, Judge Rothstein committed both legal and logical fallacies. First, citing Professor David Faigman’s treatise for the proposition that epidemiology is widely accepted because the “general techniques are valid,” the court found that the HSP, and reliance upon it, was valid, despite the identified problems. The issue was not whether epidemiological techniques are valid, but whether the techniques used in the HSP were valid. The devilish details of the HSP in particular largely went ignored.[49] From a legal perspective, Judge Rothstein’s opinion can be seen to place a burden upon the defense to show invalidity, by invoking a presumption of validity. This shifting of the burden was then, and is now, contrary to the law.

Perhaps the most obvious dodge of the court’s gatekeeping responsibility came with the conclusory assertion that the “Defendants’ ex post facto dissection of the HSP fails to undermine its reliability. Scientific studies almost invariably contain flaws.”[50] Perhaps it is sobering to consider that all human beings have flaws, and yet somehow we distinguish between sinners and saints, and between criminals and heroes. The court shirked its responsibility to look at the identified flaws to determine whether they threatened the HSP’s internal validity, as well as its external validity in the plaintiffs’ claims for hemorrhagic strokes in each of the many subgroups considered in the HSP, as well as outcomes not considered, such as myocardial infarction and ischemic stroke. Given that there was but one key epidemiologic study relied upon for support of the plaintiffs’ extravagant causal claims, the identified flaws might be expected to lead to some epistemic humility.

The PPA MDL court exhibited a willingness to cherry pick HSP results to support its low-grade gatekeeping. For instance, the court recited that “[b]ecause no men reported use of appetite suppressants and only two reported first use of a PPA-containing product, the investigators could not determine whether PPA posed an increased risk for hemorrhagic stroke in men.”[51] There was, of course, another definition of PPA exposure that yielded a total of 19 exposed men, about one-third of all exposed cases and controls. All exposed men used OTC PPA cough cold remedies, six men with HS, and 13 controls, with a reported odds ratio of 0.62 (95%, C.I., 0.20 – 1.92); p = 0.41. Although the result for men was not statistically significant, the point estimate for the sample was a risk ratio below one, with a confidence interval that excludes a doubling of the risk based upon this sample statistic. The number of male HS exposed cases was the same as the number of female HS appetite suppressant cases, which somehow did not disturb the court.

Superficially, the PPA MDL court appeared to place great weight on the fact of peer review publication in a prestigious journal, by well-credentialed scientists and clinicians. Given that “[t]he prestigious NEJM published the HSP results …  research bears the indicia of good science.”[52] Although Professor Susan Haack’s writings on law and science are often errant, her analysis of this kind of blind reliance on peer review is noteworthy:

“though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value. The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication  — perhaps for no better reason than that the law reviews are not peer-reviewed!”[53]

Ultimately, the PPA MDL court revealed that it was quite inattentive to the validity concerns of the HSP. Among the cases filed in the federal court were heart attack and ischemic stroke claims.  The HSP did not address those claims, and the MDL court was perfectly willing to green light the claims on the basis of case reports and expert witness hand waving about “plausibility.”  Not only was this reliance upon case reports plus biological plausibility against the weight of legal authority, it was against the weight of scientific opinion, as expressed by the HSP authors themselves:

“Although the case reports called attention to a possible association between the use of phenylpropanolamine and the risk of hemorrhagic stroke, the absence of control subjects meant that these studies could not produce evidence that meets the usual criteria for valid scientific inference”[54]

Since no epidemiology was necessary at all for ischemic stroke and myocardial infarction claims, then a deeply flawed epidemiologic study was thus even better than nothing. And peer review and prestige were merely window dressing.

The HSP study was subjected to much greater analysis in actual trial litigation.  Before the MDL court concluded its abridged gatekeeping, the defense successfully sought the underlying data to the HSP. Plaintiffs’ counsel and the Yale investigators resisted and filed motions to quash the defense subpoenas. The MDL court denied the motions and required the parties to collaborate on redaction of medical records to be produced.[55]

In a law review article published a few years after the PPA Rule 702 decision, Judge Rothstein immodestly described the PPA MDL as a “model mass tort,” and without irony characterized herself as having taken “an aggressive role in determining the admissibility of scientific evidence [].”[56]

The MDL court’s PPA decision stands as a landmark of judicial incuriousness and credulity.  The court conducted hearings and entertaining extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  In the end, publication in a prestigious peer-reviewed journal proved to be a proxy for independent review and an excuse not to exercise critical judgment: “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” Id. at 1239 (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science). The admissibility challenges were refused.

Exuberant Praise for Judge Rothstein

In 2009, an American Law Institute – American Bar Association continuing legal education seminar on expert witnesses and environmental litigation, Anthony Roisman presented on “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination.” Roisman has been active in various plaintiff advocacy organizations, including serving as the head of the American Trial Lawyers’ Association Section on Toxic, Environmental & Pharmaceutical Torts (STEP). In his 2009 lecture, Roisman praised Rothstein’s PPA Rule 702 decision as “the way Daubert should be interpreted.” More concerning was Roisman’s revelation that Judge Rothstein wrote the PPA decision, “fresh from a seminar conducted by the Tellus Institute, which is an organization set up of scientists to try to bring some common sense to the courts’ interpretation of science, which is what is going on in a Daubert case.”[57]

Roisman’s endorsement of the PPA decision may have been purely result-oriented jurisprudence, but what of his enthusiasm for the “learning” that Judge Rothstein received fresh from the Tellus Institute.  What exactly is or was the Tellus Institute?

In June 2003, the same month as Judge Rothstein’s PPA decision, the Tellus Institute supported a group known as Scientific Knowledge and Public Policy (SKAPP), in publishing an attack on the Daubert decision. The Tellus-SKAPP paper, “Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of,” appeared online in 2003.[58]

David Michaels, a plaintiffs’ expert in chemical exposure cases, and a founder of SKAPP, has typically described his organization as having been funded by the Common Benefit Trust, “a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation.”[59] What Michaels hides is that this “Trust” is nothing other than the common benefits fund set up in MDL 926, as it is for most MDLs, to permit plaintiffs’ counsel to retain and present expert witnesses in the common proceedings. In other words, it was the plaintiffs’ lawyers’ walking-around money. SKAPP’s sister organization, the Tellus Institute is clearly aligned with SKAPP. Alas, Richard Clapp, who was a testifying expert witness for PPA plaintiffs, was an active member of the Tellus Institute, at the time of the judicial educational seminar for Judge Rothstein.[60] Clapp is listed as a member of the planning committee responsible for preparing the anti-Daubert pamphlet. In 2005, as director of the Federal Judicial Center, Judge Rothstein attended another conference, “the Coronado Conference, which was sponsored by SKAPP.[61]

Roisman’s revelation in 2009, after the dust had settled on the PPA litigation, may well put Judge Rothstein in the same category as Judge James Kelly, against whom the U.S. Court of Appeals for the Third Circuit issued a writ of mandamus for recusal. Judge Kelly was invited to attend a conference on asbestos medical issues, set up by Dr. Irving Selikoff with scientists who testified for plaintiffs’ counsel. The conference was funded by plaintiffs’ counsel. The co-conspirators, Selikoff and plaintiffs’ counsel, paid for Judge Kelly’s transportation and lodgings, without revealing the source of the funding.[62]

In the case of Selikoff and Motley’s effort to subvert the neutrality of Judge James M. Kelly in the school district asbestos litigation, and pervert the course of justice, the conspiracy was detected in time for a successful recusal effort. In the PPA litigation, there was no disclosure of the efforts by the anti-Daubert advocacy group, the Tellus Institute, to undermine the neutrality of a federal judge. 

Aftermath of Failed MDL Gatekeeping

Ultimately, the HSP study received much more careful analysis before juries. Although the cases that went to trial involved plaintiffs with catastrophic injuries, and a high-profile article in the New England Journal of Medicine, the jury verdicts were overwhelmingly in favor of the defense.[63]

In the first case that went to trial (but second to verdict), the defense presented a thorough scientific critique of the HSP. The underlying data and medical records that had been produced in response to a Rule 45 subpoena in the MDL allowed juries to see that the study investigators had deviated from the protocol in ways to increase the number of exposed cases, with the obvious result of increasing the odds ratios reported. Juries were ultimately much more curious about evidence and testimony on reclassifications of exposure that drove up the odds ratios for PPA use, than they were about the performance of linear logistic regressions.

The HSP investigators were well aware of the potential for medication use to occur after the onset of stroke symptoms (headache), which may have sent a person to the medicine chest for an OTC cold remedy. Case 71-0039 was just such a case, as shown by the medical records and the HSP investigators’ initial classification of the case. On dubious grounds, however, the study reclassified the time of stroke onset to after the PPA-medication use, in what the investigators knew increased their chances of finding an association.

The reclassification of Case 20-0092 was even more egregious. The patient was originally diagnosed as having experienced a transient ischemic attack (TIA), after a CT of the head showed no bleed. Case 20-0092 was not a case. For the TIA, the patient was given heparin, an appropriate therapy but one that is known to cause bleeding. The following day, MRI of the head revealed a HS. The HSP classified Case 20-0092 as a case.

In Case 18-0025, the patient experienced a headache in the morning, and took a PPA-medication (Contac) for relief. The stroke was already underway when the Contac was taken, but the HSP reversed the order of events.

Case 62-0094 presented an interesting medical history that included an event no one in the HSP considered including in the interview protocol. In addition to a history of heavy smoking, alcohol, cocaine, heroin, and marijuana use, and a history of seizure disorder, Case 62-0094 suffered a traumatic head injury immediately before developing a SAH. Treating physicians ascribed the SAH to traumatic injury, but understandably there were no controls that were identified with similar head injury within the exposure period.

Both sides of the PPA litigation accused the other of “hacking at the A cell,” but juries seemed to understand that the hacking had started before the paper was published.

In a case involving two plaintiffs, in Los Angeles, where the jury heard the details of how the HSP cases were analyzed, the jury returned two defense verdicts. In post-trial motions, plaintiffs’ counsel challenged the defendant’s reliance upon underlying data in the HSP, which went behind the peer-reviewed publication, and which showed that the peer review failed to prevent serious errors.  In essence, the plaintiffs’ counsel claimed that the defense’s scrutiny of the underlying data and investigator misclassifications were themselves not “generally accepted” methods, and thus inadmissible. The trial court rejected the plaintiffs’ claim and their request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.

********

Yale gets — Yale gets a big black eye on this.”[64]

Epidemiologist Charles Hennekens, who had been a consultant to PPA-medication manufacturers, published a critique of the HSP study, in 2006. The Hennekens critique included many of the criticisms lodged by himself, as well as by epidemiologists Lewis Kuller, Noel Weiss, and Brian Strom, back in an October 2000 FDA meeting, before the HSP was published. Richard Clapp, Tellus Institute activist and expert witness for PPA plaintiffs, and Michael Williams, lawyer for PPA claimants, wrote a letter criticizing Hennekens.[65] David Michaels, an expert witness for plaintiffs in other chemical exposure cases, and a founder of SKAPP, which collaborated with the Tellus Institute on its anti-Daubert compaign, wrote a letter accusing Hennekens of “mercenary epidemiology,” for engaging in re-analysis of a published study. Michaels never complained about the litigation-inspired re-analyses put forward by plaintiffs’ witnesses in the Bendectin litigation.  Plaintiffs’ lawyers and their expert witnesses had much to gain by starting the litigation and trying to expand its reach. Defense lawyers and their expert witnesses effectively put themselves out of business by shutting it down.[66]


[1] Rachel Gorodetsky, “Phenylpropanolamine,” in Philip Wexler, ed., 7 Encyclopedia of Toxicology 559 (4th ed. 2024).

[2] Hershel Jick, Pamela Aselton, and Judith R. Hunter,  “Phenylpropanolamine and Cerebral Hemorrhage,” 323 Lancet 1017 (1984).

[3] Robert R. O’Neill & Stephen W. Van de Carr, “A Case-Control Study of Adrenergic  Decongestants and Hemorrhagic CVA Using a Medicaid Data Base” m.s. (1985).

[4] Ramond Lipicky, Center for Drug Evaluation and Research, PPA, Safety Summary at 29 (Aug. 9, 1900).

[5] Center for Drug Evaluation and Research, US Food and Drug Administration, “Epidemiologic Review of Phenylpropanolamine Safety Issues” (April 30, 1991).

[6] Ralph I. Horwitz, Lawrence M. Brass, Walter N. Kernan, Catherine M. Viscoli, “Phenylpropanolamine & Risk of Hemorrhagic Stroke – Final Report of the Hemorrhagic Stroke Project (May 10, 2000).

[7] Id. at 3, 26.

[8] Lois La Grenade & Parivash Nourjah, “Review of study protocol, final study report and raw data regarding the incidence of hemorrhagic stroke associated with the use of phenylopropanolamine,” Division of Drug Risk Assessment, Office of Post-Marketing Drug Risk Assessment (0PDRA) (Sept. 27, 2000). These authors concluded that the HSP report provided “compelling evidence of increased risk of hemorrhagic stroke in young people who use PPA-containing appetite suppressants. This finding, taken in association with evidence provided by spontaneous reports and case reports published in the

medical literature leads us to recommend that these products should no longer be available for over the counter use.”

[9] Among those who voiced criticisms of the design, methods, and interpretation of the HSP study were Noel Weiss, Lewis Kuller, Brian Strom, and Janet Daling. Many of the criticisms would prove to be understated in the light of post-publication review.

[10] Walter N. Kernan, Catherine M. Viscoli, Lawrence M. Brass, J.P. Broderick, T. Brott, and Edward Feldmann, “Phenylpropanolamine and the risk of hemorrhagic stroke,” 343 New Engl. J. Med. 1826 (2000) [cited as Kernan]

[11] Kernan, supra note 10, at 1826 (emphasis added).

[12] David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[13] Kernan, supra note 10, at 1827.

[14] Transcript of Meeting on Safety Issues of Phenylpropanolamine (PPA) in Over-the-Counter Drug Products 117 (Oct. 19, 2000).

[15][15] See, e.g., Huw Talfryn Oakley Davies, Iain Kinloch Crombie, and Manouche Tavakoli, “When can odds ratios mislead?” 316 Brit. Med. J. 989 (1998); Thomas F. Monaghan, Rahman, Christina W. Agudelo, Alan J. Wein, Jason M. Lazar, Karel Everaert, and Roger R. Dmochowski, “Foundational Statistical Principles in Medical Research: A Tutorial on Odds Ratios, Relative Risk, Absolute Risk, and Number Needed to Treat,” 18 Internat’l J. Envt’l Research & Public Health 5669 (2021).

[16] Kernan, supra note 10, at 1829, Table 2.

[17] Kernan, supra note 10, at 1827.

[18] Peter C. Austin & Lawrence J. Brunner, “Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses,” 23 Statist. Med. 1159 (2004).

[19] See, e.g., Douglas G. Altman & Patrick Royston, “The cost of dichotomising continuous variables,” 332 Brit. Med. J. 1080 (2006); Patrick Royston, Douglas G. Altman, and Willi Sauerbrei, “Dichotomizing continuous predictors in multiple regression: a bad idea,” 25 Stat. Med. 127 (2006). See also Robert C. MacCallum, Shaobo Zhang, Kristopher J. Preacher, and Derek D. Rucker, “On the Practice of Dichotomization of Quantitative Variables,” 7 Psychological Methods 19 (2002); David L. Streiner, “Breaking Up is Hard to Do: The Heartbreak of Dichotomizing Continuous Data,” 47 Can. J. Psychiatry 262 (2002); Henian Chen, Patricia Cohen, and Sophie Chen, “Biased odds ratios from dichotomization of age,” 26 Statist. Med. 3487 (2007); Carl van Walraven & Robert G. Hart, “Leave ‘em Alone – Why Continuous Variables Should Be Analyzed as Such,” 30 Neuroepidemiology 138 (2008); O. Naggara, J. Raymond, F. Guilbert, D. Roy, A. Weill, and Douglas G. Altman, “Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable,” 32 Am. J. Neuroradiol. 437 (Mar 2011); Neal V. Dawson & Robert Weiss, “Dichotomizing Continuous Variables in Statistical Analysis: A Practice to Avoid,” Med. Decision Making 225 (2012); Phillippa M Cumberland, Gabriela Czanner, Catey Bunce, Caroline J Doré, Nick Freemantle, and Marta García-Fiñana, “Ophthalmic statistics note: the perils of dichotomising continuous variables,” 98 Brit. J. Ophthalmol. 841 (2014).

[20] Valerii Fedorov, Frank Mannino1, and Rongmei Zhang, “Consequences of dichotomization,” 8 Pharmaceut. Statist. 50 (2009).

[21] Peter Peduzzi, John Concato, Elizabeth Kemper, Theodore R. Holford, and Alvan R. Feinstein, “A simulation study of the number of events per variable in logistic regression analysis?” 49 J. Clin. Epidem. 1373 (1996).

[22] HSP Final Report at 5.

[23] HSP Final Report at 26.

[24] Byron G. Stier & Charles H. Hennekens, “Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project: A Reappraisal in the Context of Science, the Food and Drug Administration, and the Law,” 16 Ann. Epidem. 49, 50 (2006) [cited as Stier & Hennekens].

[25] Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux, and Gordon H. Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004). 

[26] Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 1840 (2d ed. 2014) (47.5.8 Use of Composite Endpoints); Stuart J. Pocock, John J. V. McMurray, and Tim J. Collier, “Statistical Controversies in Reporting of Clinical Trials: Part 2 of a 4-Part Series on Statistics for Clinical Trials,” 66 J. Am. Coll. Cardiol. 2648, 2650-51 (2015) (“Interpret composite endpoints carefully.”); Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1595 (2005).

[27] Eric Lim, Adam Brown, Adel Helmy, Shafi Mussa & Douglas Altman, “Composite Outcomes in Cardiovascular Research: A Survey of Randomized Trials,” 149 Ann. Intern. Med. 612 (2008).

[28] See, e.g., Thomas Brott email to Walter Kernan (Sept. 10, 2000).

[29] Joseph P. Broderick, Catherine M. Viscoli, Thomas Brott, Walter N. Kernan, Lawrence M. Brass, Edward Feldmann, Lewis B. Morgenstern, Janet Lee Wilterdink, and Ralph I. Horwitz, “Major Risk Factors for Aneurysmal Subarachnoid Hemorrhage in the Young Are Modifiable,” 34 Stroke 1375 (2003).

[30] Id. at 1379.

[31] Id. at 1243.

[32] Id. at 1243.

[33] Id., citing Rothman Affidavit, ¶ 7; Kenneth J. Rothman, Epidemiology:  An Introduction at 117 (2002).

[34] HSP Final Report at 26 (‘‘HSP interviewers were not blinded to the case-control status of study subjects and some were aware of the study purpose’.”); Walter Kernan Dep. at 473-74, In re PPA Prods. Liab. Litig., MDL 1407 (W.D. Wash.) (Sept. 19, 2002).

[35] HSP Final Report at 26.

[36] Stier & Hennekens, note 24 supra, at 51.

[37] NEJM at 1831.

[38] See Christopher T. Robertson & Aaron S. Kesselheim, Blinding as a Solution to Bias – Strengthening Biomedical Science, Forensic Science, and the Law 53 (2016); Sandy Zabell, “The Virtues of Being Blind,” 29 Chance 32 (2016).

[39] Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965).

[40] See Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621 (2006); Linda A. Ash, Mary Ross Terry, and Daniel E. Clark, Matthew Bender Drug Product Liability § 15.86 PPA (2003).

[41] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230 (W.D. Wash. 2003).

[42] Id. at 1236 n.1

[43] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[44] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1241 (W.D. Wash. 2003).

[45] Id. (citing Reference Manual at 126-27, 358 n. 69). The edition of Manual was not identified by the court.

[46] Id. at n.9, citing deposition of Ralph Horowitz [sic].

[47] Id., citing Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1242-43 (E.D. Wash. 2002).

[48] Id. 1241, citing Kernan at 183.

[49] In re Phenylpropanolamine Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1239 (W.D. Wash. 2003) (citing 2 Modern Scientific Evidence: The Law and Science of Expert Testimony § 28-1.1, at 302-03 (David L. Faigman,  et al., eds., 1997) (“Epidemiologic studies have been well received by courts trying mass tort suits. Well-conducted studies are uniformly admitted. The widespread acceptance of epidemiology is based in large part on the belief that the general techniques are valid.”).

[50] Id. at 1240. The court cited the Reference Manual on Scientific Evidence 337 (2d ed. 2000), for this universal attribution of flaws to epidemiology studies (“It is important to recognize that most studies have flaws. Some flaws are inevitable given the limits of technology and resources.”) Of course, when technology and resources are limited, expert witnesses are permitted to say “I cannot say.” The PPA MDL court also cited another MDL court, which declared that “there is no such thing as a perfect epidemiological study.” In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 WL 230818, at *8-9 (E.D.Pa. May 5, 1997).

[51] Id. at 1236.

[52] Id. at 1239.

[53] Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemp. Problems 1, 19 (2009) (internal citations omitted). It may be telling that Haack has come to publish much of her analysis in law reviews. See Nathan Schachtman, “Misplaced Reliance On Peer Review to Separate Valid Science From NonsenseTortini (Aug. 14, 2011).

[54] Kernan, supra note 10, at 1831.

[55] In re Propanolamine Prods. Litig., MDL 1407, Order re Motion to Quash Subpoenas re Yale Study’s Hospital Records (W.D. Wash. Aug. 16, 2002). Two of the HSP investigators wrote an article, over a decade later, to complain about litigation efforts to obtain data from ongoing studies. They did not mention the PPA case. Walter N. Kernan, Catherine M. Viscoli, and Mathew C. Varughese, “Litigation Seeking Access to Data From Ongoing Clinical Trials: A Threat to Clinical Research,” 174 J. Am. Med. Ass’n Intern. Med. 1502 (2014).

[56] Barbara J. Rothstein, Francis E. McGovern, and Sarah Jael Dion, “A Model Mass Tort: The PPA Experience,” 54 Drake L. Rev. 621, 638 (2006).

[57] Anthony Roisman, “Daubert & Its Progeny – Finding & Selecting Experts – Direct & Cross-Examination,” ALI-ABA 2009. Roisman’s remarks about the role of Tellus Institute start just after minute 8, on the recording, available from the American Law Institute, and the author.

[58] See Daubert: The Most Influential Supreme Court Ruling You’ve Never Heard Of; A Publication of the Project on Scientific Knowledge and Public Policy, coordinated by the Tellus Institute” (2003).

[59] See, e.g., David Michaels, Doubt is Their Product: How Industry’s War on Science Threatens Your Health 267 (2008).

[60] See Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 189 (2004) (“This Article also benefited from discussions with colleagues in the project on Scientific Knowledge and Public Policy at Tellus Institute, in Boston, Massachusetts.”).

[61] See Barbara Rothstein, “Bringing Science to Law,” 95 Am. J. Pub. Health S1 (2005) (“The Coronado Conference brought scientists and judges together to consider these and other tensions that arise when science is introduced in courts.”).

[62] In re School Asbestos Litigation, 977 F.2d 764 (3d Cir. 1992). See Cathleen M. Devlin, “Disqualification of Federal Judges – Third Circuit Orders District Judge James McGirr Kelly to Disqualify Himself So As To Preserve ‘The Appearance of Justice’ Under 28 U.S.C. § 455 – In re School Asbestos Litigation (1992),” 38 Villanova L. Rev. 1219 (1993); Bruce A. Green, “May Judges Attend Privately Funded Educational Programs? Should Judicial Education Be Privatized?: Questions of Judicial Ethics and Policy,” 29 Fordham Urb. L.J. 941, 996-98 (2002).

[63] Alison Frankel, “A Line in the Sand,” The Am. Lawyer – Litigation (2005); Alison Frankel, “The Mass Tort Bonanza That Wasn’t,” The Am. Lawyer (Jan. 6, 2006).

[64] O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr), aff’d sub nom. O’Neill v. Novartis Consumer Health, Inc.,147 Cal. App. 4th 1388, 55 Cal. Rptr. 3d 551, 558-61 (2007).

[65] Richard Clapp & Michael L. Williams, Regarding ‘‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project,’’ 16 Ann. Epidem. 580 (2006).

[66] David Michaels, “Regarding ‘Phenylpropanolamine and Hemorrhagic Stroke in the Hemorrhagic Stroke Project’: Mercenary Epidemiology – Data Reanalysis and Reinterpretation for Sponsors with Financial Interest in the Outcome

16 Ann. Epidem. 583 (2006). Hennekens responded to these letters. Stier & Hennekens, note 24, supra.