Adverse event reporting is a recognized, important component of pharmacovigilence. Regulatory agencies around the world further acknowledge that an increased rate of reporting of a specific adverse event may signal the possible existence of an association. In the last two decades, pharmacoepidemiologists have developed techniques for mining databases of adverse event reports for evidence of a disproportionate level of reporting for a particular medication – adverse event pair. Such studies can help identify “signals” of potential issues for further study with properly controlled epidemiologic studies.[1]
Most sane and sensible epidemiologists recognize that the low quality, inconsistences, and biases of the data in adverse event reporting databases render studies of disproportionate reporting “poor surrogates for controlled epidemiologic studies.” In the face of incomplete and inconsistent reporting, so-called disproportionality analyses (“DPA”) assume that incomplete reporting will be constant for all events for a specific medication. Regulatory attention, product labeling, lawyer advertising and client recruitment, social media and publicity, and time since launch are all known to affect reporting rates, and to ensure that reporting rates for some event types for a specific medication will be higher. Thus, the DPA assumptions are virtually always false and unverifiable.[2]
DPAs are non-analytical epidemiologic studies that cannot rise in quality or probativeness above the level of the anecdote upon which they are based. DPAs may generate signals or hypotheses, but they cannot test hypotheses of causality. Although simple in concept, DPAs involve some complicated computations that embue them with an aura of “proofiness.” As would-be studies that lack probativeness for causality, they are thus ideal tools for the lawsuit industry to support litigation campaigns against drugs and medical devices. Indeed, if a statistical technique is difficult to understand but relatively easy to perform and even easier to pass off to unsuspecting courts and juries, then you can count on its metastatic use in litigation. The DPA has become one of the favorite tools of the lawsuit industry’s statisticians. This litigation use, however, cannot obscure the simple fact that the relative reporting risk provided by a DPA can never rise to the level of a relative risk.
In one case in which a Parkinson’s disease patient claimed that his compulsive gambling was caused by his use of the drug Requip, the plaintiff’s expert witness attempted to invoke a DPA in support of his causal claim. In granting a Rule 702 motion to exclude the expert witnesses who relied upon a DPA, the trial judge rejected the probativeness of DPAs, based upon the FDA’s rejection of such analyses for anything other than signal detection.[3]
In the Accutane litigation, statistician David Madigan attempted to support his fatally weak causation opinion with a DPA for Crohn’s disease and Accutane adverse event reports. According to the New Jersey Supreme Court, Madigan claimed that his DPA showed “striking signal of disproportionality” indicative of a “strong association” between Accutane use and Crohn’s disease.[4] With the benefit of a thorough review by the trial court, the New Jersey Supreme Court found other indicia of unreliability in Madigan’s opinions, such that it was not fooled by Madigan’s shenanigans. In any event, no signal of disproportionality could ever show an association between medication use and a disease; at best the DPA can show only an association between reporting of the medication use and the outcome of interest.
In litigation over Mirena and intracranial hypertension, one of the lawsuit industry’s regulars, Mayhar Etminan, published a DPA based upon the FDA’s Adverse Event Reporting System, which purported to find an increased reporting odds ratio.[5] Unthinkingly, the plaintiffs’ other testifying expert witnesses relied upon Etminan’s study. When a defense expert witness pointed out that Etminan had failed to adjust for age and gender in his multivariate analysis,[6] he repudiated his findings.[7] Remarkably, when Etminan published his original DPA in 2015, he declared that he had no conflicts, but when he published his repudiation, he disclosed that he “has been an expert witness in Mirena litigation in the past but is no longer part of the litigation.” The Etminan kerfuffle helped scuttle the plaintiffs’ assault on Mirena.[8]
DPAs have, on occasion, bamboozled federal judges into treating them as analytical epidemiology that can support causal claims. For instance, misrepresentations or misunderstandings of what DPAs can and cannot do carried the day in a Rule 702 contest on the admissibility of opinion testimony by statistician Rebecca Betensky. In multidistrict litigation over the safety of inferior vena cava (“IVC”) filters, plaintiffs’ counsel retained Rebecca Betensky, to prepare a DPA of adverse events reported for the defendants’ retrievable filters. The MDL judge’s description of Betensky’s opinion demonstrates that her DPA was either misrepresented or misunderstood:
“In this MDL, Dr. Betensky opines generally that there is a higher risk of adverse events for Bard’s retrievable IVC filters than for its permanent SNF.”[9]
The court clearly took Betensky to be opining about risk and not the risk of reporting. The court’s opinion goes on to describe Betensky’s calculation of a “reporting risk ratio,” but found that she could testify that the retrievable IVC filters increased the risk of the claimed adverse events, and not merely that there was an increase in reporting risk ratios.
Betensky acknowledged that the reporting risk ratios were “imperfect estimates of the actual risk ratios,”[10] but nevertheless dismissed all caveats about the inability of DPAs to assess actual increased risk. The trial court quoted Dr. Betensky’s attempt to infuse analytical rigor into a data mining exercise:
“[A]dverse events are generally considered to be underreported to the databases, and potentially differentially by severity of adverse event and by drug or medical device. . . . It is important to recognize that underreporting in and of itself is not problematic. Rather, differential underreporting of the higher risk device is what leads to bias. And even if there was differential underreporting of the higher risk device, given the variation in reporting relative risks across adverse events, the differential reporting would have had to have been highly variable across adverse events. This does not seem plausible given the severity of the adverse events considered. Given the magnitude of the RRR’s [relative reporting ratios], and their variability across adverse events, it seems implausible that differential underreporting by filter could fully explain the deviation of the observed RRR’s from 1.”[11]
Of course, this explanation fails to account for differential over-reporting for the newer, but less risky or equally risk device. Betensky dismissed notoriety bias as having caused an increase in reporting adverse events because her DPA ended with 2014, before the FDA had issued a warning letter. The lawsuit industry, however, was on the attack against IVC filers, years before 2014.[12] Similarly, Betensky dismissed consideration of the Weber effect, but her analysis apparently failed to acknowledge that notoriety and Weber effect are just two of many possible biases in DPAs.
In the face of her credentials, the MDL trial judge retreated to the usual chestnuts that are served up when a Rule 702 challenge is denied. Judge Campbell thus observed that “[i]t is not the job of the court to insure that the evidence heard by the jury is error-free, but to insure that it is sufficiently reliable to be considered by the jury.”[13] The trial judge professed a need to be “be careful not to conflate questions of admissibility of expert testimony with the weight appropriately to be accorded to such testimony by the fact finder.”[14] The court denied the claim that Betensky had engaged in an ipse dixit, by engaging in its own ipse dixit. Judge Campbell found that Betensky had explained her assumptions, had acknowledged shortcomings, and had engaged in various sensitivity tests of the validity of her DPA; and so he concluded that Betensky did not present “a case where ‘there is simply too great an analytical gap between the data and the opinion proffered’.”[15]
By closing off inquiry into the limits of the DPA methodology, Judge Campbell managed to stumble into a huge analytical gap he blindly ignored, or was unaware of. Even the best DPAs cannot substitute for analytical epidemiology in a scientific methodology of determining causation. The ipse dixit becomes apparent when we consider that the MDL gatekeeping opinion on Rebecca Betensky fails to mention the extensive body of regulatory and scientific opinion about the distinct methodologic limitations of DPA. The U.S. FDA’s official guidance on good pharmacovigilance practices, for example, instructs us that
“[d]ata mining is not a tool for establishing causal attributions between products and adverse events.”[16]
The FDA specifically cautions that the signals detected by data mining techniques should be acknowledged to be “inherently exploratory or hypothesis generating.”[17] The agency exercises caution when making its own comparisons of adverse events between products in the same class because of the low quality of the data themselves, and uncontrollable and unpredictable biases in how the data are collected.[18] Because of the uncertainties in DPAs,
“FDA suggests that a comparison of two or more reporting rates be viewed with extreme caution and generally considered exploratory or hypothesis-generating. Reporting rates can by no means be considered incidence rates, for either absolute or comparative purposes.”[19]
The European Medicines Agency offers similar advice and caution:
“Therefore, the concept of SDR [Signal of Disproportionate Reporting] is applied in this guideline to describe a ‘statistical signal’ that has originated from a statistical method. The underlying principle of this method is that a drug–event pair is reported more often than expected relative to an independence model, based on the frequency of ICSRs on the reported drug and the frequency of ICSRs of a specific adverse event. This statistical association does not imply any kind of causal relationship between the administration of the drug and the occurrence of the adverse event.”[20]
The current version of perhaps the leading textbook on pharmacoepidemiology is completely in accord with the above regulatory guidances. In addition to emphasizing the limitations on data quality from adverse event reporting, and the inability to interpret temporal trends, the textbook authors clearly characterize DPAs as generating signals, and unable to serve as hypothesis tests:
“a signal of disproportionality is a measure of a statistical association within a collection of AE/ADR reports (rather than in a population), and it is not a measure of causality. In this regard, it is important to underscore that the use of data mining is for signal detection – that is, for hypothesis generation – and that further work is needed to evaluate the signal.”[21]
Reporting ratios are not, and cannot serve as, measures of incidence or prevalence, because adverse event databases do not capture all the events of interest, and so these ratios “it must be interpreted cautiously.”[22] The authors further emphasize that “well-designed pharmacoepidemiology or clinical studies are needed to assess the signal.”[23]
The authors of this chapter are all scientists and officials at the FDA’s Center for Drug Evaluation and Research, and the World Health Organization. Although they properly disclaimed to have been writing for their agencies, their agencies have independently embraced their concepts in other agency publications. The consensus view of the hypothesis generating nature of DPAs can easily be seen in surveying the relevant literature.[24] Passing off a DPA as a study that supports causal inference is not a mere matter of “weight,” or excluding any opinion that has some potential for error. The misuse of Betensky’s DPA is a methodological error that goes to the heart of what Congress intended to be screened and excluded by Rule 702.
[1] Sean Hennessy, “Disproportionality analyses of spontaneous reports,” 13 Pharmacoepidemiology & Drug Safety 503, 503 (2004).
[2] Id. See, e.g., Patrick Waller & Mira Harrison-Woolrych, An Introduction to Pharmacovigilance 68-69 (2nd ed. 2017) (noting the example of the WHO’s DPA that found a 10-fold reporting rate increase for statins and ALS, which reporting association turned out to be spurious).
[3] Wells v. SmithKline Beecham Corp., 2009 WL 564303, at *12 (W.D. Tex. 2009) (citing and quoting from the FDA’s Guidance for Industry: Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment (2005)), aff’d, 601 F.3d 375 (5th Cir. 2010). But see In re Abilify (Aripiprazole) Prods. Liab. Litig., 299 F.Supp. 3d 1291. 1324 (N.D. Fla. 2018) (noting that the finding of a DPA that compared Abilify with other anti-psychotics helped to show that a traditional epidemiologic study was not confounded by the indication for depressive symptoms).
[4] In re Accutane Litig., 234 N.J. 340, 191 A.3d 560, 574 (2018).
[5] See Mahyar Etminan, Hao Luo, and Paul Gustafson, et al., “Risk of intracranial hypertension with intrauterine levonorgestrel,” 6 Therapeutic Advances in Drug Safety 110 (2015).
[6] Deborah Friedman, “Risk of intracranial hypertension with intrauterine levonorgestrel,” 7 Therapeutic Advances in Drug Safety 23 (2016).
[7] Mahyar Etminan, “Revised disproportionality analysis of Mirena and benign intracranial hypertension,” 8 Therapeutic Advances in Drug Safety 299 (2017).
[8] In re Mirena IUS Levonorgestrel-Relaated Prods. Liab. Litig. (No. II), 387 F. Supp. 3d 323, 331 (S.D.N.Y. 2019) (Engelmayer, J.).
[9] In re Bard IVC Filters Prods. Liab. Litig., No. MDL 15-02641-PHX DGC, Order Denying Motion to Exclude Rebecca Betensky at 2 (D. Ariz. Jan. 22, 2018) (Campbell, J.) (emphasis added) [Order]
[10] Id. at 4.
[11] Id.
[12] See Matt Fair, “C.R. Bard’s Faulty Filters Pose Health Risks, Suit Says,” Law360 (Aug. 10, 2012); See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).
[13] Order at 6, quoting from Southwire Co. v. J.P. Morgan Chase & Co., 528 F. Supp. 2d 908, 928 (W.D. Wis. 2007).
[14] Id., citing In re Trasylol Prods. Liab. Litig., No. 08-MD-01928, 2010 WL 1489793, at *7 (S.D. Fla. Feb. 24, 2010).
[15] Id., citing and quoting from In re Trasylol Prods. Liab. Litig., No. 08-MD-01928, 2010 WL 1489793, at *7 (S.D. Fla. Feb. 24, 2010) ((quoting General Electric v. Joiner, 522 U.S. 136, 146 (1997)).
[16] FDA, “Good Pharmacovigilance Practices and Pharmacoepidemiologic Assessment Guidance for Industry” at 8 (2005) (emphasis added).
[17] Id. at 9.
[18] Id.
[19] Id. at 11 (emphasis added).
[20] EUDRAVigilance Expert Working Group, European Medicines Agency, “Guideline on the Use of Statistical Signal Detection Methods in the EUDRAVigilance Data Analysis System,” at 3 (2006) (emphasis added).
[21] Gerald J. Dal Pan, Marie Lindquist & Kate Gelperin, “Postmarketing Spontaneous Pharmacovigilance Reporting Systems,” in Brian L. Strom & Stephen E. Kimmel and Sean Hennessy, Pharmacoepidemiology at 185 (6th ed. 2020) (emphasis added).
[22] Id. at 187.
[23] Id. See also Andrew Bate, Gianluca Trifirò, Paul Avillach & Stephen J.W. Evans, “Data Mining and Other Informatics Approaches to Pharmacoepidemiology,” chap. 27, in Brian L. Strom & Stephen E. Kimmel and Sean Hennessy, Pharmacoepidemiology at 685-88 (6th ed. 2020) (acknowledging the importance of DPAs for detecting signals that must then be tested with analytical epidemiology) (authors from industry, Pfizer, and academia, including NYU School of Medicine, Harvard Medical School, and London School of Hygiene and Tropical Medicine).
[24] See, e.g., Patrick Waller & Mira Harrison-Woolrych, An Introduction to Pharmacovigilance 61 (2nd ed. 2017) (“[A]lthough the numbers are calculated in a similar way to relative risks, they do not represent a meaningful calculation of risk.” *** “Indicators of disproportionality are measures of association and even quite extreme results may not be causal.”); Ronald D. Mann & Elizabeth B. Andrews, Pharmacovigilance 240 (2d ed. 2007) (“Importantly, data mining cannot prove or refute causal associations between drugs and events. Data mining simply identifies disproportionality of drugevent reporting patterns in databases. The absence of a signal does not rule out a safety problem. Similarly, the presence of a signal is not a proof of a causal relationship between a drug and an adverse event.”); Patrick Waller, An Introduction to Pharmacovigilance 49 (2010) (“[A]lthough the numbers are calculated in a similar way to relative risks, they do not represent a meaningful calculation of risk. Whilst it is true that the greater the degree of disproportionality, the more reason there is to look further, the only real utility of the numbers is to decide whether or not there are more cases than might reasonably have been expected. Indicators of disproportionality are measures of association and even quite extreme results may not be causal.”); Sidney N. Kahn, “You’ve found a safety signal–now what? Regulatory implications of industry signal detection activities,” 30 Drug Safety 615 (2007).