Medications are rigorously tested for safety and efficacy in clinical trials before approval by regulatory agencies such as the U.S. Food & Drug Administration (FDA) or the European Medicines Agency (EMA). The approval process, however, contemplates that more data about safety and efficacy will emerge from the use of approved medications in pharmacoepidemiologic studies conducted outside of clinical trials. Litigation of safety outcomes rarely arises from claims based upon the pivotal clinical trials that were conducted for regulatory approval and licensing. The typical courtroom scenario is that a safety outcome is called into question by pharmacoepidemiologic studies that purport to find associations or causality between the use of a specific medication and the claimed harm.
The International Society for Pharmacoepidemiology (ISPE), established in 1989, describes itself as an international professional organization intent on advancing health through pharmacoepidemiology, and related areas of pharmacovigilance. The ISPE website defines pharmacoepidemiology as
“the science that applies epidemiologic approaches to studying the use, effectiveness, value and safety of pharmaceuticals.”
The ISPE conceptualizes pharmacoepidemiology as “real-world” evidence, in contrast to randomized clinical trials:
“Randomized controlled trials (RCTs) have served and will continue to serve as the major evidentiary standard for regulatory approvals of new molecular entities and other health technology. Nonetheless, RWE derived from well-designed studies, with application of rigorous epidemiologic methods, combined with judicious interpretation, can offer robust evidence regarding safety and effectiveness. Such evidence contributes to the development, approval, and post-marketing evaluation of medicines and other health technology. It enables patient, clinician, payer, and regulatory decision-making when a traditional RCT is not feasible or not appropriate.”
ISPE Position on Real-World Evidence (Feb. 12, 2020) (emphasis in original).
The ISPE publishes an official journal, Pharmacoepidemiology and Drug Safety, and sponsors conferences and seminars, all of which are watched by lawyers pursuing and defending drug and device health safety claims. The endorsement by the ISPE of the American Statistical Association’s 2016 statement on p-values is thus of interest not only to statisticians, but to lawyers and claimants involved in drug safety litigation.
The ISPE, through its board of directors, formally endorsed the ASA 2016 p-value statement on April 1, 2017 (no fooling) in a statement that can be found at its website:
The International Society for Pharmacoepidemiology, ISPE, formally endorses the ASA statement on the misuse of p-values and accepts it as an important step forward in the pursuit of reasonable and appropriate interpretation of data.
On March 7, 2016, the American Statistical Association (ASA) issued a policy statement that warned the scientific community about the use P-values and statistical significance for interpretation of reported associations. The policy statement was accompanied by an introduction that characterized the reliance on significance testing as a vicious cycle of teaching significance testing because it was expected, and using it because that was what was taught. The statement and many accompanying commentaries illustrated that p-values were commonly misinterpreted to imply conclusions that they cannot imply. Most notably, “p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.” Also, “a p-value does not provide a good measure of evidence regarding a model or hypothesis.” Furthermore, reliance on p-values for data
interpretation has exacerbated the replication problem of scientific work, as replication of a finding is often confused with replicating the statistical significance of a finding, on the erroneous assumption that replication should lead to studies getting similar p-values.
This official statement from the ASA has ramifications for a broad range of disciplines, including pharmacoepidemiology, where use of significance testing and misinterpretation of data based on P-values is still common. ISPE has already adopted a similar stance and incorporated it into our GPP [ref] guidelines. The ASA statement, however, carries weight on this topic that other organizations cannot, and will inevitably lead to changes in journals and classrooms.
There are points of interpretation of the ASA Statement, which can be discussed and debated. What is clear, however, is that the ASA never urged the abandonment of p-values or even of statistical significance. The Statement contained six principles, some of which did nothing other than to attempt to correct prevalent misunderstandings of p-values. The third principle stated that “[s]cientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.” (emphasis added).
This principle, as stated, thus hardly advocated for the abandonment of a threshold in testing; rather it made the unexceptional point that the ultimate scientific conclusion (say about causality) required more assessment than only determining whether a p-value passed a specified threshold.
Presumably, the ISPE’s endorsement of the ASA’s 2016 Statement embraces all six of the articulated principles, including the ASA’s fourth principle:
“4. Proper inference requires full reporting and transparency
P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry-picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and “p-hacking,” leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise: Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”
The ISPE’s endorsement of the ASA 2016 Statement references the ISPE’s own
“Guidelines for Good Pharmacoepidemiology Practices (GPP),” which were promulgated initially in 1996, and revised as recently as June 2015. Good practices, as of 2015, provided that:
“Interpretation of statistical measures, including confidence intervals, should be tempered with appropriate judgment and acknowledgements of potential sources of error and limitations of the analysis, and should never be taken as the sole or rigid basis for concluding that there is or is not a relation between an exposure and outcome. Sensitivity analyses should be conducted to examine the effect of varying potentially critical assumptions of the analysis.”
All well and good, but this “good practices” statement might be taken as a bit anemic, given that it contains no mention of, or caution against, unqualified or unadjusted confidence intervals or p-values that come from multiple testing or comparisons. The ISPE endorsement of the ASA Statement now expands upon the ISPE’s good practices to include the avoidance of multiplicity and the disclosure of the full extent of analyses conducted in a study.
What happens in the “real world” of publishing, outside the board room?
Last month, the ISPE conducted its (virtual) 36th International Conference on Pharmacoepidemiology & Therapeutic Risk Management. The abstracts and poster presentations from this Conference were published last week as a Special Issue of the ISPE journal. I spot checked the journal contents to see how well the presentations lived up to the ISPE’s statistical aspirations.
One poster presentation addressed statin use and skin cancer risk in a French prospective cohort.[1] The authors described their cohort of French women, who were 40 to 65 years old, in 1990, and were followed forward. Exposure to statin medications was assessed from 2004 through 2014. The analysis included outcomes of any skin cancer, melanoma, basal-cell carcinoma (BCC), and squamous-call carcinoma (SCC), among 66,916 women. Here is how the authors describe their findings:
There was no association between ever use of statins and skin cancer risk: the HRs were 0.96 (95% CI = 0.87-1.05) for overall skin cancer, 1.18 (95% CI = 0.96-1.47) for melanoma, 0.89 (95% CI = 0.79-1.01) for BCC, and 0.90 (95% CI = 0.67-1.21) for SCC. Associations did not differ by statin molecule nor by duration or dose of use. However, women who started to use statins before age 60 were at increased risk of BCC (HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use).
To be fair, this was a poster presentation, but this short description of findings makes clear that the investigators looked at least at the following subgroups:
Exposure subgroups:
- specific statin drug
- duration of use
- dosage
- age strata
and
Outcome subgroups:
- melanoma
- basal-cell carcinoma
- squamous-cell carcinoma
The reader is not told how many specific statins, how many duration groups, dosage groups, and age strata were involved in the exposure analysis. My estimate is that the exposure subgroups were likely in excess of 100. With three disease outcome subgroups, the total subgroup analyses thus likely exceeded 300. The authors did not provide any information about the full extent of their analyses.
Here is how the authors reported their conclusion:
“These findings of increased BCC risk in statin users before age 60 deserve further investigations.”
Now, the authors did not use the phrase “statistically significant,” but it is clear that they have characterized a finding of “increased BCC risk in statin users before age 60,” and in no other subgroup, and they have done so based upon a reported nominal “HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use.” It is also clear that the authors have made no allowance, adjustment, modification, or qualification, for the wild multiplicity arising from their estimated 300 or so subgroups. Instead, they made an unqualified statement about “increased BCC risk,” and they offered an opinion about the warrant for further studies.
Endorsement of good statistical practices is a welcome professional organizational activity, but it is rather meaningless unless the professional societies begin to implement the good practices in their article selection, editing, and publishing activities.
[1] Marie Al Rahmoun, Yahya Mahamat-Saleh, Iris Cervenka, Gianluca Severi, Marie-Christine Boutron-Ruault, Marina Kvaskoff, and Agnès Fournier, “Statin use and skin cancer risk: A French prospective cohort study,” 29 Pharmacoepidemiol. & Drug Safety s645 (2020).