QRPs in Science and in Court

Lay juries usually function well in assessing the relevance of an expert witness’s credentials, experience, command of the facts, likeability, physical demeanor, confidence, and ability to communicate. Lay juries can understand and respond to arguments about personal bias, which no doubt is why trial lawyers spend so much time and effort to emphasize the size of fees and consulting income, and the propensity to testify only for one side. For procedural and practical reasons, however, lay juries do not function very well in assessing the actual merits of scientific controversies. And with respect to methodological issues that underlie the merits, juries barely function at all. The legal system imposes no educational or experiential qualifications for jurors, and trials are hardly the occasion to teach jurors the methodology, skills, and information needed to resolve methodological issues that underlie a scientific dispute.

Scientific studies, reviews, and meta-analyses are virtually never directly admissible in evidence in courtrooms in the United States. As a result, juries do not have the opportunity to read and ponder the merits of these sources, and assess their strengths and weaknesses. The working assumption of our courts is that juries are not qualified to engage directly with the primary sources of scientific evidence, and so expert witnesses are called upon to deliver opinions based upon a scientific record not directly in evidence. In the litigation of scientific disputes, our courts thus rely upon the testimony of so-called expert witnesses in the form of opinions. Not only must juries, the usual trier of fact in our courts, assess the credibility of expert witnesses, but they must assess whether expert witnesses are accurately describing studies that they cannot read in their entirety.

The convoluted path by which science enters the courtroom supports the liberal and robust gatekeeping process outlined under Rules 702 and 703 of the Federal Rules of Evidence. The court, not the jury, must make a preliminary determination, under Rule 104, that the facts and data of a study are reasonably relied upon by an expert witness (Rule 703). And the court, not the jury, again under Rule 104, must determine that expert witnesses possess appropriate qualifications for relevant expertise, and that these witnesses have proffered opinions sufficiently supported by facts or data, based upon reliable principles and methods, and reliably applied to the facts of the case. (Rule 702). There is no constitutional right to bamboozle juries with inconclusive, biased, and confounded or crummy studies, or selective and incomplete assessments of the available facts and data. Back in the days of “easy admissibility,” opinions could be tested on cross-examination, but limited time and acumen of counsel, court, and juries cry out for meaningful scientific due process along the lines set out in Rules 702 and 703.

The evolutionary development of Rules 702 and 703 has promoted a salutary convergence between science and law. According to one historical overview of systematic reviews in science, the foundational period for such reviews (1970-1989) overlaps with the enactment of Rules 702 and 703, and the institutionalization of such reviews (1990-2000) coincides with the development of these Rules in a way that introduced some methodological rigor into scientific opinions that are admitted into evidence.[1]

The convergence between legal admissibility and scientific validity considerations has had the further result that scientific concerns over the quality and sufficiency of underlying data, over the validity of study design, analysis, reporting, and interpretation, and over the adequacy and validity of data synthesis, interpretation, and conclusions have become integral to the gatekeeping process. This convergence has the welcome potential to keep legal judgments more in line with best scientific evidence and practice.

The science-law convergence also means that courts must be apprised of, and take seriously, the problems of study reproducibility, and more broadly, the problems raised by questionable research practices (QRPs), or what might be called the patho-epistemology of science. The development, in the 1970s, and the subsequent evolution, of the systematic review represented the scientific community’s rejection of the old-school narrative reviews that selected a few of all studies to support a pre-existing conclusion. Similarly, the scientific community’s embarrassment, in the 1980s and 1990s, over the irreproducibility of study results, has in this century grown into an existential crisis over study reproducibility in the biomedical sciences.

In 2005, John Ioannidis published an article that brought the concern over “reproducibility” of scientific findings in bio-medicine to an ebullient boil.[2] Ioannidis pointed to several factors, which alone or in combination rendered most published medical findings likely false. Among the publication practices responsible for this unacceptably high error rate, Ioannidis identified the use of small sample sizes, data-dredging and p-hacking techniques, poor or inadequate statistical analysis, in the context of undue flexibility in research design, conflicts of interest, motivated reasoning, fads, and prejudices, and pressure to publish “positive” results.  The results, often with small putative effect sizes, across an inadequate number of studies, are then hyped by lay and technical media, as well as the public relations offices of universities and advocacy groups, only to be further misused by advocates, and further distorted to serve the goals of policy wonks. Social media then reduces all the nuances of a scientific study to an insipid meme.

Ioannidis’ critique resonated with lawyers. We who practice in health effects litigation are no strangers to dubious research methods, lack of accountability, herd-like behavior, and a culture of generating positive results, often out of political or economic sympathies. Although we must prepare for confronting dodgy methods in front of jury, asking for scientific due process that intervenes and decides the methodological issues with well-reasoned, written opinions in advance of trial does not seem like too much.

The sense that we are awash in false-positive studies was heightened by subsequent papers. In 2011, Uri Simonsohn and others showed that by using simulations of various combinations of QRPs in psychological science, researchers could attain a 61% false-positive rate for research outcomes.[3] The following year saw scientists at Amgen attempt replication of 53 important studies in hematology and oncology. They succeeded in replicated only six.[4] Also in 2012, Dr. Janet Woodcock, director of the Center for Drug Evaluation and Research at the Food and Drug Administration, “estimated that as much as 75 per cent of published biomarker associations are not replicable.”[5] In 2016, the journal Nature reported that over 70% of scientists who responded to a survey had unsuccessfully attempted to replicate another scientist’s experiments, and more than half failed to replicate their own work.[6] Of the respondents, 90% agreed that there was a replication problem. A majority of the 90% believed that the problem was significant.

The scientific community reacted to the perceived replication crisis in a variety of ways, from conceptual clarification of the very notion of reproducibility,[7] to identification of improper uses and interpretations of key statistical concepts,[8] to guidelines for improved conduct and reporting of studies.[9]

Entire books dedicated to identifying the sources of, and the correctives for, undue researcher flexibility in the design, conduct, and analysis of studies, have been published.[10] In some ways, the Rule 702 and 703 case law is like the collected works of the Berenstain Bears, on how not to do studies.

The consequences of the replication crisis are real and serious. Badly conducted and interpreted science leads to research wastage,[11] loss of confidence in scientific expertise,[12] contemptible legal judgments, and distortion of public policy.

The proposed correctives to QRPs deserve the careful study of lawyers and judges who have a role in health effects litigation.[13] Whether as the proponent of an expert witness, or the challenger, several of the recurrent proposals, such as the call for greater data sharing and pre-registration of protocols and statistical analysis plans,[14] have real-world litigation salience. In many instances, they can and should direct lawyers’ efforts at discovery and challenging of the relied upon scientific studies in litigation.


[1] Quan Nha Hong & Pierre Pluye, “Systematic Reviews: A Brief Historical Overview,” 34 Education for Information 261 (2018); Mike Clarke & Iain Chalmers, “Reflections on the history of systematic reviews,” 23 BMJ Evidence-Based Medicine 122 (2018); Cynthia Farquhar & Jane Marjoribanks, “A short history of systematic reviews,” 126 Brit. J. Obstetrics & Gynaecology 961 (2019); Edward Purssell & Niall McCrae, “A Brief History of the Systematic Review,” chap. 2, in Edward Purssell & Niall McCrae, How to Perform a Systematic Literature Review: A Guide for Healthcare Researchers, Practitioners and Students 5 (2020).

[2] John P. A. Ioannidis “Why Most Published Research Findings Are False,” 1 PLoS Med 8 (2005).

[3] Joseph P. Simmons, Leif D. Nelson, and Uri Simonsohn, “False-Positive Psychology: UndisclosedFlexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” 22 Psychological Sci. 1359 (2011).

[4] C. Glenn Begley and Lee M. Ellis, “Drug development: Raise standards for preclinical cancer research,” 483 Nature 531 (2012).

[5] Edward R. Dougherty, “Biomarker Development: Prudence, risk, and reproducibility,” 34 Bioessays 277, 279 (2012); Turna Ray, “FDA’s Woodcock says personalized drug development entering ‘long slog’ phase,” Pharmacogenomics Reporter (Oct. 26, 2011).

[6] Monya Baker, “Is there a reproducibility crisis,” 533 Nature 452 (2016).

[7] Steven N. Goodman, Daniele Fanelli, and John P. A. Ioannidis, “What does research reproducibility mean?,” 8 Science Translational Medicine 341 (2016); Felipe Romero, “Philosophy of science and the replicability crisis,” 14 Philosophy Compass e12633 (2019); Fiona Fidler & John Wilcox, “Reproducibility of Scientific Results,” Stanford Encyclopedia of Philosophy (2018), available at https://plato.stanford.edu/entries/scientific-reproducibility/.

[8] Andrew Gelman and Eric Loken, “The Statistical Crisis in Science,” 102 Am. Scientist 460 (2014); Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 1084 (2021).

[9] The International Society for Pharmacoepidemiology issued its first Guidelines for Good Pharmacoepidemiology Practices in 1996. The most recent revision, the third, was issued in June 2015. See “The ISPE Guidelines for Good Pharmacoepidemiology Practices (GPP),” available at https://www.pharmacoepi.org/resources/policies/guidelines-08027/. See also Erik von Elm, Douglas G. Altman, Matthias Egger, Stuart J. Pocock, Peter C. Gøtzsche, and Jan P. Vandenbroucke, for the STROBE Initiative, “The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement Guidelines for Reporting Observational Studies,” 18 Epidem. 800 (2007); Jan P. Vandenbroucke, Erik von Elm, Douglas G. Altman, Peter C. Gøtzsche, Cynthia D. Mulrow, Stuart J. Pocock, Charles Poole, James J. Schlesselman, and Matthias Egger, for the STROBE initiative, “Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration,” 147 Ann. Intern. Med. W-163 (2007); Shah Ebrahim & Mike Clarke, “STROBE: new standards for reporting observational epidemiology, a chance to improve,” 36 Internat’l J. Epidem. 946 (2007); Matthias Egger, Douglas G. Altman, and Jan P Vandenbroucke of the STROBE group, “Commentary: Strengthening the reporting of observational epidemiology—the STROBE statement,” 36 Internat’l J. Epidem. 948 (2007).

[10] See, e.g., Lee J. Jussim, Jon A. Krosnick, and Sean T. Stevens, eds., Research Integrity: Best Practices for the Social and Behavioral Sciences (2022); Joel Faintuch & Salomão Faintuch, eds., Integrity of Scientific Research: Fraud, Misconduct and Fake News in the Academic, Medical and Social Environment (2022); William O’Donohue, Akihiko Masuda & Scott Lilienfeld, eds., Avoiding Questionable Research Practices in Applied Psychology (2022); Klaas Sijtsma, Never Waste a Good Crisis: Lessons Learned from Data Fraud and Questionable Research Practices (2023).

[11] See, e.g., Iain Chalmers, Michael B Bracken, Ben Djulbegovic, Silvio Garattini, Jonathan Grant, A Metin Gülmezoglu, David W Howells, John P A Ioannidis, and Sandy Oliver, “How to increase value and reduce waste when research priorities are set,” 383 Lancet 156 (2014); John P A Ioannidis, Sander Greenland, Mark A Hlatky, Muin J Khoury, Malcolm R Macleod, David Moher, Kenneth F Schulz, and Robert Tibshirani, “Increasing value and reducing waste in research design, conduct, and analysis,” 383 Lancet 166 (2014).

[12] See, e.g., Friederike Hendriks, Dorothe Kienhues, and Rainer Bromme, “Replication crisis = trust crisis? The effect of successful vs failed replications on laypeople’s trust in researchers and research,” 29 Public Understanding Sci. 270 (2020).

[13] R. Barker Bausell, The Problem with Science: The Reproducibility Crisis and What to Do About It (2021).

[14] See, e.g., Brian A. Noseka, Charles R. Ebersole, Alexander C. DeHavena, and David T. Mellora, “The preregistration revolution,” 115 Proc. Nat’l Acad. Soc. 2600 (2018); Michael B. Bracken, “Preregistration of Epidemiology Protocols: A Commentary in Support,” 22 Epidemiology 135 (2011); Timothy L. Lash & Jan P. Vandenbroucke, “Should Preregistration of Epidemiologic Study Protocols Become Compulsory? Reflections and a Counterproposal,” 23 Epidemiology 184 (2012).