Misplaced Reliance On Peer Review to Separate Valid Science From Nonsense

A recent editorial in the Annals of Occupational Hygiene is a poignant reminder of how oversold peer review is in the context of expert witness judicial gatekeeping.  Editor Trevor Ogden urges some cautionary suggestions:

“1. Papers that have been published after proper peer review are more likely to be generally right than ones that have not.

2. However, a single study is very unlikely to take everything into account, and peer review is a very fallible process, and it is very unwise to rely on just one paper.

3. The question should be asked, has any published correspondence dealt with these paper, and what do other papers that cite them say about them?

4. Correspondence will legitimately give a point of view and not consider alternative explanations in the way a paper should, so peer review does not necessarily validate the views expressed.”

Trevor Ogden, “Lawyers Beware! The Scientific Process, Peer Review, and the Use of Papers in Evidence,” 55 Ann. Occup. Hyg. 689, 691 (2011).

Ogden’s conclusions, however, are misleading.  For instance, he suggests that peer-reviewed papers are better than non-peer reviewed papers, but by how much?  What is the empirical evidence for Ogden’s assertion?  In his editorial, Ogden gives an anecdote of a scientific report submitted to a political body, and comments that this report would not have survived peer review.   But an anecdote is not a datum.  What’s worse is that the paper that is rejected by peer review at Ogden’s journal will show up in another publication, eventually.  Courts make little distinction between and among journals for purposes of rating the value of peer review.

Of course it is unwise, and perhaps scientifically unsound, as Ogden points out, to rely upon just one paper, but the legal process permits it.  Worse yet,  litigants, either plaintiff or defendant, are often allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science. “[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.” Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).

As for letters to the editor, sure, courts and litigants should pay attention to them, but as Ogden notes, these writings are themselves not peer reviewed, or not peer reviewed with very much analytical rigor.  The editing of letters raises additional concerns of imperious editors who silence some points of view to the benefit of others. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Furthermore, many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion, because in most journals the authors typically have the last word in the form of reply, which often is self-serving and misleading, with immunity from further criticism.

Ogden describes and details the limitations of peer review in some detail, but he misses the significance of how these limitations play out in the legal arena.

Limitations and Failures of Peer Review

For instance, Ogden acknowledges that peer review fails to remove important errors from published articles. Here he does provide empirical evidence.  S. Schroter, N. Black, S. Evans, et al., “What errors do peer reviewers detect, and does training improve their ability to detect them?” 101 J. Royal Soc’y  Med. 507 (2008) (describing an experiment in which manuscripts were seeded with known statistical errors (9 major and 5 minor) and sent to 600 reviewers; each reviewer missed, on average, over 6 of 14 of the major errors).  Ogden tells us that the empirical evidence suggests that “peer review is a coarse and fallible filter.”

This is hardly a ringing endorsement.

Surveys of the medical literature have found the prevalence of statistical errors ranges from 30% to 90% of papers.  See, e.g., Douglas Altman, “Statistics in medical journals: developments in the 1980s,” 10 Stat. Med. 1897 (1991); Stuart J. Pocock, M.D. Hughes, R.J. Lee, “Statistical problems in the reporting of clinical trials. A survey of three medical journals,” 317 New Engl. J. Med. 426 (1987); S.M. Gore, I.G. Jones, E.C. Rytter, “Misuse of statistical methods: critical assessment of articles in the BMJ from January to March 1976. 1 Brit. Med. J. 85 (1977).

Without citing any empirical evidence, Ogden notes that peer review is not well designed to detect fraud, especially when the data are presented to look plausible.  Despite the lack of empirical evidence, the continuing saga of fraudulent publications coming to light supports Ogden’s evaluation. Peer reviewers rarely have access to underlying data.  In the silicone gel breast implant litigation, for instance, plaintiffs relied upon a collection of studies that looked very plausible from their peer-reviewed publications.  Only after the defense discovered misrepresentations and spoliation of data did the patent unreliability and invalidity of the studies become clear to reviewing courts.  The rate of retractions of published scientific articles appears to have increased, although the secular trend may have resulted from increased surveillance and scrutiny of the published literature for fraud.  Daniel S. Levine, “Fraud and Errors Fuel Research Journal Retractions,” (August 10, 2011); Murat Cokol, Fatih Ozbay, and Raul Rodriguez-Esteban, “Retraction rates are on the rise,” 9 European Molecular Biol. Reports 2 (2008);  Orac, “Scientific fraud and journal article retractions” (Aug. 12, 2011).

The fact is that peer review is not very good in detecting fraud or error in scientific work.  Ultimately, the scientific community must judge the value of the work, but in some niche areas, only “the acolytes” are paying attention.  These acolytes cite to one another, applaud each others’ work, and often serve as peer reviewers of the work in the field because editors see them as the most knowledgeable investigators in the narrow field. This phenomenon seems especially prevalent in occupational and environmental medicine.  See Cordelia Fine, “Biased But Brilliant,” New York Times (July 30, 2011) (describing confirmation bias and irrational loyalty of scientists to their hobby-horse hypotheses).

Peer review and correspondence to the editors are not the end of the story.  Discussion and debate may continue in the scientific community, but the pace of this debate may be glacial.  In areas of research where litigation or public policy does not fuel further research to address aberrant findings or to reconcile discordant results, science may take decades to ferret out the error. Litigation cannot proceed at this deliberative speed.  Furthermore, post-publication review is hardly a cure-all for the defects of peer review; post-publication commentary can be, and often is, spotty and inconsistent.  David Schriger and Douglas Altman, “Inadequate post-publication review of medical research:  A sign of an unhealthy research environment in clinical medicine,” 341 Brit. Med. J. 356 (2010)(identifying reasons for the absence of post-publication peer review).

The Evolution of Peer Review as a Criterion for Judicial Gatekeeping of Expert Witness Opinion

The story of how peer review came to be held in such high esteem in legal circles is sad, but deserves to be told.  In the Bendectin litigation, the medication sponsor, Merrell-Richardson, was confronted with the testimony of an epidemiologist, Shanna Swan, who propounded her own, unpublished re-analysis of the published epidemiologic studies, which failed to find an association between Bendectin use and birth defects.  Merrell challenged Swan’s unpublished, non-peer-reviewed re-analyses as not “generally accepted” under the Frye test.  The lack of peer review seemed like good evidence of the novelty of Swan’s reanalyses, as well as their lack of general acceptance.

In the briefings, the Supreme Court received radically different views of peer review in the Daubert case.  One group of amici modestly explained that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.” Brief for Amici Curiae Daryl E. Chubin, et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).  See also E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990).

Other amici, such as the New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine proposed that peer-reviewed publication should be the principal criterion for admitting scientific opinion testimony.  Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993). But see Arnold S. Relman & Marcia Angell,“How Good Is Peer Review?321 New Eng. J. Med. 827, 828 (1989) (‘‘peer review is not and cannot be an objective scientific process, nor can it be relied on to guarantee the validity or honesty of scientific research’’).

Justice Blackmun, speaking for the majority in Daubert, steered a moderate course:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94, 590 n.9 (1993).

This lukewarm endorsement from Justice Blackmun, in Daubert, sent a mixed message to lower federal courts, which tended to make peer review into somewhat of a mechanical test in their gatekeeping decisions.  Many federal judges (and state court judges in states that followed the Daubert precedent), were too busy, too indolent, or too lacking in analytical acumen, to look past the fact of publication and peer review.  These judges avoided the labor of independent thought by taking the fact of peer-review publication as dispositive of the validity of the science in the paper.  Some commentators encouraged this low level of scrutiny and mechanical test, by suggesting that peer review could be taken as an indication of good science.  See, e.g., Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Federal Judicial Center, Reference Manual on Scientific Evidence 9, 17 (2d ed. 2000) (describing Daubert as endorsing peer review as one of the “indicators of good science”) (hereafter cited as Reference Manual).  Elevating peer review to be an indicator of good science, however, obscures its lack of epistemic warrant, misrepresents its real view in the scientific community, and enables judges to fall back into their pre-Daubert mindset of finding quick and easy, and invalid, proxies for scientific reliability.

In a similar vein, other commentators spoke in superlatives about peer review, and thus managed to mislead judges and decision makers further to regard anything as published as valid scientific data, data interpretation, and data analysis. For instance, Professor David Goodstein, writing in the Reference Manual, advises the federal judicial that peer review is the test that separates valid science from rubbish:

“In the competition among ideas, the institution of peer review plays a central role. Scientific articles submitted for publication and proposals for funding are often sent to anonymous experts in the field, in other words, peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (pages in prestigious journals, funds from government agencies) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their bitterest competitor is rigorously honest in the reporting of scientific results, making it easy to fool a referee with purposeful dishonesty if one wants to. Despite all of this, peer review is one of the sacred pillars of the scientific edifice.”

David Goodstein, “How Science Works,” Reference Manual 67, at 74-75, 82 (emphasis added).

Criticisms of Reliance Upon Peer Review as a Proxy for Reliability and Validity

Other commentators have put forward a more balanced and realistic, if not jaundiced, view of peer review. Professor Susan Haack, a philosopher of science at the University of Miami, who writes frequently about epistemic claims of expert witnesses and judicial approaches to gatekeeping, described the disconnect in meaning of peer review to scientists and to lawyers:

“For example, though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value.92 The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication93 — perhaps for no better reason than that the law reviews are not peer-reviewed!”

Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemporary Problems 1, 19 (2009).   Haack’s assessment of the motivation of actors in the legal system is, for a philosopher, curiously ad hominem, and her shameless dig at law reviews is ironic, considering that she publishes extensively in them.  Still, her assessment that peer review is not any guarantee of an article’s being “good stuff,” is one of her more coherent contributions to this discussion.

The absence of peer review hardly supports the inference that a study or an evaluation of studies is not reliable, unless of course we also know that the authors have failed after repeated attempts to find a publisher.  In today’s world of vanity presses, a researcher would be hard pressed to be unable to find a journal in which to publish a paper.  As Drummond Rennie, a former editor of the Journal of the American Medical Association (the same journal, acting as an amicus curiae to the Supreme Court, which oversold peer review), has remarked:

“There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Drummond Rennie, “Guarding the Guardians: A Conference on Editorial Peer Review,” 256 J. Am. Med. Ass’n 2391 (1986); D. Rennie, A. Flanagin, R. Smith, and J. Smith, “Fifth International Congress on Peer Review and Biomedical Publication: Call for Research”. 289 J. Am. Med. Ass’n 1438 (2003)

Other editors at leading medical journals seem to agree with Rennie.  Richard Horton, an editor of The Lancet, rejects the Goodstein view (from the Reference Manual) of peer review as the “sacred pillar of the scientific edifice”:

“The mistake, of course, is to have thought that peer review was any more than a crude means of discovering the acceptability — not the validity — of a new finding. Editors and scientists alike insist on the pivotal importance of peer review. We portray peer review to the public as a quasi-sacred process that helps to make science our most objective truth teller. But we know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong.”

Richard Horton “Genetically modified food: consternation, confusion, and crack-up,” 172 Med. J. Australia 148 (2000).

In last year’s prestigious 2010 Sense About Science lecture, Fiona Godlee, the editor of the British Medical Journal, characterized peer review as deficient in at least seven different ways:

  • Slow
  • Expensive
  • Biased
  • Unaccountable
  • Stifles innovation
  • Bad at detecting error
  • Hopeless at detecting fraud

Godlee, “It’s time to stand up for science once more” (June 21, 2010).

Important research often goes unpublished, and never sees the light of day.  Anti-industry zealots are fond of pointing fingers at the pharmaceutical industry, although many firms, such as GlaxoSmithKline, have adopted a practice of posting study results on a website.  The anti-industry zealots overlook how many apparently neutral investigators suppress research results that do not fit in with their pet theories.  One of my favorite examples is the failure of the late-Dr. Irving Selikoff to publish his study of Johns-Manville factory workers:  William J. Nicholson, Ph.D. and Irving J. Selikoff, M.D., “Mortality experience of asbestos factory workers; effect of differing intensities of asbestos exposure,” Unpublished Manuscript.  This study investigated cancer and other mortality at a factory in New Jersey, where crocidolite was used in the manufacture of  insulation products.  Selikoff and Nicholson apparently had no desire to publish a paper that would undermine their unfounded claim that crocidolite asbestos was not used by American workers.  But this desire does not necessarily mean that Nicholson and Selikoff’s unpublished paper was of any lesser quality than their study of North American insulators, the results of which they published, and republished, with abandon.

Examples of Failed Peer Review from the Litigation Front

Phenylpropanolamine and Stroke

Then there are many examples from the litigation arena of studies that passed peer review at the most demanding journals, but which did not hold up under the more intense scrutiny of review by experts in the cauldron of litigation.

In In re Phenylpropanolamine Products Liability Litigation, Judge Rothstein conducted hearings and entertained extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  The Project was undertaken by manufacturers, which created a Scientific Advisory Group, to oversee the study protocol.  The study was submitted as a report to the FDA, which reviewed the study and convened an advisory committee to review the study further.  “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” In re Phenylpropanolamine Prod. Liab. Litig., 289 F. 2d 1230, 1239 (2003) (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science).  There were thus many layers of peer review for the HSP study.

The HSP study was subjected to much greater analysis in litigation.  Peer review, even in the New England Journal of Medicine, did not and could not carry this weight. The Defendants fought to fight to obtain the underlying data to the HSP, and that underlying data unraveled the HSP paper.  Despite the plaintiffs’ initial enthusiasm for a litigation that was built on the back of a peer-reviewed paper in one of the leading clinical journals of internal medicine, the litigation resulted in a string of notable defense verdicts.  After one of the early defense verdicts, plaintiffs’ challenged the defendant’s reliance upon underlying data that went behind the peer-reviewed publication.  The trial court rejected the request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.”

“Yale gets — Yale gets a big black eye on this.”

O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr)

Viagra and Ophthalmic Events

The litigation over ophthalmic adverse events after the use of Viagara provides another example of challenging peer review.  In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).  In this litigation, the court, after viewing litigation discovery materials, recognized that the authors of a key paper failed to use the methodologies that were described in their published paper.  The court gave the sober assessment that ‘[p]eer review and publication mean little if a study is not based on accurate underlying data.’’ Id.

MMR Vaccine and Autism

Plaintiffs’ expert witness in the MMR vaccine/autism litigation, Andrew Wakefield published a paper in The Lancet, in which he purported to find an association between measles-mumps-rubella vaccine and autism.  A.J. Wakefield, et al., “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 351 Lancet 637 (1998).  This published paper, in a well-regarded journal, opened a decade-long controversy, with litigation, over the safety of the MMR vaccine.  The study was plagued, however, not only by failure to disclose payments from plaintiffs’ attorneys and ethical lapses for failure to obtain ethics board approvals, but by substantially misleading reports of data and data analyses.  In 2010, Wakefield was sanctioned by the UK General Medical Council’s Fitness to Practise Panel.  Finally, in 2010, over a decade after initial publication,  the Lancet ‘‘fully retract[ed] this paper from the published record.’’  Editors of the Lancet, “Retraction—Ileal-lymphoidnodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 375 Lancet 445 (2010).

Accutane and Suicide

In the New Jersey litigation over claimed health effects of Accutane, one of the plaintiffs’ expert witnesses was the author of a key paper that “linked” Accutane to depression.  Palazzolo v. Hoffman La Roche, Inc., 2010 WL 363834 (N.J. App. Div.).  Discovery revealed that the author, James Bremner, did not follow the methodology described in the paper.  Furthermore, Bremner could not document the data used in the paper’s analysis, and conceded that the statistical analyses were incorrect.  The New Jersey Appellate Division held that reliance upon Bremner’s study should be excluded as not soundly and reliably generated.  Id. at *5.

Silicone and Connective Tissue Disease

It is heartening that the scientific and medical communities decisively renounced the pathological science that underlay the silicone gel breast implant litigation.  The fact remains, however, that plaintiffs relied upon a large body of published papers, each more invalid than the other, to support their claims.  For many years, judges around the country blinked and let expert witnesses offer their causation opinions, in large part based upon papers by Smalley, Shanklin, Lappe, Kossovosky, Gershwin, Garrido, and others.  Peer review did little to stop the enthusiasm of editors for this “sexy” topic until a panel of court-appointed expert witnesses, and the Institute of Medicine put an end to the judicial gullibility.

Concluding Comments

One district court distinguished between pre-publication peer review and the important peer review that takes place after publication as other researchers quietly go about replicating or reproducing a study’s findings, or attempting to build on those findings.  “[J]ust because an article is published in a prestigious journal, or any journal at all, does not mean per se that it is scientifically valid.”  Pick v. Amer. Med. Sys., 958 F. Supp. 1151, 1178 n.19 (E.D. La. 1997), aff’d, 198 F.3d 241 (5th Cir. 1999).  With hindsight, we can say that Merrell Richardson’s strategy of emphasizing peer review has had some unfortunate, unintended consequences.  The Supreme Court elevated peer review into a factor for reliable science, and lower courts have elevated peer review into a criterion of validity.  The upshot is that many courts will now not go beyond statements in a peer-reviewed paper to determine whether they are based upon sufficient facts and data, or whether the statements are based upon sound inferences from the available facts and data.  These courts violate the letter and spirit of Rule 702, of the Federal Rules of Evidence.