TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Ziliak Gives Legal Advice — Puts His Posterior On the Line

August 31st, 2011

I have posted before about the curious saga of two university professors of economics who curiously tried to befriend the United States Supreme Court.  Professors Ziliak and McCloskey submitted an amicus brief to the Court, in connection with Matrixx Initiativives, Inc. v. Siracusano, ___ U.S. ___, 131 S.Ct. 1309 (2011).  Nothing unusual there, other than the Professors’ labeling themselves “Statistics Experts,” and then proceeding to commit a statistical howler of deriving a posterior probability from only a p-value.  See The Matrixx Oversold” (April 4, 2011).

I seemed to be alone in my dismay over this situation, but recently Professor David Kaye, an author of the chapter on statistics in the Reference Manual on Scientific Evidence, weighed in with his rebuttal to Ziliak and McCloskey’s erroneous statistical contentions.  SeeThe Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part I” (August 19, 2011), and “The Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part II” (August 26, 2011).  Kaye’s analysis is well worth reading.

Having attempted to bamboozle the Justices on statistics, Stephen Ziliak has now turned his attention to an audience of statisicians and students of statistical science, with a short article in Significance on the Court’s decision in Matrixx.  Stephen Ziliak, “Matrixx v. Siracusano and Student v. Fisher:  Statistical Significance on Trial,”  Significance 131 (September 2011).  Tellingly, Ziliak did not advance his novel, erroneous views of how to derive posterior odds or probabilities from p-values in the pages of a magazine published by the Royal Statistical Society.  Such gems were reserved for the audience of Justices and law clerks in Washington, D.C.  Instead of holding forth on statistical issues, Ziliak has used the pages of a statistical journal to advance equally bizarre, inexpert views about the legal meaning of a Supreme Court case.

The Matrixx decision involved the appeal from a dismissal of a complaint for failure to plead sufficient allegations in a securities fraud action.  No evidence was ever offered or refused; no expert witness opinion was held reliable or unreliable.  The defendant, Matrixx Initiatives, Inc., won the dismissal at the district court, only to have the complaint reinstated by the Court of Appeals for the Ninth Circuit.  The Supreme Court affirmed the reinstatement, and in doing so, did not, and could not, have created a holding about the sufficiency of evidence to show causation in a legal proceeding.  Indeed, Justice Sotomayor, in writing for a unanimous Court, specifically stated that causation was not at issue, especially given that evidentiary displays far below what is necessary to show causation between a medication and an adverse event might come to the attention of the FDA, which agency in turn might find the evidence sufficient to order a withdrawal of the medication.

Ziliak, having given dubious statistical advice to the U.S. Supreme Court, now sets himself up to give equally questionable legal advice to the statistical community.  He asserts that Matrixx claimed that anosmia (the loss of the sense of smell) was unimportant because not “statistically significant.”  Id. at 132.  Matrixx Initiatives no doubt made several errors, but it never made this erroneous claim.  Ziliak gives no citation to the parties’ briefs; nor could one be given.  Matrixx never contended that anosmia was unimportant; its claim was that the plaintiffs had not sufficiently alleged facts that Matrixx had knowledge of a causal relationship such that its failure to disclose adverse event reports became a “material” omission under the securities laws.  The word “unimportant” does not occur in the Matrixx’s briefs; nor was it uttered at oral argument.

Ziliak’s suggestion that “[t]he district court dismissed the case on the basis that investors did not prove ‘materiality’, by which that court meant ‘statistical significance’,” is nonsense.  Id. at 132.  The issue was never the sufficiency of evidence.  Matrixx did attempt to equate materiality with causation, and then argued that allegations of causation required, in turn, allegations of statistical significance.  In arguing the necessity of statistical significance, Matrixx was implicitly suggesting that an evidentiary display that fell short of supporting causation could not be material, when withheld from investors.  The Supreme Court had an easy time of disposing of Matrixx’s argument because causation was never at issue.  Everything that the Court did say about causation is readily discernible as dictum.

Ziliak erroneously reads into the Court’s opinion a requirement that a pharmaceutical company, reporting to the Securities and Exchange Commission “can no longer hide adverse effect [sic] reports from investors on the basis that reports are not statistically significant.”   Id. at 133.  Ziliak incorrectly refers to adverse event reports as “adverse effect reports,” which is a petitio principii.  Furthermore, this was not the holding of the Court.  The potentially fraudulent aspect of Matrixx’s conduct was not that it had “hidden” adverse event reports, but rather that it had adverse event reports and a good deal of additional information, none of which it had disclosed to investors, when at the same time, the company chose to give the investment community particularly bullish projections of future sales.  The medication involved, Zicam, was an over-the-counter formulation that never had the rigorous testing required for a prescription medication’s new drug application.

Curiously, Ziliak, the self-described statistics expert fails to point out that adverse event reports could not achieve, or fail to achieve, statistical significance on the basis of the facts alleged in the plaintiffs’ complaint.  Matrixx, and its legal counsel, might be forgiven this oversight, but surely Ziliak the statistical expert should have noted this.  Indeed, if the parties and the courts had recognized that there never was an issue of statistical significance involved in the case, the entire premiss of Matrixx’s appeal would have been taken away.

To be a little fair to Ziliak, the Supreme Court, having disclaimed any effort to require proof of causation or to define that requisites of reliable evidence of causation, went ahead and offered its own dubious dictum on how statistical significance might not be necessary for causation:

“Matrixx’s argument rests on the premise that statistical significance is the only reliable indication of causation. This premise is flawed: As the SEC points out, “medical researchers … consider multiple factors in assessing causation.” Brief for United States as Amicus Curiae 12. Statistically significant data are not always available. For example, when an adverse event is subtle or rare, “an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance.” Id., at 15; see also Brief for Medical Researchers as Amici Curiae 11. Moreover, ethical considerations may prohibit researchers from conducting randomized clinical trials to confirm a suspected causal link for the purpose of obtaining statistically significant data. See id., at 10-11.

A lack of statistically significant data does not mean that medical experts have no reliable basis for inferring a causal link between a drug and adverse events. As Matrixx itself concedes, medical experts rely on other evidence to establish an inference of causation. See Brief for Petitioners 44-45, n. 22. We note that courts frequently permit expert testimony on causation based on evidence other than statistical significance. See, e.g., Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (C.A.6 2009); Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263-264 (C.A.4 1999) (citing cases); Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741, 744-745 (C.A.11 1986). We need not consider whether the expert testimony was properly admitted in those cases, and we do not attempt to define here what constitutes reliable evidence of causation.”

What is problematic about this passage is that Justice Sotomayor was addressing situations that were not before the Court, and about which she had no appropriate briefing.  Her suggestion that randomized clinical trials are not always ethically appropriate is, of course, true, but that does not prevent an expert witness from relying upon observational epidemiologic studies – with statistically significant results – to support their causal claims.  Justice Sotomayor’s citation to the Best and the Westberry cases, again in dictum, is equally off the mark.  Both cases involve the application of differential etiological reasoning about specific causation, which presupposes that  general causation has been previously, sufficiently shown.  Finally, Justice Sotomayor’s citation to the Wells case, which involved both general and specific causation issues, was inapposite because plaintiff’s expert witness in Wells did rely upon at least one study with a statistically significant result.  As I have pointed out before, the Wells case went on to become an example of one trial judge’s abject failure to understand and evaluate scientific evidence.

Postscript:

The Supreme Court’s statistical acumen may have been lacking, but the Justices seemed to have a good sense of what was really going on in the case.  In December 2010, Matrixx settled over 2,000 Zicam injury claims. On February 24, 2011, a month before the Supreme Court decided the Matrixx case, the federal district judge responsible for the Zicam multi-district litigation refused Matrixx’ motion to exclude plaintiffs’ expert witnesses’ causation opinions.  “First Zicam Experts Admitted by MDL Judge for Causation, Labeling Opinions” 15 Mealey’s Daubert Reporter (February 2011); In re Zicam Cold Remedy Marketing, Sales Practices and Products Liab. Litig., MDL Docket No. 2:09-md-02096, Document 1360 (D. Ariz. 2011).

After the Supreme Court affirmed the reinstatement of the securities fraud complaint, Charles Hensley, the inventor of Zicam, was arrested on federal charges of illegally marketing another drug, Vira 38, which he claimed was therapeutic and preventive for bird flu.  Stuart Pfeifer, “Zicam inventor arrested, accused of illegal marketing of flu drug,” Los Angeles Times (June 2, 2011).  Earlier this month, Mr. Hensley pleaded guilty to the charges of unlawful distribution.

Confusion Over Causation in Texas

August 27th, 2011

As I have previously discussed, a risk ratio (RR) ≤ 2 is a strong practical argument against specific causation. See Courts and Commentators on Relative Risks to Infer Specific CausationRelative Risks and Individual Causal Attribution; and  Risk and Causation in the Law.   But a relative risk greater than 2 threshold has little to do with general causation.  There are any number of well-established causal relationships, where the magnitude of the ex ante risk in an exposed population is > 1, but ≤ 2.  The magnitude of risk for cardiovascular disease and smoking is one such well-known example.

When assessing general causation from only observational epidemiologic studies, where residual confounding and bias may be lurking, it is prudent to require a RR > 2, as a measure of strength of the association that can help us rule out the role of systemic error.  As the cardiovascular disease/smoking example illustrates, however, there is clearly no scientific requirement that the RR be greater than 2 to establish general causation.  Much will depend upon the entire body of evidence.  If the other important Bradford Hill factors are present – dose-response, consistent, coherence, etc. – then risk ratios ≤ 2, from observational studies, may suffice to show general causation.  So the requirement of a RR > 2, for the showing of general causation, is a much weaker consideration than it is for specific causation.

Randomization and double blinding are major steps in controlling confounding and bias, but they are not complete guarantees that systematic bias has been eliminated.  A double-blinded, placebo-controlled, randomized clinical trial (RCT) will usually have less opportunity for bias and confounding to play a role.  Imposing a RR > 2 requirement for general causation thus makes less sense in the context of trying to infer general causation from the results of RCTs.

Somehow the Texas Supreme Court managed to confuse these concepts in an important decision this week, Merck & Co. v. Garza (August 26, 2011).

Mr. Garza had a long history of heart disease, at least two decades long, including a heart attack, and quadruple bypass and stent surgeries.  Garza’s physician prescribed 25 mg Vioxx for pain relief.  Garza died less than a month later, at the age of 71, of an acute myocardial infarction.  The plaintiffs (Mr. Garza’s survivors) were thus faced with a problem of showing the magnitude of the risk experienced by Mr. Garza, which risk would allow them to infer that his fatal heart attack was caused by his having taken Vioxx.  The studies relied upon by plaintiffs did show increased risk, consistently, for larger doses (50 mg.) taken over longer periods of time.  The trial court entered judgment upon a jury verdict in favor of the plaintiffs.

The Texas Supreme Court reversed, and rendered the judgment for Merck.  The Court’s judgment was based largely upon its view that the studies relied upon did not apply to the plaintiff.  Here the Court was on pretty solid ground.  The plaintiffs also argued that Mr. Garza had a higher pre-medication, baseline risk, and that he therefore would have sustained a greater increased risk from short-term, low-dose use of Vioxx.  The Court saw through this speculative argument, and cautioned that the “absence of evidence cannot substitute for evidence.” Slip op. at 17.  The greater baseline does not mean that the medication imposed a greater relative risk on people like Mr. Garza, although it would mean that we would expect to see more cases from any subgroup that looked like him.  The attributable fraction and the difficulty in using risk to infer individual attribution, however, would remain the same.

The problematic aspect of the Garza case arises from the Texas Supreme Court’s conflating and confusing general with specific causation.  There was no real doubt that Vioxx at high-doses, for prolonged use, can cause heart attacks.  General causation was not at issue.  The attribution of Mr. Garza’s heart attack to his short-term, low-dose use of Vioxx, however, was at issue, and was a rather dubious claim.

The Texas Supreme Court proceeded to rely heavily upon its holding and language in Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 S.W.2d 706 (Tex. 1997).  Havner was a Bendectin case, in which plaintiffs claimed that the medication caused specific birth defects.  Both general and specific causation were contested by the parties. The epidemiologic evidence in Havner came from observational studies, either case-control or cohort studies, and not RCTs.

The Havner decision insightfully recognized that risk does not equal causation, but RR > 2 is a practical compromise for allowing courts and juries to make the plaintiff-specific attribution in the face of uncertainty.  Havner, 953 S.W.2d at 717 .  Merck latched on to this and other language, arguing that “Havner requires a plaintiff who claims injury from taking a drug to produce two independent epidemiological studies showing a statistically significant doubling of the relative risk of the injury for patients taking the drug under conditions substantially similar to the plaintiff’s (dose and duration, for example) as compared to patients taking a placebo.” Slip op. at 7.

The plaintiffs in Garza responded by arguing that their reliance upon RCTs relieved them of Havner‘s requirement of showing a RR > 2.

The Texas Supreme Court correctly rejected the plaintiffs’ argument and followed its earlier decision in Havner on specific causation:

“But while the controlled, experimental, and prospective nature of clinical trials undoubtedly make them more reliable than retroactive, observational studies, both must show a statistically significant doubling of the risk in order to be some evidence that a drug more likely than not caused a particular injury.”

Slip op. at 10.

The Garza Court, however, went a dictum too far by expressing some of the Havner requirements as applying to general causation:

Havner holds, and we reiterate, that when parties attempt to prove general causation using epidemiological evidence, a threshold requirement of reliability is that the evidence demonstrate a statistically significant doubling of the risk. In addition, Havner requires that a plaintiff show ‘that he or she is similar to [the subjects] in the studies’ and that ‘other plausible causes of the injury or condition that could be negated [are excluded] with reasonable certainty’.40

Slip op. at 13-14 (quoting from Havner at 953 S.W.2d at 720).

General causation was not the dispositive issue in Garza, and so this language must be treated as dictum.  The sloppiness in confusing the requisites of general and specific causation is regrettable.

The plaintiffs also advanced another argument, which is becoming a commonplace in health-effects litigation.  They threw all their evidence into a pile, and claimed that the “totality of the evidence” supported their claims.  This argument is somehow supposed to supplant a reasoned approach to the issue of what specific inferences can be drawn from what kind of evidence.  The Texas Supreme Court saw through the pile, and dismissed the hand waving:

“The totality of the evidence cannot prove general causation if it does not meet the standards for scientific reliability established by Havner. A plaintiff cannot prove causation by presenting different types of unreliable evidence.”

Slip op. at 17.

All in all, the Garza Court did better than many federal courts that have consistently confused risk with cause, as well as general with specific causation.

Misplaced Reliance On Peer Review to Separate Valid Science From Nonsense

August 14th, 2011

A recent editorial in the Annals of Occupational Hygiene is a poignant reminder of how oversold peer review is in the context of expert witness judicial gatekeeping.  Editor Trevor Ogden urges some cautionary suggestions:

“1. Papers that have been published after proper peer review are more likely to be generally right than ones that have not.

2. However, a single study is very unlikely to take everything into account, and peer review is a very fallible process, and it is very unwise to rely on just one paper.

3. The question should be asked, has any published correspondence dealt with these paper, and what do other papers that cite them say about them?

4. Correspondence will legitimately give a point of view and not consider alternative explanations in the way a paper should, so peer review does not necessarily validate the views expressed.”

Trevor Ogden, “Lawyers Beware! The Scientific Process, Peer Review, and the Use of Papers in Evidence,” 55 Ann. Occup. Hyg. 689, 691 (2011).

Ogden’s conclusions, however, are misleading.  For instance, he suggests that peer-reviewed papers are better than non-peer reviewed papers, but by how much?  What is the empirical evidence for Ogden’s assertion?  In his editorial, Ogden gives an anecdote of a scientific report submitted to a political body, and comments that this report would not have survived peer review.   But an anecdote is not a datum.  What’s worse is that the paper that is rejected by peer review at Ogden’s journal will show up in another publication, eventually.  Courts make little distinction between and among journals for purposes of rating the value of peer review.

Of course it is unwise, and perhaps scientifically unsound, as Ogden points out, to rely upon just one paper, but the legal process permits it.  Worse yet,  litigants, either plaintiff or defendant, are often allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science. “[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.” Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).

As for letters to the editor, sure, courts and litigants should pay attention to them, but as Ogden notes, these writings are themselves not peer reviewed, or not peer reviewed with very much analytical rigor.  The editing of letters raises additional concerns of imperious editors who silence some points of view to the benefit of others. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Furthermore, many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion, because in most journals the authors typically have the last word in the form of reply, which often is self-serving and misleading, with immunity from further criticism.

Ogden describes and details the limitations of peer review in some detail, but he misses the significance of how these limitations play out in the legal arena.

Limitations and Failures of Peer Review

For instance, Ogden acknowledges that peer review fails to remove important errors from published articles. Here he does provide empirical evidence.  S. Schroter, N. Black, S. Evans, et al., “What errors do peer reviewers detect, and does training improve their ability to detect them?” 101 J. Royal Soc’y  Med. 507 (2008) (describing an experiment in which manuscripts were seeded with known statistical errors (9 major and 5 minor) and sent to 600 reviewers; each reviewer missed, on average, over 6 of 14 of the major errors).  Ogden tells us that the empirical evidence suggests that “peer review is a coarse and fallible filter.”

This is hardly a ringing endorsement.

Surveys of the medical literature have found the prevalence of statistical errors ranges from 30% to 90% of papers.  See, e.g., Douglas Altman, “Statistics in medical journals: developments in the 1980s,” 10 Stat. Med. 1897 (1991); Stuart J. Pocock, M.D. Hughes, R.J. Lee, “Statistical problems in the reporting of clinical trials. A survey of three medical journals,” 317 New Engl. J. Med. 426 (1987); S.M. Gore, I.G. Jones, E.C. Rytter, “Misuse of statistical methods: critical assessment of articles in the BMJ from January to March 1976. 1 Brit. Med. J. 85 (1977).

Without citing any empirical evidence, Ogden notes that peer review is not well designed to detect fraud, especially when the data are presented to look plausible.  Despite the lack of empirical evidence, the continuing saga of fraudulent publications coming to light supports Ogden’s evaluation. Peer reviewers rarely have access to underlying data.  In the silicone gel breast implant litigation, for instance, plaintiffs relied upon a collection of studies that looked very plausible from their peer-reviewed publications.  Only after the defense discovered misrepresentations and spoliation of data did the patent unreliability and invalidity of the studies become clear to reviewing courts.  The rate of retractions of published scientific articles appears to have increased, although the secular trend may have resulted from increased surveillance and scrutiny of the published literature for fraud.  Daniel S. Levine, “Fraud and Errors Fuel Research Journal Retractions,” (August 10, 2011); Murat Cokol, Fatih Ozbay, and Raul Rodriguez-Esteban, “Retraction rates are on the rise,” 9 European Molecular Biol. Reports 2 (2008);  Orac, “Scientific fraud and journal article retractions” (Aug. 12, 2011).

The fact is that peer review is not very good in detecting fraud or error in scientific work.  Ultimately, the scientific community must judge the value of the work, but in some niche areas, only “the acolytes” are paying attention.  These acolytes cite to one another, applaud each others’ work, and often serve as peer reviewers of the work in the field because editors see them as the most knowledgeable investigators in the narrow field. This phenomenon seems especially prevalent in occupational and environmental medicine.  See Cordelia Fine, “Biased But Brilliant,” New York Times (July 30, 2011) (describing confirmation bias and irrational loyalty of scientists to their hobby-horse hypotheses).

Peer review and correspondence to the editors are not the end of the story.  Discussion and debate may continue in the scientific community, but the pace of this debate may be glacial.  In areas of research where litigation or public policy does not fuel further research to address aberrant findings or to reconcile discordant results, science may take decades to ferret out the error. Litigation cannot proceed at this deliberative speed.  Furthermore, post-publication review is hardly a cure-all for the defects of peer review; post-publication commentary can be, and often is, spotty and inconsistent.  David Schriger and Douglas Altman, “Inadequate post-publication review of medical research:  A sign of an unhealthy research environment in clinical medicine,” 341 Brit. Med. J. 356 (2010)(identifying reasons for the absence of post-publication peer review).

The Evolution of Peer Review as a Criterion for Judicial Gatekeeping of Expert Witness Opinion

The story of how peer review came to be held in such high esteem in legal circles is sad, but deserves to be told.  In the Bendectin litigation, the medication sponsor, Merrell-Richardson, was confronted with the testimony of an epidemiologist, Shanna Swan, who propounded her own, unpublished re-analysis of the published epidemiologic studies, which failed to find an association between Bendectin use and birth defects.  Merrell challenged Swan’s unpublished, non-peer-reviewed re-analyses as not “generally accepted” under the Frye test.  The lack of peer review seemed like good evidence of the novelty of Swan’s reanalyses, as well as their lack of general acceptance.

In the briefings, the Supreme Court received radically different views of peer review in the Daubert case.  One group of amici modestly explained that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.” Brief for Amici Curiae Daryl E. Chubin, et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).  See also E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990).

Other amici, such as the New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine proposed that peer-reviewed publication should be the principal criterion for admitting scientific opinion testimony.  Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993). But see Arnold S. Relman & Marcia Angell,“How Good Is Peer Review?321 New Eng. J. Med. 827, 828 (1989) (‘‘peer review is not and cannot be an objective scientific process, nor can it be relied on to guarantee the validity or honesty of scientific research’’).

Justice Blackmun, speaking for the majority in Daubert, steered a moderate course:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94, 590 n.9 (1993).

This lukewarm endorsement from Justice Blackmun, in Daubert, sent a mixed message to lower federal courts, which tended to make peer review into somewhat of a mechanical test in their gatekeeping decisions.  Many federal judges (and state court judges in states that followed the Daubert precedent), were too busy, too indolent, or too lacking in analytical acumen, to look past the fact of publication and peer review.  These judges avoided the labor of independent thought by taking the fact of peer-review publication as dispositive of the validity of the science in the paper.  Some commentators encouraged this low level of scrutiny and mechanical test, by suggesting that peer review could be taken as an indication of good science.  See, e.g., Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Federal Judicial Center, Reference Manual on Scientific Evidence 9, 17 (2d ed. 2000) (describing Daubert as endorsing peer review as one of the “indicators of good science”) (hereafter cited as Reference Manual).  Elevating peer review to be an indicator of good science, however, obscures its lack of epistemic warrant, misrepresents its real view in the scientific community, and enables judges to fall back into their pre-Daubert mindset of finding quick and easy, and invalid, proxies for scientific reliability.

In a similar vein, other commentators spoke in superlatives about peer review, and thus managed to mislead judges and decision makers further to regard anything as published as valid scientific data, data interpretation, and data analysis. For instance, Professor David Goodstein, writing in the Reference Manual, advises the federal judicial that peer review is the test that separates valid science from rubbish:

“In the competition among ideas, the institution of peer review plays a central role. Scientific articles submitted for publication and proposals for funding are often sent to anonymous experts in the field, in other words, peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (pages in prestigious journals, funds from government agencies) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their bitterest competitor is rigorously honest in the reporting of scientific results, making it easy to fool a referee with purposeful dishonesty if one wants to. Despite all of this, peer review is one of the sacred pillars of the scientific edifice.”

David Goodstein, “How Science Works,” Reference Manual 67, at 74-75, 82 (emphasis added).

Criticisms of Reliance Upon Peer Review as a Proxy for Reliability and Validity

Other commentators have put forward a more balanced and realistic, if not jaundiced, view of peer review. Professor Susan Haack, a philosopher of science at the University of Miami, who writes frequently about epistemic claims of expert witnesses and judicial approaches to gatekeeping, described the disconnect in meaning of peer review to scientists and to lawyers:

“For example, though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value.92 The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication93 — perhaps for no better reason than that the law reviews are not peer-reviewed!”

Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemporary Problems 1, 19 (2009).   Haack’s assessment of the motivation of actors in the legal system is, for a philosopher, curiously ad hominem, and her shameless dig at law reviews is ironic, considering that she publishes extensively in them.  Still, her assessment that peer review is not any guarantee of an article’s being “good stuff,” is one of her more coherent contributions to this discussion.

The absence of peer review hardly supports the inference that a study or an evaluation of studies is not reliable, unless of course we also know that the authors have failed after repeated attempts to find a publisher.  In today’s world of vanity presses, a researcher would be hard pressed to be unable to find a journal in which to publish a paper.  As Drummond Rennie, a former editor of the Journal of the American Medical Association (the same journal, acting as an amicus curiae to the Supreme Court, which oversold peer review), has remarked:

“There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Drummond Rennie, “Guarding the Guardians: A Conference on Editorial Peer Review,” 256 J. Am. Med. Ass’n 2391 (1986); D. Rennie, A. Flanagin, R. Smith, and J. Smith, “Fifth International Congress on Peer Review and Biomedical Publication: Call for Research”. 289 J. Am. Med. Ass’n 1438 (2003)

Other editors at leading medical journals seem to agree with Rennie.  Richard Horton, an editor of The Lancet, rejects the Goodstein view (from the Reference Manual) of peer review as the “sacred pillar of the scientific edifice”:

“The mistake, of course, is to have thought that peer review was any more than a crude means of discovering the acceptability — not the validity — of a new finding. Editors and scientists alike insist on the pivotal importance of peer review. We portray peer review to the public as a quasi-sacred process that helps to make science our most objective truth teller. But we know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong.”

Richard Horton “Genetically modified food: consternation, confusion, and crack-up,” 172 Med. J. Australia 148 (2000).

In last year’s prestigious 2010 Sense About Science lecture, Fiona Godlee, the editor of the British Medical Journal, characterized peer review as deficient in at least seven different ways:

  • Slow
  • Expensive
  • Biased
  • Unaccountable
  • Stifles innovation
  • Bad at detecting error
  • Hopeless at detecting fraud

Godlee, “It’s time to stand up for science once more” (June 21, 2010).

Important research often goes unpublished, and never sees the light of day.  Anti-industry zealots are fond of pointing fingers at the pharmaceutical industry, although many firms, such as GlaxoSmithKline, have adopted a practice of posting study results on a website.  The anti-industry zealots overlook how many apparently neutral investigators suppress research results that do not fit in with their pet theories.  One of my favorite examples is the failure of the late-Dr. Irving Selikoff to publish his study of Johns-Manville factory workers:  William J. Nicholson, Ph.D. and Irving J. Selikoff, M.D., “Mortality experience of asbestos factory workers; effect of differing intensities of asbestos exposure,” Unpublished Manuscript.  This study investigated cancer and other mortality at a factory in New Jersey, where crocidolite was used in the manufacture of  insulation products.  Selikoff and Nicholson apparently had no desire to publish a paper that would undermine their unfounded claim that crocidolite asbestos was not used by American workers.  But this desire does not necessarily mean that Nicholson and Selikoff’s unpublished paper was of any lesser quality than their study of North American insulators, the results of which they published, and republished, with abandon.

Examples of Failed Peer Review from the Litigation Front

Phenylpropanolamine and Stroke

Then there are many examples from the litigation arena of studies that passed peer review at the most demanding journals, but which did not hold up under the more intense scrutiny of review by experts in the cauldron of litigation.

In In re Phenylpropanolamine Products Liability Litigation, Judge Rothstein conducted hearings and entertained extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  The Project was undertaken by manufacturers, which created a Scientific Advisory Group, to oversee the study protocol.  The study was submitted as a report to the FDA, which reviewed the study and convened an advisory committee to review the study further.  “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” In re Phenylpropanolamine Prod. Liab. Litig., 289 F. 2d 1230, 1239 (2003) (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science).  There were thus many layers of peer review for the HSP study.

The HSP study was subjected to much greater analysis in litigation.  Peer review, even in the New England Journal of Medicine, did not and could not carry this weight. The Defendants fought to fight to obtain the underlying data to the HSP, and that underlying data unraveled the HSP paper.  Despite the plaintiffs’ initial enthusiasm for a litigation that was built on the back of a peer-reviewed paper in one of the leading clinical journals of internal medicine, the litigation resulted in a string of notable defense verdicts.  After one of the early defense verdicts, plaintiffs’ challenged the defendant’s reliance upon underlying data that went behind the peer-reviewed publication.  The trial court rejected the request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.”

“Yale gets — Yale gets a big black eye on this.”

O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr)

Viagra and Ophthalmic Events

The litigation over ophthalmic adverse events after the use of Viagara provides another example of challenging peer review.  In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).  In this litigation, the court, after viewing litigation discovery materials, recognized that the authors of a key paper failed to use the methodologies that were described in their published paper.  The court gave the sober assessment that ‘[p]eer review and publication mean little if a study is not based on accurate underlying data.’’ Id.

MMR Vaccine and Autism

Plaintiffs’ expert witness in the MMR vaccine/autism litigation, Andrew Wakefield published a paper in The Lancet, in which he purported to find an association between measles-mumps-rubella vaccine and autism.  A.J. Wakefield, et al., “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 351 Lancet 637 (1998).  This published paper, in a well-regarded journal, opened a decade-long controversy, with litigation, over the safety of the MMR vaccine.  The study was plagued, however, not only by failure to disclose payments from plaintiffs’ attorneys and ethical lapses for failure to obtain ethics board approvals, but by substantially misleading reports of data and data analyses.  In 2010, Wakefield was sanctioned by the UK General Medical Council’s Fitness to Practise Panel.  Finally, in 2010, over a decade after initial publication,  the Lancet ‘‘fully retract[ed] this paper from the published record.’’  Editors of the Lancet, “Retraction—Ileal-lymphoidnodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 375 Lancet 445 (2010).

Accutane and Suicide

In the New Jersey litigation over claimed health effects of Accutane, one of the plaintiffs’ expert witnesses was the author of a key paper that “linked” Accutane to depression.  Palazzolo v. Hoffman La Roche, Inc., 2010 WL 363834 (N.J. App. Div.).  Discovery revealed that the author, James Bremner, did not follow the methodology described in the paper.  Furthermore, Bremner could not document the data used in the paper’s analysis, and conceded that the statistical analyses were incorrect.  The New Jersey Appellate Division held that reliance upon Bremner’s study should be excluded as not soundly and reliably generated.  Id. at *5.

Silicone and Connective Tissue Disease

It is heartening that the scientific and medical communities decisively renounced the pathological science that underlay the silicone gel breast implant litigation.  The fact remains, however, that plaintiffs relied upon a large body of published papers, each more invalid than the other, to support their claims.  For many years, judges around the country blinked and let expert witnesses offer their causation opinions, in large part based upon papers by Smalley, Shanklin, Lappe, Kossovosky, Gershwin, Garrido, and others.  Peer review did little to stop the enthusiasm of editors for this “sexy” topic until a panel of court-appointed expert witnesses, and the Institute of Medicine put an end to the judicial gullibility.

Concluding Comments

One district court distinguished between pre-publication peer review and the important peer review that takes place after publication as other researchers quietly go about replicating or reproducing a study’s findings, or attempting to build on those findings.  “[J]ust because an article is published in a prestigious journal, or any journal at all, does not mean per se that it is scientifically valid.”  Pick v. Amer. Med. Sys., 958 F. Supp. 1151, 1178 n.19 (E.D. La. 1997), aff’d, 198 F.3d 241 (5th Cir. 1999).  With hindsight, we can say that Merrell Richardson’s strategy of emphasizing peer review has had some unfortunate, unintended consequences.  The Supreme Court elevated peer review into a factor for reliable science, and lower courts have elevated peer review into a criterion of validity.  The upshot is that many courts will now not go beyond statements in a peer-reviewed paper to determine whether they are based upon sufficient facts and data, or whether the statements are based upon sound inferences from the available facts and data.  These courts violate the letter and spirit of Rule 702, of the Federal Rules of Evidence.