TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Expert Evidence Free-for-All in Washington State

September 23rd, 2011

Daubert/Frye issues are fact specific. Meaningful commentary about expert witness decisions requires a close familiarity with the facts and data in the case under scrutiny.  A recent case in point comes from the Washington Supreme Court.   The plaintiff alleged that her child was born with birth defects as a result of her workplace exposure to solvents from mixing paints.  The trial court dismissed the case on summary judgment, after excluding plaintiff’s expert witnesses’ causation opinions. On appeal, the Court, en banc, reversed the summary judgment, and remanded for trail.  Anderson v. Akzo Nobel Coatings Inc., No. 82264-6, Wash. Sup.; 2011 Wash. LEXIS 669 (Sept. 8, 2011).

Anderson worked for Akzo Nobel Coatings, Inc., until the time she was fired, which occurred shortly after she filed a safety complaint.  Her last position was plant environmental coordinator for health and safety. Her job occasionally required her to mix paints.  Akzo’s safety policies required respirator usage when mixing paints, although Anderson claimed that enforcement was lax.  Slip op. at 2.  Anderson gave birth to a son, who was diagnosed with congenital nervous and renal system defects.  Id. at 3.

Anderson apparently had two expert witnesses:  one of her child’s treating physicians and Dr. Khattak, an author of an epidemiologic study on birth defects in women exposed to organic solvents. Sohail Khattak, et. al., “Pregnancy Outcome Following Gestational Exposure to Organic Solvents,” 281 J. Am. Med. Ass’n 1106 (1999). See Slip op. at 3.

The conclusions of the published paper were modest, and no claim to causality was made from either the study alone or from the study combined with the prior knowledge in the field.  When the author, Dr. Khattak donned the mantle of expert witness, intellectual modest went out the door:  He opined that the association was causal.  The treating physician echoed Dr. Khattak’s causal opinion.

The fact-specific nature of the decision makes it difficult to assess the accuracy or validity of the plaintiff’s expert witnesses’ opinions.  The claimed teratogenicity of paint solvents is an interesting issue, but I confess it is one with which I am not familiar.  Perhaps others will address the claim.  Regardless whether or not the claim has scientific merit, the Anderson decision is itself seriously defective.  The Washington Supreme Court’s opinion shows that it did little to familiarize itself with the factual issue, and holds that judges need not tax themselves very much to understand the application of scientific principles to the facts and data of their cases.  Indeed, what is disturbing about this decision is that it sets the bar so low for medical causation claims. Although Anderson does not mark a reversion to the old Ferebee standard, which would allow any qualified, willing expert witness to testify to any conclusion, the decision does appear to permit any opinion based upon a generally accepted methodology, without gatekeeping analysis of whether the expert has actually faithfully and appropriately applied the claimed methodology.  The decision eschews the three subparts of Federal Rule of Evidence 702, which requires that the proffered opinion:

(1) … is based upon sufficient facts or data,

(2) … is the product of reliable principles and methods, and

(3) …[is the product of the application of] the principles and methods reliably to the facts of the case.

Federal Rule of Evidence 702.

In abrogating standards for expert witness opinion testimony, the Washington Supreme Court manages to commit several important errors about the nature of scientific and medical testimony.  These errors are much more serious than any possible denial of intellectual due process in the Anderson case because they virtually ensure that meaningful gatekeeping will not take place in future Washington state court cases.

I. The Court Confused Significance Probability with Expert Witnesses’ Subjective Assessment of Posterior Probability

The Washington Supreme Court advances two grounds for abrogating gatekeeping in medical causation cases.  First, the Court mistakenly states that the degree of certainty for scientific propositions is greater in the scientific world than it is in a civil proceeding:

“Generally, the degree of certainty required for general acceptance in the scientific community is much higher than the concept of probability used in civil courts.  While the standard of persuasion in criminal cases is “beyond a reasonable doubt,” the standard in most civil cases is a mere “preponderance.”

Id. at 14.  No citation is provided for the proposition that the scientific degree of certainty is “much higher,” other than a misleading reference to a book by Marcia Angell, former editor of the New England Journal of Medicine:

“By contrast, “[f]or a scientific finding to be accepted, it is customary to require a 95 percent probability that it is not due to chance alone.”  Marcia Angell, M.D., Science on Trial: The Clash of Medical Evidence and the Law in the Breast Implant Case 114 (1996).  The difference in degree of confidence to satisfy the Frye “general acceptance” standard and the substantially lower standard of “preponderance” required for admissibility in civil matters has been referred to as “comparing apples to oranges.” Id. To require the exacting level of scientific certainty to support opinions on causation would, in effect, change the standard for opinion testimony in civil cases.”

Id. at 15.  This popular press book hardly supports the Court’s contention. The only charitable interpretation of the 95% probability is that the Court, through Dr. Angell, is taking an acceptable rate of false positive errors to be no more than the customary 5%, and is looking at a confidence interval based upon this specified error rate of 1 – α. This error rate, however, is not the probability that the null hypothesis is true.  If the Court would have read the very next sentence, after the first sentence it quotes from Dr. Angell, it would have seen:

“(I am here giving a shorthand version of a much more complicated statistical concept.)”

Science on Trial at 114 (1996).  The Court failed to note that Dr. Angell was talking about significance probability, which is used to assess the strength of the evidence in a single study against the null hypothesis of no association.  Dr. Angell was well aware that she was simplifying the meaning of significance probability in order to distinguish it from a totally different concept, the probability of attribution of a specific case to a known cause of the disease.  It is the probability of attribution that has some relevance to the Court’s preponderance standard; and the probability of attribution standard is not different from the civil preponderance standard.

The Court’s citation of Dr. Angell for the proposition that the “degree of confidence” and the “preponderance” standard are like “comparing apples to oranges,” is a complete distortion of Dr. Angell’s book.  She is comparing the attributable risk based upon an effect size – the relative risk, which need be only greater than 50% for specific causation, with a significance probability for the interpretation of the data from a single, based upon the assumption of the null hypothesis:

“Comparing the size of an effect with the probability that a given finding isn’t due to chance is comparing apples and oranges.”

Id. This statement is a far cry from the Court’s misleading paraphrase, and is no support at all for the Court’s statistical solecism. Implicit in the Court’s error is its commission of the transpositional fallacy; it has confused significance probability (the probability of the evidence given the null hypothesis) with Bayesian posterior probabilities (the probability of the null hypothesis given all the data and evidence in the case).

Having misunderstood significance probability to be at odds with the preponderance standard, the Court notes that the “absence of a statistically significant basis” for an expert witness’s opinion does not implicate Frye or render the expert witness’s opinion inadmissible.  Id. at 16.  In the Anderson case, this musing is pure dictum because Dr. Khattak’s study showed a highly statistically significant difference in the rate of birth defects among women with solvent exposures compared with women without such exposures.

II.  The Court Abandons Evidence or Data as Necessary to Support Judgments of Causality

The Anderson Court did not stop with its misguided distinction between burdens of proof in science and in law.  The Court went on to offer the remarkable suggestion that gatekeeping is unnecessary for medical opinions because they are not, in any event, evidence-based:

“Many expert medical opinions are pure opinions and are based on experience and training rather than scientific data.  We only require that ‘medical expert testimony . . . be based upon ‘a reasonable degree of medical certainty’ or probability.”

Slip op. at16 -17 (internal citations omitted).  There may be some opinions that are experientially based, but the Court did not, and could not, adduce any support for the proposition that judgments of teratogenic causation do not require scientific data.  Troublingly, the Court appears to allow medical expert opinions to be “pure opinions,” unsupported by empirical, scientific data.

Presumably as an example of non-evidence based medical opinions, the Anderson Court offers the example of differential diagnosis:

“Many medical opinions on causation are based upon differential diagnoses. A physician or other qualified expert may base a conclusion about causation through a process of ruling out potential causes with due consideration to temporal factors, such as events and the onset of symptoms.”

Id. at 17. This example, however, does not explain or justify anything the Court  claimed.  Differential diagnoses, or more accurately “differential etiology,” is a process of reasoning by iterative disjunctive syllogism to the most likely cause of a particular patient’s disease.  The syllogism assumes that any disjunct – possible cause of this specific case – has previously, independently been shown to be capable of causing the outcome in question.  There is no known methodology by which this syllogism itself can show general causation.

Not surprisingly, the Court makes no attempt to support its mistaken claim that differential diagnosis permits the assessment of general causation without the necessity of “scientific data.”

The Court’s confusion between significance probability (1 – α)% and posterior probability based upon all the evidence, as well as its confusion between differential diagnosis and evidence-based assessments of general causation, allowed the Court to take a short way with medical causation evidence.  The denial of scientific due process followed inevitably.

III.  The Court Abandoned All Gatekeeping for Expert Witness Opinion Testimony

The Anderson Court suggested that gatekeeping was required by Washington’s continued adherence to the stringent Frye test, but the Court then created an exception bigger than the rule:

“Once a methodology is accepted in the scientific community, then application of the science to a particular case is a matter of weight and admissibility under ER 702, the Frye test is only implicated where the opinion offered is based upon novel science.  It applies where either the theory and technique or method of arriving at the data relied upon is so novel that it is not generally accepted by the relevant scientific community.  There is nothing novel about the theory that organic solvent exposure may cause brain damage and encephalopathy.  See, e.g., Berry v. CSX Transp., Inc., 709 So. 2d 552, 568 & n.12, 571-72 (Fla. Dist. Ct. App. 1998) (surveying medical literature). Nor does it appear that there is anything novel about the methods of the study about which Dr. Khattak wrote. Khattak, supra, at 1106. Frye does not require that the specific conclusions drawn from the scientific data upon which Dr. Khatta relied be generally accepted in the scientific community.  Frye does not require every deduction drawn from generally accepted theories to be generally accepted.”

Slip op. at 18-19 (internal citations omitted).

By excepting the specific inferences and conclusions from judicial review, the Court has sanctioned any nonsense as long as the expert witness can proclaim that he used the methods of “toxicology,” or of “epidemiology,” or some other generally accepted branch of science.  The Court left no room to challenge whether the claim is correct at any other than the most general level.  The studies cited in support of a causation may completely lack internal or external validity, but if they are of a class of studies that are “scientific,” and purport to use a method that is generally accepted (e.g., cohort or case-control studies), then the inquiry is over. Indeed, the Court left no room at all for challenges to expert witnesses who give dubious opinions about medical causation.

IV. Fault Issues

Not content to banish science from the judicial assessment of scientific causality judgments, the Anderson Court went further to take away any defense based upon the mother’s fault in engaging in unprotected mixing of paints while pregnant, or the mother’s fault in smoking while pregnant.   Slip op. at 20.  Suing the mother as a tortfeasor may not be an attractive litigation option to the defendant in a case arising out of workplace exposure to an alleged teratogen, but clearly the mother could be at fault with respect to the causation of her child’s harm. She was in charge of environmental health and safety, and she may well have been aware of the hazards of solvent exposures.  In this case, there were grounds to assert the mother’s fault both in failing to comply with workplace safety rules, and in smoking during her pregnancy (assuming that there was evidence, at the same level as paint fumes, for the teratogenicity of smoking).

RULE OF EVIDENCE 703 — Problem Child of Article VII

September 19th, 2011

With the exception of a few evidence scholars, Federal Rule of Evidence 703 is ignored or misunderstood in practice.  There was a time when virtually every motion to exclude an expert witness’s opinion was framed on Rule 703 concepts, either alone, or in conjunction with Rule 702 requirements.  The Supreme Court’s decision in Daubert changed practice by holding that Rule 702 required gatekeeping, and by generally slighting Rule 703.

I.  Reform of the Common Law

Federal Rule of Evidence 703 formally abandoned the common-law requirement that expert witnesses base their opinions upon evidence of record, either personal observations or facts admitted into evidence.  The first sentence of Rule 703, which has remained unchanged since its original adoption, makes clear that an expert witness may rely upon facts or data that are never admitted into evidence.  This sentence details three methods of putting “facts or data” before expert witnesses.  First, expert witnesses may themselves be percipient witnesses to the facts or data upon which they rely.  Second, expert witnesses may learn of facts or data at the trial by observing other witnesses testify or by being asked to assume facts or data for purposes of giving an opinion.  Third, expert witnesses may come to learn of “facts or data” before the hearing.  It is this third method that represents a departure from the common law, and which raises the issue whether the expert witness has relied upon facts or data, which are themselves inadmissible.

The rationale for Rule 703 was the recognition that much of the expert witness’s understanding of an area of science, medicine, or technology was governed by training, prior experience, professional collaborations, and extensive reading, all of which represented the basis, often in large part, of the case-specific opinions that are then offered in the courtroom.  These bases are mostly hearsay, and mostly inadmissible if expert witnesses were to try to articulate any particular aspect of their personal learning.  The rationale for Rule 703, however, also included the economy and convenience of presenting expert testimony without the need of formal proof of predicate “facts or data,” at least if those facts or data were of the type reasonably relied upon by experts in the relevant field.  Not surprisingly, advocates responded by using Rule 703 to inject all manner of hearsay into their trials, including opinion testimony from witnesses that would never testify at trial.  Courts and commentators responded with confusion over whether Rule 703 created a new exception to the rule against hearsay.

II.  Conduit for Inadmissible Evidence

Much academic, judicial, and professional criticism of Rule 703, before its amendment in 2000, centered on the mischief created by expert witnesses’ reliance upon inadmissible evidence and the disclosure of this information to the jury.  To be sure, Federal Rule 705 made clear that the expert witness need not disclose any basis; the expert opinion could be elicited as a conclusory opinion, or the expert could disclose some but not all bases.  Parties, however, were often intent to use Rule 703 to present, at least selectively, those relied upon facts and data (and sometimes opinions) that would aid their case, regardless of the admissibility of the disclosed expert witness bases.  If the other side were foolish enough to request a limiting instruction, the proponent would revel in the emphasis that the Court gave to their inadmissible facts and data.[i]

Of course, the presentation of expert opinion without requiring disclosure of bases is hardly calculated to permit jurors or trial judges to assess the validity or correctness of the opinions that they must weigh at trial.  Furthermore, Rule 703 shifted the burden to opposing counsel to elicit bases in order to show flaws or weaknesses in reasoning and inference.  This crossexamination frequently could not take place without eliciting inadmissible evidence.

In 2000, Rule 703 was amended to include its third, last sentence, which creates a presumption against disclosure of inadmissible facts or data to the jury.  The presumption against disclosure may be overcome by a judicial finding that the probative value in helping the jury evaluate the opinion is outweighed by the prejudice of injecting inadmissible evidence into the trial.  Nothing in the revised rule makes the inadmissible “facts or data” admissible, although at one point, the Advisory Committee Notes confuse admissibility and disclosure when it writes in terms of relied upon information that is “admissible only for the purpose of assisting the jury in evaluating an expert’s opinion.”  Such evidence is not admissible at all, which is exactly why the presumption is against disclosure and the alternative is disclosure, along with consideration of a limiting instruction.

III.  Expert Witness Opinions – Castles in the Air

Whether underlying facts are disclosed or not, Rule 703, as currently applied in federal courts, raises serious concerns about whether expert witness opinion testimony has a reliable foundation.  The law in most states is that an expert witness’s opinion can rise no higher than the facts upon which the opinion is based.  If the jury does not hear the bases of the opinion, it cannot meaningfully evaluate the opinion.  Furthermore, the jury cannot make sense of an expert witness’s opinion, when it is bound by a limiting instruction, which explains that it may consider the basis in evaluating the expert witness’s opinion, but it may not consider the basis as evidence that has been established in the case. If this basis is not otherwise established in the case, then the jury would be compelled to reject the testimony as unsupported by facts or data in the case.  If the jury must consider the opinion because the expert witness claims to have relied reasonably upon inadmissible “facts or data,” then the expert witness has been given important fact-finding power in the case.

Perhaps Rule 702, with its imposition of gatekeeping responsibilities upon the trial court, is supposed to solve this problem.  Many of the Circuits appear to be moving toward a requirement of pretrial hearings for Rule 702 challenges, at least when requested, and sometimes even when not.  In some instances, the lack of a proper factual predicate, or unreasonableness in reliance upon an inadmissible factual predicate, can be developed in a pretrial hearing that allows the parties to join issue over the reasonableness of reliance and proof of the predicate facts or data.

IV.  Who Decides Reasonable Reliance?

Some of the earlier case law suggested that the expert witness could validate his or her own reliance upon “facts or data,” as “reasonable.”[ii] Judges, like most people, glibly assumed that what people normally or customarily do is reasonable.  Extending this assumption to the law of expert witnesses, courts have equated the reasonable reliance of Rule 703 with what experts customarily do in their field.[iii] Other courts appeared to go further, especially in the context of forensic expert witness opinion, to equate reasonable reliance with what experts do in their courtroom testimony.

The current view, influenced no doubt by the Supreme Court’s holdings in Daubert, Joiner, and Kumho Tire, has settled on requiring the trial court to make an independent assessment, based upon a factual showing, that the “facts or data” in question may be reasonably relied upon by experts in the relevant field.[iv] One of the important implications of this shift is that courts may now accept an expert witness’s testimony about what he normally does, but if opposing counsel challenges the reasonableness of the practice, with affidavits, testimony, learned treatises, and the like, then the court will be required to make a preliminary determination of the reasonableness of the expert’s “normal practice.”  Given that litigation often involves unusual situations outside both the statistical and prescriptive “norms” of ordinary life, the abandonment of extreme deference to expert witnesses as the ultimate arbiters of reasonableness is a significant advance in the evolution of the Federal Rules of Evidence.

V.  Reasonable Reliance and Reliability:  The Intersection Between Rules 702 and 703

Some of the early enthusiasm for Rule 703 as a speed bump for unreliable expert witness testimony came from the explicit use of the concept of “reasonable reliance” in the second sentence of the Rule.  The original Advisory Committee Note encouraged this view by giving an example, without much analysis, of an accident reconstruction expert whose testimony would not be reasonably based upon the statements of bystanders.  Before the advent of Daubert, this example was a tease to lawyers who were looking for some way to limit the flood of unreliable expert witness opinion testimony.  The Advisors, however, did not explain why such reliance would be unreasonable.  We could certainly imagine situations in which bystanders’ statements were essential to recreating an accident.  Furthermore, the statements of bystanders might be admissible under various exceptions to the rule against hearsay, and the Note thus seems to contradict the actual language of the Rule, which limits the reasonableness requirement to reliance upon inadmissible evidence.  In any event, the Advisory Committee’s example of an “accidentologist” seemed to imply a requirement of trustworthiness, which might apply to both admissible and inadmissible “facts or data.”

Perhaps because of the original Advisory Committee Note, litigants, in challenging the reliability of expert witness opinion testimony, frequently invoked both Rules 702 and 703 in support of exclusion.  Indeed, cases that focus on only Rule 703 are relatively uncommon; most cases note that they are addressing motions to bar expert witnesses, made under both rules.  After the Supreme Court’s Quartet on Rule 702 (Daubert, Joiner, Kumho, and Weisgram), the need to frame an exclusionary motion on Rule 703 has been largely dispelled.

One case that gave rise to much of the enthusiasm for Rule 703 as a basis for expert witness preclusion was Judge Weinstein’s decision in In re Agent Orange.[v] Some of the expert witnesses in the Agent Orange litigation relied upon checklists of symptoms prepared by the litigants.  Invoking Rule 703 to support exclusion of the expert witnesses’ opinions, the trial court observed that “no reputable physician relies on hearsay checklists by litigants to reach a conclusion with respect to the cause of their affliction.”[vi]

The lesson of Agent Orange was that Rule 703 could serve as a basis for excluding expert witness testimony.  If the expert witness relied unreasonably upon “facts or data,” then that expert witness’s testimony was fatally flawed under the Rules and had to be excluded.  The Court in Agent Orange avoided the obvious conclusion that an expert witness’s opinion, which was not reasonably based upon “facts or data,” could not be helpful to the trier of fact, and thus the opinion would necessarily offend Rule 702, as well.

Practitioners, faced with dubious expert witness opinion testimony after Agent Orange, increasingly relied upon Rule 703, along with Rules 702 and 403, in stating their challenges to proffered opinions.  Many Courts, in ruling upon these challenges, did not separate out their holdings or reasoning, in applying Rules 702 and 703 to exclude opinions.[vii] Some courts, especially before the Supreme Court’s decision in Daubert, framed reliability challenges almost exclusively in terms of compliance with Rule 703.[viii]

The early enthusiasm for an expansive role for Rule 703 as a tool for broad gatekeeping was problematic from the beginning.  Rule 703 has always required “reasonableness” for an expert witness’s reliance upon inadmissible “facts or data.”  The Rule is, and has always been, silent about reliance upon admissible “facts or data.”  As a result, Rule 703 could never have aspired to the principal role of limiting the flow of unreliable expert testimony.  The Rules of Evidence provide ample bases for expert witnesses to formulate unreliable opinions based solely, and unreasonably, upon admissible “facts or data,” such as inadvertent and false party admissions, self-serving statements made to examining physicians, or vanity press publications elevated to “learned treatise” status.  The resulting opinions have little or no epistemic warrant or claim to reliable methodology, but they may readily pass muster under Rule 703.  Furthermore, even if Rule 703 were applied to eliminate all unreasonable reliance upon “facts or data,” the Rule would not have guarded against unreliability that crept into the opinions as a result of invalid inferences or reasoning from “facts or data,” which themselves were beyond reproach.

The Advisory Committee Note to Rule 702, from 2000, attempts to answer some of the questions about the proper scope of Rule 703:

There has been some confusion over the relationship between Rules 702 and 703. The amendment makes clear that the sufficiency of the basis of an expert’s testimony is to be decided under Rule 702. Rule 702 sets forth the overarching requirement of reliability, and an analysis of the sufficiency of the expert’s basis cannot be divorced from the ultimate reliability of the expert’s opinion. In contrast, the ”reasonable reliance” requirement of Rule 703 is a relatively narrow inquiry. When an expert relies on inadmissible information, Rule 703 requires the trial court to determine whether that information is of a type reasonably relied on by other experts in the field. If so, the expert can rely on the information in reaching an opinion. However, the question whether the expert is relying on a sufficient basis of information – whether admissible information or not – is governed by the requirements of Rule 702.

This Note leaves a large gap in the analysis of expert witness opinion evidence.  The question of the sufficiency of an expert’s bases is understandably different from whether the “facts or data” are themselves reasonably (and thus presumably also reliably) relied upon by experts in the field.  Rule 702 provides guidance about the sufficiency of “facts or data,” as well the reliable application of reliable principles and methods to the facts of the case.  Rule 702, however, is silent about the reliability of the starting point in the scientific or technical knowledge:  the data.  Perhaps the Advisory Committee meant to imply that reliable methodology requires obtaining “facts or data” in a reliable way, but it failed to address the issue in the recent amendments to Rule 702.

There is another problem that amended Rules 702 and 703, along with the Advisory Notes, fail to address.  This problem further illustrates the gap in the coverage of the rules, and perhaps it explains why courts have strained at times to include Rule 703 as part of their analysis of the reliability of expert witness opinion testimony.  Consider what happens when a proffered expert witness’s opinion has already been held to satisfy the relevance and reliability requirements of Rule 702.  The Court has explicitly ruled that the expert’s opinion has a sufficient factual basis, and that the expert has reached the opinion by reliably applying reliable methods to the facts of a case.  After the Court’s Rule 702 ruling, the expert witness amends her report to add reliance upon a new study.  The study is unfinished, and unpublished.  The paper has yet to be peer-reviewed.  Furthermore, the study is written in a foreign language, and the expert has relied upon a translation that appears to have errors, with analyses that are at least partially incoherent or incorrect.  This new study no longer raises questions about sufficiency of data, and the expert’s overall opinion, ex hypothesi, satisfies Rule 702.  This new study appears to raise fresh questions under Rule 703, not provided for in the Advisory Committee’s allocation of issues between Rules 702 and 703.[ix] Some courts might think that the addition of another study, even if the study were scientifically questionable, in support of an already 702-sufficient opinion could not be harmful error.  Yet, the additional study would give the jury the sense that the expert witness had a surfeit of support for his opinion.  Furthermore, the additional study would prejudice the adverse party by requiring more cross-examination on details that may test the patience of the factfinder.

VI.  “Facts or Data” versus “Opinions”

Rule 703 describes the condition for permitting expert witnesses to rely upon inadmissible “facts or data.”  The Rule is silent about reliance upon others’ opinions.  Of course, the distinction between facts (or data) and opinions may occasionally be blurred or difficult to discern, but the entirety of Article VII is predicated upon the existence of the distinction.[x] The conspicuous absence of “opinions” from the Rule’s conditional allowance of expert testimony based upon inadmissible “facts or data” would seem to mean that such reliance upon extra-record opinion was not authorized under Rule 703.[xi] Other courts, especially the Third Circuit, have given their blessing to the wholesale backdoor introduction of opinions, and they have not distinguished facts or data from opinions, as the potentially reasonably relied upon inadmissible evidence under Rule 703.[xii] `

The Advisory Committee Note to the 2000 amendment to Rule 702 purports to answer the question of the scope of “facts or data” under Rule 703:

The term ”data” is intended to encompass the reliable opinions of other experts. See the original Advisory Committee Note to Rule 703. The language ”facts or data” is broad enough to allow an expert to rely on hypothetical facts that are supported by the evidence.

Id.

The original Advisory Committee Note to Rule 703, however, refers to opinions as within the scope of “facts or data” in just one single passage, and in a relatively narrow context:

Thus a physician in his own practice bases his diagnosis on information from numerous sources and of considerable variety, including statements by patients and relatives, reports and opinions from nurses, technicians and other doctors, hospital records, and X rays.  Most of them are admissible in evidence, but only with the expenditure of substantial time in producing and examining various authenticating witnesses. The physician makes life-and-death decisions in reliance upon them. His validation, expertly performed and subject to cross-examination, ought to suffice for judicial purposes. Rheingold, supra, at 531; McCormick § 15. A similar provision is California Evidence Code § 801(b).[xiii]

The original Note to Rule 703 is highly misleading because opinions that are recorded in medical records would be admissible in any event as business records.  Furthermore, even if physicians must sometimes make life-or-death decisions on the basis of limited, incomplete, undocumented opinions offered by another medical care provider, that in extremis scenario is hardly a propitious basis for opinion testimony at a judicial hearing where the trier of fact is charged with making a deliberate evaluation of the evidence.  Courts and juries are charged with trying to ascertain the truth, and they do not have a warrant to abridge the fact-finding process because a physician, or any other “expert,” at time past was acting under exigent circumstances.

This more recent attempt to endorse Rule 703 as a conduit for other expert “opinions” should fail for several reasons.  First, the entire Article VII concerns itself with opinions and opinion testimony.  To suggest that Rule 703 used “facts or data” to include “opinions” ignores the context of Article VII and the limited exception that Rule 703 was making to common-law procedure.  Second, the original Advisory Committee note spoke only, in one sentence, to opinions of medical-care providers.  These opinions would normally be recorded in the patient’s medical charts and records, and they would be admissible in any event.[xiv] There is nothing in the notes to Rule 703 to support the wholesale inclusion of hearsay opinion testimony.  Third, the expansion of Rule 703 to include opinions should not circumvent the reliability requirements of Rule 702.  Fourth, the rationale of convenience used to support the expansion of the common law through Rule 703 is stood on its head by this expansion to include opinions.  The Rule puts a heavy burden to ferret out reliance upon opinions of other non-testifying experts, and to take adequate discovery of those persons or organizations. This is a heavy price to pay for the “convenience” of having an opinion introduced without the usual safeguards of critical examination of the qualifications of the expert, or the reliability of his opinion.

Until this extension of Rule 703 is checked, practitioners must inquire of their adversary’s expert witnesses, either in interrogatories or in depositions, whether the witnesses have consulted with and relied upon the writings or oral discussions with any other person regarded by the testifying expert witness as an expert.  If the testifying expert witness has relied upon these non-testifying expert’s statements or opinions, opposing counsel may have to entertain the expensive, inconvenient resort to additional discovery of the out-of-court declarant.

VII.  Fulsome Importation of Untrustworthy Opinions Through Rule 703

One prevalent and problematic practice is for expert witnesses to rely upon a study in order to pass through the study’s authors’ conclusions. Most published studies have a basic ordered structure (IMRAD) :

  • Introduction – identifies the purpose and scientific context of the study;
  • Methods – identifies the materials used, the identification, organization, collection of data and controls;
  • Results – reports the data obtained and any statistical analyses of the data; and
  • Discussion – reports the study authors’ interpretation of the results and how they fit within the larger array of data from other studies.

See Luciana B. Sollaci & Mauricio G. Pereira, “The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey,” 92 J. Med. Libr. Ass’n 364 (2004).

What becomes clear is that the testifying expert witnesses needs to have access to the methods and the results of published (and unpublished) papers in order to formulate and express their own opinions.  The introduction and discussion sections of relied upon papers are the scholarship and opinions of hearsay declarants, who in modern day publications are often quite untrustworthy.  The first and last section of most articles would rarely satisfy the procedural requirements of Federal Rule of Civil Procedure 26; nor would they satisfy the evidential reliability requirements of Rule 702. Rule 703’s limitation to “facts and data” should exclude the flood of hearsay opinion from relied upon studies by forcing expert witnesses to rely upon what is really necessary to their opinions.  If the testifying expert witness cannot testify without the scholarship and opinions of the relied upon studies, then he is probably not sufficiently expert to be giving an opinion in court.

There are many clear statements in the medical literature, which caution the consumers of medical studies against misleading claims.  Several years ago, the British Medical Journal published a paper by Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093 (2004).  The authors distill their advice down to six suggestions in a “[g]uide to avoid being misled by biased presentation and interpretation of data, the first [suggestion] of which is to:  “Read only the Methods and Results sections; bypass the Discuss section.”  Id. at 1093 (emphasis added).

The federal courts have generally been oblivious to the problem of permitting fulsome presentation of hearsay opinions from the discussion and conclusion sections of articles relied upon by testifying expert witnesses.

The Supreme Court’s decision in Joiner provides a striking example.  The Court correctly assessed that plaintiffs’ expert witnesses in that case were relying upon pathologically deficient and unreliable evidence.  (Some of the expert witnesses in Joiner are known repeated offenders against Rule 702.)  In reaching the right result, and in advancing the jurisprudence of the reliability of expert witness opinion testimony, Joiner, however, stumbled in its analysis of the role of reliance upon published studies.  In his opinion in Joiner, Chief Justice Rehnquist gave considerable weight to the consideration that the plaintiffs’ expert witnesses relied upon studies, the authors of which explicitly refused to interpret as supporting a conclusion of human disease causation.  See General Electric Co. v. Joiner, 522 U.S. 136, 145-46 (1997) (noting that the PCB studies at issue did not support expert witnesses’ conclusion that PCB exposure caused cancer because the study authors, who conducted the research, were not willing to endorse a conclusion of causation).

Although the PCB study authors were well justified in their respective papers in refraining from over-interpreting their data and analyses, this consideration is of doubtful general value in evaluating the reliability of an expert witness’s proposed testimony.  First, as some plaintiffs’ counsel have argued, the testifying expert witness may be relying upon a more extensive and supportive evidentiary display than considered by the study authors.  The study, standing alone, might not support causation, but when considered with other evidence, the study could take on some importance in supporting a causal conclusion.  (This consideration would surely not save the sadly deficient opinions challenged in Joiner.) Second, as I have pointed out above, the Discussion sections of published papers of little value.  They are almost never comprehensive reviews of the subject matter, and they are often little more than the personal opinions of the study authors.  Peer reviewers may call for some acknowledgments of the weaknesses of the study, but the authors are generally allowed to press their speculations unchecked.

The use of a paper’s Discussion section to measure the reliability of a proffered expert testimony runs contrary to how scientists generally read and interpret papers.  Chief Justice Rehnquist’s emphasis upon the study authors’ Discussion of their own studies ignores the first important principal of interpreting medical studies, in an evidence-based world view:  In critically reading and evaluating a study, one should ignore anything in the paper other than the Methods and Results sections.

Joiner’s misplaced emphasis upon study authors’ Discussion sections has gained a foothold in the case law interpreting Rule 702.  In Huss v. Gayden, 571 F.3d 442  (5th Cir. 2009), for example, the Court declared:

“It is axiomatic that causation testimony is inadmissible if an expert relies upon studies or publications, the authors of which were themselves unwilling to conclude that causation had been proven.”

Id. (citing Vargas v. Lee, 317 F.3d 498, 501-01 (5th Cir. 2003) (noting that studies that did not themselves embrace causal conclusions undermined the reliability of the plaintiffs’ expert witness’s testimony that trauma caused fibromyalgia), and McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1247-48 (11th Cir. 2005) (expert witnesses’ reliance upon studies that did not reach causal conclusions about ephedrine supported the challenge to the reliability of their proffered opinions).

The reference to what authors of relied upon papers state, in Joiner, perpetuates an authority-based view of science to the detriment of requiring good and sufficient reasons to support the testifying expert witnesses’ opinions.  The problem with Joiner’s suggestion that expert witness opinion should not be admissible if it disagrees with the study authors’ Discussion section is that sometimes study authors grossly over-interpret their data.  When it comes to scientific studies written by “political scientists” (scientists who see their work as advancing a political cause or agenda), then the Discussion section often becomes a fertile source of unreliable, speculative opinions that should not be given credence in Rule 104(a) contexts, and certainly should not be admissible in trials.

Perhaps the Discussion section, in the context of a Rule 104(a) proceeding, has some role in evaluating the challenged expert witness’s opinion, but surely it is a weak factor at best.  And clearly, the disagreement with the study authors’ conclusions or opinions, as reflected by speculative Discussion sections, can cut both ways.  Study authors may downplay their findings – appropriately or inappropriately, but study authors often overplay their findings and distort or misinterpret how their findings fit into the full picture of other studies and other evidence.  The quality of peer-reviewed publications is simply too irregular and unpredictable to make the subjective, evaluative comments in hearsay papers the touchstone for admissibility or inadmissibility.

There have been, and will continue to be, occasions in which published studies contain data, relevant and important to the causation issue, but which studies also contain speculative, personal opinions expressed in the Introduction and Discussion sections.  The parties’ expert witnesses may disagree with those opinions, but such disagreements hardly reflect poorly upon the testifying witnesses.  Neither sides’ expert witnesses should be judged by those out-of-court opinions.  Perhaps the hearsay Discussion section may be considered under Rule 104(a), which suspends the application of the Rules of Evidence, but it should hardly be an important or dispositive factor, other than raising questions for the reviewing court.

Expert witnesses should not be constrained or excluded for relying upon study data, when they disagree with the hearsay authors’ conclusions or discussions.  Given how many journals cater to advocacy scientists, and how variable the quality of peer review is, testifying expert witnesses should be required to have the expertise to interpret the data without substantial reliance upon, or reference to, the interpretative comments in the published literature.

VIII.  The Relationship Between Rules 703 and 705

Rule 705 simply provides:

The expert may testify in terms of opinion or inference and give reasons therefor without first testifying to the underlying facts or data, unless the court requires otherwise. The expert may in any event be required to disclose the underlying facts or data on cross-examination.

Rule 705, despite its brevity and apparent simplicity, encourages radical changes in the presentation of expert witness testimony in the courtroom.  Rule 705 permits expert witnesses to give opinions in the most conclusory terms.  Combined with Rule 703’s removal of admissibility as a requirement for materials “reasonably relied upon” by the expert witness, Rule 705 achieves the collapsing of difficult, technical issues into sound bites for juries and judges who increasingly suffer from inability to give sustained attention to such matters.  Under the banner of “convenience” and “economy,” these Rules operate to shift the burden to the crossexaminer to elicit the bases of an expert’s opinion as well as to then engage the expert witness on the reasonableness of his reliance, his methodology, and his application of method to the facts of the case, admissible or not.

The upshot of these changes is that the direct examination of an expert witness can often be very short, and it can be filled up with details of the expert witness’s qualifications and thinly veiled attempts to accredit the witness, even in advance of any attack on credibility.  The expert can then state his opinion as a conclusion, without any of the “messy” research facts or data, or other details.  The crossexaminer is left to dig through the bases, with judge and jury looking impatiently at the clock.  This imbalance creates practical and equitable hardships in how the Federal Rules allocate responsibility for developing factual bases for expert witness opinion between presenting and opposing counsel.  When courts impose time limits in trial of complex matters, the inequity created by the modern Rule 703 is compounded.[xv] Rule 705 gives trial courts discretion to require disclosure of bases.  In the proper case, counsel must be vigilant to motions to require this disclosure before the expert witness delivers his opinion.

Conclusion

Although Rule 703 successfully addresses some evidentiary problems in presenting expert witness opinion testimony, serious problems remain.  The Rule continues to permit expert witnesses to serve as conduits for inadmissible evidence, including opinion evidence that may escape the gatekeeping of Rule 702.  As legal scholars have pointed out, the Rule raises basic issues of fundamental fairness and constitutionality in both civil and criminal proceedings.[xvi] It is time for the Advisory Committee to go beyond restyling the Rule, and to reconsider its substance.

{This post is a revision of my article in article in 7 Proof 3 (Spring 2009).   An earlier version of that article was presented as part of an ALI-ABA Course of Study, “Opinion and Expert Testimony in Federal and State Courts,” on February 15, 2008, in San Diego, California.}


[i] Although elsewhere in the Federal Rules, the Advisory Committee disparaged limiting instructions, commentators and some courts engaged in the “judicial deception” of instructing the jury to accept the inadmissible basis as part of the explanation for the expert witness’s opinion, but not to accept or consider the basis for its truth.  See United States v. Grunewald, 233 F.2d 556, 574 (2d Cir. 1956)(“judicial deception”)(Frank, J.); Nash v. United States, 54 F.2d 1006, 1007 (2d Cir. 1932)(“mental gymnastic”)( Hand, J.).  Not only would such limiting instructions aggravate the problem by giving emphasis to the inadmissible evidence, the instructions surely would confuse most reasonable people who are trying to understand whether an expert witness has applied a reliable method to correctly ascertained “facts or data.”  The bases of an expert witness’s opinion are irrelevant if they do not have some evidential support.

[ii] Peteet v. Dow Chemical Co., 868 F.2d 1428, 1432 (8th Cir. 1989)(“[T]he trial court should defer to the expert’s opinion of what they find reasonably reliable.”); United States v. Sims, 514 F.2d 147 (9th Cir. 1975)(Rule 703 enacted, but not yet in effect)(affirming trial court’s allowing government’s psychologist to rely upon I.R.S. agent’s statement that defendant had previous “legal difficulties” to counter defendant’s claim of recent insanity against tax enforcement).

[iii] International Adhesive Coating Co. v. Bolton Emerson International, Inc., 851 F.2d 540, 544-45 (1st Cir. 1988)(equating reasonableness with “normal practice”).

[iv] United States v. Locascio, 6 F.3d 924, 938 (2d Cir. 1993).  The Third Circuit, which had adopted an extremely laissez-faire approach to expert witness testimony, signaled its compliance with the Supreme Court’s decision in Daubert, in In re Paoli Railroad Yard PCB Litigation:

We now make clear that it is the judge who makes the determination of reasonable reliance, and that for the judge to make the factual determination under Rule 104(a) that an expert is basing his or her opinion on a type of data reasonably relied upon by experts, the judge must conduct an independent evaluation into reasonableness.  The judge can of course take into account the particular expert’s opinion that experts reasonably rely on that type of data, as well as the opinions of other experts as to its reliability, but the judge can also take into account other factors he or she deems relevant.

35 F.3d 717, 748 (3d Cir. 1994)(emphasis in original).

[v] In re Agent Orange Product Liability Lit., 611 F. Supp. 1223 (E.D.N.Y. 1985), aff’d on other grounds, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).

[vi] 611 F. Supp. at 1246.  But see Fed. R. Evid. 803(4).

[vii] See, e.g., Soldo v. Sandoz Pharm. Corp., 244 F.Supp. 2d 434, 572 (W.D.Pa. 2003)(barring expert witness opinion testimony, under Rule 702 and 703).

[viii] See, e.g., Ealy v. Richardson-Merrell, Inc., 897 F.2d 1159, 1161-62 (D.C. Cir. 1990)(affirming exclusion of an expert whose opinion lacked scientific foundation, and ignored extensive contrary, published data); Lima v. United States, 708 F.2d 502, 508 (10th Cir. 1983)(affirming exclusion of epidemiologist who relied upon data not reasonably relied upon by experts in the fields of epidemiology and neurology).

[ix] See, e.g., Opinion, N.J. Super. Ct., Middlesex Cty., Docket L-5532-01-MT (denying motion to preclude expert witnesses from relying, in part, on unpublished study)(Garruto, J.).  This issue was anticipated in one of the leading cases on expert witness opinion testimony.  In re Paoli, 35 F.3d 717, 749 n. 19 (3d Cir. 1994)(pointing out that Rules 702 and 703 were not redundant, and that reliable opinions might be partially based upon unreliable data).

[x]See Beech Aircraft v. Rainey, 488 U.S. 153, 168 (1988)(“The distinction between statements of facts and opinion is, at best, one of degree.”)

[xi] American Key Corp. v. Cole Nat’l Corp., 762 F.2d 1569, 1580 (11th Cir. 1985)(“Expert opinions ordinarily cannot be based upon the opinions of others whether those opinions are in evidence or not.”); see also TK-7 Corp. v. Estate of Barbouti, 993 F.2d 722, 732 (10th Cir. 1993)(affirming exclusion of expert testimony under Rule 703 “where the expert failed to demonstrate any basis for concluding that another individual’s opinion on a subjective financial prediction was reliable, other than the fact that it was the opinion of someone he believed to be an expert who had a financial interest in making an accurate prediction”).

[xii] See, e.g., Lewis v. Rego Co.,757 F.2d 66, 73-74 (3d Cir. 1985)(holding that trial court had erred in excluding a testifying expert witness’s recounting of, and reliance upon, an out-of-court conversation with a non-testifying expert).  See also Barris v. Bob’s Drag Chutes & Safety Equipment, 685 F. 94, 102 n.10 (3d Cir. 1982)(“Under Rule 703, an expert’s testimony may be formulated by the use of the facts, data and conclusions of other experts.”); Seese v. Volkswagenwerk A.G., 648 F.2d 833, 845 (affirming admissibility, under Rule 703, of accident-reconstruction expert, whose opinion was based upon facts, data, and conclusions of a physician).

[xiii] Advisory Committee Note to Rule 703 (emphasis added).

[xiv] Fed. R. Evid. 803(6).

[xv] Evidentiary rules in state courts, even those states that have adopted Rule 703, vary considerably in how disclosure is required or allowed.  Pennsylvania, for instance, has adopted its Rule 703 verbatim from the Federal Rules, but it handles disclosure very differently under its version of Rule 705:

The expert may testify in terms of opinion or inference and give reasons therefore; however the expert must testify as to the facts or data on which the opinion or inference is based.

Pa. R. Evid. 705 (emphasis added).  See, e.g., Hansen v. Wyeth, Inc., 72 Pa. D. & C. 4th 225, 2005 WL 1114512, at *13, *19 (Phila. Ct. Common Pleas 2005)(Bernstein, J.)(granting new trial to verdict loser as result of expert witness’s failure or inability to provide all bases for his opinion).

[xvi] See, e.g., Seaman, “Triangulating Testimonial Hearsay:  The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008).

The New Wigmore on Learned Treatises

September 12th, 2011

I am indebted to Professor David Bernstein for calling my attention to the treatment of learned treatises in the new edition of his treatise on expert evidence:  David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The  New Wigmore:  A Treatise on Evidence – Expert Evidence (2d ed. 2011).  Professor Bernstein suggested that I might find the treatment of learned treatises consistent with some of my concerns about the outdated rationale for allowing such works to be admissible for their truth.  See Unlearning The Learned Treatise Exception,” and  “Further Unraveling of the Learned Treatise Exception.”

Having used the first edition of the New Wigmore, I purchased a copy of the second edition of the volume on expert evidence.  The second edition appears to be a valuable addition to the scholarly literature on expert witness opinion evidence, and I recommend it strongly to students and practitioners who wrestle with expert witness issues.

Chapter 5, a treatment of “Treatises and Other Learned Writings,” is a good descriptive account of the historical development of the common law hearsay exception and its modification by various statutes and codes.  Unlike many discussions of the learned treatise exception, The New Wigmore delves into the overlap between 803(18), which specifies “reliable authority,” and the reliability factors set out in the most recent version of Rule 702.  Although the case law on the relationship between the two rules is sparse and inconsistent, the authors make a strong case for a reliability criterion for learned treatises when such treatises are offered for the truth of the matters asserted.

The New Wigmore acknowledges that many courts and scholars have assumed that juries and most normal people have a difficult time following a limiting instruction to consider a learned treatise for assessing credibility but not for the truth. Refreshingly, the New Wigmore rejects the notion that difficulty in following a limiting instruction (if real) equates to meaninglessness for the distinction.  In the context of Rule 702 or 703 motions to exclude, and accompanying motions for summary judgment, the issue whether a learned treatise statement is admissible for its truth may be outcome determinative of the motions.

The sad truth, touched on but not directly confronted by the New Wigmore, is that so much of the biomedical literature is carelessly written, with only cursory “peer review.”  SeeMisplaced Reliance On Peer Review to Separate Valid Science From Nonsense” (Aug. 14, 2011). Professor Wigmore was impressed by the desire of treatise authors to offer trustworthy opinions to avoid ridicule by their peers; in our era, scientists are not so impressed by publication as a guarantor of trustworthiness.  See, e.g., Douglas G. Altman, “Poor Quality Medical Research:  What Can Journals Do?” 287 J. Am. Med. Ass’n 2765 (2002).  There is a good deal of rubbish out there in the published literature, and most courts have not considered how to stem the flood of this rubbish into the courtroom through the 803(18) loophole.

There are yet other problems with Rule 803(18) discussed in New Wigmore.  The language of the rule is ambiguous. Does the requirement of “reliable authority” apply to the author, the text or journal, or the statement itself?  If the author or the publication, then there really is no assurance that the work satisfies reliability in the way required by 702.  If the status of the text, the journal, or the author is the sole criterion under 803(18), then we have a Ferebee-like rule that countenances the opinion of any willing, available, qualified author.  And the bar to publication these days is probably lower than the bar to being selected as a suitable testifying expert witness.

Authority is not a concept that is much at home in scientific discourse.  Nulla in verba, and all that.  If a statement in a publication is truly “authoritative,” it is because it is well supported by the facts and data on which it is based.

The New Wigmore goes beyond the coincidence of the word “reliable” in Rules 702 and 803(18), and argues that the logic of using a hearsay “learned treatise” for the truth of the matter asserted requires that the statement itself is reliably based. Here is how the second edition states its case for importing the requirements of Rule 702 into Rule 803(18):

“It would be not so difficult to conclude that assertions in a treatise that are not ‘the product of reliable principles or methods’ under Rule 702(2), for example, also are not ‘a reliable authority’ under Rule 803(18).”

Id. at 228, § 5.4.2. The triple negative may obscure the gist of the authors’ meaning, but I think their point is clear.  Let me attempt to restate their point without the negatives:

It is easy to conclude that treatise opinions that fail 702 would fail to qualify for 803(18) exception.

Of course, if a treatise statement satisfies 702, then that statement would not necessarily qualify for the 803(18) exception.  The learned treatise also has a “recognition” requirement; one of the testifying expert witnesses must recognize the treatise as “authoritative,” “learned,” or whatnot, or the court must take judicial notice of its status.  The treatise could have the most detailed discussion and documentation of its opinions, with flawless reasoning and evidential assessment, but if it were just translated from Georgian, and unknown to the expert witnesses and the court, it would not qualify as a learned treatise.  More than epistemic reliability seems to be required in terms of the status of the publication: the renown of the author and/or text. The status of the publication creates a normative obligation upon the expert witnesses to be aware of its pronouncements and to reconcile or to incorporate the publication’s statements into their courtroom opinions.

The New Wigmore’s rejection of “authoritarianism” for Rule 803(18) is commendable, but difficult to achieve in practice.  Rule 702 has evolved into an important tool to ensure that opinions offered in court are “evidence based,” rather than predicated solely on the professional status of their authors.  Along with the epistemic requirements of Rule 702, the procedural requirements of Federal Rule of Civil Procedure 26 ensure that the opinion’s author has stated all opinions, and all bases, as well as everything considered along the way in forming the opinion.  The reality is that most textbooks and treatises have short, conclusory consideration of issues that are likely important to the resolution of a lawsuit.  Frequently, a textbook cites a few studies that support the author’s opinion, without a sustained discussion of conflicting evidence, study validity, and the like.  An opinion that might be the subject of a 50 page Rule 26-compliant report may be reduced to a sentence or two in a textbook, which was published several years before the close of discovery in the case.  These are hardly propitious conditions for a truly learned treatise, and a 702-sufficient opinion.

Perhaps more promising is the development of the “systematic review,” which sets out to provide an evidence-based basis for causal claims. See, e.g., Michael B. Bracken, “Commentary: Toward systematic reviews in epidemiology,” 30 Internat’l J.  Epidem. 954 (2001).  Such reviews identify a research question, pre-specify the methodological approach to varying study designs and validity questions, search for all the data available that can contribute to answering the question, and provide a disciplined attempt to answer the research question.  Systematic reviews come very close to satisfying the needs of the courtroom, and the requirements of both Rules 702 and 803(18).  The trouble is, of course, that most traditional textbooks and narrative reviews, and “learned treatises,” are far off course from the epistemic path taken by systematic reviews.

The New Wigmore also raises the interesting question whether individual published studies are “learned treatises.” If they were, then an expert witness could rely upon them, per Rule 703, and the sponsoring party could actually offer them into evidence (or at least as an exhibit, with some right to show the jury their results).  An individual study, however, would seem to fall way short of the mark of the comprehensiveness required for a Rule 702 opinion, at least in the situation where there were other studies.

An irreducible problem in this area is that Rule 702 separates the “authority” of the speaker, in the form of qualifications to give an expert opinion, from the “reliability” of the opinion itself.  This separation, when followed, has been a huge achievement for the improvement of science in the courtroom.  Qualifications are a rather minimal necessary requirement, and even at best are a weak proxy for the reliability of the opinion given in court.  Many key 702 decisions involved expert witnesses with substantial, impressive qualifications. Despite these qualifications, courts excluded the witnesses’ proffered opinions because they were inadequately or unreliably supported.  Reliability under Rule 702 is thus an “evidence-based” requirement. The New Wigmore authors are correct that it is time to abandon “authority” as the guarantor of reliability in favor of “evidence-based principles.

Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence

September 2nd, 2011

It has been an interesting year in the world of expert witnesses.  We have seen David Egilman attempt a personal appeal of a district court’s order excluding him as an expert.  Stephen Ziliak has prattled on about how he steered the Supreme Court from the brink of disaster by helping them to avoid the horrors of statistical significance.  And then we had a philosophy professor turned expert witness, Carl Cranor, publicly touting an appellate court’s decision that held his testimony admissible.  Cranor, under the banner of the Center for Progressive Reform (CPR), hails the First Circuit’s opinion as the greatest thing since Sir Isaac Newton.   Carl Cranor, “Milward v. Acuity Specialty Products: How the First Circuit Opened Courthouse Doors for Wronged Parties to Present Wider Range of Scientific Evidence” (July 25, 2011).

Philosophy Professor Carl Cranor has been trying for decades to dilute the scientific approach to causal conclusions to permit the precautionary principle to find its way into toxic tort cases.  Cranor, along with others, has also criticized federal court expert witness gatekeeping for deconstructing individual studies, showing that the individual studies are weak, and ignoring the overall pattern of evidence from different disciplines.  This criticism has some theoretical merit, but the criticism is typically advanced as an excuse for “manufacturing certainty” from weak, inconsistent, and incoherent scientific evidence.  The criticism also ignores the actual text of the relevant rule – Rule 702, which does not limit the gatekeeping court to assessing individual “pieces” of evidence.  The scientific community acknowledges that there are times when a weaker epidemiologic dataset may be supplemented by strong experiment evidence that leads appropriately to a conclusion of causation.  See, e.g., Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxicological Sci. 223 (2011) (noting the lack of a systematic, transparent way to integrate toxicologic and epidemiologic data to support conclusions of causality; proposing a “grid” to permit disparate lines of evidence to be integrated into more straightforward conclusions).

For the most part, Cranor’s publications have been ignored in the Rule 702 gatekeeping process.  Perhaps that is why he shrugged his academic regalia and took on the mantle of the expert witness, in Milward v. Acuity Specialty Products, a case involving a claim that benzene exposure caused plaintiff’s acute promyelocytic leukemia (APL), one of several types of acute myeloid leukemia.  Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D.Mass. 2009) (O’Toole, J.).

Philosophy might seem like the wrong discipline to help a court or a jury decide general and specific causation of a rare cancer, with an incidence of less 8 cases per million per year.  (A PubMed search on leukeumia and Cranor yielded no hits.)  Cranor supplemented the other, more traditional testimony from a toxiciologist, by attempting to show that the toxicologist’s testimony was based upon sound scientific method.  Cranor was particularly intent to show that the toxicologist, Dr. Martyn Smith, had used sound method to reach a scientific conclusion, even though he lacked strong epidemiologic studies to support his opinion.

The district court excluded Cranor’s testimony, along with plaintiff’s scientific expert witnesses.  The Court of Appeals, however, reversed, and remanded with instructions that plaintiff’s scientific expert witnesses’ opinions were admissible.  639 F.3d 11 (1st Cir. 2011).  Hence Cranor’s and the CPR’s hyperbole about the opening of the courthouse doors.

The district court was appropriately skeptical about plaintiff’s expert witnesses’ reliance upon epidemiologic studies, the results of which were not statistically significant.  Before reaching the issue of statistical significance, however, the district court found that Dr. Smith had relied upon studies that did not properly support his opinion.  664 F.Supp. 2d at 148.  The defense presented Dr. David Garabrant, an expert witness with substantial qualifications and accomplishments in epidemiologic science.  Dr. Garabrant persuaded the Court that Dr. Smith had relied upon some studies that tended to show no association, and others that presented faulty statistical analyses.  Other studies, relied upon by Dr. Smith, presented data on AML, but Dr. Smith speculated that these AML cases could have been APL cases.  Id.

None of the studies relied upon by plaintiffs’ Dr Smith had a statistically significant result for APL.  Id. at 144. The district court pointed out that scientists typically take care to rely upon data only that shows “statistical significance,” and Dr. Smith (plaintiff’s expert witness) deviated from sound scientific method in attempting to support his conclusion with studies that had not ruled out chance as an explanation for their increased risk ratios.  Id.  The district court did not summarize the studies’ results, and so the unsoundness of plaintiff’s method is difficult to evaluate.  Rather than engaging in hand waving and speculating about “trends” and suggestions, those witnesses could have performed a meta-analysis to increase the statistical precision of a summary point estimate beyond what was achieved in any single, small study.  Neither the plaintiff nor the district court addressed the issue of aggregating study results to address the role of chance in producing the observed results.

The inability to show a statistically significant result was not surprising given how rare the APL subtype of AML is.  Sample size might legitimately interfere with the ability of epidemiologic studies to detect a statistically significant association that really existed.  If this were truly the case, the lack of a statistically significant association could not be interpreted to mean the absence of an association without potentially committing a type II error. In any event, the district court in Milward was willing to credit the plaintiffs’ claim that epidemiologic evidence may not always be essential for establishing causality.  If causality does exist, however, epidemiologic studies are usually required to confirm the existence of the causal relationship.  Id. at 148.

The district court also took a close look at Smith’s mechanistic biological evidence, and found it equally speculative.  Although plausibility is a desirable feature of a causal hypothesis, it only sets the stage for actual data:

“Dr. Smith’s opinion is that ‘[s]ince benzene is clastogenic and has the capability of breaking and rearranging chromosomes, it is biologically plausible for benzene to cause’ the t(15;17) translocation. (Smith Decl. ¶ 28.b.) This is a kind of ‘bull in the china shop’ generalization: since the bull smashes the teacups, it must also smash the crystal. Whether that is so, of course, would depend on the bull having equal access to both teacups and crystal.”

Id. at 146.

“Since general extrapolation is not justified and since there is no direct observational evidence that benzene causes the t(15;17) translocation, Dr. Smith’s opinion — that because benzene is an agent that can cause some chromosomal mutations, it is ‘plausible’ that it causes the one critical to APL—is simply an hypothesis, not a reliable scientific conclusion.”

Id. at 147.

Judge O’Toole’s opinion is a careful, detailed consideration of the facts and data upon which Dr. Smith relied upon, but the First Circuit found an abuse of discretion, and reversed. 639 F.3d 11 (1st Cir. 2011).

The Circuit incorrectly suggested that Smith’s opinion was based upon a “weight of the evidence” methodology described by “the world-renowned epidemiologist Sir Arthur Bradford Hill in his seminal methodological article on inferences of causality. See Arthur Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965).” Id. at 17.  This suggestion is remarkable because everyone knows that it was Arthur’s much smarter brother, Austin, who wrote the seminal article and gave the Bradford Hill name to the famous presidential address published by the Royal Society of Medicine.  Arthur Bradford Hill was not even a knight if he existed at all.

The Circuit’s suggestion is also remarkable for confusing a vague “weight of the evidence” methodology with the statistical and epidemiologic approach of one of the 20th century’s great methodologists.  Sir Austin is known for having conducted the first double-blinded randomized clinical trial, as well as having shown, with fellow knight Sir Richard Doll, the causal relationship between smoking and lung cancer.  Sir Austin wrote one of the first texts on medical statistics, Principles of Medical Statistics (London 1937).  Sir Austin no doubt was turning in his grave when he was associated with Cranor’s loosey-goosey “weight of the evidence” methodology.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).

The Circuit adopted a dismissive attitude towards epidemiology in general, citing to an opinion piece by several cancer tumor biologists, whom the court described as a group from the National Cancer Institute (NCI).  The group was actually a workshop sponsored by the NCI, with participants from many institutions.  Id. at 17 (citing Michele Carbon[e] et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004)).  The cited article did report some suggestions for modifying Bradford Hill’s criteria in the light of modern molecular biology, as well as a sense of the group that there was no “hierarchy” in which epidemiology was at the top.  (The group definitely did not address the established concept that some types of epidemiologic studies are analytically more powerful to support inferences of causality than others — the hierarchy of epidemiologic evidence.)

The Circuit then proceeded to evaluate Dr. Smith’s consideration of the available epidemiologic studies.  The Circuit mistakenly defined an “odds ratio” as the “the difference in the incidence of a disease between a population that has been exposed to benzene and one that has not.”  Id. at 24. Having failed to engage with the evidence sufficiently to learn what an odds ratio was, the Circuit Court then proceeded to state that the difference between Dr. Garabrant and Dr. Smith, as to how to calculate the odds ratio in some of the studies, was a mere difference in opinion between experts, and Dr. Garabrant’s criticisms of Dr. Smith’s approach went to the weight, not the admissibility, of the evidence.  These sparse words are, of course, a legal conclusion, not an explanation, and the Circuit leaves us without any real understanding of how Dr. Smith may have gone astray, but still have been advancing a legitimate opinion within epidemiology, which was not his discipline.  Id. at 22. If Dr. Smith’s idea of an odds ratio was as incorrect as the Circuit’s, his calculation may have had no validity whatsoever, and thus his opinions derived from his flawed ideas may have clearly failed the requirements of Rule 702.  The Circuit’s opinion is not terribly helpful in understanding anything other than its summary rejection of the district court’s more detailed analysis.

The Circuit also advanced the “impossibility” defense for Dr. Smith’s failure to rely upon epidemiologic studies with statistically significant results.  Id. at 24. As noted above, such studies fail to rule out chance for their finding of risk ratios above or below 1.0 (the measure of no association).  Because the likelihood of obtaining a risk ratio of exactly 1.0 is vanishingly small, epidemiologic science must and does consider the role of chance in explaining data that diverges from a measure of no association.  Dr. Smith’s hand waving about the large size of the studies needed to show an increased risk may have some validity in the context of benzene exposure and APL, but it does not explain or justify the failure to use aggregative techniques such as meta-analysis.  The hand waving also does nothing to rule out the role of chance in producing the results he relied upon in court.

The Circuit Court appeared to misunderstand the very nature of the need for statistical evaluation of stochastic biological events, such as APL incidence in a population.  According to the Circuit, Dr. Smith’s reliance upon epidemiologic data was merely

“meant to challenge the theory that benzene exposure could not cause APL, and to highlight that the limited data available was consistent with the conclusions that he had reached on the basis of other bodies of evidence. He stated that ‘[i]f epidemiologic studies of benzene-exposed workers were devoid of workers who developed APL, one could hypothesize that benzene does not cause this particular subtype of AML.’ The fact that, on the  contrary, ‘APL is seen in studies of workers exposed to benzene where the subtypes of AML have been separately analyzed and has been found at higher levels than expected’ suggested to him that the limited epidemiological evidence was at the very least consistent with, and suggestive of, the conclusion that benzene can cause APL.

* * *

Dr. Smith did not infer causality from this suggestion alone, but rather from the accumulation of multiple scientifically acceptable inferences from different bodies of evidence.”

Id. at 25

But challenging the theory that benzene exposure does not cause APL does not help show the validity of the studies relied upon, or the inferences drawn from them.  This was plaintiffs’ and Dr. Smith’s burden under Rule 702, and the Circuit seemed to lose sight of the law and the science with Professor Cranor’s and Dr. Smith’s sleight of hand.  As for the Circuit’s suggestion that scraps of evidence from different kinds of scientific studies can establish scientific knowledge, this approach was rejected by the great mathematician, physicist, and philosopher of science, Henri Poincaré:

“[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”

Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).  Litigants, either plaintiff or defendant, should not be allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science.

As unclear and dubious as the Circuit’s opinion is, the court did not throw out the last 18 years of Rule 702 law.  The Court distinguished the Milward case, with its sparse epidemiologic studies from those cases “in which the available epidemiological studies found that there is no causal link.”  Id. at 24 (citing Norris v. Baxter Healthcare Corp., 397 F.3d 878, 882 (10th Cir.2005), and Allen v. Pa. Eng’g Corp., 102 F.3d 194, 197 (5th Cir.1996).  The Court, however, provided no insight into why the epidemiologic studies must rise to the level of showing no causal link before an expert can torture weak, inconsistent, and contradictory data to claim such a link.  This legal sleight of hand is simply a shifting of the burden of proof, which should have been on plaintiffs and Dr. Smith.  Desperation is not a substitute for adequate scientific evidence to support a scientific conclusion.

The Court’s failure to engage more directly with the actual data, facts, and inferences, however, is likely to cause mischief in federal cases around the country.

Ziliak Gives Legal Advice — Puts His Posterior On the Line

August 31st, 2011

I have posted before about the curious saga of two university professors of economics who curiously tried to befriend the United States Supreme Court.  Professors Ziliak and McCloskey submitted an amicus brief to the Court, in connection with Matrixx Initiativives, Inc. v. Siracusano, ___ U.S. ___, 131 S.Ct. 1309 (2011).  Nothing unusual there, other than the Professors’ labeling themselves “Statistics Experts,” and then proceeding to commit a statistical howler of deriving a posterior probability from only a p-value.  See The Matrixx Oversold” (April 4, 2011).

I seemed to be alone in my dismay over this situation, but recently Professor David Kaye, an author of the chapter on statistics in the Reference Manual on Scientific Evidence, weighed in with his rebuttal to Ziliak and McCloskey’s erroneous statistical contentions.  SeeThe Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part I” (August 19, 2011), and “The Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part II” (August 26, 2011).  Kaye’s analysis is well worth reading.

Having attempted to bamboozle the Justices on statistics, Stephen Ziliak has now turned his attention to an audience of statisicians and students of statistical science, with a short article in Significance on the Court’s decision in Matrixx.  Stephen Ziliak, “Matrixx v. Siracusano and Student v. Fisher:  Statistical Significance on Trial,”  Significance 131 (September 2011).  Tellingly, Ziliak did not advance his novel, erroneous views of how to derive posterior odds or probabilities from p-values in the pages of a magazine published by the Royal Statistical Society.  Such gems were reserved for the audience of Justices and law clerks in Washington, D.C.  Instead of holding forth on statistical issues, Ziliak has used the pages of a statistical journal to advance equally bizarre, inexpert views about the legal meaning of a Supreme Court case.

The Matrixx decision involved the appeal from a dismissal of a complaint for failure to plead sufficient allegations in a securities fraud action.  No evidence was ever offered or refused; no expert witness opinion was held reliable or unreliable.  The defendant, Matrixx Initiatives, Inc., won the dismissal at the district court, only to have the complaint reinstated by the Court of Appeals for the Ninth Circuit.  The Supreme Court affirmed the reinstatement, and in doing so, did not, and could not, have created a holding about the sufficiency of evidence to show causation in a legal proceeding.  Indeed, Justice Sotomayor, in writing for a unanimous Court, specifically stated that causation was not at issue, especially given that evidentiary displays far below what is necessary to show causation between a medication and an adverse event might come to the attention of the FDA, which agency in turn might find the evidence sufficient to order a withdrawal of the medication.

Ziliak, having given dubious statistical advice to the U.S. Supreme Court, now sets himself up to give equally questionable legal advice to the statistical community.  He asserts that Matrixx claimed that anosmia (the loss of the sense of smell) was unimportant because not “statistically significant.”  Id. at 132.  Matrixx Initiatives no doubt made several errors, but it never made this erroneous claim.  Ziliak gives no citation to the parties’ briefs; nor could one be given.  Matrixx never contended that anosmia was unimportant; its claim was that the plaintiffs had not sufficiently alleged facts that Matrixx had knowledge of a causal relationship such that its failure to disclose adverse event reports became a “material” omission under the securities laws.  The word “unimportant” does not occur in the Matrixx’s briefs; nor was it uttered at oral argument.

Ziliak’s suggestion that “[t]he district court dismissed the case on the basis that investors did not prove ‘materiality’, by which that court meant ‘statistical significance’,” is nonsense.  Id. at 132.  The issue was never the sufficiency of evidence.  Matrixx did attempt to equate materiality with causation, and then argued that allegations of causation required, in turn, allegations of statistical significance.  In arguing the necessity of statistical significance, Matrixx was implicitly suggesting that an evidentiary display that fell short of supporting causation could not be material, when withheld from investors.  The Supreme Court had an easy time of disposing of Matrixx’s argument because causation was never at issue.  Everything that the Court did say about causation is readily discernible as dictum.

Ziliak erroneously reads into the Court’s opinion a requirement that a pharmaceutical company, reporting to the Securities and Exchange Commission “can no longer hide adverse effect [sic] reports from investors on the basis that reports are not statistically significant.”   Id. at 133.  Ziliak incorrectly refers to adverse event reports as “adverse effect reports,” which is a petitio principii.  Furthermore, this was not the holding of the Court.  The potentially fraudulent aspect of Matrixx’s conduct was not that it had “hidden” adverse event reports, but rather that it had adverse event reports and a good deal of additional information, none of which it had disclosed to investors, when at the same time, the company chose to give the investment community particularly bullish projections of future sales.  The medication involved, Zicam, was an over-the-counter formulation that never had the rigorous testing required for a prescription medication’s new drug application.

Curiously, Ziliak, the self-described statistics expert fails to point out that adverse event reports could not achieve, or fail to achieve, statistical significance on the basis of the facts alleged in the plaintiffs’ complaint.  Matrixx, and its legal counsel, might be forgiven this oversight, but surely Ziliak the statistical expert should have noted this.  Indeed, if the parties and the courts had recognized that there never was an issue of statistical significance involved in the case, the entire premiss of Matrixx’s appeal would have been taken away.

To be a little fair to Ziliak, the Supreme Court, having disclaimed any effort to require proof of causation or to define that requisites of reliable evidence of causation, went ahead and offered its own dubious dictum on how statistical significance might not be necessary for causation:

“Matrixx’s argument rests on the premise that statistical significance is the only reliable indication of causation. This premise is flawed: As the SEC points out, “medical researchers … consider multiple factors in assessing causation.” Brief for United States as Amicus Curiae 12. Statistically significant data are not always available. For example, when an adverse event is subtle or rare, “an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance.” Id., at 15; see also Brief for Medical Researchers as Amici Curiae 11. Moreover, ethical considerations may prohibit researchers from conducting randomized clinical trials to confirm a suspected causal link for the purpose of obtaining statistically significant data. See id., at 10-11.

A lack of statistically significant data does not mean that medical experts have no reliable basis for inferring a causal link between a drug and adverse events. As Matrixx itself concedes, medical experts rely on other evidence to establish an inference of causation. See Brief for Petitioners 44-45, n. 22. We note that courts frequently permit expert testimony on causation based on evidence other than statistical significance. See, e.g., Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (C.A.6 2009); Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263-264 (C.A.4 1999) (citing cases); Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741, 744-745 (C.A.11 1986). We need not consider whether the expert testimony was properly admitted in those cases, and we do not attempt to define here what constitutes reliable evidence of causation.”

What is problematic about this passage is that Justice Sotomayor was addressing situations that were not before the Court, and about which she had no appropriate briefing.  Her suggestion that randomized clinical trials are not always ethically appropriate is, of course, true, but that does not prevent an expert witness from relying upon observational epidemiologic studies – with statistically significant results – to support their causal claims.  Justice Sotomayor’s citation to the Best and the Westberry cases, again in dictum, is equally off the mark.  Both cases involve the application of differential etiological reasoning about specific causation, which presupposes that  general causation has been previously, sufficiently shown.  Finally, Justice Sotomayor’s citation to the Wells case, which involved both general and specific causation issues, was inapposite because plaintiff’s expert witness in Wells did rely upon at least one study with a statistically significant result.  As I have pointed out before, the Wells case went on to become an example of one trial judge’s abject failure to understand and evaluate scientific evidence.

Postscript:

The Supreme Court’s statistical acumen may have been lacking, but the Justices seemed to have a good sense of what was really going on in the case.  In December 2010, Matrixx settled over 2,000 Zicam injury claims. On February 24, 2011, a month before the Supreme Court decided the Matrixx case, the federal district judge responsible for the Zicam multi-district litigation refused Matrixx’ motion to exclude plaintiffs’ expert witnesses’ causation opinions.  “First Zicam Experts Admitted by MDL Judge for Causation, Labeling Opinions” 15 Mealey’s Daubert Reporter (February 2011); In re Zicam Cold Remedy Marketing, Sales Practices and Products Liab. Litig., MDL Docket No. 2:09-md-02096, Document 1360 (D. Ariz. 2011).

After the Supreme Court affirmed the reinstatement of the securities fraud complaint, Charles Hensley, the inventor of Zicam, was arrested on federal charges of illegally marketing another drug, Vira 38, which he claimed was therapeutic and preventive for bird flu.  Stuart Pfeifer, “Zicam inventor arrested, accused of illegal marketing of flu drug,” Los Angeles Times (June 2, 2011).  Earlier this month, Mr. Hensley pleaded guilty to the charges of unlawful distribution.

Confusion Over Causation in Texas

August 27th, 2011

As I have previously discussed, a risk ratio (RR) ≤ 2 is a strong practical argument against specific causation. See Courts and Commentators on Relative Risks to Infer Specific CausationRelative Risks and Individual Causal Attribution; and  Risk and Causation in the Law.   But a relative risk greater than 2 threshold has little to do with general causation.  There are any number of well-established causal relationships, where the magnitude of the ex ante risk in an exposed population is > 1, but ≤ 2.  The magnitude of risk for cardiovascular disease and smoking is one such well-known example.

When assessing general causation from only observational epidemiologic studies, where residual confounding and bias may be lurking, it is prudent to require a RR > 2, as a measure of strength of the association that can help us rule out the role of systemic error.  As the cardiovascular disease/smoking example illustrates, however, there is clearly no scientific requirement that the RR be greater than 2 to establish general causation.  Much will depend upon the entire body of evidence.  If the other important Bradford Hill factors are present – dose-response, consistent, coherence, etc. – then risk ratios ≤ 2, from observational studies, may suffice to show general causation.  So the requirement of a RR > 2, for the showing of general causation, is a much weaker consideration than it is for specific causation.

Randomization and double blinding are major steps in controlling confounding and bias, but they are not complete guarantees that systematic bias has been eliminated.  A double-blinded, placebo-controlled, randomized clinical trial (RCT) will usually have less opportunity for bias and confounding to play a role.  Imposing a RR > 2 requirement for general causation thus makes less sense in the context of trying to infer general causation from the results of RCTs.

Somehow the Texas Supreme Court managed to confuse these concepts in an important decision this week, Merck & Co. v. Garza (August 26, 2011).

Mr. Garza had a long history of heart disease, at least two decades long, including a heart attack, and quadruple bypass and stent surgeries.  Garza’s physician prescribed 25 mg Vioxx for pain relief.  Garza died less than a month later, at the age of 71, of an acute myocardial infarction.  The plaintiffs (Mr. Garza’s survivors) were thus faced with a problem of showing the magnitude of the risk experienced by Mr. Garza, which risk would allow them to infer that his fatal heart attack was caused by his having taken Vioxx.  The studies relied upon by plaintiffs did show increased risk, consistently, for larger doses (50 mg.) taken over longer periods of time.  The trial court entered judgment upon a jury verdict in favor of the plaintiffs.

The Texas Supreme Court reversed, and rendered the judgment for Merck.  The Court’s judgment was based largely upon its view that the studies relied upon did not apply to the plaintiff.  Here the Court was on pretty solid ground.  The plaintiffs also argued that Mr. Garza had a higher pre-medication, baseline risk, and that he therefore would have sustained a greater increased risk from short-term, low-dose use of Vioxx.  The Court saw through this speculative argument, and cautioned that the “absence of evidence cannot substitute for evidence.” Slip op. at 17.  The greater baseline does not mean that the medication imposed a greater relative risk on people like Mr. Garza, although it would mean that we would expect to see more cases from any subgroup that looked like him.  The attributable fraction and the difficulty in using risk to infer individual attribution, however, would remain the same.

The problematic aspect of the Garza case arises from the Texas Supreme Court’s conflating and confusing general with specific causation.  There was no real doubt that Vioxx at high-doses, for prolonged use, can cause heart attacks.  General causation was not at issue.  The attribution of Mr. Garza’s heart attack to his short-term, low-dose use of Vioxx, however, was at issue, and was a rather dubious claim.

The Texas Supreme Court proceeded to rely heavily upon its holding and language in Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 S.W.2d 706 (Tex. 1997).  Havner was a Bendectin case, in which plaintiffs claimed that the medication caused specific birth defects.  Both general and specific causation were contested by the parties. The epidemiologic evidence in Havner came from observational studies, either case-control or cohort studies, and not RCTs.

The Havner decision insightfully recognized that risk does not equal causation, but RR > 2 is a practical compromise for allowing courts and juries to make the plaintiff-specific attribution in the face of uncertainty.  Havner, 953 S.W.2d at 717 .  Merck latched on to this and other language, arguing that “Havner requires a plaintiff who claims injury from taking a drug to produce two independent epidemiological studies showing a statistically significant doubling of the relative risk of the injury for patients taking the drug under conditions substantially similar to the plaintiff’s (dose and duration, for example) as compared to patients taking a placebo.” Slip op. at 7.

The plaintiffs in Garza responded by arguing that their reliance upon RCTs relieved them of Havner‘s requirement of showing a RR > 2.

The Texas Supreme Court correctly rejected the plaintiffs’ argument and followed its earlier decision in Havner on specific causation:

“But while the controlled, experimental, and prospective nature of clinical trials undoubtedly make them more reliable than retroactive, observational studies, both must show a statistically significant doubling of the risk in order to be some evidence that a drug more likely than not caused a particular injury.”

Slip op. at 10.

The Garza Court, however, went a dictum too far by expressing some of the Havner requirements as applying to general causation:

Havner holds, and we reiterate, that when parties attempt to prove general causation using epidemiological evidence, a threshold requirement of reliability is that the evidence demonstrate a statistically significant doubling of the risk. In addition, Havner requires that a plaintiff show ‘that he or she is similar to [the subjects] in the studies’ and that ‘other plausible causes of the injury or condition that could be negated [are excluded] with reasonable certainty’.40

Slip op. at 13-14 (quoting from Havner at 953 S.W.2d at 720).

General causation was not the dispositive issue in Garza, and so this language must be treated as dictum.  The sloppiness in confusing the requisites of general and specific causation is regrettable.

The plaintiffs also advanced another argument, which is becoming a commonplace in health-effects litigation.  They threw all their evidence into a pile, and claimed that the “totality of the evidence” supported their claims.  This argument is somehow supposed to supplant a reasoned approach to the issue of what specific inferences can be drawn from what kind of evidence.  The Texas Supreme Court saw through the pile, and dismissed the hand waving:

“The totality of the evidence cannot prove general causation if it does not meet the standards for scientific reliability established by Havner. A plaintiff cannot prove causation by presenting different types of unreliable evidence.”

Slip op. at 17.

All in all, the Garza Court did better than many federal courts that have consistently confused risk with cause, as well as general with specific causation.

Misplaced Reliance On Peer Review to Separate Valid Science From Nonsense

August 14th, 2011

A recent editorial in the Annals of Occupational Hygiene is a poignant reminder of how oversold peer review is in the context of expert witness judicial gatekeeping.  Editor Trevor Ogden urges some cautionary suggestions:

“1. Papers that have been published after proper peer review are more likely to be generally right than ones that have not.

2. However, a single study is very unlikely to take everything into account, and peer review is a very fallible process, and it is very unwise to rely on just one paper.

3. The question should be asked, has any published correspondence dealt with these paper, and what do other papers that cite them say about them?

4. Correspondence will legitimately give a point of view and not consider alternative explanations in the way a paper should, so peer review does not necessarily validate the views expressed.”

Trevor Ogden, “Lawyers Beware! The Scientific Process, Peer Review, and the Use of Papers in Evidence,” 55 Ann. Occup. Hyg. 689, 691 (2011).

Ogden’s conclusions, however, are misleading.  For instance, he suggests that peer-reviewed papers are better than non-peer reviewed papers, but by how much?  What is the empirical evidence for Ogden’s assertion?  In his editorial, Ogden gives an anecdote of a scientific report submitted to a political body, and comments that this report would not have survived peer review.   But an anecdote is not a datum.  What’s worse is that the paper that is rejected by peer review at Ogden’s journal will show up in another publication, eventually.  Courts make little distinction between and among journals for purposes of rating the value of peer review.

Of course it is unwise, and perhaps scientifically unsound, as Ogden points out, to rely upon just one paper, but the legal process permits it.  Worse yet,  litigants, either plaintiff or defendant, are often allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science. “[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.” Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).

As for letters to the editor, sure, courts and litigants should pay attention to them, but as Ogden notes, these writings are themselves not peer reviewed, or not peer reviewed with very much analytical rigor.  The editing of letters raises additional concerns of imperious editors who silence some points of view to the benefit of others. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Furthermore, many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion, because in most journals the authors typically have the last word in the form of reply, which often is self-serving and misleading, with immunity from further criticism.

Ogden describes and details the limitations of peer review in some detail, but he misses the significance of how these limitations play out in the legal arena.

Limitations and Failures of Peer Review

For instance, Ogden acknowledges that peer review fails to remove important errors from published articles. Here he does provide empirical evidence.  S. Schroter, N. Black, S. Evans, et al., “What errors do peer reviewers detect, and does training improve their ability to detect them?” 101 J. Royal Soc’y  Med. 507 (2008) (describing an experiment in which manuscripts were seeded with known statistical errors (9 major and 5 minor) and sent to 600 reviewers; each reviewer missed, on average, over 6 of 14 of the major errors).  Ogden tells us that the empirical evidence suggests that “peer review is a coarse and fallible filter.”

This is hardly a ringing endorsement.

Surveys of the medical literature have found the prevalence of statistical errors ranges from 30% to 90% of papers.  See, e.g., Douglas Altman, “Statistics in medical journals: developments in the 1980s,” 10 Stat. Med. 1897 (1991); Stuart J. Pocock, M.D. Hughes, R.J. Lee, “Statistical problems in the reporting of clinical trials. A survey of three medical journals,” 317 New Engl. J. Med. 426 (1987); S.M. Gore, I.G. Jones, E.C. Rytter, “Misuse of statistical methods: critical assessment of articles in the BMJ from January to March 1976. 1 Brit. Med. J. 85 (1977).

Without citing any empirical evidence, Ogden notes that peer review is not well designed to detect fraud, especially when the data are presented to look plausible.  Despite the lack of empirical evidence, the continuing saga of fraudulent publications coming to light supports Ogden’s evaluation. Peer reviewers rarely have access to underlying data.  In the silicone gel breast implant litigation, for instance, plaintiffs relied upon a collection of studies that looked very plausible from their peer-reviewed publications.  Only after the defense discovered misrepresentations and spoliation of data did the patent unreliability and invalidity of the studies become clear to reviewing courts.  The rate of retractions of published scientific articles appears to have increased, although the secular trend may have resulted from increased surveillance and scrutiny of the published literature for fraud.  Daniel S. Levine, “Fraud and Errors Fuel Research Journal Retractions,” (August 10, 2011); Murat Cokol, Fatih Ozbay, and Raul Rodriguez-Esteban, “Retraction rates are on the rise,” 9 European Molecular Biol. Reports 2 (2008);  Orac, “Scientific fraud and journal article retractions” (Aug. 12, 2011).

The fact is that peer review is not very good in detecting fraud or error in scientific work.  Ultimately, the scientific community must judge the value of the work, but in some niche areas, only “the acolytes” are paying attention.  These acolytes cite to one another, applaud each others’ work, and often serve as peer reviewers of the work in the field because editors see them as the most knowledgeable investigators in the narrow field. This phenomenon seems especially prevalent in occupational and environmental medicine.  See Cordelia Fine, “Biased But Brilliant,” New York Times (July 30, 2011) (describing confirmation bias and irrational loyalty of scientists to their hobby-horse hypotheses).

Peer review and correspondence to the editors are not the end of the story.  Discussion and debate may continue in the scientific community, but the pace of this debate may be glacial.  In areas of research where litigation or public policy does not fuel further research to address aberrant findings or to reconcile discordant results, science may take decades to ferret out the error. Litigation cannot proceed at this deliberative speed.  Furthermore, post-publication review is hardly a cure-all for the defects of peer review; post-publication commentary can be, and often is, spotty and inconsistent.  David Schriger and Douglas Altman, “Inadequate post-publication review of medical research:  A sign of an unhealthy research environment in clinical medicine,” 341 Brit. Med. J. 356 (2010)(identifying reasons for the absence of post-publication peer review).

The Evolution of Peer Review as a Criterion for Judicial Gatekeeping of Expert Witness Opinion

The story of how peer review came to be held in such high esteem in legal circles is sad, but deserves to be told.  In the Bendectin litigation, the medication sponsor, Merrell-Richardson, was confronted with the testimony of an epidemiologist, Shanna Swan, who propounded her own, unpublished re-analysis of the published epidemiologic studies, which failed to find an association between Bendectin use and birth defects.  Merrell challenged Swan’s unpublished, non-peer-reviewed re-analyses as not “generally accepted” under the Frye test.  The lack of peer review seemed like good evidence of the novelty of Swan’s reanalyses, as well as their lack of general acceptance.

In the briefings, the Supreme Court received radically different views of peer review in the Daubert case.  One group of amici modestly explained that “peer review referees and editors limit their assessment of submitted articles to such matters as style, plausibility, and defensibility; they do not duplicate experiments from scratch or plow through reams of computer-generated data in order to guarantee accuracy or veracity or certainty.” Brief for Amici Curiae Daryl E. Chubin, et al. at 10, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993).  See also E. Chubin & Edward J. Hackett, Peerless Science: Peer Review and U.S. Science Policy (1990).

Other amici, such as the New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine proposed that peer-reviewed publication should be the principal criterion for admitting scientific opinion testimony.  Brief for Amici Curiae New England Journal of Medicine, Journal of the American Medical Association, and Annals of Internal Medicine in Support of Respondent, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993). But see Arnold S. Relman & Marcia Angell,“How Good Is Peer Review?321 New Eng. J. Med. 827, 828 (1989) (‘‘peer review is not and cannot be an objective scientific process, nor can it be relied on to guarantee the validity or honesty of scientific research’’).

Justice Blackmun, speaking for the majority in Daubert, steered a moderate course:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, see S. Jasanoff, The Fifth Branch: Science Advisors as Policymakers 61-76 (1990), and in some instances well-grounded but innovative theories will not have been published, see Horrobin, “The Philosophical Basis of Peer Review and the Suppression of Innovation,” 263 JAMA 1438 (1990). Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of “good science,” in part because it increases the likelihood that substantive flaws in methodology will be detected. See J. Ziman, Reliable Knowledge: An Exploration of the Grounds for Belief in Science 130-133 (1978); Relman & Angell, “How Good Is Peer Review?” 321 New Eng. J. Med. 827 (1989). The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94, 590 n.9 (1993).

This lukewarm endorsement from Justice Blackmun, in Daubert, sent a mixed message to lower federal courts, which tended to make peer review into somewhat of a mechanical test in their gatekeeping decisions.  Many federal judges (and state court judges in states that followed the Daubert precedent), were too busy, too indolent, or too lacking in analytical acumen, to look past the fact of publication and peer review.  These judges avoided the labor of independent thought by taking the fact of peer-review publication as dispositive of the validity of the science in the paper.  Some commentators encouraged this low level of scrutiny and mechanical test, by suggesting that peer review could be taken as an indication of good science.  See, e.g., Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Federal Judicial Center, Reference Manual on Scientific Evidence 9, 17 (2d ed. 2000) (describing Daubert as endorsing peer review as one of the “indicators of good science”) (hereafter cited as Reference Manual).  Elevating peer review to be an indicator of good science, however, obscures its lack of epistemic warrant, misrepresents its real view in the scientific community, and enables judges to fall back into their pre-Daubert mindset of finding quick and easy, and invalid, proxies for scientific reliability.

In a similar vein, other commentators spoke in superlatives about peer review, and thus managed to mislead judges and decision makers further to regard anything as published as valid scientific data, data interpretation, and data analysis. For instance, Professor David Goodstein, writing in the Reference Manual, advises the federal judicial that peer review is the test that separates valid science from rubbish:

“In the competition among ideas, the institution of peer review plays a central role. Scientific articles submitted for publication and proposals for funding are often sent to anonymous experts in the field, in other words, peers of the author, for review. Peer review works superbly to separate valid science from nonsense, or, in Kuhnian terms, to ensure that the current paradigm has been respected.11 It works less well as a means of choosing between competing valid ideas, in part because the peer doing the reviewing is often a competitor for the same resources (pages in prestigious journals, funds from government agencies) being sought by the authors. It works very poorly in catching cheating or fraud, because all scientists are socialized to believe that even their bitterest competitor is rigorously honest in the reporting of scientific results, making it easy to fool a referee with purposeful dishonesty if one wants to. Despite all of this, peer review is one of the sacred pillars of the scientific edifice.”

David Goodstein, “How Science Works,” Reference Manual 67, at 74-75, 82 (emphasis added).

Criticisms of Reliance Upon Peer Review as a Proxy for Reliability and Validity

Other commentators have put forward a more balanced and realistic, if not jaundiced, view of peer review. Professor Susan Haack, a philosopher of science at the University of Miami, who writes frequently about epistemic claims of expert witnesses and judicial approaches to gatekeeping, described the disconnect in meaning of peer review to scientists and to lawyers:

“For example, though peer-reviewed publication is now standard practice at scientific and medical journals, I doubt that many working scientists imagine that the fact that a work has been accepted for publication after peer review is any guarantee that it is good stuff, or that it’s not having been published necessarily undermines its value.92 The legal system, however, has come to invest considerable epistemic confidence in peer-reviewed publication93 — perhaps for no better reason than that the law reviews are not peer-reviewed!”

Susan Haack, “Irreconcilable Differences?  The Troubled Marriage of Science and Law,” 72 Law & Contemporary Problems 1, 19 (2009).   Haack’s assessment of the motivation of actors in the legal system is, for a philosopher, curiously ad hominem, and her shameless dig at law reviews is ironic, considering that she publishes extensively in them.  Still, her assessment that peer review is not any guarantee of an article’s being “good stuff,” is one of her more coherent contributions to this discussion.

The absence of peer review hardly supports the inference that a study or an evaluation of studies is not reliable, unless of course we also know that the authors have failed after repeated attempts to find a publisher.  In today’s world of vanity presses, a researcher would be hard pressed to be unable to find a journal in which to publish a paper.  As Drummond Rennie, a former editor of the Journal of the American Medical Association (the same journal, acting as an amicus curiae to the Supreme Court, which oversold peer review), has remarked:

“There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Drummond Rennie, “Guarding the Guardians: A Conference on Editorial Peer Review,” 256 J. Am. Med. Ass’n 2391 (1986); D. Rennie, A. Flanagin, R. Smith, and J. Smith, “Fifth International Congress on Peer Review and Biomedical Publication: Call for Research”. 289 J. Am. Med. Ass’n 1438 (2003)

Other editors at leading medical journals seem to agree with Rennie.  Richard Horton, an editor of The Lancet, rejects the Goodstein view (from the Reference Manual) of peer review as the “sacred pillar of the scientific edifice”:

“The mistake, of course, is to have thought that peer review was any more than a crude means of discovering the acceptability — not the validity — of a new finding. Editors and scientists alike insist on the pivotal importance of peer review. We portray peer review to the public as a quasi-sacred process that helps to make science our most objective truth teller. But we know that the system of peer review is biased, unjust, unaccountable, incomplete, easily fixed, often insulting, usually ignorant, occasionally foolish, and frequently wrong.”

Richard Horton “Genetically modified food: consternation, confusion, and crack-up,” 172 Med. J. Australia 148 (2000).

In last year’s prestigious 2010 Sense About Science lecture, Fiona Godlee, the editor of the British Medical Journal, characterized peer review as deficient in at least seven different ways:

  • Slow
  • Expensive
  • Biased
  • Unaccountable
  • Stifles innovation
  • Bad at detecting error
  • Hopeless at detecting fraud

Godlee, “It’s time to stand up for science once more” (June 21, 2010).

Important research often goes unpublished, and never sees the light of day.  Anti-industry zealots are fond of pointing fingers at the pharmaceutical industry, although many firms, such as GlaxoSmithKline, have adopted a practice of posting study results on a website.  The anti-industry zealots overlook how many apparently neutral investigators suppress research results that do not fit in with their pet theories.  One of my favorite examples is the failure of the late-Dr. Irving Selikoff to publish his study of Johns-Manville factory workers:  William J. Nicholson, Ph.D. and Irving J. Selikoff, M.D., “Mortality experience of asbestos factory workers; effect of differing intensities of asbestos exposure,” Unpublished Manuscript.  This study investigated cancer and other mortality at a factory in New Jersey, where crocidolite was used in the manufacture of  insulation products.  Selikoff and Nicholson apparently had no desire to publish a paper that would undermine their unfounded claim that crocidolite asbestos was not used by American workers.  But this desire does not necessarily mean that Nicholson and Selikoff’s unpublished paper was of any lesser quality than their study of North American insulators, the results of which they published, and republished, with abandon.

Examples of Failed Peer Review from the Litigation Front

Phenylpropanolamine and Stroke

Then there are many examples from the litigation arena of studies that passed peer review at the most demanding journals, but which did not hold up under the more intense scrutiny of review by experts in the cauldron of litigation.

In In re Phenylpropanolamine Products Liability Litigation, Judge Rothstein conducted hearings and entertained extensive briefings on the reliability of plaintiffs’ expert witnesses’ opinions, which were based largely upon one epidemiologic study, known as the “Yale Hemorrhagic Stroke Project (HSP).”  The Project was undertaken by manufacturers, which created a Scientific Advisory Group, to oversee the study protocol.  The study was submitted as a report to the FDA, which reviewed the study and convened an advisory committee to review the study further.  “The prestigious NEJM published the HSP results, further substantiating that the research bears the indicia of good science.” In re Phenylpropanolamine Prod. Liab. Litig., 289 F. 2d 1230, 1239 (2003) (citing Daubert II for the proposition that peer review shows the research meets the minimal criteria for good science).  There were thus many layers of peer review for the HSP study.

The HSP study was subjected to much greater analysis in litigation.  Peer review, even in the New England Journal of Medicine, did not and could not carry this weight. The Defendants fought to fight to obtain the underlying data to the HSP, and that underlying data unraveled the HSP paper.  Despite the plaintiffs’ initial enthusiasm for a litigation that was built on the back of a peer-reviewed paper in one of the leading clinical journals of internal medicine, the litigation resulted in a string of notable defense verdicts.  After one of the early defense verdicts, plaintiffs’ challenged the defendant’s reliance upon underlying data that went behind the peer-reviewed publication.  The trial court rejected the request for a new trial, and spoke to the significance of challenging the superficial significance of peer review of the key study relied upon by plaintiffs in the PPA litigation:

“I mean, you could almost say that there was some unethical activity with that Yale Study.  It’s real close.  I mean, I — I am very, very concerned at the integrity of those researchers.”

“Yale gets — Yale gets a big black eye on this.”

O’Neill v. Novartis AG, California Superior Court, Los Angeles Cty., Transcript of Oral Argument on Post-Trial Motions, at 46 -47 (March 18, 2004) (Hon. Anthony J. Mohr)

Viagra and Ophthalmic Events

The litigation over ophthalmic adverse events after the use of Viagara provides another example of challenging peer review.  In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).  In this litigation, the court, after viewing litigation discovery materials, recognized that the authors of a key paper failed to use the methodologies that were described in their published paper.  The court gave the sober assessment that ‘[p]eer review and publication mean little if a study is not based on accurate underlying data.’’ Id.

MMR Vaccine and Autism

Plaintiffs’ expert witness in the MMR vaccine/autism litigation, Andrew Wakefield published a paper in The Lancet, in which he purported to find an association between measles-mumps-rubella vaccine and autism.  A.J. Wakefield, et al., “Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 351 Lancet 637 (1998).  This published paper, in a well-regarded journal, opened a decade-long controversy, with litigation, over the safety of the MMR vaccine.  The study was plagued, however, not only by failure to disclose payments from plaintiffs’ attorneys and ethical lapses for failure to obtain ethics board approvals, but by substantially misleading reports of data and data analyses.  In 2010, Wakefield was sanctioned by the UK General Medical Council’s Fitness to Practise Panel.  Finally, in 2010, over a decade after initial publication,  the Lancet ‘‘fully retract[ed] this paper from the published record.’’  Editors of the Lancet, “Retraction—Ileal-lymphoidnodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children,” 375 Lancet 445 (2010).

Accutane and Suicide

In the New Jersey litigation over claimed health effects of Accutane, one of the plaintiffs’ expert witnesses was the author of a key paper that “linked” Accutane to depression.  Palazzolo v. Hoffman La Roche, Inc., 2010 WL 363834 (N.J. App. Div.).  Discovery revealed that the author, James Bremner, did not follow the methodology described in the paper.  Furthermore, Bremner could not document the data used in the paper’s analysis, and conceded that the statistical analyses were incorrect.  The New Jersey Appellate Division held that reliance upon Bremner’s study should be excluded as not soundly and reliably generated.  Id. at *5.

Silicone and Connective Tissue Disease

It is heartening that the scientific and medical communities decisively renounced the pathological science that underlay the silicone gel breast implant litigation.  The fact remains, however, that plaintiffs relied upon a large body of published papers, each more invalid than the other, to support their claims.  For many years, judges around the country blinked and let expert witnesses offer their causation opinions, in large part based upon papers by Smalley, Shanklin, Lappe, Kossovosky, Gershwin, Garrido, and others.  Peer review did little to stop the enthusiasm of editors for this “sexy” topic until a panel of court-appointed expert witnesses, and the Institute of Medicine put an end to the judicial gullibility.

Concluding Comments

One district court distinguished between pre-publication peer review and the important peer review that takes place after publication as other researchers quietly go about replicating or reproducing a study’s findings, or attempting to build on those findings.  “[J]ust because an article is published in a prestigious journal, or any journal at all, does not mean per se that it is scientifically valid.”  Pick v. Amer. Med. Sys., 958 F. Supp. 1151, 1178 n.19 (E.D. La. 1997), aff’d, 198 F.3d 241 (5th Cir. 1999).  With hindsight, we can say that Merrell Richardson’s strategy of emphasizing peer review has had some unfortunate, unintended consequences.  The Supreme Court elevated peer review into a factor for reliable science, and lower courts have elevated peer review into a criterion of validity.  The upshot is that many courts will now not go beyond statements in a peer-reviewed paper to determine whether they are based upon sufficient facts and data, or whether the statements are based upon sound inferences from the available facts and data.  These courts violate the letter and spirit of Rule 702, of the Federal Rules of Evidence.

Bad and Good Statistical Advice from the New England Journal of Medicine

July 2nd, 2011

Many people consider The New England Journal of Medicine (NEJM) a prestigious journal.  It is certainly widely read.  Judging from its “impact factor,” we know the journal is frequently cited.  So when the NEJM weighs in on issue that involves the intersection of law and science, I pay attention.

Unfortunately, this week’s issue contains an editorial “Perspective” piece that is filled with incoherent, inconsistent, and incorrect assertions, both on the law and the science.  Mark A. Pfeffer and Marianne Bowler, “Access to Safety Data – Stockholders versus Prescribers,” 364 New Engl. J. Med. ___ (2011).

Dr. Mark Pfeffer and the Hon. Marianne Bowler used the recent United States Supreme Court decision in Matrixx Initiatives, Inc. v. Siracusano, __ U.S. __, 131 S.Ct., 1309 (2011), to advance views, not supported by the law or the science.   Remarkably, Dr. Pfeffer is the Victor J. Dzau Professor of Medicine, at the Harvard Medical School.  He is both a physician, and he has received a Ph.D. degree in physiology and biophysics.  Ms. Bowler is both a lawyer and a federal judge.  Between the two, they should have provided better, more accurate, and more consistent advice.

1. The Authors Erroneously Characterize Statistical Significance in Inappropriate Bayesian Terms

The article begins with a relatively straightforward characterization of various legal burdens of proof.  The authors then try to collapse one of those burdens of proof, “beyond a reasonable doubt,” which has no accepted quantitative meaning, to a significance probability that is used to reject a pre-specified null hypothesis in scientific studies:

“To reject the null hypothesis (that a result occurred by chance) and deem an intervention effective in a clinical trial, the level of proof analogous to law’s ‘beyond a reasonable doubt’ standard would require an extremely stringent alpha level to permit researchers to claim a statistically significant effect, with the offsetting risk that a truly effective intervention would sometimes be deemed ineffective.  Instead, most randomized clinical trials are designed to achieve a lower level of evidence that in legal jargon might be called ‘clear and convincing’, making conclusions drawn from it highly probable or reasonably certain.”

Now this is both scientific and legal nonsense.  It is distressing that a federal judge characterizes the burden of proof that she must apply, or direct juries to apply, as “legal jargon.”  More important, these authors, scientist and judge, give questionable quantitative meanings to burdens of proof, and they misstate the meaning of statistical significance.  When judges or juries must determine guilt “beyond a reasonable doubt,” they are assessing the prosecution’s claim that the defendant is guilty, given the evidence at trial.  This posterior probability can be represented as:

Probability (Guilt | Evidence Adduced)

This is what is known as a posterior probability, and it is fundamentally different from significance probability.

The significance probability is a transposed conditional probability from the posterior probability that is used to assess guilt in a criminal trial, or contentions in a civil trial.  As law professor David Kaye and his statistician coauthor, the late David Freedman, described the p-value and significance probability:

“The p-value is the probability of getting data as extreme as, or more extreme than, the actual data, given that the null hypothesis is true:

p = Probability (extreme data | null hypothesis in model)

* * *

Conversely, large p-values indicate that the data are compatible with the null hypothesis: the observed difference is easy to explain by chance. In this context, small p-values argue for the plaintiffs, while large p-values argue for the defense.131Since p is calculated by assuming that the null hypothesis is correct (no real difference in pass rates), the p-value cannot give the chance that this hypothesis is true. The p-value merely gives the chance of getting evidence against the null hypothesis as strong or stronger than the evidence at hand—assuming the null hypothesis to be correct. No matter how many samples are obtained, the null hypothesis is either always right or always wrong. Chance affects the data, not the hypothesis. With the frequency interpretation of chance, there is no meaningful way to assign a numerical probability to the null hypothesis.132

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” Federal Judicial Center, Reference Manual on Scientific Evidence 122 (2ed. 2000).  Kaye and Freedman explained over a decade ago, for the benefit of federal judges:

“As noted above, it is easy to mistake the p-value for the probability that there is no difference. Likewise, if results are significant at the .05 level, it is tempting to conclude that the null hypothesis has only a 5% chance of being correct.142

This temptation should be resisted. From the frequentist perspective, statistical hypotheses are either true or false; probabilities govern the samples, not the models and hypotheses. The significance level tells us what is likely to happen when the null hypothesis is correct; it cannot tell us the probability that the hypothesis is true. Significance comes no closer to expressing the probability that the null hypothesis is true than does the underlying p-value.143

Id. at 124-25.

As we can see, our scientist from the Harvard School of Medical School and our federal judge have committed the transpositional fallacy by likening “beyond a reasonable doubt” to the alpha used to test for a statistically significant outcome in a clinical trial.  They are not the same; nor are they analogous.

This fallacy has been repeatedly described.  Not only has the Reference Manual on Scientific Manual (which is written specifically for federal judges) described the fallacy in detail, but legal and scientific writers have urged care to avoid this basic mistake in probabilistic reasoning.  Here is a recent admonition from one of the leading writers on the use (and misuse) of statistics in legal procedures:

“Some commentators, however, would go much further; they argue that is an arbitrary statistical convention and since preponderance of the evidence means 51% probability, lawyers should not use 5% as the level of statistical significance but 49% – thus rejecting the null hypothesis when there is up to a 49% chance that it is true. In their view, to use a 5% standard of significance would impermissibly raise the preponderance of evidence standard in civil trials. Of course the 5% figure is arbitrary (although widely accepted in statistics) but the argument is fallacious. It assumes that 5% (or 49% for that matter) is the probability that the null hypothesis is true. The 5% level of significance is not that, but the probability of the sample evidence if the null hypothesis were true. This is a very different matter. As I pointed out in Chapter1, the probability of the sample given the null hypothesis is not generally the same as the probability of the null hypothesis given the sample. To relate the level of significance to the probability of the null hypothesis would require an application of Bayes’s theorem and the assumption of a prior probability distribution. However, the courts have usually accepted the statistical standard, although with some justifiable reservations when the P-value is only slightly above the 5% cutoff.”

Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 54 (N.Y. 2009) (emphasis added).

2.  The Authors, Having Mischaracterized Burden-of-Proof and Significance Probabilities, Incorrectly Assess the Meaning of the Supreme Court’s Decision in Matrixx Initiatives.

I have written a good bit about the Court’s decision in Matrixx Initiatives, most recently with David Venderbush, for the Washington Legal Foundation.  See Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” W.L.F. Legal Backgrounder (June 17, 2011).

I was thus startled to see the claim of a federal judge that the Supreme Court, in Matrixx, had “applied the ‘fair preponderance of the evidence’ standard of proof used for civil matters.”  Matrixx was a case about the sufficiency of the pleadings, and thus there really could have been no such application of a burden of proof to an evidentiary display.  The very claim is incoherent, and at odds with the Supreme Court’s holding.

The NEJM authors went on to detail how the defendant in Matrixx had persuaded the trial court that the evidence against its product, Zicam, did not reach statistical significance, and therefore the evidence should not be considered “material.”  As I have pointed out before, Matrixx focused on adverse event reports, as raw number of reported events, which did not, and could not, be analyzed for statistical significance.  The very essence of Matrixx’s argument was nonsense, which perhaps explains the company’s nine-nothing loss in the Supreme Court.  The authors of the opinion piece in the NEJM, however, missed that it is not the evidence of adverse event reports, with or without a statistical analysis, that is material.  What was at issue was whether the company’s failure to disclose this information, along with a good deal more information, in the face of the company’s having made very aggressive, optimistic sales and profits projections for the future.

The NEJM authors proceed to tell us, correctly, that adverse events do not prove causality, but then they tell us, incorrectly, that the Matrixx case shows that “such a high level of proof did not have to be achieved.”  While the authors are correct about the sufficiency of adverse event reports for causal assessments, they miss the legal significance of there being no burden of proof at play in Matrixx; it was a case on the pleadings.  The issue was the sufficiency of those pleadings, and what the Supreme Court made clear was that in the context of a product subject to FDA regulation, causation was never the test for materiality because the FDA could withdraw the product on a showing far less than scientific causation of harm.  So the plaintiffs could allege less than causation, and still have pleaded a sufficient case of securities fraud.  The Supreme Court did not, and could not, address the issue that the NEJM authors discuss.  The authors’ assessment that the Matrixx case freed legal causation of any requirement of statistical significance is a tortured reading of obiter dictum, not the holding of the case.  This editorializing is troubling.

The NEJM authors similarly hold forth on what clinicians consider material, and they announce that “[c]linicians are well aware that to be considered material, information regarding drug safety does not have to reach the same level of certainty that we demand for demonstrating efficacy.”  This is true, but clinicians are ethically bound to err on the side of safety:  Primum non nocere. See, e.g., Tamraz v. Lincoln Elec. Co., 620 F.3d 665, 673 (6th Cir. 2010) (noting that treating physicians have more training in diagnosis than in etiologic assessments), cert. denied, ___ U.S.____ (2011).  Again, the authors’ statements have nothing to do with the Matrixx case, or with the standards for legal or scientific causation.

3.  The Authors, Inconsistently with Their Characterization of Various Probabilities, Proceed Correctly To Describe Statistical Significance Testing for Adverse Outcomes in Trials.

Having incorrectly described beyond a reasonable doubt as like p <0.05, the NEJM authors then, correctly point out that standard statistical testing cannot be used for “evaluating unplanned and uncommon adverse events.”  The authors also note that the flood of data in the assessment of causation of adverse events is filled with “biologic noise.”  Physicians and regulators may take the noise signals and claim that they hear a concert.  This is exactly why we should not confuse precautionary judgments with scientific assessments of causation.

Ninth Circuit Affirms Rule 702 Exclusion of Dr David Egilman in Diacetyl Case

June 20th, 2011

On June 17, 2011, the Ninth Circuit of the United States Court of Appeals affirmed a district judge’s decision to exclude Dr David S. Egilman from testifying in a consumer-exposure diacetyl case.  Newkirk v. Conagra Foods Inc. (9th Cir. 2011).

Plaintiff claimed to develop bronchiolitis obliterans from having popped and eaten an Homeric quantity of microwavable popcorn.  The case was thus a key test of “consumer” diacetyl exposure.  Another case, also involving Egilman, just finished a Daubert hearing in Colorado, last week.

To get the full “flavor” of this diacetyl case, you may have to read the district court’s opinion, which excluded Egilman and other witnesses, and entered summary judgment for the defense. Newkirk v. Conagra Foods, Inc., No. CV-08-273, 2010 WL 2680184 (E.D. Wash. July 2, 2010).

Plaintiff appealed, and so did Egilman.  (See attached Egilman Motion Appeal Diacetyl Exclusion 2011 and Egilman Declaration Newkirk Diacetyl Appeal 2011.)  In what some may consider scurrilous pleading, Egilman attacked the district judge for having excluded him from testifying.  If Egilman’s challenge to the trial judge was not bizarre enough, Egilman also claimed a right to intervene in the appeal by advancing the claim that the Rule 702 exclusion hurt his livelihood.  The following language is from paragraph 11 of Dr. Egilman’s declaration in support of his motion:

“The Daubert ruling eliminates my ability to testify in this case and in others. I will lose the opportunity to bill for services in this case and in others (although I generally donate most fees related to courtroom testimony to charitable organizations, the lack of opportunity to do so is an injury to me). Based on my experience, it is virtually certain that some lawyers will choose not to attempt to retain me as a result of this ruling. Some lawyers will be dissuaded from retaining my services because the ruling is replete with unsubstantiated pejorative attacks on my qualifications as a scientist and expert. The judge’s rejection of my opinion is primarily an ad hominem attack and not based on an actual analysis of what I said – in an effort to deflect the ad hominem nature of the attack the judge creates ‘straw man’ arguments and then knocks the straw men down, without ever addressing the substance of my positions.”

Egilman Declaration at Paragraph 11.

Egilman tempers his opinion about the prejudice he will suffer in front of judges in future cases.  Only judges who have not seen him before would likely be persuaded by Judge Peterson’s decision in Newkirk.  Those judges who have heard him testify before would, no doubt, see him for the brilliant crusading avenger that he is:

“This will generally not occur in cases heard before Judges where I have already appeared as a witness. For example a New York state trial judge has praised plaintiffs’ molecular-biology and public-health expert Dr. David Egilman as follows: ‘Dr. Egilman is a brilliant fellow and I always enjoy seeing him and I enjoy listening to his testimony . . . . He is brilliant, he really is.’ [Lopez v. Ford Motor Co., et al. (120954/2000; In re New York City Asbestos Litigation, Index No. 40000/88).]”

Egilman Declaration at p. 9 n. 2.

It does not appear as though Egilman’s attempt to intervene helped plaintiff before the Ninth Circuit, which may not have thought that he was as brilliant as the unidentified trial judge in Lopez.

The Newkirk case is interesting for several reasons.

First, the Circuit correctly saw that general causation must be shown before the plaintiff can invoke a differential etiology analysis.

Second, the Circuit saw that it is not sufficient that the substance in question can cause the outcome claimed; the substance must do so at the levels of exposure that were experienced by the plaintiff.  In Newkirk, even by consuming massive quantities of microwave popcorn, plaintiff had not shown exposure levels to diacetyl equivalent to the exposures among factory workers at risk for bronchiolitis obliterans.  The affirmance of the district court is a strong statement that exposure matters in the context of the current understanding of diacetyl causation.

Third, the Circuit was not intimidated or persuaded by the tactics of Dr David Egilman, expert witness.

Fourth, having dealt with the issues deftly, the Ninth Circuit issued a judgment from which there will be no appeal.

WLF Legal Backgrounder on Matrixx Initiatives

June 20th, 2011

In Matrixx Initiatives, Inc. v. Siracusano, ___ U.S. ___, ___ , 2011 WL 977060 (Mar. 22, 2011), the Supreme Court addressed a securities fraud case against an over-the-counter pharmaceutical company for speaking to the market about its rosy financial projections, but failing to provide information received about the hazards of the product.

Much or most of the holding of the case is an unexceptional application of settled principles of securities fraud litigation in the context of claims against a pharmaceutical company with products liability cases pending.  The defendant company, however, attempted to import Rule 702 principles of scientific evidence into a motion to dismiss on the pleadings, with much confusion resulting among the litigants, the amici, and the Court.  The Supreme Court ruled unanimously to affirm the reinstatement of the complaint against the defendant.

I have written about this case previously: “The Matrixx – A Comedy of Errors,” and “Matrixx Unloaded,” and “The Matrixx Oversold,” and “De-Zincing the Matrixx.”

Now, with the collaboration of David Venderbush from Alston & Bird LLP, we have collected our thoughts to share in the form of a Washington Legal Foundation Legal Backgrounder, which is available for download at the WLF’s website.  Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011).