Admissibility versus Sufficiency of Expert Witness Evidence

Professors Michael Green and Joseph Sanders are two of the longest serving interlocutors in the never-ending discussion and debate about the nature and limits of expert witness testimony on scientific questions about causation.  Both have made important contributions to the conversation, and both have been influential in academic and judicial circles.  Professor Green has served as the co-reporter for the American Law Institute’s Restatement (Third) of Torts: Liability for Physical Harm.  Whether wrong or right, new publications about expert witness issues by Green or Sanders call for close attention.

Early last month, Professors Green and Sanders presented together at a conference, on “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States.” Video and audio of their presentation can be found online.  The authors posted a manuscript of their draft article on expert witness testimony to the Social Science Research Network.  See Michael D. Green & Joseph Sanders, “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States,” <downloaded on March 25, 2012>.

The authors argue that most judicial exclusions of expert witness causal opinion testimony are based upon a judgment that the challenged witness’s opinion is based upon insufficient evidence.  They point to litigations, such as the Bendectin and silicone gel breast implant cases, where the defense challenges were supported in part by a body of “exonerative” epidemiologic studies.  Legal theory construction is always fraught with danger in that it either stands to be readily refuted by counterexample, or it is put forward as a normative, prescriptive tool to change the world, thus lacking in descriptive or explanatory component.  Green and Sanders, however, seem to be earnest in suggesting that their reductionist approach is both descriptive and elucidative of actual judicial practice.

The authors’ reductionist approach in this area, and especially as applied to the Bendectin and silicone decisions, however, ignores that even before the so-called exonerative epidemiology on Bendectin and silicone was available, the plaintiffs’ expert witnesses were presenting opinions on general and specific causation, based upon studies and evidence of dubious validity. Given that the silicone litigation erupted before Daubert was decided, and Bendectin cases pretty much ended with Daubert, neither litigations really permit a clean before and after picture.  Before Daubert, courts struggled with how to handle both the invalidity and the insufficiency (once the impermissible inferences were stripped away) in the Bendectin cases.  And before Daubert, all silicone cases went to the jury.  Even after Daubert, for some time, silicone cases resulted in jury verdicts, which were upheld on appeal.  It took defendants some time to uncover the nature and extent of the invalidity in plaintiffs’ expert witnesses’ opinions, the invalidity of the studies upon which these witnesses relied, and the unreasonableness of the witnesses’ reliance upon various animal and in vitro toxicologic and immunologic studies. And it took trial courts a few years after the Supreme Court’s 1993 Daubert decision to warm up to their new assignment.  Indeed, Green and Sanders get a good deal of mileage in their reductionist approach from trial and appellate courts that were quite willing to collapse the distinction between reliability or validity on the one hand, and sufficiency on the other.  Some of those “back benching” courts used consensus statements and reviews, which both marshaled the contrary evidence as well as documented the invalidity of the would-be affirmative evidence.  This judicial reliance upon external sources that encompassed both sufficiency and reliability should not be understood to mean that reliability (or validity) is nothing other than sufficiency.

A post-Daubert line of cases is more revealing:  the claim that the ethyl mercury vaccine preservative, thimerosal, causes autism.  Professors Green and Sanders touch briefly upon this litigation.  See Blackwell v. Wyeth, 971 A.2d 235 (Md. 2009).  Plaintiff’s expert witness, David Geier, had published several articles in which he claimed to have supported a causal nexus between thimerosal and autism.  Green and Sanders dutifully note that the Maryland courts ultimately rejected the claims based upon Geier’s data as wholly inadequate, standing alone to support the inference he zealously urged to be drawn.  Id. at 32.  Whether this is sufficiency or the invalidity of his ultimate inference of causation from an inadequate data set perhaps can be debated, but surely the validity concerns should not be lost in the shuffle of evaluating the evidence available.  Of course, exculpatory epidemiologic studies ultimately were published, based upon high quality data and inferences, but strictly speaking, these studies were not necessary to the process of ruling Geier’s advocacy science out of bounds for valid scientific discourse and legal proceedings.

Some additional comments.

 

1. Questionable reductionism.  The authors describe the thrust of their argument as a need to understand judicial decisions on expert witness admissibility as “sufficiency judgments.”  While their analysis simplifies the gatekeeping decisions, it also abridges the process in a way that omits important determinants of the law and its application.  Sufficiency, or the lack thereof, is often involved as a fatal deficiency in expert witness opinion testimony on causal issues, but the authors’ attempt to reduce many exclusionary decisions to insufficiency determinations ignores the many ways that expert witnesses (and scientists in the real world outside of courtrooms) go astray.  The authors’ reductionism seems a weak, if not flawed, predictive, explanatory, and normative theory of expert witness gatekeeping.  Furthermore, this reductionism holds a false allure to judges who may be tempted to oversimplify their gatekeeping task by conflating gatekeeping with the jury’s role:  exclude the proffered expert witness opinion testimony because, considering all the available evidence, the testimony is probably wrong.

 

2. Weakness of peer review, publication, and general acceptance in predicting gatekeeping decisions.  The authors further describe a “sufficiency approach” as openly acknowledging the relative unimportance of peer review, publication, and general acceptance.  Id. at 39.  These factors do not lack importance because they are unrelated to sufficiency; they are unimportant because they are weak proxies for validity.  Their presence or absence does not really help predict whether the causal opinion offered is invalid,  or otherwise unreliable.  The existence of published, high-quality, peer-reviewed systematic reviews does, however, bear on sufficiency of the evidence.  At least in some cases, courts consider such reviews and rely upon them heavily in reaching a decision on Rule 702, but we should ask to what extent has the court simply avoided the hard work of thinking through the problem on its own.

 

3. Questionable indictment of juries and the adversarial system for the excesses of expert witnesses.  Professors Green and Sanders describe the development of common law, and rules, to control expert witness testimony as “a judicial attempt to moderate the worst consequences of two defining characteristics of United States civil trials:  party control of experts and the widespread use of jury decision makers.” Id. at 2.  There is no doubt that these are two contributing factors in some of the worst excesses, but the authors really offer no support for their causal judgment.  The experience of courts in Europe, where civil juries and party control of expert witnesses are often absent from the process, raises questions about the Green and Sanders’ attribution. See, e.g., R. Meester, M. Collins, R.D. Gill, M. van Lambalgen,  “On the (ab)use of statistics in the legal case against the nurse Lucia de B”. 5 Law, Probability and Risk 233 (2007) (describing the conviction of Nurse Lucia de Berk in the Netherlands, based upon shabby statistical evidence).

Perhaps a more general phenomenon is at play, such as an epistemologic pathology of expert witnesses who feel empowered and unconstrained by speaking in court, to untutored judges or juries.  The thrill of power, the arrogance of asserted opinion, the advancement of causes and beliefs, the lure of lucre, the freedom from contradiction, and a whole array of personality quirks are strong inducements for expert witnesses, in many countries, to outrun their scientific headlights.  See Judge Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans”; “[t]he breast implant litigation was largely based on a litigation fraud. … Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”)

In any event, there have been notoriously bad verdicts in cases decided by trial judges as the finders of fact.  See, e.g., Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D.Ga. 1985), aff’d and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986); Barrow v. Bristol-Meyers Squibb Co., 1998 WL 812318, at *23 (M.D. Fla., Oct. 29, 1998)(finding for breast implant plaintiff whose claims were supported by dubious scientific studies), aff’d, 190 F. 3d 541 (11th Cir. 1999).  Bad things can happen in the judicial process even without the participation of lay juries.

Green and Sanders are correct to point out that juries are often confused by scientific evidence, and lack the time, patience, education, and resources to understand it.  Same for judges.  The real difference is that the decisions of judges is public.  Judges are expected to explain their reasoning, and there is some, even if limited, appellate review for judicial gatekeeping decisions. In this vein, Green and Sanders dismiss the hand wringing over disagreements among courts on admissibility decisions by noting that similar disagreements over evidentiary sufficiency issues fill the appellate reporters.  Id. at 37.  Green and Sanders might well add that at least the disagreements are out in the open, advanced with supporting reasoning, for public discussion and debate, unlike the unimpeachable verdicts of juries and their cloistered, secretive reasoning or lack thereof.

In addition, Green and Sander’s fail to mention a considerable problem:  the admission of weak, pathologic, or overstated scientific opinion undermines confidence in the judicial judgments based upon verdicts that come out of a process that featured the dubious opinions of the expert witnesses.  The public embarrassment of the court system for its judgments, based upon questionable expert witness opinion testimony, was a strong inducement to changing the libertine pre-Daubert laissez-faire approach.

 

4.  Failure to consider the important role of Rule 703, which is quite independent of any “sufficiency” considerations, in the gatekeeping process.  Green and Sanders properly acknowledge the historical role that Rule 703, of the Federal Rules of Evidence, played in judicial attempts to regain some semblance of control over expert witness opinion.  They do not pursue the issue of its present role, which is often neglected and underemphasized.  In part, Rule 703, with its requirement that courts screen expert witness reliance upon independently inadmissible evidence (which means virtually all epidemiologic and animal studies and their data analyses), goes to the heart of gatekeeping by requiring judges to examine the quality of study data, and the reasonableness of reliance upon such data, by testifying expert witnesses.  See Schachtman, RULE OF EVIDENCE 703 — Problem Child of Article VII (Sept. 19, 2011).  Curiously, the authors try to force Rule 703 into their sufficiency pigeonhole even though it calls for a specific inquiry into the reasonableness (vel non) of reliance upon specific (hearsay or otherwise inadmissible) studies.  In my view, Rule 703 is predominantly a validity, and not a sufficiency, inquiry.

Judge Weinstein’s use of Rule 703, in In re Agent Orange, to strip out the most egregiously weak evidence did not predominantly speak to the evidentiary insufficiency of the plaintiffs’ expert witnesses reliance materials; nor did it look to the defendants’ expert witnesses’ reliance upon contradicting evidence.  Judge Weinstein was troubled by the plaintiffs’ expert witnesses reliance upon hearsay statements, from biased witnesses, of the plaintiffs’ medical condition.  Judge Weinstein did, of course, famously apply sufficiency criteria, including relative risks too low to permit an inference of specific causation, and the insubstantial totality of the evidence, but Judge Weinstein’s judicial philosophy then was to reject Rule 702 as a quality-control procedure for expert witness opinion testimony.  See In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 817 (E.D.N.Y. 1984)(plaintiffs must prove at least a two-fold increase in rate of disease allegedly caused by the exposure), aff’d, 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004  (1988); see also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1240, 1262 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).  A decade later, in the breast implant litigation, Judge Weinstein adhered to his rejection of Rule 702 to make explicit expert witness validity rulings or sufficiency determinations by granting summary judgment on the entire evidentiary display.  This assessment of sufficiency was not, however, driven by the rules of evidence; it was based firmly upon Federal Rule of Civil Procedure 56’s empowerment of the trial judge to make an overall assessment that plaintiffs lack a submissible case.  See In re Breast Implant Cases, 942 F.Supp. 958 (E. & S.D.N.Y. 1996)(granting summary judgment because of insufficiency of plaintiffs’ evidence, but specifically declining to rule on defendants’ Rule 702 and Rule 703 motions).  Within a few years, court-appointed expert witnesses, and the Institute of Medicine, weighed in with withering criticisms of plaintiffs’ attempted scientific case.  Given that there was so little valid evidence, sufficiency really never was at issue for these experts, but Judge Weinstein chose to frame the issue as sufficiency to avoid ruling on the pending motions under Rule 702.

 

5. Re-analyzing Re-analysis.  In the Bendectin litigation, some of the plaintiffs’ expert witnesses sought to offer various re-analyses of published papers.  Defendant Merrell Dow objected, and appeared to have framed its objections in general terms to unpublished re-analyses of published papers.  Green and Sanders properly note that some of the defense arguments, to the extent stated generally as prohibitions against re-analyses, were overblown and overstated.  Re-analyses can take so many forms, and the quality of peer reviewed papers is so variable, it would be foolhardy to frame a judicial rule as a prohibition against re-analyzing data in published studies.  Indeed, so many studies are published with incorrect statistical analyses that parties and expert witnesses have an obligation to call the problems to the courts’ attention, and to correct the problems when possible.

The notion that peer review was important in any way to serve as a proxy for reliability or validity has not been borne out.  Similarly, the suggestion that reanalyses of existing data from published papers were presumptively suspect was also not well considered.  Id. at 13.

 

6. Comments dismissive of statistical significance and methodological rigor.  Judgments of causality are, at the end of the day, qualitative judgments, but is it really true that:

“Ultimately, of course, regardless of how rigorous the methodology of more probative studies, the magnitude of any result and whether it is statistically significant, judgment and inference is required as to whether the available research supports an inference of causation.”

Id. at 16 (citing among sources a particularly dubious case, Milward v. Acuity Specialty Prods. Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied, ___ U.S. ___ (2012). Can the authors really intend to say that the judgment of causal inference is or should be made “regardless” of the rigor of methodology, regardless of statistical significance, regardless of a hierarchy of study evidentiary probitiveness?  Perhaps the authors simply meant to say that, at the end of the day, judgments of causal inference are qualitative judgments.  As much as I would like to extend the principle of charity to the authors, their own labeling of appellate decisions contrary to Milward as “silly,” makes the benefit of the doubt seem inappropriate.

 

7.  The shame of scientists and physicians opining on specific causation.  Green and Sanders acknowledge that judgments of specific causation – the causation of harm in a specific person – are often uninformed by scientific considerations, and that Daubert criteria are unhelpful.

“Unfortunately, outside the context of litigation this is an inquiry to which most doctors devote very little time.46  True, they frequently serve as expert witnesses in such cases (because the law demands evidence on this issue) but there is no accepted scientific methodology for determining the cause of an individual’s disease and, therefore, the error rate is simply unknown and unquantifiable.47”

Id. at 18. (Professor Green’s comments at the conference seemed even more apodictic.) The authors, however, seem to have no sense of outrage that expert witnesses offer opinions on this topic, for which the witnesses have no epistemic warrant, and that courts accept these facile, if not fabricated, judgments.  Furthermore, specific causation is very much a scientific issue.  Scientists may, as a general matter, concentrate on population studies that show associations, which may be found to be causal, but some scientists have worked on gene associations that define extremely high risk sub-populations that determine the overall population risk.  As Green and Sanders acknowledge, when the relative risks are extremely high (say > 100), we do not need to use any fancy math to know that most cases in the exposed group will result (but for) from their exposure.  A tremendous amount of scientific work has been done to identify biomarkers of increased risk, and to tie the increased risk to an agent-specific causal mechanism.  See, e.g., Gregory L. Erexson, James L. Wilmer, and Andrew D. Kligerman, “Sister Chromatid Exchange Induction in Human Lymphocytes Exposed to Benzene and Its Metabolites in Vitro,” 45 Cancer Research 2471 (1985).

 

8. Sufficiency versus admissibility.  Green and Sanders opine that many gatekeeping decisions, such as the Bendectin and breast implant cases, should be understood as sufficiency decisions that have incorporated the significant exculpatory epidemiologic evidence offered by defendants.  Id. at 20.  The “mature epidemiologic evidence” overwhelmed the plaintiffs’ meager evidence to the point that a jury verdict was not sustainable as a matter of law.  Id.  The authors’ approach requires a weighing of the complete evidentiary display, “entirely apart from the [plaintiffs’] expert’s testimony, to determine the overall sufficiency and reasonableness of the claimed inference of causation.  Id. at 21.  What is missing, however, from this approach is that even without the defendants’ mature or solid body of epidemiologic evidence, the plaintiff’s expert witness was urging an inference of causation based upon fairly insubstantial evidence. Green and Sanders are concerned, no doubt, that if sufficiency were the main driver of exclusionary rulings, then the disconnect between appellate standard of review for expert witness opinion admissibility, which is reversed only for an “abuse of discretion” by the trial court, and the standard of review for typical grants of summary judgments, which are evaluated “de novo” by the appellate court.  Green and Sanders hint that the expert witnesses decisions, which they see as mainly sufficiency judgments, may not be appropriate for the non-searching “abuse of discretion” standard.  See id. at 40 – 41 (citing the asymmetric “hard look” approach taken in In re Paoli RR Yard PCB Litig., 35 F.3d 717, 749-5- (3d Cir. 1994), and in the intermediate appellate court in Joiner itself).  Of course, the Supreme Court’s decision in Joiner was an abandonment of something akin to de novo hard-look appellate review, lopsidedly applied to exclusions only.  Decisions to admit did not lead to summary dispositions without trial and thus were never given any meaningful appellate review.

Elsewhere, Green and Sanders note that they do not necessarily share the doubts of the “hand wringers” over the inconsistent exclusionary rulings that result from an abuse of discretion standard.  At the end of their article, however, the authors note that viewing expert witness opinion exclusions as “sufficiency determinations” raises the question whether appellate courts should review these determinations de novo, as they would review ordinary factual “no evidence” or “insufficient evidence” grants of summary judgment.  Id. at 40.  There are reasonable arguments both ways, but it is worth pointing out that appellate decisions affirming rulings going both ways on the same expert witnesses, opining about the same litigated causal issue, are different from jury verdicts going both ways on causation.  First, the reasoning of the courts is, we hope, set out for public consumption, discussion, and debate, in a way that a jury’s deliberations are not.  Second, the fact of decisions “going both ways” is a statement that the courts view the issue as close and subject to debate.  Third, if the scientific and legal communities are paying attention, as they should, they can weigh in on the disparity, and on the stated reasons.  Assuming that courts are amenable to good reasons, they may have the opportunity to revisit the issue in a way that juries, which serve for one time on the causal issue, can never do.  We might hope that the better reasoned decisions, especially those that were supported by the disinterested scientific community, would have some persuasive authority,

 

9.  Abridgment of Rule 702’s approach to gatekeeping.  The authors’ approach to sufficiency also suffers from ignoring, not only Rule 703’s requirements into the reasonableness of reliance upon individual studies, but also from ignoring Rule 702 (c) and (d), which require that:

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

These subsections of Rule 702 do not readily allow the use of proxy or substitute measures of validity or reliability; they require the trial court to assess the expert witnesses’ reasoning from data to conclusions. In large part, Green and Sanders have been misled by the instincts of courts to retreat to proxies for validity in the form of “general acceptance,” “peer review,” and contrary evidence that makes the challenged opinion appear “insubstantial.”

There is a substantial danger that Green and Sander’s reductionist approach, and their equation of admissibility with sufficiency, will undermine trial courts’ willingness to assess the more demanding, and time-consuming, validity claims that are inherent in all expert witness causation opinions.

 

10. Weight-of-the evidence (WOE) reasoning.  The authors appear captivated by the use of so-called weight-of-the evidence (WOE) reasoning, questionably featured in some recent judicial decisions.  The so-called WOE method is really not much of a method at all, but rather a hand-waving process that often excuses the poverty of data and valid analysis.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).  See also Schachtman, “Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence” (Sept. 2, 2011).

In Allen v. Pennsylvania Engineering Corp., 102 F.3d 194 (5th Cir.1996), the appellate court disparaged WOE as a regulatory tool for making precautionary judgments, not fit for civil litigation that involves actual causation as opposed to “as if” judgments.  Green and Sanders pejoratively label the Allen court’s approach as “silly”:

“The idea that a regulatory agency would make a carcinogenicity determination if it were not the best explanation of the evidence, i.e., more likely than not, is silly.”

Id. at 29 n.82 (emphasis added).  But silliness is as silliness does.  Only a few pages later in their paper, Green and Sanders admit that:

“As some courts have noted, the regulatory threshold is lower than required in tort claims. With respect to the decision of the FDA to withdraw approval of Parlodel, the court in Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), judgment aff’d, 252 F.3d 986 (8th Cir. 2001), commented that the FDA’s withdrawal statement, “does not establish that the FDA had concluded that bormocriptine can cause an ICH [intreceberal hemorrhage]; instead, it indicates that in light of the limited social utility of bromocriptine in treating lactation and the reports of possible adverse effects, the drug should no longer be used for that purpose. For these reasons, the court does not believe that the FDA statement alone establishes the reliability of plaintiffs’ experts’ causation testimony.” Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), aff’d, 252 F.3d 986 (8th Cir. 2001).”

Id. at 34 n.101. Not only do the authors appear to contradict themselves on the burden of persuasion for regulatory decisions, they offer no support for their silliness indictment.  Certainly, regulatory decisions, and not only the FDA’s, are frequently based upon precautionary principles that involve applying uncertain, ambiguous, or confusing data analyses to the process of formulating protective rules and regulations in the absence of scientific knowledge.  Unlike regulatory agencies, which operate under the Administrative Procedures Act, federal courts, and many state courts, operate under Rule 702 and 703’s requirements that expert witness opinion have the epistemic warrant of “knowledge,” not hunch, conjecture, or speculation.