TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Admissibility versus Sufficiency of Expert Witness Evidence

April 18th, 2012

Professors Michael Green and Joseph Sanders are two of the longest serving interlocutors in the never-ending discussion and debate about the nature and limits of expert witness testimony on scientific questions about causation.  Both have made important contributions to the conversation, and both have been influential in academic and judicial circles.  Professor Green has served as the co-reporter for the American Law Institute’s Restatement (Third) of Torts: Liability for Physical Harm.  Whether wrong or right, new publications about expert witness issues by Green or Sanders call for close attention.

Early last month, Professors Green and Sanders presented together at a conference, on “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States.” Video and audio of their presentation can be found online.  The authors posted a manuscript of their draft article on expert witness testimony to the Social Science Research Network.  See Michael D. Green & Joseph Sanders, “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States,” <downloaded on March 25, 2012>.

The authors argue that most judicial exclusions of expert witness causal opinion testimony are based upon a judgment that the challenged witness’s opinion is based upon insufficient evidence.  They point to litigations, such as the Bendectin and silicone gel breast implant cases, where the defense challenges were supported in part by a body of “exonerative” epidemiologic studies.  Legal theory construction is always fraught with danger in that it either stands to be readily refuted by counterexample, or it is put forward as a normative, prescriptive tool to change the world, thus lacking in descriptive or explanatory component.  Green and Sanders, however, seem to be earnest in suggesting that their reductionist approach is both descriptive and elucidative of actual judicial practice.

The authors’ reductionist approach in this area, and especially as applied to the Bendectin and silicone decisions, however, ignores that even before the so-called exonerative epidemiology on Bendectin and silicone was available, the plaintiffs’ expert witnesses were presenting opinions on general and specific causation, based upon studies and evidence of dubious validity. Given that the silicone litigation erupted before Daubert was decided, and Bendectin cases pretty much ended with Daubert, neither litigations really permit a clean before and after picture.  Before Daubert, courts struggled with how to handle both the invalidity and the insufficiency (once the impermissible inferences were stripped away) in the Bendectin cases.  And before Daubert, all silicone cases went to the jury.  Even after Daubert, for some time, silicone cases resulted in jury verdicts, which were upheld on appeal.  It took defendants some time to uncover the nature and extent of the invalidity in plaintiffs’ expert witnesses’ opinions, the invalidity of the studies upon which these witnesses relied, and the unreasonableness of the witnesses’ reliance upon various animal and in vitro toxicologic and immunologic studies. And it took trial courts a few years after the Supreme Court’s 1993 Daubert decision to warm up to their new assignment.  Indeed, Green and Sanders get a good deal of mileage in their reductionist approach from trial and appellate courts that were quite willing to collapse the distinction between reliability or validity on the one hand, and sufficiency on the other.  Some of those “back benching” courts used consensus statements and reviews, which both marshaled the contrary evidence as well as documented the invalidity of the would-be affirmative evidence.  This judicial reliance upon external sources that encompassed both sufficiency and reliability should not be understood to mean that reliability (or validity) is nothing other than sufficiency.

A post-Daubert line of cases is more revealing:  the claim that the ethyl mercury vaccine preservative, thimerosal, causes autism.  Professors Green and Sanders touch briefly upon this litigation.  See Blackwell v. Wyeth, 971 A.2d 235 (Md. 2009).  Plaintiff’s expert witness, David Geier, had published several articles in which he claimed to have supported a causal nexus between thimerosal and autism.  Green and Sanders dutifully note that the Maryland courts ultimately rejected the claims based upon Geier’s data as wholly inadequate, standing alone to support the inference he zealously urged to be drawn.  Id. at 32.  Whether this is sufficiency or the invalidity of his ultimate inference of causation from an inadequate data set perhaps can be debated, but surely the validity concerns should not be lost in the shuffle of evaluating the evidence available.  Of course, exculpatory epidemiologic studies ultimately were published, based upon high quality data and inferences, but strictly speaking, these studies were not necessary to the process of ruling Geier’s advocacy science out of bounds for valid scientific discourse and legal proceedings.

Some additional comments.

 

1. Questionable reductionism.  The authors describe the thrust of their argument as a need to understand judicial decisions on expert witness admissibility as “sufficiency judgments.”  While their analysis simplifies the gatekeeping decisions, it also abridges the process in a way that omits important determinants of the law and its application.  Sufficiency, or the lack thereof, is often involved as a fatal deficiency in expert witness opinion testimony on causal issues, but the authors’ attempt to reduce many exclusionary decisions to insufficiency determinations ignores the many ways that expert witnesses (and scientists in the real world outside of courtrooms) go astray.  The authors’ reductionism seems a weak, if not flawed, predictive, explanatory, and normative theory of expert witness gatekeeping.  Furthermore, this reductionism holds a false allure to judges who may be tempted to oversimplify their gatekeeping task by conflating gatekeeping with the jury’s role:  exclude the proffered expert witness opinion testimony because, considering all the available evidence, the testimony is probably wrong.

 

2. Weakness of peer review, publication, and general acceptance in predicting gatekeeping decisions.  The authors further describe a “sufficiency approach” as openly acknowledging the relative unimportance of peer review, publication, and general acceptance.  Id. at 39.  These factors do not lack importance because they are unrelated to sufficiency; they are unimportant because they are weak proxies for validity.  Their presence or absence does not really help predict whether the causal opinion offered is invalid,  or otherwise unreliable.  The existence of published, high-quality, peer-reviewed systematic reviews does, however, bear on sufficiency of the evidence.  At least in some cases, courts consider such reviews and rely upon them heavily in reaching a decision on Rule 702, but we should ask to what extent has the court simply avoided the hard work of thinking through the problem on its own.

 

3. Questionable indictment of juries and the adversarial system for the excesses of expert witnesses.  Professors Green and Sanders describe the development of common law, and rules, to control expert witness testimony as “a judicial attempt to moderate the worst consequences of two defining characteristics of United States civil trials:  party control of experts and the widespread use of jury decision makers.” Id. at 2.  There is no doubt that these are two contributing factors in some of the worst excesses, but the authors really offer no support for their causal judgment.  The experience of courts in Europe, where civil juries and party control of expert witnesses are often absent from the process, raises questions about the Green and Sanders’ attribution. See, e.g., R. Meester, M. Collins, R.D. Gill, M. van Lambalgen,  “On the (ab)use of statistics in the legal case against the nurse Lucia de B”. 5 Law, Probability and Risk 233 (2007) (describing the conviction of Nurse Lucia de Berk in the Netherlands, based upon shabby statistical evidence).

Perhaps a more general phenomenon is at play, such as an epistemologic pathology of expert witnesses who feel empowered and unconstrained by speaking in court, to untutored judges or juries.  The thrill of power, the arrogance of asserted opinion, the advancement of causes and beliefs, the lure of lucre, the freedom from contradiction, and a whole array of personality quirks are strong inducements for expert witnesses, in many countries, to outrun their scientific headlights.  See Judge Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans”; “[t]he breast implant litigation was largely based on a litigation fraud. … Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”)

In any event, there have been notoriously bad verdicts in cases decided by trial judges as the finders of fact.  See, e.g., Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D.Ga. 1985), aff’d and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986); Barrow v. Bristol-Meyers Squibb Co., 1998 WL 812318, at *23 (M.D. Fla., Oct. 29, 1998)(finding for breast implant plaintiff whose claims were supported by dubious scientific studies), aff’d, 190 F. 3d 541 (11th Cir. 1999).  Bad things can happen in the judicial process even without the participation of lay juries.

Green and Sanders are correct to point out that juries are often confused by scientific evidence, and lack the time, patience, education, and resources to understand it.  Same for judges.  The real difference is that the decisions of judges is public.  Judges are expected to explain their reasoning, and there is some, even if limited, appellate review for judicial gatekeeping decisions. In this vein, Green and Sanders dismiss the hand wringing over disagreements among courts on admissibility decisions by noting that similar disagreements over evidentiary sufficiency issues fill the appellate reporters.  Id. at 37.  Green and Sanders might well add that at least the disagreements are out in the open, advanced with supporting reasoning, for public discussion and debate, unlike the unimpeachable verdicts of juries and their cloistered, secretive reasoning or lack thereof.

In addition, Green and Sander’s fail to mention a considerable problem:  the admission of weak, pathologic, or overstated scientific opinion undermines confidence in the judicial judgments based upon verdicts that come out of a process that featured the dubious opinions of the expert witnesses.  The public embarrassment of the court system for its judgments, based upon questionable expert witness opinion testimony, was a strong inducement to changing the libertine pre-Daubert laissez-faire approach.

 

4.  Failure to consider the important role of Rule 703, which is quite independent of any “sufficiency” considerations, in the gatekeeping process.  Green and Sanders properly acknowledge the historical role that Rule 703, of the Federal Rules of Evidence, played in judicial attempts to regain some semblance of control over expert witness opinion.  They do not pursue the issue of its present role, which is often neglected and underemphasized.  In part, Rule 703, with its requirement that courts screen expert witness reliance upon independently inadmissible evidence (which means virtually all epidemiologic and animal studies and their data analyses), goes to the heart of gatekeeping by requiring judges to examine the quality of study data, and the reasonableness of reliance upon such data, by testifying expert witnesses.  See Schachtman, RULE OF EVIDENCE 703 — Problem Child of Article VII (Sept. 19, 2011).  Curiously, the authors try to force Rule 703 into their sufficiency pigeonhole even though it calls for a specific inquiry into the reasonableness (vel non) of reliance upon specific (hearsay or otherwise inadmissible) studies.  In my view, Rule 703 is predominantly a validity, and not a sufficiency, inquiry.

Judge Weinstein’s use of Rule 703, in In re Agent Orange, to strip out the most egregiously weak evidence did not predominantly speak to the evidentiary insufficiency of the plaintiffs’ expert witnesses reliance materials; nor did it look to the defendants’ expert witnesses’ reliance upon contradicting evidence.  Judge Weinstein was troubled by the plaintiffs’ expert witnesses reliance upon hearsay statements, from biased witnesses, of the plaintiffs’ medical condition.  Judge Weinstein did, of course, famously apply sufficiency criteria, including relative risks too low to permit an inference of specific causation, and the insubstantial totality of the evidence, but Judge Weinstein’s judicial philosophy then was to reject Rule 702 as a quality-control procedure for expert witness opinion testimony.  See In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 817 (E.D.N.Y. 1984)(plaintiffs must prove at least a two-fold increase in rate of disease allegedly caused by the exposure), aff’d, 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004  (1988); see also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1240, 1262 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).  A decade later, in the breast implant litigation, Judge Weinstein adhered to his rejection of Rule 702 to make explicit expert witness validity rulings or sufficiency determinations by granting summary judgment on the entire evidentiary display.  This assessment of sufficiency was not, however, driven by the rules of evidence; it was based firmly upon Federal Rule of Civil Procedure 56’s empowerment of the trial judge to make an overall assessment that plaintiffs lack a submissible case.  See In re Breast Implant Cases, 942 F.Supp. 958 (E. & S.D.N.Y. 1996)(granting summary judgment because of insufficiency of plaintiffs’ evidence, but specifically declining to rule on defendants’ Rule 702 and Rule 703 motions).  Within a few years, court-appointed expert witnesses, and the Institute of Medicine, weighed in with withering criticisms of plaintiffs’ attempted scientific case.  Given that there was so little valid evidence, sufficiency really never was at issue for these experts, but Judge Weinstein chose to frame the issue as sufficiency to avoid ruling on the pending motions under Rule 702.

 

5. Re-analyzing Re-analysis.  In the Bendectin litigation, some of the plaintiffs’ expert witnesses sought to offer various re-analyses of published papers.  Defendant Merrell Dow objected, and appeared to have framed its objections in general terms to unpublished re-analyses of published papers.  Green and Sanders properly note that some of the defense arguments, to the extent stated generally as prohibitions against re-analyses, were overblown and overstated.  Re-analyses can take so many forms, and the quality of peer reviewed papers is so variable, it would be foolhardy to frame a judicial rule as a prohibition against re-analyzing data in published studies.  Indeed, so many studies are published with incorrect statistical analyses that parties and expert witnesses have an obligation to call the problems to the courts’ attention, and to correct the problems when possible.

The notion that peer review was important in any way to serve as a proxy for reliability or validity has not been borne out.  Similarly, the suggestion that reanalyses of existing data from published papers were presumptively suspect was also not well considered.  Id. at 13.

 

6. Comments dismissive of statistical significance and methodological rigor.  Judgments of causality are, at the end of the day, qualitative judgments, but is it really true that:

“Ultimately, of course, regardless of how rigorous the methodology of more probative studies, the magnitude of any result and whether it is statistically significant, judgment and inference is required as to whether the available research supports an inference of causation.”

Id. at 16 (citing among sources a particularly dubious case, Milward v. Acuity Specialty Prods. Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied, ___ U.S. ___ (2012). Can the authors really intend to say that the judgment of causal inference is or should be made “regardless” of the rigor of methodology, regardless of statistical significance, regardless of a hierarchy of study evidentiary probitiveness?  Perhaps the authors simply meant to say that, at the end of the day, judgments of causal inference are qualitative judgments.  As much as I would like to extend the principle of charity to the authors, their own labeling of appellate decisions contrary to Milward as “silly,” makes the benefit of the doubt seem inappropriate.

 

7.  The shame of scientists and physicians opining on specific causation.  Green and Sanders acknowledge that judgments of specific causation – the causation of harm in a specific person – are often uninformed by scientific considerations, and that Daubert criteria are unhelpful.

“Unfortunately, outside the context of litigation this is an inquiry to which most doctors devote very little time.46  True, they frequently serve as expert witnesses in such cases (because the law demands evidence on this issue) but there is no accepted scientific methodology for determining the cause of an individual’s disease and, therefore, the error rate is simply unknown and unquantifiable.47”

Id. at 18. (Professor Green’s comments at the conference seemed even more apodictic.) The authors, however, seem to have no sense of outrage that expert witnesses offer opinions on this topic, for which the witnesses have no epistemic warrant, and that courts accept these facile, if not fabricated, judgments.  Furthermore, specific causation is very much a scientific issue.  Scientists may, as a general matter, concentrate on population studies that show associations, which may be found to be causal, but some scientists have worked on gene associations that define extremely high risk sub-populations that determine the overall population risk.  As Green and Sanders acknowledge, when the relative risks are extremely high (say > 100), we do not need to use any fancy math to know that most cases in the exposed group will result (but for) from their exposure.  A tremendous amount of scientific work has been done to identify biomarkers of increased risk, and to tie the increased risk to an agent-specific causal mechanism.  See, e.g., Gregory L. Erexson, James L. Wilmer, and Andrew D. Kligerman, “Sister Chromatid Exchange Induction in Human Lymphocytes Exposed to Benzene and Its Metabolites in Vitro,” 45 Cancer Research 2471 (1985).

 

8. Sufficiency versus admissibility.  Green and Sanders opine that many gatekeeping decisions, such as the Bendectin and breast implant cases, should be understood as sufficiency decisions that have incorporated the significant exculpatory epidemiologic evidence offered by defendants.  Id. at 20.  The “mature epidemiologic evidence” overwhelmed the plaintiffs’ meager evidence to the point that a jury verdict was not sustainable as a matter of law.  Id.  The authors’ approach requires a weighing of the complete evidentiary display, “entirely apart from the [plaintiffs’] expert’s testimony, to determine the overall sufficiency and reasonableness of the claimed inference of causation.  Id. at 21.  What is missing, however, from this approach is that even without the defendants’ mature or solid body of epidemiologic evidence, the plaintiff’s expert witness was urging an inference of causation based upon fairly insubstantial evidence. Green and Sanders are concerned, no doubt, that if sufficiency were the main driver of exclusionary rulings, then the disconnect between appellate standard of review for expert witness opinion admissibility, which is reversed only for an “abuse of discretion” by the trial court, and the standard of review for typical grants of summary judgments, which are evaluated “de novo” by the appellate court.  Green and Sanders hint that the expert witnesses decisions, which they see as mainly sufficiency judgments, may not be appropriate for the non-searching “abuse of discretion” standard.  See id. at 40 – 41 (citing the asymmetric “hard look” approach taken in In re Paoli RR Yard PCB Litig., 35 F.3d 717, 749-5- (3d Cir. 1994), and in the intermediate appellate court in Joiner itself).  Of course, the Supreme Court’s decision in Joiner was an abandonment of something akin to de novo hard-look appellate review, lopsidedly applied to exclusions only.  Decisions to admit did not lead to summary dispositions without trial and thus were never given any meaningful appellate review.

Elsewhere, Green and Sanders note that they do not necessarily share the doubts of the “hand wringers” over the inconsistent exclusionary rulings that result from an abuse of discretion standard.  At the end of their article, however, the authors note that viewing expert witness opinion exclusions as “sufficiency determinations” raises the question whether appellate courts should review these determinations de novo, as they would review ordinary factual “no evidence” or “insufficient evidence” grants of summary judgment.  Id. at 40.  There are reasonable arguments both ways, but it is worth pointing out that appellate decisions affirming rulings going both ways on the same expert witnesses, opining about the same litigated causal issue, are different from jury verdicts going both ways on causation.  First, the reasoning of the courts is, we hope, set out for public consumption, discussion, and debate, in a way that a jury’s deliberations are not.  Second, the fact of decisions “going both ways” is a statement that the courts view the issue as close and subject to debate.  Third, if the scientific and legal communities are paying attention, as they should, they can weigh in on the disparity, and on the stated reasons.  Assuming that courts are amenable to good reasons, they may have the opportunity to revisit the issue in a way that juries, which serve for one time on the causal issue, can never do.  We might hope that the better reasoned decisions, especially those that were supported by the disinterested scientific community, would have some persuasive authority,

 

9.  Abridgment of Rule 702’s approach to gatekeeping.  The authors’ approach to sufficiency also suffers from ignoring, not only Rule 703’s requirements into the reasonableness of reliance upon individual studies, but also from ignoring Rule 702 (c) and (d), which require that:

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

These subsections of Rule 702 do not readily allow the use of proxy or substitute measures of validity or reliability; they require the trial court to assess the expert witnesses’ reasoning from data to conclusions. In large part, Green and Sanders have been misled by the instincts of courts to retreat to proxies for validity in the form of “general acceptance,” “peer review,” and contrary evidence that makes the challenged opinion appear “insubstantial.”

There is a substantial danger that Green and Sander’s reductionist approach, and their equation of admissibility with sufficiency, will undermine trial courts’ willingness to assess the more demanding, and time-consuming, validity claims that are inherent in all expert witness causation opinions.

 

10. Weight-of-the evidence (WOE) reasoning.  The authors appear captivated by the use of so-called weight-of-the evidence (WOE) reasoning, questionably featured in some recent judicial decisions.  The so-called WOE method is really not much of a method at all, but rather a hand-waving process that often excuses the poverty of data and valid analysis.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).  See also Schachtman, “Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence” (Sept. 2, 2011).

In Allen v. Pennsylvania Engineering Corp., 102 F.3d 194 (5th Cir.1996), the appellate court disparaged WOE as a regulatory tool for making precautionary judgments, not fit for civil litigation that involves actual causation as opposed to “as if” judgments.  Green and Sanders pejoratively label the Allen court’s approach as “silly”:

“The idea that a regulatory agency would make a carcinogenicity determination if it were not the best explanation of the evidence, i.e., more likely than not, is silly.”

Id. at 29 n.82 (emphasis added).  But silliness is as silliness does.  Only a few pages later in their paper, Green and Sanders admit that:

“As some courts have noted, the regulatory threshold is lower than required in tort claims. With respect to the decision of the FDA to withdraw approval of Parlodel, the court in Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), judgment aff’d, 252 F.3d 986 (8th Cir. 2001), commented that the FDA’s withdrawal statement, “does not establish that the FDA had concluded that bormocriptine can cause an ICH [intreceberal hemorrhage]; instead, it indicates that in light of the limited social utility of bromocriptine in treating lactation and the reports of possible adverse effects, the drug should no longer be used for that purpose. For these reasons, the court does not believe that the FDA statement alone establishes the reliability of plaintiffs’ experts’ causation testimony.” Glastetter v. Novartis Pharmaceuticals Corp., 107 F. Supp. 2d 1015 (E.D. Mo. 2000), aff’d, 252 F.3d 986 (8th Cir. 2001).”

Id. at 34 n.101. Not only do the authors appear to contradict themselves on the burden of persuasion for regulatory decisions, they offer no support for their silliness indictment.  Certainly, regulatory decisions, and not only the FDA’s, are frequently based upon precautionary principles that involve applying uncertain, ambiguous, or confusing data analyses to the process of formulating protective rules and regulations in the absence of scientific knowledge.  Unlike regulatory agencies, which operate under the Administrative Procedures Act, federal courts, and many state courts, operate under Rule 702 and 703’s requirements that expert witness opinion have the epistemic warrant of “knowledge,” not hunch, conjecture, or speculation.

Relative of Risk > Two in the Courts – Updated

March 3rd, 2012

See , for the updated the case law on the issue of using relative and attributable risks to satisfy plaintiff’s burden of showing, more likely than not, that an exposure or condition caused a plaintiff’s disease or injury.

When There Is No Risk in Risk Factor

February 20th, 2012

Some of the terminology of statistics and epidemiology is not only confusing, but it is misleading.  Consider the terms “effect size,” “random effects,” and “fixed effect,” which are all used to describe associations even if known to be non-causal.  Biostatisticians and epidemiologists know that the terms are about putative or potential effects, but the sloppy, short-hand nomenclature can be misleading.

Although “risk” has a fairly precise meaning in scientific parlance, the usage for “risk factor” is fuzzy, loose, and imprecise.  Journalists and plaintiffs’ lawyers use “risk factor,” much as they another frequently abused term in their vocabulary:  “link.”  Both “risk factor” and “link” sound as though they are “causes,” or at least as though they have something to do with causation.  The reality is usually otherwise.

The business of exactly what “risk factor” means is puzzling and disturbing.  The phrase seems to have gained currency because it is squishy and without a definite meaning.  Like the use of “link” by journalists, the use of “risk factor” protects the speaker against contradiction, but appears to imply a scientifically valid conclusion.  Plaintiffs’ counsel and witnesses love to throw this phrase around precisely because of its ambiguity.  In journal articles, authors sometimes refer to any exposure inquired about in a case-control study to be a “risk factor,” regardless of the study result.  So a risk factor can be merely an “exposure of interest,” or a possible cause, or a known cause.

The author’s meaning in using the phrase “risk factor” can often be discerned from context.  When an article reports a case-control study, which finds an association with an exposure to some chemical the article will likely report in the discussion section that the study found that chemical to be a risk factor.  The context here makes clear that the chemical was found to be associated with the outcome, and that chance was excluded as a likely explanation because the odds ratio was statistically significant.  The context is equally clear that the authors did not conclude that the chemical was a cause of the outcome because they did not rule out bias or confounding; nor did they do any appropriate analysis to reach a causal conclusion and because their single study would not have justified reaching a causal association.

Sometimes authors qualify “risk factor” with an adjective to give more specific meaning to their usage.  Some of the adjectives used in connection with the phrase include:

– putative, possible, potential, established, well-established, known, certain, causal, and causative

The use of the adjective highlights the absence of a precise meaning for “risk factor,” standing alone.  Adjectives such as “established,” or “known” imply earlier similar findings, which are corroborated by the study at hand.  Unless “causal” is used to modify “risk factor,” however, there is no reason to interpret the unqualified phrase to imply a cause.

Here is how the phrase “risk factor” is described in some noteworthy texts and treatises.

Legal Treatises

Professor David Faigman, and colleagues, with some understatement, note that the term “risk factor is loosely used”:

Risk Factor An aspect of personal behavior or life-style, an environmental exposure, or an inborn or inherited characteristic, which on the basis of epidemiologic evidence is known to be associated with health-related condition(s) considered important to prevent. The term risk factor is rather loosely used, with any of the following meanings:

1. An attribute or exposure that is associated with an increased probability of a specified outcome, such as the occurrence of a disease. Not necessarily a causal factor.

2. An attribute or exposure that increases the probability of occurrence of disease or other specified outcome.

3. A determinant that can be modified by intervention, thereby reducing the probability of occurrence of disease or other specified outcomes.”

David L. Faigman, Michael J. Saks, Joseph Sanders, and Edward Cheng, Modern Scientific Evidence:  The Law and Science of Expert Testimony 301, vol. 1 (2010)(emphasis added).

The Reference Manual on Scientific Evidence (2011) (RMSE3d) does not offer much in the way of meaningful guidance here.  The chapter on statistics in the third edition provides a somewhat circular, and unhelpful definition.  Here is the entry in that chapter’s glossary:

risk factor. See independent variable.

RMSE3d at 295.  If the glossary defined “independent variable” as a simply a quantifiable variable that was being examined for some potential relationship with the outcome, or dependent, variable, the RMSE would have avoided error.  Instead the chapter’s glossary, as well as its text, defines independent variables as “causes,” which begs the question why do a study to determine whether the “independent variable” is even a candidate for a causal factor?  Here is how the statistics chapter’s glossary defines independent variable:

“Independent variables (also called explanatory variables, predictors, or risk factors) represent the causes and potential confounders in a statistical study of causation; the dependent variable represents the effect. ***. “

RMSE3d at 288.  This is surely circular.  Studies of causation are using independent variables that represent causes?  There would be no reason to do the study if we already knew that the independent variables were causes.

The text of the RMSE chapter on statistics propagates the same confusion:

“When investigating a cause-and-effect relationship, the variable that represents the effect is called the dependent variable, because it depends on the causes.  The variables that represent the causes are called independent variables. With a study of smoking and lung cancer, the independent variable would be smoking (e.g., number of cigarettes per day), and the dependent variable would mark the presence or absence of lung cancer. Dependent variables also are called outcome variables or response variables. Synonyms for independent variables are risk factors, predictors, and explanatory variables.”

FMSE3d at 219.  In the text, the identification of causes with risk factors is explicit.  Independent variables are the causes, and a synonym for an independent variable is “risk factor.”  The chapter could have avoided this error simply by the judicious use of “putative,” or “candidate” in front of “causes.”

The chapter on epidemiology exercises more care by using “potential” to modify and qualify the risk factors that are considered in a study:

“In contrast to clinical studies in which potential risk factors can be controlled, epidemiologic investigations generally focus on individuals living in the community, for whom characteristics other than the one of interest, such as diet, exercise, exposure to other environmental agents, and genetic background, may distort a study’s results.”

FMSE3d at 556 (emphasis added).

 

Scientific Texts

Turning our attention to texts on epidemiology written for professionals rather than judges, we find that sometimes the term “risk factor” with a careful awareness of its ambiguity.

Herbert I. Weisberg is a statistician whose firm, Correlation Research Inc., specializes in the applied statistics in legal issues.  Weisberg recently published an interesting book on bias and causation, which is recommended reading for lawyers who litigate claimed health effects.  Weisberg’s book defines “risk factor” as merely an exposure of interest in a study that is looking for associations with a harmful outcome.  He insightfully notes that authors use the phrase “risk factor” and similar phrases to avoid causal language:

“We will often refer to this factor of interest as a risk factor, although the outcome event is not necessarily something undesirable.”

Herbert I. Weisberg, Bias and Causation:  Models and Judgment for Valid Comparisons 27 (2010).

“Causation is discussed elliptically if at all; statisticians typically employ circumlocutions such as ‘independent risk factor’ or ‘explanatory variable’ to avoid causal language.”

Id. at 35.

Risk factor : The risk factor is the exposure of interest in an epidemiological study and often has the connotation that the outcome event is harmful or in some way undesirable.”

Id. at 317.   This last definition is helpful in illustrating a balanced, fair definition that does not conflate risk factor with causation.

*******************

Lemuel A. Moyé is an epidemiologist who testified in pharmaceutical litigation, mostly for plaintiffs.  His text, Statistical Reasoning in Medicine:  The Intuitive P-Value Primer, is in places a helpful source of guidance on key concepts.  Moyé puts no stock in something’s being a risk factor unless studies show a causal relationship, established through a proper analysis.  Accordingly, he uses “risk factor” to signify simply an exposure of interest:

4.2.1 Association versus Causation

An associative relationship between a risk factor and a disease is one in which the two appear in the same patient through mere coincidence. The occurrence of the risk factor does not engender the appearance of the disease.

Causal relationships on the other hand are much stronger. A relationship is causal if the presence of the risk factor in an individual generates the disease. The causative risk factor excites the production of the disease. This causal relationship is tight, containing an embedded directionality in the relationship, i.e., (1) the disease is absence in the patient, (2) the risk factor is introduced, and (3) the risk factor’s presence produces the disease.

The declaration that a relationship is causal has a deeper meaning then the mere statement that a risk factor and disease are associated. This deeper meaning and its implications for healthcare require that the demonstration of a causal relationship rise to a higher standard than just the casual observation of the risk factor and disease’s joint occurrence.

Often limited by logistics and the constraints imposed by ethical research, the epidemiologist commonly cannot carry out experiments that identify the true nature of the risk factor–disease relationship. They have therefore become experts in observational studies. Through skillful use of observational research methods and logical thought, epidemiologists assess the strength of the links between risk factors and disease.”

Lemuel A. Moyé, Statistical Reasoning in Medicine:  The Intuitive P-Value Primer 92 (2d ed. 2006)

***************************

In A Dictionary of Epidemiology, which is put out by the International Epidemiology Association, a range of meanings is acknowledged, although the range is weighted toward causality:

“RISK FACTOR (Syn: risk indicator)

1. An aspect of personal behavior or lifestyle, an environmental exposure, or an inborn or inherited characteristic that, on the basis of scientific evidence, is known to be associated with meaningful health-related condition(s). In the twentieth century multiple cause era, a synonymous with determinant acting at the individual level.

2. An attribute or exposure that is associated with an increased probability of a specified outcome, such as the occurrence of a disease. Not necessarily a causal factor: it may be a risk marker.

3. A determinant that can be modified by intervention, thereby reducing the probability of occurrence of disease or other outcomes. It may be referred to as a modifiable risk factor, and logically must be a cause of the disease.

The term risk factor became popular after its frequent use by T. R. Dawber and others in papers from the Framingham study.346 The pursuit of risk factors has motivated the search for causes of chronic disease over the past half-century. Ambiguities in risk and in risk-related concepts, uncertainties inherent to the concept, and different legitimate meanings across cultures (even if within the same society) must be kept in mind in order to prevent medicalization of life and iatrogenesis.124–128,136,142,240

Miquel Porta, Sander Greenland, John M. Last, eds., A Dictionary of Epidemiology 218-19 (5th ed. 2008).  We might add that the uncertainties inherent in risk concepts should be kept in mind to prevent overcompensation for outcomes not shown to be caused by alleged tortogens.

***************

One introductory text uses “risk factor” as a term to describe the independent variable, while acknowledging that the variable does not become a risk factor until after the study shows an association between factor and the outcome of interest:

“A case-control study is one in which the investigator seeks to establish an association between the presence of a characteristic (a risk factor).”

Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals 104 (3d ed. 2004).  See also id. at 198 (“Here, also, epidemiology plays a central role in identifying risk factors, such as smoking for lung cancer”).  Although it should be clear that much more must happen in order to show a risk factor is causally associated with an outcome, such as lung cancer, it would be helpful to spell this out.  Some texts simply characterize risk factor as associations, not necessarily causal in nature.  Another basic text provides:

“Analytical studies examine an association, i.e. the relationship between a risk factor and a disease in detail and conduct a statistical test of the corresponding hypothesis … .”

Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 18 (2005).  See also id. at 111 (Table describing the reasoning in a case-control study:    “Increased prevalence of risk factor among diseased may indicate a causal relationship.”)(emphasis added).

These texts, both legal and scientific, indicate a wide range of usage and ambiguity for “risk factor.”  There is a tremendous potential for the unscrupulous expert witness, or the uneducated lawyer, to take advantage of this linguistic latitude.  Courts and counsel must be sensitive to the ambiguity and imprecision in usages of “risk factor,” and the mischief that may result.  The Reference Manual on Scientific Evidence needs to sharpen and update its coverage of this and other statistical and epidemiologic issues.

When Is Risk Really Risk?

February 14th, 2012

The term “risk” has a fairly precise meaning in scientific parlance.  The following is a typical definition:

RISK The probability that an event will occur, e.g., that an individual will become ill or die within a stated period of time or by a certain age. Also, a nontechnical term encompassing a variety of measures of the probability of a (generally) unfavorable outcome. See also probability.

Miquel Porta, ed., A Dictionary of Epidemiology 212-18 (5th ed. 2008)(sponsored by the Internat’l Epidemiological Ass’n).

In other words, a risk is an ex ante cause.  The probability is not a qualification about whether there is a causal relationship, but rather whether any person at risk will develop the outcome of interest.  Such is the nature of stochastic risks.

Regulatory agencies often use the term “risk” metaphorically, as a fiction to justify precautionary regulations.  Although there may be nothing wrong with such precautionary initiatives, regulators often imply a real threat of harm from what can only be a hypothetical harm.  Why?  If for no other reason, regulators operate with a “wish bias” in favor of the reality of the risk they wish to avert if risk it should be.  We can certainly imagine the cognitive slippage that results from the need to motivate the regulated actors to comply with regulations, and at times, to prosecute the noncompliant.

Plaintiffs’ counsel in personal injury and class action litigation have none of the regulators’ socially useful motives for engaging in distortions of the meaning of the word “risk.”  In the context of civil litigation, plaintiffs’ counsel use the term “risk,” borrowed from the Humpty-Dumpty playbook:

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master — that’s all.”

Lewis Carroll, Through the Looking-Glass 72 (Raleigh 1872).

Undeniably, the word mangling and distortion have had some success with weak-minded judges, but Humpty-Dumpty linguistics had a fall recently in the Third Circuit.  Others have written about it, but I am only just getting around to read the analytically precise and insightful decision in Gates v. Rohm and Haas Co., 655 F.3d 255 (3d Cir. 2011).  See Sean Wajert, “Court of Appeals Rejects Medical Monitoring Class Action” (Aug. 31, 2011); Carl A. Solano, “Appellate Court Consensus on Medical Monitoring Class Actions Solidifies” (Sept. 12, 2011).

Gates was an attempted class action, in which the district court denied plaintiffs’ motion for certification of a medical monitoring and property damage class.  265 F.R.D. 208 (E.D.Pa. 2010)(Pratter, J.).  Plaintiffs contended that they were exposed to varying amounts of vinyl chloride exposure in air, and perhaps in water at levels too low to detect. Gates, 655 F.3d at 258-59.   The class’s request for medical monitoring foundered because plaintiffs were unable to prove that they were all exposed to a level of vinyl chloride that created a significant risk of serious latent disease for all class members. Id. at 267-68.

With no scientific evidence in hand, the plaintiffs tried to maintain that they were “at risk” on the basis of EPA regulations, which set a very low, precautionary threshold, but the district and circuit courts rebuffed this use of regulatory “risk” language:

The court identified two problems with the proposed evidence. First, it rejected the plaintiffs’ proposed threshold—exposure above 0.07µ/m3, developed as a regulatory threshold by the EPA for mixed populations of adults and children—as a proper standard for determining liability under tort law. Second, the court correctly noted, even if the 0.07 µ/m3 standard were a correct measurement of the aggregate threshold, it would not be the threshold for each class member who may be more or less susceptible to diseases from exposure to vinyl chloride.18 Although the positions of regulatory policymakers are relevant, their risk assessments are not necessarily conclusive in determining what risk exposure presents to specified individuals. See Federal Judicial Center, Reference Manual on Scientific Evidence 413 (2d ed.2000) (“While risk assessment information about a chemical can be somewhat useful in a toxic tort case, at least in terms of setting reasonable boundaries as to the likelihood of causation, the impetus for the development of risk assessment has been the regulatory process, which has different goals.”); id. at 423 (“Particularly problematic are generalizations made in personal injury litigation from regulatory positions…. [I]f regulatory standards are discussed in toxic tort cases to provide a reference point for assessing exposure levels, it must be recognized that  there is a great deal of variability in the extent of evidence required to support different regulations.”).

Thus, plaintiffs could not carry their burden of proof for a class of specific persons simply by citing regulatory standards for the population as a whole. Cf. Wright v. Willamette Indus., Inc., 91 F.3d 1105, 1107 (8th Cir.1996) (“Whatever may be the considerations that ought to guide a legislature in its determination of what the general good requires, courts and juries, in deciding cases, traditionally make more particularized inquiries into matters of cause and effect.”).

Plaintiffs have failed to propose a method of proving the proper point where exposure to vinyl chloride presents a significant risk of developing a serious latent disease for each class member.

Plaintiffs propose a single concentration without accounting for the age of the class member being exposed, the length of exposure, other individual factors such as medical history, or showing the exposure was so toxic that such individual factors are irrelevant. The court did not abuse its discretion in concluding individual issues on this point make trial as a class unfeasible, defeating cohesion.

Id. at 268.  For class actions, the inability to invoke a low threshold of “permissible” exposure may be the death knell of medical monitoring and personal injury class actions.  The implications of the Gates court’s treatment of “regulatory risk” is, however, more far reaching.  Sometimes risk is not really risk at all.  The ambiguity of the risk in risk assessment has confused judges from the lowest magistrate up to Supreme Court justices.  It is time to disambiguate.  See General Electric v. Joiner, 522 U.S. 136, 153-54 (1997) (Stevens, J., dissenting in part) (erroneously assuming that plaintiffs’ expert witness was justified in relying upon a weight-of-evidence methodology because such methodology is often used in risk assessment).

Two Articles of Interest in JAMA – Nocebo Effects; Medical Screening

February 12th, 2012

Two articles in this week’s Journal of the American Medical Association (JAMA) are of interest to lawyers who litigate, or counsel about, health effects.

One article deals with the nocebo effect, which is the dark side of the placebo effect.  Placebos can induce beneficial outcomes because of the expectation of useful therapy; nocebos can induce harmful outcomes because of the expectation of injury. The viewpoint article in JAMA points out that nocebo effects, like placebo effects, result from the “psychosocial context or therapeutic environment” affecting a patient’s perception of his state of health or illness.  Luana Colloca, MD, PhD, and Damien Finniss, MSc Med., “Nocebo Effects, Patient-Clinician Communication, and Therapeutic Outcomes,” 307 J. Am. Med. Ass’n 567, 567 (2012).

The authors discuss how clinicians can inadvertently prejudice health outcomes by how they frame outcome information to patients.  Importantly, Colloca and Finniss also note that the negative expectations created by the nocebo communication can take place in the process of obtaining informed consent.

The litigation significance is substantial because the creation of negative expectations is not the exclusive domain of clinicians.  Plaintiffs’ counsel, support and advocacy groups, and expert witnesses, even when well meaning, can similarly create negative expectations for health outcomes.  These actors often enjoy undeserved authority among their audience of litigants or claimants.  The extremely high rate of psychogenic illness found in many litigations is the result.  The harmful communications, however, are not limited to plaintiffs’ lawyers and their auxiliaries.  As Colloca and Finniss point out, nocebo effects can be induced by well-meaning warnings and disclosure of information from healthcare providers to patients.  Id. at 567.  The potential to induce negative harms in this way has the obvious consequence for the tort system:  more warnings are not always beneficial.  Indeed, warnings themselves can bring about harm.  This realization should temper courts’ enthusiasms for the view that more warnings are always better.  Warnings about adverse health outcomes should be based upon good scientific bases.

*************

The other article from this week’s issue of JAMA addresses the harms of screening.  Steven H. Woolf, MD, MPH, and Russell Harris, MD, MPH, “The Harms of Screening: New Attention to an Old Concern,” 307 J. Am. Med. Ass’n 565 (2012).    As I pointed out on these pages, screening for medical illnesses carries significant health risks to patients and ethical risks for the healthcare providers.  SeeEthics and Daubert: The Scylla and Charybdis of Medical Monitoring” (Feb. 1, 2012).  Bayes’ Theorem teaches us that even very high likelihood ratios for screening tests will yield true positive cases swamped by false positive cases when the baseline prevalence is low.  See Jonathan Deeks and Douglas Altman, “Diagnostic tests 4: likelihood ratios,” 329 Brit. Med. J. 168 (2004) (Providing a useful nomogram to illustrate how even highly accurate tests, with high likelihood ratios, will produce more false than true positive cases when the baseline prevalence of disease is low).

The viewpoint piece by Woolf and Harris emphasizes the potential iatrogenic harms from screening:

  • physical injury from the test itself (as in colonic perforations from colonoscopy);
  • cascade of further testing, with further risk of harm, both physical and emotional;
  • anxiety and emotional distress over abnormal results;
  • overdiagnosis; and
  • the overtreatment of conditions that are not substantial threats to patients’ health

These issues should have an appropriately chilling effect on judicial enthusiasm for medical monitoring and surveillance claims.  Great care is required to fashion a screening plan for patients or claimants.  Of course, there are legal risks as well, as when plaintiffs’ counsel fail to obtain the necessary prescriptions or permits to conduct radiological screenings.  See Schachtman “State Regulators Impose Sanction for Unlawful Silicosis Screenings,” 17(13) Wash. Leg. Fdtn. Legal Op. Ltr. (May 25, 2007).  Caveat litigator.

Interstitial Doubts About the Matrixx

February 6th, 2012

Statistics professors are excited that the United States Supreme Court issued an opinion that ostensibly addressed statistical significance.  One such example of the excitement is an article, in press, by Joseph B. Kadane, Professor in the Department of Statistics, in Carnegie Mellon University, Pittsburgh, Pennsylvania.  See Joseph B. Kadane, “Matrixx v. Siracusano: what do courts mean by ‘statistical significance’?” 11[x] Law, Probability and Risk 1 (2011).

Professor Kadane makes the sensible point that the allegations of adverse events did not admit of an analysis that would imply statistical significance or its absence.  Id. at 5.  See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”;  David Kaye, ” Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety and Liability Reporter 1007 (Sept. 12, 2011).  Unfortunately, the excitement has obscured Professor Kadane’s interpretation of the Court’s holding, and has led him astray in assessing the importance of the case.

In the opening paragraph of his paper, Professor Kadane quotes from the Supreme Court’s opinion that “the premise that statistical significance is the only reliable indication of causation … is flawed,” Matrixx Initiatives, Inc. v. Siracusano, ___ U.S. ___, 131 S.Ct. 1309 (2011).  The quote is accurate, but Professor Kadane proceeds to claim that this quote represents the holding of the Court. Kadane, supra at 1. The Court held no such thing.

Matrixx was a security fraud class action suit, brought by investors who claimed that the company misled them when they spoke to the market about the strong growth prospects of the company’s product, Zicam cold remedy, when they had information that raised concerns that might affect the product’s economic viability and its FDA license.  The only causation required for the plaintiffs to show was an economic loss caused by management’s intentional withholding of “material” information that should have been disclosed under all the facts and circumstances.  Plaintiffs do not have to prove that the medication causes the harm alleged in personal injury actions.  Indeed, it might turn out to be indisputable that the medication does not cause the alleged harm, but earlier, suggestive studies would provoke regulatory intervention and even a regulatory decision to withdraw the product from the market.  Investors obviously could be hurt under this scenario as much as, if not more than, if the medication caused the harms alleged by personal-injury plaintiffs. 

Kadane’s assessment goes awry in suggesting that the Supreme Court issued a holding about facts that were neither proven nor necessary for it to reach its decision.  Court can, and do, comment, note, and opine about many unnecessary facts or allegations in reaching a holding, but these statements are obiter dicta, if they are not necessary to the disposition of the case. Because medical causation was not required for the Supreme Court to reach its decision, its presence or absence was not, and could not, be part of the Court’s holding. 

Kadane makes a similar erroneous statement that the lower appellate courts, which earlier had addressed “statistical significance,” properly or improperly understood, found that “statistical significance in the strict sense [was] neither necessary … nor sufficient … to require action to remove a drug from the market.”  Id. at 6.  The earlier appellate decisions addressed securities fraud, however, not regulatory action of withdrawal of a product.  Kadane’s statement mistakes what was at issue, and what was decided, in all the cases discussed.

Kadane seems at least implicitly to recognize that medical causation is not at issue when he states that “the FDA does not require proof of causation but rather reasonable evidence of an association before a warning is issued.”  Id. at 7 (internal citation omitted).  All that had to have happened for the investors to have been harmed by the Company’s misleading statements was for Matrixx Initiatives to boast about future sales, and to claim that there were no health issues that would lead to regulatory intervention, when they had information raising doubts about their claim of no health issues. See FDA Regulations, 21 U.S.C. § 355(d), (e)(requiring drug sponsor to show adequate testing, labeling, safety, and efficacy); see also 21 C.F.R. § 201.57(e) (requiring warnings in labeling “as there is reasonable evidence of an association of a serious hazard with a drug; a causal relationship need not have been proved.”); 21 C.F.R. § 803.3 (adverse event reports address events possibly related to the drug or the device); 21 C.F.R. § 803.16 (adverse event report is not an admission of causation).

Kadane’s analysis of the case goes further astray when he suggests that the facts were strong enough for the case to have survived summary judgment.  Id. at 9.  The Matrixx case was a decision on the adequacy of the pleadings, not of the adequacy of the facts proven.  Elsewhere, Kadane acknowledges the difference between a challenge to the pleadings and the legal sufficiency of the facts, id. at 7 & n.8, but Kadane asserts, without explanation, that the difference is “technical” and does not matter.”  Not true.  The motion to dismiss is made upon receipt of the plaintiffs’ complaint, but the motion for summary judgment is typically made at the close of discovery, on the eve of trial.  The allegations can be conclusory, and they need have only plausible support in other alleged facts to survive a motion to dismiss.  The case, however, must have evidence of all material facts, as well as expert witness opinion that survives judicial scrutiny for scientific validity under Rule 702, to survive a motion for summary judgment, which comes much later in the natural course of any litigated case.

Kadane appears to try to support the conflation of dismissals on the pleadings and summary judgments by offering a definition of summary judgment that is not quite accurate, and potentially misleading:  “The idea behind summary judgment is that, even if every fact alleged by the opposing party were found to be true, the case would still fail for legal reasons.” Id. at 2.  The problem is that at the summary judgment stage, as opposed to the pleading stage, the party with the burden of proof cannot rest upon his allegations, but must come forward with facts, not allegations, to support every essential element of his case.  A plaintiff in a personal injury action (not a securities fraud case), for example, may easily survive a motion to dismiss by alleging medical causal connection, but at the summary judgment stage, that plaintiff must serve a report of an appropriately qualified expert witness, who in turn has presented a supporting opinion, reliably ground in science, to survive both evidentiary challenges and a dispositive motion.

Kadane concludes that the Matrixx decision’s “fact-based consideration” is consistent with a “Bayesian decision-theoretic approach that models how to make rational decisions under uncertainty.”  Id. at 9.  I am 99.99999% certain that Justice Sotomayor would not have a clue about what Professor Kadane was saying.  Although statistical significance may have played no role in the Court’s holding, and in Kadane’s Bayesian decision-theoretic approach, I am 100% certain that the irrelevance of statistical significance to the Court’s and Prof. Kadane’s approaches is purely coincidental.

Federal Rule of Evidence 702 Requires Perscrutations — Samaan v. St. Joseph Hospital (2012)

February 4th, 2012

After the dubious decision in Milward, the First Circuit would seem an unlikely forum for perscrutations of expert witness opinion testimony.  Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied, ___ U.S.___ (2012).  SeeMilwardUnhinging the Courthouse Door to Dubious Scientific Evidence” (Sept. 2, 2011).  Late last month, however, a First Circuit panel of the United States Court of Appeals held that Rule 702 required perscrutation of expert witness opinion, and then proceeded to perscrutate perspicaciously, in Samaan v. St. Joseph Hospital, 2012 WL 34262 (1st Cir. 2012).

The plaintiff, Mr. Samaan suffered an ischemic stroke, for which he was treated by the defendant hospital and physician.  Plaintiff claimed that the defendants’ treatment deviated from the standard of care by failing to administer intravenous tissue plasminogen activator (t-PA).  Id. at *1.  The plaintiff’s only causation expert witness, Dr. Ravi Tikoo, opined that the defendants’ failure to administer t-PA caused plaintiffs’ neurological injury.  Id. at *2.   Dr. Tikoo’s opinions, as well as those of the defense expert witness, were based in large part upon data from a study done by one of the National Institutes of Health:  The National Institute of Neurological Disorders and Stroke rt-PA Stroke Study Group, “Tissue Plasminogen Activator for Acute Ischemic Stroke,” 333 New Engl. J. Med. 1581 (1995).

Both the District Court and the Court of Appeals noted that the problem with Dr. Tikoo’s opinions lay not in the unreliability of the data, or in the generally accepted view that t-PA can, under certain circumstances, mitigate the sequelae of ischemic stroke; rather the problem lay in the analytical gap between those data and Dr. Tikoo’s conclusion that the failure to administer t-PA caused Mr. Samaan’s stroke-related injuries.

The district court held that Dr. Tikoo’s opinion failed to satisfy the requirements of Rule 702. Id. at *8 – *9.  Dr. Tikoo examined odds ratios from the NINDS study, and others, and concluded that a patient’s chances of improved outcome after stroke increased 50% with t-PA, and thus Mr. Samaan’s healthcare providers’ failure to provide t-PA had caused his poor post-stroke outcome.  Id. at *9.  The appellate court similarly rejected the inference from an increased odds ratio to specific causation:

“Dr. Tikoo’s first analysis depended upon odds ratios drawn from the literature. These odds ratios are, as the term implies, ratios of the odds of an adverse outcome, which reflect the relative likelihood of a particular result.FN5 * * * Dr. Tikoo opined that the plaintiff more likely than not would have recovered had he received the drug.”

Id. at *10.

The Court correctly identified the expert witness’s mistake in inferring specific causation from an odds ratio of about 1.5, without any additional information.  The Court characterized the testimonial flaw as one of “lack of fit,” but it was equally an unreliable inference from epidemiologic data to a conclusion about specific causation.

While the Court should be applauded for rejecting the incorrect inference about specific causation, we might wish that it had been more careful about important details.  The Court misinterpreted the meaning of an odds ratio to be a relative risk.  The NINDS study reported risk ratio results both as an odds ratio and as a relative risk.  The Court’s sloppiness should be avoided; the two statistics are different, especially when the outcome of interest is not particularly rare.

Still, the odds ratio is interesting and important as an approximation for the relative risk, and neither measure of risk can substitute for causation, especially when the magnitude of the risk is small, and less than two-fold.  The First Circuit recognized and focused in on this gap between risk and causal attribution in an individual’s case:

“[Dr. Tikoo’s] reasoning is structurally unsound and leaves a wide analytical gap between the results produced through the use of odds ratios and the conclusions drawn by the witness. When a person’s chances of a better outcome are 50% greater with treatment (relative to the chances of those who were not treated), that is not the same as a person having a greater than 50% chance of experiencing the better outcome with treatment. The latter meets the required standard for causation; the former does not.  To illustrate, suppose that studies have shown that 10 out of a group of 100 people who do not eat bananas will die of cancer, as compared to 15 out of a group of 100 who do eat bananas. The banana-eating group would have an odds ratio of 1.5 or a 50% greater chance of getting cancer than those who eschew bananas. But this is a far cry from showing that a person who eats bananas is more likely than not to get cancer.

Even if we were to look only at the fifteen persons in the banana-eating group who did get cancer, it would not be likely that any particular person in that cohort got it from the consumption of bananas. Correlation is not causation, and a substantial number of persons with cancer within the banana-eating group would in all probability have contracted the disease whether or not they ate bananas.FN6

We think that this example exposes the analytical gap between Dr. Tikoo’s methods and his conclusions.  Although he could present figures ranging higher than 50%, those figures were not responsive to the question of causation. Let us take the “stroke scale” figure from the NINDS study as an example. This scale measures the neurological deficits in different parts of the nervous system. Twenty percent of patients who experienced a stroke and were not treated with t-PA had a favorable outcome according to this scale, whereas that figure escalated to 31% when t-PA was administered.

Although this means that the patients treated with t-PA had over a 50% better chance of recovery than they otherwise would have had, 69% of those patients experienced the adverse outcome (stroke-related injury) anyway.FN7  The short of it is that while the odds ratio analysis shows that a t-PA patient may have a better chance of recovering than he otherwise would have had without t-PA, such an analysis does not show that a person has a better than even chance of avoiding injury if the drug is administered. The odds ratio, therefore, does not show that the failure to give t-PA was more likely than not a substantial factor in causing the plaintiff’s injuries. The unavoidable conclusion from the studies deemed authoritative by Dr. Tikoo is that only a small number of patients overall (and only a small fraction of those who would otherwise have experienced stroke-related injuries) experience improvement when t-PA is administered.”

*11 and n.6 (citing Milward).

The court in Samaan thus suggested, but did not state explicitly, that the study would have to have shown better than a 100% increase in the rate of recovery for attributability to have exceeded 50%.  The Court’s timidity is regrettable. Yes, Dr. Tikoo’s confusing the percentage increased risk with the percentage of attributability was quite knuckleheaded.  I doubt that many would want to subject themselves to Dr. Tikoo’s quality of care, at least not his statistical care.  The First Circuit, however, stopped short of stating what magnitude increase in risk would permit an inference of specifc causation for Mr. Samaan’s post-stroke sequelae.

The Circuit noted that expert witnesses may present epidemiologic statistics in a variety of forms:

“to indicate causation. Either absolute or relative calculations may suffice in particular circumstances to achieve the causation standard. See, e.g., Smith v. Bubak, 643 F.3d 1137, 1141–42 (8th Cir.2011) (rejecting relative benefit testimony and suggesting in dictum that absolute benefit “is the measure of a drug’s overall effectiveness”); Young v. Mem’l Hermann Hosp. Sys., 573 F.3d 233, 236 (5th Cir.2009) (holding that Texas law requires a doubling of the relative risk of an adverse outcome to prove causation), cert. denied, ___ U.S. ___, 130 S.Ct. 1512, 176 L.Ed.2d 111 (2010).”

 Id. at *11.

Although the citation to Texas law with its requirement of a doubling of a relative risk is welcome and encouraging, the Court seems to have gone out of its way to muddle its holding.  First, the Young case involved t-PA and a claimed deviation from the standard of care in a stroke case, and was exactly on point.  The Fifth Circuit’s reliance upon Texas substantive law left unclear to what extent the same holding would have been required by Federal Rule of Evidence 702.

Second, the First Circuit, with its banana hypothetical, appeared to confuse an odds ratio with a relative risk.  The odds ratio is different from a relative risk, and typically an odds ratio will be higher than the corresponding relative risk, unless the outcome is rare.  See Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers at 37 (2d ed. 2001). In studies of medication efficacy, however, the benefit will not be particularly rare, and the rare disease assumption cannot be made.

Third, risk is not causation, regardless of magnitude.  If the magnitude of risk is used to infer specific causation, then what is the basis for the inference, and how large must the risk be?  In what way can epidemiologic statistics be used “to indicate” specific causation?  The opinion tells us that Dr. Tivoo’s reliance upon an odds ratio of 1.5 was unhelpful, but why?  The Court, which spoke so clearly and well in identifying the fallacious reasoning of Dr. Tivoo, faltered in identifying what use of risk statistics would permit an inference of specific causation in this case, where general causation was never in doubt.

The Fifth Circuit’s decision in Young, supra, invoked a greater than doubling of risk required by Texas law.  This requirement is nothing more than a logical, common-sense recognition that risk is not causation, and that small risks alone cannot support an inference of specific causation.  Requiring a relative risk greater than two makes practical sense despite the apoplectic objections of Professor Sander Greenland.  SeeRelative Risks and Individual Causal Attribution Using Risk Size” (Mar. 18, 2011).

Importantly, the First Circuit panel in Samaan did not engage in the hand-waving arguments that were advanced in Milward, and stuck to clear, transparent rational inferences.  In footnote 6, the Samaan Court cited its earlier decision in Milward, but only with double negatives, and for the relevancy of odds ratios to the question of general causation:

“This is not to say that the odds ratio may not help to prove causation in some instances.  See, e.g., Milward v. Acuity Specialty Prods. Group, Inc., 639 F.3d 11, 13–14, 23–25 (1st Cir.2011) (reversing exclusion of expert prepared to testify as to general rather than specific causation using in part the odds ratio).”

Id. at n.6.

The Samaan Court went on to suggest that inferring specific causation from the magnitude of risk was “theoretically possible”:

Indeed, it is theoretically possible that a particular odds ratio calculation might show a better-than-even chance of a particular outcome. Here, however, the odds ratios relied on by Dr. Tikoo have no such probative force.

Id. (emphasis added).  But why and how? The implication of the Court’s dictum is that when the risk ratio is small, less than or equal to two, the ratio cannot be taken to have supported the showing of “better than even chance.” In Milward, one of the key studies relied upon by plaintiff’s expert witness reported an increased risk of only 40%.  Although Milward presented primarily a challenge on general causation, the Samaan decision suggests that the low-dose benzene exposure plaintiffs are doomed, not by benzene, but by the perscrutation required by Rule 702.

Epidemiology, Risk, and Causation – Report of Workshops

November 15th, 2011

This month’s issue of Preventive Medicine includes a series of papers arising from last year’s workshops on “Epidemiology, Risk, and Causation,” at Cambridge University. The workshops were organized by philosopher Alex Broadbent,  a member of the Department of History and Philosophy of Science, in Cambridge University.  The workshops were financially sponsored by the Foundation for Genomics and Population Health (PHG), a not-for-profit British organization.

Broadbent’s workshops were intended for philosophers of science, statisticians, and epidemiologists, lawyers involved in health effects litigation will find the papers of interest as well.  The themes of workshops included:

  • the nature of epidemiologic causation,
  • the competing claims of observational and experimental research for establishing causation,
  • the role of explanation and prediction in assessing causality,
  • the role of moral values in causal judgments, and
  • the role of statistical and epistemic uncertainty in causal judgments

See Alex Broadbent, ed., “Special Section: Epidemiology, Risk, and Causation,” 53 Preventive Medicine 213-356 (October-November 2011).  Preventive Medicine is published by Elsevier Inc., so you know that the articles are not free.  Still you may want to read these at your local library to determine what may be useful in challenging and defending causal judgments in the courtroom.  One of the interlocutors, Sander Greenland, is of particular interest because he shows up as an expert witness with some regularity.

Here are the individual papers published in this special issue:

Alfredo Morabia, Michael C. Costanza, Philosophy and epidemiology

Alex Broadbent, Conceptual and methodological issues in epidemiology: An overview

Alfredo Morabia, Until the lab takes it away from epidemiology

Nancy Cartwright, Predicting what will happen when we act. What counts for warrant?

Sander Greenland, Null misinterpretation in statistical testing and its impact on health risk assessment

Daniel M. Hausman, How can irregular causal generalizations guide practice

Mark Parascandola, Causes, risks, and probabilities: Probabilistic concepts of causation in chronic disease epidemiology

John Worrall, Causality in medicine: Getting back to the Hill top

Olaf M. Dekkers, On causation in therapeutic research: Observational studies, randomised experiments and instrumental variable analysis

Alexander Bird, The epistemological function of Hill’s criteria

Michael Joffe, The gap between evidence discovery and actual causal relationships

Stephen John, Why the prevention paradox is a paradox, and why we should solve it: A philosophical view

Jonathan Wolff, How should governments respond to the social determinants of health?

Alex Broadbent, What could possibly go wrong? — A heuristic for predicting population health outcomes of interventions, Pages 256-259

The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence

November 14th, 2011

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”

Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing. Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.  706 F. Supp. 358, 373 (E.D. Pa. 1988).

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.  In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis.

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that “no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”  In In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).  The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.  52 F.3d 1124 (2d Cir. 1995).  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.  Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.  See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia Cases” (2011) (forthcoming in Jurimetrics).

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.  See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation. Id. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies)

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis.  With this historical backdrop, it is interesting to see what the new third edition provides for guidance to the federal judiciary on this important topic.

STATISTICS CHAPTER

The statistics chapter of the third edition gives continues to give scant attention to meta-analysis.  The chapter notes, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautions that meta-analytic procedures “have their own weakness,” without detailing what that one weakness is.  RMSE 3d at 254 n. 107.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”

Id. at 289.

This definition is inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.

EPIDEMIOLOGY CHAPTER

The chapter on epidemiology delves into meta-analysis in greater detail than the statistics chapter, and offers apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”

Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).  The epidemiology chapter appropriately notes that meta-analysis can help address concerns over random error in small studies.  Id. at 579; see also id. at 607 n. 171.

Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter hedges considerably:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

Id. at 607.  The stated objection to pooling results for observational studies is certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter goes on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the RSME, the authors repeat their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”

Id. at 607 n.177.  Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the RMSE 3d fails to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

Id. at 608.  The authors are entitled to their opinion, but their discussion leaves the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed.  What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.

MEDICAL TESTIMONY CHAPTER

The chapter on medical testimony is the third pass at meta-analysis in RMSE 3d.   The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

RMSE 3d at 722-23.

The chapter curiously omits observational studies, but the footnote reference (note 143) then inconsistently discusses two meta-analyses of observational, rather than experimental, studies:

“143. Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”

Id. at 723 n.143.

The medical testimony chapter then provides further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

Id. at 723-24.  This discussion further muddies the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are a necessary precondition for conducting a meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the new Reference Manual.

Reference Manual on Scientific Evidence v3.0 – Disregarding Study Validity in Favor of the “Whole Gamish”

October 14th, 2011

There is much to digest in the new Reference Manual on Scientific Evidence, third edition (RMSE 3d).  Much of what is covered is solid information on the individual scientific and technical disciplines covered.  Although the information is easily available from other sources, there is some value in collecting the material in a single volume for the convenience of judges.  Of course, given that this information is provided to judges from an ostensibly neutral, credible source, lawyers will naturally focus on what is doubtful or controversial in the RMSE.

I have already noted some preliminary concerns, however, with some of the comments in the Preface, by Judge Kessler and Dr. Kassirer.  See “New Reference Manual’s Uneven Treatment of Conflicts of Interest.”  In addition, there is a good deal of overlap among the chapters on statistics, epidemiology, and medical testimony.  This overlap is at first blush troubling because the RMSE has the potential to confuse and obscure issues by having multiple authors address them inconsistently.  This is an area where reviewers should pay close attention.

From first looks at the RMSE 3d, there is a good deal of equivocation between encouraging judges to look at scientific validity, and discouraging them from any meaningful analysis by emphasizing inaccurate proxies for validity, such as conflicts of interest.  (As I have pointed out, the new RSME did not do quite so well in addressing its own conflicts of interest.  SeeToxicology for Judges – The New Reference Manual on Scientific Evidence (2011).”)

The strengths of the chapter on statistical evidence, updated from the second edition, remain, as do some of the strengths and flaws of the chapter on epidemiology.  I hope to write more about each of these important chapters at a later date.

The late Professor Margaret Berger has an updated version of her chapter from the second edition, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Berger’s chapter has a section criticizing “atomization,” a process she describes pejoratively as a “slicing-and-dicing” approach.  Id. at 19.  Drawing on the publications of Daubert-critic Susan Haack, Berger rejects the notion that courts should examine the reliability of each study independently. Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).  Berger contends that the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer, the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.” Id. at 19-20 & n.52.  This contention, however, is profoundly misleading.  Of course, scientists undertaking a systematic review should identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems.  Berger cites no support for the remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010.  She was no friend of Daubert, but remarkably her antipathy has outlived her.  Her critical discussion of “atomization” cites the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing. Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”)

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole gamish must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.”  One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay.  A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication.  Those leaps do not mean that the final results are untrustworthy, only that the study itself is not likely admissible in evidence.

The inadmissibility of scientific studies is not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

Referring to studies, without qualification, as admissible in themselves is wrong as a matter of evidence law.  The error has the potential to encourage carelessness in gatekeeping expert witnesses’ opinions for their reliance upon inadmissible studies.  The error is doubly wrong if this approach to expert witness gatekeeping is taken as license to permit expert witnesses to rely upon any marginally relevant study of their choosing.  It is therefore disconcerting that the new Reference Manual on Science Evidence (RMSE 3d) fails to make the appropriate distinction between admissibility of studies and admissibility of expert witness opinion that has reasonably relied upon appropriate studies.

Consider the following statement from the chapter on epidemiology:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible,184 as it tends to make an issue in dispute more or less likely.185

RMSE 3d at 610.  Curiously, the authors of this chapter have ignored Professor Berger’s caution against slicing and dicing, and speak to a single study’s ability to justify a conclusion. The authors of the epidemiology chapter seem to be stressing that scientifically valid studies should be admissible.  The footnote emphasizes the point:

See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990); cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984). Hearsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert. In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded that certain epidemiologic studies were admissible despite criticism of the methodology used in the studies. The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . .”).”

RMSE 3d at 610 n.184 (emphasis in bold, added).  This statement, that studies relied upon by an expert in forming an opinion may be admissible pursuant to Rule 703, is unsupported by Rule 703 and the overwhelming weight of case law interpreting and applying the rule.  (Interestingly, the authors of this chapter seem to abandon their suggestion that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5),” which was part of their argument in the Second Edition of the RMSE.  RMSE 2d at 335 (2000).)  See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion).

The cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).  See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18.  As such, the cases hardly support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies.

Here the RMSE, in one sentence, confuses Rule 703 with an exception to the rule against hearsay, which would prevent the statistical studies from being received in evidence.  The point is reasonably clear, however, that the studies “may be offered” to explain an expert witness’s opinion.  Under Rule 705, that offer may also be refused. The offer, however, is to “explain,” not to have the studies admitted in evidence.

The RMSE is certainly not alone in advancing this notion that studies are themselves admissible.  Other well-respected evidence scholars lapse into this position:

“Well conducted studies are uniformly admitted.”

David L. Faigman, et al., Modern Scientific Evidence:  The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009)

Evidence scholars should not conflate admissibility of the epidemiologic (or other) studies with the ability of an expert witness to advert to a study to explain his or her opinion.  The testifying expert witness really has no need to become a conduit for off-hand comments and opinions in the introduction or discussion section of relied upon articles, and the wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence.  Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.