TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports

November 24th, 2014

Gadolinium (Gd) is a rare earth element. In its ionic form (+3), gadolinium is known to be highly toxic to humans. Gadolinium is strongly paramagnetic, which makes it a valuable contrast agent in for magnetic resonance imaging (MRI). The gadolinium is administered intravenously in a chelated form before MRI. In its chelated form, the ion is escorted out of the body through the kidneys before exposure to free Gd ion occurs. Or that was the theory.

Nephrogenic systemic fibrosis (NSF) is a rare, painful, incurable progressive connective tissue disease. NSF manifests with skin thickening and fibrosis, tethering, which means it cannot be pulled away from body. Some patients may develop extracutaneous fibrosis of muscle, lymph nodes, pleura, and other internal organs. Elana J. Bernstein, Christian Schmidt-Lauber, and Jonathan Kay, “Nephrogenic systemic fibrosis: A systemic fibrosing disease resulting from gadolinium exposure,” 26 Best Practice & Research Clin. Rheum. 489, 489 (2012).

As a diagnostic entity, NSF is a relatively recent discovery. The first case was noted in 1997, in California. Within a few years, the differential diagnostic criteria to distinguish NSF from other fibrotic diseases were developed. Centers for Disease Control, “Fibrosing skin condition among patients with renal disease–United States and Europe, 1997–2002,” 51 MMWR Morbidity and Mortality Weekly Report 25 (2002). Physicians identified the condition among patients with renal insufficiency who had received MRI with a gadolinium-based contrast agent (GBCA). Given the rarity of both the exposure (GBCA and renal insufficiency) and the outcome (NSF), the relationship between NSF and the use of gadolinium-containing contrast agents for magnetic resonance imaging (MRI) was discovered largely from case reports. A case registry is maintained at Yale University, and has identified 380 cases to date. Shawn E. Cowper, “Nephrogenic Systemic Fibrosis” at the website for The International Center for Nephrogenic Systemic Fibrosis Research (ICNSFR) [last updated June 15, 2013).

The little epidemiology that exists on the subject generally has found that all “cases” had exposure to Gd[1]. Or almost all. There have been occasional cases found without reported exposure to GBCA. Indeed, one case of NSF without prior GBCA was reported last month in the dermatological literature. C. Ross, N. De Rosa, G. Marshman, D. Astill, Nephrogenic systemic fibrosis in a gadolinium-naïve patient: Successful treatment with oral sirolimus,” Australas. J. Dermatol. (2014); doi: 10.1111/ajd.12176. [Epub ahead of print].

In litigation, the usual scenario is that plaintiffs and their counsel and expert witnesses want to offer case reports or case series as probative of a causal association between an exposure and a particular disease outcome. In the silicone gel breast implant litigation, women, who self-characterized themselves “victims,” shouted outside courtrooms, “We are the evidence.”

When the outcome in question has a baseline rate, and the exposure is widespread, this strategy is usually illegitimate and most courts have limited or prohibited the obvious attempt to prejudice the jury by the use of evidence that has little or no probative value.

The causal connection between NSF and GBCA, described above, was postulated on the basis of case reports, but this is not really a rejection of the general rule about case reports. NSF is an extremely rare outcome, and GBCA administered to patients with serious kidney insufficiency is a fairly rare exposure. In addition, gadolinium ion has a known human toxicity, and the connection between renal insufficiency and Gd toxicity is rather straightforward. The insufficiency of the kidney function results in longer “in residence” times for the GBCA, with the consequence that the gadolinium disassociates from its chelating agent, and the free Gd ion does its damage. Furthermore, biopsies of affected tissues show an uptake of gadolinium in NSF patients.

   *   *   *   *   *   *   *   *

GE Healthcare manufactures Omniscan, a GBCA, for use as an MRI-contrast medium. Given the recently discovered dangers of GBCAs in vulnerable patients, Omniscan has been a magnet for lawsuits, with the peak intensity of the litigation field in the MDL courtroom of federal district courtroom of Judge Dan Polster. Judge Polster tried the first Omniscan case, which resulted in a verdict for the plaintiff. GE appealed, complaining about several of Judge Polster’s rulings, including the uneven handling of case reports. Last month, the Sixth Circuit affirmed. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014).

General causation between GBCAs and NSF was apparently not disputed in Decker. Although plaintiffs in the GBCA litigation established the causality of GABC in producing NSF, by case reports, Judge Polster refused to permit GEHC’s expert witnesses to testify about their reliance upon case reports of gadolinium-naïve cases of NSF; that is, the court disallowed testimony about reported cases that occurred in the absence of GBCA exposure[2]. Id. at *9. Judge Polster found that the reported gadolinium-naïve case reports were “methodologically flawed” because they did not adequately show that the NSF patients in question lacked Gd exposure, with tissue biopsy or other means. Id. at * 10. The district court speculated that there may have Gd exposure from a non-MRI procedure, but never explained what non-MRI procedure would involve internal administration of GBCA. Nor did the district court address the temporal relationship between this undocumented, conjectured non-MRI gadolinium-based imaging procedure and the onset of the reported patient’s NSF.

Before trial defendant GEHC moved for reconsideration of the district court’s previous decision on defensive use of gadolinium-naïve case reports, based upon on a then recent publication of a “purported” case of gadolinium-naïve NSF. Id. at *8. A quick read of the late-breaking case study shows that it was more than a “purported” case. A.A. Lemy, et al., “Revisiting nephrogenic systemic fibrosis in 6 kidney transplant recipients: a single-center experience,” 63 J. Am. Acad. Dermatol. 389 (2010). The cited paper by Lemy had diagnosed NSF in a patient without GBCA exposure, and mass spectrometry testing of affected tissue revealed no Gd. The district court, however, dismissed the Lemy case as irrelevant unless GEHC’s expert witnesses could demonstrate that Lemy’s patient number 5 and the plaintiff were so clinical similar that “it was probable that Mr. Decker’s NSF was not caused by his 2005 Omniscan [exposure].”

The Sixth Circuit affirmed this “tails they win; heads you lose” approach to gatekeeping as all within the scope of the district court’s exercise of discretion. Lemy’s case number 5 and Mr. Decker both had NSF, and yet the courts do not describe clinical varieties among NSF, which vary based upon their relatedness to gadolinium exposure. It would seem that the courts were imposing an extremely heavy burden on the defense to show that the gadolinium-naïve cases were absolutely free of Gd exposure, and that they resembled the particular plaintiff’s NSF diagnosis in every respect. Without any evidence of diagnostic disease criteria sensitivity and specificity, and positive predictive value for the criteria, the district and the appellate courts seem to have accepted glib demands for absolute identity between the plaintiff’s NSF manifestation and any candidate Gd-free NSF case. Given that there is clinical heterogeneity among Gd-NSF cases, and that causality was basically inferred from cases and case series, the courts’ reasoning seems strained.

The appellate court also seemed blithely unaware of the fallacious circularity of permitting a diagnostic entity to be defined based upon exposure, thereby preventing any fair test of the hypothesis that all NSF cases are caused by gadolinium. This fallacy was advanced in the silicone gel breast implant litigation, where the litigation industry shrank from claims that silicone caused classic connective tissue diseases, in the face of exculpatory epidemiologic studies. The claimants retreated to a claim that silicone caused a “new” disease that was defined by mostly vague, self-reported symptoms [so very different from NSF in this respect], in conjunction with silicone exposure. The court-appointed expert witnesses, however, would have none of these shenanigans:

“The National Science Panel concluded that they do not yet support the inclusion of SSRD [systemic silicone-related disease] in the list of accepted diseases, for 4 reasons. First, the requirement of the inclusion of the putative cause (silicone exposure) as one of the criteria does not allow the criteria set to be tested objectively without knowledge of the presence of implants, thus incurring incorporation bias (27).”

Peter Tugwell, George Wells, Joan Peterson, Vivian Welch, Jacqueline Page, Carolyn Davison, Jessie McGowan, David Ramroth, and Beverley Shea, “Do Silicone Breast Implants Cause Rheumatologic Disorders? A Systematic Review for a Court-Appointed National Science Panel,” 44 Arthritis & Rheumatism 2477, 2479 (2001) (citing David Sackett, “Bias in analytic research,” 32 J. Chronic Dis. 51 (1979)).

Of course, NSF does not share the dubious provenance of SSRD, or SAD [silicone-associated disorder] as it was sometimes known. Still, the analytic studies that have shown that NSF cases all, or mostly, had GBCA exposure, explicitly refrained from defining the NSF case as including gadolinium exposure.

Decker is thus a curious case. The trial and appellate court talked about preventing the defense expert witnesses from relying upon case reports that were “methodologically flawed,” but the courts never mentioned Federal Rule of Evidence 703, which should have been the basis for such selective pruning of the expert witnesses’ reliance materials. And then there is the matter that even if GEHC were correct about Gd-free NSF cases, the attributable risk for NSF to prior Gd exposure is almost certainly very high, and the debate over whether NSF is a “signature” disease was not likely going to affect the case outcome.

Decker can perhaps best be understood as a dispute about specific causation, with established general causation, in which the relative risk of NSF from GBCA exposure is extraordinarily high among patients with renal insufficiency. If there are other causes of NSF, they are considerably more rare than GBCA/renal insufficiency exposed cases. In the face of this very high attributable risk, GE’s expert witnesses’ discussions of an idiopathic or other cause was too speculative to pass muster under Rule 702.


[1] Elana J. Bernstein, Tamara Isakova, Mary E. Sullivan, Lori B. Chibnik, Myles Wolf & Jonathan Kay, “Nephrogenic systemic fibrosis is associated with hypophosphataemia: a case–control study,” 53 Rheumatology 1613 (2014); T.R. Elmholdt, M. Pedersen, B. Jørgensen, K. Søndergaard, J.D. Jensen, M. Ramsing, and A.B. Olesen, “Nephrogenic systemic fibrosis is found only among gadolinium-exposed patients with renal insufficiency: a case-control study from Denmark,” 165 Br. J. Dermatol. 828 (2011); P. Marckmann, “An epidemic outbreak of nephrogenic systemic fibrosis in a Danish hospital,” 66 Eur. J. Radiol. 187 (2008) (reporting all patients had gadodiamide-enhanced magnetic resonance imaging and severe renal insufficiency before onset of NSF); P. Marckmann, L. Skov, K. Rossen, J.G. Heaf, and H.S. Thomsen, “Case-control study of gadodiamide-related nephrogenic systemic fibrosis,” 22 Nephrol. Dialysis &Transplant. 3174 (2007) (all 19 cases in case-control study had prior exposure to gadolinium (Gd)-containing magnetic resonance imaging contrast agents); Centers for Disease Control, “Nephrogenic Fibrosing Dermopathy Associated with Exposure to Gadolinium-Containing Contrast Agents — St. Louis, Missouri, 2002–2006,” 56 MMWR Morbidity and Mortality Weekly Report (Feb. 23, 2007).

[2] T.A. Collidge, P.C. Thomson, P.B. Mark, et al., “Gadolinium-Enhanced MR Imaging and Nephrogenic Systemic Fibrosis: Retrospective Study of a Renal Replacement Therapy Cohort,” 245 Radiology 168-175 (2007); I.M. Wahba, E.L. Simpson, and K. White, “Gadolinium Is Not The Only Trigger For Nephrogenic Systemic Fibrosis: Insights From Two Cases And Review Of The Recent Literature,” 7 Am. J. Transplant. 1 (2007); A. Deng, D.B. Martin, et al., “Nephrogenic Systemic Fibrosis with a Spectrum of Clinical and Histopathological Presentation: A Disorder of Aberrant Dermal Remodeling,” 37 J. Cutan. Pathol. 204 (2009).

Rhetorical Strategy in Characterizing Scientific Burdens of Proof

November 15th, 2014

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration.  Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities: Fatico

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.


 

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).

 THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS

November 12th, 2014

Back in the day, some Circuits of the United States Court of Appeal embraced an asymmetric standard of review of district court decisions concerning the admissibility of expert witness opinion evidence. If the trial court’s decision was to exclude an expert witness, and that exclusion resulted in summary judgment, then the appellate court would take a “hard look” at the trial court’s decision. If the trial court admitted the expert witness’s opinions, and the case proceeded to trial, with opponent of the challenged expert witness losing the verdict, then the appellate court would take a not-so “hard look” the trial court’s decision to admit the opinion. In re Paoli RR Yard PCB Litig., 35 F.3d 717, 750 (3d Cir.1994) (Becker, J.), cert. denied, 115 S.Ct.1253 (1995).

In Kumho Tire, the 11th Circuit followed this asymmetric approach, only to have the Supreme Court reverse and render. Unlike the appellate procedure followed in Daubert, the high Court took the extra step of applying the symmetrical standard of review, presumably for the didactic purpose of showing the 11th Circuit how to engage in appellate review. Carmichael v. Kumho Tire Co., 131 F.3d 1433 (11th Cir. 1997), rev’d sub nom. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999).

If anything is clear from the Kumho Tire decision, courts do not have discretion to apply an asymmetric standard to their evaluation of a challenge, under Federal Rule of Evidence 702, to a proffered expert witness opinion. Justice Stephen Breyer, in his opinion for the Court, in Kumho Tire, went on to articulate the requirement that trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Again, trial courts do not have the discretion to abandon this inquiry.

The “same intellectual rigor” test may have some ambiguities that make application difficult. For instance, identifying the “relevant” field or discipline may be contested. Physicians traditionally have not been trained in statistical analyses, yet they produce, and rely extensively upon, clinical research, the proper conduct and interpretation of which requires expertise in study design and data analysis. Is the relevant field biostatistics or internal medicine? Given that the validity and reliability of the relied upon studies come from biostatistics, courts need to acknowledge that the rigor test requires identification of the “appropriate” field — the field that produces the criteria or standards of validity and interpretation.

Justice Breyer did grant that trial courts must have some latitude in determining how to conduct their gatekeeping inquiries. Some cases may call for full-blown hearings and post-hearing proposed findings of fact and conclusions of law; some cases may be easily decided upon the moving papers. Justice Breyer’s grant of “latitude,” however, wanders off target:

“The trial court must have the same kind of latitude in deciding how to test an expert’s reliability, and to decide whether or when special briefing or other proceedings are needed to investigate reliability, as it enjoys when it decides whether that expert’s relevant testimony is reliable. Our opinion in Joiner makes clear that a court of appeals is to apply an abuse-of-discretion standard when it ‛review[s] a trial court’s decision to admit or exclude expert testimony’. 522 U. S. at 138-139. That standard applies as much to the trial court’s decisions about how to determine reliability as to its ultimate conclusion. Otherwise, the trial judge would lack the discretionary authority needed both to avoid unnecessary ‛reliability’ proceedings in ordinary cases where the reliability of an expert’s methods is properly taken for granted, and to require appropriate proceedings in the less usual or more complex cases where cause for questioning the expert’s reliability arises. Indeed, the Rules seek to avoid ‛unjustifiable expense and delay’ as part of their search for ‛truth’ and the ‛jus[t] determin[ation]’ of proceedings. Fed. Rule Evid. 102. Thus, whether Daubert ’s specific factors are, or are not, reasonable measures of reliability in a particular case is a matter that the law grants the trial judge broad latitude to determine. See Joiner, supra, at 143. And the Eleventh Circuit erred insofar as it held to the contrary.”

Kumho, 526 U.S. at 152-53.

Now the segue from discretion to fashion the procedural mechanism for gatekeeping review to discretion to fashion the substantive criteria or standards for determining “intellectual rigor in the relevant field” represents a rather abrupt shift. The leap from discretion to fashion procedure to discretion to fashion substantive criteria of validity has no basis in prior law, in linguistics, or in science. For instance, Justice Breyer would be hard pressed to uphold a trial court’s refusal to consider bias and confounding in assessing whether epidemiologic studies established causality in a given case, notwithstanding the careless language quoted above.

The troubling nature of Justice Breyer’s language did not go unnoticed at the time of the Kumho Tire case. Indeed, three of the Justices in Kumho Tire concurred to clarify:

“I join the opinion of the Court, which makes clear that the discretion it endorses—trial-court discretion in choosing the manner of testing expert 1reliability—is not discretion to abandon the gatekeeping function. I think it worth adding that it is not discretion to perform the function inadequately.”

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999) (Scalia, J., concurring, with O’Connor, J., and Thomas, J.)

Of course, this language from Kumho Tire really cannot be treated as binding after the statute interpreted, Rule 702, was modified in 2000. The judges of the inferior federal courts have struggled with Rule 702, sometimes more to evade its reach than to perform gatekeeping in an intelligent way. Quotations of passages from cases decided before the statute was amended and revised should be treated with skepticism.

Recently, the Sixth Circuit quoted Justice Breyer’s language about latitude from Kumho Tire, in the Circuit’s decision involving GE Healthcare’s radiographic contrast medium, Omniscan. Decker v. GE Healthcare Inc., 2014 U.S. App. LEXIS 20049, at *29 (6th Cir. Oct. 20, 2014). Although the Decker case is problematic in many ways, the defendant did not challenge general causation between gadolinium and nephrogenic systemic fibrosis, a painful, progressive connective tissue disease, which afflicted the plaintiff. It is unclear exactly what sort of latitude in applying the statute, the Sixth Circuit was hoping to excuse.

Contrivance Standard Applied to Gatekeepers and Expert Witnesses

October 1st, 2014

In Rink v. Cheminova, Inc., 400 F.3d 1286 (11th Cir. 2005), the Eleventh Circuit’s articulated a “contrivance standard,” which suggested that a district court “may properly consider whether the expert’s methodology has been contrived to reach a particular result.” Id. at 1293 & n.7; see alsoThe Contrivance Standard for Expert Witness Gatekeeping” (Sept. 28, 2014).

Although this standard has some appeal, it raises questions of motives that can complicate the Rule 702 inquiry into whether an purported opinion is “knowledge.” A less psychoanalytic inquiry into the expert witness’s motivation should generally be the first line of approach.

In the Zoloft MDL, the trial court banished Dr. Anick Bérard from federal court birth defect cases because of her unprincipled and inexplicable cherry picking of data, relied upon for her causation opinions. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig. MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.). The “contrivance” was objectively obvious and manifest in double-counting data points, and ignoring point estimates that were contrary to the desired outcome, even from papers that provided point estimates that were selectively embraced.

In the Chantix MDL, the trial court found the defendant to have harped on methodological peccadilloes but obviously did not like the beatific music (3x). Cherry picking was going on, but it was perfectly acceptable to this MDL court:

“Why Dr. Kramer chose to include or exclude data from specific clinical trials is a matter for crossexamination, not exclusion under Daubert.

In re Chantix (varenicline) Prods. Liab. Litig., 889 F. Supp. 2d 1272, 1288 (2012) (MDL No. 2092) (permitting Dr. Shira Kramer to testify on causation despite her embracing a “weight of the evidence” method that turned largely on‘‘subjective interpretations’’ of various, undescribed, non-prespecified lines of evidence).

The differing approaches to cherry picking are hard to reconcile other than to note that Chantix had drawn a “black box” warning from the FDA, and the SSRIs involved in Zoloft had not been given any heightened warning from the FDA, foreign agencies, or any professional society. FDA labeling, of course, should not have been determinative of the causation question. The mind of the gatekeeper, however, is inscrutable.

 

 

The Contrivance Standard for Expert Witness Gatekeeping

September 28th, 2014

According to Google ngram, the phrase “junk science” made its debut circa 1975, lagging junk food by about five years. SeeThe Rise and Rise of Junk Science” (Mar. 8, 2014). I have never much like the phrase “junk science” because it suggests that courts need only be wary of the absurd and ridiculous in their gatekeeping function. Some expert witness opinions are, in fact, serious scientific contributions, just not worthy of being advanced as scientific conclusions. Perhaps better than “junk” would be patho-epistemologic opinions, or maybe even wissenschmutz, but even these terms might obscure that the opinion that needs to be excluded derives from serious scientific, only it is not ready to be held forth as a scientific conclusion that can be colorably called knowledge.

Another formulation of my term, patho-epistemology, is the Eleventh Circuit’s lovely “Contrivance Standard.” Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 & n.7 (11th Cir. 2005). In Rink, the appellate court held that the district court had acted within its discretion to exclude expert witness testimony because it had properly confined its focus to the challenged expert witness’s methodology, not his credibility:

“In evaluating the reliability of an expert’s method, however, a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result. See Joiner, 522 U.S. at 146, 118 S.Ct. at 519 (affirming exclusion of testimony where the methodology was called into question because an “analytical gap” existed “between the data and the opinion proffered”); see also Elcock v. Kmart Corp., 233 F.3d 734, 748 (3d Cir. 2000) (questioning the methodology of an expert because his “novel synthesis” of two accepted methodologies allowed the expert to ”offer a subjective judgment … in the guise of a reliable expert opinion”).”

Note the resistance, however, to the Supreme Court’s mandate of gatekeeping. District courts must apply the statutes, Rule of Evidence 702 and 703. There is no legal authority for the suggestion that a district court “may properly consider wither the expert’s methodology has been contrived.” Rink, 400 F.3d at 1293 n.7 (emphasis added).

Railroading Scientific Evidence of Causation in Court

August 31st, 2014

Harold Tanfield spent 40 years or so working for Consolidated Rail Corporation (and its predecessors), from 1952 to 1992.  Mr. Tanfield’s widow sued Conrail, under the Federal Employers’ Liability Act (“FELA”), 45 U.S.C.A. §§ 51-60, for negligently overexposing her late husband to diesel fumes, which allegedly caused him to develop lung cancer. Tanfield v. Leigh RR, No. A-4170-12T2, New Jersey Superior Court, App. Div. (Aug. 11, 2014) Slip op. at 3. [cited below as Tanfield].

The trial court granted Conrail summary judgment on grounds that plaintiff failed to show that Conrail had breached a duty of care.  The appellate court reversed and remanded for trial. The Appellate Division’s decision is “per curiam,” and franked “not for publication without the approval of the Appellate Division.” Only two of the usual three appellate judges participated.  The panel decided the case one week after it was submitted.

The plaintiff relied upon two witness, a co-worker of her husband, and an expert witness, Steven R. Tahan, M.D.  Dr. Tahan is a pathologist, an Associate Professor, Department of Pathology, Harvard Medical School, and the Director of Dermatopathology, Beth Israel Deaconess Medical Center.  Dr. Tahan’s website lists melanoma as his principal research interest. A PubMed search reveals no publications on diesel fume, occupational disease, or lung cancer.  Dr. Tahan’s principal research interest, skin pathology, was decidedly not at issue in the Tanfield case.

The panel of the Appellate Division quoted from the relevant paragraphs of Tahan’s report:

“Mr. Tanfield was a railroad worker for 35 years, where he was exposed to a large number of carcinogenic chemicals and fumes, including asbestos, antimony, arsenic, benzene, beryllium, cadmium, carbon disulfide, cyanide, DDT, diesel fumes, diesel fuel, dioxins, ethylbenzene, lead, methylene chloride, mercury, naphthalene, petroleum hydrocarbon, polychlorinated biphenyls, polynuclear aromatic hydrocarbons, toluene, vinyl acetate, and other volatile organics.

I have reviewed the cytology and biopsy slides from the right lung and confirm that he had a poorly differentiated malignant non-small cell carcinoma with both adenocarcinomatous and squamous features.  I have reached the following conclusions to a reasonable degree of medical certainty based on review of the above materials, my education, training, and experience, and review of published studies.

Mr. Tanfield’s more than 35 year substantial occupational exposure to an extensive array of carcinogens and diesel fumes without provision of protective equipment such as masks, respirators, and other filters created a long-term hazard that substantially multiplied his risk for developing lung cancer over the baseline he had as a former smoker.  It is more likely than not that his occupational exposure to diesel fumes and other carcinogenic toxins present in his workplace was a significant causative factor for his development of lung cancer and death from his cancer.”

Tanfield at 6-7.

Mr. Tanfield’s co-worker testified to what appeared to him to be excessive diesel fumes in the workplace, but there is no mention of any quantitative or qualitative evidence to any other lung carcinogen.  The Appellate Division states that the above three paragraphs represent the substance of Dr. Tahan’s report, and so it appears that there is no quantification of Tanfield’s smoking abuse, or the length of time between his discontinuing his smoking and the diagnosis of his lung cancer.  There is no discussion of any support for the alleged interaction between risks, or for any quantification of the extent of his increased risk from his lifestyle choices as opposed to his workplace exposure(s). There is no discussion of what Dr. Tahan visualized in his review of cytology and pathology slides, which permitted him to draw inferences about the actual causes of Mr. Tanfield’s lung cancer.

The trial judge proceeded on the assumption that there was an adequate proffer of expert opinion on causation, but that Dr. Tahan’s opinions on the failure to provide masks or respirators was a “net opinion,” a bit out of Tahan’s area of expertise.  Tanfield at 8. The Appellate Division apparently thought having a skin pathologist opine about the duty of care for a railroad was good enough for government work.  The appellate court gave the widow the benefit of the lower evidentiary threshold for negligence under FELA, which supposedly excuses the lack of an industrial hygiene opinion.  Tanfield at 10.  According to the two-judge panel, “[t]he doctor’s [Tahan’s] opinions are backed by professional literature and by his own considerable years of research and experience.” Tanfield at 11.  The Panel’s statement is all the more remarkable given that Tahan had never published on lung cancer, exposure assessments, or industrial hygiene measures; the vaunted experience of this witness was irrelevant to the issues in the case. Perhaps even more disturbing are the gaps in the proofs concerning the lack of causal connection between many of the alleged exposures and lung cancer generally, any discussion that the level of exposure to diesel fumes, from 1952 to 1992, was such that the railroads knew or should have known that that level of diesel fume caused lung cancer in workers.  And then there is the lurking probability that Mr. Tanfield’s smoking was the sole cause of his lung cancer.

Over 50 years ago, the New York Court of Appeals rejected a claim for leukemia, based upon allegations of benzene exposure, without any quantification of risk from the alleged exposure.  Miller v. National Cabinet Co., 8 N.Y.2d 277, 283-84, 168 N.E.2d 811, 813-15, 204 N.Y.S.2d 129, 132-34, modified on other grounds, 8 N.Y.2d 1025, 70 N.E.2d 214, 206 N.Y.S.2d 795 (1960). It is time to raise the standard for New Jersey courts’ consideration of epidemiologic evidence.

Peer Review, PubPeer, PubChase, and Rule 702 – Candles in the Ear

August 28th, 2014

In deciding the Daubert case, the Supreme Court identified several factors to assess whether “the reasoning or methodology underlying the testimony is scientifically valid and of whether that reasoning or methodology properly can be applied to the facts in issue.” One of those factors was whether the proffered opinion had been “peer reviewed” and published. Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94 (1993). The Court explained the publication factor:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, and in some instances well-grounded but innovative theories will not have been published. Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert, 509 U.S. at 593-94 (internal citations omitted). See, e.g., Lust v. Merrell Dow Pharms., Inc., 89 F. 3d 594, 597 (9th Cir. 1996) (affirming exclusion of Dr. Alan Done, plaintiffs’ expert witness in Chlomid birth defects case, in part because of the lack of peer review and publication of his litigation-driven opinions); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1406 (1996)  (noting that “the lack of peer review for [epidemiologist] Dr. Swan’s theories weighs heavily against the admissibility of Dr. Swan’s testimony”).

Case law since Daubert has made clear that peer review is neither necessary nor sufficient for the admissibility of an opinion. United States v. Mikos, 539 F.3d 706, 711 (7th Cir. 2008) (noting that the absence of peer-reviewed studies on subject of bullet grooving did not render opinion, based upon FBI database, inadmissible); In re Zoloft Prods. Liab. Litig. MDL No. 2342; 12-md-2342,  2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (excluding proffered testimony of epidemiologist Anick Bérard for arbitrarily selecting some point estimates and ignoring others in published studies).

As Susan Haack has noted, “peer review” has taken on mythic proportions in the adjudication of expert witness opinion admissibility.  Susan Haack, “Peer Review and Publication: Lessons for Lawyers,” 36 Stetson L. Rev. 789 (2007), republished in Susan Haack, Evidence Matters: Science, Proof, and Truth in the Law 156 (2014). Peer review, at best, is a weak proxy for the study validity, which is what is really needed in judicial proceedings. Proxies avoid the labor of independent, original thought, and so they are much favored by many judges.

In the past, some litigants oversold peer review as a touchstone of reliable, admissible expert witness testimony only to find that some very shoddy opinions show up in ostensibly peer-reviewed journals. SeeMisplaced Reliance On Peer Review to Separate Valid Science From Nonsense” (Aug. 14, 2011). Scientists often claim that science is “self-correcting,” but in some areas of research, there are few severe tests and little critical review, and mostly glib confirmations from acolytes.

Letters to the editor are sometimes held out as a remedy to peer-review screw ups, but such letters, which are not themselves peer reviewed, are subject to the whims of imperious editors who might wish to silence the views of those who would be critical of their judgment in publishing the article under discussion. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion.  Letters to the editor are often frowned upon in academic circles as not advancing affirmative research and scholarship agenda.

Letters to the editor often must be sent within a short time window of initial publication, often too short for busy academics to analyze a paper carefully and comment.  Furthermore, letters  and are often limited to a few hundred words, which length is often inadequate to develop a careful critique or exposition of the issues in the paper.  Moreover, such letters suffer from an additional procedural problem:  authors are permitted a response, and the letter writers are not permitted a reply. Authors thus get the last word, which they can often use to deflect or diffuse important criticisms.  The authors’ response can be sufficiently self-serving and misleading, with immunity from further criticism, that many would-be correspondents abandon the project altogether. See, e.g., PubPeer – “Example case showing why letters to the editor can be a waste of time” (Oct. 8, 2013).

Websites and blogs provide for dynamic content, with the potential for critical reviews that can be identified by search engines. See, e.g., Paul S. Brookes, “Our broken academic journal corrections system,” PSBLAB: Cardiac Mitochondrial Research in the Lab (Jan. 14, 2014). Mostly, the internet holds untapped potential for analysis, discussion, and debate on published studies.  To be sure, some journals provide “comment fields,” on their websites, with an opportunity for open discussion.  Often, full critiques must be developed and presented elsewhere. See, e.g., Androgen Study Group, “Letter to JAMA Asking for Retraction of Misleading Article on Testosterone Therapy” (Mar. 25, 2014).

PubPeer

Kate Yandell, in TheScientist, reports on the creation of PubPeer a few years ago, as a forum for post-publication review and discussion published scientific papers. Kate Yandell, “Concerns Raised Online Linger” (Aug. 25, 2014).  Billing itself as an “online journal club,” PubPeer has pointed out potentially serious problems, some of which have led to retractions and corrections. Another internet site of interest is PubChase, which monitors discussion of particular articles, as well as generating email alerts and recommendations for related articles.

One journal editor has taken notice and given notice that he will not pay attention to post-publication peer review.  Eric J. Murphy, the editor in chief of Lipids, posting a comment at PubPeer, illustrates that there will be a good deal of resistance to post-publication open peer review, out of the control of journal editors:

“As an Editor-in-Chief of a society journal, I have never examined PubPeer nor will I do so. First, there is the crowd or group mentality that may over emphasize some point in an irrational manner.  Just as using the marble theory of officiating is bad, one should never base a decision on the quantity of negative or positive comments. Second, if the concerned individual sent an e-mail or letter to me, then I would be duty bound to examine the issue.  It is not my duty to monitor PubPeer or any other such site, but rather to respond to queries sent to me.  So, with regards to Hugh’s point, I don’t support that position at all.

Mistakes happen, although frankly we try to limit these mistakes and do take steps to prevent publishing papers with FFP, it does happen.  Also, honest mistakes happen in science all the time, so[me] of these result in an erratum, while others go unnoticed by editors and reviewers.  In such a case, someone who does notice should contact the editor to put them on notice regarding the issue so that it may be resolved.  Resolution does not necessarily mean correction, but rather the editor taking a close look at the situation, discussing the situation with the original authors, and then reaching a decision.  Most of the time a correction will be made, but not always.”

Murphy’s comments are remarkable.  PubPeer provides a forum for post-publication comment, but it hardly requires editors, investigators, and consumers of scientific studies to evaluate published works by “nose counts” of favorable and unfavorable comments.  This is not, and never has been, a democratic enterprise.  Somehow, we might expect Murphy and others to evaluate the comments, on the merits, not on their prevalence.  Murphy’s declaration that he is duty-bound to investigate and evaluate letters or emails sent to him about published articles is encouraging, but the editors’ ability to ratify publication, in the face of a private communication, without comment to the scientific community, strips the community of making a principled decision on its own.  Murphy’s way, which seems largely the way of contemporary scientific publishing, ignores the important social dimension of scientific debate and resolution of issues.  Leaving control of the discussion in the hands of the editors who approved and published studies may be asking too much of editors. Nemo iudex in causa sua.

PubPeer has already tested the limits of free speech. Kate Yandell, “PubPeer Threatened with Legal Action” (Aug. 19, 2014). A scientist whose works were receiving unfavorable attention on PubPeer threatened a lawsuit.  Let’s hope that scientists can learn to be sufficiently thick skinned that there can be open discourse of the merits of their research, their data, and their conclusions.

Pritchard v. Dow Agro – Gatekeeping Exemplified

August 25th, 2014

Robert T. Pritchard was diagnosed with Non-Hodgkin’s Lymphoma (NHL) in August 2005; by fall 2005, his cancer was in remission. Mr. Pritchard had been a pesticide applicator, and so, of course, he and his wife sued the deepest pockets around, including Dow Agro Sciences, the manufacturer of Dursban. Pritchard v. Dow Agro Sciences, 705 F.Supp. 2d 471 (W.D.Pa. 2010).

The principal active ingredient of Dursban is chlorpyrifos, along with some solvents, such as xylene, cumene, and ethyltoluene. Id. at 474.  Dursban was licensed for household insecticide use until 2000, when the EPA phased out certain residential applications.  The EPA’s concern, however, was not carcinogenicity:  the EPA categorizes chlorpyrifos as “Group E,” non-carcinogenetic in humans. Id. at 474-75.

According to the American Cancer Society (ACS), the cause or causes of NHL cases are unknown.  Over 60,000 new cases are diagnosed annually, in people from all walks of life, occupations, and lifestyles. The ACS identifies some risk factors, such as age, gender, race, and ethnicity, but the ACS emphasizes that chemical exposures are not proven risk factors or causes of NHL.  See Pritchard, 705 F.Supp. 2d at 474.

The litigation industry does not need scientific conclusions of causal connections; their business is manufacturing certainty in courtrooms. Or at least, the appearance of certainty. The Pritchards found their way to the litigation industry in Pittsburgh, Pennsylvania, in the form of Goldberg, Persky & White, P.C. The Goldberg Persky firm sued Dow Agro, and then put the Pritchards in touch with Dr. Bennet Omalu, to serve as their expert witness.  A lawsuit ensued.

Alas, the Pritchards’ lawsuit ran into a wall, or at least a gate, in the form of Federal Rule of Evidence 702. In the capable hands of Judge Nora Barry Fischer, Rule 702 became an effective barrier against weak and poorly considered expert witness opinion testimony.

Dr. Omalu, no stranger to lost causes, was the medical examiner of San Joaquin County, California, at the time of his engagement in the Pritchard case. After careful consideration of the Pritchards’ claims, Omalu prepared a four page report, with a single citation, to Harrison’s Principles of Internal Medicine.  Id. at 477 & n.6.  This research, however, sufficed for Omalu to conclude that Dursban caused Mr. Pritchard to develop NHL, as well as a host of ailments he had never even sued Dow Agro for, including “neuropathy, fatigue, bipolar disorder, tremors, difficulty concentrating and liver disorder.” Id. at 478. Dr. Omalu did not cite or reference any studies, in his report, to support his opinion that Dursban caused Mr. Pritchard’s ailments.  Id. at 480.

After counsel objected to Omalu’s report, plaintiffs’ counsel supplemented the report with some published articles, including the “Lee” study.  See Won Jin Lee, Aaron Blair, Jane A. Hoppin, Jay H. Lubin, Jennifer A. Rusiecki, Dale P. Sandler, Mustafa Dosemeci, and Michael C. R. Alavanja, “Cancer Incidence Among Pesticide Applicators Exposed to Chlorpyrifos in the Agricultural Health Study,” 96 J. Nat’l Cancer Inst. 1781 (2004) [cited as Lee].  At his deposition, and in opposition to defendants’ 702 motion, Omalu became more forthcoming with actual data and argument.  According to Omalu, the Lee study “the 2004 Lee Study strongly supports a conclusion that high-level exposure to chlorpyrifos is associated with an increased risk of NHL.’’ Id. at 480.

This opinion put forward by Omalu bordered on scientific malpractice.  No; it was malpractice.  The Lee study looked at many different cancer end points, without adjustment for multiple comparisons.  The lack of adjustment means at the very least that any interpretation of p-values or confidence intervals would have to modified to acknowledge the higher rate of random error.  Now for NHL, the overall relative risk (RR) for chlorpyrifos exposure was 1.03, with a 95% confidence interval, 0.62 to 1.70.  Lee at 1783.  In other words, the study that Omalu claimed supported his opinion was about as null a study as can be, with reasonably tight confidence interval that made a doubling of the risk rather unlikely given the sample RR.

If the multiple endpoint testing were not sufficient to dissuade a scientist, intent on supporting the Pritchards’ claims, then the exposure subgroup analysis would have scared any prudent scientist away from supporting the plaintiffs’ claims.  The Lee study authors provided two different exposure-response analyses, one with lifetime exposure and the other with an intensity-weighted exposure, both in quartiles.  Neither analysis revealed an exposure-response trend.  For the lifetime exposure-response trend, the Lee study reported an NHL RR of 1.01, for the highest quartile of chloripyrifos exposure. For the intensity-weighted analysis, for the highest quartile, the authors reported RR = 1.61, with a 95% confidence interval, 0.74 to 3.53).

Although the defense and the district court did not call out Omalu on his fantasy statistical inference, the district judge certainly appreciated that Omalu had no statistically significant associations between chloripyrifos and NHL, to support his opinion. Given the weakness of relying upon a single epidemiologic study (and torturing the data therein), the district court believed that a showing of statistical significance was important to give some credibility to Omalu’s claims.  705 F.Supp. 2d at 486 (citing General Elec. Co. v. Joiner, 522 U.S. 136, 144-46 (1997);  Soldo v. Sandoz Pharm. Corp., 244 F.Supp. 2d 434, 449-50 (W.D. Pa. 2003)).

Figure 3 adapted from Lee

Figure 3 adapted from Lee

What to do when there is really no evidence supporting a claim?  Make up stuff.  Here is how the trial court describes Omalu’s declaration opposing exclusion:

 “Dr. Omalu interprets and recalculates the findings in the 2004 Lee Study, finding that ‘an 80% confidence interval for the highly-exposed applicators in the 2004 Lee Study spans a relative risk range for NHL from slightly above 1.0 to slightly above 2.5.’ Dr. Omalu concludes that ‘this means that there is a 90% probability that the relative risk within the population studied is greater than 1.0’.”

705 F.Supp. 2d at 481 (internal citations omitted); see also id. at 488. The calculations and the rationale for an 80% confidence interval were not provided, but plaintiffs’ counsel assured Judge Fischer at oral argument that the calculation was done using high school math. Id. at 481 n.12. Judge Fischer seemed unimpressed, especially given that there was no record of the calculation.  Id. at 481, 488.

The larger offense, however, was that Omalu’s interpretation of the 80% confidence interval as a probability statement of the true relative risk’s exceeding 1.0, was bogus. Dr. Omalu further displayed his lack of statistical competence when he attempted to defend his posterior probability derived from his 80% confidence interval by referring to a power calculation of a different disease in the Lee study:

“He [Omalu] further declares that ‘‘the authors of the 2004 Lee Study themselves endorse the probative value of a finding of elevated risk with less than a 95% confidence level when they point out that ‘this analysis had a 90% statistical power to detect a 1.5–fold increase in lung cancer incidence’.”

Id. at 488 (court’s quoting of Omalu’s quoting from the Lee study). To quote Wolfgang Pauli, Omalu is so far off that he is “not even wrong.” Lee and colleagues were offering a pre-study power calculation, which they used to justify their looking at the cohort for lung cancer, not NHL, outcomes.  Lee at 1787. The power calculation does not apply to the data observed for lung cancer; and the calculation has absolutely nothing to do with NHL. The power calculation certainly has nothing to do with Omalu’s misguided attempt to offer a calculation of a posterior probability for NHL based upon a subgroup confidence interval.

Given that there were epidemiologic studies available, Judge Fischer noted that expert witnesses were obligated to factor such studies into their opinions. See 705 F.Supp. 2d at 483 (citing Soldo, 244 F.Supp. 2d at 532).  Omalu sins against Rule 702 included his failure to consider any studies other than the Lee study, regardless of how unsupportive the Lee study was of his opinion.  The defense experts pointed to several studies that found lower NHL rates among exposed workers than among controls, and Omalu completely failed to consider and to explain his opinion in the face of the contradictory evidence.  See 705 F.Supp. 2d at 485 (citing Perry v. Novartis Pharm. Corp. 564 F.Supp. 2d 452, 465 (E.D. Pa. 2008)). In other words, Omalu was shown to have been a cherry picker. Id. at 489.

In addition to the abridged epidemiology, Omalu relied upon an analogy between the ethyl-toluene and other solvents that contained benzene rings and benzene itself to argue that these chemicals, supposedly like benzene, cause NHL.  Id. at 487. The analogy was never supported by any citations to published studies, and, of course, the analogy is seriously flawed. Many chemicals, including chemicals made and used by the human body, have benzene rings, without the slightest propensity to cause NHL.  Indeed, the evidence that benzene itself causes NHL is weak and inconsistent.  See, e.g., Knight v. Kirby Inland Marine Inc., 482 F.3d 347 (2007) (affirming the exclusion of Dr. B.S. Levy in a case involving benzene exposure and NHL).

Looking at all the evidence, Judge Fischer found Omalu’s general causation opinions unreliable.  Relying upon a single, statistically non-significant epidemiologic study (Lee), while ignoring contrary studies, was not sound science.  It was not even science; it was courtroom rhetoric.

Omalu’s approach to specific causation, the identification of what caused Mr. Pritchard’s NHL, was equally spurious. Omalu purportedly conducted a “differential diagnosis” or a “differential etiology,” but he never examined Mr. Pritchard; nor did he conduct a thorough evaluation of Mr. Pritchard’s medical records. 705 F.Supp. 2d at 491. Judge Fischer found that Omalu had not conducted a thorough differential diagnosis, and that he had made no attempt to rule out idiopathic or unknown causes of NHL, despite the general absence of known causes of NHL. Id. at 492. The one study identified by Omalu reported a non-statistically significant 60% increase in NHL risk, for a subgroup in one of two different exposure-response analyses.  Although Judge Fischer treated the relative risk less than two as a non-dispositive factor in her decision, she recognized that

“The threshold for concluding that an agent was more likely than not the cause of an individual’s disease is a relative risk greater than 2.0… . When the relative risk reaches 2.0, the agent is responsible for an equal number of cases of disease as all other background causes. Thus, a relative risk of 2.0 … implies a 50% likelihood that an exposed individual’s disease was caused by the agent. A relative risk greater than 2.0 would permit an inference that an individual plaintiff’s disease was more likely than not caused by the implicated agent.”

Id. at 485-86 (quoting from Reference Manual on Scientific Evidence at 384 (2d ed. 2000)).

Left with nowhere to run, plaintiffs’ counsel swung for the bleachers by arguing that the federal court, sitting in diversity, was required to apply Pennsylvania law of evidence because the standards of Rule 702 constitute “substantive,” not procedural law. The argument, which had been previously rejected within the Third Circuit, was as legally persuasive as Omalu’s scientific opinions.  Judge Fischer excluded Omalu’s proffered opinions and granted summary judgment to the defendants. The Third Circuit affirmed in a per curiam decision. 430 Fed. Appx. 102, 2011 WL 2160456 (3d Cir. 2011).

Practical Evaluation of Scientific Claims

The evaluative process that took place in the Pritchard case missed some important details and some howlers committed by Dr. Omalu, but it was more than good enough for government work. The gatekeeping decision in Pritchard was nonetheless the target of criticism in a recent book.

Kristin Shrader-Frechette (S-F) is a professor of science who wants to teach us how to expose bad science. S-F has published, or will soon publish, a book that suggests that philosophy of science can help us expose “bad science.”  See Kristin Shrader-Frechette, Tainted: How Philosophy of Science Can Expose Bad Science (Oxford U.P. 2014)[cited below at Tainted; selections available on Google books]. S-F’s claim is intriguing, as is her move away from the demarcation problem to the difficult business of evaluation and synthesis of scientific claims.

In her introduction, S-F tells us that her book shows “how practical philosophy of science” can counteract biased studies done to promote special interests and PROFITS.  Tainted at 8. Refreshingly, S-F identifies special-interest science, done for profit, as including “individuals, industries, environmentalists, labor unions, or universities.” Id. The remainder of the book, however, appears to be a jeremiad against industry, with a blind eye towards the litigation industry (plaintiffs’ bar) and environmental zealots.

The book promises to address “public concerns” in practical, jargon-free prose. Id. at 9-10. Some of the aims of the book are to provide support for “rejecting demands for only human evidence to support hypotheses about human biology (chapter 3), avoiding using statistical-significance tests with observational data (chapter 12), and challenging use of pure-science default rules for scientific uncertainty when one is doing welfare-affecting science (chapter 14).”

Id. at 10. Hmmm.  Avoiding statistical significance tests for observational data?!?  If avoided, what does S-F hope to use to assess random error?

And then S-F refers to plaintiffs’ hired expert witness (from the Milward case), Carl Cranor, as providing “groundbreaking evaluations of causal inferences [that] have helped to improve courtroom verdicts about legal liability that otherwise put victims at risk.” Id. at 7. Whether someone is a “victim” and has been “at risk” turns on assessing causality. Cranor is not a scientist, and his philosophy of science turns of “weight of the evidence” (WOE), a subjective, speculative approach that is deaf, dumb, and blind to scientific validity.

There are other “teasers,” in the introduction to Tainted.  S-F advertises that her Chapter 5 will teach us that “[c]ontrary to popular belief, animal and not human data often provide superior evidence for human-biological hypotheses.”  Tainted at 11. Chapter 6 will show that“[c]ontrary to many physicists’ claims, there is no threshold for harm from exposure to ionizing radiation.” Id.  S-F tells us that her Chapter 7 will criticize “a common but questionable way of discovering hypotheses in epidemiology and medicine—looking at the magnitude of some effect in order to discover causes. The chapter shows instead that the likelihood, not the magnitude, of an effect is the better key to causal discovery.” Id. at 13. Discovering hypotheses — what is that about? You might have thought that hypotheses were framed from observations and then tested.

Which brings us to the trailer for Chapter 8, in which S-F promises to show that “[c]ontrary to standard statistical and medical practice, statistical-significance tests are not causally necessary to show medical and legal evidence of some effect.” Tainted at 11. Again, the teaser raises lots of questions such as what could S-F possibly mean when she says statistical tests are not causally necessary to show an effect.  Later in the introduction, S-F says that her chapter on statistics “evaluates the well-known statistical-significance rule for discovering hypotheses and shows that because scientists routinely misuse this rule, they can miss discovering important causal hypotheses. Id. at 13. Discovering causal hypotheses is not what courts and regulators must worry about; their task is to establish such hypotheses with sufficient, valid evidence.

Paging through the book reveals that a rhetoric that is thick and unremitting, with little philosophy of science or meaningful advice on how to evaluate scientific studies.  The statistics chapter calls out, and lo, it features a discussion of the Pritchard case. See Tainted, Chapter 8, “Why Statistics Is Slippery: Easy Algorithms Fail in Biology.”

The chapter opens with an account of German scientist Fritz Haber’s development of organophosphate pesticides, and the Nazis use of related compounds as chemical weapons.  Tainted at 99. Then, in a fevered non-sequitur and rhetorical flourish, S-F states, with righteous indignation, that although the Nazi researchers “clearly understood the causal-neurotoxic effects of organophosphate pesticides and nerve gas,” chemical companies today “claim that the causal-carcinogenic effects of these pesticides are controversial.” Is S-F saying that a chemical that is neurotoxic must be carcinogenic for every kind of human cancer?  So it seems.

Consider the Pritchard case.  Really, the Pritchard case?  Yup; S-F holds up the Pritchard case as her exemplar of what is wrong with civil adjudication of scientific claims.  Despite the promise of jargon-free language, S-F launches into a discussion of how the judges in Pritchard assumed that statistical significance was necessary “to hypothesize causal harm.”  Tainted at 100. In this vein, S-F tells us that she will show that:

“the statistical-significance rule is not a legitimate requirement for discovering causal hypotheses.”

Id. Again, the reader is left to puzzle why statistical significance is discussed in the context of hypothesis discovery, whatever that may be, as opposed to hypothesis testing or confirmation. And whatever it may be, we are warned that “unless the [statistical significance] rule is rejected as necessary for hypothesis-discovery, it will likely lead to false causal claims, questionable scientific theories, and massive harm to innocent victims like Robert Pritchard.”

Id. S-F is decidedly not adverting to Mr. Pritichard’s victimization by the litigation industry and the likes of Dr. Omalu, although she should. S-F not only believes that the judges in Pritchard bungled their gatekeeping wrong, she knows that Dr. Omalu was correct, and the defense experts wrong, and that Pritchard was a victim of Dursban and of questionable scientific theories that were used to embarrass Omalu and his opinions.

S-F promised to teach her readers how to evaluate scientific claims and detect “tainted” science, but all she delivers here is an ipse dixit.  There is no discussion of the actual measurements, extent of random error, or threats to validity, for studies cited either by the plaintiffs or the defendants in Pritchard.  To be sure, S-F cites the Lee study in her endnotes, but she never provides any meaningful discussion of that study or any other that has any bearing on chlorpyrifos and NHL.  S-F also cited two review articles, the first of which provides no support for her ipse dixit:

“Although mutagenicity and chronic animal bioassays for carcinogenicity of chlorpyrifos were largely negative, a recent epidemiological study of pesticide applicators reported a significant exposure response trend between chlorpyrifos use and lung and rectal cancer. However, the positive association was based on small numbers of cases, i.e., for rectal cancer an excess of less than 10 cases in the 2 highest exposure groups. The lack of precision due to the small number of observations and uncertainty about actual levels of exposure warrants caution in concluding that the observed statistical association is consistent with a causal association. This association would need to be observed in more than one study before concluding that the association between lung or rectal cancer and chlorpyrifos was consistent with a causal relationship.

There is no evidence that chlorpyrifos is hepatotoxic, nephrotoxic, or immunotoxic at doses less than those that cause frank cholinesterase poisoning.”

David L. Eaton, Robert B. Daroff, Herman Autrup, James Bridges, Patricia Buffler, Lucio G. Costa, Joseph Coyle, Guy McKhann, William C. Mobley, Lynn Nadel, Diether Neubert, Rolf Schulte-Hermann, and Peter S. Spencer, “Review of the Toxicology of Chlorpyrifos With an Emphasis on Human Exposure and Neurodevelopment,” 38 Critical Reviews in Toxicology 1, 5-6(2008).

The second cited review article was written by clinical ecology zealot[1], William J. Rea. William J. Rea, “Pesticides,” 6 Journal of Nutritional and Environmental Medicine 55 (1996). Rea’s article does not appear in Pubmed.

Shrader-Frechette’s Criticisms of Statistical Significance Testing

What is the statistical significance against which S-F rails? She offers several definitions, none of which is correct or consistent with the others.

“The statistical-significance level p is defined as the probability of the observed data, given that the null hypothesis is true.8

Tainted at 101 (citing D. H. Johnson, “What Hypothesis Tests Are Not,” 16 Behavioral Ecology 325 (2004). Well not quite; attained significance probability is the probability of data observed or those more extreme, given the null hypothesis.  A Tainted definition.

Later in Chapter 8, S-F discusses significance probability in a way that overtly commits the transposition fallacy, not a good thing to do in a book that sets out to teach how to evaluate scientific evidence:

“However, typically scientists view statistical significance as a measure of how confidently one might reject the null hypothesis. Traditionally they have used a 0.05 statistical-significance level, p < or = 0.05, and have viewed the probability of a false-positive (incorrectly rejecting a true null hypothesis), or type-1, error as 5 percent. Thus they assume that some finding is statistically significant and provides grounds for rejecting the null if it has at least a 95-percent probability of not being due to chance.

Tainted at 101. Not only does the last sentence ignore the extent of error due to bias or confounding, it erroneously assigns a posterior probability that is the complement of the significance probability.  This error is not an isolated occurrence; here is another example:

“Thus, when scientists used the rule to examine the effectiveness of St. John’s Wort in relieving depression,14 or when they employed it to examine the efficacy of flutamide to treat prostate cancer,15 they concluded the treatments were ineffective because they were not statistically significant at the 0.05 level. Only at p < or = 0.14 were the results statistically significant. They had an 86-percent chance of not being due to chance.16

Tainted at 101-02 (citing papers by Shelton (endnote 14)[2], by Eisenberger (endnote 15) [3], and Rothman’s text (endnote 16)[4]). Although Ken Rothman has criticized the use of statistical significance tests, his book surely does not interpret a p-value of 0.14 as an 86% chance that the results were not due to chance.

Although S-F previous stated that statistical significance is interpreted as the probability that the null is true, she actually goes on to correct the mistake, sort of:

“Requiring the statistical-significance rule for hypothesis-development also is arbitrary in presupposing a nonsensical distinction between a significant finding if p = 0.049, but a nonsignificant finding if p = 0. 051.26 Besides, even when one uses a 90-percent (p < or = 0.10), an 85-percent (p < or = 0.15), or some other confidence level, it still may not include the null point. If not, these other p values also show the data are consistent with an effect. Statistical-significance proponents thus forget that both confidence levels and p values are measures of consistency between the data and the null hypothesis, not measures of the probability that the null is true. When results do not satisfy the rule, this means merely that the null cannot be rejected, not that the null is true.”

Tainted at 103.

S-F’s repeats some criticisms of significance testing, most of which involve their own misunderstandings of the concept.  It hardly suffices to argue that evaluating the magnitude of random error is worthless because it does not measure the extent of bias and confounding.  The flaw lies in those who would interpret the p-value as the sole measure of error involved in a measurement.

S-F takes the criticisms of significance probability to be sufficient to justify an alternative approach: evaluating causal hypotheses “on a preponderance of evidence,47 whether effects are more likely than not.”[5] Here citations, however, do not support the notion that an overall assessment of the causal hypothesis is a true alternative of statistical testing, but rather only a later step in the causal assessment, which presupposes the previous elimination of random variability in the observed associations.

S-F compounds her confusion by claiming that this purported alternative is superior to significance testing or any evaluation of random variability, and by noting that juries in civil cases must decide causal claims on the preponderance of the evidence, not on attained significance probabilities:

“In welfare-affecting areas of science, a preponderance-of-evidence rule often is better than a statistical-significance rule because it could take account of evidence based on underlying mechanisms and theoretical support, even if evidence did not satisfy statistical significance. After all, even in US civil law, juries need not be 95 percent certain of a verdict, but only sure that a verdict is more likely than not. Another reason for requiring the preponderance-of-evidence rule, for welfare-related hypothesis development, is that statistical data often are difficult or expensive to obtain, for example, because of large sample-size requirements. Such difficulties limit statistical-significance applicability. ”

Tainted at 105-06. S-F’s assertion that juries need not have 95% certainty in their verdict is either a misunderstanding or a misrepresentation of the meaning of a confidence interval, and a conflation of two very kinds of probability or certainty.  S-F invites a reading that commits the transposition fallacy by confusing the probability involved in a confidence interval with that involved in a posterior probability.  S-F’s claim that sample size requirements often limit the ability to use statistical significance evaluations is obviously highly contingent upon the facts of case, but in civil cases, such as Pritchard, this limitation is rarely at play.  Of course, if the sample size is too small to evaluate the role of chance, then a scientist should probably declare the evidence too fragile to support a causal conclusion.

S-F also postulates that that a posterior probability rather than a significance probability approach would “better counteract conflicts of interest that sometimes cause scientists to pay inadequate attention to public-welfare consequences of their work.” Tainted at 106. This claim is a remarkable assertion, which is not supported by any empirical evidence.  The varieties of evidence that go into an overall assessment of a causal hypothesis are often quantitatively incommensurate.  The so-called preponderance-of-the-evidence described by S-F is often little more than a subjective overall assessment of weight of the evidence.  The approving citations to the work of Carl Cranor support interpreting S-F to endorse this subjective, anything-goes approach to weight of the evidence.  As for WOE eliminating inadequate attention to “public welfare,” S-F’s citations actually suggest the opposite. S-F’s citations to the 1961 reviews by Wynder and by Little illustrate how subjective narrative reviews can be, with diametrically opposed results.  Rather than curbing conflicts of interest, these subjective, narrative reviews illustrate how contrary results may be obtained by the failure to pre-specify criteria of validity, and inclusion and exclusion of admissible evidence. Still, S-F asserts that “up to 80 percent of welfare-related statistical studies have false-negative or type-II errors, failing to reject a false null.” Tainted at 106. The support for this assertion is a citation to a review article by David Resnik. See David Resnik, “Statistics, Ethics, and Research: An Agenda for Education and Reform,” 8 Accountability in Research 163, 183 (2000). Resnik’s paper is a review article, not an empirical study, but at the page cited by S-F, Resnik in turn cites to well-known papers that present actual data:

“There is also evidence that many of the errors and biases in research are related to the misuses of statistics. For example, Williams et al. (1997) found that 80% of articles surveyed that used t-tests contained at least one test with a type II error. Freiman et al. (1978)  * * *  However, empirical research on statistical errors in science is scarce, and more work needs to be done in this area.”

Id. The papers cited by Resnik, Williams (1997)[6] and Freiman (1978)[7] did identify previously published studies that over-interpreted statistically non-significant results, but the identified type-II errors were potential errors, not ascertained errors, because the authors made no claim that every non-statistically significant result actually represented a missed true association. In other words, S-F is not entitled to say that these empirical reviews actually identified failures to reject fall null hypotheses. Furthermore, the empirical analyses in the studies cited by Resnik, who was in turn cited by S-F, did not look at correlations between alleged conflicts of interest and statistical errors. The cited research calls for greater attention to proper interpretation of statistical tests, not for their abandonment.

In the end, at least in the chapter on statistics, S-F fails to deliver much if anything on her promise to show how to evaluate science from a philosophic perspective.  Her discussion of the Pritchard case is not an analysis; it is a harangue. There are certainly more readable, accessible, scholarly, and accurate treatments of the scientific and statistical issues in this book.  See, e.g., Michael B. Bracken, Risk, Chance, and Causation: Investigating the Origins and Treatment of Disease (2013).


[1] Not to be confused with the deceased federal judge by the same name, William J. Rea. William J. Rea, 1 Chemical Sensitivity – Principles and Mechanisms (1992); 2 Chemical Sensitivity – Sources of Total Body Load (1994),  3 Chemical Sensitivity – Clinical Manifestation of Pollutant Overload (1996), 4 Chemical Sensitivity – Tools of Diagnosis and Methods of Treatment (1998).

[2] R. C. Shelton, M. B. Keller, et al., “Effectiveness of St. John’s Wort in Major Depression,” 285 Journal of the American Medical Association 1978 (2001).

[3] M. A. Eisenberger, B. A. Blumenstein, et al., “Bilateral Orchiectomy With or Without Flutamide for Metastic [sic] Prostate Cancer,” 339 New England Journal of Medicine 1036 (1998).

[4] Kenneth J. Rothman, Epidemiology 123–127 (NY 2002).

[5] Endnote 47 references the following papers: E. Hammond, “Cause and Effect,” in E. Wynder, ed., The Biologic Effects of Tobacco 193–194 (Boston 1955); E. L. Wynder, “An Appraisal of the Smoking-Lung-Cancer Issue,”264  New England Journal of Medicine 1235 (1961); see C. Little, “Some Phases of the Problem of Smoking and Lung Cancer,” 264 New England Journal of Medicine 1241 (1961); J. R. Stutzman, C. A. Luongo, and S. A McLuckey, “Covalent and Non-Covalent Binding in the Ion/Ion Charge Inversion of Peptide Cations with Benzene-Disulfonic Acid Anions,” 47 Journal of Mass Spectrometry 669 (2012). Although the paper on ionic charges of peptide cations is unfamiliar, the other papers do not eschew traditional statistical significance testing techniques. By the time these early (1961) reviews were written, the association that was reported between smoking and lung cancer was clearly accepted as not likely explained by chance.  Discussion focused upon bias and potential confounding in the available studies, and the lack of animal evidence for the causal claim.

[6] J. L. Williams, C. A. Hathaway, K. L. Kloster, and B. H. Layne, “Low power, type II errors, and other statistical problems in recent cardiovascular research,” 42 Am. J. Physiology Heart & Circulation Physiology H487 (1997).

[7] Jennie A. Freiman, Thomas C. Chalmers, Harry Smith and Roy R. Kuebler, “The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 ‛negative’ trials,” 299 New Engl. J. Med. 690 (1978).

Too Many Narratives – Historians in the Dock

July 13th, 2014

Historical Associates Inc. (HAI) is a commercial vendor for historical services, including litigation services. Understandably, this firm, like the academic historians who service the litigation industry, takes a broad view of the desirability of historian expert witness testimony.  An article in one of the HAI’s newsletters stakes out lawyer strategies in trying to prove historical facts.  Lawyers can present percipient witnesses, or they

“can present the story themselves, but in the end, arguments by advocates can raise questions of bias that obscure, rather than clarify, the historical facts at issue.”

Mike Reis and Dave Wiseman, “Introducing and interpreting facts-in-evidence: the historian’s role as expert witness,” HAIpoints 1 (Summer 2010)[1]. These commercial historians recommend that advocacy bias, so clear in lawyers’ narratives, be diffused or obscured by having a professional historian present the “story.”  They tout the research skills of historians: “Historians know how to find critical historical information.” And to be sure, historians, whether academic or for-hire may offer important bibliographic services, as well as help in translating, authenticating, and contextualizing documents.  But these historians from HAI want a role on center-stage, or at least in the witness box.  They tell us that:

“Historians synthesize information into well-documented, compelling stories.”

Ah yes, compelling stories, as in “the guiltless gust of a rattling good yarn[2].” The legal system should take a pass on such stories.

*     *     *     *     *     *

A recent law review article attempts to provide a less commercial defense of expert witness testimony.  See Alvaro Hasani, “Putting history on the stand: a closer look at the legitimacy of criticisms levied against historians who testify as expert witnesses,” 34 Whittier L. Rev. 343 (2013) [Hasani].  Hasani argues that historians strive to provide objective historical “interpretation,” by selecting reliable sources, and reliably reading and interpreting these sources to create a reliable “narrative.” Hasani at 355. Hasani points to some courts that have thrown up their hands and declared Daubert reliability factors inapplicable to non-scientific historian testimony. See, e.g., United States v. Paracha, No. 03 CR. 1 197(SHS), 2006 WL 12768, at *19 (S.D.N.Y. Jan. 3, 2006) (noting that Daubert is not designed for gatekeeping of a non-scientific, historian expert witness’s methodology); Saginaw Chippewa Indian Tribe of Michigan v. Granholm, 690 F. Supp. 2d 622, 634 (E.D. Mich. 2010) (noting that “[t]here is no way to ‘test’ whether the experts’ testimony concerning the historical understanding of the treaties is correct. Nor is it possible to establish an ‘error rate’ for historical experts.”).

Not all testifying historians agree, however, that their research and findings are non-scientific.  Here is how one plaintiffs’ expert witness characterized historical thinking:

“Q. Do you believe that historical thinking is a form of scientific thinking?

A. I do. I think that history is sometimes classed with the humanities, sometimes classed with the social sciences, but I think there is a good deal of historical research and writing that is a form of social science.”

Examination Before Trial of Gerald Markowitz, in Mendez v. American Optical, District Court for Tarrant County, Texas (342d Judicial District), at 44:13-20 (July 19, 2005). Professor Susan Haack, and others, have made a persuasive case that the epistemic warrants for claims of knowledge, whether denominated scientific or non-scientific, are not different in kind. If historian testimony is not about knowledge of the past, then it clearly has no role in a trial. Furthermore, Professor Markowitz is correct that sometimes historical opinions are scientific in the sense that they can be tested. If a labor historian asserts that workers are exploited and subjected to unsafe work conditions due to the very nature of capitalism and the profit motives, then that historian’s opinion will be substantially embarrassed by the widespread occupational disease in European and Asian communist regimes.

When Deborah Lipstadt described historian David Irving as a holocaust denier[3], Irving sued Lipstadt for defamation.  In defending against the claim, Lipstadt successfully carried the burden of proving the truth of her accusation.  The trial court’s judgment, quoted by Hasani, reads like a so-called Daubert exclusion of plaintiff Irving’s putative historical writing. Irving v. Penguin Books Ltd., No. 1996-1-1113, 2000 WL 362478, at ¶¶ 1.1, 13.140 (Q.B. Apr. 11, 2000)(finding that “Irving ha[d] misstated historical evidence; adopted positions which run counter to the weight of the evidence; given credence to unreliable evidence and disregarded or dismissed credible evidence.”).

The need for gatekeeping of historian testimony should be obvious.  Historian testimony is often narrative of historical fact that is not beyond the ken of an ordinary fact finder, once the predicate facts are placed into evidence.  Such narratives of historical fact present a serious threat to the integrity of fact finding by creating the conditions for delegation and deferring fact finding responsibility to the historian witness, with an abdication of responsibility by the fact finder. See Ronald J. Allen, “The Conceptual Challenge of Expert Evidence,” 14 Discusiones Filosóficas 41, 50-53 (2013).

Some historians clearly believe that they are empowered by the witness chair to preach or advocate. Allan M. Brandt, who has served as a party expert witness to give testimony on many occasions for plaintiffs in tobacco cases, unapologetically described the liberties he has taken thus:

“It seems to me now, after the hopes and disappointments of the courtroom battle, that we have a role to play in determining the future of the tobacco pandemic. If we occasionally cross the boundary between analysis and advocacy, so be it. The stakes are high, and there is much work yet to do.”

Allan M. Brandt, The Cigarette Century: The Rise, Fall, and Deadly Persistance of the Product That Defined American 505 (2007).

Hasani never comes to grips with the delegation problem or with Brandt’s attitude, which is quite prevalent in the product liability arena. The problem is more than merely “occasional.” The overreaching by historian witnesses reflects the nature of their discipline, the lack of necessity for their testimony, and the failure of courts to exercise their gatekeeping. The problem with Brandt’s excuse making is that neither analysis nor advocacy is needed or desired. Advocacy is the responsibility of counsel, as well as the kind of analysis involved in much of historian testimony.  For instance, when historians offer testimony about the so-called “state of the art,” they are drawing inferences from published and unpublished sources about what people knew or should have known, and about their motivations.  Although their bibliographic and historical researches can be helpful to the fact finder’s effort to understand who was writing what about the issue in times past, historians have no real expertise, beyond the lay fact finder, in discerning intentions, motivations, and belief states.

Hasani concludes that the prevalence of historian expert witness testimony is growing. Hasani at 364.  He cites, however, only four cases for the proposition, three of which pre-date Daubert.  The fourth is an native American rights case. Hasani at 364 n.139. There is little or no evidence that historian expert witness testimony is becoming more prevalent, although it continues in product liability where state of the art — who knew what, when — remains an issue in strict liability and negligence. Mack v. Stryker Corp., 893 F. Supp. 2d 976 (D. Minn. 2012), aff’d, 748 F.3d 845 (8th Cir. 2014). There remains a need for judicial vigilance in policing such state-of-the-art testimony.


[1] Mike Reis is the Vice President and Director of Litigation Research at History Associates Inc. Mr. Reis was received his bachelor’s degree from Loyola College, and his master’s degree from George Washington University, both in history. David Wiseman, an erstwhile trial attorney, conducts historical research for History Associates.

[2] Attributed to Anthony Burgess.

[3] Deborah E. Lipstadt, Denying the Holocaust: The Growing Assault on Truth and Memory 8 (1993).

 

Twerski’s Defense of Daubert

July 6th, 2014

Professor Aaron D. Twerski teaches torts and products liability at the Brooklyn Law School.  Along with a graduating student, Lior Sapir, Twerski has published an article in which the authors mistakenly asseverate that “[t]his is not another article about Daubert.” Aaron D. Twerski & Lior Sapir, “Sufficiency of the Evidence Does Not Meet Daubert Standards: A Critique of the Green-Sanders Proposal,” 23 Widener L.J. 641, 641 (2014) [Twerski & Sapir].

A few other comments.

1. The title of the article.  True, true, and immaterial. As Professor David Bernstein has pointed out many times, Daubert is no longer the law; Federal Rule of Evidence 702, a statute, is the law.  Just as the original Rule 702 superseded Frye in 1975, a revised Rule 702, in 2000, superseded Daubert in 1975. See David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).

2. Twerski and Sapir have taken aim at a draft paper by Professors Green and Sanders, who also presented similar ideas at a workshop in March 2012, in Spain. The Green-Sanders manuscript is available on line. Michael D. Green & Joseph Sanders, “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States,” (March 5, 2012) <downloaded on March 25, 2012>. This article appears to have matured since spring 2012, but it has never progressed to parturition.  Professor Green’s website suggests a mutated version is in the works:  “The Daubert Sleight of Hand: Substituting Reliability, Methodology, and Reasoning for an Old Fashioned Sufficiency of the Evidence Test.”

Indeed, the draft paper is a worthwhile target. SeeAdmissibility versus Sufficiency of Expert Witness Evidence” (April 18, 2012).  Green and Sanders pursue a reductionist approach to Rule 702, which is unfaithful to the letter and spirit of the law.

3. In their critique of Green and Sanders, Twerski and Sapir get some issues wrong. First they insist upon talking about Daubert criteria.  The “criteria” were never really criteria, and as Bernstein’s scholarship establishes, it is time to move past Daubert.

4. Twerski and Sapir assert that Daubert imposes a substantial or heavy burden of proof upon the proponent of expert witness opinion testimony:

“The Daubert trilogy was intended to set a formidable standard for admissibility before one entered the thicket of evaluating whether it was sufficient to serve as grounds for recovery.”

Twerski & Sapir at 648.

Daubert instituted a “high threshold of reliability”.

Twerski & Sapir at 649.

“But, the message from the Daubert trilogy is unmistakable: a court must have a high degree of confidence in the integrity of scientific evidence before it qualifies for consideration in any formal test to be utilized in litigation.”

Twerski & Sapir at 650.

“The Daubert standard is anything but minimal.”

Twerski & Sapir at 651.

Twerski and Sapir never explain whence comes “high,” “formidable,” and “anything but minimal.” To be sure, the Supreme Court noted that “[s]ince Daubert . . . parties relying on expert evidence have had notice of the exacting standards of reliability such evidence must meet.” Weisgram v. Marley Co., 528 U.S. 440, 455 (2000) (emphasis added). An exacting standard, however, is not necessarily a heavy burden.  It may be that the exacting standard is infrequently satisfied because the necessary evidence and inferences, of sufficiency quality and validity, are often missing. The truth is that science is often in the no-man’s land of indeterminate, inconclusive, and incomplete. Nevertheless, Twerski and Sapir play into the hands of the reductionist Green-Sanders’ thesis by talking about what appears to be a [heavy] burden of proof and the “weight of evidence” needed to sustain the burden.

5. Twerski and Sapir obviously recognize that reliability is different from sufficiency, but they miss the multi-dimensional aspect of expert witness opinion testimony.  Consider their assertion that:

“[t]he Court of Appeals for the Eleventh Circuit in Joiner had not lost its senses when it relied on animal studies to prove that PCBs cause lung cancer. If the question was whether any evidence viewed in the light most favorable to plaintiff supported liability, the answer was probably yes.”

Twerski & Sapir at 649; see Joiner v. Gen. Electric Co., 78 F.3d 524, 532 (11th Cir. 1996) rev’d, 522 U.S. 136 (1997).

The imprecision in thinking about expert witness testimony obscures what happened in Joiner, and what must happen under the structure of the evidence statutes (or case law).  The Court of Appeals never relied upon animal studies; nor did the district court below.  Expert witnesses relied upon animal studies, and other studies, and then offered an opinion that these studies “prove” PCBs cause human lung cancer, and Mr. Joiner’s lung cancer in particular.  Those opinions, which the Eleventh Circuit would have taken at face value, would be sufficient to support submitting the case to jury.  Indeed, courts that evade the gatekeeping requirements of Rule 702 routinely tout the credentials of the expert witnesses, recite that they have used science in some sense, and that criticisms of their opinions “go to the weight not the admissibility” of the opinions.  These are, of course, evasions used to dodge Daubert and Rule 702. They are evasions because the science recited is at a very high level of abstraction (“I relied upon epidemiology”), because credentials are irrelevant, and because “weight not the admissibility” is a conclusion not a reason.

Some of the issues obscured by the reductionist weight-of-the-evidence approach are the internal and external validity of the studies cited, whether the inferences drawn from the studies cited are valid and accurate, and whether the method of synthesizing  conclusion from disparate studies is appropriate. These various aspects of an evidentiary display cannot be reduced to a unidimensional “weight.” Consider how many observational studies suggested, some would say demonstrated, that beta carotene supplements reduced the risk of lung cancer, only to be pushed aside by one or two randomized clinical trials.

6. Twerski and Sapir illustrate the crucial point that gatekeeping judges must press beyond the conclusory opinions by exploring the legal controversy over Parlodel and post-partum strokes.  Twerski & Sapir at 652. Their exploration takes them into some of the same issues that confronted the Supreme Court in Joiner:  extrapolations or “leaps of faith” between different indications, different species, different study outcomes, between surrogate end points and the end point of interest, between very high to relatively low therapeutic doses. Twerski and Sapir correctly discern that these various issues cannot be simply subsumed under weight or sufficiency.

7. Professors Green and Sanders have published a brief reply, in which they continue their “weight of the evidence” reductionist argument. Michael D. Green & Joseph Sanders, “In Defense of Sufficiency: A Reply to Professor Twerski and Mr. Sapir,” 23 Widener L.J. 663 (2014). Green and Sanders restate their position that courts can, should, and do sweep all the nuances of evidence and inference validity into a single metric – weight and sufficiency – to adjudicate so-called Daubert challenges.  What Twerski and Sapir seem to have stumbled upon is that Green and Sanders are not engaged in a descriptive enterprise; they are prescribing a standard that abridges and distorts the law and best practice in order to ensure that dubious causal claims are submitted to the finder of fact.