For your delectation and delight, desultory dicta on the law of delicts.

P-Values: Pernicious or Perspicacious?

May 12th, 2018

Professor Kingsley R. Browne, of the Wayne State University Law School, recently published a paper that criticized the use of p-values and significance testing in discrimination litigation. Kingsley R. Browne, “Pernicious P-Values: Statistical Proof of Not Very Much,” 42 Univ. Dayton L. Rev. 113 (2017) (cited below as Browne). Browne amply documents the obvious and undeniable, that judges, lawyers, and even some ill-trained expert witnesses, are congenitally unable to describe and interpret p-values properly. Most of Browne’s examples are from the world of anti-discrimination law, but he also cites a few from health effects litigation as well. Browne also cites from many of the criticisms of p-values in the psychology and other social science literature.

Browne’s efforts to correct judicial innumeracy are welcomed, but they take a peculiar turn in this law review article. From the well-known state of affairs of widespread judicial refusal or inability to discuss statistical concepts accurately, Browne argues for what seem to be two incongruous, inconsistent responses. Rejecting the glib suggestion of former Judge Posner that evidence law is not “fussy” about evidence, Browne argues that federal evidence law requires courts to be “fussy” about evidence, and that Rule 702 requires courts to exclude expert witnesses, whose opinions fail to “employ[] in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Browne at 143 (quoting from Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Browne tells us, with apparently appropriate intellectual rigor, that “[i]f a disparity that does not provide a p-value of less than 0.05 would not be accepted as meaningful in the expert’s discipline, it is not clear that the expert should be allowed to testify – on the basis of his expertise in that discipline – that the disparity is, in fact, meaningful.” Id.

In a volte face, Browne then argues that p-values do “not tell us much,” basically because they are dependent upon sample size. Browne suggests that the quantitative disparity between expected value and observed proportion or average can be assessed without the use of p-values, and that measuring a p-value “adds virtually nothing and just muddies the water.” Id. at 152. The prevalent confusion among judges and lawyers seems sufficient in Browne’s view to justify his proposal, as well as his further suggestion that Rule 403 should be invoked to exclude p-values:

The ease with which reported p-values cause a trier of fact to slip into the transposition fallacy and the difficulty of avoiding that lapse of logic, coupled with the relatively sparse information actually provided by the p-value, make p-values prime candidates for exclusion under Federal Rule of Evidence 403. *** If judges, not to mention the statistical experts they rely on, cannot use the information without falling into fallacious reasoning, the likelihood that the jury will misunderstand the evidence is very high. Since the p-value actually provides little useful relevant information, the high risk of misleading the jury greatly exceeds its scant probative value, so it simply should not be presented to the jury.”

Id. at 152-53.

And yet, elsewhere in the same article, Browne ridicules one court and several expert witnesses who have argued in favor of conclusions that were based upon p-values up to 50%.1 The concept of p-values cannot be so flexible as to straddle the extremes of having no probative value, and yet capable of rendering an expert witness’s opinions ludicrous. P-values quantify an estimate of random error, even if that error rate varies with sample size. To be sure, the measure of random error depends upon the specified model and assumption of a null hypothesis, but the crucial point is that the estimate (whether mean, proportion, risk ratio, risk difference, etc.) is rather meaningless without some further estimate of random variability of the estimate. Of course, random error is not the only type of error, but the existence of other potential systematic errors is hardly a reason to ignore random error.

In the science of health effects, many applications of p-values have given way to the use of confidence intervals, which arguably provide more direct assessments of both sample estimates, along with ranges of potential outcomes that are reasonably compatible with the sample estimates. Remarkably, Browne never substantively discusses confidence intervals in his article.

Under the heading of other problems with p-values and significance testing, Browne advances four additional putative problems with p-values. First, Browne asserts with little to no support that “[t]he null hypothesis is unlikely a priori.” Id. at 155. He fails to tell us why the null hypothesis of no disparity is not a reasonable starting place in the absence of objective evidence of a prior estimate. Furthermore, a null hypothesis of no difference will have legal significance in claims of health effects, or of unlawful discrimination.

Second, Browne argues that significance testing will lead to “[c]onflation of statistical and practical (or legal) significance” in the minds of judges and jurors. Id. at 156-58. This charge is difficult to sustain. The actors in legal cases can probably best appreciate practical significance and its separation from statistical significance, most readily. If a large class action showed that the expected value of a minority’s proportion was 15%, and the observed proportion was 14.8%, p < 0.05, most innumerate judges and jurors would sense that this disparity was unimportant and that no employer would fine tune its discriminatory activities so closely as to achieve such a meaningless difference.

Third, Browne reminds us that the validity and the interpretation of a p-value turns on the assumption that the statistical model is perfectly specified. Id. at 158-59. His reminder is correct, but again, this aspect of p-values (or confidence intervals) is relatively easy to explain, as well as to defend or challenge. To be sure, there may be legitimate disputes about whether an appropriate model was used (say binomial versus hypergeometric), but such disputes are hardly the most arcane issues that judges and jurors will face.

Fourth, Browne claims that “the alternative hypothesis is seldom properly specified.” Id. at 159-62. Unless analysts are focused on measuring pre-test power or type II error, however, they need not advance an alternative hypothesis. Furthermore, it is hardly a flaw with significance testing that it does not account for systematic bias or confounding.

Browne does not offer an affirmative response such as urging courts to adopt a Bayesian program. A Bayesian response to prevalent blunders in interpreting statistical significance would introduce perhaps even more arcane and hard-to-discern blunders in court proceedings. Browne also leaves courts without a meaningful approach to evaluate random error other than to engage in crude comparisons between two means or proportions. The recommendations in this law review article appear to be a giant step, backwards, into an epistemic void.

1See Browne at 146, citing In re Photochromic Lens Antitrust Litig., 2014 WL 1338605 (M.D. Fla. April 3, 2014) (reversing magistrate judge’s exclusion of an expert witness who had advanced claims based upon p-value of 0.50); id. at 147 n. 116, citing In re High-Tech Employee Antitrust Litig., 2014 WL 1351040 (N.D. Cal. 2014).

Failed Gatekeeping in Ambrosini v. Labarraque (1996)

December 28th, 2017

The Ambrosini case straddled the Supreme Court’s 1993 Daubert decision. The case began before the Supreme Court clarified the federal standard for expert witness gatekeeping, and ended in the Court of Appeals for the District of Columbia, after the high court adopted the curious notion that scientific claims should be based upon reliable evidence and valid inferences. That notion has only slowly and inconsistently trickled down to the lower courts.

Given that Ambrosini was litigated in the District of Columbia, where the docket is dominated by regulatory controversies, frequently involving dubious scientific claims, no one should be surprised that the D.C. Court of Appeals did not see that the Supreme Court had read “an exacting standard” into Federal Rule of Evidence 702. And so, we see, in Ambrosini, this Court of Appeals citing and purportedly applying its own pre-Daubert decision in Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).1 In 2000, the Federal Rule of Evidence 702 was revised in a way that extinguishes the precedential value of Ambrosini and the broad dicta of Ferebee, but some courts and commentators have failed to stay abreast of the law.

Escolastica Ambrosini was using a synthetic progestin birth control, Depo-Provera, as well as an anti-nausea medication, Bendectin, when she became pregnant. The child that resulted from this pregnancy, Teresa Ambrosini, was born with malformations of her face, eyes, and ears, cleft lip and palate, and vetebral malformations. About three percent of all live births in the United States have a major malformation. Perhaps because the Divine Being has sovereign immunity, Escolastica sued the manufacturers of Bendectin and Depo-Provera, as well as the prescribing physician.

The causal claims were controversial when made, and they still are. The progestin at issue, medroxyprogesterone acetate (MPA), was embryotoxic in the cynomolgus monkey2, but not in the baboon3. The evidence in humans was equivocal at best, and involved mostly genital malformations4; the epidemiologic evidence for the MPA causal claim to this day remains unconvincing5.

At the close of discovery in Ambrosini, Upjohn (the manufacturer of the progestin) moved for summary judgment, with a supporting affidavit of a physician and geneticist, Dr. Joe Leigh Simpson. In his affidavit, Simpson discussed three epidemiologic studies, as well as other published papers, in support of his opinion that the progestin at issue did not cause the types of birth defects manifested by Teresa Ambrosini.

Ambrosini had disclosed two expert witnesses, Dr. Allen S. Goldman and Dr. Brian Strom. Neither Goldman nor Strom bothered to identify the papers, studies, data, or methodology used in arriving at an opinion on causation. Not surprisingly, the district judge was unimpressed with their opposition, and granted summary judgment for the defendant. Ambrosini v. Labarraque, 966 F.2d 1462, 1466 (D.C. Cir. 1992).

The plaintiffs appealed on the remarkable ground that Goldman’s and Strom’s crypto-evidence satisfied Federal Rule of Evidence 703. Even more remarkably, the Circuit, in a strikingly unscholarly opinion by Judge Mikva, opined that disclosure of relied-upon studies was not required for expert witnesses under Rules 703 and 705. Judge Mikva seemed to forget that the opinions being challenged were not given in testimony, but in (late-filed) affidavits that had to satisfy the requirement of Federal Rule of Civil Procedure 26. Id. at 1468-69. At trial, an expert witness may express an opinion without identifying its bases, but of course the adverse party may compel disclosure of those bases. In discovery, the proffered expert witness must supply all opinions and evidence relied upon in reach the opinions. In any event, the Circuit remanded the case for a hearing and further proceedings, at which the two challenged expert witnesses, Goldman and Strom, would have to identify the bases of their opinions. Id. at 1471.

Not long after the case landed back in the district court, the Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). With an order to produce entered, plaintiffs’ counsel could no longer hide Goldman and Strom’s evidentiary bases, and their scientific inferences came under judicial scrutiny.

Upjohn moved again to exclude Goldman and Strom’s opinions. The district court upheld Upjohn’s challenges, and granted summary judgment in favor of Upjohn for the second time. The Ambrosinis appealed again, but the second case in the D.C. Circuit resulted in a split decision, with the majority holding that the exclusion of Goldman and Strom’s opinions under Rule 702 was erroneous. Ambrosini v. Labarraque, 101 F.3d 129 (D.C. Cir. 1996).

Although issued two decades ago, the majority’s opinion remains noteworthy as an example of judicial resistance to the existence and meaning of the Supreme Court’s Daubert opinion. The majority opinion uncritically cited the notorious Ferebee6 and other pre-Daubert decisions. The court embraced the Daubert dictum about gatekeeping being limited to methodologic consideration, and then proceeded to interpret methodology as superficially as necessary to sustain admissibility. If an expert witness claimed to have looked at epidemiologic studies, and epidemiology was an accepted methodology, then the opinion of the expert witness must satisfy the legal requirements of Daubert, or so it would seem from the opinion of the U.S. Court of Appeals for the District of Columbia.

Despite the majority’s hand waving, a careful reader will discern that there must have been substantial gaps and omissions in the explanations and evidence cited by plaintiffs’ expert witnesses. Seeing anything clearly in the Circuit’s opinion is made difficult, however, by careless and imprecise language, such as its descriptions of studies as showing, or not showing “causation,” when it could have meant only that such studies showed associations, with more or less random and systematic error.

Dr. Strom’s report addressed only general causation, and even so, he apparently did not address general causation of the specific malformations manifested by the plaintiffs’ child. Strom claimed to have relied upon the “totality of the data,” but his methodologic approach seems to have required him to dismiss studies that failed to show an association.

Dr. Strom first set forth the reasoning he employed that led him to disagree with those studies finding no causal relationship [sic] between progestins and birth defects like Teresa’s. He explained that an epidemiologist evaluates studies based on their ‘statistical power’. Statistical power, he continued, represents the ability of a study, based on its sample size, to detect a causal relationship. Conventionally, in order to be considered meaningful, negative studies, that is, those which allege the absence of a causal relationship, must have at least an 80 to 90 percent chance of detecting a causal link if such a link exists; otherwise, the studies cannot be considered conclusive. Based on sample sizes too small to be reliable, the negative studies at issue, Dr. Strom explained, lacked sufficient statistical power to be considered conclusive.”

Id. at 1367.

Putting aside the problem of suggesting that an observational study detects a “causal relationship,” as opposed to an association in need of further causal evaluation, the Court’s précis of Strom’s testimony on power is troublesome, and typical of how other courts have misunderstood and misapplied the concept of statistical power. Statistical power is a probability of observing an association of a specified size at a specified level of statistical significance. The calculation of statistical power turns indeed on sample size, the level of significance probability preselected for “statistical significance, an assumed probability distribution of the sample, and, critically, an alternative hypothesis. Without a specified alternative hypothesis, the notion of statistical power is meaningless, regardless of what probability (80% or 90% or some other percentage) is sought for finding the alternative hypothesis. Furthermore, the notion that the defense must adduce studies with “sufficient statistical power to be considered conclusive” creates an unscientific standard that can never be met, while subverting the law’s requirement that the claimant establish causation.

The suggestion that the studies that failed to find an association cannot be considered conclusive because they “lacked sufficient statistical power” is troublesome because it distorts and misapplies the very notion of statistical power. No attempt was made to describe the confidence intervals surrounding the point estimates of the null studies; nor was there any discussion whether the studies could be aggregated to increase their power to rule out meaningful associations.

The Circuit court’s scientific jurisprudence was thus seriously flawed. Without a discussion of the end points observed, the relevant point estimates of risk ratios, and the confidence intervals, the reader cannot assess the strength of the claims made by Goldman and Strom, or by defense expert Simpson, in their reports. Without identifying the study endpoints, the reader cannot evaluate whether the plaintiffs’ expert witnesses relied upon relevant outcomes in formulating their opinions. The court viewed the subject matter from 30,000 feet, passing over at 600 mph, without engagement or care. A strong dissent, however, suggested serious mischaracterizations of the plaintiffs’ evidence by the majority.

The only specific causation testimony to support plaintiff’s claims came from Goldman, in what appears to have been a “differential etiology.” Goldman purported to rule out a genetic cause, even though he had not conducted a critical family history or ordered a state-of-the-art chromosomal study. Id. at 140. Of course, nothing in a differential etiology approach would allow a physician to rule out “unknown” causes, which, for birth defects, make up the most prevalent and likely causes to explain any particular case. The majority acknowledged that these were short comings, but rhetorically characterized them as substantive, not methodologic, and therefore as issues for cross-examination, not for consideration by a judicial gatekeeping. All this is magical thinking, but it continues to infect judicial approaches to specific causation. See, e.g., Green Mountain Chrysler Plymouth Dodge Jeep v. Crombie, 508 F. Supp. 2d 295, 311 (D.Vt. 2007) (citing Ambrosini for the proposition that “the possibility of uneliminated causes goes to weight rather than admissibility, provided that the expert has considered and reasonably ruled out the most obvious”). In Ambrosini, however, Dr. Goldman had not ruled out much of anything.

Circuit Judge Karen LeCraft Henderson dissented in a short, but pointed opinion that carefully marshaled the record evidence. Drs. Goldman and Strom had relied upon a study by Greenberg and Matsunaga, whose data failed to show a statistically significant association between MPA and cleft lip and palate, when the crucial issue of timing of exposure was taken into consideration. Ambrosini, 101 F.3d at 142.

Beyond the specific claims and evidence, Judge Henderson anticipated the subsequent Supreme Court decisions in Joiner, Kumho Tire, and Weisgram, and the year 2000 revision of Rule 702, in noting that the majority’s acceptance of glib claims to have used a “traditional methodology” would render Daubert nugatory. Id. at 143-45 (characterizing Strom and Goldman’s methodologies as “wispish”). Even more importantly, Judge Henderson refused to indulge the assumption that somehow the length of Goldman’s C.V. substituted for evidence that his methods satisfied the legal (or scientific) standard of reliability. Id.

The good news is that little or nothing in Ambrosini survives the 2000 amendment to Rule 702. The bad news is that not all federal judges seem to have noticed, and that some commentators continue to cite the case, as lovely.

Probably no commentator has promiscuously embraced Ambrosini as warmly as Carl Cranor, a philosopher, and occasional expert witness for the lawsuit industry, in several publications and presentations.8 Cranor has been particularly enthusiastic about Ambrosini’s approval of expert witness’s testimony that failed to address “the relative risk between exposed and unexposed populations of cleft lip and palate, or any other of the birth defects from which [the child] suffers,” as well as differential etiologies that exclude nothing.9 Somehow Cranor, as did the majority in Ambrosini, believes that testimony that fails to identify the magnitude of the point estimate of relative risk can “assist the trier of fact to understand the evidence or to determine a fact in issue.”10 Of course, without that magnitude given, the trier of fact could not evaluate the strength of the alleged association; nor could the trier assess the probability of individual causation to the plaintiff. Cranor also has written approvingly of lumping unrelated end points, which defeats the assessment of biological plausibility and coherence by the trier of fact. When the defense expert witness in Ambrosini adverted to the point estimates for relevant end points, the majority, with Cranor’s approval, rejected the null findings as “too small to be significant.”11 If the null studies were, in fact, too small to be useful tests of the plaintiffs’ claims, intellectual and scientific honesty required an acknowledgement that the evidentiary display was not one from which a reasonable scientist would draw a causal conclusion.

1Ambrosini v. Labarraque, 101 F.3d 129, 138-39 (D.C. Cir. 1996) (citing and applying Ferebee), cert. dismissed sub nom. Upjohn Co. v. Ambrosini, 117 S.Ct. 1572 (1997) See also David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89Notre Dame L. Rev. 27, 31 (2013).

2 S. Prahalada, E. Carroad, M. Cukierski, and A.G. Hendrickx, “Embryotoxicity of a single dose of medroxyprogesterone acetate (MPA) and maternal serum MPA concentrations in cynomolgus monkey (Macaca fascicularis),” 32 Teratology 421 (1985).

3 S. Prahalada, E. Carroad, and A.G. Hendrick, “Embryotoxicity and maternal serum concentrations of medroxyprogesterone acetate (MPA) in baboons (Papio cynocephalus),” 32 Contraception 497 (1985).

4 See, e.g., Z. Katz, M. Lancet, J. Skornik, J. Chemke, B.M. Mogilner, and M. Klinberg, “Teratogenicity of progestogens given during the first trimester of pregnancy,” 65 Obstet Gynecol. 775 (1985); J.L. Yovich, S.R. Turner, and R. Draper, “Medroxyprogesterone acetate therapy in early pregnancy has no apparent fetal effects,” 38 Teratology 135 (1988).

5 G. Saccone, C. Schoen, J.M. Franasiak, R.T. Scott, and V. Berghella, “Supplementation with progestogens in the first trimester of pregnancy to prevent miscarriage in women with unexplained recurrent miscarriage: a systematic review and meta-analysis of randomized, controlled trials,” 107 Fertil. Steril. 430 (2017).

6 Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1535 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).

7 Dr. Strom was also quoted as having provided a misleading definition of statistical significance: “whether there is a statistically significant finding at greater than 95 percent chance that it’s not due to random error.” Ambrosini at 101 F.3d at 136. Given the majority’s inadequate description of the record, the description of witness testimony may not be accurate, and error cannot properly be allocated.

8 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320, 327-28 (2006); see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

9 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320 (2006).

10 Id.

11 Id. ; see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

Ferebee Revisited

December 28th, 2017

The following post was originally published on November 8, 2012, but was hacked, no doubt by the lawsuit industry, and replaced with mindless fluff as is its wont. It is now restored.

Ferebee Revisited

I used to think of the infamous Ferebee decision as the Dred Scott decision of scientific evidence; in essence, declaring that science has no validity issues that the law is bound to respect. Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984). The rhetoric on expert witnesses, from the district and circuit courts in this case is sometimes jarring, but the facts of the case make the holding, rather than the expansive dicta, not so unreasonable, under all the facts and circumstances of the case.

On rereading Ferebee, I was struck by several aspects of the case that rarely are discussed when Ferebee is cited. On sober second thought, Ferebee may not be such a bad decision, especially considering that it has no continuing validity as a rule of decision for expert witness admissibility in federal court.

1. Ferebee is a government negligence case.

The plaintiff worked for the federal government when he was exposed to the herbicide paraquat. Richard Ferebee began working for the Department of Agriculture’s Beltsville Agricultural Research Center (BARC), in Beltsville, Maryland. He started spraying paraquat in the summer of 1977, and used the herbicide regularly through the time he was diagnosed with pulmonary fibrosis, in November 1979. 736 F.2d at 1531-32. Ferebee brought a failure to warn claim against the supplier of paraquat, Chevron Chemical Company. The allegations of actual or constructive knowledge of a hazard, however, could just as readily be asserted against the federal government, which owned the BARC facility, employed Ferebee, controlled and supervised his use of paraquat, and failed to comply with Chevron’s instructions. The federal government further regulated the sale and use of paraquat extensively, first by the Department of Agriculture, and later by the Environmental Protection Agency. Id. at 1532.

2. The exposure.

Ferebee filed suit in 1981, he died in 1982. His case was tried twice. In the first trial, the jury deadlocked; in the second trial, the jury returned a verdict in favor of his estate, and for his family, for $60,000. In his deposition testimony, Ferebee described how sprayed paraquat, in the summer of 1977. The chemical was diluted for use, per Chevron’s instructions. There was no evidence that Ferebee ever had direct contact with undiluted paraquat, or that the paraquat he was exposed to was not diluted according to the proportions recommended on Chevron’s label. 552 F. Supp. at 1295 & n. 3. Ferebee frequently got the chemical on his hands. 552 F. Supp. at 1294-95. Ferebee further described an occasion when he was drenched with paraquat when he walked behind a tractor that was spraying the chemical, and another incident when he used a defective sprayer that leaked paraquat “all over his pants.” 736 F.2d at 1532. On both occasions, Ferebee did not wash, and apparently went home contaminated, where he fell asleep, tired and dizzy, without showering. Id. As we will see, the exposure that Ferebee described would not have occurred had his federal employer followed the instructions on the label that it mandated. In 1978, the federal Occupational Health & Safety Administration published Guidelines on the need for protective clothing, respirators, immediate washing of contaminated skin, etc. Ferebee’s federal employer recklessly disregarded its own guidelines.

3. The warnings.

Paraquat could be sold in the United States only when labeled in accordance with EPA regulations, promulgated pursuant to the Federal Insecticide, Fungicide, and Rodenticide Act, 7 U.S.C. § 136, et seq. (FIFRA) The statute bars EPA from allowing sale of regulated herbicides, such as paraquat, unless the chemicals, as labeled, will not cause “unreasonable adverse effects on the environment.” 7 U.S.C. § 136a(c)(5)(C). Such effects are in turn defined as any unreasonable risk to man or the environment, taking into account the economic, social, and environmental costs and benefits of the use of [the] pesticide. 7 U.S.C. § 136(bb). FIFRA further requires the EPA to require labeling that is “adequate to protect health and the environment” and that is “likely to be read and understood.” 7 U.S.C. § 136(q)(1)(E). 736 F.2d at 1539-40.

Unfortunately, the courts failed to provide the complete warning label and the material data safety sheets. There are “snippets” provided, which make clear that the federal government was largely to blame for failing to comply with the directions required under FIFRA. For instance, the district court, in a footnote, acknowledged:

“For example, the label advised the user spraying paraquat to wear waterproof clothing and goggles, to avoid working in spray mist, and to wash splashes on the skin or eyes immediately with water.”

552. F. Supp. at 1304 n.40. The Court of Appeals reported that “the label, in large bold letters states:




736 F.2d at 1536. The label also informed users to wash any exposed areas immediately, and to remove contaminated clothing. Id.

4. The Stipulation.

A key fact, rarely described or explained in discussions of the Ferebee case, is the parties’ stipulation

“that Mr. Ferebee’s only significant exposure to paraquat was on his intact skin; i.e., there was no evidence that Mr. Ferebee swallowed or inhaled paraquat, or that he spilled or sprayed it on an area of his skin upon which he had any apparent cuts or scrapes. The jury was not, of course, precluded from concluding that a person engaged in Mr. Ferebee’s line of work could have had some, or even many, minor cuts or abrasions not readily discernible to the naked eye or likely to be remembered some time later.”

552. F. Supp. at 1295 & n. 3.

Why did the plaintiffs try to present their case solely as a dermal exposure cases? As we will see, this stratagem made their medical causation case more difficult, but it avoided serious misuse and lack of proximate cause issues. Ferebee had been instructed by his co-workers and supervisors that paraquat was extremely dangerous if swallowed, and probably also if inhaled. The warning label was unequivocal in detailing the dangers and the need to avoid ingestion. (Without the full label, it is difficult to evaluate how well the label warned against inhalation, but the 1978 OSHA guidelines address the use of a proper respirator for situations in which paraquat may be inhaled.) On the other hand, the label had a weakness, which could be exploited, as long as the preemption defense could be held at bay: the label urged protective clothing, goggles, and immediate washing of contaminated skin, but it failed to describe the consequence of dermal exposure other than irritation. Ferebee could thus avoid his culpable conduct, as well as a sophisticated intermediary defense, by claiming that his exposure was only dermal.

Why did Chevron agree to the stipulation? The defendant probably felt sanguine about its preemption defense, and thus also about the adequacy of its warnings overall. The stipulation limited the plaintiff’s medical causation case to a route of exposure that put it into an arguable “first instance” case report. Chevron stood to gain a claim of “lack of notice,” and thus lack of actual or constructive knowledge of the risk of lung disease from dilute dermal exposure. The clinical presentation itself differed from many of the cases of known paraquat poisoning, see infra, and Chevron probably believed that it could deal with the medical causation claim better if exposure was limited to transdermal absorption. Curiously, Chevron did not argue that Ferebee must have had some inhalational exposure, which he almost certainly did. I suspect that Chevron’s position on inhalation was hedged because its warning label did not specify respirator usage for ordinary work exposures of applicators (as opposed to workers who handled undiluted paraquat, worked in confined spaces, etc.).

5. Medical causation

Chevron took a strident position, standing on the fact that there had been no previous documented cases of pulmonary fibrosis in workers exposed to diluted paraquat through their skin. The following facts were uncontroverted:

  • Paraquat causes pulmonary fibrosis in humans.
  • The evidence that established paraquat as a cause of pulmonary fibrosis was largely case series of acute onset of pulmonary fibrosis after ingestion.
  • Paraquat induces pulmonary fibrosis relatively rapidly.
  • Paraquat can be absorbed through the skin.
  • The parties agreed that any type of exposure – ingestion, inhalation, or dermal absorption – could cause lung damage. 552. F. Supp. at 1300 & n.28.
  • Once paraquat is ingested, inhaled, or absorbed, it can travel to the lungs.
  • Lung fibrosis caused by dermal absorption of paraquat had been described previously only with skin lesions before or after the injury. 736 F.2d at 1538.
  • The lungs are the target organ for paraquat.
  • There are numerous causes of pulmonary fibrosis (such as asbestosis, scleroderma, rheumatoid arthritis, etc.).
  • The variants of pulmonary fibrosis do not all look alike, present alike, or progress alike.
  • Mr. Ferebee had no known other disease or exposure that could account for his pulmonary fibrosis.
  • There is are cases of pulmonary fibrosis with no identifiable cause, known as idiopathic pulmonary fibrosis (IPF).
  • IPF is relatively rare; it too has a rapid onset and progression, although not as fast as the cases described after exposure to undiluted paraquat.
  • Mr. Ferebee’s medical history was largely unhelpful in explaining his clinical course.
  • Ferebee had some shortness of breath before starting to use paraquat. 552. F. Supp. at 1295.
  • Ferebee used paraquat occasionally over three years before he was diagnosed with pulmonary fibrosis.

Some observations about these facts. General causation in a sense was not contested. Paraquat causes pulmonary fibrosis. The issue was whether dilute dermal exposure over three years causes pulmonary fibrosis. Chevron stridently asserted that the “scientific method” required controlled experimental or observational (epidemiologic) studies. The problem with Chevron’s position was that general causation had already been established, and not by analytical epidemiologic studies.

6. The expert witnesses.

Ferebee was initially treated by Dr. Muhammed Yusuf, a pulmonary specialist, who diagnosed pulmonary fibrosis. Dr. Yusef referred Ferebee to the National Institutes of Health (NIH), where he came under the care of Dr. Ronald G. Crystal of the Heart, Lung, and Blood Institute. (Dr. Crystal is now at Cornell-Weill, where he is Chairman of Genetic Medicine, and he practices pulmonary medicine.)

Chevron called Dr. Carrington, who diagnosed Ferebee with IPF. Dr. Carrington challenged the plaintiffs’ expert witnesses’ opinions for lacking reliance upon controlled observational or experimental studies. 552. F. Supp. at 1301. Dr. Carrington, however, acknowledged that dermal cases are too rare for observational epidemiologic analysis, but emphasized that no animal studies of sufficient size had been done to support plaintiffs’ hypothesis. Chevron also called a Dr. Fisher, who presented a toxicokinetic (TK) analysis of Ferebee’s dermal absorption. Based upon his TK analysis, Dr. Fisher concluded that the maximal amount of paraquat absorbed by Ferebee was too small, based upon known cases and animal studies, to have caused paraquat toxicity. Id.

7. Chevron’s challenge to plaintiffs’ expert witnesses’ causation opinion.

None of the defendant’s expert witnesses examined Ferebee. The courts thought this was relevant, but they never articulated what would have been observed on physical examination that was important to resolving the differential diagnosis of paraquat toxicity versus IPF. There was no dispute that Ferebee had rapidly progressing pulmonary fibrosis. The expert witnesses on both sides evaluated Ferebee’s clinical data, presentation, clinical course, and arrived at different diagnoses. The plaintiffs’ expert witnesses’ diagnosis, however, involved a causal attribution to paraquat exposure.

The Ferebee case was litigated under Maryland law because federal statutory law requires state law to control in a wrongful death action arising out of the neglect or wrongful act of another on a federal enclave. 16 U.S.C. § 457. 736 F.2d at 1533. (Maryland law is actually favorable to a sophisticated intermediary defense, although the key decisions post-date Ferebee.) Chevron appears to have relied upon Maryland’s articulation of the Frye general acceptance doctrine, and the courts analyzed Chevron’s arguments as a Frye challenge. 552 F. Supp. at 1301; 736 F.2d at 1535. Although the use of Maryland law to determine an evidentiary issue seems suspect, Chevron pressed apparently pressed its challenge in terms of Maryland’s version of Frye, and not based upon Federal Rule of Evidence 702. The infamous language used by both the district and the circuit courts was, therefore, not an interpretation of federal law. Rule 702 was never cited or discussed in either the trial or the appellate court’s opinion.

My re-reading of Ferebee has softened my criticisms of state courts that had relied upon the case, even after the Supreme Court’s decision in Daubert. Softened but not eliminated my criticism — Ferebee is still a case largely confined to its facts, and the language quoted as a standard of admissibility is really a statement of the appellate standard of review for the jury’s determination of medical causation.

8. The judicial resolution of Chevron’s Frye challenge

The district court insightfully recognized that Chevron was demanding a level of evidence, which had never been required to establish paraquat’s generally accepted ability to cause pulmonary fibrosis. This recognition led to the district court’s colorful language:

“It is true that medical expert testimony must be grounded in proper scientific methodology, but the extremely stringent standard that defendant suggests is beyond reason. Product liability law, especially as it relates to relatively new products or those with a relatively rare yet significant danger, would be rendered next to meaningless if a plaintiff could prove he was injured by a product only after a ‘statistically significant’ number of other people were also injured. A civilized legal system does not require that much human sacrifice before it can intervene. The fact that this is the first case of this exact type-or at least the first of its exact type in which the involvement of paraquat was discovered by alert doctors — cannot be enough by itself to shield defendant from liability. Defendant’s experts were not able to fault Dr. Crystal for his basic diagnostic methodology; in fact, they used the same kinds of test results, consultations, and other tools that he did. What they disagreed with chiefly were his conclusions.”

552 F. Supp. at 1301. The important observation is that general causation had been established case series and reports of human exposure. There never was statistical evidence that had been evaluated for “significance,” to establish general causation for undiluted paraquat, and the trial court refused, under Maryland law, to require such evidence for general causation for diluted paraquat. In this context, we can see that the trial court’s suggestion that statistical significance was not required has little bearing upon, cases in which general causation could only be established using epidemiologic evidence, with its attendant statistical inferences.

Of course, the matter only became worse when Chevron persisted in its argument and presented it to a liberal panel of the D.C. Circuit. (Judge Mikva wrote the opinion for a panel that included Judge Wald, and Senior Judge Bazelon.) The panel’s decision ratcheted up the rhetoric:

“Thus, a cause-effect relationship need not be clearly established by animal or epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists. As long as the basic methodology employed to reach such a conclusion is sound, such as use of tissue samples, standard tests, and patient examination, product liability does not preclude recovery until a ‘statistically significant’ number of people have been injured or until science has had the time and resources to complete sophisticated laboratory studies of the chemical. In a courtroom, the test for allowing a plaintiff to recover is not scientific certainty, but legal sufficiency; if reasonable jurors could conclude from the expert testimony that paraquat more likely than not caused Ferebee’s injury, the fact that another jury might reach the opposite conclusion or that science would require more evidence before conclusively considering the causation question resolved is irrelevant. That Ferebee’s case may have been the first of its exact type, or that his doctors may have been the first alert enough to recognize such a case, does not mean that the testimony of those doctors, who are concededly well qualified in their fields, should not have been admitted.”

736 F.2d at 1535-36 (emphasis in original).

Again, the dismissive attitude towards statistically significant evidence is limited to the context of a causal analysis that had been made, to everyone’s satisfaction, for undiluted paraquat, without the need for epidemiologic, statistical evidence. Statistical significance was never at issue. In this way, Ferebee resembles the untoward language on statistical significance from Matrixx Initiatives Inc. v. Siracusano. In both cases, statistical significance was never really at issue. In Ferebee, there was no statistical evidence needed or used to reach causal conclusions about paraquat’s ability to induce pulmonary fibrosis. In Matrixx Initiatives, allegations of statistical significance and causation were not necessary because the plaintiffs needed only to allege materiality of the facts suppressed by the company in order to plead a securities fraud case. Materiality could be established without causation, and thus neither causation nor statistical significance needed to be alleged.

As for Chevron’s Frye challenge, the district court rejected the implied call for a vote on the general acceptance of Dr. Crystal’s reasoning. Frye may require “vote counting” of some sort, but the process becomes irrelevant when virtually no one has registered to vote. Otherwise, the defense and the plaintiffs’ expert witnesses appeared to be using the same technique of arguing by analogy to accepted cases of paraquat poisoning or IPF. Dr. Crystal opined that Ferebee’s case was “similar” to three other cases he had identified. Dr. Carrington argued that Ferebee’s case was more like IPF cases, although IPF cases themselves have some clinical heterogeneity as well. Paraquat cases described onset to death as a very rapid process. Ferebee did not present with significant symptoms for three years after his first exposure, and then he survived for another two plus years. Ferebee did not report skin lesions, which had been reported in previous cases of dermal exposure leading up to pulmonary fibrosis. The case presented, on the diagnostic level, a difficult call, but it is easy to see the courts’ impatience with the defendant’s insistence upon more stringent criteria and evidence than was used to establish the causal connection with undiluted paraquat.

9. Expert witness qualifications.

Chevron never challenged Dr. Yusuf’s or Dr. Crystal’s qualifications. The oft-quoted comments about expert witness qualifications were made in the context of describing the appellate court’s standard of review, and the court’s role in not assessing credibility or weighing the evidence:

“These admonitions apply with special force in the context of the present action, in which an admittedly dangerous chemical is alleged through long-term exposure to have caused disease. Judges, both trial and appellate, have no special competence to resolve the complex and refractory causal issues raised by the attempt to link low-level exposure to toxic chemicals with human disease. On questions such as these, which stand at the frontier of current medical and epidemiological inquiry, if experts are willing to testify that such a link exists, it is for the jury to decide whether to credit such testimony.”

736 F.2d at 1534.

This procedural posture is obviously very different from the initial determination of admissibility. As far as credentials are concerned, Drs. Yusuf and Crystal were hardly “hired guns”; both physicians were well qualified. Dr. Crystal had outstanding qualifications, and Chevron wisely never challenged them. Remarkably, this language has been mistakenly invoked as a standard for trial courts to use in determining the admissibility of expert witness opinion testimony. It is no such thing.

10. Preemption and Warnings Causation.

Ultimately, Chevron’s preemption defense was rejected by both the district and the circuit court. FIFRA preemption has had its ups and downs; no surprise there. More interesting is the emphasis that both courts gave to the important role of the employer in the case. The evidence overwhelming showed that Ferebee had never read the warning label, and thus the element of proximate causation between allegedly inadequate warning and harm was in jeopardy of going unproved. The courts, however, emphasized the role that the employer, through its supervisors and responsible co-workers, play in the complex organizational situation of a modern workplace:

“Mr. Ferebee’s situation was quite different, however. He did not purchase paraquat for his personal use; rather, it was provided to him by his employer for use on the job. The evidence showed that his principal source of information about paraquat was the oral instructions of his supervisors and co-workers, not the written label. He learned from them how to mix the product and how to spray it. It was also from this source that he learned of the danger of getting the product in his mouth: one of his co-workers warned him that if he accidently swallowed paraquat, it would ‘get in his blood’ and poison him. This is a common pattern of instruction and use of occupational materials in the workplace. Learning by doing and learning by oral instruction are tried and true methods of educating manual workers in their jobs. Therefore, although it is crucial to plaintiff’s case that someone would have read the label, it was not necessary for Mr. Ferebee to have done so. And it is obvious that one or more employees at BARC did read the label, since information did reach Mr. Ferebee about the proportions for diluting the product and about the dangers about which the label did warn. It was appropriate for the jury to infer that a warning about the danger of fatal lung disease from dermal exposure would also have been communicated to Mr. Ferebee. See Restatement (Second) of Torts § 388 comment n (seller normally entitled to assume that adequate warning will be passed on by purchaser to ultimate user); cf. Chambers v. G.D. Searle & Co., 441 F.Supp. at 381 (in product liability case involving prescription drug, relevant warning is the one given to doctor, not patient).”

552 F. Supp. at 1303-04 (internal citations omitted). So here we have Ferebee, the subject of so much derision and aspersion from defense counsel, embracing the Section 388, comment n, as well as applying learned intermediary principles to a case not involving prescription drugs. The appellate court was waxed enthusiastic about the principles of Section 388, and went so far as to cite Victor Schwartz in support:

“We live in an organizational society in which traditional common-law limitations on an actor’s duty must give way to the realities of society. *** In this case, Mr. Ferebee did not purchase the paraquat for his personal use, and there was substantial evidence that workplace communication about the dangers associated with various chemicals usually took the form of oral instructions from supervisors to workers, the latter of whom then retransmitted the information to co-workers. This, rather than individual reading of product warnings, is a typical method by which information is disseminated in the modern workplace. See Schwartz & Driver, “Warnings in the Workplace: The Need for a Synthesis of Law and Communication Theory,” 52 U. Cinn. L. Rev. 38, 66-83 (1983). The requirement that an improper warning proximately ‘cause’ the injury should be elaborated against this background. We believe Maryland would construe its tort law in this case to require only that someone in the workplace have read the label, not that Mr. Ferebee personally have read it. Because there is no dispute that one or more employees at BARC did read the label, we hold that the jury could properly have inferred that, had a warning about the danger of disease from dermal exposure been included on the label, that warning would have been communicated to Mr. Ferebee and that he would as a result have acted differently. Alternatively, the jury could have inferred that an adequate warning would have led Ferebee’s employers to undertake steps that would have protected him from paraquat poisoning-for example, provision of showers for use after spraying.”

736 F.2d at 1539 (emphasis in original; internal citation omitted). Judge Mikva’s prediction, of course, was absolutely accurate; Maryland tort law did, soon thereafter, embrace the sophisticated intermediary defense to exculpate the defendant in such remote supplier situations. See, e.g., Kennedy v. Mobay Corp., 84 Md. App. 397 (1990) (applying sophisticated user defense to bar claims against manufacturers of toluene diisocyanate), aff’d, 325 Md. 385 (1992); Higgins v. E.I. DuPont de Nemours, Inc., 671 F. Supp. 1055 (D. Md. 1987) (Maryland law; holding that manufacturer of paint was in better position than bulk supplier to communicate warnings to customers’ employees), aff’d, 863 F.2d 1162 (4th Cir. 1988). The principle invoked to excuse plaintiff from reading the warning label also works to exculpate the defendant when that warning label is otherwise adequate, or when the intermediary knows of the hazard in any event.

Some High-Value Targets for Sander Greenland in 2018

December 27th, 2017

A couple of years ago, Sander Greenland and I had an interesting exchange on Deborah Mayo’s website. I tweaked Sander for his practice of calling out defense expert witnesses for statistical errors, while ignoring whoopers made by plaintiffs’ expert witnesses. SeeSignificance Levels Made a Whipping Boy on Climate-Change Evidence: Is p < 0.05 Too Strict?” Error Statistics (Jan. 6, 2015).1 Sander acknowledged that he received a biased sample of expert reports through his service as a plaintiffs’ expert witness, but protested that defense counsel avoided him like the plague. In an effort to be helpful, I directed Sander to an example of bad statistical analysis that had been proffered by Dr Bennett Omalu, in a Dursban case, Pritchard v. Dow Agro Sciences, 705 F. Supp. 2d 471 (W.D. Pa. 2010), aff’d, 430 F. App’x 102, 104 (3d Cir. 2011).2

Sander was unimpressed with my example of Dr. Omalu; he found the example “a bit disappointing though because [Omalu] was merely a county medical examiner, and his junk analysis was duly struck. The expert I quoted in my citations was a full professor of biostatistics at a major public university, a Fellow of the American Statistical Association, a holder of large NIH grants, and his analysis (more subtle in its transgressions) was admitted” (emphasis added). Sander expressed an interest in finding “examples involving similarly well-credentialed, professionally accomplished plaintiff experts whose testimony was likewise admitted… .”

Although it was heartening to read Sander’s concurrence in the assessment of Omalu’s analysis as “junk,” Sander’s rejection of Dr. Omalu as merely a low-value target was disappointing, given that Omalu also has a master’s degree in public health, from the University of Pittsburgh, where he claims he studied with Professor Lew Kuller. Omalu has also gained some fame and notoriety for his claim to have identified the problem of chronic traumatic encephalopathy (CTE) among professional football players. After all, even Sander Greenland has not been the subject of a feature-length movie (Concussion), as has Omalu.

I lost track of our exchange in 2015, until recently I was reminded of it when reading an expert report by Professor Martin Wells. Unlike Omalu, Wells meets all the Greenland criteria for high-value targets. He is not only a full, chaired professor but also the statistics department chairman at an ivy-league school, Cornell University. Wells is a fellow of both the American Statistical Association and the Royal Statistical Society, but most important, Wells is a frequent plaintiffs’ expert witness, who is well known to Sander Greenland. Both Wells and Greenland served, side by side, as plaintiffs’ expert witnesses in the pain pump litigation.

So here is the passage in the Wells’ report that is worthy of Greenland’s attention:

If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”

In re Testosterone Replacement Therapy Prods. Liab. Litig., Declaration of Martin T. Wells, Ph.D., at 2-3 (N.D. Ill., Oct. 30, 2016). Unlike the Dursban litigation involving Bennett Omalu, where the “junk analysis” was excluded, in the litigation against AbbVie for its manufacture and selling of prescription testosterone supplementation, Wells’ opinions were not excluded or limited. In re Testosterone Replacement Therapy Prods. Liab. Litig., No. 14 C 1748, MDL No. 2545, 2017 WL 1833173 (N.D. Ill. May 8, 2017) (denying Rule 702 motions).

Now this statement by Wells surely offends the guidance provided by Greenland and colleagues.3 And it was exactly the sort of misrepresentation that led to a confabulation of the American Statistical Association, and that Association’s consensus statement on statistical significance.4

And here is another example, which occurs not in a distorting litigation forum, but on the pages of an occupational health journal, where the editor in chief, Anthony L. Kiorpes, ranted about the need for better statistical editing and writing in his own journal. See Anthony L Kiorpes, “Lies, damned lies, and statistics,” 33 Toxicol. & Indus. Health 885 (2017). Kiorpes decried he misuse of statistics:

I am not implying that it is the intent of the scientists who publish in these pages to mislead readers by their use of statistics, but I submit that the misuse of statistics, whether intentional or otherwise, creates confusion and error.”

Id. at 885. Kiorpes then proceeded to hold himself up as Exhibit A to his screed:

Remember that p values are estimates of the probability that the null hypothesis (no difference) is true.”

Id. Uggh; we seem to be back sliding after the American Statistical Association’s consensus statement.

Almost all scientists have stated (or have been tempted to state) something like ‘the mean of Group A was greater than that of Group B, but the difference was not statistically significant’. With very few exceptions (which I will mention below), this statement is nonsense.”

* * * * *

What the statistics are indicating when the p-value is greater than 0.05 is that there is ‘no difference’ between group A and group B.”

Id. at 886.

Let’s hope that this gets Sander Greenland away from his biased sampling of expert witnesses, off the backs of defense expert witnesses, and on to some of the real culprits out there, in the new year.

See also Sander Greenland on ‘The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics’” (Feb. 8, 2015).

See alsoPritchard v. Dow Agro – Gatekeeping Exemplified” (Aug. 25, 2014); Omalu and Science — A Bad Weld” (Oct. 22, 2016); Brian v. Association of Independent Oil Distributors, No. 2011-3413, Westmoreland Cty. Ct. Common Pleas, Order of July 18, 2016 (excluding Dr. Omalu’s testimony on welding and solvents and Parkinson’s disease).

3 See, e.g., Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

4 Ronald L. Wasserstein & Nicole A. Lazar, “American Statistical Association Statement on statistical significance and p values,” 70 Am. Statistician 129 (2016)