TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Has the American Statistical Association Gone Post-Modern?

March 24th, 2019

Last week, the American Statistical Association (ASA) released a special issue of its journal, The American Statistician, with 43 articles addressing the issue of “statistical significance.” If you are on the ASA’s mailing list, you received an email announcing that

the lead editorial calls for abandoning the use of ‘statistically significant’, and offers much (not just one thing) to replace it. Written by Ron Wasserstein, Allen Schirm, and Nicole Lazar, the co-editors of the special issue, ‘Moving to a World Beyond ‘p < 0.05’ summarizes the content of the issue’s 43 articles.”

In 2016, the ASA issued its “consensus” statement on statistical significance, in which it articulated six principles for interpreting p-values, and for avoiding erroneous interpretations. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016) [ASA Statement] In the final analysis, that ASA Statement really did not change very much, and could be read fairly only to state that statistical significance was not sufficient for causal inference.1 Aside from overzealous, over-claiming lawyers and their expert witnesses, few scientists or statisticians had ever maintained that statistical significance was sufficient to support causal inference. Still, many “health effect claims” involve alleged causation that is really a modification of a base rate of a disease or disorder that happens without the allegedly harmful exposure, and which does not invariably happen even with the exposure. It is hard to imagine drawing an inference of such causation without ruling out random error, as well as bias and confounding.

According to the lead editorial for the special issue:

The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of ‘statistical significance’ be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term ‘statistically significant’ entirely. Nor should variants such as ‘significantly different’, ‘p < 0.05’, and ‘nonsignificant’ survive, whether expressed in words, by asterisks in a table, or in some other way.”2

The ASA (through Wasserstein and colleagues) appear to be condemning dichotomizing p-values, which are a continuum between zero and one. Presumably saying that a p-value is less than 5% is tantamount to dichotomizing, but providing the actual value of the p-value would cause no offense, as long as it was not labeled “significant.”

So although the ASA appears to have gone “whole hog,” the Wasserstein editorial does not appear to condemn assessing random error, or evaluating the extent of random error as part of assessing a study’s support for an association. Reporting p < 0.05 as opposed to p = a real number between zero and one is largely an artifact of statistical tables in the pre-computer era.

So what is the ASA affirmatively recommending? “Much, not just one thing?” Or too much of nothing, which we know makes a man feel ill at ease. Wasserstein’s editorial earnestly admits that there is no replacement for:

the outsized role that statistical significance has come to play. The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research—and in fact it may never do so.”3

The 42 other articles in the special issue certainly do not converge on any unified, coherent response to the perceived crisis. Indeed, a cursory review of the abstracts alone suggests deep disagreements over an appropriate approach to statistical inference. The ASA may claim to be agnostic in the face of the contradictory recommendations, but there is one thing we know for sure: over-reaching litigants and their expert witnesses will exploit the real or apparent chaos in the ASA’s approach. The lack of coherent, consistent guidance will launch a thousand litigation ships, with no epistemic compass.4


2 Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

3 Id. at S2.

4 See, e.g., John P. A. Ioannidis, “Retiring statistical significance would give bias a free pass,” 567 Nature 461 (2019); Valen E. Johnson, “Raise the Bar Rather than Retire Significance,” 567 Nature 461 (2019).

Expert Witnesses Who Don’t Mean What They Say

March 24th, 2019

’Then you should say what you mean’, the March Hare went on.
‘I do’, Alice hastily replied; ‘at least–at least I mean what I say–that’s the same thing, you know’.
‘Not the same thing a bit!’ said the Hatter. ‘You might just as well say that “I see what I eat” is the same thing as “I eat what I see!”’

Lewis Carroll, Alice’s Adventures in Wonderland, Chapter VII (1865)

Anick Bérard is an epidemiologist at the Université de Montréal. Most of her publications involve birth outcomes and maternal medication use, but Dr. Bérard’s advocacy also involves social media (Facebook, YouTube) and expert witnessing in litigation against the pharmaceutical industry.

When the FDA issued its alert about cardiac malformations in children born to women who took Paxil (paroxetine) in their first trimesters of pregnancy, the agency characterized its assessment of the “early results of new studies for Paxil” as “suggesting that the drug increases the risk for birth defects, particularly heart defects, when women take it during the first three months of pregnancy.”1 The agency also disclaimed any conclusion of “class effect” among the other selective serotonin reuptake inhibitors (SSRIs), such as Zoloft (sertraline), Celexa (citalopram), and Prozac (fluoxetine). Indeed, the FDA requested the manufacturer of paroxetine to undertake additional research to look at teratogenicity of paroxetine, as well as the possibility of class effects. That research never showed an SSRI teratogenicity class effect.

A “suggestion” from the FDA of an adverse effect is sufficient to launch a thousand litigation complaints, which were duly filed against GlaxoSmithKline. The plaintiffs’ counsel recruited Dr. Bérard to serve as an expert witness in support of a wide array of birth defects in Paxil cases. In her hands, the agency’s “suggestion” of causation became a conclusion. The defense challenged Bérard’s opinions, but the federal court motion to exclude her causal opinions were taken under advisement, without decision. Hayes v. SmithKline Beecham Corp., 2009 WL 4912178 (N.D. Okla. Dec. 14, 2009). One case in state court went to trial, with a verdict for plaintiffs.

Despite Dr. Bérard;s zealous advocacy for a causal association between Paxil and birth defects, she declined to assert any association between maternal use of the other, non-paroxetine SSRIs and birth defects. Here is an excerpt from her Rule 26 report in a paroxetine case:

Taken together, the available scientific evidence makes it clear that Paxil use during the first trimester of pregnancy is an independent risk factor that at least doubles the risk of cardiovascular malformations in newborns at all commonly used doses. This risk has been consistent and was further reinforced by repeated observational study findings as well as meta-analyses results. No such associations were found with other types of SSRI exposures during gestation.”2

In her sworn testimony, Dr. Bérard made clear that she really meant what she had written in her report, about exculpating the non-paroxetine SSRIs of any association with birth defects:

Q. Is it fair to say that you will not be offering an opinion that SSRIs as a class, or individual SSRIs other than Paxil increased the risk of cardiovascular malformations in newborns?

A. This is not what I was asked to do.

Q. But in fact you actually write in your report that you don’t believe there’s sufficient data to reach any conclusion about other SSRIs, true?

A. Correct.”3

In 2010, Dr. Bérard, along with two professional colleagues, published what they called a systematic review of antidepressant use in pregnancy and birth outcomes.4 In this review, Bérard specifically advised that paroxetine should be avoided by women of childbearing age, but she and her colleagaues affirmatively encouraged use of other SSRIs, such as fluoxetine, sertraline, and citalopram:

Clinical Approach: A Brief Overview

For women planning a pregnancy or when a treatment initiation during pregnancy is deemed necessary, the decision should rely not only on drug safety data but also on other factors such as the patient’s condition, previous response to other antidepressants, comorbidities, expected adverse effects and potential interactions with other current pharmacological treatments. Since there is a more extensive clinical experience with SSRIs such as fluoxetine, sertraline, and citalopram, these agents should be used as first-line therapies. Whenever possible, one should refrain from prescribing paroxetine to women of childbearing potential or planning a pregnancy. However, antenatal screening such as fetal echocardiography should be considered in a woman exposed prior to finding out about her pregnancy.5

When Bérard wrote and published her systematic review, she was still actively involved as an expert witness for plaintiffs in lawsuits against the manufacturers of paroxetine. In her 2010 review, Dr. Bérard gave no acknowledgment of monies earned in her capacity as an expert witness, and her disclosure of potential conflicts of interest was limited to noting that she was “a consultant for a plaintiff in the litigation involving Paxil.”6 In fact, Bérard had submitted multiple reports, testified at deposition, and had been listed as a testifying expert witness in many cases involving Paxil or paroxetine.

Not long after the 2010 review article, Glaxo settled most of the pending paroxetine birth defect cases, and the plaintiffs’ bar pivoted to recast their expert witnesses’ opinions as causal teratogenic conclusions about the entire class of SSRIs. In 2012, the federal courts established a “multi-district litigation,” MDL 2342, for birth defect cases involving Zoloft (sertraline), in the Philadelphia courtroom of Judge Cynthia Rufe, in the Eastern District of Pennsylvania.

Notwithstanding her 2010 clinical advice that pregnant women with depression should use fluoxetine, sertraline, or citalopram, Dr. Bérard became actively involved in the new litigation against the other, non-Paxil SSRI manufacturers. By 2013, Dr. Bérard was on record as a party expert witness for plaintiffs, opining that setraline causes virtually every major congenital malformation.7

In the same year, 2013, Dr. Bérard published another review article on teratogens, but now she gave a more equivocal view of the other SSRIs, claiming that they were “known carcinogens,” but acknowledging in a footnote that teratogenicity of the SSRIs was “controversial.”8 Incredibly, this review article states that “Anick Bérard and Sonia Chaabane have no potential conflicts of interest to disclose.”9

Ultimately, Dr. Bérard could not straddle her own contradictory statements and remain upright, which encouraged the MDL court to examine her opinions closely for methodological shortcomings and failures. Although Bérard had evolved to claim a teratogenic “class effect” for all the SSRIs, the scientific support for her claim was somewhere between weak to absent.10 Perhaps even more distressing, many of the pending claims involving the other SSRIs arose from pregnancies and births that predated Bérard’s epiphany about class effect. Finding ample evidence of specious claiming, the federal court charged with oversight of the sertraline birth defect claims excluded Dr. Bérard’s causal opinions for failing to meet the requirements of Federal Rule of Evidence 702.11

Plaintiffs sought to substitute Nicholas Jewell for Dr. Bérard, but Dr. Jewell fared no better, and was excluded for other methodological shenanigans.12 Ultimately, a unanimous panel of the United States Court of Appeals, for the Third Circuit, upheld the expert witness exclusions.13


1 See “FDA Advising of Risk of Birth Defects with Paxil; Agency Requiring Updated Product Labeling,” P05-97 (Dec. 8, 2005) (emphasis added).

2 Bérard Report in Hayes v. SmithKline Beecham Corp, 2009 WL 3072955, at *4 (N.D. Okla. Feb. 4, 2009) (emphasis added).

3 Deposition Testimony of Anick Bérard, in Hayes v. SmithKline Beecham Corp., at 120:16-25 (N.D. Okla. April 2009).

4 Marieve Simoncelli, Brigitte-Zoe Martin & Anick Bérard, “Antidepressant Use During Pregnancy: A Critical Systematic Review of the Literature,” 5 Current Drug Safety 153 (2010).

5 Id. at 168b.

6 Id. at 169 (emphasis added).

7 See Anick Bérard, “Expert Report” (June 19, 2013).

8 Sonia Chaabanen & Anick Bérard, “Epidemiology of Major Congenital Malformations with Specific Focus on Teratogens,” 8 Current Drug Safety 128, 136 (2013).

9 Id. at 137b.

10 See, e.g., Nicholas Myles, Hannah Newall, Harvey Ward, and Matthew Large, “Systematic meta-analysis of individual selective serotonin reuptake inhibitor medications and congenital malformations,” 47 Australian & New Zealand J. Psychiatry 1002 (2013).

11 See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 26 F.Supp. 3d 449 (E.D.Pa. 2014) (Rufe, J.). Plaintiffs, through their Plaintiffs’ Steering Committee, moved for reconsideration, but Judge Rufe reaffirmed her exclusion of Dr. Bérard. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration). See Zoloft MDL Relieves Matrixx Depression” (Jan. 30, 2015).

12 See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed); In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). See alsoThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

The Contrivance Standard for Gatekeeping

March 23rd, 2019

According to Google ngram, the phrase “junk science” made its debut circa 1975, lagging junk food by about five years. SeeThe Rise and Rise of Junk Science” (Mar. 8, 2014). I have never much like the phrase “junk science” because it suggests that courts need only be wary of the absurd and ridiculous in their gatekeeping function. Some expert witness opinions are, in fact, serious scientific contributions, just not worthy of being advanced as scientific conclusions. Perhaps better than “junk” would be patho-epistemologic opinions, or maybe even wissenschmutz, but even these terms might obscure that the opinion that needs to be excluded derives from serious scientific, only it is not ready to be held forth as a scientific conclusion that can be colorably called knowledge.

Another formulation of my term, patho-epistemology, is the Eleventh Circuit’s lovely “Contrivance Standard.” Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 & n.7 (11th Cir. 2005). In Rink, the appellate court held that the district court had acted within its discretion to exclude expert witness testimony because it had properly confined its focus to the challenged expert witness’s methodology, not his credibility:

“In evaluating the reliability of an expert’s method, however, a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result. See Joiner, 522 U.S. at 146, 118 S.Ct. at 519 (affirming exclusion of testimony where the methodology was called into question because an “analytical gap” existed “between the data and the opinion proffered”); see also Elcock v. Kmart Corp., 233 F.3d 734, 748 (3d Cir. 2000) (questioning the methodology of an expert because his “novel synthesis” of two accepted methodologies allowed the expert to ”offer a subjective judgment … in the guise of a reliable expert opinion”).”

Note the resistance, however, to the Supreme Court’s mandate of gatekeeping. District courts must apply the statutes, Rule of Evidence 702 and 703. There is no legal authority for the suggestion that a district court “may properly consider wither the expert’s methodology has been contrived.” Rink, 400 F.3d at 1293 n.7 (emphasis added).

The Joiner Finale

March 23rd, 2019

“This is the end
Beautiful friend

This is the end
My only friend, the end”

Jim Morrison, “The End” (c. 1966)


The General Electric Co. v. Joiner, 522 U.S. 136 (1997), case was based upon polychlorinated biphenyl exposures (PCB), only in part. The PCB part did not hold up well legally in the Supreme Court; nor was the PCB lung cancer claim vindicated by later scientific evidence. SeeHow Have Important Rule 702 Holdings Held Up With Time?” (Mar. 20, 2015).

The Supreme Court in Joiner reversed and remanded the case to the 11th Circuit, which then remanded the case back to the district court to address claims that Mr. Joiner had been exposed to furans and dioxins, and that these other chemicals had caused, or contributed to, his lung cancer, as well. Joiner v. General Electric Co., 134 F.3d 1457 (11th Cir. 1998) (per curiam). Thus the dioxins were left in the case even after the Supreme Court ruled.

After the Supreme Court’s decision, Anthony Roisman argued that the Court had addressed an artificial question when asked about PCBs alone because the case was really about an alleged mixture of exposures, and he held out hope that the Joiners would do better on remand. Anthony Z. Roisman, “The Implications of G.E. v. Joiner for Admissibility of Expert Testimony,” 1 Res Communes 65 (1999).

Many Daubert observers (including me) are unaware of the legal fate of the Joiners’ claims on remand. In the only reference I could find, the commentator simply noted that the case resolved before trial.[1] I am indebted to Michael Risinger, and Joseph Cecil, for pointing me to documents from PACER, which shed some light upon the Joiner “endgame.”

In February 1998, Judge Orinda Evans, who had been the original trial judge, and who had sustained defendants’ Rule 702 challenges and granted their motions for summary judgments, received and reopened the case upon remand from the 11th Circuit. In March, Judge Evans directed the parties to submit a new pre-trial order by April 17, 1998. At a status conference in April 1998, Judge Evans permitted the plaintiffs additional discovery, to be completed by June 17, 1998. Five days before the expiration of their additional discovery period, the plaintiffs moved for additional time; defendants opposed the request. In July, Judge Evans granted the requested extension, and gave defendants until November 1, 1998, to file for summary judgment.

Meanwhile, in June 1998, new counsel entered their appearances for plaintiffs – William Sims Stone, Kevin R. Dean, Thomas Craig Earnest, and Stanley L. Merritt. The docket does not reflect much of anything about the new discovery other than a request for a protective order for an unpublished study. But by October 6, 1998, the new counsel, Earnest, Dean, and Stone (but not Merritt) withdrew as attorneys for the Joiners, and by the end of October 1998, Judge Evans entered an order to dismiss the case, without prejudice.

A few months later, in February 1999, the parties filed a stipulation, approved by the Clerk, dismissing the action with prejudice, and with each party to bear its own coasts. Given the flight of plaintiffs’ counsel, the dismissals without and then with prejudice, a settlement seems never to have been involved in the resolution of the Joiner case. In the end, the Joiners’ case fizzled perhaps to avoid being Frye’d.

And what has happened since to the science of dioxins and lung cancer?

Not much.

In 2006, the National Research Council published a monograph on dioxin, which took the controversial approach of focusing on all cancer mortality rather than specific cancers that had been suggested as likely outcomes of interest. See David L. Eaton (Chairperson), Health Risks from Dioxin and Related Compounds – Evaluation of the EPA Reassessment (2006). The validity of this approach, and the committee’s conclusions, were challenged vigorously in subsequent publications. Paolo Boffetta, Kenneth A. Mundt, Hans-Olov Adami, Philip Cole, and Jack S. Mandel, “TCDD and cancer: A critical review of epidemiologic studies,” 41 Critical Rev. Toxicol. 622 (2011) (“In conclusion, recent epidemiological evidence falls far short of conclusively demonstrating a causal link between TCDD exposure and cancer risk in humans.”

In 2013, the Industrial Injuries Advisory Council (IIAC), an independent scientific advisory body in the United Kingdom, published a review of lung cancer and dioxin. The Council found the epidemiologic studies mixed, and declined to endorse the compensability of lung cancer for dioxin-exposed industrial workers. Industrial Injuries Advisory Council – Information Note on Lung cancer and Dioxin (December 2013). See also Mann v. CSX Transp., Inc., 2009 WL 3766056, 2009 U.S. Dist. LEXIS 106433 (N.D. Ohio 2009) (Polster, J.) (dioxin exposure case) (“Plaintiffs’ medical expert, Dr. James Kornberg, has opined that numerous organizations have classified dioxins as a known human carcinogen. However, it is not appropriate for one set of experts to bring the conclusions of another set of experts into the courtroom and then testify merely that they ‘agree’ with that conclusion.”), citing Thorndike v. DaimlerChrysler Corp., 266 F. Supp. 2d 172 (D. Me. 2003) (court excluded expert who was “parroting” other experts’ conclusions).

Last year, an industrial cohort, followed for two decades found no increased risk of lung cancer among workers exposed to dioxin. David I. McBride, James J. Collins, Thomas John Bender, Kenneth M Bodner, and Lesa L. Aylward, “Cohort study of workers at a New Zealand agrochemical plant to assess the effect of dioxin exposure on mortality,” 8 Brit. Med. J. Open e019243 (2018) (reporting SMR for lung cancer 0.95, 95%CI: 0.56 to 1.53)


[1] Morris S. Zedeck, Expert Witness in the Legal System: A Scientist’s Search for Justice 49 (2010) (noting that, after remand from the Supreme Court, Joiner v. General Electric resolved before trial)

 

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

March 23rd, 2019

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  • Double the one-sided p-value.
  • Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  • Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). SeeThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., 145 F.Supp. 3d 573 (D.S.C. 2015), reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016). As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed. Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).

Apportionment and Pennsylvania’s Fair Share Act

March 14th, 2019

In 2011, Pennsylvania enacted the Fair Share Act, which was remedial legislation designed to mitigate the unfairness of joint and several liability in mass, and other, tort litigation by abrogating joint and several liability in favor of apportionment of shares among multiple defendants, including settled defendants.1

Although the statute stated the general rule in terms of negligence,2 the Act was clearly intended to apply to actions for so-called strict liability:3

“(1) Where recovery is allowed against more than one person, including actions for strict liability, and where liability is attributed to more than one defendant, each defendant shall be liable for that proportion of the total dollar amount awarded as damages in the ratio of the amount of that defendant’s liability to the amount of liability attributed to all defendants and other persons to whom liability is apportioned under subsection.”

The intended result of the legislation was for courts to enter separate and several judgments against defendants held liable in the amount apportioned to each defendant’s liability.4 The Act created exceptions for for intentional torts and for cases in which a defendant receives 60% or greater share in the apportionment.5

In Pennsylvania, as in other states, judges sometimes fall prey to the superstition that the law, procedural and substantive, does not apply to asbestos cases. Roverano v. John Crane, Inc., is an asbestos case in which the plaintiff claimed his lung cancer was caused by exposure to multiple defendants’ products. The trial judge, falling under the sway of asbestos exceptionalism, refused to apply Fair Share Act, suggesting that “the jury was not presented with evidence that would permit an apportionment to be made by it.”

The Roverano trial judge’s suggestion is remarkable, given that any plaintiff is exposed to different asbestos products in distinguishable amounts, and for distinguishable durations. Furthermore, asbestos products have distinguishable, relative levels of friability, with different levels of respirable fiber exposure for the plaintiff. In some cases, the products contain different kinds of asbestos minerals, which have distinguishable and relative levels of potency to cause the plaintiff’s specific disease. Asbestos cases, whether involving asbestosis, lung cancer, or mesothelioma claims, are more amenable to apportionment of shares among co-defendants than are “red car / blue car” cases.

Pennsylvania’s intermediate appellate court reversed the trial court’s asbestos exceptionalism, and held that upon remand, the court must:

“[a]pply a non-per capita allocation to negligent joint tortfeasors and strict liability joint tortfeasors; and permit evidence of settlements reached between plaintiffs and bankrupt entities to be included in the calculation of allocation of liability.”

Roverano v. John Crane, 2017 Pa. Super. 415, 177 A.3d 892 (2017).

The Superior Court’s decision did not sit well with the litigation industry, which likes joint and several liability, with equal shares. Joint and several liability permits plaintiffs’ counsel to extort large settlements from minor defendants who face the prospect of out-sized pro rata shares after trial, without the benefit of reductions for the shares of settled bankrupt defendants. The Roverano plaintiff appealed from the Superior Court’s straightforward application of a remedial statute.

What should be a per curiam affirmance of the Superior Court, however, could result in another act of asbestos exceptionalism by Pennsylvania Supreme Court. Media reports of the oral argument in Roverano suggest that several of the justices invoked the specter of “junk science” in apportioning shares among asbestos co-defendants.6 Disrespectfully, Justice Max Baer commented:

“Respectfully, your theory is interjecting junk science. We’ve never held that duration of contact corresponds with culpability.”7

The Pennsylvania Justices’ muddle can be easily avoided. First, the legislature clearly expressed its intention that apportionment be permitted in strict liability cases.

Second, failure-to-warn strict liability cases are, as virtually all scholars and most courts recognize, essentially negligence cases, in any event.8

Third, apportionment is a well-recognized procedure in the law of Torts, including the Pennsylvania law of torts. Apportionment of damages among various causes was recognized in the Restatement of Torts (Second) Section 433A (Apportionment of Harm to Causes), which specifies that:

(1) Damages for harm are to be apportioned among two or more causes where

(a) there are distinct harms, or

(b) there is a reasonable basis for determining the contribution of each cause to a single harm.

Restatement (Second) of Torts § 433A(1) (1965) [hereinafter cited as Section 433A].

The comments to Section 433A suggest a liberal application for apportionment. The rules set out in Section 433A apply “whenever two or more causes have combined to bring about harm to the plaintiff, and each has been a substantial factor in producing the harm … .”

Id., comment a. The independent causes may be tortious or innocent, “and it is immaterial whether all or any of such persons are joined as defendants in the particular action.” Id. Indeed, apportionment also applies when the defendant’s conduct combines “with the operation of a force of nature, or with a pre-existing condition which the defendant has not caused, to bring about the harm to the plaintiff.” Just as the law of grits applies in everyone’s kitchen, the law of apportionment applies in Pennsylvania courts.

Apportionment of damages is an accepted legal principle in Pennsylvania law. Martin v. Owens-Corning Fiberglas Corp., 515 Pa. 377, 528 A.2d 947 (1987). Courts, applying Pennsylvania law, have permitted juries to apportion damages between asbestos and cigarette smoking as causal factors in plaintiffs’ lung cancers, based upon a reasonable basis for determining the contribution of each source of harm to a single harm.9

In Parker, none of the experts assigned exact mathematical percentages to the probability that asbestos rather than smoking caused the lung cancer. The Court of Appeals noted that on the record before it:

“we cannot say that no reasonable basis existed for determining the contribution of cigarette smoking to the cancer suffered by the decedent.”10

The Pennsylvania Supreme Court has itself affirmed the proposition that “liability attaches to a negligent act only to the degree that the negligent act caused the employee’s injury.”11 Thus, even in straight-up negligence cases, causal apportionment must play in a role, even when the relative causal contributions are much harder to determine than in the quasi-quantitative setting of an asbestos exposure claim.

Let’s hope that Justice Baer and his colleagues read the statute and the case law before delivering judgment. The first word in the name of the legislation is Fair.


1 42 Pa.C.S.A. § 7102.

2 42 Pa.C.S.A. § 7102(a)

3 42 Pa.C.S.A. § 7102(a)(1) (emphasis added).

4 42 Pa.C.S.A. § 7102(a)(2).

5 42 Pa.C.S.A. § 7102 (a)(3)(ii), (iii).

7 Id. (quoting Baer, J.).

8 See, e.g, Restatement (Third) of Torts: Products Liability § 2, and comment I (1998); Fane v. Zimmer, Inc., 927 F.2d 124, 130 (2d Cir. 1991) (“Failure to warn claims purporting to sound in strict liability and those sounding in negligence are essentially the same.”).

9 Parker v. Bell Asbestos Mines, No. 86-1197, unpublished slip op. at 5 (3d Cir., Dec. 30, 1987) (per curiam) (citing Section 433A as Pennsylvania law, and Martin v. Owens-Corning Fiberglas Corp. , 515 Pa. 377, 528 A.2d 947, 949 (1987))

10 Id. at 7.

11 Dale v. Baltimore & Ohio RR., 520 Pa. 96, 106, 552 A.2d 1037, 1041 (1989). See also McAllister v. Pennsylvania RR., 324 Pa. 65, 69-70, 187 A. 415, 418 (1936) (holding that plaintiff’s impairment, and pain and suffering, can be apportioned between two tortious causes; plaintiff need not separate damages with exactitude); Shamey v. State Farm Mutual Auto. Ins. Co., 229 Pa. Super. 215, 223, 331 A.2d 498, 502 (1974) (citing, and relying upon, Section 433A; difficulties in proof do not constitute sufficient reason to hold a defendant liable for the damage inflicted by another person). Pennsylvania law is in accord with the law of other states as well, on apportionment. See Waterson v. General Motors Corp., 111 N.J. 238, 544 A.2d 357 (1988) (holding that a strict liability claim against General Motors for an unreasonably dangerous product defect was subject to apportionment for contribution from failing to wear a seat belt) (the jury’s right to apportion furthered the public policy of properly allocating the costs of accidents and injuries).

ASA Statement Goes to Court – Part 2

March 7th, 2019

It has been almost three years since the American Statistical Association (ASA) issued its statement on statistical significance. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016) [ASA Statement]. Before the ASA’s Statement, courts and lawyers from all sides routinely misunderstood, misstated, and misrepresented the meaning of statistical significance.1 These errors were pandemic despite the efforts of the Federal Judicial Center and the National Academies of Science to educate judges and lawyers, through their Reference Manuals on Scientific Evidence and seminars. The interesting question is whether the ASA’s Statement has improved, or will improve, the unfortunate situation.2

The ASA Statement on Testosterone

“Ye blind guides, who strain out a gnat and swallow a camel!”
Matthew 23:24

To capture the state of the art, or the state of correct and flawed interpretations of the ASA Statement, reviewing a recent but now resolved, large so-called mass tort may be illustrative. Pharmaceutical products liability cases almost always turn on evidence from pharmaco-epidemiologic studies that compare the rate of an outcome of interest among patients taking a particular medication with the rate among similar, untreated patients. These studies compare the observed with the expected rates, and invariably assess the differences as either a “risk ratio,” or a “risk difference,” for both the magnitude of the difference and for “significance probability” of observing a rate at least as large as seen in the exposed group, given the assumptions that that the medication did not change the rate and that the data followed a given probability distribution. In these alleged “health effects” cases, claims and counterclaims of misuse of significance probability have been pervasive. After the ASA Statement was released, some lawyers began to modify their arguments to suggest that their adversaries’ arguments offend the ASA’s pronouncements.

One litigation that showcases the use and misuse of the ASA Statement arose from claims that AbbVie, Inc.’s transdermal testosterone medication (TRT) causes heart attacks, strokes, and venous thromboembolism. The FDA had reviewed the plaintiffs’ claims, made in a Public Citizen complaint, and resoundingly rejected the causal interpretation of two dubious observational studies, and an incomplete meta-analysis that used an off-beat composite end point.3 The Public Citizen petition probably did succeed in pushing the FDA to convene an Advisory Committee meeting, which again resulted in a rejection of the causal claims. The FDA did, however, modify the class labeling for TRT with respect to indication and a possible association with cardiovascular outcomes. And then the litigation came.

Notwithstanding the FDA’s determination that a causal association had not been shown, thousands of plaintiffs sued several companies, with most of the complaints falling on AbbVie, Inc., which had the largest presence in the market. The ASA Statement came up occasionally in pre-trial depositions, but became a major brouhaha, when AbbVie moved to exclude plaintiffs’ causation expert witnesses.4

The Defense’s Anticipatory Parry of the ASA Statement

As AbbVie described the situation:

Plaintiffs’ experts uniformly seek to abrogate the established methods and standards for determining … causal factors in favor of precisely the kind of subjective judgments that Daubert was designed to avoid. Tests for statistical significance are characterized as ‘misleading’ and rejected [by plaintiffs’ expert witnesses] in favor of non-statistical ‘estimates’, ‘clinical judgment’, and ‘gestalt’ views of the evidence.”5

AbbVie’s brief in support of excluding plaintiffs’ expert witnesses barely mentioned the ASA Statement, but in a footnote, the defense anticipated the Plaintiffs’ opposition would be based on rejecting the importance of statistical significance testing and the claim that this rejection was somehow supported by the ASA Statement:

The statistical community is currently debating whether scientists who lack expertise in statistics misunderstand p-values and overvalue significance testing. [citing ASA Statement] The fact that there is a debate among professional statisticians on this narrow issue does not validate Dr. Gerstman’s [plaintiffs’ expert witness’s] rejection of the importance of statistical significance testing, or undermine Defendants’ reliance on accepted methods for determining association and causation.”6

In its brief in support of excluding causation opinions, the defense took pains to define statistical significance, and managed to do so, painfully, or at least in ways that the ASA conferees would have found objectionable:

Any association found must be tested for its statistical significance. Statistical significance testing measures the likelihood that the observed association could be due to chance variation among samples. Scientists evaluate whether an observed effect is due to chance using p-values and confidence intervals. The prevailing scientific convention requires that there be 95% probability that the observed association is not due to chance (expressed as a p-value < 0.05) before reporting a result as “statistically significant. * * * This process guards against reporting false positive results by setting a ceiling for the probability that the observed positive association could be due to chance alone, assuming that no association was actually present.7

AbbVie’s brief proceeded to characterize the confidence interval as a tool of significance testing, again in a way that misstates the mathematical meaning and importance of the interval:

The determination of statistical significance can be described equivalently in terms of the confidence interval calculated in connection with the association. A confidence interval indicates the level of uncertainty that exists around the measured value of the association (i.e., the OR or RR). A confidence interval defines the range of possible values for the actual OR or RR that are compatible with the sample data, at a specified confidence level, typically 95% under the prevailing scientific convention. Reference Manual, at 580 (Ex. 14) (“If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population.”). * * * If the confidence interval crosses 1.0, this means there may be no difference between the treatment group and the control group, therefore the result is not considered statistically significant.”8

Perhaps AbbVie’s counsel should be permitted a plea in mitigation by having cited to, and quoted from, the Reference Manual on Scientific Evidence’s chapter on epidemiology, which was also wide of the mark in its description of the confidence interval. Counsel would have been better served by the Manual’s more rigorous and accurate chapter on statistics. Even so, the above-quoted statements give an inappropriate interpretation of random error as a probability about the hypothesis being tested.9 Particularly dangerous, in terms of failing to advance AbbVie’s own objectives, was the characterization of the confidence interval as measuring the level of uncertainty, as though there were no other sources of uncertainty other than random error in the measurement of the risk ratio.

The Plaintiffs’ Attack on Significance Testing

The Plaintiffs, of course, filed an opposition brief that characterized the defense position as an attempt to:

elevate statistical significance, as measured by confidence intervals and so-called p-values, to the status of an absolute requirement to the establishment of causation.”10

Tellingly, the plaintiffs’ brief fails to point to any modern-era example of a scientific determination of causation based upon epidemiologic evidence, in which the pertinent studies were not assessed for, and found to show, statistical significance.

After citing a few judicial opinions that underplayed the importance of statistical significance, the Plaintiffs’ opposition turned to the ASA Statement for what it perceived to be support for its loosey-goosey approach to causal inference.11 The Plaintiffs’ opposition brief quoted a series of propositions from the ASA Statement, without the ASA’s elaborations and elucidations, and without much in the way of explanation or commentary. At the very least, the Plaintiffs’ heavy reliance upon, despite their distortions of, the ASA Statement helped them to define key statistical concepts more carefully than had AbbVie in its opening brief.

The ASA Statement, however, was not immune from being misrepresented in the Plaintiffs’ opposition brief. Many of the quoted propositions were quite beside the points of the dispute over the validity and reliability of Plaintiffs’ expert witnesses’ conclusions of causation about testosterone and heart attacks, conclusions not reached or shared by the FDA, any consensus statement from medical organizations, or any serious published systematic review:

P-values do not measure the probability that the studied hypothesis is true, … .”12

This proposition from the ASA Statement is true, but trivially true. (Of course, this ASA principle is relevant to the many judicial decisions that have managed to misstate what p-values measure.) The above-quoted proposition follows from the definition and meaning of the p-value; only someone who did not understand significance probability would confuse it with the probability of the truth of the studied hypothesis. P-values’ not measuring the probability of the null hypothesis, or any alternative hypothesis, is not a flaw in p-values, but arguably their strength.

A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”13

Again, true, true, and immaterial. The existence of other importance metrics, such as the magnitude of an association or correlation, hardly detracts from the importance of assessing the random error in an observed statistic. The need to assess clinical or practical significance of an association or correlation also does not detract from the importance of the assessed random error in a measured statistic.

By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”14

The Plaintiffs’ opposition attempted to spin the above ASA statement as a criticism of p-values involves an elenchi ignoratio. Once again, the p-value assumes a probability model and a null hypothesis, and so it cannot provide a “measure” or the model or hypothesis’ probability.

The Plaintiffs’ final harrumph on the ASA Statement was their claim that the ASA Statement’s conclusion was “especially significant” to the testosterone litigation:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.”15

The existence of other important criteria in the evaluation and synthesis of a complex body of studies does not erase or supersede the importance of assessing stochastic error in the epidemiologic studies. Plaintiffs’ Opposition Brief asserted that the Defense had attempted to:

to substitute the single index, the p-value, for scientific reasoning in the reports of Plaintiffs’ experts should be rejected.”16

Some of the defense’s opening brief could indeed be read as reducing causal inference to the determination of statistical significance. A sympathetic reading of the entire AbbVie brief, however, shows that it had criticized the threats to validity in the observational epidemiologic studies, as well as some of the clinical trials, and other rampant flaws in the Plaintiffs’ expert witnesses’ reasoning. The Plaintiffs’ citations to the ASA Statement’s “negative” propositions about p-values (to emphasize what they are not) appeared to be the stuffing of a strawman, used to divert attention from other failings of their own claims and proffered analyses. In other words, the substance of the Rule 702 application had much more to do with data quality and study validity than statistical significance.

What did the trial court make of this back and forth about statistical significance and the ASA Statement? For the most part, the trial court denied both sides’ challenges to proffered expert witness testimony on causation and statistical issues. In sorting the controversy over the ASA Statement, the trial court apparently misunderstood key statistical concepts and paid little attention to the threats to validity other than random variability in study results.17 The trial court summarized the controversy as follows:

In arguing that the scientific literature does not support a finding that TRT is associated with the alleged injuries, AbbVie emphasize [sic] the importance of considering the statistical significance of study results. Though experts for both AbbVie and plaintiffs agree that statistical significance is a widely accepted concept in the field of statistics and that there is a conventional method for determining the statistical significance of a study’s findings, the parties and their experts disagree about the conclusions one may permissibly draw from a study result that is deemed to possess or lack statistical significance according to conventional methods of making that determination.”18

Of course, there was never a controversy presented to the court about drawing a conclusion from “a study.” By the time the briefs were filed, both sides had multiple observational studies, clinical trials, and meta-analyses to synthesize into opinions for or against causal claims.

Ironically, AbbVie might claim to have prevailed in having the trial court adopt its misleading definitions of p-values and confidence intervals:

Statisticians test for statistical significance to determine the likelihood that a study’s findings are due to chance. *** According to conventional statistical practice, such a result *** would be considered statistically significant if there is a 95% probability, also expressed as a “p-value” of <0.05, that the observed association is not the product of chance. If, however, the p-value were greater than 0.05, the observed association would not be regarded as statistically significant, according to prevailing conventions, because there is a greater than 5% probability that the association observed was the result of chance.”19

The MDL court similarly appeared to accept AbbVie’s dubious description of the confidence interval:

A confidence interval consists of a range of values. For a 95% confidence interval, one would expect future studies sampling the same population to produce values within the range 95% of the time. So if the confidence interval ranged from 1.2 to 3.0, the association would be considered statistically significant, because one would expect, with 95% confidence, that future studies would report a ratio above 1.0 – indeed, above 1.2.”20

The court’s opinion clearly evidences the danger in stating the importance of statistical significance without placing equal emphasis on the need to exclude bias and confounding. Having found an observational study and one meta-analysis of clinical trial safety outcomes that were statistically significant, the trial court held that any dispute over the probativeness of the studies was for the jury to assess.

Some but not all of AbbVie’s brief might have encouraged this lax attitude by failing to emphasize study validity at the same time as emphasizing the importance of statistical significance. In any event, trial court continued with its précis of the plaintiffs’ argument that:

a study reporting a confidence interval ranging from 0.9 to 3.5, for example, should certainly not be understood as evidence that there is no association and may actually be understood as evidence in favor of an association, when considered in light of other evidence. Thus, according to plaintiffs’ experts, even studies that do not show a statistically significant association between TRT and the alleged injuries may plausibly bolster their opinions that TRT is capable of causing such injuries.”21

Of course, a single study that reported a risk ratio greater than 1.0, with a confidence interval 0.9 to 3.5 might be reasonably incorporated into a meta-analysis that in turn could support, or not support a causal inference. In the TRT litigation, however, the well-conducted, most up-to-date meta-analyses did not report statistically significant elevated rates of cardiovascular events among users of TRT. The court’s insistence that a study with a confidence interval 0.9 to 3.5 cannot be interpreted as evidence of no association is, of course, correct. Equally correct would be to say that the interval shows that the study failed to show an association. The trial court never grappled with the reality that the best conducted meta-analyses failed to show statistically significant increases in the rates of cardiovascular events.

The American Statistical Association and its members would likely have been deeply disappointed by how both parties used the ASA Statement for their litigation objectives. AbbVie’s suggestion that the ASA Statement reflects a debate about “whether scientists who lack expertise in statistics misunderstand p-values and overvalue significance testing” would appear to have no support in the Statement itself or any other commentary to come out of the meeting leading up to the Statement. The Plaintiffs’ argument that p-values properly understood are unimportant and misleading similarly finds no support in the ASA Statement. Conveniently, the Plaintiffs’ brief ignored the Statement’s insistence upon transparency in pre-specification of analyses and outcomes, and in handling of multiple comparisons:

P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherrypicking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and ‘p-hacking’, leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided.”22

Most if not all of the plaintiffs’ expert witnesses’ reliance materials would have been eliminated under this principle set forth by the ASA Statement.


1 See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005). See alsoConfidence in Intervals and Diffidence in the Courts” (March 4, 2012); “Scientific illiteracy among the judiciary” (Feb. 29, 2012).

3Letter of Janet Woodcock, Director of FDA’s Center for Drug Evaluation and Research, to Sidney Wolfe, Director of Public Citizen’s Health Research Group (July 16, 2014) (denying citizen petition for “black box” warning).

4 Defendants’ (AbbVie, Inc.’s) Motion to Exclude Plaintiffs Expert Testimony on the Issue of Causation, and for Summary Judgment, and Memorandum of Law in Support, Case No. 1:14-CV-01748, MDL 2545, Document #: 1753, 2017 WL 1104501 (N.D. Ill. Feb. 20, 2017) [AbbVie Brief].

5 AbbVie Brief at 3; see also id. at 7-8 (“Depending upon the expert, even the basic tests of statistical significance are simply ignored, dismissed as misleading… .”) AbbVie’s definitions of statistical significance occasionally wandered off track and into the transposition fallacy, but generally its point was understandable.

6 AbbVie Brief at 63 n.16 (emphasis in original).

7 AbbVie Brief at 13 (emphasis in original).

8 AbbVie Brief at 13-14 (emphasis in original).

9 The defense brief further emphasized statistical significance almost as though it were a sufficient basis for inferring causality from observational studies: “Regardless of this debate, courts have routinely found the traditional epidemiological method—including bedrock principles of significance testing—to be the most reliable and accepted way to establish general causation. See, e.g., In re Zoloft, 26 F. Supp. 3d 449, 455; see also Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319 (7th Cir. 1996) (“The law lags science; it does not lead it.”). AbbVie Brief at 63-64 & n.16. The defense’s language about “including bedrock principles of significance testing” absolves it of having totally ignored other necessary considerations, but still the defense might have advantageously pointed out at the other needed considerations for causal inference at the same time.

10 Plaintiffs’ Steering Committee’ Memorandum of Law in Opposition to Motion of AbbVie Defendants to Exclude Plaintiffs’ Expert Testimony on the Issue of Causation, and for Summary Judgment at p.34, Case No. 1:14-CV-01748, MDL 2545, Document No. 1753 (N.D. Ill. Mar. 23, 2017) [Opp. Brief].

11 Id. at 35 (appending the ASA Statement and the commentary of more than two dozen interested commentators).

12 Id. at 38 (quoting from the ASA Statement at 131).

13 Id. at 38 (quoting from the ASA Statement at 132).

14 Id. at 38 (quoting from the ASA Statement at 132).

15 Id. at 38 (quoting from the ASA Statement at 132).

16 Id. at 38

17  In re Testosterone Replacement Therapy Prods. Liab. Litig., MDL No. 2545, C.M.O. No. 46, 2017 WL 1833173 (N.D. Ill. May 8, 2017) [In re TRT]

18 In re TRT at *4.

19 In re TRT at *4.

20 Id.

21 Id. at *4.

22 ASA Statement at 131-32.

Link-a-lot, right and left

March 1st, 2019


The right-wing of American politics, with its religious enthusiams, has long shown a willingness to ignore and subvert science to advance its policy agendas. The left wing of American politics, however, is not immune from ignoring evidence-based scientific conclusions in its policy agenda. When it comes to World-Trade Center (WTC) attack, the hostility to evidence-based conclusions appears to be bipartisan.

The attack on the WTC by evil Muslim religious extremists was deplorable, and the September 11th rescue workers deserve our respect and gratitude. They may even deserve compensation for fortuitous, un-related chronic diseases experienced a decade or so later. Dressing up our gratitude in the language of causality and victimhood, however, undermines basic respect for scientific evidence and leads to specious claiming.

The New York Times, no slouch when it comes to specious claiming on scientific issues, provided a great example with its editorial this morning, advocating for increased federal funding for the WTC compensation fund. Editorial Board, “Give Sept. 11 Survivors the Help They Deserve: A fund to aid the thousands sickened from the toxic dust of the World Trade Center attack is running out of money,” N.Y. Times (Feb. 28, 2019). The Times’ editors pictured a retired federal worker as one “who suffers from illnesses like leukemia related to recovery work at ground zero.” The editorial tells us that this man, in 2015, “was told he had leukemia linked to his work there, “like many who had been at the site.” The editors went on to bemoan how this man, and others like him, might receive much less than what had been promised by the federal WTC compensation fund.

The passive voice can be very revealing for the deception and misrepresentations it hides. Who told this man such a thing, about a “link,” whatever that is? And on what evidence was the “link” supposedly established?

Of course, there were toxic materials disseminated by the Muslim terrorist attack, and scientists have studied health outcomes among both the rescue workers and responders, as well as among civilians who joined the effort to look for survivors and victims. One study that was published shortly after the 10-year anniversary of the WTC attack, failed to show any “link” between respiratory or physical exposure to WTC dusts and materials and leukemia.1 In the authors’ words, “”Using within-cohort comparisons, the intensity of World Trade Center exposure was not significantly associated with cancer of the lung, prostate, thyroid, non-Hodgkin lymphoma, or hematological cancer in either group.”

Table 3 of their paper reported specifically on leukemia, using standardized incidence ratios (SIR), in two time windows:

Early Period (with enrollment through 2006, n = 21,218.

SIR 0.73 (95% C.I., 0.20 to 1.87)

and

Later Period (enrollment 2007-2008, n = 20,991)

SIR = 1.25 (95% C.I., 0.46 to 2.72)

A later paper by many of the same authors updated the cohort through 2011. Again, the results overall were equivocal in terms of standardized incidence ratios, but quite “null” for leukemia:

“RESULTS: All-cancer SIR was 1.11 (95% confidence interval (CI) 1.03-1.20) in RRWs, and 1.08 (95% CI 1.02-1.15) in non-RRWs. Prostate cancer and skin melanoma were significantly elevated in both populations. Thyroid cancer was significantly elevated only in RRWs [rescue workers] while breast cancer and non-Hodgkin’s lymphoma were significantly elevated only in non-RRWs. There was a significant exposure dose-response for bladder cancer among RRWs, and for skin melanoma among non-RRWs [civilians].”2

Table II of this later report provides the evidence that the New York Times’ anonymous “linker” was mostly likely full of soup:

Rescue/recovery workers (RRW) (N=24,863)

Leukemia: 16 observed; 17 expected

SIR = 0.95 (95% C.I., 0.54 – 1.54)

Enrollees not involved in rescue and recovery (non-RRW) (N = 35,476)

Leukemia: 18 observed, 22 expected

SIR = 0.81 (95% C.I., 0.48 – 1.29)

An article reporting on the results of multiple cohorts in the American Journal of Industrial Medicine (“red journal”) makes clear that causal conclusions or “linking” is not appropriate on the available evidence:

Conclusions. The presence of three cohorts strengthens the effort of identifying and quantifying the cancer risk; the heterogeneity in design might increase sensitivity to the identification of cancers potentially associated with exposure. The presence and magnitude of an increased cancer risk remains to be fully elucidated. Continued long-term follow up with minimal longitudinal dropout is crucial to achieve this goal.”3

These authors’ point about continued, long-term follow up is of course true, but immaterial to the validity of the present compensation schemes. The evidence for the relevant time window has been collected and analyzed. Whether compensation for longer latency period manifestations of chronic disease is appropriate is a separate issue. There is just no link in the New York Times’ linking. Evidence-based policy needs evidence, not editorial opinion.

1Jiehui Li, James E. Cone, Amy R. Kahn, Robert M. Brackbill, Mark R. Farfel, Carolyn M. Greene, James L. Hadler,Leslie T. Stayner, and Steven D. Stellman, “Association between World Trade Center exposure and excess cancer risk,” 308 J. Am. Med. Ass’n 2479 (2012).

2Mark R. Farfel, James L. Hadler, Amy R. Kahn, Kevin J. Konty, Leslie T. Stayner, and Steven D. Stellman, “Ten-year cancer incidence in rescue/recovery workers and civilians exposed to the September 11, 2001 terrorist attacks on the World Trade Center,” 59 Am. J. Indus. Med. 709 (2016).

3Paolo Boffetta, Rachel Zeig-Owens, Sylvan Wallenstein, Jiehui Li, Robert Brackbill, James Cone, Mark Farfel, William Holden, Roberto Lucchini, Mayris P. Webber, David Prezant, and Steven D. Stellman, “Cancer in World Trade Center responders: Findings from multiple cohorts and options for future study,” 59 Am. J. Indus. Med. 96, 96 (2016) (emphasis added).