TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Advocates’ Errors in Daubert

December 28th, 2018

Over 25 years ago, the United States Supreme Court answered a narrow legal question about whether the so-called Frye rule was incorporated into Rule 702 of the Federal Rules of Evidence. Plaintiffs in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), appealed a Ninth Circuit ruling that the Frye rule survived, and was incorporated into, the enactment of a statutory evidentiary rule, Rule 702. As most legal observers can now discern, plaintiffs won the battle and lost the war. The Court held that the plain language of Rule 702 does not memorialize Frye; rather the rule requires an epistemic warrant for the opinion testimony of expert witnesses.

Many of the sub-issues of the Daubert case are now so much water over the dam. The case involved claims of birth defects from maternal use of an anti-nausea medication, Bendectin. Litigation over Bendectin is long over, and the medication is now approved for use in pregnant women, on the basis of a full new drug application, supported by clinical trial evidence.

In revisiting Daubert, therefore, we might imagine that legal scholars and scientists would be interested in the anatomy of the errors that led Bendectin plaintiffs stridently to maintain their causal claims. The oral argument before the Supreme Court is telling with respect to some of the sources of error. Two law professors, Michael H. Gottesman, for plaintiffs, and Charles Fried, for the defense, squared off one Tuesday morning in March 1993. A review of Gottesman’s argument reveals several fallacious lines of argument, which are still relevant today:

A. Regulation is Based Upon Scientific Determinations of Causation

In his oral argument, Gottesman asserted that regulators (as opposed to the scientific community) are in charge of determining causation,1 and environmental regulations are based upon scientific causation determinations.2 By the time that the Supreme Court heard argument in the Daubert case, this conflation of scientific and regulatory standards for causal conclusions was fairly well debunked.3 Gottesman’s attempt to mislead the Court failed, but the effort continues in courtrooms around the United States.

B. Similar Chemical Structures Have the Same Toxicities

Gottesman asserted that human teratogenicity can be determined from similarity in chemical structures with other established teratogens.4 Close may count in horseshoes, but in chemical structural activities, small differences in chemical structures can result in huge differences in toxicologic or pharmacologic properties. A silly little methyl group on a complicated hydrocarbon ring structure can make a world of difference, as in the difference between estrogen and testosterone.

C. All Animals React the Same to Any Given Substance

Gottesman, in his oral argument, maintained that human teratogenicity can be determined from teratogenicity in non-human, non-primate, murine species.5 The Court wasted little time on this claim, the credibility of which has continued to decline in the last 25 years.

D. The Transposition Fallacy

Perhaps of greatest interest to me was Gottesman’s claim that the probability of the claimed causal association can be determined from the p-value or from the coefficient of confidence taken from the observational epidemiologic studies of birth defects among children of women who ingested Bendectin in pregancy; a.k.a. the transposition fallacy.6

All these errors are still in play in American courtrooms, despite efforts of scientists and scientific organizations to disabuse judges and lawyers. The transposition fallacy, which has been addressed in these pages and elsewhere at great length seems especially resilient to educational efforts. Still, the fallacy was as well recognized at the time of the Daubert argument as it is today, and it is noteworthy that the law professor who argued the plaintiffs’ case, in the highest court of the land, advanced this fallacious argument, and that the scientific and statistical community did little to nothing to correct the error.7

Although Professor Gottesman’s meaning in the oral argument is not entirely clear, on multiple occasions, he appeared to have conflated the coefficient of confidence, from confidence intervals, with the posterior probability that attaches to the alternative hypothesis of some association:

What the lower courts have said was yes, but prove to us to a degree of statistical certainty which would give us 95 percent confidence that the human epidemiological data is reflective, that these higher numbers for the mothers who used Bendectin were not the product of random chance but in fact are demonstrating the linkage between this drug and the symptoms observed.”8

* * * * *

“… what was demonstrated by Shanna Swan was that if you used a degree of confidence lower than 95 percent but still sufficient to prove the point as likelier than not, the epidemiological evidence is positive… .”9

* * * * *

The question is, how confident can we be that that is in fact probative of causation, not at a 95 percent level, but what Drs. Swan and Glassman said was applying the Rothman technique, a published technique and doing the arithmetic, that you find that this does link causation likelier than not.”10

Professor Fried’s oral argument for the defense largely refused or failed to engage with plaintiffs’ argument on statistical inference. With respect to the “Rothman” approach, Fried pointed out that plaintiffs’ statistical expert witness, Shanna swan, never actually employed “the Rothman principle.”11

With respect to plaintiffs’ claim that individual studies had low power to detect risk ratios of two, Professor Fried missed the opportunity to point out that such post-hoc power calculations, whatever validity they might possess, embrace the concept of statistical significance at the customary 5% level. Fried did note that a meta-analysis, based upon all the epidemiologic studies, rendered plaintiffs’ power complaint irrelevant.12

Some readers may believe that judging advocates speaking extemporaneously about statistical concepts might be overly harsh. How well then did the lawyers explain and represent statistical concepts in their written briefs in the Daubert case?

Petitioners’ Briefs

Petitioners’ Opening Brief

The petitioners’ briefs reveal that Gottesman’s statements at oral argument represent a consistent misunderstanding of statistical concepts. The plaintiffs consistently conflated significance probability or the coefficient of confidence with the civil burden of proof probability:

The crux of the disagreement between Merrell’s experts and those whose testimony is put forward by plaintiffs is that the latter are prepared to find causation more probable than not when the epidemiological evidence is strongly positive (albeit not at a 95% confidence level) and when it is buttressed with animal and chemical evidence predictive of causation, while the former are unwilling to find causation in the absence of an epidemiological study that satisfies the 95% confidence level.”13

After giving a reasonable fascimile of a definition of statistical significance, the plaintiffs’ brief proceeds to confuse the complement of alpha, or the coefficient of confidence (typically 95%), with probability that the observed risk ratio in a sample is the actual population parameter of risk:

But in toxic tort lawsuits, the issue is not whether it is certain that a chemical caused a result, but rather whether it is likelier than not that it did. It is not self-evident that the latter conclusion would require eliminating the null hypothesis (i.e. non-causation) to a confidence level of 95%.3014

The plaintiffs’ brief cited heavily to Rothman’s textbook, Modern Epidemiology, with the specious claim that the textbook supported the plaintiffs’ use of the coefficient of confidence to derive a posterior probability (> 50%) of the correctness of an elevated risk ratio for birth defects in children born to mothers who had taken Bendectin in their first trimesters of pregnancy:

An alternative mechanism has been developed by epidemiologists in recent years to give a somewhat more informative picture of what the statistics mean. At any given confidence level (e.g. 95%) a confidence interval can be constructed. The confidence interval identifies the range of relative risks that collectively comprise the 95% universe. Additional confidence levels are then constructed exhibiting the range at other confidence levels, e.g., at 90%, 80%, etc. From this set of nested confidence intervals the epidemiologist can make assessments of how likely it is that the statistics are showing a true association. Rothman, Tab 9, pp. 122-25. By calculating nested confidence intervals for the data in the Bendectin studies, Dr. Swan was able to determine that it is far more likely than not that a true association exists between Bendectin and human limb reduction birth defects. Swan, Tab 12, at 3618-28.”15

The heavy reliance upon Rothman’s textbook at first blush appears confusing. Modern Epidemiology makes one limited mention of nested confidence intervals, and certainly never suggests that such intervals can provide a posterior probability of the correctness of the hypothesis. Rothman’s complaints about reliance upon “statistical significance,” however, are well-known, and Rothman himself submitted an amicus brief16 in Daubert, a brief that has its own problems.17

In direct response to the Rothman Brief,18 Professor Alvin Feinstein filed an amicus brief in Daubert, wherein he acknowledged that meta-analyses and re-analyses can be valid, but these techniques are subject to many sources of invalidity, and their employment by careful practitioners in some instances should not be a blank check to professional witnesses who are supported by plaintiffs’ counsel. Similarly, Feinstein acknowledged that standards of statistical significance:

should be appropriately flexible, but they must exist if science is to preserve its tradition of intellectual discipline and high quality research.”19

Petitioners’ Reply Brief

The plaintiffs’ statistical misunderstandings are further exemplified in their Reply Brief, where they reassert the transposition fallacy and alternatively state that associations with p-values greater than 5%, or 95% confidence intervals that include the risk ratio of 1.0, do not show the absence of an association.20 The latter point was, of course irrelevant in the Daubert case, in which plaintiffs had the burden of persuasion. As in their oral argument through Professor Gottesman, the plaintiffs’ appellate briefs misunderstand the crucial point that confidence intervals are conditioned upon the data observed from a particular sample, and do not provide posterior probabilities for the correctness of a claimed hypothesis.

Defense Brief

The defense brief spent little time on the statistical issue or plaintiffs’ misstatements, but dispatched the issue in a trenchant footnote:

Petitioners stress the controversy some epidemiologists have raised about the standard use by epidemiologists of a 95% confidence level as a condition of statistical significance. Pet. Br. 8-10. See also Rothman Amicus Br. It is hard to see what point petitioners’ discussion establishes that could help their case. Petitioners’ experts have never developed and defended a detailed analysis of the epidemiological data using some alternative well-articulated methodology. Nor, indeed, do they show (or could they) that with some other plausible measure of confidence (say, 90%) the many published studies would collectively support an inference that Bendectin caused petitioners’ limb reduction defects. At the very most, all that petitioners’ theoretical speculations do is question whether these studies – as the medical profession and regulatory authorities in many countries have concluded – affirmatively prove that Bendectin is not a teratogen.”21

The defense never responded to the specious argument, stated or implied within the plaintiffs’ briefs, and in Gottesman’s oral argument, that a coefficient of confidence of 51% would have generated confidence intervals that routinely excluded the null hypothesis of risk ratio of 1.0. The defense did, however, respond to plaintiffs’ power argument by adverting to a meta-analysis that failed to find a statistically significant association.22

The defense also advanced two important arguments to which the plaintiffs’ briefs never meaningfully responded. First, the defense detailed the “cherry picking” or selective reliance engaged in by plaintiffs’ expert witnesses.23 Second, the defense noted that plaintiffs’ had a specific causation problem in that their expert witnesses had been attempting to infer specific causation based upon relative risks well below 2.0.24

To some extent, the plaintiffs’ statistical misstatements were taken up by an amicus brief submitted by the United States government, speaking through the office of the Solicitor General.25 Drawing upon the Supreme Court’s decisions in race discrimination cases,26 the government asserted that epidemiologists “must determine” whether a finding of an elevated risk ratio “could have arisen due to chance alone.”27

Unfortunately, the government’s brief butchered the meaning of confidence intervals. Rather than describe the confidence interval as showing what point estimates of risk ratios are reasonable compatible with the sample result, the government stated that confidence intervals show “how close the real population percentage is likely to be to the figure observed in the sample”:

since there is a 95 percent chance that the ‘true’ value lies within two standard deviations of the sample figure, that particular ‘confidence interval’ (i.e., two standard deviations) is therefore said to have a ‘confidence level’ of about 95 percent.” 28

The Solicitor General’s office seemed to have had some awareness that it was giving offense with the above definition because it quickly added:

“While it is customary (and, in many cases, easier) to speak of ‘a 95 percent chance’ that the actual population percentage is within two standard deviations of the figure obtained from the sample, ‘the chances are in the sampling procedure, not in the parameter’.”29

Easier perhaps but clearly erroneous to speak that way, and customary only among the unwashed. The government half apologized for misleading the Court when it followed up with a better definition from David Freedman’s textbook, but sadly the government lawyers were not content to let the matter sit there. The Solicitor General offices brief obscured the textbook definition with a further inaccurate and false précis:

if the sampling from the general population were repeated numerous times, the ‘real’ population figure would be within the confidence interval 95 percent of the time. The ‘real’ figure would be outside that interval the remaining five percent of the time.”30

The lawyers in the Solicitor General’s office thus made the rookie mistake of forgetting that in the long run, after numerous repeated samples, there would be numerous confidence intervals, not one. The 95% probability of containing the true population value belongs to the set of the numerous confidence intervals, not “the confidence interval” obtained in the first go around.

The Daubert case has been the subject of nearly endless scholarly comment, but few authors have chosen to revisit the parties’ briefs. Two authors have published a paper that reviewed the scientists’ amici briefs in Daubert.31 The Rothman brief was outlined in detail; the Feinstein rebuttal was not substantively discussed. The plaintiffs’ invocation of the transposition fallacy in Daubert has apparently gone unnoticed.


1 Oral Argument in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 754951, *5 (Tuesday, March 30, 1993) [Oral Arg.]

2 Oral Arg. at *6.

3 In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y.1984) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir. 1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004 (1988).

4 Org. Arg. at *19.

5 Oral Arg. at *18-19.

6 Oral Arg. at *19.

7 See, e.g., “Sander Greenland on ‘The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics’” (Feb. 8, 2015) (noting biostatistician Sander Greenland’s publications, which selectively criticize only defense expert witnesses and lawyers for statistical misstatements); see alsoSome High-Value Targets for Sander Greenland in 2018” (Dec. 27, 2017).

8 Oral Arg. at *19.

9 Oral Arg. at *20

10 Oral Arg. at *44. At the oral argument, this last statement was perhaps Gottesman’s clearest misstatement of statistical principles, in that he directly suggested that the coefficient of confidence translates into a posterior probability of the claimed association at the observed size.

11 Oral Arg. at *37.

12 Oral Arg. at *32.

13 Petitioner’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1992 WL 12006442, *8 (U.S. Dec. 2, 1992) [Petitioiner’s Brief].

14 Petitioner’s Brief at *9.

15 Petitioner’s Brief at *n. 36.

16 Brief Amici Curiae of Professors Kenneth Rothman, Noel Weiss, James Robins, Raymond Neutra and Steven Stellman, in Support of Petitioners, 1992 WL 12006438, Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. S. Ct. No. 92-102 (Dec. 2, 1992).

18 Brief Amicus Curiae of Professor Alvan R. Feinstein in Support of Respondent, in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court no. 92-102, 1993 WL 13006284, at *2 (U.S., Jan. 19, 1993) [Feinstein Brief].

19 Feinstein Brief at *19.

20 Petitioner’s Reply Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006390, at *4 (U.S., Feb. 22, 1993).

21 Respondent’s Brief in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006277, at n. 32 (U.S., Jan. 19, 1993) [Respondent Brief].

22 Respondent Brief at *4.

23 Respondent Brief at *42 n.32 and 47.

24 Respondent Brief at *40-41 (citing DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990)).

25 Brief for the United States as Amicus Curiae Supporting Respondent in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102, 1993 WL 13006291 (U.S., Jan. 19, 1993) [U.S. Brief].

26 See, e.g., Hazelwood School District v. United States, 433 U.S. 299, 308-312

(1977); Castaneda v. Partida, 430 U.S. 482, 495-499 & nn.16-18 (1977) (“As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would be suspect to a social scientist.”).

27 U.S. Brief at *3-4. Over two decades later, when politically convenient, the United States government submitted an amicus brief in a case involving alleged securities fraud for failing to disclose adverse events of an over-the-counter medication. In Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011), the securities fraud plaintiffs contended that they need not plead “statistically significant” evidence for adverse drug effects. The Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs disclaimed the necessity, or even the importance, of statistical significance:

[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).

28 U.S. Brief at *5.

29 U.S. Brief at *5-6 (citing David Freedman, Freedman, R. Pisani, R. Purves & A. Adhikari, Statistics 351, 397 (2d ed. 1991)).

30 U.S. Brief at *6 (citing Freedman’s text at 351) (emphasis added).

31 See Joan E. Bertin & Mary S. Henifin, Science, Law, and the Search for Truth in the Courtroom: Lessons from Dauburt v. Menell Dow,” 22 J. Law, Medicine & Ethics 6 (1994); Joan E. Bertin & Mary Sue Henifin, “Scientists Talk to Judges: Reflections on Daubert v. Merrell Dow,” 4(3) New Solutions 3 (1994). The authors’ choice of the New Solutions journal is interesting and curious. New Solutions: A journal of Environmental and Occupational Health Policy was published by the Oil, Chemical and Atomic Workers International Union, under the control of Anthony Mazzocchi (June 13, 1926 – Oct. 5, 2002), who was the union’s secretary-treasurer. Anthony Mazzocchi, “Finding Common Ground: Our Commitment to Confront the Issues,” 1 New Solutions 3 (1990); see also Steven Greenhouse, “Anthony Mazzocchi, 76, Dies; Union Officer and Party Father,” N.Y. Times (Oct. 9, 2002). Even a cursory review of this journal’s contents reveals how concerned, even obsessed, the union was interested and invested in the litigation industry and that industry’s expert witnesses. 

 

“Each and Every Exposure” Is a Substantial Factor

December 3rd, 2018

“Every time a bell rings an angel gets his wings”
It’s a Wonderful Life (1946)

Every time a plaintiff shows the smallest imaginable exposure, there is a full recovery.
… The American tort system.

 

In 1984, Philadelphia County had a non-jury system for asbestos personal injury cases, with a right to “appeal” for a de novo trial with a jury. The non-jury trials were a wonderful training ground for a generation of trial lawyers, and for a generation or two of testifying expert witnesses. When I started to try asbestos cases as a young lawyer, the plaintiffs’ counsel had already taught their expert witnesses to include the “each and every exposure” talismanic language in their direct examination testimonies on the causation of the plaintiffs’ condition. The litigation industry had figured out that this expression would help avoid a compulsory non-suit on proximate causation.

Back in those wild, woolly frontier days, I encountered the slick Dr. Joseph Sokolowski (“Sok”), a pulmonary physician in private practice in New Jersey. Sok, like many other pulmonary physicians in the Delaware Valley area, had seen civilian workers referred by Philadelphia Naval Shipyard to be evaluated for asbestosis. When the plaintiff-friendly physicians diagnosed asbestosis, a few preferred firms would then pursue their claims under the Federal Employees Compensation Act (FECA). The United States government would notify the workers of their occupational disease, and urge them to pursue the government’s outside vendors of asbestos-containing materials, with a reminder that the government had a lien against any civil action recovery. The federal government thus made common cause with the niche law practices of workers’ compensation lawyers,1 and helped launch the tsunami of asbestos litigation.2

Sok was perfect for his role in the federal kick-back scheme. He could deliver the most implausible testimony, and weather brutal cross-examination without flinching. He had the face of a choir boy, and his service as an outside examiner for the Navy Yard employees gave his diagnoses the apparent imprimatur of the federal government. Although Sok had no real understanding of epidemiology, he could readily master the Selikoff litany of 5-10-50, for relative risks for lung cancer, from asbestos alone (supposedly), from smoking alone, and from asbestos and smoking combined, respectively. And he similarly mastered his lines that “each and every exposure” is substantial, when pressed on whether and how exposure to a minor vendor’s product was a substantial factor. Back in those days, before Johns-Manville (JM) Corporation went bankrupt, honest witnesses at the Navy Yard acknowledged that JM supplied the vast majority of asbestos products, but that testimony changed literally over the course of a trial day, when the plaintiffs’ bar learned of the JM bankruptcy.

It was into this topsy-turvy litigation world, I was thrown. I had the sense that there was no basis for the “each and every exposure” opinion, but my elders at the defense bar seemed to avoid the opinion studiously on cross-examination. I recall co-defendants’ counsels’ looks of horror and disapproval when I broached the topic in my first cross-examination. Sok had known to incorporate the “each and every exposure” opinion into his direct testimony, but he had no intelligible response to my question about what possible basis there was for the opinion. “Well, we have to blame each and every exposure because we have no way distinguish among exposures.” I could not let it lie there, and so I asked: “So your opinion about each and every exposure is based upon your ignorance?” My question was quickly met with an objection, and just as quickly with a rather loud and disapproving, “Sustained!” When Sok finished his testimony, I moved to strike his substantial factor opinion as having no foundation, but my motion was met with by judicial annoyance and apathy.

And so I learned that science and logic had nothing to do with asbestos litigation. Some determined defense counsel persevered, however, and in the face of over one hundred bankruptcies,3 a few courts started to take the evidence and arguments against the “every exposure” testimony, seriously. Last week, the New York Court of Appeals, New York’s highest court, agreed to state out loud that the plaintiffs’ “every exposure” theory had no clothes, no foundation, and no science. Juni v. A.O. Smith Water Products Co., No. 123, N.Y. Court of Appeals (Nov. 27, 2018).4

In a short, concise opinion, with a single dissent, the Court held that plaintiffs’ evidence (any exposure, no matter how trivial) in a mesothelioma death case was “insufficient as a matter of law to establish that respondent Ford Motor Co.’s conduct was a proximate cause of the decedent’s injuries.” The ruling affirmed the First Department’s affirmance of a trial court’s judgment notwithstanding the $11 million jury verdict against Ford.5 Arguing for the proposition that every exposure is substantial, over three dozen scientists, physicians, and historians, most of whom regularly support and testify for the litigation industry, filed a brief in support of the plaintiffs.6 The Atlantic Legal Foundation filed an amicus brief on behalf of several scientists,7 and I had the privilege of filing an amicus brief on behalf of the Coalition for Litigation Justice and nine other organizations in support of Ford’s positions.8

It has been 34 years since I first encountered the “every exposure is substantial” dogma in a Philaddelphia courtroom. Some times in litigation, it takes a long time to see the truth come out.


1 E.g., Shein and Brookman; Greitzer & Locks; both of Philadelphia.

2 Encouraging litigation against its suppliers, the federal government pulled off a coup of misdirection. First, it deflected public censure from the Navy and other governmental branches for its own carelessness in the use, installation, and removal of asbestos-containing insulations. Second, the government winnowed the ranks of older, better compensated workers. Third, and most diabolically, the government, which was self-insured for FECA claims, recovered most of their outlay when its former employees recovered judgments or settlements against the government’s outside asbestos product vendors. “The United States Government’s Role in the Asbestos Mess” (Jan. 31, 2012). See also Walter Olson, “Asbestos awareness pre-Selikoff,” Point of Law (Oct. 19, 2007); “The U.S. Navy and the asbestos calamityPoint of Law (Oct. 9, 2007).

4 The plaintiffs were represented by Alani Golanski of Weitz & Luxenberg LLP.

6 Abby Lippman, Annie Thebaud Mony, Arthur L. Frank, Barry Castleman, Bruce P. Lanphear,

Celeste Monforton, Colin L. Soskolne, Daniel Thau Teitelbaum, Dario Consonni, Dario Mirabelli, David Egilman, David F. Goldsmith, David Ozonoff, David Rosner, Fiorella Belpoggi, James Huff, John Heinzow, John M. Dement, John Coulter Maddox, Karl T. Kelsey, Kathleen Ruff, Kenneth D. Rosenman, L. Christine Oliver, Laura Welch, Leslie Thomas Stayner, Morris Greenberg, Nachman Brautbar, Philip J. Landrigan, Xaver Baur, Hans-Joachim Woitowitz, Bice Fubini, Richard Kradin, T.K. Joshi, Theresa S. Emory, Thomas H. Gassert,

Tony Fletcher, and Yv Bonnier Viger.

7 John Henderson Duffus, Ronald E. Gots, Arthur M. Langer, Robert Nolan, Gordon L. Nord, Alan John Rogers, and Emanuel Rubin.

8 Amici Curiae Brief of Coalition for Litigation Justice, Inc., Business Council of New York State, Lawsuit Reform Alliance of New York, New York Insurance Association, Inc., Northeast Retail Lumber Association, National Association of Manufacturers, Chamber of Commerce of the U.S.A., American Tort Reform Association, American Insurance Association, and NFIB Small Business Legal Center Supporting Defendant-Respondent Ford Motor Company.

The “Rothman” Amicus Brief in Daubert v. Merrill Dow Pharmaceuticals

November 17th, 2018

Then time will tell just who fell
And who’s been left behind”

                  Dylan, “Most Likely You Go Your Way” (1966)

 

When the Daubert case headed to the Supreme Court, it had 22 amicus briefs in tow. Today that number is routine for an appeal to the high court, but in 1992, it was a signal of intense interest in the case among both the scientific and legal community. To the litigation industry, the prospect of judicial gatekeeping of expert witness testimony was an anathema. To the manufacturing industry, the prospect was precious to defend against specious claiming.

With the benefit of 25 years of hindsight, a look at some of those amicus briefs reveals a good deal about the scientific and legal acumen of the “friends of the court.” Not all amicus briefs in the case were equal; not all have held up well in the face of time. The amicus brief of the American Association for the Advancement of Science and the National Academy of Science was a good example of advocacy for the full implementation of gatekeeping on scientific principles of valid inference.1 Other amici urged an anything goes approach to judicial oversight of expert witnesses.

One amicus brief often praised by Plaintiffs’ counsel was submitted by Professor Kenneth Rothman and colleagues.2 This amicus brief is still cited by parties who find support in the brief for their excuses for not having consistent, valid, strong, and statistically significance evidence to support their claims of causation. To be sure, Rothman did target statistical significance as a strict criterion of causal inference, but there is little support in the brief for the loosey-goosey style of causal claiming that is so prevalent among lawyers for the litigation industry. Unlike the brief filed by the AAAS and the National Academy of Science, Rothman’s brief abstained from the social policies implied by judicial gatekeeping or its rejection. Instead, Rothman’s brief wet out to make three narrow points:

(1) courts should not rely upon strict statistical significance testing for admissibility determinations;

(2) peer review is not an appropriate touchstone for the validity of an expert witness’s opinion; and

(3) unpublished, non-peer-reviewed “reanalysis” of studies is a routine part of the scientific process, and regularly practiced by epidemiologists and other scientists.

Rothman was encouraged to target these three issues by the lower courts’ opinions in the Daubert case, in which the courts made blanket statements about the role of absent statistical significance and peer review, and the illegitimacy of “re-analyses” of published studies.

Professor Rothman has made many admirable contributions to epidemiologic practice, but the amicus brief submitted by him and his colleagues falls into the trap of making the sort of blanket general statements that they condemned in the lower courts’ opinions. Of the brief’s three points, the first, about statistical significance is the most important for epidemiologic and legal practice. Despite reports of an odd journal here or there “abolishing” p-values, most medical journals continue to require the presentation of either p-values or confidence intervals. In the majority of medical journals, 95% confidence intervals that exclude a null hypothesis risk ratio of 1.0, or risk difference of 0, are labelled “statistically significant,” sometimes improvidently in the presence of multiple comparisons and lack of pre-specification of outcome.

For over three decades, Rothman has criticized the prevailing practice on statistical significance. Professor Rothman is also well known for his advocacy for the superiority of confidence intervals over p-values in conveying important information about what range of values are reasonably compatible with the observed data.3 His criticisms of p-values and his advocacy for estimation with intervals have pushed biomedical publishing to embrace confidence intervals as more informative than just p-values. Still, his views on statistical significance have never gained complete acceptance at most clinical journals. Biomedical scientists continue to interpret 95% confidence intervals, at least in part, as to whether they show “significance” by excluding the null hypothesis value of no risk difference or of risk ratios equal to 1.0.

The first point in Rothman’s amicus brief is styled:

THE LOWER COURTS’ FOCUS ON SIGNIFICANCE TESTING IS BASED ON THE INACCURATE ASSUMPTION THAT ‘STATISTICAL SIGNIFICANCE’ IS REQUIRED IN ORDER TO DRAW INFERENCES FROM EPIDEMIOLOGICAL INFORMATION”

The challenge by Rothman and colleagues to the “assumption” that statistical significance is necessary is what, of course, has endeared this brief to the litigation industry. A close read of the brief, however, shows that Rothman’s critique of the assumption is equivocal. Rothman et amici characterized the lower courts as having given:

blind deference to inappropriate and arcane publication standards and ‘significance testing’.”4

The brief is silent about what might be knowing deference, or appropriate publication standards. To be sure, judges have often poorly expressed their reasoning for deciding scientific evidentiary issues, and perhaps poor communication or laziness by judges was responsible for Rothman’s interest in joining the Daubert fray. Putting aside the unclear, rhetorical, and somewhat hyperbolic use of “arcane” in the quote above, the suggestion of inappropriate blind deference is itself expressed in equivocal terms in the brief. At times the authors rail at the use of statistical significance as the “sole” criterion, and at times, they seem to criticize its use at all.

At least twice in their brief, Rothman and friends declare that the lower court:

misconstrues the validity and appropriateness of significance testing as a decision making tool, apparently deeming it the sole test of epidemiological hypotheses.”5

* * * * * *

this Court should reject significance testing as the sole acceptable criterion of scientific validity in epidemiology.”6

Characterizing “statistical significance” as not the sole test or criterion of scientific inference is hardly controversial, and it implies that statistical significance is one test, criterion, or factor among others. This position is consistent with the current ASA Statement on Significance Testing.7 There is, of course, much more to evaluate in a study or a body of studies, than simply whether they individually or collectively help us to exclude chance as an explanation for their findings.

Statistical Significance Is Not Necessary At All

Elsewhere, Rothman and friends take their challenge to statistical significance testing beyond merely suggesting that such testing is only one test or criterion among others. Indeed, their brief in other places states their opinion that significance testing is not necessary at all:

Testing for significance, however, is often mistaken for a sine qua non of scientific inference.”8

And at other times, Rothman and friends go further yet and claim not only that significance is not necessary, but that it is not even appropriate or useful:

Significance testing, however, is neither necessary nor appropriate as a requirement for drawing inferences from epidemiologic data.”9

Rothman compares statistical significance testing with “scientific inference,” which is not a mechanical, mathematical procedure, but rather a “thoughtful evaluation[] of possible explanations for what is being observed.”10 Significance testing, in contrast,” is “merely a statistical tool,” used inappropriately “in the process of developing inferences.”11 Rothman suggests that the term “statistical significance” could be eliminated from scientific discussions without loss of meaning, and this linguistic legerdemain shows that the phrase is unimportant in science and in law.12 Rothman’s suggestion, however, ignores that causal assessments have always required an evaluation of the play of chance, especially for putative causes, which are neither necessary nor sufficient, and which modify underlying stochastic processes by increasing or decreasing the probability of a specified outcome. Asserting that statistical significance is misleading because it never describes the size of an association, which the Rothman brief does, is like telling us that color terms tell us nothing about the mass of a body.

The Rothman brief does make the salutary point that labeling a study outcome as not “statistically significant” carries the danger that the study’s data have no value, or that the study may be taken to reject the hypothesized association. In 1992, such an interpretation may have been more common, but today, in the face of the proliferation of meta-analyses, the risk of such interpretations of single study outcomes is remote.

Questionable History of Statistics

Rothman suggests that the development of statistical hypothesis testing occurred in the context of agricultural and quality-control experiments, which required yes-no answers for future action.13 This suggestion clearly points at Sir Ronald Fisher and Jerzy Neyman, and their foundational work on frequentist statistical theory and practice. In part, the amici correctly identified the experimental milieu in which Fisher worked, but the description of Fisher’s work is neither accurate nor fair. Fisher spent a lifetime thinking and writing about statistical tests, in much more nuanced ways than implied by the claim that such testing occurred in context of agricultural and quality-control experiments. Although Fisher worked on agricultural experiments, his writings acknowledged that when statistical tests and analyses were applied to observational studies, much more searching analyses of bias and confounding were required. Fisher’s and Berkson’s reactions to the observational studies of Hill and Doll on smoking and lung cancer are telling in this regard. These statisticians criticized the early smoking lung cancer studies, not for lack of statistical significance, but for failing to address confounding by a potential common genetic propensity to smoke and to develop lung cancer.

Questionable History of Drug Development

Twice in Rothman’s amicus brief, the authors suggest that “undue reliance” on statistical significance has resulted in overlooking “effective new treatments” because observed benefits were considered “not significant,” despite an “indication” of efficacy.14 The brief never provided any insight on what is due reliance and what is undue reliance on statistical significance. Their criticism of “undue reliance” implies that there are modes or instances of “due reliance” upon statistical significance. The amicus brief fails also to inform readers exactly what “effective new treatments” have been overlooked because the outcomes were considered “not significant.” This omission is regrettable because it leaves the reader with only abstract recommendations, without concrete examples of what such effective treatments might be. The omission was unfortunate because Rothman almost certainly could have marshalled examples. Recently, Rothman tweeted just such an example:15

“30% ↓ in cancer risk from Vit D/Ca supplements ignored by authors & editorial. Why? P = 0.06. http://bit.ly/2oanl6w http://bit.ly/2p0CRj7. The 95% confidence interval for the risk ratio was 0.42–1.02.”

Of course, this was a large, carefully reported randomized clinical trial, with a narrow confidence interval that just missed “statistical significance.” It is not an example that would have given succor to Bendectin plaintiffs, who were attempting to prove an association by identifying flaws in noisy observational studies that generally failed to show an association.

Readers of the 1992 amicus brief can only guess at what might be “indications of efficacy”; no explanation or examples are provided.16 The reality of FDA approvals of new drugs is that pre-specified 5% level of statistical significance is virtually always enforced.17 If a drug sponsor has “indication of efficacy,” it is, of course, free to follow up with an additional, larger, better-designed clinical trial. Rothman’s recent tweet about the vitamin D clinical trial does provide some context and meaning to what the amici may have meant over 25 years ago by indication of efficacy. The tweet also illustrates Rothman’s acknowledgment of the need to address random variability in a data set, whether by p-value or confidence interval, or both. Clearly, Rothman was criticizing the authors of the vitamin D trial for stopping short of claiming that they had shown (or “demonstrated”) a cancer survival benefit. There is, however, a rich literature on vitamin D and cancer outcomes, and such a claim could be made, perhaps, in the context of a meta-analysis or meta-regression of multiple clinical trials, with a synthesis of other experimental and observational data.18

Questionable History of Statistical Analyses in Epidemiology

Rothman’s amicus brief deserves credit for introducing a misinterpretation of Sir Austin Bradford Hill’s famous paper on inferring causal associations, which has become catechism in the briefs of plaintiffs in pharmaceutical and other products liability cases:

No formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 290 (1965) (quoted at Rothman Brief at *6).

As exegesis of Hill’s views, this quote is misleading. The language quoted above was used by Hill in the context of his nine causal viewpoints or criteria. The Rothman brief ignores Hill’s admonition to his readers, that before reaching the nine criteria, there is a serious, demanding predicate that must be shown:

Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?”

Id. at 295 (emphasis added). Rothman and co-authors did not have to invoke the prestige and authority of Sir Austin, but once they did, they were obligated to quote him fully and with accurate context. Elsewhere, in his famous textbook, Hill expressed his view that common sense was insufficient to interpret data, and that the statistical method was necessary to interpret data in medical studies.19

Rothman complains that statistical significance focuses the reader on conjecture on the role of chance in the observed data rather than the information conveyed by the data themselves.20 The “incompleteness” of statistical analysis for arriving at causal conclusions, however, is not an argument against its necessity.

The Rothman brief does make the helpful point that statistical significance cannot be sufficient to support a conclusion of causation because many statistically significant associations or correlations will be non-causal. They give a trivial example of wearing dresses and breast cancer, but the point is well-taken. Associations, even when statistically significant, are not necessarily causal conclusions. Who ever suggested otherwise, other than expert witnesses for the litigation industry?

Unnecessary Fears

The motivation for Rothman’s challenge to the assumption that statistical significance is necessary is revealed at the end of the argument on Point I. The authors plainly express their concern that false negatives will shut down important research:

To give weight to the failure of epidemiological studies to meet strict ‘statistical significant’ standards — to use such studies to close the door on further inquiry — is not good science.”21

The relevance of this concern to the proceedings is a mystery. The judicial decisions in the case are not referenda on funding initiatives. Scientists were as free in 1993, after Daubert was decided, as they were in 1992, when Rothman wrote, to pursue the hypothesis that Bendectin caused birth defects. The decision had the potential to shut down tort claims, and left scientists to their tasks.

Reanalyses Are Appropriate Scientific Tools to Assess and Evaluate Data, and to Forge Causal Opinions

The Rothman brief took issue with the lower courts’ dismissal of plaintiffs’ expert witnesses’ re-analyses of data in published studies. The authors argued that reanalyses were part of the scientific method, and not “an arcane or specialized enterprise,” deserving of heightened or skeptical scrutiny.22

Remarkably, the Rothman brief, if accepted by the Supreme Court on the re-analysis point, would have led to the sort of unthinking blanket acceptance of a methodology, which the brief’s authors condemned in the context of blanket acceptance of significance testing. The brief covertly urges “blind deference” to its authors on the blanket approval of re-analyses.

Although amici have tight page limits, the brief’s authors made clear that they were offering no substantive opinions on the data involved in the published epidemiologic studies on Bendectin, or on the plaintiffs’ expert witnesses’ re-analyses. With the benefit of hindsight, we can see that the sweeping language used by the Ninth Circuit on re-analyses might have been taken to foreclose important and valid meta-analyses or similar approaches. The Rothman brief is not terribly explicit on what re-analysis techniques were part of the scientific method, but meta-analyses surely had been on the authors’ minds:

by focusing on inappropriate criteria applied to determine what conclusions, if any, can be reached from any one study, the trial court forecloses testimony about inferences that can be drawn from the combination of results reported by many such studies, even when those studies, standing alone, might not justify such inferences.”23

The plaintiffs’ statistical expert witness in Daubert had proffered a re-analysis of at least one study by substituting a different control sample, as well as a questionable meta-analyses. By failing to engage on the propriety of the specific analyses at issue in Daubert, the Rothman brief failed to offer meaningful guidance to the appellate court.

Reanalyses Are Not Invalid Just Because They Have Not Been Published

Rothman was certainly correct that the value of peer review was overstated by the defense in Bendectin litigation.24 The quality of pre-publication peer review is spotty, at best. Predatory journals deploy a pay-to-play scheme, which makes a mockery of scientific publishing. Even at respectable journals, peer review cannot effectively guard against fraud, or ensure that statistical analyses have been appropriately done.25 At best, peer review is a weak proxy for study validity, and an unreliable one at that.

The Rothman brief may have moderated the Supreme Court’s reaction to the defense’s argument that peer review is a requirement for studies, or “re-analyses,” relied upon by expert witnesses. The Court in Daubert opined, in dicta, that peer review is a non-dispositive consideration:

The fact of publication (or lack thereof) in a peer reviewed journal … will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”26

To the extent that Rothman and colleagues might have been disappointed in this outcome, they missed some important context of the Bendectin cases. Most of the cases had been resolved by a consolidated causation issues trial, but many opt-out cases had to be tried in state and federal courts around the country.27 The expert witnesses challenged in Daubert (Drs. Swan and Done) participated in many of these opt-out cases, and in each case, they opined that Bendectin was a public health hazard. The failure of these witnesses to publish their analyses and re-analyses spoke volumes about their bona fides. Courts (and juries if the Swan and Done proffered testimony were admissible) could certainly draw negative inferences from the plaintiffs’ expert witnesses’ failure to publish their opinions and re-analyses.

The Fate of the “Rothman Approach” in the Courts

The so-called “Rothman approach” was urged by Bendectin plaintiffs in opposing summary judgment in a case pending in federal court, in New Jersey, before the Supreme Court decided Daubert. Plaintiffs resisted exclusion of their expert witnesses, who had relied upon inconsistent and statistically non-significant studies on the supposed teratogenicity of Bendectin. The trial court excluded the plaintiffs’ witnesses, and granted summary judgment.28

On appeal, the Third Circuit reversed and remanded the DeLucas’s case for a hearing under Rule 702:

by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicia of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”29

After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow.30

In the end, the decisions in the DeLuca case never endorsed the Rothman approach, although Professor Rothman can take credit perhaps for forcing the trial court, on remand, to come to grips with the informational content of the study data, and the many threats to validity, which severely undermined the relied-upon studies and the plaintiffs’ expert witnesses’ opinions.

More recently, in litigation over alleged causation of birth defects in offspring of mothers who used Zoloft during pregnancy, plaintiffs’ counsel attempted to resurrect, through their expert witnesses, the Rothman approach. The multidistrict court saw through counsel’s assertions that the Rothman approach had been adopted in DeLuca, or that it had become generally accepted.31 After protracted litigation in the Zoloft cases, the district court excluded plaintiffs’ expert witnesses and entered summary judgment for the defense. The Third Circuit found that the district court’s handling of the statistical significance issues was fully consistent with the Circuit’s previous pronouncements on the issue of statistical significance.32


1 filed in Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. Supreme Court No. 92-102 (Jan. 19, 1993), was submitted by Richard A. Meserve and Lars Noah, of Covington & Burling, and by Bert Black, 12 Biotechnology Law Report 198 (No. 2, March-April 1993); see Daubert’s Silver Anniversary – Retrospective View of Its Friends and Enemies” (Oct. 21, 2018).

2 Brief Amici Curiae of Professors Kenneth Rothman, Noel Weiss, James Robins, Raymond Neutra and Steven Stellman, in Support of Petitioners, 1992 WL 12006438, Daubert v. Merrell Dow Pharmaceuticals, Inc., U.S. S. Ct. No. 92-102 (Dec. 2, 1992). [Rothman Brief].

3 Id. at *7.

4 Rothman Brief at *2.

5 Id. at *2-*3 (emphasis added).

6 Id. at *7 (emphasis added).

7 See Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016)

8 Id. at *3.

9 Id. at *2.

10 Id. at *3 – *4.

11 Id. at *3.

12 Id. at *3.

13 Id. at *4 -*5.

14 Id. at*5, *6.

15 at <https://twitter.com/ken_rothman/status/855784253984051201> (April 21, 2017). The tweet pointed to: Joan Lappe, Patrice Watson, Dianne Travers-Gustafson, Robert Recker, Cedric Garland, Edward Gorham, Keith Baggerly, and Sharon L. McDonnell, “Effect of Vitamin D and Calcium Supplementation on Cancer Incidence in Older WomenA Randomized Clinical Trial,” 317 J. Am. Med. Ass’n 1234 (2017).

16 In the case of United States v. Harkonen, Professors Ken Rothman and Tim Lash, and I made common cause in support of Dr. Harkonen’s petition to the United States Supreme Court. The circumstances of Dr. Harkonen’s indictment and conviction provide a concrete example of what Dr. Rothman probably was referring to as “indication of efficacy.” I supported Dr. Harkonen’s appeal because I agreed that there had been a suggestion of efficacy, even if Harkonen had overstated what his clinical trial, standing alone, had shown. (There had been a previous clinical trial, which demonstrated a robust survival benefit.) From my perspective, the facts of the case supported Dr. Harkonen’s exercise of speech in a press release, but it would hardly have justified FDA approval for the indication that Dr. Harkonen was discussing. If Harkonen had indeed committed “wire fraud,” as claimed by the federal prosecutors, then I had (and still have) a rather long list of expert witnesses who stand in need of criminal penalties and rehabilitation for their overreaching opinions in court cases.

17 Robert Temple, “How FDA Currently Makes Decisions on Clinical Studies,” 2 Clinical Trials 276, 281 (2005); Lee Kennedy-Shaffer, “When the Alpha is the Omega: P-Values, ‘Substantial Evidence’, and the 0.05 Standard at FDA,” 72 Food & Drug L.J. 595 (2017); see alsoThe 5% Solution at the FDA” (Feb. 24, 2018).

18 See, e.g., Stefan Pilz, Katharina Kienreich, Andreas Tomaschitz, Eberhard Ritz, Elisabeth Lerchbaum, Barbara Obermayer-Pietsch, Veronika Matzi, Joerg Lindenmann, Winfried Marz, Sara Gandini, and Jacqueline M. Dekker, “Vitamin D and cancer mortality: systematic review of prospective epidemiological studies,” 13 Anti-Cancer Agents in Medicinal Chem. 107 (2013).

19 Austin Bradford Hill, Principles of Medical Statistics at 2, 10 (4th ed. 1948) (“The statistical method is required in the interpretation of figures which are at the mercy of numerous influences, and its object is to determine whether individual influences can be isolated and their effects measured.”) (emphasis added).

20 Id. at *6 -*7.

21 Id. at *9.

22 Id.

23 Id. at *10.

24 Rothman Brief at *12.

25 See William Childs, “Peering Behind The Peer Review Curtain,” Law360 (Aug. 17, 2018).

26 Daubert v. Merrell Dow Pharms., 509 U.S. 579, 594 (1993).

27 SeeDiclegis and Vacuous Philosophy of Science” (June 24, 2015).

28 DeLuca v. Merrell Dow Pharms., Inc., 131 F.R.D. 71 (D.N.J. 1990).

29 DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 955 (3d Cir. 1990).

30 DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (D.N.J. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

31 In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration), aff’d, 858 F.3d 787 (3d Cir. 2017) (affirming exclusion of plaintiffs’ expert witnesses’ dubious opinions, which involved multiple methodological flaws and failures to follow any methodology faithfully). See generallyZoloft MDL Relieves Matrixx Depression” (Jan. 30, 2015); “WOE — Zoloft Escapes a MDL While Third Circuit Creates a Conceptual Muddle” (July 31, 2015).

32 See Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir. 2011) (excluding Concussion hero, Dr. Bennet Omalu).

The American Statistical Association Statement on Significance Testing Goes to Court – Part I

November 13th, 2018

It has been two and one-half years since the American Statistical Association (ASA) issued its statement on statistical significance. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016) [ASA Statement]. When the ASA Statement was published, I commended it as a needed counterweight to the exaggerated criticisms of significance testing.1 Lawyers and expert witnesses for the litigation industry had routinely poo-poohed the absence of statistical significance, but over-endorsed its presence in poorly designed and biased studies. Courts and lawyers from all sides routinely misunderstand, misstated, and misrepresented the meaning of statistical significance.2

The ASA Statement had potential to help resolve judicial confusion. It is written in non-technical language, which is easily understood by non-statisticians. Still, the Statement has to be read with care. The principle of charity led me to believe that lawyers and judges would read the Statement carefully, and that it would improve judicial gatekeeping of expert witnesses’ opinion testimony that involved statistical evidence. I am less sanguine now about the prospect of progress.

No sooner had the ASA issued its Statement than the spinning started. One scientist, and an editor PLoS Biology, blogged that “the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”3 Lawyers for the litigation industry were even less restrained in promoting wild misrepresentations about the Statement, with claims that the ASA had condemned the use of p-values, significance testing, and significance probabilities, as “flawed.”4 And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure.

Criminal Use of the ASA Statement

Where are we now, two plus years out from the ASA Statement? Not surprisingly, the Statement has made its way into the legal arena. The Statement has been used in any number of depositions, relied upon in briefs, and cited in at least a couple of judicial decisions, in the last two years. The empirical evidence of how the ASA Statement has been used, or might be used in the future, is still sparse. Just last month, the ASA Statement was cited by the Washington State Supreme Court, in a ruling that held the death penalty unconstitutional. State of Washington v. Gregory, No. 88086-7, (Wash. S.Ct., Oct. 11, 2018) (en banc). Mr. Gregory, who was facing the death penalty, after being duly convicted or rape, robbery, and murder. The prosecution was supported by DNA matches, fingerprint identification, and other evidence. Mr. Gregory challenged the constitutionality of his imposed punishment, not on per se grounds of unconstitutionality, but on race disparities in the imposition of the death penalty. On this claim, the Washington Supreme Court commented on the empirical evidence marshalled on Mr. Gregory’s behalf:

The most important consideration is whether the evidence shows that race has a meaningful impact on imposition of the death penalty. We make this determination by way of legal analysis, not pure science. At the very most, there is an 11 percent chance that the observed association between race and the death penalty in Beckett’s regression analysis is attributed to random chance rather than true association. Commissioner’s Report at 56-68 (the p-values range from 0.048-0.111, which measures the probability that the observed association is the result of random chance rather than a true association).[8] Just as we declined to require ‘precise uniformity’ under our proportionality review, we decline to require indisputably true social science to prove that our death penalty is impermissibly imposed based on race.

Id. (internal citations omitted).

Whatever you think of the death penalty, or how it is imposed in the United States, you will have to agree that the Court’s discussion of statistics is itself criminal. In the above quotation from the Court’s opinion, the Court badly misinterpreted the p-values generated in various regression analyses that were offered to support claims of race disparity. The Court’s equating statistically significant evidence of race disparity in these regression analyses with “indisputably true social science” also reflects a rhetorical strategy that imputes ridiculously high certainty (indisputably true) to social science conclusions in order to dismiss the need for them in order to accept a causal race disparity claim on empirical evidence.5

Gregory’s counsel had briefed the Washington Court on statistical significance, and raised the ASA Statement as excuse and justification for not presenting statistically significant empirical evidence of race disparity.6 Footnote 8, in the above quote from the Gregory decision shows that the Court was aware of the ASA Statement, which makes the Court’s errors even more unpardonable: 

[8] The most common p-value used for statistical significance is 0.05, but this is not a bright line rule. The American Statistical Association (ASA) explains that the ‘mechanical “bright-line” rules (such as “p < 0.05”) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making’.”7

Conveniently, Gregory’s counsel did not cite to other parts of the ASA Statement, which would have called for a more searching review of the statistical regression analyses:

“Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.”8

The Supreme Court of Washington first erred in its assessment of what scientific evidence requires in terms of a burden of proof. It then accepted spurious arguments to excuse the absence of statistical significance in the statistical evidence before it, on the basis of a distorted representation of the ASA Statement. Finally, the Court erred in claiming support from social science evidence, by ignoring other methodological issues in Gregory’s empirical claims. Ironically, the Court had made significance testing the end all and be all of its analysis, and when it dispatched statistical significance as a consideration, the Court jumped to the conclusion it wanted to reach. Clearly, the intended message of the ASA Statement had been subverted by counsel and the Court.

2 See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005). See alsoConfidence in Intervals and Diffidence in the Courts” (March 4, 2012); “Scientific illiteracy among the judiciary” (Feb. 29, 2012).

5 Moultrie v. Martin, 690 F.2d 1078, 1082 (4th Cir. 1982) (internal citations omitted) (“When a litigant seeks to prove his point exclusively through the use of statistics, he is borrowing the principles of another discipline, mathematics, and applying these principles to the law. In borrowing from another discipline, a litigant cannot be selective in which principles are applied. He must employ a standard mathematical analysis. Any other requirement defies logic to the point of being unjust. Statisticians do not simply look at two statistics, such as the actual and expected percentage of blacks on a grand jury, and make a subjective conclusion that the statistics are significantly different. Rather, statisticians compare figures through an objective process known as hypothesis testing.”).

6 Supplemental Brief of Allen Eugene Gregory, at 15, filed in State of Washington v. Gregory, No. 88086-7, (Wash. S.Ct., Jan. 22, 2018).

7 State of Washington v. Gregory, No. 88086-7, (Wash. S.Ct., Oct. 11, 2018) (en banc) (internal citations omitted).

8 ASA Statement at 132.