For your delectation and delight, desultory dicta on the law of delicts.

Expert Witnesses Who Don’t Mean What They Say

March 24th, 2019

’Then you should say what you mean’, the March Hare went on.
‘I do’, Alice hastily replied; ‘at least–at least I mean what I say–that’s the same thing, you know’.
‘Not the same thing a bit!’ said the Hatter. ‘You might just as well say that “I see what I eat” is the same thing as “I eat what I see!”’

Lewis Carroll, Alice’s Adventures in Wonderland, Chapter VII (1865)

Anick Bérard is an epidemiologist at the Université de Montréal. Most of her publications involve birth outcomes and maternal medication use, but Dr. Bérard’s advocacy also involves social media (Facebook, YouTube) and expert witnessing in litigation against the pharmaceutical industry.

When the FDA issued its alert about cardiac malformations in children born to women who took Paxil (paroxetine) in their first trimesters of pregnancy, the agency characterized its assessment of the “early results of new studies for Paxil” as “suggesting that the drug increases the risk for birth defects, particularly heart defects, when women take it during the first three months of pregnancy.”1 The agency also disclaimed any conclusion of “class effect” among the other selective serotonin reuptake inhibitors (SSRIs), such as Zoloft (sertraline), Celexa (citalopram), and Prozac (fluoxetine). Indeed, the FDA requested the manufacturer of paroxetine to undertake additional research to look at teratogenicity of paroxetine, as well as the possibility of class effects. That research never showed an SSRI teratogenicity class effect.

A “suggestion” from the FDA of an adverse effect is sufficient to launch a thousand litigation complaints, which were duly filed against GlaxoSmithKline. The plaintiffs’ counsel recruited Dr. Bérard to serve as an expert witness in support of a wide array of birth defects in Paxil cases. In her hands, the agency’s “suggestion” of causation became a conclusion. The defense challenged Bérard’s opinions, but the federal court motion to exclude her causal opinions were taken under advisement, without decision. Hayes v. SmithKline Beecham Corp., 2009 WL 4912178 (N.D. Okla. Dec. 14, 2009). One case in state court went to trial, with a verdict for plaintiffs.

Despite Dr. Bérard;s zealous advocacy for a causal association between Paxil and birth defects, she declined to assert any association between maternal use of the other, non-paroxetine SSRIs and birth defects. Here is an excerpt from her Rule 26 report in a paroxetine case:

Taken together, the available scientific evidence makes it clear that Paxil use during the first trimester of pregnancy is an independent risk factor that at least doubles the risk of cardiovascular malformations in newborns at all commonly used doses. This risk has been consistent and was further reinforced by repeated observational study findings as well as meta-analyses results. No such associations were found with other types of SSRI exposures during gestation.”2

In her sworn testimony, Dr. Bérard made clear that she really meant what she had written in her report, about exculpating the non-paroxetine SSRIs of any association with birth defects:

Q. Is it fair to say that you will not be offering an opinion that SSRIs as a class, or individual SSRIs other than Paxil increased the risk of cardiovascular malformations in newborns?

A. This is not what I was asked to do.

Q. But in fact you actually write in your report that you don’t believe there’s sufficient data to reach any conclusion about other SSRIs, true?

A. Correct.”3

In 2010, Dr. Bérard, along with two professional colleagues, published what they called a systematic review of antidepressant use in pregnancy and birth outcomes.4 In this review, Bérard specifically advised that paroxetine should be avoided by women of childbearing age, but she and her colleagaues affirmatively encouraged use of other SSRIs, such as fluoxetine, sertraline, and citalopram:

Clinical Approach: A Brief Overview

For women planning a pregnancy or when a treatment initiation during pregnancy is deemed necessary, the decision should rely not only on drug safety data but also on other factors such as the patient’s condition, previous response to other antidepressants, comorbidities, expected adverse effects and potential interactions with other current pharmacological treatments. Since there is a more extensive clinical experience with SSRIs such as fluoxetine, sertraline, and citalopram, these agents should be used as first-line therapies. Whenever possible, one should refrain from prescribing paroxetine to women of childbearing potential or planning a pregnancy. However, antenatal screening such as fetal echocardiography should be considered in a woman exposed prior to finding out about her pregnancy.5

When Bérard wrote and published her systematic review, she was still actively involved as an expert witness for plaintiffs in lawsuits against the manufacturers of paroxetine. In her 2010 review, Dr. Bérard gave no acknowledgment of monies earned in her capacity as an expert witness, and her disclosure of potential conflicts of interest was limited to noting that she was “a consultant for a plaintiff in the litigation involving Paxil.”6 In fact, Bérard had submitted multiple reports, testified at deposition, and had been listed as a testifying expert witness in many cases involving Paxil or paroxetine.

Not long after the 2010 review article, Glaxo settled most of the pending paroxetine birth defect cases, and the plaintiffs’ bar pivoted to recast their expert witnesses’ opinions as causal teratogenic conclusions about the entire class of SSRIs. In 2012, the federal courts established a “multi-district litigation,” MDL 2342, for birth defect cases involving Zoloft (sertraline), in the Philadelphia courtroom of Judge Cynthia Rufe, in the Eastern District of Pennsylvania.

Notwithstanding her 2010 clinical advice that pregnant women with depression should use fluoxetine, sertraline, or citalopram, Dr. Bérard became actively involved in the new litigation against the other, non-Paxil SSRI manufacturers. By 2013, Dr. Bérard was on record as a party expert witness for plaintiffs, opining that setraline causes virtually every major congenital malformation.7

In the same year, 2013, Dr. Bérard published another review article on teratogens, but now she gave a more equivocal view of the other SSRIs, claiming that they were “known carcinogens,” but acknowledging in a footnote that teratogenicity of the SSRIs was “controversial.”8 Incredibly, this review article states that “Anick Bérard and Sonia Chaabane have no potential conflicts of interest to disclose.”9

Ultimately, Dr. Bérard could not straddle her own contradictory statements and remain upright, which encouraged the MDL court to examine her opinions closely for methodological shortcomings and failures. Although Bérard had evolved to claim a teratogenic “class effect” for all the SSRIs, the scientific support for her claim was somewhere between weak to absent.10 Perhaps even more distressing, many of the pending claims involving the other SSRIs arose from pregnancies and births that predated Bérard’s epiphany about class effect. Finding ample evidence of specious claiming, the federal court charged with oversight of the sertraline birth defect claims excluded Dr. Bérard’s causal opinions for failing to meet the requirements of Federal Rule of Evidence 702.11

Plaintiffs sought to substitute Nicholas Jewell for Dr. Bérard, but Dr. Jewell fared no better, and was excluded for other methodological shenanigans.12 Ultimately, a unanimous panel of the United States Court of Appeals, for the Third Circuit, upheld the expert witness exclusions.13

1 See “FDA Advising of Risk of Birth Defects with Paxil; Agency Requiring Updated Product Labeling,” P05-97 (Dec. 8, 2005) (emphasis added).

2 Bérard Report in Hayes v. SmithKline Beecham Corp, 2009 WL 3072955, at *4 (N.D. Okla. Feb. 4, 2009) (emphasis added).

3 Deposition Testimony of Anick Bérard, in Hayes v. SmithKline Beecham Corp., at 120:16-25 (N.D. Okla. April 2009).

4 Marieve Simoncelli, Brigitte-Zoe Martin & Anick Bérard, “Antidepressant Use During Pregnancy: A Critical Systematic Review of the Literature,” 5 Current Drug Safety 153 (2010).

5 Id. at 168b.

6 Id. at 169 (emphasis added).

7 See Anick Bérard, “Expert Report” (June 19, 2013).

8 Sonia Chaabanen & Anick Bérard, “Epidemiology of Major Congenital Malformations with Specific Focus on Teratogens,” 8 Current Drug Safety 128, 136 (2013).

9 Id. at 137b.

10 See, e.g., Nicholas Myles, Hannah Newall, Harvey Ward, and Matthew Large, “Systematic meta-analysis of individual selective serotonin reuptake inhibitor medications and congenital malformations,” 47 Australian & New Zealand J. Psychiatry 1002 (2013).

11 See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 26 F.Supp. 3d 449 (E.D.Pa. 2014) (Rufe, J.). Plaintiffs, through their Plaintiffs’ Steering Committee, moved for reconsideration, but Judge Rufe reaffirmed her exclusion of Dr. Bérard. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration). See Zoloft MDL Relieves Matrixx Depression” (Jan. 30, 2015).

12 See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed); In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). See alsoThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

The Contrivance Standard for Gatekeeping

March 23rd, 2019

According to Google ngram, the phrase “junk science” made its debut circa 1975, lagging junk food by about five years. SeeThe Rise and Rise of Junk Science” (Mar. 8, 2014). I have never much like the phrase “junk science” because it suggests that courts need only be wary of the absurd and ridiculous in their gatekeeping function. Some expert witness opinions are, in fact, serious scientific contributions, just not worthy of being advanced as scientific conclusions. Perhaps better than “junk” would be patho-epistemologic opinions, or maybe even wissenschmutz, but even these terms might obscure that the opinion that needs to be excluded derives from serious scientific, only it is not ready to be held forth as a scientific conclusion that can be colorably called knowledge.

Another formulation of my term, patho-epistemology, is the Eleventh Circuit’s lovely “Contrivance Standard.” Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 & n.7 (11th Cir. 2005). In Rink, the appellate court held that the district court had acted within its discretion to exclude expert witness testimony because it had properly confined its focus to the challenged expert witness’s methodology, not his credibility:

“In evaluating the reliability of an expert’s method, however, a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result. See Joiner, 522 U.S. at 146, 118 S.Ct. at 519 (affirming exclusion of testimony where the methodology was called into question because an “analytical gap” existed “between the data and the opinion proffered”); see also Elcock v. Kmart Corp., 233 F.3d 734, 748 (3d Cir. 2000) (questioning the methodology of an expert because his “novel synthesis” of two accepted methodologies allowed the expert to ”offer a subjective judgment … in the guise of a reliable expert opinion”).”

Note the resistance, however, to the Supreme Court’s mandate of gatekeeping. District courts must apply the statutes, Rule of Evidence 702 and 703. There is no legal authority for the suggestion that a district court “may properly consider wither the expert’s methodology has been contrived.” Rink, 400 F.3d at 1293 n.7 (emphasis added).

The Joiner Finale

March 23rd, 2019

“This is the end
Beautiful friend

This is the end
My only friend, the end”

Jim Morrison, “The End” (c. 1966)

The General Electric Co. v. Joiner, 522 U.S. 136 (1997), case was based upon polychlorinated biphenyl exposures (PCB), only in part. The PCB part did not hold up well legally in the Supreme Court; nor was the PCB lung cancer claim vindicated by later scientific evidence. SeeHow Have Important Rule 702 Holdings Held Up With Time?” (Mar. 20, 2015).

The Supreme Court in Joiner reversed and remanded the case to the 11th Circuit, which then remanded the case back to the district court to address claims that Mr. Joiner had been exposed to furans and dioxins, and that these other chemicals had caused, or contributed to, his lung cancer, as well. Joiner v. General Electric Co., 134 F.3d 1457 (11th Cir. 1998) (per curiam). Thus the dioxins were left in the case even after the Supreme Court ruled.

After the Supreme Court’s decision, Anthony Roisman argued that the Court had addressed an artificial question when asked about PCBs alone because the case was really about an alleged mixture of exposures, and he held out hope that the Joiners would do better on remand. Anthony Z. Roisman, “The Implications of G.E. v. Joiner for Admissibility of Expert Testimony,” 1 Res Communes 65 (1999).

Many Daubert observers (including me) are unaware of the legal fate of the Joiners’ claims on remand. In the only reference I could find, the commentator simply noted that the case resolved before trial.[1] I am indebted to Michael Risinger, and Joseph Cecil, for pointing me to documents from PACER, which shed some light upon the Joiner “endgame.”

In February 1998, Judge Orinda Evans, who had been the original trial judge, and who had sustained defendants’ Rule 702 challenges and granted their motions for summary judgments, received and reopened the case upon remand from the 11th Circuit. In March, Judge Evans directed the parties to submit a new pre-trial order by April 17, 1998. At a status conference in April 1998, Judge Evans permitted the plaintiffs additional discovery, to be completed by June 17, 1998. Five days before the expiration of their additional discovery period, the plaintiffs moved for additional time; defendants opposed the request. In July, Judge Evans granted the requested extension, and gave defendants until November 1, 1998, to file for summary judgment.

Meanwhile, in June 1998, new counsel entered their appearances for plaintiffs – William Sims Stone, Kevin R. Dean, Thomas Craig Earnest, and Stanley L. Merritt. The docket does not reflect much of anything about the new discovery other than a request for a protective order for an unpublished study. But by October 6, 1998, the new counsel, Earnest, Dean, and Stone (but not Merritt) withdrew as attorneys for the Joiners, and by the end of October 1998, Judge Evans entered an order to dismiss the case, without prejudice.

A few months later, in February 1999, the parties filed a stipulation, approved by the Clerk, dismissing the action with prejudice, and with each party to bear its own coasts. Given the flight of plaintiffs’ counsel, the dismissals without and then with prejudice, a settlement seems never to have been involved in the resolution of the Joiner case. In the end, the Joiners’ case fizzled perhaps to avoid being Frye’d.

And what has happened since to the science of dioxins and lung cancer?

Not much.

In 2006, the National Research Council published a monograph on dioxin, which took the controversial approach of focusing on all cancer mortality rather than specific cancers that had been suggested as likely outcomes of interest. See David L. Eaton (Chairperson), Health Risks from Dioxin and Related Compounds – Evaluation of the EPA Reassessment (2006). The validity of this approach, and the committee’s conclusions, were challenged vigorously in subsequent publications. Paolo Boffetta, Kenneth A. Mundt, Hans-Olov Adami, Philip Cole, and Jack S. Mandel, “TCDD and cancer: A critical review of epidemiologic studies,” 41 Critical Rev. Toxicol. 622 (2011) (“In conclusion, recent epidemiological evidence falls far short of conclusively demonstrating a causal link between TCDD exposure and cancer risk in humans.”

In 2013, the Industrial Injuries Advisory Council (IIAC), an independent scientific advisory body in the United Kingdom, published a review of lung cancer and dioxin. The Council found the epidemiologic studies mixed, and declined to endorse the compensability of lung cancer for dioxin-exposed industrial workers. Industrial Injuries Advisory Council – Information Note on Lung cancer and Dioxin (December 2013). See also Mann v. CSX Transp., Inc., 2009 WL 3766056, 2009 U.S. Dist. LEXIS 106433 (N.D. Ohio 2009) (Polster, J.) (dioxin exposure case) (“Plaintiffs’ medical expert, Dr. James Kornberg, has opined that numerous organizations have classified dioxins as a known human carcinogen. However, it is not appropriate for one set of experts to bring the conclusions of another set of experts into the courtroom and then testify merely that they ‘agree’ with that conclusion.”), citing Thorndike v. DaimlerChrysler Corp., 266 F. Supp. 2d 172 (D. Me. 2003) (court excluded expert who was “parroting” other experts’ conclusions).

Last year, an industrial cohort, followed for two decades found no increased risk of lung cancer among workers exposed to dioxin. David I. McBride, James J. Collins, Thomas John Bender, Kenneth M Bodner, and Lesa L. Aylward, “Cohort study of workers at a New Zealand agrochemical plant to assess the effect of dioxin exposure on mortality,” 8 Brit. Med. J. Open e019243 (2018) (reporting SMR for lung cancer 0.95, 95%CI: 0.56 to 1.53)

[1] Morris S. Zedeck, Expert Witness in the Legal System: A Scientist’s Search for Justice 49 (2010) (noting that, after remand from the Supreme Court, Joiner v. General Electric resolved before trial)


Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

March 23rd, 2019

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  • Double the one-sided p-value.
  • Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  • Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). SeeThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., 145 F.Supp. 3d 573 (D.S.C. 2015), reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016). As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.

[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed. Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).