TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Talc Litigation in Missouri – Show Me the Law and the Evidence

February 22nd, 2017

In New Jersey, where the courts are particularly plaintiff friendly but not beyond the persuasive force of evidence, lawsuit industry claims that talc causes ovarian cancer have not fared well. Last year, Judge Johnson, of Atlantic County, New Jersey, held that the plaintiffs’ causal claims failed to meet even the minimal New Jersey legal threshold of scientific validity.1 Meanwhile, in Missouri, juries have been returning large verdicts for plaintiffs on their claims that their use of talc products caused their ovarian cancers.2

What gives? Why is the outcome of similar litigation so different in New Jersey from that in Missouri? One might mistakenly think that courts in Missouri would be skeptical of scientifically dubious claims. After all, Missouri is the “Show Me” state; right? Many people understand the state’s nickname to mean that Missourians are not gullible.3

The reality of the origins of the Missouri nickname may well be different. The most cited account reports that a congressman from Missouri, Willard Duncan Vandiver, used the phrase in an 1899 speech:

I come from a state that raises corn and cotton and cockleburs and Democrats, and frothy eloquence neither convinces nor satisfies me. I am from Missouri. You have got to show me.”

Basically, according to Vandiver, Missourians are “show me” simple folks because they do not read or understand eloquent language. Vandiver might have thought that scientific language was beyond his neighbors’ ken as well. Of course, things have changed since 1899. Missouri is no longer a state populated by Democrats. In the 2016 general election, Donald Drumpf received 56.8% of the Missouri votes cast. Hilary Clinton received 38.1%.4  Inquiring minds will want to know whether “Show Me” connotes incredulity or illiteracy.

One relevant difference between Missouri and many other states, and all the federal courts, is that some courts in Missouri engage in a particularly edentulous form of judicial gatekeeping of expert witness opinion testimony. The talc claims that resulted in large verdicts in Missouri never got off the dime (or got a dime) in New Jersey because plaintiffs’ expert witnesses’ opinions were excluded from courtrooms in the Garden State.

The resulting trials in Missouri have showcased some curious, doubtful rhetoric from legal counsel for the lawsuit industry. In his closing argument in Giannecchini v. Johnson & Johnson, the plaintiff’s lawyer accused Johnson & Johnson of having “rigged” regulatory agencies to ignore the dangers of talc.5 The argument was apparently effective and it has been repeated in another Missouri trial, in Swann v. Johnson & Johnson6, now underway. The plaintiffs’ opening “statement” in Swann was marked by overwrought, hyperbolic rhetoric.7

And the first trial days in Swann were dedicated by plaintiff’s counsel to showing, not that talc actually causes ovarian cancer, but to showing that the defendants engaged in lobbying with respect to the carcinogenic classification of talc by regulatory agencies.8 According to the coverage in legal news media, the first testimony offered was offered to show that after the National Toxicology Program (NTP) nominated talc for inclusion in its list of potential carcinogens, industry trade groups, such as the Cosmetic, Toiletry and Fragrance Association, “shut down serious regulator concerns through intensive lobbying efforts.”9

This is a remarkable digression from the truth finding function of an American jury trial for several reasons. First, the “shutting down” of regulator concern was not, in the media reports, associated with any fraudulent misrepresentations of the scientific record. By casting the lobbying in an unflattering light, the plaintiff was able to undermine the truth value of agencies’ refusal to characterize talc as an ovarian carcinogen. The media coverage did not suggest that the lobbying involved the presentation of sham evidence or arguments that might have misled agencies about the correctness of their position.

Second, if the industry lobbying had badly misled the National Toxicology Program, or other government body, then there would no doubt be a conclusive case for causation today. The fact of the matter, however, is that there is no conclusive case for the claim that talc causes ovarian cancer. Late last year, the “Sister Study,” which explored whether there was any association between perineal talc use and ovarian cancer, was published in Epidemiology.10 The Sister Study (2003–2009) followed a cohort of 50,884 women whose sisters had been diagnosed with breast cancer. Talc use was ascertained at baseline, before diagnosis of subsequent disease and before any chance for selective recall. The cohort was followed for a median of 6.6 years, in which time there were 154 cases of ovarian cancer, available for analysis using Cox’s proportional hazards model. Perineal talc use at baseline was not associated with later ovarian cancer. The authors reported a hazard ratio of 0.73, less than expected, with a 95% confidence interval of 0.44, 1.2. Such a powerful study, showing the absence of any large or even modest association, would hardly be feasible if the science were so clear in the year 2000 that no reasonable scientist would have advocated against the NTP’s proposed classification.

Third, the lawsuit industry’s focus on lobbying activities in the Giannecchini and the Swann cases raises serious issues of infringing upon the defendants’ first amendment rights. The defendants’ advocacy for non-sham, non-fraudulent scientific positions is protected by the federal constitution, under what has come to be known as the Noerr-Pennington doctrine.

The Noerr-Pennington Doctrine of Immunity

One of the first agenda items for the first United States Congress was the drafting of a “Bill of Rights” to be submitted to the individual States for ratification. The First amendment (originally the third until the first two were dropped) sets forth a basic “right of the people to peaceably assemble, and to petition the government for a redress of grievances.”11 In the context of lobbying legislatures and regulatory agencies, the Supreme Court has long regarded lobbying and advocacy for and against legislation and regulation as core political speech that is protected by the right to petition the government.12

Part of this constitutional guarantee is a freedom to associate with others to lobby for redress.13 The constitutional protection is not lost by an economic or self-interested motivation in the lobbying or advocacy.14  This constitutional protection of advocacy positions results in an immunity from civil liability for speech, association, and conduct undertaken to advance advocacy positions before legislatures, agencies, and courts.15 This immunity, over half a century old, has come to be known as the Noerr-Pennington doctrine.

Although the original Noerr-Pennington doctrine cases specifically addressed claims of antitrust liability, later cases have held that the immunity applies with equal force in tort cases. State courts, regardless of their state constitutions, are of course obliged to grant and protect the federal Noerr-Pennington immunity.16

The unconstitutional infringement of defendants’ first amendment rights is hardly an innovation in Giannecchini and Swann cases. For decades, the lawsuit industry, which jealously guards its own first amendment rights, has overzealously pressed conspiracy and tort claims against manufacturing industry for trying to influence legislation and regulation. In Senart v. Mobay Chem. Corp., 597 F. Supp. 502 (D. Minn. 1984), plaintiffs alleged that they were harmed by exposure to toluene diisocyanate (TDI), a feedstock chemical used in making polyurethane foam. The plaintiffs sued TDI manufacturers, on conspiracy claims that the manufacturers had jointly influenced the Occupational and Safety Health Administration (OSHA) to reject a recommendation from the National Institute for Occupational Safety and Health (NIOSH) for lower permissible exposure standards for TDI. Senart, 597 F. Supp. at 504. The plaintiffs’ conspiracy complaint was based upon allegations that the manufacturing defendants knew of a body of scientific evidence which suggested that workers could suffer harm at exposure levels below the prevailing … standard,” and and that they “conspired to ‘obfuscate and confuse’ scientific findings which supported a more stringent standard.” Id. Plaintiffs also alleged that the TDI manufacturers knew that a more stringent TDI exposure standard would harm their businesses. Id.

The trial court dismissed the conspiracy count in Senart. “[E]ven accepting plaintiffs’ allegations as true, defendants concerted action sought only permissible ends and acted through permissible means.” Id. at 505-6 (footnote omitted). The defendants work in concert through their trade association to persuade OSHA to reject the NIOSH proposal was clearly protected by the first amendment. Id. at 506 (internal citations omitted).

Following Senart, federal courts in later products cases have applied he Noerr-Pennington doctrine to bar tort claims. In a 1996 class action, a district court held that the immunity barred a class action filed by relatives of gunshot victims against gun manufacturers. Hamilton v. ACCU-TEK 935 F. Supp. 1307 (E.D.N.Y. 1996). The court, in Hamilton, found the plaintiffs’ negligence and product liability claims untenable:

Defendants’ efforts to affect federal firearm policies through lobbying activities are prime examples of the types of activity the First Amendment, through its rights of free speech and petition, sought to protect… . A core principle of the Noerr-Pennington doctrine is that lobbying alone cannot form the basis of liability… .”

Id. at 1321. The court in Hamilton dismissed the product liability claims. See also Tuosto v. Philip Morris USA Inc., No. 05 Civ. 9384 (PKL), 2007 WL 2398507, at *5 (S.D.N.Y. Aug. 21, 2007) (noting that the immunity “applied to bar liability in state common law tort claims, including negligence and products liability claims, for statements made in the course of petitioning the government”).

The lawsuit industry is one of the largest rent-seeking groups in the United States. Our courts need to apply constitutional standards in a symmetrical fashion, with an understanding that what is spoken in the halls of legislatures and agencies is protected at least as much as speech in the courtroom, and that the constitutional rights of manufacturing industry should not be subordinated to the rights of the lawsuit industry. Maybe lawyers need to figure out how to “show” the constitution in pictograms, without all the 18th century eloquence.


1 Carl v. Johnson & Johnson, No. ATL-L-6546-14, 2016 WL 4580145 (N.J. Super. Ct. Law Div., Atl. Cty., Sept. 2, 2016).See New Jersey Kemps Ovarian Cancer – Talc Cases” (Sept. 16, 2016).

2Talc Litigation – Stop the Madness” (Nov. 10, 2016) (describing large verdict for plaintiff in Giannecchini v. Johnson & Johnson); see also Myron Levin, “Johnson & Johnson Hammered Again in Talc-Ovarian Cancer Verdict of $70 Million,” Law360 (Oct. 27, 2016); Brandon Lowrey, “J & J, Talc Co. Hit With $70M Baby Powder Cancer Verdict,” Law360 (Oct. 2016).

3 SeeThe Show-Me State,” last visited Feb. 21, 2017.

4 SeeState of Missouri – 2016 General Election – November 8, 2016,” last visited Feb. 21, 2017. I leave it to the reader to assess whether the state nickname describes incredulity or illiteracy.

5 Myron Levin, “Johnson & Johnson Hammered Again in Talc-Ovarian Cancer Verdict of $70 Million,” Law360 (Oct. 27, 2016); Brandon Lowrey, “J & J, Talc Co. Hit With $70M Baby Powder Cancer Verdict,” Law360 (Oct. 2016).

6 Swann v. Johnson & Johnson, case number 1422-CC09326-01, in the 22nd Judicial Circuit of Missouri.

7 Cara Salvatore, “J&J Hid Talc Risk For ‘Love Of Money’, Jury Hears,” Law360 (Feb. 9, 2017).

8 Cara Salvatore, “Talc Lobbyists Stymied Carcinogen Classification, Jury Hears,” Law360 (Feb. 10, 2017).

9 Id.

10 Nicole L. Gonzalez, Katie M. O’Brien, Aimee A. D’Aloisio, Dale P. Sandler, and Clarice R. Weinberg, “Douching, Talc Use, and Risk of Ovarian Cancer,” 27 Epidemiology 797 (2016).

11 U.S. Const. amend. I.

12 California Motor Transp. Co. v. Trucking Unlimited, 404 U.S. 508, 510 (1972) (disallowing a cause of action “predicated upon mere attempts to influence the Legislative branch for the passage of laws or the Executive branch for their enforcement.”); United Mine Workers of Am. v. Ill. State Bar Ass’n, 389 U.S. 217, 222 (1967) (characterizing the right to petition as “among the most precious of the liberties safeguarded by the Bill of Rights”). United Mine Workers of Am. v. Pennington, 381 U.S. 657, 669-70 (1965); Doe v. McMillan, 566 F.2d 713, 718 (D.C.Cir. 1977), cert. denied, 435 U.S. 969 (1978) (holding that the first amendment constitutional right to petition the legislature “extends to administrative agencies and the courts”).

13 N.A.A.C.P. v. Button, 371 U.S. 415, 430 (1963) (protecting the right “to engage in association for the advancement of beliefs and ideas”); N.A.A.C.P. v. Alabama ex rel. Patterson, 357 U.S. 449, 460 (1958) (“[e]ffective advocacy of both public and private points of view, particularly controversial ones, is undeniably enhanced by group association … .”). The right of association to further lobbying activities has been described as having a “preferred place” along with other first amendment freedoms, such that the Court will not tolerate “dubious intrusions.” Thomas v. Collins, 323 U.S. 516, 530 (1945).

14 Virginia State Bd. of Pharmacy v. Virginia Citizens Consumer Council, 425 U.S. 748, 762 (1976); Sawyer v. Sandstrom, 615 F.2d 311, 316 (5th Cir. 1980) (“The right to freely associate is not limited to those associations which are ‘political in the customary sense’, but includes those which ‘pertain to the social, legal, and economic benefit of the members’.”) (citing Griswold v. Connecticut, 381 U.S. 479, 483 (1965)); International Union v. National Right to Work Legal Defense & Education Foundation, Inc., 590 F.2d 1139, 1148 (D.C. Cir. 1978) (“Even economically motivated expression or association is not disqualified from protection under the first amendment.”); Greminger v. Seaborne, 584 F.2d 275, 278 (8th Cir. 1978) (observing that the constitutionally protected [f]reedom of association includes membership in unions or other organizations concerned with ‘business and economic causes’.”); Senart v. Mobay Chem. Corp., 597 F. Supp. 502, 506 (D.Minn. 1984) (“Selfish motivations do not lessen one’s right to present views to the government.”).

15 Eastern Railroad Presidents Conference v. Noerr Motor Freight, Inc., 365 U.S. 127 (1961); United Mine Workers v. Pennington, 381 U.S. 657 (1965).

16 Fraser v. Bovino, 317 N.J. Super 23, 37 (App. Div. 1998) (recognizing “the fundamental values that undergird a citizen’s right to communicate on issues of public import”); Village Supermarket, Inc. v. Mayfair, 269 N.J. Super. 224, 229-32 (Law Div. 1995) (refusing to interpret New Jersey tort law to permit claims based on lobbying activity protected by the First Amendment); ARTS4ALL Ltd. v. Hancock, 810 N.Y.S.2d 15, 16 (App. Div. 2006) (denying employee’s motion for summary judgment on claim for breach of no-disparagement clause in severance agreement, holding that employer’s statements to government officials were protected by Noerr-Pennington doctrine); Concourse Nursing Home v. Engelstein, 692 N.Y.S. 2d 888, 891 (Sup. Ct. 1999) (holding law firm was immune from business tort claims for successful lobbying efforts); I.G. Second Generation Partners v. Reade, 793 N.Y.S.2d 379, 381 (App. Div. 2005) (holding that NoerrPennington immunity barred claim for tortious interference); Diaz v. Southwest Wheel, 736 S.W.2d 770, 771 (Tx. Ct. App. 1987) (holding that Noerr-Pennington immunity barred conspiracy claims against tire manufacturer, which as a member of a trade association, opposed the recall on defective tire rims and restrictions on multi-piece wheels).

White Hat Bias in the Lab and in the Courtroom

February 20th, 2017

nqhefb6sjs

Talc Litigation – Stop the Madness

November 10th, 2016

Back in September, Judge Johnson, of New Jersey, wrapped up a talc ovarian cancer case in Kemp, and politely excused the case from any further obligations to show up in court. Carl v. Johnson & Johnson, No. ATL-L-6546-14, 2016 WL 4580145 (N.J. Super. Ct. Law Div., Atl. Cty., Sept. 2, 2016) [cited as Carl]. See “New Jersey Kemps Ovarian Cancer – Talc Cases” (Sept. 16, 2016).

In Giannecchini v. Johnson & Johnson, a Missouri jury returned a substantial verdict for plaintiff. The jury, by a 9 to 3 vote, awarded $575,000 for claimed economic loss, and $2 million for non-economic compensatory damages. The jury also found defendant Johnson & Johnson in need of punishment to the tune of $65,000,000, and Imerys Talc America Inc. for $2.5 million. Plaintiffs, having sought $285 million, were no doubt disappointed. The Giannecchini verdict was the third large verdict in the Missouri talc litigation. See Myron Levin, “Johnson & Johnson Hammered Again in Talc-Ovarian Cancer Verdict of $70 Million,” (Oct. 27, 2016); Brandon Lowrey, “J & J, Talc Co. Hit With $70M Baby Powder Cancer Verdict,” Law360 (Oct. 2016).

In his closing argument, Giannecchini’s lawyer, R. Allen Smith, reportedly accused Johnson & Johnson of having “rigged” regulatory agencies to ignore the dangers of talc, and of having “falsified” medical records to hide the problem. Smith implored the jury to “make them stop”; make them “stop this madness.”

Make them stop the madness, indeed. The November 2016 issue of Epidemiology features a publication of the “Sister Study,” which explored whether there was any association between perineal talc use and ovarian cancer. The authors acknowledged, as had Judge Johnson in the Carl case, that some prior case-control studies had found an increased risk of ovarian cancer, but that prospective cohort studies have not confirmed an association. Nicole L. Gonzalez, Katie M. O’Brien, Aimee A. D’Aloisio, Dale P. Sandler, and Clarice R. Weinberg, “Douching, Talc Use, and Risk of Ovarian Cancer,” 27 Epidemiology 797 (2016).

The Sister Study (2003–2009) followed a cohort of 50,884 women whose sisters had been diagnosed with breast cancer. Talc use was ascertained at baseline, before diagnosis of subsequent disease and before any chance for selective recall. The cohort was followed for a median of 6.6 years, in which time there were 154 cases of ovarian cancer during the follow up, available for analysis using Cox’s proportional hazards model. Perineal talc use at baseline was not associated with later ovarian cancer. The authors reported a hazard ratio of 0.73, less than expected, with a 95% confidence interval of 0.44, 1.2.

So, yes, make them stop this madness; close the gate.

 Another Haack Article on Daubert

October 14th, 2016

In yet another law review article on Daubert, Susan Haack has managed mostly to repeat her past mistakes, while adding a few new ones to her exegesis of the law of expert witnesses. See Susan Haack, “Mind the Analytical Gap! Tracing a Fault Line in Daubert,” 654 Wayne L. Rev. 653 (2016) [cited as Gap].  Like some other commentators on the law of evidence, Haack purports to discuss this area of law without ever citing or quoting the current version of the relevant statute, Federal Rule of Evidence 703. She pours over Daubert and Joiner, as she has done before, with mostly the same errors of interpretation. In discussing Joiner, Haack misses the importance of the Supreme Court’s reversal of the 11th Circuit’s asymmetric standard of Rule 702 trial court decisions. Gap at 677. And Haack’s analysis of this area of law omits any mention of Rule 703, and its role in Rule 702 determinations. Although you can safely skip yet another Haack article, you should expect to see this one, along with her others, cited in briefs, right up there with David Michael’s Manufacturing Doubt.

A Matter of Degree

“It may be said that the difference is only one of degree. Most differences are, when nicely analyzed.”[1]

Quoting Holmes, Haack appears to complain that the courts’ admissibility decisions on expert witnesses’s opinions are dichotomous and categorical, whereas the component parts of the decisions, involving relevance and reliability, are qualitative and gradational. True, true, and immaterial.

How do you boil a live frog so it does not jump out of the water?  You slowly turn up the heat on the frog by degrees.  The frog is lulled into complacency, but at the end of the process, the frog is quite, categorically, and sincerely dead. By a matter of degrees, you can boil a frog alive in water, with a categorically ascertainable outcome.

Humans use categorical assignments in all walks of life.  We rely upon our conceptual abilities to differentiate sinners and saints, criminals and paragons, scholars and skells. And we do this even though IQ, and virtues, come in degrees. In legal contexts, the finder of fact (whether judge or jury) must resolve disputed facts and render a verdict, which will usually be dichotomous, not gradational.

Haack finds “the elision of admissibility into sufficiency disturbing,” Gap at 654, but that is life, reason, and the law. She suggests that the difference in the nature of relevancy and reliability on the one hand, and admissibility on the other, creates a conceptual “mismatch.” Gap at 669. The suggestion is rubbish, a Briticism that Haack is fond of using herself.  Clinical pathologists may diagnose cancer by counting the number of mitotic spindles in cells removed from an organ on biopsy.  The number may be characterized by as a percentage of cells in mitosis, a gradational that can run from zero to 100 percent, but the conclusion that comes out of the pathologist’s review is a categorical diagnosis.  The pathologist must decide whether the biopsy result is benign or malignant. And so it is with many human activities and ways of understanding the world.

The Problems with Daubert (in Haack’s View)

Atomism versus Holism

Haack repeats a litany of complaints about Daubert, but she generally misses the boat.  Daubert was decisional law, in 1993, which interpreted a statute, Federal Rule of Evidence 702.  The current version of Rule 702, which was not available to, or binding on, the Court in Daubert, focuses on both validity and sufficiency concerns:

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:

(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data;

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

Subsection (b) renders most of Haack’s article a legal ignoratio elenchi.

Relative Risks Greater Than Two

Modern chronic disease epidemiology has fostered an awareness that there is a legitimate category of disease causation that involves identifying causes that are neither necessary nor sufficient to produce their effects. Today it is a commonplace that an established cause of lung cancer is cigarette smoking, and yet, not all smokers develop lung cancer, and not all lung cancer patients were smokers.  Epidemiology can identify lung cancer causes such as smoking because it looks at stochastic processes that are modified from base rates, or population rates. This model of causation is not expected to produce uniform and consistent categorical outcomes in all exposed individuals, such as lung cancer in all smokers.

A necessary implication of categorizing an exposure or lifestyle variable as a “cause,” in this way is that the evidence that helps establish causation cannot answer whether a given individual case of the outcome of interest was caused by the exposure of interest, even when that exposure is a known cause.  We can certainly say that the exposure in the person was a risk for developing the disease later, but we often have no way to make the individual attribution.  In some cases, more the exception than the rule, there may be an identified mechanism that allows the detection of a “fingerprint” of causation. For the most part, however, risk and cause are two completely different things.

The magnitude of risk, expressed as a risk ratio, can be used to calculate a population attributable risk, which can in turn, with some caveats, be interpreted as approximating a probability of causation.  When the attributable risk is 95%, as it would be for people with light smoking habits and lung cancer, treating the existence of the prior risk as evidence of specific causation seems perfectly reasonable.  Treating a 25% attributable risk as evidence to support a conclusion of specific causation, without more, is simply wrong.  A simple probabilistic urn model would tell us that we would most likely be incorrect if we attributed a random case to the risk based upon such a low attributable risk.  Although we can fuss over whether the urn model is correct, the typical case in litigation allows no other model to be asserted, and it would be the plaintiffs’ burden of proof to establish the alternative model in any event.

As she has done many times before, Haack criticizes Judge Kozinski’s opinion in Daubert,[2] on remand, where he entered judgment for the defendant because further proceedings were futile given the small relative risks claimed by plaintiffs’ expert witnesses.  Those relative risks, advanced by Shanna Swan and Alan Done, lacked reliability; they were the product of a for-litigation juking of the stats that were the original target of the defendant and the medical community in the Supreme Court briefing.  Judge Kozinski simplified the case, using a common legal strategem of assuming arguendo that general causation was established.  With this assumption favorable to plaintiffs made, but never proven or accepted, Judge Kozinski could then shine his analytical light on the fatal weakness of the specific causation opinions.  When all the hand waving was put to rest, all that propped up the plaintiff’s specific causation claim was the existence of a claimed relative risk, which was less than two. Haack is unhappy with the analytical clarity achieved by Kozinski, and implicitly urges a conflation of general and specific causation so that “all the evidence” can be counted.  The evidence of general causation, however, does not advance plaintiff’s specific causation case when the nature of causation is the (assumed) existence of a non-necessary and non-sufficient risk. Haack quotes Dean McCormick as having observed that “[a] brick is not a wall,” and accuses Judge Kozinski of an atomistic fallacy of ruling out a wall simply because the party had only bricks.  Gap at 673, quoting from Charles McCormick, Handbook of the Law of Evidence at 317 (1954).

There is a fallacy opposite to the atomistic fallacy, however, namely the holistic “too much of nothing fallacy” so nicely put by Poincaré:

“Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house.”[3]

Poincaré’s metaphor is more powerful than Haack’s call for holistic evidence because it acknowledges that interlocking pieces of evidence may cohere as a building, or they may be no more than a pile of rubble.  Poorly constructed walls may soon revert to the pile of stones from which they came.

Haack proceeds to criticize Judge Kozinski for his “extraordinary argument” that

“(a) equates degrees of proof with statistical probabilities;

(b) assesses each expert’s testimony individually; and

(c) raises the standard of admissibility under the relevance prong to the standard of proof.”

Gap at 672.

Haack misses the point that a low relative risk, with no other valid evidence of specific causation, translates into a low probability of specific causation, even if general causation were apodictically certain. Aggregating the testimony, say between  animal toxicologists and epidemiologists, simply does not advance the epistemic ball on specific causation because all the evidence collectively does not help identify the cause of Jason Daubert’s birth defects on the very model of causation that plaintiffs’ expert witnesses advanced.

All this would be bad enough, but Haack then goes on to commit a serious category mistake in confusing the probabilistic inference (for specific causation) of an urn model with the prosecutor’s fallacy of interpreting a random match probability as the evidence of innocence. (Or the complement of the random match probability as the evidence of guilt.) Judge Kozinski was not working with random match probabilities, and he did not commit the prosecutor’s fallacy.

Take Some Sertraline and Call Me in the Morning

As depressing as Haack’s article is, she manages to make matters even gloomier by attempting a discussion of Judge Rufe’s recent decision in the sertraline birth defects litigation. Haack’s discussion of this decision illustrates and typifies her analyses of other cases, including various decisions on causation opinion testimony on phenylpropanolamine, silicone, bendectin, t-PA, and other occupational, environmental, and therapeutic exposures. Maybe 100 mg sertraline is in order.

Haack criticizes what she perceives to be the conflation of admissibility and sufficiency issues in how the sertraline MDL court addressed the defendants’ motion to exclude the proffered testimony of Dr. Anick Bérard. Gap at 683. The conflation is imaginary, however, and the direct result of Haack’s refusal to look at the specific, multiple methodological flaws in plaintiffs’ expert witness Anick Bérard’s methodologic approach taken to reach a causal conclusion. These flaws are not gradational, and they are detailed in the MDL court’s opinion[4] excluding Anick Bérard. Haack, however, fails to look at the details. Instead Haack focuses on what she suggests is the sertraline MDL court’s conclusion that epidemiology was necessary:

“Judge Rufe argues that reliable testimony about human causation should generally be supported by epidemiological studies, and that ‘when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why [it] does not contradict or undermine their opinion’. * * *

Judge Rufe acknowledges the difference between admissibility and sufficiency but, when it comes to the part of their testimony he [sic] deems inadmissible, his [sic] argument seems to be that, in light of the defendant’s epidemiological evidence, the plaintiffs’ expert testimony is insufficient.”

Gap at 682.

This précis is a remarkable distortion of the material facts of the case. There was no plaintiffs’ epidemiology evidence and defendants’ epidemiologic evidence.  Rather there was epidemiologic evidence, and Bérard ignored, misreported, or misrepresented a good deal of the total evidentiary display. Bérard embraced studies when she could use their risk ratios to support her opinions, but criticized or ignored the same studies when their risk ratios pointed in the direction of no association or even of a protective association. To add to this methodological duplicity, Anick Bérard published many statements, in peer-reviewed journals, that sertraline was not shown to cause birth defects, but then changed her opinion solely for litigation. The court’s observation that there was a need for consistent epidemiologic evidence flowed not only from the conception of causation (non-necessary, not sufficient), but from Berard’s and her fellow plaintiffs’ expert witnesses’ concessions that epidemiology was needed.  Haack’s glib approach to criticizing judicial opinions fails to do justice to the difficulties of the task; nor does she advance any meaningful criteria to separate successful from unsuccessful efforts.

In attempting to make her case for the gradational nature of relevance and reliability, Haack acknowledges that the details of the evidence relied upon can render the evidence, and presumably the conclusion based thereon, more or less reliable.  Thus, we are told that epidemiologic studies based upon self-reported diagnoses are highly unreliable because such diagnoses are often wrong. Gap at 667-68. Similarly, we are told that in consider a claim that a plaintiff suffered an adverse effect from a medication, that epidemiologic evidence showing a risk ratio of three would not be reliable if it had inadequate or inappropriate controls,[5] was not double blinded, and lacked randomization. Gap at 668-69. Even if the boundaries between reliable and unreliable are not always as clear as we might like, Haack fails to show that the gatekeeping process lacks a suitable epistemic, scientific foundation.

Curiously, Haack calls out Carl Cranor, plaintiffs’ expert witness in the Milward case, for advancing a confusing, vacuous “weight of the evidence” rationale for the methodology employed by the other plaintiffs’ causation expert witnesses in Milward.[6] Haack argues that Cranor’s invocation of “inference to the best explanation” and “weight of the evidence” fails to answer the important questions at issue in the case, namely how to weight the inference to causation as strong, weak, or absent. Gap at 688 & n. 223, 224. And yet, when Haack discusses court decisions that detailed voluminous records of evidence about how causal inferences should be made and supported, she flies over the details to give us confused, empty conclusions that the trial courts conflated admissibility with sufficiency.


[1] Rideout v. Knox, 19 N.E. 390, 392 (Mass. 1892).

[2] Daubert v. Merrell Dow Pharm., Inc., 43 F.3d 1311, 1320 (9th Cir. 1995).

[3] Jules Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique)( “[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”).

[4] In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 466 (E.D. Pa. 2014).

[5] Actually Haack’s suggestion is that a study with a relative risk of three would not be very reliable if it had no controls, but that suggestion is incoherent.  A risk ratio could not have been calculated at all if there had been no controls.

[6] Milward v. Acuity Specialty Prods., 639 F.3d 11, 17-18 (1st Cir. 2011), cert. denied, 132 S.Ct. 1002 (2012).

New Jersey Kemps Ovarian Cancer – Talc Cases

September 16th, 2016

Gatekeeping in many courtrooms has been reduced to requiring expert witnesses to swear an oath and testify that they have followed a scientific method. The federal rules of evidence and most state evidence codes require more. The law, in most jurisdictions, requires that judges actively engage with, and inspect, the bases for expert witnesses’ opinions and claims to determine whether expert witnesses who want to heard in a courtroom have actually, faithfully followed a scientific methodology.  In other words, the law requires judges to assess the scientific reasonableness of reliance upon the actual data cited, and to evaluate whether the inferences drawn from the data, to reach a stated conclusion, are valid.

We are getting close to a quarter of a century since the United States Supreme Court outlined the requirements of gatekeeping, in Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993). Since the Daubert decision, the Supreme Court’s decisional law, and changes in the evidence rules themselves, have clarified the nature and extent of the inquiry judges must conduct into the reasonable reliance upon facts and data, and into the inferential steps leading to a conclusion.  And yet, many judges resist, and offer up excuses and dodges for shirking their gatekeeping obligations.  See generally David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).

There is a courtroom in New Jersey, in which gatekeeping is taken seriously from beginning to end.  There is at least one trial judge who encourages and even demands that the expert witnesses appear and explain their methodologies and actually show their methodological compliance.  Judge Johnson first distinguished himself in In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015).[1] And more recently, in two ovarian cancer cases, Judge Johnson dusted two expert witnesses, who thought they could claim their turn in the witness chair by virtue of their credentials and some rather glib hand waving. Judge Johnson conducted the New Jersey analogue of a Federal Rule of Evidence 104(a) Daubert hearing, as required by the New Jersey Supreme Court’s decision in Kemp v. The State of New Jersey, 174 N.J. 412 (2002). The result was disastrous for the two expert witnesses who opined that use of talcum powder by women causes ovarian cancer. Carl v. Johnson & Johnson, No. ATL-L-6546-14, 2016 WL 4580145 (N.J. Super. Ct. Law Div., Atl. Cty., Sept. 2, 2016) [cited as Carl].

Judge Johnson obviously had a good epidemiology teacher in Professor Stephen Goodman, who testified in the Accutane case.  Against this standard, it is easy to see how the plaintiffs’ talc expert witnesses, Drs. Daniel Cramer and Dr. Graham Colditz, fell “significantly” short. After presiding over seven days of court hearings, and reviewing extensive party submissions, including the actual studies relied upon by the expert witnesses and the parties, Judge Johnson made no secret of his disappointment with the lack of rigor in the analyses proffered by Cramer and Colditz:

“Throughout these proceedings the court was disappointed in the scope of Plaintiffs’ presentation; it almost appeared as if counsel wished the court to wear blinders. Plaintiffs’ two principal witnesses on causation, Dr. Daniel Cramer and Dr. Graham Colditz, were generally dismissive of anything but epidemiological studies, and within that discipline of scientific investigation they confined their analyses to evidence derived only from small retrospective case-control studies. Both witnesses looked askance upon the three large cohort studies presented by Defendants. As confirmed by studies listed at Appendices A and B, the participants in the three large cohort studies totaled 191,090 while those case-control studies advanced by Plaintiffs’ witnesses, and which were the ones utilized in the two meta-analyses performed by Langseth and Terry, total 18,384 participants. As these proceedings drew to a close, two words reverberated in the court’s thinking:

“narrow and shallow.” It was almost as if counsel and the expert witnesses were saying, Look at this, and forget everything else science has to teach as.

Carl at *12.

Judge Johnson did what for so many judges is unthinkable; he looked behind the curtain put up by highly credentialed Oz expert witnesses in his courtroom. What he found was unexplained, unjustified selectivity in their reliance upon some but not all the available data, and glib conclusions that gloss over significant limits in the resolving power of the available epidemiologic studies. Judge Johnson was particularly unsparing of Graham Colditz, a capable scientist, who deviated from the standards he set for himself in the work he had published in the scientific community:

“Dr. Graham Colditz is a brilliant scientist and a dazzling witness. His vocal inflection, cadence, and adroit use of histrionics are extremely effective. Dr. Colditz’s reputation for his breadth of knowledge about cancer and the esteem in which he is held by his peers is well deserved. Yet, at times, it seemed that issues raised in these proceedings, and the questions posed to him, were a bit mundane for a scientist of his caliber.”

Carl at *15. Dr. Colditz and the plaintiffs’ cause were not helped by Dr. Colditz’s own previous publications of studies and reviews that failed to support any “substantial association between perineal talc use and ovarian cancer risk overall,” and failed to conclude that talc was even a “risk factor” for ovarian cancer.  Carl at *18.

Relative Risk Size

Many courts have fumbled their handling of the issue whether applicable relative risks must exceed two before fact finders may infer specific causation between claimed exposures and specific diseases. There certainly can be causal associations that involve relative risks between 1.0, up to and including 2.0.  Eliminating validity concerns may be more difficult with such smaller relative risks, but there is nothing theoretically insuperable about having a causal association based upon such small relative risks. Judge Johnson apparently saw the diversity of opinions on this relative risk issue, many of which opinions are stridently maintained, and thoroughly fallacious.

Judge Johnson ultimately did not base his decision, with respect to general or specific causation, on the magnitude of relative risk, or the covering Bradford Hill factor of “strength of association.” Dr. Cramer appropriately acknowledged that his meta-analysis result, of an odds ratio of 1.29 was “weak,” Carl at *19, and Judge Johnson was critical of Dr. Colditz for failing to address the lack of strength of the association, and for engaging in a constant refrain that the association was “significant,” which is a precision not a size estimate for the measurement. Carl at *17.

Aware of the difficulty that New Jersey appellate courts have had with the issues surrounding relative risks greater than two, Judge Johnson was realistic to steer clear of any specific judicial reliance on the small size of the relative risk.  His Honor’s prudence is unfortunate however because ultimately small relative risks, even assuming that general causation is established, do nothing to support specific causation.  Indeed, relative risks of 1.29 (and odds ratios generally overstate the size of the underlying relative risk) would on a stochastic model support the conclusion that specific causation was less than 50% probable.  Critics have pointed out that risk may not be stochastically distributed, which is a great point, except that

(1) plaintiffs often have no idea how the risk, if real, is distributed in the observed sample, and

(2) the upshot of the point is that even for relative risks greater than 2.0, there is no warrant for inferring specific causation in a given case.

Judge Johnson did wade into the relative risk waters by noting that when relative risks were “significantly” less than two, establishing biological plausibility became essential.  Carl at *11.  This pronouncement is muddled on at least two fronts.  First, the relative risk scale is a continuum, and there is no standard reference for what relative risks greater than 1.0 are “significantly” less than 2.0.  Presumably, Judge Johnson thought that 1.29 was in the “significantly less than 2.0” range, but he did not say so; nor did he cite a source that supported this assessment. Perhaps he was suggesting that the upper bound of some meta-analysis was less than two. Second, and more troubling, the claim that biological plausibility becomes “essential” in the face of small relative risks is also unsupported. Judge Johnson does not cite any support for this claim, and I am not aware of any.  Elsewhere in his opinion, Judge Johnson noted that

“When a scientific rationale doesn’t exist to explain logically the biological mechanism by which an agent causes a disease, courts may consider epidemiologic studies as an alternate [sic] means of proving general causation.”

Carl at *8. So it seems that biological plausibility is not essential after all.

This glitch in the Carl opinion is likely of no lasting consequence, however, because epidemiologists are rarely at a loss to posit some biologically plausible mechanism. As the Dictionary of Epidemiology explains the matter:

“The causal consideration that an observed, potentially causal association between an exposure and a health outcome may plausibly be attributed to causation on the basis of existing biomedical and epidemiological knowledge. On a schematic continuum including possible, plausible, compatible, and coherent, the term plausible is not a demanding or stringent requirement, given the many biological mechanisms that often can be hypothesized to underlie clinical and epidemiological observations; hence, in assessing causality, it may be logically more appropriate to require coherence (biological as well as clinical and epidemiological). Plausibility should hence be used cautiously, since it could impede development or acceptance of new knowledge that does not fit existing biological evidence, pathophysiological reasoning, or other evidence.”

Miquel Porta, et al., eds., “Biological plausibility,” in A Dictionary of Epidemiology at 24 (6th ed. 2014). Most capable epidemiologists have thought up half a dozen biologically plausible mechanisms each morning before they have had their first cup of coffee. But the most compelling reason that this judicial hiccup is inconsequential is that the plaintiffs’ expert witnesses’ postulated mechanism, inflammation, was demonstrably absent in the tissue of the specific plaintiffs.  Carl at *13. The glib invocation of “inflammation” would seem bound to fail even as the most liberal test of plausibility when talc has anti-cancer properties that result from its ability to inhibit new blood vessel formation, a necessity of solid tumor growth, and the completely unexplained selectivity for ovarian tissue to the postulated effect, which leaves vaginal, endometrial, or fallopian tissues unaffected. Carl at *13-14. On at least two occasions, the United States Food and Drug Administration rejected “Citizen Petitions” for ovarian cancer warnings on talc products, advanced by the dubious Samuel S. Epstein for the Cancer Prevention Coalition, in large measure because of Epstein’s undue selectivity in citing epidemiologic studies and because a “cogent biological mechanism by which talc might lead to ovarian cancer is lacking… .” Carl at *15, citing Stephen M. Musser, Directory FDA Director, Letter Denying Citizens’ Petition (April 1, 2014).

Large Studies

Judge Johnson quoted the Reference Manual on Scientific Evidence (3d ed.  2011) for his suggestion that establishing causation requires large studies.  The quoted language, however, really does not bear on his suggestion:

“Common sense leads one to believe that a large enough sample of individuals must be studied if the study is to identify a relationship between exposure to an agent and disease that truly exists. Common sense also suggests that by enlarging the sample size (the size of the study group), researchers can form a more accurate conclusion and reduce the chance of random error in their results…With large numbers, the outcome of test is less likely to be influenced by random error, and the researcher would have greater confidence in the inferences drawn from the data.”

Reference Manual at page 576.  What the Reference Manual simply calls for studies with “large enough” samples.  How large is large enough is a variable that depends upon the magnitude of the association to be detected, the length of follow up, and the base rate or incidence of the outcome of interest. As far as “common sense,” goes, the Reference Manual is correct only insofar as larger is better with respect to sampling error.  Increasing sample size does nothing to address internal or external validity of studies, and may lead to erroneous interpretations by allowing results to achieve statistical significance at predetermined levels, when the observed associations result from bias or confounding, and not from any underlying relationship between exposure and disease outcome.

There is a more disturbing implication in Judge Johnson’s criticism of Graham Colditz for relying upon the smaller number of subjects in the case-control studies than are found in the available cohort studies. Ovarian cancer is a relatively rare cancer (compared with breast and colon cancer), and case-control studies are more efficient at assessing increased risk than are cohort studies for a rare outcome.  The number of cases in a case-control study represents an implied population many times larger than the number of actual cases in a case-control study.  If Judge Johnson had looked at the width of the confidence intervals for the “small” case-control studies, and compared those widths to the interval widths of the cohort studies, he would have seen that “smaller” case-control studies (fewer cases, as well as fewer total subjects) can generate more statistical precision than the larger cohort studies (with many more cohort and control subjects).  A more useful comparison would have been to the number of actual ovarian cancer cases in the meta-analyzed case-control studies with the number of actual ovarian cancer cases in the cohort studies. On this comparison, the cohort studies might not fare so well.

The size of the cohort for a rare outcome is thus fairly meaningless in terms of the statistical precision generated.  Smaller case-control studies will likely have much more power, and that should be reflected in the confidence intervals of the respective studies.

The issue, as I understand the talc litigation, is not size of the case-control versus cohort studies, but rather their analytical resolving power.  Case-control studies for this sort of exposure and outcome will be plagued by recall and other biases, as well as difficulty in selecting the right control group.  And the odds ratio will tend to overestimate the relative risk, in both directions.  Cohort studies, with good, pre-morbid exposure assessments, would thus be much more rigorous and accurate in estimating the true rate ratios. In the final analysis, Judge Johnson was correct to be critical of Graham Colditz for dismissing the cohort studies, but his rationale for this criticism was, in a few places, confused and confusing. There was nothing subtle about the analytical gaps, ipse dixits, and cherry picking shown by these plaintiffs’ expert witnesses.


[1] SeeJohnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015).

Judge Bernstein’s Criticism of Rule 703 of the Federal Rules of Evidence

August 30th, 2016

Federal Rule of Evidence Rule 703 addresses the bases of expert witness opinions, and it is a mess. The drafting of this Rule is particularly sloppy. The Rule tells us, among other things, that:

“[i]f experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted.”

This sentence of the Rule has a simple grammatical and logical structure:

If A, then B;

where A contains the concept of reasonable reliance, and B tells us the consequence that the relied upon material need not be itself admissible for the opinion to be admissible.

But what happens if the expert witness has not reasonably relied upon certain facts or data; i.e., ~A?  The conditional statement as given does not describe the outcome in this situation. We are not told what happens when an expert witness’s reliance in the particular field is unreasonable.  ~A does not necessarily imply ~B. Perhaps the drafters meant to write:

B if and only if A.

But the drafters did not give us the above rule, and they have left judges and lawyers to make sense of their poor grammar and bad logic.

And what happens when the reliance material is independently admissible, say as a business record, government report, and first-person observation?  May an expert witness rely upon admissible facts or data, even when a reasonable expert would not do so? Again, it seems that the drafters were trying to limit expert witness reliance to some rule of reason, but by tying reliance to the admissibility of the reliance material, they managed to conflate two separate notions.

And why is reliance judged by the expert witness’s particular field?  Fields of study and areas of science and technology overlap. In some fields, it is common place for putative experts to rely upon materials that would not be given the time of day in other fields. Should we judge the reasonableness of homeopathic healthcare providers’ reliance by the standards of reasonableness in homeopathy, such as it is, or should we judge it by the standards of medical science? The answer to this rhetorical question seems obvious, but the drafters of Rule 703 introduced a Balkanized concept of science and technology by introducing the notion of the expert witness’s “particular field.” The standard of Rule 702 is “knowledge” and “helpfulness,” both of which concepts are not constrained by “particular fields.”

And then Rule 703 leaves us in the dark about how to handle an expert witness’s reliance upon inadmissible facts or data. According to the Rule, “the proponent of the opinion may disclose [the inadmissible facts or data] to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect. And yet, disclosing inadmissible facts or data would always be highly prejudicial because they represent facts and data that the jury is forbidden to consider in reaching its verdict.  Nonetheless, trial judges routinely tell juries that an expert witness’s opinion is no better than the facts and data on which the opinion is based.  If the facts and data are inadmissible, the jury must disregard them in its fact finding; and if an expert witness’s opinion is based upon facts and data that are to be disregarded, then the expert witness’s opinion must be disregarded as well. Or so common sense and respect for the trial’s truth-finding function would suggest.

The drafters of Rule 703 do not shoulder all the blame for the illogic and bad results of the rule. The judicial interpretation of Rule 703 has been sloppy, as well. The Rule’s “plain language” tells us that “[a]n expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed.”  So expert witnesses should be arriving at their opinions through reliance upon facts and data, but many expert witnesses rely upon others’ opinions, and most courts seem to be fine with such reliance.  And the reliance is often blind, as when medical clinicians rely upon epidemiologic opinions, which in turn are based upon data from studies that the clinicians themselves are incompetent to interpret and critique.

The problem of reliance, as contained within Rule 703, is deep and pervasive in modern civil and criminal trials. In the trial of health effect claims, expert witnesses rely upon epidemiologic and toxicologic studies that contain multiple layers of hearsay, often with little or no validation of the trustworthiness of many of those factual layers. The inferential methodologies are often obscure, even to the expert witnesses, and trial counsel are frequently untrained and ill prepared to expose the ignorance and mistakes of the expert witnesses.

Back in February 2008, I presented at an ALI-ABA conference on expert witness evidence about the problems of Rule 703.[1] I laid out a critique of Rule 703, which showed that the Rule permitted expert witnesses to rely upon “castles in the air.” A distinguished panel of law professors and judges seemed to agree; at least no one offered a defense of Rule 703.

Shortly after I presented at the ALI-ABA conference, Professor Julie E. Seaman published an insightful law review in which she framed the problems of rule 703 as constitutional issues.[2] Encouraged by Professor Seaman’s work, I wrote up my comments on Rule 703 for an ABA publication,[3] and I have updated those comments in the light of subsequent judicial opinions,[4] as well as the failure of the Third Edition of the Reference Manual of Scientific Evidence to address the problems.[5]

===================

Judge Mark I. Bernstein is a trial court judge for the Philadelphia County Court of Common Pleas. I never tried a case before Judge Bernstein, who has announced his plans to leave the Philadelphia bench after 29 years of service,[6] but I had heard from some lawyers (on both sides of the bar) that he was a “pro-plaintiff” judge. Some years ago, I sat next to him on a CLE panel on trial evidence, at which he disparaged judicial gatekeeping,[7] which seemed to support his reputation. The reality seems to be more complex. Judge Bernstein has shown that he can be a critical consumer of complex scientific evidence, and an able gatekeeper under Pennsylvania’s crazy quilt-work pattern of expert witness law. For example, in a hotly contested birth defects case involving sertraline, Judge Bernstein held a pre-trial evidentiary hearing and looked carefully at the proffered testimony of Michael D. Freeman, a chiropractor and self-styled “forensic epidemiologist, and Robert Cabrera, a teratologist. Applying a robust interpretation of Pennsylvania’s Frye rule, Judge Bernstein excluded Freeman and Cabrera’s proffered testimony, and entered summary judgment for defendant Pfizer, Inc. Porter v. Smithkline Beecham Corp., 2016 WL 614572 (Phila. Cty. Ct. Com. Pl.). SeeDemonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case” (Oct. 6, 2015).

And Judge Bernstein has shown that he is one of the few judges who takes seriously Rule 705’s requirement that expert witnesses produce their relied upon facts and data at trial, on cross-examination. In Hansen v. Wyeth, Inc., Dr. Harris Busch, a frequent testifier for plaintiffs, glibly opined about the defendant’s negligence.  On cross-examination, he adverted to the volumes of depositions and documents he had reviewed, but when defense counsel pressed, the witness was unable to produce and show exactly what he had reviewed. After the jury returned a verdict for the plaintiff, Judge Bernstein set the verdict aside because of the expert witness’s failure to comply with Rule 705. Hansen v. Wyeth, Inc., 72 Pa. D. & C. 4th 225, 2005 WL 1114512, at *13, *19, (Phila. Ct. Common Pleas 2005) (granting new trial on post-trial motion), 77 Pa. D. & C. 4th 501, 2005 WL 3068256 (Phila. Ct. Common Pleas 2005) (opinion in support of affirmance after notice of appeal).

In a recent law review article, Judge Bernstein has issued a withering critique of Rule 703. See Hon. Mark I. Bernstein, “Jury Evaluation of Expert Testimony Under the Federal Rules,” 7 Drexel L. Rev. 239 (2015). Judge Bernstein is clearly dissatisfied with the current approach to expert witnesses in federal court, and he lays almost exclusive blame on Rule 703 and its permission to hide the crucial facts, data, and inferential processes from the jury. In his law review article, Judge Bernstein characterizes Rules 703 and 705 as empowering “the expert to hide personal credibility judgments, to quietly draw conclusions, to individually decide what is proper evidence, and worst of all, to offer opinions without even telling the jury the facts assumed.” Id. at 264. Judge Bernstein cautions that the subversion of the factual predicates for expert witnesses’ opinions under Rule 703 has significant, untoward consequences for the court system. Not only are lawyers allowed to hire professional advocates as expert witnesses, but the availability of such professional witnesses permits and encourages the filing of unnecessary litigation. Id. at 286. Hear hear.

Rule 703’s practical consequence of eliminating the hypothetical question has enabled the expert witness qua advocate, and has up-regulated the trial as a contest of opinions and opiners rather than as an adversarial procedure that is designed to get at the truth. Id. at 266-67. Without having access to real, admissible facts and data, the jury is forced to rely upon proxies for the truth: qualifications, demeanor, and courtroom poise, all of which fail the jury and the system in the end.

As a veteran trial judge, Judge Bernstein makes a persuasive case that the non-disclosure permitted under Rule 703 is not really curable under Rule 705. Id. at 288.  If the cross-examination inquiry into reliance material results in the disclosure of inadmissible facts, then judges and the lawyers must deal with the charade of a judicial instruction that the identification of the inadmissible facts is somehow “not for the truth.” Judge Bernstein argues, as have many others, that this “not for the truth” business is an untenable fiction, either not understood or ignored by jurors.

Opposing counsel, of course, may ask for an elucidation of the facts and data relied upon, but when they consider the time and difficulty involved in cross-examining highly experienced, professional witnesses, opposing counsel usually choose to traverse the adverse opinion by presenting their own expert witness’s opinion rather than getting into nettlesome details and risking looking foolish in front of the jury, or even worse, allowing the highly trained adverse expert witness to run off at the mouth.

As powerful as Judge Bernstein’s critique of Rule 703 is, his analysis misses some important points. Lawyers and judges have other motives for not wanting to elicit underlying facts and data: they do not want to “get into the weeds,” and they want to avoid technical questions of valid inference and quality of data. Yet sometimes the truth is in the weeds. Their avoidance of addressing the nature of inference, as well as facts and data, often serves to make gatekeeping a sham.

And then there is the problem that arises from the lack of time, interest, and competence among judges and jurors to understand the technical details of the facts and data, and inferences therefrom, which underlie complex factual disputes in contemporary trials. Cross examination is reduced to the attempt to elicit “sound bites” and “cheap shots,” which can be used in closing argument. This approach is common on both sides of the bar, in trials before judges and juries, and even at so-called Daubert hearings. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”).

The Rule 702 and 703 pretrial hearing is an opportunity to address the highly technical validity questions, but even then, the process is doomed to failure unless trial judges make adequate time and adopt an attitude of real intellectual curiosity to permit a proper exploration of the evidentiary issues. Trial lawyers often discover that a full exploration is technical and tedious, and that it pisses off the trial judge. As much as judges dislike having to serve as gatekeepers of expert witness opinion testimony, they dislike even more having to assess the reasonableness of individual expert witness’s reliance upon facts and data, especially when this inquiry requires a deep exploration of the methods and materials of each relied upon study.

In favor of something like Rule 703, Bernstein’s critique ignores that there are some facts and data that will never be independently admissible. Epidemiologic studies, with their multiple layers of hearsay, come to mind.

Judge Bernstein, as a reformer, is wrong to suggest that the problem is solely in hiding the facts and data from the jury. Rules 702 and 703 march together, and there are problems with both that require serious attention. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1 (2015); see alsoOn Amending Rule 702 of the Federal Rules of Evidence” (Oct. 17, 2015).

And we should remember that the problem is not solely with juries and their need to see the underlying facts and data. Judges try cases too, and can butcher scientific inference with any help from a lay jury. Then there is the problem of relied upon opinions, discussed above. And then there is the problem of unreasonable reliance of the sort that juries cannot discern even if they see the underlying, relied upon facts and data.


[1] Schachtman, “Rule 703 – The Problem Child of Article VII”; and “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008).

[2] See Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008).

[3]  Nathan A. Schachtman, “Rule of Evidence 703—Problem Child of Article VII,” 17 Proof 3 (Spring 2009).

[4]RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011)

[5] SeeGiving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[6] Max Mitchell, “Bernstein Announces Plan to Step Down as Judge,” The Legal Intelligencer (July 29, 2016).

[7] See Schachtman, “Court-Appointed Expert Witnesses,” for Mealey’s Judges & Lawyers in Complex Litigation, Class Actions, Mass Torts, MDL and the Monster Case Conference, in West Palm Beach, Florida (November 8-9, 1999). I don’t recall Judge Bernstein’s exact topic, but I remember he criticized the Pennsylvania Supreme Court’s decision in Blum v. Merrill Dow Pharmaceuticals, 534 Pa. 97, 626 A.2d 537 ( 1993), which reversed a judgment for plaintiffs, and adopted what Judge Bernstein derided as a blending of Frye and Daubert, which he called Fraubert. Judge Bernstein had presided over the Blum trial, which resulted in the verdict for plaintiffs.

Excited Utterance Podcast Series on Evidence Law

August 25th, 2016

As a graduate student, I was impressed by the extent to which scholars traveled to other schools to present draft papers and obtain feedback from other faculties and graduate students.  As a student, these presentations were interesting opportunities to engage with leading scholars and learn from their new ideas, as well as their mistakes.  Law school faculties back in the 1970s seemed like a much less collegial community of scholars, who rarely shared their ideas before publication, and thus did not receive the benefit of feedback from other scholars.

The isolation of legal scholarship has been mitigated in good law schools with the introduction of invited lectures and presentations, often at weekly seminars or luncheons.  These meetings can be exciting and inspiring, but obviously participation is limited, and the financial and travel time restraints can be burdensome.

Edward Cheng, who teaches evidence and related subjects at Vanderbilt Law School, has introduced an interesting idea: scholarly podcasts on legal topics in his field of interest. Professor Cheng’s stated hope is that he can produce and provide podcasts, on scholarly topics in the law of evidence, which replicate the faculty seminar for a broader audience.

To be sure, there have been podcasts about specific legal cases, such as the famously successful “Undisclosed” podcast on the Adnan Syed case, which can honestly share in the credit in helping expose corruption and dishonesty in the prosecution of Mr. Syed, and in helping Mr. Syed obtain a new trial. Professor Cheng’s planned podcast series, “Excited Utterance: The Evidence and Proof Podcast,” will be on evidentiary topics more of interest to legal scholars, students, and practitioners. His stated goal is to focus on legal scholarship on evidence law and “to provide a weekly virtual workshop in the world of evidence throughout the academic year” to a broader audience, more efficiently than the sporadic visiting lectures that any one school can sponsor on evidentiary topics.

The project seems worth the effort in theory, and we will see what it produces in practice. The fall 2016 schedule for Cheng’s Excited Utterance podcasts is set out below; and the first one, by Daniel Chapra, is already available at iTunes, and at the Excited Utterance website.

Daniel Capra, “Electronically Stored Information and the Ancient Documents Exception” (Aug. 22, 2016)

Michael Pardo, “Group Agency and Legal Proof, or Why the Jury Is An It” (Aug. 29, 2016)

Mary Fan, “Justice Visualized” (Sept. 5, 2016)

Sachin Pandya, “The Constitutional Accuracy of Legal Presumptions” (Sept. 12, 2016)

Christopher Slobogin, “Gatekeeping Science” (Sept. 19, 2016)

Mark Spottswood, “Unraveling the Conjunction Paradox” (Sept. 26, 2016)

Deryn Strange, “Memory Errors in Alibi Generation” (Oct. 3, 2016)

Sandra Guerra Thompson, “Cops in Lab Coats” (Oct. 10, 2016)

Maggie Wittlin, “Hindsight Evidence” (Oct. 17, 2016)

Stephanos Bibas, “Designing Plea Bargaining from the Ground Up” (Oct. 24, 2016)

Erin Murphy, “Inside the Cell: The Dark Side of Forensic DNA” (Oct. 31, 2016)

Pamela R. Metzger, “Confrontation as a Rule of Production” (Nov. 7, 2016)

Nancy S. Marder, “Juries and Lay Participation: American Perspectives and Global Trends” (Nov. 14, 2016)

Jay Koehler, “Testing for Accuracy in the Forensic Sciences” (Nov. 21, 2016)

Art Historian Expert Testimony

August 15th, 2016

Art appraisal and authentication is sometimes held out as a non-technical and non-scientific area of expertise, and as such, not subject to rigorous testing.[1] But to what extent is this simply excuse mongering for an immature field of study? The law has seen way too much of this sort of rationalization in criminal forensic studies.[2] If an entire field of learning suffers from unreliability because of its reliance upon subjective methodologies, lack of rigor, inability or unwillingness to use measurements, failure to eliminate biases through blinding, and the like, then do expert witnesses in this field receive a “pass” under Rule 702, simply because they are doing reasonably well compared with their professional colleagues?

In the movie Who the Fuck is Jackson Pollack, the late Thomas Hoving was interviewed about the authenticity of a painting claimed to have been “painted” by Jackson Pollack. Hoving “authoritatively,” and with his typical flamboyance, averred that the disputed painting was not a Pollack because the work “did not sing to me like a Pollack.” Hoving did not, however, attempt to record the notes he heard; nor did Hoving speak to what key Pollack usually painted in.

In a recent case of defamation and tortious interference with prospective business benefit, a plaintiff sued over the disparagement of a painting’s authenticity and provenance. As a result of the defendants’ statements that the painting at issue was not created by Peter M. Doig, auction houses refused to sell the painting held by plaintiff. In litigation, the plaintiff proffered an expert witness who opined that the painting was, in fact, created by Doig. The defendants challenged plaintiff’s expert witness as not reliable or relevant under Federal Rule of Evidence 702. Fletcher v. Doig, 13 C 3270, 2016 U.S. Dist. LEXIS 95081 (N.D. Ill. July 21, 2016).

Peter Bartlow, the plaintiff’s expert witness on authenticity, was short on academic credentials. He had gone to college, and finished only one year of graduate study in art history. Bartlow did, however, have 40 years in experience in appraisal and authentication. Fletcher, at *3-4. Beyond qualifications, the defendants complained that Bartlow’s method was

(1) invented for the case,

(2) was too “generic” to establish authenticity, and

(3) failed to show that any claimed generic feature was unique to the work of the artist in question, Peter M. Doig.

The trial court rebuffed this challenge by noting that Peter Bartlow did not have to be an expert specifically in Doig’s work. Fletcher at *7. Similarly, the trial court rejected the defendants’ suggestion that the disputed work must exhibit “unique” features of Doig’s ouevre. Bartlow had made a legally sufficient case for his opinions based upon a qualitative analysis of 45 acknowledged works, using specific qualitative features of 11 known works. Id. At *10. Specifically, Bartlow compared types of paint, similarities in styles, shapes and positioning, and “repeated lineatures” by superimposing lines from known paintings to the questioned ones. Id. With respect to the last of these approaches, the trial court found that Bartlow’s explanation that the approach of superimposing lines to show similarity was simply a refinement of methods commonly used by art appraisers.

By comparison with Thomas Hoving’s subjective auditory methodology, as explained in Who the Fuck, Bartlow’s approach was positively brilliant, even if the challenged methodologies left much to be desired. For instance, Bartlow compared one disputed painting with 45 or so paintings of accepted provenance. No one tested Bartlow’s ability, blinded to provenance, to identify true and false positives of Doig paintings. SeeThe Eleventh Circuit Confuses Adversarial and Methodological Bias, Manifestly Erroneously” (June 6, 2015); see generally Christopher Robertson & Aaron Kesselheim, Blinding as a Solution to Bias: Strengthening Biomedical Science, Forensic Science, and Law (2016).

Interestingly, the Rule 702 challenges in Fletcher were in a case slated to be tried by the bench. The trial court thus toasted the chestnut that trial courts have even greater latitude in admitting expert witness opinion testimony in bench trials, in which “the usual concerns of [Rule 702] – keeping unreliable testimony from the jury – are not present.” Fletcher at *3 (citing Metavante Corp. v. Emigrants Savings Bank, 619 F.3d 648, 670 (7th Cir. 2010)). Citing Seventh Circuit precedent, the trial court, in Fletcher, asserted that the need to rule on admissibility before trial was lessened in a bench trial. Id. (citing In re Salem, 465 F.3d 767, 777 (7th Cir. 2006)). The courts that have taken this position have generally failed to explain why the standard for granting or denying a Rule 702 challenge should be different in a bench trial. Clearly, a bench trial can be just as much a waste of time, money, and energy as a jury trial. Even more clearly, judges can be, and are, snookered by misleading expert witness opinions, and they are also susceptible to their own cognitive biases and the false allure of unreliable opinion testimony, built upon invalid inferences. Men and women do not necessarily see more clearly when wearing black robes, but they can achieve some measure of objectivity by explaining and justifying their gatekeeping opinions in writing, subject to public review, comment, and criticism.


[1] See, e.g. Lees v. Carthage College, 714 F.3d 516, 525 (7th Cir. 2013) (holding that an expert witness’s testimony on premises security involved non-scientific expertise and knowledge that did “not easily admit of rigorous testing and replication”).

[2] See, e.g., National Academies of Science, Strengthening Forensic Science in the United States: A Path Forward (2009).

Whether to Conduct Depositions of Expert Witnesses

June 23rd, 2016

In a Litigation magazine article, Gregory Joseph sets out some strong reasons to consider for not conducting depositions of expert witnesses under the revised 2010 Federal Rules of Civil Procedure (FRCP). See Gregory P. Joseph, “The Temptation to Depose Every Expert,” 40 Litigation 35 (Winter 2014) [cited below as Joseph]. Joseph points out that FRCP 26(a)(2)(B) requires parties to disclose, for all retained expert witnesses, “all opinions” and the “full factual basis” of all their opinions, among other things. The rule is exacting. All opinions includes “a complete statement of all opinions the witness will express and the basis and reasons for them.” FRCP 26(a)(2)(B)(i). And a full factual basis includes “the facts or data considered by the witness in forming” all of the opinions disclosed in the report. FRCP 26(a)(2)(B)(ii) (emphasis added).

Joseph argues that the breadth of the required disclosure, combined with sanctions for retained expert witnesses’s attempting to testify beyond the four corners of their reports, should give lawyers sufficient assurances in many instances to forego conducting depositions of expert witnesses.

Joseph notes that the FRCP creates a presumptive mandatory sanction of exclusion for undisclosed expert testimony. FRCP 37(c)(1).[1]  Joseph offers other arguments beyond the supposed comfort given by the “four corners” rule set out in the FRCP. Joseph at 36-37. First, the deposition may “reopen” discovery by giving expert witnesses opportunities to expand upon the four corners of their reports. Although some courts will limit what expert witnesses can throw over the transom at depositions, a supervising magistrate or district judge may not regard the expansion upon the disclosures in the report as “sandbagging,” and thus fail exclude the arguably new opinions or bases. Joseph cites a few cases in which courts condemned the sandbagging of counsel by the offering of new opinions in depositions, but points out that exclusion is this circumstance is highly discretionary. The court is not required to exclude, and it may permit the new material, or allow the new material with an inadequate amount of additional time in deposition. So taking the deposition has risks.

Joseph argues also that depositions may educate expert witnesses about intended trial cross-examination, and help adversary counsel better prepare direct examination and anticipatory rebuttal. Furthermore, the new protections afforded expert witnesses from discovery into drafts of reports and most communications with retaining counsel take away one of the previous reasons to conduct depositions.

To be sure, some additional areas of discovery may be covered by interrogatories, Rule 34 document requests, or Rule 45 subpoenas directly to the expert witnesses. These non-deposition methods of discovery, however, will not reach valuable topics of discovery such as oral communications between retained expert witnesses and professional colleagues, consulting expert witnesses, the retaining lawyers’ clients, and other persons. The suggested alternative discovery methods also suffer in that they will provoke canned answers, written by counsel, and not the ingenuous, unrehearsed responses of expert witnesses required to give answers directly and without resort to  “privileged” consultation with retaining counsel.

The revised FRCP carve out important areas of inquiry from the new protections against discovery into draft reports and with counsel. Counsel still are permitted to inquire into compensation, the retaining attorneys’ provision of “facts or data” considered by the witnesses, and retaining attorneys’ identification of assumptions “relied” upon by the witnesses. Invoices can, of course, be subpoenaed, but often oral examination is required to discover whether the invoices have been paid, whether they are contingent, or whether payment flows to the personal benefit of the expert witnesses. Inquiring into what “facts or data” were provided by retaining counsel can be attempted by written discovery, but the written responses will likely be hedged and unclear, and the responses will not distinguish which lawyer-provided “facts or data” were actually relied upon.

The FRCP clearly allow discovery into retaining attorneys’ provision of assumptions relied upon by expert witnesses, but clear, unrehearsed answers to questions about what was assumed and relied upon, as opposed to merely considered, are not likely to be forthcoming in written discovery. Furthermore, if there will be any fair opportunity to explore the significance of relying upon counsel’s assumptions, only a deposition will likely allow for the extemporaneous, first-person expression of expert witnesses’ opinions. Questions into expert witnesses’ opinions based upon hypothetical questions that contradict the assumptions given, or into opinions about the level of confidence or knowledge witnesses have about the correctness of the assumptions, are likely to be effective only in face-to-face encounters.

There are important additional reasons for taking expert witness depositions, not addressed in Joseph’s article. Litigation-savvy expert witnesses will often glibly assert that they have “considered” all the relevant studies, data, and facts. If written discovery is propounded to inquire whether a study omitted from the “consideration” list in the FRCP report was not considered, the study, if meaningful, will be added to the list in the written response with a feeble excuse that it was inadvertently omitted from the list. And the omission will likely be judged harmless because the party seeking discovery obviously knew about the omitted study already. Written discovery into what studies, data, or facts were considered but not relied upon will also yield highly rehearsed answers, and interrogatories will not permit inquiries into the fine details of key studies.

The pertinent sections of the FRCP do not require expert witness reports to distinguish what the witnesses have considered from what they have actually relied upon. Written discovery could be propounded, but again, it will not likely yield clear answers such as might be had with follow up inquiry into what was considered but not relied upon, and why was reliance rejected. The deposition upon oral examination has the benefit of permitting follow up questions into why some studies were relied upon for some parts but not others, or were considered but completely excluded from actual reliance. The opportunity to field incoherent, inconsistent rationales for inclusions and exclusions that establish expert witness cherry picking will be lost without the face-to-face encounter allowed by oral examination.

With some courts engaged in retrograde refusal to apply Rule 702 as enacted, some expert witnesses have been encouraged to employ vague, invalid, and unreliable methodologies, such as the so-called “weight of the evidence” approach. Oral examination will be necessary to establish expert witnesses’ weighting considerations, their inclusion and exclusion criteria, and to test their consistency in applying these considerations and criteria, across the entire evidentiary base for conclusions.

Concessions to Be Obtained

Written discovery is not well suited to inquire into general principles of interpreting data and studies, data integrity and validity, and validity of inference.  Interrogatories are too difficult to draft in sufficient detail to permit setting up an examination that will lead to the disqualification of the expert witness under Rule 702.  Obtaining concise, clear concessions about basic methodological principles is crucial to structuring persuasive cross-examinations.  Of course, if the deponent balks at accepting generally accepted principles, then this testimony is filed under Rule 702 motion, rather than trial cross-examination.

Furthermore, written discovery is poorly suited to identify whether expert witnesses have subject-matter weaknesses.  Interrogatories are the wrong discovery tool to conduct pop-quizzes on arcane statistical and scientific methodologies. Lawyers rightfully do not want to get into show-game style quizzes to test expert witnesses’ understanding of the esoteric, but important, methodologies used in the studies relied upon, in front of a jury. Rule 26 reports rarely announce that witnesses have had no meaningful training in statistics and that they have no idea what assumptions were made in various statistical analyses or tests in the studies that they have embraced and relied upon for their opinions.

Expert witnesses have social and professional connections not always apparent from their curriculum vitae, their Rule 26 reports, or their websites. Expert witnesses are not likely, for instance, to disclose that they are Marxists, who believe that corporations are evil and mercenary, and cannot be trusted to tell the truth in litigation.[2]

As noted, the FRCP requires disclosure of facts or data considered, which disclosure is usually inadequate to permit distinguishing what was actually relied upon in forming opinions. But what about opinions considered or relied upon? FRCP does not address reliance upon opinions; nor does Rule 703. Expert witnesses may contend that their opinions are not “based upon” others’ opinions, but that their opinions are strengthened and corroborated by the opinions of others. The FRCP do not specifically call for disclosure of opinions relied upon by retained expert witnesses, and adversary counsel can be trusted to argue that there were no obligations to disclose opinions or the identity of “authoritative” treatises and publications. If there is no entitlement to disclosure, there can be no surprise and prejudice.

Interpreting the scope of the report may not be as clear as Joseph suggests.  Rule 26 reports usually contain some opinions with sufficient breadth and generality that foregoing depositions becomes a game of Russian roulette.  Trial judges may not look kindly upon “scope of the report” objections, made at trial, when the objecting counsel had the opportunity to conduct an examination, and the report language is sufficiently broad to intimate the witness’s opinion at trial. Judges seem to have great hindsight vision, and they may well distrust counsel’s objections as a different sort of sandbagging. An entire strategy of restraint may be sunk by a quick, discretionary ruling on “scope of the report,” which often will favor the proponent of the witness.

Joseph is correct that many depositions fail to accomplish much, but such failures are not the result of how wonderful the revised FRCP are.  Failed depositions are more likely to result from the lack of preparation, creativity and knowledge of counsel in carrying out coherent, effective depositions.


[1] See Primus v. United States, 389 F.3d 231, 234 (1st Cir. 2004); Vaughn v. City of Lebanon, 18 F.App’x 252, 263 (6th Cir. 2001); Musser v. Gentiva Health Services, 356 F.3d 751, 758 (7th Cir. 2004). See also Design Strategy, Inc. v. Davis, 469 F.3d 284, 296 (2d Cir. 2006) (characterizing exclusion as discretionary, but upholding district court’s exclusion).

[2] Such as may be seen with expert witnesses who belong to the Committees of Correspondence for Democracy and Socialism, a branch of the Communist Party USA, formed in 1992, after the demise of the Soviet Union.

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

April 21st, 2016

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  1. Double the one-sided p-value.
  2. Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  3. Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., MDL No. 2:14-mn-02502-RMG, ___ F.Supp. 3d  ___ (2015), 2015 WL 7422613 (D.S.C. Nov. 20, 2015) [Lipitor Jewell], reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016) [Lipitor Jewell Reconsidered].

As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


Post-Script (Aug. 9, 2017)

The defense argument and the judicial error were echoed in a Washington Legal Foundation paper that pilloried Nicholas Jewell for the surfeit of many methodological flaws in his expert witness opinions in In re Lipitor. Unfortunately, the paper uncritically recited the defense’s theory about the Fisher’s Exact Test:

“In assessing Lipitor data, even after all of the liberties that [Jewell] took with selecting data, he still could not get a statistically-significant result employing a Fisher’s exact test, so he switched to another test called a mid-p test, which generated a (barely) statistically significant result.”

Kirby Griffis, “The Role of Statistical Significance in Daubert/Rule 702 Hearings,” at 19, Wash. Leg. Foundation Critical Legal Issues Working Paper No. 201 (Mar. 2017). See Kirby Griffis, “Beware the Weak Argument: The Rule of Thirteen,” For the Defense 72 (July 2013) (quoting Justice Frankfurter, “A bad argument is like the clock striking thirteen. It puts in doubt the others.”). The fallacy of Griffis’ argument is that it assumes that a mid-p calculation is a different statistical test from the Fisher’s Exact test, which yields a one-tailed significance probability. Unfortunately, Griffis’ important paper is marred by this and other misstatements about statistics.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed.Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).