TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Lawyer and Economist Expert Witnesses Fail the t-Test

July 7th, 2016

Chad L. Staller is a lawyer and James Markham is an economist.  The two testify frequently in litigation.  They are principals in a litigation-mill known as the Center for Forensic Economic Studies (CFES), which has been a provider of damages opinions-for-hire for decades.

According to its website, the CFES is:

“a leading provider of expert economic analysis and testimony. Our economists and statisticians consult on matters arising in litigation, with a focus on the analysis of economic loss and expert witness testimony on damages.

We assist with discovery, uncover key data, critique opposing claims and produce clear, credible reports and expert testimony. Attorneys and their clients have relied on our expertise in thousands of cases in jurisdictions across the country.”

Modesty was never CFES’s strong suit. CFES was founded by Chad Staller’s father, the late Jerome M. Staller, who infused the run-away inflation of the early 1980s into his reports for plaintiffs in personal injury actions. When this propensity for inflation brought in a large volume of litigation consulting, Staller brought on Brian P. Sullivan.  The CFES website notes that Sullivan’s “courtroom demeanor was a model of modesty and good humor, yet he was known to be merciless when cross examined by an opposing attorney.” My personal recollection is that Sullivan sweated profusely on cross-examination. In one case, in which I cross-examined him, Sullivan had added several figures incorrectly to the plaintiff’s detriment.  My cross-examination irked the trial judge (Judge Dowling, who was easily irked) to the point that he interrupted me to ask why I was wasting time to point out an error that favored the defense. The question allowed me to give a short summation about how I thought the jury might want to know that the witness, Sullivan, had such difficulty in adding uncomplicated numbers.

In Butt v. v. United Brotherhood of Carpenters & Joiners of America, 2016 WL 3365772 (E.D. Pa. June 16, 2016) [cited as Butt], plaintiffs, women union members sued for alleged disparate treatment, which treatment supposedly caused them to have lower incomes than male union members. To support their claims, the women produced reports prepared by CFES’s Chad Staller and James Markham. Counsel for the union challenged the admissibility of the proffered opinions under Rule 702. The magistrate judge sustained the Rule 702 challenges, in an opinion that questioned the reliability and ability of the challenged putative expert witnesses.[1]

Staller and Markham apparently had proffered a “t-test,” which, in their opinion, showed a statistically significant disparity in male and female hours worked, “not attributable to chance.” Butt at *1. Staller and Markham failed, however, to explain or justify their use of the t-test.  The sample size in their analysis included 17 women and 388 men on average across ten years. The magistrate judge noted serious reservations over the CFES analysis’s failure to specify how many men or women were employed in any given year. Plaintiffs’ counsel improvidently attempted to support the CFES analysis by adverting to the Reference Manual on Scientific Evidence (3d ed. 2011), which properly notes that the t-test is designed for small samples, but also issues the caveat that “[a] t-test is not appropriate for small samples drawn from a population that is not normal.” Butt at *1 n.2. The CFES reports, submitted without statistical analysis output, apparently did not attempt to justify the assumption of normality; nor did they proffer a non-parametric analysis.

Putting aside the plaintiffs’ expert witnesses’ failure to explain and justify its use of the t-test, the magistrate judge took issue with the assumption that a comparison of average salaries between the genders was an appropriate analysis in the first place. Butt at *2.

First, the CFES reports assigned damages beyond the years used in their data analysis, which ended in 2012. This extrapolation was especially speculative unwarranted given that union carpenter working hours were trending downward after 2009. Butt at *3. Second, and even more seriously, the magistrate judge saw that no useful comparison could be made between male and female salaries without taking into account several important additional variables such as their individual skills, the extent that individual carpenters solicited employment, or used referral systems, or accepted out-of-town employment. Butt at *3.[2] Without an appropriate multivariate analysis, the CFES reports could not conclude that the discrepancy in hours worked was caused by, rather than merely correlated with, gender. Butt at *4.[3]


[1] See Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 322 (3d Cir. 2003) (affirming exclusion of “speculative and unreliable” expert evidence).

[2] citing Stair v. Lehigh Valley Carpenters Local Union No. 600 of United Brotherhood of Carpenters and Joiners of America, No. Civ. A. 91-1507, 1993 WL 235491, at *7, *18 (E.D. Pa. July 24, 1993) (Huyett, J.), aff’d, 43 F.3d 1463 (3d Cir. 1994) (“Many variables determine the number of hours worked by a carpenter: whether the carpenter solicits employment, whether he or she uses the referral system, whether an employer asks for that carpenter by name, whether the carpenter will accept out of town employment, and whether the carpenter has the skills requested by an employer when that employer calls the Union for a referral.”

[3] Interesting cases cited by the magistrate judge in support included Molthan v. Temple University, 778 F.2d 955, 963 (3d Cir. 1985) (“Because the considerations affecting promotion decisions may differ greatly from one department to another, statistical evidence of a general underrepresentation of women in the position of full professor adds little to a disparate treatment claim.”); Riding v. Kaufmann’s Dep’t Store, 220 F.Supp. 2d 442, 459 (W.D. Pa. 2002) (“Plaintiff’s statistical evidence is mildly interesting, but she does not put the data in context (how old were the women?) [or] tell us what to do with it or what inferences should be gathered from it…”); Brown v. Cost Co., No. Civ. A. 03-224 ERIE, 2006 WL 544296, at *3 (W.D. Pa. Mar. 3, 2006) (excluding statistical evidence proffered in support of claims of disparate treatment).

National Academies’ Teaching Modules on Scientific Policy Issues

June 30th, 2016

Today, the National Academies of Sciences, Engineering, and Medicine announced its release of nine teaching modules to help public policy decision makers and students in professional schools understand the role of science in policy decision making.[1] The modules were developed by university faculty members for  the use of other faculty who want to help their students appreciate the complexity and nuances of the evidence for and against scientific claims.

A group within the Academies’ Committee on Science, Technology and the Law supervised the development of the teaching modules, which are now publicly available at the Academies’ website. The Committee was chaired by Paul Brest, former dean and professor emeritus (active), Stanford Law School, and Saul Perlmutter, Franklin W. and Karen Weber Dabby Chair, University of California, Berkeley, and senior scientist, E.O. Lawrence Berkeley National Laboratory. The Gordon and Betty Moore Foundation and the National Biomedical Research Foundation sponsored the development of the modules.

The modules use case studies to illustrate basic scientific and statistical principles involved in contemporary scientific issues that have significant policy implications. The modules are designed to help future policy and decision makers understand and evaluate the scientific evidence that they will doubtlessly encounter. To date, nine modules have been developed and released, in the hope that they will serve as references and examples for future teaching modules.

The nine modules prepared to date are:

Models: Scientific Practice in Context

prepared by:
– Elizabeth Fisher, Professor of Environmental Law, Faculty of Law and Corpus Christi College, Oxford University
– Pasky Pascual, Environmental Protection Agency
– Wendy Wagner, Joe A. Worsham Centennial Professor,  University of Texas at Austin School of Law

The Interpretation of DNA Evidence: A Case Study in Probabilities

prepared by:

– David H. Kaye, Associate Dean for Research and Distinguished Professor, The Pennsylvania State University (Penn State Law)

Translating Science into Policy: The Role of Decision Science

prepared by:

– Paul Brest, Former Dean and Professor Emeritus (active), Stanford Law School

Placing a Bet: A New Therapy for Parkinson’s Disease

prepared by:

– Kevin W. Sharer, Senior Lecturer, Harvard Business School, Harvard University

Shale Gas Development

prepared by:

– John D. Graham, Dean, School of Public and Environmental Affairs, Indiana University
– John A. Rupp, Adjunct Instructor, School of Public and Environmental Affairs, and Senior Research Scientist, Indiana Geological Survey, Indiana University
– Adam V. Maltese, Associate Professor of Science Education, School of Education, and Adjunct Faculty in Department of Geological Sciences, Indiana University

Drug-Induced Birth Defects: Exploring the Intersection of Regulation, Medicine, Science, and Law

prepared by:

– Nathan A. Schachtman, Lecturer in Law, Columbia Law School

Vaccines

prepared by:

– Arturo Casadevall, Professor and Chair, W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins University Bloomberg School of Public Health

Forensic Pattern Recognition Evidence

prepared by:

– Simon A. Cole, Professor, Department of Criminology, Law, and Society, Director, Newkirk Center for Science and Society, University of California, Irvine
– Alyse Berthental, Ph.D. Candidate, Department of Criminology, Law, and Society, University of California, Irvine
– Jaclyn Seelagy, Scholar, PULSE (Program on Understanding Law, Science, and Evidence),  University of California, Los Angeles School of Law

Scientific Evidence of Factual Causation

prepared by:

– Steve C. Gold, Professor of Law, Rutgers School of Law-Newark
– Michael D. Green, Williams Professor of Law, Wake Forest University School of Law
– Joseph Sanders, A.A. White Professor of Law, University of Houston Law Center


[1] SeeAcademies Release Educational Modules to Help Future Policymakers and Other Professional-School Students Understand the Role of Science in Decision Making” (June 30, 2016).

Reinventing the Burden of Proof

April 27th, 2016

If lawyers make antic claims that keep the courtrooms busy, law professors make antic proposals to suggest that the law is conceptually confused and misguided, to keep law reviews full.

A few years ago, an article by Professor Edward Cheng claimed that common law courts have failed to grasp the true meaning of burdens of proof. Edward K. Cheng, “Reconceptualizing the Burden of Proof,” 122 Yale L. J. 1254 (2013) [Cheng]. Every law student knows that the preponderance-of-the-evidence standard requires that the party with the burden of proof to establish each element of the claim or defense to a probability greater than 50%. Cheng acknowledges that courts know this as well (citations omitted), but then he goes on to state some remarkable assertions.

First, Cheng suggests that the legal system has engaged in a “casual recharacterization of the burden of proof into p > 0.5 and p > 0.95.” Cheng at 1258. Being charitable, let’s say “characterization” rather than “recharacterization,” for Cheng cites nothing for his suggestion that there was some prior characterization that the law mischievously changed. Cheng at 1258.

Second, Cheng claims that the failure to deal with quantified posterior probabilities is the result of an educational or psychological deficiency of judges and lawyers:

“By comparison, the criminal beyond-a-reasonable-doubt standard is akin to a probability greater than 0.9 or 0.95. Perhaps, as most courts have ruled, the prosecution is not allowed to quantify ‘reasonable doubt’, but that is only an odd quirk of the math-phobic legal system.”

Cheng at 1256 (internal citations omitted). Cheng’s “recharacterization” has given way to his own mischaracterization of the legal system. There is a pandemic math phobia in the legal system, but the refusal to quantify the burden of proof in criminal cases has nothing to do with fear or mathematical incompetence. Most cases simply do not permit any rational or principled quantification of posterior probabilities. And even if they were to allow such a cognitive maneuver, most people, and even judges, cannot map practical certainty, or something like “beyond a reaonable doubt” on to a probability scale of 0 to 1. No less than Judge Jack Weinstein, certainly a friend to the notion that “all evidence is probabilistic,” showed in his informal survey of federal judges of the Eastern District of New York, that judges have no idea of what probability corresponds to the criminal burden of proof:

US v Fatico BoP

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Judge Weinstein’s informal survey showed well enough that there is no real understanding of how to map reasonable doubt or its complement onto a scale of 0 to 1. Furthermore, for the vast majority of cases, there is simply no way to assign meaningful probabilities to events, causes, and states of mind, which make up the elements of claims and defenses in our legal system.

Third, Cheng makes much of the non-existence of absolute probabilities in legal contexts. The word “absolute” is used 14 times in his essay. This point is confusing as stated because no one, to my knowledge, has claimed that the burden of proof is an absolute probability that is stated or arrived at independently of evidence in the case. Plaintiffs and defendants can have burdens of proof and claims and defenses, respectively, but for sake of simplicity, let’s follow Cheng and describe the civil burden of proof as the plaintiff’s burden. The relevant probability is not the absolute probability P(Hπ), but rather the conditional posterior probability: P(Hπ | E).

Fourth, Cheng’s principal innovation, the introduction of a probability ratio as the true meaning and model of the burden of proof has little or no support in case law or in evidence theory. Cheng cites virtually no cases, and only a few selected publications from the world of law reviews. Cheng proposes to recast burdens of proof as a ratio of conditional probabilities of the plaintiff’s and defendant’s “stories.” If the posterior probability of the plaintiff’s story at trial’s end is P(Hπ | E)1, and the defendant’s story is represented as P(Hδ | E), then Cheng argues that the plaintiff has carried his burden of proof whenever

P(Hπ | E) / P(Hδ | E) > 1.0

This innovation seems fundamentally wrong for several reasons. Again, assuming that the plaintiff or the State has the burden of proof, the defendant has none. If the plaintiff presents no evidence, then the numerator will be zero, and the ratio will be zero. The defendant prevails, and Cheng’s theory holds. But if the plaintiff presents some evidence and the defendant presents none, then the ratio is undefined. Alternatively, we may see the ratio in this situation as approaching infinity as a limit as the probability of the defendant’s “story” based upon his evidence approaches zero. On either interpretation of this scenario, the ratio Cheng invents is huge, and yet the plaintiff may well lose as for instance when plaintiff’s case is insufficient as a matter of law.

Cheng’s ratio theory thus fails as a descriptive theory. The theory appears to fail prescriptively as well. In most civil and criminal cases, the finder of fact is instructed that the defendant has no burden of proof and need not present any evidence at all. Even when the defendant has remained silent, and the plaintiff has presented a legally sufficient case, the fact finder may return a verdict for the defendant when the P(Hπ | E) seems too low with respect to the burden of proof.

Let’s consider an example, perhap not too far fetched in some American courtrooms. The plaintiff claims that drug A has caused him to develop Syndrome Z. Plaintiff has no clinical trial, or analytical epidemiologic, or animal evidence to support his claim. All the plaintiff can adduce is a so-called disproportionality analysis based upon the reporting of adverse events to the FDA. The defendant does not present any evidence of safety. The end point of interest in the lawsuit, Syndrome Z, was not observed in the trials, and was never looked for in any epidemiologic or toxicologic study. The defendant thus has no affirmative evidence of safety that counts for P(Hδ | E).

Assuming that the trial court does not toss this claim pretrial on a Rule 702 motion, or on a directed verdict, the defendant must address the plaintiff’s claim and the assertion that P(Hπ | E) > 0. The plaintiff supports his claim and assertion by presenting an expert witness who endorses the validity, accuracy, and probativeness of the disproportionality analysis. The defendant confronts this evidence solely on cross-examination, and not by trying to suggest that the plaintiff’s expert witness’s analysis is actually evidence of safety. The point of the cross-examination is to show that the proferred analysis is not a valid tool and lacks validity, accuracy, and probativeness.

In this situation, the plaintiff’s P(Hπ | E) might have been greater than 0.5 at the end of direct examination, but if defense counsel has done his job, then at the end of the cross-examination, the P(Hπ | E) < 0.5. Perhaps at this stage of the proceedings, P(Hπ | E) < 0.01.

The defendant, having no affirmative evidence of safety, rests without presenting any evidence. P(Hδ | E) = 0. Alas, we cannot say that P(Hδ | E) is the complement of P(Hπ | E). There is, in most cases, way too much room for ignorance, indeterminate, or unknown probability of the P(Hδ). In this hypothetical, however, there is no evidence adduced for safety at all, only very weak and unreliable evidence of harm. The ratio is undefined, but the law would allow the dismissal of the plaintiff’s case, or would affirm a rational fact finder’s return of a defense verdict. And the law should do those things.

Fifth, Cheng commits other errors along the way to arriving at his ratio theory. In one instance, he commits a serious category mistake:

“Looking at the statistical world, we immediately see that characterizing any decision rule as a 0.5 probability threshold is odd. Statisticians rarely attempt to prove the truth of a proposition or hypothesis by using its absolute probability. Instead, hypothesis testing is usually comparative. There is a null hypothesis and an alternative hypothesis, and one is rejected in favor of the other depending on the evidence observed and the consistency of that evidence with the two hypotheses.”

Cheng at 1259 (internal citations omitted; emphasis added).

Again, Cheng is correct insofar as he suggests that statisticians do not often use use absolute probabilities. Attained levels of significance probabilities, whether used in hypothesis testing or otherwise, are conditional probabilities that describe the probability of observing the sample statistic, or one more extreme, based upon the statistical model and posited null hypothesis. Indeed, many methodologically rigorous statisticians and scientists would resist placing a quantified posterior probability on the truth of a proposition or hypothesis. The measures of probability may be helpful in identifying uncertainties due to random error, or even on occasion due to bias, but these measures do not translate into assigning the quantified posterior probabilites that Cheng wants and needs to make his ratio theory work. There is nothing, however, odd about using the quantified posterior probability of greater than 50% as a metaphor.

But whence comes rejecting one hypothesis “in favor of” another, as a matter of statistics? The null hypothesis is not accepted in the hypothesis test; rather it was assumed in order to conduct the test. The inference Cheng describes would be improper. In a footnote, Cheng asserts that “classical hypothesis testing strongly favors the null hypothesis,” but this conflates attained level of significance with posterior probabilities. Cheng at 1259 n. 12. Cheng states that “the null hypothesis can be given no specific preference,” in legal contexts, id., but this statement seems to ignore what it means for a party to have a burden of proving facts needed to establish its claim or defense.

Of course, over the course of multiple studies, which look at the issue repeatedly with increasingly precise and valid experiments and studies, and which consistently fail to reject a given null hypothesis, we sometimes do, as a matter of judgment, accept the null hypothesis. This situation has little to do with the Cheng’s ratio theory, however.


1   Where P stands for probability, Hπ for the plaintiff’s “story,” Hδ for the defendant’s story, P(Hπ | E) represents the posterior probability at trial’s end of the plaintiff’s story given the evidence, and P(Hδ | E) represents the posterior probability at trial’s end of the defendant’s story given the evidence.

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

April 21st, 2016

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  1. Double the one-sided p-value.
  2. Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  3. Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., MDL No. 2:14-mn-02502-RMG, ___ F.Supp. 3d  ___ (2015), 2015 WL 7422613 (D.S.C. Nov. 20, 2015) [Lipitor Jewell], reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016) [Lipitor Jewell Reconsidered].

As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


Post-Script (Aug. 9, 2017)

The defense argument and the judicial error were echoed in a Washington Legal Foundation paper that pilloried Nicholas Jewell for the surfeit of many methodological flaws in his expert witness opinions in In re Lipitor. Unfortunately, the paper uncritically recited the defense’s theory about the Fisher’s Exact Test:

“In assessing Lipitor data, even after all of the liberties that [Jewell] took with selecting data, he still could not get a statistically-significant result employing a Fisher’s exact test, so he switched to another test called a mid-p test, which generated a (barely) statistically significant result.”

Kirby Griffis, “The Role of Statistical Significance in Daubert/Rule 702 Hearings,” at 19, Wash. Leg. Foundation Critical Legal Issues Working Paper No. 201 (Mar. 2017). See Kirby Griffis, “Beware the Weak Argument: The Rule of Thirteen,” For the Defense 72 (July 2013) (quoting Justice Frankfurter, “A bad argument is like the clock striking thirteen. It puts in doubt the others.”). The fallacy of Griffis’ argument is that it assumes that a mid-p calculation is a different statistical test from the Fisher’s Exact test, which yields a one-tailed significance probability. Unfortunately, Griffis’ important paper is marred by this and other misstatements about statistics.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed.Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).

The Education of Judge Rufe – The Zoloft MDL

April 9th, 2016

The Honorable Cynthia M. Rufe is a judge on the United States District Court, for the Eastern District of Pennsylvania.  Judge Rufe was elected to a judgeship on the Bucks County Court of Common Pleas in 1994.  She was appointed to the federal district court in 2002. Like most state and federal judges, little in her training and experience as a lawyer prepared her to serve as a gatekeeper of complex expert witness scientific opinion testimony.  And yet, the statutory code of evidence, and in particular, Federal Rules of Evidence 702 and 703, requires her do just that.

The normal approach to MDL cases is marked by the Field of Dreams: “if you build it, they will come.” Last week, Judge Rufe did something that is unusual in pharmaceutical litigation; she closed the gate and sent everyone home. In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016).

Her Honor’s decision was hardly made in haste.  The MDL began in 2012, and proceeded in a typical fashion with case management orders that required the exchange of general causation expert witness reports. The plaintiffs’ steering committee (PSC), acting for the plaintiffs, served the report of only one epidemiologist, Anick Bérard, who took the position that Zoloft causes virtually every major human congenital anomaly known to medicine. The defendants challenged the admissibility of Bérard’s opinions.  After extensive briefings and evidentiary hearings, the trial court found that Bérard’s opinions were riddled with inconsistent assessments of studies, eschewed generally accepted methods of causal inference, ignored contrary evidence, adopted novel, unreliable methods of endorsing “trends” in studies, and failed to address epidemiologic studies that did not support her subjective opinions. In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D.Pa.2014). The trial court permitted plaintiffs an opportunity to seek reconsideration of Bérard’s exclusion, which led to the trial court’s reaffirming its previous ruling. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 314149, at *2 (E.D.Pa. Jan. 23, 2015).

Notwithstanding the PSC’s claims that Bérard was the best qualified expert witness in her field and that she was the only epidemiologist needed to support the plaintiffs’ causal claims, the MDL court indulged the PSC by permitting plaintiffs another bite at the apple.  Over defendants’ objections, the court permitted the PSC to name yet another expert witness, statistician Nicholas Jewell, to do what Bérard had failed to do: proffer an opinion on general causation supported by sound science.  In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 115486, at * 2 (E.D.Pa. Jan. 7, 2015).

As a result of this ruling, the MDL dragged on for over a year, in which time, the PSC served a report by Jewell, and then the defendants conducted a discovery deposition of Jewell, and lodged a new Rule 702 challenge.  Although Jewell brought more statistical sophistication to the task, he could not transmute lead into gold; nor could he support the plaintiffs’ causal claims without committing most of the same fallacies found in Bérard’s opinions.  After another round of Rule 702 briefs and hearings, the MDL court excluded Jewell’s unwarranted causal opinions. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D.Pa. Dec. 2, 2015).

The successive exclusions of Bérard and Jewell left the MDL court in a peculiar position. There were other witnesses, Robert Cabrera, a teratologist, Michael Levin, a molecular biologist, and Thomas Sadler, an embryologist, whose opinions addressed animal toxicologic studies, biological plausibility, and putative mechanisms.  These other witnesses, however, had little or no competence in epidemiology, and they explicitly relied upon Bérard’s opinions with respect to human outcomes.  As a result of Bérard’s exclusion, these witnesses were left free to offer their views about what happens in animals at high doses, or about theoretical mechanisms, but they were unable to address human causation.

Although the PSC had no expert witnesses who could legitimately offer reasonably supported opinions about the causation of human birth defects, the plaintiffs refused to decamp and leave the MDL forum. Faced with the prospect of not trying their cases to juries, the PSC instead tried the patience of the MDL judge. The PSC pulled out the stops in adducing weak, irrelevant, and invalid evidence to support their claims, sans epidemiologic expertise. The PSC argued that adverse event reports, internal company documents that discussed possible associations, the biological plausibility opinions of Levin and Sadler, the putative mechanism opinions of Cabrera, differential diagnoses offered to support specific causation, and the hip-shot opinions of a former-FDA-commissioner-for-hire, David Kessler could come together magically to supply sufficient evidence to have their cases submitted to juries. Judge Rufe saw through the transparent effort to manufacture evidence of causation, and granted summary judgment on all remaining Zoloft cases in the MDL. s In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799, at *4 (E.D. Pa. April 5, 2016).

After a full briefing and hearing on Bérard’s opinion, a reconsideration of Bérard, a permitted “do over” of general causation with Jewell, a full briefing and hearing on Jewell’s opinions, the MDL court was able to deal deftly with the snippets of evidence “cobbled together” to substitute for evidence that might support a conclusion of causation. The PSC’s cobbled case was puffed up to give the appearance of voluminous evidence, in 200 exhibits that filled six banker’s boxes.  Id. at *5. The ruse was easily undone; most of the exhibits and purported evidence were obvious rubbish. “The quantity of the evidence is not, however, coterminous with the quality of evidence with regard to the issues now before the Court.” Id. The banker’s boxes contained artifices such as untranslated foreign-language documents, and company documents relating to the development and marketing of the medication. The PSC resubmitted reports from Levin, Cabrera, and Sadler, whose opinions were already adjudicated to be incompetent, invalid, irrelevant, or inadequate to support general causation.  The PSC pointed to the specific causation opinions of a clinical cardiologist, Ra-Id Abdulla, M.D., who proffered dubious differential etiologies, ruling in Zoloft as a cause of individual children’s birth defects, despite his inability to rule out truly known and unknown causes in the differential reasoning.  The MDL court, however, recognized that “[a] differential diagnosis assumes that general causation has been established,” id. at *7, and that Abdulla could not bootstrap general causation by purporting to reach a specific causation opinion (even if those specific causation opinions were legitimate).

The PSC submitted the recent consensus statement of the American Statistical Association (ASA)[1], which it misrepresented to be an epidemiologic study.  Id. at *5. The consensus statement makes some pedestrian pronouncements about the difference between statistical and clinical significance, about the need for other considerations in addition to statistical significance, in supporting causal claims, and the lack of bright-line distinctions for statistical significance in assessing causality.  All true, but immaterial to the PSC’s expert witnesses’ opinions that over-endorsed statistical significance in the few instances in which it was shown, and over-interpreted study data that was based upon data mining and multiple comparisons, in blatant violation of the ASA’s declared principles.

Stretching even further for “human evidence,” the PSC submitted documentary evidence of adverse event reports, as though they could support a causal conclusion.[2]  There are about four million live births each year, with an expected rate of serious cardiac malformations of about one per cent.[3]  The prevalence of SSRI anti-depressant use is at least two per cent, which means that we would expect 800 cardiac birth defects each year to occur in children of mother’s who took SSRI anti-depressants in the first trimester. If Zoloft had an average market share of all the SSRIs of about 25 per cent, then 200 cardiac defects each year would occur in children born to mothers who took Zoloft.  Given that Zoloft has been on the market since the early 1990s, we would expect that there would be thousands of children, exposed to Zoloft during embryogenesis, born with cardiac defects, if there was nothing untoward about maternal exposure to the medication.  Add the stimulated reporting of adverse events from lawyers, lawyer advertising, and lawyer instigation, you have manufactured evidence not probative of causation at all.[4] The MDL court cut deftly and swiftly through the smoke screen:

“These reports are certainly relevant to the generation of study hypotheses, but are insufficient to create a material question of fact on general causation.”

Id. at *9. The MDL court recognized that epidemiology was very important in discerning a causal connection between a common exposure and a common outcome, especially when the outcome has an expected rate in the general population. The MDL court stopped short of holding that epidemiologic evidence was required (which on the facts of the case would have been amply justified), but instead supported its ratio decidendi on the need to account for the extant epidemiology that contradicted or failed to support the strident and subjective opinions of the plaintiffs’ expert witnesses. The MDL court thus gave plaintiffs every benefit of the doubt by limiting its holding on the need for epidemiology to:

“when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why that body of research does not contradict or undermine their opinion.”

Id. at *5, quoting from In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449, 476 (E.D. Pa. 2014).

The MDL court also saw through the thin veneer of respectability of the testimony of David Kessler, a former FDA commissioner who helped make large fortunes for some of the members of the PSC by the feeding frenzy he created with his moratorium on silicone gel breast implants.  Even viewing Kessler’s proffered testimony in the most charitable light, the court recognized that he offered little support for a causal conclusion other than to delegate the key issues to epidemiologists. Id. at *9. As for the boxes of regulatory documents, foreign labels, and internal company memoranda, the MDL court found that these documents did not raise a genuine issue of material fact concerning general causation:

“Neither these documents, nor draft product documents or foreign product labels containing language that advises use of birth control by a woman taking Zoloft constitute an admission of causation, as opposed to acknowledging a possible association.”

Id.

In the end, the MDL court found that the PSC’s many banker boxes of paper contained too much of nothing for the issue at hand.  Having put the defendants through the time and expense of litigating and re-litigating these issues, nothing short of dismissing the pending cases was a fair and appropriate outcome to the Zoloft MDL.

_______________________________________

Given the denouement of the Zoloft MDL, it is worth considering the MDL judge’s handling of the scientific issues raised, misrepresented, argued, or relied upon by the parties.  Judge Rufe was required, by Rules 702 and 703, to roll up her sleeves and assess the methodological validity of the challenged expert witnesses’ opinions.  That Her Honor was able to do this is a testament to her hard work. Zoloft was not Judge Rufe’s first MDL, and she clearly learned a lot from her previous judicial assignment to an MDL for Avandia personal injury actions.

On May 21, 2007, the New England Journal of Medicine published online a seriously flawed meta-analysis of cardiovascular disease outcomes and rosiglitazone (Avandia) use.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  The Nissen article did not appear in print until June 14, 2007, but the first lawsuits resulted within a day or two of the in-press version. The lawsuits soon thereafter reached a critical mass, with the inevitable creation of a federal court Multi-District Litigation.

Within a few weeks of Nissen’s article, the Annals of Internal Medicine published an editorial by Cynthia Mulrow, and other editors, in which questioned the Nissen meta-analysis[5], and introduced an article that attempted to replicate Nissen’s work[6].  The attempted replication showed that the only way Nissen could have obtained his nominally statistically significant result was to have selected a method, Peto’s fixed effect method, known to be biased for use with clinical trials with uneven arms. Random effect methods, more appropriate for the clinically heterogeneous clinical trials, consistently failed to replicate the Nissen result. Other statisticians weighed in and pointed out that using the risk difference made much more sense when there were multiple trials with zero events in one or the other or both arms of the trials. Trials with zero cardiovascular events in both arms represented important evidence of low, but equal risk, of heart attacks, which should be captured in an appropriate analysis.  When the risk difference approach was used, with exact statistical methods, there was no statistically significant increase in risk in the dataset used by Nissen.[7] Other scientists, including some of Nissen’s own colleagues at the Cleveland Clinic, and John Ioannidis, weighed in to note how fragile and insubstantial the Nissen meta-analysis was[8]:

“As rosiglitazone case demonstrates, minor modifications of the meta-analysis protocol can change the statistical significance of the result.  For small effects, even the direction of the treatment effect estimate may change.”

Nissen achieved his political objective with his shaky meta-analysis.  The FDA convened an Advisory Committee meeting, which in turn resulted in a negative review of the safety data, and the FDA’s imposition of warnings and a Risk Evaluation and Mitigation Strategy, which all but prohibited use of rosiglizone.[9]  A clinical trial, RECORD, had already started, with support from the drug sponsor, GlaxoSmithKline, which fortunately was allowed to continue.

On a parallel track to the regulatory activities, the federal MDL, headed by Judge Rufe, proceeded to motions and a hearing on GSK’s Rule 702 challenge to plaintiffs’ evidence of general causation. The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that showed significant diffidence in addressing scientific issues.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011.

After Judge Rufe denied GSK’s challenges to the admissibility of plaintiffs’ expert witnesses’ causation opinions in the Avandia MDL, the RECORD trial was successfully completed and published.[10]  RECORD was a long term, prospectively designed randomized cardiovascular trial in over 4,400 patients, followed on average of 5.5 yrs.  The trial was designed with a non-inferiority end point of ruling out a 20% increased risk when compared with standard-of-care diabetes treatment The trial achieved its end point, with a hazard ratio of 0.99 (95% confidence interval, 0.85-1.16) for cardiovascular hospitalization and death. A readjudication of outcomes by the Duke Clinical Research Institute confirmed the published results.

On Nov. 25, 2013, after convening another Advisory Committee meeting, the FDA announced the removal of most of its restrictions on Avandia:

“Results from [RECORD] showed no elevated risk of heart attack or death in patients being treated with Avandia when compared to standard-of-care diabetes drugs. These data do not confirm the signal of increased risk of heart attacks that was found in a meta-analysis of clinical trials first reported in 2007.”

FDA Press Release, “FDA requires removal of certain restrictions on the diabetes drug Avandia” (Nov. 25, 2013). And in December 2015, the FDA abandoned its requirement of a Risk Evaluation and Mitigation Strategy for Avandia. FDA, “Rosiglitazone-containing Diabetes Medicines: Drug Safety Communication – FDA Eliminates the Risk Evaluation and Mitigation Strategy (REMS)” (Dec. 16, 2015).

GSK’s vindication came too late to reverse Judge Rufe’s decision in the Avandia MDL.  GSK spent over six billion dollars on resolving Avandia claims.  And to add to the company’s chagrin, GSK lost patent protection for Avandia in April 2012.[11]

Something good, however, may have emerged from the Avandia litigation debacle.  Judge Rufe heard from plaintiffs’ expert witnesses in Avandia about the hierarchy of evidence, about how observational studies must be evaluated for bias and confounding, about the importance of statistical significance, and about how studies that lack power to find relevant associations may still yield conclusions with appropriate meta-analysis. Important nuances of meta-analysis methodology may have gotten lost in the kerfuffle, but given that plaintiffs had reasonable quality clinical trial data, Avandia plaintiffs’ counsel could eschew their typical reliance upon weak and irrelevant lines of evidence, based upon case reports, adverse event disproportional reporting, and the like.

The Zoloft litigation introduced Judge Rufe to a more typical pharmaceutical litigation. Because the outcomes of interest were birth defects, there were no clinical trials.  To be sure, there were observational epidemiologic studies, but now the defense expert witnesses were carefully evaluating the studies for bias and confounding, and the plaintiffs’ expert witnesses were double counting studies and ignoring multiple comparisons and validity concerns.  Once again, in the Zoloft MDL, plaintiffs’ expert witnesses made their non-specific complaints about “lack of power” (without ever specifying the relevant alternative hypothesis), but it was the defense expert witnesses who cited relevant meta-analyses that attempted to do something about the supposed lack of power. Plaintiffs’ expert witnesses inconsistently argued “lack of power” to disregard studies that had outcomes that undermined their opinions, even when those studies had narrow confidence intervals surrounding values at or near 1.0.

The Avandia litigation laid the foundation for Judge Rufe’s critical scrutiny by exemplifying the nature and quantum of evidence to support a reasonable scientific conclusion.  Notwithstanding the mistakes made in the Avandia litigation, this earlier MDL created an invidious distinction with the Zoloft PSC’s evidence and arguments, which looked as weak and insubstantial as they really were.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>. SeeThe American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016).

[2] See 21 C.F.R. § 314.80 (a) Postmarketing reporting of adverse drug experiences (defining “[a]dverse drug experience” as “[a]ny adverse event associated with the use of a drug in humans, whether or not considered drug related”).

[3] See Centers for Disease Control and Prevention, “Birth Defects Home Page” (last visited April 8, 2016).

[4] See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).

[5] Cynthia D. Mulrow, John Cornell & A. Russell Localio, “Rosiglitazone: A Thunderstorm from Scarce and Fragile Data,” 147 Ann. Intern. Med. 585 (2007).

[6] George A. Diamond, Leon Bax & Sanjay Kaul, “Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infartion and Cardiovascular Death,” 147 Ann. Intern. Med. 578 (2007).

[7] Tian, et al., “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction” 10 Biostatistics 275 (2008)

[8] Adrian V. Hernandez, Esteban Walker, John P.A. Ioannidis,  and Michael W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

[9] Janet Woodcock, FDA Decision Memorandum (Sept. 22, 2010).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11]Pharmacovigilantism – Avandia Litigation” (Nov. 27, 2013).

The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees

March 19th, 2016

People say crazy things. In a radio interview, Evangelical Michael Huckabee argued that the Kentucky civil clerk who refused to issue a marriage license to a same-sex couple was as justified in defying an unjust court decision as people are justified in disregarding Dred Scott v. Sanford, 60 U.S. 393 (1857), which Huckabee described as still the “law of the land.”1 Chief Justice Roger B. Taney would be proud of Huckabee’s use of faux history, precedent, and legal process to argue his cause. Definition of “huckabee”: a bogus factoid.

Consider the case of Sander Greenland, who attempted to settle a score with an adversary’s expert witness, who had opined in 2002, that Bayesian analyses were rarely used at the FDA for reviewing new drug applications. The adversary’s expert witness obviously got Greenland’s knickers in a knot because Greenland wrote an article in a law review of all places, in which he presented his attempt to “correct the record” and show how the statement of the opposing expert witness was“ludicrous” .2 To support his indictment on charges of ludicrousness, Greenland ignored the FDA’s actual behavior in reviewing new drug applications,3 and looked at the practice of the Journal of Clinical Oncology, a clinical journal published 24 issues a year, with occasional supplements. Greenland found the word “Bayesian” 50 times in over 40,000 journal pages, and declared victory. According to Greenland, “several” (unquantified) articles had used Bayesian methods to explore, post hoc, statistically nonsignificant results.”4

Given Greenland’s own evidence, the posterior odds that Greenland was correct in his charges seem to be disturbingly low, but he might have looked at the published papers that conducted more serious, careful surveys of the issue.5 This week, the Journal of the American Medical Association published yet another study by John Ioannidis and colleagues, which documented actual practice in the biomedical literature. And no surprise, Bayesian methods barely register in a systematic survey of the last 25 years of published studies. See David Chavalarias, Joshua David Wallach, Alvin Ho Ting Li, John P. A. Ioannidis, “Evolution of reporting P values in the biomedical literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016). See also Demetrios N. Kyriacou, “The Enduring Evolution of the P Value,” 315 J. Am. Med. Ass’n 1113 (2016) (“Bayesian methods are not frequently used in most biomedical research analyses.”).

So what are we to make of Greenland’s animadversions in a law review article? It was a huckabee moment.

Recently, the American Statistical Association (ASA) issued a statement on the use of statistical significance and p-values. In general, the statement was quite moderate, and declined to move in the radical directions urged by some statisticians who attended the ASA’s meeting on the subject. Despite the ASA’s moderation, the ASA’s statement has been met with huckabee-like nonsense and hyperbole. One author, a pharmacologist trained at the University of Washington, with post-doctoral training at the University of California, Berkeley, and an editor of PloS Biology, was moved to write:

However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Lauren Richardson, “Is the p-value pointless?” (Mar. 16, 2016). And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure. Richardson suffered a lapse and wrote a huckabee.

Not surprisingly, lawyers attempting to spin the ASA’s statement have unleashed entire hives of huckabees in an attempt to deflate the methodological points made by the ASA. Here is one example of a litigation-industry lawyer who argues that the American Statistical Association Statement shows the irrelevance of statistical significance for judicial gatekeeping of expert witnesses:

To put it into the language of Daubert, debates over ‘p-values’ might be useful when talking about the weight of an expert’s conclusions, but they say nothing about an expert’s methodology.”

Max Kennerly, “Statistical Significance Has No Place In A Daubert Analysis” (Mar. 13, 2016) [cited as Kennerly]

But wait; the expert witness must be able to rule out chance, bias and confounding when evaluating a putative association for causality. As Austin Bradford Hill explained, even before assessing a putative association for causality, scientists need first to have observations that

reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

The analysis of random error is an essential step on the methodological process. Simply because a proper methodology requires consideration of non-statistical factors does not remove the statistical from the methodology. Ruling out chance as a likely explanation is a crucial first step in the methodology for reaching a causal conclusion when there is an “expected value” or base rate of for the outcome of interest in the population being sampled.

Kennerly shakes his hive of huckabees:

The erroneous belief in an ‘importance of statistical significance’ is exactly what the American Statistical Association was trying to get rid of when they said, ‘The widespread use of “statistical significance” (generally interpreted as p ≤ 0.05)’ as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

And yet, the ASA never urged that scientists “get rid of” statistical analyses and assessments of attained levels of significance probability. To be sure, they cautioned against overinterpreting p-values, especially in the context of multiple comparisons, non-prespecified outcomes, and the like. The ASA criticized bright-line rules, which are often used by litigation-industry expert witnesses to over-endorse the results of studies with p-values less than 5%, often in the face of multiple comparisons, cherry-picked outcomes, and poorly and incompletely described methods and results. What the ASA described as a “considerable distortion of the scientific process” was claiming scientific truth on the basis of “p < 0.05.” As Bradford Hill pointed out in 1965, a clear-cut association, beyond that which we would care to attribute to chance, is the beginning of the analysis of an association for causality, not the end of it. Kennerly ignores who is claiming “truth” in the litigation context.  Defense expert witnesses frequently are opining no more than “not proven.” The litigation industry expert witnesses must opine that there is causation, or else they are out of a job.

The ASA explained that the distortion of the scientific process comes from making a claim of a scientific conclusion of causality or its absence, when the appropriate claim is “we don’t know.” The ASA did not say, suggest, or imply that a claim of causality can be made in the absence of finding statistical significance, and as well as validation of the statistical model on which it is based, and other factors as well. The ASA certainly did not say that the scientific process will be served well by reaching conclusions of causation without statistical significance. What is clear is that statistical significance should not be an abridgment for a much more expansive process. Reviewing the annals of the International Agency for Research on Cancer (even in its currently politicized state), or the Institute of Medicine, an honest observer would be hard pressed to come up with examples of associations for outcomes that have known base rates, which associations were determined to be causal in the absence of studies that exhibited statistical significance, along with many other indicia of causality.

Some other choice huckabees from Kennerly:

“It’s time for courts to start seeing the phrase ‘statistically significant’ in a brief the same way they see words like ‘very,’ ‘clearly,’ and ‘plainly’. It’s an opinion that suggests the speaker has strong feelings about a subject. It’s not a scientific principle.”

Of course, this ignores the central limit theorems, the importance of random sampling, the pre-specification of hypotheses and level of Type I error, and the like. Stuff and nonsense.

And then in a similar vein, from Kennerly:

The problem is that many courts have been led astray by defendants who claim that ‘statistical significance’ is a threshold that scientific evidence must pass before it can be admitted into court.”

In my experience, litigation-industry lawyers oversell statistical significance rather than defense counsel who may question reliance upon studies that lack it. Kennerly’s statement is not even wrong, however, because defense counsel knowledgeable of the rules of evidence would know that statistical studies themselves are rarely admitted into evidence. What is admitted, or not, is the opinion of expert witnesses, who offer opinions about whether associations are causal, or not causal, or inconclusive.


1 Ben Mathis-Lilley, “Huckabee Claims Black People Aren’t Technically Citizens During Critique of Unjust Laws,” The Slatest (Sept. 11 2015) (“[T]he Dred Scott decision of 1857 still remains to this day the law of the land, which says that black people aren’t fully human… .”).

2 Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014).

3 To be sure, eight years after Greenland published this diatribe, the agency promulgated a guidance that set recommended practices for Bayesian analyses in medical device trials. FDA Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (February 5, 2010); 75 Fed. Reg. 6209 (February 8, 2010); see also Laura A. Thompson, “Bayesian Methods for Making Inferences about Rare Diseases in Pediatric Populations” (2010); Greg Campbell, “Bayesian Statistics at the FDA: The Trailblazing Experience with Medical Devices” (Presentation give by Director, Division of Biostatistics Center for Devices and Radiological Health at Rutgers Biostatistics Day, April 3, 2009). Even today, Bayesian analysis remains uncommon at the U.S. FDA.

4 39 Wake Forest Law Rev. at 306-07 & n.61 (citing only one paper, Lisa Licitra et al., Primary Chemotherapy in Resectable Oral Cavity Squamous Cell Cancer: A Randomized Controlled Trial, 21 J. Clin. Oncol. 327 (2003)).

5 See, e.g., J. Martin Bland & Douglas G. Altman, “Bayesians and frequentists,” 317 Brit. Med. J. 1151, 1151 (1998) (“almost all the statistical analyses which appear in the British Medical Journal are frequentist”); David S. Moore, “Bayes for Beginners? Some Reasons to Hesitate,” 51 The Am. Statistician 254, 254 (“Bayesian methods are relatively rarely used in practice”); J.D. Emerson & Graham Colditz, “Use of statistical analysis in the New England Journal of Medicine,” in John Bailar & Frederick Mosteler, eds., Medical Uses of Statistics 45 (1992) (surveying 115 original research studies for statistical methods used; no instances of Bayesian approaches counted); Douglas Altman, “Statistics in Medical Journals: Developments in the 1980s,” 10 Statistics in Medicine 1897 (1991); B.S. Everitt, “Statistics in Psychiatry,” 2 Statistical Science 107 (1987) (finding only one use of Bayesian methods in 441 papers with statistical methodology).

The American Statistical Association’s Statement on and of Significance

March 17th, 2016

In scientific circles, some commentators have so zealously criticized the use of p-values that they have left uninformed observers with the impression that random error was not an interesting or important consideration in evaluating the results of a scientific study. In legal circles, counsel for the litigation industry and their expert witnesses have argued duplicitously that statistical significance was at once both unimportant, except when statistical significance is observed, in which causation is conclusive. The recently published Statement of the American Statistical Association (“ASA”) restores some sanity to the scientific and legal discussions of statistical significance and p-values. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>.

Recognizing that sound statistical practice and communication affects research and public policy decisions, the ASA has published a statement of interpretative principles for statistical significance and p-values. The ASA’s statement first, and foremost, points out that the soundness of scientific conclusions turns on more than statistical methods alone. Study design, conduct, and evaluation often involve more than a statistical test result. And the ASA goes on to note, contrary to the contrarians, that “the p-value can be a useful statistical measure,” although this measure of attained significance probability “is commonly misused and misinterpreted.” ASA at 7. No news there.

The ASA’s statement puts forth six principles, all of which have substantial implications for how statistical evidence is received and interpreted in courtrooms. All are worthy of consideration by legal actors – legislatures, regulators, courts, lawyers, and juries.

1. P-values can indicate how incompatible the data are with a specified statistical model.”

The ASA notes that a p-value shows the “incompatibility between a particular set of data and a proposed model for the data.” Although there are some in the statistical world who rail against null hypotheses of no association, the ASA reports that “[t]he most common context” for p-values consists of a statistical model that includes a set of assumptions, including a “null hypothesis,” which often postulates the absence of association between exposure and outcome under study. The ASA statement explains:

The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.”

Some lawyers want to overemphasize statistical significance when present, but to minimize the importance of statistical significance when it is absent.  They will find no support in the ASA’s statement.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”

Of course, there are those who would misinterpret the meaning of p-values, but the flaw lies in the interpreters, not in the statistical concept.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Note that the ASA did not say that statistical significance is irrelevant to scientific conclusions. Of course, statistical significance is but one factor, which does not begin to account for study validity, data integrity, or model accuracy. The ASA similarly criticizes the use of statistical significance as a “bright line” mode of inference, without consideration of the contextual considerations of “the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis.” Criticizing the use of “statistical significance” as singularly assuring the correctness of scientific judgment does not, however, mean that “statistical significance” is irrelevant or unimportant as a consideration in a much more complex decision process.

4. Proper inference requires full reporting and transparency”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

ASA Statement. See also “Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses” (Oct. 14, 2014).

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”

The ASA notes the commonplace distinction between statistical and practical significance. The independence between statistical and practice significance does not, however, make statistical significance irrelevant, especially in legal and regulatory contexts, in which parties claim that a risk, however small, is relevant. Of course, we want the claimed magnitude of association to be relevant, but we also need the measured association to be accurate and precise.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”

Of course, a p-value cannot validate the model, which is assumed to generate the p-value. Contrary to the hyperbolic claims one sees in litigation, the ASA notes that “a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.” And so the ASA counsels that “data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.” 

What is important, however, is that the ASA never suggests that significance testing or measurement of significance probability is not an important and relevant part of the process. To be sure, the ASA notes that because of “the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches.”

First of these other methods unsurprisingly is estimation with assessment of confidence intervals, although the ASA also includes Bayesian and other methods as well. There are some who express irrational exuberance about the protential of Bayesian methods to restore confidence in scientific process and conclusions. Bayesian approaches are less manipulated than frequentist ones, largely because very few people use Bayesian methods, and even fewer people really understand them.

In some ways, Bayesian statistical approaches are like Apple computers. The Mac OS is less vulnerable to viruses, compared with Windows, because its lower market share makes it less attractive to virus code writers. As Apple’s OS has gained market share, its vulnerability has increased. (My Linux computer on the other hand is truly less vulnerable to viruses because of system architecture, but also because Linux personal computers have almost no market share.) If Bayesian methods become more prevalent, my prediction is that they will be subject to as much abuse as frequent views. The ASA wisely recognized that the “reproducibility crisis” and loss of confidence in scientific research were mostly due to bias, both systematic and cognitive, in how studies are done, interpreted, and evaluated.

District Court Denies Writ of Coram Nobis to Dr Harkonen

August 27th, 2015

Courts are generally suspicious of convicted defendants who challenge the competency of their trial counsel on any grounds that might reflect strategic trial decisions. A convicted defendant can always speculate about how his trial might have gone better had some witnesses, who did not fare well at trial, not been called. Similarly, a convicted defendant might well speculate that his trial counsel could and should have called other or better witnesses. Still, sometimes, trial counsel really do screw up, especially when it comes to technical, scientific, or statistical issues.

The Harkonen case is a true comedy of errors – statistical, legal, regulatory, and practical. Indeed, some would say it is truly criminal to convict someone for an interpretation of a clinical trial result.[1] As discussed in several previous posts, Dr. W. Scott Harkonen was convicted under the wire fraud statute, 18 U.S.C. § 1343, for having distributed a faxed press release about InterMune’s clinical trial, in which he described the study as having “demonstrated” Actimmune’s survival benefit in patients with mild to moderate idiopathic pulmonary fibrosis (cryptogenic fibrosing alveolitis). The trial had not shown a statistically significant result on its primary outcome, and the significance probability on the secondary outcome of survival benefit was 0.08. Dr. Harkonen reported on a non-prespecified subgroup of patients with mild to moderate disease at randomization, in which subgroup, the trial showed better survival in the experimental therapy group, p-value of 0.004, compared with the placebo group.

Having exhausted his direct appeal, Dr. Harkonen petitioned for post-conviction relief in the form of a writ of coram nobis, on grounds of ineffective assistance of counsel. Last week, federal District Judge Richard Seeborg, in San Francisco, denied Dr. Harkonen’s petition. United States v. Harkonen, Case No. 08-cr-00164-RS-1, Slip op. (N.D. Cal. Aug. 21, 2015). See Dani Kass, “Ex-InterMune CEO’s Complaints Against Trial Counsel Nixed,” Law360 (Aug. 24, 2015). Judge Seeborg held that Dr. Harkonen had failed to explain why he had not raised the claim of ineffective assistance earlier, and that trial counsel’s tactical and strategic decisions, with respect to not calling statistical expert witnesses, were “not so beyond the pale of reasonable conduct as to warrant the finding of ineffective assistance.” Slip op. at 1.

To meet its burden at trial, the government presented Dr. Thomas Fleming, a statistician and “trialist,” who had served on the data safety and monitoring board of the clinical trial at issue.[2] Fleming took the rather extreme view that a clinical trial that “fails” to meet its primary pre-stated end point at the conventional p-value of less than 5 percent is an abject failure and provides no demonstration of any claim of efficacy. (Other experts might well say that the only failed clinical trial is one that was not done.) Judge Seeborg correctly discerned that Fleming’s testimony was in the form of an opinion, and that the law of wire fraud prohibits prosecution of scientific opinions about which reasonable scientists may differ. The government’s burden was thus to show, beyond a reasonable doubt, that no reasonable scientist could have reported the Actimmune clinical trial as having “demonstrated” a survival benefit in the mild to moderate disease subgroup. Slip op. at 2.

Remarkably, at trial, the government presented no expert witnesses, and Fleming testified as a fact witness. While acknowledging that the contested issue, whether anyone could fairly say that the Actimmune clinical trial had demonstrated efficacy in a non-prespecified subgroup, called for an opinion, Judge Seeborg gave the government a pass for not presenting expert witnesses to make out its case. Indeed, Judge Seeborg noted that the government had “stressed testimony from its experts touting the view that study results without sufficiently low p-values are inherently unreliable and meaningless.” Slip op. at 3 (emphasis added). Judge Seeborg’s description of Fleming as an expert witness is remarkable because the government never sought to qualify Dr. Fleming as an expert witness, and the trial judge never gave the jury an instruction on how to evaluate the testimony of an expert witness, including an explanation that the jury was free to accept some, all, or none of Fleming’s opinion testimony. After the jury returned its guilty verdict, Harkonen’s counsel filed a motion for judgment of acquittal, based in part upon the government’s failure to qualify Fleming as an expert witness in the field of biostatistics. The trial judge refused this motion on grounds that

(1) at one point Fleming had been listed as an expert witness;

(2) Fleming’s curriculum vitae had been marked and admitted into evidence; and

(3) “[m]ost damningly,” according to the trial judge, Harkonen’s lawyers had failed to object to Fleming’s holding forth on opinions about statistical theory and practice.

Slip op. at 7. Damning indeed as evidence of a potentially serious deviation from a reasonable standard of care and competence for trial practice! On the petition for coram nobis, Judge Seeborg curiously refers to Dr. Harkonen as not objecting, when the very issue before the court, on the petition for coram nobis, is the competency of his counsel’s failing to object. Allowing a well-credentialed statistician, such as Fleming, to testify, without requesting a limiting instruction on expert witness opinion testimony certainly seems “beyond the pale.” If there were some potential tactic involved in this default, Judge Seeborg does not identify it, and none comes to mind. And even if this charade, of calling Fleming as a fact witness, were some sort of tactical cat-and-mouse litigation game between government and defendant, certainly the trial judge should have taken control of the matter by disallowing a witness, not tendered as an expert witness, from offering opinion testimony on arcane statistical issues.

Having not objected to Fleming’s opinions, Dr. Harkonen’s counsel decided not to call its own defense expert witnesses. The post-conviction court makes much of the lesser credentials of the defense witnesses, and a decision not to call expert witnesses based upon defense counsel’s apparent belief that it had undermined Fleming’s opinion on cross-examination. There is little in the cross-examination of Fleming to support the coram nobis court’s assessment. Fleming’s opinions were vulnerable in ways that trial counsel failed to exploit, and in ways that even a lesser credentialed expert witness could have made clear to a lay jury or the court. Even a journeyman statistician would have realized that Fleming had overstated the statistical orthodoxy that p-values are “magical numbers,” by noting that many statisticians and epidemiologists disagreed with invoking statistical hypothesis testing as a rigid decision procedure, based upon p-values less than 0.05. Indeed, the idea of statistical testing as driven by a rigid, pre-selected level of acceptable Type 1 error rate was rejected by the very statistician who developed and advanced computations of the p-value. See Sir Ronald Fisher, Statistical Methods and Scientific Inference 42 (Hafner 1956) (ridiculing rigid hypothesis testing as “absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.”).

After the jury convicted on the wire fraud count, Dr. Harkonen changed counsel from Kasowitz Benson Torres & Friedman LLP, to Mark Haddad at Sidley Austin LLP. Mr. Haddad was able, in relatively short order, to line up two outstanding statisticians, Professor Steven Goodman, of Stanford University’s Medical School, and Professor Donald Rubin, of Harvard University. Both Professors Goodman and Rubin robustly rejected Fleming’s orthodox positions in post-trial declarations, which were too late to affect the litigation of the merits, although their contributions may well have made it difficult for the trial judge to side with the government on its request for a Draconian ten-year prison sentence. From my own perspective, I can say it was not difficult to recruit two leading, capable epidemiologists, Professors Kenneth Rothman and Timothy Lash to join in an amicus brief that criticized Fleming’s testimony in a way that would have been devastating had it been done at trial.

The entire Harkonen affair is marked by extraordinary governmental hypocrisy. As Judge Seeborg reports:

“[t]hroughout its case in chief, the government stressed testimony from Fleming and Crager who offered that, in the world of biostatistical analysis, a 0.05 p-value threshold is ‘somewhat of a magic number’; that the only meaningful p-value from a study is the one for its primary endpoint; and that data from post-hoc subgroup analyses cannot be reported upon accurately without information about the rest of the sampling context.”[3]

Slip op. at 4. And yet, in another case, when it was politically convenient to take the opposite position, the government proclaimed, through its Solicitor General, on behalf of the FDA, that statistical significance at any level is not necessary at all for demonstrating causation:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). The methods of epidemiology and data analysis are not, however, so amenable to political expedience. The government managed both to overstate the interpretation of p-values in Harkonen, and to understate them in Matrixx Initiatives.

Like many of the judges who previously have ruled on one or another issue in the Harkonen case, Judge Seeborg struggled with statistical concepts and gave a rather bizarre, erroneous definition of what exactly was at issue with the p-values in the Actimmune trial:

“In clinical trials, a p-value is a number between one and zero which represents the probability that the results establish a cause-and-effect relationship, rather than a random effect, between the drug and a positive health benefit. Because a p-value indicates the degree to which the tested drug does not explain observed benefits, the smaller the p-value, the larger a study’s significance.”

Slip op. at 2-3. Ultimately, this error was greatly overshadowed by a simpler error of overlooking, and condoning, trial counsel’s default in challenging the government’s failure to present credible expert witness opinion testimony on the crucial issue in the case.

At the heart of the government’s complaint is that Dr. Harkonen’s press release does not explicitly disclose that the subgroup of mild and moderate disease patients was not pre-specified for analysis in the trial protocol and statistical analysis plan. Dr. Harkonen’s failure to disclose the ad hoc nature of the subgroup, while not laudable, hardly rose to the level of criminal fraud, especially when considered in the light of the available prior clinical trials on the same medication, and prevalent practice in not making the appropriate disclosure in press releases, and even in full, peer-reviewed publications of clinical trials and epidemiologic studies.

For better or worse, the practice of presenting unplanned subgroup analyses, is quite common in the scientific community. Several years ago the New England Journal of Medicine published a survey of publication practice in its own pages, and documented the widespread failure to limit “demonstrated” findings to pre-specified analyses.[4] In general, the survey authors were unable to determine the total number of subgroup analyses performed; and in the majority (68%) of trials discussed, the authors could not determine whether the subgroup analyses were pre-specified.[5] Although the authors of this article proposed guidelines for identifying subgroup analyses as pre-specified or post-hoc, they emphasized that the proposals were not “rules” that could be rigidly prescribed.[6]

Of course, what was at issue in Dr. Harkonen’s case was not a peer-reviewed article in a prestigious journal, but a much more informal, less rigorous communication that is typical of press releases. Lack of rigor in this context is not limited to academic and industry press releases. Consider the press release recently issued by the National Institutes of Health (NIH) in connection with a NIH funded clinical trial on age-related macular degeneration (AMD). NIH Press Release, “NIH Study Provides Clarity on Supplements for Protection against Blinding Eye Disease,” NIH News & Events Website (May 5, 2013) [last visited August 27, 2015]. The clinical trial studied a modified dietary supplement in common use to prevent or delay AMD. The NIH’s press release claimed that the study “provides clarity on supplements,” and announced a “finding” of “some benefits” when looking at just two of the subgroups. The press release does not use the words “post hoc” or “ad hoc” in connection with the subgroup analysis used to support the “finding” of benefit.

The clinical trial results were published the same day in a journal article that labeled the subgroup findings as post hoc subgroup findings.[7] The published paper also reported that the pre-specified endpoints of the clinical trial did not show statistically significant differences between therapies and placebo.

None of the p-values for any of the post-hoc subgroup analysis was adjusted for multiple comparisons. NIH webpages with Questions and Answers for the public and the media both fail to report the post-hoc nature of the subgroup findings.[8] By the standards imposed upon Dr. Harkonen in this case through Dr. Fleming’s testimony, and contrary to the NIH’s public representations, the NIH trial had “failed,” and no inferences could be drawn with respect to any endpoint because the primary endpoint did not yield a statistically significant result.

There are, to be sure, hopeful signs that the prevalent practice is changing. A recent article documented an increasing number of “null” effect clinical trials that have been reported, perhaps as the result of better reporting of trials without dramatic successes, increasing willingness to publish such trial results, and greater availability of trial protocols in advance of, or with, peer-review publication of trial results.[9] Transparency in clinical and other areas of research is welcome and should be the norm, descriptively and prescriptively, but we should be wary of criminalizing lapses with indictments of wire fraud for conduct that can be found in most scientific journals and press releases.


[1] See, e.g.,Who Jumped the Shark in United States v. Harkonen”; “Multiplicity versus Duplicity – The Harkonen Conviction”; “The (Clinical) Trial by Franz Kafka”; “Further Musings on U.S. v. Harkonen”; and “Subgroups — Subpar Statistical Practice versus Fraud.” In the Supreme Court, two epidemiologists and a law school lecturer filed an Amicus Brief that criticized the government’s statistical orthodoxy. Brief by Scientists And Academics as Amici Curiae, in Harkonen v. United States, 2013 WL 5915131, 2013 WL 6174902 (Supreme Court Sept. 9, 2013).

[2] The government also presented the testimony of Michael Crager, an InterMune biostatistician. Reading between the lines, we may infer that Dr. Crager was induced to testify in exchange for not being prosecuted, and that his credibility was compromised.

[3] This testimony was particularly egregious because mortality or survival is often the most important outcome measure, but frequently not made the primary trial end point because of concern over whether there would be a sufficient number of deaths over the course of the trial to assess efficacy in this outcome. In the context of the Actimmune trial, this concern was in full display, but as it turned out, when the data were collected, there was a survival benefit (p = 0.08, which shrank to 0.055 when the analysis was limited to patients who met entrance criteria, and shrank further to 0.004, when the analysis was limited plausibly to patients with only mild or moderate disease at randomization).

[4] Rui Wang, et al., “Statistics in Medicine – Reporting of Subgroup Analyses in Clinical Trials,” 357 New Eng. J. Med. 2189 (2007).

[5] Id. at 2192.

[6] Id. at 2194.

[7] Emily Chew, et al., Lutein + Zeaxanthin and Omega-3 Fatty Acids for Age-Related Macular Degeneration, 309 J. Am. Med. Ass’n 2005 (2013).

[8] SeeFor the Public: What the Age-Related Eye Disease Studies Mean for You” (May 2013) [last visited August 27, 2015]; “For the Media: Questions and Answers about AREDS2” (May 2013) [last visited August 27, 2015].

[9] See Robert M. Kaplan & Veronica L. Irvin, “Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time,” 10 PLoS ONE e0132382 (2015); see also Editorial, “Trials register sees null results rise,” 524 Nature 269 (Aug. 20, 2015); Paul Basken, “When Researchers State Goals for Clinical Trials in Advance, Success Rates Plunge,” The Chronicle of Higher Education (Aug. 5, 2015).

Canadian Judges’ Reference Manual on Scientific Evidence

July 24th, 2015

I had some notion that there was a Canadian version of the Reference Manual on Scientific Evidence in the works, but Professor Greenland’s comments in a discussion over at Deborah Mayo’s blog drew my attention to the publication of the Science Manual for Canadian Judges [Manual]. See “‘Statistical Significance’ According to the U.S. Dept. of Health and Human Services (ii),Error Statistics Philosophy (July 17, 2015).

The Manual is the product of the Canadian National Judicial Institute (NJI), which is an independent, not-for-profit group that is committed to educating Canadian judges. The NJI’s website describes the Manual:

“Without the proper tools, the justice system can be vulnerable to unreliable expert scientific evidence.

* * *

The goal of the Science Manual is to provide judges with tools to better understand expert evidence and to assess the validity of purportedly scientific evidence presented to them. …”

The Chief Justice of Canada, Hon. Beverley M. McLachlin, contributed an introduction to the Manual, which was notable for its frank admission that:

[w]ithout the proper tools, the justice system is vulnerable to unreliable expert scientific evidence.

****

Within the increasingly science-rich culture of the courtroom, the judiciary needs to discern ‘good’ science from ‘bad’ science, in order to assess expert evidence effectively and establish a proper threshold for admissibility. Judicial education in science, the scientific method, and technology is essential to ensure that judges are capable of dealing with scientific evidence, and to counterbalance the discomfort of jurists confronted with this specific subject matter.”

Manual at 14. These are laudable goals, indeed.

The first chapter of the Manual is an overview of Canadian law of scientific evidence, “The Legal Framework for Scientific Evidence,” by Canadian law professors Hamish Stewart (University of Toronto), and Catherine Piché (University of Montreal). Several judges served as peer reviewers.

The second chapter, “Science and the Scientific Method,” contains the heart of what judges supposedly should know about scientific and statistical matters to serve as effective “gatekeepers.” Like the chapters in the Reference Manual on Scientific Evidence, this chapter was prepared by a scientist author (Scott Findlay, Ph.D., Associate Professor of Biology, University of Ottawa) and a lawyer author (Nathalie Chalifour, Associate Professor of Law, University of Ottawa). Several judges, and Professor Brian Baigrie (University of Toronto, Victoria College, and the Institute for the History and Philosophy of Science and Technology) provided peer review. The chapter attempts to cover the demarcation between science and non-science, and between scientific and other expert witness opinion. The authors describe “the” scientific method, hypotheses, experiments, predictions, inference, probability, statistics and statistical hypothesis testing, data reliability, and related topics. A subsection of chapter two is entitled “Normative Issues in Science – The Myth of Scientific Objectivity,” which suggests a Feyerabend, post-modernist influence at odds with the Chief Justice’s aspirational statement of goals in her introduction to the Manual.

Greenland noted some rather cavalier statements in Chapter two that suggest that the conventional alpha of 5% corresponds to a “scientific attitude that unless we are 95% sure the null hypothesis is false, we provisionally accept it.” And he pointed elsewhere where the chapter seems to suggest that the coefficient of confidence that corresponds to an alpha of 5% “constitutes a rather high standard of proof,” thus confusing and conflating probability of random error with posterior probabilities. Some have argued that these errors are simply an effort to make statistical concepts easier to grasp for lay people, but the statistics chapter in the FJC’s Reference Manual shows that accurate exposition of statistical concepts can be made understandable. The Canadian Manual seems in need of some trimming with Einstein’s razor, usually paraphrased as “Everything should be made as simple as possible, but no simpler.[1] The razor should certainly applied to statistical concepts, with the understanding that pushing to simplify too aggressively can sometimes result in simplistic, and simply wrong, exposition.

Chapter 3 returns to more lawyerly matters, “Managing and Evaluating Expert Evidence in the Courtroom,” prepared and peer-reviewed by prominent Canadian lawyers and judges. The final chapter, “Ethics of the Expert Witness,” should be of interest to lawyers and judges in the United States, where the topic is largely ignored. The chapter was prepared by Professor Adam Dodek (University of Ottawa), along with several writers from the National Judicial Institute, the Canadian Judicial Council, American College of Trial Lawyers, Environment Canada, and notably, Joe Cecil & the Federal Judicial Center.

Weighing in at 228 pages, the Science Manual for Canadian Judges is much shorter than the Federal Judicial Center’s Reference Manual on Scientific Evidence. Unlike the FJC’s Reference Manual, which is now in its third edition, the Canadian Manual has no separate chapters on regression, DNA testing and forensic evidence, clinical medicine and epidemiology. The coverage of statistical inference is concentrated in chapter two, but that chapter has no discussion of meta-analysis, systematic review, evidence-based medicine, confounding, and the like. Perhaps there will soon be a second edition of the Science Manual for Canadian Judges.


[1] See Albert Einstein, “On the Method of Theoretical Physics; The Herbert Spencer Lecture,” delivered at Oxford (10 June 1933), published in 1 Philosophy of Science 163 (1934) (“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”).

Discovery of Retained, Testifying Statistician Expert Witnesses (Part 2)

July 1st, 2015

Discovery Beyond the Report and the Deposition

The lesson of the cases interpreting Rule 26 is that counsel cannot count exclusively upon the report and automatic disclosure requirements to obtain the materials necessary or helpful for cross-examination of statisticians who have created their own analyses. Sometimes just asking nicely suffices[1]. Other avenues of discovery are available, however, for reluctant disclosers. In particular, Rule 26(b) authorizes discovery substantially broader than what is required for inclusion in an expert witness’s report.

Occasionally, counsel cite caselaw that has been superseded by the steady expansion of Rule 26[2]. The 1993 amendments made clear, however, that Rule 26 sets out mandatory minimum requirements that do not define or exhaust the available discovery tools to obtain information from expert witnesses[3]. Some courts continue to insist that a party make a showing of necessity to go beyond the minimal requirements of Rule 26[4], although the better reasoned cases take a more expansive view of the proper scope of expert witness discovery[5].

Although the federal rules may not require the expert witness report to include, or to attach, all “working notes or recordings,” or calculations, alternative analyses, and data output files, these materials may be the subject of proper document requests to the adverse party or perhaps subpoenas to the expert witness.  The Advisory Committee Notes explain that the various techniques of discovery kick in by virtue of Rule 26(b), where automatic disclosure and report requirements of Rule 26(a) leave off:

“Rules 26(b)(4)(B) and (C) do not impede discovery about the opinions to be offered by the expert or the development, foundation, or basis of those opinions. For example, the expert’s testing of material involved in litigation, and notes of any such testing, would not be exempted from discovery by this rule. Similarly, inquiry about communications the expert had with anyone other than the party’s counsel about the opinions expressed is unaffected by the rule. Counsel are also free to question expert witnesses about alternative analyses, testing methods, or approaches to the issues on which they are testifying, whether or not the expert considered them in forming the opinions expressed. These discovery changes therefore do not affect the gatekeeping functions called for by Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and related cases.[6]

The court in Ladd Furniture v. Ernst & Young explained the structure of Rule 26 with respect to underlying documents, calculations, and data[7].  In particular, the requirements of the Rule 26(a) report do not create a limitation on Rule 26(b) discovery:

“As a basis for withholding the above information, Ladd argues that Ernst & Young is not entitled to discover any expert witness information which is not specifically mentioned in Rule 26(a)(2)(B). However, as explained below, Ladd’s position on this point is not supported by the text of Rule 26 or by the Advisory Committee’s commentary to Rule 26(a). In the text, Rule 26(a)(2)(B) provides for the mandatory disclosure of certain expert witness information, even without a request from the opposing party. However, there is no indication on the face of the rule to suggest that a party is absolutely prohibited from seeking any additional information about an opponent’s expert witnesses. In fact, Rule 26(b)(1) describes the scope of allowable discovery as follows: ‛Parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the pending action… .’ Fed. R. Civ. P. 26(b)(1).[8]

Expert witness discovery for materials that go beyond what is required in an adequate Rule 26(a) report can have serious consequences for the expert witness who fails to produce the requested materials. Opinion exclusion is an appropriate remedy against an expert witness who failed to keep data samples and statistical packages because the adversary party “could not attempt to validate [the expert witness’s] methods even if [the witness] could specifically say what he considered.[9]

No doubt expert witnesses and parties will attempt to resist the call for working notes and underlying materials on the theory that the requested documents and materials are “draft reports,” which are now protected by the revisions to Rule 26.  For the most part, these evasions have been rejected[10].  In one case, for instance, in which an expert witness’s assistants compiled and summarized information from individual case files, the court rejected the characterization of the information as part of a “draft report,” and ordered their production.[11]

Choice of Discovery Method Beyond Rule 26 Automatic Disclosure

In addition to the mandatory expert report and disclosure of data and facts, and the optional deposition by oral examination, parties have other avenues to pursue discovery of information, facts, and data, from expert witnesses. Under Rule 33(a)(2), parties may propound contention interrogatories that address expert witnesses’ opinions and conclusions. As for methods of discovery beyond what is discussed specifically in Rule 26, courts are confronted with a threshold question whether Rule 34 requests to produce, Rule 30(b)(2) depositions by oral examination, or Rule 45 subpoenas are the appropriate discovery method for obtaining documents from a retained, testifying expert witness. In the view of some courts, the resolution to this threshold question turns on whether expert witnesses are within the control of parties such that parties must respond to discovery for information, documents, and things within the custody, possession, and control of their expert witnesses.

Subpoenas Are Improper

Some federal district courts view Rule 45 subpoenas as inappropriate discovery tools for parties[12] and persons under the control of parties. In Alper v. United States[13], the district court refused to enforce plaintiff’s Rule 45 subpoena that sought documents from defendant’s expert witness. Although acknowledging that Rule 45’s language was unclear, the Alper court insisted that since a party proffers an expert witness, that witness should be considered under the party’s control[14]. And because the expert witness was “within defendant’s control,” the court noted that Rule 34 rather than Rule 45 governed the requested discovery[15]. Alper seems to be a minority view, but its approach is attractive in streamlining discovery, eliminating subpoena service issues for expert witnesses who may live outside the district, and forcing the sponsoring party to respond and to obtain compliance with its retained expert witness.

Subpoenas Are Proper

The “control” rationale of the Alper case is questionable. Rule 45 contains no statement of limitation to non-parties[16]. Parties “proffer” fact witnesses, but their proffers do not restrict the availability of Rule 45 subpoenas. More important, expert witnesses are not truly under the control of the retaining parties. Expert witnesses have independent duties to the court, and under their own professional standards, to give their own independent opinions[17].

Many courts allow discovery of expert witness documents and information by Rule 45 subpoena on either the theory that Rule 45 subpoenas are available for both parties and non-parties or the theory that expert witnesses are sufficiently independent of the sponsoring party that they are non-parties who are clearly subject to Rule 45. If expert witnesses are not parties, and Rule 26’s confidentiality provisions do not constrain the available discovery tools for expert witnesses, then expert witness subpoenas would appear to a proper discovery tool to discover documents in the witnesses’ possession, control, and custody[18]. When used as a discovery tool in this way, subpoenas used are subject to discovery deadlines[19].

Particular Concerns for Discovery of Statistician Expert Witnesses

Statistician expert witnesses require additional care and discovery investigation in complex products liability cases[20].  The caselaw sometimes takes a crabbed approach that refuses to provide parties access to their adversaries’ statistical analyses, calculations, data input  and output files, and graphical files.

Statistician expert testimony will usually involve complex statistical evidence, models, assumptions, and calculations. These materials will in turn create a difficulty in discerning the statistician’s choices from available statistical tests, and whether the statistician exploited the opportunity for multiple tests to be conducted serially with varying assumptions until a propitious result was obtained. Given these typical circumstances, statistical expert witness testimony will almost always require full disclosure to allow the adversary a fair opportunity to cross-examine at trial, or to challenge the validity of the proffered analyses under Rules 702 and 703[21].

Statisticians create and use a variety of materials that are clearly relevant to the their opinion:

  • programs and programming code run to generate all specified analyses on specified data,
  • statistical packages,
  • all data available,
  • all data “cleaning” or data selection processes,
  • selection of variables from those available,
  • data frames that show what data were included (and excluded) in the analyses,
  • data input files,
  • all specified tests run on all data,
  • all data and analysis output files that show all analyses generated,
  • all statistical test diagnostics and tests of underlying assumptions, and
  • graphical output files.

The statistician may have made any number of decisions or judgments in selecting which statistical test results to incorporate into his or her final report.  The report will in all likelihood not include important materials that would allow another statistician to fully understand, test, replicate, and criticize the more conclusory analysis and statements in the report.  In addition, lurking in the witnesses files, or in the electronic “trash bin” may be alternative analyses that were run and discarded, and not included in the final report.  Why and how those alternative analyses were run but discarded, may raise important credibility or validity questions, as well as provide insight into the statistician’s analytical process, all important considerations in preparing for cross-examination and rebuttal.  The lesson of Rule 26, and the caselaw interpreting its provisions, is that lawyers must make specific request for the materials described above.  Only with these materials firmly in hand, can a deposition fully explore the results obtained, the methods used, the assumptions made, the assumptions violated, the alternative methods rejected, the data used, the data available, data not used, the data-dredging and manipulation potential, analytical problems, and the potential failure to reconcile inconsistent results. Waiting for trial, or even for the deposition, may well be too late[22].

The warrant for examining the integrity of data relied upon by expert witnesses appears to be securely embedded in the Federal Rules of Civil Procedure, and in the Federal Rules of Evidence. Evidence Rule 703 has particular relevance to statistical or epidemiologic testimony. Lawyers facing studies of dubious quality may need to press for discovery of underlying data and materials. In the Viagra vision loss multi-district litigation (MDL), the defendant sought and obtained discovery of underlying data from plaintiffs’ expert witness’s epidemiologic study of vision loss among patients using Viagra and similar medications[23]. Although the Viagra MDL court had struggled with inferential statistics in its first approach to defendant’s Rule 702 motion, the court understood the challenge based upon lack of data integrity, and reconsidered and granted defendant’s motion to exclude the challenged expert witness[24].

The lawyering implications for discovery of statistician expert witnesses are important. Statistical evidence requires counsel’s special scrutiny to ensure compliance with the disclosure requirements of Federal Rule of Civil Procedure 26. Given the restrictive reading of Rule 26 by some courts, counsel will need to anticipate the use of other discovery tools. Lawyers should request by Rule 34 or Rule 45, all computer runs, programming routines, and outputs, and they should zealously pursue witnesses’ failure to maintain and produce data. Given the uncertainty in some districts whether expert witnesses are subject to subpoenas, counsel may consider propounding both Rule 34 requests and serving Rule 45 subpoenas.

Lawyers in data-intensive cases should give early consideration to appropriate discovery plans that contemplate data production in advance of depositions, to allow full exploration of analyses at deposition[25]. Lawyers should also be alert to the potential need to show particularized need for the requested data and analyses. In instructing expert witnesses on their preparation of their reports, lawyers should consider directing their expert witnesses to express whether they need further access to the adversary’s expert witnesses’ underlying data and materials to fully evaluate the proffered opinions. Discovery of statisticians and their data and their analyses requires careful planning, as well as patient efforts to educate the court about the need for full exploration of all data and all analyses conducted, whether or not incorporated into the Rule 26 report.


[1] Randall v. Rolls-Royce Corp., 2010 U.S. Dist. LEXIS 23421, *4-5 (S.D. Ind. March 12, 2010) (“Dr. Harnett who began his evaluation of the analysis contained in the report … soon concluded that he needed the underlying studies and statistical programs created or used by Dr. Drogin. In response to the Defendants’ request for such materials, Plaintiffs produced four discs containing more than 1,000 separate electronic files”).

[2] Marsh v. Jackson, 141 F.R.D. 431, 432–33 (W.D. Va. 1992) (holding that Rule 45 could not be used to obtain an opposing expert’s files because Rule 26(b)(4) limits expert discovery to depositions and interrogatories as a policy matter)

[3] See Advisory Comm. Notes for 1993 Amendments, to Fed. R. Civ. P. 26(a) (“The enumeration in Rule 26(a) of items to be disclosed does not prevent a court from requiring by order or local rule that the parties disclose additional information without a discovery request. Nor are parties precluded from using traditional discovery methods to obtain further information regarding these matters, … .”); United States v. Bazaarvoice, Inc., C 13-00133 WHO (LB), 2013 WL 3784240 (N.D. Cal. July 18, 2013) (“Rule 26(a)(2)(B) . . . does not preclude parties from obtaining further information through ordinary discovery tools”) (internal citations omitted).

[4] Morriss v. BNSF Ry. Co., No. 8:13CV24, 2014 WL 128393, at *4–6, 2014 U.S. Dist. LEXIS 3757, at *17 (D.Neb. Jan. 13, 2014) (holding that “absent some threshold showing of “compelling reason,” the broad discovery provisions of Rules 34 and 45 cannot be used to undermine the specific expert witness discovery rules in Rule 26(a)(2)”).

[5] Modjeska v. United Parcel Service Inc., No. 12–C–1020, 2014 WL 2807531 (E.D. Wis. June 19, 2014) (holding that Rule 26(a)(2)(B) governs only disclosure in expert witness reports and does not limit or preclude further discovery using ordinary discovery such as requests to produce); Expeditors Int’l of Wash., Inc. v. Vastera, Inc., No. 04 C 0321, 2004 WL 406999, at *3 (N.D. Ill. Feb.26, 2004). See also Wright & Miller, 9A Federal Practice & Procedure Civ. § 2452 (3d ed. 2013).

[6] Adv. Comm. Note for Rule 26(b)(4)(B)(2010).  See, e.g., Ladd Furniture v. Ernst & Young, 1998 U.S. Dist. LEXIS 17345, at *34-37 (M.D.N.C. Aug. 27, 1998).

[7] Id.

[8] Id. at *36-37.

[9] Innis Arden Golf Club v. Pitney Bowes, Inc., 629 F. Supp. 2d 175, 190 (D. Conn. 2009) (excluding expert opinion because his samples and data packages no longer existed and thus “[d]efendants could not attempt to validate [his] methods even if he could specifically say what he considered”). See also Jung v. Neschis, No. 01–Civ. 6993(RMB)(THK), 2007 WL 5256966, at *8–15 (S.D.N.Y. Oct. 23, 2007) (finding that a party’s failure to produce tape recordings that its medical expert witness relied upon for his opinion was ‘‘disturbing’’; precluding expert witness’s testimony).

[10] See, e.g., Dongguk Univ. v. Yale Univ., No. 3:08-CV-00441, 2011 WL 1935865, at *1 (D. Conn. May 19, 2011) (holding that “an expert’s handwritten notes are not protected from disclosure because they are neither drafts of an expert report nor communications between the party’s attorney and the expert witness”).

[11] D.G. ex rel. G. v. Henry, No. 08-CV-74-GKF-FHM, 2011 WL 1344200, at *1 (N.D. Okla. Apr. 8, 2011) (ordering production of the assistants’ notes because the expert witness had relied upon them in forming his opinion, which brought them within the scope of “facts or data” under the rule).

[12] Mortgage Info. Servs, Inc. v. Kitchens, 210 F.R.D. 562, 564-68 (W.D.N.C. 2002) (holding that nothing in Rule 45 precludes its use on a party); See also Mezu v. Morgan State Univ., 269 F.R.D. 565, 581 (D. Md. 2010) (“courts are divided as to whether Rule 45 subpoenas should be served on parties”); Peyton v. Burdick, 2008 U.S. Dist. LEXIS 106910 (E.D. Cal. 2008) (discussing the split among courts on the issue).

[13] 190 F.R.D. 281 (D. Mass. 2000).

[14] Id. at 283.

[15] Id. See Ambrose v. Southworth Products Corp., No. CIV.A. 95–0048–H, 1997 WL 470359, 1 (W.D. Va. June 24, 1997) (holding a “naked” subpoena duces tecum directed to a non-party expert retained by a party is not within the ambit of a Rule 45 document production subpoena, and not permitted by Fed. R. Civ. Pro. 26(b)(4)); see also Hartford Fire Ins. v. Pure Air on the Lake Ltd., 154 F.R.D. 202, 208 (N.D. Ind. 1993) (holding a party cannot use Rule 45 to circumvent Rule 26(b)(4) as a method to obtain an expert witness’s files); Marsh v. Jackson, 141 F.R.D. 431, 432 (W.D. Va. 1992) (noting that subpoena for production of documents directed to non-party expert retained by a party is not within ambit of Fed. Rule 45(c)(3)(8)(ii)).

[16] See James Wm. Moore, 9 Moore’s Federal Practice § 45.03[1] (noting that “[s]ubpoenas under Rule 45 may be issued to parties or non-parties”).

[17] See Glendale Fed. Bank, FSB v.United States, 39 Fed. Cl. 422, 424 (Fed. Cl. 1997) (“The expert witness, testifying under oath, is expected to give his own honest, independent opinion… He is not the sponsoring party’s agent at any time merely because he is retained as its expert witness”). See also National Justice Compania Naviera S.A. v. Prudential Assurance Co. Ltd., (“The Ikarian Reefer”), [1993] 2 Lloyd’s Rep. 68 at 81-82 (Q.B.D.), rev’d on other grounds [1995] 1 Lloyd’s Rep. 455 at 496 (C.A.) (embracing the enumeration of duties, including a duty to “provide independent assistance to the Court by way of objective unbiased opinion in relation to matters within his expertise,” and a duty to eschew “the role of an advocate”).

[18] Western Res., Inc. v. Union Pac. RR, No. 00-2043-CM, 2002 WL 1822428, at *3 (D. Kan. July 23, 2002) (ordering expert witness to produce prior testimony under Rule 45); All W. Supply Co. v. Hill’s Pet Prods. Div., Colgate-Palmolive Co., 152 F.R.D. 634, 639 (D. Kan. 1993) (“With regard to nonparties such as plaintiff’s expert witness, a request for documents may be made by subpoena duces tecum pursuant to Rule 45”); Smith v. Transducer Technology, Inc., No. Civ. 1995/28, 2000 WL 1717332, 2 (D.V.I. Nov. 16, 2000) (holding that Rule 30(b)(5) deposition notice, served upon opposing party, is not an appropriate discovery tool to compel expert witness to produce documents from at his deposition) (noting that a “Rule 45 subpoena duces tecum in conjunction with a properly noticed deposition may do so (subject however to any Rule 26 limitations)”); Thomas v. Marina Assocs., 202 F.R.D. 433, 434 (E.D. Pa. 2001) (denying motion to quash subpoenas issued to party’s expert witness); Quaile v. Carol Cable Co., Civ. A. No. 90-7415, 1992 WL 277981, at *2 (E.D. Pa. Oct. 5, 1992) (granting motion to compel discovery concerning expert witness’s opinions pursuant to a Rule 45 subpoena); Lawrence E. Jaffe Pension Plan v. Household Int’l, Inc., No. 02 C 5893, 2008 WL 687220, at *2 (N.D. Ill Mar. 10, 2008) (“It is clear . . . that a subpoena duces tecum . . . is an appropriate discovery mechanism against . . . a party’s expert witness”) (internal citation omitted); Expeditors Internat’l of Wash., Inc. v. Vastera, Inc., No. 04 C 0321, 2004 WL 406999, at *2-3 (N.D. Ill. Feb. 26, 2004) (holding Rule 45, not Rule 34, governs discovery from retained experts) (“Subpoena duces tecum is . . . an appropriate discovery mechanism against nonparties such as a party’s expert witness”); Reit v. Post Prop., Inc., No. 09 Civ. 5455(RMB)(KNF), 2010 WL 4537044, at *9 (S.D.N.Y. Nov. 4, 2010) (“Subpoena duces tecum … is an appropriate discovery mechanism against a nonparty expert”).

[19] See, e.g., Williamson v. Horizon Lines LLC , 248 F.R.D. 79, 83 (D. Me. 2008) (“[C]ontrary to Horizon Lines’ contention, there is a relationship between Rule 26 and Rule 45 and parties should not be allowed to employ a subpoena after a discovery deadline to obtain materials from third parties that could have been produced before discovery.”).

[20] Bartley v. Isuzu Motors Ltd., 151 F.R.D. 659, 660-61 (D. Colo. 1993) (ordering party to create and preserve “the input and output data for each variable in the program, for each iteration, or each simulation,” as well as a record of all simulations performed, even those that do not conform to the plaintiff’s claims and theories in the case).

[21] See City of Cleveland v. Cleveland Elec. Illuminating Co., 538 F. Supp. 1257 (N.D. Ohio 1980) (“Certainly, where, as here, the expert reports are predicated upon complex data, calculations and computer simulations which are neither discernible nor deducible from the written reports themselves, disclosure thereof is essential to the facilitation of effective and efficient examination of these experts at trial.”); Shu-Tao Lin v. McDonnell-Douglas, Corp., 574 F. Supp. 1407, 1412-13 (S.D.N.Y. 1983) (granting new trial, and holding that expert witness’s failure to disclosure the “nature of [the plaintiff’s testifying expert’s] computer program or the underlying data, the inputs and outputs employed in the program” deprived adversary of an “adequate basis on which to cross-examine plaintiff’s experts”), rev’d on other grounds, 742 F.2d 45 (2d Cir. 1984).

[22] Manual for Complex Litigation at 99, § 11.482 (4th ed. 2004) (“Early and full disclosure of expert evidence can help define and narrow issues. Although experts often seem hopelessly at odds, revealing the assumptions and underlying data on which they have relied in reaching their opinions often makes the bases for their differences clearer and enables substantial simplification of the issues. In addition, disclosure can facilitate rulings well in advance of trial on objections to the qualifications of an expert, the relevance and reliability of opinions to be offered, and the reasonableness of reliance on particular data.207”). See also ABA Section of Antitrust Law, Econometrics: Legal, Practical, and Technical Issues at 75-76 (2005) (advising of the necessity to obtain all data, all analyses, and all supporting materials, in advance of deposition to ensure efficient and effective discovery procedures).

[23] In re Viagra Prods. Liab. Litig., 572 F. Supp. 2d 1071, 1090 (D. Minn. 2008).

[24] In re Viagra Prods. Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).

[25] See Fed. R. Civ. Pro. 16(b); 26(f).