TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The American Statistical Association’s Statement on and of Significance

March 17th, 2016

In scientific circles, some commentators have so zealously criticized the use of p-values that they have left uninformed observers with the impression that random error was not an interesting or important consideration in evaluating the results of a scientific study. In legal circles, counsel for the litigation industry and their expert witnesses have argued duplicitously that statistical significance was at once both unimportant, except when statistical significance is observed, in which causation is conclusive. The recently published Statement of the American Statistical Association (“ASA”) restores some sanity to the scientific and legal discussions of statistical significance and p-values. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>.

Recognizing that sound statistical practice and communication affects research and public policy decisions, the ASA has published a statement of interpretative principles for statistical significance and p-values. The ASA’s statement first, and foremost, points out that the soundness of scientific conclusions turns on more than statistical methods alone. Study design, conduct, and evaluation often involve more than a statistical test result. And the ASA goes on to note, contrary to the contrarians, that “the p-value can be a useful statistical measure,” although this measure of attained significance probability “is commonly misused and misinterpreted.” ASA at 7. No news there.

The ASA’s statement puts forth six principles, all of which have substantial implications for how statistical evidence is received and interpreted in courtrooms. All are worthy of consideration by legal actors – legislatures, regulators, courts, lawyers, and juries.

1. P-values can indicate how incompatible the data are with a specified statistical model.”

The ASA notes that a p-value shows the “incompatibility between a particular set of data and a proposed model for the data.” Although there are some in the statistical world who rail against null hypotheses of no association, the ASA reports that “[t]he most common context” for p-values consists of a statistical model that includes a set of assumptions, including a “null hypothesis,” which often postulates the absence of association between exposure and outcome under study. The ASA statement explains:

The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis, if the underlying assumptions used to calculate the p-value hold. This incompatibility can be interpreted as casting doubt on or providing evidence against the null hypothesis or the underlying assumptions.”

Some lawyers want to overemphasize statistical significance when present, but to minimize the importance of statistical significance when it is absent.  They will find no support in the ASA’s statement.

2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”

Of course, there are those who would misinterpret the meaning of p-values, but the flaw lies in the interpreters, not in the statistical concept.

3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Note that the ASA did not say that statistical significance is irrelevant to scientific conclusions. Of course, statistical significance is but one factor, which does not begin to account for study validity, data integrity, or model accuracy. The ASA similarly criticizes the use of statistical significance as a “bright line” mode of inference, without consideration of the contextual considerations of “the design of a study, the quality of the measurements, the external evidence for the phenomenon under study, and the validity of assumptions that underlie the data analysis.” Criticizing the use of “statistical significance” as singularly assuring the correctness of scientific judgment does not, however, mean that “statistical significance” is irrelevant or unimportant as a consideration in a much more complex decision process.

4. Proper inference requires full reporting and transparency”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

ASA Statement. See also “Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses” (Oct. 14, 2014).

5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.”

The ASA notes the commonplace distinction between statistical and practical significance. The independence between statistical and practice significance does not, however, make statistical significance irrelevant, especially in legal and regulatory contexts, in which parties claim that a risk, however small, is relevant. Of course, we want the claimed magnitude of association to be relevant, but we also need the measured association to be accurate and precise.

6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.”

Of course, a p-value cannot validate the model, which is assumed to generate the p-value. Contrary to the hyperbolic claims one sees in litigation, the ASA notes that “a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.” And so the ASA counsels that “data analysis should not end with the calculation of a p-value when other approaches are appropriate and feasible.” 

What is important, however, is that the ASA never suggests that significance testing or measurement of significance probability is not an important and relevant part of the process. To be sure, the ASA notes that because of “the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches.”

First of these other methods unsurprisingly is estimation with assessment of confidence intervals, although the ASA also includes Bayesian and other methods as well. There are some who express irrational exuberance about the protential of Bayesian methods to restore confidence in scientific process and conclusions. Bayesian approaches are less manipulated than frequentist ones, largely because very few people use Bayesian methods, and even fewer people really understand them.

In some ways, Bayesian statistical approaches are like Apple computers. The Mac OS is less vulnerable to viruses, compared with Windows, because its lower market share makes it less attractive to virus code writers. As Apple’s OS has gained market share, its vulnerability has increased. (My Linux computer on the other hand is truly less vulnerable to viruses because of system architecture, but also because Linux personal computers have almost no market share.) If Bayesian methods become more prevalent, my prediction is that they will be subject to as much abuse as frequent views. The ASA wisely recognized that the “reproducibility crisis” and loss of confidence in scientific research were mostly due to bias, both systematic and cognitive, in how studies are done, interpreted, and evaluated.

District Court Denies Writ of Coram Nobis to Dr Harkonen

August 27th, 2015

Courts are generally suspicious of convicted defendants who challenge the competency of their trial counsel on any grounds that might reflect strategic trial decisions. A convicted defendant can always speculate about how his trial might have gone better had some witnesses, who did not fare well at trial, not been called. Similarly, a convicted defendant might well speculate that his trial counsel could and should have called other or better witnesses. Still, sometimes, trial counsel really do screw up, especially when it comes to technical, scientific, or statistical issues.

The Harkonen case is a true comedy of errors – statistical, legal, regulatory, and practical. Indeed, some would say it is truly criminal to convict someone for an interpretation of a clinical trial result.[1] As discussed in several previous posts, Dr. W. Scott Harkonen was convicted under the wire fraud statute, 18 U.S.C. § 1343, for having distributed a faxed press release about InterMune’s clinical trial, in which he described the study as having “demonstrated” Actimmune’s survival benefit in patients with mild to moderate idiopathic pulmonary fibrosis (cryptogenic fibrosing alveolitis). The trial had not shown a statistically significant result on its primary outcome, and the significance probability on the secondary outcome of survival benefit was 0.08. Dr. Harkonen reported on a non-prespecified subgroup of patients with mild to moderate disease at randomization, in which subgroup, the trial showed better survival in the experimental therapy group, p-value of 0.004, compared with the placebo group.

Having exhausted his direct appeal, Dr. Harkonen petitioned for post-conviction relief in the form of a writ of coram nobis, on grounds of ineffective assistance of counsel. Last week, federal District Judge Richard Seeborg, in San Francisco, denied Dr. Harkonen’s petition. United States v. Harkonen, Case No. 08-cr-00164-RS-1, Slip op. (N.D. Cal. Aug. 21, 2015). See Dani Kass, “Ex-InterMune CEO’s Complaints Against Trial Counsel Nixed,” Law360 (Aug. 24, 2015). Judge Seeborg held that Dr. Harkonen had failed to explain why he had not raised the claim of ineffective assistance earlier, and that trial counsel’s tactical and strategic decisions, with respect to not calling statistical expert witnesses, were “not so beyond the pale of reasonable conduct as to warrant the finding of ineffective assistance.” Slip op. at 1.

To meet its burden at trial, the government presented Dr. Thomas Fleming, a statistician and “trialist,” who had served on the data safety and monitoring board of the clinical trial at issue.[2] Fleming took the rather extreme view that a clinical trial that “fails” to meet its primary pre-stated end point at the conventional p-value of less than 5 percent is an abject failure and provides no demonstration of any claim of efficacy. (Other experts might well say that the only failed clinical trial is one that was not done.) Judge Seeborg correctly discerned that Fleming’s testimony was in the form of an opinion, and that the law of wire fraud prohibits prosecution of scientific opinions about which reasonable scientists may differ. The government’s burden was thus to show, beyond a reasonable doubt, that no reasonable scientist could have reported the Actimmune clinical trial as having “demonstrated” a survival benefit in the mild to moderate disease subgroup. Slip op. at 2.

Remarkably, at trial, the government presented no expert witnesses, and Fleming testified as a fact witness. While acknowledging that the contested issue, whether anyone could fairly say that the Actimmune clinical trial had demonstrated efficacy in a non-prespecified subgroup, called for an opinion, Judge Seeborg gave the government a pass for not presenting expert witnesses to make out its case. Indeed, Judge Seeborg noted that the government had “stressed testimony from its experts touting the view that study results without sufficiently low p-values are inherently unreliable and meaningless.” Slip op. at 3 (emphasis added). Judge Seeborg’s description of Fleming as an expert witness is remarkable because the government never sought to qualify Dr. Fleming as an expert witness, and the trial judge never gave the jury an instruction on how to evaluate the testimony of an expert witness, including an explanation that the jury was free to accept some, all, or none of Fleming’s opinion testimony. After the jury returned its guilty verdict, Harkonen’s counsel filed a motion for judgment of acquittal, based in part upon the government’s failure to qualify Fleming as an expert witness in the field of biostatistics. The trial judge refused this motion on grounds that

(1) at one point Fleming had been listed as an expert witness;

(2) Fleming’s curriculum vitae had been marked and admitted into evidence; and

(3) “[m]ost damningly,” according to the trial judge, Harkonen’s lawyers had failed to object to Fleming’s holding forth on opinions about statistical theory and practice.

Slip op. at 7. Damning indeed as evidence of a potentially serious deviation from a reasonable standard of care and competence for trial practice! On the petition for coram nobis, Judge Seeborg curiously refers to Dr. Harkonen as not objecting, when the very issue before the court, on the petition for coram nobis, is the competency of his counsel’s failing to object. Allowing a well-credentialed statistician, such as Fleming, to testify, without requesting a limiting instruction on expert witness opinion testimony certainly seems “beyond the pale.” If there were some potential tactic involved in this default, Judge Seeborg does not identify it, and none comes to mind. And even if this charade, of calling Fleming as a fact witness, were some sort of tactical cat-and-mouse litigation game between government and defendant, certainly the trial judge should have taken control of the matter by disallowing a witness, not tendered as an expert witness, from offering opinion testimony on arcane statistical issues.

Having not objected to Fleming’s opinions, Dr. Harkonen’s counsel decided not to call its own defense expert witnesses. The post-conviction court makes much of the lesser credentials of the defense witnesses, and a decision not to call expert witnesses based upon defense counsel’s apparent belief that it had undermined Fleming’s opinion on cross-examination. There is little in the cross-examination of Fleming to support the coram nobis court’s assessment. Fleming’s opinions were vulnerable in ways that trial counsel failed to exploit, and in ways that even a lesser credentialed expert witness could have made clear to a lay jury or the court. Even a journeyman statistician would have realized that Fleming had overstated the statistical orthodoxy that p-values are “magical numbers,” by noting that many statisticians and epidemiologists disagreed with invoking statistical hypothesis testing as a rigid decision procedure, based upon p-values less than 0.05. Indeed, the idea of statistical testing as driven by a rigid, pre-selected level of acceptable Type 1 error rate was rejected by the very statistician who developed and advanced computations of the p-value. See Sir Ronald Fisher, Statistical Methods and Scientific Inference 42 (Hafner 1956) (ridiculing rigid hypothesis testing as “absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.”).

After the jury convicted on the wire fraud count, Dr. Harkonen changed counsel from Kasowitz Benson Torres & Friedman LLP, to Mark Haddad at Sidley Austin LLP. Mr. Haddad was able, in relatively short order, to line up two outstanding statisticians, Professor Steven Goodman, of Stanford University’s Medical School, and Professor Donald Rubin, of Harvard University. Both Professors Goodman and Rubin robustly rejected Fleming’s orthodox positions in post-trial declarations, which were too late to affect the litigation of the merits, although their contributions may well have made it difficult for the trial judge to side with the government on its request for a Draconian ten-year prison sentence. From my own perspective, I can say it was not difficult to recruit two leading, capable epidemiologists, Professors Kenneth Rothman and Timothy Lash to join in an amicus brief that criticized Fleming’s testimony in a way that would have been devastating had it been done at trial.

The entire Harkonen affair is marked by extraordinary governmental hypocrisy. As Judge Seeborg reports:

“[t]hroughout its case in chief, the government stressed testimony from Fleming and Crager who offered that, in the world of biostatistical analysis, a 0.05 p-value threshold is ‘somewhat of a magic number’; that the only meaningful p-value from a study is the one for its primary endpoint; and that data from post-hoc subgroup analyses cannot be reported upon accurately without information about the rest of the sampling context.”[3]

Slip op. at 4. And yet, in another case, when it was politically convenient to take the opposite position, the government proclaimed, through its Solicitor General, on behalf of the FDA, that statistical significance at any level is not necessary at all for demonstrating causation:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). The methods of epidemiology and data analysis are not, however, so amenable to political expedience. The government managed both to overstate the interpretation of p-values in Harkonen, and to understate them in Matrixx Initiatives.

Like many of the judges who previously have ruled on one or another issue in the Harkonen case, Judge Seeborg struggled with statistical concepts and gave a rather bizarre, erroneous definition of what exactly was at issue with the p-values in the Actimmune trial:

“In clinical trials, a p-value is a number between one and zero which represents the probability that the results establish a cause-and-effect relationship, rather than a random effect, between the drug and a positive health benefit. Because a p-value indicates the degree to which the tested drug does not explain observed benefits, the smaller the p-value, the larger a study’s significance.”

Slip op. at 2-3. Ultimately, this error was greatly overshadowed by a simpler error of overlooking, and condoning, trial counsel’s default in challenging the government’s failure to present credible expert witness opinion testimony on the crucial issue in the case.

At the heart of the government’s complaint is that Dr. Harkonen’s press release does not explicitly disclose that the subgroup of mild and moderate disease patients was not pre-specified for analysis in the trial protocol and statistical analysis plan. Dr. Harkonen’s failure to disclose the ad hoc nature of the subgroup, while not laudable, hardly rose to the level of criminal fraud, especially when considered in the light of the available prior clinical trials on the same medication, and prevalent practice in not making the appropriate disclosure in press releases, and even in full, peer-reviewed publications of clinical trials and epidemiologic studies.

For better or worse, the practice of presenting unplanned subgroup analyses, is quite common in the scientific community. Several years ago the New England Journal of Medicine published a survey of publication practice in its own pages, and documented the widespread failure to limit “demonstrated” findings to pre-specified analyses.[4] In general, the survey authors were unable to determine the total number of subgroup analyses performed; and in the majority (68%) of trials discussed, the authors could not determine whether the subgroup analyses were pre-specified.[5] Although the authors of this article proposed guidelines for identifying subgroup analyses as pre-specified or post-hoc, they emphasized that the proposals were not “rules” that could be rigidly prescribed.[6]

Of course, what was at issue in Dr. Harkonen’s case was not a peer-reviewed article in a prestigious journal, but a much more informal, less rigorous communication that is typical of press releases. Lack of rigor in this context is not limited to academic and industry press releases. Consider the press release recently issued by the National Institutes of Health (NIH) in connection with a NIH funded clinical trial on age-related macular degeneration (AMD). NIH Press Release, “NIH Study Provides Clarity on Supplements for Protection against Blinding Eye Disease,” NIH News & Events Website (May 5, 2013) [last visited August 27, 2015]. The clinical trial studied a modified dietary supplement in common use to prevent or delay AMD. The NIH’s press release claimed that the study “provides clarity on supplements,” and announced a “finding” of “some benefits” when looking at just two of the subgroups. The press release does not use the words “post hoc” or “ad hoc” in connection with the subgroup analysis used to support the “finding” of benefit.

The clinical trial results were published the same day in a journal article that labeled the subgroup findings as post hoc subgroup findings.[7] The published paper also reported that the pre-specified endpoints of the clinical trial did not show statistically significant differences between therapies and placebo.

None of the p-values for any of the post-hoc subgroup analysis was adjusted for multiple comparisons. NIH webpages with Questions and Answers for the public and the media both fail to report the post-hoc nature of the subgroup findings.[8] By the standards imposed upon Dr. Harkonen in this case through Dr. Fleming’s testimony, and contrary to the NIH’s public representations, the NIH trial had “failed,” and no inferences could be drawn with respect to any endpoint because the primary endpoint did not yield a statistically significant result.

There are, to be sure, hopeful signs that the prevalent practice is changing. A recent article documented an increasing number of “null” effect clinical trials that have been reported, perhaps as the result of better reporting of trials without dramatic successes, increasing willingness to publish such trial results, and greater availability of trial protocols in advance of, or with, peer-review publication of trial results.[9] Transparency in clinical and other areas of research is welcome and should be the norm, descriptively and prescriptively, but we should be wary of criminalizing lapses with indictments of wire fraud for conduct that can be found in most scientific journals and press releases.


[1] See, e.g.,Who Jumped the Shark in United States v. Harkonen”; “Multiplicity versus Duplicity – The Harkonen Conviction”; “The (Clinical) Trial by Franz Kafka”; “Further Musings on U.S. v. Harkonen”; and “Subgroups — Subpar Statistical Practice versus Fraud.” In the Supreme Court, two epidemiologists and a law school lecturer filed an Amicus Brief that criticized the government’s statistical orthodoxy. Brief by Scientists And Academics as Amici Curiae, in Harkonen v. United States, 2013 WL 5915131, 2013 WL 6174902 (Supreme Court Sept. 9, 2013).

[2] The government also presented the testimony of Michael Crager, an InterMune biostatistician. Reading between the lines, we may infer that Dr. Crager was induced to testify in exchange for not being prosecuted, and that his credibility was compromised.

[3] This testimony was particularly egregious because mortality or survival is often the most important outcome measure, but frequently not made the primary trial end point because of concern over whether there would be a sufficient number of deaths over the course of the trial to assess efficacy in this outcome. In the context of the Actimmune trial, this concern was in full display, but as it turned out, when the data were collected, there was a survival benefit (p = 0.08, which shrank to 0.055 when the analysis was limited to patients who met entrance criteria, and shrank further to 0.004, when the analysis was limited plausibly to patients with only mild or moderate disease at randomization).

[4] Rui Wang, et al., “Statistics in Medicine – Reporting of Subgroup Analyses in Clinical Trials,” 357 New Eng. J. Med. 2189 (2007).

[5] Id. at 2192.

[6] Id. at 2194.

[7] Emily Chew, et al., Lutein + Zeaxanthin and Omega-3 Fatty Acids for Age-Related Macular Degeneration, 309 J. Am. Med. Ass’n 2005 (2013).

[8] SeeFor the Public: What the Age-Related Eye Disease Studies Mean for You” (May 2013) [last visited August 27, 2015]; “For the Media: Questions and Answers about AREDS2” (May 2013) [last visited August 27, 2015].

[9] See Robert M. Kaplan & Veronica L. Irvin, “Likelihood of Null Effects of Large NHLBI Clinical Trials Has Increased over Time,” 10 PLoS ONE e0132382 (2015); see also Editorial, “Trials register sees null results rise,” 524 Nature 269 (Aug. 20, 2015); Paul Basken, “When Researchers State Goals for Clinical Trials in Advance, Success Rates Plunge,” The Chronicle of Higher Education (Aug. 5, 2015).

Canadian Judges’ Reference Manual on Scientific Evidence

July 24th, 2015

I had some notion that there was a Canadian version of the Reference Manual on Scientific Evidence in the works, but Professor Greenland’s comments in a discussion over at Deborah Mayo’s blog drew my attention to the publication of the Science Manual for Canadian Judges [Manual]. See “‘Statistical Significance’ According to the U.S. Dept. of Health and Human Services (ii),Error Statistics Philosophy (July 17, 2015).

The Manual is the product of the Canadian National Judicial Institute (NJI), which is an independent, not-for-profit group that is committed to educating Canadian judges. The NJI’s website describes the Manual:

“Without the proper tools, the justice system can be vulnerable to unreliable expert scientific evidence.

* * *

The goal of the Science Manual is to provide judges with tools to better understand expert evidence and to assess the validity of purportedly scientific evidence presented to them. …”

The Chief Justice of Canada, Hon. Beverley M. McLachlin, contributed an introduction to the Manual, which was notable for its frank admission that:

[w]ithout the proper tools, the justice system is vulnerable to unreliable expert scientific evidence.

****

Within the increasingly science-rich culture of the courtroom, the judiciary needs to discern ‘good’ science from ‘bad’ science, in order to assess expert evidence effectively and establish a proper threshold for admissibility. Judicial education in science, the scientific method, and technology is essential to ensure that judges are capable of dealing with scientific evidence, and to counterbalance the discomfort of jurists confronted with this specific subject matter.”

Manual at 14. These are laudable goals, indeed.

The first chapter of the Manual is an overview of Canadian law of scientific evidence, “The Legal Framework for Scientific Evidence,” by Canadian law professors Hamish Stewart (University of Toronto), and Catherine Piché (University of Montreal). Several judges served as peer reviewers.

The second chapter, “Science and the Scientific Method,” contains the heart of what judges supposedly should know about scientific and statistical matters to serve as effective “gatekeepers.” Like the chapters in the Reference Manual on Scientific Evidence, this chapter was prepared by a scientist author (Scott Findlay, Ph.D., Associate Professor of Biology, University of Ottawa) and a lawyer author (Nathalie Chalifour, Associate Professor of Law, University of Ottawa). Several judges, and Professor Brian Baigrie (University of Toronto, Victoria College, and the Institute for the History and Philosophy of Science and Technology) provided peer review. The chapter attempts to cover the demarcation between science and non-science, and between scientific and other expert witness opinion. The authors describe “the” scientific method, hypotheses, experiments, predictions, inference, probability, statistics and statistical hypothesis testing, data reliability, and related topics. A subsection of chapter two is entitled “Normative Issues in Science – The Myth of Scientific Objectivity,” which suggests a Feyerabend, post-modernist influence at odds with the Chief Justice’s aspirational statement of goals in her introduction to the Manual.

Greenland noted some rather cavalier statements in Chapter two that suggest that the conventional alpha of 5% corresponds to a “scientific attitude that unless we are 95% sure the null hypothesis is false, we provisionally accept it.” And he pointed elsewhere where the chapter seems to suggest that the coefficient of confidence that corresponds to an alpha of 5% “constitutes a rather high standard of proof,” thus confusing and conflating probability of random error with posterior probabilities. Some have argued that these errors are simply an effort to make statistical concepts easier to grasp for lay people, but the statistics chapter in the FJC’s Reference Manual shows that accurate exposition of statistical concepts can be made understandable. The Canadian Manual seems in need of some trimming with Einstein’s razor, usually paraphrased as “Everything should be made as simple as possible, but no simpler.[1] The razor should certainly applied to statistical concepts, with the understanding that pushing to simplify too aggressively can sometimes result in simplistic, and simply wrong, exposition.

Chapter 3 returns to more lawyerly matters, “Managing and Evaluating Expert Evidence in the Courtroom,” prepared and peer-reviewed by prominent Canadian lawyers and judges. The final chapter, “Ethics of the Expert Witness,” should be of interest to lawyers and judges in the United States, where the topic is largely ignored. The chapter was prepared by Professor Adam Dodek (University of Ottawa), along with several writers from the National Judicial Institute, the Canadian Judicial Council, American College of Trial Lawyers, Environment Canada, and notably, Joe Cecil & the Federal Judicial Center.

Weighing in at 228 pages, the Science Manual for Canadian Judges is much shorter than the Federal Judicial Center’s Reference Manual on Scientific Evidence. Unlike the FJC’s Reference Manual, which is now in its third edition, the Canadian Manual has no separate chapters on regression, DNA testing and forensic evidence, clinical medicine and epidemiology. The coverage of statistical inference is concentrated in chapter two, but that chapter has no discussion of meta-analysis, systematic review, evidence-based medicine, confounding, and the like. Perhaps there will soon be a second edition of the Science Manual for Canadian Judges.


[1] See Albert Einstein, “On the Method of Theoretical Physics; The Herbert Spencer Lecture,” delivered at Oxford (10 June 1933), published in 1 Philosophy of Science 163 (1934) (“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”).

Discovery of Retained, Testifying Statistician Expert Witnesses (Part 2)

July 1st, 2015

Discovery Beyond the Report and the Deposition

The lesson of the cases interpreting Rule 26 is that counsel cannot count exclusively upon the report and automatic disclosure requirements to obtain the materials necessary or helpful for cross-examination of statisticians who have created their own analyses. Sometimes just asking nicely suffices[1]. Other avenues of discovery are available, however, for reluctant disclosers. In particular, Rule 26(b) authorizes discovery substantially broader than what is required for inclusion in an expert witness’s report.

Occasionally, counsel cite caselaw that has been superseded by the steady expansion of Rule 26[2]. The 1993 amendments made clear, however, that Rule 26 sets out mandatory minimum requirements that do not define or exhaust the available discovery tools to obtain information from expert witnesses[3]. Some courts continue to insist that a party make a showing of necessity to go beyond the minimal requirements of Rule 26[4], although the better reasoned cases take a more expansive view of the proper scope of expert witness discovery[5].

Although the federal rules may not require the expert witness report to include, or to attach, all “working notes or recordings,” or calculations, alternative analyses, and data output files, these materials may be the subject of proper document requests to the adverse party or perhaps subpoenas to the expert witness.  The Advisory Committee Notes explain that the various techniques of discovery kick in by virtue of Rule 26(b), where automatic disclosure and report requirements of Rule 26(a) leave off:

“Rules 26(b)(4)(B) and (C) do not impede discovery about the opinions to be offered by the expert or the development, foundation, or basis of those opinions. For example, the expert’s testing of material involved in litigation, and notes of any such testing, would not be exempted from discovery by this rule. Similarly, inquiry about communications the expert had with anyone other than the party’s counsel about the opinions expressed is unaffected by the rule. Counsel are also free to question expert witnesses about alternative analyses, testing methods, or approaches to the issues on which they are testifying, whether or not the expert considered them in forming the opinions expressed. These discovery changes therefore do not affect the gatekeeping functions called for by Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and related cases.[6]

The court in Ladd Furniture v. Ernst & Young explained the structure of Rule 26 with respect to underlying documents, calculations, and data[7].  In particular, the requirements of the Rule 26(a) report do not create a limitation on Rule 26(b) discovery:

“As a basis for withholding the above information, Ladd argues that Ernst & Young is not entitled to discover any expert witness information which is not specifically mentioned in Rule 26(a)(2)(B). However, as explained below, Ladd’s position on this point is not supported by the text of Rule 26 or by the Advisory Committee’s commentary to Rule 26(a). In the text, Rule 26(a)(2)(B) provides for the mandatory disclosure of certain expert witness information, even without a request from the opposing party. However, there is no indication on the face of the rule to suggest that a party is absolutely prohibited from seeking any additional information about an opponent’s expert witnesses. In fact, Rule 26(b)(1) describes the scope of allowable discovery as follows: ‛Parties may obtain discovery regarding any matter, not privileged, which is relevant to the subject matter involved in the pending action… .’ Fed. R. Civ. P. 26(b)(1).[8]

Expert witness discovery for materials that go beyond what is required in an adequate Rule 26(a) report can have serious consequences for the expert witness who fails to produce the requested materials. Opinion exclusion is an appropriate remedy against an expert witness who failed to keep data samples and statistical packages because the adversary party “could not attempt to validate [the expert witness’s] methods even if [the witness] could specifically say what he considered.[9]

No doubt expert witnesses and parties will attempt to resist the call for working notes and underlying materials on the theory that the requested documents and materials are “draft reports,” which are now protected by the revisions to Rule 26.  For the most part, these evasions have been rejected[10].  In one case, for instance, in which an expert witness’s assistants compiled and summarized information from individual case files, the court rejected the characterization of the information as part of a “draft report,” and ordered their production.[11]

Choice of Discovery Method Beyond Rule 26 Automatic Disclosure

In addition to the mandatory expert report and disclosure of data and facts, and the optional deposition by oral examination, parties have other avenues to pursue discovery of information, facts, and data, from expert witnesses. Under Rule 33(a)(2), parties may propound contention interrogatories that address expert witnesses’ opinions and conclusions. As for methods of discovery beyond what is discussed specifically in Rule 26, courts are confronted with a threshold question whether Rule 34 requests to produce, Rule 30(b)(2) depositions by oral examination, or Rule 45 subpoenas are the appropriate discovery method for obtaining documents from a retained, testifying expert witness. In the view of some courts, the resolution to this threshold question turns on whether expert witnesses are within the control of parties such that parties must respond to discovery for information, documents, and things within the custody, possession, and control of their expert witnesses.

Subpoenas Are Improper

Some federal district courts view Rule 45 subpoenas as inappropriate discovery tools for parties[12] and persons under the control of parties. In Alper v. United States[13], the district court refused to enforce plaintiff’s Rule 45 subpoena that sought documents from defendant’s expert witness. Although acknowledging that Rule 45’s language was unclear, the Alper court insisted that since a party proffers an expert witness, that witness should be considered under the party’s control[14]. And because the expert witness was “within defendant’s control,” the court noted that Rule 34 rather than Rule 45 governed the requested discovery[15]. Alper seems to be a minority view, but its approach is attractive in streamlining discovery, eliminating subpoena service issues for expert witnesses who may live outside the district, and forcing the sponsoring party to respond and to obtain compliance with its retained expert witness.

Subpoenas Are Proper

The “control” rationale of the Alper case is questionable. Rule 45 contains no statement of limitation to non-parties[16]. Parties “proffer” fact witnesses, but their proffers do not restrict the availability of Rule 45 subpoenas. More important, expert witnesses are not truly under the control of the retaining parties. Expert witnesses have independent duties to the court, and under their own professional standards, to give their own independent opinions[17].

Many courts allow discovery of expert witness documents and information by Rule 45 subpoena on either the theory that Rule 45 subpoenas are available for both parties and non-parties or the theory that expert witnesses are sufficiently independent of the sponsoring party that they are non-parties who are clearly subject to Rule 45. If expert witnesses are not parties, and Rule 26’s confidentiality provisions do not constrain the available discovery tools for expert witnesses, then expert witness subpoenas would appear to a proper discovery tool to discover documents in the witnesses’ possession, control, and custody[18]. When used as a discovery tool in this way, subpoenas used are subject to discovery deadlines[19].

Particular Concerns for Discovery of Statistician Expert Witnesses

Statistician expert witnesses require additional care and discovery investigation in complex products liability cases[20].  The caselaw sometimes takes a crabbed approach that refuses to provide parties access to their adversaries’ statistical analyses, calculations, data input  and output files, and graphical files.

Statistician expert testimony will usually involve complex statistical evidence, models, assumptions, and calculations. These materials will in turn create a difficulty in discerning the statistician’s choices from available statistical tests, and whether the statistician exploited the opportunity for multiple tests to be conducted serially with varying assumptions until a propitious result was obtained. Given these typical circumstances, statistical expert witness testimony will almost always require full disclosure to allow the adversary a fair opportunity to cross-examine at trial, or to challenge the validity of the proffered analyses under Rules 702 and 703[21].

Statisticians create and use a variety of materials that are clearly relevant to the their opinion:

  • programs and programming code run to generate all specified analyses on specified data,
  • statistical packages,
  • all data available,
  • all data “cleaning” or data selection processes,
  • selection of variables from those available,
  • data frames that show what data were included (and excluded) in the analyses,
  • data input files,
  • all specified tests run on all data,
  • all data and analysis output files that show all analyses generated,
  • all statistical test diagnostics and tests of underlying assumptions, and
  • graphical output files.

The statistician may have made any number of decisions or judgments in selecting which statistical test results to incorporate into his or her final report.  The report will in all likelihood not include important materials that would allow another statistician to fully understand, test, replicate, and criticize the more conclusory analysis and statements in the report.  In addition, lurking in the witnesses files, or in the electronic “trash bin” may be alternative analyses that were run and discarded, and not included in the final report.  Why and how those alternative analyses were run but discarded, may raise important credibility or validity questions, as well as provide insight into the statistician’s analytical process, all important considerations in preparing for cross-examination and rebuttal.  The lesson of Rule 26, and the caselaw interpreting its provisions, is that lawyers must make specific request for the materials described above.  Only with these materials firmly in hand, can a deposition fully explore the results obtained, the methods used, the assumptions made, the assumptions violated, the alternative methods rejected, the data used, the data available, data not used, the data-dredging and manipulation potential, analytical problems, and the potential failure to reconcile inconsistent results. Waiting for trial, or even for the deposition, may well be too late[22].

The warrant for examining the integrity of data relied upon by expert witnesses appears to be securely embedded in the Federal Rules of Civil Procedure, and in the Federal Rules of Evidence. Evidence Rule 703 has particular relevance to statistical or epidemiologic testimony. Lawyers facing studies of dubious quality may need to press for discovery of underlying data and materials. In the Viagra vision loss multi-district litigation (MDL), the defendant sought and obtained discovery of underlying data from plaintiffs’ expert witness’s epidemiologic study of vision loss among patients using Viagra and similar medications[23]. Although the Viagra MDL court had struggled with inferential statistics in its first approach to defendant’s Rule 702 motion, the court understood the challenge based upon lack of data integrity, and reconsidered and granted defendant’s motion to exclude the challenged expert witness[24].

The lawyering implications for discovery of statistician expert witnesses are important. Statistical evidence requires counsel’s special scrutiny to ensure compliance with the disclosure requirements of Federal Rule of Civil Procedure 26. Given the restrictive reading of Rule 26 by some courts, counsel will need to anticipate the use of other discovery tools. Lawyers should request by Rule 34 or Rule 45, all computer runs, programming routines, and outputs, and they should zealously pursue witnesses’ failure to maintain and produce data. Given the uncertainty in some districts whether expert witnesses are subject to subpoenas, counsel may consider propounding both Rule 34 requests and serving Rule 45 subpoenas.

Lawyers in data-intensive cases should give early consideration to appropriate discovery plans that contemplate data production in advance of depositions, to allow full exploration of analyses at deposition[25]. Lawyers should also be alert to the potential need to show particularized need for the requested data and analyses. In instructing expert witnesses on their preparation of their reports, lawyers should consider directing their expert witnesses to express whether they need further access to the adversary’s expert witnesses’ underlying data and materials to fully evaluate the proffered opinions. Discovery of statisticians and their data and their analyses requires careful planning, as well as patient efforts to educate the court about the need for full exploration of all data and all analyses conducted, whether or not incorporated into the Rule 26 report.


[1] Randall v. Rolls-Royce Corp., 2010 U.S. Dist. LEXIS 23421, *4-5 (S.D. Ind. March 12, 2010) (“Dr. Harnett who began his evaluation of the analysis contained in the report … soon concluded that he needed the underlying studies and statistical programs created or used by Dr. Drogin. In response to the Defendants’ request for such materials, Plaintiffs produced four discs containing more than 1,000 separate electronic files”).

[2] Marsh v. Jackson, 141 F.R.D. 431, 432–33 (W.D. Va. 1992) (holding that Rule 45 could not be used to obtain an opposing expert’s files because Rule 26(b)(4) limits expert discovery to depositions and interrogatories as a policy matter)

[3] See Advisory Comm. Notes for 1993 Amendments, to Fed. R. Civ. P. 26(a) (“The enumeration in Rule 26(a) of items to be disclosed does not prevent a court from requiring by order or local rule that the parties disclose additional information without a discovery request. Nor are parties precluded from using traditional discovery methods to obtain further information regarding these matters, … .”); United States v. Bazaarvoice, Inc., C 13-00133 WHO (LB), 2013 WL 3784240 (N.D. Cal. July 18, 2013) (“Rule 26(a)(2)(B) . . . does not preclude parties from obtaining further information through ordinary discovery tools”) (internal citations omitted).

[4] Morriss v. BNSF Ry. Co., No. 8:13CV24, 2014 WL 128393, at *4–6, 2014 U.S. Dist. LEXIS 3757, at *17 (D.Neb. Jan. 13, 2014) (holding that “absent some threshold showing of “compelling reason,” the broad discovery provisions of Rules 34 and 45 cannot be used to undermine the specific expert witness discovery rules in Rule 26(a)(2)”).

[5] Modjeska v. United Parcel Service Inc., No. 12–C–1020, 2014 WL 2807531 (E.D. Wis. June 19, 2014) (holding that Rule 26(a)(2)(B) governs only disclosure in expert witness reports and does not limit or preclude further discovery using ordinary discovery such as requests to produce); Expeditors Int’l of Wash., Inc. v. Vastera, Inc., No. 04 C 0321, 2004 WL 406999, at *3 (N.D. Ill. Feb.26, 2004). See also Wright & Miller, 9A Federal Practice & Procedure Civ. § 2452 (3d ed. 2013).

[6] Adv. Comm. Note for Rule 26(b)(4)(B)(2010).  See, e.g., Ladd Furniture v. Ernst & Young, 1998 U.S. Dist. LEXIS 17345, at *34-37 (M.D.N.C. Aug. 27, 1998).

[7] Id.

[8] Id. at *36-37.

[9] Innis Arden Golf Club v. Pitney Bowes, Inc., 629 F. Supp. 2d 175, 190 (D. Conn. 2009) (excluding expert opinion because his samples and data packages no longer existed and thus “[d]efendants could not attempt to validate [his] methods even if he could specifically say what he considered”). See also Jung v. Neschis, No. 01–Civ. 6993(RMB)(THK), 2007 WL 5256966, at *8–15 (S.D.N.Y. Oct. 23, 2007) (finding that a party’s failure to produce tape recordings that its medical expert witness relied upon for his opinion was ‘‘disturbing’’; precluding expert witness’s testimony).

[10] See, e.g., Dongguk Univ. v. Yale Univ., No. 3:08-CV-00441, 2011 WL 1935865, at *1 (D. Conn. May 19, 2011) (holding that “an expert’s handwritten notes are not protected from disclosure because they are neither drafts of an expert report nor communications between the party’s attorney and the expert witness”).

[11] D.G. ex rel. G. v. Henry, No. 08-CV-74-GKF-FHM, 2011 WL 1344200, at *1 (N.D. Okla. Apr. 8, 2011) (ordering production of the assistants’ notes because the expert witness had relied upon them in forming his opinion, which brought them within the scope of “facts or data” under the rule).

[12] Mortgage Info. Servs, Inc. v. Kitchens, 210 F.R.D. 562, 564-68 (W.D.N.C. 2002) (holding that nothing in Rule 45 precludes its use on a party); See also Mezu v. Morgan State Univ., 269 F.R.D. 565, 581 (D. Md. 2010) (“courts are divided as to whether Rule 45 subpoenas should be served on parties”); Peyton v. Burdick, 2008 U.S. Dist. LEXIS 106910 (E.D. Cal. 2008) (discussing the split among courts on the issue).

[13] 190 F.R.D. 281 (D. Mass. 2000).

[14] Id. at 283.

[15] Id. See Ambrose v. Southworth Products Corp., No. CIV.A. 95–0048–H, 1997 WL 470359, 1 (W.D. Va. June 24, 1997) (holding a “naked” subpoena duces tecum directed to a non-party expert retained by a party is not within the ambit of a Rule 45 document production subpoena, and not permitted by Fed. R. Civ. Pro. 26(b)(4)); see also Hartford Fire Ins. v. Pure Air on the Lake Ltd., 154 F.R.D. 202, 208 (N.D. Ind. 1993) (holding a party cannot use Rule 45 to circumvent Rule 26(b)(4) as a method to obtain an expert witness’s files); Marsh v. Jackson, 141 F.R.D. 431, 432 (W.D. Va. 1992) (noting that subpoena for production of documents directed to non-party expert retained by a party is not within ambit of Fed. Rule 45(c)(3)(8)(ii)).

[16] See James Wm. Moore, 9 Moore’s Federal Practice § 45.03[1] (noting that “[s]ubpoenas under Rule 45 may be issued to parties or non-parties”).

[17] See Glendale Fed. Bank, FSB v.United States, 39 Fed. Cl. 422, 424 (Fed. Cl. 1997) (“The expert witness, testifying under oath, is expected to give his own honest, independent opinion… He is not the sponsoring party’s agent at any time merely because he is retained as its expert witness”). See also National Justice Compania Naviera S.A. v. Prudential Assurance Co. Ltd., (“The Ikarian Reefer”), [1993] 2 Lloyd’s Rep. 68 at 81-82 (Q.B.D.), rev’d on other grounds [1995] 1 Lloyd’s Rep. 455 at 496 (C.A.) (embracing the enumeration of duties, including a duty to “provide independent assistance to the Court by way of objective unbiased opinion in relation to matters within his expertise,” and a duty to eschew “the role of an advocate”).

[18] Western Res., Inc. v. Union Pac. RR, No. 00-2043-CM, 2002 WL 1822428, at *3 (D. Kan. July 23, 2002) (ordering expert witness to produce prior testimony under Rule 45); All W. Supply Co. v. Hill’s Pet Prods. Div., Colgate-Palmolive Co., 152 F.R.D. 634, 639 (D. Kan. 1993) (“With regard to nonparties such as plaintiff’s expert witness, a request for documents may be made by subpoena duces tecum pursuant to Rule 45”); Smith v. Transducer Technology, Inc., No. Civ. 1995/28, 2000 WL 1717332, 2 (D.V.I. Nov. 16, 2000) (holding that Rule 30(b)(5) deposition notice, served upon opposing party, is not an appropriate discovery tool to compel expert witness to produce documents from at his deposition) (noting that a “Rule 45 subpoena duces tecum in conjunction with a properly noticed deposition may do so (subject however to any Rule 26 limitations)”); Thomas v. Marina Assocs., 202 F.R.D. 433, 434 (E.D. Pa. 2001) (denying motion to quash subpoenas issued to party’s expert witness); Quaile v. Carol Cable Co., Civ. A. No. 90-7415, 1992 WL 277981, at *2 (E.D. Pa. Oct. 5, 1992) (granting motion to compel discovery concerning expert witness’s opinions pursuant to a Rule 45 subpoena); Lawrence E. Jaffe Pension Plan v. Household Int’l, Inc., No. 02 C 5893, 2008 WL 687220, at *2 (N.D. Ill Mar. 10, 2008) (“It is clear . . . that a subpoena duces tecum . . . is an appropriate discovery mechanism against . . . a party’s expert witness”) (internal citation omitted); Expeditors Internat’l of Wash., Inc. v. Vastera, Inc., No. 04 C 0321, 2004 WL 406999, at *2-3 (N.D. Ill. Feb. 26, 2004) (holding Rule 45, not Rule 34, governs discovery from retained experts) (“Subpoena duces tecum is . . . an appropriate discovery mechanism against nonparties such as a party’s expert witness”); Reit v. Post Prop., Inc., No. 09 Civ. 5455(RMB)(KNF), 2010 WL 4537044, at *9 (S.D.N.Y. Nov. 4, 2010) (“Subpoena duces tecum … is an appropriate discovery mechanism against a nonparty expert”).

[19] See, e.g., Williamson v. Horizon Lines LLC , 248 F.R.D. 79, 83 (D. Me. 2008) (“[C]ontrary to Horizon Lines’ contention, there is a relationship between Rule 26 and Rule 45 and parties should not be allowed to employ a subpoena after a discovery deadline to obtain materials from third parties that could have been produced before discovery.”).

[20] Bartley v. Isuzu Motors Ltd., 151 F.R.D. 659, 660-61 (D. Colo. 1993) (ordering party to create and preserve “the input and output data for each variable in the program, for each iteration, or each simulation,” as well as a record of all simulations performed, even those that do not conform to the plaintiff’s claims and theories in the case).

[21] See City of Cleveland v. Cleveland Elec. Illuminating Co., 538 F. Supp. 1257 (N.D. Ohio 1980) (“Certainly, where, as here, the expert reports are predicated upon complex data, calculations and computer simulations which are neither discernible nor deducible from the written reports themselves, disclosure thereof is essential to the facilitation of effective and efficient examination of these experts at trial.”); Shu-Tao Lin v. McDonnell-Douglas, Corp., 574 F. Supp. 1407, 1412-13 (S.D.N.Y. 1983) (granting new trial, and holding that expert witness’s failure to disclosure the “nature of [the plaintiff’s testifying expert’s] computer program or the underlying data, the inputs and outputs employed in the program” deprived adversary of an “adequate basis on which to cross-examine plaintiff’s experts”), rev’d on other grounds, 742 F.2d 45 (2d Cir. 1984).

[22] Manual for Complex Litigation at 99, § 11.482 (4th ed. 2004) (“Early and full disclosure of expert evidence can help define and narrow issues. Although experts often seem hopelessly at odds, revealing the assumptions and underlying data on which they have relied in reaching their opinions often makes the bases for their differences clearer and enables substantial simplification of the issues. In addition, disclosure can facilitate rulings well in advance of trial on objections to the qualifications of an expert, the relevance and reliability of opinions to be offered, and the reasonableness of reliance on particular data.207”). See also ABA Section of Antitrust Law, Econometrics: Legal, Practical, and Technical Issues at 75-76 (2005) (advising of the necessity to obtain all data, all analyses, and all supporting materials, in advance of deposition to ensure efficient and effective discovery procedures).

[23] In re Viagra Prods. Liab. Litig., 572 F. Supp. 2d 1071, 1090 (D. Minn. 2008).

[24] In re Viagra Prods. Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).

[25] See Fed. R. Civ. Pro. 16(b); 26(f).

Discovery of Retained, Testifying Statistician Expert Witnesses (Part 1)

June 30th, 2015

At times, the judiciary’s resistance to delving into the factual underpinnings of expert witness opinions is extraordinary. In one case, the Second Circuit affirmed a judgment for a plaintiff in a breach of contract action, based in large part upon expert witness testimony that presented the results of a computer simulation. Perma Research & Development v. Singer Co.[1] Although the trial court had promised to permit inquiry into the plaintiff’s computer expert witness’s source of data, programmed mathematical formulae, and computer programs, when the defendant asked the plaintiff’s expert witness to disclose his underlying data and algorithms, the district judge sustained the witness’s refusal on grounds that the requested materials were his “private work product” and “proprietary information.”[2] Despite the trial court’s failure to articulate any legally recognized basis for permitting the expert witness to stonewall in this fashion, a panel of the Circuit, in an opinion by superannuated Justice Tom Clark, affirmed, on an argument that the defendant “had not shown that it did not have an adequate basis on which to cross-examine plaintiff’s experts.” Judge Van Graafeiland dissented, indelicately pointing out that the majority had charged the defendant with failing to show that it had been deprived of a fair opportunity to cross-examine plaintiff’s expert witnesses while depriving the defendant of access to the secret underlying evidence and materials that were needed to demonstrate what could have been done on cross-examination[3]. The dissent traced the trial court’s error to its misconception that a computer is just a giant calculator, and pointed out that the majority contravened Circuit precedent[4] and evolving standards[5] for handling underlying data that was analyzed or otherwise incorporated into computer models and simulations.

Although the approach of Perma Research has largely been ignored, has fallen into disrepute, and has been superseded by statutory amendments[6], its retrograde approach continues to find occasional expression in reported decisions. The refinement of Federal Rule of Evidence 702 to require sound support for expert witnesses’ opinions has opened the flow of discovery of underlying facts and data considered by expert witnesses before generating their reports. The most recent edition of the Federal Judicial Center’s Manual for Complex Litigation treats both computer-generated evidence and expert witnesses’ underlying data as both subject to pre-trial discovery as necessary to provide for full and fair litigation of the issues in the case[7].

The discovery of expert witnesses who have conducted statistical analyses poses difficult problems for lawyers.  Unlike other some expert witnesses, who passively review data and arrive at an opinion that synthesizes published research, statisticians actually create evidence with new arrangements and analyses of data in the case.  In this respect, statisticians are like material scientists who may test and record experimental observations on a product or its constituents.  Inquiring minds will want to know whether the statistical analyses in the witness’s report were the results of pre-planned analysis protocols, or whether they were the second, third, or fifteenth alternative analysis.  Earlier statistical analyses conducted but not produced may reveal what the expert witness believed would have been the preferred analysis if only the data had cooperated more fully. Statistical analyses conducted by expert witnesses provide plenty of opportunity for data-dredging, which can then be covered up by disclosing only selected analyses in the expert witness’s report.

The output of statisticians’ statistical analyses will take the form of a measure of “point estimates” of “effect size,” a significance or posterior probability, a set of regression coefficients, a summary estimate of association, or a similar measure that did not exist before the statistician used the underlying data to produce the analytical outcome, which is then the subject of further inference and opinion.  Frequentist analyses must identify the probability model and other assumptions employed. Bayesian analyses must also identify prior probabilities used as the starting point used with further evidence to arrive at posterior probabilities. The science, creativity, and judgment involved in statistical methods challenge courts and counsel to discover, understand, reproduce, present, and cross-examine statistician expert witness testimony.  And occasionally, there is duplicity and deviousness to uncover as well.

The discovery obligations with respect to statistician expert witnesses vary considerably among state and federal courts.  The 1993 amendments to the Federal Rules of Civil Procedure created an automatic right to conduct depositions of expert witnesses[8].  Previously, parties in federal court had to show the inadequacy of other methods of discovery.  Rule 26(a)(2)(B)(ii) requires the automatic production of “the facts or data considered by the [expert] witness in forming” his or her opinions. The literal wording of this provision would appear to restrict automatic, mandatory disclosure to those facts and data that are specifically considered in forming the opinions contained in the prescribed report. Several courts, however, have interpreted the term “considered” to include any information that expert witnesses review or generate, “regardless of whether the experts actually rely on those materials as a basis for their opinions.[9]

Among the changes introduced by the 2010 amendments to the Federal Rules of Civil Procedure was a narrowing of the disclosure requirement of “facts and data” considered by expert witnesses in arriving at their opinions to exclude some attorney work product, as well as protecting drafts of expert witness reports from discovery.  The implications of the Federal Rules for statistician expert witnesses are not entirely clear, but these changes should not be used as an excuse to deprive litigants of access to the data and materials underlying statisticians’ analyses. Since the 2010 amendments, courts have enforced discovery requests for testifying expert witnesses’ notes because they were not draft reports or specific communications between counsel and expert witnesses[10].

The Requirements Associated With Producing A Report

Rule 26 is the key rule that governs disclosure and discovery of expert witnesses and their opinions. Under the current version of Rule 26(a)(2)(B), the scope of required disclosure in the expert report has been narrowed in some respects. Rule 26(a)(2)(B) now requires service of expert witness reports that contain, among other things:

(i) a complete statement of all opinions the witness will express and the basis and reasons for them;

(ii) the facts or data considered by the witness in forming them;

(iii) any exhibits that will be used to summarize or support them.

The Rule’s use of “them” seems clearly to refer back to “opinions,” which creates a problem with respect to materials considered generally with respect to the case or the issues, but not for the specific opinions advanced in the report.

The previous language of the rule required that the expert report disclose “the data or other information considered by the witness.[11]” The use of “other information” in the older version of the rule, rather than the new “data” was generally interpreted to authorize discovery of all oral and written communications between counsel and expert witnesses.  The trimming of Rule 26(a)(2)(B)(ii) was thus designed to place these attorney-expert witness communications off limits from disclosure or discovery.

The federal rules specify that the required report “is intended to set forth the substance of the direct examination[12].” Several court have thus interpreted the current rule in a way that does not result in automatic production of all statistical analyses performed, but only those data and analyses the witness has decided to present at trial.  The report requirement, as it now stands, is thus not necessarily designed to help adverse counsel fully challenge and cross-examine the expert witness on analyses attempted, discarded, or abandoned. If a statistician expert witness conducted multiple statistical testing before arriving at a “preferred” analysis, that expert witness, and instructing counsel, will obviously be all too happy to eliminate the unhelpful analyses from the direct examination, and from the purview of disclosure.

Some of the caselaw in this area makes clear that it is up to the requesting party to discover what it wants beyond the materials that must automatically be disclosed in, or with, the report. A party will not be heard to complain, or attack its adversary, about failure to produce materials never requested.[13] Citing Rule 26(a) and its subsections, which deal with the report, and not discovery beyond the report, several cases take a narrow view of disclosure as embodied in the report requirement.[14] In one case, McCoy v. Whirlpool Corp, the trial court did, however, permit the plaintiff to conduct a supplemental deposition of the defense expert witness to question him about his calculations[15].

A narrow view of automatic disclosure in some cases appears to protect statistician and other expert witnesses from being required to produce calculations, statistical analyses, and data outputs even for opinions that are identified in their reports, and intended to be the subject of direct examination at trial[16].  The trial court’s handling of the issues in Cook v. Rockwell International Corporation is illustrative of this questionable approach.  The issue of the inadequacy of expert witnesses’ reports, for failing to disclose notes, calculations, and preliminary analyses, arose in the context of a Rule 702 motion to the admissibility of the witnesses’ opinion testimony.  The trial court rejected “[a]ny suggestion that an opposing expert must be able to verify the correctness of an expert’s work before it can be admitted… ”[17]; any such suggestion “misstates the standard for admission of expert evidence under [Fed. R. Evid.] 702.[18]”  The Cook court further rejected any “suggestion in Rule 26(a)(2) that an expert report is incomplete unless it contains sufficient information and detail for an opposing expert to replicate and verify in all respects both the method and results described in the report.[19]”   Similarly, the court rejected the defense’s complaints that one of plaintiffs’ expert witness’s expert report and disclosures violated Rule 26(a)(2), by failing to provide “detailed working notes, intermediate results and computer records,” to allow a rebuttal expert witness to test the methodology and replicate the results[20]. The court observed that

“Defendants’ argument also confuses the expert reporting requirements of Rule 26(a)(2) with the considerations for assessing the admissibility of an expert’s opinions under Rule 702 of the Federal Rules of Evidence. Whether an expert’s method or theory can or has been tested is one of the factors that can be relevant to determining whether an expert’s testimony is reliable enough to be admissible. See Fed. R. Evid. 702 2000 advisory committee’s note; Daubert, 509 U.S. at 593, 113 S.Ct. 2786. It is not a factor for assessing compliance with Rule 26(a)(2)’s expert disclosure requirements.[21]

The Rule 702 motion to exclude an expert witness comes too late in the pre-trial process for complaints about failure to disclose underlying data and analyses. The Cook case never explicitly addressed Rule 26(b), or other discovery procedures, as a basis for the defense request for underlying documents, data, and materials.  In any event, the limited scope accorded to Rule 26 disclosure mechanisms by Cook emphasizes the importance of deploying ancillary discovery tools early in the pre-trial process.

The Format Of Documents and Data Files To Be Produced

The dispute in Helmert v.  Butterball, LLC, is typical of what may be expected in a case involving statistician expert witness testimony.  The parties exchanged reports of their statistical expert witnesses, as well as the data output files.  The parties chose, however, to produce the data files in ways that were singularly unhelpful to the other side.  One party produced data files in the “portable document format” (pdf) rather than in the native format of the statistical software package used (STATA).  The other party produced data in a spreadsheet without any information about how the data were processed.  The parties then filed cross-motions to compel the data in its “electronic, native format.” In addition, plaintiffs pressed for all the underlying data, formulae, and calculations. The court denied both motions on the theory that both sides had received copies of the data considered, and neither was denied facts or data considered by the expert witnesses in reaching their opinions[22]. The court refused plaintiffs’ request for formulae and calculations as well. The court’s discussion of its rationale for denying the cross-motions is framed entirely in terms of what parties may expect and be entitled in the form of a report, without any mention of additional discovery mechanisms to obtain the sought-after materials. The court noted that the parties would have the opportunity to explore calculations at deposition.

The decision in Helmert seems typical of judicial indifference to, and misunderstanding of, the need for datasets, especially with large datasets, in the form uploaded to, and used in, statistical software programs. What is missing from the Helmert opinion is a recognition that an effective deposition would require production of the requested materials in advance of the oral examination, so that the examining counsel can confer and consult with a statistical expert for help in formulating and structuring the deposition questions. There are at least two remedial considerations for future discovery motions of the sort seen in Helmert. First, the moving party should support its application with an affidavit of a statistical expert to explain the specific need for identification of the actual formulae used, programming used within specific software programs to run analyses, and interim and final outputs. Second, a strong analogy with document discovery of parties, in which courts routinely order “native format” versions of PowerPoint, Excel, and Word documents produced in response to document requests. Rule 34 of the Federal Rules of Civil Procedure requires that “[a] party must produce documents as they are kept in the usual course of business[23]” and that, “[i]f a request does not specify a form for producing electronically stored information, a party must produce it in a form or forms in which it is ordinarily maintained or in a reasonably usable form or forms.[24]” The Advisory Committee notes to Rule 34[25] make clear that:

“[T]he option to produce in a reasonably usable form does not mean that a responding party is free to convert electronically stored information from the form in which it is ordinarily maintained to a different form that makes it more difficult or burdensome for the requesting party to use the information efficiently in the litigation. If the responding party ordinarily maintains the information it is producing in a way that makes it searchable by electronic means, the information should not be produced in a form that removes or significantly degrades this feature.”

Under the Federal Rules, a requesting party’s obligation to specify a particular format for document production is superseded by the responding party’s obligation to refrain from manipulating or converting “any of its electronically stored information to a different format that would make it more difficult or burdensome for [the requesting party] to use.[26]” In Helmert, the STATA files should have been delivered as STATA native format files, and the requesting party should have requested, and received, all STATA input and output files, which would have permitted the requestor to replicate all analyses conducted.

Some of the decided cases on expert witness reports are troubling because they do not explicitly state whether they are addressing the adequacy of automatic disclosure and reports, or a response to propounded discovery.  For example, in Etherton v. Owners Ins. Co.[27], the plaintiff sought to preclude a defense accident reconstruction expert witness on grounds that the witness failed to produce several pages of calculations[28]. The defense argued that the “[w]hile [the witness’s] notes regarding these calculations were not included in his expert report, the report does specifically identify the methods he employed in his analysis, and the static data used in his calculations”; and by asserting that “Rule 26 does not require the disclosure of draft expert reports, and it certainly does not require disclosure of calculations, as Plaintiff contends.[29]”  The court in Etherton agreed that “Fed. R. Civ. P. 26(a)(2)(B) does not require the production of every scrap of paper with potential relevance to an expert’s opinion.[30]” The court laid the discovery default here upon the plaintiff, as the requesting party:  “Although Plaintiff should have known that Mr. Ogden’s engineering analysis would likely involve calculations, Plaintiff never requested that documentation of those calculations be produced at any time prior to the date of [Ogden’s] deposition.[31]

The Etherton court’s assessment that the defense expert witness’s calculations were “working notes,” which Rule 26(a)(2) does not require to be included in or produced with a report, seems a complete answer, except for the court’s musings about the new provisions of Rule 26(b)(4)(B), which protect draft reports.  Because of the court’s emphasis that the plaintiff never requested the documentation of the relevant calculations, the court’s musings about what was discoverable were clearly dicta.  The calculations, which would reveal data and inferential processes considered, appear to be core materials, subject to and important for discovery[32].

[This post is a substantial revision and update to an earlier post, “Discovery of Statistician Expert Witnesses” (July 19, 2012).]


[1] 542 F.2d 111 (2d Cir. 1976), cert. denied, 429 U.S. 987 (1976)

[2] Id. at 124.

[3] Id. at 126 & n.17.

[4] United States v. Dioguardi, 428 F.2d 1033, 1038 (2d Cir.), cert. denied, 400 U.S. 825 (1970) (holding that prosecution’s failure to produce computer program was error but harmless on the particular facts of the case).

[5] See, e.g., Roberts, “A Practitioner’s Primer on Computer-Generated Evidence,” 41 U. Chi. L. Rev. 254, 255-56 (1974); Freed, “Computer Records and the Law — Retrospect and Prospect,” 15 Jurimetrics J. 207, 208 (1975); ABA Sub-Committee on Data Processing, “Principles of Introduction of Machine Prepared Studies” (1964).

[6] Aldous, Note, “Disclosure of Expert Computer Simulations,” 8 Computer L.J. 51 (1987); Betsy S. Fiedler, “Are Your Eyes Deceiving You?: The Evidentiary Crisis Regarding the Admissibility of Computer Generated Evidence,” 48 N.Y.L. Sch. L. Rev. 295, 295–96 (2004); Fred Galves, “Where the Not-So-Wild Things Are: Computers in the Courtroom, the Federal Rules of Evidence, and the Need for Institutional Reform and More Judicial Acceptance,” 13 Harv. J.L. & Tech. 161 (2000); Leslie C. O’Toole, “Admitting that We’re Litigating in the Digital Age: A Practical Overview of Issues of Admissibility in the Technological Courtroom,” Fed. Def. Corp. Csl. Quart. 3 (2008); Carole E. Powell, “Computer Generated Visual Evidence: Does Daubert Make a Difference?” 12 Georgia State Univ. L. Rev. 577 (1995).

[7] Federal Judicial Center, Manual for Complex Litigation § 11.447, at 82 (4th ed. 2004) (“The judge should therefore consider the accuracy and reliability of computerized evidence, including any necessary discovery during pretrial proceedings, so that challenges to the evidence are not made for the first time at trial.”); id. at § 11.482, at 99 (“Early and full disclosure of expert evidence can help define and narrow issues. Although experts often seem hopelessly at odds, revealing the assumptions and underlying data on which they have relied in reaching their opinions often makes the bases for their differences clearer and enables substantial simplification of the issues.”)

[8] Fed. R. Civ. P. 26(b)(4)(A) (1993).

[9] United States v. Dish Network, L.L.C., No. 09-3073, 2013 WL 5575864, at *2, *5 (C.D. Ill. Oct. 9, 2013) (noting that the 2010 amendments did not affect the change the meaning of the term “considered,” as including “anything received, reviewed, read, or authored by the expert, before or in connection with the forming of his opinion, if the subject matter relates to the facts or opinions expressed.”); S.E.C. v. Reyes, 2007 WL 963422, at *1 (N.D. Cal. Mar. 30, 2007). See also South Yuba River Citizens’ League v. National Marine Fisheries Service, 257 F.R.D. 607, 610 (E.D. Cal. 2009) (majority rule requires production of materials considered even when work product); Trigon Insur. Co. v. United States, 204 F.R.D. 277, 282 (E.D. Va. 2001).

[10] Dongguk Univ. v. Yale Univ., No. 3:08–CV–00441 (TLM), 2011 WL 1935865 (D. Conn. May 19, 2011) (ordering production of a testifying expert witness’s notes, reasoning that they were neither draft reports nor communications between the party’s attorney and the expert witness, and they were not the mental impressions, conclusions, opinions, or legal theories of the party’s attorney); In re Application of the Republic of Ecuador, 280 F.R.D. 506, 513 (N.D. Cal. 2012) (holding that Rule 26(b) does not protect an expert witness’s own work product other than draft reports). But see Internat’l Aloe Science Council, Inc. v. Fruit of the Earth, Inc., No. 11-2255, 2012 WL 1900536, at *2 (D. Md. May 23, 2012) (holding that expert witness’s notes created to help counsel prepare for deposition of adversary’s expert witness were protected as attorney work product and protected from disclosure under Rule 26(b)(4)(C) because they did not contain opinions that the expert would provide at trial)).

[11] Fed. R. Civ. P. 26(a)(2)(B)(ii) (1993) (emphasis added).

[12] Notes of Advisory Committee on Rules for Rule 26(a)(2)(B). See, e.g., Lituanian Commerce Corp., Ltd. v. Sara Lee Hosiery, 177 F.R.D. 245, 253 (D.N.J. 1997) (expert witness’s written report should state completely all opinions to be given at trial, the data, facts, and information considered in arriving at those opinions, as well as any exhibits to be used), vacated on other grounds, 179 F.R.D. 450 (D.N.J. 1998).

[13] See, e.g., Gillepsie v. Sears, Roebuck & Co., 386 F.3d 21, 35 (1st Cir. 2004) (holding that trial court erred in allowing cross-examination and final argument on expert witness’s supposed failure to produce all working notes and videotaped recordings while conducting tests, when objecting party never made such document requests).

[14] See, e.g., McCoy v. Whirlpool Corp., 214 F.R.D. 646, 652 (D. Kan. 2003) (Rule  26(a)(2) “does not require that a report recite each minute fact or piece of scientific information that might be elicited on direct examination to establish the admissibility of the expert opinion … Nor does it require the expert to anticipate every criticism and articulate every nano-detail that might be involved in defending the opinion[.]”).

[15] Id. (without distinguishing between the provisions of Rule 26(a) concerning reports and Rule 26(b) concerning depositions); see also Scott v. City of New York, 591 F.Supp. 2d 554, 559 (S.D.N.Y. 2008) (“failure to record the panoply of descriptive figures displayed automatically by his statistics program does not constitute best practices for preparation of an expert report,’’ but holding that the report contained ‘‘the data or other information’’ he considered in forming his opinion, as required by Rule 26); McDonald v. Sun Oil Co., 423 F.Supp. 2d 1114, 1122 (D. Or. 2006) (holding that Rule 26(a)(2)(B) does not require the production of an expert witness’s working notes; a party may not be sanctioned for spoliation based upon expert witness’s failure to retain notes, absent a showing of relevancy and bad faith), rev’d on other grounds, 548 F.3d 774 (9th Cir. 2008).

[16] In re Xerox Corp Securities Litig., 746 F. Supp. 2d 402, 414-15 (D. Conn. 2010) (“The court concludes that it was not necessary for the [expert witness’s] initial regression analysis to be contained in the [expert] report” that was disclosed pursuant to Rule 26(a)(2)), aff’d on other grds. sub. nom., Dalberth v. Xerox Corp., 766 F. 3d 172 (2d Cir. 2014). See also Cook v. Rockwell Int’l Corp., 580 F.Supp. 2d 1071, 1122 (D. Colo. 2006), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ , No. 10-1377, 2012 WL 2368857 (June 25, 2012), on remand, 13 F.Supp.3d 1153 (D. Colo. 2014), vacated 2015 WL 3853593, No. 14–1112 (10th Cir. June 23, 2015); Flebotte v. Dow Jones & Co., No. Civ. A. 97–30117–FHF, 2000 WL 35539238, at *7 (D. Mass. Dec. 6, 2000) (“Therefore, neither the plain language of the rule nor its purpose compels disclosure of every calculation or test conducted by the expert during formation of the report.”).

[17] Cook, 580 F. Supp. 2d at 1121–22.

[18] Id.

[19] Id. & n. 55 (Rule 26(a)(2) does not “require that an expert report contain all the information that a scientific journal might require an author of a published paper to retain.”).

[20] Id. at 1121-22.

[21] Id.

[22] Helmert v.  Butterball, LLC, No. 4:08-CV-00342, 2011 WL 3157180, at *2 (E.D. Ark. July 27, 2011).

[23] Fed. R. Civ. P. 34(b)(2)(E)(i).

[24] Fed. R. Civ. P. 34(b)(2)(E)(ii).

[25] Fed. R. Civ. P. 34, Advisory Comm. Notes (2006 Amendments).

[26] Crissen v. Gupta, 2013 U.S. Dist. LEXIS 159534, at *22 (S.D. Ind. Nov. 7, 2013), citing Craig & Landreth, Inc. v. Mazda Motor of America, Inc., 2009 U.S. Dist. LEXIS 66069, at *3 (S.D. Ind. July 27, 2009). See also Saliga v. Chemtura Corp., 2013 U.S. Dist. LEXIS 167019, *3-7 (D. Conn. Nov. 25, 2013).

[27] No. 10-cv-00892-MSKKLM, 2011 WL 684592 (D. Colo. Feb. 18, 2011)

[28] Id. at *1.

[29] Id.

[30] Id. at *2.

[31] Id.

[32] See Barnes v. Dist. of Columbia, 289 F.R.D. 1, 19–24 (D.D.C. 2012) (ordering production of underlying data and information because, “[i]n order for the [requesting party] to understand fully the . . . [r]eports, they need to have all the underlying data and information on how” the reports were prepared).

Don’t Double Dip Data

March 9th, 2015

Meta-analyses have become commonplace in epidemiology and in other sciences. When well conducted and transparently reported, meta-analyses can be extremely helpful. In several litigations, meta-analyses determined the outcome of the medical causation issues. In the silicone gel breast implant litigation, after defense expert witnesses proffered meta-analyses[1], court-appointed expert witnesses adopted the approach and featured meta-analyses in their reports to the MDL court[2].

In the welding fume litigation, plaintiffs’ expert witness offered a crude, non-quantified, “vote counting” exercise to argue that welding causes Parkinson’s disease[3]. In rebuttal, one of the defense expert witnesses offered a quantitative meta-analysis, which provided strong evidence against plaintiffs’ claim.[4] Although the welding fume MDL court excluded the defense expert’s meta-analysis from the pre-trial Rule 702 hearing as untimely, plaintiffs’ counsel soon thereafter initiated settlement discussions of the entire set of MDL cases. Subsequently, the defense expert witness, with his professional colleagues, published an expanded version of the meta-analysis.[5]

And last month, a meta-analysis proffered by a defense expert witness helped dispatch a long-festering litigation in New Jersey’s multi-county isotretinoin (Accutane) litigation. In re Accutane Litig., No. 271(MCL), 2015 WL 753674 (N.J. Super., Law Div., Atlantic Cty., Feb. 20, 2015) (excluding plaintiffs’ expert witness David Madigan).

Of course, when a meta-analysis is done improperly, the resulting analysis may be worse than none at all. Some methodological flaws involve arcane statistical concepts and procedures, and may be easily missed. Other flaws are flagrant and call for a gatekeeping bucket brigade.

When a merchant puts his hand the scale at the check-out counter, we call that fraud. When George Costanza double dipped his chip twice in the chip dip, he was properly called out for his boorish and unsanitary practice. When a statistician or epidemiologist produces a meta-analysis that double counts crucial data to inflate a summary estimate of association, or to create spurious precision in the estimate, we don’t need to crack open Modern Epidemiology or the Reference Manual on Scientific Evidence to know that something fishy has taken place.

In litigation involving claims that selective serotonin reuptake inhibitors cause birth defects, plaintiffs’ expert witness, a perinatal epidemiologist, relied upon two published meta-analyses[6]. In an examination before trial, this epidemiologist was confronted with the double counting (and other data entry errors) in the relied-upon meta-analyses, and she readily agreed that the meta-analyses were improperly done and that she had to abandon her reliance upon them.[7] The result of the expert witness’s deposition epiphany, however, was that she no longer had the illusory benefit of an aggregation of data, with an outcome supporting her opinion. The further consequence was that her opinion succumbed to a Rule 702 challenge. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Double counting of studies, or subgroups within studies, is a flaw that most careful readers can identify in a meta-analysis, without advance training. According to statistician Stephen Senn, double counting of evidence is a serious problem in published meta-analytical studies. Stephen J. Senn, “Overstating the evidence – double counting in meta-analysis and related problems,” 9, at *1 BMC Medical Research Methodology 10 (2009). Senn observes that he had little difficulty in finding examples of meta-analyses gone wrong, including meta-analyses with double counting of studies or data, in some of the leading clinical medical journals. Id. Senn urges analysts to “[b]e vigilant about double counting,” id. at *4, and recommends that journals should withdraw meta-analyses promptly when mistakes are found,” id. at *1.

Similar advice abounds in books and journals[8]. Professor Sander Greenland addresses the issue in his chapter on meta-analysis in Modern Epidemiology:

Conducting a Sound and Credible Meta-Analysis

Like any scientific study, an ideal meta-analysis would follow an explicit protocol that is fully replicable by others. This ideal can be hard to attain, but meeting certain conditions can enhance soundness (validity) and credibility (believability). Among these conditions we include the following:

  • A clearly defined set of research questions to address.

  • An explicit and detailed working protocol.

  • A replicable literature-search strategy.

  • Explicit study inclusion and exclusion criteria, with a rationale for each.

  • Nonoverlap of included studies (use of separate subjects in different included studies), or use of statistical methods that account for overlap. * * * * *”

Sander Greenland & Keith O’Rourke, “Meta-Analysis – Chapter 33,” in Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 652, 655 (3d ed. 2008) (emphasis added).

Just remember George Costanza; don’t double dip that chip, and don’t double dip in the data.


[1] See, e.g., Otto Wong, “A Critical Assessment of the Relationship between Silicone Breast Implants and Connective Tissue Diseases,” 23 Regulatory Toxicol. & Pharmacol. 74 (1996).

[2] See Barbara Hulka, Betty Diamond, Nancy Kerkvliet & Peter Tugwell, “Silicone Breast Implants in Relation to Connective Tissue Diseases and Immunologic Dysfunction:  A Report by a National Science Panel to the Hon. Sam Pointer Jr., MDL 926 (Nov. 30, 1998)”; Barbara Hulka, Nancy Kerkvliet & Peter Tugwell, “Experience of a Scientific Panel Formed to Advise the Federal Judiciary on Silicone Breast Implants,” 342 New Engl. J. Med. 812 (2000).

[3] Deposition of Dr. Juan Sanchez-Ramos, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008514 (N.D. Ohio May 17, 2011).

[4] Deposition of Dr. James Mortimer, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008054 (N.D. Ohio June 29, 2011).

[5] James Mortimer, Amy Borenstein & Laurene Nelson, Associations of Welding and Manganese Exposure with Parkinson’s Disease: Review and Meta-Analysis, 79 Neurology 1174 (2012).

[6] Shekoufeh Nikfar, Roja Rahimi, Narjes Hendoiee, and Mohammad Abdollahi, “Increasing the risk of spontaneous abortion and major malformations in newborns following use of serotonin reuptake inhibitors during pregnancy: A systematic review and updated meta-analysis,” 20 DARU J. Pharm. Sci. 75 (2012); Roja Rahimi, Shekoufeh Nikfara, Mohammad Abdollahic, “Pregnancy outcomes following exposure to serotonin reuptake inhibitors: a meta-analysis of clinical trials,” 22 Reproductive Toxicol. 571 (2006).

[7] “Q So the question was: Have you read it carefully and do you understand everything that was done in the Nikfar meta-analysis?

A Yes, I think so.

* * *

Q And Nikfar stated that she included studies, correct, in the cardiac malformation meta-analysis?

A That’s what she says.

* * *

Q So if you look at the STATA output, the demonstrative, the — the forest plot, the second study is Kornum 2010. Do you see that?

A Am I —

Q You’re looking at figure four, the cardiac malformations.

A Okay.

Q And Kornum 2010, —

A Yes.

Q — that’s a study you relied upon.

A Mm-hmm.

Q Is that right?

A Yes.

Q And it’s on this forest plot, along with its odds ratio and confidence interval, correct?

A Yeah.

Q And if you look at the last study on the forest plot, it’s the same study, Kornum 2010, same odds ratio and same confidence interval, true?

A You’re right.

Q And to paraphrase My Cousin Vinny, no self-respecting epidemiologist would do a meta-analysis by including the same study twice, correct?

A Well, that was an error. Yeah, you’re right.

***

Q Instead of putting 2 out of 98, they extracted the data and put 9 out of 28.

A Yeah. You’re right.

Q So there’s a numerical transposition that generated a 25-fold increased risk; is that right?

A You’re correct.

Q And, again, to quote My Cousin Vinny, this is no way to do a meta-analysis, is it?

A You’re right.”

Testimony of Anick Bérard, Kuykendall v. Forest Labs, at 223:14-17; 238:17-20; 239:11-240:10; 245:5-12 (Cole County, Missouri; Nov. 15, 2013). According to a Google Scholar search, the Rahimi 2005 meta-analysis had been cited 90 times; the Nikfar 2012 meta-analysis, 11 times, as recently as this month. See, e.g., Etienne Weisskopf, Celine J. Fischer, Myriam Bickle Graz, Mathilde Morisod Harari, Jean-Francois Tolsa, Olivier Claris, Yvan Vial, Chin B. Eap, Chantal Csajka & Alice Panchaud, “Risk-benefit balance assessment of SSRI antidepressant use during pregnancy and lactation based on best available evidence,” 14 Expert Op. Drug Safety 413 (2015); Kimberly A. Yonkers, Katherine A. Blackwell & Ariadna Forray, “Antidepressant Use in Pregnant and Postpartum Women,” 10 Ann. Rev. Clin. Psychol. 369 (2014); Abbie D. Leino & Vicki L. Ellingrod, “SSRIs in pregnancy: What should you tell your depressed patient?” 12 Current Psychiatry 41 (2013).

[8] Julian Higgins & Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions 152 (2008) (“7.2.2 Identifying multiple reports from the same study. Duplicate publication can introduce substantial biases if studies are inadvertently included more than once in a meta-analysis (Tramèr 1997). Duplicate publication can take various forms, ranging from identical manuscripts to reports describing different numbers of participants and different outcomes (von Elm 2004). It can be difficult to detect duplicate publication, and some ‘detectivework’ by the reviewauthors may be required.”); see also id. at 298 (Table 10.1.a “Definitions of some types of reporting biases”); id. at 304-05 (10.2.2.1 Duplicate (multiple) publication bias … “The inclusion of duplicated data may therefore lead to overestimation of intervention effects.”); Julian P.T. Higgins, Peter W. Lane, Betsy Anagnostelis, Judith Anzures-Cabrera, Nigel F. Baker, Joseph C. Cappelleri, Scott Haughie, Sally Hollis, Steff C. Lewis, Patrick Moneuse & Anne Whitehead, “A tool to assess the quality of a meta-analysis,” 4 Research Synthesis Methods 351, 363 (2013) (“A common error is to double-count individuals in a meta-analysis.”); Alessandro Liberati, Douglas G. Altman, Jennifer Tetzlaff, Cynthia Mulrow, Peter C. Gøtzsche, John P.A. Ioannidis, Mike Clarke, Devereaux, Jos Kleijnen, and David Moher, “The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration,” 151 Ann. Intern. Med. W-65, W-75 (2009) (“Some studies are published more than once. Duplicate publications may be difficult to ascertain, and their inclusion may introduce bias. We advise authors to describe any steps they used to avoid double counting and piece together data from multiple reports of the same study (e.g., juxtaposing author names, treatment comparisons, sample sizes, or outcomes).”) (internal citations omitted); Erik von Elm, Greta Poglia; Bernhard Walder, and Martin R. Tramèr, “Different patterns of duplicate publication: an analysis of articles used in systematic reviews,” 291 J. Am. Med. Ass’n 974 (2004); John Andy Wood, “Methodology for Dealing With Duplicate Study Effects in a Meta-Analysis,” 11 Organizational Research Methods 79, 79 (2008) (“Dependent studies, duplicate study effects, nonindependent studies, and even covert duplicate publications are all terms that have been used to describe a threat to the validity of the meta-analytic process.”) (internal citations omitted); Martin R. Tramèr, D. John M. Reynolds, R. Andrew Moore, Henry J. McQuay, “Impact of covert duplicate publication on meta­analysis: a case study,” 315 Brit. Med. J. 635 (1997); Beverley J Shea, Jeremy M Grimshaw, George A. Wells, Maarten Boers, Neil Andersson, Candyce Hamel, Ashley C. Porter, Peter Tugwell, David Moher, and Lex M. Bouter, “Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews,” 7(10) BMC Medical Research Methodology 2007 (systematic reviews must inquire whether there was “duplicate study selection and data extraction”).

Bentham’s Legacy – Quantification of Fact Finding

March 1st, 2015

Jeremy Bentham, radical philosopher, was a source of many antic proposals. Perhaps his most antic proposal was to have himself stuffed, mounted, and displayed in the halls of University College of London, where he may still be observed during normal school hours. In ethical theory, Bentham advocated for an extreme ethical reductionism, known as utilitarianism. Bentham shared Edmund Burke’s opposition to the invocation of natural rights, but unlike Burke, Bentham was an ardent foe of the American Revolution.

Bentham was also a non-practicing lawyer who had an inexhaustible capacity for rationalistic revisions of legal practice. Among his revisionary schemes, Bentham proposed to reduce or translate qualitative beliefs to a numerical a scale, like a thermometer. Jeremy Bentham, 6 The Works of Jeremy Bentham; Rationale of Evidence, Rationale of Judicial Evidence at 225 (1843); 1 Rationale of Judicial Evidence Specially Applied to Judicial Practice at 76 (1827). The legal profession, that is lawyers who actually tried or judged cases, did think much of Bentham’s proposal:

“The notions of those who have proposed that mere moral probabilities or relations could ever be represented by numbers or space, and thus be subjected to arithmetical analysis, cannot but be regarded as visionary and chimerical.”

Thomas Starkie, A Practical Treatise of the Law of Evidence 225 (2d ed. 1833). Having graduated from St. John’s College, Cambridge University, as senior wrangler, Starkie was no slouch in mathematics, and he was an accomplished lawyer and judge later in life.

Starkie’s pronouncement upon Bentham’s proposal was, in the legal profession, a final judgment. The idea of having witnesses provide a decigrade or centigrade scale of belief in facts never caught on in the law. No evidentiary code or set of rules allows for, or requires, such quantification, but on the fringes, Bentham’s ideas still resonate with some observers who would require juries or judges to quantify their findings of fact:

“Consequently statistical ideas should be used in court and have already been used in the analysis of forensic data. But there are other areas to explore. Thus I do not think a jury should be required to decide guilty or innocent; they should provide their probability of guilt. The judge can then apply MEU [maximised expected utility] by incorporating society’s utility. Hutton could usefully have used some probability. A lawyer and I wrote a paper on the evidential worth of failure to produce evidence.”

Lindley, “Bayesian Thoughts,” Significance 73, 74-75 (June 2004). Some might say that Lindley was trash picking in the dustbin of legal history.

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Sander Greenland is one of the few academics, who has served as an expert witness, who has written post-mortems of his involvement in various litigations[1]. Although settling scores with opposing expert witnesses can be a risky business[2], the practice can provide important insights for judges and lawyers who want to avoid the errors of the past. Greenland correctly senses that many errors seem endlessly recycled, and that courts could benefit from disinterested commentary on cases. And so, there should be a resounding affirmation from federal and state courts to the proclaimed “need for critical appraisal of expert witnesses in epidemiology and statistics,” as well as in many other disciplines.

A recent exchange[3] with Professor Greenland led me to revisit his Wake Forest Law Review article. His article raises some interesting points, some mistaken, but some valuable and thoughtful considerations about how to improve the state of statistical expert witness testimony. For better and worse[4], lawyers who litigate health effects issues should read it.

Other Misunderstandings

Greenland posits criticisms of defense expert witnesses[5], who he believes have misinterpreted or misstated the appropriate inferences to be drawn from null studies. In one instance, Greenland revisits one of his own cases, without any clear acknowledgment that his views were largely rejected.[6] The State of California had declared, pursuant to Proposition 65 ( the Safe Drinking Water and Toxic Enforcement Act of 1986, Health and Safety Code sections 25249.5, et seq.), that the State “knew” that di(2-ethylhexyl)phthalate, or “DEHP” caused cancer. Baxter Healthcare challenged the classification, and according to Greenland, the defense experts erroneously interpreted inclusive studies with evidence supporting a conclusion that DEHP does not cause cancer.

Greenland argues that the Baxter expert’s reference[7] to an IARC working group’s classification of DEHP as “not classifiable as to its carcinogenicity to humans” did not support the expert’s conclusion that DEHP does not cause cancer in human. If Baxter’s expert invoked the IARC working group’s classification for complete exoneration of DEHP, then Greenland’s point is fair enough. In his single-minded attack on Baxter’s expert’s testimony, however, Greenland missed a more important point, which is that the IARC’s determination that DEHP is not classifiable as to carcinogenicity is directly contradictory of California’s epistemic claim to “know” that DEHP causes cancer. And Greenland conveniently omits any discussion that the IARC working group had reclassified DEHP from “possibly carcinogenic” to “not classifiable,” in the light of its conclusion that mechanistic evidence of carcinogenesis in rodents did not pertain to humans.[8] Greenland maintains that Baxter’s experts misrepresented the IARC working group’s conclusion[9], but that conclusion, at the very least, demonstrates that California was on very shaky ground when it declared that it “knew” that DEHP was a carcinogen. California’s semantic gamesmanship over its epistemic claims is at the root of the problem, not a misstep by defense experts in describing inconclusive evidence as exonerative.

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.

39 Wake Forest Law Rev. at 303. Despite Greenland’s alignment with California in the Denton case, the fact of the matter is that a verdict of “uncertain” was allowed, and he was free to criticize California for making a grossly exaggerated epistemic claim on inconclusive evidence.

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

39 Wake Forest Law Rev. at 305. And yet, his involvement in the Denton case (as well as other cases, such as silicone gel breast implant cases, thimerosal cases, etc.) suggest that he is willing to lend aid and support to judgments for plaintiffs when the evidence is inconclusive.

Important Advice and Recommendations

These foregoing points are rather severe limitations to Greenland’s article, but lawyers and judges should also look to what is good and helpful here. Greenland is correct to call out expert witnesses, regardless of party of affiliation, who opine that inconclusive studies are “proof” of the null hypothesis. Although some of Greenland’s arguments against the use of significance probability may be overstated, his corrections to the misstatements and misunderstandings of significance probability should command greater attention in the legal community. In one strained passage, however, Greenland uses a disjunction to juxtapose null hypothesis testing with proof beyond a reasonable doubt[10]. Greenland of course understands the difference, but the context would lead some untutored readers to think he has equated the two probabilistic assessments. Writing in a law review for lawyers and judges might have led him to be more careful. Given the prevalence of plaintiffs’ counsel’s confusing the 95% confidence coefficient with a burden of proof akin to beyond a reasonable doubt, great care in this area is, indeed, required.

Despite his appearing for plaintiffs’ counsel in health effects litigation, some of Greenland’s suggestions are balanced and perhaps more truth-promoting than many plaintiffs’ counsel would abide. His article provides an important argument in favor of raising the legal criteria for witnesses who purport to have expertise to address and interpret epidemiologic and experimental evidence[11]. And beyond raising qualification requirements above mere “reasonable pretense at expertise,” Professor Greenland offers some thoughtful, helpful recommendations for improving expert witness testimony in the courts:

  • “Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
  • Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
  • Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Each of these three suggestions is valuable and constructive, and worthy of an independent paper. The recommendation of neutral expert witnesses and scholarly tutorials for judges is hardly new. Many defense counsel and judges have argued for them in litigation and in commentary. The first recommendation, of publishing “controversial testimony” is part of the purpose of this blog. There would be great utility to making expert witness testimony, and analysis thereof, more available for didactic purposes. Perhaps the more egregious testimonial adventures should be republished in professional journals, as Greenland suggests. Greenland qualifies his recommendation with “as space allows,” but space is hardly the limiting consideration in the digital age.

Causation

Professor Greenland correctly points out that causal concepts and conclusions are often essentially contested[12], but his argument might well be incorrectly taken for “anything goes.” More helpfully, Greenland argues that various academic ideals should infuse expert witness testimony. He suggests that greater scholarship, with acknowledgment of all viewpoints, and all evidence, is needed in expert witnessing. 39 Wake Forest Law Rev. at 293.

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Id. Despite his position in the Denton case, and others, Greenland and all expert witnesses are free to maintain that more research is needed before a causal claim can be supported. Greenland also maintains that expert witnesses should “look past” the conclusions drawn by authors, and base their opinions on the “actual data” on which the statistical analyses are based, and from which conclusions have been drawn. Courts have generally rejected this view, but if courts were to insist upon real expertise in epidemiology and statistics, then the testifying expert witnesses should not be constrained by the hearsay opinions in the discussion sections of published studies – sections which by nature are incomplete and tendentious. See Follow the Data, Not the Discussion” (May 2, 2010).

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

39 Wake Forest Law Rev. at 293-4. This recommendation would be helpful in assuring courts that the data may simply not support conclusions sufficiently certain to be submitted to lay judges and jurors. Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319, 320 (7th Cir. 1996) (“But the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”) (internal citations omitted).

Threats to Validity

One of the serious mistakes counsel often make in health effects litigation is to invite courts to believe that statistical significance is sufficient for causal inferences. Greenland emphasizes that validity considerations often are much stronger, and more important considerations than the play of random error[13]:

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

*   *   *   *   *   *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

39 Wake Forest Law Rev. at 302 – 03. Greenland’s abbreviated list of threats to validity should remind courts that they cannot sniff a p-value below five percent and then safely kick the can to the jury. The literature on evaluating bias and confounding is huge, but Greenland was a co-author on an important recent paper, which needs to be added to the required reading lists of judges charged with gatekeeping expert witness opinion testimony about health effects. See Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).


[1] For an influential example of this sparse genre, see James T. Rosenbaum, “Lessons from litigation over silicone breast implants: A call for activism by scientists,” 276 Science 1524 (1997) (describing the exaggerations, distortions, and misrepresentations of plaintiffs’ expert witnesses in silicone gel breast implant litigation, from perspective of a highly accomplished scientist physician, who served as a defense expert witness, in proceedings before Judge Robert Jones, in Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). In one attempt to “correct the record” in the aftermath of a case, Greenland excoriated a defense expert witness, Professor Robert Makuch, for stating that Bayesian methods are rarely used in medicine or in the regulation of medicines. Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004).  Greenland heaped adjectives upon his adversary, “ludicrous claim,” “disturbing, “misleading expert testimony,” and “demonstrably quite false.” See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014) (debunking Prof. Greenland’s claims).

[2] One almost comical example of trying too hard to settle a score occurs in a footnote, where Greenland cites a breast implant case as having been reversed in part by another case in the same appellate court. See 39 Wake Forest Law Rev. at 309 n.68, citing Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir. 1999), aff’d in part & rev’d in part, United States v. Baxter Int’l, Inc., 345 F.3d 866 (11th Cir. 2003). The subsequent case was not by any stretch of the imagination a reversal of the earlier Allison case; the egregious citation is a legal fantasy. Furthermore, Allison had no connection with the procedures for court-appointed expert witnesses or technical advisors. Perhaps the most charitable interpretation of this footnote is that it was injected by the law review editors or supervisors.

[3] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[4] In addition to the unfair attack on Professor Makuch, see supra, n.1, there is much that some will find “disturbing,” “misleading,” and even “ludicrous,” (some of Greenland’s favorite pejorative adjectives) in the article. Greenland repeats in brief his arguments against the legal system’s use of probabilities of causation[4], which I have addressed elsewhere.

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[6] See 39 Wake Forest Law Rev. at 294-95, citing Baxter Healthcare Corp. v. Denton, No. 99CS00868, 2002 WL 31600035, at *1 (Cal. App. Dep’t Super. Ct. Oct. 3, 2002) (unpublished); Baxter Healthcare Corp. v. Denton, 120 Cal. App. 4th 333 (2004)

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

[10] 39 Wake Forest L. Rev. at 305 (“If it is necessary to prove causation ‛beyond a reasonable doubt’–or be ‛compelled to give up the null’ – then action can be forestalled forever by focusing on any aspect of available evidence that fails to conform neatly with the causal (alternative) hypothesis. And in medical and social science there is almost always such evidence available, not only because of the ‛play of chance’ (the focus of ordinary statistical theory), but also because of the numerous validity problems in human research.”

[11] See Peter Green, “Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases” (Jan. 23, 2002) (writing on behalf of The Royal Statistical Society; “Although many scientists have some familiarity with statistical methods, statistics remains a specialised area. The Society urges you to take steps to ensure that statistical evidence is presented only by appropriately qualified statistical experts, as would be the case for any other form of expert evidence.”).

[12] 39 Wake Forest Law Rev. at 291 (“In reality, there is no universally accepted method for inferring presence or absence of causation from human observational data, nor is there any universally accepted method for inferring probabilities of causation (as courts often desire); there is not even a universally accepted definition of cause or effect.”).

[13] 39 Wake Forest Law Rev. at 302-03 (“If one is more concerned with explaining associations scientifically, rather than with mechanical statistical analysis, evidence about validity can be more important than statistical results.”).

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Sander Greenland is one of the few academics, who has served as an expert witness, who has written post-mortems of his involvement in various litigations[1]. Although settling scores with opposing expert witnesses can be a risky business[2], the practice can provide important insights for judges and lawyers who want to avoid the errors of the past. Greenland correctly senses that many errors seem endlessly recycled, and that courts could benefit from disinterested commentary on cases. And so, there should be a resounding affirmation from federal and state courts to the proclaimed “need for critical appraisal of expert witnesses in epidemiology and statistics,” as well as in many other disciplines.

A recent exchange[3] with Professor Greenland led me to revisit his Wake Forest Law Review article. His article raises some interesting points, some mistaken, but some valuable and thoughtful considerations about how to improve the state of statistical expert witness testimony. For better and worse[4], lawyers who litigate health effects issues should read it.

Other Misunderstandings

Greenland posits criticisms of defense expert witnesses[5], who he believes have misinterpreted or misstated the appropriate inferences to be drawn from null studies. In one instance, Greenland revisits one of his own cases, without any clear acknowledgment that his views were largely rejected.[6] The State of California had declared, pursuant to Proposition 65 ( the Safe Drinking Water and Toxic Enforcement Act of 1986, Health and Safety Code sections 25249.5, et seq.), that the State “knew” that di(2-ethylhexyl)phthalate, or “DEHP” caused cancer. Baxter Healthcare challenged the classification, and according to Greenland, the defense experts erroneously interpreted inclusive studies with evidence supporting a conclusion that DEHP does not cause cancer.

Greenland argues that the Baxter expert’s reference[7] to an IARC working group’s classification of DEHP as “not classifiable as to its carcinogenicity to humans” did not support the expert’s conclusion that DEHP does not cause cancer in human. If Baxter’s expert invoked the IARC working group’s classification for complete exoneration of DEHP, then Greenland’s point is fair enough. In his single-minded attack on Baxter’s expert’s testimony, however, Greenland missed a more important point, which is that the IARC’s determination that DEHP is not classifiable as to carcinogenicity is directly contradictory of California’s epistemic claim to “know” that DEHP causes cancer. And Greenland conveniently omits any discussion that the IARC working group had reclassified DEHP from “possibly carcinogenic” to “not classifiable,” in the light of its conclusion that mechanistic evidence of carcinogenesis in rodents did not pertain to humans.[8] Greenland maintains that Baxter’s experts misrepresented the IARC working group’s conclusion[9], but that conclusion, at the very least, demonstrates that California was on very shaky ground when it declared that it “knew” that DEHP was a carcinogen. California’s semantic gamesmanship over its epistemic claims is at the root of the problem, not a misstep by defense experts in describing inconclusive evidence as exonerative.

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.

39 Wake Forest Law Rev. at 303. Despite Greenland’s alignment with California in the Denton case, the fact of the matter is that a verdict of “uncertain” was allowed, and he was free to criticize California for making a grossly exaggerated epistemic claim on inconclusive evidence.

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

39 Wake Forest Law Rev. at 305. And yet, his involvement in the Denton case (as well as other cases, such as silicone gel breast implant cases, thimerosal cases, etc.) suggest that he is willing to lend aid and support to judgments for plaintiffs when the evidence is inconclusive.

Important Advice and Recommendations

These foregoing points are rather severe limitations to Greenland’s article, but lawyers and judges should also look to what is good and helpful here. Greenland is correct to call out expert witnesses, regardless of party of affiliation, who opine that inconclusive studies are “proof” of the null hypothesis. Although some of Greenland’s arguments against the use of significance probability may be overstated, his corrections to the misstatements and misunderstandings of significance probability should command greater attention in the legal community. In one strained passage, however, Greenland uses a disjunction to juxtapose null hypothesis testing with proof beyond a reasonable doubt[10]. Greenland of course understands the difference, but the context would lead some untutored readers to think he has equated the two probabilistic assessments. Writing in a law review for lawyers and judges might have led him to be more careful. Given the prevalence of plaintiffs’ counsel’s confusing the 95% confidence coefficient with a burden of proof akin to beyond a reasonable doubt, great care in this area is, indeed, required.

Despite his appearing for plaintiffs’ counsel in health effects litigation, some of Greenland’s suggestions are balanced and perhaps more truth-promoting than many plaintiffs’ counsel would abide. His article provides an important argument in favor of raising the legal criteria for witnesses who purport to have expertise to address and interpret epidemiologic and experimental evidence[11]. And beyond raising qualification requirements above mere “reasonable pretense at expertise,” Professor Greenland offers some thoughtful, helpful recommendations for improving expert witness testimony in the courts:

  • “Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
  • Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
  • Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Each of these three suggestions is valuable and constructive, and worthy of an independent paper. The recommendation of neutral expert witnesses and scholarly tutorials for judges is hardly new. Many defense counsel and judges have argued for them in litigation and in commentary. The first recommendation, of publishing “controversial testimony” is part of the purpose of this blog. There would be great utility to making expert witness testimony, and analysis thereof, more available for didactic purposes. Perhaps the more egregious testimonial adventures should be republished in professional journals, as Greenland suggests. Greenland qualifies his recommendation with “as space allows,” but space is hardly the limiting consideration in the digital age.

Causation

Professor Greenland correctly points out that causal concepts and conclusions are often essentially contested[12], but his argument might well be incorrectly taken for “anything goes.” More helpfully, Greenland argues that various academic ideals should infuse expert witness testimony. He suggests that greater scholarship, with acknowledgment of all viewpoints, and all evidence, is needed in expert witnessing. 39 Wake Forest Law Rev. at 293.

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Id. Despite his position in the Denton case, and others, Greenland and all expert witnesses are free to maintain that more research is needed before a causal claim can be supported. Greenland also maintains that expert witnesses should “look past” the conclusions drawn by authors, and base their opinions on the “actual data” on which the statistical analyses are based, and from which conclusions have been drawn. Courts have generally rejected this view, but if courts were to insist upon real expertise in epidemiology and statistics, then the testifying expert witnesses should not be constrained by the hearsay opinions in the discussion sections of published studies – sections which by nature are incomplete and tendentious. See Follow the Data, Not the Discussion” (May 2, 2010).

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

39 Wake Forest Law Rev. at 293-4. This recommendation would be helpful in assuring courts that the data may simply not support conclusions sufficiently certain to be submitted to lay judges and jurors. Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319, 320 (7th Cir. 1996) (“But the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”) (internal citations omitted).

Threats to Validity

One of the serious mistakes counsel often make in health effects litigation is to invite courts to believe that statistical significance is sufficient for causal inferences. Greenland emphasizes that validity considerations often are much stronger, and more important considerations than the play of random error[13]:

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

*   *   *   *   *   *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

39 Wake Forest Law Rev. at 302 – 03. Greenland’s abbreviated list of threats to validity should remind courts that they cannot sniff a p-value below five percent and then safely kick the can to the jury. The literature on evaluating bias and confounding is huge, but Greenland was a co-author on an important recent paper, which needs to be added to the required reading lists of judges charged with gatekeeping expert witness opinion testimony about health effects. See Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).


[1] For an influential example of this sparse genre, see James T. Rosenbaum, “Lessons from litigation over silicone breast implants: A call for activism by scientists,” 276 Science 1524 (1997) (describing the exaggerations, distortions, and misrepresentations of plaintiffs’ expert witnesses in silicone gel breast implant litigation, from perspective of a highly accomplished scientist physician, who served as a defense expert witness, in proceedings before Judge Robert Jones, in Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). In one attempt to “correct the record” in the aftermath of a case, Greenland excoriated a defense expert witness, Professor Robert Makuch, for stating that Bayesian methods are rarely used in medicine or in the regulation of medicines. Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004).  Greenland heaped adjectives upon his adversary, “ludicrous claim,” “disturbing, “misleading expert testimony,” and “demonstrably quite false.” See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014) (debunking Prof. Greenland’s claims).

[2] One almost comical example of trying too hard to settle a score occurs in a footnote, where Greenland cites a breast implant case as having been reversed in part by another case in the same appellate court. See 39 Wake Forest Law Rev. at 309 n.68, citing Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir. 1999), aff’d in part & rev’d in part, United States v. Baxter Int’l, Inc., 345 F.3d 866 (11th Cir. 2003). The subsequent case was not by any stretch of the imagination a reversal of the earlier Allison case; the egregious citation is a legal fantasy. Furthermore, Allison had no connection with the procedures for court-appointed expert witnesses or technical advisors. Perhaps the most charitable interpretation of this footnote is that it was injected by the law review editors or supervisors.

[3] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[4] In addition to the unfair attack on Professor Makuch, see supra, n.1, there is much that some will find “disturbing,” “misleading,” and even “ludicrous,” (some of Greenland’s favorite pejorative adjectives) in the article. Greenland repeats in brief his arguments against the legal system’s use of probabilities of causation[4], which I have addressed elsewhere.

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[6] See 39 Wake Forest Law Rev. at 294-95, citing Baxter Healthcare Corp. v. Denton, No. 99CS00868, 2002 WL 31600035, at *1 (Cal. App. Dep’t Super. Ct. Oct. 3, 2002) (unpublished); Baxter Healthcare Corp. v. Denton, 120 Cal. App. 4th 333 (2004)

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

[10] 39 Wake Forest L. Rev. at 305 (“If it is necessary to prove causation ‛beyond a reasonable doubt’–or be ‛compelled to give up the null’ – then action can be forestalled forever by focusing on any aspect of available evidence that fails to conform neatly with the causal (alternative) hypothesis. And in medical and social science there is almost always such evidence available, not only because of the ‛play of chance’ (the focus of ordinary statistical theory), but also because of the numerous validity problems in human research.”

[11] See Peter Green, “Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases” (Jan. 23, 2002) (writing on behalf of The Royal Statistical Society; “Although many scientists have some familiarity with statistical methods, statistics remains a specialised area. The Society urges you to take steps to ensure that statistical evidence is presented only by appropriately qualified statistical experts, as would be the case for any other form of expert evidence.”).

[12] 39 Wake Forest Law Rev. at 291 (“In reality, there is no universally accepted method for inferring presence or absence of causation from human observational data, nor is there any universally accepted method for inferring probabilities of causation (as courts often desire); there is not even a universally accepted definition of cause or effect.”).

[13] 39 Wake Forest Law Rev. at 302-03 (“If one is more concerned with explaining associations scientifically, rather than with mechanical statistical analysis, evidence about validity can be more important than statistical results.”).

Fixodent Study Causes Lockjaw in Plaintiffs’ Counsel

February 4th, 2015

Litigation Drives Science

Back in 2011, the Fixodent MDL Court sustained Rule 702 challenges to plaintiffs’ expert witnesses. “Hypotheses are verified by testing, not by submitting them to lay juries for a vote.” In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345, 1367 (S.D.Fla.2011), aff’d, Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296 (11th Cir. 2014). The Court found that the plaintiffs had raised a superficially plausible hypothesis, but that they had not verified the hypothesis by appropriate testing[1].

Like dentures to Fixodent, the plaintiffs stuck to their claims, and set out to create the missing evidence. Plaintiffs’ counsel contracted with Dr. Salim Shah and his companies Sarfez Pharmaceuticals, Inc. and Sarfez USA, Inc. (“Sarfez”) to conduct human research in India, to support their claims that zinc in denture cream causes neurological damage[2]In re Denture Cream Prods. Liab. Litig., Misc. Action 13-384 (RBW), 2013 U.S. Dist. LEXIS 93456, *2 (D.D.C. July 3, 2013).  When the defense learned of this study, and the plaintiffs’ counsel’s payments of over $300,000, to support the study, they sought discovery of raw data, study protocol, statistical analyses, and other materials from plaintiffs’ counsel.  Plaintiffs’ counsel protested that they did not have all the materials, and directed defense counsel to Sarfez.  Although other courts have made counsel produce similar materials from the scientists and independent contractors they engaged, in this case, defense counsel followed the trail of documents to contractor, Sarfez, with subpoenas in hand.  Id. at *3-4.

The defense served a Rule 45 subpoena on Sarfez, which produced some, but not all responsive documents. Proctor & Gamble pressed for the missing materials, including study protocols, analytical reports, and raw data.  Id. at *12-13.  Judge Reggie Walton upheld the subpoena, which sought underlying data and non-privileged correspondence, to be within the scope of Rules 26(b) and 45, and not unduly burdensome. Id. at *9-10, *20. Sarfez attempted to argue that the requested materials, listed as email attachments, might not exist, but Judge Walton branded the suggestion “disingenuous.”  Attachments to emails should be produced along with the emails.  Id. at *12 (citing and collecting cases). Although Judge Walton did not grant a request for forensic recovery of hard-drive data or for sanctions, His Honor warned Sarfez that it might be required to bear the cost of forensic data recovery if it did not comply the court’s order.  Id. at *15, *22.

Plaintiffs Put Their Study Into Play

The study at issue in the subpoena was designed by Frederick K. Askari, M.D., Ph.D., an associate professor of hepatology, in the University of Michigan Health System. In re Denture Cream Prods. Liab. Litig., No. 09–2051–MD, 2015 WL 392021, at *7 (S.D. Fla. Jan. 28, 2015). At the instruction of plaintiffs’ counsel, Dr. Askari sought to study the short-term effects of Fixodent on copper absorption in humans. Working in India, Askari conducted the study on 24 participants, who were given a controlled diet for 36 days. Of the 24 participants, 12, randomly selected, received 12 grams of Fixodent per day (containing 204 mg. of zinc). Another six participants, randomly selected, were given zinc acetate, three times per day (150 mg of zinc), and the remaining six participants received placebo, three times per day.

A study protocol was approved by an independent group[3], id. at *9, and the study was supposed to be conducted with a double blind. Id. at *7. Not surprisingly, those participants who received doses of Fixodent or zinc acetate had higher urinary levels of zinc (pee < 0.05). The important issue, however, was whether the dietary zinc levels affect copper excretion in a way that would support plaintiffs’ claims that copper levels were lowered sufficiently by Fixodent to cause a syndromic neurological disorder. The MDL Court ultimately concluded that plaintiffs’ expert witnesses’ opinions on general causation claims were not sufficiently supported to satisfy the requirements of Rule 702, and upheld defense challenges to those expert witnesses. In doing so, the MDL Court had much of interest to say about case reports, weight of the evidence, and other important issues. This post, however, concentrates on the deviations of one study, commissioned by plaintiffs’ counsel, from the scientific standard of care. The Askari “research” makes for a fascinating case study of how not to conduct a study in a litigation caldron.

Non-Standard Deviations

The First Deviation – Changing the Ascertainment Period After the Data Are Collected

The protocol apparently identified a primary endpoint to be:

“the mean increase in [copper 65] excretion in fecal matter above the baseline (mg/day) averaged over the study period … to test the hypothesis that the release of [zinc] either from Fixodent or Zinc Acetate impairs [copper 65] absorption as measured in feces.”

The study outcome, on the primary end point, was clear. The plaintiffs’ testifying statistician, Hongkun Wang, stated in her deposition that the fecal copper (whether isotope Cu63 or Cu65) was not different across the three groups (Fixodent, zinc acetate, and placebo). Id. at *9[4]. Even Dr. Askari himself admitted that the total fecal copper levels were not increased in the Fixodent group compared with the placebo control group. Id. at *9.[5]

Apparently after obtaining the data, and finding no difference in the pre-specified end point of average fecal copper levels between Fixodent and placebo groups, Askari turned to a new end point, measured in a different way, not described in the protocol as the primary end point.

The Second Deviation – Changing Primary End Point After the Data Are Collected

In the early (days 3, 4, and 5) and late (days 31, 32, and 33) part of the Study, participants received a dose of purified copper 65[6] to help detect the “blockade of copper.” Id. at 8*. The participants’ fecal copper 65 levels were compared to their naturally occurring copper 63 levels. According to Dr. Askari:

“if copper is being blocked in the Fixodent and zinc acetate test subjects from exposure to the zinc in the test product (Fixodent) and positive control (zinc acetate), the ratio of their fecal output of copper 65 as compared to their fecal output of copper 63 would increase relative to the control subjects, who were not dosed with zinc. In short, a higher ratio of copper 65 to copper 63 reflects blocking of copper.”

Id.

Askari analyzed the ratio of two copper isotopes (Cu65 /Cu63), in the limited period of observation to study days 31 to 33. Id. at *9. Askari thus changed the outcome to be measured, the timing of the measurement, and manner of measurement (average over entire period versus amount on days 31 to 33). On this post hoc, non-prespecified end point, Askari claimed to have found “significant” differences.

The MDL Court expressed its skepticism and concern over the difference between the protocol’s specified end point, and one that came into the study only after the data were obtained and analyzed. The plaintiffs claimed that it was their (and Askari’s) intention from the initial stages of designing the Fixodent Blockade Study to use the Cu65/Cu63 ratio as the primary end point. According to the plaintiffs, the isotope ratio was simply better articulated and “clarified” as the primary end point in the final report than it was in the protocol. The Court was not amused or assuaged by the plaintiffs’ assurances. The study sponsor, Dr. Salim Shah could not point to a draft protocol that indicated the isotope ratio as the end point; nor could Dr. Shah identify a request for this analysis by Wang until after the study was concluded. Id. at *9.[7]

Ultimately, the Court declared that whether the protocol was changed post hoc after the primary end point provided disappointing analysis, or the isotope ratio was carelessly omitted from the protocol, the design or conduct of the study was “incompatible with reliable scientific methodology.”

The Third Deviation – Changing the Standard of “Significance” After the Data Are Collected and P-Values Are Computed

The protocol for the Blockade study called for a pre-determined Type I error rate (p-value) of no more than 5 percent.[8] Id. at *10. The difference in the isotope ratio showed an attained level of significance probability of 5.7 percent, and thus even the post hoc end point missed the prespecified level of significance. The final protocol changed the value of “significance” to 10 percent, to permit the plaintiffs to declare a “statistically significant” result. Dr. Wang admitted in deposition that she doubled the acceptable level of Type I error only after she obtained the data and calculated the p-value of 0.057. Id. at *10.[9]

The Court found that this deliberate moving of the statistical goal post reflected a “lack of objectivity and reliability,” which smacked of contrivance[10].

The Court found that the study’s deviations from the protocol demonstrated a lack of objectivity. The inadequacy of the Study’s statistical analysis plan supported the Court’s conclusion that Dr. Askari’s supposed finding of a “statistically significant” difference in fecal copper isotope ratio between Fixodent and placebo group participants was “not based on sufficiently reliable and objective scientific methodology” and thus could not support plaintiffs’ expert witnesses’ general causation claims.

The Fourth Deviation – Failing to Take Steps to Preserve the Blind

The protocol called for a double-blinded study, with neither the participants nor the clinical investigators knowing which participant was in which group. Rather than delivering the three different groups capsules that looked similar, the group each received starkly different looking capsules. Id. at *11. The capsules for one set were apparently so large that the investigators worried whether the participants would comply with the dosing regimen.

The Fifth Deviation – Failing to Take Steps to Keep Biological Samples From Becoming Contaminated

Documents and emails from Dr. Shah acknowledged that there had been “difficulties in storing samples at appropriate temperature.” Id. at *11. Fecal samples were “exposed to unfrozen and undesirable temperature conditions.” Dr. Shah called for remedial steps from the Study manager, but there was no documentation that such steps were taken to correct the problem. Id.

The Consequences of Discrediting the Study

Dr. Askari opined that the Study, along with other evidence, shows that Fixodent can cause copper deficiency myeloneuropathy (“CDM”). The plaintiffs, of course, argued that the Defendants’ criticisms of the Fixodent

Study’s methodology went merely to the “weight rather than admissibility.” Id. at *9. Askari’s study was but one leg of the stool, but the defense’s thorough discrediting of the study was an important step in collapsing the support for the plaintiffs’ claims. As the MDL Court explained:

“The Court cannot turn a blind eye to the myriad, serious methodological flaws in the Fixodent Blockade Study and conclude they go to weight rather than admissibility. While some of these flaws, on their own, may not be serious enough to justify exclusion of the Fixodent Blockade Study; taken together, the Court finds Fixodent Blockade Study is not “good science,” and is not admissible. Daubert, 509 U.S. at 593 (internal quotation marks and citation omitted).”

Id. at *11.

A study, such as the Fixodent Blockade Study, is not itself admissible, but the deconstruction of the study upon which plaintiffs’ expert witnesses relied, led directly to the Court’s decision to exclude those witnesses. The Court omitted any reference to Federal Rule of Evidence 703, which addresses the requirements of facts and data, otherwise inadmissible, which may be relied upon by expert witnesses in reaching their opinions.


 

[1] SeePhiladelphia Plaintiff’s Claims Against Fixodent Prove Toothless” (May 2, 2012); Jacoby v. Rite Aid Corp., 2012 Phila. Ct. Com. Pl. LEXIS 208 (2012), aff’d, 93 A.3d 503 (Pa. Super. 2013); “Pennsylvania Superior Court Takes The Bite Out of Fixodent Claims” (Dec. 12, 2013).

[2] SeeUsing the Rule 45 Subpoena to Obtain Research Data” (July 24, 2013)

[3] The group was identified as the Ethica Norma Ethical Committee.

[4] citing Wang Dep. at 56:7–25, Aug. 13, 2013), and Wang Analysis of Fixodent Blockade Study [ECF No. 2197–56] (noting “no clear treatment effect on Cu63 or Cu65”).

[5] Askari Dep. at 69:21–24, June 20, 2013.

[6] Copper 65 is not a typical tracer; it is not radioactive. Naturally occurring copper consists almost exclusively of two stable (non-radioactive) isotope, Cu65 about 31 percent, Cu63 about 69 percent. See, e.g., Manuel Olivares, Bo Lönnerdal, Steve A Abrams, Fernando Pizarro, and Ricardo Uauy, “Age and copper intake do not affect copper absorption, measured with the use of 65Cu as a tracer, in young infants,” 76 Am. J. Clin. Nutr. 641 (2002); T.D. Lyon, et al., “Use of a stable copper isotope (65Cu) in the differential diagnosis of Wilson’s disease,” 88 Clin. Sci. 727 (1995).

[7] Shah Dep. at 87:12–25; 476:2–536:12, 138:6–142:12, June 5, 2013).

[8] The reported decision leaves unclear how the analysis would proceed, whether by ANOVA for the three groups, or t-tests, and whether there was multiple testing.

[9] Wang Dep. at 151:13–152:7; 153:15–18.

[10] 2015 WL 392021, at *10, citing Perry v. United States, 755 F.2d 888, 892 (11th Cir. 1985) (“A scientist who has a formed opinion as to the answer he is going to find before he even begins his research may be less objective than he needs to be in order to produce reliable scientific results.”); Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 n. 7 (11th Cir.2005) (“In evaluating the reliability of an expert’s method … a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result.” (alteration added)).