For your delectation and delight, desultory dicta on the law of delicts.

Traditional, Frequentist Statistics Still Hegemonic

March 25th, 2017

The Defense Fallacy

In civil actions, defendants, and their legal counsel sometimes argue that the absence of statistical significance across multiple studies requires a verdict of “no cause” for the defense. This argument is fallacious, as can be seen where there are many studies, say eight or nine, which all consistently find elevated risk ratios, but with p-values slightly higher than 5%. The probability that eight studies, free of bias, would consistently find an elevated risk ratio, regardless of the individual studies’ p-values, is itself very small. If the studies were amenable to meta-analysis, the summary estimate of the risk ratio would itself likely be highly statistically significant in this hypothetical.

The Plaintiffs’ Fallacy

The plaintiffs’ fallacy derives from instances, such as the hypothetical one above, in which statistical significance, taken as a property of individual studies, is lacking. Even though we can hypothesize such instances, plaintiffs fallaciously extrapolate from them to the conclusion that statistical significance, or any other measure of sampling estimate precision, is unnecessary to support a conclusion of causation.

In courtroom proceedings, epidemiologist Kenneth Rothman is frequently cited by plaintiffs as having shown or argued that statistical significance is unimportant. For instance, in the Zoloft multi-district birth defects litigation, plaintiffs argued in a motion for reconsideration of the exclusion of their epidemiologic witness that the trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27 (2011), as well as to the Third Circuit’s invocation of the so-called “Rothman” approach in a Bendectin birth defects case, DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). According to the plaintiffs’ argument, their excluded epidemiologic witness, Dr. Anick Bérard, had used this approach in arriving at her novel conclusion that sertraline causes virtually every kind of birth defect.

The Zoloft plaintiffs did not call Rothman as a witness; nor did they even present an expert witness to explain what Rothman’s arguments were. Instead, the plaintiffs’ counsel, sneaked in some references and vague conclusions into their cross-examinations of defense expert witnesses, and submitted snippets from Rothman’s textbook, Modern Epidemiology.

If plaintiffs had called Dr. Rothman to testify, he would have probably insisted that statistical significance is not a criterion for causation. Such insistence is not as helpful to plaintiffs in cases such as Zoloft birth defects cases as their lawyers might have thought or hoped. Consider for instance the cases in which causal inferences are arrived at without formal statistical analysis. These instances are often not relevant to mass tort litigation that involve prevalent exposure and a prevalent outcome.

Rothman also would have likely insisted that consideration of random variation and bias are essential to the assessment of causation, and that many apparently or nominally statistically significant associations do not and cannot support valid inferences of causation. Furthermore, he might have been given the opportunity to explain that his criticisms of significance testing are as much directed to the creation of false positive as to false negative rates in observational epidemiology. In keeping with his publications, Rothman would have challenged strict significance testing with p-values as opposed to the use of sample statistical estimates in conjunction with confidence intervals. The irony of the Zoloft case and many other litigations was that the defense was not using significance testing in the way that Rothman had criticized; rather the plaintiffs were over-endorsing statistical significance that was nominal, plagued by multi-testing, and inconsistent.

Judge Rufe, who presided over the Zoloft MDL, pointed out that the Third Circuit in DeLuca had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:

by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicia of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”

2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). And in DeLuca, after remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses (including the infamous Dr. Done, and Shanna Swan), in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. On subsequent appeal, the Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

Judge Rufe similarly rebuffed the plaintiffs’ use of the Rothman approach, their reliance upon Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration). SeeZoloft MDL Relieves Matrixx Depression” (Feb. 4, 2015).

Some Statisticians’ Errors

Recently, Dr. Rothman and three other epidemiologists set out to track the change, over time, from 1975 to 2014, of the use of various statistical methodologies. Andreas Stang, Markus Deckert, Charles Poole & Kenneth J. Rothman, “Statistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic review,” 32 Eur. J. Epidem. 21 (2017) [cited below as Stang]. They made clear that their preferred methodological approach was to avoid the strictly dichotomous null hypothesis significance testing (NHST), which has evolved from Fisher’s significance testing and Neyman’s null hypothesis testing (NHT), in favor of the use of estimation with confidence intervals (CI). The authors conducted a meta-study, that is a study of studies, to track the trends in use of NHST, ST, NHT and CI reporting in the major bio-medical journals.

Unfortunately, the authors limited their data and analysis to abstracts, which makes their results very likely misleading and incomplete. Even when abstracts reported using so-called CI-only approaches, the authors may well have reasoned that point estimates with CIs that spanned no association were “non-significant.” Similarly, authors who found elevated risk ratios with very wide confidence intervals may well have properly acknowledged that their study did not provide credible evidence of an association. See W. Douglas Thompson, “Statistical criteria in the interpretation of epidemiologic data,” 77 Am. J. Public Health 191, 191 (1987) (discussing the over-interpretation of skimpy data).

Rothman and colleagues found that while a few epidemiologic journals had a rising prevalence of CI-only reports in abstracts, for many biomedical journals the NHST approach remained more common. Interestingly, at three of the major clinical medical journals, the Journal of the American Medical Association, the New England Journal of Medicine, and Lancet, the NHST has prevailed over the almost four decades of observation.

The clear implication of Rothman’s meta-study is that consideration of significance probability, whether or not treated as a dichotomous outcome, and whether or not treated as a p-value or a point estimate with a confidence interval, is absolutely critical to how biomedical research is conducted, analyzed, and reported. In Rothman’s words:

Despite the many cautions, NHST remains one of the most prevalent statistical procedures in the biomedical literature.”

Stang at 22. See also David Chavalarias, Joshua David Wallach, Alvin Ho Ting & John P. A. Ioannidis, “Evolution of Reporting P Values in the Biomedical Literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016) (noting the absence of the use of Bayes’ factors, among other techniques).

There is one aspect to the Stang article that is almost Trump-like in its citing to an inappropriate, unknowledgable source and then treating its author as having meaningful knowledge of the subject. As part of their rhetorical goals, Stang and colleagues declare that:

there are some indications that it has begun to create a movement away from strict adherence to NHT, if not to ST as well. For instance, in the Matrixx decision in 2011, the U.S. Supreme Court unanimously ruled that admissible evidence of causality does not have to be statistically significant [12].”

Stang at 22. Whence comes this claim? Footnote 12 takes us to what could well be fake news of a legal holding, an article by a statistician about a legal case:

Joseph L. Gastwirth, “Statistical considerations support the Supreme Court’s decision in Matrixx Initiatives v. Siracusano, 52 Jurimetrics J. 155 (2012).

Citing a secondary source when the primary source is readily available, and what is at issue, seems like poor scholarship. Professor Gastwirth is a statistician, not a lawyer, and his exegesis of the Supreme Court’s decision is wildly off target. As any first year law student could discern, the Matrixx case could not have been about the admissibility of evidence because the case had been dismissed on the pleadings, and no evidence had ever been admitted or excluded. The only issue on appeal was the adequacy of the allegations, not the admissibility of evidence.

Although the Court managed to muddle its analysis by wandering off into dicta about causation, the holding of the case is that alleging causation was not required to plead a case of materiality for a securities fraud case. Having dispatched causality from the case, the Court had no serious business in setting the considerations for alleging in pleadings or proving at trial the elements of causation. Indeed, the Court made it clear that its frolic and detour into causation could not be taken seriously:

We need not consider whether the expert testimony was properly admitted in those cases [cited earlier in the opinion], and we do not attempt to define here what constitutes reliable evidence of causation.”

Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309, 1319 (2011).

The word “admissible” or “admissibility” never appear in the Court’s opinion, and the above quote explains that the admissibility was not considered. Laughably, the Court went on to cite three cases as examples of supposed causation opinions in the absence of statistical significance. Two of the three were specific causation, differential etiology cases that involved known general causation. The third case involved a claim of birth defects from contraceptive jelly, when the plaintiffs’ expert witnesses actually relied upon statistically significant (but thoroughly flawed and invalid) associations.1

When it comes to statistical testing the legal world would be much improved if lawyers actually and carefully read statistics authors, and if statisticians and scientists actually read court opinions.

Science Day Should Be Every Day in Our Courtrooms — Part II

March 24th, 2017

This post and the previous one are an expansion upon a post that I wrote with Dr. David Schwartz, of Innovative Science Solutions, LLC. Dr. Schwartz is a talented scientist with whom I had the privilege and pleasure to work at McCarter & English, before he left to become an independent scientific consultant. Dr. Schwartz is one of the founding partners of his firm, which focuses on helping lawyers with the scientific issues in complex health effects litigation. Our earlier post can be found on the Courtroom View Network’s website. “Guest Analysis: Key Takeaways from Recent Talc Powder ‘Science Day’ Hearing in California,” Courtroom View Network (Mar 24, 2017).


Talc Science Goes Bicoastal

This year, two trial judges have entertained Science Days in talc cases, on both coasts of the United States. In the federal talc litigation, MDL 2738,1 Judge Freda L. Wolfson conducted a Science Day on January 26, 2017. In the coordinated California state court talc cases, Judge Maren E. Nelson, of the Superior Court of California, Los Angeles County, conducted a Science Day, on March 7, 2017.2

Federal Talc MDL 2738

In the federal cases, Johnson & Johnson, one of the defendants, initiated the Science Day, in November 2016, when it asked Judge Wolfson to set aside a day in “which the parties and their experts can outline their positions/arguments regarding the medical and science issues at play.”3 In Case Management Order No. 1 (Jan. 23, 2017), Judge Wolfson apparently agreed, and the parties had their talc Science Day on January 26, 2017.4 The Science Day took up over five hours of presentations to Judge Freda L. Wolfson, and Judge Lois H. Goodman.5

California Coordinated Docket

In the California cases,6 plaintiffs’ counsel filed a formal motion, in early December 2016, to request a Science Day tutorial. The plaintiffs’ motion requested that each side be permitted two hours to present their views of the scientific evidence in support of their litigation positions on causation and liability in talc ovarian cancer cases. The plaintiffs argued that a Science Day would be “significant benefit to the Court and the parties.”7 Judge Nelson granted the request, and held Science Day on March 7, 2017.

Courtroom View Network (CVN)

The proceedings in California were recorded videographically, and are available through Courtroom View Network (CVN). Johnson & Johnson opposed televising the Science Day proceedings, on the procedural ground that CVN had not filed and served the appropriate motion. Johnson & Johnson also argued a substantive ground that the proceedings were not a formal trial, and that televising “would not confer any benefit on the public, the parties, or the Court, let alone one that outweighs the significant concerns posed by such a broadcast.”8

Whatever the merits of J & J’s procedural ground, its substantive grounds seem dubious. The importance of the Science Day proceedings transcends the pecuniary interests of the parties to the litigation. First, the presentations provide the empirical bases for other courts and lawyer to evaluate the procedures employed. Second, lawyers and judges generally, outside the talc litigation, can learn a great deal from the strengths, weaknesses, and mistakes made by the participants in the televised proceedings. Many of the scientific issues that pervade the talc litigation recur frequently in other mass tort litigations in the United States and abroad. Finally, and perhaps most important, the talc litigation involves litigation claims of huge import to the public generally. For better or worse, litigation has become an adjunct to regulation in the United States. If the plaintiffs’ claims are true, then there has been a serious failure of national and international regulatory agencies and scientific organizations in evaluating the evidentiary record. If the defendants’ claims are true, then the plaintiffs’ lawyers have misunderstood and distorted the basic process of synthesizing evidence and arriving at conclusions of causation. More important, it behooves the public to understand why one side is wrong.

Evaluation of the California Talc Litigation Science Day

Plaintiffs’ Presentations

The presentation by the plaintiff lawyers was eerily reminiscent of the scientific case made by plaintiffs in the silicone breast implant ligation. Their arguments ranged from highlighting anecdotal evidence to emphasizing the implicit sinister nature of talc migration from the vaginal opening to the ovaries. Plaintiff counsel focused heavily on the alleged role of talc in the inflammatory process and strong (unsubstantiated) implications that anything that enhances inflammation must necessarily cause cancer.

As one would expect, plaintiff counsel placed strong emphasis on the published epidemiological studies linking talc exposure to ovarian cancer. It is important to highlight that the majority of the studies showing an association between talc and ovarian cancer came from case-control studies by design (as opposed to a cohort design). Plaintiff counsel offered very little distinction between these two study designs and, instead tried to make the case that the sheer volume of studies made their argument..

Finally, it should be noted that at many times throughout the plaintiff presentations, the presenter veered off into non-scientific, ad hominem attacks against the industry and/or activities that they tried to paint as venal or unseemly. Defense counsel made objections throughout that seemed to be based upon first amendment protections for the defendants’ ability to speak to agencies about the scientific evidence. For example, the last presenter for the plaintiffs described alleged industry “lobbying” efforts at NTP. Defense counsel objected on first amendment grounds, and the judge semi-sustained the objection on the basis that it had little or nothing to do with the science. The plaintiffs’ emphasis on “lobbying” contained no examples of misrepresentations of scientific data. See Talc Litigation in Missouri – Show Me the Law and the Evidence” (Feb. 24, 2017).

Defendants’ Presentations

In general, the defense presentations were more structured, coherent, substantial, and rigorously scientific. For example, unlike the plaintiffs’ graphics, many of the defense slides could stand on their own in a scientific or medical society presentation. The defense lawyers attempted to provide a solid foundation for the judge on the different types of ovarian cancer as well as the myriad uncertainties that exist in terms of the known causes of the condition. Many of the slides contained direct quotes from notable scientists and regulators on topics that were directly relevant to answering critical questions in the litigation.

Epidemiology and Specific Causation

Nevertheless, the defense presentations were not without their problems. Consider the following quote from an article by Graham Colditz, used in one of the defense powerpoint slides:9

The fundamental object of epidemiology is to estimate the population average risk of the disease. Risk is a population measure, not an individual disease measure.”

Colditz has served as an expert witness on epidemiology for plaintiffs in talc and other litigations, and the defense no doubt believed that they could make their point in a rhetorically powerful way by quoting him. The problem starts with the quote’s failure to make the defense’s point. Risk is a measure or relative proportions in the sample, to be used to estimate the population measure. To say that it is a group measure, however, does not mean that there are no reasonable inferences from the group measure to the individual member of the sample or population.

The defense seems to want to argue that even there were an increased risk not explained by chance, bias, or confounding, that measure of risk does not tell us anything about what caused an individual claimant’s ovarian cancer.

A fuller quotation might even have helped the defense because Colditz seems intent on undermining not just the use of group measures of risk as an individual variable, but also the use of the measure to support an inference about individuals:

The fundamental object of epidemiology is to estimate the population average risk of disease. Risk is a population measure, not an individual measure. Epidemiology does not estimate individual levels of risk, nor does it perfectly predict individual likelihood of disease. As noted by Rose, epidemiology does not describe why an individual case of cancer arises in the population but rather the population burden of cancer.14 In his article in this issue of the Journal, Begg ignores this principle and uses the term “risk” as an individual-level variable.15

The fuller quotation points to a disagreement in which another epidemiologist was willing to use risk to describe individual attribute, but more to the point is that Colditz’s assertion is that risk is a group measure.

Colditz, at least in this article, does not claim that the group measure of risk was irrelevant to prospective individual predictions or retrospective individual attributions. Interestingly, Graham Colditz has elsewhere asserted that an increased risk of disease cannot be translated into the “but-for” standard of causation10:

Knowledge that a factor is associated with increased risk of disease does not translate into the premise that a case of disease will be prevented if a specific individual eliminates exposure to that risk factor. Disease pathogenesis at the individual level is extremely complex.”

Defense may have wanted to highlight this assertion even recognizing that it is controversial, and quite dependent upon the magnitude of the measured risk.

In attempting to make their point with a quote from plaintiffs’ own expert (Dr. Graham Colditz), the defense oversimplified a much more complex issue. While increased or relative risk is indeed a measure or incidence rates used to estimate rates in the broader population, this aspect of relative risk as a measure does not mean that there are no reasonable inferences that can be made from the group measure to the individual member of the sample or population. The defense seems to want to make the seemingly unreasonable point that even if an increased risk were appropriately demonstrated by the epidemiology, that that measure of risk does not tell us anything about what caused an individual claimant’s ovarian cancer. This point might be correct when the magnitude of the increased risk is small (as is alleged in the talc ovarian cancer litigation), but the sweeping generality of the defense’s assertion is jarring.

Back in the 1960s and 1970s, tobacco companies attempted to rebut inferences of individual causation, despite scientific consensus on general causation, and relative risks of 20 to 30, and more for lung cancer in smoking versus non-smoking groups. The tobacco companies’ claim of the irrelevancy of epidemiology to inferring specific causation was not particularly credible when the population attributable risk was 95 percent and greater.

Even at lower relative risks, measures of risks in epidemiologic studies and clinical trials are used to predict individual responses to treatments, to life style interventions, and to life style and other risk factors. Of course, there is always potential heterogeneity in the sample and in the population, which should be acknowledged, but when the studies are multivariate, with inclusion of the known causes and potential risk factors, then scientists and physicians routinely use these measures of risks and benefits in groups to make predictions about individuals.

Consider a man, seriously overweight, who goes to see his internist. His physician tells him,

look in populations of overweight men, just like yourself, more men die of heart attack and stroke, and they die of these diseases at an early age, and suffer more morbidity and disability from them, then in groups of men who are not overweight, but don’t worry, that has NOTHING to do with you. We don’t know your risk, so go right on eating candy bars for breakfast, and studiously avoiding exercise.”

Of course, no sane, competent physician would advise the obese patient in this manner. Now, I understand rhetorically why the defense might want to capitalize on Colditz’ statement, but the end result appears to mislead the intended audience. The rejection of probabilistic inferences is still occasionally sanctioned by courts11, but more typically, such inferences are permitted when not conjectural.

Defense’s Misleading Claim that Case-Control Studies are Smaller than Cohort Studies

In its Science Day presentation, the defense asserted that a disadvantage of case-control studies is, among other things, their “small size.” In the same vein, the defense claimed that an advantage of cohort studies is their “large size.” The defense provided no supporting citations for its contention about the relative size of the two kinds of analytical epidemiologic studies.12

In his oral comments, the defense presenter notes the size disparity between the case-control and the cohort studies as a reason to distrust the results of the case-control studies on talc exposure and ovarian cancer. The presenter leans in and says that the cohort studies are huge, some with hundreds of thousands of women.

Now there are important qualitative differences between case-control and cohort studies, with respect to recall bias and the validity of control groups. To be sure, and fair, the defense made these points accurately. The defense’s invidious comparison of size of the two types of studies ignores that case-control studies are statistically much more efficient.

As the defense presented the matter, case-control studies are placed lower on the “hierarchy of evidence” than cohort studies. For this point, the defense did present a supporting citation13, and their claim is generally correct, but epidemiologists recognize that a well-conducted case-control study may well trump a cohort study. Case-control studies are often ranked below cohort studies because of greater potential for systematic bias, the inherent difficulty in selecting appropriate controls, and because the measure of risk in the form of an odds ratio is at best an estimate of the relative risk. The sizes of the “cases” group in a case-control study and the cohort in a cohort study are not a valid comparison.

A case-control study may be based upon hundreds of cases of ovarian cancer, a size that would require a huge cohort study. Furthermore, the size of the cohort study can be highly misleading because recruitment and inception into the cohort often takes place at a young age, when the rate of ovarian cancer is very low. The efficiency of the case-control study design is reflected in the narrow confidence intervals seen in many of the published papers. Some of these confidence intervals are as narrow as those generated by analysis of data from “larger” cohort studies.14 The size is ultimately related to the precision of the various studies’ point estimates of measured risk, not to the accuracy of their measurements. The statistical efficiency of the ovarian cancer talc case-control studies becomes an important when rare disease subtypes are considered, or when the interaction between genotype, exposure, and cancer outcomes needs to be considered.

Synthesis of Evidence for Judgments of Causation

Finally, it seems that neither the defense nor the plaintiffs adequately incorporated into their presentations the important concept of causal inference (or how evidence from disparate sources is synthesized into a judgment of causation, or into a rejection of such a claim). Specifically, counsel never explicitly set forth the importance of the Bradford Hill factors, or the techniques of proper and rigorous systematic review methodologies. The defense did touch upon many of the key considerations of the Bradford Hill factors as they applied to the relevant data, but there was no discussion of how these factors are considered after the identification of an association that is not likely the result of bias and that is beyond the play of chance. With respect to meta-analysis, both sides provided no guidance or insight into the problems that arise in conducting, reporting, and interpreting quantitative syntheses of a body of epidemiologic studies.

The Trial Court’s Role

Most trial judges, sadly, come to cases such as the talc ovarian cancer cases without any training in statistics, epidemiology, toxicology, or an adequate understanding of the role that clinical medicine plays (or does not play) in assessing important questions of causation. Judge Nelsen seemed to listen carefully, but asked few questions to suggest that Her Honor understood the discrepancy in statements made in the parties’ presentation.

Perhaps a starting point for Science Day should be an Order that sets out the procedures for the Day, as well as a statement: “The Court has read and studied the relevant chapters in the Reference Manual on Scientific Evidence (3d ed. 2011), and all materials submitted by the parties. The parties should not recreate a tutorial that covers material in the Reference Manual, unless they wish to contest its contents. Specific references to the Manual, in connection with the parties’ presentation would be very helpful to the Court.”

1  In re Johnson & Johnson Talcum Powder Products Marketing, Sales Practices & Prods. Liab. Litig., No. 16-2738 (D.N.J.)

2 Johnson & Johnson Talcum Powder Cases, No. JCCP4872 (Calif. Super. Ct., Los Angeles Cty.).

3  “Johnson & Johnson Files Status Report in MDL Docket, Requests ‘Science Day’ to Address Causation in Talc Cases,” HarrisMartin’s Talcum Powder Litigation Report (Nov. 16, 2016).

4  “Parties in Federal Talcum Powder MDL Hold ‘Science Day’,” HarrisMartin’s Talcum Powder Litig. Report (Jan. 26, 2017).

5  Id.

6  Johnson & Johnson Talcum Powder Cases, No. JCCP4872 (Calif. Super. Ct., Los Angeles Cty.).

7  “Plaintiffs Ask Court to Hold ‘Science Day’ in California Coordinated Talcum Powder Docket,” HarrisMartin’s Talcum Powder Litig. Report (Dec. 7, 2016).

8  See “Calif. Court Oversees ‘Science Day’ in Talcum Powder Docket One Day After J&J Opposes Broadcast of Hearing,” HarrisMartin’s Talcum Powder Litig. Report (Mar. 8, 2017).

9 “Defense Slide 129. “Epidemiology Estimates Risk in the Population, Not in Individuals,” quoting from Graham Colditz, “Cancer Culture: Epidemics, Human Behavior, and the Dubious Search for New Risk Factors,” 91 Am. J. Pub. Health 357, 357 (2001).

10 Graham A. Colditz, “From epidemiology to cancer prevention: implications for the 21st Century,” 18 Cancer Causes Control 117, 118 (2007).

11 See, e.g., Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1573 (N.D. Ga. 1991) (“However, in an individual case, epidemiology cannot conclusively prove causation; at best, it can only establish a certain probability that a randomly selected case of disease was one that would not have occurred absent exposure, or the ‘relative risk’ of the exposed population. Epidemiology, therefore, involves evidence on causation derived from group-based information, rather than specific conclusions regarding causation in an individual case.”).

12  See Defense Slide 134, “Disadvantages of Case-Control Studies,” which sets out in bullet points, “Recall Bias, Confounding, Small Size.” And in Defense Slide 135, “Epidemiologic Studies on Talc and Ovarian Cancer: Three Types Large Prospective Cohort Studies,” the defense touts advantages of cohort studies to include “No Recall Bias” and “Large Size.” The slides contained no supporting citation for the contention about size.

13 See Defense Slide 136, “Epidemiology Studies on Talc and Ovarian Cancer: Three Types,” where the defense places case-control studies lower on the “hierarchy of evidence” than cohort studies, citing Trisha Greenhalgh, “How to Read a Paper,” 315 Brit. Med. J. 241 (1997).

14 See Wera Berge, Kenneth Mundt, Hung Luu & Paolo Boffetta, “Genital use of talc and risk of ovarian cancer: a meta-analysis,” European J. Cancer Prevention (2017), in press, DOI: 10.1097/CEJ.0000000000000340.


American Bar Association’s “Civil Trial Practice Standards” (August 2007).

7. Use of Tutorials to Assist the Court

a. Pretrial Use of Tutorials. In cases involving complex technology or other complex subject matter which may be especially difficult for non-specialists to comprehend, the court may permit or require the use of tutorials to educate the court. Tutorials are intended to provide the court with background information to assist the court in understanding the technology or other complex subject matter involved in the case. Tutorials may, but need not, seek to explain the contentions or arguments made by each party with respect to the technology or complex subject matter.

b. Selection of Type of Tutorial.

i. In any case in which the court believes one or more tutorials might be useful in assisting it in understanding the complex technology or other complex subject matter, the court should invite the parties to express their views on the desirability of one or more tutorials.

ii. Once the court decides to permit or require one or more tutorials, it should invite the parties to suggest the subject matter and format of each tutorial.

iii. If the parties cannot agree on the subject matter and format, the court should invite each party to submit a description of any tutorial it proposes and to explain how that tutorial will assist the court and why it is preferable to the tutorial proposed by another party. The court may approve one or more tutorials proposed by the parties, or the court may fashion its own tutorial after providing the parties with an opportunity to comment on the court’s proposed subject matter and format.

c. Procedures for Presentation. A court may consider the following procedures for the presentation of tutorials:

i. An in-court or recorded presentation by an expert jointly selected by the parties.

ii. An in-court or recorded presentation by one or more experts on behalf of each party.

iii. An in-court or recorded presentation by counsel for each party.

iv. A combined in-court or recorded presentation by counsel and one or more experts on behalf of each party.

v. An in-court or recorded presentation by an expert appointed by the court, which may include cross-examination by counsel for each party.

vi. Recorded presentations that have been prepared for generic use in particular kinds of cases by reliable sources such as the Federal Judicial Center.

d. Trial Use of Tutorials. In cases involving complex technology or other complex subject matter which may be especially difficult for non-specialists to comprehend, the court may permit or require the use of tutorials to educate the court or jury during one or more stages of the trial. Trial tutorials are intended to provide the court or jury with background information to assist in understanding the technology or other complex subject matter involved in the case. Tutorials may, but need not, seek to explain the contentions or arguments made by each party with respect to the technology or complex subject matter.

e. Selection of Type of Tutorial. The court should use the process set forth in 7.b. above.

f. Procedures for Presentation.

i. In a bench trial, the court may consider using any of the procedures set forth in 7.b. above.

ii. In a jury trial, the court should consider the use of tutorials in connection with interim statements and arguments as provided in

Standard 9.

iii. In both bench and jury trials, the court should provide parties with a full opportunity to present admissible evidence in support of their cases that may differ from or quarrel with information presented in a tutorial and to argue that the information presented in a tutorial should be rejected by the court or jury.

Science Day Should Be Every Day in Our Courtrooms — Part I

March 24th, 2017

The following post and its sequel are an expansion upon a post that I wrote with Dr. David Schwartz, of Innovative Science Solutions, LLC. Dr. Schwartz is a talented scientist with whom I had the privilege and pleasure to work at McCarter & English, before he left to become an independent scientific consultant. Dr. Schwartz is one of the founding partners of his firm, which focuses on helping lawyers with the scientific issues in complex health effects litigation. Our earlier post can be found on the Courtroom View Network’s website. “Guest Analysis: Key Takeaways from Recent Talc Powder ‘Science Day’ Hearing in California,” Courtroom View Network (Mar 24, 2017).


Every February 28th, India celebrates National Science Day in honor the Indian physicist Sir Chandrashekhara Venkata Raman, who discovered the Raman effect. The United States has no equivalent celebration, but “Science Days” have become a commonplace in complex state and federal Litigations, around the country.


The major impetus for science tutorials seems to have come from the United States Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993). The holding of Daubert, now incorporated into, and extended by Federal Rule of Evidence 702, requires trial judges to act as gatekeepers of the relevance and reliability of expert witness opinion testimony in their courtrooms. One of the first tests of the judiciary’s performance to perform this role came in the silicone gel breast implant litigation. The federal silicone cases were consolidated before Judge Pointer Sam C. Pointer, Jr., in MDL 926. Judge Pointer believed that trial judges in the transferor courts should conduct whatever review of expert witness opinion was needed to satisfy the then recent Daubert decision.

Some of the first federal silicone lawsuits remanded from the MDL went to Judge Robert Jones in Portland Oregon. These cases involved complex issues of immunology, clinical rheumatology, epidemiology, toxicology, surgery, and polymer and analytical chemistry. At the outset of his case management of the remanded cases, plaintiffs’ counsel requested that Judge Jones schedule an all-day tutorial for counsel to present on these scientific issues. The parties’ tutorials, along with an avalanche of defense Daubert motions, persuaded Judge Jones to take the unusual step of appointing technical advisors to assist him in assessing the scientific evidence, inferences and claims in the silicone litigation. See Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1415 (D. Ore. 1996).1 Judge Jones’s technical advisors attended court throughout the Daubert hearings conducted in Portland, and they delivered advisory reports to Judge Jones to assist him in his gatekeeping function. Judge Jones ultimately granted the defense motions to exclude the plaintiffs’ expert witnesses’ claims of silicone causation of connective tissue diseases.

In large measure because of Judge Jones’s case management and exclusion of expert witness testimony, the silicone MDL court appointed a panel of neutral expert witnesses, in the fields of epidemiology, rheumatology, immunology, and toxicology.2

One of the first requests received from the Science Panel in MDL 926 was for what turned out to be a series of Science Days in which the parties’ expert witnesses would present to them, and explain their interpretation of the vast array of evidence, from different disciplines. Each presenting party expert witness was allowed 15 to 20 minutes to present. The lawyers were not entirely reduced to potted plants; they had a chance to conduct a short cross-examination. Given that the primary audience was a panel of four distinguished scientists, there was an emphasis for most of the lawyers, for the plaintiffs and the defendants, to ask pertinent, substantive questions.

The Science Panel was not entirely satisfied with the party expert witnesses, and requested a second Science Day, at which the Panel could call its own slate of scientists to address the scientific claims made in the litigation. The proceedings took place at the National Academies of Science, in Washington, D.C.

These proceedings, along with extensive submissions of articles and briefings from the parties led to the Report of National Science Panel, on November 30, 1998.

Every Day is Science Day, Somewhere

Since the breast implant litigation, many MDL and other courts have faced complex causation claims in litigation involving pharmaceutical products, medical devices, consumer products and a host of chemical exposures. Appointment of independent, neutral expert witnesses remains unusual, but trial judges have welcomed tutorials in the form of “Science Days,” to help them learn the methodologies and vocabularies of the scientific disciplines that are involved in the litigations before them. For some reason, the parties, the judges, and the legal media often reference Science Days in scare quotes, signaling that perhaps other Science takes place in these proceedings. Whether the scare quotes are warranted remains to be determined.

Science Days” have become a tradition in mass tort litigation.3 In the last few years, there is a Science Day somewhere, in some courtroom, going on, perhaps not daily, but with sufficient frequency that the phenomenon should receive more critical attention. Federal judges with multi-district litigation, or state judges with multi-county cases, set aside time to permit the parties a chance to educate them about the scientific and technical aspects of the litigations before them. Judges know that Daubert and Rule 702, or their state analogues, require them to act as gatekeepers. Furthermore, myriad motions in the discovery and trial phases of a case will require judges to make nuanced but accurate decisions about scope and content of discovery, and admissibility of documents and testimony,

Science Day – Have It Your Way

John Milton: We negotiating?

Kevin Lomax: Always.4

The Devil’s Advocate (1997).

There are no federal or state rules that set out procedures for science tutorials for judges or their appointed expert. The form and substance of Science Days depend upon a three-say negotiation among the plaintiffs, defendants, and the trial judge. Although the parties are often left to work out a plan for science day, most courts tend to weigh in by imposing time limits, and they may even rule in or rule out live witness testimony.

In 2007, the American Bar Association set out Civil Trial Practice Standards,5 which included an entire section on the use of tutorials to assist the court. [The relevant standards for tutorials is set out at the end of Part II of this post, as an appendix.]

1 See Laural L. Hooper, Joe S. Cecil, and Thomas E. Willging, “Neutral Science Panels: Two Examples of Panels of Court-Appointed Experts in the Breast Implants Product Liability Litigation,” at 9 (Federal Judicial Center 2001).

2 MDL 926 Order 31 (May 31, 1996) (order to show cause why a national Science Panel should not be appointed under Federal Rule of Evidence 706); MDL 926 Order No. 31C (Aug. 23, 1996) (appointing Drs. Barbara S. Hulka, Peter Tugwell, and Betty A. Diamond); Order No. 31D (Sept. 17, 1996) (appointing Dr. Nancy I. Kerkvliet).

3 See, e.g., Barbara J. Rothstein & Catherine R. Borden, Managing Multidistrict Litigation in Products Liability Cases: A Pocket Guide for Transferee Judges at 39 & n. 54 (Fed. Jud. Ctr. 2011); Sean Wajert, “‘Science Day’ In Mass Torts,” Mass Tort Defense (Oct. 20, 2008); Lisa M. Martin, “Using Science Day to Your Advantage,” 2(4) Pro Te: Solutio 9 (2009).

4 From the screenplay of the movie, directed by Taylor Hackford, written by Jonathan Lemkin and Tony Gilroy, and based on a novel by Andrew Neiderman.

5 American Bar Association’s “Civil Trial Practice Standards” (August 2007 & 2011 Update).

Washington Legal Foundation’s Paper on Statistical Significance in Rule 702 Proceedings

March 13th, 2017

The Washington Legal Foundation has released a Working Paper, No. 201, by Kirby Griffis, entitledThe Role of Statistical Significance in Daubert / Rule 702 Hearings,” in its Critical Legal Issues Working Paper Series, (Mar. 2017) [cited below as Griffis]. I am a fan of many of the Foundation’s Working Papers (having written one some years ago), but this one gives me pause.

Griffis’s paper manages to avoid many of the common errors of lawyers writing about this topic, but adds little to the statistics chapter in the Reference Manual on Scientific Evidence (3d ed. 2011), and he propagates some new, unfortunate misunderstandings. On the positive side, Griffis studiously avoids the transposition fallacy in defining significance probability, and he notes that multiplicity from subgroups and multiple comparisons often undermines claims of statistical significance. Griffis gets both points right. These are woefully common errors, and they deserve the emphasis Griffis gives to them in this working paper.

On the negative side, however, Griffis falls into error on several points. Griffis helpfully narrates the Supreme Court’s evolution in Daubert and then in Joiner, but he fails to address the serious mischief and devolution introduced by the Court’s opinion in Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309 (2011). See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, “Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety & Liability Reporter 1007 (Sept. 12, 2011). With respect to statistical practice, this Working Paper is at times wide of the mark.


Although avoiding the transposition fallacy, Griffis falls into another mistake in interpreting tests of significance; he states that a non-significant result tells us that an hypothesis is “perfectly consistent with mere chance”! Griffis at 9. This is, of course, wrong, or at least seriously misleading. A failure to reject the null hypothesis does not prove the null such that we can say that the “null results” in one study were perfectly consistent with chance. The test may have lacked power to detect an “effect size” of interest. Furthermore, tests of significance cannot rule out systematic bias or confounding, and that limitation alone ensures that Griffis’s interpretation is mistaken. A null result may have resulted from bias or confounding that obscured a measurable association.

Griffis states that p-values are expressed as percentages “usually 95% or 99%, corresponding to 0.05 or 0.01,” but this states things backwards. The p-value that is pre-specified to be “significant” is a probability or percentage that is low; it is the coefficient of confidence used to construct a confidence interval that is the complement of the significance probability. Griffis at 10. An alpha, or pre-specified statistical significance level, of 5% thus corresponds to a coefficient of confidence of 95% (or 1.0 – 0.05).

The Mid-p Controversy

In discussing the emerging case law, Griffis rightly points to cases that chastise Dr. Nicholas Jewell for the many liberties he has taken in various litigations as an expert witness for the lawsuit industry. One instance cited by Griffis is the Lipitor diabetes litigation, where the MDL court suggested that Jewell switched improperly from a Fisher’s exact test to a mid-test. Griffis at 18-19. Griffis seems to agree, but as I have explained elsewhere, Fisher’s exact test generates a one-tailed measure of significance probability, and the analyst is left to one of several ways of calculating a two-tailed test. SeeLipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test” (April 21, 2016). The mid-p is one legitimate approach for asymmetric distributions, and is more favorable to the defense than passing off the one-tailed measure as the result of the test. The mere fact that a statistical software package does not automatically specify the mid-p for a Fisher’s exact analysis does not make invoking this measure into p-hacking or other misconduct. Doubling the attained significance probability of a particular Fisher’s exact test result is generally considered less accurate than a mid-p calculation, even though some software packages using doubling attained significance probability as a default. As much as we might dislike bailing Jewell out of Daubert limbo, on this one, limited point, he deserved a better hearing.


On recounting the Bendectin litigation, Griffis refers to the epidemiologic studies of birth defects and Bendectin as “experiments,” Griffis at 7, and then describes such studies as comparing “populations,” when he clearly meant “samples.” Griffis at 8.

Griffis conflates personal bias with bias as a scientific concept of systematic error in research, a confusion usually perpetuated by plaintiffs’ counsel. See Griffis at 9 (“Coins are not the only things that can be biased: scientists can be, too, as can their experimental subjects, their hypotheses, and their manipulations of the data.”) Of course, the term has multiple connotations, but too often an accusation of personal bias, such as conflict of interest, is used to avoid engaging with the merits of a study.

Relative Risks

Griffis correctly describes the measure known as “relative risk” as a determination of the “the strength of a particular association.” Griffis at 10. The discussion then lapses into using a given relative risk as a measure of the likelihood that an individual with the exposure studied develop the disease. Sometimes this general-to-specific inference is warranted, but without further analysis, it is impossible to tell whether Griffis lapsed from general to specific, deliberately or inadvertently, in describing the interpretation of relative risk.


Griffis is right in his chief contention that the proper planning, conduct and interpretation statistical tests is hugely important to judicial gatekeeping of some expert witness opinion testimony under Federal Rule of Evidence 702 (and under Rule 703, too). Judicial and lawyer aptitude in this area is low, and needs to be bolstered.

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.