TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Statistical Significance at the New England Journal of Medicine

July 19th, 2019

Some wild stuff has been going on in the world of statistics, at the American Statistical Association, and elsewhere. A very few obscure journals have declared p-values to be verboten, and presumably confidence intervals as well. The world of biomedical research has generally reacted more sanely, with authors defending the existing frequentist approaches and standards.[1]

This week, the editors of the New England Journal of Medicine have issued new statistical guidelines for authors. The Journal’s approach seems appropriately careful and conservative for the world of biomedical research. In an editorial introducing the new guidelines,[2] the Journal editors remind their potential authors that statistical significance and p-values are here to stay:

“Despite the difficulties they pose, P values continue to have an important role in medical research, and we do not believe that P values and significance tests should be eliminated altogether. A well-designed randomized or observational study will have a primary hypothesis and a prespecified method of analysis, and the significance level from that analysis is a reliable indicator of the extent to which the observed data contradict a null hypothesis of no association between an intervention or an exposure and a response. Clinicians and regulatory agencies must make decisions about which treatment to use or to allow to be marketed, and P values interpreted by reliably calculated thresholds subjected to appropriate adjustments have a role in those decisions.”[3]

The Journal’s editors described their revamped statistical policy as being based upon three premises:

(1) adhering to prespecified analysis plans if they exist;

(2) declaring associations or effects only for statistical analyses that have pre-specified “a method for controlling type I error”; and

(3) presenting evidence about clinical benefits or harms requires “both point estimates and their margins of error.”

With a hat tip to the ASA’s recent pronouncements on statistical significance,[4] the editors suggest that their new guidelines have moved away from bright-line applications of statistical significance “as a bright-line marker for a conclusion or a claim”[5]:

“[T]he notion that a treatment is effective for a particular outcome if P < 0.05 and ineffective if that threshold is not reached is a reductionist view of medicine that does not always reflect reality.”[6]

The editors’ language intimates greater latitude for authors in claiming associations or effects from their studies, but this latitude may well be circumscribed by tighter control over such claims in the inevitable context of multiple testing within a dataset.

The editors’ introduction of the new guidelines is not entirely coherent. The introductory editorial notes that the use of p-values for reporting multiple outcomes, without adjustments for multiplicity, inflates the number of findings with p-values less than 5%. The editors thus caution against “uncritical interpretation of multiple inferences,” which can be particularly threatening to valid inference when not all the comparisons conducted by the study investigators have been reported in their manuscript.[7] They reassuringly tell prospective authors that many methods are available to adjust for multiple comparisons, and can be used to control Type I error probability “when specified in the design of a study.”[8]

But what happens when such adjustment methods are not pre-specified in the study design? Failure to to do so do not appear to be disqualifying factors for publication in the Journal. For one thing, when the statistical analysis plan of the study has not specified adjustment methods for controlling type I error probabilities, then authors must replace p-values with “estimates of effects or association and 95% confidence intervals.”[9] It is hard to understand how this edict helps when the specified coefficient of 95% is a continuation of the 5% alpha, which would have been used in any event. The editors seem to be saying that if authors fail to pre-specify or even post-specify methods for controlling error probabilities, then they cannot declare statistical significance, or use p-values, but they can use confidence intervals in the same way they have been using them, and with the same misleading interpretations supplied by their readers.

More important, another price authors will have to pay for multiple testing without pre-specified methods of adjustment is that they will affirmatively have to announce their failure to adjust for multiplicity and that their putative associations “may not be reproducible.” Tepid as this concession is, it is better than previous practice, and perhaps it will become a badge of shame. The crucial question is whether judges, in exercising their gatekeeping responsibilities, will see these acknowledgements as disabling valid inferences from studies that carry this mandatory warning label.

The editors have not issued guidelines for the use of Bayesian statistical analyses, because “the large majority” of author manuscripts use only frequentist analyses.[10] The editors inform us that “[w]hen appropriate,” they will expand their guidelines to address Bayesian and other designs. Perhaps this expansion will be appropriate when Bayesian analysts establish a track record of abuse in their claiming of associations and effects.

The new guidelines themselves are not easy to find. The Journal has not published these guidelines as an article in their published issues, but has relegated them to a subsection of their website’s instructions to authors for new manuscripts:

https://www.nejm.org/author-center/new-manuscripts

Presumably, the actual author instructions control in any perceived discrepancy between this week’s editorial and the guidelines themselves. Authors are told that p-values generally should be two-sided. Authors’ use of:

“Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to match any adjustment made to significance levels in the corresponding test.”

Similarly, the guidelines call for, but do not require, pre-specified methods of controlling family-wide error rates for multiple comparisons. For observational studies submitted without pre-specified methods of error control, the guidelines recommend the use of point estimates and 95% confidence intervals, with an explanation that the interval widths have not been adjusted for multiplicity, and a caveat that the inferences from these findings may not be reproducible. The guidelines recommend against using p-values for such results, but again, it is difficult to see why reporting the 95% confidence intervals is recommended when p-values are not recommended.


[1]  Jonathan A. Cook, Dean A. Fergusson, Ian Ford, Mithat Gonen, Jonathan Kimmelman, Edward L. Korn, and Colin B. Begg, “There is still a place for significance testing in clinical trials,” 16 Clin. Trials 223 (2019).

[2]  David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[3]  Id. at 286.

[4]  See id. (“Journal editors and statistical consultants have become increasingly concerned about the overuse and misinterpretation of significance testing and P values in the medical literature. Along with their strengths, P values are subject to inherent weaknesses, as summarized in recent publications from the American Statistical Association.”) (citing Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s statement on p-values: context, process, and purpose,” 70 Am. Stat. 129 (2016); Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Moving to a world beyond ‘p < 0.05’,” 73 Am. Stat. s1 (2019)).

[5]  Id. at 285.

[6]  Id. at 285-86.

[7]  Id. at 285.

[8]  Id., citing Alex Dmitrienko, Frank Bretz, Ajit C. Tamhane, Multiple testing problems in pharmaceutical statistics (2009); Alex Dmitrienko & Ralph B. D’Agostino, Sr., “Multiplicity considerations in clinical trials,” 378 New Engl. J. Med. 2115 (2018).

[9]  Id.

[10]  Id. at 286.

Science Bench Book for Judges

July 13th, 2019

On July 1st of this year, the National Judicial College and the Justice Speakers Institute, LLC released an online publication of the Science Bench Book for Judges [Bench Book]. The Bench Book sets out to cover much of the substantive material already covered by the Federal Judicial Center’s Reference Manual:

Acknowledgments

Table of Contents

  1. Introduction: Why This Bench Book?
  2. What is Science?
  3. Scientific Evidence
  4. Introduction to Research Terminology and Concepts
  5. Pre-Trial Civil
  6. Pre-trial Criminal
  7. Trial
  8. Juvenile Court
  9. The Expert Witness
  10. Evidence-Based Sentencing
  11. Post Sentencing Supervision
  12. Civil Post Trial Proceedings
  13. Conclusion: Judges—The Gatekeepers of Scientific Evidence

Appendix 1 – Frye/Daubert—State-by-State

Appendix 2 – Sample Orders for Criminal Discovery

Appendix 3 – Biographies

The Bench Book gives some good advice in very general terms about the need to consider study validity,[1] and to approach scientific evidence with care and “healthy skepticism.”[2] When the Bench Book attempts to instruct on what it represents the scientific method of hypothesis testing, the good advice unravels:

“A scientific hypothesis simply cannot be proved. Statisticians attempt to solve this dilemma by adopting an alternate [sic] hypothesis – the null hypothesis. The null hypothesis is the opposite of the scientific hypothesis. It assumes that the scientific hypothesis is not true. The researcher conducts a statistical analysis of the study data to see if the null hypothesis can be rejected. If the null hypothesis is found to be untrue, the data support the scientific hypothesis as true.”[3]

Even in experimental settings, a statistical analysis of the data do not lead to a conclusion that the null hypothesis is untrue, as opposed to not reasonably compatible with the study’s data. In observational studies, the statistical analysis must acknowledge whether and to what extent the study has excluded bias and confounding. When the Bench Book turns to speak of statistical significance, more trouble ensues:

“The goal of an experiment, or observational study, is to achieve results that are statistically significant; that is, not occurring by chance.”[4]

In the world of result-oriented science, and scientific advocacy, it is perhaps true that scientists seek to achieve statistically significant results. Still, it seems crass to come right out and say so, as opposed to saying that the scientists are querying the data to see whether they are compatible with the null hypothesis. This first pass at statistical significance is only mildly astray compared with the Bench Book’s more serious attempts to define statistical significance and confidence intervals:

4.10 Statistical Significance

The research field agrees that study outcomes must demonstrate they are not the result of random chance. Leaving room for an error of .05, the study must achieve a 95% level of confidence that the results were the product of the study. This is denoted as p ≤ 05. (or .01 or .1).”[5]

and

“The confidence interval is also a way to gauge the reliability of an estimate. The confidence interval predicts the parameters within which a sample value will fall. It looks at the distance from the mean a value will fall, and is measured by using standard deviations. For example, if all values fall within 2 standard deviations from the mean, about 95% of the values will be within that range.”[6]

Of course, the interval speaks to the precision of the estimate, not its reliability, but that is a small point. These definitions are virtually guaranteed to confuse judges into conflating statistical significance and the coefficient of confidence with the legal burden of proof probability.

The Bench Book runs into problems in interpreting legal decisions, which would seem softer grist for the judicial mill. The authors present dictum from the Daubert decision as though it were a holding:[7]

“As noted in Daubert, ‘[t]he focus, of course, must be solely on principles and methodology, not on the conclusions they generate’.”

The authors fail to mention that this dictum was abandoned in Joiner, and that it is specifically rejected by statute, in the 2000 revision to the Federal Rule of Evidence 702.

Early in the Bench Book, it authors present a subsection entitled “The Myth of Scientific Objectivity,” which they might have borrowed from Feyerabend or Derrida. The heading appears misleading because the text contradicts it:

“Scientists often develop emotional attachments to their work—it can be difficult to abandon an idea. Regardless of bias, the strongest intellectual argument, based on accepted scientific hypotheses, will always prevail, but the road to that conclusion may be fraught with scholarly cul-de-sacs.”[8]

In a similar vein, the authors misleadingly tell readers that “the forefront of science is rarely encountered in court,” and so “much of the science mentioned there shall be considered established….”[9] Of course, the reality is that many causal claims presented in court have already been rejected or held to be indeterminate by the scientific community. And just when readers may think themselves safe from the goblins of nihilism, the authors launch into a theory of naïve probabilism that science is just placing subjective probabilities upon data, based upon preconceived biases and beliefs:

“All of these biases and beliefs play into the process of weighing data, a critical aspect of science. Placing weight on a result is the process of assigning a probability to an outcome. Everything in the universe can be expressed in probabilities.”[10]

So help the expert witness who honestly (and correctly) testifies that the causal claim or its rejection cannot be expressed as a probability statement!

Although I have not read all of the Bench Book closely, there appears to be no meaningful discussion of Rule 703, or of the need to access underlying data to ensure that the proffered scientific opinion under scrutiny has used appropriate methodologies at every step in its development. Even a 412 text cannot address every issue, but this one does little to help the judicial reader find more in-depth help on statistical and scientific methodological issues that arise in occupational and environmental disease claims, and in pharmaceutical products litigation.

The organizations involved in this Bench Book appear to be honest brokers of remedial education for judges. The writing of this Bench Book was funded by the State Justice Institute (SJI) Which is a creation of federal legislation enacted with the laudatory goal of improving the quality of judging in state courts.[11] Despite its provenance in federal legislation, the SJI is a a private, nonprofit corporation, governed by 11 directors appointed by the President, and confirmed by the Senate. A majority of the directors (six) are state court judges, one state court administrator, and four members of the public (no more than two from any one political party). The function of the SJI is to award grants to improve judging in state courts.

The National Judicial College (NJC) originated in the early 1960s, from the efforts of the American Bar Association, American Judicature Society and the Institute of Judicial Administration, to provide education for judges. In 1977, the NJC became a Nevada not-for-profit (501)(c)(3) educational corporation, which its campus at the University of Nevada, Reno, where judges could go for training and recreational activities.

The Justice Speakers Institute appears to be a for-profit company that provides educational resources for judge. A Press Release touts the Bench Book and follow-on webinars. Caveat emptor.

The rationale for this Bench Book is open to question. Unlike the Reference Manual for Scientific Evidence, which was co-produced by the Federal Judicial Center and the National Academies of Science, the Bench Book’s authors are lawyers and judges, without any subject-matter expertise. Unlike the Reference Manual, the Bench Book’s chapters have no scientist or statistician authors, and it shows. Remarkably, the Bench Book does not appear to cite to the Reference Manual or the Manual on Complex Litigation, at any point in its discussion of the federal law of expert witnesses or of scientific or statistical method. Perhaps taxpayers would have been spared substantial expense if state judges were simply encouraged to read the Reference Manual.


[1]  Bench Book at 190.

[2]  Bench Book at 174 (“Given the large amount of statistical information contained in expert reports, as well as in the daily lives of the general society, the ability to be a competent consumer of scientific reports is challenging. Effective critical review of scientific information requires vigilance, and some healthy skepticism.”).

[3]  Bench Book at 137; see also id. at 162.

[4]  Bench Book at 148.

[5]  Bench Book at 160.

[6]  Bench Book at 152.

[7]  Bench Book at 233, quoting Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[8]  Bench Book at 10.

[9]  Id. at 10.

[10]  Id. at 10.

[11] See State Justice Institute Act of 1984 (42 U.S.C. ch. 113, 42 U.S.C. § 10701 et seq.).

The Shmeta-Analysis in Paoli

July 11th, 2019

In the Paoli Railroad yard litigation, plaintiffs claimed injuries and increased risk of future cancers from environmental exposure to polychlorinated biphenyls (PCBs). This massive litigation showed up before federal district judge Hon. Robert F. Kelly,[1] in the Eastern District of Pennsylvania, who may well have been the first judge to grapple with a litigation attempt to use meta-analysis to show a causal association.

One of the plaintiffs’ expert witnesses was the late William J. Nicholson, who was a professor at Mt. Sinai School of Medicine, and a colleague of Irving Selikoff. Nicholson was trained in physics, and had no professional training in epidemiology. Nonetheless, Nicholson was Selikoff’s go-to colleague for performing epidemiologic studies. After Selikoff withdrew from active testifying for plaintiffs in tort litigation, Nicholson was one of his colleagues who jumped into the fray as a surrogate advocate for Selikoff.[2]

For his opinion that PCBs were causally associated with liver cancer in humans,[3] Nicholson relied upon a report he wrote for the Ontario Ministry of Labor. [cited here as “Report”].[4] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[5]

The defense challenged the admissibility of Nicholson’s meta-analysis, on several grounds. The trial court decided the challenge based upon the Downing case, which was the law in the Third Circuit, before the Supreme Court decided Daubert.[6] The Downing case allowed some opportunity for consideration of reliability and validity concerns; there is, however, disappointingly little discussion of any actual validity concerns in the courts’ opinions.

The defense challenge to Nicholson’s proffered testimony on liver cancer turned on its characterization of meta-analysis as a “novel” technique, which is generally unreliable, and its claim that Nicholson’s meta-analysis in particular was unreliable. None of the individual studies that contributed data showed any “connection” between PCBs and liver cancer; nor did any individual study conclude that there was a causal association.

Of course, the appropriate response to this situation, with no one study finding a statistically significant association, or concluding that there was a causal association, should have been “so what?” One of the reasons to do a meta-analysis is that no available study was sufficiently large to find a statistically significant association, if one were there. As for drawing conclusions of causal associations, it is not the role or place of an individual study to synthesize all the available evidence into a principled conclusion of causation.

In any event, the trial court concluded that the proffered novel technique lacked sufficient reliability, that the meta-analysis would “overwhelm, confuse, or mislead the jury,” and that the proffered meta-analysis on liver cancer was not sufficiently relevant to the facts of the case (in which no plaintiff had developed, or had died of, liver cancer). The trial court noted that the Report had not been peer-reviewed, and that it had not been accepted or relied upon by the Ontario government for any finding or policy decision. The trial court also expressed its concern that the proffered testimony along the lines of the Report would possibly confuse the jury because it appeared to be “scientific” and because Nicholson appeared to be qualified.

The Appeal

The Court of Appeals for the Third Circuit, in an opinion by Judge Becker, reversed Judge Kelly’s exclusion of the Nicholson Report, in an opinion that is still sometimes cited, even though Downing is no longer good law in the Circuit or anywhere else.[7] The Court was ultimately not persuaded that the trial court had handled the exclusion of Nicholson’s Report and its meta-analysis correctly, and it remanded the case for a do-over analysis.

Judge Becker described Nicholson’s Report as a “meta-analysis,” which pooled or “combined the results of numerous epidemiologic surveys in order to achieve a larger sample size, adjusted the results for differences in testing techniques, and drew his own scientific conclusions.”[8] Through this method, Nicholson claimed to have shown that “exposure to PCBs can cause liver, gall bladder and biliary tract disorders … even though none of the individual surveys supports such a conclusion when considered in isolation.”[9]

Validity

The appellate court gave no weight to the possibility that a meta-analysis would confuse a jury, or that its “scientific nature” or Nicholson’s credentials would lead a jury to give it more weight than it deserved.[10] The Court of Appeals conceded, however, that exclusion would have been appropriate if the methodology used itself was invalid. The appellate opinion further acknowledged that the defense had offered opposition to Nicholson’s Report in which it documented his failure to include data that were inconsistent with his conclusions, and that “Nicholson had produced a scientifically invalid study.”[11]

Judge Becker’s opinion for a panel of the Third Circuit provided no details about the cherry picking. The opinion never analyzed why this charge of cherry-picking and manipulation of the dataset did not invalidate the meta-analytic method generally, or Nicholson’s method as applied. The opinion gave no suggestion that this counter-affidavit was ever answered by the plaintiffs.

Generally, Judge Becker’s opinion dodged engagement with the specific threats to validity in Nicholson’s Report, and took refuge in the indisputable fact that hundreds of meta-analyses were published annually, and that the defense expert witnesses did not question the general reliability of meta-analysis.[12] These facts undermined the defense claim that meta-analysis was novel.[13] The reality, however, was that meta-analysis was in its infancy in bio-medical research.

When it came to the specific meta-analysis at issue, the court did not discuss or analyze a single pertinent detail of the Report. Despite its lack of engagement with the specifics of the Report’s meta-analysis, the court astutely observed that prevalent errors and flaws do not mean that a particular meta-analysis is “necessarily in error.”[14] Of course, without bothering to look, the court would not know whether the proffered meta-analysis was “actually in error.”

The appellate court would have given Nicholson’s Report a “pass” if it was an application of an accepted methodology. The defense’s remedy under this condition would be to cross-examine the opinion in front of a jury. If, on the other hand, the Nicholson had altered an accepted methodology to skew its results, then the court’s gatekeeping responsibility under Downing would be invoked.

The appellate court went on to fault the trial court for failing to make sufficiently explicit findings as to whether the questioned meta-analysis was unreliable. From its perspective, the Court of Appeals saw the trial court as resolving the reliability issue upon the greater credibility of defense expert witnesses in branding the disputed meta-analysis as unreliability. Credibility determinations are for the jury, but the court left room for a challenge on reliability itself:[15]

“Assuming that Dr. Nicholson’s meta-analysis is the proper subject of Downing scrutiny, the district court’s decision is wanting, because it did not make explicit enough findings on the reliability of Dr. Nicholson’s meta-analysis to satisfy Downing. We decline to define the exact level at which a district court can exclude a technique as sufficiently unreliable. Reliability indicia vary so much from case to case that any attempt to define such a level would most likely be pointless. Downing itself lays down a flexible rule. What is not flexible under Downing is the requirement that there be a developed record and specific findings on reliability issues. Those are absent here. Thus, even if it may be possible to exclude Dr. Nicholson’s testimony under Downing, as an unreliable, skewed meta-analysis, we cannot make such a determination on the record as it now stands. Not only was there no hearing, in limine or otherwise, at which the bases for the opinions of the contesting experts could be evaluated, but the experts were also not even deposed. All of the expert evidence was based on affidavits.”

Peer Review

Understandably, the defense attacked Nicholson’s Report as not having been peer reviewed. Without any scrutiny of the scientific bona fides of the workers’ compensation agency, the appellate court acquiesced in Nicholson’s self-serving characterization of his Report as having been reviewed by “cooperating researchers” and the Panel of the Ontario Workers’ Compensation agency. Another partisan expert witness characterized Nicholson’s Report as a “balanced assessment,” and this seemed to appease the Third Circuit, which was wary of requiring peer review in the first place.[16]

Relevancy Prong

The defense had argued that Nicholson’s Report was irrelevant because no individual plaintiff claimed liver cancer.[17] The trial court largely accepted this argument, but the appellate court disagreed because of conclusory language in Nicholson’s affidavit, in which he asserted that “proof of an increased risk of liver cancer is probative of an increased risk of other forms of cancer.” The court seemed unfazed by the ipse dixit, asserted without any support. Indeed, Nicholson’s assertion was contradicted by his own Report, in which he reported that there were fewer cancers among PCB-exposed male capacitor manufacturing workers than expected,[18] and that the rate for all cancers for both men and women was lower than expected, with 132 observed and 139.40 expected.[19]

The trial court had also agreed with the defense’s suggestion that Nicholson’s report, and its conclusion of causality between PCB exposure and liver cancer, were irrelevant because the Report “could not be the basis for anyone to say with reasonable degree of scientific certainty that some particular person’s disease, not cancer of the liver, biliary tract or gall bladder, was caused by PCBs.”[20]

Analysis

It would likely have been lost on Judge Becker and his colleagues, but Nicholson presented SMRs (standardized mortality ratios) throughout his Report, and for the all cancers statistic, he gave an SMR of 95. What Nicholson clearly did in this, and in all other instances, was simply divide the observed number by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of false and misleading. And in case anyone remembers General Electric v. Joiner, Nicholson’s summary estimate of risk for lung cancer in men was below the expected rate.[21]

Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[22]

Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.

Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies is not a valid meta-analysis; indeed, it is a well-known example of how to generate a Simpson’s Paradox, which can change the direction or magnitude of any association.[23]

Some may be tempted to criticize the defense for having focused its challenge on the “novelty” of Nicholson’s approach in Paoli. The problem of course was the invalidity of Nicholson’s work, but both the trial court’s exclusion of Nicholson, and the Court of Appeals’ reversal and remand of the exclusion decision, illustrate the problem in getting judges, even well-respected judges, to accept their responsibility to engage with questioned scientific evidence.

Even in Paoli, no amount of ketchup could conceal the unsavoriness of Nicholson’s scrapple analysis. When the Paoli case reached the Court Appeals again in 1994, Nicholson’s analysis was absent.[24] Apparently, the plaintiffs’ counsel had second thoughts about the whole matter. Today, under the revised Rule 702, there can be little doubt that Nicholson’s so-called meta-analysis should have been excluded.


[1]  Not to be confused with the Judge Kelly of the same district, who was unceremoniously disqualified after attending an ex parte conference with plaintiffs’ lawyers and expert witnesses, at the invitation of Dr. Irving Selikoff.

[2]  Pace Philip J. Landrigan & Myron A. Mehlman, “In Memoriam – William J. Nicholson,” 40 Am. J. Indus. Med. 231 (2001). Landrigan and Mehlman assert, without any support, that Nicholson was an epidemiologist. Their own description of his career, his undergraduate work at MIT, his doctorate in physics from the University of Washington, his employment at the Watson Laboratory, before becoming a staff member in Irving Selikoff’s department in 1969, all suggest that Nicholson brought little to no experience in epidemiology to his work on occupational and environmental exposure epidemiology.

[3]  In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).

[4]  William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto, Ontario Dec. 1987).

[5]  Id. at 373.

[6]  United States v. Downing, 753 F.2d 1224 (3d Cir.1985)

[7]  In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 111 S.Ct. 1584 (1991).

[8]  Id. at 845.

[9]  Id.

[10]  Id. at 841, 848.

[11]  Id. at 845.

[12]  Id. at 847-48.

[13]  See, e.g., Robert Rosenthal, Judgment studies: Design, analysis, and meta-analysis (1987); Richard J. Light & David B. Pillemer, Summing Up: the Science of Reviewing Research (1984); Thomas A. Louis, Harvey V. Fineberg & Frederick Mosteller, “Findings for Public Health from Meta-Analyses,” 6 Ann. Rev. Public Health 1 (1985); Kristan A. L’abbé, Allan S. Detsky & Keith O’Rourke, “Meta-analysis in clinical research,” 107 Ann. Intern. Med. 224 (1987).

[14]  Id. at 857.

[15]  Id. at 858/

[16]  Id. at 858.

[17]  Id. at 845.

[18]  Report, Table 16.

[19]  Report, Table 18.

[20]  In re Paoli, 916 F.2d at 847.

[21]  See General Electric v. Joiner, 522 U.S. 136 (1997); NAS, “How Have Important Rule 702 Holdings Held Up With Time?” (March 20, 2015).

[22]  Report, Table 22.

[23]  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in Statistics, 2 Biometrika 121 (1903).

[24]  In re Paoli RR Yard Litig., 35 F.3d 717 (3d Cir. 1994).

Specious Claiming in Multi-District Litigation

May 2nd, 2019

In a recent article in an American Bar Association newsletter, Paul Rheingold notes with some concern that, in the last two years or so, there has been a rash of dismissals of entire multi-district litigations (MDLs) based upon plaintiffs’ failure to produce expert witnesses who can survive Rule 702 gatekeeping.[1]  Paul D. Rheingold, “Multidistrict Litigation Mass Terminations for Failure to Prove Causation,” A.B.A. Mass Tort Litig. Newsletter (April 24, 2019) [cited as Rheingold]. According to Rheingold, judges historically involved in the MDL processing of products liability cases did not grant summary judgments across the board. In other words, federal judges felt that if plaintiffs’ lawyers aggregated a sufficient number of cases, then their judicial responsibility was to push settlements or to remand the cases to the transferor courts for trial.

Missing from Rheingold’s account is the prevalent judicial view, in the early going of MDL of products cases, which held that judges lacked the authority to consider Rule 702 motions for all cases in the MDL. Gatekeeping motions were considered extreme and best avoided by pushing them off to the transferor courts upon remand. In MDL 926, involving silicone gel breast implants, the late Judge Sam Pointer, who was a member of the Rules Advisory Committee, expressed the view that Rule 702 gatekeeping was a trial court function, for the trial judge who received the case on remand from the MDL.[2] Judge Pointer’s view was a commonplace in the 1990s. As mass tort litigation moved into MDL “camps,” judges more frequently adopted a managerial rather than a judicial role, and exerted great pressure on the parties, and the defense in particular, to settle cases. These judges frequently expressed their view that the two sides so stridently disagreed on causation that the truth must be somewhere in between, and even with “a little causation,” the defendants should offer a little compensation. These litigation managers thus eschewed dispositive motion practice, or gave it short shrift.

Rheingold cites five recent MDL terminations based upon “Daubert failure,” and he acknowledges other MDLs collapsed because of federal pre-emption issues (Eliquis, Incretins, and possibly Fosamax), and that other fatally weak causal MDL claims settled for nominal compensation (NuvaRing). He omits other MDLs, such as In re Silica, in which an entire MDL collapsed because of prevalent fraud in the screening and diagnosing of silicosis claimants by plaintiffs’ counsel and their expert witnesses.[3] Also absent from his reckoning is the collapse of MDL cases against Celebrex[4] and Viagra[5].

Rheingold does concede that the recent across-the-board dismissals of MDLs were due to very weak causal claims.[6] He softens his judgment by suggesting that the weaknesses were apparent “at least in retrospect,” but the weaknesses were clearly discernible before litigation by the refusal of regulatory agencies, such as the FDA, to accept the litigation-driven causal claims. Rheingold also tries to assuage fellow plaintiffs’ counsel by suggesting that plaintiffs’ lawyers somehow fell prey to the pressure to file cases because of internet advertising and the encouragement of records collection and analysis firms. This attribution of naiveté to Plaintiffs’ Steering Committee (PSC) members does not ring true given the wealth and resources of lawyers on PSCs. Furthermore, the suggestion that PSC member may be newcomers to the MDL playing fields does not hold water given that most of the lawyers involved are “repeat players,” with substantial experience and financial incentives to sort out invalid expert witness opinions.[7]

Rheingold offers the wise counsel that plaintiffs’ lawyers “should take [their] time and investigate for [themselves] the potential proof available for causation and adequacy of labeling.” If history is any guide, his advice will not be followed.


[1] Rheingold cites five MDLs that were “Daubert failures” in the recent times: (1) In re Lipitor (Atorvastatin Calcium) Marketing, Sales Practices & Prods. Liab.  Litig. (MDL 2502), 892 F.3d 624 (4th Cir. 2018) (affirming Rule 702 dismissal of claims that atorvastatin use caused diabetes); (2) In re Mirena IUD Products Liab. Litig. (Mirena II, MDL 2767), 713 F. App’x 11 (2d Cir. 2017) (excluding expert witnesses’ opinion testimony that the intrauterine device caused embedment and perforation); (3) In re Mirena Ius Levonorgestrel-Related Prods. Liab. Litig., (Mirena II), 341 F. Supp. 3d 213 (S.D.N.Y. 2018) (affirming Rule 702 dismissal of claims that product caused pseudotumor cerebri); (4) In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 858 F.3d 787 (3d Cir. 2017) (affirming MDL trial court’s Rule 702 exclusions of opinions that Zoloft is teratogenic); (5) Jones v. SmithKline Beecham, 652 F. App’x 848 (11th Cir. 2016) (affirming MDL court’s Rule 702 exclusions of expert witness opinions that denture adhesive creams caused metal deficiencies).

[2]  Not only was Judge Pointer a member of the Rules committee, he was the principal author of the 1993 Amendments to the Federal Rules of Civil Procedure, as well as the editor-in-chief of the Federal Judicial Center’s Manual for Complex. At an ALI-ABA conference in 1997, Judge Pointer complained about the burden of gatekeeping. 3 Federal Discovery News 1 (Aug. 1997). He further opined that, under Rule 104(a), he could “look to decisions from the Southern District of New York and Eastern District of New York, where the same expert’s opinion has been offered and ruled upon by those judges. Their rulings are hearsay, but hearsay is acceptable. So I may use their rulings as a basis for my decision on whether to allow it or not.” Id. at 4. Even after Judge Jack Weinstein excluded plaintiffs’ expert witnesses’ causal opinions in the silicone litigation, however, Judge Pointer avoided having to make an MDL-wide decision with the scope of one of the leading judges from the Southern and Eastern Districts of New York. See In re Breast Implant Cases, 942 F. Supp. 958 (E. & S.D.N.Y. 1996). Judge Pointer repeated his anti-Daubert views three years later at a symposium on expert witness opinion testimony. See Sam C. Pointer, Jr., “Response to Edward J. Imwinkelried, the Taxonomy of Testimony Post-Kumho: Refocusing on the Bottom Lines of Reliability and Necessity,” 30 Cumberland L. Rev. 235 (2000).

[3]  In re Silica Products Liab. Litig., MDL No. 1553, 398 F. Supp. 2d 563 (S.D. Tex. 2005).

[4]  In re Bextra & Celebrex Marketing Sales Practices & Prod. Liab. Litig., 524 F. Supp. 2d 1166 (N.D. Calif. 2007) (excluding virtually all relevant expert witness testimony proffered to support claims that ordinary dosages of these COX-2 inhibitors caused cardiovascular events).

[5]  In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071 (D. Minn. 2008) (addressing claims that sildenafil causes vision loss from non-arteritic anterior ischemic optic neuropathy (NAION)).

[6]  Rheingold (“Examining these five mass terminations, at least in retrospect[,] it is apparent that they were very weak on causation.”)

[7] See Elizabeth Chamblee Burch & Margaret S. Williams, “Repeat Players in Multidistrict Litigation: The Social Network,” 102 Cornell L. Rev. 1445 (2017); Margaret S. Williams, Emery G. Lee III & Catherine R. Borden, “Repeat Players in Federal Multidistrict Litigation,” 5 J. Tort L. 141, 149–60 (2014).