TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Judicial Gatekeeping Cures Claims That Viagra Can Cause Melonoma

January 24th, 2020

The phosphodiesterases 5 inhibitor medications (PDE5i) seem to arouse the litigation propensities of the lawsuit industry. The PDE5i medications (sildenafil, tadalafil, etc.) have multiple indications, but they are perhaps best known for their ability to induce penile erections, which in some situations can be a very useful outcome.

The launch of Viagra in 1998 was followed by litigation that claimed the drug caused heart attacks, and not the romantic kind. The only broken hearts, however, were those of the plaintiffs’ lawyers and their expert witnesses who saw their litigation claims excluded and dismissed.[1]

Then came claims that the PDE5i medications caused non-arteritic anterior ischemic optic neuropathy (“NAION”), based upon a dubious epidemiologic study by Dr. Gerald McGwin. This litigation demonstrated, if anything, that while love may be blind, erections need not be.[2] The NAION cases were consolidated in a multi-district litigation (MDL) in front of Judge Paul Magnuson, in the District of Minnesota. After considerable back and forth, Judge Manguson ultimately concluded that the McGwin study was untrustworthy, and the NAION claims were dismissed.[3]

In 2014, the American Medical Association’s internal medicine journal published an observational epidemiologic study of sildenafil (Viagra) use and melanoma.[4] The authors of the study interpreted their study modestly, concluding:

“[s]ildenafil use may be associated with an increased risk of developing melanoma. Although this study is insufficient to alter clinical recommendations, we support a need for continued investigation of this association.”

Although the Li study eschewed causal conclusions and new clinical recommendations in view of the need for more research into the issue, the litigation industry filed lawsuits, claiming causality.[5]

In the new natural order of things, as soon as the litigation industry cranks out more than a few complaints, an MDL results, and the PDE5i – melanoma claims were no exception. By spring 2016, plaintiffs’ counsel had collected ten cases, a minion, sufficient for an MDL.[6] The MDL plaintiffs named the manufacturers of sildenafil and tadalafil, two of the more widely prescribed PDEi5 medications, on behalf of putative victims.

While the MDL cases were winding their way through discovery and possible trials, additional studies and meta-analyses were published. None of the subsequent studies, including the systematic reviews and meta-analyses, concluded that there was a causal association. Most scientists who were publishing on the issue opined that systematic error (generally confounding) prevented a causal interpretation of the data.[7]

Many of the observational studies found statistically significantly increased relative risks about 1.1 to 1.2 (10 to 20%), typically with upper bounds of 95% confidence intervals less than 2.0. The only scientists who inferred general causation from the available evidence were those who had been recruited and retained by plaintiffs’ counsel. As plaintiffs’ expert witnesses, they contended that the Li study, and the several studies that became available afterwards, collectively showed that PDE5i drugs cause melanoma in humans.

Not surprisingly, given the absence of any non-litigation experts endorsing the causal conclusion, the defendants challenged plaintiffs’ proffered expert witnesses under Federal Rule of Evidence 702. Plaintiffs’ counsel also embraced judicial gatekeeping and challenged the defense experts. The MDL trial judge, the Hon. Richard Seeborg, held hearings with four days of viva voce testimony from four of plaintiffs’ expert witnesses (two on biological plausibility, and two on epidemiology), and three of the defense’s experts. Last week, Judge Seeborg ruled by granting in part, and denying in part, the parties’ motions.[8]

The Decision

The MDL trial judge’s opinion is noteworthy in many respects. First, Judge Richard Seeborg cited and applied Rule 702, a statute, and not dicta from case law that predates the most recent statutory version of the rule. As a legal process matter, this respect for judicial process and the difference in legal authority between statutory and common law was refreshing. Second, the judge framed the Rule 702 issue, in line with the statute, and Ninth Circuit precedent, as an inquiry whether expert witnesses deviated from the standard of care of how scientists “conduct their research and reach their conclusions.”[9]

Biological Plausibility

Plaintiffs proffered three expert witnesses on biological plausibility, Drs. Rizwan Haq, Anand Ganesan, and Gary Piazza. All were subject to motions to exclude under Rule 702. Judge Seeborg denied the defense motions against all three of plaintiffs’ plausibility witnesses.[10]

The MDL judge determined that biological plausibility is neither necessary nor sufficient for inferring causation in science or in the law. The defense argued that the plausibility witnesses relied upon animal and cell culture studies that were unrealistic models of the human experience.[11] The MDL court, however, found that the standard for opinions on biological plausibility is relatively forgiving, and that the testimony of all three of plaintiffs’ proffered witnesses was admissible.

The subjective nature of opinions about biological plausibility is widely recognized in medical science.[12] Plausibility determinations are typically “Just So” stories, offered in the absence of hard evidence that postulated mechanisms are actually involved in a real causal pathway in human beings.

Causal Association

The real issue in the MDL hearings was the conclusion reached by plaintiffs’ expert witnesses that the PDE5i medications cause melanoma. The MDL court did not have to determine whether epidemiologic studies were necessary for such a causal conclusion. Plaintiffs’ counsel had proffered three expert witnesses with more or less expertise in epidemiology: Drs. Rehana Ahmed-Saucedo, Sonal Singh, and Feng Liu-Smith. All of plaintiffs’ epidemiology witnesses, and certainly all of defendants’ experts, implicitly if not explicitly embraced the proposition that analytical epidemiology was necessary to determine whether PDE5i medications can cause melanoma.

In their motions to exclude Ahmed-Saucedo, Singh, and Liu-Smith, the defense pointed out that, although many of the studies yielded statistically significant estimates of melanoma risk, none of the available studies adequately accounted for systematic bias in the form of confounding. Although the plaintiffs’ plausibility expert witnesses advanced “Just-So” stories about PDE5i and melanoma, the available studies showed an almost identical increased risk of basal cell carcinoma of the skin, which would be explained by confounding, but not by plaintiffs’ postulated mechanisms.[13]

The MDL court acknowledged that whether epidemiologic studies “adequately considered” confounding was “central” to the Rule 702 inquiry. Without any substantial analysis, however, the court gave its own ipse dixit that the existence vel non of confounding was an issue for cross-examination and the jury’s resolution.[14] Whether there was a reasonably valid association between PDE5i and melanoma was a jury question. This judicial refusal to engage with the issue of confounding was one of the disappointing aspects of the decision.

The MDL court was less forgiving when it came to the plaintiffs’ epidemiology expert witnesses’ assessment of the association as causal. All the parties’ epidemiology witnesses invoked Sir Austin Bradford Hill’s viewpoints or factors for judging whether associations were causal.[15] Although they embraced Hill’s viewpoints on causation, the plaintiffs’ epidemiologic expert witnesses had a much more difficult time faithfully applying them to the evidence at hand. The MDL court concluded that the plaintiffs’ witnesses deviated from their own professional standard of care in their analysis of the data.[16]

Hill’s first enumerated factor was “strength of association,” which is typically expressed epidemiologically as a risk ratio or a risk difference. The MDL court noted that the extant epidemiologic studies generally showed relative risks around 1.2 for PDE5i and melanoma, which was “undeniably” not a strong association.[17]

The plaintiffs’ epidemiology witnesses were at sea on how to explain away the lack of strength in the putative association. Dr. Ahmed-Saucedo retreated into an emphasis on how all or most of the studies found some increased risk, but the MDL court correctly found that this ruse was merely a conflation of strength with consistency of the observed associations. Dr. Ahmed-Saucedo’s dismissal of the importance of a dose-response relationship, another Hill factor, as unimportant sealed her fate. The MDL court found that her Bradford Hill analysis was “unduly results-driven,” and that her proffered testimony was not admissible.[18] Similarly, the MDL court found that Dr. Feng Liu-Smith similarly conflated strength of association with consistency, which error was too great a professional deviation from the standard of care.[19]

Dr. Sonal Singh fared no better after he contradicted his own prior testimony that there is an order of importance to the Hill factors, with “strength of association,” at or near the top. In the face of a set of studies, none of which showed a strong association, Dr. Singh abandoned his own interpretative principle to suit the litigation needs of the case. His analysis placed the greatest weight on the Li study, which had the highest risk ratio, but he failed to advance any persuasive reason for his emphasis on one of the smallest studies available. The MDL court found that Dr. Singh’s claim to have weighed strength of association heavily, despite the obvious absence of strong associations, puzzling and too great an analytical gap to abide.[20]

Judge Seeborg thus concluded that while the plaintiffs’ expert witness could opine that there was an association, which was arguably plausible, they could not, under Rule 702, contend that the association was causal. In attempting to advance an argument that the association met Bradford Hill’s factors for causality, the plaintiffs’ witnesses had ignored, misrepresented, or confused one of the most important factors, strength of the association, in a way that revealed their analyses to be result driven and unfaithful to the methodology they claimed to have followed. Judge Seeborg emphasized a feature of the revised Rule 702, which often is ignored by his fellow federal judges:[21]

“Under the amendment, as under Daubert, when an expert purports to apply principles and methods in accordance with professional standards, and yet reaches a conclusion that other experts in the field would not reach, the trial court may fairly suspect that the principles and methods have not been faithfully applied. See Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F.3d 594, 598 (9th Cir. 1996). The amendment specifically provides that the trial court must scrutinize not only the principles and methods used by the expert, but also whether those principles and methods have been properly applied to the facts of the case.”

Given that the plaintiffs’ witnesses purported to apply a generally accepted methodology, Judge Seeborg was left to question why they would conclude causality when no one else in their field had done so.[22] The epidemiologic issue had been around for several years, and addressed not just in observational studies, but systematically reviewed and meta-analyzed. The absence of published causal conclusions was not just an absence of evidence, but evidence of absence of expert support for how plaintiffs’ expert witnesses applied the Bradford Hill factors.

Reliance Upon Studies That Did Not Conclude Causation Existed

Parties challenging causal claims will sometimes point to the absence of a causal conclusion in the publication of individual epidemiologic studies that are the main basis for the causal claim. In the PDE5i-melanoma cases, the defense advanced this argument unsuccessfully. The MDL court rejected the defense argument, based upon the absence of any comprehensive review of all the pertinent evidence for or against causality in an individual study; the study authors are mostly concerned with conveying the results of their own study.[23] The authors may have a short discussion of other study results as the rationale for their own study, but such discussions are often limited in scope and purpose. Judge Seeborg, in this latest round of PDE5i litigation, thus did not fault plaintiffs’ witnesses’ reliance upon epidemiologic or mechanistic studies, which individually did not assert causal conclusions; rather it was the absence of causal conclusions in systematic reviews, meta-analyses, narrative reviews, regulatory agency pronouncements, or clinical guidelines that ultimately raised the fatal inference that the plaintiffs’ witnesses were not faithfully deploying a generally accepted methodology.

The defense argument that pointed to the individual epidemiologic studies themselves derives some legal credibility from the Supreme Court’s opinion in General Electric Co. v. Joiner, 522 U.S. 136 (1997). In Joiner, the SCOTUS took plaintiffs’ expert witnesses to task for drawing stronger conclusions than were offered in the papers upon which they relied. Chief Justice Rehnquist gave considerable weight to the consideration that the plaintiffs’ expert witnesses relied upon studies, the authors of which explicitly refused to interpret as supporting a conclusion of human disease causation.[24]

Joiner’s criticisms of the reliance upon studies that do not themselves reach causal conclusions have gained a foothold in the case law interpreting Rule 702. The Fifth Circuit, for example, has declared:[25]

“It is axiomatic that causation testimony is inadmissible if an expert relies upon studies or publications, the authors of which were themselves unwilling to conclude that causation had been proven.”

This aspect of Joiner may properly limit the over-interpretation or misinterpretation of an individual study, which seems fine.[26] The Joiner case may, however, perpetuate an authority-based view of science to the detriment of requiring good and sufficient reasons to support the testifying expert witnesses’ opinions.  The problem with Joiner’s suggestion that expert witness opinion should not be admissible if it disagrees with the study authors’ discussion section is that sometimes study authors grossly over-interpret their data.  When it comes to scientific studies written by “political scientists” (scientists who see their work as advancing a political cause or agenda), then the discussion section often becomes a fertile source of unreliable, speculative opinions that should not be given credence in Rule 104(a) contexts, and certainly should not be admissible in trials. In other words, the misuse of non-rigorous comments in published articles can cut both ways.

There have been, and will continue to be, occasions in which published studies contain data, relevant and important to the causation issue, but which studies also contain speculative, personal opinions expressed in the Introduction and Discussion sections.  The parties’ expert witnesses may disagree with those opinions, but such disagreements hardly reflect poorly upon the testifying witnesses.  Neither side’s expert witnesses should be judged by those out-of-court opinions.  Perhaps the hearsay discussion section may be considered under Rule 104(a), which suspends the application of the Rules of Evidence, but it should hardly be a dispositive factor, other than raising questions for the reviewing court.

In exercising their gatekeeping function, trial judges should exercise care in how they assess expert witnesses’ reliance upon study data and analyses, when they disagree with the hearsay authors’ conclusions or discussions.  Given how many journals cater to advocacy scientists, and how variable the quality of peer review is, testifying expert witnesses should, in some instances,  have the expertise to interpret the data without substantial reliance upon, or reference to, the interpretative comments in the published literature.

Judge Seeborg sensibly seems to have distinguished between the absence of causal conclusions in individual epidemiologic studies and the absence of causal conclusions in any reputable medical literature.[27] He refused to be ensnared in the Joiner argument because:[28]

“Epidemiology studies typically only expressly address whether an association exists between agents such as sildenafil and tadalafil and outcomes like melanoma progression. As explained in In re Roundup Prod. Liab. Litig., 390 F. Supp. 3d 1102, 1116 (N.D. Cal. 2018), ‘[w]hether the agents cause the outcomes, however, ordinarily cannot be proven by epidemiological studies alone; an evaluation of causation requires epidemiologists to exercise judgment about the import of those studies and to consider them in context’.”

This new MDL opinion, relying upon the Advisory Committee Notes to Rule 702, is thus a more felicitous statement of the goals of gatekeeping.

Confidence Intervals

As welcome as some aspects of Judge Seeborg’s opinion are, the decision is not without mistakes. The district judge, like so many of his judicial colleagues, trips over the proper interpretation of a confidence interval:[29]

“When reviewing the results of a study it is important to consider the confidence interval, which, in simple terms, is the ‘margin of error’. For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent ‘confidence interval’ of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”

This statement is inescapably wrong. The 95 percent probability attaches to the capturing of the true parameter – the actual relative risk – in the long run of repeated confidence intervals that result from repeated sampling of the same sample size, in the same manner, from the same population. In Judge Seeborg’s example, the next sample might give a relative risk point estimate 1.9, and that new estimate will have a confidence interval that may run from just below 1.0 to over 3. A third sample might turn up a relative risk estimate of 0.8, with a confidence interval that runs from say 0.3 to 1.4. Neither the second nor the third sample would be reasonably incompatible with the first. A more accurate assessment of the true parameter is that it will be somewhere between 0.3 and 3, a considerably broader range for the 95 percent.

Judge Seeborg’s error is sadly all too common. Whenever I see the error, I wonder whence it came. Often the error is in briefs of both plaintiffs’ and defense counsel. In this case, I did not see the erroneous assertion about confidence intervals made in plaintiffs’ or defendants’ briefs.


[1]  Brumley  v. Pfizer, Inc., 200 F.R.D. 596 (S.D. Tex. 2001) (excluding plaintiffs’ expert witness who claimed that Viagra caused heart attack); Selig v. Pfizer, Inc., 185 Misc. 2d 600 (N.Y. Cty. S. Ct. 2000) (excluding plaintiff’s expert witness), aff’d, 290 A.D. 2d 319, 735 N.Y.S. 2d 549 (2002).

[2]  “Love is Blind but What About Judicial Gatekeeping of Expert Witnesses? – Viagra Part I” (July 7, 2012); “Viagra, Part II — MDL Court Sees The Light – Bad Data Trump Nuances of Statistical Inference” (July 8, 2012).

[3]  In re Viagra Prods. Liab. Litig., 572 F.Supp. 2d 1071 (D. Minn. 2008), 658 F. Supp. 2d 936 (D. Minn. 2009), and 658 F. Supp. 2d 950 (D. Minn. 2009).

[4]  Wen-Qing Li, Abrar A. Qureshi, Kathleen C. Robinson, and Jiali Han, “Sildenafil use and increased risk of incident melanoma in US men: a prospective cohort study,” 174 J. Am. Med. Ass’n Intern. Med. 964 (2014).

[5]  See, e.g., Herrara v. Pfizer Inc., Complaint in 3:15-cv-04888 (N.D. Calif. Oct. 23, 2015); Diana Novak Jones, “Viagra Increases Risk Of Developing Melanoma, Suit Says,” Law360 (Oct. 26, 2015).

[6]  See In re Viagra (Sildenafil Citrate) Prods. Liab. Litig., 176 F. Supp. 3d 1377, 1378 (J.P.M.L. 2016).

[7]  See, e.g., Jenny Z. Wang, Stephanie Le , Claire Alexanian, Sucharita Boddu, Alexander Merleev, Alina Marusina, and Emanual Maverakis, “No Causal Link between Phosphodiesterase Type 5 Inhibition and Melanoma,” 37 World J. Men’s Health 313 (2019) (“There is currently no evidence to suggest that PDE5 inhibition in patients causes increased risk for melanoma. The few observational studies that demonstrated a positive association between PDE5 inhibitor use and melanoma often failed to account for major confounders. Nonetheless, the substantial evidence implicating PDE5 inhibition in the cyclic guanosine monophosphate (cGMP)-mediated melanoma pathway warrants further investigation in the clinical setting.”); Xinming Han, Yan Han, Yongsheng Zheng, Qiang Sun, Tao Ma, Li Dai, Junyi Zhang, and Lianji Xu, “Use of phosphodiesterase type 5 inhibitors and risk of melanoma: a meta-analysis of observational studies,” 11 OncoTargets & Therapy 711 (2018).

[8]  In re Viagra (Sildenafil Citrate) and Cialis (Tadalafil) Prods. Liab. Litig., Case No. 16-md-02691-RS, Order Granting in Part and Denying in Part Motions to Exclude Expert Testimony (N.D. Calif. Jan. 13, 2020) [cited as Opinion].

[9]  Opinion at 8 (“determin[ing] whether the analysis undergirding the experts’ testimony falls within the range of accepted standards governing how scientists conduct their research and reach their conclusions”), citing Daubert v. Merrell Dow Pharm., Inc. (Daubert II), 43 F.3d 1311, 1317 (9th Cir. 1995).

[10]  Opinion at 11.

[11]  Opinion at 11-13.

[12]  See Kenneth J. Rothman, Sander Greenland, and Timothy L. Lash, “Introduction,” chap. 1, in Kenneth J. Rothman, et al., eds., Modern Epidemiology at 29 (3d ed. 2008) (“no approach can transform plausibility into an objective causal criterion).

[13]  Opinion at 15-16.

[14]  Opinion at 16-17.

[15]  See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965); see also “Woodside & Davis on the Bradford Hill Considerations” (April 23, 2013).

[16]  Opinion at 17 – 21.

[17]  Opinion at 18. The MDL court cited In re Silicone Gel Breast Implants Prod. Liab. Litig., 318 F. Supp. 2d 879, 893 (C.D. Cal. 2004), for the proposition that relative risks greater than 2.0 permit the inference that the agent under study “was more likely than not responsible for a particular individual’s disease.”

[18]  Opinion at 18.

[19]  Opinion at 20.

[20]  Opinion at 19.

[21]  Opinion at 21, quoting from Rule 702, Advisory Committee Notes (emphasis in Judge Seeborg’s opinion).

[22]  Opinion at 21.

[23]  SeeFollow the Data, Not the Discussion” (May 2, 2010).

[24]  Joiner, 522 U.S. at 145-46 (noting that the PCB studies at issue did not support expert witnesses’ conclusion that PCB exposure caused cancer because the study authors, who conducted the research, were not willing to endorse a conclusion of causation).

[25]  Huss v. Gayden, 571 F.3d 442  (5th Cir. 2009) (citing Vargas v. Lee, 317 F.3d 498, 501-01 (5th Cir. 2003) (noting that studies that did not themselves embrace causal conclusions undermined the reliability of the plaintiffs’ expert witness’s testimony that trauma caused fibromyalgia); see also McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1247-48 (11th Cir. 2005) (expert witnesses’ reliance upon studies that did not reach causal conclusions about ephedrine supported the challenge to the reliability of their proffered opinions); Happel v. Walmart, 602 F.3d 820, 826 (7th Cir. 2010) (observing that “is axiomatic that causation testimony is inadmissible if an expert relies upon studies or publications, the authors of which were themselves unwilling to conclude that causation had been proven”).

[26]  In re Accutane Prods. Liab. Litig., 511 F. Supp. 2d 1288, 1291 (M.D. Fla. 2007) (“When an expert relies on the studies of others, he must not exceed the limitations the authors themselves place on the study. That is, he must not draw overreaching conclusions.) (internal citations omitted).

[27]  See Rutigliano v. Valley Bus. Forms, 929 F. Supp. 779, 785 (D.N.J. 1996), aff’d, 118 F.3d 1577 (3d Cir. 1997) (“law warns against use of medical literature to draw conclusions not drawn in the literature itself …. Reliance upon medical literature for conclusions not drawn therein is not an accepted scientific methodology.”).

[28]  Opinion at 14

[29]  Opinion at 4 – 5.

Is the IARC Lost in the Weeds?

November 30th, 2019

A couple of years ago, I met David Zaruk at a Society for Risk Analysis meeting, where we were both presenting. I was aware of David’s blogging and investigative journalism, but meeting him gave me a greater appreciation for the breadth and depth of his work. For those of you who do not know David, he is present in cyberspace as the Risk-Monger who blogs about risk and science communications issues. His blog has featured cutting-edge exposés about the distortions in risk communications perpetuated by the advocacy of non-governmental organizations (NGOs). Previously, I have recorded my objections to the intellectual arrogance of some such organizations that purport to speak on behalf of the public interest, when often they act in cahoots with the lawsuit industry in the manufacturing of tort and environmental litigation.

David’s writing on the lobbying and control of NGOs by plaintiffs’ lawyers from the United States should be required reading for everyone who wants to understand how litigation sausage is made. His series, “SlimeGate” details the interplay among NGO lobbying, lawsuit industry maneuvering, and carcinogen determinations at the International Agency for Research on Cancer (IARC). The IARC, a branch of the World Health Organization, is headquartered in Lyon, France. The IARC convenes “working groups” to review the scientific studies of the carcinogencity of various substances and processes. The IARC working groups produce “monographs” of their reviews, and the IARC publishes these monographs, in print and on-line. The United States is in the top tier of participating countries for funding the IARC.

The IARC was founded in 1965, when observational epidemiology was still very much an emerging science, with expertise concentrated in only a few countries. For its first few decades, the IARC enjoyed a good reputation, and its monographs were considered definitive reviews, especially under its first director, Dr. John Higginson, from 1966 to 1981.[1] By the end of the 20th century, the need for the IARC and its reviews had waned, as the methods of systematic review and meta-analyses had evolved significantly, and had became more widely standardized and practiced.

Understandably, the IARC has been concerned that the members of its working groups should be viewed as disinterested scientists. Unfortunately, this concern has been translated into an asymmetrical standard that excludes anyone with a hint of manufacturing connection, but keeps the door open for those scientists with deep lawsuit industry connections. Speaking on behalf of the plaintiffs’ bar, Michael Papantonio, a plaintiffs’ lawyer who founded Mass Torts Made Perfect, noted that “We [the lawsuit industry] operate just like any other industry.”[2]

David Zaruk has shown how this asymmetry has been exploited mercilessly by the lawsuit industry and its agents in connection with the IARC’s review of glyphosate.[3] The resulting IARC classification of glyphosate has led to a litigation firestorm and an all-out assault on agricultural sustainability and productivity.[4]

The anomaly of the IARC’s glyphosate classification has been noted by scientists as well. Dr. Geoffrey Kabat is a cancer epidemiologist, who has written perceptively on the misunderstandings and distortions of cancer risk assessments in various settings.[5] He has previously written about glyphosate in Forbes and elsewhere, but recently he has written an important essay on glyphosate in Issues in Science and Technology, which is published by the National Academies of Sciences, Engineering, and Medicine and Arizona State University. In his essay, Dr. Kabat details how the IARC’s evaluation of glyphosate is an outlier in the scientific and regulatory world, and is not well supported by the available evidence.[6]

The problems with the IARC are both substantive and procedural.[7] One of the key problems that face IARC evaluations is an incoherent classification scheme. IARC evaluations classify putative human carcinogenic risks into five categories: Group I (known), Group 2A (probably), Group 2B (possibly), Group 3 (unclassifiable), and Group 4 (probably not). Group 4 is virtually an empty set with only one substance, caprolactam ((CH2)5C(O)NH), an organic compound used in the manufacture of nylon.

In the IARC evaluation at issue, glyphosate was placed into Group 2A, which would seem to satisfy the legal system’s requirement that an exposure more likely than not causes the harm in question. Appearances and word usage, however, can be deceiving. Probability is a continuous scale from zero to one. In Bayesian decision making, zero and one are unavailable because if either was our starting point, no amount of evidence could ever change our judgment of the probability of causation. (Cromwell’s Rule) The IARC informs us that its use of “probably” is quite idiosyncratic; the probability that a Group 2A agent causes cancer has “no quantitative” meaning. All the IARC intends is that a Group 2A classification “signifies a greater strength of evidence than possibly carcinogenic.”[8]

In other words, Group 2A classifications are consistent with having posterior probabilities of less than 0.5 (or 50 percent). A working group could judge the probability of a substance or a process to be carcinogenic to humans to be greater than zero, but no more than five or ten percent, and still vote for a 2A classification, in keeping with the IARC Preamble. This low probability threshold for a 2A classification converts the judgment of “probably carcinogenic” into a precautionary prescription, rendered when the most probable assessment is either ignorance or lack of causality. There is thus a practical certainty, close to 100%, that a 2A classification will confuse judges and juries, as well as the scientific community.

In IARC-speak, a 2A “probability” connotes “sufficient evidence” in experimental animals, and “limited evidence” in humans. A substance can receive a 2A classification even when the sufficient evidence of carcinogenicity occurs in one non-human animal specie, even though other animal species fail to show carcinogenicity. A 2A classification can raise the thorny question in court whether a claimant is more like a rat or a mouse.

Similarly, “limited evidence” in humans can be based upon inconsistent observational studies that fail to measure and adjust for known and potential confounding risk factors and systematic biases. The 2A classification requires little substantively or semantically, and many 2A classifications leave juries and judges to determine whether a chemical or medication caused a human being’s cancer, when the basic predicates for Sir Austin Bradford Hill’s factors for causal judgment have not been met.[9]

In courtrooms, IARC 2A classifications should be excluded as legally irrelevant, under Rule 403. Even if a 2A IARC classification were a credible judgment of causation, admitting evidence of the classification would be “substantially outweighed by a danger of … unfair prejudice, confusing the issues, [and] misleading the jury….”[10]

The IARC may be lost in the weeds, but there is no need to fret. A little Round Up™ will help.


[1]  See John Higginson, “The International Agency for Research on Cancer: A Brief History of Its History, Mission, and Program,” 43 Toxicological Sci. 79 (1998).

[2]  Sara Randazzo & Jacob Bunge, “Inside the Mass-Tort Machine That Powers Thousands of Roundup Lawsuits,” Wall St. J. (Nov. 25, 2019).

[3]  David Zaruk, “The Corruption of IARC,” Risk Monger (Aug. 24, 2019); David Zaruk, “Greed, Lies and Glyphosate: The Portier Papers,” Risk Monger (Oct. 13, 2017).

[4]  Ted Williams, “Roundup Hysteria,” Slate Magazine (Oct. 14, 2019).

[5]  See, e.g., Geoffrey Kabat, Hyping Health Risks: Environmental Hazards in Everyday Life and the Science of Epidemiology (2008); Geoffrey Kabat, Getting Risk Right: Understanding the Science of Elusive Health Risks (2016).

[6]  Geoffrey Kabat, “Who’s Afraid of Roundup?” 36 Issues in Science and Technology (Fall 2019).

[7]  See Schachtman, “Infante-lizing the IARC” (May 13, 2018); “The IARC Process is Broken” (May 4, 2016). See also Eric Lasker and John Kalas, “Engaging with International Carcinogen Evaluations,” Law360 (Nov. 14, 2019).

[8]  “IARC Preamble to the IARC Monographs on the Identification of Carcinogenic Hazards to Humans,” at Sec. B.5., p.31 (Jan. 2019); See alsoIARC Advisory Group Report on Preamble” (Sept. 2019).

[9]  See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965) (noting that only when “[o]ur observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” do we move on to consider the nine articulated factors for determining whether an association is causal.

[10]  Fed. R. Evid. 403.

 

Science Bench Book for Judges

July 13th, 2019

On July 1st of this year, the National Judicial College and the Justice Speakers Institute, LLC released an online publication of the Science Bench Book for Judges [Bench Book]. The Bench Book sets out to cover much of the substantive material already covered by the Federal Judicial Center’s Reference Manual:

Acknowledgments

Table of Contents

  1. Introduction: Why This Bench Book?
  2. What is Science?
  3. Scientific Evidence
  4. Introduction to Research Terminology and Concepts
  5. Pre-Trial Civil
  6. Pre-trial Criminal
  7. Trial
  8. Juvenile Court
  9. The Expert Witness
  10. Evidence-Based Sentencing
  11. Post Sentencing Supervision
  12. Civil Post Trial Proceedings
  13. Conclusion: Judges—The Gatekeepers of Scientific Evidence

Appendix 1 – Frye/Daubert—State-by-State

Appendix 2 – Sample Orders for Criminal Discovery

Appendix 3 – Biographies

The Bench Book gives some good advice in very general terms about the need to consider study validity,[1] and to approach scientific evidence with care and “healthy skepticism.”[2] When the Bench Book attempts to instruct on what it represents the scientific method of hypothesis testing, the good advice unravels:

“A scientific hypothesis simply cannot be proved. Statisticians attempt to solve this dilemma by adopting an alternate [sic] hypothesis – the null hypothesis. The null hypothesis is the opposite of the scientific hypothesis. It assumes that the scientific hypothesis is not true. The researcher conducts a statistical analysis of the study data to see if the null hypothesis can be rejected. If the null hypothesis is found to be untrue, the data support the scientific hypothesis as true.”[3]

Even in experimental settings, a statistical analysis of the data do not lead to a conclusion that the null hypothesis is untrue, as opposed to not reasonably compatible with the study’s data. In observational studies, the statistical analysis must acknowledge whether and to what extent the study has excluded bias and confounding. When the Bench Book turns to speak of statistical significance, more trouble ensues:

“The goal of an experiment, or observational study, is to achieve results that are statistically significant; that is, not occurring by chance.”[4]

In the world of result-oriented science, and scientific advocacy, it is perhaps true that scientists seek to achieve statistically significant results. Still, it seems crass to come right out and say so, as opposed to saying that the scientists are querying the data to see whether they are compatible with the null hypothesis. This first pass at statistical significance is only mildly astray compared with the Bench Book’s more serious attempts to define statistical significance and confidence intervals:

4.10 Statistical Significance

The research field agrees that study outcomes must demonstrate they are not the result of random chance. Leaving room for an error of .05, the study must achieve a 95% level of confidence that the results were the product of the study. This is denoted as p ≤ 05. (or .01 or .1).”[5]

and

“The confidence interval is also a way to gauge the reliability of an estimate. The confidence interval predicts the parameters within which a sample value will fall. It looks at the distance from the mean a value will fall, and is measured by using standard deviations. For example, if all values fall within 2 standard deviations from the mean, about 95% of the values will be within that range.”[6]

Of course, the interval speaks to the precision of the estimate, not its reliability, but that is a small point. These definitions are virtually guaranteed to confuse judges into conflating statistical significance and the coefficient of confidence with the legal burden of proof probability.

The Bench Book runs into problems in interpreting legal decisions, which would seem softer grist for the judicial mill. The authors present dictum from the Daubert decision as though it were a holding:[7]

“As noted in Daubert, ‘[t]he focus, of course, must be solely on principles and methodology, not on the conclusions they generate’.”

The authors fail to mention that this dictum was abandoned in Joiner, and that it is specifically rejected by statute, in the 2000 revision to the Federal Rule of Evidence 702.

Early in the Bench Book, it authors present a subsection entitled “The Myth of Scientific Objectivity,” which they might have borrowed from Feyerabend or Derrida. The heading appears misleading because the text contradicts it:

“Scientists often develop emotional attachments to their work—it can be difficult to abandon an idea. Regardless of bias, the strongest intellectual argument, based on accepted scientific hypotheses, will always prevail, but the road to that conclusion may be fraught with scholarly cul-de-sacs.”[8]

In a similar vein, the authors misleadingly tell readers that “the forefront of science is rarely encountered in court,” and so “much of the science mentioned there shall be considered established….”[9] Of course, the reality is that many causal claims presented in court have already been rejected or held to be indeterminate by the scientific community. And just when readers may think themselves safe from the goblins of nihilism, the authors launch into a theory of naïve probabilism that science is just placing subjective probabilities upon data, based upon preconceived biases and beliefs:

“All of these biases and beliefs play into the process of weighing data, a critical aspect of science. Placing weight on a result is the process of assigning a probability to an outcome. Everything in the universe can be expressed in probabilities.”[10]

So help the expert witness who honestly (and correctly) testifies that the causal claim or its rejection cannot be expressed as a probability statement!

Although I have not read all of the Bench Book closely, there appears to be no meaningful discussion of Rule 703, or of the need to access underlying data to ensure that the proffered scientific opinion under scrutiny has used appropriate methodologies at every step in its development. Even a 412 text cannot address every issue, but this one does little to help the judicial reader find more in-depth help on statistical and scientific methodological issues that arise in occupational and environmental disease claims, and in pharmaceutical products litigation.

The organizations involved in this Bench Book appear to be honest brokers of remedial education for judges. The writing of this Bench Book was funded by the State Justice Institute (SJI) Which is a creation of federal legislation enacted with the laudatory goal of improving the quality of judging in state courts.[11] Despite its provenance in federal legislation, the SJI is a a private, nonprofit corporation, governed by 11 directors appointed by the President, and confirmed by the Senate. A majority of the directors (six) are state court judges, one state court administrator, and four members of the public (no more than two from any one political party). The function of the SJI is to award grants to improve judging in state courts.

The National Judicial College (NJC) originated in the early 1960s, from the efforts of the American Bar Association, American Judicature Society and the Institute of Judicial Administration, to provide education for judges. In 1977, the NJC became a Nevada not-for-profit (501)(c)(3) educational corporation, which its campus at the University of Nevada, Reno, where judges could go for training and recreational activities.

The Justice Speakers Institute appears to be a for-profit company that provides educational resources for judge. A Press Release touts the Bench Book and follow-on webinars. Caveat emptor.

The rationale for this Bench Book is open to question. Unlike the Reference Manual for Scientific Evidence, which was co-produced by the Federal Judicial Center and the National Academies of Science, the Bench Book’s authors are lawyers and judges, without any subject-matter expertise. Unlike the Reference Manual, the Bench Book’s chapters have no scientist or statistician authors, and it shows. Remarkably, the Bench Book does not appear to cite to the Reference Manual or the Manual on Complex Litigation, at any point in its discussion of the federal law of expert witnesses or of scientific or statistical method. Perhaps taxpayers would have been spared substantial expense if state judges were simply encouraged to read the Reference Manual.


[1]  Bench Book at 190.

[2]  Bench Book at 174 (“Given the large amount of statistical information contained in expert reports, as well as in the daily lives of the general society, the ability to be a competent consumer of scientific reports is challenging. Effective critical review of scientific information requires vigilance, and some healthy skepticism.”).

[3]  Bench Book at 137; see also id. at 162.

[4]  Bench Book at 148.

[5]  Bench Book at 160.

[6]  Bench Book at 152.

[7]  Bench Book at 233, quoting Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[8]  Bench Book at 10.

[9]  Id. at 10.

[10]  Id. at 10.

[11] See State Justice Institute Act of 1984 (42 U.S.C. ch. 113, 42 U.S.C. § 10701 et seq.).

The Shmeta-Analysis in Paoli

July 11th, 2019

In the Paoli Railroad yard litigation, plaintiffs claimed injuries and increased risk of future cancers from environmental exposure to polychlorinated biphenyls (PCBs). This massive litigation showed up before federal district judge Hon. Robert F. Kelly,[1] in the Eastern District of Pennsylvania, who may well have been the first judge to grapple with a litigation attempt to use meta-analysis to show a causal association.

One of the plaintiffs’ expert witnesses was the late William J. Nicholson, who was a professor at Mt. Sinai School of Medicine, and a colleague of Irving Selikoff. Nicholson was trained in physics, and had no professional training in epidemiology. Nonetheless, Nicholson was Selikoff’s go-to colleague for performing epidemiologic studies. After Selikoff withdrew from active testifying for plaintiffs in tort litigation, Nicholson was one of his colleagues who jumped into the fray as a surrogate advocate for Selikoff.[2]

For his opinion that PCBs were causally associated with liver cancer in humans,[3] Nicholson relied upon a report he wrote for the Ontario Ministry of Labor. [cited here as “Report”].[4] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[5]

The defense challenged the admissibility of Nicholson’s meta-analysis, on several grounds. The trial court decided the challenge based upon the Downing case, which was the law in the Third Circuit, before the Supreme Court decided Daubert.[6] The Downing case allowed some opportunity for consideration of reliability and validity concerns; there is, however, disappointingly little discussion of any actual validity concerns in the courts’ opinions.

The defense challenge to Nicholson’s proffered testimony on liver cancer turned on its characterization of meta-analysis as a “novel” technique, which is generally unreliable, and its claim that Nicholson’s meta-analysis in particular was unreliable. None of the individual studies that contributed data showed any “connection” between PCBs and liver cancer; nor did any individual study conclude that there was a causal association.

Of course, the appropriate response to this situation, with no one study finding a statistically significant association, or concluding that there was a causal association, should have been “so what?” One of the reasons to do a meta-analysis is that no available study was sufficiently large to find a statistically significant association, if one were there. As for drawing conclusions of causal associations, it is not the role or place of an individual study to synthesize all the available evidence into a principled conclusion of causation.

In any event, the trial court concluded that the proffered novel technique lacked sufficient reliability, that the meta-analysis would “overwhelm, confuse, or mislead the jury,” and that the proffered meta-analysis on liver cancer was not sufficiently relevant to the facts of the case (in which no plaintiff had developed, or had died of, liver cancer). The trial court noted that the Report had not been peer-reviewed, and that it had not been accepted or relied upon by the Ontario government for any finding or policy decision. The trial court also expressed its concern that the proffered testimony along the lines of the Report would possibly confuse the jury because it appeared to be “scientific” and because Nicholson appeared to be qualified.

The Appeal

The Court of Appeals for the Third Circuit, in an opinion by Judge Becker, reversed Judge Kelly’s exclusion of the Nicholson Report, in an opinion that is still sometimes cited, even though Downing is no longer good law in the Circuit or anywhere else.[7] The Court was ultimately not persuaded that the trial court had handled the exclusion of Nicholson’s Report and its meta-analysis correctly, and it remanded the case for a do-over analysis.

Judge Becker described Nicholson’s Report as a “meta-analysis,” which pooled or “combined the results of numerous epidemiologic surveys in order to achieve a larger sample size, adjusted the results for differences in testing techniques, and drew his own scientific conclusions.”[8] Through this method, Nicholson claimed to have shown that “exposure to PCBs can cause liver, gall bladder and biliary tract disorders … even though none of the individual surveys supports such a conclusion when considered in isolation.”[9]

Validity

The appellate court gave no weight to the possibility that a meta-analysis would confuse a jury, or that its “scientific nature” or Nicholson’s credentials would lead a jury to give it more weight than it deserved.[10] The Court of Appeals conceded, however, that exclusion would have been appropriate if the methodology used itself was invalid. The appellate opinion further acknowledged that the defense had offered opposition to Nicholson’s Report in which it documented his failure to include data that were inconsistent with his conclusions, and that “Nicholson had produced a scientifically invalid study.”[11]

Judge Becker’s opinion for a panel of the Third Circuit provided no details about the cherry picking. The opinion never analyzed why this charge of cherry-picking and manipulation of the dataset did not invalidate the meta-analytic method generally, or Nicholson’s method as applied. The opinion gave no suggestion that this counter-affidavit was ever answered by the plaintiffs.

Generally, Judge Becker’s opinion dodged engagement with the specific threats to validity in Nicholson’s Report, and took refuge in the indisputable fact that hundreds of meta-analyses were published annually, and that the defense expert witnesses did not question the general reliability of meta-analysis.[12] These facts undermined the defense claim that meta-analysis was novel.[13] The reality, however, was that meta-analysis was in its infancy in bio-medical research.

When it came to the specific meta-analysis at issue, the court did not discuss or analyze a single pertinent detail of the Report. Despite its lack of engagement with the specifics of the Report’s meta-analysis, the court astutely observed that prevalent errors and flaws do not mean that a particular meta-analysis is “necessarily in error.”[14] Of course, without bothering to look, the court would not know whether the proffered meta-analysis was “actually in error.”

The appellate court would have given Nicholson’s Report a “pass” if it was an application of an accepted methodology. The defense’s remedy under this condition would be to cross-examine the opinion in front of a jury. If, on the other hand, the Nicholson had altered an accepted methodology to skew its results, then the court’s gatekeeping responsibility under Downing would be invoked.

The appellate court went on to fault the trial court for failing to make sufficiently explicit findings as to whether the questioned meta-analysis was unreliable. From its perspective, the Court of Appeals saw the trial court as resolving the reliability issue upon the greater credibility of defense expert witnesses in branding the disputed meta-analysis as unreliability. Credibility determinations are for the jury, but the court left room for a challenge on reliability itself:[15]

“Assuming that Dr. Nicholson’s meta-analysis is the proper subject of Downing scrutiny, the district court’s decision is wanting, because it did not make explicit enough findings on the reliability of Dr. Nicholson’s meta-analysis to satisfy Downing. We decline to define the exact level at which a district court can exclude a technique as sufficiently unreliable. Reliability indicia vary so much from case to case that any attempt to define such a level would most likely be pointless. Downing itself lays down a flexible rule. What is not flexible under Downing is the requirement that there be a developed record and specific findings on reliability issues. Those are absent here. Thus, even if it may be possible to exclude Dr. Nicholson’s testimony under Downing, as an unreliable, skewed meta-analysis, we cannot make such a determination on the record as it now stands. Not only was there no hearing, in limine or otherwise, at which the bases for the opinions of the contesting experts could be evaluated, but the experts were also not even deposed. All of the expert evidence was based on affidavits.”

Peer Review

Understandably, the defense attacked Nicholson’s Report as not having been peer reviewed. Without any scrutiny of the scientific bona fides of the workers’ compensation agency, the appellate court acquiesced in Nicholson’s self-serving characterization of his Report as having been reviewed by “cooperating researchers” and the Panel of the Ontario Workers’ Compensation agency. Another partisan expert witness characterized Nicholson’s Report as a “balanced assessment,” and this seemed to appease the Third Circuit, which was wary of requiring peer review in the first place.[16]

Relevancy Prong

The defense had argued that Nicholson’s Report was irrelevant because no individual plaintiff claimed liver cancer.[17] The trial court largely accepted this argument, but the appellate court disagreed because of conclusory language in Nicholson’s affidavit, in which he asserted that “proof of an increased risk of liver cancer is probative of an increased risk of other forms of cancer.” The court seemed unfazed by the ipse dixit, asserted without any support. Indeed, Nicholson’s assertion was contradicted by his own Report, in which he reported that there were fewer cancers among PCB-exposed male capacitor manufacturing workers than expected,[18] and that the rate for all cancers for both men and women was lower than expected, with 132 observed and 139.40 expected.[19]

The trial court had also agreed with the defense’s suggestion that Nicholson’s report, and its conclusion of causality between PCB exposure and liver cancer, were irrelevant because the Report “could not be the basis for anyone to say with reasonable degree of scientific certainty that some particular person’s disease, not cancer of the liver, biliary tract or gall bladder, was caused by PCBs.”[20]

Analysis

It would likely have been lost on Judge Becker and his colleagues, but Nicholson presented SMRs (standardized mortality ratios) throughout his Report, and for the all cancers statistic, he gave an SMR of 95. What Nicholson clearly did in this, and in all other instances, was simply divide the observed number by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of false and misleading. And in case anyone remembers General Electric v. Joiner, Nicholson’s summary estimate of risk for lung cancer in men was below the expected rate.[21]

Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[22]

Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.

Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies is not a valid meta-analysis; indeed, it is a well-known example of how to generate a Simpson’s Paradox, which can change the direction or magnitude of any association.[23]

Some may be tempted to criticize the defense for having focused its challenge on the “novelty” of Nicholson’s approach in Paoli. The problem of course was the invalidity of Nicholson’s work, but both the trial court’s exclusion of Nicholson, and the Court of Appeals’ reversal and remand of the exclusion decision, illustrate the problem in getting judges, even well-respected judges, to accept their responsibility to engage with questioned scientific evidence.

Even in Paoli, no amount of ketchup could conceal the unsavoriness of Nicholson’s scrapple analysis. When the Paoli case reached the Court Appeals again in 1994, Nicholson’s analysis was absent.[24] Apparently, the plaintiffs’ counsel had second thoughts about the whole matter. Today, under the revised Rule 702, there can be little doubt that Nicholson’s so-called meta-analysis should have been excluded.


[1]  Not to be confused with the Judge Kelly of the same district, who was unceremoniously disqualified after attending an ex parte conference with plaintiffs’ lawyers and expert witnesses, at the invitation of Dr. Irving Selikoff.

[2]  Pace Philip J. Landrigan & Myron A. Mehlman, “In Memoriam – William J. Nicholson,” 40 Am. J. Indus. Med. 231 (2001). Landrigan and Mehlman assert, without any support, that Nicholson was an epidemiologist. Their own description of his career, his undergraduate work at MIT, his doctorate in physics from the University of Washington, his employment at the Watson Laboratory, before becoming a staff member in Irving Selikoff’s department in 1969, all suggest that Nicholson brought little to no experience in epidemiology to his work on occupational and environmental exposure epidemiology.

[3]  In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).

[4]  William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto, Ontario Dec. 1987).

[5]  Id. at 373.

[6]  United States v. Downing, 753 F.2d 1224 (3d Cir.1985)

[7]  In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 111 S.Ct. 1584 (1991).

[8]  Id. at 845.

[9]  Id.

[10]  Id. at 841, 848.

[11]  Id. at 845.

[12]  Id. at 847-48.

[13]  See, e.g., Robert Rosenthal, Judgment studies: Design, analysis, and meta-analysis (1987); Richard J. Light & David B. Pillemer, Summing Up: the Science of Reviewing Research (1984); Thomas A. Louis, Harvey V. Fineberg & Frederick Mosteller, “Findings for Public Health from Meta-Analyses,” 6 Ann. Rev. Public Health 1 (1985); Kristan A. L’abbé, Allan S. Detsky & Keith O’Rourke, “Meta-analysis in clinical research,” 107 Ann. Intern. Med. 224 (1987).

[14]  Id. at 857.

[15]  Id. at 858/

[16]  Id. at 858.

[17]  Id. at 845.

[18]  Report, Table 16.

[19]  Report, Table 18.

[20]  In re Paoli, 916 F.2d at 847.

[21]  See General Electric v. Joiner, 522 U.S. 136 (1997); NAS, “How Have Important Rule 702 Holdings Held Up With Time?” (March 20, 2015).

[22]  Report, Table 22.

[23]  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in Statistics, 2 Biometrika 121 (1903).

[24]  In re Paoli RR Yard Litig., 35 F.3d 717 (3d Cir. 1994).

Specious Claiming in Multi-District Litigation

May 2nd, 2019

In a recent article in an American Bar Association newsletter, Paul Rheingold notes with some concern that, in the last two years or so, there has been a rash of dismissals of entire multi-district litigations (MDLs) based upon plaintiffs’ failure to produce expert witnesses who can survive Rule 702 gatekeeping.[1]  Paul D. Rheingold, “Multidistrict Litigation Mass Terminations for Failure to Prove Causation,” A.B.A. Mass Tort Litig. Newsletter (April 24, 2019) [cited as Rheingold]. According to Rheingold, judges historically involved in the MDL processing of products liability cases did not grant summary judgments across the board. In other words, federal judges felt that if plaintiffs’ lawyers aggregated a sufficient number of cases, then their judicial responsibility was to push settlements or to remand the cases to the transferor courts for trial.

Missing from Rheingold’s account is the prevalent judicial view, in the early going of MDL of products cases, which held that judges lacked the authority to consider Rule 702 motions for all cases in the MDL. Gatekeeping motions were considered extreme and best avoided by pushing them off to the transferor courts upon remand. In MDL 926, involving silicone gel breast implants, the late Judge Sam Pointer, who was a member of the Rules Advisory Committee, expressed the view that Rule 702 gatekeeping was a trial court function, for the trial judge who received the case on remand from the MDL.[2] Judge Pointer’s view was a commonplace in the 1990s. As mass tort litigation moved into MDL “camps,” judges more frequently adopted a managerial rather than a judicial role, and exerted great pressure on the parties, and the defense in particular, to settle cases. These judges frequently expressed their view that the two sides so stridently disagreed on causation that the truth must be somewhere in between, and even with “a little causation,” the defendants should offer a little compensation. These litigation managers thus eschewed dispositive motion practice, or gave it short shrift.

Rheingold cites five recent MDL terminations based upon “Daubert failure,” and he acknowledges other MDLs collapsed because of federal pre-emption issues (Eliquis, Incretins, and possibly Fosamax), and that other fatally weak causal MDL claims settled for nominal compensation (NuvaRing). He omits other MDLs, such as In re Silica, in which an entire MDL collapsed because of prevalent fraud in the screening and diagnosing of silicosis claimants by plaintiffs’ counsel and their expert witnesses.[3] Also absent from his reckoning is the collapse of MDL cases against Celebrex[4] and Viagra[5].

Rheingold does concede that the recent across-the-board dismissals of MDLs were due to very weak causal claims.[6] He softens his judgment by suggesting that the weaknesses were apparent “at least in retrospect,” but the weaknesses were clearly discernible before litigation by the refusal of regulatory agencies, such as the FDA, to accept the litigation-driven causal claims. Rheingold also tries to assuage fellow plaintiffs’ counsel by suggesting that plaintiffs’ lawyers somehow fell prey to the pressure to file cases because of internet advertising and the encouragement of records collection and analysis firms. This attribution of naiveté to Plaintiffs’ Steering Committee (PSC) members does not ring true given the wealth and resources of lawyers on PSCs. Furthermore, the suggestion that PSC member may be newcomers to the MDL playing fields does not hold water given that most of the lawyers involved are “repeat players,” with substantial experience and financial incentives to sort out invalid expert witness opinions.[7]

Rheingold offers the wise counsel that plaintiffs’ lawyers “should take [their] time and investigate for [themselves] the potential proof available for causation and adequacy of labeling.” If history is any guide, his advice will not be followed.


[1] Rheingold cites five MDLs that were “Daubert failures” in the recent times: (1) In re Lipitor (Atorvastatin Calcium) Marketing, Sales Practices & Prods. Liab.  Litig. (MDL 2502), 892 F.3d 624 (4th Cir. 2018) (affirming Rule 702 dismissal of claims that atorvastatin use caused diabetes); (2) In re Mirena IUD Products Liab. Litig. (Mirena II, MDL 2767), 713 F. App’x 11 (2d Cir. 2017) (excluding expert witnesses’ opinion testimony that the intrauterine device caused embedment and perforation); (3) In re Mirena Ius Levonorgestrel-Related Prods. Liab. Litig., (Mirena II), 341 F. Supp. 3d 213 (S.D.N.Y. 2018) (affirming Rule 702 dismissal of claims that product caused pseudotumor cerebri); (4) In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 858 F.3d 787 (3d Cir. 2017) (affirming MDL trial court’s Rule 702 exclusions of opinions that Zoloft is teratogenic); (5) Jones v. SmithKline Beecham, 652 F. App’x 848 (11th Cir. 2016) (affirming MDL court’s Rule 702 exclusions of expert witness opinions that denture adhesive creams caused metal deficiencies).

[2]  Not only was Judge Pointer a member of the Rules committee, he was the principal author of the 1993 Amendments to the Federal Rules of Civil Procedure, as well as the editor-in-chief of the Federal Judicial Center’s Manual for Complex. At an ALI-ABA conference in 1997, Judge Pointer complained about the burden of gatekeeping. 3 Federal Discovery News 1 (Aug. 1997). He further opined that, under Rule 104(a), he could “look to decisions from the Southern District of New York and Eastern District of New York, where the same expert’s opinion has been offered and ruled upon by those judges. Their rulings are hearsay, but hearsay is acceptable. So I may use their rulings as a basis for my decision on whether to allow it or not.” Id. at 4. Even after Judge Jack Weinstein excluded plaintiffs’ expert witnesses’ causal opinions in the silicone litigation, however, Judge Pointer avoided having to make an MDL-wide decision with the scope of one of the leading judges from the Southern and Eastern Districts of New York. See In re Breast Implant Cases, 942 F. Supp. 958 (E. & S.D.N.Y. 1996). Judge Pointer repeated his anti-Daubert views three years later at a symposium on expert witness opinion testimony. See Sam C. Pointer, Jr., “Response to Edward J. Imwinkelried, the Taxonomy of Testimony Post-Kumho: Refocusing on the Bottom Lines of Reliability and Necessity,” 30 Cumberland L. Rev. 235 (2000).

[3]  In re Silica Products Liab. Litig., MDL No. 1553, 398 F. Supp. 2d 563 (S.D. Tex. 2005).

[4]  In re Bextra & Celebrex Marketing Sales Practices & Prod. Liab. Litig., 524 F. Supp. 2d 1166 (N.D. Calif. 2007) (excluding virtually all relevant expert witness testimony proffered to support claims that ordinary dosages of these COX-2 inhibitors caused cardiovascular events).

[5]  In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071 (D. Minn. 2008) (addressing claims that sildenafil causes vision loss from non-arteritic anterior ischemic optic neuropathy (NAION)).

[6]  Rheingold (“Examining these five mass terminations, at least in retrospect[,] it is apparent that they were very weak on causation.”)

[7] See Elizabeth Chamblee Burch & Margaret S. Williams, “Repeat Players in Multidistrict Litigation: The Social Network,” 102 Cornell L. Rev. 1445 (2017); Margaret S. Williams, Emery G. Lee III & Catherine R. Borden, “Repeat Players in Federal Multidistrict Litigation,” 5 J. Tort L. 141, 149–60 (2014).

Has the American Statistical Association Gone Post-Modern?

March 24th, 2019

Last week, the American Statistical Association (ASA) released a special issue of its journal, The American Statistician, with 43 articles addressing the issue of “statistical significance.” If you are on the ASA’s mailing list, you received an email announcing that

the lead editorial calls for abandoning the use of ‘statistically significant’, and offers much (not just one thing) to replace it. Written by Ron Wasserstein, Allen Schirm, and Nicole Lazar, the co-editors of the special issue, ‘Moving to a World Beyond ‘p < 0.05’ summarizes the content of the issue’s 43 articles.”

In 2016, the ASA issued its “consensus” statement on statistical significance, in which it articulated six principles for interpreting p-values, and for avoiding erroneous interpretations. Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016) [ASA Statement] In the final analysis, that ASA Statement really did not change very much, and could be read fairly only to state that statistical significance was not sufficient for causal inference.1 Aside from overzealous, over-claiming lawyers and their expert witnesses, few scientists or statisticians had ever maintained that statistical significance was sufficient to support causal inference. Still, many “health effect claims” involve alleged causation that is really a modification of a base rate of a disease or disorder that happens without the allegedly harmful exposure, and which does not invariably happen even with the exposure. It is hard to imagine drawing an inference of such causation without ruling out random error, as well as bias and confounding.

According to the lead editorial for the special issue:

The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of ‘statistical significance’ be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term ‘statistically significant’ entirely. Nor should variants such as ‘significantly different’, ‘p < 0.05’, and ‘nonsignificant’ survive, whether expressed in words, by asterisks in a table, or in some other way.”2

The ASA (through Wasserstein and colleagues) appear to be condemning dichotomizing p-values, which are a continuum between zero and one. Presumably saying that a p-value is less than 5% is tantamount to dichotomizing, but providing the actual value of the p-value would cause no offense, as long as it was not labeled “significant.”

So although the ASA appears to have gone “whole hog,” the Wasserstein editorial does not appear to condemn assessing random error, or evaluating the extent of random error as part of assessing a study’s support for an association. Reporting p < 0.05 as opposed to p = a real number between zero and one is largely an artifact of statistical tables in the pre-computer era.

So what is the ASA affirmatively recommending? “Much, not just one thing?” Or too much of nothing, which we know makes a man feel ill at ease. Wasserstein’s editorial earnestly admits that there is no replacement for:

the outsized role that statistical significance has come to play. The statistical community has not yet converged on a simple paradigm for the use of statistical inference in scientific research—and in fact it may never do so.”3

The 42 other articles in the special issue certainly do not converge on any unified, coherent response to the perceived crisis. Indeed, a cursory review of the abstracts alone suggests deep disagreements over an appropriate approach to statistical inference. The ASA may claim to be agnostic in the face of the contradictory recommendations, but there is one thing we know for sure: over-reaching litigants and their expert witnesses will exploit the real or apparent chaos in the ASA’s approach. The lack of coherent, consistent guidance will launch a thousand litigation ships, with no epistemic compass.4


2 Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

3 Id. at S2.

4 See, e.g., John P. A. Ioannidis, “Retiring statistical significance would give bias a free pass,” 567 Nature 461 (2019); Valen E. Johnson, “Raise the Bar Rather than Retire Significance,” 567 Nature 461 (2019).

Expert Witnesses Who Don’t Mean What They Say

March 24th, 2019

’Then you should say what you mean’, the March Hare went on.
‘I do’, Alice hastily replied; ‘at least–at least I mean what I say–that’s the same thing, you know’.
‘Not the same thing a bit!’ said the Hatter. ‘You might just as well say that “I see what I eat” is the same thing as “I eat what I see!”’

Lewis Carroll, Alice’s Adventures in Wonderland, Chapter VII (1865)

Anick Bérard is an epidemiologist at the Université de Montréal. Most of her publications involve birth outcomes and maternal medication use, but Dr. Bérard’s advocacy also involves social media (Facebook, YouTube) and expert witnessing in litigation against the pharmaceutical industry.

When the FDA issued its alert about cardiac malformations in children born to women who took Paxil (paroxetine) in their first trimesters of pregnancy, the agency characterized its assessment of the “early results of new studies for Paxil” as “suggesting that the drug increases the risk for birth defects, particularly heart defects, when women take it during the first three months of pregnancy.”1 The agency also disclaimed any conclusion of “class effect” among the other selective serotonin reuptake inhibitors (SSRIs), such as Zoloft (sertraline), Celexa (citalopram), and Prozac (fluoxetine). Indeed, the FDA requested the manufacturer of paroxetine to undertake additional research to look at teratogenicity of paroxetine, as well as the possibility of class effects. That research never showed an SSRI teratogenicity class effect.

A “suggestion” from the FDA of an adverse effect is sufficient to launch a thousand litigation complaints, which were duly filed against GlaxoSmithKline. The plaintiffs’ counsel recruited Dr. Bérard to serve as an expert witness in support of a wide array of birth defects in Paxil cases. In her hands, the agency’s “suggestion” of causation became a conclusion. The defense challenged Bérard’s opinions, but the federal court motion to exclude her causal opinions were taken under advisement, without decision. Hayes v. SmithKline Beecham Corp., 2009 WL 4912178 (N.D. Okla. Dec. 14, 2009). One case in state court went to trial, with a verdict for plaintiffs.

Despite Dr. Bérard;s zealous advocacy for a causal association between Paxil and birth defects, she declined to assert any association between maternal use of the other, non-paroxetine SSRIs and birth defects. Here is an excerpt from her Rule 26 report in a paroxetine case:

Taken together, the available scientific evidence makes it clear that Paxil use during the first trimester of pregnancy is an independent risk factor that at least doubles the risk of cardiovascular malformations in newborns at all commonly used doses. This risk has been consistent and was further reinforced by repeated observational study findings as well as meta-analyses results. No such associations were found with other types of SSRI exposures during gestation.”2

In her sworn testimony, Dr. Bérard made clear that she really meant what she had written in her report, about exculpating the non-paroxetine SSRIs of any association with birth defects:

Q. Is it fair to say that you will not be offering an opinion that SSRIs as a class, or individual SSRIs other than Paxil increased the risk of cardiovascular malformations in newborns?

A. This is not what I was asked to do.

Q. But in fact you actually write in your report that you don’t believe there’s sufficient data to reach any conclusion about other SSRIs, true?

A. Correct.”3

In 2010, Dr. Bérard, along with two professional colleagues, published what they called a systematic review of antidepressant use in pregnancy and birth outcomes.4 In this review, Bérard specifically advised that paroxetine should be avoided by women of childbearing age, but she and her colleagaues affirmatively encouraged use of other SSRIs, such as fluoxetine, sertraline, and citalopram:

Clinical Approach: A Brief Overview

For women planning a pregnancy or when a treatment initiation during pregnancy is deemed necessary, the decision should rely not only on drug safety data but also on other factors such as the patient’s condition, previous response to other antidepressants, comorbidities, expected adverse effects and potential interactions with other current pharmacological treatments. Since there is a more extensive clinical experience with SSRIs such as fluoxetine, sertraline, and citalopram, these agents should be used as first-line therapies. Whenever possible, one should refrain from prescribing paroxetine to women of childbearing potential or planning a pregnancy. However, antenatal screening such as fetal echocardiography should be considered in a woman exposed prior to finding out about her pregnancy.5

When Bérard wrote and published her systematic review, she was still actively involved as an expert witness for plaintiffs in lawsuits against the manufacturers of paroxetine. In her 2010 review, Dr. Bérard gave no acknowledgment of monies earned in her capacity as an expert witness, and her disclosure of potential conflicts of interest was limited to noting that she was “a consultant for a plaintiff in the litigation involving Paxil.”6 In fact, Bérard had submitted multiple reports, testified at deposition, and had been listed as a testifying expert witness in many cases involving Paxil or paroxetine.

Not long after the 2010 review article, Glaxo settled most of the pending paroxetine birth defect cases, and the plaintiffs’ bar pivoted to recast their expert witnesses’ opinions as causal teratogenic conclusions about the entire class of SSRIs. In 2012, the federal courts established a “multi-district litigation,” MDL 2342, for birth defect cases involving Zoloft (sertraline), in the Philadelphia courtroom of Judge Cynthia Rufe, in the Eastern District of Pennsylvania.

Notwithstanding her 2010 clinical advice that pregnant women with depression should use fluoxetine, sertraline, or citalopram, Dr. Bérard became actively involved in the new litigation against the other, non-Paxil SSRI manufacturers. By 2013, Dr. Bérard was on record as a party expert witness for plaintiffs, opining that setraline causes virtually every major congenital malformation.7

In the same year, 2013, Dr. Bérard published another review article on teratogens, but now she gave a more equivocal view of the other SSRIs, claiming that they were “known carcinogens,” but acknowledging in a footnote that teratogenicity of the SSRIs was “controversial.”8 Incredibly, this review article states that “Anick Bérard and Sonia Chaabane have no potential conflicts of interest to disclose.”9

Ultimately, Dr. Bérard could not straddle her own contradictory statements and remain upright, which encouraged the MDL court to examine her opinions closely for methodological shortcomings and failures. Although Bérard had evolved to claim a teratogenic “class effect” for all the SSRIs, the scientific support for her claim was somewhere between weak to absent.10 Perhaps even more distressing, many of the pending claims involving the other SSRIs arose from pregnancies and births that predated Bérard’s epiphany about class effect. Finding ample evidence of specious claiming, the federal court charged with oversight of the sertraline birth defect claims excluded Dr. Bérard’s causal opinions for failing to meet the requirements of Federal Rule of Evidence 702.11

Plaintiffs sought to substitute Nicholas Jewell for Dr. Bérard, but Dr. Jewell fared no better, and was excluded for other methodological shenanigans.12 Ultimately, a unanimous panel of the United States Court of Appeals, for the Third Circuit, upheld the expert witness exclusions.13


1 See “FDA Advising of Risk of Birth Defects with Paxil; Agency Requiring Updated Product Labeling,” P05-97 (Dec. 8, 2005) (emphasis added).

2 Bérard Report in Hayes v. SmithKline Beecham Corp, 2009 WL 3072955, at *4 (N.D. Okla. Feb. 4, 2009) (emphasis added).

3 Deposition Testimony of Anick Bérard, in Hayes v. SmithKline Beecham Corp., at 120:16-25 (N.D. Okla. April 2009).

4 Marieve Simoncelli, Brigitte-Zoe Martin & Anick Bérard, “Antidepressant Use During Pregnancy: A Critical Systematic Review of the Literature,” 5 Current Drug Safety 153 (2010).

5 Id. at 168b.

6 Id. at 169 (emphasis added).

7 See Anick Bérard, “Expert Report” (June 19, 2013).

8 Sonia Chaabanen & Anick Bérard, “Epidemiology of Major Congenital Malformations with Specific Focus on Teratogens,” 8 Current Drug Safety 128, 136 (2013).

9 Id. at 137b.

10 See, e.g., Nicholas Myles, Hannah Newall, Harvey Ward, and Matthew Large, “Systematic meta-analysis of individual selective serotonin reuptake inhibitor medications and congenital malformations,” 47 Australian & New Zealand J. Psychiatry 1002 (2013).

11 See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 26 F.Supp. 3d 449 (E.D.Pa. 2014) (Rufe, J.). Plaintiffs, through their Plaintiffs’ Steering Committee, moved for reconsideration, but Judge Rufe reaffirmed her exclusion of Dr. Bérard. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration). See Zoloft MDL Relieves Matrixx Depression” (Jan. 30, 2015).

12 See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed); In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). See alsoThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

The Contrivance Standard for Gatekeeping

March 23rd, 2019

According to Google ngram, the phrase “junk science” made its debut circa 1975, lagging junk food by about five years. SeeThe Rise and Rise of Junk Science” (Mar. 8, 2014). I have never much like the phrase “junk science” because it suggests that courts need only be wary of the absurd and ridiculous in their gatekeeping function. Some expert witness opinions are, in fact, serious scientific contributions, just not worthy of being advanced as scientific conclusions. Perhaps better than “junk” would be patho-epistemologic opinions, or maybe even wissenschmutz, but even these terms might obscure that the opinion that needs to be excluded derives from serious scientific, only it is not ready to be held forth as a scientific conclusion that can be colorably called knowledge.

Another formulation of my term, patho-epistemology, is the Eleventh Circuit’s lovely “Contrivance Standard.” Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 & n.7 (11th Cir. 2005). In Rink, the appellate court held that the district court had acted within its discretion to exclude expert witness testimony because it had properly confined its focus to the challenged expert witness’s methodology, not his credibility:

“In evaluating the reliability of an expert’s method, however, a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result. See Joiner, 522 U.S. at 146, 118 S.Ct. at 519 (affirming exclusion of testimony where the methodology was called into question because an “analytical gap” existed “between the data and the opinion proffered”); see also Elcock v. Kmart Corp., 233 F.3d 734, 748 (3d Cir. 2000) (questioning the methodology of an expert because his “novel synthesis” of two accepted methodologies allowed the expert to ”offer a subjective judgment … in the guise of a reliable expert opinion”).”

Note the resistance, however, to the Supreme Court’s mandate of gatekeeping. District courts must apply the statutes, Rule of Evidence 702 and 703. There is no legal authority for the suggestion that a district court “may properly consider wither the expert’s methodology has been contrived.” Rink, 400 F.3d at 1293 n.7 (emphasis added).

The Joiner Finale

March 23rd, 2019

“This is the end
Beautiful friend

This is the end
My only friend, the end”

Jim Morrison, “The End” (c. 1966)


The General Electric Co. v. Joiner, 522 U.S. 136 (1997), case was based upon polychlorinated biphenyl exposures (PCB), only in part. The PCB part did not hold up well legally in the Supreme Court; nor was the PCB lung cancer claim vindicated by later scientific evidence. SeeHow Have Important Rule 702 Holdings Held Up With Time?” (Mar. 20, 2015).

The Supreme Court in Joiner reversed and remanded the case to the 11th Circuit, which then remanded the case back to the district court to address claims that Mr. Joiner had been exposed to furans and dioxins, and that these other chemicals had caused, or contributed to, his lung cancer, as well. Joiner v. General Electric Co., 134 F.3d 1457 (11th Cir. 1998) (per curiam). Thus the dioxins were left in the case even after the Supreme Court ruled.

After the Supreme Court’s decision, Anthony Roisman argued that the Court had addressed an artificial question when asked about PCBs alone because the case was really about an alleged mixture of exposures, and he held out hope that the Joiners would do better on remand. Anthony Z. Roisman, “The Implications of G.E. v. Joiner for Admissibility of Expert Testimony,” 1 Res Communes 65 (1999).

Many Daubert observers (including me) are unaware of the legal fate of the Joiners’ claims on remand. In the only reference I could find, the commentator simply noted that the case resolved before trial.[1] I am indebted to Michael Risinger, and Joseph Cecil, for pointing me to documents from PACER, which shed some light upon the Joiner “endgame.”

In February 1998, Judge Orinda Evans, who had been the original trial judge, and who had sustained defendants’ Rule 702 challenges and granted their motions for summary judgments, received and reopened the case upon remand from the 11th Circuit. In March, Judge Evans directed the parties to submit a new pre-trial order by April 17, 1998. At a status conference in April 1998, Judge Evans permitted the plaintiffs additional discovery, to be completed by June 17, 1998. Five days before the expiration of their additional discovery period, the plaintiffs moved for additional time; defendants opposed the request. In July, Judge Evans granted the requested extension, and gave defendants until November 1, 1998, to file for summary judgment.

Meanwhile, in June 1998, new counsel entered their appearances for plaintiffs – William Sims Stone, Kevin R. Dean, Thomas Craig Earnest, and Stanley L. Merritt. The docket does not reflect much of anything about the new discovery other than a request for a protective order for an unpublished study. But by October 6, 1998, the new counsel, Earnest, Dean, and Stone (but not Merritt) withdrew as attorneys for the Joiners, and by the end of October 1998, Judge Evans entered an order to dismiss the case, without prejudice.

A few months later, in February 1999, the parties filed a stipulation, approved by the Clerk, dismissing the action with prejudice, and with each party to bear its own coasts. Given the flight of plaintiffs’ counsel, the dismissals without and then with prejudice, a settlement seems never to have been involved in the resolution of the Joiner case. In the end, the Joiners’ case fizzled perhaps to avoid being Frye’d.

And what has happened since to the science of dioxins and lung cancer?

Not much.

In 2006, the National Research Council published a monograph on dioxin, which took the controversial approach of focusing on all cancer mortality rather than specific cancers that had been suggested as likely outcomes of interest. See David L. Eaton (Chairperson), Health Risks from Dioxin and Related Compounds – Evaluation of the EPA Reassessment (2006). The validity of this approach, and the committee’s conclusions, were challenged vigorously in subsequent publications. Paolo Boffetta, Kenneth A. Mundt, Hans-Olov Adami, Philip Cole, and Jack S. Mandel, “TCDD and cancer: A critical review of epidemiologic studies,” 41 Critical Rev. Toxicol. 622 (2011) (“In conclusion, recent epidemiological evidence falls far short of conclusively demonstrating a causal link between TCDD exposure and cancer risk in humans.”

In 2013, the Industrial Injuries Advisory Council (IIAC), an independent scientific advisory body in the United Kingdom, published a review of lung cancer and dioxin. The Council found the epidemiologic studies mixed, and declined to endorse the compensability of lung cancer for dioxin-exposed industrial workers. Industrial Injuries Advisory Council – Information Note on Lung cancer and Dioxin (December 2013). See also Mann v. CSX Transp., Inc., 2009 WL 3766056, 2009 U.S. Dist. LEXIS 106433 (N.D. Ohio 2009) (Polster, J.) (dioxin exposure case) (“Plaintiffs’ medical expert, Dr. James Kornberg, has opined that numerous organizations have classified dioxins as a known human carcinogen. However, it is not appropriate for one set of experts to bring the conclusions of another set of experts into the courtroom and then testify merely that they ‘agree’ with that conclusion.”), citing Thorndike v. DaimlerChrysler Corp., 266 F. Supp. 2d 172 (D. Me. 2003) (court excluded expert who was “parroting” other experts’ conclusions).

Last year, an industrial cohort, followed for two decades found no increased risk of lung cancer among workers exposed to dioxin. David I. McBride, James J. Collins, Thomas John Bender, Kenneth M Bodner, and Lesa L. Aylward, “Cohort study of workers at a New Zealand agrochemical plant to assess the effect of dioxin exposure on mortality,” 8 Brit. Med. J. Open e019243 (2018) (reporting SMR for lung cancer 0.95, 95%CI: 0.56 to 1.53)


[1] Morris S. Zedeck, Expert Witness in the Legal System: A Scientist’s Search for Justice 49 (2010) (noting that, after remand from the Supreme Court, Joiner v. General Electric resolved before trial)

 

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

March 23rd, 2019

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  • Double the one-sided p-value.
  • Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  • Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016) (granting summary judgment after excluding Dr. Jewell). SeeThe Education of Judge Rufe – The Zoloft MDL” (April 9, 2016).

In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., 145 F.Supp. 3d 573 (D.S.C. 2015), reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016). As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed. Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).