TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

A Proclamation from the Task Force on Statistical Significance

June 21st, 2021

The American Statistical Association (ASA) has finally spoken up about statistical significance testing.[1] Sort of.

Back in February of this year, I wrote about the simmering controversy over statistical significance at the ASA.[2] Back in 2016, the ASA issued its guidance paper on p-values and statistical significance, which sought to correct misinterpretations and misrepresentations of “statistical significance.”[3] Lawsuit industry lawyers seized upon the ASA statement to proclaim a new freedom from having to exclude random error.[4] To obtain their ends, however, the plaintiffs’ bar had to distort the ASA guidance in statistically significant ways.

To add to the confusion, in 2019, the ASA Executive Director published an editorial that called for an end to statistical significance testing.[5] Because the editorial lacked disclaimers about whether or not it represented official ASA positions, scientists, statisticians, and lawyers on all sides were fooled into thinking the ASA had gone whole hog.[6] Then ASA President Karen Kafadar stepped into the breach to explain that the Executive Director was not speaking for the ASA.[7]

In November 2019, members of the ASA board of directors (BOD) approved a motion to create a “Task Force on Statistical Significance and Replicability.”[8] Its charge was

“to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors. The task force will be appointed by the ASA President with advice and participation from the ASA BOD. The task force will report to the ASA BOD by November 2020.

The members of the Task Force identified in the motion were:

Linda Young (Nat’l Agricultural Statistics Service & Univ. of Florida; Co-Chair)

Xuming He (Univ. Michigan; Co-Chair)

Yoav Benjamini (Tel Aviv Univ.)

Dick De Veaux (Williams College; ASA Vice President)

Bradley Efron (Stanford Univ.)

Scott Evans (George Washington Univ.; ASA Publications Representative)

Mark Glickman (Harvard Univ.; ASA Section Representative)

Barry Graubard (Nat’l Cancer Instit.)

Xiao-Li Meng (Harvard Univ.)

Vijay Nair (Wells Fargo & Univ. Michigan)

Nancy Reid (Univ. Toronto)

Stephen Stigler (Univ. Chicago)

Stephen Vardeman (Iowa State Univ.)

Chris Wikle (Univ. Missouri)

Tommy Wright (U.S. Census Bureau)

Despite the inclusion of highly accomplished and distinguished statisticians on the Task Force, there were isolated demurrers. Geoff Cumming, for one, clucked:

“Why won’t statistical significance simply whither and die, taking p <. 05 and maybe even p-values with it? The ASA needs a Task Force on Statistical Inference and Open Science, not one that has its eye firmly in the rear view mirror, gazing back at .05 and significance and other such relics.”[9]

Despite the clucking, the Taskforce arrived at its recommendations, but curiously, its report did not find a home in an ASA publication. Instead, the “The ASA President’s Task Force Statement on Statistical Significance and Replicability” has now appeared as an “in press” publication at The Annals of Applied Statistics, where Karen Kafadar is the editor in chief.[10] The report is accompanied by an editorial by Kafadar.[11]

The Taskforce advanced five basic propositions, which may have been obscured by some of the recent glosses on the ASA 2016 p-value statement:

  1. “Capturing the uncertainty associated with statistical summaries is critical.”
  2. “Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data.”
  3. “The theoretical basis of statistical science offers several general strategies for dealing with uncertainty.”
  4. “Thresholds are helpful when actions are required.”
  5. “P-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data.”

All of this seems obvious and anodyne, but I suspect it will not silence the clucking.


[1] Deborah Mayo, “Alas! The ASA President’s Task Force Statement on Statistical Significance and Replicability,” Error Statistics (June 20, 2021).

[2]Falsehood Flies – The ASA 2016 Statement on Statistical Significance” (Feb. 26, 2021).

[3] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); see “The American Statistical Association’s Statement on and of Significance” (March 17, 2016).

[4]The American Statistical Association Statement on Significance Testing Goes to Court – Part I” (Nov. 13, 2018); “The American Statistical Association Statement on Significance Testing Goes to Court – Part 2” (Mar. 7, 2019).

[5]Has the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019); “American Statistical Association – Consensus versus Personal Opinion” (Dec. 13, 2019). See also Deborah G. Mayo, “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean,” Error Statistics Philosophy (June 17, 2019); B. Haig, “The ASA’s 2019 update on P-values and significance,” Error Statistics Philosophy  (July 12, 2019); Brian Tarran, “THE S WORD … and what to do about it,” Significance (Aug. 2019); Donald Macnaughton, “Who Said What,” Significance 47 (Oct. 2019).

[6] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[7] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019) (emphasis added); see Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

[8] Karen Kafadar, “Task Force on Statistical Significance and Replicability,” ASA Amstat Blog (Feb. 1, 2020).

[9] See, e.g., Geoff Cumming, “The ASA and p Values: Here We Go Again,” The New Statistics (Mar. 13, 2020).

[10] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics 2021, available at https://www.e-publications.org/ims/submission/AOAS/user/submissionFile/51526?confirm=79a17040.

[11] Karen Kafadar, “Editorial: Statistical Significance, P-Values, and Replicability,” 15 Annals of Applied Statistics 2021, available at https://www.e-publications.org/ims/submission/AOAS/user/submissionFile/51525?confirm=3079934e.

The Practicing Law Institute’s Second Edition of Products Liability Litigation

May 30th, 2021

In late March, the Practicing Law Institute released the second edition of its treatise on products liability. George D. Sax, Stephanie A. Scharf, Sarah R. Marmor, eds., Product Liability Litigation: Current Law, Strategies and Best Practices, (2nd ed. 2021).

The new edition is now in two volumes, which cover substantive products liability law, as well as legal theory, policy, and strategy considerations important to products liability law, both pursuers and defenders. The work of the editors, Stephanie A. Scharf and her colleagues, George D. Sax and Sarah R. Marmor, in managing this process is nothing short of Homeric.  The authors are mostly practitioners, with a wealth of practical experience. There are a good number of friends, colleagues, and adversaries, among the chapters’ authors, so any recommendation I make should be tempered by my disclosure.

Unlike the first edition, the PLI has doubled down on control of the copyright license, and so I am no longer able to upload my chapter on statistical evidence to ResearchGate, Academia.com, or my own website.  But here is the outline index to my contribution, Chapter 28, “Statistical Evidence in Products Liability Litigation”:

  • 28:1 History and Overview
  • 28:2 Litigation Context of Statistical Issues
  • 28:3 Qualifications of Expert Witnesses Who Give Testimony on Statistical Issues
  • 28:4 Admissibility of Statistical Evidence – Rules 702 and 703
  • 28:5 Significance Probability
  • 28:5.1 Definition of Significance Probability (The “p-value”)
  • 28:5.2 Misstatements about Statistical Significance
  • 28:5.3 Transposition Fallacy
  • 28:5.4 Confusion Between Significance Probability and Burden of Proof
  • 28:5.5 Hypothesis Testing
  • 28:5.6 Confidence Intervals
  • 28:5.7 Inappropriate Use of Statistics – Matrixx  Initiatives
    • [A]     Sequelae of Matrixx Initiatives
    • [B]     Is Statistical Significance Necessary?
  • 28: 5.8 American Statistical Association’s Statemen on P-Values
  • 28:6 Statistical Power
  • 28:6.1 Definition of Statistical Power
  • 28:6.2 Cases Involving Statistical Power
  • 28:7 Evidentiary Rule of Completeness
  • 28:8 Meta-Analysis
  • 28:8.1 Definition and History of Meta-Analysis
  • 28:8.2 Consensus  Statements
  • 28:8.3 Use of Meta-Analysis in Litigation
  • 28:8.4 Competing Models for Meta-Analysis
  • 28:8.5 Recent Cases Involving Meta-Analyses
  • 28:9 Statistical Inference in Securities Fraud Cases Against Pharmaceutical Manufacturers
  • 28:10 Multiple Testing
  • 28:11 Ethical Considerations Raised by Statistical Expert Witness Testimony
  • 28:12 Conclusion

A detailed table of contents for the entire treatise is available at the PLI’s website The authors and their chapters are set out below.

Chapter 1. What Product Liability Might Look Like in the Twenty-First Century (James M. Beck)

Chapter 2. Recent Trends in Product Claims and Product Defenses (Lori B. Leskin & Angela R. Vicari)

Chapter 3. Game-Changers: Defending Products Cases with Child Plaintiffs (Sandra Giannone Ezell & Diana M. Miller)

Chapter 4. Preemption Defenses (Joseph G. Petrosinelli, Ana C. Reyes & Amy Mason Saharia)

Chapter 5. Defending Class Action Lawsuits (Mark Herrmann, Pearson N. Bownas & Katherine Garceau Sobiech)

Chapter 6. Litigation in Foreign Countries Against U.S. Companies (Joseph G. Petrosinelli & Ana C. Reyes)

Chapter 7. Emerging Issues in Pharmaceutical Litigation (Allen P. Waxman, Loren H. Brown & Brooke Kim)

Chapter 8. Recent Developments in Asbestos, Talc, Silica, Tobacco, and E-Cigarette/Vaping Litigation in the U.S. and Canada (George Gigounas, Arthur Hoffmann, David Jaroslaw, Amy Pressman, Nancy Shane Rappaport, Wendy Michael, Christopher Gismondi, Stephen H. Barrett, Micah Chavin, Adam A. DeSipio, Ryan McNamara, Sean Newland, Becky Rock, Greg Sperla & Michael Lisanti)

Chapter 9. Emerging Issues in Medical Device Litigation (David R. Geiger, Richard G. Baldwin, Stephen G.W. Stich & E. Jacqueline Chávez)

Chapter 10. Emerging Issues in Automotive Product Liability Litigation (Eric P. Conn, Howard A. Fried, Thomas N. Lurie & Nina A. Rosenbach)

Chapter 11. Emerging Issues in Food Law and Litigation (Sarah L. Brew & Joelle Groshek)

Chapter 12. Regulating Cannabis Products (James H. Rotondo, Steven A. Cash & Kaitlin A. Canty)

Chapter 13. Blockchain Technology and Its Impact on Product Litigation (Justin Wales & Matt Kohen)

Chapter 14. Emerging Trends: Smart Technology and the Internet of Things (Christopher C. Hossellman & Damion M. Young)

Chapter 15. The Law of Damages in Product Liability Litigation (Evan D. Buxner & Dionne L. Koller)

Chapter 16. Using Early Case Assessments to Develop Strategy (Mark E. (Rick) Richardson)

Chapter 17. Impact of Insurance Policies (Kamil Ismail, Linda S. Woolf & Richard M. Barnes)

Chapter 18. Advantages and Disadvantages of Multidistrict Litigation (Wendy R. Fleishman)

Chapter 19. Strategies for Co-Defending Product Actions (Lem E. Montgomery III & Anna Little Morris)

Chapter 20. Crisis Management and Media Strategy (Joanne M. Gray & Nilda M. Isidro)

Chapter 21. Class Action Settlements (Richard B. Goetz, Carlos M. Lazatin & Esteban Rodriguez)

Chapter 22. Mass Tort Settlement Strategies (Richard B. Goetz & Carlos M. Lazatin)

Chapter 23. Arbitration (Beth L. Kaufman & Charles B. Updike)

Chapter 24. Privilege in a Global Product Economy (Marina G. McGuire)

Chapter 25. E-Discovery—Practical Considerations (Denise J. Talbert, John C. Vaglio, Jeremiah S. Wikler & Christy A. Pulis)

Chapter 26. Expert Evidence—Law, Strategies and Best Practices (Stephanie A. Scharf, George D. Sax, Sarah R. Marmor & Morgan G. Churma)

Chapter 27. Court-Appointed and Unconventional Expert Issues (Jonathan M. Hoffman)

Chapter 28. Statistical Evidence in Products Liability Litigation (Nathan A. Schachtman)

Chapter 29. Post-Sale Responsibilities in the United States and Foreign Countries (Kenneth Ross & George W. Soule)

Chapter 30. Role of Corporate Executives (Samuel Goldblatt & Benjamin R. Dwyer)

Chapter 31. Contacting Corporate Employees (Sharon L. Caffrey, Kenneth M. Argentieri & Rachel M. Good)

Chapter 32. Spoliation of Product Evidence (Paul E. Benson & Adam E. Witkov)

Chapter 33. Presenting Complex Scientific Evidence (Morton D. Dubin II & Nina Trovato)

Chapter 34. How to Win a Dismissal When the Plaintiff Declares Bankruptcy (Anita Hotchkiss & Earyn Edwards)

Chapter 35. Juries (Christopher C. Spencer)

Chapter 36. Preparing for the Appeal (Wendy F. Lumish & Alina Alonso Rodriguez)

Chapter 37. Global Reach: Foreign Defendants in the United States (Lisa J. Savitt)

Cancel Causation

March 9th, 2021

The Subversion of Causation into Normative Feelings

The late Professor Margaret Berger argued for the abandonment of general causation, or cause-in-fact, as an element of tort claims under the law.[1] Her antipathy to the requirement of showing causation ultimately involved her deprecating efforts to inject due scientific care in gatekeeping of causation opinions. After a long, distinguished career as a law professor, Berger died in November 2010.  Her animus against causation and Rule 702, however, was so strong that her chapter in the third edition of the Reference Manual on Scientific Evidence, which came out almost one year after her death, she embraced the First Circuit’s notorious anti-Daubert decision in Milward, which also post-dated her passing.[2]

Despite this posthumous writing and publication by Professor Berger, there have been no further instances of Zombie scholarship or ghost authorship.  Nonetheless, the assault on causation has been picked up by Professor Alexandra D. Lahav, of the University of Connecticut School of Law, in a recent essay posted online.[3] Lahav’s essay is an extension of her work, “The Knowledge Remedy,” published last year.[4]

This second essay, entitled “Chancy Causation in Tort Law,” is the plaintiffs’ brief against counterfactual causation, which Lahav acknowledges is the dominant test for factual causation.[5] Lahav begins with a reasonable, reasonably understandable distinction between deterministic (necessary and sufficient) and probabilistic (or chancy in her parlance) causation.

The putative victim of a toxic exposure (such as glyphosate and Non-Hodgkin’s lymphoma) cannot show that his exposure was a necessary and sufficient determinant of his developing NHL. Not everyone similarly exposed develops NHL; and not everyone with NHL has been exposed to glyphosate. In Lahav’s terminology, specific causation in such a case is “chancy.” Lahav asserts, but never proves, that the putative victim “could never prove that he would not have developed cancer if he had not been exposed to that herbicide.”[6]

Lahav’s example presents an example of a causal claim, which involves both general and specific causation, which is easily distinguishable from someone who claims his death was caused by being run over by a high-speed train. Despite this difference, Lahav never marshals any evidence to show why the putative glyphosate victim cannot show a probability that his case is causally related by adverting to the magnitude of the relative risk created by the prior exposure.

Repeatedly, Lahav asserts that when causation is chancy – probabilistic – it can never be shown by counterfactual causal reasoning, which she claims “assumes deterministic causation.” And she further asserts that because probabilistic causation cannot fit the counterfactual model, it can never “meet the law’s demand for a binary determination of cause.”[7]

Contrary to these ipse dixits, probabilistic causation can, at both the general and specific, or individual, levels be described in terms of counterfactuals. The modification requires us, of course, to address the baseline situation as a rate or frequency of events, and the post-exposure world as one with a modified rate or frequency. The exposure is the cause of the change in event rates. Modern physics addresses whether we must be content with probability statements, rather than precise deterministic “billiard ball” physics, which is so useful in a game of snooker, but less so in describing quarks. In the first half of the 20th century, the biological sciences learned with some difficulty that it must embrace probabilistic models, in genetic science, as well as in epidemiology. Many biological causation models are completely stated in terms of probabilities that are modified by specified conditions.

When Lahav gets to providing an example of where chancy causation fails in reasoning about individual causation, she gives a meaningless hypothetical of a woman, Mary, who is a smoker who develops lung cancer. To remove any semblance to real world cases, Lahav postulates that Mary had a 20% increased risk of lung cancer from smoking (a relative risk of 1.2). Thus, Lahav suggests that:

“[i]f Mary is a smoker and develops lung cancer, even after she has developed lung cancer it would still be the case that the cause of her cancer could only be described as a likelihood of 20 percent greater than what it would have been otherwise. Her doctor would not be able to say to her ‘Mary, if you had not smoked, you would not have developed this cancer’ because she might have developed it in any event.”

A more pertinent, less counterfactual hypothetical, is that Mary had a 2,000% increase in risk from her tobacco smoking. This corresponds to the relative risks in the range of 20, seen in many, if not most, epidemiologic studies of smoking and lung cancer. And there is an individual probability of causation that would be well over 0.9, for such a risk.

To be sure, there are critics of using the probability of causation because it assumes that the risk is distributed stochastically, which may not be correct. Of course, claimants are free to try to show that more of the risk fell on them for some reason, but of course, this requires evidence!

Lahav attempts to answer this point, but her argument runs off its rails.  She notes that:

“[i]f there is an 80% chance that a given smoker’s cancer is caused by smoking, and Mary smoked, some might like to say that she has met her burden of proof.

This approach confuses the strength of the evidence with its content. Assume that it is more likely than not, based on recognized scientific methodology, that for 80% of smokers who contract lung cancer their cancer is attributable to smoking. That fact does not answer the question of whether we ought to infer that Mary’s cancer was caused by smoking. I use the word ought advisedly here. Suppose Mary and the cigarette company stipulate that 80% of people like Mary will contract lung cancer, the burden of proof has been met. The strength of the evidence is established. The next question regards the legal permissibility of an inference that bridges the gap between the run of cases and Mary. The burden of proof cannot dictate the answer. It is a normative question of whether to impose liability on the cigarette company for Mary’s harm.”[8]

Lahav is correct that an 80% probability of causation might be based upon very flimsy evidence, and so that probability alone cannot establish that the plaintiff has a “submissible” case. If the 80% probability of causation is stipulated, and not subject to challenge, then Lahav’s claim is remarkable and contrary to most of the scholarship that has followed the infamous Blue Bus hypothetical. Indeed, she is making the very argument that tobacco companies made in opposition to the use of epidemiologic evidence in tobacco cases, in the 1950s and 1960s.

Lahav advances a perverse skepticism that any inferences about individuals can be drawn from information about rates or frequencies in groups of similar individuals.  Yes, there may always be some debate about what is “similar,” but successive studies may well draw the net tighter around what is the appropriate class. Lahav’s skepticism and her outright denialism, is common among some in the legal academy, but it ignores that group to individual inferences are drawn in epidemiology in multiple contexts. Regressions for disease prediction are based upon individual data within groups, and the regression equations are then applied to future individuals to help predict those individuals’ probability of future disease (such as heart attack or breast cancer), or their probability of cancer-free survival after a specific therapy. Group to individual inferences are, of course, also the basis for prescribing decisions in clinical medicine.  These are not normative inferences; they are based upon evidence-based causal thinking.

Lahav suggests that the metaphor of a “link” between exposure and outcome implies “something is determined and knowable, which is not possible in chancy causation cases.”[9] Not only is the link metaphor used all the time by sloppy journalists and some scientists, but when they use it, they mostly use it in the context of what Lahav would characterize as “chancy causation.” Even when speaking more carefully, and eschewing the link metaphor, scientists speak of probabilistic causation as something that is real, based upon evidence and valid inferences, not normative judgments or emotive reactions.

The probabilistic nature of the probability of causation does not affect its epistemic status.

The law does not assume that binary deterministic causality, as Lahav describes, is required to apply “but for” or counterfactual analysis. Juries are instructed to determine whether the party with the burden of proof has prevailed on each element of the claim, by a preponderance of the evidence. This civil jury instruction is almost always explained in terms of a posterior probability greater than 0.5, whether the claimed tort is a car crash or a case of Non-Hodgkin’s lymphoma.

Elsewhere, Lahav struggles with the concept of probability. Her essay suggests that

“[p]robability follows certain rules, or tendencies, but these regular laws do not abolish chance. There is a chance that the exposure caused his cancer, and a chance that it did not.”[10]

The use of chance here, in contradistinction to probability, is so idiosyncratic, and unexplained, that it is impossible to know what is meant.

Manufactured Doubt

Lahav’s essay twice touches upon a strawperson argument that stretches to claim that “manufacturing doubt” does not undermine her arguments about the nature of chancy causation. To Lahav, the likes of David Michaels have “demonstrated” that manufactured uncertainty is a genuine problem, but not one that affects her main claims. Nevertheless, Lahav remarkably sees no problem with manufactured certainty in the advocacy science of many authors.[11]

Lahav swallows the Michaels’ line, lure and all, and goes so far as to describe Rule 702 challenges to causal claims as having the “negative effect” of producing “incentives to sow doubt about epidemiologic studies using methodological battles, a strategy pioneered by the tobacco companies … .”[12] There is no corresponding concern about the negative effect of producing incentives to overstate the findings, or the validity of inferences, in order to get to a verdict for claimants.

Post-Modern Causation

What we have then is the ultimate post-modern program, which asserts that cause is “irreducibly chancy,” and thus indeterminate, and rightfully in the realm of “normative decisions.”[13] Lahav maintains there is an extreme plasticity to the very concept of causation:

“Causation in tort law can be whatever judges want it to be… .”[14]

I for one sincerely doubt it. And if judges make up some Lahav-inspired concept or normative causation, the scientific community would rightfully scoff.

Taking Lahav’s earlier paper, “The Knowledge Remedy,” along with this paper, the reader will see that Lahav is arguing for a rather extreme, radical precautionary principle approach to causation. There is a germ of truth that gatekeeping is affected by the moral quality of the defendant or its product. In the early days of the silicone gel breast implant litigation, some judges were influenced by suggestions that breast implants were frivolous products, made and sold to cater to male fantasies. Later, upon more mature reflection, judges recognized that roughly one third of breast implant surgeries were post-mastectomy, and that silicone was an essential biomaterial.  The recognition brought a sea change in critical thinking about the evidence proffered by claimants, and ultimately brought a recognition that the claimants were relying upon bogus and fraudulent evidence.[15]

—————————————————————————————–

[1]  Margaret A. Berger, “Eliminating General Causation: Notes towards a New Theory of Justice and Toxic Torts,” 97 Colum. L. Rev. 2117 (1997).

[2] Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir. 2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, 132 S. Ct. 1002 (2012)

[3]  Alexandra D. Lahav, “Chancy Causation in Tort,” (May 15, 2020) [cited as Chancy], available at https://ssrn.com/abstract=3633923 or http://dx.doi.org/10.2139/ssrn.3633923.

[4]  Alexandra D. Lahav, “The Knowledge Remedy,” 98 Texas L. Rev. 1361 (2020). SeeThe Knowledge Remedy Proposal” (Nov. 14, 2020).

[5]  Chancy at 2 (citing American Law Institute, Restatement (Third) of Torts: Physical & Emotional Harm § 26 & com. a (2010) (describing legal history of causal tests)).

[6]  Id. at 2-3.

[7]  Id.

[8]  Id. at 10.

[9]  Id. at 12.

[10]  Id. at 2.

[11]  Id. at 8 (citing David Michaels, The Triumph of Doubt: Dark Money and the Science of Deception (2020), among others).

[12]  Id. at 18.

[13]  Id. at 6.

[14]  Id. at 3.

[15]  Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans” and the litigation as largely based upon fraud).

Falsehood Flies – The ASA 2016 Statement on Statistical Significance

February 26th, 2021

Under the heading of “falsehood flies,” we have the attempt by the American Statistical Association (ASA) to correct misinterpretations and misrepresentations of “statistical significance,” in a 2016 consensus statement.[1] Almost before the ink was dried, lawsuit industry lawyers seized upon the ASA statement to proclaim a new freedom from having to exclude random error.[2] Those misrepresentations were easily enough defeated by the actual text of the ASA statement, as long as lawyers bothered to read it carefully.

In 2019, Ronald Wasserstein, the ASA executive director, along with two other authors wrote an editorial, which explicitly called for the abandonment of using “statistical significance.” Although the piece, published in the American Statistician, was labeled “editorial,”[3] I predicted that Wasserstein’s official title, which appears in the editorial, and the absence of a disclaimer that the piece was not an ex cathedra pronouncement, would lead to widespread confusion, abuse, and further misrepresentations of the ASA’s views.[4]

Some people pooh-poohed the danger of confusion, but I was doubtful, given the experience with what happened with the anodyne 2016 ASA statement. What I did not realize until recently was that the Wasserstein editorial was misunderstood to be an official policy statement by the ASA’s own publication, Significance!

Significance is a bimonthly magazine on statistics for educated laypeople, published jointly the ASA and the Royal Statistical Society. In August 2019, the editor of Significance, Brian Turran, published an editorial that clearly reflected that Turran interpreted the Wasserstein editorial as an official ASA pronouncement.[5] Indeed, Turran cited the Wasserstein 2019 editorial as the ASA “recommendation.”

Donald Macnaughton, President of MatStat Research Consulting Inc., in Toronto, wrote a letter to point out Turran’s error.[6] Macnaughton noted that Wasserstein had disclaimed an official imprimatur for his ideas in various oral presentations, and that the editors of the New England Journal of Medicine had explicitly rejected the editorial’s call for abandoning statistical significance.[7]

In reply, Tarran graciously acknowledge the mistake, and pointed to an ASA press release that had led him astray:

“Thank you for this clarification. Our mistake was to give too much weight to the headline of a press release, ‘ASA Calls Time on “Statistically Significant” in Science Research’ (bit.ly/2UBWKNq).”

Inquiring minds might wonder why the ASA allowed such a press release to go out.

In 2019, then President of the ASA, Karen Kafadar, wrote on multiple occasions, in AmStat News, to correct any confusion or misimpression created by Wasserstein’s editorial:

“One final challenge, which I hope to address in my final month as ASA president, concerns issues of significance, multiplicity, and reproducibility. In 2016, the ASA published a statement that simply reiterated what p-values are and are not. It did not recommend specific approaches, other than ‘good statistical practice … principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean’.

The guest editors of the March 2019 supplement to The American Statistician went further, writing: ‘The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. … [I]t is time to stop using the term “statistically significant” entirely’.

Many of you have written of instances in which authors and journal editors – and even some ASA members – have mistakenly assumed this editorial represented ASA policy. The mistake is understandable: The editorial was coauthored by an official of the ASA. In fact, the ASA does not endorse any article, by any author, in any journal – even an article written by a member of its own staff in a journal the ASA publishes.”[8]

Kafadar did not address the hyperactivity of the ASA public relations’ office, but her careful statement of the issues should put the matter to bed. There are now citable sources to correct the incorrect claim that the ASA has recommended the complete abandonment of significance testing.

——————————————————————————————————————–

[1]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); seeThe American Statistical Association’s Statement on and of Significance” (March 17, 2016).

[2]  “The American Statistical Association Statement on Significance Testing Goes to Court – Part I” (Nov. 13, 2018); “The American Statistical Association Statement on Significance Testing Goes to Court – Part 2” (Mar. 7, 2019).

[3]  Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[4]  “Has the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019); “American Statistical Association – Consensus versus Personal Opinion” (Dec. 13, 2019). See also Deborah G. Mayo, “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean,” Error Statistics Philosophy (June 17, 2019); B. Haig, “The ASA’s 2019 update on P-values and significance,” Error Statistics Philosophy  (July 12, 2019).

[5]  Brian Tarran, “THE S WORD … and what to do about it,” Significance (Aug. 2019).

[6]  Donald Macnaughton, “Who Said What,” Significance 47 (Oct. 2019).

[7]  See “Statistical Significance at the New England Journal of Medicine” (July 19, 2019); See also Deborah G. Mayo, “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?” Error Statistics Philosophy  (July 19, 2019).

[8]  Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019) (emphasis added); see Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

On Praising Judicial Decisions – In re Viagra

February 8th, 2021

We live in strange times. A virulent form of tribal stupidity gave us Trumpism, a personality cult in which it impossible to function in the Republican party and criticize der Führe. Even a diehard right-winger such as Liz Cheney, who dared to criticize Trump is censured, for nothing more than being disloyal to a cretin who fomented an insurrection that resulted in the murder of a Capital police officer and the deaths of several other people.[1]

Unfortunately, a similar, even if less extreme, tribal chauvinism affects legal commentary, from both sides of the courtroom. When Judge Richard Seeborg issued an opinion, early in 2020), in the melanoma – phosphodiesterase type 5 inhibitor (PDE5i) litigation,[2] I praised the decision for not shirking the gatekeeping responsibility even when the causal claim was based upon multiple, consistent statistically significant observational studies that showed an association between PDE5i medications and melanoma.[3] Although many of the plaintiffs’ relied-upon studies reported statistically significant associations between PDE5i use and melanoma occurrence, they also found similar size associations with non-melanoma skin cancers. Because skin carcinomas were not part of the hypothesized causal mechanism, the study findings strongly suggested a common, unmeasured confounding variable such as skin damage from ultraviolet light. The plaintiffs’ expert witnesses’ failure to account for confounding was fatal under Rule 702, and Judge Seeborg’s recognition of this defect, and his willingness to go beyond multiple, consistent, statistically significant associations was what made the decision important.

There were, however, problems and even a blatant error in the decision that required attention. Although the error was harmless in that its correction would not have required, or even suggested, a different result, Judge Seeborg, like many other judges and lawyers, tripped up over the proper interpretation of a confidence interval:

“When reviewing the results of a study it is important to consider the confidence interval, which, in simple terms, is the ‘margin of error’. For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent ‘confidence interval’ of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”[4]

This statement about the true value is simply wrong. The provenance of this error is old, but the mistake was unfortunately amplified in the Third Edition of the Reference Manual on Scientific Evidence,[5] in its chapter on epidemiology.[6] The chapter, which is often cited, twice misstates the meaning of a confidence interval:

“A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”[7]

and

“A confidence interval is a range of possible values calculated from the results of a study. If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population. Thus, the width of the interval reflects random error.”[8]

The 95% confidence interval does represent random error, 1.96 standard errors above and below the point estimate from the sample date. The confidence interval is not the range of possible values, which could well be anything, but the range of reasonable compatible estimates with this one, particular study sample statistic.[9] Intervals have lower and upper bounds, which are themselves random variables, with approximately normal (or some other specified) distributions. The essence of the interval is that no value within the interval would be rejected as a null hypothesis based upon the data collected for the particular sample. Although the chapter on statistics in the Reference Manual accurately describes confidence intervals, judges and many lawyers are misled by the misstatements in the epidemiology chapter.[10]

Given the misdirection created by the Federal Judicial Center’s manual, Judge Seeborg’s erroneous definition of a confidence interval is understandable, but it should be noted in the context of praising the important gatekeeping decision in In re Viagra. Certainly our litigation tribalism should not “allow us to believe” impossible things.[11] The time to revise the Reference Manual is long overdue.

_____________________________________________________________________

[1]  John Ruwitch, “Wyoming GOP Censures Liz Cheney For Voting To Impeach Trump,” Nat’l Pub. Radio (Feb. 6, 2021).

[2]  In re Viagra (Sildenafil Citrate) and Cialis (Tadalafil) Prods. Liab. Litig., 424 F. Supp. 3d 781 (N.D. Cal. 2020) [Viagra].

[3]  SeeJudicial Gatekeeping Cures Claims That Viagra Can Cause Melonoma” (Jan. 24, 2020).

[4]  Id. at 787.

[5]  Federal Judicial Center, Reference Manual on Scientific Evidence (3rd ed. 2011).

[6]  Michael D. Green, D. Michal Freedman, & Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 549 (3rd ed. 2011).

[7]  Id. at 573.

[8]  Id. at 580.

[9] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[10]  See, e.g., Derek C. Smith, Jeremy S. Goldkind, and William R. Andrichik, “Statistically Significant Association: Preventing the Misuse of the Bradford Hill Criteria to Prove Causation in Toxic Tort Cases,” 86 Defense Counsel J. 1 (2020) (mischaracterizing the meaning of confidence intervals based upon the epidemiology chapter in the Reference Manual).

[11]  See, e.g., James Beck, “Tort Pandemic Countermeasures? The Ten Best Prescription Drug/Medical Device Decisions of 2020,” Drug and Device Law Blog (Dec. 30, 2020) (suggesting that Judge Seeborg’s decision represented the rejection of plausibility and a single “association” as insufficient); Steven Boranian, “General Causation Experts Excluded In Viagra/Cialis MDL,” (Jan. 23, 2020).

Pernicious Probabilities in the Supreme Court

December 11th, 2020

Based upon Plato’s attribution,[1] philosophers credit pre-Socratic philosopher Heraclitus, who was in his prime about 500 B.C., for the oracular observation that πάντα χωρεῖ και οὐδε ν μένει, or in more elaborative English:

all things pass and nothing stays, and comparing existing things to the flow of a river, he says you could not step twice into the same river.

Time changes us all. Certainly 2016 is not 2020, and the general elections held in November of those two years were not the same elections, and certainly not the same electorate. No one would need a statistician to know that the population of voters in 2016 was different from that in 2020.  Inevitably, some voters from 2016 died in the course of the Trump presidency; some no doubt died as a result of Trump’s malfeasance in handling the pandemic. Inevitably, some new voters came of age or became citizens and were thus eligible to vote in 2020, when they could not vote in 2016. Some potential voters who were unregistered in 2016 became new registrants. Non-voters in 2016 chose to vote in 2020, and some voters in 2016 chose not to vote in 2020. Overall, many more people turned out to vote in 2020 than turned out in 2016.

The candidates in 2016 and 2020 were different as well. On the Republican side, we had ostensibly the same candidate, but in 2020, Trump was the incumbent and had a record of dismal moral and political failures, four years in duration. Many Republicans who fooled themselves into believing that the Office of the Presidency would transform Trump into an honest political actor, came to realize that he was, and always has been, and always will be, a moral leper. These “apostate” Republicans effectively organized across the country, through groups like the Lincoln Project and the Bulwark, against Trump, and for the Democratic candidate, Joseph Biden.

In the 2016 election, Hilary Clinton outspent Donald Trump, but Trump used social media more effectively, with a big help from Vladimir Putin. In the 2020 election, Russian hackers did not have to develop a disinformation campaign; the incumbent president had been doing so for four years.

On the Democratic side of the 2016 and 2020 elections, there was a dramatic change in the line-up. In 2016, candidate Hilary Clinton inspired many feminists because of her XX 23rd chromosomes. She also suffered significant damage in primary battles with social democrat Bernie Sanders, whose supporters were alienated by the ham-fisted prejudices of the Clinton-supporters on the Democratic National Committee. Many of Sanders’ supporters stayed home on election day, 2016. In 2020, Sanders and the left-wing of the Democratic party made peace with the centrist candidate Joseph Biden, in recognition that the alternative – Trump – involved existential risks to our republican democracy.

In 2016, third party candidates, from the Green Party and the Libertarian Party, attracted more votes than they did in 2020. The 2016 election saw more votes siphoned from the two major party candidates by third parties because of the unacceptable choice between Trump and Clinton for several percent of the voting public. In 2020, with Trump’s authoritarian kleptocracy fully disclosed to Americans, a symbolic vote for a third-party candidate was tantamount to the unacceptable decision to not vote at all.

In 2016, after eight years of Obama’s presidency, the economy and the health of the nation were good. In 2020, the general election occurred in the midst of a pandemic and great economic suffering. Many more people voted by absentee or mail-in ballot than voted in that manner in 2016. State legislatures anticipated the deluge of mail-in ballots; some by facilitating early counting, and some by prohibiting early counting. The Trump administration anticipated the large uptick in mail-in ballots by manipulating the Post Office’s funding, by anticipatory charges of fraud in mail-in procedures, and by spreading lies and disinformation about COVID-19, along with spreading the infection itself.

On December 8, 2020, without apparently tiring of losing so much, the Trump Campaign orchestrated the filing of the big one, the “kraken lawsuit.” The State of Texas filed a complaint in the United States Supreme Court, in an attempt to invoke that court’s original jurisdiction to adjudicate Texas’ complaint that it was harmed by voting procedures in four states in which Trump lost the popular vote. All four states had certified their results before Texas filed its audacious lawsuit. Legal commentators were skeptical and derisive of the kraken’s legal theories.[2] Even the stalwart National Review saw the frivolity.[3]

Charles J. Cicchetti[4] is an economist, who is a director at the Berkeley Research Group. Previously, Cicchetti held academic positions at the University of Southern California, and the Energy and Environmental Policy Center at Harvard University’s John F. Kennedy School of Government. At the heart of the kraken is a declaration from Cicchetti, who tells us under penalty of perjury, that he was “formally trained statistics and econometrics [sic][5] and accepted as an expert witness in civil proceedings.”[6] Declaration of Charles J. Cicchetti, Ph.D., Dec. 6, 2020, filed in support of Texas’ motion at ¶ 2.

Cicchetti’s declaration is not a model of clarity, but it is clear that he conducted several statistical analyses. He was quite transparent in stating his basic assumption for all his analyses; namely, the outcomes for the two Democratic candidates, Clinton and Biden, for the two major party candidates, Clinton versus Trump and Biden versus Trump, and for in-person and for mail-in voters were all randomly drawn from the same population. Id. at ¶ 7. Using a binomial model, Cicchetti calculated Z-scores for the observed disparities in rates, which was very good evidence to reject the “same population” assumptions.

Based upon very large Z-scores, Cicchetti rejected the null hypothesis of “same population” and of Biden = Clinton. Id. at ¶ 20. But nothing of importance follows from this. We knew before the analysis that Biden ≠ Clinton, and the various populations compare were definitely not the same. Cicchetti might have stopped there and preserved his integrity and reputation, but he went further.

He treated the four states, Georgia, Michigan, Pennsylvania, and Wisconsin, as independent tests, which of course they are not. All states had different populations from 2016 to 2020; all had no pandemic in 2016, and pandemic in 2020; all had been exposed for four years of Trump’s incompetence, venality, corruption, bigotry, and bullying. Cicchetti gilded the lily with the independence assumption, and came up with even lower, more meaningless probabilities that the populations were the same. And then he stepped into the abyss of the fallacy and non sequitur:

“In my opinion, this difference in the Clinton and Biden performance warrants further investigation of the vote tally particularly in large metropolitan counties within and adjacent to the urban centers in Atlanta, Philadelphia, Pittsburgh, Detroit and Milwaukee.”

Id. at ¶ 30. Cicchetti’s suggestion that there is anything amiss, which warrants investigation, follows only from a maga, mega-transposition fallacy. The high Z-score does not mean that observed result is not accurate or fair; it means only that the starting assumptions were outlandishly false.

Early versus Late Counting

Texas’ claim that there is something “odd” about the reporting before and after 3 a.m., on the morning after Election Day fares no better. Cicchetti tells us that “many Americans went to sleep election night with President Donald Trump (Trump) winning key battleground states, only to learn the next day that Biden surged ahead.” Id. at ¶ 7.

Well, Americans who wanted to learn the final count should not have gone to sleep, for several days. Again, the later counted mail-in votes came from a segment of the population that was obviously different from the in-person voters. Cicchetti’s statistical analysis shows that we should reject any assumption that they were the same, but who would make that assumption?  These expected values for the mail-in ballots differed from the expected values for in-person votes; the difference was driven by Republican lies and disinformation about Covid-19, and by laws that prohibited early counting.  Not surprisingly, the Trumpist propaganda had an effect, and there was a disparity between the rate at which Trump and Biden supporters voted in person, and who voted by mail-in ballot. The late counting and reporting of mail-in ballots was further ensured by laws in some states that prohibited counting before Election Day. Trump was never winning in the referenced “key battleground” states; he was ahead in some states, at 2:59 a.m., but the count changed after all lawfully cast ballots had been counted.

The Response to Cicchetti’s Analyses

The statistical “argument,” such as it is, has not fooled anyone outside of maga-land.[7] Cicchetti’s analysis has been derided as “ludicrous” and “incompetence, by Professors Kenneth Mayer and David Post. Mayer described the analysis as one that will be “used in undergraduate statistics classes as a canonical example of how not to do statistics.”[8] It might even make its way into a Berenstain Bear book on statistics. Andrew Gelman called the analysis “horrible,” and likened the declaration to the infamous Dreyfus case.[9]

The Texas lawsuit speaks volumes of the insincerity of the Trumpist Republican party. The rantings of Pat Robertson, asking God to intervene in the election to keep Trump in office, are more likely to have an effect.[10] The only issue the kraken fairly raises is whether the plaintiff, and plaintiff intervenor, should be be sanctioned for “multipl[ying] the proceedings in any case unreasonably and vexatiously.”[11]


[1]  Plato, Cratylus 402a = A6.

[2] Adam Liptak, “Texas files an audacious suit with the Supreme Court challenging the election results,” N.Y. Times (Dec. 8, 2020); Jeremy W. Peters and Maggie Haberman, “17 Republican Attorneys General Back Trump in Far-Fetched Election Lawsuit,” N.Y. Times (Dec. 9, 2020); Paul J. Weber, “Trump’s election fight puts embattled Texas AG in spotlight,” Wash. Post (Dec. 9, 2020).

[3] Andrew C. McCarthy, “Texas’s Frivolous Lawsuit Seeks to Overturn Election in Four Other States,” Nat’l Rev. (Dec. 9, 2020); Robert VerBruggen, “The Dumb Statistical Argument in Texas’s Election Lawsuit,” Nat’l Rev. (Dec. 9, 2020).

[4] Not to be confused with Chicolini, Sylvania’s master spy.

[5] Apparently not formally trained in English.

[6] See, e.g., K N Energy, Inc. v. Cities of Alliance & Oshkosh, 266 Neb. 882, 670 N.W.2d 319 (2003), Center for Biological Diversity v. Pizarchik, 858 F. Supp. 2d 1221 (D. Colo. 2012), National Paint & Coatings Ass’n, v. City of Chicago, 835 F. Supp. 421 (N.D. Ill. 1993), National Paint & Coatings Ass’n, v. City of Chicago, 835 F. Supp. 414 (N.D. Ill. 1993); Mississippi v. Entergy Mississippi, Inc. (S.D. Miss. 2012); Hiko Energy, LLC v. Pennsylvania Public Utility Comm’n, 209 A.3d 246 (Pa. 2019).

[7] Philip Bump, “Trump’s effort to steal the election comes down to some utterly ridiculous statistical claims,” Wash. Post (Dec. 9, 2020); Jeremy W. Peters, David Montgomery, Linda Qiu & Adam Liptak, “Two reasons the Texas election case is faulty: flawed legal theory and statistical fallacy,N.Y. Times (Dec. 10, 2020); David Post, “More on Statistical Stupidity at SCOTUS,” Volokh Conspiracy (Dec. 9, 2020).

[8] Eric Litke, “Lawsuit claim that statistics prove fraud in Wisconsin, elsewhere is wildly illogical,”  PolitiFact ((Dec. 9, 2020).

[9] Andrew Gelman, “The p-value is 4.76×10^−264 1 in a quadrillionStatistical Modeling, Causal Inference, and Social Science (Dec. 8, 2020).

[10]  Evan Brechtel, “Pat Robertson Calls on God to ‘Intervene’ in the Election to Keep Trump President in Bonkers Rant” (Dec. 10, 2020).

[11] SeeCounsel’s liability for excessive costs,” 28 U.S. Code § 1927.

Regressive Methodology in Pharmaco-Epidemiology

October 24th, 2020

Medications are rigorously tested for safety and efficacy in clinical trials before approval by regulatory agencies such as the U.S. Food & Drug Administration (FDA) or the European Medicines Agency (EMA). The approval process, however, contemplates that more data about safety and efficacy will emerge from the use of approved medications in pharmacoepidemiologic studies conducted outside of clinical trials. Litigation of safety outcomes rarely arises from claims based upon the pivotal clinical trials that were conducted for regulatory approval and licensing. The typical courtroom scenario is that a safety outcome is called into question by pharmacoepidemiologic studies that purport to find associations or causality between the use of a specific medication and the claimed harm.

The International Society for Pharmacoepidemiology (ISPE), established in 1989, describes itself as an international professional organization intent on advancing health through pharmacoepidemiology, and related areas of pharmacovigilance. The ISPE website defines pharmacoepidemiology as

“the science that applies epidemiologic approaches to studying the use, effectiveness, value and safety of pharmaceuticals.”

The ISPE conceptualizes pharmacoepidemiology as “real-world” evidence, in contrast to randomized clinical trials:

“Randomized controlled trials (RCTs) have served and will continue to serve as the major evidentiary standard for regulatory approvals of new molecular entities and other health technology. Nonetheless, RWE derived from well-designed studies, with application of rigorous epidemiologic methods, combined with judicious interpretation, can offer robust evidence regarding safety and effectiveness. Such evidence contributes to the development, approval, and post-marketing evaluation of medicines and other health technology. It enables patient, clinician, payer, and regulatory decision-making when a traditional RCT is not feasible or not appropriate.”

ISPE Position on Real-World Evidence (Feb. 12, 2020) (emphasis in original).

The ISPE publishes an official journal, Pharmacoepidemiology and Drug Safety, and sponsors conferences and seminars, all of which are watched by lawyers pursuing and defending drug and device health safety claims. The endorsement by the ISPE of the American Statistical Association’s 2016 statement on p-values is thus of interest not only to statisticians, but to lawyers and claimants involved in drug safety litigation.

The ISPE, through its board of directors, formally endorsed the ASA 2016 p-value statement on April 1, 2017 (no fooling) in a statement that can be found at its website:

The International Society for Pharmacoepidemiology, ISPE, formally endorses the ASA statement on the misuse of p-values and accepts it as an important step forward in the pursuit of reasonable and appropriate interpretation of data.

On March 7, 2016, the American Statistical Association (ASA) issued a policy statement that warned the scientific community about the use P-values and statistical significance for interpretation of reported associations. The policy statement was accompanied by an introduction that characterized the reliance on significance testing as a vicious cycle of teaching significance testing because it was expected, and using it because that was what was taught. The statement and many accompanying commentaries illustrated that p-values were commonly misinterpreted to imply conclusions that they cannot imply. Most notably, “p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.” Also, “a p-value does not provide a good measure of evidence regarding a model or hypothesis.” Furthermore, reliance on p-values for data

interpretation has exacerbated the replication problem of scientific work, as replication of a finding is often confused with replicating the statistical significance of a finding, on the erroneous assumption that replication should lead to studies getting similar p-values.

This official statement from the ASA has ramifications for a broad range of disciplines, including pharmacoepidemiology, where use of significance testing and misinterpretation of data based on P-values is still common. ISPE has already adopted a similar stance and incorporated it into our GPP [ref] guidelines. The ASA statement, however, carries weight on this topic that other organizations cannot, and will inevitably lead to changes in journals and classrooms.

There are points of interpretation of the ASA Statement, which can be discussed and debated. What is clear, however, is that the ASA never urged the abandonment of p-values or even of statistical significance. The Statement contained six principles, some of which did nothing other than to attempt to correct prevalent misunderstandings of p-values. The third principle stated that “[s]cientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.” (emphasis added).

This principle, as stated, thus hardly advocated for the abandonment of a threshold in testing; rather it made the unexceptional point that the ultimate scientific conclusion (say about causality) required more assessment than only determining whether a p-value passed a specified threshold.

Presumably, the ISPE’s endorsement of the ASA’s 2016 Statement embraces all six of the articulated principles, including the ASA’s fourth principle:

4. Proper inference requires full reporting and transparency

P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry-picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and “p-hacking,” leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise: Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

The ISPE’s endorsement of the ASA 2016 Statement references the ISPE’s own

Guidelines for Good Pharmacoepidemiology Practices (GPP),” which were promulgated initially in 1996, and revised as recently as June 2015. Good practices, as of 2015, provided that:

“Interpretation of statistical measures, including confidence intervals, should be tempered with appropriate judgment and acknowledgements of potential sources of error and limitations of the analysis, and should never be taken as the sole or rigid basis for concluding that there is or is not a relation between an exposure and outcome. Sensitivity analyses should be conducted to examine the effect of varying potentially critical assumptions of the analysis.”

All well and good, but this “good practices” statement might be taken as a bit anemic, given that it contains no mention of, or caution against, unqualified or unadjusted confidence intervals or p-values that come from multiple testing or comparisons. The ISPE endorsement of the ASA Statement now expands upon the ISPE’s good practices to include the avoidance of multiplicity and the disclosure of the full extent of analyses conducted in a study.

What happens in the “real world” of publishing, outside the board room?

Last month, the ISPE conducted its (virtual) 36th International Conference on Pharmacoepidemiology & Therapeutic Risk Management. The abstracts and poster presentations from this Conference were published last week as a Special Issue of the ISPE journal. I spot checked the journal contents to see how well the presentations lived up to the ISPE’s statistical aspirations.

One poster presentation addressed statin use and skin cancer risk in a French prospective cohort.[1]  The authors described their cohort of French women, who were 40 to 65 years old, in 1990, and were followed forward. Exposure to statin medications was assessed from 2004 through 2014. The analysis included outcomes of any skin cancer, melanoma, basal-cell carcinoma (BCC), and squamous-call carcinoma (SCC), among 66,916 women. Here is how the authors describe their findings:

There was no association between ever use of statins and skin cancer risk: the HRs were 0.96 (95% CI = 0.87-1.05) for overall skin cancer, 1.18 (95% CI = 0.96-1.47) for melanoma, 0.89 (95% CI = 0.79-1.01) for BCC, and 0.90 (95% CI = 0.67-1.21) for SCC. Associations did not differ by statin molecule nor by duration or dose of use. However, women who started to use statins before age 60 were at increased risk of BCC (HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use).

To be fair, this was a poster presentation, but this short description of findings makes clear that the investigators looked at least at the following subgroups:

Exposure subgroups:

  • specific statin drug
  • duration of use
  • dosage
  • age strata

and

Outcome subgroups:

  • melanoma
  • basal-cell carcinoma
  • squamous-cell carcinoma

The reader is not told how many specific statins, how many duration groups, dosage groups, and age strata were involved in the exposure analysis. My estimate is that the exposure subgroups were likely in excess of 100. With three disease outcome subgroups, the total subgroup analyses thus likely exceeded 300. The authors did not provide any information about the full extent of their analyses.

Here is how the authors reported their conclusion:

“These findings of increased BCC risk in statin users before age 60 deserve further investigations.”

Now, the authors did not use the phrase “statistically significant,” but it is clear that they have characterized a finding of “increased BCC risk in statin users before age 60,” and in no other subgroup, and they have done so based upon a reported nominal “HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use.” It is also clear that the authors have made no allowance, adjustment, modification, or qualification, for the wild multiplicity arising from their estimated 300 or so subgroups. Instead, they made an unqualified statement about “increased BCC risk,” and they offered an opinion about the warrant for further studies.

Endorsement of good statistical practices is a welcome professional organizational activity, but it is rather meaningless unless the professional societies begin to implement the good practices in their article selection, editing, and publishing activities.


[1]  Marie Al Rahmoun, Yahya Mahamat-Saleh, Iris Cervenka, Gianluca Severi, Marie-Christine Boutron-Ruault, Marina Kvaskoff, and Agnès Fournier, “Statin use and skin cancer risk: A French prospective cohort study,” 29 Pharmacoepidemiol. & Drug Safety s645 (2020).

The Defenestration of Sir Ronald Aylmer Fisher

August 20th, 2020

Fisher has been defenestrated. Literally.

Sir Ronald Fisher was a brilliant statistician. Born in 1890, he won a scholarship to Gonville and Caius College, in Cambridge University, in 1909. Three years later, he gained first class honors in Mathematics, and he went on to have extraordinary careers in genetics and statistics. In 1929, Fisher was elected to the Royal Society, and in 1952, Queen Bessy knighted him for his many contributions to the realm, including his work on experimental design and data interpretation, and his bridging the Mendelian theory of genetics and Darwin’s theory of evolution. In 1998, Bradley Efron described Fisher as “the single most important figure in 20th century statistics.[1] And in 2010, University College, London, established the “R. A. Fisher Chair in Statistical Genetics” in honor of Fisher’s remarkable contributions to both genetics and statistics. Fisher’s college put up a stained-glass window to celebrate its accomplished graduate.

Fisher was, through his interest in genetics, also interested in eugenics through the application of genetic learning to political problems. For instance, he favored abolishing extra social support to large families, in favor of support proportional to the father’s wages. Fisher also entertained with some seriousness grand claims about the connection between rise and fall of civilizations and the loss of fertility among the upper classes.[2] While a student at Caius College, Fisher joined the Cambridge Eugenics Society, as did John Maynard Keynes. For reasons having to do with professional jealousies, Fisher’s appointment at University College London, in 1933, was as a professor of Eugenics, not Statistics.

After World War II, an agency of the United Nations, the United Nations Educational, Scientific and Cultural Organization (UNESCO) sought to forge a scientific consensus against racism, and Nazi horrors.[3] Fisher participated in the UNESCO commission, which he found to be “well-intentioned” but errant for failing to acknowledge inter-group differences “in their innate capacity for intellectual and emotional development.”[4]

Later in the UNESCO report, Fisher’s objections are described as the same as those of Herman Joseph Muller, who won the Nobel Prize for Medicine in 1946, The report provides Fisher’s objections in his own words:

“As you ask for remarks and suggestions, there is one that occurs to me, unfortunately of a somewhat fundamental nature, namely that the Statement as it stands appears to draw a distinction between the body and mind of men, which must, I think, prove untenable. It appears to me unmistakable that gene differences which influence the growth or physiological development of an organism will ordinarily pari passu influence the congenital inclinations and capacities of the mind. In fact, I should say that, to vary conclusion (2) on page 5, ‘Available scientific knowledge provides a firm basis for believing that the groups of mankind differ in their innate capacity for intellectual and emotional development,’ seeing that such groups do differ undoubtedly in a very large number of their genes.”[5]

Fisher’s comments may not be totally anodyne by today’s standards, but he had also commented that that:

“the practical international problem is that of learning to share the resources of this planet amicably with persons of materially different nature, and that this problem is being obscured by entirely well-intentioned efforts to minimize the real differences that exist.”[6]

Fisher’s comments seem to reflect his beliefs in the importance of the genetic contribution to “intelligence and emotional development,” which today retain both their plausibility and controversial status. Fisher’s participation in the UNESCO effort, and his emphasis on sharing resources peacefully, seem to speak against malignant racism, and distinguish him from the ugliness of the racism expressed by the Marxist statistician (and eugenicist) Karl Pearson.[7]

Cancel Culture Catches Up With Sir Ronald A. Fisher

Nonetheless the Woke mob has had its daggers out for Sir Ronald, for some time. Back in June of this year, graffiti covered the walls of Caius College, calling for the defenestration of Fisher.  A more sedate group circulated a petition for the removal of the Fisher window.[8] Later that month, the university removed the Fisher window, literally defenestrating him.[9]

The de-platforming of Fisher was not contained to the campus of a college in Cambridge University.  Fisher spent some of his most productive years, outside the university, at the Rothamsted Experimental Station.  Not to be found deficient in the metrics of social justice, Rothamsted Research issued a statement, on June 9, 2020, concerning its most famous resident scientist:

“Ronald Aylmer Fisher is often considered to have founded modern statistics. Starting in 1919, Fisher worked at Rothamsted Experimental Station (as it was called then) for 14 years.

Among his many interests, Fisher supported the philosophy of eugenics, which was not uncommon among intellectuals in Europe and America in the early 20th Century.

The Trustees of the Lawes Agricultural Trust, therefore, consider it appropriate to change the name of the Fisher Court accommodation block (opened in 2018 and named after the old Fisher Building that it replaced) to ‘AnoVa Court’, after the analysis of variance statistical test developed by Fisher’s team at Rothamsted, and which is widely used today. Arrangements for this change of name are currently being made.”

I suppose that soon it will verboten to mention Fisher’s Exact Test.

Daniel Cleather, a scientist and self-proclaimed anarchist, goes further and claims that the entire enterprise of statistics is racist.[10] Cleather argues that mathematical models of reality are biased against causal explanation, and that this bias supports eugenics and politically conservative goals. Cleather claims that statistical methods were developed “by white supremacists for the express purpose of demonstrating that white men are better than other people.” Cleather never delivers any evidence, however, to support his charges, but he no doubt feels strongly about it, and feels unsafe in the presence of Fisher’s work on experimental methods.

It is interesting to compare the disparate treatment that other famous scholars and scientists are receiving from the Woke. Aristotle was a great philosopher and “natural philosopher” scientist. There is a well-known philosophical society, the Aristotlean Society, obviously named for Aristotle, as is fitting. In the aftermath of the killings of George Floyd, Breonna Taylor and Ahmaud Arbery, the Aristotlean Society engaged in this bit of moral grandstanding, of which The Philosopher would have likely disapproved:

A statement from the Aristotelian Society

“The recent killings of George Floyd, Breonna Taylor and Ahmaud Arbery have underlined the systemic racism and racial injustice that continue to pervade not just US but also British society. The Aristotelian Society stands resolutely opposed to racism and discrimination in any form. In line with its founding principles, the Society is committed to ensuring that all its members can meet on an equal footing in the promotion of philosophy. In order to achieve this aim, we will continue to work to identify ways that we can improve, in consultation with others. We recognise it as part of the mission of the Society to actively promote philosophical work that engages productively with issues of race and racism.”

I am sure it occurred to the members of the Society that Aristotle had expressed a view that some people were slaves by nature.[11] Today, we certainly do not celebrate Aristotle for this view, but we have not defenestrated him for a view much more hateful than any expressed by Sir Ronald. My point is merely that the vaunted Aristotelian Society is well able to look at the entire set of accomplishments of Aristotle, and not throw him out the window for his views on slavery. Still, if you have art work depicting Aristotle, you may be wise to put it out of harms way.

If Aristotle’s transgressions were too ancient for the Woke mob, then consider those of Nathan Roscoe Pound, who was the Dean of Harvard Law School, from 1916 to 1936. Pound wrote on jurisprudential issues, and he is generally regarded as the founder of “sociological jurisprudence,” which seeks to understand law as influenced and determined by sociological conditions. Pound is celebrated especially by the plaintiffs’ bar, for his work for National Association of Claimants‘ Compensation Attorneys, which was the precursor to the Association of Trial Lawyers of America, and the current, rent-seeking, American Association for Justice. A group of “compensation lawyers” founded the Roscoe Pound –American Trial Lawyers Foundation (now the The Pound Civil Justice Institute) in 1956, to build on Pound’s work.

Pound died in 1964, but he lives on in the hearts of social justice warriors, who seem oblivious of Pound’s affinity for Hitler and Nazism.[12] Pound’s enthusiasm was not a momentary lapse, but lasted a decade according to Daniel R. Coquillette, professor of American legal history at Harvard Law School.[13] Although Pound is represented in various ways as having been a great leader throughout the Harvard Law School, Coquillette says that volume two of his history of the school will address the sordid business of Pound’s Nazi leanings. In the meanwhile, no one is spraying graffiti on Pound’s portraits, photographs, and memorabilia, which are scattered throughout the School.

I would not want my defense of Fisher to be taken as a Trumpist “what-about” rhetorical diversion. Still, the Woke criteria for defenestrations seem, at best, to be applied inconsistently. More important, the Woke seem to have no patience for examining the positive contributions made by those they denounce. In Fisher’s (and Aristotle’s) case, the balance between good and bad ideas, and the creativity and brilliance of his important contributions, should allow of people of good will to celebrate his many achievements, without moral hand waving. If the Aristotelian Society can keep its name, the Cambridge should be able to keep its stained-glass window memorial to Fisher.


[1]        Bradley Efron, “R. A. Fisher in the 21st century,” 13 Statistical Science 95, 95 (1998).

[2]        See Ronald A. Fisher, The Genetical Theory of Natural Selection 228-55 (1930) (chap. XI, “Social Selection of Fertility,” addresses the “decay of ruling classes”).

[3]        UNESCO, The Race Concept: Results of an Inquiry (1952).

[4]        Id. at 27 (noting that “Sir Ronald Fisher has one fundamental objection to the Statement, which, as he himself says, destroys the very spirit of the whole document. He believes that human groups differ profoundly “in their innate capacity for intellectual and emotional development.”)

[5]        Id. at 56.

[6]        Id. at 27.

[7]        Karl Pearson & Margaret Moul, “The Problem of Alien Immigration into Great Britain, Illustrated by an Examination of Russian and Polish Jewish Children, Part I,” 1 Ann. Human Genetics 5 (1925) (opining that Jewish immigrants “will develop into a parasitic race. […] Taken on the average, and regarding both sexes, this alien Jewish population is somewhat inferior physically and mentally to the native population.” ); “Part II,” 2 Ann. Human Genetics 111 (1927); “Part III,” 3 Ann. Human Genetics 1 (1928).

[8]        “Petition: Remove the window in honour of R. A. Fisher at Gonville & Caius, University of Cambridge.” See Genevieve Holl-Allen, “Students petition for window commemorating eugenicist to be removed from college hall; The petition surpassed 600 signatures in under a day,” The Cambridge Tab (June 2020).

[9]        Eli Cahan, “Amid protests against racism, scientists move to strip offensive names from journals, prizes, and more,” Science (July 2, 2020); Sam Kean “Ronald Fisher, a Bad Cup of Tea, and the Birth of Modern Statistics: A lesson in humility begets a scientific revolution,” Distillations (Science History Institute) (Aug. 6, 2019). Bayesians have been all-too-happy to throw shade at Fisher. See Eric-Jan Wagenmakers & Johnny van Doorn, “This Statement by Sir Ronald Fisher Will Shock You,” Bayesian Spectacles (July 2, 2020).

[10]      Daniel Cleather, “Is Statistics Racist?Medium (Mar. 9, 2020).

[11]      Aristotle, Politics, 1254b16–21.

[12]      James Q. Whitman, Hitler’s American Model: The United States and the Making of Nazi Race Law 15 & n. 39 (2017); Stephen H. Norwood, The Third Reich in the Ivory Tower 56-57 (2009); Peter Rees, “Nathan Roscoe Pound and the Nazis,”  60 Boston Coll. L. Rev. 1313 (2019); Ron Grossman, “Harvard accused of coddling Nazis,” Chicago Tribune (Nov. 30, 2004).

[13]      Garrett W. O’Brien, “The Hidden History of the Harvard Law School Library’s Treasure Room,” The Crimson (Mar. 28, 2020).

David Madigan’s Graywashed Meta-Analysis in Taxotere MDL

June 12th, 2020

Once again, a meta-analysis is advanced as a basis for an expert witness’s causation opinion, and once again, the opinion is the subject of a Rule 702 challenge. The litigation is In re Taxotere (Docetaxel) Products Liability Litigation, a multi-district litigation (MDL) proceeding before Judge Jane Triche Milazzo, who sits on the United States District Court for the Eastern District of Louisiana.

Taxotere is the brand name for docetaxel, a chemotherapic medication used either alone or in conjunction with another chemotherapy, to treat a number of different cancers. Hair loss is a side effect of Taxotere, but in the MDL, plaintiffs claim that they have experienced permanent hair loss, which was not adequately warned about in their view. The litigation thus involved issues of exactly what “permanent” means, medical causation, adequacy of warnings in the Taxotere package insert, and warnings causation.

Defendant Sanofi challenged plaintiffs’ statistical expert witness, David Madigan, a frequent testifier for the lawsuit industry. In its Rule 702 motion, Sanofi argued that Madigan had relied upon two randomized clinical trials (TAX 316 and GEICAM 9805) that evaluated “ongoing alopecia” to reach conclusions about “permanent alopecia.” Sanofi made the point that “ongoing” is not “permanent,” and that trial participants who had ongoing alopecia may have had their hair grow back. Madigan’s reliance upon an end point different from what plaintiffs complained about made his analysis irrelevant. The MDL court rejected Sanofi’s argument, with the observation that Madigan’s analysis was not irrelevant for using the wrong end point, only less persuasive, and that Sanofi’s criticism was one that “Sanofi can highlight for the jury on cross-examination.”[1]

Did Judge Milazzo engage in judicial dodging with rejecting the relevancy argument and emphasizing the truism that Sanofi could highlight the discrepancy on cross-examination?  In the sense that the disconnect can be easily shown by highlight the different event rates for the alopecia differently defined, the Sanofi argument seems like one that a jury could easily grasp and refute. The judicial shrug, however, begs the question why the defendant should have to address a data analysis that does not support the plaintiffs’ contention about “permanence.” The federal rules are supposed to advance the finding of the truth and the fair, speedy resolution of cases.

Sanofi’s more interesting argument, from the perspective of Rule 702 case law, was its claim that Madigan had relied upon a flawed methodology in analyzing the two clinical trials:

“Sanofi emphasizes that the results of each study individually produced no statistically significant results. Sanofi argues that Dr. Madigan cannot now combine the results of the studies to achieve statistical significance. The Court rejects Sanofi’s argument and finds that Sanofi’s concern goes to the weight of Dr. Madigan’s testimony, not to its admissibility.34”[2]

There seems to be a lot going on in the Rule 702 challenge that is not revealed in the cryptic language of the MDL district court. First, the court deployed the jurisprudentially horrific, conclusory language to dismiss a challenge that “goes to the weight …, not to … admissibility.” As discussed elsewhere, this judicial locution is rarely true, fails to explain the decision, and shows a lack of engagement with the actual challenge.[3] Of course, aside from the inanity of the expression, and the failure to explain or justify the denial of the Rule 702 challenge, the MDL court may have been able to provide a perfectly adequately explanation.

Second, the footnote in the quoted language, number 34, was to the infamous Milward case,[4] with the explanatory parenthetical that the First Circuit had reversed a district court for excluding testimony of an expert witness who had sought to “draw conclusions based on combination of studies, finding that alleged flaws identified by district court go to weight of testimony not admissibility.”[5] As discussed previously, the widespread use of the “weight not admissibility” locution, even by the Court of Appeals, does not justify it. More important, however, the invocation of Milward suggests that any alleged flaws in combining study results in a meta-analysis are always matters for the jury, no matter how arcane, technical, or threatening to validity they may be.

So was Judge Milazzo engaged in judicial dodging in Her Honor’s opinion in Taxotere? Although the citation to Milward tends to inculpate, the cursory description of the challenge raises questions whether the challenge itself was valid in the first place. Fortunately, in this era of electronic dockets, finding the actual Rule 702 motion is not very difficult, and we can inspect the challenge to see whether it was dodged or given short shrift. Remarkably, the reality is much more complicated than the simple, simplistic rejection by the MDL court would suggest.

Sanofi’s brief attacks three separate analyses proffered by David Madigan, and not surprisingly, the MDL court did not address every point made by Sanofi.[6] Sanofi’s point about the inappropriateness of conducting the meta-analysis was its third in its supporting brief:

“Third, Dr. Madigan conducted a statistical analysis on the TAX316 and GEICAM9805/TAX301 clinical trials separately and combined them to do a ‘meta-analysis’. But Dr. Madigan based his analysis on unproven assumptions, rendering his methodology unreliable. Even without those assumptions, Dr. Madigan did not find statistical significance for either of the clinical trials independently, making this analysis unhelpful to the trier of fact.”[7]

This introductory statement of the issue is itself not particularly helpful because it fails to explain why combining two individual clinical trials (“RCTs”), each not having “statistically significant” results, by meta-analysis would be unhelpful. Sanofi’s brief identified other problems with Madigan’s analyses, but eventually returned to the meta-analysis issue, with the heading:

“Dr. Madigan’s analysis of the individual clinical trials did not result in statistical significance, thus is unhelpful to the jury and will unfairly prejudice Sanofi.”[8]

After a discussion of some of the case law about statistical significance, Sanofi pressed its case against Madigan. Madigan’s statistical analysis of each of two RCTs apparently did not reach statistical significance, and Sanofi complained that permitting Madigan to present these two analyses with results that were “not statistically very impressive,” would confuse and mislead the jury.[9]

“Dr. Madigan tried to avoid that result here [of having two statistically non-significant results] by conducting a ‘meta-analysis’ — a greywashed term meaning that he combined two statistically insignificant results to try to achieve statistical significance. Madigan Report at 20 ¶ 53. Courts have held that meta-analyses are admissible, but only when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist. RMSE at 361–362, fn76.”

Now the claims here are quite unsettling, especially considering that they were lodged in a defense brief, in an MDL, with many cases at stake, made on behalf of an important pharmaceutical company, represented by two large, capable national or international law firms.

First, what does the defense brief signify by placing ‘meta-analysis’ in quotes. Are these scare quotes to suggest that Madigan was passing off something as a meta-analysis that failed to be one? If so, there is nothing in the remainder of the brief that explains such an interpretation. Meta-analysis has been around for decades, and reporting meta-analyses of observational or of experimental studies has been the subject of numerous consensus and standard-setting papers over the last two decades. Furthermore, the FDA has now issued a draft guidance for the use of meta-analyses in pharmacoepidemiology. Scare quotes are at best unexplained, at worst, inappropriate. If the authors had something else in mind, they did not explain the meaning of using quotes around meta-analysis.

Second, the defense lawyers referred to meta-analysis as a “greywashed” term. I am always eager to expand my vocabulary, and so I looked up the word in various dictionaries of statistical and epidemiologic terms. Nothing there. Perhaps it was not a technical term, so I checked with the venerable Oxford English Dictionary. No relevant entries.

Pushed to the wall, I checked the font of all knowledge – the internet. To be sure, I found definitions, but nothing that could explain this odd locution in a brief filed in an important motion:

gray-washing: “noun In calico-bleaching, an operation following the singeing, consisting of washing in pure water in order to wet out the cloth and render it more absorbent, and also to remove some of the weavers’ dressing.”

graywashed: “adj. adopting all the world’s cultures but not really belonging to any of them; in essence, liking a little bit of everything but not everything of a little bit.”

Those definitions do not appear pertinent.

Another website offered a definition based upon the “blogsphere”:

Graywash: “A fairly new term in the blogsphere, this means an investigation that deals with an offense strongly, but not strongly enough in the eyes of the speaker.”

Hmmm. Still not on point.

Another one from “Urban Dictionary” might capture something of what was being implied:

Graywashing: “The deliberate, malicious act of making art having characters appear much older and uglier than they are in the book, television, or video game series.”

Still, I am not sure how this is an argument that a federal judge can respond to in a motion affecting many cases.

Perhaps, you say, I am quibbling with word choices, and I am not sufficiently in tune with the way people talk in the Eastern District of Louisiana. I plead guilty to both counts. But the third, and most important point, is the defense assertion that meta-analyses are only admissible “when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist.”

This assertion is truly puzzling. Meta-analyses involve so many layers of hearsay that they will virtually never be admissible. Admissibility of the meta-analyses is virtually never the issue. When an expert witness has conducted a meta-analysis, or has relied upon one, the important legal question is whether the witness may reasonably rely upon the meta-analysis (under Rule 703) for an inference that satisfies Rule 702. The meta-analysis itself does not come into evidence, and does not go out to the jury for its deliberations.

But what about the defense brief’s “only when” language that clearly implies that courts have held that expert witnesses may rely upon meta-analyses only to reduce “numerical instability on existing statistically significant differences”? This seems clearly wrong because achieving statistical significance from studies that have no “instability” for their point estimates but individually lack statistical significance is a perfectly legitimate and valid goal. Consider a situation in which, for some reason, sample size in each study is limited by the available observations, but we have 10 studies, each with a point estimate of 1.5, and each with a 95% confidence interval of (0.88, 2.5). This hypothetical situation presents no instability of point estimates, and the meta-analytical summary point estimate would shrink the confidence interval so that the lower bound would exclude 1.0, in a perfectly valid analysis. In the real world, meta-analyses are conducted on studies with point estimates of risk that vary, because of random and non-random error, but there is no reason that meta-analyses cannot reduce random error to show that the summary point estimate is statistically significant at a pre-specified alpha, even though no constituent study was statistically significant.

Sanofi’s lawyers did not cite to any case for the remarkable proposition they advanced, but they did cite the Reference Manual for Scientific Evidence (RMSE). Earlier in the brief, the defense cited to this work in its third edition (2011), and so I turned to the cited page (“RMSE at 361–362, fn76”) only to find the introduction to the chapter on survey research, with footnotes 1 through 6.

After a diligent search through the third edition, I could not find any other language remotely supportive of the assertion by Sanofi’s counsel. There are important discussions about how a poorly conducted meta-analysis, or a meta-analysis that was heavily weighted in a direction by a methodologically flawed study, could render an expert witness’s opinion inadmissible under Rule 702.[10] Indeed, the third edition has a more sustained discussion of meta-analysis under the heading “VI. What Methods Exist for Combining the Results of Multiple Studies,”[11] but nothing in that discussion comes close to supporting the remarkable assertion by defense counsel.

On a hunch, I checked the second edition of RMSE, published in the year 2000. There was indeed a footnote 76, on page 361, which discussed meta-analysis. The discussion comes in the midst of the superseded edition’s chapter on epidemiology. Nothing, however, in the text or in the cited footnote appears to support the defense’s contention about meta-analyses are appropriate only when each included clinical trial has independently reported a statistically significant result.

If this analysis is correct, the MDL court was fully justified in rejecting the defense argument that combining two statistically non-significant clinical trials to yield a statistically significant result was methodologically infirm. No cases were cited, and the Reference Manual does not support the contention. Furthermore, no statistical text or treatise on meta-analysis supports the Sanofi claim. Sanofi did not support its motion with any affidavits of experts on meta-analysis.

Now there were other arguments advanced in support of excluding David Madigan’s testimony. Indeed, there was a very strong methodological challenge to Madigan’s decision to include the two RCTs in his meta-analysis, other than those RCTs lack of statistical significance on the end point at issue. In the words of the Sanofi brief:

“Both TAX clinical trials examined two different treatment regimens, TAC (docetaxel in combination with doxorubicin and cyclophosphamide) versus FAC (5-fluorouracil in combination with doxorubicin and cyclophosphamide). Madigan Report at 18–19 ¶¶ 47–48. Dr. Madigan admitted that TAC is not Taxotere alone, Madigan Dep. 305:21–23 (Ex. B); however, he did not rule out doxorubicin or cyclophosphamide in his analysis. Madigan Dep. 284:4–12 (“Q. You can’t rule out other chemotherapies as causes of irreversible alopecia? … A. I can’t rule out — I do not know, one way or another, whether other chemotherapy agents cause irreversible alopecia.”).”[12]

Now unlike the statistical significance argument, this argument is rather straightforward and turns on the clinical heterogeneity of the two trials that seems to clearly point to the invalidity of a meta-analysis of them. Sanofi’s lawyers could have easily supported this point with statements from standard textbooks and non-testifying experts (but alas did not). Sanofi did support their challenge, however, with citations to an important litigation and Fifth Circuit precedent.[13]

This closer look at the actual challenge to David Madigan’s opinions suggests that Sanofi’s counsel may have diluted very strong arguments about heterogeneity in exposure variable, and in the outcome variable, by advancing what seems a very doubtful argument based upon the lack of statistical significance of the individual studies in the Madigan meta-analysis.

Sanofi advanced two very strong points, first about the irrelevant outcome variable definitions used by Madigan, and second about the complexity of Taxotere’s being used with other, and different, chemotherapeutic agents in each of the two trials that Madigan combined.[14] The MDL court addressed the first point in a perfunctory and ultimately unsatisfactory fashion, but did not address the second point at all.

Ultimately, the result was that Madigan was given a pass to offer extremely tenuous opinions in an MDL on causation. Given that Madigan has proffered tendentious opinions in the past, and has been characterized as “an expert on a mission,” whose opinions are “conclusion driven,”[15] the missteps in the briefing, and the MDL court’s abridgement of the gatekeeping process are regrettable. Also regrettable is that the merits or demerits of a Rule 702 challenge cannot be fairly evaluated from cursory, conclusory judicial decisions riddled with meaningless verbiage such as “the challenge goes to the weight and not the admissibility of the witness.” Access to the actual Rule 702 motion helped shed important light on the inadequacy of one point in the motion but also the complexity and fullness of the challenge that was not fully addressed in the MDL court’s decision. It is possible that a Reply or a Supplemental brief, or oral argument, may have filled in gaps, corrected errors, or modified the motion, and the above analysis missed some important aspect of what happened in the Taxotere MDL. If so, all the more reason that we need better judicial gatekeeping, especially when a decision can affect thousands of pending cases.[16]


[1]  In re Taxotere (Docetaxel) Prods. Liab. Litig., 2019 U.S. Dist. LEXIS 143642, at *13 (E.D. La. Aug. 23, 2019) [Op.]

[2]  Op. at *13-14.

[3]  “Judicial Dodgers – Weight not Admissibility” (May 28, 2020).

[4]  Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11, 17-22 (1st Cir. 2011).

[5]  Op. at *13-14 (quoting and citing Milward, 639 F.3d at 17-22).

[6]  Memorandum in Support of Sanofi Defendants’ Motion to Exclude Expert Testimony of David Madigan, Ph.D., Document 6144, in In re Taxotere (Docetaxel) Prods. Liab. Litig. (E.D. La. Feb. 8, 2019) [Brief].

[7]  Brief at 2; see also Brief at 14 (restating without initially explaining why combining two statistically non-significant RCTs by meta-analysis would be unhelpful).

[8]  Brief at 16.

[9]  Brief at 17 (quoting from Madigan Dep. 256:14–15).

[10]  Michael D. Green, Michael Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” at 581n.89, in Fed. Jud. Center, Reference Manual on Scientific Evidence (3d ed. 2011).

[11]  Id. at 606.

[12]  Brief at 14.

[13]  Brief at 14, citing Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, at *7 (E.D. La. June 16, 2015) (Vance, J.) (quoting LeBlanc v. Chevron USA, Inc., 396 F. App’x 94, 99 (5th Cir. 2010)) (“[A] study that notes ‘that the subjects were exposed to a range of substances and then nonspecifically note[s] increases in disease incidence’ can be disregarded.”), aff’d, 650 F. App’x 170 (5th Cir. 2016). SeeThe One Percent Non-solution – Infante Fuels His Own Exclusion in Gasoline Leukemia Case” (June 25, 2015).

[14]  Brief at 14-16.

[15]  In re Accutane Litig., 2015 WL 753674, at *19 (N.J.L.Div., Atlantic Cty., Feb. 20, 2015), aff’d, 234 N.J. 340, 191 A.3d 560 (2018). SeeJohnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015); “N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses” (Aug. 8, 2018).

[16]  Cara Salvatore, “Sanofi Beats First Bellwether In Chemo Drug Hair Loss MDL,” Law360 (Sept. 27, 2019).

April Fool – Zambelli-Weiner Must Disclose

April 2nd, 2020

Back in the summer of 2019, Judge Saylor, the MDL judge presiding over the Zofran birth defect cases, ordered epidemiologist, Dr. Zambelli-Weiner to produce documents relating to an epidemiologic study of Zofran,[1] as well as her claimed confidential consulting relationship with plaintiffs’ counsel.[2]

This previous round of motion practice and discovery established that Zambelli-Weiner was a paid consultant in advance of litigation, that her Zofran study was funded by plaintiffs’ counsel, and that she presented at a Las Vegas conference, for plaintiffs’ counsel only, on [sic] how to make mass torts perfect. Furthermore, she had made false statements to the court about her activities.[3]

Zambelli-Weiner ultimately responded to the discovery requests but she and plaintiffs’ counsel withheld several documents as confidential, pursuant to the MDL’s procedure for protective orders. Yesterday, April 1, 2020, Judge Saylor entered granted GlaxoSmithKline’s motion to de-designate four documents that plaintiffs claimed to be confidential.[4]

Zambelli-Weiner sought to resist GSK’s motion to compel disclosure of the documents on a claim that GSK was seeking the documents to advance its own litigation strategy. Judge Saylor acknowledged that Zambelli-Weiner’s psycho-analysis might be correct, but that GSK’s motive was not the critical issue. According to Judge Saylor, the proper inquiry was whether the claim of confidentiality was proper in the first place, and whether removing the cloak of secrecy was appropriate under the facts and circumstances of the case. Indeed, the court found “persuasive public-interest reasons” to support disclosure, including providing the FDA and the EMA a complete, unvarnished view of Zambelli-Weiner’s research.[5] Of course, the plaintiffs’ counsel, in close concert with Zambelli-Weiner, had created GSK’s need for the documents.

This discovery battle has no doubt been fought because plaintiffs and their testifying expert witnesses rely heavily upon the Zambelli-Weiner study to support their claim that Zofran causes birth defects. The present issue is whether four of the documents produced by Dr. Zambelli-Weiner pursuant to subpoena should continue to enjoy confidential status under the court’s protective order. GSK argued that the documents were never properly designated as confidential, and alternatively, the court should de-designate the documents because, among other things, the documents would disclose information important to medical researchers and regulators.

Judge Saylor’s Order considered GSK’s objections to plaintiffs’ and Zambelli-Weiner’s withholding four documents:

(1) Zambelli-Weiner’s Zofran study protocol;

(2) Undisclosed, hidden analyses that compared birth defects rates for children born to mothers who used Zofran with the rates seen with the use of other anti-emetic medications;

(3) An earlier draft Zambelli-Weiner’s Zofran study, which she had prepared to submit to the New England Journal of Medicine; and

(4) Zambelli-Weiner’s advocacy document, a “Causation Briefing Document,” which she prepared for plaintiffs’ lawyers.

Judge Saylor noted that none of the withheld documents would typically be viewed as confidential. None contained “sensitive personal, financial, or medical information.”[6]  The court dismissed Zambelli-Weiner’s contention that the documents all contained “business and proprietary information,” as conclusory and meritless. Neither she nor plaintiffs’ counsel explained how the requested documents implicated proprietary information when Zambelli-Weiner’s only business at issue is to assist in making lawsuits. The court observed that she is not “engaged in the business of conducting research to develop a pharmaceutical drug or other proprietary medical product or device,” and is related solely to her paid consultancy to plaintiffs’ lawyers. Neither she nor the plaintiffs’ lawyers showed how public disclosure would hurt her proprietary or business interests. Of course, if Zambelli-Weiner had been dishonest in carrying out the Zofran study, as reflected in study deviations from its protocol, her professional credibility and her business of conducting such studies might well suffer. Zambelli-Weiner, however, was not prepared to affirm the antecedent of that hypothetical. In any event, the court found that whatever right Zambelli-Weiner might have enjoyed to avoid discovery evaporated with her previous dishonest representations to the MDL court.[7]

The Zofran Study Protocol

GSK sought production of the Zofran study protocol, which in theory contained the research plan for the Zofran study and the analyses the researchers intended to conduct. Zambelli-Weiner attempted to resist production on the specious theory that she had not published the protocol, but the court found this “non-publication” irrelevant to the claim of confidentiality. Most professional organizations, such as the International Society of Pharmacoepidemiology (“ISPE”), which ultimately published Zambelli-Weiner’s study, encourage the publication and sharing of study protocols.[8] Disclosure of protocols helps ensure the integrity of studies by allowing readers to assess whether the researchers have adhered to their study plan, or have engaged in ad hoc data dredging in search for a desired result.[9]

The Secret, Undisclosed Analyses

Perhaps even more egregious than withholding the study protocol was the refusal to disclose unpublished analyses comparing the rate of birth defects among children born to mothers who used Zofran with the birth defect rates of children with in utero exposure to other anti-emetic medications.  In ruling that Zambelli-Weiner must produce the unpublished analyses, the court expressed its skepticism over whether these analyses could ever have been confidential. Under ISPE guidelines, researchers must report findings that significantly affect public health, and the relative safety of Zofran is essential to its evaluation by regulators and prescribing physicians.

Not only was Zambelli-Weiner’s failure to include these analyses in her published article ethically problematic, but she apparently hid these analyses from the Pharmacovigilance Risk Assessment Committee (PRAC) of the European Medicines Agency, which specifically inquired of Zambelli-Weiner whether she had performed such analyses. As a result, the PRAC recommended a label change based upon Zambelli-Weiner’s failure to disclosure material information. Furthermore, the plaintiffs’ counsel represented they intended to oppose GSK’s citizen petition to the FDA, based upon the Zambelli-Weiner study. The apparently fraudulent non-disclosure of relevant analyses could not have been more fraught for public health significance. The MDL court found that the public health need trumped any (doubtful) claim to confidentiality.[10] Against the obvious public interest, Zambelli-Weiner offered no “compelling countervailing interest” in keeping her secret analyses confidential.

There were other aspects to the data-dredging rationale not discussed in the court’s order. Without seeing the secret analyses of other anti-emetics, readers were deprive of an important opportunity to assess actual and potential confounding in her study. Perhaps even more important, the statistical tools that Zambelli-Weiner used, including any measurements of p-values and confidence intervals, and any declarations of “statistical significance,” were rendered meaningless by her secret, undisclosed, multiple testing. As noted by the American Statistical Association (ASA) in its 2016 position statement, “4. Proper inference requires full reporting and transparency.”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

“Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”[11]

The Draft Manuscript for the New England Journal of Medicine

The MDL court wasted little time and ink in dispatching Zambelli-Weiner’s claim of confidentiality for her draft New England Journal of Medicine manuscript. The court found that she failed to explain how any differences in content between this manuscript and the published version constituted “proprietary business information,” or how disclosure would cause her any actual prejudice.

Zambelli-Weiner’s Litigation Road Map

In a world where social justice warriors complain about organizations such as Exponent, for its litigation support of defense efforts, the revelation that Zambelli-Weiner was helping to quarterback the plaintiffs’ offense deserves greater recognition. Zambelli-Weiner’s litigation road map was clearly created to help Grant & Eisenhofer, P.A., the plaintiffs’ lawyers,, create a causation strategy (to which she would add her Zofran study). Such a document from a consulting expert witness is typically the sort of document that enjoys confidentiality and protection from litigation discovery. The MDL court, however, looked beyond Zambelli-Weiner’s role as a “consulting witness” to her involvement in designing and conducting research. The broader extent of her involvement in producing studies and communicating with regulators made her litigation “strategery” “almost certainly relevant to scientists and regulatory authorities” charged with evaluating her study.”[12]

Despite Zambelli-Weiner’s protestations that she had made a disclosure of conflict of interest, the MDL court found her disclosure anemic and the public interest in knowing the full extent of her involvement in advising plaintiffs’ counsel, long before the study was conducted, great.[13]

The legal media has been uncommonly quiet about the rulings on April Zambelli-Weiner, in the Zofran litigation. From the Union of Concerned Scientists, and other industry scolds such as David Egilman, David Michaels, and Carl Cranor – crickets. Meanwhile, while the appeal over the admissibility of her testimony is pending before the Pennsylvania Supreme Court,[14] Zambelli-Weiner continues to create an unenviable record in Zofran, Accutane,[15] Mirena,[16] and other litigations.


[1]  April Zambelli‐Weiner, Christina Via, Matt Yuen, Daniel Weiner, and Russell S. Kirby, “First Trimester Pregnancy Exposure to Ondansetron and Risk of Structural Birth Defects,” 83 Reproductive Toxicology 14 (2019).

[2]  See In re Zofran (Ondansetron) Prod. Liab. Litig., 392 F. Supp. 3d 179, 182-84 (D. Mass. 2019) (MDL 2657) [cited as In re Zofran].

[3]  “Litigation Science – In re Zambelli-Weiner” (April 8, 2019); “Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL” (July 30, 2019). See also Nate Raymond, “GSK accuses Zofran plaintiffs’ law firms of funding academic study,” Reuters (Mar. 5, 2019).

[4]  In re Zofran Prods. Liab. Litig., MDL No. 1:15-md-2657-FDS, Order on Defendant’s Motion to De-Designate Certain Documents as Confidential Under the Protective Order (D.Mass. Apr. 1, 2020) [Order].

[5]  Order at n.3

[6]  Order at 3.

[7]  See In re Zofran, 392 F. Supp. 3d at 186.

[8]  Order at 4. See also Xavier Kurz, Susana Perez-Gutthann, the ENCePP Steering Group, “Strengthening standards, transparency, and collaboration to support medicine evaluation: Ten years of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP),” 27 Pharmacoepidemiology & Drug Safety 245 (2018).

[9]  Order at note 2 (citing Charles J. Walsh & Marc S. Klein, “From Dog Food to Prescription Drug Advertising: Litigating False Scientific Establishment Claims Under the Lanham Act,” 22 Seton Hall L. Rev. 389, 431 (1992) (noting that adherence to study protocol “is essential to avoid ‘data dredging’—looking through results without a predetermined plan until one finds data to support a claim”).

[10]  Order at 5, citing Anderson v. Cryovac, Inc., 805 F.2d 1, 8 (1st Cir. 1986) (describing public-health concerns as “compelling justification” for requiring disclosing of confidential information).

[11]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016)

See alsoThe American Statistical Association’s Statement on and of Significance” (March 17, 2016).“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses (Oct. 14, 2014).

[12]  Order at 6.

[13]  Cf. Elizabeth J. Cabraser, Fabrice Vincent & Alexandra Foote, “Ethics and Admissibility: Failure to Disclose Conflicts of Interest in and/or Funding of Scientific Studies and/or Data May Warrant Evidentiary Exclusions,” Mealey’s Emerging Drugs Reporter (Dec. 2002) (arguing that failure to disclose conflicts of interest and study funding should result in evidentiary exclusions).

[14]  Walsh v. BASF Corp., GD #10-018588 (Oct. 5, 2016, Pa. Ct. C.P. Allegheny Cty., Pa.) (finding that Zambelli-Weiner’s and Nachman Brautbar’s opinions that pesticides generally cause acute myelogenous leukemia, that even the smallest exposure to benzene increases the risk of leukemia offended generally accepted scientific methodology), rev’d, 2018 Pa. Super. 174, 191 A.3d 838, 842-43 (Pa. Super. 2018), appeal granted, 203 A.3d 976 (Pa. 2019).

[15]  In re Accutane Litig., No. A-4952-16T1, (Jan. 17, 2020 N.J. App. Div.) (affirming exclusion of Zambelli-Weiner as an expert witness).

[16]  In re Mirena IUD Prods. Liab. Litig., 169 F. Supp. 3d 396 (S.D.N.Y. 2016) (excluding Zambelli-Weiner in part).