TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

American Statistical Association – Consensus versus Personal Opinion

December 13th, 2019

Lawyers and judges pay close attention to standards, guidances, and consenus statements from respected and recognized professional organizations. Deviations from these standards may be presumptive evidence of malpractice or malfeasance in civil and criminal litigation, in regulatory matters, and in other contexts. One important, recurring situation arises when trial judges must act as gatekeepers of the admissibility of expert witness opinion testimony. In making this crucial judicial determination, judges will want to know whether a challenged expert witness has deviated from an accepted professional standard of care or practice.

In 2016, the American Statistical Association (ASA) published a consensus statement on p-values. The ASA statement grew out of a lengthy process that involved assembling experts of diverse viewpoints. In October 2015, the ASA convened a two-day meeting for 20 experts to meet and discuss areas of core agreement. Over the following three months, the participating experts and the ASA Board members continued their discussions, which led to the ASA Executive Committee’s approval of the statement that was published in March 2016.[1]

The ASA 2016 Statement spelled out six relatively uncontroversial principles of basic statistical practice.[2] Far from rejecting statistical significance, the six principles embraced statistical tests as an important but insufficient basis for scientific conclusions:

“3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Despite the fairly clear and careful statement of principles, legal actors did not take long to misrepresent the ASA principles.[3] What had been a prescription about the insufficiency of p-value thresholds was distorted into strident assertions that statistical significance was unnecessary for scientific conclusions.

Three years after the ASA published its p-value consensus document, ASA Executive Director, Ronald Wasserstein, and two other statisticians, published an editorial in a supplemental issue of The American Statistician, in which they called for the abandonment of significance testing.[4] Although the Wasserstein’s editorial was clearly labeled as such, his essay introduced the special journal issue, and it appeared without disclaimer over his name, and his official status as the ASA Executive Director.

Sowing further confusion, the editorial made the following pronouncement:[5]

“The [2016] ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of ‘statistical significance’ be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as ‘significantly different’, ‘p < 0.05’, and ‘nonsignificant’ survive, whether expressed in words, by asterisks in a table, or in some other way.”

The ASA is a collective body, and its ASA Statement 2016 was a statement from that body, which spoke after lengthy deliberation and debate. The language, quoted above, moves within one paragraph, from the ASA Statement to the royal “We,” who are taking the step of abandoning the term “statistically significant.” Given the unqualified use of the collective first person pronoun in the same paragraph that refers to the ASA, combined with Ronald Wasserstein’s official capacity, and the complete absence of a disclaimer that this pronouncement was simply a personal opinion, a reasonable reader could hardly avoid concluding that this pronouncement reflected ASA policy.

Your humble blogger, and others, read Wasserstein’s 2019 editorial as an ASA statement.[6] Although it is true that the 2019 paper is labeled “editorial,” and that the editorial does not describe a consensus process, there is no disclaimer such as is customary when someone in an official capacity publishes a personal opinion. Indeed, rather than the usual disclaimer, the Wasserstein editorial thanks the ASA Board of Directors “for generously and enthusiastically supporting the ‘p-values project’ since its inception in 2014.” This acknowledgement strongly suggests that the editorial is itself part of the “p-values project,” which is “enthusiastically” supported by the ASA Board of Directors.

If the editorial were not itself confusing enough, an unsigned email from “ASA <asamail@amstat.org>” was sent out in July 2019, in which the anonymous ASA author(s) takes credit for changing statistical guidelines at the New England Journal of Medicine:[7]

From: ASA <asamail@amstat.org>
Date: Thu, Jul 18, 2019 at 1:38 PM
Subject: Major Medical Journal Updates Statistical Policy in Response to ASA Statement
To: <XXXX>

The email is itself an ambiguous piece of evidence as to what the ASA is claiming. The email says that the New England Journal of Medicine changed its guidelines “in response to the ASA Statement on P-values and Statistical Significance and the subsequent The American Statistician special issue on statistical inference.” Of course, the “special issue” was not just Wasserstein’s editorial, but the 42 other papers. So this claim leaves open to doubt exactly what in the 2019 special issue the NEJM editors were responding to. Given that the 42 articles that followed Wasserstein’s editorial did not all agree with Wasserstein’s “steps taken,” or with each other, the only landmark in the special issue was the editorial over the name of the ASA’s Executive Director.

Moreover, a reading of the NEJM revised guidelines does not suggest that the journal’s editors were unduly influenced by the Wasserstein editorial or the 42 accompanying papers. The journal mostly responded to the ASA 2016 consensus paper by putting some teeth into its Principle 4, which dealt with multiplicity concerns in submitted manuscripts.  The newly adopted (2019) NEJM author guidelines do not take step out with Wasserstein and colleagues; there is no general prohibition on p-values or statements of “statistical significance.”

The confusion propagated by the Wasserstein 2019 editorial has not escaped the attention of other ASA officials. An editorial in the June 2019 issue of AmStat News, by ASA President Karen Kafadar, noted the prevalent confusion and uneasiness over the 2019 The American Statistician special issue, the lack of consensus, and the need for healthy debate.[8]

In this month’s issue of AmStat News, President Kafadar returned to the issue of the confusion over the 2019 ASA special issue of The American Statistician, in her “President’s Corner.” Because Executive Director Wasserstein’s editorial language about “we now take this step” is almost certainly likely to find its way into opportunistic legal briefs, Kafadar’s comments are worth noting in some detail:[9]

“One final challenge, which I hope to address in my final month as ASA president, concerns issues of significance, multiplicity, and reproducibility. In 2016, the ASA published a statement that simply reiterated what p-values are and are not. It did not recommend specific approaches, other than ‘good statistical practice … principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean’.

The guest editors of the March 2019 supplement to The American Statistician went further, writing: ‘The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. … [I]t is time to stop using the term “statistically significant” entirely’.

Many of you have written of instances in which authors and journal editors – and even some ASA members – have mistakenly assumed this editorial represented ASA policy. The mistake is understandable: The editorial was coauthored by an official of the ASA. In fact, the ASA does not endorse any article, by any author, in any journal – even an article written by a member of its own staff in a journal the ASA publishes.”

Kafadar’s caveat should quash incorrect assertions about the ASA’s position on statistical significance testing. It is a safe bet, however, that such assertions will appear in trial and appellate briefs.

Statistical reasoning is difficult enough for most people, but the hermeneutics of American Statistical Association publications on statistical significance may require a doctorate of divinity degree. In a cleverly titled post, Professor Deborah Mayo argues that there is no other way to interpret the Wasserstein 2019 editorial except as laying down an ASA prescription. Deborah G. Mayo, “Les stats, c’est moi,” Error Philosophy (Dec. 13, 2019). I accept President Kafadar’s correction at face value, and accept that I, like many other readers, misinterpreted the Wasserstein editorial as having the imprimatur of the ASA. Mayo points out, however, that Kafadar’s correction in a newsletter may be insufficient at this point, and that a stronger disclaimer is required. Officers of the ASA are certainly entitled to their opinions and the opportunity to present them, but disclaimers would bring clarity and transparency to published work of these officials.

Wasserstein’s 2019 editorial goes further to make a claim about how his “step” will ameliorate the replication crisis:

“In this world, where studies with ‘p < 0.05’ and studies with ‘p > 0.05 are not automatically in conflict, researchers will see their results more easily replicated – and, even when not, they will better understand why.”

The editorial here seems to be attempting to define replication failure out of existence. This claim, as stated, is problematic. A sophisticated practitioner may think of the situation in which two studies, one with p = .048, and another with p = 0.052 might be said not to be conflict. In real world litigation, however, advocates will take Wasserstein’s statement about studies not in conflict (despite p-values above and below a threshold, say 5%) to the extremes. We can anticipate claims that two similar studies with p-values above and below 5%, say with one p-value at 0.04, and the other at 0.40, will be described as not in conflict, with the second a replication of the first test. It is hard to see how this possible interpretation of Wasserstein’s editorial, although consistent with its language, will advance sound, replicable science.[10]


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016).

[2]The American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016).

[3] See, e.g., “The Education of Judge Rufe – The Zoloft MDL” (April 9, 2016) (Zoloft litigation); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016); “The American Statistical Association Statement on Significance Testing Goes to Court – Part I” (Nov. 13, 2018).

[4] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[5] Id. at S2.

[6] SeeHas the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019); Deborah G. Mayo, “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean,” Error Statistics Philosophy (June 17, 2019); B. Haig, “The ASA’s 2019 update on P-values and significance,” Error Statistics Philosophy  (July 12, 2019).

[7] SeeStatistical Significance at the New England Journal of Medicine” (July 19, 2019); See also Deborah G. Mayo, “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?Error Statistics Philosophy  (July 19, 2019).

[8] See Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

[9] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019).

[10]  See also Deborah G. Mayo, “P‐value thresholds: Forfeit at your peril,” 49 Eur. J. Clin. Invest. e13170 (2019).

 

Is the IARC Lost in the Weeds?

November 30th, 2019

A couple of years ago, I met David Zaruk at a Society for Risk Analysis meeting, where we were both presenting. I was aware of David’s blogging and investigative journalism, but meeting him gave me a greater appreciation for the breadth and depth of his work. For those of you who do not know David, he is present in cyberspace as the Risk-Monger who blogs about risk and science communications issues. His blog has featured cutting-edge exposés about the distortions in risk communications perpetuated by the advocacy of non-governmental organizations (NGOs). Previously, I have recorded my objections to the intellectual arrogance of some such organizations that purport to speak on behalf of the public interest, when often they act in cahoots with the lawsuit industry in the manufacturing of tort and environmental litigation.

David’s writing on the lobbying and control of NGOs by plaintiffs’ lawyers from the United States should be required reading for everyone who wants to understand how litigation sausage is made. His series, “SlimeGate” details the interplay among NGO lobbying, lawsuit industry maneuvering, and carcinogen determinations at the International Agency for Research on Cancer (IARC). The IARC, a branch of the World Health Organization, is headquartered in Lyon, France. The IARC convenes “working groups” to review the scientific studies of the carcinogencity of various substances and processes. The IARC working groups produce “monographs” of their reviews, and the IARC publishes these monographs, in print and on-line. The United States is in the top tier of participating countries for funding the IARC.

The IARC was founded in 1965, when observational epidemiology was still very much an emerging science, with expertise concentrated in only a few countries. For its first few decades, the IARC enjoyed a good reputation, and its monographs were considered definitive reviews, especially under its first director, Dr. John Higginson, from 1966 to 1981.[1] By the end of the 20th century, the need for the IARC and its reviews had waned, as the methods of systematic review and meta-analyses had evolved significantly, and had became more widely standardized and practiced.

Understandably, the IARC has been concerned that the members of its working groups should be viewed as disinterested scientists. Unfortunately, this concern has been translated into an asymmetrical standard that excludes anyone with a hint of manufacturing connection, but keeps the door open for those scientists with deep lawsuit industry connections. Speaking on behalf of the plaintiffs’ bar, Michael Papantonio, a plaintiffs’ lawyer who founded Mass Torts Made Perfect, noted that “We [the lawsuit industry] operate just like any other industry.”[2]

David Zaruk has shown how this asymmetry has been exploited mercilessly by the lawsuit industry and its agents in connection with the IARC’s review of glyphosate.[3] The resulting IARC classification of glyphosate has led to a litigation firestorm and an all-out assault on agricultural sustainability and productivity.[4]

The anomaly of the IARC’s glyphosate classification has been noted by scientists as well. Dr. Geoffrey Kabat is a cancer epidemiologist, who has written perceptively on the misunderstandings and distortions of cancer risk assessments in various settings.[5] He has previously written about glyphosate in Forbes and elsewhere, but recently he has written an important essay on glyphosate in Issues in Science and Technology, which is published by the National Academies of Sciences, Engineering, and Medicine and Arizona State University. In his essay, Dr. Kabat details how the IARC’s evaluation of glyphosate is an outlier in the scientific and regulatory world, and is not well supported by the available evidence.[6]

The problems with the IARC are both substantive and procedural.[7] One of the key problems that face IARC evaluations is an incoherent classification scheme. IARC evaluations classify putative human carcinogenic risks into five categories: Group I (known), Group 2A (probably), Group 2B (possibly), Group 3 (unclassifiable), and Group 4 (probably not). Group 4 is virtually an empty set with only one substance, caprolactam ((CH2)5C(O)NH), an organic compound used in the manufacture of nylon.

In the IARC evaluation at issue, glyphosate was placed into Group 2A, which would seem to satisfy the legal system’s requirement that an exposure more likely than not causes the harm in question. Appearances and word usage, however, can be deceiving. Probability is a continuous scale from zero to one. In Bayesian decision making, zero and one are unavailable because if either was our starting point, no amount of evidence could ever change our judgment of the probability of causation. (Cromwell’s Rule) The IARC informs us that its use of “probably” is quite idiosyncratic; the probability that a Group 2A agent causes cancer has “no quantitative” meaning. All the IARC intends is that a Group 2A classification “signifies a greater strength of evidence than possibly carcinogenic.”[8]

In other words, Group 2A classifications are consistent with having posterior probabilities of less than 0.5 (or 50 percent). A working group could judge the probability of a substance or a process to be carcinogenic to humans to be greater than zero, but no more than five or ten percent, and still vote for a 2A classification, in keeping with the IARC Preamble. This low probability threshold for a 2A classification converts the judgment of “probably carcinogenic” into a precautionary prescription, rendered when the most probable assessment is either ignorance or lack of causality. There is thus a practical certainty, close to 100%, that a 2A classification will confuse judges and juries, as well as the scientific community.

In IARC-speak, a 2A “probability” connotes “sufficient evidence” in experimental animals, and “limited evidence” in humans. A substance can receive a 2A classification even when the sufficient evidence of carcinogenicity occurs in one non-human animal specie, even though other animal species fail to show carcinogenicity. A 2A classification can raise the thorny question in court whether a claimant is more like a rat or a mouse.

Similarly, “limited evidence” in humans can be based upon inconsistent observational studies that fail to measure and adjust for known and potential confounding risk factors and systematic biases. The 2A classification requires little substantively or semantically, and many 2A classifications leave juries and judges to determine whether a chemical or medication caused a human being’s cancer, when the basic predicates for Sir Austin Bradford Hill’s factors for causal judgment have not been met.[9]

In courtrooms, IARC 2A classifications should be excluded as legally irrelevant, under Rule 403. Even if a 2A IARC classification were a credible judgment of causation, admitting evidence of the classification would be “substantially outweighed by a danger of … unfair prejudice, confusing the issues, [and] misleading the jury….”[10]

The IARC may be lost in the weeds, but there is no need to fret. A little Round Up™ will help.


[1]  See John Higginson, “The International Agency for Research on Cancer: A Brief History of Its History, Mission, and Program,” 43 Toxicological Sci. 79 (1998).

[2]  Sara Randazzo & Jacob Bunge, “Inside the Mass-Tort Machine That Powers Thousands of Roundup Lawsuits,” Wall St. J. (Nov. 25, 2019).

[3]  David Zaruk, “The Corruption of IARC,” Risk Monger (Aug. 24, 2019); David Zaruk, “Greed, Lies and Glyphosate: The Portier Papers,” Risk Monger (Oct. 13, 2017).

[4]  Ted Williams, “Roundup Hysteria,” Slate Magazine (Oct. 14, 2019).

[5]  See, e.g., Geoffrey Kabat, Hyping Health Risks: Environmental Hazards in Everyday Life and the Science of Epidemiology (2008); Geoffrey Kabat, Getting Risk Right: Understanding the Science of Elusive Health Risks (2016).

[6]  Geoffrey Kabat, “Who’s Afraid of Roundup?” 36 Issues in Science and Technology (Fall 2019).

[7]  See Schachtman, “Infante-lizing the IARC” (May 13, 2018); “The IARC Process is Broken” (May 4, 2016). See also Eric Lasker and John Kalas, “Engaging with International Carcinogen Evaluations,” Law360 (Nov. 14, 2019).

[8]  “IARC Preamble to the IARC Monographs on the Identification of Carcinogenic Hazards to Humans,” at Sec. B.5., p.31 (Jan. 2019); See alsoIARC Advisory Group Report on Preamble” (Sept. 2019).

[9]  See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965) (noting that only when “[o]ur observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” do we move on to consider the nine articulated factors for determining whether an association is causal.

[10]  Fed. R. Evid. 403.

 

Does the California State Bar Discriminate Unlawfully?

November 24th, 2019

Earlier this month, various news outlets announced a finding in a California study that black male attorneys are three times more likely to be disciplined by the State Bar than their white male counterparts.[1] Some of the news accounts treated the study findings as conclusions that the Bar had engaged in race discrimination. One particularly irresponsible website proclaimed that “bar discipline is totally racist.”[2] Indeed, the California State Bar itself apparently plans to hire consulting experts to help it achieve “bias-free decision-making and processes,” to eliminate “unintended bias,” and to consider how, if at all, to weigh prior complaints in the disciplinary procedure.[3]

The California Bar’s report was prepared by a social scientist, George Farkas, of the School of Education at University of California, Irvine. Based upon data from attorneys admitted to the California bar between 1990 and 2008, Professor Farkas reported crude prevalence rates of discipline, probation, disbarment, or resignation, by race.[4] The disbarment/ resignation rate for black male lawyers was 3.9%, whereas the rate for white male lawyers was 1%. Disparities, however, are not unlawful discriminations.

The disbarment/resignation rate for black female lawyers was 0.9%, but no one has suggested that there is implicit bias in favor of black women over both black and white male lawyers. White women were twice as likely as Asian women to resign, or be placed on probation or be disbarred (0.4% versus 0.2%).

The ABA’s coverage sheepishly admitted that “[d]ifferences could be explained by the number of complaints received about an attorney, the number of investigations opened, the percentage of investigations in which a lawyer was not represented by counsel, and previous discipline history.”[5]

Farkas’s report of October 31, 2019, was transmitted to the Bar’s Board of Trustees, on November 14th.[6] As anyone familiar with discrimination law would have expected, Professor Farkas conducted multiple regression analyses that adjusted for the number of previous complaints filed against the errant lawyer, and whether the lawyer was represented by counsel before the Bar. The full analyses showed that these other important variables, not race – not could – but did explain variability in discipline rates:

“Statistically, these variables explained all of the differences in probation and disbarment rates by race/ethnicity. Among all variables included in the final analysis, prior discipline history was found to have the strongest effects [sic] on discipline outcomes, followed by the proportion of investigations in which the attorney under investigation was represented by counsel, and the number of investigations.”[7]

The number of previous complaints against a particular lawyer surely has a role in considering whether a miscreant lawyer should be placed on probation, or subjected to disbarment. And without further refinement of the analysis, and irrespective of race or ethnicity, failure to retain counsel for disciplinary hearings may correlate strongly with futility of any defense.

Curiously, the Farkas report did not take into account the race or ethnicity of the complainants before the Bar’s disciplinary committee. The Farkas report seems reasonable as far as it goes, but the wild conclusions drawn in the media would not pass Rule 702 gatekeeping.


[1]  See, e.g., Emma Cueto, “Black Male Attorneys Disciplined More Often, California Study Finds,” Law360 (Nov. 18, 2019); Debra Cassens Weiss, “New California bar study finds racial disparities in lawyer discipline,” Am. Bar Ass’n J. (Nov. 18, 2019).

[2]  Joe Patrice, “Study Finds That Bar Discipline Is Totally Racist Shocking Absolutely No One: Black male attorneys are more likely to be disciplined than white attorneys,” Above the Law (Nov. 19, 2019).

[3]  Debra Cassens Weiss, “New California bar study finds racial disparities in lawyer discipline,” Am. Bar Ass’n J. (Nov. 18, 2019).

[4]  George Farkas, “Discrepancies by Race and Gender in Attorney Discipline by the State Bar of California: An Empirical Analysis” (Oct. 31, 2019).

[5]  Debra Cassens Weiss, supra at note 3.

[6]  Dag MacLeod (Chief of Mission Advancement & Accountability Division) & Ron Pi (Principal Analyst, Office of Research & Institutional Accountability), Report on Disparities in the Discipline System (Nov. 14, 2019).

[7] Dag MacLeod & Pi, Report on Disparities in the Discipline System at 4 (Nov. 14, 2019) (emphasis added).

Everything She Just Said Was Bullshit

September 26th, 2019

At this point, most products liability lawyers have read about the New Jersey verdicts returned earlier this month against Johnson & Johnson in four mesothelioma cases.[1] The Middlesex County jury found that the defendant’s talc and its supposed asbestos impurities were a cause of all four mesothelioma cases, and awarded compensatory damages of $37.3 million, in the cases.[2]

Johnson & Johnson was prejudiced by having to try four cases questionably consolidated together, and then hobbled by having its affirmative defense evidence stricken, and finally crucified when the trial judge instructed the jury at the end of the defense lawyer’s closing argument: “everything she just said was bullshit.”

Judge Ana C. Viscomi, who presided over the trial, struck the entire summation of defense lawyer Diane Sullivan. The action effectively deprived Johnson & Johnson of a defense, as can be seen from the verdicts. Judge Viscomi’s egregious ruling was given without explaining which parts of Sullivan’s closing were objectionable, and without giving Sullivan an opportunity to argue against the sanction.

During the course of Sullivan’s closing argument, Judge Viscomi criticized Sullivan for calling the plaintiffs’ lawyers “sinister,” and suggested that her argument was defaming the legal profession in violation of the Rules of Professional Conduct.[3] Sullivan did use the word “sinister” several times, but in each instance, she referred to the plaintiffs’ arguments, allegations, and innuendo about Johnson & Johnson’s action. Judge Viscomi curiously imputed unprofessional conduct to Sullivan for referring to plaintiffs’ counsel’s “shows and props,” as a suggestion that plaintiffs’ counsel had fabricated evidence.

Striking an entire closing argument is, as far as anyone has determined, unprecedented. Of course, Judge Haller is fondly remembered for having stricken the entirety of Vinny Gambini’s opening statement, but the good judge did allow Vinny’s “thank you” to stand:

Vinny Gambini: “Yeah, everything that guy just said is bullshit… Thank you.”

D.A. Jim Trotter: “Objection. Counsel’s entire opening statement is argument.”

Judge Chamberlain Haller: “Sustained. Counselor’s entire opening statement, with the exception of ‘Thank you’ will be stricken from the record.”

My Cousin Vinny (1992).

In the real world of a New Jersey courtroom, even Ms. Sullivan’s expression of gratitude for the jury’s attention and service succumbed to Judge Viscomi’s unprecedented ruling,[4] as did almost 40 pages of argument in which Sullivan carefully debunked and challenged the opinion testimony of plaintiffs’ highly paid expert witnesses. The trial court’s ruling undermined the defense’s detailed rebuttal of plaintiffs’ evidence, as well as the defense’s comment upon the plaintiffs’ witnesses’ lack of credibility.

Judge Viscomi’s sua sponte ruling appears even more curious given what took place in the aftermath of her instructing the jury to disregard Sullivan’s argument. First, the trial court gave very disparate treatment to plaintiffs’ counsel. The lawyers for the plaintiffs gave extensive closing arguments that were replete with assertions that Johnson & Johnson and Ms. Sullivan were liars, predators, manipulators, poisoners, baby killers, and then some. Sullivan’s objections were perfunctorily overruled. Second, Judge Viscomi permitted plaintiffs’ counsel to comment extensively upon Ms. Sullivan’s closing, even though it had been stricken. Third, despite the judicial admonition about the Rules of Professional Conduct, neither the trial judge nor plaintiffs’ counsel appear to have filed a disciplinary complaint against Ms. Sullivan. Of course, if Judge Viscomi or the plaintiffs’ counsel thought that Ms. Sullivan had violated the Rules, then they would be obligated to report Ms. Sullivan for misconduct.

Bottom line: these verdicts are unsafe.


[1]  The cases were tried in a questionable consolidation in the New Jersey Superior Court, for Middlesex County, before Judge Viscomi. Barden v. Brenntag North America, No. L-1809-17; Etheridge v. Brenntag North America, No. L-932-17; McNeill-George v. Brenntag North America, No. L-7049-16; and Ronning v. Brenntag North America, No. L-6040-17.

[2]  Bill Wichert, “J&J Hit With $37.3M Verdict In NJ Talc Case,” Law360 (Sept. 11, 2019).

[3]  Amanda Bronstad, “J&J Moves for Talc Mistrial After Judge Strikes Entire Closing Argument,” N.J.L.J. (Sept. 10, 2019) (describing Judge Viscomi as having admonished Ms. Sullivan to “[s]top denigrating the lawyers”; J&J’s motion for mistrial was made before the case was submitted to the jury).

[4]  See Peder B. Hong, “Summation at the Border: Serious Misconduct in Final Argument in Civil Trials,” 19 Hamline L. Rev. 179 (1995); Ty Tasker, “Stick and Stones: Judicial Handling of Invective in Advocacy,” 42 Judges J. 17 (2003); Janelle L. Davis, “Sticks and Stones May Break My Bones, But Names Could Get Me a Mistrial: An Examination of Name-Calling in Closing Argument in Civil Cases,” 42 Gonzaga L. Rev. 133 (2011).

Palavering About P-Values

August 17th, 2019

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[1] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Helena Chmura Kraemer is an accomplished professor of statistics at Stanford University. This week the Journal of the American Medical Association network flagged Professor Kraemer’s opinion piece on p-values as one of its most read articles. Kraemer’s eye-catching title creates the impression that the p-value is unnecessary and inimical to valid inference.[2]

Remarkably, Kraemer’s article commits the very mistake that the ASA set out to correct back in 2016,[3] by conflating the probability of the data under a hypothesis of no association with the probability of a hypothesis given the data:

“If P value is less than .05, that indicates that the study evidence was good enough to support that hypothesis beyond reasonable doubt, in cases in which the P value .05 reflects the current consensus standard for what is reasonable.”

The ASA tried to break the bad habit of scientists’ interpreting p-values as allowing us to assign posterior probabilities, such as beyond a reasonable doubt, to hypotheses, but obviously to no avail.

Kraemer also ignores the ASA 2016 Statement’s teaching of what the p-value is not and cannot do, by claiming that p-values are determined by non-random error probabilities such as:

“the reliability and sensitivity of the measures used, the quality of the design and analytic procedures, the fidelity to the research protocol, and in general, the quality of the research.”

Kraemer provides errant advice and counsel by insisting that “[a] non-significant result indicates that the study has failed, not that the hypothesis has failed.” If the p-value is the measure of the probability of observing an association at least as large as obtained given an assumed null hypothesis, then of course a large p-value cannot speak to the failure of the hypothesis, but why declare that the study has failed? The study was perhaps indeterminate, but it still yielded information that perhaps can be combined with other data, or help guide future studies.

Perhaps in her most misleading advice, Kraemer asserts that:

“[w]hether P values are banned matters little. All readers (reviewers, patients, clinicians, policy makers, and researchers) can just ignore P values and focus on the quality of research studies and effect sizes to guide decision-making.”

Really? If a high quality study finds an “effect size” of interest, we can now ignore random error?

The ASA 2016 Statement, with its “six principles,” has provoked some deliberate or ill-informed distortions in American judicial proceedings, but Kraemer’s editorial creates idiosyncratic meanings for p-values. Even the 2019 ASA “post-modernism” does not advocate ignoring random error and p-values, as opposed to proscribing dichotomous characterization of results as “statistically significant,” or not.[4] The current author guidelines for articles submitted to the Journals of the American Medical Association clearly reject this new-fangled rejection of evaluating this new-fangled rejection of the need to assess the role of random error.[5]


[1]  See Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[2]  Helena Chmura Kraemer, “Is It Time to Ban the P Value?J. Am. Med. Ass’n Psych. (August 7, 2019), in-press at doi:10.1001/jamapsychiatry.2019.1965.

[3]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016).

[4]  “Has the American Statistical Association Gone Post-Modern?” (May 24, 2019).

[5]  See instructions for authors at https://jamanetwork.com/journals/jama/pages/instructions-for-authors

Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL

July 30th, 2019

Judge Saylor, who presides over the Zofran MDL, handed down his opinion on the Zambelli-Weiner affair, on July 25, 2019.[1] As discussed on these pages back in April of this year,[2] GlaxoSmithKline (GSK), the defendant in the Zofran birth defects litigation, sought documents from plaintiffs and Dr Zambelli-Weiner (ZW) about her published study on Zofran and birth defects.[3] Plaintiffs refused to respond to the discovery on grounds of attorney work product,[4] and of consulting expert witness confidential communications.[5] After an abstract of ZW’s study appeared in print, GSK subpoenaed ZW and her co-author, Dr. Russell Kirby, for a deposition and for production of documents.

Plaintiffs’ counsel sought a protective order. Their opposition relied upon a characterization of ZW as a research scientist; they conveniently ommitted their retention of her as a paid expert witness. In December 2018, the MDL court denied plaintiffs’ motion for a protective order, and allowed the deposition to go forward to explore the financial relationship between counsel and ZW.

In January 2019, when GSK served ZW with its subpoena duces tecum, ZW through her own counsel moved for a protective order, supported by ZW’s affidavit with factual assertions to support her claim to be not subject to the deposition. The MDL court quickly denied her motion, and in short order, her lawyer notified the court that ZW’s affidavit contained “factual misrepresentations,” which she refused to correct, and he sought leave to withdraw.

According to the MDL court, the ZW affidavit contained three falsehoods. She claimed not to have been retained by any party when she had been a paid consultant to plaintiffs at times over the previous five years, since December 2014. ZW claimed that she had no factual information about the litigation, when in fact she had participated in a Las Vegas plaintiffs’ lawyers’ conference, “Mass Torts Made Perfect,” in October 2015. Furthermore, ZW falsely claimed that monies received from plaintiffs’ law firms did not go to fund the Zofran study, but went to her company, Translational Technologies International Health Research & Economics, for unrelated work. ZW received in excess of $200,000 for her work on the Zofran study.

After ZW obtained new counsel, she gave deposition testimony in February 2019, when she acknowledged the receipt of money for the study, and the lengthy relationship with plaintiffs’ counsel. Armed with this information, GSK moved for full responses to its document requests. Again, plaintiffs’ counsel and ZW resisted on grounds of confidentiality and privilege.

Judge Saylor reviewed the requested documents in camera, and held last week that they were not protected by consulting expert witness privilege or by attorney work product confidentiality. ZW’s materials and communications in connection with the Las Vegas plaintiffs’ conference never had the protection of privilege or confidentiality. ZW presented at a “quasi-public” conference attended by lawyers who had no connection to the Zofran litigation.[6]

With respect to work product claims, Judge Saylor found that GSK had shown “exceptional circumstances” and “substantial need” for the requested materials given that the plaintiffs’ testifying expert witnesses had relied upon the ZW study, which had been covertly financially supported by plaintiffs’ lawyers.[7] With respect to whatever was thinly claimed to be privileged and confidential, Judge Saylor found the whole arrangement to fail the smell test:[8]

“It is troublesome, to say the least, for a party to engage a consulting, non-testifying expert; pay for that individual to conduct and publish a study, or otherwise affect or influence the study; engage a testifying expert who relies upon the study; and then cloak the details of the arrangement with the consulting expert in the confidentiality protections of Rule 26(b) in order to conceal it from a party opponent and the Court. The Court can see no valid reason to permit such an arrangement to avoid the light of discovery and the adversarial process. Under the circumstances, GSK has made a showing of substantial need and an inability to obtain these documents by other means without undue hardship.

Furthermore, in this case, the consulting expert made false statements to the Court as to the nature of her relationship with plaintiffs’ counsel. The Court would not have been made aware of those falsehoods but for the fact that her attorney became aware of the issue and sought to withdraw. Certainly plaintiffs’ counsel did nothing at the time to correct the false impressions created by the affidavit. At a minimum, the submission of those falsehoods effectively waived whatever protections might otherwise apply. The need to discover the truth and correct the record surely outweighs any countervailing policy in favor of secrecy, particularly where plaintiffs’ testifying experts have relied heavily on Dr. Zambelli-Weiner’s study as a basis for their causation opinions. In order to effectively cross-examine plaintiffs’ experts about those opinions at trial, GSK is entitled to review the documents. At a minimum, the documents shed additional light on the nature of the relationship between Dr. Zambelli-Weiner and plaintiffs’ counsel, and go directly to the credibility of Dr. Zambelli-Weiner and the reliability of her study results.”

It remains to be seen whether Judge Saylor will refer the matter of ZW’s false statements in her affidavit to the U.S. Attorney’s office, or the lawyers’ complicity in perpetuating these falsehoods to disciplinary boards.

Mass torts will never be perfect, or even very good. Judge Saylor, however, has managed to make the Zofran litigation a little less bad.


[1]  Memorandum and order on In Camera Production of Documents Concerning Dr. April Zambelli-Weiner, In re Zofran Prods. Liab. Litig., MDL 2657, D.Mass. (July 25, 2019) [cited as Mem.].

[2]  NAS, “Litigation Science – In re Zambelli-Weiner” (April 8, 2019).

[3]  April Zambelli-Weiner, et al., “First Trimester Ondansetron Exposure and Risk of Structual Birth Defects,” 83 Reproductive Toxicol. 14 (2019).

[4]  Fed. R. Civ. P. 26(b)(3).

[5]  Fed. R. Civ. P. 26(b)(4)(D).

[6]  Mem. at 7-9.

[7]  Mem. at 9.

[8]  Mem. at 9-10.

Statistical Significance at the New England Journal of Medicine

July 19th, 2019

Some wild stuff has been going on in the world of statistics, at the American Statistical Association, and elsewhere. A very few obscure journals have declared p-values to be verboten, and presumably confidence intervals as well. The world of biomedical research has generally reacted more sanely, with authors defending the existing frequentist approaches and standards.[1]

This week, the editors of the New England Journal of Medicine have issued new statistical guidelines for authors. The Journal’s approach seems appropriately careful and conservative for the world of biomedical research. In an editorial introducing the new guidelines,[2] the Journal editors remind their potential authors that statistical significance and p-values are here to stay:

“Despite the difficulties they pose, P values continue to have an important role in medical research, and we do not believe that P values and significance tests should be eliminated altogether. A well-designed randomized or observational study will have a primary hypothesis and a prespecified method of analysis, and the significance level from that analysis is a reliable indicator of the extent to which the observed data contradict a null hypothesis of no association between an intervention or an exposure and a response. Clinicians and regulatory agencies must make decisions about which treatment to use or to allow to be marketed, and P values interpreted by reliably calculated thresholds subjected to appropriate adjustments have a role in those decisions.”[3]

The Journal’s editors described their revamped statistical policy as being based upon three premises:

(1) adhering to prespecified analysis plans if they exist;

(2) declaring associations or effects only for statistical analyses that have pre-specified “a method for controlling type I error”; and

(3) presenting evidence about clinical benefits or harms requires “both point estimates and their margins of error.”

With a hat tip to the ASA’s recent pronouncements on statistical significance,[4] the editors suggest that their new guidelines have moved away from bright-line applications of statistical significance “as a bright-line marker for a conclusion or a claim”[5]:

“[T]he notion that a treatment is effective for a particular outcome if P < 0.05 and ineffective if that threshold is not reached is a reductionist view of medicine that does not always reflect reality.”[6]

The editors’ language intimates greater latitude for authors in claiming associations or effects from their studies, but this latitude may well be circumscribed by tighter control over such claims in the inevitable context of multiple testing within a dataset.

The editors’ introduction of the new guidelines is not entirely coherent. The introductory editorial notes that the use of p-values for reporting multiple outcomes, without adjustments for multiplicity, inflates the number of findings with p-values less than 5%. The editors thus caution against “uncritical interpretation of multiple inferences,” which can be particularly threatening to valid inference when not all the comparisons conducted by the study investigators have been reported in their manuscript.[7] They reassuringly tell prospective authors that many methods are available to adjust for multiple comparisons, and can be used to control Type I error probability “when specified in the design of a study.”[8]

But what happens when such adjustment methods are not pre-specified in the study design? Failure to to do so do not appear to be disqualifying factors for publication in the Journal. For one thing, when the statistical analysis plan of the study has not specified adjustment methods for controlling type I error probabilities, then authors must replace p-values with “estimates of effects or association and 95% confidence intervals.”[9] It is hard to understand how this edict helps when the specified coefficient of 95% is a continuation of the 5% alpha, which would have been used in any event. The editors seem to be saying that if authors fail to pre-specify or even post-specify methods for controlling error probabilities, then they cannot declare statistical significance, or use p-values, but they can use confidence intervals in the same way they have been using them, and with the same misleading interpretations supplied by their readers.

More important, another price authors will have to pay for multiple testing without pre-specified methods of adjustment is that they will affirmatively have to announce their failure to adjust for multiplicity and that their putative associations “may not be reproducible.” Tepid as this concession is, it is better than previous practice, and perhaps it will become a badge of shame. The crucial question is whether judges, in exercising their gatekeeping responsibilities, will see these acknowledgements as disabling valid inferences from studies that carry this mandatory warning label.

The editors have not issued guidelines for the use of Bayesian statistical analyses, because “the large majority” of author manuscripts use only frequentist analyses.[10] The editors inform us that “[w]hen appropriate,” they will expand their guidelines to address Bayesian and other designs. Perhaps this expansion will be appropriate when Bayesian analysts establish a track record of abuse in their claiming of associations and effects.

The new guidelines themselves are not easy to find. The Journal has not published these guidelines as an article in their published issues, but has relegated them to a subsection of their website’s instructions to authors for new manuscripts:

https://www.nejm.org/author-center/new-manuscripts

Presumably, the actual author instructions control in any perceived discrepancy between this week’s editorial and the guidelines themselves. Authors are told that p-values generally should be two-sided. Authors’ use of:

“Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to match any adjustment made to significance levels in the corresponding test.”

Similarly, the guidelines call for, but do not require, pre-specified methods of controlling family-wide error rates for multiple comparisons. For observational studies submitted without pre-specified methods of error control, the guidelines recommend the use of point estimates and 95% confidence intervals, with an explanation that the interval widths have not been adjusted for multiplicity, and a caveat that the inferences from these findings may not be reproducible. The guidelines recommend against using p-values for such results, but again, it is difficult to see why reporting the 95% confidence intervals is recommended when p-values are not recommended.


[1]  Jonathan A. Cook, Dean A. Fergusson, Ian Ford, Mithat Gonen, Jonathan Kimmelman, Edward L. Korn, and Colin B. Begg, “There is still a place for significance testing in clinical trials,” 16 Clin. Trials 223 (2019).

[2]  David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[3]  Id. at 286.

[4]  See id. (“Journal editors and statistical consultants have become increasingly concerned about the overuse and misinterpretation of significance testing and P values in the medical literature. Along with their strengths, P values are subject to inherent weaknesses, as summarized in recent publications from the American Statistical Association.”) (citing Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s statement on p-values: context, process, and purpose,” 70 Am. Stat. 129 (2016); Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Moving to a world beyond ‘p < 0.05’,” 73 Am. Stat. s1 (2019)).

[5]  Id. at 285.

[6]  Id. at 285-86.

[7]  Id. at 285.

[8]  Id., citing Alex Dmitrienko, Frank Bretz, Ajit C. Tamhane, Multiple testing problems in pharmaceutical statistics (2009); Alex Dmitrienko & Ralph B. D’Agostino, Sr., “Multiplicity considerations in clinical trials,” 378 New Engl. J. Med. 2115 (2018).

[9]  Id.

[10]  Id. at 286.

Science Bench Book for Judges

July 13th, 2019

On July 1st of this year, the National Judicial College and the Justice Speakers Institute, LLC released an online publication of the Science Bench Book for Judges [Bench Book]. The Bench Book sets out to cover much of the substantive material already covered by the Federal Judicial Center’s Reference Manual:

Acknowledgments

Table of Contents

  1. Introduction: Why This Bench Book?
  2. What is Science?
  3. Scientific Evidence
  4. Introduction to Research Terminology and Concepts
  5. Pre-Trial Civil
  6. Pre-trial Criminal
  7. Trial
  8. Juvenile Court
  9. The Expert Witness
  10. Evidence-Based Sentencing
  11. Post Sentencing Supervision
  12. Civil Post Trial Proceedings
  13. Conclusion: Judges—The Gatekeepers of Scientific Evidence

Appendix 1 – Frye/Daubert—State-by-State

Appendix 2 – Sample Orders for Criminal Discovery

Appendix 3 – Biographies

The Bench Book gives some good advice in very general terms about the need to consider study validity,[1] and to approach scientific evidence with care and “healthy skepticism.”[2] When the Bench Book attempts to instruct on what it represents the scientific method of hypothesis testing, the good advice unravels:

“A scientific hypothesis simply cannot be proved. Statisticians attempt to solve this dilemma by adopting an alternate [sic] hypothesis – the null hypothesis. The null hypothesis is the opposite of the scientific hypothesis. It assumes that the scientific hypothesis is not true. The researcher conducts a statistical analysis of the study data to see if the null hypothesis can be rejected. If the null hypothesis is found to be untrue, the data support the scientific hypothesis as true.”[3]

Even in experimental settings, a statistical analysis of the data do not lead to a conclusion that the null hypothesis is untrue, as opposed to not reasonably compatible with the study’s data. In observational studies, the statistical analysis must acknowledge whether and to what extent the study has excluded bias and confounding. When the Bench Book turns to speak of statistical significance, more trouble ensues:

“The goal of an experiment, or observational study, is to achieve results that are statistically significant; that is, not occurring by chance.”[4]

In the world of result-oriented science, and scientific advocacy, it is perhaps true that scientists seek to achieve statistically significant results. Still, it seems crass to come right out and say so, as opposed to saying that the scientists are querying the data to see whether they are compatible with the null hypothesis. This first pass at statistical significance is only mildly astray compared with the Bench Book’s more serious attempts to define statistical significance and confidence intervals:

4.10 Statistical Significance

The research field agrees that study outcomes must demonstrate they are not the result of random chance. Leaving room for an error of .05, the study must achieve a 95% level of confidence that the results were the product of the study. This is denoted as p ≤ 05. (or .01 or .1).”[5]

and

“The confidence interval is also a way to gauge the reliability of an estimate. The confidence interval predicts the parameters within which a sample value will fall. It looks at the distance from the mean a value will fall, and is measured by using standard deviations. For example, if all values fall within 2 standard deviations from the mean, about 95% of the values will be within that range.”[6]

Of course, the interval speaks to the precision of the estimate, not its reliability, but that is a small point. These definitions are virtually guaranteed to confuse judges into conflating statistical significance and the coefficient of confidence with the legal burden of proof probability.

The Bench Book runs into problems in interpreting legal decisions, which would seem softer grist for the judicial mill. The authors present dictum from the Daubert decision as though it were a holding:[7]

“As noted in Daubert, ‘[t]he focus, of course, must be solely on principles and methodology, not on the conclusions they generate’.”

The authors fail to mention that this dictum was abandoned in Joiner, and that it is specifically rejected by statute, in the 2000 revision to the Federal Rule of Evidence 702.

Early in the Bench Book, it authors present a subsection entitled “The Myth of Scientific Objectivity,” which they might have borrowed from Feyerabend or Derrida. The heading appears misleading because the text contradicts it:

“Scientists often develop emotional attachments to their work—it can be difficult to abandon an idea. Regardless of bias, the strongest intellectual argument, based on accepted scientific hypotheses, will always prevail, but the road to that conclusion may be fraught with scholarly cul-de-sacs.”[8]

In a similar vein, the authors misleadingly tell readers that “the forefront of science is rarely encountered in court,” and so “much of the science mentioned there shall be considered established….”[9] Of course, the reality is that many causal claims presented in court have already been rejected or held to be indeterminate by the scientific community. And just when readers may think themselves safe from the goblins of nihilism, the authors launch into a theory of naïve probabilism that science is just placing subjective probabilities upon data, based upon preconceived biases and beliefs:

“All of these biases and beliefs play into the process of weighing data, a critical aspect of science. Placing weight on a result is the process of assigning a probability to an outcome. Everything in the universe can be expressed in probabilities.”[10]

So help the expert witness who honestly (and correctly) testifies that the causal claim or its rejection cannot be expressed as a probability statement!

Although I have not read all of the Bench Book closely, there appears to be no meaningful discussion of Rule 703, or of the need to access underlying data to ensure that the proffered scientific opinion under scrutiny has used appropriate methodologies at every step in its development. Even a 412 text cannot address every issue, but this one does little to help the judicial reader find more in-depth help on statistical and scientific methodological issues that arise in occupational and environmental disease claims, and in pharmaceutical products litigation.

The organizations involved in this Bench Book appear to be honest brokers of remedial education for judges. The writing of this Bench Book was funded by the State Justice Institute (SJI) Which is a creation of federal legislation enacted with the laudatory goal of improving the quality of judging in state courts.[11] Despite its provenance in federal legislation, the SJI is a a private, nonprofit corporation, governed by 11 directors appointed by the President, and confirmed by the Senate. A majority of the directors (six) are state court judges, one state court administrator, and four members of the public (no more than two from any one political party). The function of the SJI is to award grants to improve judging in state courts.

The National Judicial College (NJC) originated in the early 1960s, from the efforts of the American Bar Association, American Judicature Society and the Institute of Judicial Administration, to provide education for judges. In 1977, the NJC became a Nevada not-for-profit (501)(c)(3) educational corporation, which its campus at the University of Nevada, Reno, where judges could go for training and recreational activities.

The Justice Speakers Institute appears to be a for-profit company that provides educational resources for judge. A Press Release touts the Bench Book and follow-on webinars. Caveat emptor.

The rationale for this Bench Book is open to question. Unlike the Reference Manual for Scientific Evidence, which was co-produced by the Federal Judicial Center and the National Academies of Science, the Bench Book’s authors are lawyers and judges, without any subject-matter expertise. Unlike the Reference Manual, the Bench Book’s chapters have no scientist or statistician authors, and it shows. Remarkably, the Bench Book does not appear to cite to the Reference Manual or the Manual on Complex Litigation, at any point in its discussion of the federal law of expert witnesses or of scientific or statistical method. Perhaps taxpayers would have been spared substantial expense if state judges were simply encouraged to read the Reference Manual.


[1]  Bench Book at 190.

[2]  Bench Book at 174 (“Given the large amount of statistical information contained in expert reports, as well as in the daily lives of the general society, the ability to be a competent consumer of scientific reports is challenging. Effective critical review of scientific information requires vigilance, and some healthy skepticism.”).

[3]  Bench Book at 137; see also id. at 162.

[4]  Bench Book at 148.

[5]  Bench Book at 160.

[6]  Bench Book at 152.

[7]  Bench Book at 233, quoting Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[8]  Bench Book at 10.

[9]  Id. at 10.

[10]  Id. at 10.

[11] See State Justice Institute Act of 1984 (42 U.S.C. ch. 113, 42 U.S.C. § 10701 et seq.).

The Shmeta-Analysis in Paoli

July 11th, 2019

In the Paoli Railroad yard litigation, plaintiffs claimed injuries and increased risk of future cancers from environmental exposure to polychlorinated biphenyls (PCBs). This massive litigation showed up before federal district judge Hon. Robert F. Kelly,[1] in the Eastern District of Pennsylvania, who may well have been the first judge to grapple with a litigation attempt to use meta-analysis to show a causal association.

One of the plaintiffs’ expert witnesses was the late William J. Nicholson, who was a professor at Mt. Sinai School of Medicine, and a colleague of Irving Selikoff. Nicholson was trained in physics, and had no professional training in epidemiology. Nonetheless, Nicholson was Selikoff’s go-to colleague for performing epidemiologic studies. After Selikoff withdrew from active testifying for plaintiffs in tort litigation, Nicholson was one of his colleagues who jumped into the fray as a surrogate advocate for Selikoff.[2]

For his opinion that PCBs were causally associated with liver cancer in humans,[3] Nicholson relied upon a report he wrote for the Ontario Ministry of Labor. [cited here as “Report”].[4] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[5]

The defense challenged the admissibility of Nicholson’s meta-analysis, on several grounds. The trial court decided the challenge based upon the Downing case, which was the law in the Third Circuit, before the Supreme Court decided Daubert.[6] The Downing case allowed some opportunity for consideration of reliability and validity concerns; there is, however, disappointingly little discussion of any actual validity concerns in the courts’ opinions.

The defense challenge to Nicholson’s proffered testimony on liver cancer turned on its characterization of meta-analysis as a “novel” technique, which is generally unreliable, and its claim that Nicholson’s meta-analysis in particular was unreliable. None of the individual studies that contributed data showed any “connection” between PCBs and liver cancer; nor did any individual study conclude that there was a causal association.

Of course, the appropriate response to this situation, with no one study finding a statistically significant association, or concluding that there was a causal association, should have been “so what?” One of the reasons to do a meta-analysis is that no available study was sufficiently large to find a statistically significant association, if one were there. As for drawing conclusions of causal associations, it is not the role or place of an individual study to synthesize all the available evidence into a principled conclusion of causation.

In any event, the trial court concluded that the proffered novel technique lacked sufficient reliability, that the meta-analysis would “overwhelm, confuse, or mislead the jury,” and that the proffered meta-analysis on liver cancer was not sufficiently relevant to the facts of the case (in which no plaintiff had developed, or had died of, liver cancer). The trial court noted that the Report had not been peer-reviewed, and that it had not been accepted or relied upon by the Ontario government for any finding or policy decision. The trial court also expressed its concern that the proffered testimony along the lines of the Report would possibly confuse the jury because it appeared to be “scientific” and because Nicholson appeared to be qualified.

The Appeal

The Court of Appeals for the Third Circuit, in an opinion by Judge Becker, reversed Judge Kelly’s exclusion of the Nicholson Report, in an opinion that is still sometimes cited, even though Downing is no longer good law in the Circuit or anywhere else.[7] The Court was ultimately not persuaded that the trial court had handled the exclusion of Nicholson’s Report and its meta-analysis correctly, and it remanded the case for a do-over analysis.

Judge Becker described Nicholson’s Report as a “meta-analysis,” which pooled or “combined the results of numerous epidemiologic surveys in order to achieve a larger sample size, adjusted the results for differences in testing techniques, and drew his own scientific conclusions.”[8] Through this method, Nicholson claimed to have shown that “exposure to PCBs can cause liver, gall bladder and biliary tract disorders … even though none of the individual surveys supports such a conclusion when considered in isolation.”[9]

Validity

The appellate court gave no weight to the possibility that a meta-analysis would confuse a jury, or that its “scientific nature” or Nicholson’s credentials would lead a jury to give it more weight than it deserved.[10] The Court of Appeals conceded, however, that exclusion would have been appropriate if the methodology used itself was invalid. The appellate opinion further acknowledged that the defense had offered opposition to Nicholson’s Report in which it documented his failure to include data that were inconsistent with his conclusions, and that “Nicholson had produced a scientifically invalid study.”[11]

Judge Becker’s opinion for a panel of the Third Circuit provided no details about the cherry picking. The opinion never analyzed why this charge of cherry-picking and manipulation of the dataset did not invalidate the meta-analytic method generally, or Nicholson’s method as applied. The opinion gave no suggestion that this counter-affidavit was ever answered by the plaintiffs.

Generally, Judge Becker’s opinion dodged engagement with the specific threats to validity in Nicholson’s Report, and took refuge in the indisputable fact that hundreds of meta-analyses were published annually, and that the defense expert witnesses did not question the general reliability of meta-analysis.[12] These facts undermined the defense claim that meta-analysis was novel.[13] The reality, however, was that meta-analysis was in its infancy in bio-medical research.

When it came to the specific meta-analysis at issue, the court did not discuss or analyze a single pertinent detail of the Report. Despite its lack of engagement with the specifics of the Report’s meta-analysis, the court astutely observed that prevalent errors and flaws do not mean that a particular meta-analysis is “necessarily in error.”[14] Of course, without bothering to look, the court would not know whether the proffered meta-analysis was “actually in error.”

The appellate court would have given Nicholson’s Report a “pass” if it was an application of an accepted methodology. The defense’s remedy under this condition would be to cross-examine the opinion in front of a jury. If, on the other hand, the Nicholson had altered an accepted methodology to skew its results, then the court’s gatekeeping responsibility under Downing would be invoked.

The appellate court went on to fault the trial court for failing to make sufficiently explicit findings as to whether the questioned meta-analysis was unreliable. From its perspective, the Court of Appeals saw the trial court as resolving the reliability issue upon the greater credibility of defense expert witnesses in branding the disputed meta-analysis as unreliability. Credibility determinations are for the jury, but the court left room for a challenge on reliability itself:[15]

“Assuming that Dr. Nicholson’s meta-analysis is the proper subject of Downing scrutiny, the district court’s decision is wanting, because it did not make explicit enough findings on the reliability of Dr. Nicholson’s meta-analysis to satisfy Downing. We decline to define the exact level at which a district court can exclude a technique as sufficiently unreliable. Reliability indicia vary so much from case to case that any attempt to define such a level would most likely be pointless. Downing itself lays down a flexible rule. What is not flexible under Downing is the requirement that there be a developed record and specific findings on reliability issues. Those are absent here. Thus, even if it may be possible to exclude Dr. Nicholson’s testimony under Downing, as an unreliable, skewed meta-analysis, we cannot make such a determination on the record as it now stands. Not only was there no hearing, in limine or otherwise, at which the bases for the opinions of the contesting experts could be evaluated, but the experts were also not even deposed. All of the expert evidence was based on affidavits.”

Peer Review

Understandably, the defense attacked Nicholson’s Report as not having been peer reviewed. Without any scrutiny of the scientific bona fides of the workers’ compensation agency, the appellate court acquiesced in Nicholson’s self-serving characterization of his Report as having been reviewed by “cooperating researchers” and the Panel of the Ontario Workers’ Compensation agency. Another partisan expert witness characterized Nicholson’s Report as a “balanced assessment,” and this seemed to appease the Third Circuit, which was wary of requiring peer review in the first place.[16]

Relevancy Prong

The defense had argued that Nicholson’s Report was irrelevant because no individual plaintiff claimed liver cancer.[17] The trial court largely accepted this argument, but the appellate court disagreed because of conclusory language in Nicholson’s affidavit, in which he asserted that “proof of an increased risk of liver cancer is probative of an increased risk of other forms of cancer.” The court seemed unfazed by the ipse dixit, asserted without any support. Indeed, Nicholson’s assertion was contradicted by his own Report, in which he reported that there were fewer cancers among PCB-exposed male capacitor manufacturing workers than expected,[18] and that the rate for all cancers for both men and women was lower than expected, with 132 observed and 139.40 expected.[19]

The trial court had also agreed with the defense’s suggestion that Nicholson’s report, and its conclusion of causality between PCB exposure and liver cancer, were irrelevant because the Report “could not be the basis for anyone to say with reasonable degree of scientific certainty that some particular person’s disease, not cancer of the liver, biliary tract or gall bladder, was caused by PCBs.”[20]

Analysis

It would likely have been lost on Judge Becker and his colleagues, but Nicholson presented SMRs (standardized mortality ratios) throughout his Report, and for the all cancers statistic, he gave an SMR of 95. What Nicholson clearly did in this, and in all other instances, was simply divide the observed number by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of false and misleading. And in case anyone remembers General Electric v. Joiner, Nicholson’s summary estimate of risk for lung cancer in men was below the expected rate.[21]

Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[22]

Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.

Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies is not a valid meta-analysis; indeed, it is a well-known example of how to generate a Simpson’s Paradox, which can change the direction or magnitude of any association.[23]

Some may be tempted to criticize the defense for having focused its challenge on the “novelty” of Nicholson’s approach in Paoli. The problem of course was the invalidity of Nicholson’s work, but both the trial court’s exclusion of Nicholson, and the Court of Appeals’ reversal and remand of the exclusion decision, illustrate the problem in getting judges, even well-respected judges, to accept their responsibility to engage with questioned scientific evidence.

Even in Paoli, no amount of ketchup could conceal the unsavoriness of Nicholson’s scrapple analysis. When the Paoli case reached the Court Appeals again in 1994, Nicholson’s analysis was absent.[24] Apparently, the plaintiffs’ counsel had second thoughts about the whole matter. Today, under the revised Rule 702, there can be little doubt that Nicholson’s so-called meta-analysis should have been excluded.


[1]  Not to be confused with the Judge Kelly of the same district, who was unceremoniously disqualified after attending an ex parte conference with plaintiffs’ lawyers and expert witnesses, at the invitation of Dr. Irving Selikoff.

[2]  Pace Philip J. Landrigan & Myron A. Mehlman, “In Memoriam – William J. Nicholson,” 40 Am. J. Indus. Med. 231 (2001). Landrigan and Mehlman assert, without any support, that Nicholson was an epidemiologist. Their own description of his career, his undergraduate work at MIT, his doctorate in physics from the University of Washington, his employment at the Watson Laboratory, before becoming a staff member in Irving Selikoff’s department in 1969, all suggest that Nicholson brought little to no experience in epidemiology to his work on occupational and environmental exposure epidemiology.

[3]  In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).

[4]  William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto, Ontario Dec. 1987).

[5]  Id. at 373.

[6]  United States v. Downing, 753 F.2d 1224 (3d Cir.1985)

[7]  In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 111 S.Ct. 1584 (1991).

[8]  Id. at 845.

[9]  Id.

[10]  Id. at 841, 848.

[11]  Id. at 845.

[12]  Id. at 847-48.

[13]  See, e.g., Robert Rosenthal, Judgment studies: Design, analysis, and meta-analysis (1987); Richard J. Light & David B. Pillemer, Summing Up: the Science of Reviewing Research (1984); Thomas A. Louis, Harvey V. Fineberg & Frederick Mosteller, “Findings for Public Health from Meta-Analyses,” 6 Ann. Rev. Public Health 1 (1985); Kristan A. L’abbé, Allan S. Detsky & Keith O’Rourke, “Meta-analysis in clinical research,” 107 Ann. Intern. Med. 224 (1987).

[14]  Id. at 857.

[15]  Id. at 858/

[16]  Id. at 858.

[17]  Id. at 845.

[18]  Report, Table 16.

[19]  Report, Table 18.

[20]  In re Paoli, 916 F.2d at 847.

[21]  See General Electric v. Joiner, 522 U.S. 136 (1997); NAS, “How Have Important Rule 702 Holdings Held Up With Time?” (March 20, 2015).

[22]  Report, Table 22.

[23]  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in Statistics, 2 Biometrika 121 (1903).

[24]  In re Paoli RR Yard Litig., 35 F.3d 717 (3d Cir. 1994).

California Roasts Fear-Mongering Industry

June 16th, 2019

A year ago, California set out to create an exemption for coffee from its Proposition 65 regulations. The lawsuit industry, represented by the Council for Education and Research on Toxics (CERT) had been successfully deploying Prop 65’s private right of action provisions to pick the pockets of coffee vendors. Something had to give.

In 2010, Mr. Metzger, on behalf of CERT, sued Starbucks and 90 other coffee manufacturers and distributors, claiming they had failed to warn consumers about the cancer risks of acrylamide. CERT’s mission was to shake down the roasters and the vendors because coffee has minor amounts of acrylamide in it. Acrylamide in very high doses causes tumors in rats[1]; coffee consumption by humans is generally regarded as beneficial.

Earlier last year a Los Angeles Superior Court ordered the coffee companies to put cancer warnings on their beverages. In the upcoming damages phase of the case, Metzger sought as much as $2,500 in civil penalties for each cup of coffee the defendants sold over at least a decade. Suing companies for violating California’s Proposition 65 is like shooting fish in a barrel, but the State’s regulatory initiative to save California from the embarrassment of branding coffee a carcinogen was a major setback for CERT.

And so the Office of Environmental Health Hazard Assessment (OEHHA) began a rulemaking largely designed to protect the agency from the public relations nightmare created by the application of the governing statute and regulations to squeeze the coffee roasters and makers.[2] The California’s agency’s proposed regulation on acrylamide in coffee resulted in a stay of CERT’s enforcement action against Starbucks.[3] CERT’s lawyers were not pleased; they had already won a trial court’s judgment that they were owed damages, and only the amount needed to be set. In September 2018, CERT filed a lawsuit in Los Angeles Superior Court against the state of California challenging OEHHA’s proposed rule, saying it was being rammed through the agency on the order of the Office of the Governor in an effort to kill CERT’s suit against the coffee companies. Or maybe it was simply designed to allow people to drink their coffee without the Big Prop 65 warning.

Earlier this month, after reviewing voluminous submissions and holding a hearing, the OEHHA announced its ruling that Californians do not need to be warned that coffee causes cancer. Epistemically, coffee is not known to the State of California to be hazardous to human health.[4] According to Sam Delson, a spokesperson for the OEHHA, “Coffee is a complex mixture of hundreds of chemicals that includes both carcinogens and anti-carcinogens. … The overall effect of coffee consumption is not associated with any significant cancer risk.” The regulation saving coffee goes into effect in October 2019. CERT, no doubt, will press on in its litigation campaign against the State.

CERT is the ethically dodgy organization founded by C. Sterling Wolfe, a former environmental lawyer; Brad Lunn; Carl Cranor, a philosophy professor at University of California Riverside; and Martyn T. Smith, a toxicology professor at University of California Berkeley.[5] Metzger has been its lawyer for many years; indeed, Metzger and CERT share the same office. Smith has been the recipient of CERT’s largesse in funding toxicologic studies. Cranor and Smith have both testified for the lawsuit industry.

In the well-known Milward case,[6] both Cranor and Smith served as paid expert witnesses for plaintiff. When the trial court excluded their proffered testimonies as unhelpful and unreliable, their own organization, CERT, came to the rescue by filing an amicus brief in the First Circuit. Supporting by a large cast of fellow travelers, CERT perverted the course of justice by failing to disclose the intimate relationship between the “amicus” CERT and the expert witnesses Cranor and Smith, whose opinions had been successfully challenged.[7]

The OEHHA coffee regulation shows that not all regulation is bad.


[1]  National Cancer Institute, “Acrylamide and Cancer Risk.”

[2]  See Sam Delson, “Press Release: Proposed OEHHA regulation clarifies that cancer warnings are not required for coffee under Proposition 65” (June 15, 2018).

[3]  Council for Education and Research on Toxics v. Starbucks Corp., case no. B292762, Court of Appeal of the State of California, Second Appellate District.

[4]  Associated Press, “Perk Up: California Says Coffee Cancer Risk Insignificant,” N.Y. Times (June 3, 2019); Sara Randazzo, “Coffee Doesn’t Warrant a Cancer Warning in California, Agency Says; Industry scores win following finding on chemical found in beverage,” W.S.J. (June 3, 2019); Editorial Board, “Coffee Doesn’t Kill After All: California has a moment of sanity, and a lawyer is furious,” Wall.St.J. (June 5, 2019).

[5]  Michael Waters, “The Secretive Non-Profit Gaming California’s Health Laws,” The Outline (June 18, 2018); Beth Mole, “The secretive nonprofit that made millions suing companies over cancer warnings,” Ars Technica (June 6, 2019); NAS, “Coffee with Cream, Sugar & a Dash of Acrylamide” (June 9, 2018); NAS, “The Council for Education & Research on Toxics” (July 9, 2013); NAS, “Sand in My Shoe – CERTainly” (June 17, 2014) (CERT briefs supported by fellow-travelers, testifying expert witnesses Jerrold Abraham, Richard W. Clapp, Ronald Crystal, David A. Eastmond, Arthur L. Frank, Robert J. Harrison, Ronald Melnick, Lee Newman, Stephen M. Rappaport, David Joseph Ross, and Janet Weiss, all without disclosing conflicts of interest).

[6]  Milward v. Acuity Specialty Products Group, Inc., 664 F. Supp. 2d 137, 148 (D.Mass. 2009), rev’d, 639 F.3d 11 (1st Cir. 2011), cert. den. sub nom. U.S. Steel Corp. v. Milward, 565 U.S. 1111 (2012), on remand, Milward v. Acuity Specialty Products Group, Inc., 969 F.Supp. 2d 101 (D.Mass. 2013) (excluding specific causation opinions as invalid; granting summary judgment), aff’d, 820 F.3d 469 (1st Cir. 2016).

[7]  NAS, “The Council for Education & Research on Toxics” (July 9, 2013) (CERT amicus brief filed without any disclosure of conflict of interest). The fellow travelers who knowingly or unknowingly aided CERT’s scheme to pervert the course of justice, included some well-known testifiers for the lawsuit industry: Nicholas A. Ashford, Nachman Brautbar, David C. Christiani, Richard W. Clapp, James Dahlgren, Devra Lee Davis, Malin Roy Dollinger, Brian G. Durie, David A. Eastmond, Arthur L. Frank, Frank H. Gardner, Peter L. Greenberg, Robert J. Harrison, Peter F. Infante, Philip J. Landrigan, Barry S. Levy, Melissa A. McDiarmid, Myron Mehlman, Ronald L. Melnick, Mark Nicas, David Ozonoff, Stephen M. Rappaport, David Rosner, Allan H. Smith, Daniel Thau Teitelbaum, Janet Weiss, and Luoping Zhang. See also NAS, “Carl Cranor’s Conflicted Jeremiad Against Daubert” (Sept. 23, 2018); Carl Cranor, “Milward v. Acuity Specialty Products: How the First Circuit Opened Courthouse Doors for Wronged Parties to Present Wider Range of Scientific Evidence” (July 25, 2011).

 

 

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.