TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Regressive Methodology in Pharmaco-Epidemiology

October 24th, 2020

Medications are rigorously tested for safety and efficacy in clinical trials before approval by regulatory agencies such as the U.S. Food & Drug Administration (FDA) or the European Medicines Agency (EMA). The approval process, however, contemplates that more data about safety and efficacy will emerge from the use of approved medications in pharmacoepidemiologic studies conducted outside of clinical trials. Litigation of safety outcomes rarely arises from claims based upon the pivotal clinical trials that were conducted for regulatory approval and licensing. The typical courtroom scenario is that a safety outcome is called into question by pharmacoepidemiologic studies that purport to find associations or causality between the use of a specific medication and the claimed harm.

The International Society for Pharmacoepidemiology (ISPE), established in 1989, describes itself as an international professional organization intent on advancing health through pharmacoepidemiology, and related areas of pharmacovigilance. The ISPE website defines pharmacoepidemiology as

“the science that applies epidemiologic approaches to studying the use, effectiveness, value and safety of pharmaceuticals.”

The ISPE conceptualizes pharmacoepidemiology as “real-world” evidence, in contrast to randomized clinical trials:

“Randomized controlled trials (RCTs) have served and will continue to serve as the major evidentiary standard for regulatory approvals of new molecular entities and other health technology. Nonetheless, RWE derived from well-designed studies, with application of rigorous epidemiologic methods, combined with judicious interpretation, can offer robust evidence regarding safety and effectiveness. Such evidence contributes to the development, approval, and post-marketing evaluation of medicines and other health technology. It enables patient, clinician, payer, and regulatory decision-making when a traditional RCT is not feasible or not appropriate.”

ISPE Position on Real-World Evidence (Feb. 12, 2020) (emphasis in original).

The ISPE publishes an official journal, Pharmacoepidemiology and Drug Safety, and sponsors conferences and seminars, all of which are watched by lawyers pursuing and defending drug and device health safety claims. The endorsement by the ISPE of the American Statistical Association’s 2016 statement on p-values is thus of interest not only to statisticians, but to lawyers and claimants involved in drug safety litigation.

The ISPE, through its board of directors, formally endorsed the ASA 2016 p-value statement on April 1, 2017 (no fooling) in a statement that can be found at its website:

The International Society for Pharmacoepidemiology, ISPE, formally endorses the ASA statement on the misuse of p-values and accepts it as an important step forward in the pursuit of reasonable and appropriate interpretation of data.

On March 7, 2016, the American Statistical Association (ASA) issued a policy statement that warned the scientific community about the use P-values and statistical significance for interpretation of reported associations. The policy statement was accompanied by an introduction that characterized the reliance on significance testing as a vicious cycle of teaching significance testing because it was expected, and using it because that was what was taught. The statement and many accompanying commentaries illustrated that p-values were commonly misinterpreted to imply conclusions that they cannot imply. Most notably, “p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.” Also, “a p-value does not provide a good measure of evidence regarding a model or hypothesis.” Furthermore, reliance on p-values for data

interpretation has exacerbated the replication problem of scientific work, as replication of a finding is often confused with replicating the statistical significance of a finding, on the erroneous assumption that replication should lead to studies getting similar p-values.

This official statement from the ASA has ramifications for a broad range of disciplines, including pharmacoepidemiology, where use of significance testing and misinterpretation of data based on P-values is still common. ISPE has already adopted a similar stance and incorporated it into our GPP [ref] guidelines. The ASA statement, however, carries weight on this topic that other organizations cannot, and will inevitably lead to changes in journals and classrooms.

There are points of interpretation of the ASA Statement, which can be discussed and debated. What is clear, however, is that the ASA never urged the abandonment of p-values or even of statistical significance. The Statement contained six principles, some of which did nothing other than to attempt to correct prevalent misunderstandings of p-values. The third principle stated that “[s]cientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.” (emphasis added).

This principle, as stated, thus hardly advocated for the abandonment of a threshold in testing; rather it made the unexceptional point that the ultimate scientific conclusion (say about causality) required more assessment than only determining whether a p-value passed a specified threshold.

Presumably, the ISPE’s endorsement of the ASA’s 2016 Statement embraces all six of the articulated principles, including the ASA’s fourth principle:

4. Proper inference requires full reporting and transparency

P-values and related analyses should not be reported selectively. Conducting multiple analyses of the data and reporting only those with certain p-values (typically those passing a significance threshold) renders the reported p-values essentially uninterpretable. Cherry-picking promising findings, also known by such terms as data dredging, significance chasing, significance questing, selective inference, and “p-hacking,” leads to a spurious excess of statistically significant results in the published literature and should be vigorously avoided. One need not formally carry out multiple statistical tests for this problem to arise: Whenever a researcher chooses what to present based on statistical results, valid interpretation of those results is severely compromised if the reader is not informed of the choice and its basis. Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted, and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”

The ISPE’s endorsement of the ASA 2016 Statement references the ISPE’s own

Guidelines for Good Pharmacoepidemiology Practices (GPP),” which were promulgated initially in 1996, and revised as recently as June 2015. Good practices, as of 2015, provided that:

“Interpretation of statistical measures, including confidence intervals, should be tempered with appropriate judgment and acknowledgements of potential sources of error and limitations of the analysis, and should never be taken as the sole or rigid basis for concluding that there is or is not a relation between an exposure and outcome. Sensitivity analyses should be conducted to examine the effect of varying potentially critical assumptions of the analysis.”

All well and good, but this “good practices” statement might be taken as a bit anemic, given that it contains no mention of, or caution against, unqualified or unadjusted confidence intervals or p-values that come from multiple testing or comparisons. The ISPE endorsement of the ASA Statement now expands upon the ISPE’s good practices to include the avoidance of multiplicity and the disclosure of the full extent of analyses conducted in a study.

What happens in the “real world” of publishing, outside the board room?

Last month, the ISPE conducted its (virtual) 36th International Conference on Pharmacoepidemiology & Therapeutic Risk Management. The abstracts and poster presentations from this Conference were published last week as a Special Issue of the ISPE journal. I spot checked the journal contents to see how well the presentations lived up to the ISPE’s statistical aspirations.

One poster presentation addressed statin use and skin cancer risk in a French prospective cohort.[1]  The authors described their cohort of French women, who were 40 to 65 years old, in 1990, and were followed forward. Exposure to statin medications was assessed from 2004 through 2014. The analysis included outcomes of any skin cancer, melanoma, basal-cell carcinoma (BCC), and squamous-call carcinoma (SCC), among 66,916 women. Here is how the authors describe their findings:

There was no association between ever use of statins and skin cancer risk: the HRs were 0.96 (95% CI = 0.87-1.05) for overall skin cancer, 1.18 (95% CI = 0.96-1.47) for melanoma, 0.89 (95% CI = 0.79-1.01) for BCC, and 0.90 (95% CI = 0.67-1.21) for SCC. Associations did not differ by statin molecule nor by duration or dose of use. However, women who started to use statins before age 60 were at increased risk of BCC (HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use).

To be fair, this was a poster presentation, but this short description of findings makes clear that the investigators looked at least at the following subgroups:

Exposure subgroups:

  • specific statin drug
  • duration of use
  • dosage
  • age strata

and

Outcome subgroups:

  • melanoma
  • basal-cell carcinoma
  • squamous-cell carcinoma

The reader is not told how many specific statins, how many duration groups, dosage groups, and age strata were involved in the exposure analysis. My estimate is that the exposure subgroups were likely in excess of 100. With three disease outcome subgroups, the total subgroup analyses thus likely exceeded 300. The authors did not provide any information about the full extent of their analyses.

Here is how the authors reported their conclusion:

“These findings of increased BCC risk in statin users before age 60 deserve further investigations.”

Now, the authors did not use the phrase “statistically significant,” but it is clear that they have characterized a finding of “increased BCC risk in statin users before age 60,” and in no other subgroup, and they have done so based upon a reported nominal “HR = 1.45, 95% CI = 1.07-1.96 for ever vs never use.” It is also clear that the authors have made no allowance, adjustment, modification, or qualification, for the wild multiplicity arising from their estimated 300 or so subgroups. Instead, they made an unqualified statement about “increased BCC risk,” and they offered an opinion about the warrant for further studies.

Endorsement of good statistical practices is a welcome professional organizational activity, but it is rather meaningless unless the professional societies begin to implement the good practices in their article selection, editing, and publishing activities.


[1]  Marie Al Rahmoun, Yahya Mahamat-Saleh, Iris Cervenka, Gianluca Severi, Marie-Christine Boutron-Ruault, Marina Kvaskoff, and Agnès Fournier, “Statin use and skin cancer risk: A French prospective cohort study,” 29 Pharmacoepidemiol. & Drug Safety s645 (2020).

The Defenestration of Sir Ronald Aylmer Fisher

August 20th, 2020

Fisher has been defenestrated. Literally.

Sir Ronald Fisher was a brilliant statistician. Born in 1890, he won a scholarship to Gonville and Caius College, in Cambridge University, in 1909. Three years later, he gained first class honors in Mathematics, and he went on to have extraordinary careers in genetics and statistics. In 1929, Fisher was elected to the Royal Society, and in 1952, Queen Bessy knighted him for his many contributions to the realm, including his work on experimental design and data interpretation, and his bridging the Mendelian theory of genetics and Darwin’s theory of evolution. In 1998, Bradley Efron described Fisher as “the single most important figure in 20th century statistics.[1] And in 2010, University College, London, established the “R. A. Fisher Chair in Statistical Genetics” in honor of Fisher’s remarkable contributions to both genetics and statistics. Fisher’s college put up a stained-glass window to celebrate its accomplished graduate.

Fisher was, through his interest in genetics, also interested in eugenics through the application of genetic learning to political problems. For instance, he favored abolishing extra social support to large families, in favor of support proportional to the father’s wages. Fisher also entertained with some seriousness grand claims about the connection between rise and fall of civilizations and the loss of fertility among the upper classes.[2] While a student at Caius College, Fisher joined the Cambridge Eugenics Society, as did John Maynard Keynes. For reasons having to do with professional jealousies, Fisher’s appointment at University College London, in 1933, was as a professor of Eugenics, not Statistics.

After World War II, an agency of the United Nations, the United Nations Educational, Scientific and Cultural Organization (UNESCO) sought to forge a scientific consensus against racism, and Nazi horrors.[3] Fisher participated in the UNESCO commission, which he found to be “well-intentioned” but errant for failing to acknowledge inter-group differences “in their innate capacity for intellectual and emotional development.”[4]

Later in the UNESCO report, Fisher’s objections are described as the same as those of Herman Joseph Muller, who won the Nobel Prize for Medicine in 1946, The report provides Fisher’s objections in his own words:

“As you ask for remarks and suggestions, there is one that occurs to me, unfortunately of a somewhat fundamental nature, namely that the Statement as it stands appears to draw a distinction between the body and mind of men, which must, I think, prove untenable. It appears to me unmistakable that gene differences which influence the growth or physiological development of an organism will ordinarily pari passu influence the congenital inclinations and capacities of the mind. In fact, I should say that, to vary conclusion (2) on page 5, ‘Available scientific knowledge provides a firm basis for believing that the groups of mankind differ in their innate capacity for intellectual and emotional development,’ seeing that such groups do differ undoubtedly in a very large number of their genes.”[5]

Fisher’s comments may not be totally anodyne by today’s standards, but he had also commented that that:

“the practical international problem is that of learning to share the resources of this planet amicably with persons of materially different nature, and that this problem is being obscured by entirely well-intentioned efforts to minimize the real differences that exist.”[6]

Fisher’s comments seem to reflect his beliefs in the importance of the genetic contribution to “intelligence and emotional development,” which today retain both their plausibility and controversial status. Fisher’s participation in the UNESCO effort, and his emphasis on sharing resources peacefully, seem to speak against malignant racism, and distinguish him from the ugliness of the racism expressed by the Marxist statistician (and eugenicist) Karl Pearson.[7]

Cancel Culture Catches Up With Sir Ronald A. Fisher

Nonetheless the Woke mob has had its daggers out for Sir Ronald, for some time. Back in June of this year, graffiti covered the walls of Caius College, calling for the defenestration of Fisher.  A more sedate group circulated a petition for the removal of the Fisher window.[8] Later that month, the university removed the Fisher window, literally defenestrating him.[9]

The de-platforming of Fisher was not contained to the campus of a college in Cambridge University.  Fisher spent some of his most productive years, outside the university, at the Rothamsted Experimental Station.  Not to be found deficient in the metrics of social justice, Rothamsted Research issued a statement, on June 9, 2020, concerning its most famous resident scientist:

“Ronald Aylmer Fisher is often considered to have founded modern statistics. Starting in 1919, Fisher worked at Rothamsted Experimental Station (as it was called then) for 14 years.

Among his many interests, Fisher supported the philosophy of eugenics, which was not uncommon among intellectuals in Europe and America in the early 20th Century.

The Trustees of the Lawes Agricultural Trust, therefore, consider it appropriate to change the name of the Fisher Court accommodation block (opened in 2018 and named after the old Fisher Building that it replaced) to ‘AnoVa Court’, after the analysis of variance statistical test developed by Fisher’s team at Rothamsted, and which is widely used today. Arrangements for this change of name are currently being made.”

I suppose that soon it will verboten to mention Fisher’s Exact Test.

Daniel Cleather, a scientist and self-proclaimed anarchist, goes further and claims that the entire enterprise of statistics is racist.[10] Cleather argues that mathematical models of reality are biased against causal explanation, and that this bias supports eugenics and politically conservative goals. Cleather claims that statistical methods were developed “by white supremacists for the express purpose of demonstrating that white men are better than other people.” Cleather never delivers any evidence, however, to support his charges, but he no doubt feels strongly about it, and feels unsafe in the presence of Fisher’s work on experimental methods.

It is interesting to compare the disparate treatment that other famous scholars and scientists are receiving from the Woke. Aristotle was a great philosopher and “natural philosopher” scientist. There is a well-known philosophical society, the Aristotlean Society, obviously named for Aristotle, as is fitting. In the aftermath of the killings of George Floyd, Breonna Taylor and Ahmaud Arbery, the Aristotlean Society engaged in this bit of moral grandstanding, of which The Philosopher would have likely disapproved:

A statement from the Aristotelian Society

“The recent killings of George Floyd, Breonna Taylor and Ahmaud Arbery have underlined the systemic racism and racial injustice that continue to pervade not just US but also British society. The Aristotelian Society stands resolutely opposed to racism and discrimination in any form. In line with its founding principles, the Society is committed to ensuring that all its members can meet on an equal footing in the promotion of philosophy. In order to achieve this aim, we will continue to work to identify ways that we can improve, in consultation with others. We recognise it as part of the mission of the Society to actively promote philosophical work that engages productively with issues of race and racism.”

I am sure it occurred to the members of the Society that Aristotle had expressed a view that some people were slaves by nature.[11] Today, we certainly do not celebrate Aristotle for this view, but we have not defenestrated him for a view much more hateful than any expressed by Sir Ronald. My point is merely that the vaunted Aristotelian Society is well able to look at the entire set of accomplishments of Aristotle, and not throw him out the window for his views on slavery. Still, if you have art work depicting Aristotle, you may be wise to put it out of harms way.

If Aristotle’s transgressions were too ancient for the Woke mob, then consider those of Nathan Roscoe Pound, who was the Dean of Harvard Law School, from 1916 to 1936. Pound wrote on jurisprudential issues, and he is generally regarded as the founder of “sociological jurisprudence,” which seeks to understand law as influenced and determined by sociological conditions. Pound is celebrated especially by the plaintiffs’ bar, for his work for National Association of Claimants‘ Compensation Attorneys, which was the precursor to the Association of Trial Lawyers of America, and the current, rent-seeking, American Association for Justice. A group of “compensation lawyers” founded the Roscoe Pound –American Trial Lawyers Foundation (now the The Pound Civil Justice Institute) in 1956, to build on Pound’s work.

Pound died in 1964, but he lives on in the hearts of social justice warriors, who seem oblivious of Pound’s affinity for Hitler and Nazism.[12] Pound’s enthusiasm was not a momentary lapse, but lasted a decade according to Daniel R. Coquillette, professor of American legal history at Harvard Law School.[13] Although Pound is represented in various ways as having been a great leader throughout the Harvard Law School, Coquillette says that volume two of his history of the school will address the sordid business of Pound’s Nazi leanings. In the meanwhile, no one is spraying graffiti on Pound’s portraits, photographs, and memorabilia, which are scattered throughout the School.

I would not want my defense of Fisher to be taken as a Trumpist “what-about” rhetorical diversion. Still, the Woke criteria for defenestrations seem, at best, to be applied inconsistently. More important, the Woke seem to have no patience for examining the positive contributions made by those they denounce. In Fisher’s (and Aristotle’s) case, the balance between good and bad ideas, and the creativity and brilliance of his important contributions, should allow of people of good will to celebrate his many achievements, without moral hand waving. If the Aristotelian Society can keep its name, the Cambridge should be able to keep its stained-glass window memorial to Fisher.


[1]        Bradley Efron, “R. A. Fisher in the 21st century,” 13 Statistical Science 95, 95 (1998).

[2]        See Ronald A. Fisher, The Genetical Theory of Natural Selection 228-55 (1930) (chap. XI, “Social Selection of Fertility,” addresses the “decay of ruling classes”).

[3]        UNESCO, The Race Concept: Results of an Inquiry (1952).

[4]        Id. at 27 (noting that “Sir Ronald Fisher has one fundamental objection to the Statement, which, as he himself says, destroys the very spirit of the whole document. He believes that human groups differ profoundly “in their innate capacity for intellectual and emotional development.”)

[5]        Id. at 56.

[6]        Id. at 27.

[7]        Karl Pearson & Margaret Moul, “The Problem of Alien Immigration into Great Britain, Illustrated by an Examination of Russian and Polish Jewish Children, Part I,” 1 Ann. Human Genetics 5 (1925) (opining that Jewish immigrants “will develop into a parasitic race. […] Taken on the average, and regarding both sexes, this alien Jewish population is somewhat inferior physically and mentally to the native population.” ); “Part II,” 2 Ann. Human Genetics 111 (1927); “Part III,” 3 Ann. Human Genetics 1 (1928).

[8]        “Petition: Remove the window in honour of R. A. Fisher at Gonville & Caius, University of Cambridge.” See Genevieve Holl-Allen, “Students petition for window commemorating eugenicist to be removed from college hall; The petition surpassed 600 signatures in under a day,” The Cambridge Tab (June 2020).

[9]        Eli Cahan, “Amid protests against racism, scientists move to strip offensive names from journals, prizes, and more,” Science (July 2, 2020); Sam Kean “Ronald Fisher, a Bad Cup of Tea, and the Birth of Modern Statistics: A lesson in humility begets a scientific revolution,” Distillations (Science History Institute) (Aug. 6, 2019). Bayesians have been all-too-happy to throw shade at Fisher. See Eric-Jan Wagenmakers & Johnny van Doorn, “This Statement by Sir Ronald Fisher Will Shock You,” Bayesian Spectacles (July 2, 2020).

[10]      Daniel Cleather, “Is Statistics Racist?Medium (Mar. 9, 2020).

[11]      Aristotle, Politics, 1254b16–21.

[12]      James Q. Whitman, Hitler’s American Model: The United States and the Making of Nazi Race Law 15 & n. 39 (2017); Stephen H. Norwood, The Third Reich in the Ivory Tower 56-57 (2009); Peter Rees, “Nathan Roscoe Pound and the Nazis,”  60 Boston Coll. L. Rev. 1313 (2019); Ron Grossman, “Harvard accused of coddling Nazis,” Chicago Tribune (Nov. 30, 2004).

[13]      Garrett W. O’Brien, “The Hidden History of the Harvard Law School Library’s Treasure Room,” The Crimson (Mar. 28, 2020).

David Madigan’s Graywashed Meta-Analysis in Taxotere MDL

June 12th, 2020

Once again, a meta-analysis is advanced as a basis for an expert witness’s causation opinion, and once again, the opinion is the subject of a Rule 702 challenge. The litigation is In re Taxotere (Docetaxel) Products Liability Litigation, a multi-district litigation (MDL) proceeding before Judge Jane Triche Milazzo, who sits on the United States District Court for the Eastern District of Louisiana.

Taxotere is the brand name for docetaxel, a chemotherapic medication used either alone or in conjunction with another chemotherapy, to treat a number of different cancers. Hair loss is a side effect of Taxotere, but in the MDL, plaintiffs claim that they have experienced permanent hair loss, which was not adequately warned about in their view. The litigation thus involved issues of exactly what “permanent” means, medical causation, adequacy of warnings in the Taxotere package insert, and warnings causation.

Defendant Sanofi challenged plaintiffs’ statistical expert witness, David Madigan, a frequent testifier for the lawsuit industry. In its Rule 702 motion, Sanofi argued that Madigan had relied upon two randomized clinical trials (TAX 316 and GEICAM 9805) that evaluated “ongoing alopecia” to reach conclusions about “permanent alopecia.” Sanofi made the point that “ongoing” is not “permanent,” and that trial participants who had ongoing alopecia may have had their hair grow back. Madigan’s reliance upon an end point different from what plaintiffs complained about made his analysis irrelevant. The MDL court rejected Sanofi’s argument, with the observation that Madigan’s analysis was not irrelevant for using the wrong end point, only less persuasive, and that Sanofi’s criticism was one that “Sanofi can highlight for the jury on cross-examination.”[1]

Did Judge Milazzo engage in judicial dodging with rejecting the relevancy argument and emphasizing the truism that Sanofi could highlight the discrepancy on cross-examination?  In the sense that the disconnect can be easily shown by highlight the different event rates for the alopecia differently defined, the Sanofi argument seems like one that a jury could easily grasp and refute. The judicial shrug, however, begs the question why the defendant should have to address a data analysis that does not support the plaintiffs’ contention about “permanence.” The federal rules are supposed to advance the finding of the truth and the fair, speedy resolution of cases.

Sanofi’s more interesting argument, from the perspective of Rule 702 case law, was its claim that Madigan had relied upon a flawed methodology in analyzing the two clinical trials:

“Sanofi emphasizes that the results of each study individually produced no statistically significant results. Sanofi argues that Dr. Madigan cannot now combine the results of the studies to achieve statistical significance. The Court rejects Sanofi’s argument and finds that Sanofi’s concern goes to the weight of Dr. Madigan’s testimony, not to its admissibility.34”[2]

There seems to be a lot going on in the Rule 702 challenge that is not revealed in the cryptic language of the MDL district court. First, the court deployed the jurisprudentially horrific, conclusory language to dismiss a challenge that “goes to the weight …, not to … admissibility.” As discussed elsewhere, this judicial locution is rarely true, fails to explain the decision, and shows a lack of engagement with the actual challenge.[3] Of course, aside from the inanity of the expression, and the failure to explain or justify the denial of the Rule 702 challenge, the MDL court may have been able to provide a perfectly adequately explanation.

Second, the footnote in the quoted language, number 34, was to the infamous Milward case,[4] with the explanatory parenthetical that the First Circuit had reversed a district court for excluding testimony of an expert witness who had sought to “draw conclusions based on combination of studies, finding that alleged flaws identified by district court go to weight of testimony not admissibility.”[5] As discussed previously, the widespread use of the “weight not admissibility” locution, even by the Court of Appeals, does not justify it. More important, however, the invocation of Milward suggests that any alleged flaws in combining study results in a meta-analysis are always matters for the jury, no matter how arcane, technical, or threatening to validity they may be.

So was Judge Milazzo engaged in judicial dodging in Her Honor’s opinion in Taxotere? Although the citation to Milward tends to inculpate, the cursory description of the challenge raises questions whether the challenge itself was valid in the first place. Fortunately, in this era of electronic dockets, finding the actual Rule 702 motion is not very difficult, and we can inspect the challenge to see whether it was dodged or given short shrift. Remarkably, the reality is much more complicated than the simple, simplistic rejection by the MDL court would suggest.

Sanofi’s brief attacks three separate analyses proffered by David Madigan, and not surprisingly, the MDL court did not address every point made by Sanofi.[6] Sanofi’s point about the inappropriateness of conducting the meta-analysis was its third in its supporting brief:

“Third, Dr. Madigan conducted a statistical analysis on the TAX316 and GEICAM9805/TAX301 clinical trials separately and combined them to do a ‘meta-analysis’. But Dr. Madigan based his analysis on unproven assumptions, rendering his methodology unreliable. Even without those assumptions, Dr. Madigan did not find statistical significance for either of the clinical trials independently, making this analysis unhelpful to the trier of fact.”[7]

This introductory statement of the issue is itself not particularly helpful because it fails to explain why combining two individual clinical trials (“RCTs”), each not having “statistically significant” results, by meta-analysis would be unhelpful. Sanofi’s brief identified other problems with Madigan’s analyses, but eventually returned to the meta-analysis issue, with the heading:

“Dr. Madigan’s analysis of the individual clinical trials did not result in statistical significance, thus is unhelpful to the jury and will unfairly prejudice Sanofi.”[8]

After a discussion of some of the case law about statistical significance, Sanofi pressed its case against Madigan. Madigan’s statistical analysis of each of two RCTs apparently did not reach statistical significance, and Sanofi complained that permitting Madigan to present these two analyses with results that were “not statistically very impressive,” would confuse and mislead the jury.[9]

“Dr. Madigan tried to avoid that result here [of having two statistically non-significant results] by conducting a ‘meta-analysis’ — a greywashed term meaning that he combined two statistically insignificant results to try to achieve statistical significance. Madigan Report at 20 ¶ 53. Courts have held that meta-analyses are admissible, but only when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist. RMSE at 361–362, fn76.”

Now the claims here are quite unsettling, especially considering that they were lodged in a defense brief, in an MDL, with many cases at stake, made on behalf of an important pharmaceutical company, represented by two large, capable national or international law firms.

First, what does the defense brief signify by placing ‘meta-analysis’ in quotes. Are these scare quotes to suggest that Madigan was passing off something as a meta-analysis that failed to be one? If so, there is nothing in the remainder of the brief that explains such an interpretation. Meta-analysis has been around for decades, and reporting meta-analyses of observational or of experimental studies has been the subject of numerous consensus and standard-setting papers over the last two decades. Furthermore, the FDA has now issued a draft guidance for the use of meta-analyses in pharmacoepidemiology. Scare quotes are at best unexplained, at worst, inappropriate. If the authors had something else in mind, they did not explain the meaning of using quotes around meta-analysis.

Second, the defense lawyers referred to meta-analysis as a “greywashed” term. I am always eager to expand my vocabulary, and so I looked up the word in various dictionaries of statistical and epidemiologic terms. Nothing there. Perhaps it was not a technical term, so I checked with the venerable Oxford English Dictionary. No relevant entries.

Pushed to the wall, I checked the font of all knowledge – the internet. To be sure, I found definitions, but nothing that could explain this odd locution in a brief filed in an important motion:

gray-washing: “noun In calico-bleaching, an operation following the singeing, consisting of washing in pure water in order to wet out the cloth and render it more absorbent, and also to remove some of the weavers’ dressing.”

graywashed: “adj. adopting all the world’s cultures but not really belonging to any of them; in essence, liking a little bit of everything but not everything of a little bit.”

Those definitions do not appear pertinent.

Another website offered a definition based upon the “blogsphere”:

Graywash: “A fairly new term in the blogsphere, this means an investigation that deals with an offense strongly, but not strongly enough in the eyes of the speaker.”

Hmmm. Still not on point.

Another one from “Urban Dictionary” might capture something of what was being implied:

Graywashing: “The deliberate, malicious act of making art having characters appear much older and uglier than they are in the book, television, or video game series.”

Still, I am not sure how this is an argument that a federal judge can respond to in a motion affecting many cases.

Perhaps, you say, I am quibbling with word choices, and I am not sufficiently in tune with the way people talk in the Eastern District of Louisiana. I plead guilty to both counts. But the third, and most important point, is the defense assertion that meta-analyses are only admissible “when used to reduce the numerical instability on existing statistically significant differences, not as a means to achieve statistical significance where it does not exist.”

This assertion is truly puzzling. Meta-analyses involve so many layers of hearsay that they will virtually never be admissible. Admissibility of the meta-analyses is virtually never the issue. When an expert witness has conducted a meta-analysis, or has relied upon one, the important legal question is whether the witness may reasonably rely upon the meta-analysis (under Rule 703) for an inference that satisfies Rule 702. The meta-analysis itself does not come into evidence, and does not go out to the jury for its deliberations.

But what about the defense brief’s “only when” language that clearly implies that courts have held that expert witnesses may rely upon meta-analyses only to reduce “numerical instability on existing statistically significant differences”? This seems clearly wrong because achieving statistical significance from studies that have no “instability” for their point estimates but individually lack statistical significance is a perfectly legitimate and valid goal. Consider a situation in which, for some reason, sample size in each study is limited by the available observations, but we have 10 studies, each with a point estimate of 1.5, and each with a 95% confidence interval of (0.88, 2.5). This hypothetical situation presents no instability of point estimates, and the meta-analytical summary point estimate would shrink the confidence interval so that the lower bound would exclude 1.0, in a perfectly valid analysis. In the real world, meta-analyses are conducted on studies with point estimates of risk that vary, because of random and non-random error, but there is no reason that meta-analyses cannot reduce random error to show that the summary point estimate is statistically significant at a pre-specified alpha, even though no constituent study was statistically significant.

Sanofi’s lawyers did not cite to any case for the remarkable proposition they advanced, but they did cite the Reference Manual for Scientific Evidence (RMSE). Earlier in the brief, the defense cited to this work in its third edition (2011), and so I turned to the cited page (“RMSE at 361–362, fn76”) only to find the introduction to the chapter on survey research, with footnotes 1 through 6.

After a diligent search through the third edition, I could not find any other language remotely supportive of the assertion by Sanofi’s counsel. There are important discussions about how a poorly conducted meta-analysis, or a meta-analysis that was heavily weighted in a direction by a methodologically flawed study, could render an expert witness’s opinion inadmissible under Rule 702.[10] Indeed, the third edition has a more sustained discussion of meta-analysis under the heading “VI. What Methods Exist for Combining the Results of Multiple Studies,”[11] but nothing in that discussion comes close to supporting the remarkable assertion by defense counsel.

On a hunch, I checked the second edition of RMSE, published in the year 2000. There was indeed a footnote 76, on page 361, which discussed meta-analysis. The discussion comes in the midst of the superseded edition’s chapter on epidemiology. Nothing, however, in the text or in the cited footnote appears to support the defense’s contention about meta-analyses are appropriate only when each included clinical trial has independently reported a statistically significant result.

If this analysis is correct, the MDL court was fully justified in rejecting the defense argument that combining two statistically non-significant clinical trials to yield a statistically significant result was methodologically infirm. No cases were cited, and the Reference Manual does not support the contention. Furthermore, no statistical text or treatise on meta-analysis supports the Sanofi claim. Sanofi did not support its motion with any affidavits of experts on meta-analysis.

Now there were other arguments advanced in support of excluding David Madigan’s testimony. Indeed, there was a very strong methodological challenge to Madigan’s decision to include the two RCTs in his meta-analysis, other than those RCTs lack of statistical significance on the end point at issue. In the words of the Sanofi brief:

“Both TAX clinical trials examined two different treatment regimens, TAC (docetaxel in combination with doxorubicin and cyclophosphamide) versus FAC (5-fluorouracil in combination with doxorubicin and cyclophosphamide). Madigan Report at 18–19 ¶¶ 47–48. Dr. Madigan admitted that TAC is not Taxotere alone, Madigan Dep. 305:21–23 (Ex. B); however, he did not rule out doxorubicin or cyclophosphamide in his analysis. Madigan Dep. 284:4–12 (“Q. You can’t rule out other chemotherapies as causes of irreversible alopecia? … A. I can’t rule out — I do not know, one way or another, whether other chemotherapy agents cause irreversible alopecia.”).”[12]

Now unlike the statistical significance argument, this argument is rather straightforward and turns on the clinical heterogeneity of the two trials that seems to clearly point to the invalidity of a meta-analysis of them. Sanofi’s lawyers could have easily supported this point with statements from standard textbooks and non-testifying experts (but alas did not). Sanofi did support their challenge, however, with citations to an important litigation and Fifth Circuit precedent.[13]

This closer look at the actual challenge to David Madigan’s opinions suggests that Sanofi’s counsel may have diluted very strong arguments about heterogeneity in exposure variable, and in the outcome variable, by advancing what seems a very doubtful argument based upon the lack of statistical significance of the individual studies in the Madigan meta-analysis.

Sanofi advanced two very strong points, first about the irrelevant outcome variable definitions used by Madigan, and second about the complexity of Taxotere’s being used with other, and different, chemotherapeutic agents in each of the two trials that Madigan combined.[14] The MDL court addressed the first point in a perfunctory and ultimately unsatisfactory fashion, but did not address the second point at all.

Ultimately, the result was that Madigan was given a pass to offer extremely tenuous opinions in an MDL on causation. Given that Madigan has proffered tendentious opinions in the past, and has been characterized as “an expert on a mission,” whose opinions are “conclusion driven,”[15] the missteps in the briefing, and the MDL court’s abridgement of the gatekeeping process are regrettable. Also regrettable is that the merits or demerits of a Rule 702 challenge cannot be fairly evaluated from cursory, conclusory judicial decisions riddled with meaningless verbiage such as “the challenge goes to the weight and not the admissibility of the witness.” Access to the actual Rule 702 motion helped shed important light on the inadequacy of one point in the motion but also the complexity and fullness of the challenge that was not fully addressed in the MDL court’s decision. It is possible that a Reply or a Supplemental brief, or oral argument, may have filled in gaps, corrected errors, or modified the motion, and the above analysis missed some important aspect of what happened in the Taxotere MDL. If so, all the more reason that we need better judicial gatekeeping, especially when a decision can affect thousands of pending cases.[16]


[1]  In re Taxotere (Docetaxel) Prods. Liab. Litig., 2019 U.S. Dist. LEXIS 143642, at *13 (E.D. La. Aug. 23, 2019) [Op.]

[2]  Op. at *13-14.

[3]  “Judicial Dodgers – Weight not Admissibility” (May 28, 2020).

[4]  Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11, 17-22 (1st Cir. 2011).

[5]  Op. at *13-14 (quoting and citing Milward, 639 F.3d at 17-22).

[6]  Memorandum in Support of Sanofi Defendants’ Motion to Exclude Expert Testimony of David Madigan, Ph.D., Document 6144, in In re Taxotere (Docetaxel) Prods. Liab. Litig. (E.D. La. Feb. 8, 2019) [Brief].

[7]  Brief at 2; see also Brief at 14 (restating without initially explaining why combining two statistically non-significant RCTs by meta-analysis would be unhelpful).

[8]  Brief at 16.

[9]  Brief at 17 (quoting from Madigan Dep. 256:14–15).

[10]  Michael D. Green, Michael Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” at 581n.89, in Fed. Jud. Center, Reference Manual on Scientific Evidence (3d ed. 2011).

[11]  Id. at 606.

[12]  Brief at 14.

[13]  Brief at 14, citing Burst v. Shell Oil Co., C. A. No. 14–109, 2015 WL 3755953, at *7 (E.D. La. June 16, 2015) (Vance, J.) (quoting LeBlanc v. Chevron USA, Inc., 396 F. App’x 94, 99 (5th Cir. 2010)) (“[A] study that notes ‘that the subjects were exposed to a range of substances and then nonspecifically note[s] increases in disease incidence’ can be disregarded.”), aff’d, 650 F. App’x 170 (5th Cir. 2016). SeeThe One Percent Non-solution – Infante Fuels His Own Exclusion in Gasoline Leukemia Case” (June 25, 2015).

[14]  Brief at 14-16.

[15]  In re Accutane Litig., 2015 WL 753674, at *19 (N.J.L.Div., Atlantic Cty., Feb. 20, 2015), aff’d, 234 N.J. 340, 191 A.3d 560 (2018). SeeJohnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015); “N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses” (Aug. 8, 2018).

[16]  Cara Salvatore, “Sanofi Beats First Bellwether In Chemo Drug Hair Loss MDL,” Law360 (Sept. 27, 2019).

April Fool – Zambelli-Weiner Must Disclose

April 2nd, 2020

Back in the summer of 2019, Judge Saylor, the MDL judge presiding over the Zofran birth defect cases, ordered epidemiologist, Dr. Zambelli-Weiner to produce documents relating to an epidemiologic study of Zofran,[1] as well as her claimed confidential consulting relationship with plaintiffs’ counsel.[2]

This previous round of motion practice and discovery established that Zambelli-Weiner was a paid consultant in advance of litigation, that her Zofran study was funded by plaintiffs’ counsel, and that she presented at a Las Vegas conference, for plaintiffs’ counsel only, on [sic] how to make mass torts perfect. Furthermore, she had made false statements to the court about her activities.[3]

Zambelli-Weiner ultimately responded to the discovery requests but she and plaintiffs’ counsel withheld several documents as confidential, pursuant to the MDL’s procedure for protective orders. Yesterday, April 1, 2020, Judge Saylor entered granted GlaxoSmithKline’s motion to de-designate four documents that plaintiffs claimed to be confidential.[4]

Zambelli-Weiner sought to resist GSK’s motion to compel disclosure of the documents on a claim that GSK was seeking the documents to advance its own litigation strategy. Judge Saylor acknowledged that Zambelli-Weiner’s psycho-analysis might be correct, but that GSK’s motive was not the critical issue. According to Judge Saylor, the proper inquiry was whether the claim of confidentiality was proper in the first place, and whether removing the cloak of secrecy was appropriate under the facts and circumstances of the case. Indeed, the court found “persuasive public-interest reasons” to support disclosure, including providing the FDA and the EMA a complete, unvarnished view of Zambelli-Weiner’s research.[5] Of course, the plaintiffs’ counsel, in close concert with Zambelli-Weiner, had created GSK’s need for the documents.

This discovery battle has no doubt been fought because plaintiffs and their testifying expert witnesses rely heavily upon the Zambelli-Weiner study to support their claim that Zofran causes birth defects. The present issue is whether four of the documents produced by Dr. Zambelli-Weiner pursuant to subpoena should continue to enjoy confidential status under the court’s protective order. GSK argued that the documents were never properly designated as confidential, and alternatively, the court should de-designate the documents because, among other things, the documents would disclose information important to medical researchers and regulators.

Judge Saylor’s Order considered GSK’s objections to plaintiffs’ and Zambelli-Weiner’s withholding four documents:

(1) Zambelli-Weiner’s Zofran study protocol;

(2) Undisclosed, hidden analyses that compared birth defects rates for children born to mothers who used Zofran with the rates seen with the use of other anti-emetic medications;

(3) An earlier draft Zambelli-Weiner’s Zofran study, which she had prepared to submit to the New England Journal of Medicine; and

(4) Zambelli-Weiner’s advocacy document, a “Causation Briefing Document,” which she prepared for plaintiffs’ lawyers.

Judge Saylor noted that none of the withheld documents would typically be viewed as confidential. None contained “sensitive personal, financial, or medical information.”[6]  The court dismissed Zambelli-Weiner’s contention that the documents all contained “business and proprietary information,” as conclusory and meritless. Neither she nor plaintiffs’ counsel explained how the requested documents implicated proprietary information when Zambelli-Weiner’s only business at issue is to assist in making lawsuits. The court observed that she is not “engaged in the business of conducting research to develop a pharmaceutical drug or other proprietary medical product or device,” and is related solely to her paid consultancy to plaintiffs’ lawyers. Neither she nor the plaintiffs’ lawyers showed how public disclosure would hurt her proprietary or business interests. Of course, if Zambelli-Weiner had been dishonest in carrying out the Zofran study, as reflected in study deviations from its protocol, her professional credibility and her business of conducting such studies might well suffer. Zambelli-Weiner, however, was not prepared to affirm the antecedent of that hypothetical. In any event, the court found that whatever right Zambelli-Weiner might have enjoyed to avoid discovery evaporated with her previous dishonest representations to the MDL court.[7]

The Zofran Study Protocol

GSK sought production of the Zofran study protocol, which in theory contained the research plan for the Zofran study and the analyses the researchers intended to conduct. Zambelli-Weiner attempted to resist production on the specious theory that she had not published the protocol, but the court found this “non-publication” irrelevant to the claim of confidentiality. Most professional organizations, such as the International Society of Pharmacoepidemiology (“ISPE”), which ultimately published Zambelli-Weiner’s study, encourage the publication and sharing of study protocols.[8] Disclosure of protocols helps ensure the integrity of studies by allowing readers to assess whether the researchers have adhered to their study plan, or have engaged in ad hoc data dredging in search for a desired result.[9]

The Secret, Undisclosed Analyses

Perhaps even more egregious than withholding the study protocol was the refusal to disclose unpublished analyses comparing the rate of birth defects among children born to mothers who used Zofran with the birth defect rates of children with in utero exposure to other anti-emetic medications.  In ruling that Zambelli-Weiner must produce the unpublished analyses, the court expressed its skepticism over whether these analyses could ever have been confidential. Under ISPE guidelines, researchers must report findings that significantly affect public health, and the relative safety of Zofran is essential to its evaluation by regulators and prescribing physicians.

Not only was Zambelli-Weiner’s failure to include these analyses in her published article ethically problematic, but she apparently hid these analyses from the Pharmacovigilance Risk Assessment Committee (PRAC) of the European Medicines Agency, which specifically inquired of Zambelli-Weiner whether she had performed such analyses. As a result, the PRAC recommended a label change based upon Zambelli-Weiner’s failure to disclosure material information. Furthermore, the plaintiffs’ counsel represented they intended to oppose GSK’s citizen petition to the FDA, based upon the Zambelli-Weiner study. The apparently fraudulent non-disclosure of relevant analyses could not have been more fraught for public health significance. The MDL court found that the public health need trumped any (doubtful) claim to confidentiality.[10] Against the obvious public interest, Zambelli-Weiner offered no “compelling countervailing interest” in keeping her secret analyses confidential.

There were other aspects to the data-dredging rationale not discussed in the court’s order. Without seeing the secret analyses of other anti-emetics, readers were deprive of an important opportunity to assess actual and potential confounding in her study. Perhaps even more important, the statistical tools that Zambelli-Weiner used, including any measurements of p-values and confidence intervals, and any declarations of “statistical significance,” were rendered meaningless by her secret, undisclosed, multiple testing. As noted by the American Statistical Association (ASA) in its 2016 position statement, “4. Proper inference requires full reporting and transparency.”

The ASA explains that the proper inference from a p-value can be completely undermined by “multiple analyses” of study data, with selective reporting of sample statistics that have attractively low p-values, or cherry picking of suggestive study findings. The ASA points out that common practices of selective reporting compromises valid interpretation. Hence the correlative recommendation:

“Researchers should disclose the number of hypotheses explored during the study, all data collection decisions, all statistical analyses conducted and all p-values computed. Valid scientific conclusions based on p-values and related statistics cannot be drawn without at least knowing how many and which analyses were conducted, and how those analyses (including p-values) were selected for reporting.”[11]

The Draft Manuscript for the New England Journal of Medicine

The MDL court wasted little time and ink in dispatching Zambelli-Weiner’s claim of confidentiality for her draft New England Journal of Medicine manuscript. The court found that she failed to explain how any differences in content between this manuscript and the published version constituted “proprietary business information,” or how disclosure would cause her any actual prejudice.

Zambelli-Weiner’s Litigation Road Map

In a world where social justice warriors complain about organizations such as Exponent, for its litigation support of defense efforts, the revelation that Zambelli-Weiner was helping to quarterback the plaintiffs’ offense deserves greater recognition. Zambelli-Weiner’s litigation road map was clearly created to help Grant & Eisenhofer, P.A., the plaintiffs’ lawyers,, create a causation strategy (to which she would add her Zofran study). Such a document from a consulting expert witness is typically the sort of document that enjoys confidentiality and protection from litigation discovery. The MDL court, however, looked beyond Zambelli-Weiner’s role as a “consulting witness” to her involvement in designing and conducting research. The broader extent of her involvement in producing studies and communicating with regulators made her litigation “strategery” “almost certainly relevant to scientists and regulatory authorities” charged with evaluating her study.”[12]

Despite Zambelli-Weiner’s protestations that she had made a disclosure of conflict of interest, the MDL court found her disclosure anemic and the public interest in knowing the full extent of her involvement in advising plaintiffs’ counsel, long before the study was conducted, great.[13]

The legal media has been uncommonly quiet about the rulings on April Zambelli-Weiner, in the Zofran litigation. From the Union of Concerned Scientists, and other industry scolds such as David Egilman, David Michaels, and Carl Cranor – crickets. Meanwhile, while the appeal over the admissibility of her testimony is pending before the Pennsylvania Supreme Court,[14] Zambelli-Weiner continues to create an unenviable record in Zofran, Accutane,[15] Mirena,[16] and other litigations.


[1]  April Zambelli‐Weiner, Christina Via, Matt Yuen, Daniel Weiner, and Russell S. Kirby, “First Trimester Pregnancy Exposure to Ondansetron and Risk of Structural Birth Defects,” 83 Reproductive Toxicology 14 (2019).

[2]  See In re Zofran (Ondansetron) Prod. Liab. Litig., 392 F. Supp. 3d 179, 182-84 (D. Mass. 2019) (MDL 2657) [cited as In re Zofran].

[3]  “Litigation Science – In re Zambelli-Weiner” (April 8, 2019); “Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL” (July 30, 2019). See also Nate Raymond, “GSK accuses Zofran plaintiffs’ law firms of funding academic study,” Reuters (Mar. 5, 2019).

[4]  In re Zofran Prods. Liab. Litig., MDL No. 1:15-md-2657-FDS, Order on Defendant’s Motion to De-Designate Certain Documents as Confidential Under the Protective Order (D.Mass. Apr. 1, 2020) [Order].

[5]  Order at n.3

[6]  Order at 3.

[7]  See In re Zofran, 392 F. Supp. 3d at 186.

[8]  Order at 4. See also Xavier Kurz, Susana Perez-Gutthann, the ENCePP Steering Group, “Strengthening standards, transparency, and collaboration to support medicine evaluation: Ten years of the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP),” 27 Pharmacoepidemiology & Drug Safety 245 (2018).

[9]  Order at note 2 (citing Charles J. Walsh & Marc S. Klein, “From Dog Food to Prescription Drug Advertising: Litigating False Scientific Establishment Claims Under the Lanham Act,” 22 Seton Hall L. Rev. 389, 431 (1992) (noting that adherence to study protocol “is essential to avoid ‘data dredging’—looking through results without a predetermined plan until one finds data to support a claim”).

[10]  Order at 5, citing Anderson v. Cryovac, Inc., 805 F.2d 1, 8 (1st Cir. 1986) (describing public-health concerns as “compelling justification” for requiring disclosing of confidential information).

[11]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016)

See alsoThe American Statistical Association’s Statement on and of Significance” (March 17, 2016).“Courts Can and Must Acknowledge Multiple Comparisons in Statistical Analyses (Oct. 14, 2014).

[12]  Order at 6.

[13]  Cf. Elizabeth J. Cabraser, Fabrice Vincent & Alexandra Foote, “Ethics and Admissibility: Failure to Disclose Conflicts of Interest in and/or Funding of Scientific Studies and/or Data May Warrant Evidentiary Exclusions,” Mealey’s Emerging Drugs Reporter (Dec. 2002) (arguing that failure to disclose conflicts of interest and study funding should result in evidentiary exclusions).

[14]  Walsh v. BASF Corp., GD #10-018588 (Oct. 5, 2016, Pa. Ct. C.P. Allegheny Cty., Pa.) (finding that Zambelli-Weiner’s and Nachman Brautbar’s opinions that pesticides generally cause acute myelogenous leukemia, that even the smallest exposure to benzene increases the risk of leukemia offended generally accepted scientific methodology), rev’d, 2018 Pa. Super. 174, 191 A.3d 838, 842-43 (Pa. Super. 2018), appeal granted, 203 A.3d 976 (Pa. 2019).

[15]  In re Accutane Litig., No. A-4952-16T1, (Jan. 17, 2020 N.J. App. Div.) (affirming exclusion of Zambelli-Weiner as an expert witness).

[16]  In re Mirena IUD Prods. Liab. Litig., 169 F. Supp. 3d 396 (S.D.N.Y. 2016) (excluding Zambelli-Weiner in part).

Practical Solutions for the Irreproducibility Crisis

March 3rd, 2020

I have previously praised the efforts of the National Association of Scholars (NAS) for its efforts to sponsor a conference on “Fixing Science: Practical Solutions for the Irreproducibility Crisis.” The conference was a remarkable event, with a good deal of diverse view points, civil discussion and debate, and collegiality.

The NAS has now posted a follow up to its conference, with a link to slide presentations, and to a You Tube page with videos of the presentations. The NAS, along with The Independent Institute, should be commended for their organizational efforts, and their transparency in making the conference contents available now to a wider audience.

The conference took place on February 7th and 8th, and I had the privilege of starting the event with my presentation, “Not Just an Academic Dispute: Irreproducible Scientific Evidence Renders Legal Judgments Unsafe”.

Some, but not all, of the interesting presentations that followed:

Tim Edgell, “Stylistic Bias, Selective Reporting, and Climate Science” (Feb. 7, 2020)

Patrick J. Michaels, “Biased Climate Science” (Feb. 7, 2020)

Daniele Fanelli, “Reproducibility Reforms if there is no Irreproducibility Crisis” (Feb. 8, 2020)

On Saturday, I had the additional privilege of moderating a panel on “Group Think” in science, and its potential for skewing research focus and publication:

Lee Jussim, “Intellectual Diversity Limits Groupthink in Scientific Psychology” (Feb. 8, 2020)

Mark Regnerus, “Groupthink in Sociology” (Feb. 8, 2020)

Michael Shermer, “Giving the Devil His Due” (Feb. 8, 2020)

Later on Saturday, the presenters turned to methodological issues, many of which are key to understanding ongoing scientific and legal controversies:

Stanley Young, “Prevention and Management of Acute and Late Toxicities in Radiation Oncology

James E. Enstrom, “Reproducibility is Essential to Combating Environmental Lysenkoism

Deborah Mayo, “P-Value ‘Reforms’: Fixing Science or Threats to Replication and Falsification?” (Feb. 8, 2020)

Ronald L. Wasserstein, “What Professional Organizations Can Do To Fix The Irreproducibility Crisis” (Feb. 8, 2020)

Louis Anthony Cox, Jr., “Causality, Reproducibility, and Scientific Generalization in Public Health” (Feb. 8, 2020)

David Trafimow, “What Journals Can Do To Fix The Irreproducibility Crisis” (Feb. 8, 2020)

David Randall, “Regulatory Science and the Irreproducibility Crisis” (Feb. 8, 2020)

Counter Cancel Culture – The NAS Conference on Irreproducibility

February 9th, 2020

The meaning of the world is the separation of wish and fact.”  Kurt Gödel

Back in October 2019, David Randall, the Director of Research, of the National Association of Scholars, contacted me to ask whether I would be interested in presenting at a conference, to be titled “Fixing Science: Practical Solutions for the Irreproducibility Crisis.” David explained that the conference would be aimed at a high level consideration of whether such a crisis existed, and if so, what salutary reforms might be implemented.

As for the character and commitments of the sponsoring organizations, David was candid and forthcoming. I will quote him, without his permission, and ask his forgiveness later:

The National Association of Scholars is taken to be conservative by many scholars; the Independent Institute is (broadly speaking) in the libertarian camp. The NAS is open to but currently agnostic about the degree of human involvement in climate change. The Independent Institute I take to be institutionally skeptical of consensus climate change theory–e.g., they recently hosted Willie Soon for lecture. A certain number of speakers prefer not to participate in events hosted by institutions with these commitments.”

To me, the ask was for a presentation on how the so-called replication crisis, or the irreproducibility crisis, affected the law. This issue was certainly one I have had much occasion to consider. Although I am aware of the “adjacency” arguments made by some that people should be mindful of whom they align with, I felt that nothing in my participation would compromise my own views or unduly accredit institutional positions of the sponsors.

I was flattered by the invitation, but I did some due diligence on the sponsoring organizations. I vaguely recalled the Independent Institute from my more libertarian days, but the National Association of Scholars (NAS, not to be confused with Nathan A. Schachtman) was relatively unknown to me. A little bit of research showed that the NAS had a legitimate interest in the irreproducibility crisis. David Randall had written a monograph for the organization, which was a nice summary of some of the key problems. The Irreproducibility Crisis of Modern Science: Causes, Consequences,and the Road to Reform (2018).

On other issues, the NAS seemed to live up to its description as “an organization of scholars committed to higher education as the catalyst of American freedom.” I listened to some of the group’s podcasts, Curriculum Vitae, and browsed through its publications. I found myself agreeing with many positions articulated by or through the NAS, and disagreeing with a few positions very strongly.

In looking over the list of other invited speakers, I saw great diversity of view points and approaches, One distinguished speaker, Daniele Fanelli, had criticized the very notion that there was a reproducibility crisis. In the world of statistics, there were strong defenders of statistical tests, and vociferous critics. I decided to accept the invitation, not because I was flattered, but because the replication issue was important, and I believed that I could add something to the discussion before an audience of professional scientists, statisticians, and educated lay persons. In writing to David Randall to accept the invitation, I told him that with respect to the climate change issues, I was not at all put off by healthy skepticism in the face all dogmas. Every dogma will have its day.

I did not give any further consideration to the political aspect of the conference until early January, when I received an email from a scientist, Lenny Teytelman, Ph.D., the C.E.O. of a company protocols.io, which addresses reproducibility issues. Dr Teytelman’s interest in improving reproducibility seemed quite genuine, but he wrote to express his deep concern about the conference and the organizations that were sponsoring it.

Perhaps a bit pedantically, he cautioned me that the NAS was not the National Academy of Sciences, a confusion that never occurred to me because the National Academies has been known as the National Academies of Science, Engineering and Medicine for several years now. Dr. Teytelman’s real concern seemed to be that the NAS is a “‘politically conservative advocacy group’.” (The internal scare quotes were Teytelman’s, but I was not afraid.) According to Dr. Teytelman, the NAS sought to undermine climate science and environmental protection by advancing a call for more reproducible science. He pointed me to what he characterized as an exposé on NAS, in Undark,1 and he cautioned me that the National Association of Scholars’ work is “dangerous.” Finally, Dr. Teytelman urged me to reconsider my decision to participate in the conference.

I did reconsider my decision, but reaffirmed it in an email I sent back to Dr. Teytelman. I realized that I could be wrong, in which case, I would eat my words, confident that they would be most digestible:

Dear Dr Teytelman,

Thank you for your note. I was aware of the piece on Undark’s website, as well as the difference between the NAS and the NASEM. I don’t believe anyone involved in science education would likely to be confused between the two organizations. A couple of years ago, I wrote a teaching module on biomedical causation for the National Academies. This is my first presentation at the request of the NAS, and frankly I am honored by the organization’s request that I present at its conference.

I have read other materials that have been critical of the NAS and its publications on climate change and other issues. I know that there are views of the organization from which I would dissent, but I do not see my disagreement on some issues as a reason not to attend, and present at a conference on an issue of great importance to the legal system.

I am hardly an expert on climate change issues, and that is my failing. Most of my professional work involves health effects regulation and litigation. If the NAS has advanced sophistical arguments against a scientific claim, then the proper antidote will be to demonstrate its fallacious reasoning and misleading marshaling of evidence. I should think, however, as someone interested in improving the reproducibility of scientific research, you will agree that there is much common ground for discussion and reform of scientific practice, on a broader arrange [sic] of issues than climate change.

As for the political ‘conservatism’, of the organization, I am not sure why that is a reason to eschew participation in a conference that should be of great importance to people of all political views. My own politics probably owe much to the influence of Michael Oakeshott, which puts me in perhaps the smallest political tribe of any in the United States. If conservatism means antipathy to post-modernism, identity politics, political orthodoxies, and assaults on Enlightenment values and the Rule of Law, then count me in.

In any event, thanks for your solicitude. I think I can participate and return with my soul intact.

All the best.

Nathan

To his credit, Dr. Teytelman tenaciously continued. He acknowledged that the political leanings of the organizers were not a reason to boycott, but he politely pressed his case. We were now on a first name basis:

Dear Nathan,

I very much applaud all efforts to improve the rigour of our science. The problem here is that this NAS organization has a specific goal – undermining the environmental protection and denying climate change. This is why 7 out of the 21 speakers at the event are climate change deniers. [https://docs.google.com/spreadsheets/d/136FNLtJzACc6_JbbOxjy2urbkDK7GefRZ/edit?usp=sharing] And this isn’t some small fringe effort to be ignored. Efforts of this organization and others like them have now gotten us to the brink of a regulatory change at the United States Environmental Protection Agency which can gut the entire EPA (see a recent editorial against this I co-authored). This conference is not a genuine effort to talk about reproducibility. The reproducibility part is a clever disguise for pushing a climate change denialism agenda.

Best,

Lenny

I looked more carefully at Lenny’s spreadsheet, and considered the issue afresh. We were both pretty stubborn:

Dear Lenny,

Thank you for this information. I will review with interest.

I do not see that the conference is primarily or even secondarily about climate change vel non. There are two scientists, Trafimow and Wasserstein with whom I have some disagreements about statistical methodology. Tony Cox and Stan Young, whatever their political commitments or views on climate change may be, are both very capable statisticians, from whom I have learned a great deal. The conference should be a lively conversation about reproducibility, not about climate change. Given your interests and background, you should go.

I believe that your efforts here are really quite illiberal, although they are in line with the ‘cancel culture’, so popular on campuses these days.

Forty three years ago, I entered a Roman Catholic Church to marry the woman I love. There were no lightning bolts or temblors, even though I was then and I am now an atheist. Yes, I am still married to my first wife. Although I share the late Christopher Hitchins’ low view of the Catholic Church, somehow I managed to overcome my antipathy to being married in what some would call a house of ill repute. I even manage to agree with some Papist opinions, although not for the superstitious reasons’ Papists embrace.

If I could tolerate the RC Church’s dogma for a morning, perhaps you could put aside the dichotomous ‘us and them’ view of the world and participate in what promises to be an interesting conference on reproducibility?

All the best.

Nathan

Lenny kindly acknowledged my having considered his issues, and wrote back a nice note, which I will quote again in full without permission, but with the hope that he will forgive me and even acknowledge that I have given his views an airing in this forum.

Hi Nathan,

We’ll have to agree to disagree. I don’t want to give a veneer of legitimacy to an organization whose goal is not improving reproducibility but derailing EPA and climate science.

Warmly,

Lenny

The business of psychoanalyzing motives and disparaging speakers and conference organizers is a dangerous business for several reasons. First motives can be inscrutable. Second, they can be misinterpreted. And third, they can be mixed. When speaking of organizations, there is the further complication of discerning a corporate motive among the constituent members.

The conference was an exciting, intellectually challenging event, which took place in Oakland, California, on February 7 and 8. I can report back to Lenny that his characterizations of and fears about the conference were unwarranted. While there were some assertions of climate change skepticism made with little or no evidence, the evidence-based presentations essentially affirmed climate change and sought to understand its causes and future course in a scientific way. But climate change was not why I went to this conference. On the more general issue of reform of scientific procedures and methods, we had open debates, some agreement on important principles, and robust and reasoned disagreement.

Lenny, you were correct that the NAS should not be ignored, but you should have gone to the meeting and participated in the conversation.


1 Michael Schulson, “A Remedy for Broken Science, Or an Attempt to Undercut It?Undark (April 18, 2018).

American Statistical Association – Consensus versus Personal Opinion

December 13th, 2019

Lawyers and judges pay close attention to standards, guidances, and consenus statements from respected and recognized professional organizations. Deviations from these standards may be presumptive evidence of malpractice or malfeasance in civil and criminal litigation, in regulatory matters, and in other contexts. One important, recurring situation arises when trial judges must act as gatekeepers of the admissibility of expert witness opinion testimony. In making this crucial judicial determination, judges will want to know whether a challenged expert witness has deviated from an accepted professional standard of care or practice.

In 2016, the American Statistical Association (ASA) published a consensus statement on p-values. The ASA statement grew out of a lengthy process that involved assembling experts of diverse viewpoints. In October 2015, the ASA convened a two-day meeting for 20 experts to meet and discuss areas of core agreement. Over the following three months, the participating experts and the ASA Board members continued their discussions, which led to the ASA Executive Committee’s approval of the statement that was published in March 2016.[1]

The ASA 2016 Statement spelled out six relatively uncontroversial principles of basic statistical practice.[2] Far from rejecting statistical significance, the six principles embraced statistical tests as an important but insufficient basis for scientific conclusions:

“3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.”

Despite the fairly clear and careful statement of principles, legal actors did not take long to misrepresent the ASA principles.[3] What had been a prescription about the insufficiency of p-value thresholds was distorted into strident assertions that statistical significance was unnecessary for scientific conclusions.

Three years after the ASA published its p-value consensus document, ASA Executive Director, Ronald Wasserstein, and two other statisticians, published an editorial in a supplemental issue of The American Statistician, in which they called for the abandonment of significance testing.[4] Although the Wasserstein’s editorial was clearly labeled as such, his essay introduced the special journal issue, and it appeared without disclaimer over his name, and his official status as the ASA Executive Director.

Sowing further confusion, the editorial made the following pronouncement:[5]

“The [2016] ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of ‘statistical significance’ be abandoned. We take that step here. We conclude, based on our review of the articles in this special issue and the broader literature, that it is time to stop using the term “statistically significant” entirely. Nor should variants such as ‘significantly different’, ‘p < 0.05’, and ‘nonsignificant’ survive, whether expressed in words, by asterisks in a table, or in some other way.”

The ASA is a collective body, and its ASA Statement 2016 was a statement from that body, which spoke after lengthy deliberation and debate. The language, quoted above, moves within one paragraph, from the ASA Statement to the royal “We,” who are taking the step of abandoning the term “statistically significant.” Given the unqualified use of the collective first person pronoun in the same paragraph that refers to the ASA, combined with Ronald Wasserstein’s official capacity, and the complete absence of a disclaimer that this pronouncement was simply a personal opinion, a reasonable reader could hardly avoid concluding that this pronouncement reflected ASA policy.

Your humble blogger, and others, read Wasserstein’s 2019 editorial as an ASA statement.[6] Although it is true that the 2019 paper is labeled “editorial,” and that the editorial does not describe a consensus process, there is no disclaimer such as is customary when someone in an official capacity publishes a personal opinion. Indeed, rather than the usual disclaimer, the Wasserstein editorial thanks the ASA Board of Directors “for generously and enthusiastically supporting the ‘p-values project’ since its inception in 2014.” This acknowledgement strongly suggests that the editorial is itself part of the “p-values project,” which is “enthusiastically” supported by the ASA Board of Directors.

If the editorial were not itself confusing enough, an unsigned email from “ASA <asamail@amstat.org>” was sent out in July 2019, in which the anonymous ASA author(s) takes credit for changing statistical guidelines at the New England Journal of Medicine:[7]

From: ASA <asamail@amstat.org>
Date: Thu, Jul 18, 2019 at 1:38 PM
Subject: Major Medical Journal Updates Statistical Policy in Response to ASA Statement
To: <XXXX>

The email is itself an ambiguous piece of evidence as to what the ASA is claiming. The email says that the New England Journal of Medicine changed its guidelines “in response to the ASA Statement on P-values and Statistical Significance and the subsequent The American Statistician special issue on statistical inference.” Of course, the “special issue” was not just Wasserstein’s editorial, but the 42 other papers. So this claim leaves open to doubt exactly what in the 2019 special issue the NEJM editors were responding to. Given that the 42 articles that followed Wasserstein’s editorial did not all agree with Wasserstein’s “steps taken,” or with each other, the only landmark in the special issue was the editorial over the name of the ASA’s Executive Director.

Moreover, a reading of the NEJM revised guidelines does not suggest that the journal’s editors were unduly influenced by the Wasserstein editorial or the 42 accompanying papers. The journal mostly responded to the ASA 2016 consensus paper by putting some teeth into its Principle 4, which dealt with multiplicity concerns in submitted manuscripts.  The newly adopted (2019) NEJM author guidelines do not take step out with Wasserstein and colleagues; there is no general prohibition on p-values or statements of “statistical significance.”

The confusion propagated by the Wasserstein 2019 editorial has not escaped the attention of other ASA officials. An editorial in the June 2019 issue of AmStat News, by ASA President Karen Kafadar, noted the prevalent confusion and uneasiness over the 2019 The American Statistician special issue, the lack of consensus, and the need for healthy debate.[8]

In this month’s issue of AmStat News, President Kafadar returned to the issue of the confusion over the 2019 ASA special issue of The American Statistician, in her “President’s Corner.” Because Executive Director Wasserstein’s editorial language about “we now take this step” is almost certainly likely to find its way into opportunistic legal briefs, Kafadar’s comments are worth noting in some detail:[9]

“One final challenge, which I hope to address in my final month as ASA president, concerns issues of significance, multiplicity, and reproducibility. In 2016, the ASA published a statement that simply reiterated what p-values are and are not. It did not recommend specific approaches, other than ‘good statistical practice … principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean’.

The guest editors of the March 2019 supplement to The American Statistician went further, writing: ‘The ASA Statement on P-Values and Statistical Significance stopped just short of recommending that declarations of “statistical significance” be abandoned. We take that step here. … [I]t is time to stop using the term “statistically significant” entirely’.

Many of you have written of instances in which authors and journal editors – and even some ASA members – have mistakenly assumed this editorial represented ASA policy. The mistake is understandable: The editorial was coauthored by an official of the ASA. In fact, the ASA does not endorse any article, by any author, in any journal – even an article written by a member of its own staff in a journal the ASA publishes.”

Kafadar’s caveat should quash incorrect assertions about the ASA’s position on statistical significance testing. It is a safe bet, however, that such assertions will appear in trial and appellate briefs.

Statistical reasoning is difficult enough for most people, but the hermeneutics of American Statistical Association publications on statistical significance may require a doctorate of divinity degree. In a cleverly titled post, Professor Deborah Mayo argues that there is no other way to interpret the Wasserstein 2019 editorial except as laying down an ASA prescription. Deborah G. Mayo, “Les stats, c’est moi,” Error Philosophy (Dec. 13, 2019). I accept President Kafadar’s correction at face value, and accept that I, like many other readers, misinterpreted the Wasserstein editorial as having the imprimatur of the ASA. Mayo points out, however, that Kafadar’s correction in a newsletter may be insufficient at this point, and that a stronger disclaimer is required. Officers of the ASA are certainly entitled to their opinions and the opportunity to present them, but disclaimers would bring clarity and transparency to published work of these officials.

Wasserstein’s 2019 editorial goes further to make a claim about how his “step” will ameliorate the replication crisis:

“In this world, where studies with ‘p < 0.05’ and studies with ‘p > 0.05 are not automatically in conflict, researchers will see their results more easily replicated – and, even when not, they will better understand why.”

The editorial here seems to be attempting to define replication failure out of existence. This claim, as stated, is problematic. A sophisticated practitioner may think of the situation in which two studies, one with p = .048, and another with p = 0.052 might be said not to be conflict. In real world litigation, however, advocates will take Wasserstein’s statement about studies not in conflict (despite p-values above and below a threshold, say 5%) to the extremes. We can anticipate claims that two similar studies with p-values above and below 5%, say with one p-value at 0.04, and the other at 0.40, will be described as not in conflict, with the second a replication of the first test. It is hard to see how this possible interpretation of Wasserstein’s editorial, although consistent with its language, will advance sound, replicable science.[10]


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016).

[2]The American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016).

[3] See, e.g., “The Education of Judge Rufe – The Zoloft MDL” (April 9, 2016) (Zoloft litigation); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016); “The American Statistical Association Statement on Significance Testing Goes to Court – Part I” (Nov. 13, 2018).

[4] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[5] Id. at S2.

[6] SeeHas the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019); Deborah G. Mayo, “The 2019 ASA Guide to P-values and Statistical Significance: Don’t Say What You Don’t Mean,” Error Statistics Philosophy (June 17, 2019); B. Haig, “The ASA’s 2019 update on P-values and significance,” Error Statistics Philosophy  (July 12, 2019).

[7] SeeStatistical Significance at the New England Journal of Medicine” (July 19, 2019); See also Deborah G. Mayo, “The NEJM Issues New Guidelines on Statistical Reporting: Is the ASA P-Value Project Backfiring?Error Statistics Philosophy  (July 19, 2019).

[8] See Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

[9] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019).

[10]  See also Deborah G. Mayo, “P‐value thresholds: Forfeit at your peril,” 49 Eur. J. Clin. Invest. e13170 (2019).

 

Is the IARC Lost in the Weeds?

November 30th, 2019

A couple of years ago, I met David Zaruk at a Society for Risk Analysis meeting, where we were both presenting. I was aware of David’s blogging and investigative journalism, but meeting him gave me a greater appreciation for the breadth and depth of his work. For those of you who do not know David, he is present in cyberspace as the Risk-Monger who blogs about risk and science communications issues. His blog has featured cutting-edge exposés about the distortions in risk communications perpetuated by the advocacy of non-governmental organizations (NGOs). Previously, I have recorded my objections to the intellectual arrogance of some such organizations that purport to speak on behalf of the public interest, when often they act in cahoots with the lawsuit industry in the manufacturing of tort and environmental litigation.

David’s writing on the lobbying and control of NGOs by plaintiffs’ lawyers from the United States should be required reading for everyone who wants to understand how litigation sausage is made. His series, “SlimeGate” details the interplay among NGO lobbying, lawsuit industry maneuvering, and carcinogen determinations at the International Agency for Research on Cancer (IARC). The IARC, a branch of the World Health Organization, is headquartered in Lyon, France. The IARC convenes “working groups” to review the scientific studies of the carcinogencity of various substances and processes. The IARC working groups produce “monographs” of their reviews, and the IARC publishes these monographs, in print and on-line. The United States is in the top tier of participating countries for funding the IARC.

The IARC was founded in 1965, when observational epidemiology was still very much an emerging science, with expertise concentrated in only a few countries. For its first few decades, the IARC enjoyed a good reputation, and its monographs were considered definitive reviews, especially under its first director, Dr. John Higginson, from 1966 to 1981.[1] By the end of the 20th century, the need for the IARC and its reviews had waned, as the methods of systematic review and meta-analyses had evolved significantly, and had became more widely standardized and practiced.

Understandably, the IARC has been concerned that the members of its working groups should be viewed as disinterested scientists. Unfortunately, this concern has been translated into an asymmetrical standard that excludes anyone with a hint of manufacturing connection, but keeps the door open for those scientists with deep lawsuit industry connections. Speaking on behalf of the plaintiffs’ bar, Michael Papantonio, a plaintiffs’ lawyer who founded Mass Torts Made Perfect, noted that “We [the lawsuit industry] operate just like any other industry.”[2]

David Zaruk has shown how this asymmetry has been exploited mercilessly by the lawsuit industry and its agents in connection with the IARC’s review of glyphosate.[3] The resulting IARC classification of glyphosate has led to a litigation firestorm and an all-out assault on agricultural sustainability and productivity.[4]

The anomaly of the IARC’s glyphosate classification has been noted by scientists as well. Dr. Geoffrey Kabat is a cancer epidemiologist, who has written perceptively on the misunderstandings and distortions of cancer risk assessments in various settings.[5] He has previously written about glyphosate in Forbes and elsewhere, but recently he has written an important essay on glyphosate in Issues in Science and Technology, which is published by the National Academies of Sciences, Engineering, and Medicine and Arizona State University. In his essay, Dr. Kabat details how the IARC’s evaluation of glyphosate is an outlier in the scientific and regulatory world, and is not well supported by the available evidence.[6]

The problems with the IARC are both substantive and procedural.[7] One of the key problems that face IARC evaluations is an incoherent classification scheme. IARC evaluations classify putative human carcinogenic risks into five categories: Group I (known), Group 2A (probably), Group 2B (possibly), Group 3 (unclassifiable), and Group 4 (probably not). Group 4 is virtually an empty set with only one substance, caprolactam ((CH2)5C(O)NH), an organic compound used in the manufacture of nylon.

In the IARC evaluation at issue, glyphosate was placed into Group 2A, which would seem to satisfy the legal system’s requirement that an exposure more likely than not causes the harm in question. Appearances and word usage, however, can be deceiving. Probability is a continuous scale from zero to one. In Bayesian decision making, zero and one are unavailable because if either was our starting point, no amount of evidence could ever change our judgment of the probability of causation. (Cromwell’s Rule) The IARC informs us that its use of “probably” is quite idiosyncratic; the probability that a Group 2A agent causes cancer has “no quantitative” meaning. All the IARC intends is that a Group 2A classification “signifies a greater strength of evidence than possibly carcinogenic.”[8]

In other words, Group 2A classifications are consistent with having posterior probabilities of less than 0.5 (or 50 percent). A working group could judge the probability of a substance or a process to be carcinogenic to humans to be greater than zero, but no more than five or ten percent, and still vote for a 2A classification, in keeping with the IARC Preamble. This low probability threshold for a 2A classification converts the judgment of “probably carcinogenic” into a precautionary prescription, rendered when the most probable assessment is either ignorance or lack of causality. There is thus a practical certainty, close to 100%, that a 2A classification will confuse judges and juries, as well as the scientific community.

In IARC-speak, a 2A “probability” connotes “sufficient evidence” in experimental animals, and “limited evidence” in humans. A substance can receive a 2A classification even when the sufficient evidence of carcinogenicity occurs in one non-human animal specie, even though other animal species fail to show carcinogenicity. A 2A classification can raise the thorny question in court whether a claimant is more like a rat or a mouse.

Similarly, “limited evidence” in humans can be based upon inconsistent observational studies that fail to measure and adjust for known and potential confounding risk factors and systematic biases. The 2A classification requires little substantively or semantically, and many 2A classifications leave juries and judges to determine whether a chemical or medication caused a human being’s cancer, when the basic predicates for Sir Austin Bradford Hill’s factors for causal judgment have not been met.[9]

In courtrooms, IARC 2A classifications should be excluded as legally irrelevant, under Rule 403. Even if a 2A IARC classification were a credible judgment of causation, admitting evidence of the classification would be “substantially outweighed by a danger of … unfair prejudice, confusing the issues, [and] misleading the jury….”[10]

The IARC may be lost in the weeds, but there is no need to fret. A little Round Up™ will help.


[1]  See John Higginson, “The International Agency for Research on Cancer: A Brief History of Its History, Mission, and Program,” 43 Toxicological Sci. 79 (1998).

[2]  Sara Randazzo & Jacob Bunge, “Inside the Mass-Tort Machine That Powers Thousands of Roundup Lawsuits,” Wall St. J. (Nov. 25, 2019).

[3]  David Zaruk, “The Corruption of IARC,” Risk Monger (Aug. 24, 2019); David Zaruk, “Greed, Lies and Glyphosate: The Portier Papers,” Risk Monger (Oct. 13, 2017).

[4]  Ted Williams, “Roundup Hysteria,” Slate Magazine (Oct. 14, 2019).

[5]  See, e.g., Geoffrey Kabat, Hyping Health Risks: Environmental Hazards in Everyday Life and the Science of Epidemiology (2008); Geoffrey Kabat, Getting Risk Right: Understanding the Science of Elusive Health Risks (2016).

[6]  Geoffrey Kabat, “Who’s Afraid of Roundup?” 36 Issues in Science and Technology (Fall 2019).

[7]  See Schachtman, “Infante-lizing the IARC” (May 13, 2018); “The IARC Process is Broken” (May 4, 2016). See also Eric Lasker and John Kalas, “Engaging with International Carcinogen Evaluations,” Law360 (Nov. 14, 2019).

[8]  “IARC Preamble to the IARC Monographs on the Identification of Carcinogenic Hazards to Humans,” at Sec. B.5., p.31 (Jan. 2019); See alsoIARC Advisory Group Report on Preamble” (Sept. 2019).

[9]  See Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965) (noting that only when “[o]ur observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” do we move on to consider the nine articulated factors for determining whether an association is causal.

[10]  Fed. R. Evid. 403.

 

Does the California State Bar Discriminate Unlawfully?

November 24th, 2019

Earlier this month, various news outlets announced a finding in a California study that black male attorneys are three times more likely to be disciplined by the State Bar than their white male counterparts.[1] Some of the news accounts treated the study findings as conclusions that the Bar had engaged in race discrimination. One particularly irresponsible website proclaimed that “bar discipline is totally racist.”[2] Indeed, the California State Bar itself apparently plans to hire consulting experts to help it achieve “bias-free decision-making and processes,” to eliminate “unintended bias,” and to consider how, if at all, to weigh prior complaints in the disciplinary procedure.[3]

The California Bar’s report was prepared by a social scientist, George Farkas, of the School of Education at University of California, Irvine. Based upon data from attorneys admitted to the California bar between 1990 and 2008, Professor Farkas reported crude prevalence rates of discipline, probation, disbarment, or resignation, by race.[4] The disbarment/ resignation rate for black male lawyers was 3.9%, whereas the rate for white male lawyers was 1%. Disparities, however, are not unlawful discriminations.

The disbarment/resignation rate for black female lawyers was 0.9%, but no one has suggested that there is implicit bias in favor of black women over both black and white male lawyers. White women were twice as likely as Asian women to resign, or be placed on probation or be disbarred (0.4% versus 0.2%).

The ABA’s coverage sheepishly admitted that “[d]ifferences could be explained by the number of complaints received about an attorney, the number of investigations opened, the percentage of investigations in which a lawyer was not represented by counsel, and previous discipline history.”[5]

Farkas’s report of October 31, 2019, was transmitted to the Bar’s Board of Trustees, on November 14th.[6] As anyone familiar with discrimination law would have expected, Professor Farkas conducted multiple regression analyses that adjusted for the number of previous complaints filed against the errant lawyer, and whether the lawyer was represented by counsel before the Bar. The full analyses showed that these other important variables, not race – not could – but did explain variability in discipline rates:

“Statistically, these variables explained all of the differences in probation and disbarment rates by race/ethnicity. Among all variables included in the final analysis, prior discipline history was found to have the strongest effects [sic] on discipline outcomes, followed by the proportion of investigations in which the attorney under investigation was represented by counsel, and the number of investigations.”[7]

The number of previous complaints against a particular lawyer surely has a role in considering whether a miscreant lawyer should be placed on probation, or subjected to disbarment. And without further refinement of the analysis, and irrespective of race or ethnicity, failure to retain counsel for disciplinary hearings may correlate strongly with futility of any defense.

Curiously, the Farkas report did not take into account the race or ethnicity of the complainants before the Bar’s disciplinary committee. The Farkas report seems reasonable as far as it goes, but the wild conclusions drawn in the media would not pass Rule 702 gatekeeping.


[1]  See, e.g., Emma Cueto, “Black Male Attorneys Disciplined More Often, California Study Finds,” Law360 (Nov. 18, 2019); Debra Cassens Weiss, “New California bar study finds racial disparities in lawyer discipline,” Am. Bar Ass’n J. (Nov. 18, 2019).

[2]  Joe Patrice, “Study Finds That Bar Discipline Is Totally Racist Shocking Absolutely No One: Black male attorneys are more likely to be disciplined than white attorneys,” Above the Law (Nov. 19, 2019).

[3]  Debra Cassens Weiss, “New California bar study finds racial disparities in lawyer discipline,” Am. Bar Ass’n J. (Nov. 18, 2019).

[4]  George Farkas, “Discrepancies by Race and Gender in Attorney Discipline by the State Bar of California: An Empirical Analysis” (Oct. 31, 2019).

[5]  Debra Cassens Weiss, supra at note 3.

[6]  Dag MacLeod (Chief of Mission Advancement & Accountability Division) & Ron Pi (Principal Analyst, Office of Research & Institutional Accountability), Report on Disparities in the Discipline System (Nov. 14, 2019).

[7] Dag MacLeod & Pi, Report on Disparities in the Discipline System at 4 (Nov. 14, 2019) (emphasis added).

Palavering About P-Values

August 17th, 2019

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[1] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Helena Chmura Kraemer is an accomplished professor of statistics at Stanford University. This week the Journal of the American Medical Association network flagged Professor Kraemer’s opinion piece on p-values as one of its most read articles. Kraemer’s eye-catching title creates the impression that the p-value is unnecessary and inimical to valid inference.[2]

Remarkably, Kraemer’s article commits the very mistake that the ASA set out to correct back in 2016,[3] by conflating the probability of the data under a hypothesis of no association with the probability of a hypothesis given the data:

“If P value is less than .05, that indicates that the study evidence was good enough to support that hypothesis beyond reasonable doubt, in cases in which the P value .05 reflects the current consensus standard for what is reasonable.”

The ASA tried to break the bad habit of scientists’ interpreting p-values as allowing us to assign posterior probabilities, such as beyond a reasonable doubt, to hypotheses, but obviously to no avail.

Kraemer also ignores the ASA 2016 Statement’s teaching of what the p-value is not and cannot do, by claiming that p-values are determined by non-random error probabilities such as:

“the reliability and sensitivity of the measures used, the quality of the design and analytic procedures, the fidelity to the research protocol, and in general, the quality of the research.”

Kraemer provides errant advice and counsel by insisting that “[a] non-significant result indicates that the study has failed, not that the hypothesis has failed.” If the p-value is the measure of the probability of observing an association at least as large as obtained given an assumed null hypothesis, then of course a large p-value cannot speak to the failure of the hypothesis, but why declare that the study has failed? The study was perhaps indeterminate, but it still yielded information that perhaps can be combined with other data, or help guide future studies.

Perhaps in her most misleading advice, Kraemer asserts that:

“[w]hether P values are banned matters little. All readers (reviewers, patients, clinicians, policy makers, and researchers) can just ignore P values and focus on the quality of research studies and effect sizes to guide decision-making.”

Really? If a high quality study finds an “effect size” of interest, we can now ignore random error?

The ASA 2016 Statement, with its “six principles,” has provoked some deliberate or ill-informed distortions in American judicial proceedings, but Kraemer’s editorial creates idiosyncratic meanings for p-values. Even the 2019 ASA “post-modernism” does not advocate ignoring random error and p-values, as opposed to proscribing dichotomous characterization of results as “statistically significant,” or not.[4] The current author guidelines for articles submitted to the Journals of the American Medical Association clearly reject this new-fangled rejection of evaluating this new-fangled rejection of the need to assess the role of random error.[5]


[1]  See Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[2]  Helena Chmura Kraemer, “Is It Time to Ban the P Value?J. Am. Med. Ass’n Psych. (August 7, 2019), in-press at doi:10.1001/jamapsychiatry.2019.1965.

[3]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016).

[4]  “Has the American Statistical Association Gone Post-Modern?” (May 24, 2019).

[5]  See instructions for authors at https://jamanetwork.com/journals/jama/pages/instructions-for-authors