For your delectation and delight, desultory dicta on the law of delicts.

The Populist Attack on Scientific Free Speech

July 18th, 2011

Siddhartha Mukherjee’s opinion piece in Sunday’s New York Times illustrates the populist efforts to muzzle and minimize industry’s efforts to communicate about scientific issues that affect public policy.  Mukherjee, “Opinion:  Patrolling Cancer’s Borderlands,” New York Times, Sunday Review, p. 8 (July 17, 2011).

Mukherjee, an an assistant professor of medicine at Columbia University, is the author of The Emperor of All Maladies: A Biography of Cancer, and a frequent commentator on public health issues.  In his recent article, Mukherjee notes how difficult it is identify a carcinogen with reasonable certainty.  Tobacco as a cause of lung cancer was easy, relatively, to identify because of the very strong associations shown by observational studies.  Scientists are dealing with smaller candidate risks now, and with cancers that are less common and therefore with more expected variability in population samples.  Mukerhjee seems to acknowledge these considerations, but he appears much less concerned with scientific accuracy than with what he perceives as industrial lobbying against the labeling of certain chemicals as carcinogens.

There is much that is objectionable in this populist attack on scientific speech and the right to petition the government.  Putting aside scientific inaccuracies such as referring to epidemiologic studies as “trials,” let me focus on what emerges as the dominant theme of the opinion article.  Three times in his short editorial, Mukherjee uses the term “lobbying” to describe scientific speech and analyses submitted by industrial representatives:

“Second: in mid-June, the National Toxicology Program, countering years of lobbying by certain industries, finally classified formaldehyde (used in plywood manufacturing and embalming) as a carcinogen.”

* * *

“The second challenge facing cancer control agencies is political. The formaldehyde case illustrates this. Unlike phone radiation, formaldehyde has a well-established mechanism to cause cancer: it is a strikingly reactive chemical that can directly attack DNA. Experiments performed in the 1970s demonstrated that the chemical causes cancer in mice and rats. Following this data, sophisticated trials [sic] showed that men and women exposed to formaldehyde — morticians, for instance — had higher rates of leukemia than unexposed people.

But some of these studies were performed three decades ago. Why have 30 years elapsed between them and the National Toxicology Program announcement? In part, because of active lobbying by various industries, in particular, plywood manufacturers, who have tried to thwart this classification.”

* * *

“Identifying a carcinogen, in short, isn’t sufficient. Beyond the science — which, as the cellphone example shows, can be hard enough — cancer-control agencies need to bolster political support, and neutralize lobbying interests, before a culprit carcinogen can be revealed to the public.”

Mukherjee, supra. Now, the references to lobbying over scientific interests suggest an image of industrial gladhanders plying agency scientists and bureaucrats with expensive gifts, meals, and travel.  If that were so, then the decried “lobbying” might well be offensive, but what Mukherjee is talking about is nothing more or less than scientific free speech.  Industrial concerns and associations submit discussions that call attention to inadequacies in the data and evidence that regulators seek to rely upon in their zealous attempts to protect the public health.  The issue, of course, is a scientific one of the accuracy of the regulators’ interpretation of the data.  By using the term “lobbying,” with its pejorative connotations, Mukherjee is playing to the Zeitgeist’s impatience with the facts, when they embarrass regulatory or tort law attempts to condemn aspects of our industrialized society.  The exhibited hostility to scientific speech is at odds with our core political, constitutional values of both free speech and the right to petition the government.  The dismissive attitude is also contrary to a good deal of scientific evidence.  See, e.g., C. Bosetti, J. McLaughlin, et al., “Formaldehyde and cancer risk: a quantitative review of cohort studies through 2006,” 19 Ann. Oncol. 29 (2008). The Times and Mukherjee know that most readers will not familiar with the factual dispute underlying the classification of formaldehyde, and this editorial is nothing less than a cynical attempt to mold public opinion by the use of ad hominem attacks on industry.

Note that Citizens for Science in the Public Interest, the Center for Regulatory Reform, SKAPP, and dozens of other organizations, submit their views on issues of carcinogenicity, or other other health concerns, but they are not labeled as “lobbyists.”  Note also that Mukherjee urges cancer-control agencies “to bolster political support,” as well as “neutralize lobbying interests.”  The identification of carcinogens is a scientific issue, not a political one.  Society can certainly decide to err on the side of precaution, but agencies such as the National Toxicology Program, or the International Agency for Research on Cancer, hold themselves out to be scientific agencies, not political organizations.  These agencies should act scientifically, and they should be amenable to scientific evidence and evaluation, marshaled by any stakeholder in the discussion over putative carcinogens.  Mukherjee’s rhetoric and propaganda should be rejected in a free society.

Bad and Good Statistical Advice from the New England Journal of Medicine

July 2nd, 2011

Many people consider The New England Journal of Medicine (NEJM) a prestigious journal.  It is certainly widely read.  Judging from its “impact factor,” we know the journal is frequently cited.  So when the NEJM weighs in on issue that involves the intersection of law and science, I pay attention.

Unfortunately, this week’s issue contains an editorial “Perspective” piece that is filled with incoherent, inconsistent, and incorrect assertions, both on the law and the science.  Mark A. Pfeffer and Marianne Bowler, “Access to Safety Data – Stockholders versus Prescribers,” 364 New Engl. J. Med. ___ (2011).

Dr. Mark Pfeffer and the Hon. Marianne Bowler used the recent United States Supreme Court decision in Matrixx Initiatives, Inc. v. Siracusano, __ U.S. __, 131 S.Ct., 1309 (2011), to advance views, not supported by the law or the science.   Remarkably, Dr. Pfeffer is the Victor J. Dzau Professor of Medicine, at the Harvard Medical School.  He is both a physician, and he has received a Ph.D. degree in physiology and biophysics.  Ms. Bowler is both a lawyer and a federal judge.  Between the two, they should have provided better, more accurate, and more consistent advice.

1. The Authors Erroneously Characterize Statistical Significance in Inappropriate Bayesian Terms

The article begins with a relatively straightforward characterization of various legal burdens of proof.  The authors then try to collapse one of those burdens of proof, “beyond a reasonable doubt,” which has no accepted quantitative meaning, to a significance probability that is used to reject a pre-specified null hypothesis in scientific studies:

“To reject the null hypothesis (that a result occurred by chance) and deem an intervention effective in a clinical trial, the level of proof analogous to law’s ‘beyond a reasonable doubt’ standard would require an extremely stringent alpha level to permit researchers to claim a statistically significant effect, with the offsetting risk that a truly effective intervention would sometimes be deemed ineffective.  Instead, most randomized clinical trials are designed to achieve a lower level of evidence that in legal jargon might be called ‘clear and convincing’, making conclusions drawn from it highly probable or reasonably certain.”

Now this is both scientific and legal nonsense.  It is distressing that a federal judge characterizes the burden of proof that she must apply, or direct juries to apply, as “legal jargon.”  More important, these authors, scientist and judge, give questionable quantitative meanings to burdens of proof, and they misstate the meaning of statistical significance.  When judges or juries must determine guilt “beyond a reasonable doubt,” they are assessing the prosecution’s claim that the defendant is guilty, given the evidence at trial.  This posterior probability can be represented as:

Probability (Guilt | Evidence Adduced)

This is what is known as a posterior probability, and it is fundamentally different from significance probability.

The significance probability is a transposed conditional probability from the posterior probability that is used to assess guilt in a criminal trial, or contentions in a civil trial.  As law professor David Kaye and his statistician coauthor, the late David Freedman, described the p-value and significance probability:

“The p-value is the probability of getting data as extreme as, or more extreme than, the actual data, given that the null hypothesis is true:

p = Probability (extreme data | null hypothesis in model)

* * *

Conversely, large p-values indicate that the data are compatible with the null hypothesis: the observed difference is easy to explain by chance. In this context, small p-values argue for the plaintiffs, while large p-values argue for the defense.131Since p is calculated by assuming that the null hypothesis is correct (no real difference in pass rates), the p-value cannot give the chance that this hypothesis is true. The p-value merely gives the chance of getting evidence against the null hypothesis as strong or stronger than the evidence at hand—assuming the null hypothesis to be correct. No matter how many samples are obtained, the null hypothesis is either always right or always wrong. Chance affects the data, not the hypothesis. With the frequency interpretation of chance, there is no meaningful way to assign a numerical probability to the null hypothesis.132

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” Federal Judicial Center, Reference Manual on Scientific Evidence 122 (2ed. 2000).  Kaye and Freedman explained over a decade ago, for the benefit of federal judges:

“As noted above, it is easy to mistake the p-value for the probability that there is no difference. Likewise, if results are significant at the .05 level, it is tempting to conclude that the null hypothesis has only a 5% chance of being correct.142

This temptation should be resisted. From the frequentist perspective, statistical hypotheses are either true or false; probabilities govern the samples, not the models and hypotheses. The significance level tells us what is likely to happen when the null hypothesis is correct; it cannot tell us the probability that the hypothesis is true. Significance comes no closer to expressing the probability that the null hypothesis is true than does the underlying p-value.143

Id. at 124-25.

As we can see, our scientist from the Harvard School of Medical School and our federal judge have committed the transpositional fallacy by likening “beyond a reasonable doubt” to the alpha used to test for a statistically significant outcome in a clinical trial.  They are not the same; nor are they analogous.

This fallacy has been repeatedly described.  Not only has the Reference Manual on Scientific Manual (which is written specifically for federal judges) described the fallacy in detail, but legal and scientific writers have urged care to avoid this basic mistake in probabilistic reasoning.  Here is a recent admonition from one of the leading writers on the use (and misuse) of statistics in legal procedures:

“Some commentators, however, would go much further; they argue that is an arbitrary statistical convention and since preponderance of the evidence means 51% probability, lawyers should not use 5% as the level of statistical significance but 49% – thus rejecting the null hypothesis when there is up to a 49% chance that it is true. In their view, to use a 5% standard of significance would impermissibly raise the preponderance of evidence standard in civil trials. Of course the 5% figure is arbitrary (although widely accepted in statistics) but the argument is fallacious. It assumes that 5% (or 49% for that matter) is the probability that the null hypothesis is true. The 5% level of significance is not that, but the probability of the sample evidence if the null hypothesis were true. This is a very different matter. As I pointed out in Chapter1, the probability of the sample given the null hypothesis is not generally the same as the probability of the null hypothesis given the sample. To relate the level of significance to the probability of the null hypothesis would require an application of Bayes’s theorem and the assumption of a prior probability distribution. However, the courts have usually accepted the statistical standard, although with some justifiable reservations when the P-value is only slightly above the 5% cutoff.”

Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 54 (N.Y. 2009) (emphasis added).

2.  The Authors, Having Mischaracterized Burden-of-Proof and Significance Probabilities, Incorrectly Assess the Meaning of the Supreme Court’s Decision in Matrixx Initiatives.

I have written a good bit about the Court’s decision in Matrixx Initiatives, most recently with David Venderbush, for the Washington Legal Foundation.  See Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” W.L.F. Legal Backgrounder (June 17, 2011).

I was thus startled to see the claim of a federal judge that the Supreme Court, in Matrixx, had “applied the ‘fair preponderance of the evidence’ standard of proof used for civil matters.”  Matrixx was a case about the sufficiency of the pleadings, and thus there really could have been no such application of a burden of proof to an evidentiary display.  The very claim is incoherent, and at odds with the Supreme Court’s holding.

The NEJM authors went on to detail how the defendant in Matrixx had persuaded the trial court that the evidence against its product, Zicam, did not reach statistical significance, and therefore the evidence should not be considered “material.”  As I have pointed out before, Matrixx focused on adverse event reports, as raw number of reported events, which did not, and could not, be analyzed for statistical significance.  The very essence of Matrixx’s argument was nonsense, which perhaps explains the company’s nine-nothing loss in the Supreme Court.  The authors of the opinion piece in the NEJM, however, missed that it is not the evidence of adverse event reports, with or without a statistical analysis, that is material.  What was at issue was whether the company’s failure to disclose this information, along with a good deal more information, in the face of the company’s having made very aggressive, optimistic sales and profits projections for the future.

The NEJM authors proceed to tell us, correctly, that adverse events do not prove causality, but then they tell us, incorrectly, that the Matrixx case shows that “such a high level of proof did not have to be achieved.”  While the authors are correct about the sufficiency of adverse event reports for causal assessments, they miss the legal significance of there being no burden of proof at play in Matrixx; it was a case on the pleadings.  The issue was the sufficiency of those pleadings, and what the Supreme Court made clear was that in the context of a product subject to FDA regulation, causation was never the test for materiality because the FDA could withdraw the product on a showing far less than scientific causation of harm.  So the plaintiffs could allege less than causation, and still have pleaded a sufficient case of securities fraud.  The Supreme Court did not, and could not, address the issue that the NEJM authors discuss.  The authors’ assessment that the Matrixx case freed legal causation of any requirement of statistical significance is a tortured reading of obiter dictum, not the holding of the case.  This editorializing is troubling.

The NEJM authors similarly hold forth on what clinicians consider material, and they announce that “[c]linicians are well aware that to be considered material, information regarding drug safety does not have to reach the same level of certainty that we demand for demonstrating efficacy.”  This is true, but clinicians are ethically bound to err on the side of safety:  Primum non nocere. See, e.g., Tamraz v. Lincoln Elec. Co., 620 F.3d 665, 673 (6th Cir. 2010) (noting that treating physicians have more training in diagnosis than in etiologic assessments), cert. denied, ___ U.S.____ (2011).  Again, the authors’ statements have nothing to do with the Matrixx case, or with the standards for legal or scientific causation.

3.  The Authors, Inconsistently with Their Characterization of Various Probabilities, Proceed Correctly To Describe Statistical Significance Testing for Adverse Outcomes in Trials.

Having incorrectly described beyond a reasonable doubt as like p <0.05, the NEJM authors then, correctly point out that standard statistical testing cannot be used for “evaluating unplanned and uncommon adverse events.”  The authors also note that the flood of data in the assessment of causation of adverse events is filled with “biologic noise.”  Physicians and regulators may take the noise signals and claim that they hear a concert.  This is exactly why we should not confuse precautionary judgments with scientific assessments of causation.