TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Mythology of Linear No-Threshold Cancer Causation

March 13th, 2015

“For the great enemy of the truth is very often not the lie—deliberate, contrived, and dishonest—but the myth—persistent, persuasive, and unrealistic. Too often we hold fast to the clichés of our forebears. We subject all facts to a prefabricated set of interpretations. We enjoy the comfort of opinion without the discomfort of thought.”

John F. Kennedy, Yale University Commencement (June 11, 1962)

         *        *        *        *        *        *        *        *        *

The linear no-threshold model for risk assessment has its origins in a dubious attempt of scientists playing at policy making[1]. The model has survived as a political strategy to inject the precautionary principle into regulatory decision making, but it has turned into a malignant myth in litigation over low-dose exposures to putative carcinogens. Ignorance or uncertainty about low-dose exposures is turned into an affirmative opinion that the low-dose exposures are actually causative. Call it contrived, or dishonest, or call it a myth, the LNT model is an intellectual cliché.

The LNT cliché pervades American media as well as courtrooms. Earlier this week, the New York Times provided a lovely example of the myth taking center stage, without explanation or justification. Lumber Liquidators is under regulatory and litigation attack for having sold Chinese laminate wood flooring made with formaldehyde-containing materials. According to a “60 Minutes” investigation, the flooring off-gases formaldehyde at concentrations in excess of regulatory permissible levels. See Aaron M. Kessler & Rachel Abrams, “Homeowners Try to Assess Risks From Chemical in Floors,” New York Times (Mar. 10, 2015).

The Times reporters, in discussing whether a risk exists to people who live in houses and apartments with the Lumber Liquidators flooring sought out and quoted the opinion of Marilyn Howarth:

“Any exposure to a carcinogen can increase your risk of cancer,” said Marilyn Howarth, a toxicologist at the University of Pennsylvania’s Perelman School of Medicine.

Id. Dr. Howarth, however, is not a toxicologist; she is an occupational and environmental physician, and serves as the Director of Occupational and Environmental Consultation Services at the Hospital of the University of Pennsylvania. She is also an adjunct associate professor of emergency medicine, and the Director, of the Community Outreach and Engagement Core, Center of Excellence in Environmental Toxicology, at the University of Pennsylvania Perelman School of Medicine. Without detracting from Dr. Howarth’s fine credentials, the New York Times reporters might have noticed that Dr. Howarth’s publications are primarily on latex allergies, and not on the issue of the effect of low-dose exposure to carcinogens.

The point is not to diminish Dr. Howarth’s accomplishments, but to criticize the Times reporters for seeking out an opinion of a physician whose expertise is not well matched to the question they raise about risks, and then to publish that opinion even though it is demonstrably wrong. Clearly, there are some carcinogens, and perhaps all, that do not increase risk at “any exposure.” Consider ethanol, which is known to cause cancer of the larynx, liver, female breast, and perhaps other organs[2]. Despite known causation, no one would assert that “any exposure” to alcohol-containing food and drink increases the risk of these cancers. And the same could be said for most, if not all, carcinogens. The human body has defense mechanisms to carcinogens, including DNA repair mechanisms and programmed cell suicide, which work to prevent carcinogenesis from low-dose exposures.

The no threshold hypothesis is really at best an hypothesis, with affirmative evidence showing that the hypothesis should be rejected for some cancers[3]. The factual status of LNT is a myth; it is an opinion, and a poorly supported opinion at that.

         *        *        *        *        *        *        *        *        *

“There are, in fact, two things: science and opinion. The former brings knowledge, the latter ignorance.”

Hippocrates of Cos


[1] See Edward J. Calabrese, “Cancer risk assessment foundation unraveling: New historical evidence reveals that the US National Academy of Sciences (US NAS), Biological Effects of Atomic Radiation (BEAR) Committee Genetics Panel falsified the research record to promote acceptance of the LNT,” 89 Arch. Toxicol. 649 (2015); Edward J. Calabrese & Michael K. O’Connor, “Estimating Risk of Low Radiation Doses – A Critical Review of the BEIR VII Report and its Use of the Linear No-Threshold (LNT) Hypothesis,” 182 Radiation Research 463 (2014); Edward J. Calabrese, “Origin of the linearity no threshold (LNT) dose–response concept,” 87 Arch. Toxicol. 1621 (2013); Edward J. Calabrese, “The road to linearity at low doses became the basis for carcinogen risk assessment,” 83 Arch. Toxicol. 203 (2009).

[2] See, e.g., IARC Monographs on the Evaluation of Carcinogenic Risks to Humans – Alcohol Consumption and Ethyl Carbamate; volume 96 (2010).

[3] See, e.g., Jerry M. Cuttler, “Commentary on Fukushima and Beneficial Effects of Low Radiation,” 11 Dose-Response 432 (2013); Jerry M. Cuttler, “Remedy for Radiation Fear – Discard the Politicized Science,” 12 Dose Response 170 (2014).

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Sander Greenland is one of the few academics, who has served as an expert witness, who has written post-mortems of his involvement in various litigations[1]. Although settling scores with opposing expert witnesses can be a risky business[2], the practice can provide important insights for judges and lawyers who want to avoid the errors of the past. Greenland correctly senses that many errors seem endlessly recycled, and that courts could benefit from disinterested commentary on cases. And so, there should be a resounding affirmation from federal and state courts to the proclaimed “need for critical appraisal of expert witnesses in epidemiology and statistics,” as well as in many other disciplines.

A recent exchange[3] with Professor Greenland led me to revisit his Wake Forest Law Review article. His article raises some interesting points, some mistaken, but some valuable and thoughtful considerations about how to improve the state of statistical expert witness testimony. For better and worse[4], lawyers who litigate health effects issues should read it.

Other Misunderstandings

Greenland posits criticisms of defense expert witnesses[5], who he believes have misinterpreted or misstated the appropriate inferences to be drawn from null studies. In one instance, Greenland revisits one of his own cases, without any clear acknowledgment that his views were largely rejected.[6] The State of California had declared, pursuant to Proposition 65 ( the Safe Drinking Water and Toxic Enforcement Act of 1986, Health and Safety Code sections 25249.5, et seq.), that the State “knew” that di(2-ethylhexyl)phthalate, or “DEHP” caused cancer. Baxter Healthcare challenged the classification, and according to Greenland, the defense experts erroneously interpreted inclusive studies with evidence supporting a conclusion that DEHP does not cause cancer.

Greenland argues that the Baxter expert’s reference[7] to an IARC working group’s classification of DEHP as “not classifiable as to its carcinogenicity to humans” did not support the expert’s conclusion that DEHP does not cause cancer in human. If Baxter’s expert invoked the IARC working group’s classification for complete exoneration of DEHP, then Greenland’s point is fair enough. In his single-minded attack on Baxter’s expert’s testimony, however, Greenland missed a more important point, which is that the IARC’s determination that DEHP is not classifiable as to carcinogenicity is directly contradictory of California’s epistemic claim to “know” that DEHP causes cancer. And Greenland conveniently omits any discussion that the IARC working group had reclassified DEHP from “possibly carcinogenic” to “not classifiable,” in the light of its conclusion that mechanistic evidence of carcinogenesis in rodents did not pertain to humans.[8] Greenland maintains that Baxter’s experts misrepresented the IARC working group’s conclusion[9], but that conclusion, at the very least, demonstrates that California was on very shaky ground when it declared that it “knew” that DEHP was a carcinogen. California’s semantic gamesmanship over its epistemic claims is at the root of the problem, not a misstep by defense experts in describing inconclusive evidence as exonerative.

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.

39 Wake Forest Law Rev. at 303. Despite Greenland’s alignment with California in the Denton case, the fact of the matter is that a verdict of “uncertain” was allowed, and he was free to criticize California for making a grossly exaggerated epistemic claim on inconclusive evidence.

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

39 Wake Forest Law Rev. at 305. And yet, his involvement in the Denton case (as well as other cases, such as silicone gel breast implant cases, thimerosal cases, etc.) suggest that he is willing to lend aid and support to judgments for plaintiffs when the evidence is inconclusive.

Important Advice and Recommendations

These foregoing points are rather severe limitations to Greenland’s article, but lawyers and judges should also look to what is good and helpful here. Greenland is correct to call out expert witnesses, regardless of party of affiliation, who opine that inconclusive studies are “proof” of the null hypothesis. Although some of Greenland’s arguments against the use of significance probability may be overstated, his corrections to the misstatements and misunderstandings of significance probability should command greater attention in the legal community. In one strained passage, however, Greenland uses a disjunction to juxtapose null hypothesis testing with proof beyond a reasonable doubt[10]. Greenland of course understands the difference, but the context would lead some untutored readers to think he has equated the two probabilistic assessments. Writing in a law review for lawyers and judges might have led him to be more careful. Given the prevalence of plaintiffs’ counsel’s confusing the 95% confidence coefficient with a burden of proof akin to beyond a reasonable doubt, great care in this area is, indeed, required.

Despite his appearing for plaintiffs’ counsel in health effects litigation, some of Greenland’s suggestions are balanced and perhaps more truth-promoting than many plaintiffs’ counsel would abide. His article provides an important argument in favor of raising the legal criteria for witnesses who purport to have expertise to address and interpret epidemiologic and experimental evidence[11]. And beyond raising qualification requirements above mere “reasonable pretense at expertise,” Professor Greenland offers some thoughtful, helpful recommendations for improving expert witness testimony in the courts:

  • “Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
  • Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
  • Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Each of these three suggestions is valuable and constructive, and worthy of an independent paper. The recommendation of neutral expert witnesses and scholarly tutorials for judges is hardly new. Many defense counsel and judges have argued for them in litigation and in commentary. The first recommendation, of publishing “controversial testimony” is part of the purpose of this blog. There would be great utility to making expert witness testimony, and analysis thereof, more available for didactic purposes. Perhaps the more egregious testimonial adventures should be republished in professional journals, as Greenland suggests. Greenland qualifies his recommendation with “as space allows,” but space is hardly the limiting consideration in the digital age.

Causation

Professor Greenland correctly points out that causal concepts and conclusions are often essentially contested[12], but his argument might well be incorrectly taken for “anything goes.” More helpfully, Greenland argues that various academic ideals should infuse expert witness testimony. He suggests that greater scholarship, with acknowledgment of all viewpoints, and all evidence, is needed in expert witnessing. 39 Wake Forest Law Rev. at 293.

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Id. Despite his position in the Denton case, and others, Greenland and all expert witnesses are free to maintain that more research is needed before a causal claim can be supported. Greenland also maintains that expert witnesses should “look past” the conclusions drawn by authors, and base their opinions on the “actual data” on which the statistical analyses are based, and from which conclusions have been drawn. Courts have generally rejected this view, but if courts were to insist upon real expertise in epidemiology and statistics, then the testifying expert witnesses should not be constrained by the hearsay opinions in the discussion sections of published studies – sections which by nature are incomplete and tendentious. See Follow the Data, Not the Discussion” (May 2, 2010).

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

39 Wake Forest Law Rev. at 293-4. This recommendation would be helpful in assuring courts that the data may simply not support conclusions sufficiently certain to be submitted to lay judges and jurors. Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319, 320 (7th Cir. 1996) (“But the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”) (internal citations omitted).

Threats to Validity

One of the serious mistakes counsel often make in health effects litigation is to invite courts to believe that statistical significance is sufficient for causal inferences. Greenland emphasizes that validity considerations often are much stronger, and more important considerations than the play of random error[13]:

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

*   *   *   *   *   *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

39 Wake Forest Law Rev. at 302 – 03. Greenland’s abbreviated list of threats to validity should remind courts that they cannot sniff a p-value below five percent and then safely kick the can to the jury. The literature on evaluating bias and confounding is huge, but Greenland was a co-author on an important recent paper, which needs to be added to the required reading lists of judges charged with gatekeeping expert witness opinion testimony about health effects. See Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).


[1] For an influential example of this sparse genre, see James T. Rosenbaum, “Lessons from litigation over silicone breast implants: A call for activism by scientists,” 276 Science 1524 (1997) (describing the exaggerations, distortions, and misrepresentations of plaintiffs’ expert witnesses in silicone gel breast implant litigation, from perspective of a highly accomplished scientist physician, who served as a defense expert witness, in proceedings before Judge Robert Jones, in Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). In one attempt to “correct the record” in the aftermath of a case, Greenland excoriated a defense expert witness, Professor Robert Makuch, for stating that Bayesian methods are rarely used in medicine or in the regulation of medicines. Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004).  Greenland heaped adjectives upon his adversary, “ludicrous claim,” “disturbing, “misleading expert testimony,” and “demonstrably quite false.” See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014) (debunking Prof. Greenland’s claims).

[2] One almost comical example of trying too hard to settle a score occurs in a footnote, where Greenland cites a breast implant case as having been reversed in part by another case in the same appellate court. See 39 Wake Forest Law Rev. at 309 n.68, citing Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir. 1999), aff’d in part & rev’d in part, United States v. Baxter Int’l, Inc., 345 F.3d 866 (11th Cir. 2003). The subsequent case was not by any stretch of the imagination a reversal of the earlier Allison case; the egregious citation is a legal fantasy. Furthermore, Allison had no connection with the procedures for court-appointed expert witnesses or technical advisors. Perhaps the most charitable interpretation of this footnote is that it was injected by the law review editors or supervisors.

[3] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[4] In addition to the unfair attack on Professor Makuch, see supra, n.1, there is much that some will find “disturbing,” “misleading,” and even “ludicrous,” (some of Greenland’s favorite pejorative adjectives) in the article. Greenland repeats in brief his arguments against the legal system’s use of probabilities of causation[4], which I have addressed elsewhere.

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[6] See 39 Wake Forest Law Rev. at 294-95, citing Baxter Healthcare Corp. v. Denton, No. 99CS00868, 2002 WL 31600035, at *1 (Cal. App. Dep’t Super. Ct. Oct. 3, 2002) (unpublished); Baxter Healthcare Corp. v. Denton, 120 Cal. App. 4th 333 (2004)

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

[10] 39 Wake Forest L. Rev. at 305 (“If it is necessary to prove causation ‛beyond a reasonable doubt’–or be ‛compelled to give up the null’ – then action can be forestalled forever by focusing on any aspect of available evidence that fails to conform neatly with the causal (alternative) hypothesis. And in medical and social science there is almost always such evidence available, not only because of the ‛play of chance’ (the focus of ordinary statistical theory), but also because of the numerous validity problems in human research.”

[11] See Peter Green, “Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases” (Jan. 23, 2002) (writing on behalf of The Royal Statistical Society; “Although many scientists have some familiarity with statistical methods, statistics remains a specialised area. The Society urges you to take steps to ensure that statistical evidence is presented only by appropriately qualified statistical experts, as would be the case for any other form of expert evidence.”).

[12] 39 Wake Forest Law Rev. at 291 (“In reality, there is no universally accepted method for inferring presence or absence of causation from human observational data, nor is there any universally accepted method for inferring probabilities of causation (as courts often desire); there is not even a universally accepted definition of cause or effect.”).

[13] 39 Wake Forest Law Rev. at 302-03 (“If one is more concerned with explaining associations scientifically, rather than with mechanical statistical analysis, evidence about validity can be more important than statistical results.”).

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Sander Greenland is one of the few academics, who has served as an expert witness, who has written post-mortems of his involvement in various litigations[1]. Although settling scores with opposing expert witnesses can be a risky business[2], the practice can provide important insights for judges and lawyers who want to avoid the errors of the past. Greenland correctly senses that many errors seem endlessly recycled, and that courts could benefit from disinterested commentary on cases. And so, there should be a resounding affirmation from federal and state courts to the proclaimed “need for critical appraisal of expert witnesses in epidemiology and statistics,” as well as in many other disciplines.

A recent exchange[3] with Professor Greenland led me to revisit his Wake Forest Law Review article. His article raises some interesting points, some mistaken, but some valuable and thoughtful considerations about how to improve the state of statistical expert witness testimony. For better and worse[4], lawyers who litigate health effects issues should read it.

Other Misunderstandings

Greenland posits criticisms of defense expert witnesses[5], who he believes have misinterpreted or misstated the appropriate inferences to be drawn from null studies. In one instance, Greenland revisits one of his own cases, without any clear acknowledgment that his views were largely rejected.[6] The State of California had declared, pursuant to Proposition 65 ( the Safe Drinking Water and Toxic Enforcement Act of 1986, Health and Safety Code sections 25249.5, et seq.), that the State “knew” that di(2-ethylhexyl)phthalate, or “DEHP” caused cancer. Baxter Healthcare challenged the classification, and according to Greenland, the defense experts erroneously interpreted inclusive studies with evidence supporting a conclusion that DEHP does not cause cancer.

Greenland argues that the Baxter expert’s reference[7] to an IARC working group’s classification of DEHP as “not classifiable as to its carcinogenicity to humans” did not support the expert’s conclusion that DEHP does not cause cancer in human. If Baxter’s expert invoked the IARC working group’s classification for complete exoneration of DEHP, then Greenland’s point is fair enough. In his single-minded attack on Baxter’s expert’s testimony, however, Greenland missed a more important point, which is that the IARC’s determination that DEHP is not classifiable as to carcinogenicity is directly contradictory of California’s epistemic claim to “know” that DEHP causes cancer. And Greenland conveniently omits any discussion that the IARC working group had reclassified DEHP from “possibly carcinogenic” to “not classifiable,” in the light of its conclusion that mechanistic evidence of carcinogenesis in rodents did not pertain to humans.[8] Greenland maintains that Baxter’s experts misrepresented the IARC working group’s conclusion[9], but that conclusion, at the very least, demonstrates that California was on very shaky ground when it declared that it “knew” that DEHP was a carcinogen. California’s semantic gamesmanship over its epistemic claims is at the root of the problem, not a misstep by defense experts in describing inconclusive evidence as exonerative.

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.

39 Wake Forest Law Rev. at 303. Despite Greenland’s alignment with California in the Denton case, the fact of the matter is that a verdict of “uncertain” was allowed, and he was free to criticize California for making a grossly exaggerated epistemic claim on inconclusive evidence.

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

39 Wake Forest Law Rev. at 305. And yet, his involvement in the Denton case (as well as other cases, such as silicone gel breast implant cases, thimerosal cases, etc.) suggest that he is willing to lend aid and support to judgments for plaintiffs when the evidence is inconclusive.

Important Advice and Recommendations

These foregoing points are rather severe limitations to Greenland’s article, but lawyers and judges should also look to what is good and helpful here. Greenland is correct to call out expert witnesses, regardless of party of affiliation, who opine that inconclusive studies are “proof” of the null hypothesis. Although some of Greenland’s arguments against the use of significance probability may be overstated, his corrections to the misstatements and misunderstandings of significance probability should command greater attention in the legal community. In one strained passage, however, Greenland uses a disjunction to juxtapose null hypothesis testing with proof beyond a reasonable doubt[10]. Greenland of course understands the difference, but the context would lead some untutored readers to think he has equated the two probabilistic assessments. Writing in a law review for lawyers and judges might have led him to be more careful. Given the prevalence of plaintiffs’ counsel’s confusing the 95% confidence coefficient with a burden of proof akin to beyond a reasonable doubt, great care in this area is, indeed, required.

Despite his appearing for plaintiffs’ counsel in health effects litigation, some of Greenland’s suggestions are balanced and perhaps more truth-promoting than many plaintiffs’ counsel would abide. His article provides an important argument in favor of raising the legal criteria for witnesses who purport to have expertise to address and interpret epidemiologic and experimental evidence[11]. And beyond raising qualification requirements above mere “reasonable pretense at expertise,” Professor Greenland offers some thoughtful, helpful recommendations for improving expert witness testimony in the courts:

  • “Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
  • Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
  • Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Each of these three suggestions is valuable and constructive, and worthy of an independent paper. The recommendation of neutral expert witnesses and scholarly tutorials for judges is hardly new. Many defense counsel and judges have argued for them in litigation and in commentary. The first recommendation, of publishing “controversial testimony” is part of the purpose of this blog. There would be great utility to making expert witness testimony, and analysis thereof, more available for didactic purposes. Perhaps the more egregious testimonial adventures should be republished in professional journals, as Greenland suggests. Greenland qualifies his recommendation with “as space allows,” but space is hardly the limiting consideration in the digital age.

Causation

Professor Greenland correctly points out that causal concepts and conclusions are often essentially contested[12], but his argument might well be incorrectly taken for “anything goes.” More helpfully, Greenland argues that various academic ideals should infuse expert witness testimony. He suggests that greater scholarship, with acknowledgment of all viewpoints, and all evidence, is needed in expert witnessing. 39 Wake Forest Law Rev. at 293.

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Id. Despite his position in the Denton case, and others, Greenland and all expert witnesses are free to maintain that more research is needed before a causal claim can be supported. Greenland also maintains that expert witnesses should “look past” the conclusions drawn by authors, and base their opinions on the “actual data” on which the statistical analyses are based, and from which conclusions have been drawn. Courts have generally rejected this view, but if courts were to insist upon real expertise in epidemiology and statistics, then the testifying expert witnesses should not be constrained by the hearsay opinions in the discussion sections of published studies – sections which by nature are incomplete and tendentious. See Follow the Data, Not the Discussion” (May 2, 2010).

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

39 Wake Forest Law Rev. at 293-4. This recommendation would be helpful in assuring courts that the data may simply not support conclusions sufficiently certain to be submitted to lay judges and jurors. Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319, 320 (7th Cir. 1996) (“But the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”) (internal citations omitted).

Threats to Validity

One of the serious mistakes counsel often make in health effects litigation is to invite courts to believe that statistical significance is sufficient for causal inferences. Greenland emphasizes that validity considerations often are much stronger, and more important considerations than the play of random error[13]:

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

*   *   *   *   *   *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

39 Wake Forest Law Rev. at 302 – 03. Greenland’s abbreviated list of threats to validity should remind courts that they cannot sniff a p-value below five percent and then safely kick the can to the jury. The literature on evaluating bias and confounding is huge, but Greenland was a co-author on an important recent paper, which needs to be added to the required reading lists of judges charged with gatekeeping expert witness opinion testimony about health effects. See Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).


[1] For an influential example of this sparse genre, see James T. Rosenbaum, “Lessons from litigation over silicone breast implants: A call for activism by scientists,” 276 Science 1524 (1997) (describing the exaggerations, distortions, and misrepresentations of plaintiffs’ expert witnesses in silicone gel breast implant litigation, from perspective of a highly accomplished scientist physician, who served as a defense expert witness, in proceedings before Judge Robert Jones, in Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). In one attempt to “correct the record” in the aftermath of a case, Greenland excoriated a defense expert witness, Professor Robert Makuch, for stating that Bayesian methods are rarely used in medicine or in the regulation of medicines. Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004).  Greenland heaped adjectives upon his adversary, “ludicrous claim,” “disturbing, “misleading expert testimony,” and “demonstrably quite false.” See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014) (debunking Prof. Greenland’s claims).

[2] One almost comical example of trying too hard to settle a score occurs in a footnote, where Greenland cites a breast implant case as having been reversed in part by another case in the same appellate court. See 39 Wake Forest Law Rev. at 309 n.68, citing Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir. 1999), aff’d in part & rev’d in part, United States v. Baxter Int’l, Inc., 345 F.3d 866 (11th Cir. 2003). The subsequent case was not by any stretch of the imagination a reversal of the earlier Allison case; the egregious citation is a legal fantasy. Furthermore, Allison had no connection with the procedures for court-appointed expert witnesses or technical advisors. Perhaps the most charitable interpretation of this footnote is that it was injected by the law review editors or supervisors.

[3] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[4] In addition to the unfair attack on Professor Makuch, see supra, n.1, there is much that some will find “disturbing,” “misleading,” and even “ludicrous,” (some of Greenland’s favorite pejorative adjectives) in the article. Greenland repeats in brief his arguments against the legal system’s use of probabilities of causation[4], which I have addressed elsewhere.

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[6] See 39 Wake Forest Law Rev. at 294-95, citing Baxter Healthcare Corp. v. Denton, No. 99CS00868, 2002 WL 31600035, at *1 (Cal. App. Dep’t Super. Ct. Oct. 3, 2002) (unpublished); Baxter Healthcare Corp. v. Denton, 120 Cal. App. 4th 333 (2004)

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

[10] 39 Wake Forest L. Rev. at 305 (“If it is necessary to prove causation ‛beyond a reasonable doubt’–or be ‛compelled to give up the null’ – then action can be forestalled forever by focusing on any aspect of available evidence that fails to conform neatly with the causal (alternative) hypothesis. And in medical and social science there is almost always such evidence available, not only because of the ‛play of chance’ (the focus of ordinary statistical theory), but also because of the numerous validity problems in human research.”

[11] See Peter Green, “Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases” (Jan. 23, 2002) (writing on behalf of The Royal Statistical Society; “Although many scientists have some familiarity with statistical methods, statistics remains a specialised area. The Society urges you to take steps to ensure that statistical evidence is presented only by appropriately qualified statistical experts, as would be the case for any other form of expert evidence.”).

[12] 39 Wake Forest Law Rev. at 291 (“In reality, there is no universally accepted method for inferring presence or absence of causation from human observational data, nor is there any universally accepted method for inferring probabilities of causation (as courts often desire); there is not even a universally accepted definition of cause or effect.”).

[13] 39 Wake Forest Law Rev. at 302-03 (“If one is more concerned with explaining associations scientifically, rather than with mechanical statistical analysis, evidence about validity can be more important than statistical results.”).

Playing Dumb on Statistical Significance

January 4th, 2015

For the last decade, at least, researchers have written to document, explain, and correct, a high rate of false-positive research findings in biomedical research[1]. And yet, there are some authors who complain that the traditional standard of statistical significance is too stringent. The best explanation for this paradox appears to lie in these authors’ rhetorical strategy of protecting their “scientific conclusions,” based upon weak and uncertain research findings, from criticisms. The strategy includes mischaracterizing significance probability as a burden of proof, and then speciously claiming that the standard for significance in the significance probability is too high as a threshold for posterior probabilities of scientific claims. SeeRhetorical Strategy in Characterizing Scientific Burdens of Proof” (Nov. 15, 2014).

Naomi Oreskes is a professor of the history of science in Harvard University. Her writings on the history of geology are well respected; her writings on climate change tend to be more adversarial, rhetorical, and ad hominem. See, e.g., Naomi Oreskes, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (N.Y. 2010). Oreskes’ abuse of the meaning of significance probability for her own rhetorical ends is on display in today’s New York Times. Naomi Oreskes, “Playing Dumb on Climate Change,” N.Y. Times Sunday Rev. at 2 (Jan. 4, 2015).

Oreskes wants her readers to believe that those who are resisting her conclusions about climate change are hiding behind an unreasonably high burden of proof, which follows from the conventional standard of significance in significance probability. In presenting her argument, Oreskes consistently misrepresents the meaning of statistical significance and confidence intervals to be about the overall burden of proof for a scientific claim:

“Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.”

Although the confidence interval is related to the pre-specified Type I error rate, alpha, and so a conventional alpha of 5% does lead to a coefficient of confidence of 95%, Oreskes has misstated the confidence interval to be a burden of proof consisting of a 95% posterior probability. The “relationship” is either true or not; the p-value or confidence interval provides a probability for the sample statistic, or one more extreme, on the assumption that the null hypothesis is correct. The 95% probability of confidence intervals derives from the long-term frequency that 95% of all confidence intervals, based upon samples of the same size, will contain the true parameter of interest.

Oreskes is an historian, but her history of statistical significance appears equally ill considered. Here is how she describes the “severe” standard of the 95% confidence interval:

“Where does this severe standard come from? The 95 percent confidence level is generally credited to the British statistician R. A. Fisher, who was interested in the problem of how to be sure an observed effect of an experiment was not just the result of chance. While there have been enormous arguments among statisticians about what a 95 percent confidence level really means, working scientists routinely use it.”

First, Oreskes, the historian, gets the history wrong. The confidence interval is due to Jerzy Neyman, not to Sir Ronald A. Fisher. Jerzy Neyman, “Outline of a theory of statistical estimation based on the classical theory of probability,” 236 Philos. Trans. Royal Soc’y Lond. Ser. A 333 (1937). Second, although statisticians have debated the meaning of the confidence interval, they have not wandered from its essential use as an estimation of the parameter (based upon the use of an unbiased, consistent sample statistic) and a measure of random error (not systematic error) about the sample statistic. Oreskes provides a fallacious history, with a false and misleading statistics tutorial.

Oreskes, however, goes on to misidentify the 95% coefficient of confidence with the legal standard known as “beyond a reasonable doubt”:

“But the 95 percent level has no actual basis in nature. It is a convention, a value judgment. The value it reflects is one that says that the worst mistake a scientist can make is to think an effect is real when it is not. This is the familiar “Type 1 error.” You can think of it as being gullible, fooling yourself, or having undue faith in your own ideas. To avoid it, scientists place the burden of proof on the person making an affirmative claim. But this means that science is prone to ‘Type 2 errors’: being too conservative and missing causes and effects that are really there.

Is a Type 1 error worse than a Type 2? It depends on your point of view, and on the risks inherent in getting the answer wrong. The fear of the Type 1 error asks us to play dumb; in effect, to start from scratch and act as if we know nothing. That makes sense when we really don’t know what’s going on, as in the early stages of a scientific investigation. It also makes sense in a court of law, where we presume innocence to protect ourselves from government tyranny and overzealous prosecutors — but there are no doubt prosecutors who would argue for a lower standard to protect society from crime.

When applied to evaluating environmental hazards, the fear of gullibility can lead us to understate threats. It places the burden of proof on the victim rather than, for example, on the manufacturer of a harmful product. The consequence is that we may fail to protect people who are really getting hurt.”

The truth of climate change opinions do not turn on sampling error, but rather on the desire to draw an inference from messy, incomplete, non-random, and inaccurate measurements, fed into models of uncertain validity. Oreskes suggests that significance probability is keeping us from acknowledging a scientific fact, but the climate change data sets are amply large to rule out sampling error if that were a problem. And Oreskes’ suggestion that somehow statistical significance is placing a burden upon the “victim,” is simply assuming what she hopes to prove; namely, that there is a victim (and a perpetrator).

Oreskes’ solution seems to have a Bayesian ring to it. She urges that we should start with our a priori beliefs, intuitions, and pre-existing studies, and allow them to lower our threshold for significance probability:

“And what if we aren’t dumb? What if we have evidence to support a cause-and-effect relationship? Let’s say you know how a particular chemical is harmful; for example, that it has been shown to interfere with cell function in laboratory mice. Then it might be reasonable to accept a lower statistical threshold when examining effects in people, because you already have reason to believe that the observed effect is not just chance.

This is what the United States government argued in the case of secondhand smoke. Since bystanders inhaled the same chemicals as smokers, and those chemicals were known to be carcinogenic, it stood to reason that secondhand smoke would be carcinogenic, too. That is why the Environmental Protection Agency accepted a (slightly) lower burden of proof: 90 percent instead of 95 percent.”

Oreskes’ rhetoric misstates key aspects of scientific method. The demonstration of causality in mice, or only some perturbation of cell function in non-human animals, does not warrant lowering our standard for studies in human beings. Mice and rats are, for many purposes, poor predictors of human health effects. All medications developed for human use are tested in animals first, for safety and efficacy. A large majority of such medications, efficacious in rodents, fail to satisfy the conventional standards of significance probability in randomized clinical trials. And that standard is not lowered because the drug sponsor had previously demonstrated efficacy in mice, or some other furry rodent.

The EPA meta-analysis of passive smoking and lung cancer is a good example of how not to conduct science. The protocol for the EPA meta-analysis called for a 95% confidence interval, but the agency scientists manipulated their results by altering the pre-specified coefficient confidence in their final report. Perhaps even more disgraceful was the selectivity of included studies for the meta-analysis, which biased the agency’s result in a way not reflected in p-values or confidence intervals. SeeEPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2, 2012); “EPA Post Hoc Statistical Tests – One Tail vs Two” (Dec. 2, 2012).

Of course, the scientists preparing for and conducting a meta-analysis on environmental tobacco smoke began with a well-justified belief that active smoking causes lung cancer. Passive smoking, however, involves very different exposure levels and raises serious issues of the human body’s defensive mechanisms to protect against low-level exposure. Insisting on a reasonable quality meta-analysis for passive smoking and lung cancer was not a matter of “playing dumb”; it was a recognition of our actual ignorance and uncertainty about the claim being made for low-exposure effects. The shifty confidence intervals and slippery methodology exemplifies how agency scientists assume their probandum to be true, and then manipulate or adjust their methods to provide the result they had assumed all along.

Oreskes then analogizes not playing dumb on environmental tobacco smoke to not playing dumb on climate change:

“In the case of climate change, we are not dumb at all. We know that carbon dioxide is a greenhouse gas, we know that its concentration in the atmosphere has increased by about 40 percent since the industrial revolution, and we know the mechanism by which it warms the planet.

WHY don’t scientists pick the standard that is appropriate to the case at hand, instead of adhering to an absolutist one? The answer can be found in a surprising place: the history of science in relation to religion. The 95 percent confidence limit reflects a long tradition in the history of science that valorizes skepticism as an antidote to religious faith.”

I will leave substance of the climate change issue to others, but Oreskes’ methodological misidentification of the 95% coefficient of confidence with burden of proof is wrong. Regardless of motive, the error obscures the real debate, which is about data quality. More disturbing is that Oreskes’ error confuses significance and posterior probabilities, and distorts the meaning of burden of proof. To be sure, the article by Oreskes is labeled opinion, and Oreskes is entitled to her opinions about climate change and whatever.  To the extent that her opinions, however, are based upon obvious factual errors about statistical methodology, they are entitled to no weight at all.


 

[1] See, e.g., John P. A. Ioannidis, “How to Make More Published Research True,” 11 PLoS Medicine e1001747 (2014); John P. A. Ioannidis, “Why Most Published Research Findings Are False” 2 PLoS Medicine e124 (2005); John P. A. Ioannidis, Anna-Bettina Haidich, and Joseph Lau, “Any casualties in the clash of randomised and observational evidence?” 322 Brit. Med. J. 879 (2001).

 

Showing Causation in the Absence of Controlled Studies

December 17th, 2014

The Federal Judicial Center’s Reference Manual on Scientific Evidence has avoided any clear, consistent guidance on the issue of case reports. The Second Edition waffled:

“Case reports lack controls and thus do not provide as much information as controlled epidemiological studies do. However, case reports are often all that is available on a particular subject because they usually do not require substantial, if any, funding to accomplish, and human exposure may be rare and difficult to study. Causal attribution based on case studies must be regarded with caution. However, such studies may be carefully considered in light of other information available, including toxicological data.”

F.J.C. Reference Manual on Scientific Evidence at 474-75 (2d ed. 2000). Note the complete lack of discussion of base-line risk, prevalence of exposure, and external validity of the “toxicological data.”

The second edition’s more analytically acute and rigorous chapter on statistics generally acknowledged the unreliability of anecdotal evidence of causation. See David Kaye & David Freedman, “Reference Guide on Statistics,” in F.J.C. Reference Manual on Scientific Evidence 91 – 92 (2d ed. 2000).

The Third Edition of the Reference Manual is even less coherent. Professor Berger’s introductory chapter[1] begrudgingly acknowledges, without approval, that:

“[s]ome courts have explicitly stated that certain types of evidence proffered to prove causation have no probative value and therefore cannot be reliable.59

The chapter on statistical evidence, which had been relatively clear in the second edition, now states that controlled studies may be better but case reports can be helpful:

“When causation is the issue, anecdotal evidence can be brought to bear. So can observational studies or controlled experiments. Anecdotal reports may be of value, but they are ordinarily more helpful in generating lines of inquiry than in proving causation.14

Reference Manual at 217 (3d ed. 2011). The “generally” is given no context or contour for readers. These authors fail to provide any guidance on what will come from anecdotal evidence, or when and why anecdotal reports may do more than merely generating “lines of inquiry.”

In Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011), the Supreme Court went out of its way, way out of its way, to suggest that statistical significance was not always necessary to support conclusions of causation in medicine. Id. at 1319. The Court cited three Circuit court decisions to support its suggestion, but two of three involved specific causation inferences from so-called differential etiologies. General causation was assumed in those two cases, and not at issue[2]. The third case, the notorious Wells v. Ortho Pharmaceutical Corp., 788 F. 2d 741, 744–745 (11th Cir. 1986), was also cited in support of the suggestion that statistical significance was not necessary, but in Wells, the plaintiffs’ expert witnesses actually relied upon studies that claimed at least nominal statistical significance. Wells was and remains representative of what is possible and results when trial judges ignore the constraints of study validity. The Supreme Court, in any event, abjured any intent to specify “whether the expert testimony was properly admitted in those cases [Wells and others],” and the Court made no “attempt to define here what constitutes reliable evidence of causation.” 131 S. Ct. at 1319.

The causal claim in Siracusano involved anosmia, loss of the sense of smell, from the use of Zicam, zinc gluconate. The case arose from a motion to dismiss the complaint; no evidence was ever presented or admitted. No baseline risk of anosmia was pleaded; nor did plaintiffs allege that any controlled study demonstrated an increased risk of anosmia from nasal instillation of zinc gluconate. There were, however, clinical trials conducted in the 1930s, with zinc sulfate for poliomyelitis prophylaxis, which showed a substantial incidence of anosmia in the treated children[3]. Matrixx tried to argue that this evidence was unreliable, in part because it involved a different compound, but this argument (1) in turn demonstrated a factual issue that required discovery and perhaps a trial, and (2) traded on a clear error in asserting that the zinc in zinc sulfate and zinc gluconate were different, when in fact they are both ionic compounds that result in zinc ion exposure, as the active constituent.

The position stridently staked out in Matrixx Initiatives is not uncommon among defense counsel in tort cases. Certainly, similar, unqualified statements, rejecting the use of case reports for supporting causal conclusions, can be found in the medical literature[4].

When the disease outcome has an expected value, a baseline rate, in the exposed population, then case reports simply confirm what we already know: cases of the disease happen in people regardless of their exposure status. For this reason, medical societies, such as the Teratology Society, have issued guidances that generally downplay or dismiss the role that case reports may have in the assessment and determination of causality for birth defects:

“5. A single case report by itself is not evidence of a causal relationship between an exposure and an outcome.  Combinations of both exposures and adverse developmental outcomes frequently occur by chance. Common exposures and developmental abnormalities often occur together when there is no causal link at all. Multiple case reports may be appropriate as evidence of causation if the exposures and outcomes are both well-defined and low in incidence in the general population. The use of multiple case reports as evidence of causation is analogous to the use of historical population controls: the co-occurrence of thalidomide ingestion in pregnancy and phocomelia in the offspring was evidence of causation because both thalidomide use and phocomelia were highly unusual in the population prior to the period of interest. Given how common exposures may be, and how common adverse pregnancy outcome is, reliance on multiple case reports as the sole evidence for causation is unsatisfactory.”

The Public Affairs Committee of the Teratology Society, “Teratology Society Public Affairs Committee Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421, 423 (2005).

When the base rate for the outcome is near zero, and other circumstantial evidence is present, some commentators insist that causality may be inferred from well-documented case reports:

“However, we propose that some adverse drug reactions are so convincing, even without traditional chronological causal criteria such as challenge tests, that a well documented anecdotal report can provide convincing evidence of a causal association and further verification is not needed.”

Jeffrey K. Aronson & Manfred Hauben, “Drug safety: Anecdotes that provide definitive evidence,” 333 Brit. Med. J. 1267, 1267 (2006) (Dr. Hauben was medical director of risk management strategy for Pfizer, in New York, at the time of publication). But which ones are convincing, and why?

        *        *        *        *        *        *        *        *        *

Dr. David Schwartz, in a recent blog post, picked up on some of my discussion of the gadolinium case reports (see here and there), and posited the ultimate question: when are case reports sufficient to show causation? David Schwartz, “8 Examples of Causal Inference Without Data from Controlled Studies” (Dec. 14, 2014).

Dr. Schwartz discusses several causal claims, all of which gave rise to litigation at some point, in which case reports or case series played an important, if not dispositive, role:

  1.      Gadolinium-based contrast agents and NSF
  2.      Amphibole asbestos and malignant mesothelioma
  3.      Ionizing radiation and multiple cancers
  4.      Thalidomide and teratogenicity
  5.      Rezulin and acute liver failure
  6.      DES and clear cell vaginal adenocarcinoma
  7.      Vinyl chloride and angiosarcoma
  8.      Manganese exposure and manganism

Dr. Schwartz’s discussion is well worth reading in its entirety, but I wanted to emphasize some of Dr. Schwartz’s caveats. Most of the exposures are rare, as are the outcomes. In some cases, the outcomes occur almost exclusively with the identified exposures. All eight examples pose some danger of misinterpretation. Gadolinium-based contrast agents appear to create a risk of NSF only in the presence of chronic renal failure. Amphibole asbestos, and most importantly, crocidolite causes malignant mesothelioma after a very lengthy latency period. Ionizing radiation causes some cancers that are all-too common, but the presence of multiple cancers in the same person, after a suitable latency period, is distinctly uncommon, as is the level of radiation needed to overwhelm bodily defenses and induce cancers. Thalidomide was associated by case reports fairly quickly with phocomelia, which has an extremely low baseline risk. Other birth defects were not convincingly demonstrated by the case series. Rezulin, an oral antidiabetic medication, was undoubtedly causally responsible for rare cases of acute liver failure. Chronic liver disease, however, which is common among type 2 diabetic patients, required epidemiologic evidence, which never materialized[5].

Manganism, by definition, is the cause of manganism, but extremely high levels of manganese exposure, and specific speciation of the manganese, are essential to the causal connection. Manganism raises another issue often seen in so-called signature diseases: diagnostic accuracy. Unless the diagnostic criteria have perfect (100%) specificity, with no false-positive diagnoses, then once again, we expect false-positive cases to appear when the criteria are applied to large numbers of people. In the welding fume litigation, where plaintiffs’ counsel and physicians engaged in widespread, if not wanton, medico-legal screenings, it was not surprising that they might find occasional cases that appeared to satisfy their criteria. Of course, the more the criteria are diluted to accommodate litigation goals, the more likely there will be false positive cases.[6]

Dr. Schwartz identifies some common themes and important factors in identifying the bases for inferring causality from uncontrolled evidence:

“(a) low or no background rate of the disease condition;

(b) low background rate of the exposure;

(c) a clear understanding of the mechanism of action.”

These factors and perhaps others should not be confused with strict criteria here. The exemplar cases suggest a family resemblance of overlapping factors that help support the inference, even against the most robust skepticism.

In litigation, defense counsel typically argue that analytical epidemiology is always necessary, and plaintiffs’ counsel claim epidemiology is never needed. The truth is more nuanced and conditional, but the great majority of litigated cases do require epidemiology for health effects because the claimed harms are outcomes that have an expected incidence or prevalence in the exposed population irrespective of exposure.


[1] Reference Manual on Scientific Evidence at 23 (3d ed. 2011) (citing “Cloud v. Pfizer Inc., 198 F. Supp. 2d 1118, 1133 (D. Ariz. 2001) (stating that case reports were merely compilations of occurrences and have been rejected as reliable scientific evidence supporting an expert opinion that Daubert requires); Haggerty v. Upjohn Co., 950 F. Supp. 1160, 1164 (S.D. Fla. 1996), aff’d, 158 F.3d 588 (11th Cir. 1998) (“scientifically valid cause and effect determinations depend on controlled clinical trials and epidemiological studies”); Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441, 1454 (D.V.I. 1994), aff’d, 46 F.3d 1120 (3d Cir. 1994) (stating there is a need for consistent epidemiological studies showing statistically significant increased risks).”)

[2] Best v. Lowe’s Home Centers, Inc., 563 F. 3d 171, 178 (6th Cir 2009); Westberry v. Gislaved Gummi AB, 178 F. 3d 257, 263–264 (4th Cir. 1999).

[3] There may have been a better argument for Matrixx in distinguishing the method and place of delivery of the zinc sulfate in the polio trials of the 1930s, but when Matrixx’s counsel was challenged at oral argument, he asserted simply, and wrongly, that the two compounds were different.

[4] Johnston & Hauser, “The value of a case report,” 62 Ann. Neurology A11 (2007) (“No matter how compelling a vignette may seem, one must always be concerned about the reliability of inference from an “n of one.” No statistics are possible in case reports. Inference is entirely dependent, then, on subjective judgment. For a case meant to suggest that agent A leads to event B, the association of these two occurrences in the case must be compared to the likelihood that the two conditions could co-occur by chance alone …. Such a subjective judgment is further complicated by the fact that case reports are selected from a vast universe of cases.”); David A. Grimes & Kenneth F. Schulz, “Descriptive studies: what they can and cannot do,” 359 Lancet 145, 145, 148 (2002) (“A frequent error in reports of descriptive studies is overstepping the data: studies without a comparison group allow no inferences to be drawn about associations, causal or otherwise.”) (“Common pitfalls of descriptive reports include an absence of a clear, specific, and reproducible case definition, and interpretations that overstep the data. Studies without a comparison group do not allow conclusions about cause and disease.”); Troyen A. Brennan, “Untangling Causation Issues in Law and Medicine: Hazardous Substance Litigation,” 107 Ann. Intern. Med. 741, 746 (1987) (recommending that testifying physicans “[a]void anecdotal evidence; clearly state the opposing side is relying on anecdotal evidence and why that is not good science.”).

[5] See In re Rezulin, 2004 WL 2884327, at *3 (S.D.N.Y. 2004).

[6] This gaming of diagnostic criteria has been a major invitation to diagnostic invalidity in litigation over asbestosis and silicosis in the United States.

More Case Report Mischief in the Gadolinium Litigation

November 28th, 2014

The Decker case is one curious decision, by the MDL trial court, and the Sixth Circuit. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014). First, the Circuit went out of its way to emphasize that the trial court had discretion, not only in evaluating the evidence on a Rule 702 challenge, but also in devising the criteria of validity[1]. Second, the courts ignored the role and the weight being assigned to Federal Rule of Evidence 703, in winnowing the materials upon which the defense expert witnesses could rely. Third, the Circuit approved what appeared to be extremely asymmetric gatekeeping of plaintiffs’ and defendant’s expert witnesses. The asymmetrical standards probably were the basis for emphasizing the breadth of the trial court’s discretion to devise the criteria for assessing scientific validity[2].

In barring GEHC’s expert witnesses from testifying about gadolinium-naive nephrogenic systemic fibrosis (NSF) cases, Judge Dan Polster, the MDL judge, appeared to invoke a double standard. Plaintiffs could adduce any case report or adverse event report (AER) on the theory that the reports were relevant to “notice” of a “safety signal” between gadolinium-based contrast agents in MRI and NSF. Defendants’ expert witnesses, however, were held to the most exacting standards of clinical identity with the plaintiff’s particular presentation of NSP, biopsy-proven presence of Gd in affected tissue, and documentation of lack of GBCA-exposure, before case reports would be permitted as reliance materials to support the existence of gadolinium-naïve NSF.

A fourth issue with the Decker opinion is the latitude it permitted the district court to allow testimony from plaintiffs’ pharmacovigilance expert witness, Cheryl Blume, Ph.D., over objections, to testify about the “signal” created by the NSF AERs available to GEHC. Decker at *11. At the same trial, the MDL judge prohibited GEHC’s expert witness, Dr. Anthony Gaspari, to testify that the AERs described by Blume did not support a clinical diagnosis of NSF.

On a motion for reconsideration, Judge Polster reaffirmed his ruling on grounds that

(1) the AERs were too incomplete to rule in or rule out a diagnosis of NSF, although they were sufficient to create a “signal”;

(2) whether the AERs were actual cases of NSF was not relevant to their being safety signals;

(3) Dr. Gaspari was not an expert in pharmacovigilance, which studied “signals” as opposed to causation; and

(4) Dr. Gaspari’s conclusion that the AERs were not NSF was made without reviewing all the information available to GEHC at the time of the AERs.

Decker at *12.

The fallacy of this stingy approach to Dr. Gaspari’s testimony lies in the courts’ stubborn refusal to recognize that if an AER was not, as a matter of medical science, a case of NSF, then it could not be a “signal” of a possible causal relationship between GBCA and NSF. Pharmacovigilance does not end with ascertaining signals; yet the courts privileged Blume’s opinions on signals even though she could not proceed to the next step and evaluate diagnostic accuracy and causality. This twisted logic makes a mockery of pharmacovigilance. It also led to the exclusion of Dr. Gaspari’s testimony on a key aspect of plaintiffs’ liability evidence.

The erroneous approach pioneered by Judge Polster was compounded by the district court’s refusal to give a jury instruction that AERs were only relevant to notice, and not to causation. Judge Polster offered his reasoning that “the instruction singles out one type of evidence, and adds, rather than minimizes, confusion.” Judge Polster cited the lack of any expert witness testimony that suggested that AERs showed causation and “besides, it doesn’t matter because those patients are not, are not the plaintiffs.” Decker at *17.

The lack of dispute about the meaning of AERs would have seemed all the more reason to control jury speculation about their import, and to give a binding instruction on AERs and their limited significance. As for the AER patients’ not being the plaintiffs, well, the case report patients were not the plaintiffs, either. This last reason is not even wrong[3]. The Circuit, in affirming, turned a blind eye to the district court’s exercise of discretion in a way that systematically increased the importance of Blume’s testimony on signals, while systematically hobbling the defendant’s expert witnesses.


[1]THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS” (Nov. 12, 2014).

[2]Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports” (Nov. 24, 2014).

[3] “Das ist nicht nur nicht richtig, es ist nicht einmal falsch!” The quote is attributed to Wolfgang Pauli in R. E. Peierls, “Wolfgang Ernst Pauli, 1900-1958,” 5 Biographical Memoirs Fellows Royal Soc’y 175, 186 (1960).

 

Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports

November 24th, 2014

Gadolinium (Gd) is a rare earth element. In its ionic form (+3), gadolinium is known to be highly toxic to humans. Gadolinium is strongly paramagnetic, which makes it a valuable contrast agent in for magnetic resonance imaging (MRI). The gadolinium is administered intravenously in a chelated form before MRI. In its chelated form, the ion is escorted out of the body through the kidneys before exposure to free Gd ion occurs. Or that was the theory.

Nephrogenic systemic fibrosis (NSF) is a rare, painful, incurable progressive connective tissue disease. NSF manifests with skin thickening and fibrosis, tethering, which means it cannot be pulled away from body. Some patients may develop extracutaneous fibrosis of muscle, lymph nodes, pleura, and other internal organs. Elana J. Bernstein, Christian Schmidt-Lauber, and Jonathan Kay, “Nephrogenic systemic fibrosis: A systemic fibrosing disease resulting from gadolinium exposure,” 26 Best Practice & Research Clin. Rheum. 489, 489 (2012).

As a diagnostic entity, NSF is a relatively recent discovery. The first case was noted in 1997, in California. Within a few years, the differential diagnostic criteria to distinguish NSF from other fibrotic diseases were developed. Centers for Disease Control, “Fibrosing skin condition among patients with renal disease–United States and Europe, 1997–2002,” 51 MMWR Morbidity and Mortality Weekly Report 25 (2002). Physicians identified the condition among patients with renal insufficiency who had received MRI with a gadolinium-based contrast agent (GBCA). Given the rarity of both the exposure (GBCA and renal insufficiency) and the outcome (NSF), the relationship between NSF and the use of gadolinium-containing contrast agents for magnetic resonance imaging (MRI) was discovered largely from case reports. A case registry is maintained at Yale University, and has identified 380 cases to date. Shawn E. Cowper, “Nephrogenic Systemic Fibrosis” at the website for The International Center for Nephrogenic Systemic Fibrosis Research (ICNSFR) [last updated June 15, 2013).

The little epidemiology that exists on the subject generally has found that all “cases” had exposure to Gd[1]. Or almost all. There have been occasional cases found without reported exposure to GBCA. Indeed, one case of NSF without prior GBCA was reported last month in the dermatological literature. C. Ross, N. De Rosa, G. Marshman, D. Astill, Nephrogenic systemic fibrosis in a gadolinium-naïve patient: Successful treatment with oral sirolimus,” Australas. J. Dermatol. (2014); doi: 10.1111/ajd.12176. [Epub ahead of print].

In litigation, the usual scenario is that plaintiffs and their counsel and expert witnesses want to offer case reports or case series as probative of a causal association between an exposure and a particular disease outcome. In the silicone gel breast implant litigation, women, who self-characterized themselves “victims,” shouted outside courtrooms, “We are the evidence.”

When the outcome in question has a baseline rate, and the exposure is widespread, this strategy is usually illegitimate and most courts have limited or prohibited the obvious attempt to prejudice the jury by the use of evidence that has little or no probative value.

The causal connection between NSF and GBCA, described above, was postulated on the basis of case reports, but this is not really a rejection of the general rule about case reports. NSF is an extremely rare outcome, and GBCA administered to patients with serious kidney insufficiency is a fairly rare exposure. In addition, gadolinium ion has a known human toxicity, and the connection between renal insufficiency and Gd toxicity is rather straightforward. The insufficiency of the kidney function results in longer “in residence” times for the GBCA, with the consequence that the gadolinium disassociates from its chelating agent, and the free Gd ion does its damage. Furthermore, biopsies of affected tissues show an uptake of gadolinium in NSF patients.

   *   *   *   *   *   *   *   *

GE Healthcare manufactures Omniscan, a GBCA, for use as an MRI-contrast medium. Given the recently discovered dangers of GBCAs in vulnerable patients, Omniscan has been a magnet for lawsuits, with the peak intensity of the litigation field in the MDL courtroom of federal district courtroom of Judge Dan Polster. Judge Polster tried the first Omniscan case, which resulted in a verdict for the plaintiff. GE appealed, complaining about several of Judge Polster’s rulings, including the uneven handling of case reports. Last month, the Sixth Circuit affirmed. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014).

General causation between GBCAs and NSF was apparently not disputed in Decker. Although plaintiffs in the GBCA litigation established the causality of GABC in producing NSF, by case reports, Judge Polster refused to permit GEHC’s expert witnesses to testify about their reliance upon case reports of gadolinium-naïve cases of NSF; that is, the court disallowed testimony about reported cases that occurred in the absence of GBCA exposure[2]. Id. at *9. Judge Polster found that the reported gadolinium-naïve case reports were “methodologically flawed” because they did not adequately show that the NSF patients in question lacked Gd exposure, with tissue biopsy or other means. Id. at * 10. The district court speculated that there may have Gd exposure from a non-MRI procedure, but never explained what non-MRI procedure would involve internal administration of GBCA. Nor did the district court address the temporal relationship between this undocumented, conjectured non-MRI gadolinium-based imaging procedure and the onset of the reported patient’s NSF.

Before trial defendant GEHC moved for reconsideration of the district court’s previous decision on defensive use of gadolinium-naïve case reports, based upon on a then recent publication of a “purported” case of gadolinium-naïve NSF. Id. at *8. A quick read of the late-breaking case study shows that it was more than a “purported” case. A.A. Lemy, et al., “Revisiting nephrogenic systemic fibrosis in 6 kidney transplant recipients: a single-center experience,” 63 J. Am. Acad. Dermatol. 389 (2010). The cited paper by Lemy had diagnosed NSF in a patient without GBCA exposure, and mass spectrometry testing of affected tissue revealed no Gd. The district court, however, dismissed the Lemy case as irrelevant unless GEHC’s expert witnesses could demonstrate that Lemy’s patient number 5 and the plaintiff were so clinical similar that “it was probable that Mr. Decker’s NSF was not caused by his 2005 Omniscan [exposure].”

The Sixth Circuit affirmed this “tails they win; heads you lose” approach to gatekeeping as all within the scope of the district court’s exercise of discretion. Lemy’s case number 5 and Mr. Decker both had NSF, and yet the courts do not describe clinical varieties among NSF, which vary based upon their relatedness to gadolinium exposure. It would seem that the courts were imposing an extremely heavy burden on the defense to show that the gadolinium-naïve cases were absolutely free of Gd exposure, and that they resembled the particular plaintiff’s NSF diagnosis in every respect. Without any evidence of diagnostic disease criteria sensitivity and specificity, and positive predictive value for the criteria, the district and the appellate courts seem to have accepted glib demands for absolute identity between the plaintiff’s NSF manifestation and any candidate Gd-free NSF case. Given that there is clinical heterogeneity among Gd-NSF cases, and that causality was basically inferred from cases and case series, the courts’ reasoning seems strained.

The appellate court also seemed blithely unaware of the fallacious circularity of permitting a diagnostic entity to be defined based upon exposure, thereby preventing any fair test of the hypothesis that all NSF cases are caused by gadolinium. This fallacy was advanced in the silicone gel breast implant litigation, where the litigation industry shrank from claims that silicone caused classic connective tissue diseases, in the face of exculpatory epidemiologic studies. The claimants retreated to a claim that silicone caused a “new” disease that was defined by mostly vague, self-reported symptoms [so very different from NSF in this respect], in conjunction with silicone exposure. The court-appointed expert witnesses, however, would have none of these shenanigans:

“The National Science Panel concluded that they do not yet support the inclusion of SSRD [systemic silicone-related disease] in the list of accepted diseases, for 4 reasons. First, the requirement of the inclusion of the putative cause (silicone exposure) as one of the criteria does not allow the criteria set to be tested objectively without knowledge of the presence of implants, thus incurring incorporation bias (27).”

Peter Tugwell, George Wells, Joan Peterson, Vivian Welch, Jacqueline Page, Carolyn Davison, Jessie McGowan, David Ramroth, and Beverley Shea, “Do Silicone Breast Implants Cause Rheumatologic Disorders? A Systematic Review for a Court-Appointed National Science Panel,” 44 Arthritis & Rheumatism 2477, 2479 (2001) (citing David Sackett, “Bias in analytic research,” 32 J. Chronic Dis. 51 (1979)).

Of course, NSF does not share the dubious provenance of SSRD, or SAD [silicone-associated disorder] as it was sometimes known. Still, the analytic studies that have shown that NSF cases all, or mostly, had GBCA exposure, explicitly refrained from defining the NSF case as including gadolinium exposure.

Decker is thus a curious case. The trial and appellate court talked about preventing the defense expert witnesses from relying upon case reports that were “methodologically flawed,” but the courts never mentioned Federal Rule of Evidence 703, which should have been the basis for such selective pruning of the expert witnesses’ reliance materials. And then there is the matter that even if GEHC were correct about Gd-free NSF cases, the attributable risk for NSF to prior Gd exposure is almost certainly very high, and the debate over whether NSF is a “signature” disease was not likely going to affect the case outcome.

Decker can perhaps best be understood as a dispute about specific causation, with established general causation, in which the relative risk of NSF from GBCA exposure is extraordinarily high among patients with renal insufficiency. If there are other causes of NSF, they are considerably more rare than GBCA/renal insufficiency exposed cases. In the face of this very high attributable risk, GE’s expert witnesses’ discussions of an idiopathic or other cause was too speculative to pass muster under Rule 702.


[1] Elana J. Bernstein, Tamara Isakova, Mary E. Sullivan, Lori B. Chibnik, Myles Wolf & Jonathan Kay, “Nephrogenic systemic fibrosis is associated with hypophosphataemia: a case–control study,” 53 Rheumatology 1613 (2014); T.R. Elmholdt, M. Pedersen, B. Jørgensen, K. Søndergaard, J.D. Jensen, M. Ramsing, and A.B. Olesen, “Nephrogenic systemic fibrosis is found only among gadolinium-exposed patients with renal insufficiency: a case-control study from Denmark,” 165 Br. J. Dermatol. 828 (2011); P. Marckmann, “An epidemic outbreak of nephrogenic systemic fibrosis in a Danish hospital,” 66 Eur. J. Radiol. 187 (2008) (reporting all patients had gadodiamide-enhanced magnetic resonance imaging and severe renal insufficiency before onset of NSF); P. Marckmann, L. Skov, K. Rossen, J.G. Heaf, and H.S. Thomsen, “Case-control study of gadodiamide-related nephrogenic systemic fibrosis,” 22 Nephrol. Dialysis &Transplant. 3174 (2007) (all 19 cases in case-control study had prior exposure to gadolinium (Gd)-containing magnetic resonance imaging contrast agents); Centers for Disease Control, “Nephrogenic Fibrosing Dermopathy Associated with Exposure to Gadolinium-Containing Contrast Agents — St. Louis, Missouri, 2002–2006,” 56 MMWR Morbidity and Mortality Weekly Report (Feb. 23, 2007).

[2] T.A. Collidge, P.C. Thomson, P.B. Mark, et al., “Gadolinium-Enhanced MR Imaging and Nephrogenic Systemic Fibrosis: Retrospective Study of a Renal Replacement Therapy Cohort,” 245 Radiology 168-175 (2007); I.M. Wahba, E.L. Simpson, and K. White, “Gadolinium Is Not The Only Trigger For Nephrogenic Systemic Fibrosis: Insights From Two Cases And Review Of The Recent Literature,” 7 Am. J. Transplant. 1 (2007); A. Deng, D.B. Martin, et al., “Nephrogenic Systemic Fibrosis with a Spectrum of Clinical and Histopathological Presentation: A Disorder of Aberrant Dermal Remodeling,” 37 J. Cutan. Pathol. 204 (2009).

Rhetorical Strategy in Characterizing Scientific Burdens of Proof

November 15th, 2014

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration.  Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities: Fatico

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.


 

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).

Substantial Factor Versus Sine Qua Non Causation

October 7th, 2014

In a prosecution against the eponymously named Mr. Mullet, and other Amish defendants, the Department of Justice grabbed an an Amish beard- and hair-cutting case from state authorities and cast it as a hate crime. United States v. Mullet, 868 F.Supp. 2d 618 (N.D. Ohio 2012). The criminal statute invoked by the federal prosecutors prohibits

“willfully caus[ing] bodily injury to any person . . . because of the actual or perceived . . . religion . . . of [that] person… .”

18 U.S.C. § 249(a)(2)(A). The prosecution managed to persuade the trial judge, Judge Polster, that “because of” means merely “significant motivating factor,” but the Sixth Circuit would have none of it, and reversed. United States v. Miller, 2014 U.S. App. LEXIS 16532, 2014 FED App. 0210P (6th Cir. ); see Debra Cassens Weiss, “6th Circuit reverses hate-crime convictions of Amish in beard- and hair-cutting attacks” (Aug 28, 2014).

The Court of Appeals held, in a two to one decision, that the statute required a “but for” jury instruction, reversed and remanded. Most plainly, the appellate court stated that:

“[B]ecause of” in brief means what it says: The prohibited act or motive must be an actual cause of the specified outcome.”

United States v. Miller, at *12.

The appellate court cited the common meaning of “because of” and the treatment this phrase has received in criminal[1] and civil[2] cases in the United States Supreme Court. The defendants had presented evidence of other non-religious, non-prohibited motives and thus the district court’s charge was not harmless.

The court then, rather inconsistently, pointed to the “beyond a reasonable doubt standard” and constitutional concerns over religious freedom, as requiring “but for,” despite the identical interpretive outcome in civil cases. Id. What happens when, as in Miller, there are clearly several motives involved:

“How should a jury measure whether a specific motive was significant in inspiring a defendant to act? Is a motive significant if it is one of three reasons he acted? One of ten?”

Id. at *12. The same difficulty could be raised against using the “significant” or “substantial factor” test in civil cases.

More persuasive was simply the invocation of common usage and the need to construe a statute leniently in favor of the defendant.

The dissenting judge would have brushed this all under the rug as “harmless error,” but failure to charge properly on the correct causation standard is rarely going to pass as harmless, and it did not do so here. Even the dissenter, however, acknowledged that:

“This but-for requirement is part of the common understanding of cause.”

Id. (Sargus, J., dissenting) at *46 (quoting from Burrage v. United States, 134 S. Ct. 881, 888 (2014)).


[1] Burrage v. United States, 134 S. Ct. 881, 887–89 (2014) (criminal).

[2] Univ. of Tex. Sw. Med. Ctr. v. Nassar, 133 S. Ct. 2517, 2528 (2013) (civil); Gross v. FBL Fin. Servs., Inc., 557 U.S. 167, 176–77 (2009) (civil); Safeco Ins. Co. of Am. v. Burr, 551 U.S. 47, 63–64 & n.14 (2007) (civil).

Cancer Epidemiology 100 Years Ago

October 6th, 2014

Writing from the Department of Pathology of Columbia University, at the College of Physicians and Surgeons, Isaac Levin published a study of cancer etiology in 1910. Isaac Levin, “III. The Study of the Etiology of Cancer Based on Clinical Statistics, 51 Ann. Surg. Jun 768 (1910). Levin looked at population and gender prevalence among cancer cases, without age correction or any statistical measure of random error. He compared population prevalence of specific or all-cause mortality without isolating exposure and outcome. Levin’s efforts were earnest, but surely they strike us as primitive. If you want to be disabused of the belief that epidemiology today is a primitive scientific enterprise, mired in methodologies and interpretative strategies of the past, Levin’s article is a welcome documentation that progress is possible and has in fact occurred.

Levin sums up what was known about occupation and cancer in 1910, which was not much:

“QUESTION 10,- OCCUPATION.–Occupation undoubtedly plays an important role in the causation of cancer. The carcinoma of the scrotum of the chimney sweeps, tumors of the bladder of the aniline workers, and X-ray cancer are well known, but it will require a great deal of research to, show how direct the influence is that these occupations exert on the causation of cancer, since only a certain number of the workers contract the disease.”

Id. at 776. No acknowledgment of dose response, or thresholds. No quantitation of risk against baselines.

Levin goes on to note that:

“[o]f extreme interest seems to be the fact, noted both in England and America, that cancer is comparatively rare among the miners. Table IV, compiled from the twelfth U. S. Census, illustrates this fact:

Table IV from Levin 1910

Table IV from Levin 1910

[Open in new window for clearer image]

Wilkesbarre and Scranton are mining towns and the death rate is lower than in Harrisburg or in the whole state of Pennsylvania. It seems also to be the opinion of the surgeons in Pennsylvania (personal communication) that cancer is rare among miners.”

Id. at 776.

There are some other quaint relics of the past here. On the questionnaire used for 4,000 cases or so, here is how Levin inquired of “Race or Nationality”

“RACE OR NATIONALITY. …………Australoid – Coolies of East India; Negroid – Negroes, Negritos of the Philippines; Mongoloid – Chinese, Japanese, American Indians, Filipinos; Melanochroic – Italians, Spaniards, Greeks, Arabs, Jews; Xanthochroic – Fair Europeans. State not only the name of the race, but also of the subdivision]”

Id. at 772. Anthropology was fairly primitive as well, in 1910.