TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence

September 2nd, 2011

It has been an interesting year in the world of expert witnesses.  We have seen David Egilman attempt a personal appeal of a district court’s order excluding him as an expert.  Stephen Ziliak has prattled on about how he steered the Supreme Court from the brink of disaster by helping them to avoid the horrors of statistical significance.  And then we had a philosophy professor turned expert witness, Carl Cranor, publicly touting an appellate court’s decision that held his testimony admissible.  Cranor, under the banner of the Center for Progressive Reform (CPR), hails the First Circuit’s opinion as the greatest thing since Sir Isaac Newton.   Carl Cranor, “Milward v. Acuity Specialty Products: How the First Circuit Opened Courthouse Doors for Wronged Parties to Present Wider Range of Scientific Evidence” (July 25, 2011).

Philosophy Professor Carl Cranor has been trying for decades to dilute the scientific approach to causal conclusions to permit the precautionary principle to find its way into toxic tort cases.  Cranor, along with others, has also criticized federal court expert witness gatekeeping for deconstructing individual studies, showing that the individual studies are weak, and ignoring the overall pattern of evidence from different disciplines.  This criticism has some theoretical merit, but the criticism is typically advanced as an excuse for “manufacturing certainty” from weak, inconsistent, and incoherent scientific evidence.  The criticism also ignores the actual text of the relevant rule – Rule 702, which does not limit the gatekeeping court to assessing individual “pieces” of evidence.  The scientific community acknowledges that there are times when a weaker epidemiologic dataset may be supplemented by strong experiment evidence that leads appropriately to a conclusion of causation.  See, e.g., Hans-Olov Adami, Sir Colin L. Berry, Charles B. Breckenridge, Lewis L. Smith, James A. Swenberg, Dimitrios Trichopoulos, Noel S. Weiss, and Timothy P. Pastoor, “Toxicology and Epidemiology: Improving the Science with a Framework for Combining Toxicological and Epidemiological Evidence to Establish Causal Inference,” 122 Toxicological Sci. 223 (2011) (noting the lack of a systematic, transparent way to integrate toxicologic and epidemiologic data to support conclusions of causality; proposing a “grid” to permit disparate lines of evidence to be integrated into more straightforward conclusions).

For the most part, Cranor’s publications have been ignored in the Rule 702 gatekeeping process.  Perhaps that is why he shrugged his academic regalia and took on the mantle of the expert witness, in Milward v. Acuity Specialty Products, a case involving a claim that benzene exposure caused plaintiff’s acute promyelocytic leukemia (APL), one of several types of acute myeloid leukemia.  Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D.Mass. 2009) (O’Toole, J.).

Philosophy might seem like the wrong discipline to help a court or a jury decide general and specific causation of a rare cancer, with an incidence of less 8 cases per million per year.  (A PubMed search on leukeumia and Cranor yielded no hits.)  Cranor supplemented the other, more traditional testimony from a toxiciologist, by attempting to show that the toxicologist’s testimony was based upon sound scientific method.  Cranor was particularly intent to show that the toxicologist, Dr. Martyn Smith, had used sound method to reach a scientific conclusion, even though he lacked strong epidemiologic studies to support his opinion.

The district court excluded Cranor’s testimony, along with plaintiff’s scientific expert witnesses.  The Court of Appeals, however, reversed, and remanded with instructions that plaintiff’s scientific expert witnesses’ opinions were admissible.  639 F.3d 11 (1st Cir. 2011).  Hence Cranor’s and the CPR’s hyperbole about the opening of the courthouse doors.

The district court was appropriately skeptical about plaintiff’s expert witnesses’ reliance upon epidemiologic studies, the results of which were not statistically significant.  Before reaching the issue of statistical significance, however, the district court found that Dr. Smith had relied upon studies that did not properly support his opinion.  664 F.Supp. 2d at 148.  The defense presented Dr. David Garabrant, an expert witness with substantial qualifications and accomplishments in epidemiologic science.  Dr. Garabrant persuaded the Court that Dr. Smith had relied upon some studies that tended to show no association, and others that presented faulty statistical analyses.  Other studies, relied upon by Dr. Smith, presented data on AML, but Dr. Smith speculated that these AML cases could have been APL cases.  Id.

None of the studies relied upon by plaintiffs’ Dr Smith had a statistically significant result for APL.  Id. at 144. The district court pointed out that scientists typically take care to rely upon data only that shows “statistical significance,” and Dr. Smith (plaintiff’s expert witness) deviated from sound scientific method in attempting to support his conclusion with studies that had not ruled out chance as an explanation for their increased risk ratios.  Id.  The district court did not summarize the studies’ results, and so the unsoundness of plaintiff’s method is difficult to evaluate.  Rather than engaging in hand waving and speculating about “trends” and suggestions, those witnesses could have performed a meta-analysis to increase the statistical precision of a summary point estimate beyond what was achieved in any single, small study.  Neither the plaintiff nor the district court addressed the issue of aggregating study results to address the role of chance in producing the observed results.

The inability to show a statistically significant result was not surprising given how rare the APL subtype of AML is.  Sample size might legitimately interfere with the ability of epidemiologic studies to detect a statistically significant association that really existed.  If this were truly the case, the lack of a statistically significant association could not be interpreted to mean the absence of an association without potentially committing a type II error. In any event, the district court in Milward was willing to credit the plaintiffs’ claim that epidemiologic evidence may not always be essential for establishing causality.  If causality does exist, however, epidemiologic studies are usually required to confirm the existence of the causal relationship.  Id. at 148.

The district court also took a close look at Smith’s mechanistic biological evidence, and found it equally speculative.  Although plausibility is a desirable feature of a causal hypothesis, it only sets the stage for actual data:

“Dr. Smith’s opinion is that ‘[s]ince benzene is clastogenic and has the capability of breaking and rearranging chromosomes, it is biologically plausible for benzene to cause’ the t(15;17) translocation. (Smith Decl. ¶ 28.b.) This is a kind of ‘bull in the china shop’ generalization: since the bull smashes the teacups, it must also smash the crystal. Whether that is so, of course, would depend on the bull having equal access to both teacups and crystal.”

Id. at 146.

“Since general extrapolation is not justified and since there is no direct observational evidence that benzene causes the t(15;17) translocation, Dr. Smith’s opinion — that because benzene is an agent that can cause some chromosomal mutations, it is ‘plausible’ that it causes the one critical to APL—is simply an hypothesis, not a reliable scientific conclusion.”

Id. at 147.

Judge O’Toole’s opinion is a careful, detailed consideration of the facts and data upon which Dr. Smith relied upon, but the First Circuit found an abuse of discretion, and reversed. 639 F.3d 11 (1st Cir. 2011).

The Circuit incorrectly suggested that Smith’s opinion was based upon a “weight of the evidence” methodology described by “the world-renowned epidemiologist Sir Arthur Bradford Hill in his seminal methodological article on inferences of causality. See Arthur Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965).” Id. at 17.  This suggestion is remarkable because everyone knows that it was Arthur’s much smarter brother, Austin, who wrote the seminal article and gave the Bradford Hill name to the famous presidential address published by the Royal Society of Medicine.  Arthur Bradford Hill was not even a knight if he existed at all.

The Circuit’s suggestion is also remarkable for confusing a vague “weight of the evidence” methodology with the statistical and epidemiologic approach of one of the 20th century’s great methodologists.  Sir Austin is known for having conducted the first double-blinded randomized clinical trial, as well as having shown, with fellow knight Sir Richard Doll, the causal relationship between smoking and lung cancer.  Sir Austin wrote one of the first texts on medical statistics, Principles of Medical Statistics (London 1937).  Sir Austin no doubt was turning in his grave when he was associated with Cranor’s loosey-goosey “weight of the evidence” methodology.  See, e.g., Douglas L. Weed, “Weight of Evidence: A Review of Concept and Methods,” 25 Risk Analysis 1545 (2005) (noting the vague, ambiguous, indefinite nature of the concept of “weight of evidence” review).

The Circuit adopted a dismissive attitude towards epidemiology in general, citing to an opinion piece by several cancer tumor biologists, whom the court described as a group from the National Cancer Institute (NCI).  The group was actually a workshop sponsored by the NCI, with participants from many institutions.  Id. at 17 (citing Michele Carbon[e] et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004)).  The cited article did report some suggestions for modifying Bradford Hill’s criteria in the light of modern molecular biology, as well as a sense of the group that there was no “hierarchy” in which epidemiology was at the top.  (The group definitely did not address the established concept that some types of epidemiologic studies are analytically more powerful to support inferences of causality than others — the hierarchy of epidemiologic evidence.)

The Circuit then proceeded to evaluate Dr. Smith’s consideration of the available epidemiologic studies.  The Circuit mistakenly defined an “odds ratio” as the “the difference in the incidence of a disease between a population that has been exposed to benzene and one that has not.”  Id. at 24. Having failed to engage with the evidence sufficiently to learn what an odds ratio was, the Circuit Court then proceeded to state that the difference between Dr. Garabrant and Dr. Smith, as to how to calculate the odds ratio in some of the studies, was a mere difference in opinion between experts, and Dr. Garabrant’s criticisms of Dr. Smith’s approach went to the weight, not the admissibility, of the evidence.  These sparse words are, of course, a legal conclusion, not an explanation, and the Circuit leaves us without any real understanding of how Dr. Smith may have gone astray, but still have been advancing a legitimate opinion within epidemiology, which was not his discipline.  Id. at 22. If Dr. Smith’s idea of an odds ratio was as incorrect as the Circuit’s, his calculation may have had no validity whatsoever, and thus his opinions derived from his flawed ideas may have clearly failed the requirements of Rule 702.  The Circuit’s opinion is not terribly helpful in understanding anything other than its summary rejection of the district court’s more detailed analysis.

The Circuit also advanced the “impossibility” defense for Dr. Smith’s failure to rely upon epidemiologic studies with statistically significant results.  Id. at 24. As noted above, such studies fail to rule out chance for their finding of risk ratios above or below 1.0 (the measure of no association).  Because the likelihood of obtaining a risk ratio of exactly 1.0 is vanishingly small, epidemiologic science must and does consider the role of chance in explaining data that diverges from a measure of no association.  Dr. Smith’s hand waving about the large size of the studies needed to show an increased risk may have some validity in the context of benzene exposure and APL, but it does not explain or justify the failure to use aggregative techniques such as meta-analysis.  The hand waving also does nothing to rule out the role of chance in producing the results he relied upon in court.

The Circuit Court appeared to misunderstand the very nature of the need for statistical evaluation of stochastic biological events, such as APL incidence in a population.  According to the Circuit, Dr. Smith’s reliance upon epidemiologic data was merely

“meant to challenge the theory that benzene exposure could not cause APL, and to highlight that the limited data available was consistent with the conclusions that he had reached on the basis of other bodies of evidence. He stated that ‘[i]f epidemiologic studies of benzene-exposed workers were devoid of workers who developed APL, one could hypothesize that benzene does not cause this particular subtype of AML.’ The fact that, on the  contrary, ‘APL is seen in studies of workers exposed to benzene where the subtypes of AML have been separately analyzed and has been found at higher levels than expected’ suggested to him that the limited epidemiological evidence was at the very least consistent with, and suggestive of, the conclusion that benzene can cause APL.

* * *

Dr. Smith did not infer causality from this suggestion alone, but rather from the accumulation of multiple scientifically acceptable inferences from different bodies of evidence.”

Id. at 25

But challenging the theory that benzene exposure does not cause APL does not help show the validity of the studies relied upon, or the inferences drawn from them.  This was plaintiffs’ and Dr. Smith’s burden under Rule 702, and the Circuit seemed to lose sight of the law and the science with Professor Cranor’s and Dr. Smith’s sleight of hand.  As for the Circuit’s suggestion that scraps of evidence from different kinds of scientific studies can establish scientific knowledge, this approach was rejected by the great mathematician, physicist, and philosopher of science, Henri Poincaré:

“[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”

Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique).  Litigants, either plaintiff or defendant, should not be allowed to pick out isolated findings in a variety of studies, and throw them together as if that were science.

As unclear and dubious as the Circuit’s opinion is, the court did not throw out the last 18 years of Rule 702 law.  The Court distinguished the Milward case, with its sparse epidemiologic studies from those cases “in which the available epidemiological studies found that there is no causal link.”  Id. at 24 (citing Norris v. Baxter Healthcare Corp., 397 F.3d 878, 882 (10th Cir.2005), and Allen v. Pa. Eng’g Corp., 102 F.3d 194, 197 (5th Cir.1996).  The Court, however, provided no insight into why the epidemiologic studies must rise to the level of showing no causal link before an expert can torture weak, inconsistent, and contradictory data to claim such a link.  This legal sleight of hand is simply a shifting of the burden of proof, which should have been on plaintiffs and Dr. Smith.  Desperation is not a substitute for adequate scientific evidence to support a scientific conclusion.

The Court’s failure to engage more directly with the actual data, facts, and inferences, however, is likely to cause mischief in federal cases around the country.

Ziliak Gives Legal Advice — Puts His Posterior On the Line

August 31st, 2011

I have posted before about the curious saga of two university professors of economics who curiously tried to befriend the United States Supreme Court.  Professors Ziliak and McCloskey submitted an amicus brief to the Court, in connection with Matrixx Initiativives, Inc. v. Siracusano, ___ U.S. ___, 131 S.Ct. 1309 (2011).  Nothing unusual there, other than the Professors’ labeling themselves “Statistics Experts,” and then proceeding to commit a statistical howler of deriving a posterior probability from only a p-value.  See The Matrixx Oversold” (April 4, 2011).

I seemed to be alone in my dismay over this situation, but recently Professor David Kaye, an author of the chapter on statistics in the Reference Manual on Scientific Evidence, weighed in with his rebuttal to Ziliak and McCloskey’s erroneous statistical contentions.  SeeThe Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part I” (August 19, 2011), and “The Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part II” (August 26, 2011).  Kaye’s analysis is well worth reading.

Having attempted to bamboozle the Justices on statistics, Stephen Ziliak has now turned his attention to an audience of statisicians and students of statistical science, with a short article in Significance on the Court’s decision in Matrixx.  Stephen Ziliak, “Matrixx v. Siracusano and Student v. Fisher:  Statistical Significance on Trial,”  Significance 131 (September 2011).  Tellingly, Ziliak did not advance his novel, erroneous views of how to derive posterior odds or probabilities from p-values in the pages of a magazine published by the Royal Statistical Society.  Such gems were reserved for the audience of Justices and law clerks in Washington, D.C.  Instead of holding forth on statistical issues, Ziliak has used the pages of a statistical journal to advance equally bizarre, inexpert views about the legal meaning of a Supreme Court case.

The Matrixx decision involved the appeal from a dismissal of a complaint for failure to plead sufficient allegations in a securities fraud action.  No evidence was ever offered or refused; no expert witness opinion was held reliable or unreliable.  The defendant, Matrixx Initiatives, Inc., won the dismissal at the district court, only to have the complaint reinstated by the Court of Appeals for the Ninth Circuit.  The Supreme Court affirmed the reinstatement, and in doing so, did not, and could not, have created a holding about the sufficiency of evidence to show causation in a legal proceeding.  Indeed, Justice Sotomayor, in writing for a unanimous Court, specifically stated that causation was not at issue, especially given that evidentiary displays far below what is necessary to show causation between a medication and an adverse event might come to the attention of the FDA, which agency in turn might find the evidence sufficient to order a withdrawal of the medication.

Ziliak, having given dubious statistical advice to the U.S. Supreme Court, now sets himself up to give equally questionable legal advice to the statistical community.  He asserts that Matrixx claimed that anosmia (the loss of the sense of smell) was unimportant because not “statistically significant.”  Id. at 132.  Matrixx Initiatives no doubt made several errors, but it never made this erroneous claim.  Ziliak gives no citation to the parties’ briefs; nor could one be given.  Matrixx never contended that anosmia was unimportant; its claim was that the plaintiffs had not sufficiently alleged facts that Matrixx had knowledge of a causal relationship such that its failure to disclose adverse event reports became a “material” omission under the securities laws.  The word “unimportant” does not occur in the Matrixx’s briefs; nor was it uttered at oral argument.

Ziliak’s suggestion that “[t]he district court dismissed the case on the basis that investors did not prove ‘materiality’, by which that court meant ‘statistical significance’,” is nonsense.  Id. at 132.  The issue was never the sufficiency of evidence.  Matrixx did attempt to equate materiality with causation, and then argued that allegations of causation required, in turn, allegations of statistical significance.  In arguing the necessity of statistical significance, Matrixx was implicitly suggesting that an evidentiary display that fell short of supporting causation could not be material, when withheld from investors.  The Supreme Court had an easy time of disposing of Matrixx’s argument because causation was never at issue.  Everything that the Court did say about causation is readily discernible as dictum.

Ziliak erroneously reads into the Court’s opinion a requirement that a pharmaceutical company, reporting to the Securities and Exchange Commission “can no longer hide adverse effect [sic] reports from investors on the basis that reports are not statistically significant.”   Id. at 133.  Ziliak incorrectly refers to adverse event reports as “adverse effect reports,” which is a petitio principii.  Furthermore, this was not the holding of the Court.  The potentially fraudulent aspect of Matrixx’s conduct was not that it had “hidden” adverse event reports, but rather that it had adverse event reports and a good deal of additional information, none of which it had disclosed to investors, when at the same time, the company chose to give the investment community particularly bullish projections of future sales.  The medication involved, Zicam, was an over-the-counter formulation that never had the rigorous testing required for a prescription medication’s new drug application.

Curiously, Ziliak, the self-described statistics expert fails to point out that adverse event reports could not achieve, or fail to achieve, statistical significance on the basis of the facts alleged in the plaintiffs’ complaint.  Matrixx, and its legal counsel, might be forgiven this oversight, but surely Ziliak the statistical expert should have noted this.  Indeed, if the parties and the courts had recognized that there never was an issue of statistical significance involved in the case, the entire premiss of Matrixx’s appeal would have been taken away.

To be a little fair to Ziliak, the Supreme Court, having disclaimed any effort to require proof of causation or to define that requisites of reliable evidence of causation, went ahead and offered its own dubious dictum on how statistical significance might not be necessary for causation:

“Matrixx’s argument rests on the premise that statistical significance is the only reliable indication of causation. This premise is flawed: As the SEC points out, “medical researchers … consider multiple factors in assessing causation.” Brief for United States as Amicus Curiae 12. Statistically significant data are not always available. For example, when an adverse event is subtle or rare, “an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance.” Id., at 15; see also Brief for Medical Researchers as Amici Curiae 11. Moreover, ethical considerations may prohibit researchers from conducting randomized clinical trials to confirm a suspected causal link for the purpose of obtaining statistically significant data. See id., at 10-11.

A lack of statistically significant data does not mean that medical experts have no reliable basis for inferring a causal link between a drug and adverse events. As Matrixx itself concedes, medical experts rely on other evidence to establish an inference of causation. See Brief for Petitioners 44-45, n. 22. We note that courts frequently permit expert testimony on causation based on evidence other than statistical significance. See, e.g., Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (C.A.6 2009); Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263-264 (C.A.4 1999) (citing cases); Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741, 744-745 (C.A.11 1986). We need not consider whether the expert testimony was properly admitted in those cases, and we do not attempt to define here what constitutes reliable evidence of causation.”

What is problematic about this passage is that Justice Sotomayor was addressing situations that were not before the Court, and about which she had no appropriate briefing.  Her suggestion that randomized clinical trials are not always ethically appropriate is, of course, true, but that does not prevent an expert witness from relying upon observational epidemiologic studies – with statistically significant results – to support their causal claims.  Justice Sotomayor’s citation to the Best and the Westberry cases, again in dictum, is equally off the mark.  Both cases involve the application of differential etiological reasoning about specific causation, which presupposes that  general causation has been previously, sufficiently shown.  Finally, Justice Sotomayor’s citation to the Wells case, which involved both general and specific causation issues, was inapposite because plaintiff’s expert witness in Wells did rely upon at least one study with a statistically significant result.  As I have pointed out before, the Wells case went on to become an example of one trial judge’s abject failure to understand and evaluate scientific evidence.

Postscript:

The Supreme Court’s statistical acumen may have been lacking, but the Justices seemed to have a good sense of what was really going on in the case.  In December 2010, Matrixx settled over 2,000 Zicam injury claims. On February 24, 2011, a month before the Supreme Court decided the Matrixx case, the federal district judge responsible for the Zicam multi-district litigation refused Matrixx’ motion to exclude plaintiffs’ expert witnesses’ causation opinions.  “First Zicam Experts Admitted by MDL Judge for Causation, Labeling Opinions” 15 Mealey’s Daubert Reporter (February 2011); In re Zicam Cold Remedy Marketing, Sales Practices and Products Liab. Litig., MDL Docket No. 2:09-md-02096, Document 1360 (D. Ariz. 2011).

After the Supreme Court affirmed the reinstatement of the securities fraud complaint, Charles Hensley, the inventor of Zicam, was arrested on federal charges of illegally marketing another drug, Vira 38, which he claimed was therapeutic and preventive for bird flu.  Stuart Pfeifer, “Zicam inventor arrested, accused of illegal marketing of flu drug,” Los Angeles Times (June 2, 2011).  Earlier this month, Mr. Hensley pleaded guilty to the charges of unlawful distribution.

Confusion Over Causation in Texas

August 27th, 2011

As I have previously discussed, a risk ratio (RR) ≤ 2 is a strong practical argument against specific causation. See Courts and Commentators on Relative Risks to Infer Specific CausationRelative Risks and Individual Causal Attribution; and  Risk and Causation in the Law.   But a relative risk greater than 2 threshold has little to do with general causation.  There are any number of well-established causal relationships, where the magnitude of the ex ante risk in an exposed population is > 1, but ≤ 2.  The magnitude of risk for cardiovascular disease and smoking is one such well-known example.

When assessing general causation from only observational epidemiologic studies, where residual confounding and bias may be lurking, it is prudent to require a RR > 2, as a measure of strength of the association that can help us rule out the role of systemic error.  As the cardiovascular disease/smoking example illustrates, however, there is clearly no scientific requirement that the RR be greater than 2 to establish general causation.  Much will depend upon the entire body of evidence.  If the other important Bradford Hill factors are present – dose-response, consistent, coherence, etc. – then risk ratios ≤ 2, from observational studies, may suffice to show general causation.  So the requirement of a RR > 2, for the showing of general causation, is a much weaker consideration than it is for specific causation.

Randomization and double blinding are major steps in controlling confounding and bias, but they are not complete guarantees that systematic bias has been eliminated.  A double-blinded, placebo-controlled, randomized clinical trial (RCT) will usually have less opportunity for bias and confounding to play a role.  Imposing a RR > 2 requirement for general causation thus makes less sense in the context of trying to infer general causation from the results of RCTs.

Somehow the Texas Supreme Court managed to confuse these concepts in an important decision this week, Merck & Co. v. Garza (August 26, 2011).

Mr. Garza had a long history of heart disease, at least two decades long, including a heart attack, and quadruple bypass and stent surgeries.  Garza’s physician prescribed 25 mg Vioxx for pain relief.  Garza died less than a month later, at the age of 71, of an acute myocardial infarction.  The plaintiffs (Mr. Garza’s survivors) were thus faced with a problem of showing the magnitude of the risk experienced by Mr. Garza, which risk would allow them to infer that his fatal heart attack was caused by his having taken Vioxx.  The studies relied upon by plaintiffs did show increased risk, consistently, for larger doses (50 mg.) taken over longer periods of time.  The trial court entered judgment upon a jury verdict in favor of the plaintiffs.

The Texas Supreme Court reversed, and rendered the judgment for Merck.  The Court’s judgment was based largely upon its view that the studies relied upon did not apply to the plaintiff.  Here the Court was on pretty solid ground.  The plaintiffs also argued that Mr. Garza had a higher pre-medication, baseline risk, and that he therefore would have sustained a greater increased risk from short-term, low-dose use of Vioxx.  The Court saw through this speculative argument, and cautioned that the “absence of evidence cannot substitute for evidence.” Slip op. at 17.  The greater baseline does not mean that the medication imposed a greater relative risk on people like Mr. Garza, although it would mean that we would expect to see more cases from any subgroup that looked like him.  The attributable fraction and the difficulty in using risk to infer individual attribution, however, would remain the same.

The problematic aspect of the Garza case arises from the Texas Supreme Court’s conflating and confusing general with specific causation.  There was no real doubt that Vioxx at high-doses, for prolonged use, can cause heart attacks.  General causation was not at issue.  The attribution of Mr. Garza’s heart attack to his short-term, low-dose use of Vioxx, however, was at issue, and was a rather dubious claim.

The Texas Supreme Court proceeded to rely heavily upon its holding and language in Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 S.W.2d 706 (Tex. 1997).  Havner was a Bendectin case, in which plaintiffs claimed that the medication caused specific birth defects.  Both general and specific causation were contested by the parties. The epidemiologic evidence in Havner came from observational studies, either case-control or cohort studies, and not RCTs.

The Havner decision insightfully recognized that risk does not equal causation, but RR > 2 is a practical compromise for allowing courts and juries to make the plaintiff-specific attribution in the face of uncertainty.  Havner, 953 S.W.2d at 717 .  Merck latched on to this and other language, arguing that “Havner requires a plaintiff who claims injury from taking a drug to produce two independent epidemiological studies showing a statistically significant doubling of the relative risk of the injury for patients taking the drug under conditions substantially similar to the plaintiff’s (dose and duration, for example) as compared to patients taking a placebo.” Slip op. at 7.

The plaintiffs in Garza responded by arguing that their reliance upon RCTs relieved them of Havner‘s requirement of showing a RR > 2.

The Texas Supreme Court correctly rejected the plaintiffs’ argument and followed its earlier decision in Havner on specific causation:

“But while the controlled, experimental, and prospective nature of clinical trials undoubtedly make them more reliable than retroactive, observational studies, both must show a statistically significant doubling of the risk in order to be some evidence that a drug more likely than not caused a particular injury.”

Slip op. at 10.

The Garza Court, however, went a dictum too far by expressing some of the Havner requirements as applying to general causation:

Havner holds, and we reiterate, that when parties attempt to prove general causation using epidemiological evidence, a threshold requirement of reliability is that the evidence demonstrate a statistically significant doubling of the risk. In addition, Havner requires that a plaintiff show ‘that he or she is similar to [the subjects] in the studies’ and that ‘other plausible causes of the injury or condition that could be negated [are excluded] with reasonable certainty’.40

Slip op. at 13-14 (quoting from Havner at 953 S.W.2d at 720).

General causation was not the dispositive issue in Garza, and so this language must be treated as dictum.  The sloppiness in confusing the requisites of general and specific causation is regrettable.

The plaintiffs also advanced another argument, which is becoming a commonplace in health-effects litigation.  They threw all their evidence into a pile, and claimed that the “totality of the evidence” supported their claims.  This argument is somehow supposed to supplant a reasoned approach to the issue of what specific inferences can be drawn from what kind of evidence.  The Texas Supreme Court saw through the pile, and dismissed the hand waving:

“The totality of the evidence cannot prove general causation if it does not meet the standards for scientific reliability established by Havner. A plaintiff cannot prove causation by presenting different types of unreliable evidence.”

Slip op. at 17.

All in all, the Garza Court did better than many federal courts that have consistently confused risk with cause, as well as general with specific causation.

Bad and Good Statistical Advice from the New England Journal of Medicine

July 2nd, 2011

Many people consider The New England Journal of Medicine (NEJM) a prestigious journal.  It is certainly widely read.  Judging from its “impact factor,” we know the journal is frequently cited.  So when the NEJM weighs in on issue that involves the intersection of law and science, I pay attention.

Unfortunately, this week’s issue contains an editorial “Perspective” piece that is filled with incoherent, inconsistent, and incorrect assertions, both on the law and the science.  Mark A. Pfeffer and Marianne Bowler, “Access to Safety Data – Stockholders versus Prescribers,” 364 New Engl. J. Med. ___ (2011).

Dr. Mark Pfeffer and the Hon. Marianne Bowler used the recent United States Supreme Court decision in Matrixx Initiatives, Inc. v. Siracusano, __ U.S. __, 131 S.Ct., 1309 (2011), to advance views, not supported by the law or the science.   Remarkably, Dr. Pfeffer is the Victor J. Dzau Professor of Medicine, at the Harvard Medical School.  He is both a physician, and he has received a Ph.D. degree in physiology and biophysics.  Ms. Bowler is both a lawyer and a federal judge.  Between the two, they should have provided better, more accurate, and more consistent advice.

1. The Authors Erroneously Characterize Statistical Significance in Inappropriate Bayesian Terms

The article begins with a relatively straightforward characterization of various legal burdens of proof.  The authors then try to collapse one of those burdens of proof, “beyond a reasonable doubt,” which has no accepted quantitative meaning, to a significance probability that is used to reject a pre-specified null hypothesis in scientific studies:

“To reject the null hypothesis (that a result occurred by chance) and deem an intervention effective in a clinical trial, the level of proof analogous to law’s ‘beyond a reasonable doubt’ standard would require an extremely stringent alpha level to permit researchers to claim a statistically significant effect, with the offsetting risk that a truly effective intervention would sometimes be deemed ineffective.  Instead, most randomized clinical trials are designed to achieve a lower level of evidence that in legal jargon might be called ‘clear and convincing’, making conclusions drawn from it highly probable or reasonably certain.”

Now this is both scientific and legal nonsense.  It is distressing that a federal judge characterizes the burden of proof that she must apply, or direct juries to apply, as “legal jargon.”  More important, these authors, scientist and judge, give questionable quantitative meanings to burdens of proof, and they misstate the meaning of statistical significance.  When judges or juries must determine guilt “beyond a reasonable doubt,” they are assessing the prosecution’s claim that the defendant is guilty, given the evidence at trial.  This posterior probability can be represented as:

Probability (Guilt | Evidence Adduced)

This is what is known as a posterior probability, and it is fundamentally different from significance probability.

The significance probability is a transposed conditional probability from the posterior probability that is used to assess guilt in a criminal trial, or contentions in a civil trial.  As law professor David Kaye and his statistician coauthor, the late David Freedman, described the p-value and significance probability:

“The p-value is the probability of getting data as extreme as, or more extreme than, the actual data, given that the null hypothesis is true:

p = Probability (extreme data | null hypothesis in model)

* * *

Conversely, large p-values indicate that the data are compatible with the null hypothesis: the observed difference is easy to explain by chance. In this context, small p-values argue for the plaintiffs, while large p-values argue for the defense.131Since p is calculated by assuming that the null hypothesis is correct (no real difference in pass rates), the p-value cannot give the chance that this hypothesis is true. The p-value merely gives the chance of getting evidence against the null hypothesis as strong or stronger than the evidence at hand—assuming the null hypothesis to be correct. No matter how many samples are obtained, the null hypothesis is either always right or always wrong. Chance affects the data, not the hypothesis. With the frequency interpretation of chance, there is no meaningful way to assign a numerical probability to the null hypothesis.132

David H. Kaye and David A. Freedman, “Reference Guide on Statistics,” Federal Judicial Center, Reference Manual on Scientific Evidence 122 (2ed. 2000).  Kaye and Freedman explained over a decade ago, for the benefit of federal judges:

“As noted above, it is easy to mistake the p-value for the probability that there is no difference. Likewise, if results are significant at the .05 level, it is tempting to conclude that the null hypothesis has only a 5% chance of being correct.142

This temptation should be resisted. From the frequentist perspective, statistical hypotheses are either true or false; probabilities govern the samples, not the models and hypotheses. The significance level tells us what is likely to happen when the null hypothesis is correct; it cannot tell us the probability that the hypothesis is true. Significance comes no closer to expressing the probability that the null hypothesis is true than does the underlying p-value.143

Id. at 124-25.

As we can see, our scientist from the Harvard School of Medical School and our federal judge have committed the transpositional fallacy by likening “beyond a reasonable doubt” to the alpha used to test for a statistically significant outcome in a clinical trial.  They are not the same; nor are they analogous.

This fallacy has been repeatedly described.  Not only has the Reference Manual on Scientific Manual (which is written specifically for federal judges) described the fallacy in detail, but legal and scientific writers have urged care to avoid this basic mistake in probabilistic reasoning.  Here is a recent admonition from one of the leading writers on the use (and misuse) of statistics in legal procedures:

“Some commentators, however, would go much further; they argue that is an arbitrary statistical convention and since preponderance of the evidence means 51% probability, lawyers should not use 5% as the level of statistical significance but 49% – thus rejecting the null hypothesis when there is up to a 49% chance that it is true. In their view, to use a 5% standard of significance would impermissibly raise the preponderance of evidence standard in civil trials. Of course the 5% figure is arbitrary (although widely accepted in statistics) but the argument is fallacious. It assumes that 5% (or 49% for that matter) is the probability that the null hypothesis is true. The 5% level of significance is not that, but the probability of the sample evidence if the null hypothesis were true. This is a very different matter. As I pointed out in Chapter1, the probability of the sample given the null hypothesis is not generally the same as the probability of the null hypothesis given the sample. To relate the level of significance to the probability of the null hypothesis would require an application of Bayes’s theorem and the assumption of a prior probability distribution. However, the courts have usually accepted the statistical standard, although with some justifiable reservations when the P-value is only slightly above the 5% cutoff.”

Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 54 (N.Y. 2009) (emphasis added).

2.  The Authors, Having Mischaracterized Burden-of-Proof and Significance Probabilities, Incorrectly Assess the Meaning of the Supreme Court’s Decision in Matrixx Initiatives.

I have written a good bit about the Court’s decision in Matrixx Initiatives, most recently with David Venderbush, for the Washington Legal Foundation.  See Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” W.L.F. Legal Backgrounder (June 17, 2011).

I was thus startled to see the claim of a federal judge that the Supreme Court, in Matrixx, had “applied the ‘fair preponderance of the evidence’ standard of proof used for civil matters.”  Matrixx was a case about the sufficiency of the pleadings, and thus there really could have been no such application of a burden of proof to an evidentiary display.  The very claim is incoherent, and at odds with the Supreme Court’s holding.

The NEJM authors went on to detail how the defendant in Matrixx had persuaded the trial court that the evidence against its product, Zicam, did not reach statistical significance, and therefore the evidence should not be considered “material.”  As I have pointed out before, Matrixx focused on adverse event reports, as raw number of reported events, which did not, and could not, be analyzed for statistical significance.  The very essence of Matrixx’s argument was nonsense, which perhaps explains the company’s nine-nothing loss in the Supreme Court.  The authors of the opinion piece in the NEJM, however, missed that it is not the evidence of adverse event reports, with or without a statistical analysis, that is material.  What was at issue was whether the company’s failure to disclose this information, along with a good deal more information, in the face of the company’s having made very aggressive, optimistic sales and profits projections for the future.

The NEJM authors proceed to tell us, correctly, that adverse events do not prove causality, but then they tell us, incorrectly, that the Matrixx case shows that “such a high level of proof did not have to be achieved.”  While the authors are correct about the sufficiency of adverse event reports for causal assessments, they miss the legal significance of there being no burden of proof at play in Matrixx; it was a case on the pleadings.  The issue was the sufficiency of those pleadings, and what the Supreme Court made clear was that in the context of a product subject to FDA regulation, causation was never the test for materiality because the FDA could withdraw the product on a showing far less than scientific causation of harm.  So the plaintiffs could allege less than causation, and still have pleaded a sufficient case of securities fraud.  The Supreme Court did not, and could not, address the issue that the NEJM authors discuss.  The authors’ assessment that the Matrixx case freed legal causation of any requirement of statistical significance is a tortured reading of obiter dictum, not the holding of the case.  This editorializing is troubling.

The NEJM authors similarly hold forth on what clinicians consider material, and they announce that “[c]linicians are well aware that to be considered material, information regarding drug safety does not have to reach the same level of certainty that we demand for demonstrating efficacy.”  This is true, but clinicians are ethically bound to err on the side of safety:  Primum non nocere. See, e.g., Tamraz v. Lincoln Elec. Co., 620 F.3d 665, 673 (6th Cir. 2010) (noting that treating physicians have more training in diagnosis than in etiologic assessments), cert. denied, ___ U.S.____ (2011).  Again, the authors’ statements have nothing to do with the Matrixx case, or with the standards for legal or scientific causation.

3.  The Authors, Inconsistently with Their Characterization of Various Probabilities, Proceed Correctly To Describe Statistical Significance Testing for Adverse Outcomes in Trials.

Having incorrectly described beyond a reasonable doubt as like p <0.05, the NEJM authors then, correctly point out that standard statistical testing cannot be used for “evaluating unplanned and uncommon adverse events.”  The authors also note that the flood of data in the assessment of causation of adverse events is filled with “biologic noise.”  Physicians and regulators may take the noise signals and claim that they hear a concert.  This is exactly why we should not confuse precautionary judgments with scientific assessments of causation.