TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

New Release of PLI’s Treatise on Product Liability Litigation

January 19th, 2013

The Practicing Law Institute (PLI) has released a new edition of its treatise on product liability litigation.  Stephanie A. Scharf, Lise T. Spacapan, Traci M. Braun, and Sarah R. Marmor, eds., Product Liability Litigation:  Current Law, Strategies and Best Practices (PLI Dec. 2012).

The new edition, the third release of the treatise, has several new chapters, including my contribution, Chapter 30A, “Statistical Evidence in Products Liability Litigation,” which discusses the use of, and recent developments, in statistical and scientific evidence in the law, including judicial mishandling of “significance probability,” statistical significance, statistical power, and meta-analysis.  Here is the table of contents for this new chapter on statistical evidence:

  • § 30A:1 : Overview 30A-2
  • § 30A:2 : Litigation Context of Statistical Issues 30A-2
  • § 30A:3 : Qualification of Expert Witnesses Who Give Testimony on Statistical Issues 30A-3
  • § 30A:4 : Admissibility of Statistical Evidence—Rules 702 and 703 30A-3
  • § 30A:5 : Significance Probability 30A-5
    • § 30A:5.1 : Definition of Significance Probability (The “p-value”) 30A-5
    • § 30A:5.2 : The Transpositional Fallacy 30A-5
    • § 30A:5.3 : Confusion Between Significance Probability and The Burden of Proof 30A-6
    • § 30A:5.4 : Hypothesis Testing 30A-7
    • § 30A:5.5 : Confidence Intervals 30A-8
    • § 30A:5.6 : Inappropriate Use of Statistical Significance—Matrixx Initiatives, Inc. v. Siracusano 30A-9
      • [A] : Sequelae of Matrixx Initiatives 30A-12
      • [B] : Is Statistical Significance Necessary? 30A-13
  • § 30A:6 : Statistical Power30A-14
    • § 30A:6.1 : Definition of Statistical Power 30A-14
    • § 30A:6.2 : Cases Involving Statistical Power 30A-15
  • § 30A:7 : Meta-Analysis 30A-17
    • § 30A:7.1 : Definition and History of Meta-Analysis 30A-17
    • § 30A:7.2 : Consensus Statements 30A-18
    • § 30A:7.3 : Use of Meta-Analysis in Litigation 30A-18
    • § 30A:7.4 : Competing Models for Meta-Analysis 30A-20
    • § 30A:7.5 : Recent Cases Involving Meta-Analyses 30A-21
  • § 30A:8 : Conclusion 30A-23

The treatise weighs in with over 40 chapters, and over 1,000 pages.  The table of contents and table of authorities are available online at the PLI’s website.

The PLI is a non-profit educational organization, chartered by the Regents of the University of the State of New York.  The PLI provides continuing legal education, and publishes treatises and handbooks geared for the practitioner.

Litmus Tests

December 27th, 2012

Rule 702 is, or is not, a litmus test for expert witness opinion admissibility.  Relative risk is, or is not, a litmus test for specific causation.  Statistical significance is, or is not, a litmus test for reasonable reliance upon the results of a study.  It is relatively easy to find judicial opinions on either side of the litmus divide.  Compare National Judicial College, Resource Guide for Managing Complex Litigation at 57 (2010) (Daubert is not a litmus test) with Cryer v. Werner Enterprises, Inc., Civ. Action No. 05-S-696-NE, Mem. Op. & Order at 16 n. 63 (N.D. Ala. Dec. 28, 2007) (describing the Eleventh Circuit’s restatement of Rule 702’s “litmus test” for the methodological reliability of proffered expert witness opinion testimony).

The “litmus test“ is one sorry, overworked metaphor.  Perhaps its appeal has to do with a vague collective memory that litmus paper is one of those “things of science,” which we used in high school chemistry, and never had occasion to use again. Perhaps, litmus tests have the appeal of “proofiness.”

The reality is different. The litmus test is a semi-quantitative test for acidity or alkalinity.  Neutral litmus is purple.  Under acidic conditions, litmus turns red; under basic conditions, it turns blue.  For some time, scientists have used pH meters when they want a precise quantification of acidity or alkalinity.  Litmus paper is a fairly crude test, which easily discriminates  moderate acidity from alkalinity (say pH 4 from pH 11), but is relatively useless for detecting an acidity at pH or 6.95, or alkalinity at 7.05.

So what exactly are legal authors trying to say when they say that some feature of a test is, or is not, a “litmus test”? The litmus test is accurate, but not precise at the important boundary at neutrality.  The litmus test color can be interpreted for degree of acidity or alkalinity, but it is not the preferred method to obtain a precise measurement. Saying that a judicial candidate’s views on abortion are a litmus test for the Senate’s evaluation of the candidate makes sense, given the relative binary nature of the outcome of a litmus test, and the polarization of political views on abortion. Apparently, neutral views or views close to neutrality on abortion are not a desideratum for judicial candidates.  A cruder, binary test is exactly what is desired by politicians.

The litmus test that is used for judicial candidates does not seem to work so well when used to describe scientific or statistical inference.  The litmus test is well understood, but fairly obsolete in modern laboratory practice.  When courts say things, such as statistical significance is not a litmus test for acceptability of a study’s results, clearly they are correct because measure of random error is only one aspect of judging a body of evidence for, or against, an association.  Yet courts seem to imply something else, at least at times:

statistical significance is not an important showing in making a case that an exposure is reliably associated with a particular outcome.

Here courts are trading in half truths.  Statistical significance is quantitative, and the choice of a level of significance is not based upon immutable law. So like the slight difference between a pH of 6.95 and 7.05, statistical significance tests have a boundary issue.  Nonetheless, a consideration of random error cannot be dismissed or overlooked on the theory that significance level is not a “litmus test.”  This metaphor obscures and attempts to excuse sloppy thinking.  It is time to move beyond this metaphor.

Lumpenepidemiology

December 24th, 2012

Judge Helen Berrigan, who presides over the Paxil birth defects MDL in New Orleans, has issued a nicely reasoned Rule 702 opinion, upholding defense objections to plaintiffs expert witnesses, Paul Goldstein, Ph.D., and Shira Kramer, Ph.D. Frischhertz v SmithKline Beecham EDLa 2012 702 MSJ Op.

The plaintiff, Andrea Frischhertz, took GSK’s Paxil, a selective serotonin reuptake inhibitor (SSRI), for depression while pregnant with her daughter, E.F. The parties agreed that E.F. was born with a deformity of her right hand.  Plaintiffs originally claimed that E.F. had a heart defect, but their expert witnesses appeared to give up this claim at deposition, as lacking evidential support.

Adhering to Daubert’s Epistemiologic Lesson

Like many other lower federal courts, Judge Berrigan focused her analysis on the language of Daubert v. Merrell Dow Pharmaceuticals Inc., 509 U.S. 579 (1993), a case that has been superseded by subsequent cases and a revision to the operative statute, Rule 702.  Fortunately, the trial court did not lose sight of the key epistemological teaching of Daubert, which is based upon Rule 702:

“Regarding reliability, the [Daubert] Court said: ‘the subject of an expert’s testimony must be “scientific . . . knowledge.” The adjective “scientific” implies a grounding in the methods and procedures of science. Similarly, the word “knowledge” connotes more than subjective belief or unsupported speculation’.”

Slip Op. at 3 (quoting Daubert, 509 U.S. at 589-590).

There was not much to the plaintiffs’ expert witnesses’ opinion beyond speculation, but many other courts have been beguiled by speculation dressed up as “scientific … knowledge.”  Dr. Goldstein relied upon whole embryo culture testing of SSRIs, but in the face overwhelming evidence, Dr. Goldstein was forced to concede that this test may generate hypotheses about, but cannot predict, human risk of birth defects.  No doubt this concession made the trial court’s decision easier, but the result would have been required regardless of Dr. Goldstein’s exhibition of truthfulness at deposition.

Statistical Association – A Good Place to Begin

More interestingly, the trial court rejected the plaintiffs’ expert witnesses’ efforts to leapfrog finding a statistically significant association to parsing the so-called Bradford Hill factors:

“The Bradford-Hill criteria can only be applied after a statistically significant association has been identified. Federal Judicial Center, Reference Manual on Scientific Evidence, 599, n.141 (3d. ed. 2011) (“In a number of cases, experts attempted to use these guidelines to support the existence of causation in the absence of any epidemiologic studies finding an association . . . . There may be some logic to that effort, but it does not reflect accepted epidemiologic methodology.”). See, e.g., Dunn v. Sandoz Pharms., 275 F. Supp. 2d 672, 678 (M.D.N.C. 2003). Here, Dr. Goldstein attempted to use the Bradford-Hill criteria to prove causation without first identifying a valid statistically significant association. He first developed a hypothesis and then attempted to use the Bradford-Hill criteria to prove it. Rec. Doc. 187, Exh. 2, depo. Goldstein, p. 103. Because there is no data showing an association between Paxil and limb defects, no association existed for Dr. Goldstein to apply the Bradford-Hill criteria. Hence, Dr. Goldstein’s general causation opinion is not reliable.”

Slip op. at 6.

The trial court’s rejection of Dr. Goldstein’s attempted end run is particularly noteworthy given the Reference Manual’s weak-kneed attempt to suggest that this reasoning has “some logic” to it.  The Manual never articulates what “logic” commends Dr. Goldstein’s approach; nor does it identify any causal relationship ever established with such paltry evidence in the real world of science. The Manual does cite several legal cases that excused or overlooked the need to find a statistically significant association, and even elevated such reasoning into legally acceptable, admissibility method.  See Reference Manual on Scientific Evidence at 599 n. 141 (describing cases in which purported expert witnesses attempted to use Bradford Hill factors in the absence of a statistically significant association; citing Rains v. PPG Indus., Inc., 361 F. Supp. 2d 829, 836–37 (S.D. Ill. 2004); ); Soldo v. Sandoz Pharms. Corp., 244 F. Supp. 2d 434, 460–61 (W.D. Pa. 2003).  The Reference Manual also cited cases, without obvious disapproval, which completely dispatched with any necessity of considering any of the Bradford Hill factors, or the precondition of a statistically significant association.  See Reference Manual at 599 n. 144 (citing Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071, 1098 (D. Colo. 2006) (“Defendants cite no authority, scientific or legal, that compliance with all, or even one, of these factors is required. . . . The scientific consensus is, in fact, to the contrary. It identifies Defendants’ list of factors as some of the nine factors or lenses that guide epidemiologists in making judgments about causation. . . . These factors are not tests for determining the reliability of any study or the causal inferences drawn from it.“).

Shira Kramer Takes Her Lumpings

The plaintiffs’ other key expert witness, Dr. Shira Kramer, was a more sophisticated and experienced obfuscator.  Kramer attempted to provide plaintiffs with a necessary association by “lumping” all birth defects together in her analysis of epidemiologic data of birth defects among children of women who had ingested Paxil (or other SSRIs).  Given the clear evidence that different birth defects arise at different times, based upon interference with different embryological processes, the trial court discerned this “lumping” of end points to be methodologically inappropriate.  Slip op. at 8 (citing Chamber v. Exxon Corp., 81 F. Supp. 2d 661 (M.D. La. 2000), aff’d, 247 F.3d 240 (5th Cir. 2001) (unpublished).

Without her “lumping”, Dr. Kramer was left with only a weak, inconsistent claim of biological plausibility and temporality. Finding that Dr. Kramer’s opinion had outrun her headlights, Judge Berrigan, excluded Dr. Kramer as an expert witness, and granted GSK summary judgment.

Merry Christmas!

 

The Matrixx Motion in U.S. v. Harkonen

December 17th, 2012

United States of America v. W. Scott Harkonen, MD — Part III

Background

The recent oral argument in United States v. Harkonen (seeThe (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief filed by the Solicitor General’s office in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  One of Dr. Harkonen’s post-trial motions contended that the government’s failure to disclose its Matrixx amicus brief deprived him of a powerful argument that would have resulted from citing the language of the brief, which disparaged the necessity of statistical significance for “demonstrating” causal inferences. SeeMultiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012).

Matrixx Initiatives is a good example of how litigants make bad law when they press for rulings on bad facts.  The Supreme Court ultimately held that pleading and proving causation were not necessary for a securities fraud action that turned on non-disclosure of information about health outcomes among users of the company’s medication. What is required is “materiality,” which may be satisfied upon a much lower showing than causation.  Because Matrixx Initiatives contended that statistical significance was necessary to causation, which in turn was needed to show materiality, much of the briefings before the Supreme Court addressed statistical significance, but the reality is that the Court’s disposition obviated any discussion of the role of statistical inferences for causation. 131 S.Ct. at 1319.

Still, the Supreme Court, in a unanimous opinion, plowed forward and issued its improvident dicta about statistical significance. Taken at face value, the Court’s statement that “the premise that statistical significance is the only reliable indication of causation … is flawed,” is unexceptionable. Matrixx Initiatives, 131 S.Ct. at 1319.  For one thing, the statement would be true if statistical significance were necessary but not sufficient to “indicate” causation. But more to the point, there are some cases in which statistical significance may not be part of the analytical toolkit for reaching a causal conclusion. For instance, the infamous Ferebee case, which did not involve Federal Rule of 702, is a good example of a case that did not involve epidemiologic or statistical evidence.  SeeFerebee Revisited” (Nov. 8, 2012) (discussing the agreement of both parties that statistical evidence was not necessary to resolve general causation because of the acute onset, post-exposure, of an extremely uncommon medical outcome – severe diffuse interstitial pulmonary fibrosis).

Surely, there are other such cases, but in modern products liability law, many causation puzzles are based upon the interpretation of rate-driven processes, measured using epidemiologic studies, involving a measurable base-line risk and an observed higher or lower risk among a sample of an exposed population. In this context, some evaluation of the size of random error is, indeed, necessary. The Supreme Court’s muddled dicta, however, has confused the issues by painting with an extremely broad brush.

The dicta in Matrixx Initiatives has already led to judicial errors. The MDL court in the Chantix litigation provides one such instance. Plaintiffs claimed that Chantix, a medication that helps people stop smoking, causes suicide. Pfizer, the manufacturer, challenged plaintiffs’ general causation expert witnesses, for not meeting the standards of Federal Rule of Evidence 702, for various reasons, not the least of which was that the studies relied upon by plaintiffs’ witnesses did not show statistical significance.  In re Chantix Prods. Liab. Litig., MDL 2092, 2012 U.S. Dist. LEXIS 130144 (Aug. 21, 2012).  The Chantix MDL court, citing Matrixx Initiatives for a blanket rejection of the need to consider random error, denied the defendant’s challenge. Id. at *41-42 (citing Matrixx Initiatives, 131 S.Ct. at 1319).

The Supreme Court, in Matrixx, however, never stated or implied such a blanket rejection of the importance of considering random error in evidence that was essentially statistical in nature. Of course, if it had done so, it would have been wrong.

Within two weeks of the Chantix decision, a similar erroneous interpretation of Matrixx Initiatives surfaced in MDL litigation over fenfluramine.  Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012). Rejecting a Rule 702 challenge to plaintiffs’ expert witness’s opinion, the MDL trial judge, cited Matrixx Initiatives for the assertion that:

Daubert does not require that an expert opinion regarding causation be based on statistical evidence in order to be reliable. * * * In fact, many courts have recognized that medical professionals often base their opinions on data other than statistical evidence from controlled clinical trials or epidemiological studies.”

Id. at *22 (citing Matrixx Initiatives, 131 S. Ct. at 1319, 1320).  While some causation opinions might be perfectly appropriately based upon other than statistical evidence, the Supreme Court specifically disclaimed any comment upon Rule 702, in Matrixx Initiatives, which was a case about proper pleading of materiality in a securities fraud case, not about proper foundations for actual evidence of causation, at trial, of a health-effects claim. The Cheek decision is thus remarkable for profoundly misunderstanding the Matrixx case. There was no resolution of any Rule 702 issue in Matrixx.

The Trial Court’s Denial of the Matrixx Motion in Harkonen

Dr. Harkonen argued that he is entitled to a new trial on the basis of “newly discovered evidence” in the form of the government’s amicus brief in Matrixx. The trial court denied this motion on several grounds.  First, the government’s amicus brief was filed after the jury returned its verdict against Dr. Harkonen.  Second, the language in the Solicitor General’s amicus brief was just “argument.”  And third, the issue in Matrixx involved adverse events, not efficacy, and the FDA, as well as investors, would be concerned with lesser levels of evidence that did not “demonstrate” causation.  United States v. Harkonen, Memorandum & Order re Defendant Harkonen’s Motions for a New Trial, No. C 08-00164 MHP (N.D. Calif. April 18, 2011). Perhaps the most telling ground might have been that the government’s amicus briefing about statistical significance, prompted by Matrixx Initiatives’ appellate theory, was irrelevant to the proper resolution of that Supreme Court case.  Still, if these reasons are taken individually, or in combination, they fail to mitigate the unfairness of the government’s prosecution of Dr. Harkonen.

The Amicus Brief Behind the Matrixx Motion

Judge Patel’s denial of the motion raised serious problems. SeeMultiplicity versus Duplicity – The Harkonen Conviction” (Dec. 11, 2012).  It may thus be worth a closer look at the government’s amicus brief to evaluate Dr. Harkonen’s Matrixx motion. The distinction between efficacy and adverse effects is particularly unconvincing.  Similarly, it does not seem fair to permit the government to take inconsistent positions, whether on facts or on inferences and arguments, when those inconsistencies confuse criminal defendants, prosecutors, civil litigants, and lower court judges. After all, Dr. Harkonen’s use of the key word, “demonstrate” was an argument about the strength and epistemic strength of the evidence at hand.

The government’s amicus brief was filed by the Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services. The government, in its brief, appeared to disclaim the necessity, or even the importance, of statistical significance:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010). This statement, with its double negatives, is highly problematic.  Validity of a correlation is really not what is at issue in randomized clinical trial; rather it is the statistical reliability or stability of the measurement that is called into question when the result is not statistically significant.  A statistically insignificant result may not refute causation, but it certainly does not thereby support an inference of causation.  The Solicitor General’s brief made this statement without citation to any biostatistics text or treatise.

The government’s amicus brief introduces its discussion of statistical significance with a heading, entitled “Statistical significance is a limited and non-exclusive tool for inferring causation.” Id. at *13.  In a footnote, the government elaborated that its position applied to both safety and efficacy outcomes:

“[t]he same principle applies to studies suggesting that a particular drug is efficacious. A study  in which the cure rate for cancer patients who took a drug was twice the cure rate for those who took a placebo could generate meaningful interest even if the results were not statistically significant.”

Id. at *15 n.2.  Judge Patel’s distinction between efficacy and adverse events thus cannot be sustained. Of course, “meaningful interest” is not exactly a sufficient basis for a causal conclusion. As a general matter, Dr. Harkonen’s motion seems well grounded.  Although not a model of clarity, the amicus brief appears to disparage the necessity of statistical significance for supporting a causal conclusion. A criminal defendant being prosecuted for using the wrong verb to describe his characterization of the inference he drew from a clinical trial would certainly want to showcase these high-profile statements made by Solicitor General’s office to the highest court of the land.

Solicitor General’s Good Advice

Much of the Solicitor General’s brief is directly on point for the Matrixx case. The amicus brief leads off by insisting that information that supports reasonable suspicions about adverse events, may be material absent sufficient evidence of causation.  Id. at 11.  Of course, this is the dispositive argument, and it is stated well in the brief.  The brief then wonders into scientific and statistical territory, with little or no authority, at times misciting important works such as the Reference Manual on Scientific Evidence.

The Solicitor General’s amicus brief hones in on the key issue: materiality, which does not necessarily involve causation:

“Second, a reasonable investor may consider information suggesting an adverse drug effect important even if it does not prove that the drug causes the effect.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *8.

“As explained above (see p. 19, supra), however, adverse event reports do not lend themselves to a statistical-significance analysis. At a minimum, the standard petitioners advocate would require the design of a scientific study able to capture the relative rates of incidence (either through a clinical trial or observational study); enough participants and data to perform such a study and make it powerful enough to detect any increased incidence of the adverse effect; and a researcher equipped and interested enough to conduct it.”

Id. at 23.

“As petitioners acknowledge (Br. 23), FDA does not apply any single metric for determining when additional inquiry or action is necessary, and it certainly does not insist upon ‘statistical significance.’ See Adverse Event Reporting 7. Indeed, statistical significance is not a scientifically appropriate or meaningful standard in evaluating adverse event data outside of carefully designed studies. Id. at 5; cf. Lempert 240 (‘it is meaningless to talk about receiving a statistically significant number of complaints’).”

Id. at 19. So statistical significance is unrelated to the case, and the kind of evidence of materiality, alleged by plaintiffs, does not even open itself to a measurement of statistical significance.  At this point, the brief writers might have called it a day.  The amicus brief, however, pushes on.

Solicitor General’s Ignoratio Elenchi

A good part of the government’s amicus brief in Matrixx presented argument irrelevant to the issues before the Court, even assuming that statistical significance was relevant to materiality.

“First, data showing a statistically significant association are not essential to establish a link between use of a drug and an adverse effect. As petitioners ultimately acknowledge (Br. 44 n.22), medical researchers, regulators, and courts consider multiple factors in assessing causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *12.  This statement is a non-sequitur.  The consideration of multiple factors in assessing causation does not make the need for a statistically significant association or more less essential. Statistical significance could still be necessary but not sufficient in assessing causation.  The government’s brief writers pick up the thread a few pages later:

“More broadly, causation can appropriately be inferred through consideration of multiple factors independent of statistical significance. In a footnote, petitioners acknowledge that critical fact: ‘[C]ourts permit an inference of causation on the basis of scientifically reliable evidence other than statistically significant epidemiological data. In such cases experts rely on a lengthy list of factors to draw reliable inferences, including, for example,

(1) the “strength” of the association, including “whether it is statistically significant”;

(2) temporal relationship between exposure and the adverse event;

(3) consistency across multiple studies;

(4) “biological plausibility”;

(5) “consideration of alternative explanations” (i.e., confounding);

(6) “specificity” (i.e., whether the specific chemical is associated with the specific disease at issue); and

(7) dose-response relationship (i.e., whether an increase in exposure yields an increase in risk).’ ”

Pet. Br. 44 n.22 (citations omitted). Those and other factors for inferring causation have been well recognized in the medical literature and by the courts of appeals. See, e.g., Reference Guide on Epidemiology 345-347 (discussing relevance of toxicologic studies), 375-379 (citing, e.g., Austin Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’y Med. 295 (1965))… .”

Id. at 15-16. These enumerated factors are obviously due to Sir Austin Bradford Hill. No doubt Matrixx Initiatives cited the Bradford Hill factors, but that was because the company was contending that statistical significance was necessary but not sufficient to show causation.  As Bradford Hill showed by his famous conclusion that smoking causes lung cancer, these factors were considered after statistical significance was shown in several epidemiologic studies.  The Supreme Court incorporated this non-argument into its opinion, even after disclaiming that causation was needed for materiality or that the Court was going to assess the propriety of causal findings in other cases.

The Solicitor General went on to cite three cases for the proposition that statistical significance is not necessary for assessing causation:

Best v. Lowe’s Home Centers, Inc., 563 F.3d 171, 178 (6th Cir. 2009) (“an ‘overwhelming majority of the courts of appeals’ agree” that differential diagnosis, a process for medical diagnosis that does not entail statistical significance tests, informs causation) (quoting Westberry v. Gislaved Gummi AB, 178 F.3d 257, 263 (4th Cir. 1999)).”

Id. at 16.  These two cases both involved so-called “differential diagnosis” or differential etiology, a process of ruling in, by ruling out.  This method, which involves iterative disjunctive syllogism, starts from established causes, and reasons to a single cause responsible for a given case of the disease.  The citation of these cases was irrelevant and bad scholarship by the government.  The Solicitor General’s error here seems to have been responsible for the Supreme Court’s unthinking incorporation of these cases into its opinion.

The Solicitor General went on to cite a third case, the infamous Ferebee, for its suggestion that statistical significance was not necessary to establish causation:

Ferebee v. Chevron Chem. Co., 736 F.2d 1529, 1536 (D.C. Cir.) (‘[P]roducts liability law does not preclude recovery until a “statistically significant” number of people have been injured’.), cert. denied, 469 U.S. 1062 (1984). As discussed below (see pp. 19-20, infra), FDA relies on a number of those factors in deciding whether to take regulatory action based on reports of an adverse drug effect.”

Id. at 16.  Curiously, the Supreme Court departed from its reliance on the Solicitor General’s brief, with respect to Ferebee, and substituted its own citation to Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D. Ga. 1985), aff’d in relevant part, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986). See Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 1 (Nov. 12, 2012).  The reliance upon the two differential etiology cases was “demonstrably” wrong, but citing Wells was even more bizarre because that case featured at least one statistically significant study relied upon by plaintiffs’ expert witnesses. Ferebee, on the other hand, involved an acute onset of a rare condition – severe pulmonary fibrosis – shortly after exposure to paraquat.  Ferebee was thus a case in which the parties agreed that the causal relationship between paraquat and lung fibrosis had been established by non-analytical epidemiologic evidence.  See Ferebee Revisited.

The government then pointed out in its amicus that sometimes statistical significance is hard to obtain:

“In some circumstances —e.g., where an adverse effect is subtle or has a low rate of incidence —an inability to obtain a data set of appropriate quality or quantity may preclude a finding of statistical significance. Ibid. That does not mean, however, that researchers have no basis on which to infer a plausible causal link between a drug and an adverse effect.”

Id. at 15. Biological plausibility is hardly a biologically established causal link.  Inability to find an appropriate data set often translates into an inability to draw a causal conclusion; inappropriate data are not an excuse for jumping to unsupported conclusions.

Solicitor General’s Bad Advice – Crimen Falsi?

The government’s brief then manages to go from bad to worse. The government’s amicus brief in Matrixx raises serious concerns about criminalizing inappropriate statistical statements, inferences, or conclusions.  If the Solicitor General’s office, with input from Chief Counsel of the Food and Drug Division, of the Department of Health & Human Services, cannot correctly state basic definitions of statistical significance, then the government has no business of prosecuting others for similar offenses.

“To assess statistical significance in the medical context, a researcher begins with the ‘null hypothesis’, i.e., that there is no relationship between the drug and the adverse effect. The researcher calculates a ‘p-value’, which is the probability that the association observed in the study would have occurred even if there were in fact no link between the drug and the adverse effect. If that p-value is lower than the ‘significance level’ selected for the study, then the results can be deemed statistically significant.”

Id. at 13. Here the government’s brief commits a common error that results when lawyers want to simplify the definition of a p-value. The p-value is a cumulative probability of observing a disparity at least as great as observed, given the assumption that there is no difference.  Furthermore, the subjunctive is not appropriate to describe the basic assumption of significance probability.

“The significance level most commonly used in medical studies is 0.05. If the p-value is less than 0.05, there is less than a 5% chance that the observed association between the drug and the effect would have occurred randomly, and the results from such a study are deemed statistically significant. Conversely, if the p-value is greater than 0.05, there is greater than a 5% chance that the observed association would have occurred randomly, and the results are deemed not statistically significant. See Reference Guide on Epidemiology 357-358; David Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 123, 123-125 (2d ed. 2000) (Reference Guide on Statistics).”

Id. at 14. Here the government’s brief drops the conditional of the significance probability; the p-value provides the probability that a disparity at least as large as observed would have occurred (based upon the assumed probability model), given the assumption that there really is no difference between the observed and expected results.

“While statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant – let alone, as here, the absence of any determination one way or the other — does not refute an inference of causation. See Michael D. Green, Expert Witnesses and Sufficiency of Evidence in Toxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643, 682- 683 (1992).”

Id. at 14. Validity is probably the wrong word since most statisticians and scientific authors use validity to refer to features other than low random error.

“Take, for example, results from a study, with a p-value of 0.06, showing that those who take a drug develop a rare but serious adverse effect (e.g., permanent paralysis) three times as often as those who do not. Because the p-value exceeds 5%, the study’s results would not be considered statistically significant at the 0.05 level. But since the results indicate a 94% likelihood that the observed association between the drug and the effect would not have occurred randomly, the data would clearly bear on the drug’s safety. Upon release of such a study, “confidence in the safety of the drug in question should diminish, and if the drug were important enough to [the issuer’s] balance sheet, the price of its stock would be expected to decline.” Lempert 239.2

Id. at 14-15. The citation to Lempert’s article is misleading. At the cited page, Professor Lempert is simply making the point that materiality in a securities fraud case will often be present when evidence for a causal conclusion is not. Richard Lempert, “The Significance of Statistical Significance:  Two Authors Restate An Incontrovertible Caution. Why A Book?” 34 Law & Social Inquiry 225, 239 (2009).  In so writing, Lempert anticipated the true holding of Matrixx Initiative.  The calculation of the 94% likelihood is also incorrect.  The quantity (1 – [p-value]) yields a probability that describes the probability of obtaining a disparity no greater than the observed result, on the assumption that there is no difference at all between observed and expect results. There is, however, a larger point lurking in this passage of the amicus brief, which is the difference between a p-value of 0.05 and 0.06 is not particularly large, and there is thus a degree of arbitrariness to treating it as too sharp a line.

All in all, a distressingly poor performance by the Solicitor General’s office.  With access to many talented statisticians, the government could have at least have had a competent statistician review and approve the content of this amicus brief.  I suspect that most judges and lawyers, however, would balk at drawing an inference that the Solicitor General intended to mislead the Court simply because the brief contained so many misstatements about statistical inference.  This reluctance should have obvious implications for the government’s attempt to criminalize Dr. Harkonen’s statistical inferences.

Multiplicity versus Duplicity – The Harkonen Conviction

December 11th, 2012

United States of America v. W. Scott Harkonen, MD — Part II

The Alleged Fraud – “False as a matter of statistics”

The essence of the government’s case was that drawing an inference of causation from a statistically nonsignificant, post-hoc analysis was “false as a matter of statistics.” ER2498.  Dr. Harkonen’s trial counsel did not present any statistician testimony at trial.  In their final argument, his counsel explained that they obtained sufficient concessions at trial to make their point.

In post-trial motions, new counsel for Dr. Harkonen submitted affidavits from Dr. Steven Goodman and Dr. Donald Rubin, two very capable and highly accomplished statisticians, who explained the diversity of views in their field about the role of p-values in interpreting study data and drawing causal inferences.  At trial, however, the government’s witnesses, Drs. Crager and Fleming, testified that p-values of [less than] 0.05 were “magic numbers.”  United States v. Harkonen, 2010 WL 2985257, at *5 (N.D. Calif. 2010) (Judge Patel’s opinion denying defendant’s post–trial motions to dismiss the indictment, for acquittal, or for a new trial).  Sometimes judges are looking for bright lines in the wrong places.

The Multiplicity Problem

The government argued that the proper interpretation of a given p-value requires information about the nature and context of the statistical test that gave rise to the p-value.  If many independent tests are run on the same set of data, a low p-value would be expected to occur by chance alone.  Multiple testing can inflate the rate of false-positive findings, Type I errors.  The generation of these potentially false positive results is sometimes called the “multiplicity problem”; in the face of multiple testing, a stated p-value can greatly understate the level of false-positive findings.

In the context of a randomized clinical trial, it is thus important to know what the prespecified primary and secondary end points were.  David Moher, Kenneth F. Schulz, and Douglas G. Altman, “The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials,” 357 Lancet 1191 (2001). Post hoc data dredging can lead to the “Texas Sharpshooter Fallacy,” which results when an investigator draws a target around a hit, after the fact, and declares a bulls-eye.

Dr. Fleming thus had a limited point; namely the use of the verb “demonstrate” rather than “show” or “suggest” was too strong if based solely upon InterMune’s clinical trial, given that the low p-value came in the context of a non-prespecified subgroup analysis. (The supposedly offensive press release issued by Dr. Harkonen did indicate that the data confirmed the results in a previously reported phase II trial.) If the government engaged in some counter-speech to say that Dr. Harkonen’s statements fell below an idealized “best statistical practice” in his use of “demonstrate,” many statisticians might well agree with the government.  Even this limited point would evaporates if Dr. Harkonen had stated that the phase III subgroup analysis, along with the earlier published clinical trial, and clinical experience, “demonstrated” a survival benefit.  Had Dr. Harkonen issued this more scientifically felicitous statement, the government could not have made a claim of falsity in using the verb “to demonstrate” with a single p-value from a post hoc subgroup analysis.  Such a statement would have taken Dr. Harkonen’s analytic inference out of the purely statistical realm. Indeed, Dr. Harkonen’s press release did reference an earlier phase II trial, as well as notify readers that more detailed analyses would be presented at upcoming medical conferences.  Although Dr. Harkonen did use “demonstrate” to characterize the results of the phase III trial, standing alone, the entire press release made clear that the data were preliminary. It is difficult to imagine any reasonable physician prescribing Actimmune on the basis of the press release.

The prosecution and conviction of Dr. Harkonen thus raises the issue whether the alleged improper characterization of a study’s statistical result can be criminalized by the State.  Clearly, the federal prosecutors were motivated by their perception that the alleged fraud was connected to an attempt to promote an off-label use of Actimmune.  Such linguistic precision, however, is widely flouted in the world of law and science.  Lawyers use the word “proofs,” which often admit of inferences for either side, to describe real, demonstrative, and testimonial evidence.  A mathematician might be moved to prosecute all lawyers for fraudulent speech.  From the mathematicians’ perspective, the lawyers have made a claim of certainty in using “proof,” which is totally out of place.  Even in the world of science, the verb “to demonstrate” is used in a way that does not imply the sort of certitude that the purists might wish to retain for the strongest of empirical inferences from clinical trials. See, e.g., William B. Wong, Vincent W. Lin, Denise Boudreau, and Emily Beth Devine, “Statins in the prevention of dementia and Alzheimer’s disease: A meta-analysis of observational studies and an assessment of confounding,” 21 Pharmacoepidemiology & Drug Safety in-press, at Abstract (2012) (“Studies demonstrate the potential for statins to prevent dementia and Alzheimer’s disease (AD), but the evidence is inconclusive.”) (emphasis added).

The Duplicity Problem – The Matrixx Motion

After the conviction, Dr. Harkonen’s counsel moved for a new trial on grounds of newly discovered evidence. Dr. Harkonen’s counsel hoisted the prosecutors with their own petards, by quoting the government’s amicus brief to the United States Supreme Court in Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011).  In Matrixx, the securities fraud plaintiffs contended that they need not plead “statistically significant” evidence for adverse drug effects.  The Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs in their claims against an over-the-counter pharmaceutical manufacturer, disclaimed the necessity, or even the importance, of statistical significance:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).

The government’s amicus brief introduces its discussion of this topic with a heading, entitled “Statistical significance is a limited and non-exclusive tool for inferring causation.” Id. at *13.  In a footnote, the government elaborated that its position applied to both safety and efficacy outcomes:

“[t]he same principle applies to studies suggesting that a particular drug is efficacious. A study  in which the cure rate for cancer patients who took a drug was twice the cure rate for those who took a placebo could generate meaningful interest even if the results were not statistically significant.”

Id. at *15 n.2.

The government might have suggested that Dr. Harkonen was parsing the amicus brief incorrectly.  After all, generating “meaningful interest” is not the same as generating a scientific conclusion, or as “demonstrating.” As I will show in a future post, the government, in its amicus brief, consistently misstated the meaning of statistical significance, and of significance probability.  The government’s inability to communicate these concepts correctly raises serious due process issues with a prosecution against someone for having using the wrong verb to describe a statistical inference. 

SCOTUS

The government’s amicus brief was clearly influential before the Supreme Court. The Court cited to, and adopted in dictum, the claim that the absence of statistical significance did not mean that medical expert witnesses could not have a reliable basis for inferring causation between a drug and an adverse event.  Matrixx Initiatives, Inc. v. Siracusano, — U.S. –, 131 S.Ct. 1309, 1319-20 (2011) (“medical professionals and researchers do not limit the data they consider to the results of randomized clinical trials or to statistically significant evidence”).

In any event, the prosecutor, in Dr. Harkonen’s trial, argued in summation that InterMune’s clinical trial had “failed,” and no conclusions could be drawn from the trial.  If this argument was not flatly contradicted by the government’s Matrixx brief, then the argument was certainly undermined by the rhetorical force of the government’s amicus brief.

The district court denied Dr. Harkonen’s motion for a new trial, and explained that the government’s Matrixx amicus brief contained “argument” rather than “newly discovered evidence.” United States v. Harkonen, No. C 08-00164 MHP, Memorandum and Order re Defendant Harkonen’s Motions for a New Trial at 14 (N.D. Calif. April 18, 2011). This rationale seems particularly inappropriate because the interpretation of a statistical test and the drawing of an inference are both “arguments,” and it is a fact that the government contended that p < 0.05 was not necessary to drawing causal inferences. The district court also offered that Matrixx was distinguishable on grounds that the securities fraud in Matrixx involved a safety outcome rather than an efficacy conclusion. This distinction truly lacks a difference:  the standards for determining causation do not differ between establishing harm or efficacy.  Of course, the FDA does employ a lesser, precautionary standard for regulating against harm, but this difference does not mean that the causal connections between harm and drugs are assessed on different standards.

On December 6th, the appeals in United States v. Harkonen were argued and submitted for decision.  Win or lose, Dr. Harkonen is likely to make important law in how scientists and lawyers speak about statistical inferences.

EPA Post Hoc Statistical Tests – One Tail vs Two

December 2nd, 2012

EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 2

In 1992, the U.S. Environmental Protection Agency (EPA) published a risk assessment of lung cancer (and other) risks from environmental tobacco smoke (ETS).  See Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders EPA/600/6-90/006F (1992).  The agency concluded that ETS causes about 3,000 lung cancer deaths each year among non-smoking adults.  See also EPA “Fact Sheet: Respiratory Health Effects of Passive Smoking,” Office of Research and Development, and Office of Air and Radiation, EPA Document Number 43-F-93-003 (Jan. 1993).

In my last post, I discussed  how various plaintiffs, including tobacco companies, challenged the EPA’s conclusions as agency action that violated administrative and statutory procedures. “EPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2. 2012). The plaintiffs further claimed that the EPA had manufactured its methods to achieve the result it desired in advance of the analyses. A federal district court agreed with the methodological challenges to the EPA’s report, but the Court of Appeals reversed on grounds that the agency’s report was not reviewable agency action.  Flue-Cured Tobacco Cooperative Stabilization Corp. v. EPA, 4 F. Supp. 2d 435 (M.D.N.C. 1998), rev’d 313 F.3d 852, 862 (4th Cir. 2002) (Widener, J.) (holding that the issuance of the report was not “final agency action”).

One of the grounds of the plaintiffs’ challenge was that the EPA had changed, without explanation, from a 95% to a 90% confidence interval.  The change in the specification of the coefficient of confidence was equivalent to a shift from a two-tailed to a one-tailed test of confidence, with alpha set at 5%.  This change, along with gerrymandering or “cherry picking” of studies, allowed the EPA to claim a statistically significant association between ETS and lung cancer. 4 F. Supp. 2d at 461.  The plaintiffs pointed to EPA’s own previous risk assessments, as well as statistical analyses by the World Health Organization (International Agency for Research on Cancer), the National Research Council, and the Surgeon General, all of which routinely use 95% intervals, and two-tailed tests of significance.  Id.

In its 1990 Draft ETS Risk Assessment, the EPA had used a 95% confidence interval, but in later drafts, changed to a 90% interval.  One of the epidemiologists on the EPA’s Scientific Advisory Board, Geoffrey Kabat, criticized this post hoc change, noting that the use of 90% intervals are disfavored and that the post hoc change in statistical methodology created the appearance of an intent to influence the outcome of the analysis. Id. (citing Geoffrey Kabat, “Comments on EPA’s Draft Report: Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders,” II.SAB.9.15 at 6 (July 28, 1992) (JA 12,185).

The EPA argued that its adoption of a one-tailed test of significance was justified on the basis of an a priori hypothesis that ETS is associated with lung cancer.  Id. at 451-52, 461 (citing to ETS Risk Assessment at 5–2). The court found this EPA argument hopelessly circular.  The agency postulated its a priori hypothesis, which it then took as license to dilute the statistical test for assessing the evidence.  The agency, therefore, had assumed what it wished to show, in order to achieve the result it sought.  Id. at 456.  The EPA claimed that the one-tailed test had more power, but with dozens of studies aggregated into a summary result, the court recognized that Type I error was a larger threat to the validity of the agency’s conclusions.

The EPA also advanced a muddled defense of its use of 90% confidence intervals by arguing that if it used a 95% interval, the results would have been incongruent with the one-tailed p-values.  The court recognized that this was really no discrepancy at all, but only a corollary of using either one-tailed 5% tests or 90% confidence intervals.  Id. at 461.

If the EPA had adhered to its normal methodology, there would have been no statistically significant association between ETS and lung cancer. With its post hoc methodological choice, and highly selective approach to study inclusions in its meta-analysis, the EPA was able to claim a weak statistically significant association between ETS and lung cancer.  Id. at 463.  The court found this to be a deviation from the legally required use of “best judgment possible based upon the available evidence.”  Id.

Of course, the EPA could have announced its one-tailed test from the inception of the risk assessment, and justified its use on grounds that it was attempting to reach only a precautionary judgment for purposes of regulation.  Instead, the agency tried to showcase its finding as a scientific conclusion, which only further supported the tobacco companies’ challenge to the post hoc change in plan for statistical analysis.

Although the validity issues in the EPA’s 1992 meta-analysis should have been superseded by later studies, and later meta-analyses, the government’s fraud case, before Judge Kessler, resurrected the issue:

“3344. Defendants criticized EPA’s meta-analysis of U.S. epidemiological studies, particularly its use of an ‘unconventional 90 percent confidence interval’. However, Dr. [David] Burns, who participated in the EPA Risk Assessment, testified that the EPA used a one-tailed 95% confidence interval, not a two-tailed 90% confidence interval. He also explained in detail why a one-tailed test was proper: The EPA did not use a 90% confidence interval. They used a traditional 95% confidence interval, but they tested for that interval only in one direction. That is, rather than testing for both the possibility that exposure to ETS increased risk and the possibility that it decreased risk, the EPA only tested for the possibility that it increased the risk. It tested for that possibility using the traditional 5% chance or a P value of 0.05. It did not test for the possibility that ETS protected those exposed from developing lung cancer at the direction of the advisory panel which made that decision based on its prior decision that the evidence established that ETS was a carcinogen. What was being tested was whether the exposure was sufficient to increase lung cancer risk, not whether the agent itself, that is cigarette smoke, had the capacity to cause lung cancer with sufficient exposure. The statement that a 90% confidence interval was used comes from the observation that if you test for a 5% probability in one direction the boundary is the same as testing for a 10% probability in two directions. Burns WD, 67:5-15. In fact, the EPA Risk Assessment stated, ‘Throughout this chapter, one-tailed tests of significance (p = 0.05) are used …’ .”

U.S. v. Philip Morris USA, Inc., 449 F. Supp. 2d 1, 702-03 (D.D.C., 2006) (Kessler, J.) (internal citations omitted).

Judge Kessler was misled by Dr. Burns, a frequent testifier for plaintiffs’ counsel in tobacco cases.  Burns should have known that with respect to the lower bound of the confidence interval, which is what matters for determining whether the meta-analysis excludes a risk ratio of 1.0, there is no difference between a one-tailed 95% confidence interval and a two-tailed 90% interval.  Burns’ sophistry hardly saves the EPA’s error in changing its pre-specified end point and statistical analysis, or the danger of unduly increasing the risk of Type I error in the EPA meta-analysis. SeePin the Tail on the Significance Test” (July 14th, 2012)

Post-script

Judge Widener wrote the opinion for a panel of the United States Court of Appeals, for the Fourth Circuit, which reversed the district court’s judgment, enjoining the EPA’s report.  The Circuit’s decision did not address the scientific issues, but by holding that the agency action was not reviewable, the basis for the district court’ review of the scientific and statistical issues was removed.  For those pundits who see only self-interested behavior in judging, the author of the Circuit’s decision was a life-time smoker, who grew Burley tobacco on his farm, outside Abingdon, Virginia.  Judge Widener died on September 19, 2007, of lung cancer.

Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 6

November 21st, 2012

In 1984, before Judge Shoob gave his verdict in the Wells case, another firm filed a birth defects case against Ortho for failure to warn in connection with its non-ionic surfactant spermicides, in the same federal district court, the Northern District of Georgia. The mother in Smith used Ortho’s product about the same time as the mother in Wells (in 1980).  The case was assigned to Judge Shoob, who recused himself.  Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1562 n.1 (N.D. Ga. 1991) (no reasons for the recusal provided).  The Smith case was reassigned to Judge Horace Ward, who entertained Ortho’s motion for summary judgment in July 1988.  Two and one-half years later, Judge Ward granted summary judgment to Ortho on grounds that the plaintiffs’ expert witnesses’ testimony was not based upon the type of data reasonably relied upon by experts in the field, and was thus inadmissible under Federal Rule of Evidence 703. 770 F. Supp. at 1681.

A prevalent interpretation of the split between Wells and Smith is that the scientific evidence developed with new studies, and that the scientific community’s views matured in the five years between the two district court opinions. The discussion in Modern Scientific Evidence is typical:

“As epidemiological evidence develops over time, courts may change their view as to whether testimony based on other evidence is admissible. In this regard it is worth comparing Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir. 1986), with Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991). Both involve allegations that the use of spermicide caused a birth defect. At the time of the Wells case there was limited epidemiological evidence and this type of claim was relatively novel.  In a bench trial the court found for the plaintiff.  *** The Smith court, writing five years later, noted that, ‘The issue of causation with respect to spermicide and birth defects has been extensively researched since the Wells decision.’ Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1563 (N.D. Ga. 1991).”

1 David L. Faigman, Michael J. Saks, Joseph Sanders, and Edward K. Cheng, Modern Scientific Evidence:  The Law and Science of Expert Testimony, “Chapter 23 – Epidemiology,” § 23:4, at 213 n.12 (West 2011) (internal citations omitted).

Although Judge Ward was being charitable to his judicial colleague, this attempt to reconcile Wells and Smith does a disservice to Judge Ward’s hard work in Smith, and Judge Shoob’s errors in Wells.

Even a casual reading of Smith and Wells reveals that the injuries were completely differently.  Plaintiff Crystal Smith was born with a chromosomal defect known as Trisomy-18; Plaintiff Katie Wells was born with limb reduction deficits.   Some studies relevant to one injury had no information about the other.  Other studies, which addressed both injuries, yielded different results for the different injuries.  Although some additional studies were available to Judge Ward in 1988, this difference is hardly the compelling difference between the two cases.

Perhaps the most important difference between the cases is that in Smith, the biologically plausibility that spermicides could cause a Trisomy-18 was completely absent.  The chromosomal defect arises from a meiotic disjunction, an error in meiosis that is part of the process in which germ cells are formed.  Simply put, spermicides arrive on the scene too late to cause a Trisomy-18.  Notwithstanding the profound differences between the injuries involved in Wells and Smith, the Smith plaintiffs sought the application of collateral estoppel.  Judge Ward refused this motion, on the basis of the factual differences in the cases, as well as the availability of new evidence.  770 F.Supp. at 1562.

The difference in injuries, however, was not the only important difference between these two cases.  Wells was actually tried, apparently without any challenge under Frye, or Rules 702 or 703, to the admissibility of expert witness testimony.  There is little to no discussion of scientific validity of studies, or analysis of the requisites for evaluating associations for causality.  It is difficult to escape the conclusion that Judge Shoob decided the Wells case on the basis of superficial appearances, and that he frequently ignored validity concerns in drawing invidious distinctions between plaintiffs’ and defendant’s expert witnesses and their “credibility.”  Smith, on the other hand, was never tried.  Judge Ward entertained and granted dispositive motions for summary judgment, on grounds that the plaintiffs’ expert witnesses’ testimony was inadmissible. Legally, the cases are light years apart.

In Smith, Judge Ward evaluated the same FDA reports and decisions seen by Judge Shoob.  Judge Ward did not, however, dismiss these agency materials simply because one or two of dozens of independent scientists involved had some fleeting connection with industry. 770 F.Supp. at 1563-64.

Judge Ward engaged with the structure and bases of the expert witnesses’ opinions, under Rules 702 and 703.  The Smith case thus turned on whether expert witness opinions were admissible, an issue not considered or discussed in Wells.  As was often the case before the Supreme Court decided Daubert in 1993, Judge Ward paid little attention to Rule 702’s requirement of helpfulness or knowledge.  The court’s 702 analysis was limited to qualifications.  Id. at 1566-67.  The qualifications of the plaintiffs’ witnesses were rather marginal.  They relied upon genetic and epidemiologic studies, but they had little training or experience in these disciplines. Finding the plaintiffs’ expert witnesses to meet the low threshold for qualification to offer an opinion in court, Judge Ward focused on Rule 703’s requirement that expert witnesses reasonably rely upon facts and data that are not otherwise admissible.

The trial court in Smith struggled with how it should analyze the underpinnings of plaintiffs’ witnesses’ proffered testimony.  The court acknowledged that conflicts between expert witnesses typically raise questions of weight, not admissibility.  Id. at 1569.  Ortho had, however, challenged plaintiffs’ witnesses for having given opinions that lacked a “sound underlying methodology.” Id.  The trial court found at least one Fifth Circuit case that suggested that Rule 703 requires trial courts to evaluate the reliability of expert witnesses’ sources.  Id. (citing Soden v. Freightliner Corp., 714 F.2d 498, 505 (5th Cir. 1983). Elsewhere, the trial court also found precedent from Judge Weinstein’s opinion in Agent Orange, as well as Court of Appeals decisions involving Bendectin, all of which turned to Rule 703 as the legal basis for reviewing, and in some cases limiting or excluding expert witness opinion testimony.  Id.

The defendant’s argument under Rule 703 was strained; Ortho argued that the plaintiffs’

“experts’ selection and use of the epidemiological data is faulty and thus provides an insufficient basis upon which experts in the field of diagnosing the source of birth defects normally form their opinions. The defendant also contends that the plaintiffs’ experts’ data on genetics is not of the kind reasonably relied upon by experts in field of determining causation of birth defects.”

Id. at 1572.  Nothing in Rule 703 addresses the completeness or thoroughness of expert witnesses in their consideration of facts and data; nor does Rule 703 address the sufficiency of data or the validity vel non of inferences drawn from facts and data considered.  Nonetheless, the trial court in Smith took Rule 703 as its legal basis for exploring the epistemic warrant for plaintiffs’ witnesses’ causation opinions.

Although plaintiffs’ expert witnesses stated that they had relied upon epidemiologic studies and method, the trial court in Smith went beyond their asseverations.  The Smith trial court explored the credibility of these witnesses at a whole other level.  The court reviewed and discussed the basic structure of epidemiologic studies, and noted that the objective of such studies is to provide a statistical analysis:

“The objective of both case-control and cohort studies is to determine whether the difference observed in the two groups, if any, is ‘statistically significant’, (that is whether the difference found in the particular study did not occur by chance alone).40 However, statistical methods alone, or the finding of a statistically significant association in one study, do not establish a causal relationship.41 As one authority states:

‘Statistical methods alone cannot establish proof of a causal relationship in an association’.42

As a result, once a statistical association is found in an epidemiological study, that data must then be evaluated in a systematic manner to determine causation. If such an association is present, then the researcher looks for ‘bias’ in the study.  Bias refers to the existence of factors in the design of a study or in the manner in which the study was carried out which might distort the result.43

If a statistically significant association is found and there is no apparent ‘bias’, an inference is created that there may be a cause-and-effect relationship between the agent and the medical effect. To confirm or rebut that inference, an epidemiologist must apply five criteria in making judgments as to whether the associations found reflect a cause-and-effect relationship.44 The five criteria are:

1. The consistency of the association;

2. The strength of the association;

3. The specificity of the association;

4. The temporal relationship of the association; and,

5. The coherence of the association.

Assuming there is some statistical association, it is these five criteria that provide the generally accepted method of establishing causation between drugs or chemicals and birth defects.45

The Smith court acknowledged that there were differences of opinion in weighting these five factors, but that some of them were very important to drawing a reliable inference of causality.  Id. at 1775.

A major paradigm shift thus separates Wells and Smith.  The trial court in Wells contented itself with superficial and subjective indicia of witnesses’ personal credibility; the trial in Smith delved into the methodology of drawing an appropriate scientific conclusion about causation.  Telling was the Smith court’s citation to Moultrie v. Martin, 690 F.2d 1078, 1082 (4th Cir. 1982) (“In borrowing from another discipline. a litigant cannot be selective in which principles are applied.”).  770 F.Supp. at 1575 & n.45.  Gone is the Wells retreat from engagement with science, and the dodge that the court must make a legal, not a scientific decision.

Applying the relevant principles, the Smith court found that the plaintiffs’ expert witnesses had deviated from the scientific standards of reasoning and analysis:

“It is apparent to the court that the testimony of Doctors Bussey and Holbrook is insufficiently grounded in any reliable evidence. * * * The conclusions Doctors Bussey and Holbrook reach are also insufficient as a basis for a finding of causality because they fail to consider critical information, such as the most relevant epidemiologic studies and the other possible causes of disease.81

The court finds that the opinions of plaintiffs’ experts are not based upon the type of data reasonably relied upon by experts in determining the cause of birth defects. Experts in determining birth defects rely upon a consensus in genetic or epidemiological investigations or specific generally accepted studies in these fields. While a consensus in genetics or epidemiology is not a prerequisite to a finding of causation in any and all birth defect cases, Rule 703 requires some reliable evidence for the basis of an expert’s opinion.

Experts in determining birth defects also utilize methodologies and protocols not followed by plaintiffs’ experts. Without a well-founded methodology, opinions which run contrary to the consensus of the scientific community and are not supported by any reliable data are necessarily speculative and lacking in the type of foundation necessary to be admissible.

For the foregoing reasons, the court finds that plaintiffs have failed to produce admissible evidence sufficient to show that defendant’s product caused Crystal’s birth defects.”

Id. at 1581.  Rule 703 was forced into a service to filter out methodologically specious opinions.

Not all was smooth sailing for Judge Ward.  Like Judge Shoob, Judge Ward seemed to think that a physical examination of the plaintiff provided helpful, relevant evidence, but he never articulated what the basis for this opinion was. (His Honor did note that the parties agreed that the physical examination offered no probative evidence about causation.  Id. at 1572 n.32.) No harm came of this opinion.  Judge Ward wrestled with the lack of peer review in some unpublished studies, and the existence of a study only in abstract form.  See, e.g., id. at 1579 (“a scientific study not subject to peer review has little probative value”); id. at 1578 (insightfully noting that an abstract had insufficient data to permit a reader to evaluate its conclusions).  The Smith court recognized the importance of statistical analysis, but it confused Bayesian posterior probabilities with significance probabilities:

“Because epidemiology involves evidence on causation derived from group based information, rather than specific conclusions regarding causation in an individual case, epidemiology will not conclusively prove or disprove that an agent or chemical causes a particular birth defect. Instead, its probative value lies in the statistical likelihood of a specific agent causing a specific defect. If the statistical likelihood is negligible, it establishes a reasonable degree of medical certainty that there is no cause-and-effect relationship absent some other evidence.”

The confusion here is hardly unique, but ultimately it did not prevent Judge Ward from reaching a sound result in Smith.

What intervened between Wells and Smith was not any major change in the scientific evidence on spermicides and birth defects; the sea change came in the form of judicial attitudes toward the judge’s role in evaluating expert witness opinion testimony.  In 1986, for instance, after the Court of Appeals affirmed the judgment in Wells, Judge Higginbotham, speaking for a panel of the Fifth Circuit, declared:

“Our message to our able trial colleagues: it is time to take hold of expert testimony in federal trials.”

 In re Air Crash Disaster at New Orleans, 795 F.2d 1230, 1234 (5th Cir. 1986).  By the time the motion for summary judgment in Smith was decided, that time had come.

Wells v. Ortho Pharmaceutical Corp. Reconsidered – Part 5

November 21st, 2012

While the trial court was preparing its findings of fact and conclusions of law, Ortho moved to reopen to evidence to permit additional testimony based upon three new articles.  Ortho’s motion came three months after the close of evidence, and Judge Shoob’s announcement of his verdict. The court denied this motion without mentioning what the new articles purported to show.  Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262, 298 (N.D. Ga. 1985), aff’d and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986).

What is remarkable in Wells, from the vantage point of current practice, is the absence of motions directed at the proffered expert witness opinion testimony.  On the basis of Judge Shoob’s opinion, there appears to have been no Frye motion, no motions to exclude expert witnesses based upon the Federal Rules of Evidence, and no motions to strike testimony after the fact for lack of a proper basis.

Having lost the verdict in a bench trial, Ortho had little chance for success in the Court of Appeals on a claim that the evidence supporting the plaintiffs’ verdict was legally insufficient.  The traditional standard, applied by the Court of Appeals, was to sustain the trier of fact’s decision as not “clearly erroneous” when there were two “permissible” views of the evidence. 788 F.2d 741, 743 (11th Cir. 1986).  Without some legal doctrine to filter out flawed, invalid, and inadequate expert witness opinion from permissible views of an evidentiary display, the Court of Appeals was left with only a rubber stamp, which it proceeded to use with alacrity.

Ortho attempted to turn its appellate argument about the sufficiency of the evidence into a legal principle about rejecting factual findings not based upon “scientifically reliable foundations.”  Id. at 744.  The appellate court framed the issue on appeal simply as a “battle of the experts,” which Ortho had lost.  Both sides had qualified expert witnesses, and thus, according to the appellate court, “the district court was forced to make credibility determinations to ‘decide the victor’.” Id. (citing Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1535 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984)).  The Court of Appeals thus acquiesced in Judge Shoob’s superficial analysis, which attempted to resolve a scientific issue by trial atmospherics, demeanor, and subjective impressions of witness confidence rather than the validity of the studies relied upon and inferences drawn therefrom.  The possibility that Judge Shoob might have evaluated the evidentiary basis underlying the expert witnesses’ opinions was not even acknowledged.

The Court of Appeals invoked the language from Ferebee on statistical significance, despite its irrelevance to the case before it:

“We recognize, as did the Ferebee court, that ‘a cause-effect relationship need not be clearly established by animal or epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists. As long as the basic methodology employed to reach such a conclusion is sound, such as use of tissue samples, standard tests, and patient examination, products liability law does not preclude recovery until a “statistically significant” number of people have been injured or until science has had the time and resources to complete sophisticated laboratory studies of the chemical. Id. at 1535-36.”

Wells, 788 F.2d at 745 (quoting Ferebee). Ferebee involved an injury that all parties agreed could be attributed to paraquat exposure without the need for epidemiologic studies; statistical analysis was not particularly germane.  In Wells, on the other hand, both sides relied upon studies that required statistical analyses for any sensible interpretation, and some of the studies actually reported statistically significant results.  The appellate court’s rhetoric was empty and irrelevant.

(to be continued)

Broadbent on the Relative Risk > 2 Argument

October 31st, 2012

Alex Broadbent, of the University of Johannesburg, Department of Philosophy, has published a paper that contributes to the debate over whether a relative risk (RR) greater than (>) two is irrelevant, helpful, necessary, or sufficient in inferring that an exposure more likely than not caused an individual claimant’s disease. Alex Broadbent, “Epidemiological Evidence in Proof of Specific Causation,” 17 Legal Theory 237 (2011) [cited as Broadbent].  I am indebted to his having called his paper to my attention. Professor Broadbent’s essay is clearly written, which is helpful in assessing the current use of the RR > 2 argument in judicial decisions.

General vs. Specific Causation

Broadbent carefully distinguishes between general and specific causation.  By focusing exclusively upon specific causation (and assuming that general causation is accepted), he avoids the frequent confusion over when RR > 2 might play a role in legal decisions. Broadbent also “sanitizes” his portrayal of RR by asking us to assume that “the RR is not due to anything other than the exposure.” Id. at 241. This is a BIG assumption and a tall order for observational epidemiologic evidence.  The study or studies that establishes the RR we are reasoning from must be free of bias and confounding. Id.  Broadbent does not mention, however, the statistical stability of the RR, which virtually always will be based upon a sample, and thus subject to the play of random error.  He sidesteps the need for statistical significance in comparing two proportions, but the most charitable interpretation of his paper requires us to assume further that the hypothetical RR from which we are reasoning is sufficiently statistically stable that random error, along with bias and confounding, can be also ruled out as likely explanations for the RR > 1.

Broadbent sets out to show that RR > 2 may, in certain circumstances, suffices to show specific causation, but he argues that RR > 2 is never logically necessary, and must never be required to support a claim of specific causation.  Broadbent at 237.  On the same page in which he states that epidemiologic evidence of increased risk is a “last resort,” Broadbent contradicts himself by stating RR > 2 evidence “must never be required,” and then, in an apparent about face, he argues:

“that far from being epistemically irrelevant, to achieve correct and just outcomes it is in fact mandatory to take (high-quality) epidemiological evidence into account in deciding specific causation. Failing to consider such evidence when it is available leads to error and injustice. The conclusion is that in certain circumstances epidemiological evidence of RR > 2 is not necessary to prove specific causation but that it is sufficient.”

Id. at 237 (emphasis added). I am not sure how epidemiologic evidence can be mandatory but never logically necessary, and something that we should never require.

Presumably, Broadbent is using “to prove” in its legal and colloquial sense, and not as a mathematician.  Let us also give Broadbent his assumptions of “high quality” epidemiologic studies, with established general causation, and ask why, and explore when and whether, RR > 2 is not necessary to show specific causation.

The Probability of Causation vs. The Fact of Causation

Broadbent notes that he is arguing against what he perceives to be Professor Haack’s rejection of probabilistic inference, which would suggest that epidemiologic evidence is “never sufficient to establish specific causation.” Id. at 239 & n.3 (citing Susan Haack, “Risky Business: Statistical Proof of Individual Causation,” in Causación y Atribucion de Responsabilidad (J. Beltran ed., forthcoming)). He correctly points out that sometimes the probabilistic inference is the only probative inference available to support specific causation.  His point, however, does not resolve the dispute; it suffices only to show that whether we allow the probabilistic inference may be outcome determinative in many lawsuits.  Broadbent characterizes Haack’s position as one of two “serious mistakes in judicial and academic literature on this topic.”  Broadbent at 239.  The other alleged mistake is the claim that RR > 2 is needed to show specific causation:

“What follows, I conclude, is that epidemiological evidence is relevant to the proof of specific causation. Epidemiological evidence says that a particular exposure causes a particular harm within a certain population. Importantly, it quantifies: it says how often the exposure causes the harm. However, its methods are limited: they measure only the net effect of the exposure, leaving open the possibility that the exposure is causing more harm than the epidemiological evidence suggests—but ruling out the possibility that it causes less. Accordingly I suggest that epidemiological evidence can be used to estimate a lower bound on the probability of causation but that no epidemiological measure can be required. Thus a relative risk (RR, defined in Section II) of greater than 2 can be used to prove causation when there is no other evidence; but RR < 2 does not disprove causation. Given high-quality epidemiological evidence, RR > 2 is sufficient for proof of specific causation when no other evidence is available but not necessary when other evidence is available.”

Some of this seems reasonable enough.  Contrary to the claims of authors such as Haack and Wright, Broadbent maintains that some RR evidence is relevant and indeed probative of specific causation.  In a tobacco lung cancer, with a plaintiff who has smoked three packs a day, for 50 years (and RR > 50), we can confidently attribute the lung cancer to smoking, and rest assured that background cosmic radiation did not likely play a substantial role. The RR quantifies the strength of the association, and it does lead us to a measure of “attributable risk” (AR), also known as the attributable fraction (AF):

AR = 1 – 1/RR.

So far, so good.

Among the perplexing statements above, however, Broadbent suggests that:

1. The methods of epidemiologic evidence measure only the net effect of the exposure.  Epidemiologic evidence (presumably the RR or other risk ratio) provides a lower bound on the probability of causation.  I take up this suggestion in discussing Broadbent’s distinction between the “excess fraction,” and the “etiologic fraction,” below.

2. A RR > 2 “can be used to prove causation when there is no other evidence; but RR < 2 does not disprove causation.” (My emphasis.) When an author is usually clear about his qualifications, and his language generally, it is distressing for him to start comparing apples to oranges.  Note that RR > 2 suffices “when there is no other evidence,” but the parallel statement about RR < 2 is not similarly qualified, and the statement about RR < 2 is framed in terms of disproof of causation. Even if the RR < 2 did not “disprove” specific causation, when there was no other evidence, it would not prove causation.  And if there is no other evidence, judgment for the defense must result. Broadbent fails to provide us a persuasive scenario in which a RR ≤ 2, with no other evidence, would support an inference of specific causation.

Etiological Fraction vs. Excess Fraction — Occam’s Disposable Razor

Broadbent warns that the expression “attributable risk” (AR or “attributable fraction,” AF) is potentially misleading.  The numerical calculation identifies the excess number of cases, above “expected” per base rate, and proceeds from there.  The AR thus identifies the “excess fraction,” and not the “etiological fraction,” which is the fraction of all cases in which exposure makes a contribution. Broadbent tells us that:

“Granted a sound causal inference, we can infer that all the excess cases are caused by the exposure. But we cannot infer that the remaining cases are not caused by the exposure. The etiologic fraction—the cases in which the exposure makes a causal contribution—could be larger. Roughly speaking, this is because, in the absence of substantive biological assumptions, it is possible that the exposure could contribute to cases that would have occurred12 even without the exposure.13 For example, it might be that smoking is a cause of lung cancer even among some of those who would have developed it anyway. The fact that a person would have developed lung cancer anyway does not offer automatic protection against the carcinogenic effects of cigarette smoke (a point we return to in Section IV).”

Id. at 241. In large measure here, Broadbent has adopted (and acknowledged) his borrowings from Professor Sander Greenland.  Id. at 242 n.11. The argument  still fails.  What Broadbent has interposed is a “theoretical possibility” that the exposure in question may contribute to those cases that would have occurred anyway.  Note that raising theoretical possibilities here now alters the hypothetical; Broadbent is no longer working from a hypothetical that we have a RR and no other evidence.  Even more important, we are left guessing what it means to say that an exposure causes some cases that would have occurred anyway.  If we accept the postulated new evidence at face value, we can say confidently that the exposure is not the “but for” cause of the case at issue.  Without sufficient evidence of “but for” causation, plaintiff will lose. Furthermore, we are being told to add a new fact to the hypothetical, namely that the non-excess cases are causally over-determined.  If this is the only additional new fact being added, a court might invoke the rule in Summers v. Tice, but even so, the defense will be entitled to a directed verdict if the RR < 2. (If the RR = 2, I suppose, the new fact, and the change in the controlling rule, might alter the result.)

Exposures that Cause Some and Prevent Some Cases of Disease

Broadbent raises yet another hypothetical possibility, which adds to, and materially alters,  his original hypothetical.  If the exposure in question, causes some cases, and prevents others, then the RR ≤ 2 will not permit us to infer that a given case is less likely than not the result of the exposure.  (Broadbent might have given an example of what he had in mind, from well-established biological causal relationships; I am skeptical that he would have found one that would have satisfactorily made his argument.) The bimodal distribution of causal effects is certainly not typical of biological processes, but even if we indulge the “possibility,” we are now firmly in the realm of speculation.  This is a perfectly acceptable realm for philosophers, but in court, we want evidence.  Assuming that the claimant could present such evidence, finders of fact would still founder because the new evidence would leave them guessing whether the claimant was a person who would have gotten the disease anyway, or got it because of the exposure, or even got it in spite of the exposure.

Many commentators who urge a “probability of [specific] causation” approach equate the probability of causation (PC) with the AR.  Broadbent argues that because of the possibility that some biological model results in the etiologic fraction exceeded the excess fraction, the usual equation of PC = AR, must be represented as an equality:

PC ≥ AR

While the point is logically unexceptional, Broadbent must concede that some other evidence, which supports and justifies the postulated biological model, is required to change the equality to an inequality.  If no other evidence besides the RR is available, we are left with the equality.  Broadbent tells us that the biological model “often” requires that the etiological fraction exceeds the excess fraction, but he never tells us how often, or how we would ascertain the margin of error.  Id. at 256.

Broadbent does not review any of the decided judicial cases to point out which ones involved biological models that invalidated the equality.  Doing so would be an important exercise because it might well show that even where PC ≥ AR, with a non-quantified upper bound, the plaintiff might still fail in presenting a prima facie case of specific causation.  Suppose the population RR for the exposure in question were 1.1, and we “know” (and are not merely speculating) that the etiological fraction > excess fraction.   Unless we know how much greater is the etiological fraction, such that we can recalculate the PC, then we are left agnostic about specific causation.

Broadbent treats us to several biological scenarios in which PC possibly is greater than AR.  All of these scenarios violate his starting premiss that we have a RR with no other evidence. For instance, Broadbent hypothesizes that exposure might accelerate onset of a disease.  Id. at 256. This biological model of acceleration can be established with the same epidemiologic evidence that established the RR for the population.  Epidemiologists will frequently look at time windows from onset of exposure to explore whether there is an acceleration of onset of cases in a younger age range that offsets a deficit later in the lives of the exposed population.  If there were firm evidence of such a phenomenon, then we would look to the RR within the relevant time window.  If the relevant RR ≤ 2, the biological model will have added nothing to the plaintiff’s case.

Broadbent cites Greenland for the proposition that PC > AR:

“We know of no cancer or other important chronic disease for which current biomedical knowledge allows one to exclude mechanisms that violate the assumptions needed to claim that PC = [AF].”

Id. at 259, quoting form Sander Greenland & James Robins, “Epidemiology, Justice, and the Probability of Causation,” 40 Jurimetrics J. 321, 325 (2000).  Here, not only has Broadbent postulated a mechanism that makes PC > AR, but he has shifted the burden of proof to the defense to exclude it!

The notion that the etiological fraction may exceed the excess fraction is an important caveat.  Courts and lawyers should take note.  It will not do, however, wave hands and exclaim that the RR > 2 is not a “litmus test,” and proceed to let any RR > 1, or even RR ≤ 1 support a verdict.  The biological models that may push the etiological fraction higher than the excess fraction can be tested, and quantified, with the same epidemiologic approaches that provided a risk ratio, in the first place.  Broadbent gives us an example of this sort of hand waving:

“Thus, for example, evidence that an exposure would be likely to aggravate an existing predisposition to the disease in question might suffice, along with RR between 1 and 2, to make it more likely than not that the claimant’s disease was caused by the exposure.”

Id. at 275. This is a remarkable, and unsupported claim.  The magnitude of the aggravation might still leave the RR ≤ 2.  What is needed is evidence that would allow quantification of the risk ratio in the scenario presented. Speculation will not do the trick; nor will speculation get the case to a jury, or support a verdict.

 

Old-Fashioned Probablism – Origins of Legal Probabilism

October 26th, 2012

In several posts, I have addressed Professor Haack’s attack on legal probabilism.  See

Haack Attack on Legal Probabilism (May 6, 2012).  The probabilistic mode of reasoning is not a modern innovation; nor is the notion that the universe is entirely determined, although revealed to humans as a stochastic phenomenon:

“I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all.”

Ecclesiastes 9:11 King James Bible (Cambridge ed.)

The Old Testament describes the “casting of lots,” some sort of dice rolling or coin flipping, in a wide variety of human decision making.  The practice is described repeatedly in the Old Testament, and half a dozen times in the New Testament.

Casting of lots figures more prominently in the Old Testament, in the making of important decisions, and in attempting to ascertain “God’s will.”  The Bible describes matters of inheritance, Numbers 34:13; Joshua 14:2, and division of property, Joshua 14-21, Numbers 26:55, as decided by lots.  Elections to important office, including offices and functions in the Temple, were determined by lot. 1 Chronicles 24:5, 31; 25:8-9; 26:13-14; Luke 1:9.

Casting lots was an early form of alternative dispute resolution – alternative to slaying and smiting.  Proverbs describes the lot as used as a method to resolve quarrels.  Proverbs 18:18.  Lot casting determined fault in a variety of situations.  Lots were cast to identify the culprit who had brought God’s wrath upon Jonah’s ship. Jonah 1:7 (“Come, let us cast lots, that we may know on whose account this evil has come upon us.”).

What we might take as a form of gambling appeared to have been understood by the Israelites as a method for receiving instruction from God. Proverbs 16:33 (“The lot is cast into the lap, but its every decision is from the Lord.”).  This Old Testament fortune cookie suggests that the Lord knows the outcome of the lot casting, but mere mortals must wager.  I like to think the passage means that events that appear to be stochastic to humans may have a divinely determined mechanism.  In any event, the Bible describes various occasions on which lots were cast to access the inscrutable intentions and desires of the Lord.  Numbers 26:55; 33:54; 34:13; 36:2; Joshua 18:6-10; 1 Chronicles 24:5,31; 1 Samuel 14:42; Leviticus 16:8-10 (distinguishing between sacrificial and scape goat).

In the New Testament, the Apostles cast lots to decide upon a replacement for Judas (Acts 1:26). Matthias was the winner.  Matthew, Mark, and John describe Roman soldiers casting lots for Jesus’ garments (Matthew 27:35; Mark 15:24; John 19:24.  See also Psalm 22:18.  This use of lots by the Roman soldiers seems to have taken some of the magic out of lot casting, which fell into disrepute and gave way to consultations with the Holy Spirit for guidance on important decisions.

The Talmud deals with probabilistic inference in more mundane settings.  The famous “Nine Shops” hypothetical poses 10 butcher shops in a town, nine of which sell kosher meat.  The hypothetical addresses whether the dietary laws permit eating a piece of meat found in town, when its butchering cannot be attributed to either the nine kosher shops or the one non-kosher shop:

“A typical question involves objects whose identity is not known and reference is made to the likelihood that they derive from a specific type of source in order to determine their legal status, i.e. whether they be permitted or forbidden, ritually clean or unclean, etc. Thus, only meat which has been slaughtered in the prescribed manner is kasher, permitted for food. If it is known that most of the meat available in a town is kasher, there being, say, nine shops selling kasher meat and only one that sells non-kasher meat, then it can be assumed when an unidentified piece of meat is found in the street that it came from the majority and is therefore permitted.”

Nachum L. Rabinovitch, “Studies in the History of Probability and Statistics.  XXII Probability in the Talmud,” 56 Biometrika 437, 437 (1969).  Rabinovitch goes on to describe the Talmud’s resolution of this earthly dilemma:  “follow the majority” or the most likely inference.

A small digression on this Talmudic hypothetical.  First, why not try to find out whether someone has lost this package of meat? Or turn the package in to the local “lost and found.” Second, how can it be kosher to eat a piece of meat found lying around in the town?  This is really not very appetizing, and it cannot be good hygiene.  Third, why not open the package and determine whether it’s a nice pork tenderloin or a piece of cow?  This alone could resolve the issue. Fourth, the hypothetical posed asks us to assume a 9:1 ratio of kosher to non-kosher shops, but what if the one non-kosher shop had a market share equal to the other nine? The majority rule could lead to an untoward result for those who wish to keep kosher.

The Talmud’s proposed resolution is, nevertheless, interesting in anticipating the controversy over the use of “naked statistical inferences” in deciding specific causation or discrimination cases.  Of course, the 9:1 ratio is sufficiently high that it might allow an inference about the “likely” source of the meat.  The more interesting case would have been a town with 11 butcher shops, six of which were kosher.  Would the rabbis of old have had the intestinal fortitude to eat lost & found meat, on the basis of a ratio of 6:5?

In the 12th century, Maimonides rejected probabilistic conclusions for assigning criminal liability, at least where the death penalty was at issue:

“The 290th Commandment is a prohibition to carry out punishment on a high probability, even close to certainty . . .No punishment [should] be carried out except where . . . the matter is established in certainty beyond any doubt, and , moreover, it cannot be explained otherwise in any manner.  If we do not punish on very strong probabilities, nothing can happen other than a sinner be freed; but if punishment be done on probability and opinion it is possible that one day we might kill an innocent man — and it is better and more desirable to free a thousand sinners, than ever kill one innocent.”

Stephen E. Fienberg, ed., The Evolving Role of Statistical Assessments as Evidence in the Courts 213 (N.Y. 1989), quoting from Nachum Rabinovitch, Probability and Statistical Inference in Ancient and Medieval Jewish Literature 111 (Toronto 1973).

Indiana Senate candidate and theocrat, Republican Richard E. Mourdock, recently opined that conception that results from rape was God’s will:

“I’ve struggled with it myself for a long time, but I came to realize that life is that gift from God.  And even when life begins in that horrible situation of rape, that it is something that God intended to happen.”

Jonathan Weisman, “Rape Remark Jolts a Senate Race, and the Presidential One, Too,” N.Y. Times (Oct. 25, 2012 ).

Mourdock’s comments about pregnancies resulting from rape representing God’s will show that stochastic events continue to be interpreted as determined mechanistic events at some “higher plane.” Magical thinking is still with us.