TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Confidence in Intervals and Diffidence in the Courts

March 4th, 2012

Next year, the Supreme Court’s Daubert decision will turn 20.  The decision, in interpreting Federal Rule of Evidence 702, dramatically changed the landscape of expert witness testimony.  Still, there are many who would turn the clock back to disabling the gatekeeping function.  In past posts, I have identified scholars, such as Erica Beecher-Monas and the late Margaret Berger, who tried to eviscerate judicial gatekeeping.  Recently a student note argued for the complete abandonment of all judicial control of expert witness testimony.  See  Note, “Admitting Doubt: A New Standard for Scientific Evidence,” 123 Harv. L. Rev. 2021 (2010)(arguing that courts should admit all relevant evidence).

One advantage that comes from requiring trial courts to serve as gatekeepers is that the expert witnesses’ reasoning is approved or disapproved in an open, transparent, and rational way.  Trial courts subject themselves to public scrutiny in a way that jury decision making does not permit.  The critics of Daubert often engage in a cynical attempt to remove all controls over expert witnesses in order to empower juries to act on their populist passions and prejudices.  When courts misinterpret statistical and scientific evidence, there is some hope of changing subsequent decisions by pointing out their errors.  Jury errors on the other hand, unless they involve determinations of issues for which there were “no evidence,” are immune to institutional criticism or correction.

Despite my whining, not all courts butcher statistical concepts.  There are many astute judges out there who see error and call it error.  Take for instance, the trial judge who was confronted with this typical argument:

“While Giles admits that a p-value of .15 is three times higher than what scientists generally consider statistically significant—that is, a p-value of .05 or lower—she maintains that this ‘‘represents 85% certainty, which meets any conceivable concept of preponderance of the evidence.’’ (Doc. 103 at 16).”

Giles v. Wyeth, Inc., 500 F.Supp. 2d 1048, 1056-57 (S.D.Ill. 2007), aff’d, 556 F.3d 596 (7th Cir. 2009).  Despite having case law cited to it (such as In re Ephedra), the trial court looked to the Reference Manual on Scientific Evidence, a resource that seems to be ignored by many federal judges, and rejected the bogus argument.  Unfortunately, the lawyers who made the bogus argument still are licensed, and at large, to incite the same error in other cases.

This business perhaps would be amenable to an empirical analysis.  An enterprising sociologist of the law could conduct some survey research on the science and math training of the federal judiciary, on whether the federal judges have read chapters of the Reference Manual before deciding cases involving statistics or science, and whether federal judges expressed the need for further education.  This survey evidence could be capped by an analysis of the prevalence of certain kinds of basic errors, such as the transpositional fallacy committed by so many judges (but decisively rejected in the Giles case).  Perhaps such an empirical analysis would advance our understanding whether we need specialty science courts.

One of the reasons that the Reference Manual on Scientific Evidence is worthy of so much critical attention is that the volume has the imprimatur of the Federal Judicial Center, and now the National Academies of Science.  Putting aside the idiosyncratic chapter by the late Professor Berger, the Manual clearly present guidance on many important issues.  To be sure, there are gaps, inconsistencies, and mistakes, but the statistics chapter should be a must-read for federal (and state) judges.

Unfortunately, the Manual has competition from lesser authors whose work obscures, misleads, and confuses important issues.  Consider an article by two would-be expert witnesses, who testify for plaintiffs, and confidently misstate the meaning of a confidence interval:

“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”

Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004).  This misstatement was then cited and quoted with obvious approval by Professor Beecher-Monas, in her text on scientific evidence.  Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 60-61 n. 17 (2007).   Beecher-Monas goes on, however, to argue that confidence interval coefficients are not the same as burdens of proof, but then implies that scientific standards of proof are different from the legal preponderance of the evidence.  She provides no citation or support for the higher burden of scientific proof:

“Some commentators have attributed the causation conundrum in the courts to the differing burdens of proof in science and law.28 In law, the civil standard of ‘more probable than not’ is often characterized as a probability greater than 50 percent.29 In science, on the other hand, the most widely used standard is a 95 percent confidence interval (corresponding to a 5 percent level of significance, or p-level).30 Both sound like probabilistic assessment. As a result, the argument goes, civil judges should not exclude scientific testimony that fails scientific validity standards because the civil legal standards are much lower. The transliteration of the ‘more probable than not’ standard of civil factfinding into a quantitative threshold of statistical evidence is misconceived. The legal and scientific standards are fundamentally different. They have different goals and different measures.  Therefore, one cannot justifiably argue that evidence failing to meet the scientific standards nonetheless should be admissible because the scientific standards are too high for preponderance determinations.”

Id. at 65.  This seems to be on the right track, although Beecher-Monas does not state clearly whether she subscribes to the notion that the burdens of proof in science and law differ.  The argument then takes a wrong turn:

“Equating confidence intervals with burdens of persuasion is simply incoherent. The goal of the scientific standard – the 95 percent confidence interval – is to avoid claiming an effect when there is none (i.e., a false positive).31

Id. at 66.   But this is crazy error; confidence intervals are not burdens of persuasion, legal or scientific.  Beecher-Monas is not, however, content to leave this alone:

“Scientists using a 95 percent confidence interval are making a prediction about the results being due to something other than chance.”

Id. at 66 (emphasis added).  Other than chance?  Well this implies causality, as well as bias and confounding, but the confidence interval, like the p-value, addresses only random or sampling error.  Beecher-Monas’s error is neither random nor scientific.  Indeed, she perpetuates the same error committed by the Fifth Circuit in a frequently cited Bendectin case, which interpreted the confidence interval as resolving questions of the role of matters “other than chance,” such as bias and confounding.  Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989)(“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”)(emphasis in original).  See, e.g., David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The New Wigmore – A Treatise on Evidence:  Expert Evidence § 12.6.4, at 546 (2d ed. 2011) Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009)(criticizing the overinterpretation of confidence intervals by the Brock court).

Clapp, Ozonoff, and Beecher-Monas are not alone in offering bad advice to judges who must help resolve statistical issues.  Déirdre Dwyer, a prominent scholar of expert evidence in the United Kingdom, manages to bundle up the transpositional fallacy and a misstatement of the meaning of the confidence interval into one succinct exposition:

“By convention, scientists require a 95 per cent probability that a finding is not due to chance alone. The risk ratio (e.g. ‘2.2’) represents a mean figure. The actual risk has a 95 per cent probability of lying somewhere between upper and lower limits (e.g. 2.2 ±0.3, which equals a risk somewhere between 1.9 and 2.5) (the ‘confidence interval’).”

Déirdre Dwyer, The Judicial Assessment of Expert Evidence 154-55 (Cambridge Univ. Press 2008).

Of course, Clapp, Ozonoff, Beecher-Monas, and Dwyer build upon a long tradition of academics’ giving errant advice to judges on this very issue.  See, e.g., Christopher B. Mueller, “Daubert Asks the Right Questions:  Now Appellate Courts Should Help Find the Right Answers,” 33 Seton Hall L. Rev. 987, 997 (2003)(describing the 95% confidence interval as “the range of outcomes that would be expected to occur by chance no more than five percent of the time”); Arthur H. Bryant & Alexander A. Reinert, “The Legal System’s Use of Epidemiology,” 87 Judicature 12, 19 (2003)(“The confidence interval is intended to provide a range of values within which, at a specified level of certainty, the magnitude of association lies.”) (incorrectly citing the first edition of Rothman & Greenland, Modern Epidemiology 190 (Philadelphia 1998);  John M. Conley & David W. Peterson, “The Science of Gatekeeping: The Federal Judicial Center’s New Reference Manual on Scientific Evidence,” 74 N.C.L.Rev. 1183, 1212 n.172 (1996)(“a 95% confidence interval … means that we can be 95% certain that the true population average lies within that range”).

Who has prevailed?  The statistically correct authors of the statistics chapter of the Reference Manual on Scientific Evidence, or the errant commentators?  It would be good to have some empirical evidence to help evaluate the judiciary’s competence. Here are some cases, many drawn from the Manual‘s discussions, arranged chronologically, before and after the first appearance of the Manual:

Before First Edition of the Reference Manual on Scientific Evidence:

DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 948 (3d Cir. 1990)(“A 95% confidence interval is constructed with enough width so that one can be confident that it is only 5% likely that the relative risk attained would have occurred if the true parameter, i.e., the actual unknown relationship between the two studied variables, were outside the confidence interval.   If a 95% confidence interval thus contains ‘1’, or the null hypothesis, then a researcher cannot say that the results are ‘statistically significant’, that is, that the null hypothesis has been disproved at a .05 level of significance.”)(internal citations omitted)(citing in part, D. Barnes & J. Conley, Statistical Evidence in Litigation § 3.15, at 107 (1986), as defining a CI as “a limit above or below or a range around the sample mean, beyond which the true population is unlikely to fall”).

United States ex rel. Free v. Peters, 806 F. Supp. 705, 713 n.6 (N.D. Ill. 1992) (“A 99% confidence interval, for instance, is an indication that if we repeated our measurement 100 times under identical conditions, 99 times out of 100 the point estimate derived from the repeated experimentation will fall within the initial interval estimate … .”), rev’d in part, 12 F.3d 700 (7th Cir. 1993)

DeLuca v. Merrell Dow Pharms., Inc., 791 F. Supp. 1042, 1046 (D.N.J. 1992)(”A 95% confidence interval means that there is a 95% probability that the ‘true’ relative risk falls within the interval”) , aff’d, 6 F.3d 778 (3d Cir. 1993)

Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1353-54 & n.1 (6th Cir. 1992)(describing a 95% CI of 0.8 to 3.10, to mean that “random repetition of the study should produce, 95 percent of the time, a relative risk somewhere between 0.8 and 3.10”)

Hilao v. Estate of Marcos, 103 F.3d 767, 787 (9th Cir. 1996)(Rymer, J., dissenting and concurring in part).

After the first publication of the Reference Manual on Scientific Evidence:

American Library Ass’n v. United States, 201 F.Supp. 2d 401, 439 & n.11 (E.D.Pa. 2002), rev’d on other grounds, 539 U.S. 194 (2003)

SmithKline Beecham Corp. v. Apotex Corp., 247 F.Supp.2d 1011, 1037-38 (N.D. Ill. 2003)(“the probability that the true value was between 3 percent and 7 percent, that is, within two standard deviations of the mean estimate, would be 95 percent”)(also confusing attained significance probability with posterior probability: “This need not be a fatal concession, since 95 percent (i.e., a 5 percent probability that the sign of the coefficient being tested would be observed in the test even if the true value of the sign was zero) is an  arbitrary measure of statistical significance.  This is especially so when the burden of persuasion on an issue is the undemanding ‘preponderance’ standard, which  requires a confidence of only a mite over 50 percent. So recomputing Niemczyk’s estimates as significant only at the 80 or 85 percent level need not be thought to invalidate his findings.”), aff’d on other grounds, 403 F.3d 1331 (Fed. Cir. 2005)

In re Silicone Gel Breast Implants Prods. Liab. Litig, 318 F.Supp.2d 879, 897 (C.D. Cal. 2004) (interpreting a relative risk of 1.99, in a subgroup of women who had had polyurethane foam covered breast implants, with a 95% CI that ran from 0.5 to 8.0, to mean that “95 out of 100 a study of that type would yield a relative risk somewhere between on 0.5 and 8.0.  This huge margin of error associated with the PUF-specific data (ranging from a potential finding that implants make a woman 50% less likely to develop breast cancer to a potential finding that they make her 800% more likely to develop breast cancer) render those findings meaningless for purposes of proving or disproving general causation in a court of law.”)(emphasis in original)

Ortho–McNeil Pharm., Inc. v. Kali Labs., Inc., 482 F.Supp. 2d 478, 495 (D.N.J.2007)(“Therefore, a 95 percent confidence interval means that if the inventors’ mice experiment was repeated 100 times, roughly 95 percent of results would fall within the 95 percent confidence interval ranges.”)(apparently relying party’s expert witness’s report), aff’d in part, vacated in part, sub nom. Ortho McNeil Pharm., Inc. v. Teva Pharms Indus., Ltd., 344 Fed.Appx. 595 (Fed. Cir. 2009)

Eli Lilly & Co. v. Teva Pharms, USA, 2008 WL 2410420, *24 (S.D.Ind. 2008)(stating incorrectly that “95% percent of the time, the true mean value will be contained within the lower and upper limits of the confidence interval range”)

Benavidez v. City of Irving, 638 F.Supp. 2d 709, 720 (N.D. Tex. 2009)(interpreting a 90% CI to mean that “there is a 90% chance that the range surrounding the point estimate contains the truly accurate value.”)

Estate of George v. Vermont League of Cities and Towns, 993 A.2d 367, 378 n.12 (Vt. 2010)(erroneously describing a confidence interval to be a “range of values within which the results of a study sample would be likely to fall if the study were repeated numerous times”)

Correct Statements

There is no reason for any of these courts to have struggled so with the concept of statistical significance or of the confidence interval.  These concepts are well elucidated in the Reference Manual on Scientific Evidence (RMSE):

“To begin with, ‘confidence’ is a term of art. The confidence level indicates the percentage of the time that intervals from repeated samples would cover the true value. The confidence level does not express the chance that repeated estimates would fall into the confidence interval.91

* * *

According to the frequentist theory of statistics, probability statements cannot be made about population characteristics: Probability statements apply to the behavior of samples. That is why the different term ‘confidence’ is used.”

RMSE 3d at 247 (2011).

Even before the Manual, many capable authors have tried to reach the judiciary to help them learn and apply statistical concepts more confidently.  Professors Michael Finkelstein and Bruce Levin, of the Columbia University’s Law School and Mailman School of Public Health, respectively, have worked hard to educate lawyers and judges in the important concepts of statistical analyses:

“It is the confidence limits PL and PU that are random variables based on the sample data. Thus, a confidence interval (PL, PU ) is a random interval, which may or may not contain the population parameter P. The term ‘confidence’ derives from the fundamental property that, whatever the true value of P, the 95% confidence interval will contain P within its limits 95% of the time, or with 95% probability. This statement is made only with reference to the general property of confidence intervals and not to a probabilistic evaluation of its truth in any particular instance with realized values of PL and PU. “

Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers at 169-70 (2d ed. 2001)

Courts have no doubt been confused to some extent between the operational definition of a confidence interval and the role of the sample point estimate as an estimator of the population parameter.  In some instances, the sample statistic may be the best estimate of the population parameter, but that estimate may be rather crummy because of the sampling error involved.  See, e.g., Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 158 (3d ed. 2008) (“Although a single confidence interval can be much more informative than a single P-value, it is subject to the misinterpretation that values inside the interval are equally compatible with the data, and all values outside it are equally incompatible. * * *  A given confidence interval is only one of an infinite number of ranges nested within one another. Points nearer the center of these ranges are more compatible with the data than points farther away from the center.”); Nicholas P. Jewell, Statistics for Epidemiology 23 (2004)(“A popular interpretation of a confidence interval is that it provides values for the unknown population proportion that are ‘compatible’ with the observed data.  But we must be careful not to fall into the trap of assuming that each value in the interval is equally compatible.”); Charles Poole, “Confidence Intervals Exclude Nothing,” 77 Am. J. Pub. Health 492, 493 (1987)(“It would be more useful to the thoughtful reader to acknowledge the great differences that exist among the p-values corresponding to the parameter values that lie within a confidence interval … .”).

Admittedly, I have given an impressionistic account, and I have used anecdotal methods, to explore the question whether the courts have improved in their statistical assessments in the 20 years since the Supreme Court decided Daubert.  Many decisions go unreported, and perhaps many errors are cut off from the bench in the course of testimony or argument.  I personally doubt that judges exercise greater care in their comments from the bench than they do in published opinions.  Still, the quality of care exercised by the courts would be a worthy area of investigation by the Federal Judicial Center, or perhaps by other sociologists of the law.

Scientific illiteracy among the judiciary

February 29th, 2012

Ken Feinberg, speaking at a symposium on mass torts, asks what legal challenges do mass torts confront in the federal courts.  The answer seems obvious.

Pharmaceutical cases that warrant federal court multi-district litigation (MDL) treatment typically involve complex scientific and statistical issues.  The public deserves having MDL cases assigned to judges who have special experience and competence to preside in cases in which these complex issues predominate.  There appears to be no procedural device to ensure that the judges selected in the MDL process have the necessary experience and competence, and a good deal of evidence to suggest that the MDL judges are not up to the task at hand.

In the aftermath of the Supreme Court’s decision in Daubert, the Federal Judicial Center assumed responsibility for producing science and statistics tutorials to help judges grapple with technical issues in their cases.  The Center has produced videotaped lectures as well as the Reference Manual on Scientific Evidence, now in its third edition.  Despite the Center’s best efforts, many federal judges have shown themselves to be incorrigible.  It is time to revive the discussions and debates about implementing a “science court.”

The following three federal MDLs all involved pharmaceutical products, well-respected federal judges, and a fundamental error in statistical inference.

Avandia

Avandia is a prescription oral anti-diabetic medication licensed by GlaxoSmithKline (GSK).  Concerns over Avandia’s association with excess heart attack risk resulted in regulatory revisions of its availability, as well as thousands of lawsuits.  In a decision that affected virtually all of those several thousand claims, aggregated for pretrial handing in a federal MDL, a federal judge, in ruling on a Rule 702 motion, described a clinical trial with a risk ratio greater than 1.0, with a p-value of 0.08, as follows:

“The DREAM and ADOPT studies were designed to study the impact of Avandia on prediabetics and newly diagnosed diabetics. Even in these relatively low-risk groups, there was a trend towards an adverse outcome for Avandia users (e.g., in DREAM, the p-value was .08, which means that there is a 92% likelihood that the difference between the two groups was not the result of mere chance).FN72

In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011)(Rufe, J.).  This is a remarkable error by a trial judge given the responsibility for pre-trial handling of so many cases.  There are many things you can argue about a p-value of 0.08, but Judge Rufe’s interpretation is not an argument; it is error.  That such an error, explicitly warned against in the Reference Manual on Scientific Evidence, could be made by an MDL judge, over 15 years since the first publication of the Manual, highlights the seriousness and the extent of the illiteracy problem.

What possible basis could the Avandia MDL court have to support this clearly erroneous interpretation of crucial studies in the litigation?  Footnote 72 in Judge Rufe’s opinion references a report by plaintiffs’ expert witness, Allan D. Sniderman, M.D, “a cardiologist, medical researcher, and professor at McGill University.” Id. at *10.  The trial court goes on to note that:

“GSK does not challenge Dr. Sniderman’s qualifications as a cardiologist, but does challenge his ability to analyze and draw conclusions from epidemiological research, since he is not an epidemiologist. GSK’s briefs do not elaborate on this challenge, and in any event the Court finds it unconvincing given Dr. Sniderman’s credentials as a researcher and published author, as well as clinician, and his ability to analyze the epidemiological research, as demonstrated in his report.”

Id.

What more evidence could the Avandia MDL trial court possibly have needed to show that Sniderman was incompetent to give statistical and epidemiologic testimony?  Fundamentally at odds with the Manual on an uncontroversial point, Sniderman had given the court a baseless, incorrect interpretation of a p-value.  Everything else he might have to say on the subject was likely suspect.  If, as the court suggested, GSK did not elaborate upon its challenge with specific examples, then shame on GSK. The trial court, however, could have readily determined that Sniderman was speaking nonsense by reading the chapter on statistics in the Reference Manual on Scientific Evidence.  For all my complaints about gaps in coverage in the Manual, the text, on this issue is clear and concise. It really is not too much to expect an MDL trial judge to be conversant with the basic concepts of scientific and statistical evidence set out in the Manual, which is prepared to help federal judges.

Phenylpropanolamine (PPA) Litigation

Litigation over phenylpropanolamine was aggregated, within the federal system, before Judge Barbara Rothstein.  Judge Rothstein is not only a respected federal trial judge, she was the director of the Federal Judicial Center, which produces the Reference Manual on Scientific Evidence.  Her involvement in overseeing the preparation of the third edition of the Manual, however, did not keep Judge Rothstein from badly misunderstanding and misstating the meaning of a p-value in the PPA litigation.  See In re Phenylpropanolamine (PPA) Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1236 n.1 (W.D. Wash. 2003)(“P-values measure the probability that the reported association was due to chance… .”).  Tellingly, Judge Rothstein denied, in large part, the defendants’ Rule 702 challenges.  Juries, however, overwhelmingly rejected the claims that PPA caused their strokes.

Ephedra Litigation

Judge Rakoff, of the Southern District of New York, notoriously committed the transposition fallacy in the Ephedra litigation:

“Generally accepted scientific convention treats a result as statistically significant if the P-value is not greater than .05. The expression ‘P=.05’ means that there is one chance in twenty that a result showing increased risk was caused by a sampling error—i.e., that the randomly selected sample accidentally turned out to be so unrepresentative that it falsely indicates an elevated risk.”

In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005).

Judge Rakoff then fallaciously argued that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant.  Id. at 188, 193.  See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009).

Judge Rakoff may well have had help in confusing the probability used to characterize the plaintiff’s burden of proof with the probability of attained significance.  At least one of the defense expert witnesses in the Ephedra cases gave an erroneous definition of “statistically significant association,” which may have invited the judicial error:

“A statistically significant association is an association between exposure and disease that meets rigorous mathematical criteria demonstrating that the finding is unlikely to be the result of chance.”

Report of John Concato, MD, MS, MPH, at 7, ¶29 (Sept. 13, 2004).  Dr. Concato’s error was picked up and repeated in the defense briefing of its motion to preclude:

“The likelihood that an observed association could occur by chance alone is evaluated using tests for statistical significance.”

Memorandum of Law in Support of Motion by Ephedra Defendants to Exclude Expert Opinions of Charles Buncher, [et alia] …That Ephedra Causes Hemorrhagic Stroke, Ischemic Stroke, Seizure, Myocardial Infarction, Sudden Cardiac Death, and Heat-Related Illnesses at 9 (Dec. 3, 2004).

Judge Rakoff’s insistence that requiring “statistical significance” at the customary 5% level would change the plaintiffs’ burden of proof, and require greater certitude for epidemiologists than for other expert witnesses who opine in less “rigorous” fields of learning, is wrong as a matter of fact.  His Honor’s comparison, however, ignores the Supreme Court’s observation that the point of Rule 702 is:

‘‘to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

Judge Rakoff not only ignored the conditional nature of significance probability, but he overinterpreted the role of significance testing in arriving at a conclusion of causality.  Statistical significance may answer the question of the strength of the evidence for ruling out chance in producing the data observed based upon an assumption of the no risk, but it doesn’t alone answer the question whether the study result shows an increased risk.  Bias and confounding must be considered, along with other Bradford Hill factors.

Even if the p-value could be turned into a posterior probability of the null hypothesis, there would be many other probabilities that would necessarily diminish that probability.  Some of the other factors (which could be expressed as objective or subjective probabilities) include:

  • accuracy of the data reporting
  • data collection
  • data categorization
  • data cleaning
  • data handling
  • data analysis
  • internal validity of the study
  • external validity of the study
  • credibility of study participants
  • credibility of study researchers
  • credibility of the study authors
  • accuracy of the study authors’ expression of their research
  • accuracy of the editing process
  • accuracy of the testifying expert witness’s interpretation
  • credibility of the testifying expert witness
  • other available studies, and their respective data and analysis factors
  • all the other Bradford Hill factors

If these largely independent factors each had a probability or accuracy of 95%, the conjunction of their probabilities would likely be below the needed feather weight on top of 50%.  In sum, Judge Rakoff’s confusing significance probability and the posterior probability of the null hypothesis does not subvert the usual standards of proof in civil cases.  See also Sander Greenland, “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment,” 53 Preventive Medicine 225 (2011).

WHENCE COMES THIS ERROR

As a matter of intellectual history, I wonder where this error entered into the judicial system.  As a general matter, there was not much judicial discussion of statistical evidence before the 1970s.  The earliest manifestation of the transpositional fallacy in connection with scientific and statistical evidence appears in an opinion of the United States Court of Appeals, for the District of Columbia Circuit.  Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).  The Circuit’s language is worth looking at carefully:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain.

Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App.D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

 Id.  The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the error that is less than 5% is not the probability of error of the belief in hypothesis of no difference between observations and expectations, but rather the probability of observing the data or the data even more extreme, on the assumption that observed would equal the expected.  The District of Columbia Circuit thus created a strawman:  scientific certainty is 95%, whereas civil and administrative law certainty is 51%.  This is rubbish, which confuses the frequentist probability from hypothesis testing with the subjective probability for belief in a fact.

The transpositional fallacy has a good pedigree, but that does not make it correct.  Only a lawyer would suggest that a mistake once made was somehow binding upon future litigants.  The following collection of citations and references illustrate how widespread the fundamental misunderstanding of statistical inference is, in the courts, in the academy, and at the bar.  If courts cannot deliver fair, accurate adjudication of scientific facts, then it is time to reform the system.


Courts

U.S. Supreme Court

Vasquez v. Hillery, 474 U.S. 254, 259 n.3 (1986) (“the District Court . . . accepted . . . a probability of 2 in 1,000 that the phenomenon was attributable to chance”)

U.S. Court of Appeals

First Circuit

Fudge v. Providence Fire Dep’t, 766 F.2d 650, 658 (1st Cir. 1985) (“Widely accepted statistical techniques have been developed to determine the likelihood an observed disparity resulted from mere chance.”)

Second Circuit

Nat’l Abortion Fed. v. Ashcroft, 330 F. Supp. 2d 436 (S.D.N.Y. 2004), aff’d in part, 437 F.3d 278 (2d Cir. 2006), vacated, 224 Fed. App’x 88 (2d Cir. 2007) (reporting an expert witness’s interpretation of a p-value of 0.30 to mean that there was a 30% probability that the study results were due to chance alone)

Smith v. Xerox Corp., 196 F.3d 358, 366 (2d Cir. 1999) (“If an obtained result varies from the expected result by two standard deviations, there is only about a .05 probability that the variance is due to chance.”)

Waisome v. Port Auth., 948 F.2d 1370, 1376 (2d Cir. 1991) (“about one chance in 20 that the explanation for a deviation could be random”)

Ottaviani v. State Univ. of New York at New Paltz, 875 F.2d 365, 372 n.7 (2d Cir. 1989)

Murphy v. General Elec. Co., 245 F. Supp. 2d 459, 467 (N.D.N.Y. 2003) (“less than a 5% probability that age was related to termination by chance”)

Third Circuit

United States v. State of Delaware, 2004 WL 609331, *10 n.27 (D. Del. 2004) (“there is a 5% (or 1 in 20) chance that the relationship observed is purely random”)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 605 n.26 (D.N.J. 2002) (“only 5% probability that an observed association is due to chance”)

Fifth Circuit

EEOC v. Olson’s Dairy Queens, Inc., 989 F.2d 165, 167 (5th Cir. 1993) (“Dr. Straszheim concluded that the likelihood that [the] observed hiring patterns resulted from truly race-neutral hiring practices was less than one chance in ten thousand.”)

Capaci v. Katz & Besthoff, Inc., 711 F.2d 647, 652 (5th Cir. 1983) (“the highest probability of unbiased hiring was 5.367 × 10-20”), cert. denied, 466 U.S. 927 (1984)

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982)(” A variation of two standard deviations would indicate that the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke. Sullivan, Zimmer & Richards, supra n.9 at 74.”)

Vuyanich v. Republic Nat’l Bank, 505 F. Supp. 224, 272 (N.D.Tex. 1980) (“the chances are less than one in 20 that the true coefficient is actually zero”), judgement vacated, 723 F.2d 1195 (5th Cir. 1984).

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982) (“the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke”)

Seventh Circuit

Adams v. Ameritech Services, Inc., 231 F.3d 414, 424, 427 (7th Cir. 2000) (“it is extremely unlikely (that is, there is less than a 5% probability) that the disparity is due to chance.”)

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940, 941 (7th Cir. 1997) (“An affidavit by a statistician . . . states that the probability that the retentions . . . are uncorrelated with age is less than 5 percent.”)

Eighth Circuit

Craik v. Minnesota State Univ. Bd., 731 F.2d 465, 476n. 13 (8th Cir. 1984) (“Statistical significance is a measure of the probability that an observed disparity is not due to chance. Baldus & Cole, Statistical Proof of Discrimination § 9.02, at 290 (1980). A finding that a disparity is statistically significant at the 0.05 or 0.01 level means that there is a 5 per cent. or 1 per cent. probability, respectively, that the disparity is due to chance.

Ninth Circuit

Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1241n.9 (E.D. Wash. 2002)(describing “statistical tools to calculate the probability that the difference seen is caused by random variation”)

D.C. Circuit

National Lime Ass’n v. EPA, 627 F.2d 416,453 (D.C. Cir. 1980)

FEDERAL CIRCUIT

Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”)(citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).


Regulatory Guidance

OSHA’s Guidance for Compliance with Hazard Communication Act:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).


Academic Commentators

Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999):

“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).  See SANDERS, supra note 5, at 51. The discipline of epidemiology is inherently conservative in making causal ascriptions, and regards Type I errors as more serious than Type II errors, or falsely assuming no association when in fact there is one (false negative). Thus, epidemiology conventionally requires a 95% level of statistical significance, i.e. that in statistical terms it is 95% likely that the association is due to exposure, rather than to chance. See id. at 50-52; Thompson, supra note 3, at 256-58. Despite courts’ use of statistical significance as an evidentiary screening device, this measurement has nothing to do with causation. It is most reflective of a study’s sample size, the relative rarity of the disease being studied, and the variance in study populations. Thompson, supra note 3, at 256.”

 

Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30 (2007):

 “‘By rejecting a hypothesis only when the test is statistically significant, we have placed an upper bound, .05, on the chance of rejecting a true hypothesis’. Fienberg et al., p. 22. Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”

Professor Fienberg stated the matter corrrectly, but Beecher-Monas goes on to restate the matter in her own words, erroneously.  Later, she repeats her incorrect interpretation:

“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.19”

Id. at 61 (citing a paper by Sander Greenland, who correctly stated the definition).

Mark G. Haug, “Minimizing Uncertainty in Scientific Evidence,” in Cynthia H. Cwik & Helen E. Witt, eds., Scientific Evidence Review:  Current Issues at the Crossroads of Science, Technology, and the Law – Monograph No. 7, at 87 (2006)

Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34(Oxford 1993)(One can think of α, β (the chances of type I and type II errors, respectively) and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

Arnold Barnett, “An Underestimated Threat to Multiple Regression Analyses Used in Job Discrimination Cases, 5 Indus. Rel. L.J. 156, 168 (1982) (“The most common rule is that evidence is compelling if and only if the probability the pattern obtained would have arisen by chance alone does not exceed five percent.”)

David W. Barnes, Statistics as Proof: Fundamentals of Quantitative Evidence 162 (1983)(“Briefly, however, the findings of statistical significance at the P < .05, P < .04, and P < .02 levels indicate that the court can be 95%, 96%, and 98% certain, respectively, that the null hypotheses involved in the specific tests carried out … should be rejected.”)

Wayne Roth-Nelson & Kathey Verdeal, “Risk Evidence in Toxic Torts,” 2 Envt’l Lawyer 405,415-16 (1996) (confusing burden of proof with standard for hypothesis testint; and apparently endorsing the erroneous views given by Judge Newman, dissenting in Hodges). Caveat: Roth-Nelson is now a “forensic” toxicologist, who testifies in civil and criminal trials.

Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993) (“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”).


Plaintiffs’ Counsel

Steven Rotman, “Don’t Know Much About Epidemiology?” Trial (Sept. 2007) (Author’s question answered in the affirmative:  “P values.  These measure the probability that a reported association between a drug and condition was due to chance.  A P-value of 0.05, which is generally considered the standard for statistical significance, means there is a 5 percent probability that the association was due to chance.”)

Defense Counsel

Bruce R. Parker & Anthony F. Vittoria, “Debunking Junk Science: Techniques for Effective Use of Biostatistics,” 65 Defense Csl. J. 35, 44 (2002) (“a P value of .01 means the researcher can be 99 percent sure that the result was not due to chance”).

Meta-Analysis of Observational Studies in Non-Pharmaceutical Litigations

February 26th, 2012

Yesterday, I posted on several pharmaceutical litigations that have involved meta-analytic studies.   Meta-analytic studies have also figured prominently in non-pharmaceutical product liability litigation, as well as in litigation over videogames, criminal recidivism, and eyewitness testimony.  Some, but not all, of the cases in these other areas of litigation are collected below.  In some cases, the reliability or validity of the meta-analyses were challenged; in some cases, the court fleetingly referred to meta-analyses relied upon the parties.  Some of the courts’ treatments of meta-analysis are woefully inadequate or erroneous.  The failure of the Reference Manual on Scientific Evidence to update its treatment of meta-analysis is telling.  See The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011).

 

Abortion (Breast Cancer)

Christ’s Bride Ministries, Inc. v. Southeastern Pennsylvania Transportation Authority, 937 F.Supp. 425 (E.D. Pa. 1996), rev’d, 148 F.3d 242 (3d Cir. 1997)

Asbestos

In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993)(“adding a series of positive but statistically insignificant SMRs [standardized mortality ratios] together does not produce a statistically significant pattern”), rev’d, 52 F.3d 1124 (2d Cir. 1995).

In Re Asbestos Litigation, Texas Multi District Litigation Cause No. 2004-03964 (June 30, 2005)(Davidson, J.)(“The Defendants’ response was presented by Dr. Timothy Lash.  I found him to be highly qualified and equally credible.  He largely relied on the report submitted to the Environmental Protection Agency by Berman and Crump (“B&C”).  He found the meta-analysis contained in B&C credible and scientifically based.  B&C has not been published or formally accepted by the EPA, but it does perform a valuable study of the field.  If the question before me was whether B&C is more credible than the Plaintiffs’ studies taken together, my decision might well be different.”)

Jones v. Owens-Corning Fiberglas, 288 N.J. Super. 258, 672 A.2d 230 (1996)

Berger v. Amchem Prods., 818 N.Y.S.2d 754 (2006)

Grenier v. General Motors Corp., 2009 WL 1034487 (Del.Super. 2009)

Benzene

Knight v. Kirby Inland Marine, Inc., 363 F. Supp. 2d 859 (N.D. Miss. 2005)(precluding proffered opinion that benzene caused bladder cancer and lymphoma; noting without elaboration or explanation, that meta-analyses are “of limited value in combining the results of epidemiologic studies based on observation”), aff’d, 482 F.3d 347 (5th Cir. 2007)

Baker v. Chevron USA, Inc., 680 F.Supp. 2d 865 (S.D. Ohio 2010)

Diesel Exhaust Exposure

King v. Burlington Northern Santa Fe Ry. Co., 277 Neb. 203, 762 N.W.2d 24 (2009)

Kennecott Greens Creek Mining Co. v. Mine Safety & Health Admin., 476 F.3d 946 (D.C. Cir. 2007)

Eyewitness Testimony

State of New Jersey v. Henderson, 208 N.J. 208, 27 A.3d 872 (2011)

Valle v. Scribner, 2010 WL 4671466 (C.D. Calif. 2010)

People v. Banks, 16 Misc.3d 929, 842 N.Y.S.2d 313 (2007)

Lead

Palmer Asarco Inc., 510 F.Supp.2d 519 (N.D. Okla. 2007)

PCBs

In re Paoli R.R. Yard PCB Litigation, 916 F.2d 829, 856-57 (3d Cir.1990) (‘‘There is some evidence that half the time you shouldn’t believe meta-analysis, but that does not mean that meta-analyses are necessarily in error. It means that they are, at times, used in circumstances in which they should not be.’’) (internal quotation marks and citations omitted), cert. denied, 499 U.S. 961 (1991)

Repetitive Stress

Allen v. International Business Machines Corp., 1997 U.S. Dist. LEXIS 8016 (D. Del. 1997)

Tobacco

Flue-Cured Tobacco Cooperative Stabilization Corp. v. United States Envt’l Protection Agency, 4 F.Supp.2d 435 (M.D.N.C. 1998), vacated by, 313 F.3d 852 (4th Cir. 2002)

Tocolytics – Medical Malpractice

Hurd v. Yaeger, 2009 WL 2516874 (M.D. Pa. 2009)

Toluene

Black v. Rhone-Poulenc, Inc., 19 F.Supp.2d 592 (S.D.W.Va. 1998)

Video Games (Violent Behavior)

Brown v. Entertainment Merchants Ass’n, ___ U.S.___, 131 S.Ct. 2729 (2011)

Entertainment Software Ass’n v. Blagojevich, 404 F.Supp.2d 1051 (N.D. Ill. 2005)

Entertainment Software Ass’n v. Hatch, 443 F.Supp.2d 1065 (D. Minn. 2006)

Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950 (9th Cir. 2009)

Vinyl Chloride

Taylor v. Airco, 494 F. Supp. 2d 21 (D. Mass. 2007)(permitting opinion testimony that vinyl chloride caused intrahepatic cholangiocarcinoma, without commenting upon the reasonableness of reliance upon the meta-analysis cited)

Welding

Cooley v. Lincoln Electric Co., 693 F.Supp.2d 767 (N.D. Ohio. 2010)

Meta-Analysis in Pharmaceutical Cases

February 25th, 2012

The Third Edition of the Reference Manual on Scientific Evidence attempts to cover a lot of ground to give the federal judiciary guidance on scientific, medical, and statistical, and engineering issues.  It has some successes, and some failures.  One of the major problems in coverage in the new Manual is its inconsistent, sparse, and at points out-dated treatment of meta-analysis.   See The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011).

As I have pointed out elsewhere, the gaps and problems in the Manual‘s coverage are not “harmless error,” when some courts have struggled to deal with methodological and evaluative issues in connection with specific meta-analyses.  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011).

Perhaps the reluctance to treat meta-analysis more substantively comes from a perception that the technique for analyzing multiple studies does not come up frequently in litigation.  If so, let me help dispel the notion.  I have collected a partial list of drug and medical device cases that have confronted meta-analysis in one form or another.  In some cases, such as the Avandia MDL, a meta-analysis was a key, or the key, piece of evidence.  In other cases, meta-analysis may have been treated more peripherally.  Still, there are over 20 pharmaceutical cases in the last two decades that have dealt with the statistical techniques involved in meta-analysis.  In another post, I will collect the non-pharmaceutical cases as well.

 

Aredia – Zometa

Deutsch v. Novartis Pharm. Corp., 768 F. Supp. 2d 420 (E.D.N.Y. 2011)

 

Avandia

In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011)

Avon Pension Fund v. GlaxoSmithKline PLC, 343 Fed.Appx. 671 (2d Cir. 2009)

 

Baycol

In re Baycol Prods. Litig., 532 F.Supp. 2d 1029 (D. Minn. 2007)

 

Bendectin

Daubert v. Merrell Dow Pharm., 43 F.3d 1311 (9th Cir. 1995) (on remand from Supreme Court)

DePyper v. Navarro, 1995 WL 788828 (Mich.Cir.Ct. 1995)

 

Benzodiazepine

Vinitski v. Adler, 69 Pa. D. & C.4th 78, 2004 WL 2579288 (Phila. Cty. Ct. Common Pleas 2004)

 

Celebrex – Bextra

In re Bextra & Celebrex Marketing Sales Practices & Prod. Liab. Litig., 524 F.Supp.2d 1166 (2007)


E5 (anti-endotoxin monoclonal antibody for gram-negative sepsis)

Warshaw v. Xoma Corp., 74 F.3d 955 (1996)

 

Excedrin vs. Tylenol

McNeil-P.C.C., Inc. v. Bristol-Myers Squibb Co., 938 F.2d 1544 (2d Cir. 1991)

 

Fenfluramine, Phentermine

In re Diet Drugs Prod. Liab. Litig., 2000 WL 1222042 (E.D.Pa. 2000)

 

Fosamax

In re Fosamax Prods. Liab. Litig., 645 F.Supp.2d 164 (S.D.N.Y. 2009)

 

Gadolinium

In re Gadolinium-Based Contrast Agents Prod. Liab. Litig., 2010 WL 1796334 (N.D. Ohio 2010)

 

Neurontin

In re Neurontin Marketing, Sales Pracices, and Products Liab. Litig., 612 F.Supp.2d 116 (D. Mass. 2009)

 

Paxil (SSRI)

Tucker v. Smithkline Beecham Corp., 2010 U.S. Dist. LEXIS 30791 (S.D.Ind. 2010)

 

Prozac (SSRI)

Rimberg v. Eli Lilly & Co., 2009 WL 2208570 (D.N.M.)

 

Seroquel

In re Seroquel Products Liab. Litig., 2009 WL 3806434 *5 (M.D. Fla. 2009)

 

Silicone – Breast Implants

Allison v. McGhan Med. Corp., 184 F.3d 1300, 1315 n. 12 (11th Cir. 1999)(noting, in passing that the district court had found a meta-analysis (the “Kayler study”) unreliable “because it was a re-analysis of other studies that had found no statistical correlation between silicone implants and disease”)

Thimerosal – Vaccine

Salmond v. Sec’y Dep’t of Health & Human Services, 1999 WL 778528 (Fed.Cl. 1999)

Hennessey v. Sec’y Dep’t Health & Human Services, 2009 WL 1709053 (Fed.Cl. 2009)

 

Trasylol

In re Trasylol Prods. Liab. Litig., 2010 WL 1489793 (S.D. Fla. 2010)

 

Vioxx

Merck & Co., Inc. v. Ernst, 296 S.W.3d 81 (Tex. Ct. App. 2009)
Merck & Co., Inc. v. Garza, 347 S.W.3d 256 (Tex. 2011)

 

X-Ray Contrast Media (Nephrotoxicity of Visipaque versus Omnipaque)

Bracco Diagnostics, Inc. v. Amersham Health, Inc., 627 F.Supp.2d 384 (D.N.J. 2009)

Zestril

E.R. Squibb & Sons, Inc. v. Stuart Pharms., 1990 U.S. Dist. LEXIS 15788 (D.N.J. 1990)(Zestril versus Squibb’s competing product,
Capote)

 

Zoloft (SSRI)

Miller v. Pfizer, Inc., 356 F.3d 1326 (10th Cir. 2004)

 

Zymar

Senju Pharmaceutical Co. Ltd. v. Apotex Inc., 2011 WL 6396792 (D.Del. 2011)

 

Zyprexa

In re Zyprexa Products Liab. Litig., 489 F.Supp.2d 230 (E.D.N.Y. 2007) (Weinstein, J.)

When There Is No Risk in Risk Factor

February 20th, 2012

Some of the terminology of statistics and epidemiology is not only confusing, but it is misleading.  Consider the terms “effect size,” “random effects,” and “fixed effect,” which are all used to describe associations even if known to be non-causal.  Biostatisticians and epidemiologists know that the terms are about putative or potential effects, but the sloppy, short-hand nomenclature can be misleading.

Although “risk” has a fairly precise meaning in scientific parlance, the usage for “risk factor” is fuzzy, loose, and imprecise.  Journalists and plaintiffs’ lawyers use “risk factor,” much as they another frequently abused term in their vocabulary:  “link.”  Both “risk factor” and “link” sound as though they are “causes,” or at least as though they have something to do with causation.  The reality is usually otherwise.

The business of exactly what “risk factor” means is puzzling and disturbing.  The phrase seems to have gained currency because it is squishy and without a definite meaning.  Like the use of “link” by journalists, the use of “risk factor” protects the speaker against contradiction, but appears to imply a scientifically valid conclusion.  Plaintiffs’ counsel and witnesses love to throw this phrase around precisely because of its ambiguity.  In journal articles, authors sometimes refer to any exposure inquired about in a case-control study to be a “risk factor,” regardless of the study result.  So a risk factor can be merely an “exposure of interest,” or a possible cause, or a known cause.

The author’s meaning in using the phrase “risk factor” can often be discerned from context.  When an article reports a case-control study, which finds an association with an exposure to some chemical the article will likely report in the discussion section that the study found that chemical to be a risk factor.  The context here makes clear that the chemical was found to be associated with the outcome, and that chance was excluded as a likely explanation because the odds ratio was statistically significant.  The context is equally clear that the authors did not conclude that the chemical was a cause of the outcome because they did not rule out bias or confounding; nor did they do any appropriate analysis to reach a causal conclusion and because their single study would not have justified reaching a causal association.

Sometimes authors qualify “risk factor” with an adjective to give more specific meaning to their usage.  Some of the adjectives used in connection with the phrase include:

– putative, possible, potential, established, well-established, known, certain, causal, and causative

The use of the adjective highlights the absence of a precise meaning for “risk factor,” standing alone.  Adjectives such as “established,” or “known” imply earlier similar findings, which are corroborated by the study at hand.  Unless “causal” is used to modify “risk factor,” however, there is no reason to interpret the unqualified phrase to imply a cause.

Here is how the phrase “risk factor” is described in some noteworthy texts and treatises.

Legal Treatises

Professor David Faigman, and colleagues, with some understatement, note that the term “risk factor is loosely used”:

Risk Factor An aspect of personal behavior or life-style, an environmental exposure, or an inborn or inherited characteristic, which on the basis of epidemiologic evidence is known to be associated with health-related condition(s) considered important to prevent. The term risk factor is rather loosely used, with any of the following meanings:

1. An attribute or exposure that is associated with an increased probability of a specified outcome, such as the occurrence of a disease. Not necessarily a causal factor.

2. An attribute or exposure that increases the probability of occurrence of disease or other specified outcome.

3. A determinant that can be modified by intervention, thereby reducing the probability of occurrence of disease or other specified outcomes.”

David L. Faigman, Michael J. Saks, Joseph Sanders, and Edward Cheng, Modern Scientific Evidence:  The Law and Science of Expert Testimony 301, vol. 1 (2010)(emphasis added).

The Reference Manual on Scientific Evidence (2011) (RMSE3d) does not offer much in the way of meaningful guidance here.  The chapter on statistics in the third edition provides a somewhat circular, and unhelpful definition.  Here is the entry in that chapter’s glossary:

risk factor. See independent variable.

RMSE3d at 295.  If the glossary defined “independent variable” as a simply a quantifiable variable that was being examined for some potential relationship with the outcome, or dependent, variable, the RMSE would have avoided error.  Instead the chapter’s glossary, as well as its text, defines independent variables as “causes,” which begs the question why do a study to determine whether the “independent variable” is even a candidate for a causal factor?  Here is how the statistics chapter’s glossary defines independent variable:

“Independent variables (also called explanatory variables, predictors, or risk factors) represent the causes and potential confounders in a statistical study of causation; the dependent variable represents the effect. ***. “

RMSE3d at 288.  This is surely circular.  Studies of causation are using independent variables that represent causes?  There would be no reason to do the study if we already knew that the independent variables were causes.

The text of the RMSE chapter on statistics propagates the same confusion:

“When investigating a cause-and-effect relationship, the variable that represents the effect is called the dependent variable, because it depends on the causes.  The variables that represent the causes are called independent variables. With a study of smoking and lung cancer, the independent variable would be smoking (e.g., number of cigarettes per day), and the dependent variable would mark the presence or absence of lung cancer. Dependent variables also are called outcome variables or response variables. Synonyms for independent variables are risk factors, predictors, and explanatory variables.”

FMSE3d at 219.  In the text, the identification of causes with risk factors is explicit.  Independent variables are the causes, and a synonym for an independent variable is “risk factor.”  The chapter could have avoided this error simply by the judicious use of “putative,” or “candidate” in front of “causes.”

The chapter on epidemiology exercises more care by using “potential” to modify and qualify the risk factors that are considered in a study:

“In contrast to clinical studies in which potential risk factors can be controlled, epidemiologic investigations generally focus on individuals living in the community, for whom characteristics other than the one of interest, such as diet, exercise, exposure to other environmental agents, and genetic background, may distort a study’s results.”

FMSE3d at 556 (emphasis added).

 

Scientific Texts

Turning our attention to texts on epidemiology written for professionals rather than judges, we find that sometimes the term “risk factor” with a careful awareness of its ambiguity.

Herbert I. Weisberg is a statistician whose firm, Correlation Research Inc., specializes in the applied statistics in legal issues.  Weisberg recently published an interesting book on bias and causation, which is recommended reading for lawyers who litigate claimed health effects.  Weisberg’s book defines “risk factor” as merely an exposure of interest in a study that is looking for associations with a harmful outcome.  He insightfully notes that authors use the phrase “risk factor” and similar phrases to avoid causal language:

“We will often refer to this factor of interest as a risk factor, although the outcome event is not necessarily something undesirable.”

Herbert I. Weisberg, Bias and Causation:  Models and Judgment for Valid Comparisons 27 (2010).

“Causation is discussed elliptically if at all; statisticians typically employ circumlocutions such as ‘independent risk factor’ or ‘explanatory variable’ to avoid causal language.”

Id. at 35.

Risk factor : The risk factor is the exposure of interest in an epidemiological study and often has the connotation that the outcome event is harmful or in some way undesirable.”

Id. at 317.   This last definition is helpful in illustrating a balanced, fair definition that does not conflate risk factor with causation.

*******************

Lemuel A. Moyé is an epidemiologist who testified in pharmaceutical litigation, mostly for plaintiffs.  His text, Statistical Reasoning in Medicine:  The Intuitive P-Value Primer, is in places a helpful source of guidance on key concepts.  Moyé puts no stock in something’s being a risk factor unless studies show a causal relationship, established through a proper analysis.  Accordingly, he uses “risk factor” to signify simply an exposure of interest:

4.2.1 Association versus Causation

An associative relationship between a risk factor and a disease is one in which the two appear in the same patient through mere coincidence. The occurrence of the risk factor does not engender the appearance of the disease.

Causal relationships on the other hand are much stronger. A relationship is causal if the presence of the risk factor in an individual generates the disease. The causative risk factor excites the production of the disease. This causal relationship is tight, containing an embedded directionality in the relationship, i.e., (1) the disease is absence in the patient, (2) the risk factor is introduced, and (3) the risk factor’s presence produces the disease.

The declaration that a relationship is causal has a deeper meaning then the mere statement that a risk factor and disease are associated. This deeper meaning and its implications for healthcare require that the demonstration of a causal relationship rise to a higher standard than just the casual observation of the risk factor and disease’s joint occurrence.

Often limited by logistics and the constraints imposed by ethical research, the epidemiologist commonly cannot carry out experiments that identify the true nature of the risk factor–disease relationship. They have therefore become experts in observational studies. Through skillful use of observational research methods and logical thought, epidemiologists assess the strength of the links between risk factors and disease.”

Lemuel A. Moyé, Statistical Reasoning in Medicine:  The Intuitive P-Value Primer 92 (2d ed. 2006)

***************************

In A Dictionary of Epidemiology, which is put out by the International Epidemiology Association, a range of meanings is acknowledged, although the range is weighted toward causality:

“RISK FACTOR (Syn: risk indicator)

1. An aspect of personal behavior or lifestyle, an environmental exposure, or an inborn or inherited characteristic that, on the basis of scientific evidence, is known to be associated with meaningful health-related condition(s). In the twentieth century multiple cause era, a synonymous with determinant acting at the individual level.

2. An attribute or exposure that is associated with an increased probability of a specified outcome, such as the occurrence of a disease. Not necessarily a causal factor: it may be a risk marker.

3. A determinant that can be modified by intervention, thereby reducing the probability of occurrence of disease or other outcomes. It may be referred to as a modifiable risk factor, and logically must be a cause of the disease.

The term risk factor became popular after its frequent use by T. R. Dawber and others in papers from the Framingham study.346 The pursuit of risk factors has motivated the search for causes of chronic disease over the past half-century. Ambiguities in risk and in risk-related concepts, uncertainties inherent to the concept, and different legitimate meanings across cultures (even if within the same society) must be kept in mind in order to prevent medicalization of life and iatrogenesis.124–128,136,142,240

Miquel Porta, Sander Greenland, John M. Last, eds., A Dictionary of Epidemiology 218-19 (5th ed. 2008).  We might add that the uncertainties inherent in risk concepts should be kept in mind to prevent overcompensation for outcomes not shown to be caused by alleged tortogens.

***************

One introductory text uses “risk factor” as a term to describe the independent variable, while acknowledging that the variable does not become a risk factor until after the study shows an association between factor and the outcome of interest:

“A case-control study is one in which the investigator seeks to establish an association between the presence of a characteristic (a risk factor).”

Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals 104 (3d ed. 2004).  See also id. at 198 (“Here, also, epidemiology plays a central role in identifying risk factors, such as smoking for lung cancer”).  Although it should be clear that much more must happen in order to show a risk factor is causally associated with an outcome, such as lung cancer, it would be helpful to spell this out.  Some texts simply characterize risk factor as associations, not necessarily causal in nature.  Another basic text provides:

“Analytical studies examine an association, i.e. the relationship between a risk factor and a disease in detail and conduct a statistical test of the corresponding hypothesis … .”

Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 18 (2005).  See also id. at 111 (Table describing the reasoning in a case-control study:    “Increased prevalence of risk factor among diseased may indicate a causal relationship.”)(emphasis added).

These texts, both legal and scientific, indicate a wide range of usage and ambiguity for “risk factor.”  There is a tremendous potential for the unscrupulous expert witness, or the uneducated lawyer, to take advantage of this linguistic latitude.  Courts and counsel must be sensitive to the ambiguity and imprecision in usages of “risk factor,” and the mischief that may result.  The Reference Manual on Scientific Evidence needs to sharpen and update its coverage of this and other statistical and epidemiologic issues.