TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Risk and Causation in the Law

March 16th, 2011

In “Risk ≠ Causation,” I discussed the lack of scientific basis for confusing and conflating risk and cause.  For many years, the law was in accord, and plaintiffs could not substitute evidence of risk for evidence of cause in fact.  Some of the case law is collected, below.  The law in this area was fairly stable until Judge Weinstein’s important decision in the Agent Orange litigation, where the court confronted the limitations of epidemiologic evidence to support conclusions about specific causation. Judge Weinstein implicitly recognized the problem that very large relative risks certainly suggested that an individual case was likely to have been related to its antecedent risks.  Small relative risks suggested that any inference of specific causation from the antecedent risk was largely speculative, in the absence of some reliable marker of exposure-related causation. See In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 817 (E.D.N.Y. 1984)(plaintiffs must prove at least a two-fold increase in rate of disease allegedly caused by the exposure), aff’d, 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004  (1988); see also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1240 (E.D.N.Y. 1985)(excluding plaintiffs’ expert witnesses), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988). 

CASE LAW

Krim v. pcOrder.com, Inc., 402 F.3d 489 (5th Cir. 2005)(rejecting standing plaintiffs’ standing to sue for fraud absent a showing of actual tracing of sharings to the offending public offering; statistical likelihood of those shares having been among those purchased was insufficient to confer standing)

Howard v. Wal-Mart Stores, Inc., 160 F.3d 358, 359–60 (7th Cir. 1998) (Posner, C.J.)

Norman v. National Gypsum Co., 739 F. Supp. 1137, 1138 (E.D. Tenn. 1990)(statistical evidence of risk of lung cancer from asbestos and smoking was insufficient to show individual causation, without evidence of asbestos fibers in the plaintiff’s lung tissue)

Washington v. Armstrong World Industries, 839 F.2d 1121 (5th Cir. 1988)(affirming grant of summary judgment on grounds that statistical correlation between asbestos exposure and disease did not support specific causation)

Thompson v. Merrell Dow Pharm., 229 N.J. Super. 230, 244, 551 A.2d 177, 185 (1988)(epidemiology looks at increased incidences of diseases in populations) 

Johnston v. United States, 597 F.Supp. 374, 412, 425-26 (D.Kan. 1984)(although the probability of attribution increases with the relative risk, expert must still speculate in making an individual attribution; “a statistical method which shows a greater than 50% probability does not rise to the required level of proof; plaintiffs’ expert witnesses’ reports were “statistical sophistry,” not medical opinion)

Robinson v. United States, 533 F. Supp. 320, 330 (E.D. Mich. 1982)(finding for government in swine flu vaccine case; the court found that that the epidemiological evidence offered by the plaintiff was not probative, and that it “would reach the same result if the epidemiological data were entirely excluded since statistical evidence cannot establish cause and effect in an individual

Sulesky v. United States, 545 F. Supp. 426, 430 (S.D.W.Va. 1982)(swine flu vaccine GBS cases; epidemiological studies alone do not prove or disprove causation in an individual)

Olson v. Federal American Partners, 567 P.2d 710, 712 13 (Wyo. 1977)(affirming judgment for employer in compensation proceedings; cigarette smoking claimant failed to show that his lung cancer resulted from workplace exposure to radiation, despite alleged synergism between smoking and radiation).

Heckman v. Federal Press Co., 587 F.2d 612, 617 (3d Cir. 1977) (statistical data about a group do not establish facts about an individual).

Crawford v. Industrial Comm’n, 23 Ariz. App. 578, 582-83, 534 P.2d 1077, 1078, 1082-83 (1975)(affirming an employee’s award of no compensation because he was exposed to disease producing conditions both on and off the job; a physician’s testimony, expressed to a reasonable degree of medical certainty that the working conditions statistically increased the probability of developing a disease does not satisfy the reasonable certainty standard)

Guenther v. Armstrong Rubber Co., 406 F.2d 1315, 1318 (3d Cir. 1969)(holding that defendant cannot be found liable on the basis that it supplied 75-80% of the kind of tire purchased by the plaintiff; any verdict based on this evidence “would at best be a guess”). 

In re King, 352 Mass. 488, 491 92, 225 N.E.2d 900, 902 (1967)(physician expert’s opinion that expressed a mathematical likelihood that claimant’s death was caused by his accident was legally insufficient to support a judgment)

Garner v. Heckla Mining Co., 19 Utah 2d 367, 431 P.2d 794, 796 97 (1967)(affirming denial of compensation to family of a uranium miner who had smoked cigarettes and had died of lung cancer; statistical evidence of synergistically increased risk of lung cancer among uranium miners is insufficient to show causation of decedent’s lung cancer, especially considering his having smoked cigarettes)

Mahoney v. United States, 220 F. Supp. 823, 840 41 (E.D. Tenn. 1963)(Taylor, C.J.)(holding that plaintiffs had failed to prove that their cancers were caused by radiation exposures, on the basis of their statistical, epidemiological proofs), aff’d, 339 F.2d 605, (6th Cir. 1964)(per curiam)

Kamosky v Owens-Illinois Co., 89 F. Supp. 561, 561-62 (M.D.Pa. 1950)(directing verdict in favor of defendant; statistical likelihood that defendant manufactured the bottle that injured plaintiff was insufficient to satisfy plaintiff’s burden of proof)

Sargent v. Massachusetts Accident Co., 307 Mass. 246, 250 (1940)(“It has been held not enough that mathematically the chances somewhat favor a proposition to be proved; for example, the fact that colored automobiles made in the current year outnumber black ones would not warrant a finding that an undescribed automobile of the current year is colored and not black, nor would the fact that only a minority of men die of cancer warrant a finding that a particular man did not die of cancer. The weight or preponderance of the evidence is its power to convince the tribunal which has the determination of the fact, of the actual truth of the proposition to be proved. After the evidence has been weighed, that proposition is proved by a preponderance of the evidence if it is made to appear more likely or probable in the sense that actual belief in its truth, derived from the evidence, exists in the mind or minds of the tribunal notwithstanding any doubts that may linger there.”)

Day v. Boston & Maine R.R., 96 Me. 207, 217–218, 52 A. 771, 774 (1902) (“Quantitative probability, however, is only the greater chance.  It is not proof, nor even probative evidence, of the proposition to be proved.  That in one throw of dice, there is a quantitative probability, or greater chance, that a less number of spots than sixes will fall uppermost is no evidence whatever that in a given throw such was the actual result.  Without something more, the actual result of the throw would still be utterly unknown.  The slightest real evidence would outweigh all the probability otherwise.”)

LEGAL COMMENTARY

Federal Judicial Center, Reference Manual on Scientific Evidence 337 (2d ed. 2000)( “A final caveat is that employing the results of group-based studies of risk to make a causal determination for an individual plaintiff is beyond the limits of epidemiology. Nevertheless, a substantial body of legal precedent has developed that addresses the use of epidemiologic evidence to prove causation for an individual litigant through probabilistic means, and these cases are discussed later in this reference guide.”)

Special Committee on Science and Law, “An Analysis of Proposesd Changes in Substantive and Procedural Law in Response to Perceived Difficulties in Establishing Whether or Not Causation Exists in Mass Toxic Tort Litigation,” The Record of the Ass’n of the Bar of the City of N.Y. 905, 916, 920 (1986)(epidemiologic evidence cannot answer causation issue, with “any certainty,” in the case of an individual claimant whose disease occurs “naturally” in unexposed people).

Dore, A Proposed Standard for Evaluating the Use of Epidemiological Evidence in Toxic Tort and Other Personal Injury Cases, 28 Howard L.J. 677, 692 (1985)(individual causation questions are beyond the competence of epidemiologists and the description of epidemiology)

E. Cleary, et al., eds., McCormick on Evidence § 209, at 646 & n.1 (3d ed. 1984)( “In and of itself, statistical analysis can never prove that some factor A causes some outcome B.  It can show that in a sample of observations, occurrences of B tend to be associated with those of A, and it can suggest that this statistical association probably would be observed for repeated samples.  But the association, even though “statistically significant,” need not be causal.  For instance, a third factor C could be causing both A and B.  Thus, over some time period, there may be a correlation between the number of people smoking cigarettes and the number of certain crimes committed, but if told that the population was growing rapidly during this time, no one would think that this proves that smoking causes crime.  Experimental design and some forms of statistical analysis can help control for the effects of other variables, but even these merely help formulate, confirm or refute theories about causal relationships.”)

Cong. Research Serv. Library of Cong., Report to the Subcommittee on Science, Research and Technology, “Review of Risk Assessment Methodologies,” 95th Cong., 1st Sess. 11 (Mar. 1983)(recognizing that epidemiologic predictions of disease incidence among groups can establish establishing statistical associations, but show specific causation) 

Solomons, “Workers’ Compensation for Occupational Disease Victims:  Federal Standards and Threshold Problems,” 41 Alb. L. Rev. 195, 201 (1977)(“suggesting that epidemiological showing a high probability of employment relatedness of lung cancer in an asbestos insulation worker, for example, would probably not establish causation in an individual claim.”) 

Estep, “Radiation Injuries and Statistics:  The Need for a New Approach to Injury Litigation,” 59 Mich. L. Rev. 259, 268-69 (1960)

The Selikoff – Castleman Conspiracy

March 13th, 2011

In previous posts about the late Irving Selikoff, I have discussed his iconic status as a scientist who battled corporate evil, to make the workplace and the environment safe from asbestos.  The truth is much murkier than this fabled narrative.

Selikoff and his cadre fueled cancerphobia, billions of dollars spent on asbestos abatement, irrational regulations that applied equally to all asbestos mineral types, demonization of legitimate industrial uses of chrysotile, and ultimately the wasting of American industry by asbestos litigation.

His conduct in these activities calls for greater scrutiny than has been accorded by journalists and historians.  The difficult case of Irving Selikoff is an instructive parable of the dangers of mixed motives and scientific enthusiasms.

Some might think that we should let bygones be bygones.  Perhaps, but that attitude did not spare the memory of Sir Richard Doll.  His death brought out the daggers and the yutzballs.  See, e.g., Samuel Epstein, “Richard Doll, An Epidemiologist Gone Awry” (visited on March 13, 2011); Sarah Boseley, “Renowned cancer scientist was paid by chemical firm for 20 years,” The Guardian (Dec. 8, 2006).

Now, imagine if a tobacco industry consultant wrote to a scientist and told him that plaintiffs were looking for important data to help them in their lawsuits, and that it was essential that these claimants not get what they were looking for.  In many courtrooms, such correspondence would be prima facie evidence of a conspiracy.  In the public forum, such evidence would tarnish the reputation of the scientist who engaged with the correspondent about suppressing evidence and refusing to cooperate with lawful discovery.

Now consider the case of Barry Castleman, consulting and testifying witness to the asbestos plaintiff industry.  Hired gun Castleman appears to have written Dr Selikoff in 1979, in the early days of the asbestos litigation, and urged him to not cooperate with lawful efforts of Johns-Manville to obtain evidence of the insulators’ union knowledge of the hazards of asbestos.  I found the memorandum from Castleman to Selikoff, “Defense Attorneys’ Efforts to Use Background Files of Selikoff-Hammond Studies to Avert Liability,” dated November 5, 1979, in a document archive at the University of California, San Francisco, The Legacy Tobacco Documents Library.  The document is now also available at Scribd

Because of its provenance, I cannot be absolutely sure of the document’s authenticity, but it certainly has the ring of truth. It was uploaded to the UCSF archive over a decade ago.  Presumably, if false, Castleman, or one of Selikoff’s intellectual heirs would have sued for its removal.  Perhaps someone can help me determine whether Barry Castleman, in his many testimonial adventures, has ever been confronted with this document.

Here is the text of the Castleman memorandum:

Memorandum from Barry Castleman to Irving Selikoff

November 5, 1979

Subject : Defense Attorneys’ Efforts to Use Background Files of Selikoff-Hammond Studies to Avert Liability

Ron Motley informs me that the industry lawyers are hoping to get cases thrown out of court by showing that the insulators themselves knew about their job risks.  The defendants hope to obtain the questionnaire materials used by you and Dr. Hammond, in the expectation of finding reference to when the men said they first became aware of the dangers of their trade. Ron and other plaintiffs lawyers are afraid that some of the men would have answered with 20-20 hindsight, recalling vaguely that “I heard something back in the early 40’s”.

Discovery of such statements in writing, even though made without much care and without any knowledge that rights to compensation might be jeopardized, without any consultation with their attorneys, could throw out individual claims; further,  a significant number of such statements pre-1964 would hurt the state of the art case for all the plaintiffs.

I don’t know what kinds of things might be found in your files and those of ACS (Dr . Hammond) but it strikes me as most important to hold these files confidential and resist efforts to get them released to the defendants. Among other things, the release of such materials could impair your ability to obtain the cooperation of the insulation workers and other trade unions who desparately [sic] need your services. From the urgency of Ron’s efforts to find me to raise this issue, I gather that defense efforts to gain access to your files is an imminent and serious possibility.

I will try to call in a week or so with more information, and to discuss this matter directly with you.

#######################################

Attached are the latest discoveries and notes thereon from Vorwald’s files and the Industrial Health Foundation . We now have the correspondence to shav that Ken Smith and Ivan Sabourin edited the Braun-Truan study prior to publication.  The exchange on S-M Waukegan worker Dominic Bertogliat shows that J-M was aware that workers exposed only to the general in-plant atmosphere were in some cases developing severe asbestosis (1948).

What is interesting is that there is no reply memorandum from Dr Selikoff, to point out “Mr. Castleman, that would be wrong; all parties are entitled to the evidence, and I am not here to help insulators avoid the legal consequences of their own negligence, if negligence it be.”  I would like to think that there is such a reply memorandum in the Selikoff archives, but personally, I doubt it.  Perhaps someone who has control over the archives would come forward with the missing documents.

The Poisson Distribution

March 12th, 2011

If Ms. Valerie Schremp Hahn had not reported the story in the St. Louis Post-Dispatch, then the story would had to have been invented by a tort reformer, or perhaps by a masochistic torts law professor.

Mr. Poisson is a murderer; actually he was convicted of involuntary manslaughter, as a result of his crime.  He stole the tip jar, containing less than $5.00, from a Starbucks coffee shop in Crestwood, Missouri, a suburb of St. Louis.  A paying customer, Roger Kreutz, saw this crime unfold, and yearing for a Darwin award, gave chase to the purloining Poisson.  A struggle ensued, but Poisson managed to get into his get-away car, and back into Mr. Kreutz.  Mr. Kreutz died shortly afterwards from the mayhem. See Hahn, Estate of man sues Starbucks over death (March 9, 2011).

Having served one year in prison, Mr. Poisson is now a free man.  The surviving Kreutz family has focused their outrage not at the murderous thief, but at Starbucks for the grievous misstep of having left the tip jar out on the counter without a warning.

Lest you think that the Kreutz family is a narrow-minded, money-grubbing lot, consider this.  Last year, the Kreutzes invited Poisson to a reunion at the Crestwood Starbucks, to shower him with forgiveness, and to help with the planting of a memorial tree for Roger.  Ms. Hahn’s article inclues a photograph, of Mr. Poissson, with a sinister smile, spreading the ashes of his victim, on the ground around a young tree.  Presumably, Mr. Poisson had enough sense not to go into the nearby Starbucks shop, where he might have been tempted once again by the tip jar, or perhaps by some old woman’s handbag.

And lest you think that the Kreutz family is a forgiving lot, consider this.  The Kreutzes have filed a wrongful death suit against Starbucks.  Roger’s death, they say, was directly and proximately caused by leaving the tip jar on the counter, unanchored and without a warning to innocent bystanders not to chase anyone who might steal the tips.  Mr. Poisson, who had received absolution for his murderous deed from the Kreutzes, was not named in the suit.

The story is almost too sick to be true.  The story is almost sick enough to be a law professor’s torts examination problem. 

What are Starbucks’ legal options?  Until they have a chance to appeal to the court of common sense, Starbucks might consider impleading Mr. Poisson, the agent of death in this case.  Perhaps they ought to sue the Kreutzes for having caused emotional distress by their intentional, wonton trespass arising from spreading Roger Kreutz’s ashes on the ground outside their coffee shop.  Finally, perhaps a subsequent, remedial is in order:  post Mr. Poisson’s picture on the walls of all Starbucks stores, to identify him, his previous crime, and to caution patrons not to chase him if he robs the store lest they want to end up like Roger.

This lawsuit will be worth watching.

The Kreutzes’ misdirected lawsuit is hardly unique in the annals of American law.  Consider all the lawsuits directed at companies that supply products and materials to employers, who in turn fail to control and supervisor workplace conditions.  When employees are harmed, they cannot sue their employers because of the preclusive effects of most Worker’s Compensation Acts.  The result is that the injured workers choose to sue the remote suppliers, who cannot control and supervise the workplace.  Why?  Because you can always sue.  Sadly, this sort of thing happens all the time.

Risk ≠ Causation

March 12th, 2011

Evidence of risk is not evidence of causation.  It never has been; it never will be. Risk and causation are distinct concepts.  Processes, events, or exposures may be risks; that is, they may be capable of causing an outcome of interest.  Risk, however, is an ex ante concept.  We can speak of a risk only before the outcome of interest has occurred.  After its occurrence, we are interested in what caused the outcome.

Before the tremendous development of epidemiology in the decades after World War II, most negligence and products liability cases involved mechanistic conceptions of causation.  Juries and courts considered claims of causation that conceptually were framed in the manner of billiard balls hitting one another until the final, billiard-of-ball of interest, went into the pocket.  Litigants and courts did not need to consider statistical evidence when considering whether a saw dismembered a plaintiff, or even whether chronic asbestos exposure caused inflammation and scarring in the lungs of workers.  In some instances, judicial efforts to cast causation as a mechanistic process smack of quackery.  Claims that blunt trauma caused malignant tumors at the site of the trauma, within days or weeks of the impact, come to mind as an example of magical thinking that plagued courts and juries in a era that was short on scientific gatekeeping, and long on deferring to clinical judgment, empty of meaningful scientific support.  See, e.g., Baker v. DeRosa, 413 Pa. 164, 196 A.2d 387 (1964)(holding that question whether car accident caused tumor was for the jury).

The advent of epidemiologic evidence introduced an entirely different class of claims, ones that were based upon stochastic concepts of causation.  The exposure, event, or process that was a putative cause had a probabilistic element to its operation.  The putative cause exercised its contribution to the outcome through a random process, which left changed the frequency of the harmful outcome in those who encountered the exposure.  In addition, the outcome that resulted from the “putative cause” was frequently indistinguishable from those outcomes that arose spontaneously or from other causes in the environment or from normal human aging.  Discerning which risks (or “putative causes”) operated in a given case of chronic human disease (such cancer, cardiovascular disease, autoimmune disease) became a key issue for courts and litigants’ expert witnesses.  The black box of epidemiology, however, sheds little or no light on the issue, and no other light source was available.

Today, expert witnesses, typically for plaintiffs, equate risk with causation.  Because risk is an ex ante concept, the inference from risk to causation is problematic.  In rare instances, the risk is absolute under the circumstances of the plaintiff’s manifestation, such that the outcome can be tied to the exposure that created the risk.  In most cases, however, there will have been other competing risks, which alone could have operated to produce the outcome of which the plaintiff complains.  In toxic tort litigation, we frequently see a multiplicity of pre-existing risks for a chronic disease that is prevalent in the entire population.  When claimants attempt to show causation for such outcomes by epidemiologic evidence, the inference of causation from a particular prior risk is typically little more than a guess.

One well-known epidemiologist explained the limits of inferences with respect to stochastic causation:

“An elementary but essential principal that epidemiologists must keep in mind is that a person may be exposed to an agent and then develop disease without there being any causal connection between exposure and disease.”   ****

“In a courtroom, experts are asked to opine whether the disease of a given patient has been caused by a specific exposure.  This approach of assigning causation in a single person is radically differentfrom the epidemiologic approach, which does not attempt to attribute causation in any individual instance.  Rather, the epidemiologic approach is to evaluate the proposition that the exposure is a cause of the disease in a theoretical sense, rather than in a specific person.”

Kenneth Rothman, Epidemiology: An Introduction 44 (Oxford 2002)(emphasis added). 

Another epidemiologist, who wrote the chapter in the Federal Judicial Center’s Reference Manual on Scientific Evidence, on epidemiology, put the matter thus:

“Epidemiology answers questions about groups, whereas the court often requires information about individuals.”

Leon Gordis, Epidemiology 3d ed. (Philadelphia 2004)(emphasis in original).  Accord G. Friedman, Primer of Epidemiology 2 (2d ed. 1980 (epidemiologic studies address causes of disease in populations, not causation in individuals); Sander Greenland, “Relation of the Probability of Causation to Relative Risk and Doubling Dose:  A Methodologic Error that Has Become a Social Problem,” 89 Am. J. Pub. Health1166, 1168 (1999)(“[a]ll epidemiologic measures (such as rate ratios and rate fractions) reflect only the net impact of exposure on a population”); Joseph V. Rodricks & Susan H. Rieth, “Toxicological Risk Assessment in the Courtroom:  Are Available Methodologies Suitable for Evaluating Toxic Tort and Product Liability Claims?” 27 Regulatory Toxicol. & Pharmacol. 21, 24-25 (1998)(noting that a population risk applies to individuals only if all persons within the population are the same with respect to the influence of the risk on outcome).

These cautionary notes are important reminders of the limits of epidemiologic method.  What these authors miss is that there may be no other principled way to connect one pre-existing risk, among several, to an outcome that is claimed to be tortious.  As the young, laconic Wittgenstein wrote: 

“Wovon man nicht sprechen kann, darüber muß man schweigen.” 

L. Wittgenstein, Tractatus Logico-Philosophicus, Proposition 7 (1921)(translated by Ogden as “Whereof one cannot speak, thereof one must be silent”).  Unfortunately, expert witnesses in legal proceedings sometimes do not feel the normative force of Wittgenstein’s Proposition 7, and they speak without restraint.  As a contemporary philosopher explained in a more accessible idiom,

“Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about.  Thus the production of bullshit is stimulated whenever a person’s obligations or opportunities to speak about some topic exceed his knowledge of the facts that are relevant to that topic.”

Harry Frankfurt, On Bullshit 63 (Princeton University Press 2005).

Judicial Innumeracy and the MDL Process

February 26th, 2011

In writing previously about the Avandia MDL Court’s handling of the defendants’ Daubert motion, I noted the trial court’s erroneous interpretation of statistical evidence.  See “Learning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011).  In fact, the Avandia court badly misinterpreted the meaning of a p-value, a basic concept in statistics:

“The DREAM and ADOPT studies were designed to study the impact of Avandia on prediabetics and newly diagnosed diabetics. Even in these relatively low-risk groups, there was a trend towards an adverse outcome for Avandia users (e.g., in DREAM, the p-value was .08, which means that there is a 92% likelihood that the difference between the two groups was not the result of mere chance).”

In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011) (internal citation omitted).  The Avandia MDL court was not, however, the first to commit this howler.  Professor David Kaye collected examples of statistical blunders from published cases in a 1986 law review, and again in his chapter on statistical evidence in the Federal Judicial Center’s Reference Manual on Scientific Evidence created a list of erroneous interpretations:

United States v. Georgia Power Co., 474 F.2d. 906, 915 (5th Cir. 1973)

National Lime Ass’n v. EPA, 627 F.2d 416, 453 (D.C. Cir. 1980)

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982) (“A variation of two standard deviations would indicate that the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke.”);

Vuyanich v. Republic Nat’l Bank, 505 F. Supp. 224, 272 (N.D. Tex. 1980) (“[I]f a 5% level of significance is used, a sufficiently large t-statistic for the coefficient indicates that the chances are less than one in 20 that the true coefficient is actually zero.”), vacated, 723 F.2d 1195 (5th Cir. 1984)

Craik v. Minnesota State Univ. Bd., 731 F.2d 465, 476n.13 (8th Cir. 1984)(“[a] finding that a disparity is statistically significant at the 0.095 or 0.01 level means that there is a 5 per cent. Or 1 per cent. Probability, respectively, that the disparity is due to chance.”  See also id. at 510 (Swygert, J., dissenting)(stating that coefficients were statistically significant at 1% level, allowing him to say that “we can be 99% confident that each was different from zero.”)

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940, 941 (7th Cir. 1997) (“An affidavit by a statistician . . . states that the probability that the retentions . . . are uncorrelated with age is less than 5 percent.”)

Waisome v. Port Authority, 948 F.2d 1370, 1376 (2d Cir. 1991) (“Social scientists consider a finding of two standard deviations significant, meaning there is about one chance in 20 that the explanation for a deviation could be random . . . .”)

David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in Reference Manual on Scientific Evidence 83, 122-24 (2nd ed. 2000); David H. Kaye, “Is Proof of Statistical Significance Relevant?” 61 Wash. L. Rev. 1333, 1347 (1986)(pointing out that before 1970, there were virtually no references to “statistical significance” or p-values in reported state or federal cases. 

Notwithstanding the educational efforts of the Federal Judicial Center, the innumeracy continues, and with the ascent of the MDL model for addressing mass torts, many recent howlers have come from trial judges given responsibility for overseeing the pretrial coordination of thousands of lawsuits.  In addition to the Avandia MDL Court, here are some other recent erroneous statements that can be added to Professor Kaye’s lists: 

“Scientific convention defines statistical significance as “P ≤ .05,” i.e., no more than one chance in twenty of a finding a false association due to sampling error.  Plaintiffs, however, need only prove that causation is more-probable-than-not.”

In re Ephedra Prods. Liab. Litig., 393 F.Supp.2d 181, 193 (S.D.N.Y. 2005)(confusing the standard for Type I statistical error with the burden of proof).

“More-probable-than-not might be likened to P < .5, so that preponderance of the evidence is nearly ten times less significant (whatever that might mean) than the scientific standard.”

Id. at 193 n.9 (same). 

In the Phenylpropanolamine litigation, the error was even more clearly stated, for both p-values and confidence intervals:

“P-values measure the probability that the reported association was due to chance… .”

“… while confidence intervals indicate the range of values within which the true odds ratio is likely to fall.”

In re Phenylpropanolamine Products Liab. Litig., 289 F. 2d 1230, 1236n.1 (2003)

These misstatements raise important questions about judicial competency for gatekeeping, the selection, education, and training of judges, the assignment of MDL cases to individual trial judges, and the aggregation of Rule 702 motions to a trial judge for a single, one-time decision that will control hundreds if not thousands of cases.

Recently, a student published a bold note that argued for the dismantling of judicial gatekeeping.  Note, “Admitting Doubt: A New Standards for Scientific Evidence,” 123 Harvard Law Review 2021 (2010).  With all the naiveté of someone who has never tried a jury trial, the student argued that juries are at least as good, if not better, at handling technical questions.  The empirical evidence for such a suggestion is slim, and ignores the geographic variability in jury pools.  The above instances of erroneous statistical interpretations might seem to support the student’s note, but the argument would miss two important points: 

  • these errors are put on display for all to see, and for commentators to note and correct, whereas jury decisions obscure their mistakes; and
  • judges can be singled out for their technical competencies, and given appropriate assignments (which hardly ever happens at present), and judges can be required to partake in professional continuing legal education, which might well include training in technical areas to improve their decision making.

The Federal Judicial Center, and its state court counterparts, have work to do.  Lawyers also have an obligation to help courts get difficult, technical issue right.  Finally, courts, lawyers, and commentators need to rethink how the so-called Daubert process works, and does not work, especially in the high-stakes arena of multi-district litigation.

Can Daubert Survive the Multi-District Litigation Process?

February 23rd, 2011

The so-called Daubert process, by which each side in a lawsuit may challenge and seek preclusion of the other side’s expert witnesses, arose in the setting of common-law judges making rulings in individual cases.  Indeed, the Daubert case itself, although one of many cases involving claims of birth defects allegedly caused by Bendectin, was an individual case. 

In the silicone gel breast implant (SGBI) litigation, the process evolved over time, with decisions from different judges, each of whom saw the evidence differently.  The different judges brought different insights and aptitudes to bear on the evidence, and the expert witnesses themselves may have varied in their approaches and reliance upon different studies.  This incrementalist approach, in the context of the SBGI litigation, worked to the benefit of the defendants, in part because their counsel learned about the fraudulent evidence underlying certain studies, and about serious lapses in the standard of research care on the part of some investigators whose studies were prominently relied upon by plaintiffs’ counsel.  In the case of one dubious study, one of its authors, Marc Lappe, a prominent expert witness for plaintiffs, withdrew his support from the conclusions advanced in the study.

Early decisions in the SGBI cases (shortly after the Supreme Court’s decision in Daubert, in 1993) denied the defendants’ applications to preclude plaintiffs’ expert witnesses’ opinion testimony.  Later decisions converged upon the unavoidable truth that the case for SGBIs causing atypical or typical connective tissue diseases was a house of cards, built mostly with jokers.  If the Daubert process had been censored after the first hearing, the result would have been to deem all the breast implant cases trial and jury worthy, to the detriment of the judicial process, to the public’s interest in knowing the truth about silicone biomaterials, to the defendants’ reputational and financial interests, and to the interests of the claimants who had been manipulated by their counsel and support group leaders.

The evolutionary approach taken in the SGBI litigation was indirectly supported by the late Judge Sam Pointer, who presided over the SGBI federal multi-district litigation (MDL).  Judge Pointer strongly believed that the decision to exclude expert testimony belonged to individual trial judges, who received cases on remand from the MDL 926, when the cases were ready for trial.  Judge Pointer ruled on expert witness challenges in cases set for trial before him, but he was not terribly enthusiastic about the Daubert process, and denied most of the motions in a fairly perfunctory fashion.  Because of this procedural approach, Judge Pointer’s laissez-faire attitude towards expert witness testimony did not interfere with the evolutionary process that allowed other courts to see through the dense fog in the plaintiffs’ case.

Since MDL 926, the MDL process has absorbed the ritual of each side’s challenging the other’s expert witnesses, and MDL judges view their role as including the hearing and deciding all pre-trial Daubert challenges.  It has been over 17 years since the Supreme Court decided Daubert, and in that time, the MDL model, both state and federal, has become dominant.  As a result, the Daubert process has often been truncated and abridged to a single motion, decided at one time, by one judge.  The results of this abridgement have not always been happy for ensuring reliable and accurate gatekeeping. 

The MDL process appears to have broken the promise of Rule 702 in many cases.  By putting the first and only Rule 702 gatekeeping decision in the hands of a single judge, charged with making pre-trial rulings in the entire MDL, the MDL process has sapped the gatekeeping process of its dynamic, evolutionary character.  No longer can litigants and judges learn from previous efforts, as well as from commentary by scientists and legal scholars on the prior outcomes.  For judges who lack scientific and analytical acumen, this isolation from the scientific community works to the detriment of the entire process.

To be sure, the MDL process for deciding Rule 702 is efficient.  In many cases, expensive motions, briefings, and hearings are reduced to one event.  The incorporation of expert challenges into an MDL may improve fairness in some instances by allowing well-qualified plaintiffs’ counsel to wrest control of the process from unprepared plaintiffs’ counsel who are determined to control their individual cases.  Defendants may embrace the MDL process because it permits a single, unified document production and discovery schedule of corporate executives.  Perhaps defendants see the gains from MDL process as sufficiently important to forgo the benefit of a fuller opportunity to litigate the expert witness issues.  Whatever can be said in favor of using the MDL forum to resolve expert witness challenges, it is clear that MDL procedures limit the parties’ ability to refine their challenges over time, and to incorporate new evidence and discovery gained after the first challenges are resolved.  In the SGBI litigation, for instance, the defendants learned of significant scientific malfeasance and misfeasance that undermined key studies relied upon by plaintiffs, including some studies done by apparently neutral, well-credential scientists.  The omnibus MDL Daubert motion prevents either side, or the judiciary, from learning from the first and only motion.

Another example of an evidentiary display that has changed over time comes from the asbestos litigation, where plaintiffs continue to claim that asbestos causes gastrointestinal cancer.  The first such cases were pressed by plaintiffs in the early 1980s, with the support of Dr Selikoff and his cadre of testifying physicians and scientists.  A few years ago, however, the Institutes of Medicine convened a committee to review non-pulmonary cancers and asbestos, and concluded that the studies, now accumulated over 35 years since Dr Selikoff’s ipse dixit, do not support a conclusion that asbestos causes colorectal cancer.  Institute of Medicine of the National Academies, Asbestos: Selected Health Effects (2006).

Unfortunately, many trial judges view the admissibility and sufficiency of causation opinions on asbestos and colorectal cancer as “grandfathered” by virtue of the way business has been conducted in trial courts for over three decades.  Still, defendants have gained the opportunity to invoke an important systematic review, which shows that the available evidence does not reliably support the conclusion urged by plaintiffs’ expert witnesses. 

The current approach of using the MDL as the vehicle for resolving expert witness challenges raises serious questions about how MDLs are assigned to judges, and whether those judges have the analytical or quantitative skills to resolve Daubert challenges.  Assigning an MDL to a judge, who will have to rule on the admissibility of expert witness opinion testimony she or he does not understand, does not inspire confidence in the judicial process.  At least in the ad hoc approach employed in the SGBI, the parties could size up their trial judge, and decide that they would forgo their expert challenges based upon their assessment.  Furthermore, an anomalous outcome could be corrected over a series of decisions.  The MDL process, on the other hand, frequently places the Rule 702 decision in the discretion of a single judge.  The selection criteria for that sole decision maker becomes critical.  As equity in days of old varied with the size of the Chancellor’s foot, today’s scientific equity under Rule 702 may vary with accuracy of the trial judge’s slide rule.

Toxic Litigation and Toxic Torts

February 2nd, 2011

Christopher J. Robinette, at TortsProf Blog, thoughtfully provided a link to a new paper, in press, by Professor Robert Rabin.  The paper is a short romp through the last few decades of toxic tort law.  Robert L. Rabin, “Harms from Exposure to Toxic Substances:  The Limits of Liability Law,” 38 Pepperdine L. Rev. 101 (2011), available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1747907

Having lived and practiced law through the romp, I thought it would make for an interesting read.

Professor Rabin describes the growth and contraction of judicial activism in response to popular enthusiasm for environmental and products liability.  As part of historical review, Rabin describes the growth of strict product liability, the advent of medical monitoring, fear, and increased risk damages, and the application of class action procedures to so-called toxic torts.

The story is familiar, but here it is told with enthusiasm for the very idea of liability.  Although I may be misreading the piece, Rabin seems to share the popular enthusiasm for liability, and regrets missed opportunities to impose even greater liability.  For instance, Rabin tells us that the “signals” sent by mass tort cases involving asbestos, Agent Orange, and Dalkon Shield, were “encouraging,” while the Bendectin litigation was one of the “notable litigation failures.” Id. at 105

The reader is challenged to imagine exactly what Professor Rabin has in mind by his scorecard of successes and failures.  Why, for instance, would anyone consider the asbestos litigation encouraging?  Asbestos litigation can take credit for dozens of bankruptcies, with the erosion of the country’s industrial manufacturing capability.  Jobs have been lost.  The asbestos litigation can take further credit for:

  • disruption and destruction of insurance markets,
  • procedural innovations, such as collusive class actions that sold out future claimants,
  • collusive bankruptcies that favored powerfully positioned plaintiffs’ law firms,
  • egregious consolidations
  • magic jurisdictions known “easy law,”
  • special rules for asbestos cases that deprived defendants of their opportunity to prepare defenses

Of course, procedural peculiarities of asbestos litigation pale in comparison with the substantive abuses:

  • fraudulent product identification,
  • fraudulent diagnoses,
  • unlawful and unethical mass screenings,
  • diluted causation standards,
  • markets for junk medicine
  • speculative damages for fear and risk of unrelated diseases
  • governmental avoidance of liability for its widespread use of asbestos in shipyards, and elsewhere

A sensible reaction would be to condemn asbestos litigation, and similar enterprises, as grotesque failures, and to cede the control of  risks, to the extent they are real, to federal and state police powers.  Here, however, Professor Rabin goes to an even farther extreme:  he tells us that “regulation has played virtually no role at all in reducing risk and compensating victims.”  Id. 113.  Rabin tells us that the regulatory failure was “especially evident in the case of asbestos,” which continued to be used in marketed products, and thus “remained unregulated in any meaningful sense, until the toll of death and disease had spiraled entirely out of control.”  Id. at 113 & n 62. 

Well, most (but not all) regulations of asbestos deal with mitigating risk, actual or potential, and not with providing compensation.  So on that score, we can hardly fault EPA, OSHA, CDC, NIOSH, etc., in their handling of health risks from asbestos.  The remainder of this assessment is equally difficult to understand.  The landmark case of Borel v. Fibreboard Paper Products Corp., 493 F.2d 1076 (5th Cir. 1973), cited by Professor Rabin, came one year before asbestos-containing insulation products were banned.  To be sure, EPA and OSHA have failed to ban all uses of asbestos, but their failure is driven by a lack of scientific knowledge that extremely low exposures to asbestos, and especially to chrysotile asbestos, are of any moment at all.  Asbestosis has become a medical curiosity in the last decade or so.  Lung cancer continues of course because men and women continue to smoke tobacco products.  Mesothelioma rates have stabilized or decreased, and the orthodoxy that asbestos causes gastrointestinal cancers has been debunked by this country’s Institute of Medicine.  Tellingly, Professor Rabin cites no support for his opinion that the failure to regulate low exposures to asbestos played any role in producing a spiral of death and disease. 

And why was Bendectin litigation a failure?  A new-age style of consolidated trials of multiple claimants in federal court ended in a defense verdict on general causation.  Although a few state courts were more hospitable to the plaintiffs’ claims, the Bendectin litigation taught the federal bench and most state courts about the quality and quantity of extremist advocacy on the part of claimants.  We owe Havner and Daubert, and a host of lesser known cases to Bendectin litigation.  So although much work needs to be done, one of Bendectin’s litigation successes was the education of American courts in the ways of statistical and epidemiologic evidence.  Ultimately, the courts put their teeth into standard procedural devices, such as summary judgment and expert witness gatekeeping, to put the Bendectin claims to rest.  Before the manufacturer, Merrill Richardson achieved vindication, however, it pulled an efficacious medication from the market, despite the absence of reliable evidence to support the claims that it caused birth defects.  Perhaps Rabin suggested that Bendectin was a litigation failure because the litigation process could not shut down the unfounded allegations and claims in time to save a worthwhile medication.

Absent from Professor Rabin’s historical discussion is any mention of the silicone gel breast implant litigation, which took hold with the advocacy of expert witnesses, described by Judge Jack Weinstein as “charlatans,” B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation.”   The massive, toxic litigation inspired by silicone led to billions of dollars in settlements before a few courageous judges (including Judge Weinstein) were willing to pay attention to the science in a more discriminating fashion.  Also absent from Rabin’s retrospective is any mention of the silica litigation, with its rampant fraud that has led to the defrocking of several physician witnesses.  In re Silica Products Liab. Litig., MDL No. 1553, 398 F.Supp. 2d 563 (S.D.Tex. 2005).

In his final analysis, Professor Rabin seems to acknowledge that the enthusiasm of the 1970s and early 1980s had to give way to other institutional goals, values, and considerations.  What Rabin does not say, about the abuses and excesses of toxic torts, and the toxic litigation it spawned, however, could fill volumes.

The Other Shoe Drops for GSK in Avandia MDL — Hand Waving on Specific Causation

January 24th, 2011

For GSK, the other shoe dropped in the Avandia multi-district litigation, on January 13, 2011, when the presiding judge denied the defense challenge to plaintiff’s expert witness specific causation opinions, in the first case set for trial.  Burford v. GlaxoSmithKline, PLC, 2011 WL 135017 (E.D.Pa. 2011). 

In the MDL court’s opinion on general causation, In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576 (E.D. Pa. 2011), Judge Rufe determined that she was bound to apply a “Third Circuit” approach to expert witness gatekeeping, which focused on the challenged expert witnesses’ methodology, not their conclusions.  In Burford, Judge Rufe, citing two Third Circuit cases were decided after Daubert, but before Joiner, repeats this basic mistake.  Burford, 2011 WL 135017, *2.  Remarkably, the court’s opinion in Burford recites the current version of Federal Rule of Evidence 702, which states that the court must analyze expert witnesses’ conclusions for being based upon “sufficient facts or data,” as well as for being “the product of reliable principle and methods.” The statute mandates consideration of the reliability and validity of the witness’s conclusions, if those conclusion are in his testimony.  This Rule, enacted by Congress in 2000, is a statute, and thus supersedes prior case law, although the Advisory Notes explain that the language of the rule draws heavily from the United States Supreme Court’s decisions in Daubert, Joiner, and Kumho Tire. The Avandia MDL court ignored both the post-Daubert decisions of the Supreme Court, as well as the controlling language of the statute, in gatekeeping opinions on general and specific causation.

Two expert witnesses on specific causation were the subject of GSK’s challenge in Burford:  Dr. Nicholas DePace and Dr. Judy Melinek.  The court readily dispatches Dr. Melinek, who opines that Mr. Burford’s fatal cardiac event, which she characterizes as a heart attack, was caused by Avandia because Avandia causes heart attacks.  The court correctly noted that this inference was improper because risk does not equal causation in a specific case.

As one well-known epidemiologist has put it:

“An elementary but essential principal that epidemiologists must keep in mind is that a person may be exposed to an agent and then develop disease without there being any causal connection between exposure and disease.”

* * *

“In a courtroom, experts are asked to opine whether the disease of a given patient has been caused by a specific exposure.  This approach of assigning causation in a single person is radically different from the epidemiologic approach, which does not attempt to attribute causation in any individual instance.  Rather, the epidemiologic approach is to evaluate the proposition that the exposure is a cause of the disease in a theoretical sense, rather than in a specific person.”

Kenneth Rothman, Epidemiology: An Introduction 44 (Oxford 2002)(emphasis added).

In addressing the admissibility of Dr. DePace’s expert opinion, however, the MDL Court is led astray by Dr. DePace’s handwaving about having considered and “ruled out” Mr. Burford’s other risk factors. 

To be sure, Dr. DePace has some ideas about how Avandia may, plausibly, cause heart attacks.  In particular, Dr. DePace identified three plausible mechanisms, each of which would have had been accompanied by some biomarker (elevated blood lipids, elevated Lp-PLA2, or hypoglycemia).  This witness, however, could not opine that any of these mechanisms was in operation in producing Mr. Burford’s fatal cardiac event. Burford, at *3.

Undaunted, Dr. DePace opined that he had ruled out Mr. Burford’s other risk factors, but his opinion, even from Judge Rufe’s narrative is clearly hand waving and dissembling.  First, everyone, including every middle age man, has a risk of heart attack or cardiac arrest, although that risk may be modified – increased or lowered – by risks or preventive factors.  Mr. Burford had severe diabetes, which in and of itself, is a risk factor, commonly recognized to equal the size of the risk from having had a previous heart attack.  So Mr. Burford was not at baseline risk; indeed, he started all his diabetes medications with the equivalent risk of someone who had had a heart attack already.

Dr. DePace apparently opined that Mr. Burford’s diabetes, his blood sugar level, was well controlled.  The court accepted this contention at face value, although the reader of the court’s opinion will know that it is rubbish.  Although the court does not recite any blood sugar levels, its narrative of facts includes the following course of medications for Mr. Burford:

  • June 2004, diagnosed with type II diabetes, and treated with metformin
  • April 2005, dose of metformin doubled
  • August 2005, Avandia added to double dose of metformin
  • December 2005, Avandia dose doubled as well
  • June 2006, metformin dose doubled again
  • October or November 2006, sulfonylurea added to Avandia and metformin

This narrative hardly suggests good control.  Mr. Burford was on a downward spiral of disease, which in a little over two years took him from diagnosis to three medications to try to control his diabetes. Despite adding Avandia to metformin, doubling the doses of Avandia, doubling and then quadrupling doses of metformin, Mr. Burford still required yet another, third medication, to achieve glycemic control.  Of course, an expert witness can say anything, but the federal district court is supposed to act as a gatekeeper, to protect juries and parties from their ipse dixit.  Many opinions will be difficult to evaluate, but here, Dr. DePace’s opinion about glycemic control in Mr. Burford comes with a banner headline, which shouts “bogus.”

The addition of a third medication, a sulfonylurea, known to cause hypoglycemia (dangerously low blood sugar), which in turn can cause cardiac events and myocardial infarction, is particularly troubling.  See “Sulfonylurea,” in Wikipedia January 24, 2011.  Sulfonylureas act by stimulating the pancreas to produce more insulin, and the sudden addition of this medication to an already aggressive regime of medication clearly had the ability to induce hypoglycemia in Mr. Burford.  Dr. DePace notes that there is no evidence of an hypoglycemic event, which is often true in diabetic patients who experience a sudden death, but the gatekeeping court should have noticed that Dr. DePace’s lack of evidence did not equate to evidence that the risk or actual causal role (of hypoglycemia) was lacking.  Again, the trial court appeared to be snookered by an expert witness’s hand waving.  Surely gatekeepers must be made of sterner stuff.

Perhaps the most wrongheaded is the MDL court’s handling, or its failure to handle, risk as causation, in Dr. DePace’s testimony.

In his deposition, Dr. DePace testified that a heart attack in a 49 year-old man was “very unusual.”  Such a qualitative opinion does not help the finder of fact.  A heart attack is more likely in any 49 year-old man than in any 21 year-old man, although men of both ages can and do suffer heart attacks.  Clearly, a heart attack is more likely in a 49-year old man who has had diabetes, which has required intensive medication for even a semblance of control, than in a 49 year-old man who has never had diabetes.  Dr. DePace’s opinions fail to show that Mr. Burford had no base-line risk in absence of one particular medication, or that this base-line risk was not operating to produce, sufficiently, his alleged heart attack. 

Rather than being a high-risk group with respect to his Avandia use, according to the FDA’s 2007  meta-analysis, Mr. Burford and other patients on “triple therapy” (Avandia + metformin + sulfonylurea), would have had an odds ratio of 1.1 for any myocardial ischemic event, not statistically significant, as a result of their Avandia use.  Mr. Burford’s additional use of an ACE-inhibitor, along with this three diabetic medications, would place him into yet another sub-subgroup.  Whatever modification or interaction this additional medication created in combination with Avandia, the confidence intervals, which were wide for the odds ratio of 1.1, would  become extremely wide, allowing no meaningful inference. In any event, the court in Burford does not tell us what the risk was opined to be, and whether there were good data and facts to support such an opinion.  Remarkably absent from the court’s opinion in Burford is any consideration of the actual magnitude of the claimed risk (in terms of a hazard ratio, relative risk, odds ratio, risk difference, etc.) for patients like Mr. Burford.  Further absent is any consideration of whether any study showing risk has further shown the risk to be statistically different from 1.0 (no increased risk at all). 

As Ted Frank has noted on PointofLaw Forum, the Avandia MDL raises serious questions about the allocation of technical multi-district litigation cases to judges in the federal system.  “It is hard to escape the conclusion that the MDL denied GSK intellectual due process of law” (January 21, 2011).  The Avandia experience also raises questions about the efficacy of the Federal Judicial Center’s program to train judges in the basic analytical, statistical, and scientific disciplines needed in their gatekeeping capacity. 

Although the Avandia MDL court’s assessment that Dr. DePace’s opinion was suboptimal, Burford at * 4, may translate into GSK’s ability to win before a jury, the point of Rule 702 is that a party should not have to stand trial on such shoddy evidence.

Power in the Courts — Part Two

January 21st, 2011

Post hoc calculations of power were once in vogue, but have now routinely been condemned by biostatisticians and epidemiologists in studies that report confidence intervals around estimates of associations, or “effect sizes.”  Power calculations require an alternative hypothesis against which to measure the rejection of the null hypothesis, and the choice of the alternative is subjective and often arbitrary.  Furthermore, the power calculation must make assumptions about the anticipated variance of the data to be obtained.  Once the data are in fact obtained, those assumptions may be shown wrong.  In other words, sometimes the investigators are “lucky,” and their data are less variable than anticipated.  The variance of the data actually obtained, rather than hypothesized, can best be appreciated from the confidence interval around the actually measured point estimate of risk.

In Part One of “Power in Courts,” I addressed the misplaced emphasis the Avandia MDL court put upon the concept of statistical power.  The court apparently accepted at face value the plaintiffs’ argument that GSK’s clinical trials were “underpowered,” which claim was very misleading.  Power calculations were no doubt done to choose sample size for GSK’s clinical trials, but those a priori estimates were based upon assumptions.  In the case of one very large trial, RECORD, many fewer events occurred than anticipated (which is generally a good thing to happen, and not unusual in the context of a clinical trial that gives patients in all arms of the trial better healthcare than available to the general population).  In one sense, those plaintiffs’ expert witnesses are correct to say that RECORD was “underpowered,” but once the study is done, the real measure of statistical precision is given by the confidence interval.

Because the Avandia MDL is not the only litigation in which courts and lawyers have mistakenly urged power concepts for studies that have already been completed, I have collected some key statements that reflect the general consensus and reasoning against what the Court did.

To be fair, the Avandia court did not fault the defense for not having analyzed and calculated post-hoc power of the clinical trials, all of which failed to find statistically significant associations between Avandia and heart attacks. The court, however, did appear to embrace the plaintiffs’ rhetoric that all the Avandia trials were underpowered, without any consideration given to the width and the upper bounds of the confidence intervals around those trials’ estimates of risk ratios for heart attack.  Remarkably, the Avandia court did not present any confidence intervals for any estimates of effect size, although it did present p-values, which it then badly misinterpreted.  Many of the Avandia trials (and the resulting meta-analyses) confidently ruled out risk ratios, for heart attacks, under 2.0.  The court’s conclusions about power are thus misleading at best.

Several consensus statements address whether considerations of power, after studies are completed and the data are analyzed, are appropriate.  The issue has also been addressed extensively in textbooks and in articles.  I have collected some of the relevant statements, below.  To the extent that the Federal Judicial Center’s Reference Manual on Scientific Evidence appears to urge post hoc power calculations, I hope that the much anticipated  Third Edition will correct the error.

CONSENSUS STATEMENTS

CONSORT

The CONSORT group (Consolidated Standards of Reporting Trials) is a world-wide group that sets quality standard for randomized trials in testing of pharmaceuticals.  CONSORT’s lead author is Douglas Altman, a well-respected biostatistician from Oxford University.  The advice of the CONSORT group is clear:

“There is little merit in calculating the statistical power once the results of the trial are known, the power is then appropriately indicated by confidence intervals.”

Douglas Altman, et al., “The Revised CONSORT Statement for Reporting Randomized Trials:  Explanation and Elaboration,” 134 Ann. Intern. Med. 663, 670 (2001).  See alsoDouglas Altman, et al., “Reporting power calculations is important,” 325 Br. Med. J. 1304 (2002).

STROBE

An effort similar to the CONSORT group has been put together by investigators interested in observational studies, the STROBE group (the Strengthening the Reporting of Observational Studies in Epidemiology).  The STROBE group was made up of leading epidemiologists and biostatisticians, who addressed persistent issues and errors in the reporting of observational studies.  Their advice was equally unequivocal on the issue of post hoc power considerations:

“Do not bother readers with post hoc justifications for study size or retrospective power calculations. From the point of view of the reader, confidence intervals indicate the statistical precision that was ultimately obtained. It should be realized that confidence intervals reflect statistical uncertainty only, and not all uncertainty that may be present in a study (see item 20).”

Vandenbroucke, et al., “Strengthening the reporting of observational studies in epidemiology (STROBE):  Explanation and elaboration,” 18 Epidemiology 805, 815 (2007) (Section 10, sample size).

American Psychological Association

In 1999, a committee of the American Psychological Association met to discuss various statistical issues in psychological research papers.  With respect to power analysis, the committee concluded:

“Once the study is analyzed, confidence intervals replace calculated power in describing the results.”

Wilkinson, Task Force on Statistical Inference, “Statistical methods in psychology journals:  guidelines and explanations,” 54 Am. Psychol. 594-604 (1999)

TEXTBOOKS

Modern Epidemiology

Kenneth Rothman and Sander Greenland are known for many contributions, not the least of which is their textbook on epidemiology.  In the second edition of Modern Epidemiology, the authors explain how and why confidence intervals replace power considerations, once the study is completed and the data are analyzed:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis.  The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect.  * * *  In planning a study, it is reasonable to make conjectures about the magnitude of an effect in order to compute sample-size requirements or power.

In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with sample-size or power calculations (Smith & Bates 1992; Goodman & Berlin 1994). * * * Confidence limits convey much more of the essential information by indicating a range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level).  They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”

Kenneth Rothman & Sander Greenland, Modern Epidemiology 192 – 193 (1998)

And in 2008, with the addition of Timothy Lash as a co-author, Modern Epidemiology continued its guidance on power as only a pre-study consideration:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis. The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect. In planning a study, it is reasonable to make conjectures about the magnitude of an effect to compute study-size requirements or power. In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with study-size or power calculations (Smith and Bates, 1992; Goodman and Berlin, 1994; Hoening and Heisey, 2001). Confidence limits and (even more so) P-value functions convey much more of the essential information by indicating the range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level), assuming the statistical model is correct. They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”

Kenneth Rothman, Sander Greenland, and Timothy Lash, Modern Epidemiology 160 (3d ed. 2008)

A Short Introduction to Epidemiology

Neil Pierce, an epidemiologist, citing Smith & Bates 1992, and Goodman & Berlin 1994, infra, describes the standard method:

“Once a study has been completed, there is little value in retrospectively performing power calculations since the confidence limits of the observed measure of effect provide the best indication of the range of likely value for true association.”

Neil Pierce, Introduction to Epidemiology (2d ed. 2005)

Statistics at Square One

The British Medical Journal publishes a book, Statistics at Square One, which addresses the issue of post hoc power:

“The concept of power is really only relevant when a study is being planned.  After a study has been completed, we wish to make statements not about hypotheses but about the data, and the way to do this is with estimates and confidence intervals.”

T. Swinscow, Statistics at Square One42 (9thed. London 1996) (citing to a book by Martin Gardiner and Douglas Altman, both highly accomplished biostatisticians).

How to Report Statistics in Medicine

Two authors from the Cleveland Clinic, in a guidebook published by the American College of Physicians:

“Until recently, authors were urged to provide ‘post hoc power calculations’ for non-significant differences.  That is, if the results of the study were negative, a power calculation was to be performed after the fact to determine the adequacy of the sample size.  Confidence intervals also reflect sample size, however, and are more easily interpreted, so the requirement of a post hoc power calculation for non-statistically significant results has given way to reporting the confidence interval (32).”

Thomas Lang & Michelle Secic, How to Report Statistics in Medicine 58 (2d ed. 2006)(citing to Goodman & Berlin, infra).  See also Thomas Lang & Michelle Secic, How to Report Statistics in Medicine 78 (1st ed. 1996)

Clinical Epidemiology:  The Essentials

The Fletchers, both respected clinical epidemiologists, describe standard method and practice:

Statistical Power Before and After a Study is Done

Calculation of statistical power based on the hypothesis testing approach is done by the researchers before a study is undertaken to ensure that enough patients will be entered to have a good chance of detecting a clinically meaningful effect if it is present.  However, after the study is completed this approach is no longer relevant.”  There is no need to estimate effect size, outcome event rates, and variability among patients, they are now known.

Therefore, for researchers who report the results of clinical research and readers who try to understand their meaning, the confidence interval approach is more relevant.  One’s attention should shift from statistical power for a somewhat arbitrarily chosen effect size, which may be relevant in the planning stage, to the actual effect size observed in the study and the statistical precision of that estimate of the true value.”

R. Fletcher, et al., Clinical Epidemiology: The Essentials at 200 (3d ed. 1996)

The Planning of Experiments

Sir David Cox is one of the leading statisticians in the world.  In his classic 1958 text, The Planning of Experiments, Sir David wrote:

“Power is important in choosing between alternative methods of analyzing data and in deciding on an appropriate size of experiment.  It is quite irrelevant in the actual analysis of data.”

David Cox, The Planning of Experiments 161 (1958)

ARTICLES

Cummings & Rivara (2003)

“Reporting of power calculations makes little sense once the study has been done.  We think that reviewers who request such calculations are misguided.”

* * *

“Point estimates and confidence intervals tell us more than any power calculations about the range of results that are compatible with the data.”

Cummings & Rivara, “Reporting statistical information in medical journal articles,” 157 Arch. Pediatric Adolesc. Med. 321, 322 (2003)

Senn (2002)

“Power is of no relevance in interpreting a completed study.

* * *

“The definition of a medical statistician is one who not accept that Columbus discovered America because he said he was looking for India in the trial plan.  Columbus made an error in his power calculation – – he relied on an estimate of the size of the Earth that was too small, but he made one none the less, and it turned out to have very fruitful consequences.”

Senn, “Power is indeed irrelevant in interpreting completed studies,” 325 Br. Med. J. 1304 (2002).

Hoenig & Heisey (2001)

“Once we have constructed a C.I., power calculations yield no additional insight.  It is pointless to perform power calculations for hypotheses outside of the C.I. because the data have already told us that these are unlikely values.”  p. 22a

Hoenig & Heisey, “The Abuse of Power:  The Pervasive Fallacy of Power Calculations for Data Analysis”? American Statistician (2001)

Zumbo & Hubley (1998)

In The Statistician, published by the Royal Statistical Society, these authors roundly condemn post hoc power calculations:

“We suggest that it is nonsensical to make power calculations after a study has been conducted and a statistical decision has been made.  Instead, the focus after a study has been conducted should be on effect size . . . .”

Zumbo & Hubley, “A note on misconceptions concerning prospective and retrospective power,” 47-2 The Statistician 385 (1998)

Goodman & Berlin (1994)

Professor Steven Goodman is a professor of epidemiology in Johns Hopkins University, and the Statistical Editor for the Annals Internal Medicine.  Interestingly, Professor Goodman appeared as an expert witness, opposite Sander Greenland, in hearings on Thimerosal.  His article, with Jesse Berlin, has been frequently cited in support of the irrelevance of post hoc power considerations:

“Power is the probability that, given a specified true difference between two groups, the quantitative results of a study will be deemed statistically significant.”

(p. 200a, ¶1)

“Studies with low statistical power have sample sizes that are too small, producing results that have high statistical variability (low precision).  Confidence intervals are a convenient way to express that variability.”

(p. 200a, ¶2)

“Confidence intervals should play an important role when setting sample size, and power should play no role once the data have been collected . . . .”

(p. 200 b, top)

“Power is exclusively a pretrial concept; it is the probability of a group of possible results (namely all statistically significant outcomes) under a specified alternative hypothesis.  A study produces only one result.”

(p. 201a, ¶2)

“The perspective after the experiment differs from that before that experiment simply because the result is known.  That may seem obvious, but what is less apparent is that we cannot cross back over the divide and use pre-experiment numbers to interpret the result.  That would be like trying to convince someone that buying a lottery ticket was foolish (the before-experiment perspective) after they hit a lottery jackpot (the after-experiment perspective).”

(p. 201a-b)

“For interpretation of observed results, the concept of power has no place, and confidence intervals, likelihood, or Bayesian methods should be used instead.”

(p. 205)

Goodman & Berlin, “The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results,” 121 Ann. Intern. Med. 200, 200, 201, 205 (1994).

Smith & Bates (1992)

This article was published in the journal, Epidemiology, which was founded and edited by Professor Kenneth Rothman:

“In conclusion, we recommend that post-study epidemiologic power calculations be abandoned.”

“Generally, a negative study with low power will be regarded as providing little evidence against the existence of a causal association.  Often overlooked, however, is that otherwise well-conducted studies of low power can be informative:  the upper bound of the (1 – α)% confidence intervals provides a limit on the likely magnitude of any actual effect.

The purpose of this paper is to extend this argument to show that the use of traditional power calculations is causal inference (that is, after a study has been carried out) can be misleading and inferior to the use of upper confidence limits of estimates of effect.  The replacement of post-study power calculations with confidence interval estimates is not a new idea.”

(p. 449a)

* * *

“It is clear, then, that the use of the upper confidence limit conveys considerable information for the purposes of causal inference; by contrast, the power calculation can be quite misleading.”

(p. 451b)

* * *

“In conclusion, we recommend that post-study epidemiologic power calculations be abandoned.  As we have demonstrated, they have little, if any, value.  We propose that, in their place, (1 – α)%  upper confidence limits be calculated.”

(p. 451b)

Smith & Bates, “Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies,” 3 Epidemiology 449-52 (1992)

Greenland (1988)

“the arbitrariness of power specification is of course absent once the data are collected, since the statistical power refers to the probability of obtaining a particular type of data.  It is thus not a property of particular data sets.  Statistical power of collected data, as the probability of heads on a coin toss that has already taken place, can, at best, meaningfully refer only to one’s ignorance of the result and loses all meaning when one examines the result.”

Greenland, “On Sample Size and Power Calculations for Studies Using Confidence Limits,” Am. J. Epidem. 236 (1988)

Simon (1986)

“Although power is a useful concept for initially planning the size of a medical study, it is less relevant for interpreting studies at the end.  This is because power takes no account of the actual results obtained.”

***

“[I]n general, confidence intervals are more appropriate than power figures for interpreting results.”

Richard Simon, “Confidence intervals for reporting results of clinical trials,” 105 Ann. Intern. Med. 429, 433 (1986) (internal citation omitted).

Rothman (1986)

“[Simon] rightly dismisses calculations of power as a weak substitute for confidence intervals, because power calculations address only the qualitative issue of statistical significance and do not take account of the results already in hand.”

Kenneth J. Rothman, “Significance Questing,” 105 Ann. Intern. Med. 445, 446 (1986)

Makuch & Johnson (1986)

“[the] confidence interval approach, the method we recommend for interpreting completed trials in order to judge the range of true treatment differences that is reasonable consistent with the observed data.”

Robert W. Makuch & Mary F. Johnson, “Some Issues in the Design and Interpretation of ‘Negative’ Clinical Studies,” 146 Arch. Intern. Med. 986, 986 (1986).

Detsky & Sackett (1985)

“Negative clinical trials that conclude that neither of the treatments is superior are often criticized for having enrolled too few patients.  These criticisms usually are based on formal sample size calculations that compute the number of patients required prospectively, as if the trial had not yet been carried out.  We suggest that this ‘prospective’ sample size calculation is incorrect, for once the trial is over we have ‘hard’ data from which to estimate the actual size of the treatment effect.  We can either generate confidence limits around the observed treatment effect or retrospectively compare it with the effect hypothesized before the trial.”

Detsky & Sackett, “When was a ‘negative’ clinicaltrial big enough?  How many patients you need depends on what you found,” 145 Arch. Intern. Med. 709 (1985).

Power in the Courts — Part One

January 18th, 2011

The Avandia MDL court, in its recent decision to permit plaintiffs’ expert witnesses to testify about general causation, placed substantial emphasis on the statistical concept of power.  Plaintiffs’ key claim is that the Avandia causes heart attacks, yet no clinical trial of the oral anti-diabetic medication Avandia found a statistically significant increased risk of heart attacks.  Plaintiffs’ expert witnesses argued that all the clinical trials of Avandia were “underpowered,” and thus the failure to find an increased risk was a Type II (false-negative) error that resulted from the small size of the clinical trials:

“If the sample size is too small to adequately assess whether the substance is associated with the outcome of interest, statisticians say that the study lacks the power necessary to test the hypothesis. Plaintiffs’ experts argue, among other points, that the RCTs upon which GSK relies are all underpowered to study cardiac risks.”

In re Avandia Marketing, Sales Practices, and Products Liab. Litig., MDL 1871, Mem. Op. and Order (E.D.Pa. Jan. 3, 2011)(emphasis in original).

The true effect, according to plaintiffs’ expert witnesses, could be seen only through aggregating the data, across clinical trials, in a meta-analysis.  The proper conduct, reporting, and interpretation of meta-analyses were thus crucial issues for the Avandia MDL court, which appeared to have difficulty with statistical concepts.  The court’s difficulty, however, may have had several sources beyond misleading plaintiffs’ expert witness testimony, and the defense’s decision not to call an expert in biostatistics and meta-analysis at the Rule 702 hearing.

Another source of confusion about statistical power may well have come from the very reference work designed to help judges address statistical and scientific evidence in their judicial capacities:  The Reference Manual on Scientific Evidence.

Statistical power is discussed in the both the chapters on statistics and on epidemiology in The Reference Manual on Scientific Evidence.  The chapter on epidemiology, however, provides misleading guidance on the use of power:

“When a study fails to find a statistically significant association, an important question is whether the result tends to exonerate the agent’s toxicity or is essentially inconclusive with regard to toxicity. The concept of power can be helpful in evaluating whether a study’s outcome is exonerative or inconclusive.79  The power of a study expresses the probability of finding a statistically significant association of a given magnitude (if it exists) in light of the sample sizes used in the study. The power of a study depends on several factors: the sample size; the level of alpha, or statistical significance, specified; the background incidence of disease; and the specified relative risk that the researcher would like to detect.80 Power curves can be constructed that show the likelihood of finding any given relative risk in light of these factors. Often power curves are used in the design of a study to determine what size the study populations should be.81

Michael D. Green, D. Michael Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, The Reference Manual on Scientific Evidence 333, 362-63 (2ed. 2000).  See also David H. Kaye and David A. Freedman, Reference Guide on Statistics,” Federal Judicial Center, Reference Manual on Scientific Evidence 83, 125-26 (2ed. 2000)

This guidance is misleading in the context of epidemiologic studies because power curves are rarely used any more to assess completed studies.  Power calculations are, of course, used to help determine sample size for a planned study.  After the data are collected, however, the appropriate method to evaluate the “resolving power” of a study is to examine the confidence interval around the study’s estimate of risk size.

The authors of the chapter on epidemiology cite to a general review paper, id. at p. 362n.79, which does indeed address the concept of statistical power, but the author, a well-known statistician, addresses the issue primarily in the context of planning a statistical analysis, and in discrimination litigation, where the test result will be expressed in a p-value, without a measure of “effect size,” and more important, without a measure of a “confidence interval” around the estimate of effect size:

“The chance of rejecting the false null hypothesis, under the assumptions of an alternative, is called the power of the test. Simply put, among many ways in which we can test a null hypothesis, we want to select a test that has a large power to correctly distinguish between two alternatives. Generally speaking, the power of a test increases with the size of the sample, and tests have greater power, and therefore perform better, the more extreme the alternative considered becomes.

Often, however, attention is focused on the first type of error and the level of significance. If the evidence, then, is not statistically significant, it may be because the null hypothesis is true or because our test did not have sufficient power to discern a difference between the null hypothesis and an alternative explanation. In employment discrimination cases, for example, separate tests for small samples of employees may not yield statistically significant results because each test may not have the ability to discern the null hypothesis of nondiscriminatory employment from illegal patterns of discrimination that are not extreme. On the other hand, a test may be so powerful, for example, when the sample size is very large, that the null hypothesis may be rejected in favor of an alternative explanation that is substantively of very little difference.  ***

Attention must be paid to both types of errors and the risks of each, the level of significance, and the power. The trier of fact can better interpret the result of a significance test if he or she knows how powerful the test is to discern alternatives. If the power is too low against alternative explanations that are illegal practices, then the test may fail to achieve statistical significance even though the illegal practices may be operating. If the power is very large against a substantively small and legally permissible difference from the null hypothesis, then the test may achieve statistical significance even though the employment practices are legal.”

Stephen E. Fienberg,  Samuel H. Krislov, and Miron L. Straf, “Understanding and Evaluating Statistical Evidence in Litigation,” 36 Jurimetrics J. 1, 22-23 (1995).

Professor Fienberg’s characterization is accurate, but his description of “post-hoc” assessment of power was not provided for the context of  edemiologic studies, which today virtually always report confidence intervals around the studies’ estimates of effect size.  These confidence intervals allow a concerned reader to evaluate what can reasonably ruled out by the data in a given study.  Post-hoc power calculations or considerations fail to provide meaningful consideration because they require a specified alternative hypothesis.  A wily plaintiff’s expert witness can always arbitrarily select a sufficiently low alternative hypothesis, say a relative risk of 1.01, such that any study would have a vanishingly small probability of correctly distinguishing the null and alternative hypotheses.

The Reference Manual is now undergoing a revision, for an anticipated third edition.  A saner appreciation of the concept of power as it is used in epidemiologic studies and clinical trials would be helpful to courts and to lawyers who litigate cases involving this kind of statistical evidence.

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.