TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Historical Intersection of Law and Epidemiology: Miller v National Cabinet (NY Court of Appeals 1960)

January 3rd, 2014

The history of statistics, epidemiology, and products liability are intertwined in ways that call for greater attention.  The 1950s and 1960s witnessed increasingly sophisticated statistical approaches to epidemiologic evidence. Starting in 1950, and continuing throughout the 1950s, Sir Richard Doll and Sir Austin Bradford Hill began their epidemiologic exploration of lung cancer among smokers. See, e.g., Richard Doll & A. Bradford Hill, “Smoking and Carcinoma of the Lung,” 2 Br. Med. J. 739 (1950); Richard Doll & A. Bradford Hill, “The mortality of doctors in relation to their smoking habits; a preliminary report,” 1 Br. Med. J. 1451 (1954).  In 1955, Sir Richard Doll published his important paper that suggested an association between asbestosis and lung cancer.  Richard Doll, “Mortality from Lung Cancer in Asbestos Workers,”  12 Br. J. Indus. Med. 81 (1955).  No disparity between observed and expected rates of lung cancer was observed among workers without asbestosis. Measures of p-values were used to assess the strength of the evidence against a null hypothesis of no association. As important an advance as was Doll’s paper, and as careful an investigator as he was, it is remarkable that Doll neglected to consider the potential role of smoking in producing the excess lung cancer rates among the factory workers with asbestosis. 

Starting in the 1960s, Dr. Irving Selikoff began publishing his epidemiologic studies of American asbestos insulators. See, e.g., Irving J. Selikoff , Jacob Churg,  and E. Cuyler Hammond, “Asbestos exposure and neoplasia,” 188 J. Am. Med. Ass’n 22 (1964). Selikoff neglected to stratify his observational data by the presence or absence of clinical asbestosis (although his later studies suggested that there was a very high prevalence of asbestosis after 20 years from first employment).  In addition, these insulator studies used crude measures of smoking, which lumped the very rare non-smoking insulators in with those who “never smoked regularly.”

In 1965, Sir Austin Bradford Hill published his lecture to the Royal Society of Medicine, in which he gave a spirited defense of inferring causality from observational epidemiologic studies. Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965). Hill was justly proud of the success of observational epidemiology, at least for very large effect sizes that made residual confounding and bias unlikely.

The same years as Hill’s lecture, the American Law Institute published the Restatement (Second) of Torts, with its controversial Section 402A. Before 1965, employee-plaintiff lawsuits against remote suppliers of raw materials and products to employers were a rarity in American law.  Bradford Hill’s lecture on causal assessments of “clear-cut” statistical associations came just as epidemiologic, statistical evidence was working its way into tort cases involving smoking and lung cancer.  Not surprisingly, some of the first uses of epidemiologic evidence occurred in cases involving claims that tobacco caused lung cancer. See, e.g., Lartigue v. R.J. Reynolds Tobacco Co., 317 F.2d 19 (1963) (affirming defense verdict in case noted for plaintiffs’ use of epidemiologic evidence) (“The plaintiff contends that the jury’s verdict was contrary to the manifest weight of the evidence. … The jury had the benefit of chemical studies, epidemiological studies, reports of animal experiments, pathological evidence, reports of clinical observations, and the testimony of renowned doctors. The plaintiff made a convincing case, in general, for the causal connection between tobacco and cancer and, in particular, for the causal connection between Lartigue’s smoking and his cancer. The defendants made a convincing case for the lack of any causal connection.”), cert. denied, 375 U.S. 865 (1963), and cert. denied, 379 U.S. 869 (1964).

Epidemiologic and statistical evidence in tort cases has become a commonplace, even when it is distorted and abused by litigants and judges. Recent decisions involving claims that benzene caused various cancers are illustrative.  See, e.g., Milward v. Acuity Specialty Products Group, Inc., 664 F.Supp. 2d 137 (D. Mass. 2009) (granting motion to exclude opinions that substantially distorted epidemiologic evidence under the vague rubric of “weight of the evidence”), rev’d, 639 F.3d 11 (1st Cir. 2011) (closing off scrutiny of expert witness’s abuse of epidemiologic evidence in one of the most controversial, reactionary decisions involving federal gatekeeping decisions of recent years), cert. denied, U.S. Steel Corp. v. Milward, ___ U.S. ___, 2012 WL 33303 (2012). See also David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013); “WOE-fully Inadequate Methodology – An Ipse Dixit By Another Name” (Sept. 2, 2011); “Milward — Unhinging the Courthouse Door to Dubious Scientific Evidence” (Sept. 2, 2011)

Given the problematic Milward decision, we might wonder what challenges to benzene-leukemia cases looked like just before Hill’s defense of inferring causality from observational studies began to infuse the witches’ brew of Rule 402A. What did statistical evidence look like before Hill’s paper?  In court cases, typically, statistical evidence was presented crudely or not at all.

In 1960, there was little opportunity to challenge causation opinions on admissibility grounds; rather sufficiency of the evidence to support a verdict or judgment was the primary means to gain review of an adverse decision.  Case reports of leukemia in workers very heavily exposed to benzene appeared in the 1920s, but it was not until the 1960s that analytical epidemiologic evidence (case-control and cohort studies) of association between leukemia and benzene were published. See generally Deborah Glass, Christopher Gray, Damien Jolley, Carl Gibbons, and Malcolm Sim, “The health watch case – control study of leukemia and benzene: the story so far,” 1076 Ann. N.Y. Acad. Sci. 80-89 (2006).  Thus, when the New York Court of Appeals decided a case involving a claim of benzene-induced leukemia, in 1960, the judicial decision was driven largely by the absence of specific quantification of risk of leukemia among workers occupationally exposed to benzene.[1]  Miller v. National Cabinet Co., 8 N.Y.2d 277, 281, 168 N.E.2d 811, 813, 204 N.Y.S.2d 129, 132, modified on other grounds, 8 N.Y.2d 1025, 70 N.E.2d 214, 206 N.Y.S.2d 795 (1960). The New York high court wrestled with the formalistic aspects of the expert witnesses’ testimony, including whether they expressed themselves in terms of “possibilities” or “probabilities.” Miller, 8 N.Y.2d at 284, 168 N.E.2d  at 814-15, 204 N.Y.S.2d at 134.

Focusing in one of the more knowledgeable of plaintiffs’ expert witnesses, Dr. Reznikoff, the Miller court was impressed that this witness disclaimed any intent to support an inference, from statistical analyses, to the plaintiff’s decedent.  Id. at 283. Furthermore, the court suggested that Reznikoff’s evidence might have been sufficient were it not for concession:

“I am sorry I can’t give you any statistics, but we don’t have them.”

Id. at 283.  The court appeared also to be concerned that Reznikoff’s approach provided no mechanistic insight or understanding into why many cases of  leukemia followed benzene exposure. Id.  

Reznikoff’s qualifications to speak to the subject were not dispositive of the question; the court was looking for data that were not available in the 1950s, when the case was tried:

“Not every supposition of a witness concerning what might be has the force of evidence, even though he has been licensed to practice medicine. If the witness is unfamiliar with any statistical data in the medical literature or in his own practice to give an inkling either to himself or to the court or board of how high the incidence of these cases is in situations of this kind, then the doctor’s assumption that it is ‘quite high’ is without significance. The lack of any kind of statistical data, which in the absence of scientific understanding is all that there would be to go on, is the more inexplicable if the claim is well founded in view of the large number of persons who die of leukemia and of workers in industry who are exposed to benzol. If there were any observed correlation between the two, it is certain that a physician of Dr. Reznikoff’s standing would be in possession of the information.”

Id. at 283-84. The court did not excuse the claimant’s evidentiary display with the indulgent, “this was the best evidence available,” when the evidence was inadequate.  Nor did the court engage in soothsaying that causality would someday be demonstrated.  We might feel some frustration today, looking back, that the court missed this opportunity, but case reports, and even case reports and epidemiologic studies, have generated many false-positive associations.  Clearly, more is required, and the New York Court of Appeals recognized the necessity for more.

Traumatic Cancers Distinguished

In 1960, the courts still indulged the proto-scientific opinion that traumatic injury caused cancer.[2]  Some medical writers supported this opinion, but by 1960, the opinion was already falling out of favor due to an improved understanding of carcinogenesis.

There is much irony, therefore, in the Miller court drawing’s an invidious distinction between Reznikoff’s proto-epidemiologic evidence and traumatic cancer cases that were still prevalent in the 1960s.  Id. at 285-86. In the traumatic cancer cases, in the 1960s, and even in the 1970s, courts sustained verdicts for cancer claimants who had shown that their cancers were diagnosed very shortly after a traumatic blow to the precise portion of the body where cancer manifested. The Miller court referred to these traumatic cancer cases as presenting the kind of causal inferences that could be understood and made by judges and juries.  Today, 40 years later, we see those causal inferences as mostly rubbish, based upon incorrect, inadequate, and discarded theories of carcinogenesis.

The prospect of cancer cases sustained by epidemiologic (statistical) evidence clearly troubled the New York court:

“The courts have been confronted before with cancer cases, and this is not likely to be the last. This is not an isolated situation. Questions of causation are common to actions based on warranty, tort or workmen’s compensation proceedings. Would, for example, evidence that there are 4 to 11 times as many cases of lung cancer among cigarette smokers as among nonsmokers be sufficient to establish a cause of action for breach of warranty in the sale of cigarettes? … There appear to be no decisions upholding causation in so complex a variety of the disease as leukemia. The cancer decisions in the courts where recovery has been allowed have dealt almost entirely with trauma, and there only in instances where the trauma occurred in the spot in the body where the pre-existing cancer was and the symptoms of its aggravation were immediately apparent … . In all of those cases the immediacy of the symptoms of aggravation of the cancer by a traumatic injury suffered in the area where the cancer was located was accepted as a substitute for scientific evidence or understanding of cause and effect. Absent that, damage claims of this nature have been dismissed on the law for lack of evidence of causation.”

Id. at 285-86 (internal citations omitted). The Miller court went on to note that New York law required that the cancer must develop at the exact location of the injury.  Furthermore, latency between the traumatic blow and the clinical recognition of the cancer was fatal to the claim, even in the face of opinion testimony that a plaintiff’s cancer was a “very slow growing” tumor.  Today, we understand that latency between first-exposure and clinical manifestation is necessitated by the length of induction periods and the doubling time of solid tumor cancers.  As a result of the Miller court’s reliance upon some dodgy notions of cancer causation, it held that Mr. Miller’s latency period disqualified the case from the immediate impact rule of traumatic cancer cases. Id. at 287-88 (distinguishing Hagy v. Allied Chemical & Dye Corp., 122 Cal. App. 2d 361, 265 P.2d 86 (1953), which involved a diagnosis of laryngeal cancer following immediately upon exposure to sulfuric acid mists).

The majority in Miller further expressed its concern that the understanding of cancer causation was marked by such uncertainty that the mere possibilities of chemical carcinogenesis should not be tolerated in this and similar cases:

“… [F]or so long as the causes of a disease — like cancer — are unknown to science, everyone contracting the disease could secure medical testimony that it is ‘possible’ that the disease is contracted from a wide variety of causes, choosing in each instance the particular possibility having the greatest promise of holding liable some responsible defendant. Any cancer expert could readily state that cancer could be caused by virus infection or by exposure to automobile exhaust fumes, sunlight, radiation, smog, smoking, hormone imbalance or according to any other theory which has been entertained by researchers or specialists as a possibility. Is a malpractice suit pending against some doctor who has given cortisone or ACTH as medicine? Then appears a medical witness who testifies that possibly cancer is caused by hormone imbalance induced thereby. Is it an action for breach of an implied warranty in the sale of cigarettes? Then the medical witness will testify that cigarettes could be a cause of lung cancer. Is it X ray or working in a garage where there have been exhaust fumes? Then the ‘possibility’ doctrine is adapted to creating questions of fact in those fields — and the same with benzol exposure and leukemia. Such a doctrine would overturn the rule that the burden is on the party asserting that a disease is based on actionable facts to prove causation. It would mean that, wherever such a cause is possible, the burden rests on the opposite party to prove that the disease resulted from something else. Consequently, for so long as the causes of the disease are unknown to medical science, the claimant or plaintiff can always recover — if the trier of the fact is favorably disposed — since no one can prove that the disease had other causes. This is a perversion of the normal rule that the disease must have resulted from the occupation and that the burden of proving causation is upon the party asserting it. The law does not intend that the less that is known about a disease the greater shall be the opportunity of recovery in court.”

Id. at 289.

The Miller decision provoked a dissent, mostly on formalistic grounds.  Id. at 290. The dissenting judges asserted, without much analysis, that there was substantial evidence to support causality. Given that qualified expert witnesses showed up for the claimant seemed sufficient on this score for the dissenters.  To the extent that the claimant’s expert witnesses expressed themselves in terms of possibilities, the dissenters opined that possibilities are sufficient, especially in the context of workman’s compensation cases, in which the burden of proof standards are lower than in common law civil liability cases.

The majority opinion stands as an eloquent expression of concern about the need for quantitative evidence of statistical risk in chemical exposure cancer cases. The court also presciently saw what would become a plague of litigation involving claims of cancer causation.  In 1960, for benzene and leukemia, the evidence was clearly, even by the standards of the day, inadequate, and the claimant’s expert witnesses were appropriately modest about what inferences could be drawn both with respect to general and specific causation.  The 1970s would witness a growing immodesty among available expert witnesses, as well as an explosive growth in the techniques and applications of analytical epidemiology to many problems, including the relationship between benzene and leukemia.



[1] A decade or two later, the scientific community recognized high levels of exposure to benzene as a cause of certain kinds of leukemia, by virtue of epidemiologic studies. See, e.g., Fusun Yaris, Mustafa Dikici, Turhan Akbulut, Ersin Yaris, Hilmi Sabuncu, “Story of benzene and leukemia: epidemiologic approach of Muzaffer Aksoy,” 46 J. Occup. Health 244 (2004); Abdul Khalade, Maritta S Jaakkola, Eero Pukkala and Jouni JK Jaakkola, “Exposure to benzene at work and the risk of leukemia: a systematic review and meta-analysis,” 9 Envt’l Health 31 (2010).  See also Michael D. Green, The Paradox of Statutes of Limitations in Toxic Substances Litigation, 76 Cal. L. Rev. 965, 974 (1988).

[2] William B. Coley & Norman L. Higinbotham, “Injury as a Causative Factor in the Development of Malignant Tumors,” 98 Annals of Surgery 991 (1933); Shields Warren, “Minimal Criteria Required to Prove Causation of Traumatic or Occupational Neoplasms,” 117 Annals of Surgery 585 (April 1943); Shields Warren, “Criteria Required to Prove Causation of Occupational or Traumatic Tumors,” 10 U. Chi. L. Rev. 313, 318-20 (1943); Russell & Clark, “Medico-Legal Considerations of Trauma and Other External Influences in Relationship to Cancer,” 6 Vand. L. Rev. 868, 875 (1953); Arden R. Hedge, “Can a Single Injury Cause Cancer?” 90 California Medicine 55 (1959); Auster, “The Role of Trauma in Oncogenesis: A Juridical Consideration,” 175 J. Am. Med. Ass’n 940, 949 (1961); Comment, “Sufficiency of Proof in Traumatic Cancer Cases,” 46 Cornell L.Q. 581, 581-82 (1961); Comment, “Sufficiency of Proof in Traumatic Cancer: A Medico-Legal Quandary,” 16 Arkansas L. Rev. 243, 256 67 (1962); Dyke, “Traumatic Cancer,” 15 Clev.Mar. L. Rev 472, 484-94 (1977).  See also Comment, “Judicial Attitudes Towards Legal and Scientific Proof of Cancer Causation,” 3 Columbia J. Envt’l L. 344, 354-68 (1977).

 

The Seventh Circuit Regresses on Rule 702

October 29th, 2013

Earlier this month, a panel of the Seventh Circuit of the United States Court of Appeal decided a relatively straight forward case by reversing the trial court’s exclusion of a forensic accountant’s damages calculation.  Manpower, Inc. v. Insurance Company of the State of Pennsylvania, No. 12‐2688 (7th Cir. Oct. 16, 2013).  In reversing, the appellate court disregarded a congressional statute, Supreme Court precedent, and Circuit decisional law.

The case involved a dispute over insurance coverage dispute and an economic assessment of Manpower, Inc.’s economic losses that followed a building collapse.  The trial court excluded Manpower’s accounting expert witness, Sullivan, who projected a growth rate (7.76%) for the plaintiff by comparing total revenues for a five month period in 2006 to the same five months in the previous year.  Id. at 8.  The historical performance, however, included a negative annual growth rate of 4.79% , over the years 2003 to 2009.  Over the five months immediately preceding Sullivan’s chosen period in 2006, the growth rate was merely 3.8%, less than half his projected growth rate.  Id.  Sullivan tried to justify his rather his extreme selectivity in data reliance by adverting to information that he obtained from the company about its having initiated new policies and installed new managers by the end of 2005.  Id.

The trial court held that Sullivan, who was not an expert on business management, had uncritically accepted the claimant’s proffered explanation for a very short-term swing in profitability and revenue.  Id. at 9.  While suggesting that Sullivan’s opinion was not “bulletproof,” the panel of the Seventh Circuit reversed.  The panel, which should have been reviewing the district court for potential “abuse of discretion,” appears to have made its own independent determination that Sullivan opinion was “sufficiently reliable to present to a jury.” Id. at 17.  In reversing, the panel explained that “the district court exercised its gatekeeping role under Daubert with too much vigor.” Id.

The panel attempted to justify its reversal by suggesting that a district court “usurps the role of the jury, and therefore abuses its discretion, if it unduly scrutinizes the quality of the expert’s data and conclusions rather than the reliability of the methodology the expert employed.” Id. at 18.  The panel’s reversal illustrates several methodological and legal confusions that make this case noteworthy beyond its mundane subject matter.

Of course, the most striking error in the panel’s approach is citing to a Supreme Court case, Daubert, which has been effectively superseded by a Congressional statute, Federal Rule of Evidence 702, in 2000:

“A witness who is qualified as an expert … may testify in the form of an opinion or otherwise if:

(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data;

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.”

Pub. L. 93–595, § 1, Jan. 2, 1975, 88 Stat. 1937; Apr. 17, 2000 (eff. Dec. 1, 2000); Apr. 26, 2011, eff. Dec. 1, 2011.)  Ironically, the Supreme Court’s Daubert case itself, had the Manpower panel paid attention to it, reversed the Ninth Circuit for applying a standard, the so-called Frye test, which predated the adoption of the Federal Rules of Evidence in 1975.  Rather than following the holding of the Daubert case, the panel got mired down in its dicta about a distinction between methodology and conclusion.  The Supreme Court itself abandoned his distinction a few years later in General Electric Co. v. Joiner, when it noted that

“conclusions and methodology are not entirely distinct from one another.”

522 U.S. 136, 146 (1997).

The panel of the Seventh Circuit concluded, without much real analysis, that the district court had excluded Sullivan’s opinions on a basis that implicated his conclusion and data selection, not his methodology.  Id. at 19-20.  The problem, of course, is that how one selects data of past performance to project future performance is part and parcel of the methodology of making the economic projection.  The supposed distinction advanced by the panel is illusory, and contrary to post-Daubert decisions, and the Congressional revision of the statute, which requires attention to whether “the testimony is based on sufficient facts or data; the testimony is the product of reliable principles and methods; and, the expert has reliably applied the principles and methods to the facts of the case.” Rule 702.

To make matters worse, the appellate court in Manpower proceeded to attempt to justify its reversal on grounds of “[t]he latitude we afford to statisticians employing regression analysis, a proven statistical methodology used in a wide variety of contexts.” Id. at 21. Here the appellate court suggests that if expert witnesses use a statistical test or analysis, such as regression analysis, it does not matter how badly they apply the test, or how worthless their included data are.  Id. at 22.  According to the Manpower panel:

“the Supreme Court and this Circuit have confirmed on a number of occasions that the selection of the variables to include in a regression analysis is normally a question that goes to the probative weight of the analysis rather than to its admissibility. See, e.g.,Bazemore v. Friday, 478 U.S. 385, 400 (1986) (reversing lower court’s exclusion of regression analysis based on its view that the analysis did not include proper selection of variables); Cullen v. Indiana Univ. Bd. of Trustees, 338 F.3d 693, 701‐02 & n.4 (7th Cir. 2003) (citing Bazemore in rejecting challenge to expert based on omission of variables in regression analysis); In re High Fructose Corn Syrup Antitrust Litigation, 295 F.3d 651, 660‐61 (7th Cir. 2002) (detailing arguments of counsel about omission of variables and other flaws in application of the parties’ respective regression analyses and declining to exclude analyses on that basis); Adams v. Ameritech Servs., Inc., 231 F.3d 414, 423 (7th Cir. 2000) (citing Bazemore in affirming use of statistical analysis based solely on correlations—in other words, on a statistical comparison that employed no regression analysis of any independent variables at all). These precedents teach that arguments about how the selection of data inputs affect the merits of the conclusions produced by an accepted methodology should normally be left to the jury.”

Id. at 22.

Again, the Seventh Circuit’s approach in Manpower is misguided. Bazemore involved a multivariate regression analysis in the context of a discrimination case.  Neither the Supreme Court nor the Fourth Circuit considered the regression at issue in Bazemore as evidence; rather the analysis was focused upon whether, within the framework of discrimination law, the plaintiffs’ regression satisfied their burden of establishing a prima facie case that shifted the burden to the defendant. No admissibility challenge was made to the regression in Bazemore under Rule 702.  Of course, the Bazemore litigation predates the Supreme Court’s decision in Daubert by several years.  Furthermore, even the Bazemore decision acknowledged that there may be

“some regressions so incomplete as to be inadmissible as irrelevant… .”

478 U.S. 385, 400 n.10 (1986).

The need for quantitative analysis of race and other suspect class discrimination under the equal protection clause no doubt led the Supreme Court, and subsequent lower courts to avoid looking too closely at regression analyses.  Some courts, such as the Manpower panel view Bazemore as excluding regression analysis from gatekeeping of statistical evidence, which magically survives Daubert. The better reasoned cases, however, even within the Seventh Circuit fully apply the principles of Rule 702 to statistical inference and analyses. See, e.g., ATA Airlines, Inc. v. Fed. Express Corp., 665 F.3d 882, 888–89 (2011) (Posner, J.) (reversing on grounds that plaintiff’s regression analysis should never have been admitted), cert. denied, 2012 WL 189940 (Oct. 7, 2012); Zenith Elecs. Corp. v. WH-TV Broad. Corp., 395 F.3d 416 (7th Cir.) (affirming exclusion of expert witness opinion whose extrapolations were mere “ipse dixit”), cert. denied, 125 S. Ct. 2978 (2005); Sheehan v. Daily Racing Form, Inc. 104 F.3d 940 (7th Cir. 1997) (Posner, J.) (discussing specification error).  See also Munoz v. Orr, 200 F.3d 291 (5th Cir. 2000).  For a more enlightened and educated view of regression and the scope and application of Rule 702, from another Seventh Circuit panel, Judge Posner’s decision in ATA Airlines, supra, is an essential starting place. SeeJudge Posner’s Digression on Regression” (April 6, 2012).

There is yet one more flaw in the Manpower decision and its rejection of the relevancy of data quality for judicial gatekeeping.  Federal Rule of Evidence 703 specifically addresses the bases of an expert witness’s opinion testimony.  The Rule, in relevant part, provides that:

“If experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted.”

Here the district court had acted prudently in excluding an expert witness who accepted the assertions of new management that it had, within a very short time span, turned a company from a money loser into a money earner.  As any observer of the market knows, there are too many short-term “fixes,” such as cutting personnel, selling depreciated property, and the like, to accredit any such short-term data as “reasonably relied upon.”  See In re Agent Orange Product Liability Lit., 611 F. Supp. 1223, 1246 (E.D.N.Y. 1985) (excluding opinions under Rule 703 of proffered expert witnesses who relied upon checklists of symptoms prepared by the litigants; “no reputable physician relies on hearsay checklists by litigants to reach a conclusion with respect to the cause of their affliction”), aff’d on other grounds, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).

Manpower represents yet another example of Court of Appeals abrogating gatekeeping by reversing a district judge who attempted to apply the Rules and the relevant Supreme Court precedent.  The panel in Manpower ignored Congressional statutory enactments and precedents of its own Circuit, and it relied upon cases superseded and overruled by later Supreme Court cases.  That’s regression for you.

Urging Review and Reversal, Scientists File Amicus Brief in the Harkonen Case

September 7th, 2013

Earlier this week, Professors Kenneth Rothman and Timothy Lash, and I, filed our Brief by Scientists And Academics as Amici Curiae, in the case, Harkonen v. United States.  As noted previously, Dr. Harkonen has petitioned the Supreme Court for review of the Ninth Circuit’s affirmance of his conviction for Wire Fraud.  Other amici will likely file on Monday, September 9, 2013.

Aaron Kesselheim’s Presentation on FDA Regulation of Manufacturer Speech

September 3rd, 2013

On August 5, 2013, Dr. Scott Harkonen filed his petition for a writ of certiorari with the United States Supreme Court. As noted in some previous posts, Dr. Harkonen was acquitted of misbranding, but convicted of wire fraud, for his role in issuing a press release about the results of a clinical trial of interferon gamma 1b, in patients with idiopathic pulmonary fibrosis.  (See Multiplicity versus Duplicity – The Harkonen Conviction; The Matrixx Motion in U.S. v. Harkonen; The (Clinical) Trial by Franz Kafka).

Dr. Harkonen’s petition presents two questions:

“1. Whether a conclusion about the meaning of scientific data, one on which scientists may reasonably disagree, satisfies the element of a “false or fraudulent” statement under the wire fraud statute, 18 U.S.C. § 1343?

2. Whether applying 18 U.S.C. § 1343 to scientific conclusions drawn from accurate data violates the First Amendment’s proscription against viewpoint discrimination, or renders the statute, as applied , unconstitutionally vague.”

Both questions are important given that the government has conceded that Dr. Harkonen’s press release accurately presented the raw data and calculated p-values.  The crime, if crime it be, lay in Dr. Harkonen’s drawing a causal inference from a subgroup, p = 0.004, which was not prespecified, in a specified secondary endpoint of survival (p = 0.08), when the subgroup was clearly based upon the goals of the trials, and there was other corroborative evidence in the form of two previous trials, clinical practice, and strong mechanistic evidence.

The government argued that NO inferences could be drawn from a trial that “failed” on its primary endpoint.  The government’s embrace of this statistical orthodoxy greatly misrepresented scientific practice to the courts below.  The only “failed” trial is one that is not conducted.

There are many who would go to great lengths to distort the facts of the Harkonen case in order to demonize the pharmaceutical industry, or to arm the Justice Department with a weapon that can shut down scientific speech about pharmaceutical interventions.  The expansion of the Wire Fraud Act, seen in the Harkonen case, to achieve these political goals will not only affect pharmaceutical company scientists, but also government and academic scientists.  The standard for falsity, drawn from an out-dated, tendentious, and overly rigid conception of hypothesis testing will apply equally to non-industry scientists in False Claim Act cases.  Perhaps in future posts, I can provide some good examples, on condition that any qui tam relators share their bounty with me.

Back in May, Aaron Kesselheim presented (by video) a paper, written with Michelle Mello, of the Harvard School of Public Health, on “The Prospect of Continued FDA Regulation of Manufacturer Promotion in an Era of Expanding Commercial Speech.”  Kesselheim went out of his way to misrepresent the facts of the Harkonen case, as part of his brief against off-market promotion.

By way of background, Aaron S. Kesselheim is a physician and a lawyer, and an Assistant Professor of Medicine at Harvard Medical School.  He is also a faculty member in the Division of Pharmacoepidemiology and Pharmacoeconomics in the Department of Medicine at Brigham and Women’s Hospital.   Given his position and his training in two professions, as well as the extraordinary stakes involved in allowing the government to prosecute scientists for drawing allegedly false conclusions about facts that the government concedes are accurate, Dr. Kesselheim should have exercised much greater care in checking his own assertions more closely.

Dr. Kesselheim focused primarily on the Second Circuit’s recent decision in United States v. Caronia, 703 F.3d 149 (2d Cir. 2012) , which reversed a judgment of conviction for off-label promotion, on First Amendment grounds.  About nine minutes into his presentation, Kesselheim turned to alternative strategies for the government to use to squelch off-label promotion.  One of his suggestions was to follow the model of the Harkonen prosecution, and to prosecute off-label promotion as false and misleading speech.

In his discussion of his suggested strategy, Kesselheim suggested that Dr. Harkonen had made misleading “conclusory, unsubstantiated claims for efficacy,“ and “without reference to supporting evidence.”  It is Kesselheim, however, who seriously mislead his listeners and readers by stating that Dr. Harkonen had made “conclusory, unsubstantiated claims for efficacy.”  The Press Release that was the subject of the government’s indictment set out accurately actual count data and calculated p-values.  No data were fabricated or falsified.  Within the limited space and the informal context of a Press Release, Dr. Harkonen had provided a substantial account of the data from InterMune’s clinical trial, as well as citing a previous, independent clinical trial and its extension, clinical experience, and mechanism research on the action of interferon γ-1b.  Unfortunately, it is Kesselheim who is speaking in conclusory sound bites when he ignores the context and content of the actual Press Release at issue.

Kesselheim went on to suggest that Harkonen’s statement was refuted by a “company-sponsored clinical trial showing that the drug was not effective.” This statement is not only false, but shows a flagrant disregard for statistical analysis and the data in the Harkonen case.  Kesselheim implies that a clinical trial that fails to show treatment efficacy thereby shows that the treatment was not effective.  His statement commits the fundamental error of equating a failure to reject the null hypothesis at a specified level of attained significance with acceptance of the null hypothesis.  This reasoning is fallacious and fundamentally flawed.

To be sure, the prespecified secondary survival endpoint in InterMune’s clinical trial did not meet the 0.05 cutoff (it was 0.08), although the per-protocol analysis for this endpoint came up at 0.055, on a preliminary analysis of the data. When the clinical trial was fully analyzed and written up for publication in the New England Journal of Medicine, the treatment-adherent analysis for survival in the entire clinical trial was 0.02, with a statistically significant hazard ratio for survival, favoring the therapy:

“Analysis of the treatment-adherent cohort of patients showed an absolute reduction in the risk of death of 9 percent in the interferon gamma-1b group, as compared with the placebo group, and a relative reduction in the risk of 66 percent (5 percent of 126 patients in the interferon gamma-1b group and 14 percent of 143 patients in the placebo group died, P=0.02). The hazard ratio for death in the interferon gamma-1b group, as compared with the placebo group, was 0.3 (95 percent confidence interval, 0.1 to 0.9).”

Ganesh Raghu, Kevin K. Brown, Williamson Z. Bradford, Karen Starko, Paul W. Noble, David A. Schwartz, and Talmadge E. King, Jr., for the Idiopathic Pulmonary Fibrosis Study Group, “A Placebo-Controlled Trial of Interferon Gamma-1b in Patients with Idiopathic Pulmonary Fibrosis,” 350 New Engl. Med. J. 125, 129-30 (2004).

Dr. Harkonen, in his Press Release, did focus on what seems like an eminently sensible subgroup, within the survival secondary endpoint, of mild- and moderate-cases, which, a priori, were believed to be the patients mostly likely to benefit from the interferon γ-1b therapy.  (What was not known before the trial was at what point in disease progression might patients no longer respond with greater survival, and hence the difficulty in setting the boundary between moderate and severe cases.)  Kesselheim might argue that the interferon γ-1b clinical trial, standing alone, was inconclusive, but he certainly cannot argue truthfully that the trial showed that the biological product to be ineffective.  Clinical trials do not neatly divide the world of possible results into demonstrations of efficacy and demonstrations of inefficacy.  Not only does the evidence come in degrees, but there is a range of “inconclusiveness” in between the two extremes. Given his background, training, and experience, Kesselheim certainly should know this, and he should apologize for his inaccurate statements.

Kesselheim might well have stopped there, but he went on to acknowledge that the company-sponsored clinical trial at issue did find, in post-hoc analyses, a non-significant trend of benefit in a subset of patients.  Talk of misleading speech!  The p-value at issue was 0.004, uncorrected for multiple comparisons, but no one, not Kesslheim, not the government or anyone else, has offered any appropriate adjustment for multiple comparisons that would inflate that 0.004 to over 0.05.  Kesselheim has no warrant for branding the subgroup finding “non-significant,” until he shows that the p = 0.004, when appropriately modified (if it can be), exceeds 0.05.

Kesselheim mangles other, less technical facts.  He claims that the company saw a ten-fold increase in sales of interferon γ-1b for idiopathic pulmonary fibrosis.  No such fact was ever, or could ever, be established in the Harkonen case.  Kesselheim claims that Dr. Harkonen admitted, in emails, that he did not really believe that the trial “demonstrated” benefit; no such emails were ever adduced at trial, and this seems to be part of a fictional narrative that Dr. Kesselheim has manufactured.  Finally, Kesselheim harrumphs that FDA declined to approve drug.  The company never filed a new drug application for the idiopathic pulmonary fibrosis indication; there was no application to reject.  Perhaps more important is that the Press Release was issued before InterMune had made any formal submission of data to the FDA, an event that did not take place until the following year.

Kesselheim sighs that the Harkonen prosecution will be a difficult act to follow because it requires a case-by-case showing of falsity, with the necessity of expert testimony, and heavy cognitive demands on lay jurors. How ironic that Kesselheim, a lawyer and a physician, and a Harvard Medical School faculty member, buckled under the cognitive demands of his topic. Indeed, Kesselheim’s confusion is a strong argument for why the Supreme Court should put a stop to the practice of asking jurors to second guess whether a scientist has incorrectly inferred causation from accurately presented facts.

Let’s hope Dr. Harkonen gets a fair hearing in the Supreme Court.

Woodside & Davis on the Bradford Hill Considerations

August 23rd, 2013

Dr. Frank Woodside and Allison Davis have published an article on the so-called Bradford Hill criteria.  Frank C. Woodside, III & Allison G. Davis, “The Bradford Hill Criteria: The Forgotten Predicate,” 35 Thomas Jefferson L. Rev. 103 (2013).

Their short paper may be of interest to Rule 702 geeks, and students of how the law parses causal factors in litigation.

The authors argue that a “predicate” to applying the Hill criteria consists of:

  • ascertaining a clear-cut association,
  • determining the studies establishing the association are valid, and
  • satisfying the Daubert [1][sic] requirements.

Id. at 107.  Parties contending for a causal association often try to flyblow the need for statistical significance at any level, and argue that Bradford Hill did not insist upon statistical testing.  Woodside and Davis remind us that Bradford Hill was quite firm in insisting upon the need to rule out random variability as an explanation for an association:

“Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Id. at 105; see Hill, Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965).  The authors correctly note that the need for study validity is fairly implied by Bradford Hill’s casual expression about “perfectly clear-cut.”

Woodside and Davis appear to acquiesce in the plaintiffs’ tortured interpretation of Bradford Hill’s speech, on which statistical significance supposedly is unimportant.  Woodside & Davis at 105 & n.7 (suggesting that Bradford Hill “seemingly negates the second [the requirement of statistical significance] when he discounts the value of significance testing, citing Bradford Hill at 299).

Woodside and Davis, however, miss the heavy emphasis that Bradford Hill actually placed upon “tests of significance”:

“No formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.”

Bradford Hill at 299.  Bradford Hill never says that statistical tests contribute nothing to proving an hypothesis; rather, his emphasis is on the insufficiency of statistical tests alone to establish causality.  Bradford Hill’s “beyond that” language clearly stakes out the preliminary, but necessary importance of ruling out the play of chance before proceeding to consider the causal factors.

Passing beyond their exegetical fumble, Woodside and Davis proceed to discuss the individual Bradford Hill considerations and how they have fared in the crucible of Rule 702.  Their discussion may be helpful to lawyers who want to track the individual considerations, and how they have treated, or dismissed, by trial courts charged with gatekeeping expert witness opinion testimony.

There is another serious problem in the Woodside and Davis paper.  The authors describe risk ratios and the notion of “confidence intervals”:

“A confidence interval provides both the relative risk found in the study and a range (interval) within which the risk would likely fall if the study were repeated numerous times.32 … As such, risk measures used in conjunction with confidence intervals are critical in establishing a perfectly clear-cut association when it comes to examining the results of a single study.35

Woodside & Davis at 110.  The authors cite to the Reference Manual on Scientific Evidence (3d 2011), but they fail to catch important nuances of the definition of a confidence interval.  The obtained interval from a given study is not the interval within which the “risk would likely fall if the study were repeated… .”  Rather it is 95% of the many intervals, from the many repeated studies done on the same population, with the same sample size, which would capture the true risk.  As for the obtained interval, the true risk is either within it, or not, and no probability value attaches to the likelihood that the true value lies within the obtained interval.

It is a mystery why lawyers would bother to define something like the confidence interval, and then do it incorrectly.  Here is how Professors Finkelstein and Levin define the confidence interval in their textbook on statistics:

“A confidence interval for a population proportion P is a range of values around the proportion observed in a sample with the property that no value in the interval would be considered unacceptable as a possible value for P in light of the sample data.”

Michael Finkelstein & Bruce Levin, Statistics for Lawyers 166-67 (2d ed. 2001).   This text explains why and where Woodside and Davis went astray:

“It is the confidence limits PL and PU that are random variables based on the sample data. Thus, a confidence interval (PL, PU) is a random interval, which may or may not contain the population parameter P. The term “confidence” derives from the fundamental property that, whatever the true value of P, the 95% confidence interval will contain P within its limits 95% of the time, or with 95% probability. This statement is made only with reference to the general property of confidence intervals and not to a probabilistic evaluation of its truth in any particular instance with realized values of PL and PU.”

Id. at 167-71.


[1] Surely the time has come to stop referring to the Daubert factors and acknowledge that the Daubert case was just one small step in the maturation of evidence law.  The maturation consisted of three additional Supreme Court cases, many lower court cases, and a statutory revision to Federal Rule of Evidence 702, in 2000.  The Daubert factors hardly give due consideration to the depth and breadth of the law in this area.

Securities Fraud vs Wire Fraud

July 29th, 2013

Pharmaceutical manufacturers are particularly vulnerable to securities fraud claims arising from the manufacturers’ pronouncements about safety or efficacy, the evidence for which is often statistical in nature.  Safety claims may involve complex data sets, both from observational studies and clinical trials.  Efficacy claims are typically based upon clinical trial data.

Publicly traded manufacturers may find themselves caught between competing securities regulations.  In evaluating safety or efficacy data, manufacturers will often consult with an outside science advisory board, or report to regulatory agencies.  Securities regulations specify that any disclosure of confidential inside information to an outsider triggers an obligation of prompt public disclosure of that information.[1]  Companies also routinely seek to keep investors informed of research and marketing developments.  Generally, manufacturers will make their public disclosures through widely circulated press releases.[2]  Not surprisingly, disgruntled investors may challenge the accuracy of the press releases, when the product or drug turns out to be less efficacious or more harmful than represented in the press release.  These challenges, brought under the securities laws, often are maintained in parallel to product liability actions, sometimes in the same multi-district litigation.

Securities laws require accurate disclosure of all material information.[3]  Rule 10b-5 of the Securities Exchange Commission (SEC) prohibits any person from making “any untrue statement of material fact” or from omitting “a material fact necessary in order to make the statements made, in light of the circumstances under which they were made, not misleading.”[4]

A prima facie case of securities fraud requires that plaintiff allege and establish, among other things, a material misrepresentation or omission.[5]  The obligations to speak and to speak accurately have opened manufacturers to second guessing in their analyses of safety and efficacy data.  In most securities fraud cases, courts have given manufacturers a wide berth by rejecting differences in opinions about the proper interpretation of studies as demonstrating fraud under the securities regulations.[6]  This latitude has been given both in judgment of what test procedures to use, as well as in how best to interpret data.[7]   In Padnes v. Scios Nova Inc., the manufacturer was testing a drug for treatment of acute kidney failure.  Scios Nova issued a press release after its phase II trial, to announce a statistically significant reduction in patients’ need for dialysis.  When the early phase III results failed to confirm this result, plaintiffs sued Scios Nova for not disclosing the lack of statistically significant outcomes on other measures of kidney function, as well as for its interpretation of dialysis results as statistically significant.[8]  The trial court dismissed the complaint.[9]

Several securities fraud cases have turned on investor dissatisfaction on how companies interpreted clinical trial subgroup data.  In Noble Asset Management v. Allos Therapeutics, Inc.,[10] the company issued a press release, noting no statistically significant increase overall in survival advantage from a drug for breast cancer, but also noting a statistically significant increased survival in a non-prespecified subgroup of patients with metastatic breast cancer.[11] The plaintiff investors claimed that the company should have disclosed that the FDA was unlikely to approve an indication based upon an ad hoc subgroup analysis, but the trial court rejected this claim because FDA policy on drug approvals is public and well known.[12] The plaintiffs also complained that the press release referred to statistically significant results from a Cox multiple regression analysis rather than the log-rank (non-parametric survival) analysis required by FDA. The trial court rejected this claim as well, opining that the analysis was not misleading when the company correctly reported the raw data and the results of the Cox multiple regression analysis.[13]

Two recent appellate decisions emphasize the courts’ unwillingness to scrutinize the contested statistical methodology that underlies plaintiffs’ claims of misrepresentation.  In In re Rigel Pharmaceuticals, Inc. Securities Litigation, the plaintiff investors were dissatisfied, not with reporting of subgroups, but rather with the failure of the company to report geographic subgroup results, as well as its use of allegedly improper statistical tests and its failure to account for multiple comparisons.[14]

The Ninth Circuit affirmed the dismissal of a complaint.  The appellate court held that allegations of “statistically false p-values” were not sufficient; plaintiffs must allege facts that explain why the difference between two statements “is not merely the difference between two permissible judgments, but rather the results of a falsehood.”[15] Alleging that a company should have used a different statistical method to analyze the data from its clinical trial is not sufficient to raise an issue of factual falsity under the securities fraud statute and regulations.[16]  The Court explained that the burden was on plaintiffs to plead and prove that the difference between two statistical statements “is not merely the difference between two permissible judgments, but rather the result of a falsehood.”[17] The Court characterized the plaintiffs’ allegations to be about judgments of which statistical tests or methods are appropriate, and not about false statements.  Furthermore, the Court emphasized that the company’s statistical method was called for in the trial protocol, and was selected before the data were unblinded and provided to the company.[18]

In Kleinman v. Elan Corporation[19], the Second Circuit affirmed the dismissal of a securities fraud class action against two pharmaceutical joint venturers, which issued a challenged press release on interim phase II clinical trial results for bapineuzumab, a drug for mild- to moderate-Alzheimer’s disease.  The press release at issue announced “top line” findings and promised a full review at an upcoming international conference.[20]  According to the release, the clinical trial data did not show a statistically significant benefit on the primary efficacy end point, but “[p]ost-hoc analyses did show statistically significant and clinically meaningful benefits in important subgroups.”[21]

The plaintiffs in Kleinman complained that the clinical trial had started with crucial imbalances between drug and placebo arms, thus indicating a failure in randomization, and that the positive results had come from impermissible post-hoc subgroup analyses.[22]  The appellate court appeared not to take the randomization issue seriously, and rejected the notion that statements can be false when they represent a defendant company’s reasonable interpretation of the data, even when the interpretation later turns out to be shown to be false[23]:

“At bottom, Kleinman simply has a problem with using post-hoc analyses as a methodology in pharmaceutical studies.  Kleinman cites commentators who liken post-hoc analyses to moving the goal posts or shooting an arrow into the wall and then drawing a target around it. Nonetheless, when it is clear that a post-hoc analysis is being used, it is understood that those results are less significant and should have less impact on investors.  Our job is not to evaluate the use of post-hoc analysis in the scientific community; the FDA has already done so.”

In United States v. Harkonen[24], the government turned the law of statistical analyses in securities fraud on its head, when it prosecuted a pharmaceutical company executive for his role in issuing a press release on clinical trial data. The jury acquitted Dr. Harkonen on a charge of misbranding[25], but convicted on a single count of wire fraud.[26] Dr. Harkonen’s crime?  Bad statistical practice.

The government conceded that the data represented in the press release were accurate, as were the calculated p-values.  The chargeable offense lay in Dr. Harkonen’s describing the clinical trial results as “demonstrating a survival benefit” of the biological product (interferon γ-1b) in a clinical trial subgroup of patients with mild- to moderate-idiopathic pulmonary fibrosis.  The p-value for the subgroup was 0.004, with an effect size of 70% reduction in mortality.  The subgroup, however, was not prespecified, and was not clearly labeled as a post-hoc analysis.  The trial had not achieved statistical significant on its primary end point.

In prosecuting Dr. Harkonen, the government offered no expert witness opinion.  Instead, it relied upon a member of the clinical trial’s data safety monitoring board, who advanced a strict, orthodox view that if the primary end point of a trial “failed,” then the data could not be relied upon to infer any meaningful causal connection within secondary end points, let alone non-prespecified end points.  The prespecified survival secondary end point showed a 40 percent reduction in mortality, p = 0.08 (which shrank to 0.055 on an intent-to-treat analysis). The press release also relied upon a previous small clinical trial that showed a benefit in survival at five years, with the therapy group at 77.8%, compared with 16.7% in the control groups, p = 0.009.

The trial court accepted the government’s claim that p-values less than 0.05 were something of “magic numbers,”[27] and rejected post-trial motions for accquittal. Dr. Harkonen’s use of “demonstrate” to describe a therapeutic benefit was, in the trial court’s view, fraudulent, because of the lack of “statistical significance” on the primary end point, and the multiple testing with respect to the secondary survival end point.  The Ninth Circuit affirmed the judgment of conviction in an unpublished per curiam opinion[28].

In contrast to the criminal wire fraud prosecution, the civil fraud actions against Dr. Harkonen and the company were dismissed.[29] The prosecution and the judgment in United States v. Harkonen is at odds with the latitude afforded companies in securities fraud cases.  Furthermore, the case cannot be fairly squared with the position that the government took as an amicus curiae in Matrixx Initiatives, Inc. v. Siracusano[30], where the Solicitor General’s office, along with counsel for the Food and Drug Division of the Department of Health & Human Services, in their zeal to assist plaintiffs on claims against an over-the-counter pharmaceutical manufacturer, disclaimed the necessity, or even the importance, of statistical significance[31]:

“[w]hile statistical significance provides some indication about the validity of a correlation between a product and a harm, a determination that certain data are not statistically significant … does not refute an inference of causation.”

Suddenly, when prosecuting an unpopular pharmaceutical company executive, the government’s flexibility evaporated. Government duplicity was a much greater problem than statistical multiplicity in Harkonen.[32]


[1] Security Exchange Comm’n Regulation FD, 17 C.F.R. § 243.100 (requiring prompt  public disclosure of any confidential, material inside information after disclosed to non-insiders).

[2] Selective Disclosure and Insider Trading, Securities Act Release No. 7881, Fed. Sec. L. Rep. (CCH) ¶ 86,319 (Aug. 15, 2000) (“As a general matter, acceptable methods of public disclosure for purposes of Regulation FD will include press releases distributed through a widely circulated news or wire service . . . .”).

[3] Section 10(b) of the Exchange Act of 1934 prohibits any person “[t]o use or employ, in connection with the purchase or sale of any security . . . any manipulative or deceptive device or contrivance in contravention of such rules and regulations as the [Securities and Exchange Commission] may prescribe.”  15 U.S.C. § 78j(b).

[4] 17 C.F.R. § 240.10b-5.

[5] Stoneridge Inv. Partners LLC v. Scientific-Atlanta, 552 U.S. 148, 157 (2008) (“(1) a material misrepresentation or omission []; (2) scienter; (3) a connection between the misrepresentation or omission and the purchase or sale of a security; (4) reliance upon the misrepresentation or omission; (5) economic loss; and (6) loss causation.”)

[6] In re Medimmune, Inc. Sec. Litig., 873 F.Supp. 953, 965 (D. Md. 1995).  The biological product at issue in this case was Respivir, a polyclonal antibody product, which “significantly” reduced frequency of hospitalization for respiratory syncytial virus (RSV).  Plaintiffs alleged “flaws” in study design, but the trial court appeared to interpret the statistical significance to mean that Respivir was “unquestionably efficacious.” Id. at 967.

[7] See, e.g., Padnes v. Scios Nova Inc., No. C 95-1693 MHP, 1996 WL 539711, at *5 (N.D. Cal. Sept. 18, 1996) (Patel, J.)[cited herein as Padnes].  See also DeMarco v. DePoTech Corp., 149 F.Supp. 2d 1212, 1225 (S.D. Cal. 2001)(“Although plaintiffs have established a legitimate difference in opinion as to the proper statistical analysis, they have hardly stated a securities fraud claim.”); n re Aldor Corp. Sec. Litig., 616 F.Supp. 2d 551, 568 n.15 (E.D. Pa. 2009) (allegations as to how data should have been analyzed do not support claims for false or misleading statements).

[8] Padnes at *2.

[9] Id. at *10.

[10] 2005 WL 4161977 (D. Colo. Oct. 20, 2005).

[11] Id. at *1.

[12] Id. at *6-7.

[13] Id. at *11.

[14] 2010 WL 8816155 (N.D. Cal. Aug. 24, 2010).

[15] 697 F.3d 869, 877 (9th Cir. 2012) (internal citations omitted), aff’g 2010 WL 8816155 (N.D. Cal. Aug. 24, 2010).

[16] Id. at 877-78.

[17] Id. at 878.

[18] Id. (“Because there are many ways to statistically analyze data, it is necessary to choose the statistical methodology before seeing the data that is collected during the trial; otherwise someone can manipulate the unblinded data to obtain a favorable result.”), citing and attempting to distinguish United States v. Harkonen, 2010 WL 2985257, at *4 (N.D. Cal. July 27, 2010).

[19] 706 F.3d 145 (2d Cir. 2013).

[20] Id. at 149.

[21] Id. at 149-50 (also noting that the press release provided a “preliminary analysis,” which might be less favorable upon further analysis).

[22] Id. at 150.

[23] Id. at 154-55 & 155n.11 (citing and quoting FDA Center for Drug Evaluation and Research:  E9 Statistical Principles for Clinical Trials, 63 Fed. Reg. 49583, 49595 (Sept. 16, 1998), that post-hoc analyses are exploratory and “unlikely” to be accepted as support of efficacy.)

[24] United States v. Harkonen, 2010 WL 2985257 (N.D. Calif. 2010) ((Patel, J.) (denying defendant’s post–trial motions to dismiss the indictment, for acquittal, or for a new trial).  Sometimes judges are looking for bright lines in the wrong places).

[25] 21 U.S.C. §§ 331(k), 333(a)(2), 352(a).

[26] 18 U.S.C. § 1343.

[27] United States v. Harkonen, 2010 WL 2985257, at *5 (N.D. Calif. 2010).

[28] United States v. Harkonen, 2013 WL 782354 (9th Cir. 2013).

[29] In re Actimmune Marketing Litig., 2010 WL 3463491 (N.D. Cal. Sept. 1, 2010), aff’d,  464 Fed.Appx. 651 (9th Cir. 2011).

[30] 131 S. Ct. 1309 (2011).

[31] Brief for the United States as Amicus Curiae Supporting Respondents, in Matrixx Initiatives, Inc. v. Siracusano, 2010 WL 4624148, at *14 (Nov. 12, 2010).

[32] Dr. Harkonen is expected to petition the Supreme Court for certiorari on statutory and constitutional grounds.  See Alex Kozinski & Stuart Banner, “Who’s Afraid of Commercial Speech?” 76 VA. L. REV. 627, 635 (1990) (“[T]here are many varieties of noncommercial speech that are just as objective as paradigmatic commercial speech and yet receive full first amendment protection. Scientific speech is the most obvious; much scientific expression can easily be labeled true or false, but we would be shocked at the suggestion that it is therefore entitled to a lesser degree of protection. If you want, you can proclaim that the sun revolves around the earth, that the earth is flat, that there is no such thing as nitrogen, that flounder smoke cigars, that you have fused atomic nuclei in your bathtub — you can spout any nonsense you want, and the government can’t stop you.”).

 

Power in the Reference Manual for Scientific Evidence

June 15th, 2013

The Third Edition of the Reference Manual on Scientific Evidence (2011) [RMSE3ed] treats statistical power in three of its chapters, those on statistics, epidemiology, and medical testimony.  Unfortunately, the treatment is not always consistent.

The chapter on statistics has been consistently among the best and most frequently ignored content of the three editions of the Reference Manual.  The most recent edition offers a good introduction to basic concepts of sampling, random variability, significance testing, and confidence intervals.  David H. Kaye & David A. Freedman, “Reference Guide on Statistics,” in RMSE3ed 209 (2011).  Kaye and Freedman provide an acceptable non-technical definition of statistical power:

“More precisely, power is the probability of rejecting the null hypothesis when the alternative hypothesis … is right. Typically, this probability will depend on the values of unknown parameters, as well as the preset significance level α. The power can be computed for any value of α and any choice of parameters satisfying the alternative hypothesis. Frequentist hypothesis testing keeps the risk of a false positive to a specified level (such as α = 5%) and then tries to maximize power. Statisticians usually denote power by the Greek letter beta (β). However, some authors use β to denote the probability of accepting the null hypothesis when the alternative hypothesis is true; this usage is fairly standard in epidemiology. Accepting the null hypothesis when the alternative holds true is a false negative (also called a Type II error, a missed signal, or a false acceptance of the null hypothesis).”

Id. at 254 n.106

The definition is not, however, without problems.  First, it introduces a nomenclature issue likely to be confusing for judges and lawyers. Kaye and Freeman use β to denote statistical power, but they acknowledge that epidemiologists use β to denote the probability of a Type II error.  And indeed, both the chapters on epidemiology and medical testimony use β to reference Type II error rate, and thus denote power as the complement of β, or (1- β).  See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in RMSE3ed 549, 582, 626 ; John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, Abogado, “Reference Guide on Medical Testimony,” in RMSE3ed 687, 724.  This confusion in nomenclature is regrettable, given the difficulty many lawyers and judges seem have in following discussions of statistical concepts.

Second, the reason for introducing the confusion about β is doubtful.  Kaye and Freeman suggest that statisticians usually denote power by β, but they offer no citations.  A quick review (not necessarily complete or even a random sample) suggests that many modern statistics texts denote power as (1- β).  See, e.g., Richard D. De Veaux, Paul F. Velleman, and David E. Bock, Intro Stats 545-48 (3d ed. 2012); Rand R. Wilcox, Fundamentals of Modern Statistical Methods 65 (2d ed. 2010).  At the end of the day, there really is no reason for the conflicting nomenclature and the likely confusion it engenders.  Indeed, the duplicative handling of statistical power, and other concepts, suggests that it is time to eliminate the repetitive discussions, in favor of one, clear, thorough discussion in the statistics chapter.

Third, Kaye and Freeman problematically refer to β as the probability of accepting the null hypothesis when elsewhere they more carefully instruct that a non-significant finding results in not rejecting the null hypothesis as opposed to accepting the null.  Id. at 253.  See also Daniel Rubinfeld, “Reference Guide on Multiple Regression,“ in RMSE3d 303, 321 (describing a p-value > 5% as leading to failing to reject the null hypothesis).

Fourth, Kaye and Freedman’s discussion of power, unlike most of their chapter, offers advice that is controversial and unclear:

“On the other hand, when studies have a good chance of detecting a meaningful association, failure to obtain significance can be persuasive evidence that there is nothing much to be found.”

RMSE3d at 254. Note that the authors leave open what a legal or clinically meaningful association is, and thus offer no real guidance to judges on how to evaluate power after data are collected and analyzed.  As Professor Sander Greenland has argued, in legal contexts, this reliance upon observed power (as opposed to power as a guide in determining appropriate sample size in the planning stages of a study) is arbitrary and “unsalvageable as an analytic tool.”  See Sander Greenland, “Nonsignificance Plus High Power Does Not Imply Support Over the Alternative,” 22 Ann. Epidemiol. 364, 364 (2012).

The chapter on epidemiology offers similar controversial advice on the use of power:

“When a study fails to find a statistically significant association, an important question is whether the result tends to exonerate the agent’s toxicity or is essentially inconclusive with regard to toxicity.93 The concept of power can be helpful in evaluating whether a study’s outcome is exonerative or inconclusive.94  The power of a study is the probability of finding a statistically significant association of a given magnitude (if it exists) in light of the sample sizes used in the study. The power of a study depends on several factors: the sample size; the level of alpha (or statistical significance) specified; the background incidence of disease; and the specified relative risk that the researcher would like to detect.95  Power curves can be constructed that show the likelihood of finding any given relative risk in light of these factors. Often, power curves are used in the design of a study to determine what size the study populations should be.96

Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” RMSE3ed 549, 582.  Although the authors correctly emphasize the need to specify an alternative hypothesis, their discussion and advice are empty of how that alternative should be selected in legal contexts.  The suggestion that power curves can be constructed is, of course, true, but irrelevant unless courts know where on the power curve they should be looking.  The authors are correct that power is used to determine adequate sample size under specified conditions, but again, the use of power curves in this setting is today rather uncommon.  Investigators select a level of power corresponding to an acceptable Type II error rate, and an alternative hypothesis that would be clinically meaningful for their research, in order to determine their sample size. Translating clinical into legal meaningfulness is not always straightforward.

In a footnote, the authors of the epidemiology chapter note that Professor Rothman has been “one of the leaders in advocating the use of confidence intervals and rejecting strict significance testing.” RMSE3d at 579 n.88.  What the chapter fails, however, to mention is that Rothman has also been outspoken in rejecting post-hoc power calculations that the epidemiology chapter seems to invite:

“Standard statistical advice states that when the data indicate a lack of significance, it is important to consider the power of the study to detect as significant a specific alternative hypothesis. The power of a test, however, is only an indirect indicator of precision, and it requires an assumption about the magnitude of the effect. In planning a study, it is reasonable to make conjectures about the magnitude of an effect to compute study-size requirements or power. In analyzing data, however, it is always preferable to use the information in the data about the effect to estimate it directly, rather than to speculate about it with study-size or power calculations (Smith and Bates, 1992; Goodman and Berlin, 1994; Hoening and Heisey, 2001). Confidence limits and (even more so) P-value functions convey much more of the essential information by indicating the range of values that are reasonably compatible with the observations (albeit at a somewhat arbitrary alpha level), assuming the statistical model is correct. They can also show that the data do not contain the information necessary for reassurance about an absence of effect.”

Kenneth Rothman, Sander Greenland, and Timothy Lash, Modern Epidemiology 160 (3d ed. 2008).  See also Kenneth J. Rothman, “Significance Questing,” 105 Ann. Intern. Med. 445, 446 (1986) (“[Simon] rightly dismisses calculations of power as a weak substitute for confidence intervals, because power calculations address only the qualitative issue of statistical significance and do not take account of the results already in hand.”)

The selective, incomplete scholarship of the epidemiology chapter on the issue of statistical power is not only unfortunate, but it distorts the authors’ evaluation of the sparse case law on the issue of power.  For instance, they note:

“Even when a study or body of studies tends to exonerate an agent, that does not establish that the agent is absolutely safe. See Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767 (N.D. Ohio 2010).  Epidemiology is not able to provide such evidence.”

RMSE3d at 582 n.93; id. at 582 n.94 (“Thus, in Smith v. Wyeth-Ayerst Labs. Co., 278 F.Supp. 2d 684, 693 (W.D.N.C. 2003), and Cooley v. Lincoln Electric Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010), the courts recognized that the power of a study was critical to assessing whether the failure of the study to find a statistically significant association was exonerative of the agent or inconclusive.”)

Here Green, Freedman, and Gordis shift the burden to the defendant and make the burden one of absolute certainty in the product’s safety.  This is not a legal standard. The cases they cite amplify the error. In Cooley, for instance, the defense expert would have opined that welding fume exposure did not cause parkinsonism or Parkinson’s disease.  Although the expert had not conducted a meta-analysis, he had reviewed the confidence intervals around the point estimates of the available studies.  Many of the point estimates were at or below 1.0, and in some cases, the upper bound of the confidence interval excluded 1.0.  The trial court expressed its concern that the expert witness had inferred “evidence of absence” from “absence of evidence.”  Cooley v. Lincoln Elec. Co., 693 F. Supp. 2d 767, 773 (N.D. Ohio 2010).  This concern, however, was misguided given that many studies had tested the claimed association, and that virtually every case-control and cohort study had found risk ratios at or below 1.0, or very close to 1.0.  What the court in Cooley, and the authors of the epidemiology chapter in the RSME3d have lost sight of, is that when the hypothesis is repeatedly tested, with failure to reject the null hypothesis, and with point estimates at or very close to 1.0, and with narrow confidence intervals, then the claimed association is probably incorrect.  See, e.g., Anthony J. Swerdlow, Maria Feychting, Adele C. Green, Leeka Kheifets, David A. Savitz, International Commission for Non-Ionizing Radiation Protection Standing Committee on Epidemiology, “Mobile Phones, Brain Tumors, and the Interphone Study: Where Are We Now?” 119 Envt’l Health Persp. 1534, 1534 (2011) (“Although there remains some uncertainty, the trend in the accumulating evidence is increasingly against the hypothesis that mobile phone use can cause brain tumors in adults.”).

The Cooley court’s comments have some validity when applied to a single study, but not to the impressive body of exculpatory epidemiologic evidence that pertains to welding fume and Parkinson’s disease.  Shortly after the Cooley case was decided, a published meta-analysis of welding fume or manganese exposure demonstrated a reduced level of risk for Parkinson’s disease among persons occupationally exposed to welding fumes or manganese.  James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

Improper Claims That Studies Lack Power, Made Without Specifying An Alternative Hypothesis

June 14th, 2013

The Misuse of Power in the Courts

A claim that a study has low power is meaningless unless both the alternative hypothesis and the level of significance are included in the statement.  See Sander Greenland, “Nonsignificance Plus High Power Does Not Imply Support Over the Alternative,” 22 Ann. Epidemiol. 364 (2012); Sander Greenland & Charles, Poole, “Problems in common interpretations of statistics in scientific articles, expert reports, and testimony,” 51 Jurimetrics J. 113, 121-22 (2011).

Power can always be assessed as low by selecting an alternative hypothesis sufficiently close to the null. A study, using risk ratios, which has high power against an alternative hypothesis of 2.0, may have very low power against an alternative of 1.1. Because risk ratios greater than two are often used to attribute specific causation, measuring power of a study against an alternative hypothesis of a doubling of risk might well be a reasonable approach in some cases.  For instance, in Miller v. Pfizer, 196 F. Supp. 2d 1062, 1079 (D. Kan. 2002), aff’d, 356 F. 3d 1326 (10th Cir.), cert. denied, 543 U.S. 917 (2004), the trial court’s Rule 706 expert witness calculated the power of a study to exceed 90% probability to detect a doubling of risk in a pooled analysis of suicidality in clinical trial data of an anti-depressant.  Report of John Concato, M.D., 2001 WL 1793169, *9 (D.Kan. 2001). Unless a court was willing to specify the level at which it would find the risk ratio unhelpful or not probative, such as a relative risk greater than two, power analyses of completed studies are not particularly useful.

Plaintiffs’ counsel rightly complain when defendants claim that a study with a statistically “non-significant” risk ratio greater than 1.0 has no probative value. Although random error (or bias and confounding) may account for the increased risk, the risk may be real. If studies consistently show an increased risk, even though all the studies have reported p-values > 5%, meta-analytic approaches may very well help rule out chance as a likely explanation for the increased risk. The complaint that a study, however, is underpowered, without more, does not help plaintiff establish an association; nor does the complaint establish that the study provides no useful information.

The power of a study depends upon several variables, including the size of the alternative hypothesis, the sample size, the expected value and its variance, and the acceptable level of probability for false-positive findings, which level is reflected in the pre-specified p-value, α, at which level the study’s findings would be interpreted as not likely consistent with the null hypothesis. The lower α is set, the lower will be the power of a test or a study, all other things being equal.  Similarly, moving from a two-tailed to a one-tailed test of significance will increase power.  Courts have acknowledged that both Type I and Type II errors, and the corresponding α and β, are important, but they have overlooked that Type II errors are usually less relevant to the litigation process. See, e.g., DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 948 (3d Cir. 1990).  A single study that failed to show a statistically significant difference in the outcome of interest does not support a conclusion that the outcome is not causally related to the exposure under study.  In products liability litigation, the parties are typically not assigned a burden of proving the absence of causation.

In the Avandia litigation, plaintiffs’ key claim is that the medication, an oral anti-diabetic, causes heart attacks, even though none of the several dozen clinical trials found a statistically significant increased risk. Plaintiffs’ expert witnesses argued that all the clinical trials of Avandia were “underpowered,” and thus the failure to find an increased risk was a Type II (false-negative) error that resulted from the small size of the clinical trials. The Avandia MDL court, considering Rule 702 challenges to plaintiffs’ expert witness opinions, accepted this argument:

“If the sample size is too small to adequately assess whether the substance is associated with the outcome of interest, statisticians say that the study lacks the power necessary to test the hypothesis. Plaintiffs’ experts argue, among other points, that the RCTs [randomized controlled trials] upon which GSK relies are all underpowered to study cardiac risks.”

In re Avandia Mktg., Sales Practices & Prods. Liab. Litig., 2011 WL 13576, at *2 (E.D. Pa. 2011) (emphasis in original). The Avandia MDL court failed to realize that the power argument was empty without a specification of an alternative hypothesis. For instance, in one of the larger trials of Avandia, the risk ratio for heart attack was a statistically non-significant 1.14, with a 95% confidence interval that spanned 0.80 to 1.63. P.D. Home, et al., Rosiglitazone Evaluated for Cardiovascular Outcomes in Oral Agent Combination Therapy for Type 2 Diabetes (RECORD), 373 Lancet  2125 (2009). This trial, standing alone, thus had excellent power against an alternative hypothesis that Avandia doubled the risk of heart attacks; such an alternative hypothesis would clearly be rejected based upon the RECORD trial. On the other hand, an alternative hypothesis of 1.2 would not be. The confidence interval, by giving a quantification of random error, conveys results reasonably compatible with the study estimate; the claim of “low power” against an unspecified alternative hypothesis, conveys nothing.

Last year, in a hormone therapy breast cancer case, the Eight Circuit confused power with β, and succumbed to plaintiff’s expert witness’s argument that he was justified in ignoring several large, well-conducted clinical trials and observational studies because they were “underpowered,” without specifying the alternative hypothesis he was using to make his claim:

“Statistical power is ‘the probability of rejecting the null hypothesis in a statistical test when a particular alternative hypothesis happens to be true’. Merriam–Webster Collegiate Dictionary 973 (11th ed. 2003). In other words, it is the probability of observing false negatives. Power analysis can be used to calculate the likelihood of accurately measuring a risk that manifests itself at a given frequency in the general population based on the sample size used in a particular study. Such an analysis is distinguishable from determining which study among several is the most reliable for evaluating whether a correlative or even a causal relationship exists between two variables.”

Kuhn v. Wyeth, Inc., 686 F.3d 618, 622 n.5 (8th Cir. 2012), rev’g, In re Prempro Prods. Liab. Litig., 765 F. Supp. 2d 1113 (W.D. Ark. 2011). The Kuhn court’s formulation, “in other words,” is incorrect.  Power is not the probability of observing false negatives; it is the probability of correctly rejecting the null in favor of a specified alternative hypothesis, at a specified level of significance probability.  The court’s further discussion of “accurately measuring” mischievously confuses one aspect of statistical power concerned with random variability, with study validity.  The 8th Circuit’s opinion never discusses or discloses what alternative hypothesis the plaintiff’s expert witness had in mind when disavowing certain studies as underpowered.  I suspect that none was ever provided, and that the judges missed the significance of the omission.  The courts would seem better off using the confidence intervals around point estimates to assess the statistical imprecision in the observed data, rather than improper power analyses that fail to specify a legally significant alternative hypothesis.

Further Musings on U.S. v. Harkonen

April 15th, 2013

Epistemic Crimes

In U.S. v. Harkonen, the government prosecuted a physician, company CEO, for issuing a press release that stated a clinical trial “demonstrated” benefit when the government believed that the clinical trial was inconclusive.  No doubt the government was intent upon punishing what it thought was off-label promotion in the same press release, but the jury acquitted on the charge of misbranding, and convicted on the wire fraud count.  The trial court denied post-trial motions, and recently, the United States Court of Appeals, for the Ninth Circuit, affirmed, in an unpublished per curiam opinion.  United States v. Harkonen, No. 11-10209, No. 11-10242, 2013 WL 782354, 2013 U.S. App. LEXIS 4472 (9th Cir. March 4, 2013).

A Gedanken Experiment

An expert witness writes a report that X, a drug therapy, causes Y, a benefit in survival, for a disease, Z.

The expert witness sent his report by email, and regular mail, to counsel, who then served it upon his adversary.  The report set out some of the support for the opinion, as follows.

The expert witness relied upon a randomized clinical trial, conducted with one primary and nine secondary endpoints.  The multiple endpoints were chosen because of uncertainty of how the anticipated benefit would manifest.  Mortality (survival), although obviously a very important endpoint, was not made primary endpoint because the scientists who conducted the trial did not anticipate sufficient deaths over the course of the trials to see a statistically significant benefit.

This clinical trial had surprising results. Although the trial did not show a difference on the primary endpoint, a composite defined in terms of various pulmonary functional changes or death, the trial did “demonstrate,” according to the witness, a survival benefit.  Indeed, the survival benefit was clinically significant.  Patients randomized to therapy experienced a 40% decrease in mortality, compared to those randomized to placebo. (p = 0.084).  The expert witness pointed out, in his report, that the survival benefit was even stronger in a subgroup of the clinical trial, which consisted of the patients who had mild- to moderate-disease at the time of randomization.  For this subgroup, the decrease in mortality was even more dramatic, 70%, p = 0.004.  The witness’s report did not clearly label this subgroup as “post-hoc,” although a discerning reader might well have assessed it as such.

The expert witness was not relying upon only one clinical trial.  His report identified an earlier trial, published in a leading clinical medical journal, which reported benefit from the drug, p < 0.001.  This trial was extended, with continuing strong evidence of differential survival.  In terms of survival at five years, the earlier trial showed survival in the therapy group at 77.8%, compared to 16.7% in the control groups, p = 0.009.

The expert witness’s report did not explicitly reference clinical experience, or the in vitro and in vivo mechanistic evidence that the therapy, X, plays a role in inhibiting processes that are clearly involved in producing the disease, Z.  The expert witness could have written a stronger expert witness report with these references, but did not expect that this level of completeness was required.  The expert witness did note that he would marshal the data in more detail at a later time. The expert witness further relied upon the assessment of the principal investigator of the later clinical, who had written that the benefit against mortality of X was “compelling,” and that the finding was “a major breakthrough.”  The principal investigator of the trial noted that X was “the first treatment ever to show any meaningful clinical impact in this disease in rigorous clinical trials, and these results would indicate that [X] should be used early in the course of this disease in order to realize the most favorable long-term survival benefit.”  The report went on to note, accurately, that there are no FDA-approved therapies for Z.

Adversary counsel, receiving this report, moved pursuant to Federal Rules of Evidence 702 and 703, to exclude the expert witness’s report and his opinions.  The motion to exclude was made in advance of the deposition, and without a preliminary motion for more detail about the supporting data.  In particular, the motion to exclude claimed that the expert witness was unjustified in concluding that a benefit had been “demonstrated,” as opposed to being merely suggested.

What would be the challenger’s chances of success on the Rule 702 motion?  The outcome, Y, was not “statistically significant” at the conventional two-tail 5% (but would have been on a one-tail test).  The subgroup that sported a p-value of 0.004 was not clearly marked as a post-hoc subgroup, although the challenger could discern that it was likely exploratory, and challenged it as uncorrected for multiple testing.  The challenger, however, did not attempt to offer a modified p-value that took into account of multiple testing.  The essence of this challenge was that the expert witness’s statement that a benefit had been “demonstrated” was not supported by sufficient evidence, and that the low p-value of 0.004 was not truly “significant” because the result emerged from an analysis that was not pre-planned.

My hunch, based upon published judicial opinions on both state and federal Rule 702 motions, is that many judges would allow the challenged expert witness to testify.  There would be the usual judicial hand waving about the challenge’s going to the weight not the admissibility of the expert witness’s opinion.  Perhaps an occasional judge might order additional discovery.  I believe that most judges would not find that this expert witness had engaged in pathologically bad science such that the party proponent should be denied its expert witness.

Transmuting Disputed Causal Inferences Into Criminal Fraud

Instead of moving to exclude the expert witness’s opinion, why not turn the report over to the U.S. Attorney’s office to prosecute for wire or mail fraud?  Even if a trial court were to brand the opinion “inadmissible,” that outcome would hardly suggest that the opinion was the kind of speech that could qualify as fraudulent under federal wire or mail fraud statutes. Branding a scientist as a fraudfeasor, however, was exactly the result reached in U.S. v. Harkonen, where the Ninth Circuit upheld a wire fraud conviction of a physician whose written statements would likely have been admissible in most federal courtrooms, under Federal Rule 702.  As much as I would like to see more stringent gatekeeping of expert witness opinions, there is something unseemly about the government’s efforts here to criminalize scientific opinions with which it disagrees.

Dr. Harkonen has petitioned the Ninth Circuit for reconsideration, in a brief filed by attorneys, Mark Haddad and colleagues, of Sidley Austin.  Petition for Rehearing En Banc (filed 29, 2013).  The case raises important First Amendment and due process issues, which were addressed by the party and amici briefs before the Panel.

The case also raises the specter of prosecutions of scientists for speech in various contexts, including grant applications and reports, under the False Claims Act, for witness perjury for testimony in judicial, administrative, or legislative proceedings, or for wire or mail fraud for manuscript submissions to journals. On April 8th, Professor Robert Makuch, of Yale University, Professor Timothy Lash, of Emory University, and I filed an amicus brief, which addresses the government’s controversial branding statements “false as a matter of statistics.” The government has gone from one extreme of painting, broad brush, that statistical significance is not important or necessary (in Matrixx Initiatives Inc. v. Siracusano), to the other extreme that statistical significance is so important that a scientist who his opinion on causality on evidence the government believes is not statistically significant has committed fraud (in Harkonen).  Both extreme positions are untenable.

Probabilism Case Law

January 28th, 2013

Some judges and commentators have characterized all evidence as ultimately “probable,” but other writers have criticized this view as trading on the ambiguities inherent in our ordinary usage of probable to convey an epistemic hedge or uncertainty.  How successful is the probabilistic program in the law?  In the context of assessing causation, many courts have succumbed to the temptation to substitute risk for causation.  Other courts have noticed the difference between a prospective risk and a retrospective factual determination that a risk factor actually participated in bringing about the caused result.  In any event, judicial skepticism about probabilistic evidence, in many contexts, has found its expression in holdings and in dicta of common law courts.  The following is a chronological listing of some pertinent cases that rejected or limited the use of overtly probabilistic evidence. There are only two cases involving epidemiological evidence before 1970 on the list.

Day v. Boston & Maine R.R., 96 Me. 207, 217–218, 52 A. 771, 774 (1902) (“Quantitative probability, however, is only the greater chance. It is not proof, nor even probative evidence, of the proposition to be proved. That in one throw of dice, there is a quantitative probability, or greater chance, that a less number of spots than sixes will fall uppermost is no evidence whatever that in a given throw such was the actual result. Without something more, the actual result of the throw would still be utterly unknown. The slightest real evidence would outweigh all the probability otherwise.”)

Toledo, St. L. & W. R. Co. v. Howe, 191 F. 776, 782-83 (6th Cir. 1911) (holding that evidence at issue was not probabilistic, but noting in dictum that “[n]o man’s property should be taken from him on the mere guess that he has committed a wrong. . . because of a probability among other probabilities that the accident for which recovery is sought might have happened in the way charged.”)

People v. Risley, 214 N.Y. 75, 86, 108 N.E. 200, 203 (1915) (holding that probability calculations were improper when “the fact to be established in this case was not the probability of a future event, but whether an occurrence asserted by the people to have happened had actually taken place”)

Lampe v. Franklin Am. Trust, 339 Mo. 361, 384, 96 S.W.2d 710, 723 (1936) (verdict must be based upon what the jury finds to be facts rather than what they find to be ‘more probable’.)

Sargent v. Massachusetts Accident Co., 307 Mass. 246, 250, 29 N.E.2d 825, 827 (1940) (the preponderance standard requires more than showing that the chances mathematically favor a fact in dispute; the proponent must prove the proposition in dispute such that the jurors form an actual belief in the truth of the proposition) (“It has been held not enough that mathematically the chances somewhat favor a proposition to be proved; for example, the fact that colored automobiles made in the current year outnumber black ones would not warrant a finding that an undescribed automobile of the current year is colored and not black, nor would the fact that only a minority of men die of cancer warrant a finding that a particular man did not die of cancer. The weight or preponderance of the evidence is its power to convince the tribunal which has the determination of the fact, of the actual truth of the proposition to be proved. After the evidence has been weighed, that proposition is proved by a preponderance of the evidence if it is made to appear more likely or probable in the sense that actual belief in its truth, derived from the evidence, exists in the mind or minds of the tribunal notwithstanding any doubts that may linger there.”)

Smith v. Rapid Transit, 317 Mass. 469, 470, 58 N.E.2d 754, 755 (1945) (evidence that defendant was the only bus franchise operating in the area where the accident took place was not sufficient to establish that the bus that caused the accident belonged to the defendant where private or chartered buses could have been in the area; it is not enough that mathematically the chances somewhat favor the proposition to be proved)

Kamosky v Owens-Illinois Co., 89 F. Supp. 561, 561-62 (M.D.Pa. 1950) (directing verdict in favor of defendant; statistical likelihood that defendant manufactured the bottle that injured plaintiff was insufficient to satisfy plaintiff’s burden of proof)

Mahoney v. United States, 220 F. Supp. 823, 840 41 (E.D. Tenn. 1963) (Taylor, C.J.) (holding that plaintiffs had failed to prove that their cancers were caused by radiation exposures, on the basis of their statistical, epidemiological proofs), aff’d, 339 F.2d 605 (6th Cir. 1964) (per curiam)

In re King, 352 Mass. 488, 491-92, 225 N.E.2d 900, 902 (1967) (physician expert’s opinion that expressed a mathematical likelihood, unsupported by clinical evidence, that claimant’s death from cancer was caused by his accidental fall was legally insufficient to support a judgment)

Garner v. Heckla Mining Co., 19 Utah 2d 367, 431 P.2d 794, 796 97 (1967) (affirming denial of compensation to family of a uranium miner who had smoked cigarettes and had died of lung cancer; statistical evidence of synergistically increased risk of lung cancer among uranium miners is insufficient to show causation of decedent’s lung cancer, especially considering his having smoked cigarettes)

Whitehurst v. Revlon, 307 F. Supp. 918, 920 (E.D. Va. 1969) (holding that challenged evidence was not probabilistic, and noting in dictum that probability evidence of negligence evidence would leave verdict based upon conjecture, guess or speculation)

Guenther v. Armstrong Rubber Co., 406 F.2d 1315, 1318 (3d Cir. 1969) (holding that defendant cannot be found liable on the basis that it supplied 75-80% of the kind of tire purchased by the plaintiff; any verdict based on this evidence “would at best be a guess”)

Crawford v. Industrial Comm’n, 23 Ariz. App. 578, 582-83, 534 P.2d 1077, 1078, 1082-83 (1975) (affirming an employee’s award of no compensation because he was exposed to disease producing conditions both on and off the job; a physician’s testimony, expressed to a reasonable degree of medical certainty that the working conditions statistically increased the probability of developing a disease does not satisfy the reasonable certainty standard)

Olson v. Federal American Partners, 567 P.2d 710, 712 13 (Wyo. 1977) (affirming judgment for employer in compensation proceedings; cigarette smoking claimant failed to show that his lung cancer resulted from workplace exposure to radiation, despite alleged synergism between smoking and radiation).

Heckman v. Federal Press Co., 587 F.2d 612, 617 (3d Cir. 1977) (statistical data about a group do not establish facts about an individual)

Bazemore v. Davis, 394 A.2d 1377, 1382 n.7 (D.C. 1978) (if verdicts were determined on the basis of statistics indicating high probability of alleged facts, more often than not they would be correct guesses, but this is not a sufficient basis for reaching verdicts)

Kaminsky v. Hertz Corp., 94 Mich. App. 356 (1979) (dictum; reversing summary judgment)

Sulesky v. United States, 545 F. Supp. 426, 430 (S.D.W.Va. 1982) (swine flu vaccine GBS cases; epidemiological studies alone do not prove or disprove causation in an individual)

Robinson v. United States, 533 F. Supp. 320, 330 (E.D. Mich. 1982) (finding for government in swine flu vaccine case; the court found that that the epidemiological evidence offered by the plaintiff was not probative, and that it “would reach the same result if the epidemiological data were entirely excluded since statistical evidence cannot establish cause and effect in an individual”)

Iglarsh v. United States, No. 79 C 2148, 1983 U.S. Dist. LEXIS 10950, *10 (N.D.Ill. Dec. 9, 1983) (“In the absence of a statistically valid epidemiological study, even the plaintiff’s treating physician or expert witness, or any clinician for that matter, is unable to attribute a plaintiff’s injury to the swine flu vaccination.”)

Johnston v. United States, 597 F. Supp. 374, 412, 425-26 (D.Kan. 1984) (although the probability of attribution increases with the relative risk, expert must still speculate in making an individual attribution; “a statistical method which shows a greater than 50% probability does not rise to the required level of proof; plaintiffs’ expert witnesses’ reports were “statistical sophistry,” not medical opinion)

Kramer v. Weedhopper of Utah, Inc., 490 N.E.2d 104, 108 (Ill. App. Ct. 1986) (Stamos, J., dissenting) (“Liability is not based on a balancing of probabilities, but on a finding of fact.  While the majority contends that the measure of what is considered sufficient evidence [to support submitting a case to the jury] resolves itself into a question of probability, a review of case law … reveals that a theoretical probability alone cannot be the basis for [a prima facie case].  There must be some evidence in addition to the abstraction which will enable a jury to choose between competing probabilities.”)

Washington v. Armstrong World Industries, 839 F.2d 1121 (5th Cir. 1988) (affirming grant of summary judgment on grounds that statistical correlation between asbestos exposure and disease did not support specific causation)

Thompson v. Merrell Dow Pharm., 229 N.J. Super. 230, 244, 551 A.2d 177, 185 (1988) (epidemiology looks at increased incidences of diseases in populations)

Norman v. National Gypsum Co., 739 F. Supp. 1137, 1138 (E.D. Tenn. 1990) (statistical evidence of risk of lung cancer from asbestos and smoking was insufficient to show individual causation, without evidence of asbestos fibers in the plaintiff’s lung tissue)

Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1576 (N.D. Ga. 1991) (“The court notes that, in an individual case, epidemiology cannot conclusively prove causation; at best, it can establish only a certain probability that a randomly selected case of birth defect was one that would not have occurred absent exposure (or the ‘relative risk’ of the exposed population).”)

Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1573 (N.D. Ga. 1991) (“However, in an individual case, epidemiology cannot conclusively prove causation; at best, it can only establish a certain probability that a randomly selected case of disease was one that would not have occurred absent exposure, or the ‘relative risk’ of the exposed population.  Epidemiology, therefore, involves evidence on causation derived from group-based information, rather than specific conclusions regarding causation in an individual case.”)

Howard v. Wal-Mart Stores, Inc., 160 F.3d 358, 359–60 (7th Cir. 1998) (Posner, C.J.)

Krim v. pcOrder.com, Inc., 402 F.3d 489 (5th Cir. 2005) (rejecting standing plaintiffs’ standing to sue for fraud absent a showing of actual tracing of shares to the offending public offering; statistical likelihood of those shares having been among those purchased was insufficient to confer standing)