TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Failed Gatekeeping in Ambrosini v. Labarraque (1996)

December 28th, 2017

The Ambrosini case straddled the Supreme Court’s 1993 Daubert decision. The case began before the Supreme Court clarified the federal standard for expert witness gatekeeping, and ended in the Court of Appeals for the District of Columbia, after the high court adopted the curious notion that scientific claims should be based upon reliable evidence and valid inferences. That notion has only slowly and inconsistently trickled down to the lower courts.

Given that Ambrosini was litigated in the District of Columbia, where the docket is dominated by regulatory controversies, frequently involving dubious scientific claims, no one should be surprised that the D.C. Court of Appeals did not see that the Supreme Court had read “an exacting standard” into Federal Rule of Evidence 702. And so, we see, in Ambrosini, this Court of Appeals citing and purportedly applying its own pre-Daubert decision in Ferebee v. Chevron Chem. Co., 552 F. Supp. 1297 (D.D.C. 1982), aff’d, 736 F.2d 1529 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).1 In 2000, the Federal Rule of Evidence 702 was revised in a way that extinguishes the precedential value of Ambrosini and the broad dicta of Ferebee, but some courts and commentators have failed to stay abreast of the law.

Escolastica Ambrosini was using a synthetic progestin birth control, Depo-Provera, as well as an anti-nausea medication, Bendectin, when she became pregnant. The child that resulted from this pregnancy, Teresa Ambrosini, was born with malformations of her face, eyes, and ears, cleft lip and palate, and vetebral malformations. About three percent of all live births in the United States have a major malformation. Perhaps because the Divine Being has sovereign immunity, Escolastica sued the manufacturers of Bendectin and Depo-Provera, as well as the prescribing physician.

The causal claims were controversial when made, and they still are. The progestin at issue, medroxyprogesterone acetate (MPA), was embryotoxic in the cynomolgus monkey2, but not in the baboon3. The evidence in humans was equivocal at best, and involved mostly genital malformations4; the epidemiologic evidence for the MPA causal claim to this day remains unconvincing5.

At the close of discovery in Ambrosini, Upjohn (the manufacturer of the progestin) moved for summary judgment, with a supporting affidavit of a physician and geneticist, Dr. Joe Leigh Simpson. In his affidavit, Simpson discussed three epidemiologic studies, as well as other published papers, in support of his opinion that the progestin at issue did not cause the types of birth defects manifested by Teresa Ambrosini.

Ambrosini had disclosed two expert witnesses, Dr. Allen S. Goldman and Dr. Brian Strom. Neither Goldman nor Strom bothered to identify the papers, studies, data, or methodology used in arriving at an opinion on causation. Not surprisingly, the district judge was unimpressed with their opposition, and granted summary judgment for the defendant. Ambrosini v. Labarraque, 966 F.2d 1462, 1466 (D.C. Cir. 1992).

The plaintiffs appealed on the remarkable ground that Goldman’s and Strom’s crypto-evidence satisfied Federal Rule of Evidence 703. Even more remarkably, the Circuit, in a strikingly unscholarly opinion by Judge Mikva, opined that disclosure of relied-upon studies was not required for expert witnesses under Rules 703 and 705. Judge Mikva seemed to forget that the opinions being challenged were not given in testimony, but in (late-filed) affidavits that had to satisfy the requirement of Federal Rule of Civil Procedure 26. Id. at 1468-69. At trial, an expert witness may express an opinion without identifying its bases, but of course the adverse party may compel disclosure of those bases. In discovery, the proffered expert witness must supply all opinions and evidence relied upon in reach the opinions. In any event, the Circuit remanded the case for a hearing and further proceedings, at which the two challenged expert witnesses, Goldman and Strom, would have to identify the bases of their opinions. Id. at 1471.

Not long after the case landed back in the district court, the Supreme Court decided Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). With an order to produce entered, plaintiffs’ counsel could no longer hide Goldman and Strom’s evidentiary bases, and their scientific inferences came under judicial scrutiny.

Upjohn moved again to exclude Goldman and Strom’s opinions. The district court upheld Upjohn’s challenges, and granted summary judgment in favor of Upjohn for the second time. The Ambrosinis appealed again, but the second case in the D.C. Circuit resulted in a split decision, with the majority holding that the exclusion of Goldman and Strom’s opinions under Rule 702 was erroneous. Ambrosini v. Labarraque, 101 F.3d 129 (D.C. Cir. 1996).

Although issued two decades ago, the majority’s opinion remains noteworthy as an example of judicial resistance to the existence and meaning of the Supreme Court’s Daubert opinion. The majority opinion uncritically cited the notorious Ferebee6 and other pre-Daubert decisions. The court embraced the Daubert dictum about gatekeeping being limited to methodologic consideration, and then proceeded to interpret methodology as superficially as necessary to sustain admissibility. If an expert witness claimed to have looked at epidemiologic studies, and epidemiology was an accepted methodology, then the opinion of the expert witness must satisfy the legal requirements of Daubert, or so it would seem from the opinion of the U.S. Court of Appeals for the District of Columbia.

Despite the majority’s hand waving, a careful reader will discern that there must have been substantial gaps and omissions in the explanations and evidence cited by plaintiffs’ expert witnesses. Seeing anything clearly in the Circuit’s opinion is made difficult, however, by careless and imprecise language, such as its descriptions of studies as showing, or not showing “causation,” when it could have meant only that such studies showed associations, with more or less random and systematic error.

Dr. Strom’s report addressed only general causation, and even so, he apparently did not address general causation of the specific malformations manifested by the plaintiffs’ child. Strom claimed to have relied upon the “totality of the data,” but his methodologic approach seems to have required him to dismiss studies that failed to show an association.

Dr. Strom first set forth the reasoning he employed that led him to disagree with those studies finding no causal relationship [sic] between progestins and birth defects like Teresa’s. He explained that an epidemiologist evaluates studies based on their ‘statistical power’. Statistical power, he continued, represents the ability of a study, based on its sample size, to detect a causal relationship. Conventionally, in order to be considered meaningful, negative studies, that is, those which allege the absence of a causal relationship, must have at least an 80 to 90 percent chance of detecting a causal link if such a link exists; otherwise, the studies cannot be considered conclusive. Based on sample sizes too small to be reliable, the negative studies at issue, Dr. Strom explained, lacked sufficient statistical power to be considered conclusive.”

Id. at 1367.

Putting aside the problem of suggesting that an observational study detects a “causal relationship,” as opposed to an association in need of further causal evaluation, the Court’s précis of Strom’s testimony on power is troublesome, and typical of how other courts have misunderstood and misapplied the concept of statistical power. Statistical power is a probability of observing an association of a specified size at a specified level of statistical significance. The calculation of statistical power turns indeed on sample size, the level of significance probability preselected for “statistical significance, an assumed probability distribution of the sample, and, critically, an alternative hypothesis. Without a specified alternative hypothesis, the notion of statistical power is meaningless, regardless of what probability (80% or 90% or some other percentage) is sought for finding the alternative hypothesis. Furthermore, the notion that the defense must adduce studies with “sufficient statistical power to be considered conclusive” creates an unscientific standard that can never be met, while subverting the law’s requirement that the claimant establish causation.

The suggestion that the studies that failed to find an association cannot be considered conclusive because they “lacked sufficient statistical power” is troublesome because it distorts and misapplies the very notion of statistical power. No attempt was made to describe the confidence intervals surrounding the point estimates of the null studies; nor was there any discussion whether the studies could be aggregated to increase their power to rule out meaningful associations.

The Circuit court’s scientific jurisprudence was thus seriously flawed. Without a discussion of the end points observed, the relevant point estimates of risk ratios, and the confidence intervals, the reader cannot assess the strength of the claims made by Goldman and Strom, or by defense expert Simpson, in their reports. Without identifying the study endpoints, the reader cannot evaluate whether the plaintiffs’ expert witnesses relied upon relevant outcomes in formulating their opinions. The court viewed the subject matter from 30,000 feet, passing over at 600 mph, without engagement or care. A strong dissent, however, suggested serious mischaracterizations of the plaintiffs’ evidence by the majority.

The only specific causation testimony to support plaintiff’s claims came from Goldman, in what appears to have been a “differential etiology.” Goldman purported to rule out a genetic cause, even though he had not conducted a critical family history or ordered a state-of-the-art chromosomal study. Id. at 140. Of course, nothing in a differential etiology approach would allow a physician to rule out “unknown” causes, which, for birth defects, make up the most prevalent and likely causes to explain any particular case. The majority acknowledged that these were short comings, but rhetorically characterized them as substantive, not methodologic, and therefore as issues for cross-examination, not for consideration by a judicial gatekeeping. All this is magical thinking, but it continues to infect judicial approaches to specific causation. See, e.g., Green Mountain Chrysler Plymouth Dodge Jeep v. Crombie, 508 F. Supp. 2d 295, 311 (D.Vt. 2007) (citing Ambrosini for the proposition that “the possibility of uneliminated causes goes to weight rather than admissibility, provided that the expert has considered and reasonably ruled out the most obvious”). In Ambrosini, however, Dr. Goldman had not ruled out much of anything.

Circuit Judge Karen LeCraft Henderson dissented in a short, but pointed opinion that carefully marshaled the record evidence. Drs. Goldman and Strom had relied upon a study by Greenberg and Matsunaga, whose data failed to show a statistically significant association between MPA and cleft lip and palate, when the crucial issue of timing of exposure was taken into consideration. Ambrosini, 101 F.3d at 142.

Beyond the specific claims and evidence, Judge Henderson anticipated the subsequent Supreme Court decisions in Joiner, Kumho Tire, and Weisgram, and the year 2000 revision of Rule 702, in noting that the majority’s acceptance of glib claims to have used a “traditional methodology” would render Daubert nugatory. Id. at 143-45 (characterizing Strom and Goldman’s methodologies as “wispish”). Even more importantly, Judge Henderson refused to indulge the assumption that somehow the length of Goldman’s C.V. substituted for evidence that his methods satisfied the legal (or scientific) standard of reliability. Id.

The good news is that little or nothing in Ambrosini survives the 2000 amendment to Rule 702. The bad news is that not all federal judges seem to have noticed, and that some commentators continue to cite the case, as lovely.

Probably no commentator has promiscuously embraced Ambrosini as warmly as Carl Cranor, a philosopher, and occasional expert witness for the lawsuit industry, in several publications and presentations.8 Cranor has been particularly enthusiastic about Ambrosini’s approval of expert witness’s testimony that failed to address “the relative risk between exposed and unexposed populations of cleft lip and palate, or any other of the birth defects from which [the child] suffers,” as well as differential etiologies that exclude nothing.9 Somehow Cranor, as did the majority in Ambrosini, believes that testimony that fails to identify the magnitude of the point estimate of relative risk can “assist the trier of fact to understand the evidence or to determine a fact in issue.”10 Of course, without that magnitude given, the trier of fact could not evaluate the strength of the alleged association; nor could the trier assess the probability of individual causation to the plaintiff. Cranor also has written approvingly of lumping unrelated end points, which defeats the assessment of biological plausibility and coherence by the trier of fact. When the defense expert witness in Ambrosini adverted to the point estimates for relevant end points, the majority, with Cranor’s approval, rejected the null findings as “too small to be significant.”11 If the null studies were, in fact, too small to be useful tests of the plaintiffs’ claims, intellectual and scientific honesty required an acknowledgement that the evidentiary display was not one from which a reasonable scientist would draw a causal conclusion.


1Ambrosini v. Labarraque, 101 F.3d 129, 138-39 (D.C. Cir. 1996) (citing and applying Ferebee), cert. dismissed sub nom. Upjohn Co. v. Ambrosini, 117 S.Ct. 1572 (1997) See also David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89Notre Dame L. Rev. 27, 31 (2013).

2 S. Prahalada, E. Carroad, M. Cukierski, and A.G. Hendrickx, “Embryotoxicity of a single dose of medroxyprogesterone acetate (MPA) and maternal serum MPA concentrations in cynomolgus monkey (Macaca fascicularis),” 32 Teratology 421 (1985).

3 S. Prahalada, E. Carroad, and A.G. Hendrick, “Embryotoxicity and maternal serum concentrations of medroxyprogesterone acetate (MPA) in baboons (Papio cynocephalus),” 32 Contraception 497 (1985).

4 See, e.g., Z. Katz, M. Lancet, J. Skornik, J. Chemke, B.M. Mogilner, and M. Klinberg, “Teratogenicity of progestogens given during the first trimester of pregnancy,” 65 Obstet Gynecol. 775 (1985); J.L. Yovich, S.R. Turner, and R. Draper, “Medroxyprogesterone acetate therapy in early pregnancy has no apparent fetal effects,” 38 Teratology 135 (1988).

5 G. Saccone, C. Schoen, J.M. Franasiak, R.T. Scott, and V. Berghella, “Supplementation with progestogens in the first trimester of pregnancy to prevent miscarriage in women with unexplained recurrent miscarriage: a systematic review and meta-analysis of randomized, controlled trials,” 107 Fertil. Steril. 430 (2017).

6 Ferebee v. Chevron Chemical Co., 736 F.2d 1529, 1535 (D.C. Cir.), cert. denied, 469 U.S. 1062 (1984).

7 Dr. Strom was also quoted as having provided a misleading definition of statistical significance: “whether there is a statistically significant finding at greater than 95 percent chance that it’s not due to random error.” Ambrosini at 101 F.3d at 136. Given the majority’s inadequate description of the record, the description of witness testimony may not be accurate, and error cannot properly be allocated.

8 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320, 327-28 (2006); see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

9 Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 320 (2006).

10 Id.

11 Id. ; see also Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 238 (2d ed. 2016).

Gatekeeping of Expert Witnesses Needs a Bair Hug

December 20th, 2017

For every Rule 702 (“Daubert”) success story, there are multiple gatekeeping failures. See David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).1 Exemplars of inadequate expert witness gatekeeping in state or federal court abound, and overwhelm the bar. The only solace one might find is that the abuse-of-discretion appellate standard of review keeps the bad decisions from precedentially outlawing the good ones.

Judge Joan Ericksen recently provided another Berenstain Bears’ example of how not to keep the expert witness gate, in litigation claims that the Bair Hugger forced air warming devices (“Bair Huggers”) cause infections. In re Bair Hugger Forced Air Warming, MDL No. 15-2666, 2017 WL 6397721 (D. Minn. Dec. 13, 2017). Although Her Honor properly cited and quoted Rule 702 (2000), a new standard is announced in a bold heading:

Under Federal Rule of Evidence 702, the Court need only exclude expert testimony that is so fundamentally unsupported that it can offer no assistance to the jury.”

Id. at *1. This new standard thus permits largely unsupported opinion that can offer bad assistance to the jury. As Judge Ericksen demonstrates, this new standard, which has no warrant in the statutory text of Rule 702 or its advisory committee notes, allows expert witnesses to rely upon studies that have serious internal and external validity flaws.

Jonathan Samet, a specialist in pulmonary medicine, not infectious disease or statistics, is one of the plaintiffs’ principal expert witnesses. Samet relies in large measure upon an observational study2, which purports to find an increased odds ratio for use of the Bair Hugger among infection cases in one particular hospital. The defense epidemiologist, Jonathan B. Borak, criticized the McGovern observational study on several grounds, including that the study was highly confounded by the presence of other known infection risks. Id. at *6. Judge Ericksen characterized Borak’s opinion as an assertion that the McGovern study was an “insufficient basis” for the plaintiffs’ claims. A fair reading of even Judge Ericksen’s précis of Borak’s proffered testimony requires the conclusion that Borak’s opinion was that the McGovern study was invalid because of data collection errors and confounding. Id.

Judge Ericksen’s judicial assessment, taken from the disagreement between Samet and Borak, is that there are issues with the McGovern study, which go to “weight of the evidence.” This finding obscures, however, that there were strong challenges to the internal and external validity of the study. Drawing causal inferences from an invalid observational study is a methodological issue, not a weight-of-the-evidence problem for the jury to resolve. This MDL opinion never addresses the Rule 703 issue, whether an epidemiologic expert would reasonably rely upon such a confounded study.

The defense proffered the opinion of Theodore R. Holford, who criticized Dr. Samet for drawing causal inferences from the McGovern observational study. Holford, a professor of biostatistics at Yale University’s School of Public Health, analyzed the raw data behind the McGovern study. Id. at *8. The plaintiffs challenged Holford’s opinions on the ground that he relied on data in “non-final” form, from a temporally expanded dataset. Even more intriguingly, given that the plaintiffs did not present a statistician expert witness, plaintiffs argued that Holford’s opinions should be excluded because

(1) he insufficiently justified his use of a statistical test, and

(2) he “emphasizes statistical significance more than he would in his professional work.”

Id.

The MDL court dismissed the plaintiffs’ challenge on the mistaken conclusion that the alleged contradictions between Holford’s practice and his testimony impugn his credibility at most.” If there were truly such a deviation from the statistical standard of care, the issue is methodological, not a credibility issue of whether Holford was telling the truth. And as for the alleged over-emphasis on statistical significance, the MDL court again falls back to the glib conclusions that the allegation goes to the weight, not the admissibility of expert witness opinion testimony, and that plaintiffs can elicit testimony from Dr Samet as to how and why Professor Holford over-emphasized statistical significance. Id. Inquiring minds, at the bar, and in the academy, are left with no information about what the real issues are in the case.

Generally, both sides’ challenges to expert witnesses were denied.3 The real losers, however, were the scientific and medical communities, bench, bar, and general public. The MDL court glibly and incorrectly treated methodological issues as “credibility” issues, confused sufficiency with validity, and banished methodological failures to consideration by the trier of fact for “weight.” Confounding was mistreated as simply a debating point between the parties’ expert witnesses. The reader of Judge Ericksen’s opinion never learns what statistical test was used by Professor Holford, what justification was needed but allegedly absent for the test, why the justification was contested, and what other test was alleged by plaintiffs to have been a “better” statistical test. As for the emphasis given statistical significance, the reader is left in the dark about exactly what that emphasis was, and how it led to Holford’s conclusions and opinions, and what the proper emphasis should have been.

Eventually appellate review of the Bair Hugger MDL decision must turn on whether the district court abused its discretion. Although appellate courts give trial judges discretion to resolve Rule 702 issues, the appellate courts cannot reach reasoned decisions when the inferior courts fail to give even a cursory description of what the issues were, and how and why they were resolved as they were.


2 P. D. McGovern, M. Albrecht, K. G. Belani, C. Nachtsheim, P. F. Partington, I. Carluke, and M. R. Reed, “Forced-Air Warming and Ultra-Clean Ventilation Do Not Mix: An Investigation of Theatre Ventilation, Patient Warming and Joint Replacement Infection in Orthopaedics,” 93 J. Bone Joint 1537 (2011). The article as published contains no disclosures of potential or actual conflicts of interest. A persistent rumor has it that the investigators were funded by a commercial rival to the manufacturer of the Bair Hugger at issue in Judge Ericksen’s MDL. See generally, Melissa D. Kellam, Loraine S. Dieckmann, and Paul N. Austin, “Forced-Air Warming Devices and the Risk of Surgical Site Infections,” 98 Ass’n periOperative Registered Nurses (AORN) J. 354 (2013).

3 A challenge to plaintiffs’ expert witness Yadin David was sustained to the extent he sought to offer opinions about the defendant’s state of mind. Id. at *5.

Multiplicity in the Third Circuit

September 21st, 2017

In Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283 (W. D. Pa.), plaintiffs claimed that their employer’s reduction in force unlawfully targeted workers over 50 years of age. Plaintiffs lacked any evidence of employer animus against old folks, and thus attempted to make out a statistical disparate impact claim. The plaintiffs placed their chief reliance upon an expert witness, Michael A. Campion, to analyze a dataset of workers agreed to have been the subject of the R.I.F. For the last 30 years, Campion has been on the faculty in Purdue University. His academic training and graduate degrees are in industrial and organizational psychology. Campion has served an editor of Personnel Psychology, and as a past president of the Society for Industrial and Organizational Psychology. Campion’s academic website page notes that he manages a small consulting firm, Campion Consulting Services1.

The defense sought to characterize Campion as not qualified to offer his statistical analysis2. Campion did, however, have some statistical training as part of his master’s level training in psychology, and his professional publications did occasionally involve statistical analyses. To be sure, Campion’s statistical acumen paled in comparison to the defense expert witness, James Rosenberger, a fellow and a former vice president of the American Statistical Association, as well as a full professor of statistics in Pennsylvania State University. The threshold for qualification, however, is low, and the defense’s attack on Campion’s qualifications failed to attract the court’s serious attention.

On the merits, the defense subjected Campion to a strong challenge on whether he had misused data. The defense’s expert witness, Prof. Rosenberger, filed a report that questioned Campion’s data handling and statistical analyses. The defense claimed that Campion had engaged in questionable data manipulation by including, in his RIF analysis, workers who had been terminated when their plant was transferred to another company, as well as workers who retired voluntarily.

Using simple z-score tests, Campion compared the ages of terminated and non-terminated employees in four subgroups, ages 40+, 45+, 50+, and 55+. He did not conduct an analysis of the 60+ subgroup on the claim that this group had too few members for the test to have sufficient power3Campion found a small z-score for the 40+ versus <40 age groups comparison (z =1.51), which is not close to statistical significance at the 5% level. On the defense’s legal theory, this was the crucial comparison to be made under the Age Discrimination in Employment Act (ADEA). The plaintiffs, however, maintained that they could make out a case of disparate impact by showing age discrimination at age subgroups that started above the minimum specified by the ADEA. Although age is a continuous variable, Campion decided to conduct z-scores on subgroups that were based upon five-year increments. For the 45+, 50+, and 55+ age subgroups, he found z-scores that ranged from 2.15 to 2.46, and he concluded that there was evidence of disparate impact in the higher age subgroups4. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 (W.D. Pa. July 13, 2015) (McVerry, S.J.)

The defense, and apparently the defense expert witnesses, branded Campion’s analysis as “data snooping,” which required correction for multiple comparisons. In the defense’s view, the multiple age subgroups required a Bonferroni correction that would have diminished the critical p-value for “significance” by a factor of four. The trial court agreed with the defense contention about data snooping and multiple comparisons, and excluded Campion’s opinion of disparate impact, which had been based upon finding statistically significant disparities in the 45+, 50+, and 55+ age subgroups. 2015 WL 4232600, at *13. The trial court noted that Campion, in finding significant disparities in terminations in the subgroups, but not in the 40+ versus <40 analysis:

[did] not apply any of the generally accepted statistical procedures (i.e., the Bonferroni procedure) to correct his results for the likelihood of a false indication of significance. This sort of subgrouping ‘analysis’ is data-snooping, plain and simple.”

Id. After excluding Campion’s opinions under Rule 702, as well as other evidence in support of plaintiffs’ disparate impact claim, the trial court granted summary judgment on the discrimination claims. Karlo v. Pittsburgh Glass Works, LLC, No. 2:10–cv–1283, 2015 WL 5156913 (W. D. Pa. Sept. 2, 2015).

On plaintiffs’ appeal, the Third Circuit took the wind out of the attack on Campion by holding that the ADEA prohibits disparate impacts based upon age, which need not necessarily be on workers’ being over 40 years old, as opposed to being at least 40 years old. Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 66-68 (3d Cir. 2017). This holding took the legal significance out of the statistical insignificance of Campion’s comparison 40+ versus <40 age-group termination rates. Campion’s subgroup analyses were back in play, but the Third Circuit still faced the question whether Campion’s conclusions, based upon unadjusted z-scores and p-values, offended Rule 702.

The Third Circuit noted that the district court had identified three grounds for excluding Campion’s statistical analyses:

(1) Dr. Campion used facts or data that were not reliable;

(2) he failed to use a statistical adjustment called the Bonferroni procedure; and

(3) his testimony lacks ‘‘fit’’ to the case because subgroup claims are not cognizable.

849 F.3d at 81. The first issue was raised by the defense’s claims of Campion’s sloppy data handling, and inclusion of voluntarily retired workers and workers who were terminated when their plant was turned over to another company. The Circuit did not address these data handling issues, which it left for the trial court on remand. Id. at 82. The third ground went out of the case with the appellate court’s resolution of the scope of the ADEA. The Circuit did, however, engage on the issue whether adjustment for multiple comparisons was required by Rule 702.

On the “data-snooping” issue, the Circuit concluded that the trial court had applied “an incorrectly rigorous standard for reliability.” Id. The Circuit acknowledged that

[i]n theory, a researcher who searches for statistical significance in multiple attempts raises the probability of discovering it purely by chance, committing Type I error (i.e., finding a false positive).”

849 F.3d at 82. The defense expert witness contended that applying the Bonferroni adjustment, which would have reduced the critical significance probability level from 5% to 1%, would have rendered Campion’s analyses not statistically significant, and thus not probative of disparate impact. Given that plaintiffs’ cases were entirely statistical, the adjustment would have been fatal to their cases. Id. at 82.

At the trial level and on appeal, plaintiffs and Campion had objected to the data-snooping charge on ground that

(1) he had engaged in only four subgroups;

(2) virtually all subgroups were statistically significant;

(3) his methodology was “hypothesis driven” and involved logical increments in age to explore whether the strength of the evidence of age disparity in terminations continued in each, increasingly older subgroup;

(4) his method was analogous to replications with different samples; and

(5) his result was confirmed by a single, supplemental analysis.

Id. at 83. According to the plaintiffs, Campion’s approach was based upon the reality that age is a continuous, not a dichotomous variable, and he was exploring a single hypothesis. A.240-241; Brief of Appellants at 26. Campion’s explanations do mitigate somewhat the charge of “data snooping,” but they do not explain why Campion did not use a statistical analysis that treated age as a continuous variable, at the outset of his analysis. The single, supplemental analysis was never described or reported by the trial or appellate courts.

The Third Circuit concluded that the district court had applied a ‘‘merits standard of correctness,’’ which is higher than what Rule 702 requires. Specifically, the district court, having identified a potential methodological flaw, did not further evaluate whether Campion’s opinion relied upon good grounds. 849 F.3d at 83. The Circuit vacated the judgment below, and remanded the case to the district court for the opportunity to apply the correct standard.

The trial court’s acceptance that an adjustment was appropriate or required hardly seems a “merits standard.” The use of a proper adjustment for multiple comparisons is very much a methodological concern. If Campion could reach his conclusion only by way of an inappropriate methodology, then his conclusion surely would fail the requirements of Rule 702. The trial court did, however, appear to accept, without explicit evidence, that the failure to apply the Bonferroni correction made it impossible for Campion to present sound scientific argument for his conclusion that there had been disparate impact. The trial court’s opinion also suggests that the Bonferroni correction itself, as opposed to some more appropriate correction, was required.

Unfortunately, the reported opinions do not provide the reader with a clear account of what the analyses would have shown on the correct data set, without improper inclusions and exclusions, and with appropriate statistical adjustments. Presumably, the parties are left to make their cases on remand.

Based upon citations to sources that described the Bonferroni adjustment as “good statistical practice,” but one that is ‘‘not widely or consistently adopted’’ in the behavioral and social sciences, the Third Circuit observed that in some cases, failure to adjust for multiple comparisons may “simply diminish the weight of an expert’s finding.”5 The observation is problematic given that Kumho Tire suggests that an expert witness must use “in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho Tire Co. v. Carmichael, 526 U.S. 137, 150, (1999). One implication is that courts are prisoners to prevalent scientific malpractice and abuse of statistical methodology. Another implication is that courts need to look more closely at the assumptions and predicates for various statistical tests and adjustments, such as the Bonferroni correction.

These worrisome implications are exacerbated by the appellate court’s insistence that the question whether a study’s result was properly calculated or interpreted “goes to the weight of the evidence, not to its admissibility.”6 Combined with citations to pre-Daubert statistics cases7, judicial comments such as these can appear to be a general disregard for the statutory requirements of Rules 702 and 703. Claims of statistical significance, in studies with multiple exposure and multiple outcomes, are frequently not adjusted for multiple comparisons, without notation, explanation, or justification. The consequence is that study results are often over-interpreted and over-sold. Methodological errors related to multiple testing or over-claiming statistical significance are commonplace in tort litigation over “health-effects” studies of birth defects, cancer, and other chronic diseases that require epidemiologic evidence8.

In Karlo, the claimed methodological error is beset by its own methodological problems. As the court noted, adjustments for multiple comparisons are not free from methodological controversy9. One noteworthy textbook10 labels the Bonferroni correction as an “awful response” to the problem of multiple comparisons. Aside from this strident criticism, there are alternative approaches to statistical adjustment for multiple comparisons. In the context of the Karlo case, the Bonferroni might well be awful because Campion’s four subgroups are hardly independent tests. Because each subgroup is nested within the next higher age subgroup, the subgroup test results will be strongly correlated in a way that defeats the mathematical assumptions of the Bonferroni correction. On remand, the trial court in Karlo must still make his Rule 702 gatekeeping decision on the methodological appropriateness of whether Campion’s properly considered the role of multiple subgroups, and multiple anaslyses run on different models.


1 Although Campion describes his consulting business as small, he seems to turn up in quite a few employment discrimination cases. See, e.g., Chen-Oster v. Goldman, Sachs & Co., 10 Civ. 6950 (AT) (JCF) (S.D.N.Y. 2015); Brand v. Comcast Corp., Case No. 11 C 8471 (N.D. Ill. July 5, 2014); Powell v. Dallas Morning News L.P., 776 F. Supp. 2d 240, 247 (N.D. Tex. 2011) (excluding Campion’s opinions), aff’d, 486 F. App’x 469 (5th Cir. 2012).

2 See Defendant’s Motion to Bar Dr. Michael Campion’s Statistical Analysis, 2013 WL 11260556.

3 There was no mention of an effect size for the lower aged subgroups, and a power calculation for the 60+ subgroup’s probability of showing a z-score greater than two. Similarly, there was no discussion or argument about why this subgroup could not have been evaluated with Fisher’s exact test. In deciding the appeal, the Third Circuit observed that “Dr. Rosenberger test[ed] a subgroup of sixty-and-older employees, which Dr. Campion did not include in his analysis because ‘[t]here are only 14 terminations, which means the statistical power to detect a significant effect is very low’. A.244–45.” Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 82 n.15 (3d Cir. 2017).

4 In the trial court’s words, the z-score converts the difference in termination rates into standard deviations. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 n.13 (W.D. Pa. July 13, 2015). According to the trial court, Campion gave a rather dubious explanation of the meaning of the z-score: “[w]hen the number of standard deviations is less than –2 (actually–1.96), there is a 95% probability that the difference in termination rates of the subgroups is not due to chance alone” Id. (internal citation omitted).

5 See 849 F.3d 61, 83 (3d Cir. 2017) (citing and quoting from Paetzold & Willborn § 6:7, at 308 n.2) (describing the Bonferroni adjustment as ‘‘good statistical practice,’’ but ‘‘not widely or consistently adopted’’ in the behavioral and social sciences); see also E.E.O.C. v. Autozone, Inc., No. 00-2923, 2006 WL 2524093, at *4 (W.D. Tenn. Aug. 29, 2006) (‘‘[T]he Court does not have a sufficient basis to find that … the non-utilization [of the Bonferroni adjustment] makes [the expert’s] results unreliable.’’). And of course, the Third Circuit invoked the Daubert chestnut: ‘‘Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but

admissible evidence.’’ Daubert, 509 U.S. 579, 596 (1993).

6 See 849 F.3d at 83 (citing Leonard v. Stemtech Internat’l Inc., 834 F.3d 376, 391 (3d Cir. 2016).

7 See 849 F.3d 61, 83 (3d Cir. 2017), citing Bazemore v. Friday, 478 U.S. 385, 400 (1986) (‘‘Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.’’).

8 See Hans Zeisel & David Kaye, Prove It with Figures: Empirical Methods in Law and Litigation 93 & n.3 (1997) (criticizing the “notorious” case of Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986), for its erroneous endorsement of conclusions based upon “statistically significant” studies that explored dozens of congenital malformation outcomes, without statistical adjustment). The authors do, however, give an encouraging example of a English trial judge who took multiplicity seriously. Reay v. British Nuclear Fuels (Q.B. Oct. 8,1993) (published in The Independent, Nov. 22,1993). In Reay, the trial court took seriously the multiplicity of hypotheses tested in the study relied upon by plaintiffs. Id. (“the fact that a number of hypotheses were considered in the study requires an increase in the P-value of the findings with consequent reduction in the confidence that can be placed in the study result … .”), quoted in Zeisel & Kaye at 93. Zeisel and Kaye emphasize that courts should not be overly impressed with claims of statistically significant findings, and should pay close attention to how expert witnesses developed their statistical models. Id. at 94.

9 See David B. Cohen, Michael G. Aamodt, and Eric M. Dunleavy, Technical Advisory Committee Report on Best Practices in Adverse Impact Analyses (Center for Corporate Equality 2010).

10 Kenneth J. Rothman, Sander Greenland, and Timoth L. Lash, Modern Epidemiology 273 (3d ed. 2008); see also Kenneth J. Rothman, “No Adjustments Are Needed for Multiple Comparisons,” 1 Epidemiology 43, 43 (1990)

 Another Haack Article on Daubert

October 14th, 2016

In yet another law review article on Daubert, Susan Haack has managed mostly to repeat her past mistakes, while adding a few new ones to her exegesis of the law of expert witnesses. See Susan Haack, “Mind the Analytical Gap! Tracing a Fault Line in Daubert,” 654 Wayne L. Rev. 653 (2016) [cited as Gap].  Like some other commentators on the law of evidence, Haack purports to discuss this area of law without ever citing or quoting the current version of the relevant statute, Federal Rule of Evidence 703. She pours over Daubert and Joiner, as she has done before, with mostly the same errors of interpretation. In discussing Joiner, Haack misses the importance of the Supreme Court’s reversal of the 11th Circuit’s asymmetric standard of Rule 702 trial court decisions. Gap at 677. And Haack’s analysis of this area of law omits any mention of Rule 703, and its role in Rule 702 determinations. Although you can safely skip yet another Haack article, you should expect to see this one, along with her others, cited in briefs, right up there with David Michael’s Manufacturing Doubt.

A Matter of Degree

“It may be said that the difference is only one of degree. Most differences are, when nicely analyzed.”[1]

Quoting Holmes, Haack appears to complain that the courts’ admissibility decisions on expert witnesses’s opinions are dichotomous and categorical, whereas the component parts of the decisions, involving relevance and reliability, are qualitative and gradational. True, true, and immaterial.

How do you boil a live frog so it does not jump out of the water?  You slowly turn up the heat on the frog by degrees.  The frog is lulled into complacency, but at the end of the process, the frog is quite, categorically, and sincerely dead. By a matter of degrees, you can boil a frog alive in water, with a categorically ascertainable outcome.

Humans use categorical assignments in all walks of life.  We rely upon our conceptual abilities to differentiate sinners and saints, criminals and paragons, scholars and skells. And we do this even though IQ, and virtues, come in degrees. In legal contexts, the finder of fact (whether judge or jury) must resolve disputed facts and render a verdict, which will usually be dichotomous, not gradational.

Haack finds “the elision of admissibility into sufficiency disturbing,” Gap at 654, but that is life, reason, and the law. She suggests that the difference in the nature of relevancy and reliability on the one hand, and admissibility on the other, creates a conceptual “mismatch.” Gap at 669. The suggestion is rubbish, a Briticism that Haack is fond of using herself.  Clinical pathologists may diagnose cancer by counting the number of mitotic spindles in cells removed from an organ on biopsy.  The number may be characterized by as a percentage of cells in mitosis, a gradational that can run from zero to 100 percent, but the conclusion that comes out of the pathologist’s review is a categorical diagnosis.  The pathologist must decide whether the biopsy result is benign or malignant. And so it is with many human activities and ways of understanding the world.

The Problems with Daubert (in Haack’s View)

Atomism versus Holism

Haack repeats a litany of complaints about Daubert, but she generally misses the boat.  Daubert was decisional law, in 1993, which interpreted a statute, Federal Rule of Evidence 702.  The current version of Rule 702, which was not available to, or binding on, the Court in Daubert, focuses on both validity and sufficiency concerns:

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:

(a) the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data;

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

Subsection (b) renders most of Haack’s article a legal ignoratio elenchi.

Relative Risks Greater Than Two

Modern chronic disease epidemiology has fostered an awareness that there is a legitimate category of disease causation that involves identifying causes that are neither necessary nor sufficient to produce their effects. Today it is a commonplace that an established cause of lung cancer is cigarette smoking, and yet, not all smokers develop lung cancer, and not all lung cancer patients were smokers.  Epidemiology can identify lung cancer causes such as smoking because it looks at stochastic processes that are modified from base rates, or population rates. This model of causation is not expected to produce uniform and consistent categorical outcomes in all exposed individuals, such as lung cancer in all smokers.

A necessary implication of categorizing an exposure or lifestyle variable as a “cause,” in this way is that the evidence that helps establish causation cannot answer whether a given individual case of the outcome of interest was caused by the exposure of interest, even when that exposure is a known cause.  We can certainly say that the exposure in the person was a risk for developing the disease later, but we often have no way to make the individual attribution.  In some cases, more the exception than the rule, there may be an identified mechanism that allows the detection of a “fingerprint” of causation. For the most part, however, risk and cause are two completely different things.

The magnitude of risk, expressed as a risk ratio, can be used to calculate a population attributable risk, which can in turn, with some caveats, be interpreted as approximating a probability of causation.  When the attributable risk is 95%, as it would be for people with light smoking habits and lung cancer, treating the existence of the prior risk as evidence of specific causation seems perfectly reasonable.  Treating a 25% attributable risk as evidence to support a conclusion of specific causation, without more, is simply wrong.  A simple probabilistic urn model would tell us that we would most likely be incorrect if we attributed a random case to the risk based upon such a low attributable risk.  Although we can fuss over whether the urn model is correct, the typical case in litigation allows no other model to be asserted, and it would be the plaintiffs’ burden of proof to establish the alternative model in any event.

As she has done many times before, Haack criticizes Judge Kozinski’s opinion in Daubert,[2] on remand, where he entered judgment for the defendant because further proceedings were futile given the small relative risks claimed by plaintiffs’ expert witnesses.  Those relative risks, advanced by Shanna Swan and Alan Done, lacked reliability; they were the product of a for-litigation juking of the stats that were the original target of the defendant and the medical community in the Supreme Court briefing.  Judge Kozinski simplified the case, using a common legal strategem of assuming arguendo that general causation was established.  With this assumption favorable to plaintiffs made, but never proven or accepted, Judge Kozinski could then shine his analytical light on the fatal weakness of the specific causation opinions.  When all the hand waving was put to rest, all that propped up the plaintiff’s specific causation claim was the existence of a claimed relative risk, which was less than two. Haack is unhappy with the analytical clarity achieved by Kozinski, and implicitly urges a conflation of general and specific causation so that “all the evidence” can be counted.  The evidence of general causation, however, does not advance plaintiff’s specific causation case when the nature of causation is the (assumed) existence of a non-necessary and non-sufficient risk. Haack quotes Dean McCormick as having observed that “[a] brick is not a wall,” and accuses Judge Kozinski of an atomistic fallacy of ruling out a wall simply because the party had only bricks.  Gap at 673, quoting from Charles McCormick, Handbook of the Law of Evidence at 317 (1954).

There is a fallacy opposite to the atomistic fallacy, however, namely the holistic “too much of nothing fallacy” so nicely put by Poincaré:

“Science is built up with facts, as a house is with stones. But a collection of facts is no more a science than a heap of stones is a house.”[3]

Poincaré’s metaphor is more powerful than Haack’s call for holistic evidence because it acknowledges that interlocking pieces of evidence may cohere as a building, or they may be no more than a pile of rubble.  Poorly constructed walls may soon revert to the pile of stones from which they came.

Haack proceeds to criticize Judge Kozinski for his “extraordinary argument” that

“(a) equates degrees of proof with statistical probabilities;

(b) assesses each expert’s testimony individually; and

(c) raises the standard of admissibility under the relevance prong to the standard of proof.”

Gap at 672.

Haack misses the point that a low relative risk, with no other valid evidence of specific causation, translates into a low probability of specific causation, even if general causation were apodictically certain. Aggregating the testimony, say between  animal toxicologists and epidemiologists, simply does not advance the epistemic ball on specific causation because all the evidence collectively does not help identify the cause of Jason Daubert’s birth defects on the very model of causation that plaintiffs’ expert witnesses advanced.

All this would be bad enough, but Haack then goes on to commit a serious category mistake in confusing the probabilistic inference (for specific causation) of an urn model with the prosecutor’s fallacy of interpreting a random match probability as the evidence of innocence. (Or the complement of the random match probability as the evidence of guilt.) Judge Kozinski was not working with random match probabilities, and he did not commit the prosecutor’s fallacy.

Take Some Sertraline and Call Me in the Morning

As depressing as Haack’s article is, she manages to make matters even gloomier by attempting a discussion of Judge Rufe’s recent decision in the sertraline birth defects litigation. Haack’s discussion of this decision illustrates and typifies her analyses of other cases, including various decisions on causation opinion testimony on phenylpropanolamine, silicone, bendectin, t-PA, and other occupational, environmental, and therapeutic exposures. Maybe 100 mg sertraline is in order.

Haack criticizes what she perceives to be the conflation of admissibility and sufficiency issues in how the sertraline MDL court addressed the defendants’ motion to exclude the proffered testimony of Dr. Anick Bérard. Gap at 683. The conflation is imaginary, however, and the direct result of Haack’s refusal to look at the specific, multiple methodological flaws in plaintiffs’ expert witness Anick Bérard’s methodologic approach taken to reach a causal conclusion. These flaws are not gradational, and they are detailed in the MDL court’s opinion[4] excluding Anick Bérard. Haack, however, fails to look at the details. Instead Haack focuses on what she suggests is the sertraline MDL court’s conclusion that epidemiology was necessary:

“Judge Rufe argues that reliable testimony about human causation should generally be supported by epidemiological studies, and that ‘when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why [it] does not contradict or undermine their opinion’. * * *

Judge Rufe acknowledges the difference between admissibility and sufficiency but, when it comes to the part of their testimony he [sic] deems inadmissible, his [sic] argument seems to be that, in light of the defendant’s epidemiological evidence, the plaintiffs’ expert testimony is insufficient.”

Gap at 682.

This précis is a remarkable distortion of the material facts of the case. There was no plaintiffs’ epidemiology evidence and defendants’ epidemiologic evidence.  Rather there was epidemiologic evidence, and Bérard ignored, misreported, or misrepresented a good deal of the total evidentiary display. Bérard embraced studies when she could use their risk ratios to support her opinions, but criticized or ignored the same studies when their risk ratios pointed in the direction of no association or even of a protective association. To add to this methodological duplicity, Anick Bérard published many statements, in peer-reviewed journals, that sertraline was not shown to cause birth defects, but then changed her opinion solely for litigation. The court’s observation that there was a need for consistent epidemiologic evidence flowed not only from the conception of causation (non-necessary, not sufficient), but from Berard’s and her fellow plaintiffs’ expert witnesses’ concessions that epidemiology was needed.  Haack’s glib approach to criticizing judicial opinions fails to do justice to the difficulties of the task; nor does she advance any meaningful criteria to separate successful from unsuccessful efforts.

In attempting to make her case for the gradational nature of relevance and reliability, Haack acknowledges that the details of the evidence relied upon can render the evidence, and presumably the conclusion based thereon, more or less reliable.  Thus, we are told that epidemiologic studies based upon self-reported diagnoses are highly unreliable because such diagnoses are often wrong. Gap at 667-68. Similarly, we are told that in consider a claim that a plaintiff suffered an adverse effect from a medication, that epidemiologic evidence showing a risk ratio of three would not be reliable if it had inadequate or inappropriate controls,[5] was not double blinded, and lacked randomization. Gap at 668-69. Even if the boundaries between reliable and unreliable are not always as clear as we might like, Haack fails to show that the gatekeeping process lacks a suitable epistemic, scientific foundation.

Curiously, Haack calls out Carl Cranor, plaintiffs’ expert witness in the Milward case, for advancing a confusing, vacuous “weight of the evidence” rationale for the methodology employed by the other plaintiffs’ causation expert witnesses in Milward.[6] Haack argues that Cranor’s invocation of “inference to the best explanation” and “weight of the evidence” fails to answer the important questions at issue in the case, namely how to weight the inference to causation as strong, weak, or absent. Gap at 688 & n. 223, 224. And yet, when Haack discusses court decisions that detailed voluminous records of evidence about how causal inferences should be made and supported, she flies over the details to give us confused, empty conclusions that the trial courts conflated admissibility with sufficiency.


[1] Rideout v. Knox, 19 N.E. 390, 392 (Mass. 1892).

[2] Daubert v. Merrell Dow Pharm., Inc., 43 F.3d 1311, 1320 (9th Cir. 1995).

[3] Jules Henri Poincaré, La Science et l’Hypothèse (1905) (chapter 9, Les Hypothèses en Physique)( “[O]n fait la science avec des faits comme une maison avec des pierres; mais une accumulation de faits n’est pas plus une science qu’un tas de pierres n’est une maison.”).

[4] In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 466 (E.D. Pa. 2014).

[5] Actually Haack’s suggestion is that a study with a relative risk of three would not be very reliable if it had no controls, but that suggestion is incoherent.  A risk ratio could not have been calculated at all if there had been no controls.

[6] Milward v. Acuity Specialty Prods., 639 F.3d 11, 17-18 (1st Cir. 2011), cert. denied, 132 S.Ct. 1002 (2012).

Judge Bernstein’s Criticism of Rule 703 of the Federal Rules of Evidence

August 30th, 2016

Federal Rule of Evidence Rule 703 addresses the bases of expert witness opinions, and it is a mess. The drafting of this Rule is particularly sloppy. The Rule tells us, among other things, that:

“[i]f experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted.”

This sentence of the Rule has a simple grammatical and logical structure:

If A, then B;

where A contains the concept of reasonable reliance, and B tells us the consequence that the relied upon material need not be itself admissible for the opinion to be admissible.

But what happens if the expert witness has not reasonably relied upon certain facts or data; i.e., ~A?  The conditional statement as given does not describe the outcome in this situation. We are not told what happens when an expert witness’s reliance in the particular field is unreasonable.  ~A does not necessarily imply ~B. Perhaps the drafters meant to write:

B if and only if A.

But the drafters did not give us the above rule, and they have left judges and lawyers to make sense of their poor grammar and bad logic.

And what happens when the reliance material is independently admissible, say as a business record, government report, and first-person observation?  May an expert witness rely upon admissible facts or data, even when a reasonable expert would not do so? Again, it seems that the drafters were trying to limit expert witness reliance to some rule of reason, but by tying reliance to the admissibility of the reliance material, they managed to conflate two separate notions.

And why is reliance judged by the expert witness’s particular field?  Fields of study and areas of science and technology overlap. In some fields, it is common place for putative experts to rely upon materials that would not be given the time of day in other fields. Should we judge the reasonableness of homeopathic healthcare providers’ reliance by the standards of reasonableness in homeopathy, such as it is, or should we judge it by the standards of medical science? The answer to this rhetorical question seems obvious, but the drafters of Rule 703 introduced a Balkanized concept of science and technology by introducing the notion of the expert witness’s “particular field.” The standard of Rule 702 is “knowledge” and “helpfulness,” both of which concepts are not constrained by “particular fields.”

And then Rule 703 leaves us in the dark about how to handle an expert witness’s reliance upon inadmissible facts or data. According to the Rule, “the proponent of the opinion may disclose [the inadmissible facts or data] to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect. And yet, disclosing inadmissible facts or data would always be highly prejudicial because they represent facts and data that the jury is forbidden to consider in reaching its verdict.  Nonetheless, trial judges routinely tell juries that an expert witness’s opinion is no better than the facts and data on which the opinion is based.  If the facts and data are inadmissible, the jury must disregard them in its fact finding; and if an expert witness’s opinion is based upon facts and data that are to be disregarded, then the expert witness’s opinion must be disregarded as well. Or so common sense and respect for the trial’s truth-finding function would suggest.

The drafters of Rule 703 do not shoulder all the blame for the illogic and bad results of the rule. The judicial interpretation of Rule 703 has been sloppy, as well. The Rule’s “plain language” tells us that “[a]n expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed.”  So expert witnesses should be arriving at their opinions through reliance upon facts and data, but many expert witnesses rely upon others’ opinions, and most courts seem to be fine with such reliance.  And the reliance is often blind, as when medical clinicians rely upon epidemiologic opinions, which in turn are based upon data from studies that the clinicians themselves are incompetent to interpret and critique.

The problem of reliance, as contained within Rule 703, is deep and pervasive in modern civil and criminal trials. In the trial of health effect claims, expert witnesses rely upon epidemiologic and toxicologic studies that contain multiple layers of hearsay, often with little or no validation of the trustworthiness of many of those factual layers. The inferential methodologies are often obscure, even to the expert witnesses, and trial counsel are frequently untrained and ill prepared to expose the ignorance and mistakes of the expert witnesses.

Back in February 2008, I presented at an ALI-ABA conference on expert witness evidence about the problems of Rule 703.[1] I laid out a critique of Rule 703, which showed that the Rule permitted expert witnesses to rely upon “castles in the air.” A distinguished panel of law professors and judges seemed to agree; at least no one offered a defense of Rule 703.

Shortly after I presented at the ALI-ABA conference, Professor Julie E. Seaman published an insightful law review in which she framed the problems of rule 703 as constitutional issues.[2] Encouraged by Professor Seaman’s work, I wrote up my comments on Rule 703 for an ABA publication,[3] and I have updated those comments in the light of subsequent judicial opinions,[4] as well as the failure of the Third Edition of the Reference Manual of Scientific Evidence to address the problems.[5]

===================

Judge Mark I. Bernstein is a trial court judge for the Philadelphia County Court of Common Pleas. I never tried a case before Judge Bernstein, who has announced his plans to leave the Philadelphia bench after 29 years of service,[6] but I had heard from some lawyers (on both sides of the bar) that he was a “pro-plaintiff” judge. Some years ago, I sat next to him on a CLE panel on trial evidence, at which he disparaged judicial gatekeeping,[7] which seemed to support his reputation. The reality seems to be more complex. Judge Bernstein has shown that he can be a critical consumer of complex scientific evidence, and an able gatekeeper under Pennsylvania’s crazy quilt-work pattern of expert witness law. For example, in a hotly contested birth defects case involving sertraline, Judge Bernstein held a pre-trial evidentiary hearing and looked carefully at the proffered testimony of Michael D. Freeman, a chiropractor and self-styled “forensic epidemiologist, and Robert Cabrera, a teratologist. Applying a robust interpretation of Pennsylvania’s Frye rule, Judge Bernstein excluded Freeman and Cabrera’s proffered testimony, and entered summary judgment for defendant Pfizer, Inc. Porter v. Smithkline Beecham Corp., 2016 WL 614572 (Phila. Cty. Ct. Com. Pl.). SeeDemonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case” (Oct. 6, 2015).

And Judge Bernstein has shown that he is one of the few judges who takes seriously Rule 705’s requirement that expert witnesses produce their relied upon facts and data at trial, on cross-examination. In Hansen v. Wyeth, Inc., Dr. Harris Busch, a frequent testifier for plaintiffs, glibly opined about the defendant’s negligence.  On cross-examination, he adverted to the volumes of depositions and documents he had reviewed, but when defense counsel pressed, the witness was unable to produce and show exactly what he had reviewed. After the jury returned a verdict for the plaintiff, Judge Bernstein set the verdict aside because of the expert witness’s failure to comply with Rule 705. Hansen v. Wyeth, Inc., 72 Pa. D. & C. 4th 225, 2005 WL 1114512, at *13, *19, (Phila. Ct. Common Pleas 2005) (granting new trial on post-trial motion), 77 Pa. D. & C. 4th 501, 2005 WL 3068256 (Phila. Ct. Common Pleas 2005) (opinion in support of affirmance after notice of appeal).

In a recent law review article, Judge Bernstein has issued a withering critique of Rule 703. See Hon. Mark I. Bernstein, “Jury Evaluation of Expert Testimony Under the Federal Rules,” 7 Drexel L. Rev. 239 (2015). Judge Bernstein is clearly dissatisfied with the current approach to expert witnesses in federal court, and he lays almost exclusive blame on Rule 703 and its permission to hide the crucial facts, data, and inferential processes from the jury. In his law review article, Judge Bernstein characterizes Rules 703 and 705 as empowering “the expert to hide personal credibility judgments, to quietly draw conclusions, to individually decide what is proper evidence, and worst of all, to offer opinions without even telling the jury the facts assumed.” Id. at 264. Judge Bernstein cautions that the subversion of the factual predicates for expert witnesses’ opinions under Rule 703 has significant, untoward consequences for the court system. Not only are lawyers allowed to hire professional advocates as expert witnesses, but the availability of such professional witnesses permits and encourages the filing of unnecessary litigation. Id. at 286. Hear hear.

Rule 703’s practical consequence of eliminating the hypothetical question has enabled the expert witness qua advocate, and has up-regulated the trial as a contest of opinions and opiners rather than as an adversarial procedure that is designed to get at the truth. Id. at 266-67. Without having access to real, admissible facts and data, the jury is forced to rely upon proxies for the truth: qualifications, demeanor, and courtroom poise, all of which fail the jury and the system in the end.

As a veteran trial judge, Judge Bernstein makes a persuasive case that the non-disclosure permitted under Rule 703 is not really curable under Rule 705. Id. at 288.  If the cross-examination inquiry into reliance material results in the disclosure of inadmissible facts, then judges and the lawyers must deal with the charade of a judicial instruction that the identification of the inadmissible facts is somehow “not for the truth.” Judge Bernstein argues, as have many others, that this “not for the truth” business is an untenable fiction, either not understood or ignored by jurors.

Opposing counsel, of course, may ask for an elucidation of the facts and data relied upon, but when they consider the time and difficulty involved in cross-examining highly experienced, professional witnesses, opposing counsel usually choose to traverse the adverse opinion by presenting their own expert witness’s opinion rather than getting into nettlesome details and risking looking foolish in front of the jury, or even worse, allowing the highly trained adverse expert witness to run off at the mouth.

As powerful as Judge Bernstein’s critique of Rule 703 is, his analysis misses some important points. Lawyers and judges have other motives for not wanting to elicit underlying facts and data: they do not want to “get into the weeds,” and they want to avoid technical questions of valid inference and quality of data. Yet sometimes the truth is in the weeds. Their avoidance of addressing the nature of inference, as well as facts and data, often serves to make gatekeeping a sham.

And then there is the problem that arises from the lack of time, interest, and competence among judges and jurors to understand the technical details of the facts and data, and inferences therefrom, which underlie complex factual disputes in contemporary trials. Cross examination is reduced to the attempt to elicit “sound bites” and “cheap shots,” which can be used in closing argument. This approach is common on both sides of the bar, in trials before judges and juries, and even at so-called Daubert hearings. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”).

The Rule 702 and 703 pretrial hearing is an opportunity to address the highly technical validity questions, but even then, the process is doomed to failure unless trial judges make adequate time and adopt an attitude of real intellectual curiosity to permit a proper exploration of the evidentiary issues. Trial lawyers often discover that a full exploration is technical and tedious, and that it pisses off the trial judge. As much as judges dislike having to serve as gatekeepers of expert witness opinion testimony, they dislike even more having to assess the reasonableness of individual expert witness’s reliance upon facts and data, especially when this inquiry requires a deep exploration of the methods and materials of each relied upon study.

In favor of something like Rule 703, Bernstein’s critique ignores that there are some facts and data that will never be independently admissible. Epidemiologic studies, with their multiple layers of hearsay, come to mind.

Judge Bernstein, as a reformer, is wrong to suggest that the problem is solely in hiding the facts and data from the jury. Rules 702 and 703 march together, and there are problems with both that require serious attention. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1 (2015); see alsoOn Amending Rule 702 of the Federal Rules of Evidence” (Oct. 17, 2015).

And we should remember that the problem is not solely with juries and their need to see the underlying facts and data. Judges try cases too, and can butcher scientific inference with any help from a lay jury. Then there is the problem of relied upon opinions, discussed above. And then there is the problem of unreasonable reliance of the sort that juries cannot discern even if they see the underlying, relied upon facts and data.


[1] Schachtman, “Rule 703 – The Problem Child of Article VII”; and “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008).

[2] See Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008).

[3]  Nathan A. Schachtman, “Rule of Evidence 703—Problem Child of Article VII,” 17 Proof 3 (Spring 2009).

[4]RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011)

[5] SeeGiving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[6] Max Mitchell, “Bernstein Announces Plan to Step Down as Judge,” The Legal Intelligencer (July 29, 2016).

[7] See Schachtman, “Court-Appointed Expert Witnesses,” for Mealey’s Judges & Lawyers in Complex Litigation, Class Actions, Mass Torts, MDL and the Monster Case Conference, in West Palm Beach, Florida (November 8-9, 1999). I don’t recall Judge Bernstein’s exact topic, but I remember he criticized the Pennsylvania Supreme Court’s decision in Blum v. Merrill Dow Pharmaceuticals, 534 Pa. 97, 626 A.2d 537 ( 1993), which reversed a judgment for plaintiffs, and adopted what Judge Bernstein derided as a blending of Frye and Daubert, which he called Fraubert. Judge Bernstein had presided over the Blum trial, which resulted in the verdict for plaintiffs.

Systematic Reviews and Meta-Analyses in Litigation, Part 2

February 11th, 2016

Daubert in Knee’d

In a recent federal court case, adjudicating a plaintiff’s Rule 702 challenge to defense expert witnesses, the trial judge considered plaintiff’s claim that the challenged witness had deviated from PRISM guidelines[1] for systematic reviews, and thus presumably had deviated from the standard of care required of expert witnesses giving opinions about causal conclusions.

Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015) [cited as Batty I]. The trial judge, the Hon. Rebecca R. Pallmeyer, denied plaintiff’s motion to exclude the allegedly deviant witness, but appeared to accept the premise of the plaintiff’s argument that an expert witness’s opinion should be reached in the manner of a carefully constructed systematic review.[2] The trial court’s careful review of the challenged witness’s report and deposition testimony revealed that there had mean no meaningful departure from the standards put forward for systematic reviews. SeeSystematic Reviews and Meta-Analyses in Litigation” (Feb. 5, 2016).

Two days later, the same federal judge addressed a different set of objections by the same plaintiff to two other of the defendant’s, Zimmer Inc.’s, expert witnesses, Dr. Stuart Goodman and Dr. Timothy Wright. Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5095727, (N.D. Ill. Aug. 27, 2015) [cited as Batty II]. Once again, plaintiff Batty argued for the necessity of adherence to systematic review principles. According to Batty, Dr. Wright’s opinion, based upon his review of the clinical literature, was scientifically and legally unreliable because he had not conducted a proper systematic review. Plaintiff alleged that Dr. Wright’s review selectively “cherry picked” favorable studies to buttress his opinion, in violation of systematic review guidelines. The trial court, which had assumed that a systematic review was the appropriate “methodology” for Dr. Vitale, in Batty I, refused to sustain the plaintiff’s challenge in Batty II, in large part because the challenged witness, Dr. Wright, had not claimed to have performed a systematic or comprehensive review, and so his failure to follow the standard methodology did not require the exclusion of his opinion at trial. Batty II at *3.

The plaintiff never argued that Dr. Wright misinterpreted any of his selected studies upon which he relied, and the trial judge thus suggested that Dr. Wright’s discussion of the studies, even if a partial, selected group of studies, would be helpful to the jury. The trial court thus left the plaintiff to her cross-examination to highlight Dr. Wright’s selectivity and lack of comprehensiveness. Apparently, in the trial judge’s view, this expert witness’s failure to address contrary studies did not render his testimony unreliable under “Daubert scrutiny.” Batty II at *3.

Of course, it is no longer the Daubert judicial decision that mandates scrutiny of expert witness opinion testimony, but Federal Rule of Evidence 702. Perhaps it was telling that when the trial court backed away from its assumption, made in Batty I, that guidelines or standards for systematic reviews should inform a Rule 702 analysis, the court cited Daubert, a judicial opinion superseded by an Act of Congress, in 2000. The trial judge’s approach, in Batty II, threatens to make gatekeeping meaningless by deferring to the expert witness’s invocation of personal, idiosyncratic, non-scientific standards. Furthermore, the Batty II approach threatens to eviscerate gatekeeping for clinical practitioners who remain blithely unaware of advances in epidemiology and evidence-based medicine. The upshot of Batty I and II combined seems to be that systematic review principles apply to clinical expert witnesses only if those witness choose to be bound by such principles. If this is indeed what the trial court intended, then it is jurisprudential nonsense.

The trial court, in Batty II, exercised a more searching approach, however, to Dr. Wright’s own implant failure analysis, which he relied upon in an attempt to rebut plaintiff’s claim of defective design. The plaintiff claimed that the load-bearing polymer surfaces of the artificial knee implant experienced undue deformation. Dr. Wright’s study found little or no deformation on the load bearing polymer surfaces of the eight retrieved artificial joints. Batty II at *4.

Dr. Wright assessed deformation qualitatively, not quantitatively, through the use of a “colormetric map of deformation” of the polymer surface. Dr. Wright, however, provided no scale to define or assess how much deformation was represented by the different colors in his study. Notwithstanding the lack of any metric, Dr. Wright concluded that his findings, based upon eight retrieved implants, “suggested” that the kind of surface failing claimed by plaintiff was a “rare event.”

The trial court had little difficulty in concluding that Dr. Wright’s evidentiary base was insufficient, as was his presentation of the study’s data and inferences. The challenged witness failed to explain how his conclusions followed from his data, and thus his proffered testimony fell into the “ipse dixit” category of inadmissible opinion testimony. General Electric v. Joiner, 522 U.S. 136, 146 (1997). In the face of the challenge to his opinions, Dr. Wright supplemented his retrieval study with additional scans of surficial implant wear patterns, but he failed again to show the similarity of previous use and failure conditions in the patients from whom these implants were retrieved and the plaintiff’s case (which supposedly involved aseptic loosening). Furthermore, Dr. Wright’s interpretation of his own retrieval study was inadequate in the trial court’s view because he had failed to rule out other modes of implant failure, in which the polyethylene surface would have been preserved. Because, even as supplemented, Dr. Wright’s study failed to support his proffered opinions, the court held that his opinions, based upon his retrieval study had to be excluded under Rule 702. The trial court did not address the Rule 703 implications for Dr. Wright’s reliance upon a study that was poorly designed and explained, and which lacked the ability to support his contention that the claimed mode of implant failure was a “rare” event. Batty II at *4 – 5.


[1] See David Moher , Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, & The PRISMA Group, “Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement,” 6 PLoS Med e1000097 (2009) [PRISMA].

[2] Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015).

ALI Reporters Are Snookered by Racette Fallacy

April 27th, 2015

In the Reference Manual on Scientific Evidence, the authors of the epidemiology chapter advance instances of acceleration of onset of disease as an example of a situation in which reliance upon doubling of risk will not provide a reliable probability of causation calculation[1]. In a previous post, I suggested that the authors’ assertion may be unfounded. SeeReference Manual on Scientific Evidence on Relative Risk Greater Than Two For Specific Causation Inference” (April 25, 2014). Several epidemiologic methods would permit the calculation of relative risk within specific time windows from first exposure.

The American Law Institute (ALI) Reporters, for the Restatement of Torts, make similar claims.[2] First, the Reporters, citing the Manual’s second edition, repeat the Manual’s claim that:

 “Epidemiologists, however, do not seek to understand causation at the individual level and do not use incidence rates in group to studies to determine the cause of an individual’s disease.”

American Law Institute, Restatement (Third) of Torts: Liability for Physical and Emotional Harm § 28(a) cmt. c(4) & rptrs. notes (2010) [Comment c(4)]. In making this claim, the Reporters ignore an extensive body of epidemiologic studies on genetic associations and on biomarkers, which do address causation implicitly or explicitly, on an individual level.

The Reporters also repeat the Manual’s doubtful claim that acceleration of onset of disease prevents an assessment of attributable risk, although they acknowledge that an average earlier age of onset would form the basis of damages calculations rather than calculations for damages for an injury that would not have occurred but for the tortious exposure. Comment c(4). The Reporters go a step further than the Manual, however, and provide an example of the acceleration-of-onset studies that they have in mind:

“For studies whose results suggest acceleration, see Brad A. Racette, Welding-Related Parkinsonism: Clinical Features, Treatments, and Pathophysiology,” 56 Neurology 8, 12 (2001) (stating that authors “believe that welding acts as an accelerant to cause [Parkinson’s Disease]… .”

The citation to Racette’s 2001 paper[3] is curious, interesting, disturbing, and perhaps revealing. In this 2001 paper, Racette misrepresented the type of study he claimed to have done, and the inferences he drew from his case series are invalid. Any one experienced in the field of epidemiology would have dismissed this study, its conclusions, and its suggested relation between welding and parkinsonism.

Dr. Brad A. Racette teaches and practices neurology at Washington University in St. Louis, across the river from a hotbed of mass tort litigation, Madison County, Illinois. In the 1990s, Racette received referrals from plaintiffs’ attorneys to evaluate their clients in litigation over exposure to welding fumes. Plaintiffs were claiming that their occupational exposures caused them to develop manganism, a distinctive parkinsonism that differs from Parkinson’s disease [PD], but has signs and symptoms that might be confused with PD by unsophisticated physicians unfamiliar with both manganism and PD.

After the publication of his 2001 paper, Racette became the darling of felon Dicky Scruggs and other plaintiffs’ lawyers. The litigation industrialists invited Racette and his team down to Alabama and Mississippi, to conduct screenings of welding tradesmen, recruited by Scruggs and his team, for potential lawsuits for PD and parkinsonism. The result was a paper that helped Scruggs propel a litigation assault against the welding industry.[4]

Racette’s 2001 paper was accompanied by a press release, as have many of his papers, in which he was quoted as stating that “[m]anganism is a very different disease” from PD. Gila Reckess, “Welding, Parkinson’s link suspected” (Feb. 9, 2001)[5].

Racette’s 2001 paper provoked a strongly worded letter that called Racette and his colleagues out for misrepresenting the nature of their work:

“The authors describe their work as a case–control study. Racette et al. ascertained welders with parkinsonism and compared their concurrent clinical features to those of subjects with PD. This is more consistent with a cross-sectional design, as the disease state and factors of interest were ascertained simultaneously. Cross-sectional studies are descriptive and therefore cannot be used to infer causation.”

*****

“The data reported by Racette et al. do not necessarily support any inference about welding as a risk factor in PD. A cohort study would be the best way to evaluate the role of welding in PD.”

Bernard Ravina, Andrew Siderowf, John Farrar, Howard Hurtig, “Welding-related parkinsonism: Clinical features, treatment, and pathophysiology,” 57 Neurology 936, 936 (2001).

As we will see, Dr. Ravina and his colleagues were charitable to suggest that the study was more compatible with a cross-sectional study. Racette had set out to determine “whether welding-related parkinsonism differs from idiopathic PD.” He claimed that he had “performed a case-control study,” with a case group of welders and two control groups. His inferences drawn from his “data” are, however, fallacious because he employed an invalid study design.

In reality, Racette’s paper was nothing more than a chart review, a case series of 15 “welders” in the context of a movement disorder clinic. After his clinical and radiographic evaluation, Racette found that these 15 cases were clinically indistinguishable from PD, and thus unlike manganism. Racette did not reveal whether any of these 15 welders had been referred by plaintiffs’ counsel; nor did he suggest that these welding tradesmen made up a disproportionate number of his patient base in St. Louis, Missouri.

Racette compared his selected 15 career welders with PD to his general movement disorders clinic patient population, for comparison. From the patient population, Racette deployed two “control” groups, one matched for age and sex with the 15 welders, and the other group not matched. The America Law Institute reporters are indeed correct that Racette suggested that the average age of onset for these 15 welders was lower than that for his non-welder patients, but their uncritical embrace overlooked the fact that Racette’s suggestion does not support his claimed inference that in welders, therefore, “welding exposure acts as an accelerant to cause PD.”

Racette’s claimed inference is remarkable because he did not perform an analytical epidemiologic study that was capable of generating causal inferences. His paper incongruously presents odds ratios, although the controls have PD, the disease of interest, which invalidates any analytical inference from his case series. Given the referral and selection biases inherent in tertiary-care specialty practices, this paper can provide no reliable inferences about associations or differences in ages of onset. Even within the confines of a case series misrepresented to be a case-control study, Racette acknowledged that “[s]ubsequent comparisons of the welders with age-matched controls showed no significant differences.”

NOT A CASE-CONTROL STUDY

That Racette wrongly identified his paper as a case-control study is beyond debate. How the journal Neurology accepted the paper for publication is a mystery. The acceptance of the inference by the ALI Reporter, lawyers and judges, is regrettable.

Structurally, Racette’s paper could never quality as a case-control study, or any other analytical epidemiologic study. Here is how a leading textbook on case-control studies defines a case-control study:

“In a case-control study, individuals with a particular condition or disease (the cases) are selected for comparison with a series of individuals in whom the condition or disease is absent (the controls).”

James J. Schlesselman, Case-control Studies. Design, Conduct, Analysis at 14 (N.Y. 1982)[6].

Every patient in Racette’s paper, welders and non-welders, have the outcome of interest, PD. There is no epidemiologic study design that corresponds to what Racette did, and there is no way to draw any useful inference from Racette’s comparisons. Racette’s paper violates the key principle for a proper case-control study; namely, all subjects must be selected independently of the study exposure that is under investigation. Schlesselman stressed that that identifying an eligible case or control must not depend upon that person’s exposure status for any factor under consideration. Id. Racette’s 2001 paper deliberately violated this basic principle.

Racette’s study design, with only cases with the outcome of interest appearing in the analysis, recklessly obscures the underlying association between the exposure (welding) and age in the population. We would, of course, expect self-identified welders to be younger than the average Parkinson’s disease patient because welding is physical work that requires good health. An equally fallacious study could be cobbled together to “show” that the age-of-onset of Parkinson’s disease for sitcom actors (such as Michael J. Fox) is lower than the age-of-onset of Parkinson’s disease for Popes (such as John Paul II). Sitcom actors are generally younger as a group than Popes. Comparing age of onset between disparate groups that have different age distributions generates a biased comparison and an erroneous inference.

The invalidity and fallaciousness of Racette’s approach to studying the age-of-onset issue of PD in welders, and his uncritical inferences, have been extensively commented upon in the general epidemiologic literature. For instance, in studies that compared the age at death for left-handed versus right-handed person, studies reported an observed nine-year earlier death for left handers, leading to (unfounded) speculation that earlier mortality resulted from birth and life stressors and accidents for left handers, living in a world designed to accommodate right-handed person[7]. The inference has been shown to be fallacious and the result of social pressure in the early twentieth century to push left handers to use their right hands, a prejudicial practice that abated over the decades of the last century. Left handers born later in the century were less likely to be “switched,” as opposed to those persons born earlier and now dying, who were less likely to be classified as left-handed, as a result of a birth-cohort effect[8]. When proper prospective cohort studies were conducted, valid data showed that left-handers and right-handers have equivalent mortality rates[9].

Epidemiologist Ken Rothman addressed the fallacy of Racette’s paper at some length in one of his books:

“Suppose we study two groups of people and look at the average age at death among those who die. In group A, the average age of death is 4 years; in group B, it is 28 years. Can we say that being a member of group A is riskier than being a member of group B? We cannot… . Suppose that group A comprises nursery school students and group B comprises military commandos. It would be no surprise that the average age at death of people who are currently military commandos is 28 years or that the average age of people who are currently nursery students is 4 years. …

In a study of factory workers, an investigator inferred that the factory work was dangerous because the average age of onset of a particular kind of cancer was lower in these workers than among the general population. But just as for the nursery school students and military commandos, if these workers were young, the cancers that occurred among them would have to be occurring in young people. Furthermore, the age of onset of a disease does not take into account what proportion of people get the disease.

These examples reflect the fallacy of comparing the average age at which death or disease strikes rather than comparing the risk of death between groups of the same age.”

Kenneth J. Rothman, “Introduction to Epidemiologic Thinking,” in Epidemiology: An Introduction at 5-6 (N.Y. 2002).

And here is how another author of Modern Epidemiology[10] addressed the Racette fallacy in a different context involving PD:

“Valid studies of age-at-onset require no underlying association between the risk factor and aging or birth cohort in the source population. They must also consider whether a sufficient induction time has passed for the risk factor to have an effect. When these criteria and others cannot be satisfied, age-specific or standardized risks or rates, or a population-based case-control design, must be used to study the association between the risk factor and outcome. These designs allow the investigator to disaggregate the relation between aging and the prevalence of the risk factor, using familiar methods to control confounding in the design or analysis. When prior knowledge strongly suggests that the prevalence of the risk factor changes with age in the source population, case-only studies may support a relation between the risk factor and age-at-onset, regardless of whether the inference is justified.”

Jemma B. Wilk & Timothy L. Lash, “Risk factor studies of age-at-onset in a sample ascertained for Parkinson disease affected sibling pairs: a cautionary tale,” 4 Emerging Themes in Epidemiology 1 (2007) (internal citations omitted) (emphasis added).

A properly designed epidemiologic study would have avoided Racette’s fallacy. A relevant cohort study would have enrolled welders in the study at the outset of their careers, and would have continued to follow them even if they changed occupations. A case-control study would have enrolled cases with PD and controls without PD (or more broadly, parkinsonism), with cases and controls selected independently of their exposure to welding fumes. Either method would have determined the rate of PD in both groups, absolutely or relatively. Racette’s paper, which completely lacked non-PD cases, could not have possibly accomplished his stated objectives, and it did not support his claims.

Racette’s questionable work provoked a mass tort litigation and ultimately federal Multi-District Litigation 1535.[11] Ultimately, analytical epidemiologic studies consistently showed no association between welding and PD. A meta-analysis published in 2012 ended the debate[12] as a practical matter, and MDL 1535 is no more. How strange that the ALI reporters chose the Racette work as an example of their claims about acceleration of onset!


[1] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 549, 614 (Wash., DC 3d ed. 2011).

[2] Michael D. Green was an ALI Reporter, and of course, an author of the chapter in the Reference Manual.

[3] Brad A. Racette, L. McGee-Minnich, S. M. Moerlein, J. W. Mink, T. O. Videen, and Joel S. Perlmutter, “Welding-related parkinsonism: clinical features, treatment, and pathophysiology,” 56 Neurology 8 (2001).

[4] See Brad A. Racette, S.D. Tabbal, D. Jennings, L. Good, Joel S. Perlmutter, and Brad Evanoff, “Prevalence of parkinsonism and relationship to exposure in a large sample of Alabama welders,” 64 Neurology 230 (2005); Brad A. Racette, et al., “A rapid method for mass screening for parkinsonism,” 27 Neurotoxicology 357 (2006) (duplicate publication of the earlier, 2005, paper).

[5] Previously available at <http://record.wustl.edu/archive/2001/02-09-01/articles/welding.html>, last visited on June 27, 2005.

[6] See also Brian MacMahon & Dimitrios Trichopoulos, Epidemiology. Principles and Methods at 229 (2ed 1996) (“A case-control study is an inquiry in which groups of individuals are selected based on whether they do (the cases) or do not (the controls) have the disease of which the etiology is to be studied.”); Jennifer L. Kelsey, W.D. Thompson, A.S. Evans, Methods in Observational Epidemiology at 148 (N.Y. 1986) (“In a case-control study, persons with a given disease (the cases) and persons without the disease (the controls) are selected … .”).

[7] See, e.g., Diane F. Halpern & Stanley Coren, “Do right-handers live longer?” 333 Nature 213 (1988); Diane F. Halpern & Stanley Coren, “Handedness and life span,” 324 New Engl. J. Med. 998 (1991).

[8] Kenneth J. Rothman, “Left-handedness and life expectancy,” 325 New Engl. J. Med. 1041 (1991) (pointing out that by comparing age of onset method, nursery education would be found more dangerous than paratrooper training, given that the age at death of pres-schoolers wo died would be much lower than that of paratroopers who died); see also Martin Bland & Doug Altman, “Do the left-handed die young?” Significance 166 (Dec. 2005).

[9] See Philip A. Wolf, Ralph B. D’Agostino, Janet L. Cobb, “Left-handedness and life expectancy,” 325 New Engl. J. Med. 1042 (1991); Marcel E. Salive, Jack M. Guralnik & Robert J. Glynn, “Left-handedness and mortality,” 83 Am. J. Public Health 265 (1993); Olga Basso, Jørn Olsen, Niels Holm, Axel Skytthe, James W. Vaupel, and Kaare Christensen, “Handedness and mortality: A follow-up study of Danish twins born between 1900 and 1910,” 11 Epidemiology 576 (2000). See also Martin Wolkewitz, Arthur Allignol, Martin Schumacher, and Jan Beyersmann, “Two Pitfalls in Survival Analyses of Time-Dependent Exposure: A Case Study in a Cohort of Oscar Nominees,” 64 Am. Statistician 205 (2010); Michael F. Picco, Steven Goodman, James Reed, and Theodore M. Bayless, “Methodologic pitfalls in the determination of genetic anticipation: the case of Crohn’s disease,” 134 Ann. Intern. Med. 1124 (2001).

[10] Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, eds., Modern Epidemiology (3d ed. 2008).

[11] Dicky Scruggs served on the Plaintiffs’ Steering Committee until his conviction on criminal charges.

[12] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

Johnson of Accutane – Keeping the Gate in the Garden State

March 28th, 2015

Flag of Aquitaine     Nelson Johnson is the author of Boardwalk Empire: The Birth, High Times, and Corruption of Atlantic City (2010), a rattling good yarn, which formed the basis for a thinly fictionalized story of Atlantic City under the control of mob boss (and Republican politician) Enoch “Nucky” Johnson. HBO transformed Johnson’s book into a multi-season series, with Steve Buscemi playing Nucky Johnson (Thompson in the series). Robert Strauss, “Judge Nelson Johnson: Atlantic City’s Godfather — A Q&A with Judge Nelson Johnson,” New Jersey Monthly (Aug. 16, 2010).

Nelson Johnson is also known as the Honorable Nelson Johnson, a trial court judge in Atlantic County, New Jersey, where he inherited some of the mass tort docket of Judge Carol Higbee. Judge Higbee has since ascended to the Appellate Division of the New Jersey Superior Court. One of the litigations Judge Johnson presides over is the mosh pit of isotretinoin (Accutane) cases, involving claims that the acne medication causes irritable bowel syndrome (IBS) and Crohn’s disease (CD). Judge Johnson is not only an accomplished writer of historical fiction, but he is also an astute evaluator of the facts and data, and the accompanying lawyers’ rhetoric, thrown about in pharmaceutical products liability litigation.

Perhaps more than his predecessor ever displayed, Judge Johnson recently demonstrated his aptitude for facts and data in serving as a gatekeeper of scientific evidence, as required by the New Jersey Supreme Court, in Kemp v. The State of New Jersey, 174 NJ 412 (2002). Faced with a complex evidentiary display on the validity and reliability of the scientific evidence, Judge Johnson entertained extensive briefings, testimony, and oral argument. When the dust settled, the court ruled that the proffered testimony of Dr, Arthur Kornbluth and Dr. David Madigan did not meet the liberal New Jersey test for admissibility. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015). And in settling the dust, Judge Johnson dispatched several bogus and misleading “lines of evidence,” which have become standard ploys to clog New Jersey and other courthouses.

Case Reports

As so often is the case when there is no serious scientific evidence of harm in pharmaceutical cases, plaintiffs in the Accutane litigation relied heavily upon case and adverse event reports. Id. at *11. Judge Johnson was duly unimpressed, and noted that:

“[u]nsystematic clinical observations or case reports and adverse event reports are at the bottom of the evidence hierarchy.”

Id. at *16.

Bootstrapped, Manufactured Evidence

With respect to case reports that are submitted to the FDA’s Adverse Event Reporting System (FAERS), Judge Johnson acknowledged the “serious limitations” of the hearsay anecdotes that make up such reports. Despite the value of AERs in generating signals for future investigation, Judge Johnson, citing FDA’s own description of the reporting system, concluded that the system’s anecdotal data are “not evidentiary in a court of law.” Id. at 14 (quoting FDA’s description of FAERS).

Judge Johnson took notice of another fact; namely, the industry litigation creates evidence that it then uses to claim causal connections in the courtroom. Plaintiffs’ lawyers in pharmaceutical cases routinely file Medwatch adverse event reports, which thus inflate the “signal,” they claim supports the signal of harm from medication use. This evidentiary bootstrapping machine was hard at work in the isotretinoin litigation. See Derrick J. Stobaugh, Parakkal Deepak, and Eli D. Ehrenpreis, “Alleged Isotretinoin-Associated Inflammatory Bowel Disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 398 (2013) (“Attorney-initiated reports inflate the pharmacovigilance signal of isotretinoin-associated IBD in the FAERS.”). Judge Johnson gave a wry hat tip to plaintiffs’ counsel’s industry, by acknowledging that the litigation industry itself had inflated this signal-generating process:

“The legal profession is a bulwark of our society, yet the courts should never underestimate the resourcefulness of some attorneys.”

In re Accutane, 2015 WL 753674, at *15.

Bias and Confounding

The epidemiologic studies referenced by the parties had identified a fairly wide range of “risk factors” for irritable bowel syndrome, including many prevalent factors in Westernized countries such as prior appendectomy, breast-feeding as an infant, stress, Vitamin D deficiency, tobacco or alcohol use, refined sugars, dietary animal fat, fast food. In re Accutane, 2015 WL 753674, at *9. The court also noted that there were four medications known to be risk factors for IBD: aspirin, nonsteroidal anti-inflammatory medications (NSAIDs), oral contraceptives, and antibiotics.

In reviewing the plaintiffs’ expert witnesses’ methodology, Judge Johnson found that they had been inordinately, and inappropriately selective in the studies chosen for reliance. The challenged witnesses had discounted and discarded most of the available studies in favor of two studies that were small, biased, and not population based. Indeed, one of the studies evidenced substantial selection bias by using referrals to obtain study participants, a process deprecated by the trial court as “cherry picking the subjects.” Id. at *18. “The scientific literature does not support reliance upon such insignificant studies to arrive at conclusions.” Id.

Animal Studies

Both sides in the isotretinoin cases seemed to concede the relative unimportance of animal studies. The trial court discussed the limitations on animal studies, especially the absence of a compelling animal model of human irritable bowel syndrome. Id. at *18.

Cherry Picking and Other Crafty Stratagems

With respect to the complete scientific evidentiary display, plaintiffs asserted that their expert witnesses had considered everything, but then failed to account for most of the evidence. Judge Johnson found this approach deceptive and further evidence of a cherry-picking, pathological methodology:

‘‘Finally, coursing through Plaintiffs’ presentation is a refrain that is a ruse. Repeatedly, counsel for the Plaintiffs and their witnesses spoke of ‛lines of evidence”, emphasizing that their experts examined ‛the same lines of evidence’ as did the experts for the Defense. Counsels’ sophistry is belied by the fact that the examination of the ‘lines of evidence’ by Plaintiffs’ experts was highly selective, looking no further than they wanted to—cherry picking the evidence—in order to find support for their conclusion-driven testimony in support of a hypothesis made of disparate pieces, all at the bottom of the medical evidence hierarchy.’’

Id. at *21.

New Jersey Rule of Evidence 703

The New Jersey rules of evidence, like the Federal Rules, imposes a reasonableness limit on what sorts of otherwise inadmissible evidence an expert witness may rely upon. SeeRULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 9, 2011). Although Judge Johnson did not invoke Rule 703 specifically, he was clearly troubled by plaintiffs’ expert witnesses’ reliance upon an unadjusted odds ratio from an abstract, which did not address substantial confounding from a known causal risk factor – antibiotics use. Judge Johnson concluded that the reliance upon the higher, unadjusted risk figure, contrary to the authors’ own methods and conclusions, and without a cogent explanation for so doing was “pure advocacy” on the part of the witnesses. In re Accutane, 2015 WL 753674, at *17; see also id. at *5 (citing Landrigan v. Celotex Corp., 127 N.J. 404, 417 (1992), for the proposition that “when an expert relies on such data as epidemiological studies, the trial court should review the studies, as well as other information proffered by the parties, to determine if they are of a kind on which such experts ordinarily rely.”).

Discordance Between Courtroom and Professional Opinions

One of plaintiffs’ expert witnesses, Dr. Arthur Kornbluth actually had studied putative association between isotretinoin and CD before he became intensively involved in litigation as an expert witness. In re Accutane, 2015 WL 753674, at *7. Having an expert witness who is a real world expert can be a plus, but not when that expert witness maintains a double standard for assessing causal connections. Back in 2009, Kornbluth published an article, “Ulcerative Colitis Practice Guidelines in Adults” in The American Journal of Gastroenterology. Id. at *10. This positive achievement became a large demerit when cross-examination at the Kemp hearing revealed that Kornbluth had considered but rejected the urgings of a colleague, Dr. David Sachar, to comment on isotretinoin as a cause of irritable bowel syndrome. In front of Judge Johnson, Dr. Kornbluth felt no such scruples. Id. at *11. Dr. Kornbluth’s stature in the field of gastroenterology, along with his silence on the issue in his own field, created a striking contrast with his stridency about causation in the courtroom. The contrast raised the trial court’s level of scrutiny and skepticism about his causal opinions in the New Jersey litigation. Id. (citing and quoting Soldo v. Sandoz Pharms. Corp, 244 F. Supp. 2d 434, 528 (W.D. Pa. 2003) (“Expert opinions generated as the result of litigation have less credibility than opinions generated as the result of academic research or other forms of ‘pure’ research.”) (“The expert’s motivation for his/her study and research is important. … We may not ignore the fact that a scientist’s normal work place is the lab or field, not the courtroom or the lawyer’s office.”).

Meta-Analysis

Meta-analysis has become an important facet of pharmaceutical and other products liability litigation[1]. Fortunately for Judge Johnson, he had before him an extremely capable expert witness, Dr. Stephen Goodman, to explain meta-analysis generally, and two meta-analyses performed on isotretinoin and irritable bowel outcomes. In re Accutane, 2015 WL 753674, at *8. Dr. Goodman explained that:

“the strength of the meta-analysis is that no one feature, no one study, is determinant. You don’t throw out evidence except when you absolutely have to.”

Id. Dr. Goodman further explained that plaintiffs’ expert witnesses’ failure to perform a meta-analysis was telling meta-analysis “can get us closer to the truth.” Id.

Some Nitpicking

Specific Causation

After such a commanding judicial performance by Judge Johnson, nitpicking on specific causation might strike some as ungrateful. For some reason, however, Judge Johnson cited several cases on the appropriateness of expert witnesses’ reliance upon epidemiologic studies for assessing specific causation or for causal apportionment between two or more causes. In re Accutane, 2015 WL 753674, at *5 (citing Landrigan v. Celotex Corp., 127 N.J. 404 (1992), Caterinicchio v. Pittsburgh Corning, 127 N.J. 428 (1992), and Dafler v. Raymark Inc., 259 N.J. Super. 17, 36 (App. Div. 1992), aff’d. o.b. 132 N.J. 96 (1993)). Fair enough, but specific causation was not at issue in the Accutane Kemp hearing, and the Landrigan and Caterinicchio cases are irrelevant to general causation.

In both Landrigan and Caterincchio, the defendants moved for directed verdicts by arguing that, assuming arguendo that asbestos causes colon cancer, the plaintiffs’ expert witnesses had not presented a sufficient opinion to support that Landrigan’s and Caterinnichio’s colon cancers were caused by asbestos. SeeLandrigan v. The Celotex Corporation, Revisited” (June 4, 2013). General causation was thus never at issue, and the holdings never addressed the admissibility of the expert witnesses’ causation opinions. Only sufficiency of the opinions that equated increased risks, less than 2.0, to specific causation was at issue in the directed verdicts, and the appeals taken from the judgments entered on those verdicts.

Judge Johnson, in discussing previous case law suggests that the New Jersey Supreme Court reversed and remanded the Landrigan case for trial, holding that “epidemiologists could help juries determine causation in toxic tort cases and rejected the proposition that epidemiological studies must show a relative risk factor of 2.0 before gaining acceptance by a court.” In re Accutane, 2015 WL 753674, at *5, citing Landrigan, 127 N.J. at 419. A close and fair reading of Landrigan, however, shows that it was about a directed verdict, 127 N.J. at 412, and not a challenge to the use of epidemiologic studies generally, or to their use to show general causation.

Necessity of Precise Biological Mechanism

In the Accutane hearings, the plaintiffs’ counsel and their expert witnesses failed to provide a precise biological mechanism of the cause of IBD. Judge Johnson implied that any study that asserted that Accutane caused IBD ‘‘would, of necessity, require an explication of a precise biological mechanism of the cause of IBD and no one has yet to venture more than alternate and speculative hypotheses on that question.’’ In re Accutane, 2015 WL 753674, at *8. Conclusions of causality, however, do not always come accompanied by understood biological mechanisms, and Judge Johnson demonstrated that the methods and evidence relied upon by plaintiffs’ expert witnesses could not, in any event, allow them to draw causal conclusions.

Interpreting Results Contrary to Publication Authors’ Interpretations

There is good authority, no less than the United States Supreme Court in Joiner, that there is something suspect in expert witnesses’ interpreting a published study’s results in contrary to the authors’ publication. Judge Johnson found that the plaintiffs’ expert witnesses in the Accutane litigation had inferred that two studies showed increased risk when the authors of those studies had concluded that their studies did not appear to show an increased risk. Id. at *17. There will be times, however, when a published study may have incorrectly interpreted its own data, when “real” expert witnesses can, and should, interpret the data appropriately. Accutane was not such a case. In In re Accutane, Judge Johnson carefully documented and explained how the plaintiffs’ expert witnesses’ supposed reinterpretation was little more than attempted obfuscation. His Honor concluded that the witnesses’ distortion of, and ‘‘reliance upon these two studies is fatal and reveals the lengths to which legal counsel and their experts are willing to contort the facts and torture the logic associated with Plaintiffs’ hypothesis.’’ Id. at *18.


[1] “The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011) (The Reference Manual fails to come to grips with the prevalence and importance of meta-analysis in litigation, and fails to provide meaningful guidance to trial judges).

Fixodent Study Causes Lockjaw in Plaintiffs’ Counsel

February 4th, 2015

Litigation Drives Science

Back in 2011, the Fixodent MDL Court sustained Rule 702 challenges to plaintiffs’ expert witnesses. “Hypotheses are verified by testing, not by submitting them to lay juries for a vote.” In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345, 1367 (S.D.Fla.2011), aff’d, Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296 (11th Cir. 2014). The Court found that the plaintiffs had raised a superficially plausible hypothesis, but that they had not verified the hypothesis by appropriate testing[1].

Like dentures to Fixodent, the plaintiffs stuck to their claims, and set out to create the missing evidence. Plaintiffs’ counsel contracted with Dr. Salim Shah and his companies Sarfez Pharmaceuticals, Inc. and Sarfez USA, Inc. (“Sarfez”) to conduct human research in India, to support their claims that zinc in denture cream causes neurological damage[2]In re Denture Cream Prods. Liab. Litig., Misc. Action 13-384 (RBW), 2013 U.S. Dist. LEXIS 93456, *2 (D.D.C. July 3, 2013).  When the defense learned of this study, and the plaintiffs’ counsel’s payments of over $300,000, to support the study, they sought discovery of raw data, study protocol, statistical analyses, and other materials from plaintiffs’ counsel.  Plaintiffs’ counsel protested that they did not have all the materials, and directed defense counsel to Sarfez.  Although other courts have made counsel produce similar materials from the scientists and independent contractors they engaged, in this case, defense counsel followed the trail of documents to contractor, Sarfez, with subpoenas in hand.  Id. at *3-4.

The defense served a Rule 45 subpoena on Sarfez, which produced some, but not all responsive documents. Proctor & Gamble pressed for the missing materials, including study protocols, analytical reports, and raw data.  Id. at *12-13.  Judge Reggie Walton upheld the subpoena, which sought underlying data and non-privileged correspondence, to be within the scope of Rules 26(b) and 45, and not unduly burdensome. Id. at *9-10, *20. Sarfez attempted to argue that the requested materials, listed as email attachments, might not exist, but Judge Walton branded the suggestion “disingenuous.”  Attachments to emails should be produced along with the emails.  Id. at *12 (citing and collecting cases). Although Judge Walton did not grant a request for forensic recovery of hard-drive data or for sanctions, His Honor warned Sarfez that it might be required to bear the cost of forensic data recovery if it did not comply the court’s order.  Id. at *15, *22.

Plaintiffs Put Their Study Into Play

The study at issue in the subpoena was designed by Frederick K. Askari, M.D., Ph.D., an associate professor of hepatology, in the University of Michigan Health System. In re Denture Cream Prods. Liab. Litig., No. 09–2051–MD, 2015 WL 392021, at *7 (S.D. Fla. Jan. 28, 2015). At the instruction of plaintiffs’ counsel, Dr. Askari sought to study the short-term effects of Fixodent on copper absorption in humans. Working in India, Askari conducted the study on 24 participants, who were given a controlled diet for 36 days. Of the 24 participants, 12, randomly selected, received 12 grams of Fixodent per day (containing 204 mg. of zinc). Another six participants, randomly selected, were given zinc acetate, three times per day (150 mg of zinc), and the remaining six participants received placebo, three times per day.

A study protocol was approved by an independent group[3], id. at *9, and the study was supposed to be conducted with a double blind. Id. at *7. Not surprisingly, those participants who received doses of Fixodent or zinc acetate had higher urinary levels of zinc (pee < 0.05). The important issue, however, was whether the dietary zinc levels affect copper excretion in a way that would support plaintiffs’ claims that copper levels were lowered sufficiently by Fixodent to cause a syndromic neurological disorder. The MDL Court ultimately concluded that plaintiffs’ expert witnesses’ opinions on general causation claims were not sufficiently supported to satisfy the requirements of Rule 702, and upheld defense challenges to those expert witnesses. In doing so, the MDL Court had much of interest to say about case reports, weight of the evidence, and other important issues. This post, however, concentrates on the deviations of one study, commissioned by plaintiffs’ counsel, from the scientific standard of care. The Askari “research” makes for a fascinating case study of how not to conduct a study in a litigation caldron.

Non-Standard Deviations

The First Deviation – Changing the Ascertainment Period After the Data Are Collected

The protocol apparently identified a primary endpoint to be:

“the mean increase in [copper 65] excretion in fecal matter above the baseline (mg/day) averaged over the study period … to test the hypothesis that the release of [zinc] either from Fixodent or Zinc Acetate impairs [copper 65] absorption as measured in feces.”

The study outcome, on the primary end point, was clear. The plaintiffs’ testifying statistician, Hongkun Wang, stated in her deposition that the fecal copper (whether isotope Cu63 or Cu65) was not different across the three groups (Fixodent, zinc acetate, and placebo). Id. at *9[4]. Even Dr. Askari himself admitted that the total fecal copper levels were not increased in the Fixodent group compared with the placebo control group. Id. at *9.[5]

Apparently after obtaining the data, and finding no difference in the pre-specified end point of average fecal copper levels between Fixodent and placebo groups, Askari turned to a new end point, measured in a different way, not described in the protocol as the primary end point.

The Second Deviation – Changing Primary End Point After the Data Are Collected

In the early (days 3, 4, and 5) and late (days 31, 32, and 33) part of the Study, participants received a dose of purified copper 65[6] to help detect the “blockade of copper.” Id. at 8*. The participants’ fecal copper 65 levels were compared to their naturally occurring copper 63 levels. According to Dr. Askari:

“if copper is being blocked in the Fixodent and zinc acetate test subjects from exposure to the zinc in the test product (Fixodent) and positive control (zinc acetate), the ratio of their fecal output of copper 65 as compared to their fecal output of copper 63 would increase relative to the control subjects, who were not dosed with zinc. In short, a higher ratio of copper 65 to copper 63 reflects blocking of copper.”

Id.

Askari analyzed the ratio of two copper isotopes (Cu65 /Cu63), in the limited period of observation to study days 31 to 33. Id. at *9. Askari thus changed the outcome to be measured, the timing of the measurement, and manner of measurement (average over entire period versus amount on days 31 to 33). On this post hoc, non-prespecified end point, Askari claimed to have found “significant” differences.

The MDL Court expressed its skepticism and concern over the difference between the protocol’s specified end point, and one that came into the study only after the data were obtained and analyzed. The plaintiffs claimed that it was their (and Askari’s) intention from the initial stages of designing the Fixodent Blockade Study to use the Cu65/Cu63 ratio as the primary end point. According to the plaintiffs, the isotope ratio was simply better articulated and “clarified” as the primary end point in the final report than it was in the protocol. The Court was not amused or assuaged by the plaintiffs’ assurances. The study sponsor, Dr. Salim Shah could not point to a draft protocol that indicated the isotope ratio as the end point; nor could Dr. Shah identify a request for this analysis by Wang until after the study was concluded. Id. at *9.[7]

Ultimately, the Court declared that whether the protocol was changed post hoc after the primary end point provided disappointing analysis, or the isotope ratio was carelessly omitted from the protocol, the design or conduct of the study was “incompatible with reliable scientific methodology.”

The Third Deviation – Changing the Standard of “Significance” After the Data Are Collected and P-Values Are Computed

The protocol for the Blockade study called for a pre-determined Type I error rate (p-value) of no more than 5 percent.[8] Id. at *10. The difference in the isotope ratio showed an attained level of significance probability of 5.7 percent, and thus even the post hoc end point missed the prespecified level of significance. The final protocol changed the value of “significance” to 10 percent, to permit the plaintiffs to declare a “statistically significant” result. Dr. Wang admitted in deposition that she doubled the acceptable level of Type I error only after she obtained the data and calculated the p-value of 0.057. Id. at *10.[9]

The Court found that this deliberate moving of the statistical goal post reflected a “lack of objectivity and reliability,” which smacked of contrivance[10].

The Court found that the study’s deviations from the protocol demonstrated a lack of objectivity. The inadequacy of the Study’s statistical analysis plan supported the Court’s conclusion that Dr. Askari’s supposed finding of a “statistically significant” difference in fecal copper isotope ratio between Fixodent and placebo group participants was “not based on sufficiently reliable and objective scientific methodology” and thus could not support plaintiffs’ expert witnesses’ general causation claims.

The Fourth Deviation – Failing to Take Steps to Preserve the Blind

The protocol called for a double-blinded study, with neither the participants nor the clinical investigators knowing which participant was in which group. Rather than delivering the three different groups capsules that looked similar, the group each received starkly different looking capsules. Id. at *11. The capsules for one set were apparently so large that the investigators worried whether the participants would comply with the dosing regimen.

The Fifth Deviation – Failing to Take Steps to Keep Biological Samples From Becoming Contaminated

Documents and emails from Dr. Shah acknowledged that there had been “difficulties in storing samples at appropriate temperature.” Id. at *11. Fecal samples were “exposed to unfrozen and undesirable temperature conditions.” Dr. Shah called for remedial steps from the Study manager, but there was no documentation that such steps were taken to correct the problem. Id.

The Consequences of Discrediting the Study

Dr. Askari opined that the Study, along with other evidence, shows that Fixodent can cause copper deficiency myeloneuropathy (“CDM”). The plaintiffs, of course, argued that the Defendants’ criticisms of the Fixodent

Study’s methodology went merely to the “weight rather than admissibility.” Id. at *9. Askari’s study was but one leg of the stool, but the defense’s thorough discrediting of the study was an important step in collapsing the support for the plaintiffs’ claims. As the MDL Court explained:

“The Court cannot turn a blind eye to the myriad, serious methodological flaws in the Fixodent Blockade Study and conclude they go to weight rather than admissibility. While some of these flaws, on their own, may not be serious enough to justify exclusion of the Fixodent Blockade Study; taken together, the Court finds Fixodent Blockade Study is not “good science,” and is not admissible. Daubert, 509 U.S. at 593 (internal quotation marks and citation omitted).”

Id. at *11.

A study, such as the Fixodent Blockade Study, is not itself admissible, but the deconstruction of the study upon which plaintiffs’ expert witnesses relied, led directly to the Court’s decision to exclude those witnesses. The Court omitted any reference to Federal Rule of Evidence 703, which addresses the requirements of facts and data, otherwise inadmissible, which may be relied upon by expert witnesses in reaching their opinions.


 

[1] SeePhiladelphia Plaintiff’s Claims Against Fixodent Prove Toothless” (May 2, 2012); Jacoby v. Rite Aid Corp., 2012 Phila. Ct. Com. Pl. LEXIS 208 (2012), aff’d, 93 A.3d 503 (Pa. Super. 2013); “Pennsylvania Superior Court Takes The Bite Out of Fixodent Claims” (Dec. 12, 2013).

[2] SeeUsing the Rule 45 Subpoena to Obtain Research Data” (July 24, 2013)

[3] The group was identified as the Ethica Norma Ethical Committee.

[4] citing Wang Dep. at 56:7–25, Aug. 13, 2013), and Wang Analysis of Fixodent Blockade Study [ECF No. 2197–56] (noting “no clear treatment effect on Cu63 or Cu65”).

[5] Askari Dep. at 69:21–24, June 20, 2013.

[6] Copper 65 is not a typical tracer; it is not radioactive. Naturally occurring copper consists almost exclusively of two stable (non-radioactive) isotope, Cu65 about 31 percent, Cu63 about 69 percent. See, e.g., Manuel Olivares, Bo Lönnerdal, Steve A Abrams, Fernando Pizarro, and Ricardo Uauy, “Age and copper intake do not affect copper absorption, measured with the use of 65Cu as a tracer, in young infants,” 76 Am. J. Clin. Nutr. 641 (2002); T.D. Lyon, et al., “Use of a stable copper isotope (65Cu) in the differential diagnosis of Wilson’s disease,” 88 Clin. Sci. 727 (1995).

[7] Shah Dep. at 87:12–25; 476:2–536:12, 138:6–142:12, June 5, 2013).

[8] The reported decision leaves unclear how the analysis would proceed, whether by ANOVA for the three groups, or t-tests, and whether there was multiple testing.

[9] Wang Dep. at 151:13–152:7; 153:15–18.

[10] 2015 WL 392021, at *10, citing Perry v. United States, 755 F.2d 888, 892 (11th Cir. 1985) (“A scientist who has a formed opinion as to the answer he is going to find before he even begins his research may be less objective than he needs to be in order to produce reliable scientific results.”); Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 n. 7 (11th Cir.2005) (“In evaluating the reliability of an expert’s method … a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result.” (alteration added)).

 

More Case Report Mischief in the Gadolinium Litigation

November 28th, 2014

The Decker case is one curious decision, by the MDL trial court, and the Sixth Circuit. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014). First, the Circuit went out of its way to emphasize that the trial court had discretion, not only in evaluating the evidence on a Rule 702 challenge, but also in devising the criteria of validity[1]. Second, the courts ignored the role and the weight being assigned to Federal Rule of Evidence 703, in winnowing the materials upon which the defense expert witnesses could rely. Third, the Circuit approved what appeared to be extremely asymmetric gatekeeping of plaintiffs’ and defendant’s expert witnesses. The asymmetrical standards probably were the basis for emphasizing the breadth of the trial court’s discretion to devise the criteria for assessing scientific validity[2].

In barring GEHC’s expert witnesses from testifying about gadolinium-naive nephrogenic systemic fibrosis (NSF) cases, Judge Dan Polster, the MDL judge, appeared to invoke a double standard. Plaintiffs could adduce any case report or adverse event report (AER) on the theory that the reports were relevant to “notice” of a “safety signal” between gadolinium-based contrast agents in MRI and NSF. Defendants’ expert witnesses, however, were held to the most exacting standards of clinical identity with the plaintiff’s particular presentation of NSP, biopsy-proven presence of Gd in affected tissue, and documentation of lack of GBCA-exposure, before case reports would be permitted as reliance materials to support the existence of gadolinium-naïve NSF.

A fourth issue with the Decker opinion is the latitude it permitted the district court to allow testimony from plaintiffs’ pharmacovigilance expert witness, Cheryl Blume, Ph.D., over objections, to testify about the “signal” created by the NSF AERs available to GEHC. Decker at *11. At the same trial, the MDL judge prohibited GEHC’s expert witness, Dr. Anthony Gaspari, to testify that the AERs described by Blume did not support a clinical diagnosis of NSF.

On a motion for reconsideration, Judge Polster reaffirmed his ruling on grounds that

(1) the AERs were too incomplete to rule in or rule out a diagnosis of NSF, although they were sufficient to create a “signal”;

(2) whether the AERs were actual cases of NSF was not relevant to their being safety signals;

(3) Dr. Gaspari was not an expert in pharmacovigilance, which studied “signals” as opposed to causation; and

(4) Dr. Gaspari’s conclusion that the AERs were not NSF was made without reviewing all the information available to GEHC at the time of the AERs.

Decker at *12.

The fallacy of this stingy approach to Dr. Gaspari’s testimony lies in the courts’ stubborn refusal to recognize that if an AER was not, as a matter of medical science, a case of NSF, then it could not be a “signal” of a possible causal relationship between GBCA and NSF. Pharmacovigilance does not end with ascertaining signals; yet the courts privileged Blume’s opinions on signals even though she could not proceed to the next step and evaluate diagnostic accuracy and causality. This twisted logic makes a mockery of pharmacovigilance. It also led to the exclusion of Dr. Gaspari’s testimony on a key aspect of plaintiffs’ liability evidence.

The erroneous approach pioneered by Judge Polster was compounded by the district court’s refusal to give a jury instruction that AERs were only relevant to notice, and not to causation. Judge Polster offered his reasoning that “the instruction singles out one type of evidence, and adds, rather than minimizes, confusion.” Judge Polster cited the lack of any expert witness testimony that suggested that AERs showed causation and “besides, it doesn’t matter because those patients are not, are not the plaintiffs.” Decker at *17.

The lack of dispute about the meaning of AERs would have seemed all the more reason to control jury speculation about their import, and to give a binding instruction on AERs and their limited significance. As for the AER patients’ not being the plaintiffs, well, the case report patients were not the plaintiffs, either. This last reason is not even wrong[3]. The Circuit, in affirming, turned a blind eye to the district court’s exercise of discretion in a way that systematically increased the importance of Blume’s testimony on signals, while systematically hobbling the defendant’s expert witnesses.


[1]THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS” (Nov. 12, 2014).

[2]Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports” (Nov. 24, 2014).

[3] “Das ist nicht nur nicht richtig, es ist nicht einmal falsch!” The quote is attributed to Wolfgang Pauli in R. E. Peierls, “Wolfgang Ernst Pauli, 1900-1958,” 5 Biographical Memoirs Fellows Royal Soc’y 175, 186 (1960).