TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Multiplicity in the Third Circuit

September 21st, 2017

In Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283 (W. D. Pa.), plaintiffs claimed that their employer’s reduction in force unlawfully targeted workers over 50 years of age. Plaintiffs lacked any evidence of employer animus against old folks, and thus attempted to make out a statistical disparate impact claim. The plaintiffs placed their chief reliance upon an expert witness, Michael A. Campion, to analyze a dataset of workers agreed to have been the subject of the R.I.F. For the last 30 years, Campion has been on the faculty in Purdue University. His academic training and graduate degrees are in industrial and organizational psychology. Campion has served an editor of Personnel Psychology, and as a past president of the Society for Industrial and Organizational Psychology. Campion’s academic website page notes that he manages a small consulting firm, Campion Consulting Services1.

The defense sought to characterize Campion as not qualified to offer his statistical analysis2. Campion did, however, have some statistical training as part of his master’s level training in psychology, and his professional publications did occasionally involve statistical analyses. To be sure, Campion’s statistical acumen paled in comparison to the defense expert witness, James Rosenberger, a fellow and a former vice president of the American Statistical Association, as well as a full professor of statistics in Pennsylvania State University. The threshold for qualification, however, is low, and the defense’s attack on Campion’s qualifications failed to attract the court’s serious attention.

On the merits, the defense subjected Campion to a strong challenge on whether he had misused data. The defense’s expert witness, Prof. Rosenberger, filed a report that questioned Campion’s data handling and statistical analyses. The defense claimed that Campion had engaged in questionable data manipulation by including, in his RIF analysis, workers who had been terminated when their plant was transferred to another company, as well as workers who retired voluntarily.

Using simple z-score tests, Campion compared the ages of terminated and non-terminated employees in four subgroups, ages 40+, 45+, 50+, and 55+. He did not conduct an analysis of the 60+ subgroup on the claim that this group had too few members for the test to have sufficient power3Campion found a small z-score for the 40+ versus <40 age groups comparison (z =1.51), which is not close to statistical significance at the 5% level. On the defense’s legal theory, this was the crucial comparison to be made under the Age Discrimination in Employment Act (ADEA). The plaintiffs, however, maintained that they could make out a case of disparate impact by showing age discrimination at age subgroups that started above the minimum specified by the ADEA. Although age is a continuous variable, Campion decided to conduct z-scores on subgroups that were based upon five-year increments. For the 45+, 50+, and 55+ age subgroups, he found z-scores that ranged from 2.15 to 2.46, and he concluded that there was evidence of disparate impact in the higher age subgroups4. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 (W.D. Pa. July 13, 2015) (McVerry, S.J.)

The defense, and apparently the defense expert witnesses, branded Campion’s analysis as “data snooping,” which required correction for multiple comparisons. In the defense’s view, the multiple age subgroups required a Bonferroni correction that would have diminished the critical p-value for “significance” by a factor of four. The trial court agreed with the defense contention about data snooping and multiple comparisons, and excluded Campion’s opinion of disparate impact, which had been based upon finding statistically significant disparities in the 45+, 50+, and 55+ age subgroups. 2015 WL 4232600, at *13. The trial court noted that Campion, in finding significant disparities in terminations in the subgroups, but not in the 40+ versus <40 analysis:

[did] not apply any of the generally accepted statistical procedures (i.e., the Bonferroni procedure) to correct his results for the likelihood of a false indication of significance. This sort of subgrouping ‘analysis’ is data-snooping, plain and simple.”

Id. After excluding Campion’s opinions under Rule 702, as well as other evidence in support of plaintiffs’ disparate impact claim, the trial court granted summary judgment on the discrimination claims. Karlo v. Pittsburgh Glass Works, LLC, No. 2:10–cv–1283, 2015 WL 5156913 (W. D. Pa. Sept. 2, 2015).

On plaintiffs’ appeal, the Third Circuit took the wind out of the attack on Campion by holding that the ADEA prohibits disparate impacts based upon age, which need not necessarily be on workers’ being over 40 years old, as opposed to being at least 40 years old. Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 66-68 (3d Cir. 2017). This holding took the legal significance out of the statistical insignificance of Campion’s comparison 40+ versus <40 age-group termination rates. Campion’s subgroup analyses were back in play, but the Third Circuit still faced the question whether Campion’s conclusions, based upon unadjusted z-scores and p-values, offended Rule 702.

The Third Circuit noted that the district court had identified three grounds for excluding Campion’s statistical analyses:

(1) Dr. Campion used facts or data that were not reliable;

(2) he failed to use a statistical adjustment called the Bonferroni procedure; and

(3) his testimony lacks ‘‘fit’’ to the case because subgroup claims are not cognizable.

849 F.3d at 81. The first issue was raised by the defense’s claims of Campion’s sloppy data handling, and inclusion of voluntarily retired workers and workers who were terminated when their plant was turned over to another company. The Circuit did not address these data handling issues, which it left for the trial court on remand. Id. at 82. The third ground went out of the case with the appellate court’s resolution of the scope of the ADEA. The Circuit did, however, engage on the issue whether adjustment for multiple comparisons was required by Rule 702.

On the “data-snooping” issue, the Circuit concluded that the trial court had applied “an incorrectly rigorous standard for reliability.” Id. The Circuit acknowledged that

[i]n theory, a researcher who searches for statistical significance in multiple attempts raises the probability of discovering it purely by chance, committing Type I error (i.e., finding a false positive).”

849 F.3d at 82. The defense expert witness contended that applying the Bonferroni adjustment, which would have reduced the critical significance probability level from 5% to 1%, would have rendered Campion’s analyses not statistically significant, and thus not probative of disparate impact. Given that plaintiffs’ cases were entirely statistical, the adjustment would have been fatal to their cases. Id. at 82.

At the trial level and on appeal, plaintiffs and Campion had objected to the data-snooping charge on ground that

(1) he had engaged in only four subgroups;

(2) virtually all subgroups were statistically significant;

(3) his methodology was “hypothesis driven” and involved logical increments in age to explore whether the strength of the evidence of age disparity in terminations continued in each, increasingly older subgroup;

(4) his method was analogous to replications with different samples; and

(5) his result was confirmed by a single, supplemental analysis.

Id. at 83. According to the plaintiffs, Campion’s approach was based upon the reality that age is a continuous, not a dichotomous variable, and he was exploring a single hypothesis. A.240-241; Brief of Appellants at 26. Campion’s explanations do mitigate somewhat the charge of “data snooping,” but they do not explain why Campion did not use a statistical analysis that treated age as a continuous variable, at the outset of his analysis. The single, supplemental analysis was never described or reported by the trial or appellate courts.

The Third Circuit concluded that the district court had applied a ‘‘merits standard of correctness,’’ which is higher than what Rule 702 requires. Specifically, the district court, having identified a potential methodological flaw, did not further evaluate whether Campion’s opinion relied upon good grounds. 849 F.3d at 83. The Circuit vacated the judgment below, and remanded the case to the district court for the opportunity to apply the correct standard.

The trial court’s acceptance that an adjustment was appropriate or required hardly seems a “merits standard.” The use of a proper adjustment for multiple comparisons is very much a methodological concern. If Campion could reach his conclusion only by way of an inappropriate methodology, then his conclusion surely would fail the requirements of Rule 702. The trial court did, however, appear to accept, without explicit evidence, that the failure to apply the Bonferroni correction made it impossible for Campion to present sound scientific argument for his conclusion that there had been disparate impact. The trial court’s opinion also suggests that the Bonferroni correction itself, as opposed to some more appropriate correction, was required.

Unfortunately, the reported opinions do not provide the reader with a clear account of what the analyses would have shown on the correct data set, without improper inclusions and exclusions, and with appropriate statistical adjustments. Presumably, the parties are left to make their cases on remand.

Based upon citations to sources that described the Bonferroni adjustment as “good statistical practice,” but one that is ‘‘not widely or consistently adopted’’ in the behavioral and social sciences, the Third Circuit observed that in some cases, failure to adjust for multiple comparisons may “simply diminish the weight of an expert’s finding.”5 The observation is problematic given that Kumho Tire suggests that an expert witness must use “in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho Tire Co. v. Carmichael, 526 U.S. 137, 150, (1999). One implication is that courts are prisoners to prevalent scientific malpractice and abuse of statistical methodology. Another implication is that courts need to look more closely at the assumptions and predicates for various statistical tests and adjustments, such as the Bonferroni correction.

These worrisome implications are exacerbated by the appellate court’s insistence that the question whether a study’s result was properly calculated or interpreted “goes to the weight of the evidence, not to its admissibility.”6 Combined with citations to pre-Daubert statistics cases7, judicial comments such as these can appear to be a general disregard for the statutory requirements of Rules 702 and 703. Claims of statistical significance, in studies with multiple exposure and multiple outcomes, are frequently not adjusted for multiple comparisons, without notation, explanation, or justification. The consequence is that study results are often over-interpreted and over-sold. Methodological errors related to multiple testing or over-claiming statistical significance are commonplace in tort litigation over “health-effects” studies of birth defects, cancer, and other chronic diseases that require epidemiologic evidence8.

In Karlo, the claimed methodological error is beset by its own methodological problems. As the court noted, adjustments for multiple comparisons are not free from methodological controversy9. One noteworthy textbook10 labels the Bonferroni correction as an “awful response” to the problem of multiple comparisons. Aside from this strident criticism, there are alternative approaches to statistical adjustment for multiple comparisons. In the context of the Karlo case, the Bonferroni might well be awful because Campion’s four subgroups are hardly independent tests. Because each subgroup is nested within the next higher age subgroup, the subgroup test results will be strongly correlated in a way that defeats the mathematical assumptions of the Bonferroni correction. On remand, the trial court in Karlo must still make his Rule 702 gatekeeping decision on the methodological appropriateness of whether Campion’s properly considered the role of multiple subgroups, and multiple anaslyses run on different models.


1 Although Campion describes his consulting business as small, he seems to turn up in quite a few employment discrimination cases. See, e.g., Chen-Oster v. Goldman, Sachs & Co., 10 Civ. 6950 (AT) (JCF) (S.D.N.Y. 2015); Brand v. Comcast Corp., Case No. 11 C 8471 (N.D. Ill. July 5, 2014); Powell v. Dallas Morning News L.P., 776 F. Supp. 2d 240, 247 (N.D. Tex. 2011) (excluding Campion’s opinions), aff’d, 486 F. App’x 469 (5th Cir. 2012).

2 See Defendant’s Motion to Bar Dr. Michael Campion’s Statistical Analysis, 2013 WL 11260556.

3 There was no mention of an effect size for the lower aged subgroups, and a power calculation for the 60+ subgroup’s probability of showing a z-score greater than two. Similarly, there was no discussion or argument about why this subgroup could not have been evaluated with Fisher’s exact test. In deciding the appeal, the Third Circuit observed that “Dr. Rosenberger test[ed] a subgroup of sixty-and-older employees, which Dr. Campion did not include in his analysis because ‘[t]here are only 14 terminations, which means the statistical power to detect a significant effect is very low’. A.244–45.” Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 82 n.15 (3d Cir. 2017).

4 In the trial court’s words, the z-score converts the difference in termination rates into standard deviations. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 n.13 (W.D. Pa. July 13, 2015). According to the trial court, Campion gave a rather dubious explanation of the meaning of the z-score: “[w]hen the number of standard deviations is less than –2 (actually–1.96), there is a 95% probability that the difference in termination rates of the subgroups is not due to chance alone” Id. (internal citation omitted).

5 See 849 F.3d 61, 83 (3d Cir. 2017) (citing and quoting from Paetzold & Willborn § 6:7, at 308 n.2) (describing the Bonferroni adjustment as ‘‘good statistical practice,’’ but ‘‘not widely or consistently adopted’’ in the behavioral and social sciences); see also E.E.O.C. v. Autozone, Inc., No. 00-2923, 2006 WL 2524093, at *4 (W.D. Tenn. Aug. 29, 2006) (‘‘[T]he Court does not have a sufficient basis to find that … the non-utilization [of the Bonferroni adjustment] makes [the expert’s] results unreliable.’’). And of course, the Third Circuit invoked the Daubert chestnut: ‘‘Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but

admissible evidence.’’ Daubert, 509 U.S. 579, 596 (1993).

6 See 849 F.3d at 83 (citing Leonard v. Stemtech Internat’l Inc., 834 F.3d 376, 391 (3d Cir. 2016).

7 See 849 F.3d 61, 83 (3d Cir. 2017), citing Bazemore v. Friday, 478 U.S. 385, 400 (1986) (‘‘Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.’’).

8 See Hans Zeisel & David Kaye, Prove It with Figures: Empirical Methods in Law and Litigation 93 & n.3 (1997) (criticizing the “notorious” case of Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986), for its erroneous endorsement of conclusions based upon “statistically significant” studies that explored dozens of congenital malformation outcomes, without statistical adjustment). The authors do, however, give an encouraging example of a English trial judge who took multiplicity seriously. Reay v. British Nuclear Fuels (Q.B. Oct. 8,1993) (published in The Independent, Nov. 22,1993). In Reay, the trial court took seriously the multiplicity of hypotheses tested in the study relied upon by plaintiffs. Id. (“the fact that a number of hypotheses were considered in the study requires an increase in the P-value of the findings with consequent reduction in the confidence that can be placed in the study result … .”), quoted in Zeisel & Kaye at 93. Zeisel and Kaye emphasize that courts should not be overly impressed with claims of statistically significant findings, and should pay close attention to how expert witnesses developed their statistical models. Id. at 94.

9 See David B. Cohen, Michael G. Aamodt, and Eric M. Dunleavy, Technical Advisory Committee Report on Best Practices in Adverse Impact Analyses (Center for Corporate Equality 2010).

10 Kenneth J. Rothman, Sander Greenland, and Timoth L. Lash, Modern Epidemiology 273 (3d ed. 2008); see also Kenneth J. Rothman, “No Adjustments Are Needed for Multiple Comparisons,” 1 Epidemiology 43, 43 (1990)

WOE — Zoloft Escapes a MDL While Third Circuit Creates a Conceptual Muddle

July 31st, 2017

Multidistrict Litigations (MDLs) can be “muddles” that are easy to get in, but hard to get out of. Pfizer and subsidiary Greenstone fabulously escaped a muddle through persistent lawyering and the astute gatekeeping of a district judge, in the Eastern District of Pennsylvania. That judge, the Hon. Cynthia Rufe, sustained objections to the admissibility of plaintiffs’ epidemiologic expert witness Anick Bérard. When the MDL’s plaintiffs’ steering committee (PSC) demanded, requested, and begged for a do over, Judge Rufe granted them one more chance. The PSC put their litigation industry eggs in a single basket, carried by statistician Nicholas Jewell. Unfortunately for the PSC, Judge Rufe found Jewell’s basket to be as methodologically defective as Bérard’s, and Her Honor excluded Jewell’s proffered testimony. Motions, paper, and appeals followed, but on June 2, 2017, the Third Circuit declared that the PSC and its clients had had enough opportunities to get through the gate. Their baskets of methodological deplorables were not up to snuff. In re Zoloft Prod. Liab. Litig., No. 16-2247 , __ F.3d __, 2017 WL 2385279, 2017 U.S. App. LEXIS 9832 (3d Cir. June 2, 2017) (affirming exclusion of Jewell’s dodgy opinions, which involved multiple methodological flaws and failures to follow any methodology faithfully) [Slip op. cited below as Zoloft].

Plaintiffs Attempt to Substitute WOE for Depressingly Bad Expert Witness Opinion

The ruse of conflating “weight of the evidence,” as used to describe the appellate standard of review for sustaining or reversing a trial court’s factual finding with a purported scientific methodology for inferring causation, was on full display by the PSC in their attack on Judge Rufe’s gatekeeping. In their appellate brief in the Court of Appeals for the Third Circuit, the PSC asserted that Jewell had used a “weight of the evidence method,” even though that phrase, “weight of the evidence” (WOE) was never used in Jewell’s litigation reports. The full context of the PSC’s argument and citations to Milward make clear a deliberate attempt to conflate WOE as an appellate judicial standard for reviewing jury fact finding and a purported scientific methodology. See Appellants’ Opening Brief at 54 (Aug. 10, 2016) [cited as PSC] (asserting that “[a]t all times, the ultimate evaluation of the weight of the evidence is a jury question”; citing Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 20 (1st Cir. 2011), cert. denied, 133 S. Ct. 63 (2012).

Having staked the ground that WOE is akin to a jury’s factual finding, and thus immune to any but the most extraordinary trial court action or appellate intervention, the PSC then pivoted to claim that Jewell’s WOE-ful method was nothing much more than an assessment of “the totality of the available scientific evidence, guided by the well-accepted Bradford-Hill criteria.” PSC at 3, 4, 7. This maneuver allowed the PSC to argue, apparently with a straight face, that WOE methodology as used by Jewell, had been generally accepted in the scientific community, as well as by the Third Circuit, in previous cases in which the court accepted the use of Bradford Hill’s considerations as a reliable method for establishing general causation. See PSC at 4 (citing Gannon v. United States, 292 F. App’x 170, 173 n.1 (3d Cir. 2008)). Jewell then simply plugged in his expertise and “40 years of experience,” and the desired conclusion of causation popped out. Id. Quod erat demonstrandum.

In pressing its point, the PSC took full advantage of loose, inaccurate language from the American Law Institute’s Restatement’s notorious comment C:

No algorithm exists for applying the Hill guidelines to determine whether an association truly reflects a causal relationship or is spurious.”

PSC at 33-34, citing Restatement (Third) of Torts: Physical and Emotional Harm § 28 cmt. c(3) (2010). Well true, but the absence of a mathematical algorithm hardly means that causal judgments are devoid of principles and standards. The PSC was undeterred, by text or by shame, from equating an unarticulated use of WOE methodology with some vague invocation of Bradford Hill’s considerations for evaluating associations for causality. See PSC at 43 (citing cases that never mentioned WOE but only Bradford Hill’s 50-plus year old heuristic as somehow supporting the claimed identity of the two approaches)1.

Pfizer Rebuffs WOE

Pfizer filed a comprehensive brief that unraveled the PSC’s duplicity. For unknown reasons, tactical or otherwise, however, Pfizer did not challenge the specifics of PSC’s equation of WOE with an abridged, distorted application of Bradford Hill’s considerations. See generally Opposition Brief of Defendants-Appellees Pfizer Inc., Pfizer International LLC, and Greenstone LLC [cited as Pfizer]. Perhaps given page limits and limited judicial attention spans, and just how woefully bad Jewell’s opinions were, Pfizer may well have decided that a more directed approach of assuming arguendo WOE’s methodological appropriateness was a more economical, pragmatic approach. A close reading of Pfizer’s brief, however, makes clear that it never conceded the validity of WOE as a scientific methodology.

Pfizer did point to the recasting of Jewell’s aborted attempt to apply Bradford Hill considerations as an employment of WOE methodology. Pfizer at 46-47. The argument reminded me of Abraham Lincoln’s famous argument:

How many legs does a dog have if you call his tail a leg?

Four.

Saying that a tail is a leg doesn’t make it a leg.”

Allen Thorndike Rice, Reminiscences of Abraham Lincoln by Distinguished Men of His Time at 242 (1909). Calling Jewell’s supposed method WOE or Bradford Hill or WOE/Bradford Hill did not cure the “fatal methodological flaws in his opinions.” Pfizer at 47.

Pfizer understandably and properly objected to the PSC’s attempt to cast Jewell’s “methodology” at such a high level of generality that any consideration of the many instances of methodological infidelity would be relegated to mere jury questions. Acquiescence in the PSC’s rhetorical move would constitute a complete abandonment of the inquiry whether Jewell had used a proper method. Pfizer at 15-16.

Interestingly, none of the amici curiae addressed the slippery WOE arguments advanced by the PSC. See generally Brief of Amici Curiae American Tort Reform Ass’n & Pharmaceutical Research and Manufacturers of America (Oct. 18, 2016); Brief of Washington Legal Fdtn. as Amicus Curiae (Oct. 18, 2016). There was no meaningful discussion of WOE as a supposedly scientific methodology at oral argument. See Transcript of Oral Argument in In re Zoloft Prod. Liab. Litig., No. 16-2247 (Jan. 25, 2017).

The Third Circuit Acknowledges that Some Methodological Infelicities, Flaws, and Fallacies Are Properly the Subject of Judicial Gatekeeping

Fortunately, Jewell’s methodological infidelities were easily recognized by the Circuit judges. Jewell treated multiple studies, which were nested within one another, and thus involved overlapping and included populations, as though they were independent verifications of the same hypothesis. When the population at issue (from the Danish cohort) was included in a more inclusive pan-Scandivanian study, the relied-upon association dissipated, and Jewell utterly failed to explain or account for these data. Zoloft at 5-6.

Jewell relied upon a study by Anick Bérard, even though he later had to concede that the study had serious flaws that invalidated its conclusions, and which flaws caused him to have a lack of confidence in the paper’s findings.2 In another instance, Jewell relied innocently upon a study that purported to report a statistically significant association, but the authors of this paper were later required by the journal, The New England Journal of Medicine, to correct the very calculated confidence interval upon which Jewell had relied. Despite his substantial mathematical prowess, Jewell missed the miscalculation and relied (uncritically) upon a finding as statistically significant when in fact it was not.

Jewell rejected a meta-analysis of Zoloft studies for questionable methodological quibbles, even though he had relied upon the very same meta-analysis, with the same methodology, in his litigation efforts involving Prozac and birth defects. Not to be corralled by methodological punctilio, Jewell conducted his own meta-analysis with two studies Huybrechts (2014) and Jimenez-Solem (2012), but failed to explain why he excluded other studies, the inclusion of which would have undone his claimed result. Zoloft at 9. Jewell purported to reanalyze and recalculate point estimates in two studies, Jimenez-Solem (2012) and Huybrechts (2014), without any clear protocol or consistency in his approach to other studies. Zoloft at 9. The list goes on, but in sum, Jewell’s handling of these technical issues did not inspire confidence, either in the district or in the appellate court.

WOE to the Third Circuit

The Circuit gave the PSC every conceivable break. Because Pfizer had not engaged specifically on whether WOE was a proper, or any kind of, scientific method, the Circuit treated the issue as virtually conceded:

Pfizer does not seem to contest the reliability of the Bradford Hill criteria or weight of the evidence analysis generally; the dispute centers on whether the specific methodology implemented by Dr. Jewell is reliable. Flexible methodologies, such as the “weight of the evidence,” can be implemented in multiple ways; despite the fact that the methodology is generally reliable, each application is distinct and should be analyzed for reliability.”

Zoloft at 18. The Court acknowledged that WOE arose only in the PSC’s appellate brief, which would have made the entire dubious argument waived under general appellate jurisdictional principles, but the Court, in a footnote, indulged the assumption, “for the sake of argument,” that WOE was Jewell’s purported method from the inception. Zoloft at 18 n. 39. Without any real evidentiary support or analysis or concession from Pfizer, the Circuit accepted that WOE analyses were “generally reliable.” Zoloft at 21.

The Circuit accepted, rather uncritically, that Jewell used a combination of WOE analysis and Bradford Hill considerations. Zoloft at 17. Although Jewell had never described WOE in his litigation report, and WOE was not a feature of his hearing testimony, the Circuit impermissibly engrafted Carl Cranor’s description of WOE as involving inference to the best explanation. Zoloft at 17 & n.37, citing Milward v. Acuity Specialty Prods. Grp., Inc., 639 F.3d 11, 17 (1st Cir. 2011) (internal quotation marks and citation omitted).

There was, however, a limit to the Circuit’s credulousness and empathy. As the Court noted, there must be some assurance that the purported Bradford Hill/WOE method is something more than a “mere conclusion-oriented selection process.” Zoloft at 20. Ultimately, the Court put its markers down for Jewell’s putative WOE methodology:

there must be a scientific method of weighting that is used and explained.”

Zoloft at 20. Calling the method WOE did not, in the final analysis, exclude Jewell from Rule 702 gatekeeping. Try as the PSC might, there was just no mistaking Jewell’s approach as anything other than a crazy patchwork quilt of numerical wizardry in aid of subjective, result-oriented conclusion mongering.

In the Court’s words:

we find that Dr. Jewell did not 1) reliably apply the ‘techniques’ to the body of evidence or 2) adequately explain how this analysis supports specified Bradford Hill criteria. Because ‘any step that renders the analysis unreliable under the Daubert factors renders the expert’s testimony inadmissible’, this is sufficient to show that the District Court did not abuse its discretion in excluding Dr. Jewell’s testimony.”

Zoloft at 28. As heartening as the Circuit’s conclusion is, the Court’s couching its observation as a finding (“we find”) is disheartening with respect to the Third Circuit’s apparent inability to distinguish abuse-of-discretion review from de novo appellate findings. Equally distressing is the Court’s invocation of Daubert factors, which were dicta in a Supreme Court case that was superseded by an amended statute over 17 years ago, in Federal Rule of Evidence 702.

On the crucial question whether Jewell had engaged in an unreliable application of methods or techniques that superficially, at a very high level of generality, claim to be generally accepted, the Court stayed on course. The Court “found” that Jewell had applied techniques, analyses, and critiques so obviously inconsistently that no amount of judicial indulgence, assumptions arguendo, or careless glosses could save Jewell and his fatuous opinions from judicial banishment. Zoloft 28-29. Returning to the correct standard of review (abuse of discretion), but the wrong governing law (Daubert instead of Rule 702), the Court announced that:

[b]ecause ‘any step that renders the analysis unreliable under the Daubert factors renders the expert’s testimony inadmissible’, this is sufficient to show that the District Court did not abuse its discretion in excluding Dr. Jewell’s testimony.”

Zoloft at 21 n.50 (citation omitted). The Court found itself unable to say simply and directly that “the MDL trial court decided the case well within its discretion.”

The Zoloft case was not the Third Circuit’s first WOE rodeo. WOE had raised its unruly head in Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 602 (D.N.J. 2002), aff’d, 68 F. App’x 356 (3d Cir. 2003), where an expert witness, David Ozonoff, offered what purported to be a WOE opinion. The Magistrini trial court did not fuss with the assertion that WOE was generally reliable, but took issue with how Ozonoff tried to pass off his analysis as a comprehensive treatment of the totality of the evidence. In Magistrini, Judge Hochberg noted that regardless of the rubric of the methodology, the witness must show that in conducting a WOE analysis:

all of the relevant evidence must be gathered, and the assessment or weighing of that evidence must not be arbitrary, but must itself be based on methods of science.”

Magistrini, 180 F. Supp. 2d at 602. The witness must show that the methodology is more than a “mere conclusion-oriented selection process,” and that it has a “a scientific method of weighting that is used and explained.” Id. at 607. Asserting the use of WOE was not an excuse or escape from judicial gatekeeping as specified by Rule 702.

Although the Third Circuit gave the Zoloft MDL trial court’s findings a searching review (certainly much tougher than the prescribed abuse-of-discretion review), the MDL court’s finding that Jewell “failed to consistently apply the scientific methods he articulates, has deviated from or downplayed certain well-established principles of his field, and has inconsistently applied methods and standards to the data so as to support his a priori opinion” were ultimately vindicated by the Court of Appeals. Zoloft at 10.

All’s well that ends well. Perhaps. It remains unfortunate, however, that a hypothetical method, WOE — which was never actually advocated by the challenged expert witnesses, which lacks serious support in the scientific community, and which was merely assumed arguendo to be valid — will be taken by careless readers to have been endorsed the Third Circuit.


1 Among the cases cited without any support for the PSC’s dubious contention were Gannon v. United States, 292 F. App’x 170, 173 n.1 (3d Cir. 2008); Bitler v. A.O. Smith Corp., 391 F.3d 1114, 1124-25 (10th Cir. 2004); In re Joint E. & S. Dist. Asbestos Litig., 52 F.3d 1124, 1128 (2d Cir. 1995); In re Avandia Mktg., Sales Practices & Prods. Liab. Litig., No. 2007-MD-1871, 2011 WL 13576, at *3 (E.D. Pa. Jan. 4, 2011) (“Bradford-Hill criteria are used to assess whether an established association between two variables actually reflects a causal relationship.”).

2 Anick Bérard, Sertraline Use During Pregnancy and the Risk of Major Malformations, 212 Am. J. Obstet. Gynecol. 795 (2015).

Welding Litigation – Another Positive Example of Litigation-Generated Science

July 11th, 2017

In a recent post1, I noted Samuel Tarry’s valuable article2 for its helpful, contrarian discussion of the importance of some scientific articles with litigation provenances. Public health debates can spill over to the courtroom, and developments in the courtroom can, on occasion, inform and even resolve those public health debates that gave rise to the litigation. Tarry provided an account of three such articles, and I provided a brief account of another article, a published meta-analysis, from the welding fume litigation.

The welding litigation actually accounted for several studies, but in this post, I detail the background of another published study, this one an epidemiologic study by a noted Harvard epidemiologist. Not every expert witness’s report has the making of a published paper. In theory, if the expert witness has conducted a systematic review, and reached a conclusion that is not populated among already published papers, we might well expect that the witness had achieved the “least publishable unit.” The reality is that most causal claims are not based upon what could even remotely be called a systematic review. Given the lack of credibility to the causal claim, rebuttal reports are likely to have little interest to serious scientists.

Martin Wells

In the welding fume cases, one of plaintiffs’ hired expert witnesses, Martin Wells, a statistician, proffered an analysis of Parkinson’s disease (PD) mortality among welders and welding tradesmen. Using the National Center for Health Statistics (NCHS) database, Wells aggregated data from 1993 to 1999, for PD among welders and compared this to PD mortality among non-welders. Wells claimed to find an increased risk of PD mortality among younger (under age 65 at death) welders and welding tradesmen in this dataset.

The defense sought discovery of Wells’s methods and materials, and obtained the underlying data from the NCHS. Wells had no protocol, no pre-stated commitment to which years in the dataset he would use, and no pre-stated statistical analysis plan. At a Rule 702 hearing, Wells was unable to state how many welders were included in his analysis, why he selected some years but not others, or why he had selected age 65 as the cut off. His analyses appeared to be pure data dredging.

As the defense discovered, the NCHS dataset contained mortality data for many more years than the limited range employed by Wells in his analysis. Working with an expert witness at the Harvard School of Public Health, the defense discovered that Wells had gerrymandered the years included (and excluded) in his analysis in a way that just happened to generate a marginally, nominally statistically significant association.

NCHS Welder Age Distribution

The defense was thus able to show that the data overall, and in each year, were very sparse. For most years, the value was either 0 or 1, for PD deaths under age 65. Because of the huge denominators, however, the calculated mortality odds ratios were nominally statistically significant. The value of four PD deaths in 1998 is clearly an outlier. If the value were three rather than four, the statistical significance of the calculated OR would have been lost. Alternatively, a simple sensitivity test suggests that if instead of overall n = 7, n were 6, statistical significance would have been lost. The chart below, prepared at the time with help from Dr. David Schwartzof Innovative Science solutions, shows the actual number of “underlying cause” PD deaths that were in the dataset for each year in the NCHS dataset, and how sparse and granular” these data were:

A couple of years later, the Wells’ litigation analysis showed up as a manuscript, with only minor changes in its analyses, and with authors listed as Martin T. Wells and Katherine W. Eisenberg, in the editorial offices of Neurology. Katherine W. Eisenberg, AB and Martin T. Wells, Ph.D., “A Mortality Odds Ratio Study of Welders and Parkinson Disease.” Wells disclosed that he had testified for plaintiffs in the welding fume litigation, but Eisenberg declared no conflicts. Having only an undergraduate degree, and attending medical school at the time of submission, Ms. Eisenberg would not seem to have had the opportunity to accumulate any conflicts of interest. Undisclosed to the editors of Neurology, however, was that Ms. Eisenberg was the daughter of Theodore (Ted) Eisenberg, a lawyer who taught at Cornell University and who represented plaintiffs in the same welding MDL as the one in which Wells testified. Inquiring minds might have wondered whether Ms. Eisenberg’s tuition, room, and board were subsidized by Ted’s earnings in the welding fume and other litigations. Ted Eisenberg and Martin Wells had collaborated on many other projects, but in the welding fume litigation, Ted worked as an attorney for MDL welding plaintiffs, and Martin Wells was compensated handsomely as an expert witness. The acknowledgment at the end of the manuscript thanked Theodore Eisenberg for his thoughtful comments and discussion, without noting that he had been a paid member of the plaintiff’s litigation team. Nor did Wells and Eisenberg tells the Neurology editors that the article had grown out of Wells’ 2005 litigation report in the welding MDL.

The disclosure lapses and oversights by Wells and the younger Eisenberg proved harmless error because Neurology rejected the Wells and Eisenberg paper for publication, and it was never submitted elsewhere. The paper used the same restricted set of years of NCHS data, 1993-1999. The defense had already shown, through its own expert witness’s rebuttal report, that the manuscript’s analysis achieved statistical significance only because it omitted years from the analysis. For instance, if the authors had analyzed 1992 through 1999, their Parkinson’s disease mortality point estimate for younger welding tradesmen would no longer have been statistically significant.

Robert Park

One reason that Wells and Eisenberg may have abandoned their gerrymandered statistical analysis of the NCHS dataset was that an ostensibly independent group3 of investigators published a paper that presented a competing analysis. Robert M. Park, Paul A. Schulte, Joseph D. Bowman, James T. Walker, Stephen C. Bondy, Michael G. Yost, Jennifer A. Touchstone, and Mustafa Dosemeci, “Potential Occupational Risks for Neurodegenerative Diseases,” 48 Am. J. Ind. Med. 63 (2005) [cited as Park (2005)]. The authors accessed the same NCHS dataset, and looked at hundreds of different occupations, including welding tradesmen, and four neurodegenerative diseases.

Park, et al., claimed that they looked at occupations that had previously shown elevated proportional mortality ratios (PMR) in a previous publication of the NIOSH. A few other occupations were included; in all their were hundreds of independent analyses, without any adjustment for multiple testing. Welding occupations4 were included “[b]ecause of reports of Parkinsonism in welders [Racette et al.,, 2001; Levy and Nassetta, 2003], possibly attributable to manganese exposure (from welding rods and steel alloys)… .”5 Racette was a consultant for the Lawsuit Industry, which had been funded his research on parkinsonism among welders. Levy was a testifying expert witness for Lawsuit, Inc. A betting person would conclude that Park had consulted with Wells and Eisenberg, and their colleagues.

These authors looked at four neurological degenerative diseases (NDDs), Alzheimer’s disease, Parkinson’s disease, motor neuron disease, and pre-senile dementia. The authors looked at NCHS death certificate occupational information from 1992 to 1998, which was remarkable because Wells had insisted that 1992 somehow was not available for inclusion in his analyses. During 1992 to 1998, in 22 states, there were 2,614,346 deaths with 33,678 from Parkinson’s diseases. (p. 65b). Then for each of the four disease outcomes, the authors conducted an analysis for deaths below age 65. For the welding tradesmen, none of the four NDDs showed any associations. Park went on to conduct subgroup analyses for each of the four NDDs for death below age 65. In these subgroup analyses for welding tradesmen, the authors purported to find only an association only with Parkinson’s disease:

Of the four NDDs under study, only PD was associated with occupations where arc-welding of steel is performed, and only for the 20 PD deaths below age 65 (MOR=1.77, 95% CI=1.08-2.75) (Table V).”

Park (2005), at 70.

The exact nature of the subgroup was obscure, to say the least. Remarkably, Park and his colleagues had not calculated an odds ratio for welding tradesmen under age 65 at death compared with non-welding tradesmen under age 65 at death. The table’s legend attempts to explain the authors’ calculation:

Adjusted for age, race, gender, region and SES. Model contains multiplicative terms for exposure and for exposure if age at death <65; thus MOR is estimate for deaths occurring age 65+, and MOR, age <65 is estimate of enhanced risk: age <65 versus age 65+”

In other words, Park looked to see whether welding tradesmen who died at a younger age (below age 65) were more likely to have a PD cause of death than welding tradesmen who died an older age (over age 65). The meaning of this internal comparison is totally unclear, but it cannot represent a comparison of welder’s with non-welders. Indeed, every time, Park and his colleagues calculated and reported this strange odds ratio for any occupational group in the published paper, the odds ratio was elevated. If the odds ratio means anything, it is that younger Parkinson’s patients, regardless of occupation, are more likely to die of their neurological disease than older patients. Older men, regardless of occupation, are more likely to die of cancer, cardiovascular disease, and other chronic diseases. Furthermore, this age association within (not between) an occupational groups may be nothing other than a reflection of the greater severity of early-onset Parkinson’s disease in anyone, regardless of their occupation.

Like the manuscript by Eisenberg and Wells, the Park paper was an exercise in data dredging. The Park study reported increased odds ratios for Parkinson’s disease among the following groups on the primary analysis:

biological, medical scientists [MOR 2.04 (95% CI, 1.37-2.92)]

clergy [MOR 1.79 (95% CI, 1.58-2.02)]

religious workers [MOR 1.70 (95% CI, 1.27-2.21)]

college teachers [MOR 1.61 (95% CI, 1.39-1.85)]

social workers [MOR 1.44 (95% CI, 1.14-1.80)]

As noted above, the Park paper reported all of the internal mortality odds ratios for below versus above age 65, within occupational groups were nominally statistically significantly elevated. Nonetheless, the Park authors were on a mission, and determined to make something out of nothing, at least when it came to welding and Parkinson’s disease among younger patients. The authors’ conclusion reflected stunningly poor scholarship:

Studies in the US, Europe, and Korea implicate manganese fumes from arc-welding of steel in the development of a Parkinson’s-like disorder, probably a manifestation of manganism [Sjogren et al., 1990; Kim et al., 1999; Luccini, et al., 1999; Moon et al., 1999]. The observation here that PD mortality is elevated among workers with likely manganese exposures from welding, below age 65 (based on 20 deaths), supports the welding-Parkinsonism connection.”

Park (2005) at 73.

Stunningly bad because the cited papers by Sjogren, Luccini, Kim, and Moon did not examine Parkinson’s disease as an outcome; indeed, they did not even examine a parkinsonian movement disorder. More egregious, however, was the authors’ assertion that their analysis, which compared the odds of Parkinson’s disease mortality between welders under age 65 to that mortality for welders over age 65, supported an association between welding and “Parkinsonism.” 

Every time the authors conducted this analysis internal to an occupational group, they found an elevation among under age 65 deaths compared with over age 65 deaths within the occupational group. They did not report comparisons of any age-defined subgroup of a single occupational group with similarly aged mortality in the remaining dataset.

Elan Louis

The plaintiffs’ lawyers used the Park paper as “evidence” of an association that they claimed was causal. They were aided by a cadre of expert witnesses who could cite to a paper’s conclusions, but could not understand its methods. Occasionally, one of the plaintiffs’ expert witnesses would confess ignorance about exactly what Robert Park had done in this paper. Elan Louis, one of the better qualified expert witnesses on the side of claimants, for instance, testified in the plaintiffs’ attempt to certify a national medical monitoring class action for welding tradesmen. His testimony about what to make of the Park paper was more honest than most of the plaintiffs’ expert witnesses:

Q. My question to you is, is it true that that 1.77 point estimate of risk, is not a comparison of this welder and allied tradesmen under this age 65 mortality, compared with non-welders and allied tradesmen who die under age 65?

A. I think it’s not clear that the footnote — I think that the footnote is not clearly written. When you read the footnote, you didn’t read the punctuation that there are semicolons and colons and commas in the same sentence. And it’s not a well constructed sentence. And I’ve gone through this sentence many times. And I’ve gone through this sentence with Ted Eisenberg many times. This is a topic of our discussion. One of the topics of our discussions. And it’s not clear from this sentence that that’s the appropriate interpretation. *  *  *  However, the footnote, because it’s so poorly written, it obscures what he actually did. And then I think it opens up alternative interpretations.

Q. And if we can pursue that for a moment. If you look at other tables for other occupational titles, or exposure related variables, is it true that every time that Mr. Park reports on that MOR age under 65, that the estimate is elevated and statistically significantly so?

A. Yes. And he uses the same footnote every time. He’s obviously cut and paste that footnote every single time, down to the punctuation is exactly the same. And I would agree that if you look for example at table 4, the mortality odds ratios are elevated in that manner for Parkinson’s Disease, with reference to farming, with reference to pesticides, and with reference to farmers excluding horticultural deaths.

Deposition testimony of Elan Louis, at p. 401-04, in Steele v. A. O. Smith Corp., no. 1:03 CV-17000, MDL 1535 (Jan. 18, 2007). Other less qualified, or less honest expert witnesses on the plaintiffs’ side were content to cite Park (2005) as support for their causal opinions.

Meir Stampfer

The empathetic MDL trial judge denied the plaintiffs’ request for class certification in Steele, but individual personal injury cases continued to be litigated. Steele v. A.O. Smith Corp., 245 F.R.D. 279 (N.D. Ohio 2007) (denying class certification); In re Welding Fume Prods. Liab. Litig., No. 1:03-CV-17000, MDL 1535, 2008 WL 3166309 (N.D. Ohio Aug. 4, 2008) (striking pendent state-law class actions claims)

Although Elan Louis was honest enough to acknowledge his own confusion about the Park paper, other expert witnesses continued to rely upon it, and plaintiffs’ counsel continued to cite the paper in their briefs and to use the apparently elevated point estimate for welders in their cross-examinations of defense expert witnesses. With the NCHS data in hand (on a DVD), defense counsel returned to Meir Stampfer, who had helped them unravel the Martin Wells’ litigation analysis. The question for Professor Stampfer was whether Park’s reported point estimate for PD mortality odds ratio was truly a comparison of welders versus non-welders, or whether it was some uninformative internal comparison of younger welders versus older welders.

The one certainty available to the defense is that it had the same dataset that had been used by Martin Wells in the earlier litigation analysis, and now by Robert Park and his colleagues in their published analysis. Using the NCHS dataset, and Park’s definition of a welder or a welding tradesman, Professor Stampfer calculated PD mortality odds ratios for each definition, as well as for each definition for deaths under age 65. None of these analyses yielded statistically significant associations. Park’s curious results could not be replicated from the NCHS dataset.

For welders, the overall PD mortality odds ratio (MOR) was 0.85 (95% CI, 0.77–0.94), for years 1985 through 1999, in the NCHS dataset. If the definition of welders was expanded to including welding tradesmen, as used by Robert Park, the MOR was 0.83 (95% CI, 0.78–0.88) for all years available in the NCHS dataset.

When Stampfer conducted an age-restricted analysis, which properly compared welders or welding tradesmen with non-welding tradesmen, with death under age 65, he similarly obtained no associations for PD MOR. For the years 1985-1991, death under 65 from PD, Stampfer found MORs 0.99 (95% CI, 0.44–2.22) for just welders, and 0.83 (95% CI, 0.48–1.44) all welding tradesmen.

And for 1992-1999, the years used by Park (2005), and similar to the date range used by Martin Wells, for PD deaths at under age 65, for welders only, Stampfer found a MOR of 1.44 (95% CI, 0.79–2.62), and for all welding tradesmen, 1.20 (95% CI, 0.79–1.84)

None of Park’s slicing, dicing, and subgrouping of welding and PD results could be replicated. Although Dr. Stampfer submitted a report in Steele, there remained the problem that Park (2005) was a peer-reviewed paper, and that plaintiffs’ counsel, expert witnesses, and other published papers were citing it for its claimed results and errant discussion. The defense asked Dr. Stampfer whether the “least publishable unit” had been achieved, and Stampfer reluctantly agreed. He wrote up his analysis, and published it in 2009, with an appropriate disclosure6. Meir J. Stampfer, “Welding Occupations and Mortality from Parkinson’s Disease and Other Neurodegenerative Diseases Among United States Men, 1985–1999,” 6 J. Occup. & Envt’l Hygiene 267 (2009).

Professor Stampfer’s paper may not be the most important contribution to the epidemiology of Parkinson’s disease, but it corrected the distortions and misrepresentations of data in Robert Park’s paper. His paper has since been cited by well-known researchers in support of their conclusion that there is no association between welding and Parkinson’s disease7. Park’s paper has been criticized on PubPeer, with no rebuttal8.

Almost comically, Park has cited Stampfer’s study tendentiously for a claim that there is a healthy worker bias present in the available epidemiology of welding and PD, without noting, or responding to, the devastating criticism of his own Park (2005) work:

For a mortality study of neurodegenerative disease deaths in the United States during 1985 – 1999, Stampfer [61] used the Cause of Death database of the US National Center for Health Statistics and observed adjusted mortality odds ratios for PD of 0.85 (95% CI, 0.77 – 0.94) and 0.83 (95% CI, 0.78 – 0.88) in welders, using two definitions of welding occupations [61]. This supports the presence of a significant HWE [healthy worker effect] among welders. An even stronger effect was observed in welders for motor neuron disease (amyotrophic lateral sclerosis, OR 0.71, 95% CI, 0.56 – 0.89), a chronic condition that clearly would affect welders’ ability to work.”

Robert M. Park, “Neurobehavioral Deficits and Parkinsonism in Occupations with Manganese Exposure: A Review of Methodological Issues in the Epidemiological Literature,” 4 Safety & Health at Work 123, 126 (2013). Amyotrophic lateral sclerosis has a sudden onset, usually in middle age, without any real prodomal signs or symptoms, which would keep a young man from entering welding as a trade. Just shows you can get any opinion published in a peer-reviewed journal, somewhere. Stampfer’s paper, along with Mortimer’s meta-analysis helped put the kabosh on welding fume litigation.

Addendum

A few weeks ago, the Sixth Circuit affirmed the dismissal of a class action that was attempted based upon claims of environmental manganese exposure. Abrams v. Nucor Steel Marion, Inc., Case No. 3:13 CV 137, 2015 WL 6872511 (N. D. Ohio Nov. 9, 2015) (finding testimony of neurologist Jonathan Rutchik to be nugatory, and excluding his proffered opinions), aff’d, 2017 U.S. App. LEXIS 9323 (6th Cir. May 25, 2017). Class plaintiffs employed one of the regulators, Jonathan Rutchik, from the welding fume parkinsonism litigation).


2 Samuel L. Tarry, Jr., “Can Litigation-Generated Science Promote Public Health?” 33 Am. J. Trial Advocacy 315 (2009)

3 Ostensibly, but not really. Robert M. Park was an employee of NIOSH, but he had spent most of his career working as an employee for the United Autoworkers labor union. The paper acknowledged help from Ed Baker, David Savitz, and Kyle Steenland. Baker is a colleague and associate of B.S. Levy, who was an expert witness for plaintiffs in the welding fume litigation, as well as many others. The article was published in the “red” journal, the American Journal of Industrial Medicine.

4 The welding tradesmen included in the analyses were welders and cutters, boilermakers, structural metal workers, millwrights, plumbers, pipefitters, and steamfitters. Robert M. Park, Paul A. Schulte, Joseph D. Bowman, James T. Walker, Stephen C. Bondy, Michael G. Yost, Jennifer A. Touchstone, and Mustafa Dosemeci, “Potential Occupational Risks for Neurodegenerative Diseases,” 48 Am. J. Ind. Med. 63, 65a, ¶2 (2005).

5 Id.

6 “The project was supported in part through a consulting agreement with a group of manufacturers of welding consumables who had no role in the analysis, or in preparing this report, did not see any draft of this manuscript prior to submission for publication, and had no control over any aspect of the work or its publication.” Stampfer, at 272.

7 Karin Wirdefeldt, Hans-Olov Adami, Philip Cole, Dimitrios Trichopoulos, and Jack Mandel, “Epidemiology and etiology of Parkinson’s disease: a review of the evidence,” 26 Eur. J. Epidemiol. S1 (2011).

8 The criticisms can be found at <https://pubpeer.com/publications/798F9D98B5D2E5A832136C0A4AD261>, last visited on July 10, 2017.

Slemp Trial Part 3 – The Defense Expert Witness – Huh

July 9th, 2017

On June 19, 2017, the U.S. Supreme Court curtailed the predatory jurisdictional practices of the lawsuit industry in seeking out favorable trial courts with no meaningful connection to their claims. See Bristol-Myers Squib Co. v. Superior Court, No. 16-466, 582 U.S. ___ (June 19, 2017). The same day, the defendants in a pending talc cancer case in St. Louis filed a motion for a mistrial. Swann v. Johnson & Johnson, Case No. 1422-CC09326-01, Division 10, Circuit Court of St. Louis City, Missouri. Missouri law may protect St. Louis judges from having to get involved in gatekeeping scientific expert witness testimony, but when the Supreme Court speaks to the requirements of the federal constitution’s due process clause, even St. Louis judges must listen. Bristol-Myers held that the constitution limits the practice of suing defendants in jurisdictions unrelated to the asserted claims, and the St. Louis trial judge, Judge Rex Burlison, granted the requested mistrial in Swann. As a result, there will not be another test of plaintiffs’ claims that talc causes ovarian cancer, and the previous Slemp case will remain an important event to interpret.

The Sole Defense Expert Witness

Previous posts1 addressed some of the big picture issues as well as the opening statements in Slemp. This posts turns to the defense expert witness, Dr. Walter Huh, in an attempt to understand how and why the jury returned its egregious verdict. Juries can, of course, act out of sympathy, passion, or prejudice, but their verdicts are usually black boxes when it comes to discerning their motivations and analyses. A more interesting and fruitful exercise is to ask whether a reasonable jury could have reached the conclusion in the case. The value of this exercise is limited, however. A reasonable jury should have reasonable expertise in the subject matter, and in our civil litigation system, this premise is usually not satisfied.

Dr. Walter Huh, a gynecologic oncologist, was the only expert witness who testified for the defense. As the only defense witness, and as a clinician, Huh had a terrible burden. He had to meet and rebut testimony outside his fields of expertise, including pathology, toxicology, and most important, epidemiology. Huh was by all measures well-spoken, articulate, and well-qualified as a clinical gynecologic oncologist. Defense counsel and Huh, however, tried to make the case that Huh was qualified to speak to all issues in the case. The initial examination on qualifications was long and tedious, and seemed to overcompensate for the obvious gaps in Dr. Huh’s qualifications. In my view, the defense never presented much in the way of credible explanations about where Huh had obtained the training, experience, and expertise to weigh in on areas outside clinical medicine. Ultimately, the cross-examination is the crucial test of whether this strategy of one witness for all subjects can hold. The cross-examination of Dr. Huh, however, exposed the gaps in qualifications, and more important, Dr. Huh made substantive errors that were unnecessary and unhelpful to the defense of the case.

The defense pitched the notion that Dr. Huh somehow trumped all the expert witnesses called by plaintiff because Huh was the “only physician heard by the jury” in court. Somehow, I wonder whether the jury was so naïve. It seems like a poor strategic choice to hope that the biases of the jury in favor of the omniscience of physicians (over scientists) will carry the day.

There were, to be sure, some difficult clinical issues, on which Dr. Huh could address within his competence. Cancer causation itself is a multi-disciplinary science, but in the case of a disease, such as ovarian cancer, with a substantial base-rate in the general population and without any biomarker of a causal pathway between exposure and outcome, epidemiology will be a necessary tool. Huh was thus forced to “play” on the plaintiffs’ expert witnesses’ home court, much to his detriment.

General Causation

Don’t confuse causation with links, association, and risk factors

The defense strong point is that virtually no one, other than the plaintiffs’ expert witnesses themselves, and only in the context of litigation, has causally attributed ovarian cancer to talc exposure. There are, however, some ways that this point can be dulled in the rough and tumble of trial. Lawyers, like journalists, and even some imprecise scientists, use a variety of terms such as “risk,” “risk factor,” “increased risk,” and “link,” for something less than causation. Sometimes these terms are used deliberately to try to pass off something less than causation as causation; sometimes the speaker is confused; and sometimes the speaker is simply being imprecise. It seems incumbent upon the defense to explain the differences between and among these terms, and to stick with a consistent, appropriate terminology.

One instance in which Dr. Huh took his eye off the “causation ball,” arose when plaintiffs’ counsel showed him a study conclusion that talc use among African American women was statistically significantly associated with ovarian cancer. Huh answered, non-responsively, “I disagree with the concept that talc causes ovarian cancer.” The study, however, did not advance a causal conclusion and there was no reason to suggest to the jury that he disagreed with anything in the paper; rather it was the opportunity to repeat that association is not causation, and the article did not contradict anything he had said.

Similarly, Dr. Huh was confronted with several precautionary recommendations that women “may” benefit from avoiding talc. Remarkably, Huh simply disagreed, rather than making the obvious point that the recommendation was not stated as something that would in fact benefit women.

When witnesses answer long, involved questions, with a simple “yes,” then they may have made every implied proposition in the questions into facts in the case. In an exchange between plaintiff’s counsel and Huh, counsel asked whether a textbook listed talc as a risk factor.2 Huh struggled to disagree, which disagreement tended to impair his credibility, for disagreeing with a textbook he acknowledged using and relying upon. Disagreement, however, was not necessary; the text merely stated that “talc … may increase risk.” If “increased risk” had been defined and explained as something substantially below causation, then Huh could have answered simply “yes, but that quotation does not support a causal claim.”

At another point, plaintiffs’ counsel, realizing that none of the individual studies reached a causal conclusion, asked whether it would be improper for a single study to give such a conclusion. It was a good question, with a solid premise, but Dr. Huh missed the opportunity for explaining that the authors of all the various individual studies had not conducted systematic reviews that advanced the causal conclusion that plaintiffs would need. Certainly, the authors of individual studies were not prohibited from taking the next step to advance a causal conclusion in a separate paper with the appropriate analysis.

Bradford Hill’s Factors

Dr. Huh’s testimony provided the jury with some understanding of Sir Austin Bradford Hill’s nine factors, but Dr. Huh would have helped himself by acknowledging several important points. First, as Hill explained, the nine factors are invoked only after there is a clear-cut (valid) association beyond that which we care to attribute to chance. Second, establishing all nine factors is not necessary. Third, some of the nine factors are more important than others.

Study validity

In the epidemiology of talc and ovarian cancer, statistical power and significance are not the crucial issues; study validity is. It should have been the plaintiff’s burden to rule out bias, and confounding, as well as chance. Hours had passed in the defense examination of Dr. Huh before study validity was raised, and it was never comprehensively explained. Dr. Huh explained recall bias as a particular problem of case-control studies, which made up the bulk of evidence upon which plaintiffs’ expert witnesses relied. A more sophisticated witness on epidemiology might well have explained that the selection of controls can be a serious problem without obvious solutions in case-control studies.

On cross-examination, plaintiffs’ counsel, citing Kenneth Rothman, asked whether misclassification bias always yields a lower risk ratio. Dr. Huh resisted with “not necessarily,” but failed to dig in whether the conditions for rejecting plaintiffs’ generalization (such as polychotomous exposure classification) obtained in the relevant cohort studies. More importantly, Huh missed the opportunity to point out that the most recent, most sophisticated cohort study reported a risk ratio below 1.0, which on the plaintiffs’ theory about misclassification would have been even lower than 1.0 than reported in the published paper. Again, a qualified epidemiologist would not have failed to make these points.

Dr. Huh never read the testimony of one of the plaintiffs’ expert witnesses on epidemiology, Graham Colditz, and offered no specific rebuttal of Colditz’s opinions. With respect to the other of plaintiffs’ epidemiology expert witness, Dr. Cramer, Huh criticized him for engaging in post-hoc secondary analyses and asserted that Cramer’s meta-analysis could not be validated. Huh never attempted to validate the meta-analysis himself; nor did Huh offer his own meta-analysis or explain why a meta-analysis of seriously biased studies was unhelpful. These omissions substantially blunted Huh’s criticisms.

On the issue of study validity, Dr. Huh seem to intimate that cohort studies were necessarily better than case-control studies because of recall bias, but also because there are more women involved in the cohort studies than in the case-control studies. The latter point, although arithmetically correct, is epidemiologically bogus. There are often fewer ovarian cancer cases in the cohort study, especially if the cohort is not followed for a very long time. The true test comes in the statistical precision of the point estimate, relative risk or odds ratio, in the different type of study. The case-control studies often generate much more precise point estimates as seen from their narrower confidence intervals. Of course, the real issue is not precision here, but accuracy.  Still, Dr. Huh appeared to have endorsed the defense counsel misleading argument about study size, a consideration that will not help the defense when the contentions of the parties are heard in scientific fora.

Statistical Significance

Huh appeared at times to stake out a position that if a study does not have statistical significance, then we must accept the null hypothesis. I believe that most careful scientists would reject this position. Null studies simply fail to reject the null hypothesis.

Although there seems to be no end to fallacious reasoning by plaintiffs, there is a particular defense fallacy seen in some cases that turn on epidemiology. What if we had 10 studies that each found an elevated risk ratio of 1.5, with two-tailed 95 percent confidence intervals of 0.92 – 2.18, or so. Can the defense claim victory because no study is statistically significant? Huh seemed to suggest so, but this is clearly wrong. Of course, we might ask why no one conducted the 11th study, with sufficient power to detect a risk ratio of 1.5, at the desired level of significance. But parties go to trial with the evidence they have, not what they might want to have. On the above 10-study hypothetical, a meta-analysis might well be done (assuming the studies could be appropriately included), and the summary risk ratio for all studies would be 1.5, and highly statistically significant.

On the question of talc and ovarian cancer, there were several meta-analyses at issue, and so the role of statistical significance of individual studies was less relevant. The real issue was study validity. This issue was muddled by assertions that risk ratios such as 2.05 (95%, 0.94 – 4.47) were “chance findings.” Chance may not have been ruled out, but the defense can hardly assert that chance and chance alone produced the findings; otherwise, it will be sunk by the available meta-analyses.

Strength of Association

The risk ratios involved in most of the talc ovarian cancer studies are small, and that is obviously an important factor to consider in evaluating the studies for causal conclusions. Still, it is also obvious that sometimes real causal associations can be small in magnitude. Dr Huh could and should have conceded in direct that small associations can be causal, but explained that validity concerns about the studies that show small associations become critical. Examples would have helped, such as the body of observational epidemiology that suggested that estrogen replacement therapy in post-menopausal women provided cardiovascular benefit, only to be reversed by higher quality clinical trials. Similarly, observational studies suggested that lung cancer rates were reduced by Vitamin A intake, but again clinical trial data showed the opposite.

Consistency of Studies

Are studies that have statistically non-significant risk ratios above 1.0 inconsistent with studies that find statistically significant elevated risk ratios? At several points, Huh appears to say that such a group of studies is inconsistent, but that is not necessarily so. Huh’s assertion provoked a good bit of harmful cross-examination, in which he seemed to resist the notion that meta-analysis could help answer whether a group of studies is statistically consistent. Huh could have conceded the point readily but emphasized that a group of biased studies would give only a consistently biased estimate of association.

Authority

One of the cheapest tricks in the trial lawyers’ briefcase is the “learned treatise” exception to the rule against hearsay.”3 The lawyer sets up witnesses in deposition by obtaining their agreement that a particular author or text is “authoritative.” Then at trial, the lawyer confronts the witnesses with a snippet of text, which appears to disagree with the expert witnesses’ testimony. Under the rule, in federal and in some state courts, the jury may accept the snippet or sound bite as true, and also accept that the witnesses do not know what they are talking about when they disagree with the “authoritative” text.

The rule is problematic and should have been retired long ago. Since 1663, the Royal Society has sported the motto:  “Nullius in verba.”  Disputes in science are resolved with data, from high-quality, reproducible experimental or observational studies, not with appeals to the prestige of the speaker. And yet, we lawyers will try, and sometimes succeed, with this greasy kidstuff approach cross-examination. Indeed, when there is an opportunity to use it, we may even have an obligation to use so-called learned treatises to advance our clients’ cause.

In the Slemp trial, the plaintiff’s counsel apparently had gotten a concession from Dr. Huh that plaintiff’s expert witness on epidemiology, Dr. Daniel Cramer, was “credible and authoritative.” Plaintiff’s counsel then used Huh’s disagreement with Cramer’s testimony as well as his published papers to undermine Huh’s credibility.

This attack on Huh was a self-inflicted wound. The proper response to a request for a concession that someone or some publication is “authoritative,” is that this word really has no meaning in science. “Nullius in verba,” and all that. Sure, someone can be a respected research based upon past success, but past performance is no guarantee of future success. Look at Linus Pauling and Vitamin C. The truth of a conclusion rests on the data and the soundness of the inferences therefrom.

Collateral Attacks

The plaintiff’s lawyer in Slemp was particularly adept at another propaganda routine – attacking the witness on the stand for having cited another witness, whose credibility in turn was attacked by someone else, even if that someone else was a crackpot. Senator McCarthy (Joseph not Eugene) would have been proud of plaintiff’s lawyer’s use of the scurrilous attack on Paolo Boffetta for his views on EMF and cancer, as set out in Microwave News, a fringe publication that advances EMF-cancer claims. Now, the claim that non-ionizing radiation causes cancer has not met with much if any acceptance, and Boffetta’s criticisms of the claims are hardly unique or unsupported. Yet plaintiff’s counsel used this throw-away publication’s characterization of Boffetta as “the devil’s advocate,” to impugn Boffetta’s publications and opinions on EMF, as well as Huh’s opinions that relied upon some aspect of Boffetta’s work on talc. Not that “authority” counts, but Boffetta is the Associate Director for Population Sciences of the Tisch Cancer Institute and Chief of the Division of Cancer Prevention and Control of the Department of Oncological Sciences, at the Mt. Sinai School of Medicine in New York. He has published many epidemiologic studies, as well as a textbook on the epidemiology of cancer.4

The author from the Microwave News was never identified, but almost certainly lacks the training, experience, and expertise of Paolo Boffetta. The point, however, is that this cross-examination was extremely collateral, had nothing to do with Huh, or the issues in the Slemp case, and warranted an objection and admonition to plaintiff’s counsel for the scurrilous attack. An alert trial judge, who cared about substantial justice, might have shut down this frivolous, highly collateral attack, sua sponte. When Huh was confronted with the “devil’s advocate” characterization, he responded “OK,” seemingly affirming the premise of the question.

Specific Causation

Dr. Huh and the talc defendants took the position that epidemiology never informs assessment of individual causation. This opinion is hard to sustain. Elevated risk ratios reflect more individual cases than expected in a sample. Epidemiologic models are used to make individual predictions of risk for purposes of clinical monitoring and treatment. Population-based statistics are used to define range of normal function and to assess individuals as impaired or disabled, or not.

At one point in the cross-examination, plaintiffs’ counsel suggested the irrelevance of the size of relative risk by asking whether Dr. Huh would agree that a 20% increased risk was not small if you are someone who has gotten the disease. Huh answered “Well, if it is a real association.” This answer fails on several levels. First, it conflates “increased risk” and “real association” with causation. The point was for Huh to explain that an increased risk, if statistically significant, may be an association, but it is not necessary causal.

Second, and equally important, Huh missed the opportunity to explain that even if the 20% increased risk was real and causal, it would still mean that an individual patient’s ovarian cancer was most likely not caused by the exposure. See David H. Schwartz, “The Importance of Attributable Risk in Toxic Tort Litigation,” (July 5, 2017).

Conclusion

The defense strategy of eliciting all their scientific and medical testimony from a single witness was dangerous at best. As good a clinician as Dr. Huh appears to be, the defense strategy did not bode well for a good outcome when many of the scientific issues were outside of Dr. Huh’s expertise.


2 Jonathan S. Berek & Neville F. Hacker, Gynecologic Oncology at 231 (6th ed. 2014).

3 SeeTrust-Me Rules of Evidence” (Oct. 18 2012).

4 See, e.g., Paolo Boffetta, Stefania Boccia, Carol La Vecchia, A Quick Guide to Cancer Epidemiology (2014).