TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Echeverria Talc Trial – Crossexamination on Alleged Expert Witness Misconduct

October 21st, 2017

In a post-trial end-zone victory dance in Echeverria v. Johnson & Johnson, plaintiffs’ lawyer, Allen Smith proffered three explanations for the jury’s stunning $417 million verdict in his talc ovarian cancer case.1 One of the explanations asserted was Smith’s boast that he had adduced evidence that Johnson & Johnson’s expert witness on epidemiology, Douglas Weed, a former National Cancer Institute epidemiologist and physician, had been sanctioned in another, non-talc case in North Carolina, for lying under oath about whether he had notes to his expert report in that other case.2 Having now viewed Dr. Weed’s testimony3, through the Courtroom Video Network, I can evaluate Smith’s claim.

Weed’s allegedly perjurious testimony took place in Carter v. Fiber Composites LLC, 11 CVS 1355, N.C. Super. Ct., where he served as a party expert witness. In April 2014, Weed gave deposition testimony in the discovery phase of the Carter case. Although not served personally with a lawful subpoena, defense counsel had agreed to accept a subpoena for their expert witness to appear and produce documents, as was the local custom. In deposition, plaintiffs’ counsel asked Dr. Weed to produce any notes he created in the process of researching and writing his expert witness report. Dr. Weed testified that he had no notes. 

The parties disputed whether Dr. Weed had complied with a subpoena served upon defense counsel. The discovery dispute escalated and Dr. Weed obtained legal counsel, and submitted a sworn affidavit that denied the existence of notes. Plaintiffs’ counsel pressed on Dr. Weed’s understanding that he had no “notes.” In an Order, dated May 6, 2014, the trial court directed Dr. Weed to produce everything in his possession. In response to the order, Weed produced his calendar and a thumb drive with “small fragments of notes,” “inserts,” and “miscellaneous items.”

The North Carolina court did not take kindly to Dr. Weed’s confusion about whether his report “segments” and “inserts” were notes, or not. Dr. Weed viewed the segments and inserts to have been parts of his report, and later included within his report without any substantial change. The court concluded, however, that although Dr. Weed did not violate any court order, his assertion, in deposition, in an affidavit, and through legal counsel, was unreasonable, and directly related to his credibility in the Carter case. See Order Concerning Plaintiffs’ Motion for Sanctions Against Defendants and Non-Party Witness for Defendants (June 22, 2015) (Forrest D. Bridges, J.).

The upshot was that Dr. Weed and his counsel had provided false information to the court, on the court’s understanding of what had been requested in discovery. In the court’s view, Dr. Weed’s misunderstanding may have been understandable as a non-lawyer, but it was not reasonable for him to persist and have his counsel argue that there were no notes. The trial court specifically did not find that Dr. Weed had lied, as asserted by Allen Smith, but found that Weed’s conduct was undertaken intentionally or with reckless disregard of the truth, and that his testimony was an unacceptable violation of the oath to tell the whole truth. The trial court concluded that it could not sanction Dr. Weed personally, but its order specified that as a sanction, the plaintiffs’ counsel would be permitted to cross-examine Dr. Weed with the court’s findings and conclusions in the Carter case. Id. Not surprisingly, defense counsel withdrew Dr. Weed as an expert witness.

In the Echeverria case, the defense counsel did not object to the cross-examination; the video proceedings did not inform the viewers whether there had been a prior motion in limine concerning this examination. Allen Smith’s assertion about the North Carolina court’s findings was thus almost true. A cynic might say he too had not told the whole truth, but he did march Dr. Weed through Judge Bridges’ order of June 2015, which was displayed to the jury.

Douglas Weed handled the cross-examination about as well as possible. He explained on cross, and later on redirect, that he did not regard segments of his report, which were later incorporated into his report as served, to be notes. He pointed out that there was no information in the segments, which differed from the final report, or which was not included in the report. Smith’s cross-examination, however, had raised questions not so much about credibility (despite Judge Bridges’ findings), but about whether Dr. Weed was a “quibbler,” who would hide behind idiosyncratic understandings of important words such as “consistency.” Given how harmless the belatedly produced report fragments and segments were, we are left to wonder why Dr. Weed persisted in not volunteering them.

Smith’s confrontation of Dr. Weed with the order from the Carter case came at the conclusion of a generally unsuccessful cross-examination. Unlike the Slemp case, in which Smith appeared to be able to ask unfounded questions without restraint from the bench, in Echeverria, Smith drew repeated objections, which were frequently sustained. His response often was to ask almost the same question again, drawing the same objection and the same ruling. He sounded stymied and defeated.

Courtroom Video Network, of course, does not film the jurors, and so watching the streaming video of the trial offers no insights into how the jurors reacted in real time to Smith’s cross-examination. If Weed’s testimony was ignored or discredited by Smith’s cross-examination on the Carter order, then the Escheverria case cannot be considered a useful test of the plaintiffs’ causal claim. Dr. Weed had offered important testimony on methodological issues for conducting and interpreting studies, as well as inferring causation.

One of the peculiarities of the Slemp case was that the defense offered no epidemiologist in the face of two epidemiologists offered by the plaintiff. In Escheverria, the defense addressed this gap and went further to have its epidemiologist address the glaring problem of how any specific causal inference can be drawn from a risk ratio of 1.3. Dr. Weed explained attributable risk and probability of causation, and this testimony and many other important points went without cross-examination or contradiction. And yet, after finding general causation on a weak record, the jury somehow leaped over an insurmountable epistemic barrier on specific causation.


1 Amanda Bronstad, “New Evidence Seen as Key in LA Jury’s $417M Talc Verdict,” Law.com (Aug. 22, 2017).

3 The cross-examination at issue arose about one hour, nine minutes into Smith’s cross-examination, on Aug. 15, 2017.

Statistical Gobbledygook Goes to the Supreme Court

October 20th, 2017

Back in July, my summer slumber was rudely interrupted by an intemperate, ad hominem rant from statistican Sander Greenland. Greenland’s rant concerned my views of the the Supreme Court’s decision in Matrixx Initiatives v. Siracusano, 563 U.S. 27 (2011).

Greenland held forth, unfiltered, on Deborah Mayo’s web blog, where he wrote:

Glad to have finally flushed out Schachtman, whose blog did not allow my critical dissenting comments back when this case first hit. Nice to see him insult the intellect of the Court too, using standard legal obfuscation of the fact that the Court is entitled to consider science, ordinary logic, and common sense outside of that legal framework to form and justify its ruling – that reasoning is what composes the bulk of the opinion I linked. Go read it and see what you think without the smokescreen offered by Schachtman.”

A megateam of reproducibility-minded scientists look to lowering the p-value,” Error Statistics (July 25, 2017).

Oh my! It is true that my blog does not have comments enabled, but as I have written on several occasions, I would gladly welcome requests to post opposing views, even those of Sander Greenland. On Deborah Mayo’s blog, I had the opportunity to explain carefully why Greenland has been giving a naïve, mistaken characterization of the holding of Matrixx Initiatives, in his expert witness reports for plaintiffs’ counsel, as well as in his professional publications. Ultimately, Greenland ran out of epithets, lost his enthusiasm for the discussion, and slunk away into cyber-silence.

I was a bit jarred, however, by Greenland’s accusation that I had insulted the Court. Certainly, I did not use any of the pejorative adjectives that Greenland had hurled at me; rather, I simply have given legal analysis of the Court’s opinions and a description of the legal, scientific, and statistical errors therein.1 And, to be sure, other knowledgeable writers and evidence scholars, have critiqued the Court’s decision and some of the pronouncements of the parties and the amici in Matrixx Initiatives2.

This week, John Pfaff, a professor at Fordham Law School, published an editorial in the New York Times, to argue that “The Supreme Court Justices Need Fact-Checkers,” N.Y. Times (Oct. 18, 2017). No doubt, Greenland would consider Pfaff’s editorial to be “insulting” to the Court, unless of course, Greenland thinks criticism can be insulting only if it challenges views he wants to see articulated by the Court.

In support of his criticism of the Court, Pfaff adverted to the Chief Justice’s recent comments in the oral argument of a gerrymandering case, Gill v. Whitford. In a question critical of the gerrymander challenge, Chief Justice Roberts described the supporting evidence:

it may be simply my educational background, but I can only describe as sociological gobbledygook.”

Oral Argument before the U.S. Supreme Court at p.40, in Gill v. Whitford, No. 16-1161 (Oct. 3, 2017). The Chief Justice’s dismissive comments about gobble may well have been provoked by an amicus brief filed on behalf of 44 election law, scientific evidencce, and empirical legal scholars, who explored the legal and statistical basis for striking down the Wisconsin gerrymander. See Brief of Amici Curiae, of 44 Election Law, Scientific Evidence, and Empirical Legal Scholars, filed in Gill v. Whitford, No. 16-1161 (Sept. 1, 2017).

As with Greenland’s obsequious respect for the Matrixx Initiatives opinion, no one is likely to have been misled by Chief Justice Roberts’ false modesty. John Roberts was graduated summa cum laude from Harvard College in three years, although with a major in a “soft” discipline, history. He went on to Harvard Law School, where he was the managing editor of the Harvard Law Review, and was graduated magna cum laude. As a lawyer, Roberts has had an extraordinarily successful career. And yet, the Chief Justice went out of his way to disparage the mathematical and statistical models used to show gerrymandering in the Gill case, as “gobbledygook.” Odds are that the Chief Justice was thus not deprecating his own education; yet, inquiring minds might wonder whether that education was deficient in mathematics, statistics, and science.

Policy is a major part of the court’s docket now, whether the Justices likes it or not. The Justices cannot avoid adapting to the technical requirements of scientific and statistical issues, and they cannot simply dismiss evidence they do not understand as “gobbledygook.” Referencing a recent ProPublica report, Professor Pfaff suggests that the Supreme Court might well employ independent advisors to fact check their use of descriptive statistics3

The problem identified by Pfaff, however, seems to implicate a fundamental divide between the “two cultures” of science and the humanities. See C.P. Snow, The Rede Lecture 1959. Perhaps Professor Pfaff might start with his own educational institution. The Fordham University School of Law does not offer a course in statistics and probability; nor does it require entering students to have satisfied a requirement of course work in mathematics, science, or statistics. The closest offering at Fordham is a course on accounting for lawyer, and the opportunity to take a one-credit course in “quantitative methods” at the graduate school.

Fordham School of Law, of course, is hardly alone. Despite cries for “relevancy” and experiential learning in legal education, some law schools eschew courses in statistics and probability for legal applications, sometimes on the explicit acknowledgement that such courses are too “hard,” or provoke too much student anxiety. The result, as C.P. Snow saw over a half century ago, is that lawyers and judges cannot tell gobbledygook from important data analysis, even when it smacks them in the face.


1 With David Venderbush of Alston & Bird LLP, I published my initial views of the Matrixx case, in the the form of a Washington Legal Foundation Legal Backgrounder, available at the Foundation’s website. See Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011). I expanded on my critique in several blog posts. See, e.g., Matrixx Unloaded” (Mar. 29, 2011); The Matrixx Oversold” (Apr. 4, 2011); The Matrixx – A Comedy of Errors” (Apr. 6, 2011); De-Zincing the Matrixx” (Apr. 12, 2011); “Siracusano Dicta Infects Daubert Decisions” (Sept. 22, 2012).

2 See David Kaye, “The Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part I” (Aug. 19, 2011), and “The Transposition Fallacy in Matrixx Initiatives, Inc. v. Siracusano: Part II” (Aug. 26, 2011); David Kaye, “Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety & Liability Reporter 1007 (Sept. 12, 2011).

Love that Hormesis to Pieces

October 12th, 2017

Hermann Joseph Muller was an American biologist who won the Nobel Prize in 1946, for Physiology or Medicine, for his work on fruit fly genetics. In his Nobel Prize speech, Muller opined that there was no threshold dose for radiation-induced mutagenesis. Muller’s speech became a locus of support for what later became known as the “linear no threshold” (LNT) theory of carcinogenesis.

Muller was an ardent eugenicist, although of the communist, not the Nazi, variety.1 After 1932, Muller’s political enthusiasms took him to the Soviet Union, where Muller blithely ignored murderous purges and famines, in order to pursue his scientific interests for the greater glory of the Proletarian Dicatorship.2 Muller became enamored of a People’s eugenics program. On May 5, 1936, Muller wrote to “Comrade Stalin,” “[a]s a scientist with confidence in the ultimate Bolshevik triumph throughout all possible spheres of human endeavor,” to offer the brutal dictator “a matter of vital importance arising out of my own science – biology, and, in particular, genetics.”3

Comrade Stalin was underwhelmed by Muller’s offer, and threw his lot in with Trofim Lysenko. A disheartened Muller managed to extricate himself from the Soviet fatherland, but not so much from its politics and ideology4. After returning to the United States, he remained active in noteworthy liberal and progressive political activities. Alas, he also seemed to remain a Communist fellow traveler, who found time to criticize only the Soviet embrace of Lysenkoism and its treatment of dissident geneticists (such as himself), with nary a mention of Ukrainian farmers, political dissidents, or the Soviet subjugation of eastern and central Europe.5

In retreating from his Soviet homeland, Muller did not abandon his eugenic vision for the United States. In 1966, Muller urged the immediate establishment of sperm banks for “outstanding men,” such as himself, to make deposits for use in artificial insemination6

**********************************

Back in a 1976, George E. P. Box outlined his notion that all models are wrong even though some may be useful7. The LNT model, as devised by Muller and embraced by regulatory agencies around the world, has long since lost its usefulness in describing and predicting biological phenomena. LNT is scientific in the sense that it is testable and falsifiable; LNT has been tested and falsified. Muller’s model ignores relevant biological processes of tolerance, defense, and adaptation8

The resilience of the LNT seems to be due to the advocacy of scientists and regulators who find the simplistic LNT model to be useful in ensuring regulation of, and compensation for, low-dose exposures. The perpetual machine litigation created with asbestos comes to mind. Other “political scientists” come to mind as well. Theory and data are often in tension, but at the end of any debate, scientists are obligated to “save the phenomena.” Fortunately, there are scientists who are challenging the dominance of the LNT model, and who are pointing out where the model just does not fit the data9.

In the United States, Muller’s theories were subjected to some real-world tests. In May 1947, Muller warned of the possible evolution of evil monsters born to Japanese survivors of Hiroshima and Nagasaki, on the basis of his assessment that the atomic bombs had produced countless mutants. Later that year, however, Austin Brues, director of the Argonne National Laboraty, published his findings of children born to Hiroshima survivors, who had no more mutations than baseline expectation10.

Notwithstanding the shaky evidentiary foundations of Muller’s views, his prestige as a Nobel laureate encouraged the adoption and promotion of the LNT model by the National Academy of Sciences’ Biological Effects of Atomic Radiation (BEAR) I Genetics Panel. Edward J. Calabrese, a prominent toxicologist in the Department of Environmental Health Sciences, School of Public Health and Health Sciences, University of Massachusetts, has taken pains, on multiple occasions, to trace the genealogy of this error. His most recent, and most succinct effort, is a worthwhile read for policy makers, judges, and lawyers who want to understand the historical dimension of the LNT model11. A fuller bibliography is set out as an appendix to this post.


 

1 Herman Joseph Muller, Out of the Night – a Biologist’s View of the Future (1935).

2 Elof Alex Carlson, Genes, Radiation, and Society: The Life and Work of H.J. Muller (1981).

3 John Glad, “Hermann J. Muller’s 1936 Letter to Stalin,” 43 The Mankind Quarterly 305 (2003).

4 See, e.g., Peter J. Kuznick, Beyond the Laboratory: Scientists as Political Activists in 1930’s America 121 (1987).

5 Hermann J. Muller, “The Crushing of Genetics in the USSR,” 4 Bull. Atomic Scientists 369 (1948). Some have attempted to protect Muller’s conduct by arguing that he testified before the House Un-American Activities Committee, where he was critical of Soviet restrictions on secondary education. See Thomas D. Clark, Indiana University: Midwestern Pioneer 310 (1977). Given Muller’s privileged position to observe first hand what had happened to Ukrainian farmers and others, this coming forward on Soviet education seems feeble indeed.

6 See Sperm Banks Urged by Nobel Laureate,” N.Y. Times (Sept. 13, 1966).

7 See George E. P. Box, “Science and Statistics,” 71 J. Am. Stat. Ass’ 791 (1976); George E. P. Box, “Robustness in the strategy of scientific model building,” in R. L. Launer & G.N. Wilkinson, Robustness in Statistics at 201–236 (1979); George E. P. Box & Norman Draper, Empirical Model-Building and Response Surfaces at 74 (1987) (“Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful.”).

8 See, e.g., Adam D. Thomas, Gareth J. S. Jenkins, Bernd Kaina, Owen G. Bodger, Karl-Heinz Tomaszowski, Paul D. Lewis, Shareen H. Doak, and George E. Johnson, “Influence of DNA Repair on Nonlinear Dose-Responses for Mutation,” 132 Toxicol. Sci. 87 (2013).

9 See, e.g., Bill Sacks & Jeffry A. Siegel, “Preserving the Anti-Scientific Linear No-Threshold Myth: Authority, Agnosticism, Transparency, and the Standard of Care,” 15 Dose-Response: An Internat’l J. 1 (2017); Charles L. Sanders, Radiobiology and Radiation Hormesis: New Evidence and its Implications for Medicine and Society (2017).

10 William Widder, “Probe Effects of Atom Bomb: Study Betrays No Evidence of Mutations,” Greensburg Daily News (Greensburg, Indiana) at 22 (Mon, Nov. 24, 1947).

11 Edward J.Calabrese, “The Mistaken Birth and Adoption of the LNT: An Abridged Version,” 15 Dose-Response: An Internat’l J. (2017).


Appendix

Edward J.Calabrese & Linda A. Baldwin, “Chemical hormesis: its historical foundations as a biological hypothesis,” 19 Human & Experimental Toxicol. 2 (2000)

Edward J. Calabrese and Linda A. Baldwin, “Hormesis: U-shaped dose responses and their centrality in toxicology,” 22 Trends Pharmacol. Sci. 285 (2001)

Edward J.Calabrese, “Hormesis: a revolution in toxicology, risk assessment and medicine: Re-framing the dose–response relationship,” 5 Eur. Mol. Bio. Org. Reports S37 (2004)

Edward J. Calabrese & Robyn Blain, “The occurrence of hormetic dose responses in the toxicological literature, the hormesis database: an overview,” 202 Toxicol. & Applied Pharmacol. 289 (2005);

Edward J. Calabrese, “Pain and U-shaped dose responses: occurrence, mechanisms and clinical Implications,” 38 Crit. Rev. Toxicol. 579 (2008)

Edward J. Calabrese, “Neuroscience and hormesis: overview and general findings,” 38 Crit. Rev. Toxicol. 249 (2008)

Edward J. Calabrese, “Linear No Threshold (LNT) – The New Homeopathy,” 31 Envt’l Toxicol. & Chem. 2723 (2012)

Edward J. Calabrese, “Muller’s Nobel Prize Lecture: When Ideology Prevailed over Science,” 126 Toxicol. Sci. 1 (2012)

Edward J. Calabrese, “How the U.S. National Academy of Sciences misled the world community on cancer risk assessment: new findings challenge historical foundations of the linear dose response, 87 Arch. Toxicol. 2063 (2013)

Edward J. Calabrese, “On the origins of the linear no-threshold (LNT) dogma by means of untruths, artful dodges and blind faith,” 142 Envt’l Research 432 (2015)

Edward J. Calabrese, “An abuse of risk assessment: how regulatory agencies improperly adopted LNT for cancer risk assessment,” 89 Arch. Toxicol. 647 (2015)

Edward J. Calabrese, “LNTgate: How scientific misconduct by the U.S. NAS led to governments adopting LNT for cancer risk assessment,” 148 Envt’l Research 535 148 (2016)

Edward J. Calabrese, “The threshold vs LNT showdown: Dose rate findings exposed flaws in the LNT model part 1. The Russell-Muller debate,” 154 Envt’l Res. 435 (2017)

Edward J. Calabrese, “The threshold vs LNT showdown: Dose rate findings exposed flaws in the LNT model part 2. How a mistake led BEIR I to adopt LNT,” 154 Envt’l Res. 452 (2017)

Multiplicity in the Third Circuit

September 21st, 2017

In Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283 (W. D. Pa.), plaintiffs claimed that their employer’s reduction in force unlawfully targeted workers over 50 years of age. Plaintiffs lacked any evidence of employer animus against old folks, and thus attempted to make out a statistical disparate impact claim. The plaintiffs placed their chief reliance upon an expert witness, Michael A. Campion, to analyze a dataset of workers agreed to have been the subject of the R.I.F. For the last 30 years, Campion has been on the faculty in Purdue University. His academic training and graduate degrees are in industrial and organizational psychology. Campion has served an editor of Personnel Psychology, and as a past president of the Society for Industrial and Organizational Psychology. Campion’s academic website page notes that he manages a small consulting firm, Campion Consulting Services1.

The defense sought to characterize Campion as not qualified to offer his statistical analysis2. Campion did, however, have some statistical training as part of his master’s level training in psychology, and his professional publications did occasionally involve statistical analyses. To be sure, Campion’s statistical acumen paled in comparison to the defense expert witness, James Rosenberger, a fellow and a former vice president of the American Statistical Association, as well as a full professor of statistics in Pennsylvania State University. The threshold for qualification, however, is low, and the defense’s attack on Campion’s qualifications failed to attract the court’s serious attention.

On the merits, the defense subjected Campion to a strong challenge on whether he had misused data. The defense’s expert witness, Prof. Rosenberger, filed a report that questioned Campion’s data handling and statistical analyses. The defense claimed that Campion had engaged in questionable data manipulation by including, in his RIF analysis, workers who had been terminated when their plant was transferred to another company, as well as workers who retired voluntarily.

Using simple z-score tests, Campion compared the ages of terminated and non-terminated employees in four subgroups, ages 40+, 45+, 50+, and 55+. He did not conduct an analysis of the 60+ subgroup on the claim that this group had too few members for the test to have sufficient power3Campion found a small z-score for the 40+ versus <40 age groups comparison (z =1.51), which is not close to statistical significance at the 5% level. On the defense’s legal theory, this was the crucial comparison to be made under the Age Discrimination in Employment Act (ADEA). The plaintiffs, however, maintained that they could make out a case of disparate impact by showing age discrimination at age subgroups that started above the minimum specified by the ADEA. Although age is a continuous variable, Campion decided to conduct z-scores on subgroups that were based upon five-year increments. For the 45+, 50+, and 55+ age subgroups, he found z-scores that ranged from 2.15 to 2.46, and he concluded that there was evidence of disparate impact in the higher age subgroups4. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 (W.D. Pa. July 13, 2015) (McVerry, S.J.)

The defense, and apparently the defense expert witnesses, branded Campion’s analysis as “data snooping,” which required correction for multiple comparisons. In the defense’s view, the multiple age subgroups required a Bonferroni correction that would have diminished the critical p-value for “significance” by a factor of four. The trial court agreed with the defense contention about data snooping and multiple comparisons, and excluded Campion’s opinion of disparate impact, which had been based upon finding statistically significant disparities in the 45+, 50+, and 55+ age subgroups. 2015 WL 4232600, at *13. The trial court noted that Campion, in finding significant disparities in terminations in the subgroups, but not in the 40+ versus <40 analysis:

[did] not apply any of the generally accepted statistical procedures (i.e., the Bonferroni procedure) to correct his results for the likelihood of a false indication of significance. This sort of subgrouping ‘analysis’ is data-snooping, plain and simple.”

Id. After excluding Campion’s opinions under Rule 702, as well as other evidence in support of plaintiffs’ disparate impact claim, the trial court granted summary judgment on the discrimination claims. Karlo v. Pittsburgh Glass Works, LLC, No. 2:10–cv–1283, 2015 WL 5156913 (W. D. Pa. Sept. 2, 2015).

On plaintiffs’ appeal, the Third Circuit took the wind out of the attack on Campion by holding that the ADEA prohibits disparate impacts based upon age, which need not necessarily be on workers’ being over 40 years old, as opposed to being at least 40 years old. Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 66-68 (3d Cir. 2017). This holding took the legal significance out of the statistical insignificance of Campion’s comparison 40+ versus <40 age-group termination rates. Campion’s subgroup analyses were back in play, but the Third Circuit still faced the question whether Campion’s conclusions, based upon unadjusted z-scores and p-values, offended Rule 702.

The Third Circuit noted that the district court had identified three grounds for excluding Campion’s statistical analyses:

(1) Dr. Campion used facts or data that were not reliable;

(2) he failed to use a statistical adjustment called the Bonferroni procedure; and

(3) his testimony lacks ‘‘fit’’ to the case because subgroup claims are not cognizable.

849 F.3d at 81. The first issue was raised by the defense’s claims of Campion’s sloppy data handling, and inclusion of voluntarily retired workers and workers who were terminated when their plant was turned over to another company. The Circuit did not address these data handling issues, which it left for the trial court on remand. Id. at 82. The third ground went out of the case with the appellate court’s resolution of the scope of the ADEA. The Circuit did, however, engage on the issue whether adjustment for multiple comparisons was required by Rule 702.

On the “data-snooping” issue, the Circuit concluded that the trial court had applied “an incorrectly rigorous standard for reliability.” Id. The Circuit acknowledged that

[i]n theory, a researcher who searches for statistical significance in multiple attempts raises the probability of discovering it purely by chance, committing Type I error (i.e., finding a false positive).”

849 F.3d at 82. The defense expert witness contended that applying the Bonferroni adjustment, which would have reduced the critical significance probability level from 5% to 1%, would have rendered Campion’s analyses not statistically significant, and thus not probative of disparate impact. Given that plaintiffs’ cases were entirely statistical, the adjustment would have been fatal to their cases. Id. at 82.

At the trial level and on appeal, plaintiffs and Campion had objected to the data-snooping charge on ground that

(1) he had engaged in only four subgroups;

(2) virtually all subgroups were statistically significant;

(3) his methodology was “hypothesis driven” and involved logical increments in age to explore whether the strength of the evidence of age disparity in terminations continued in each, increasingly older subgroup;

(4) his method was analogous to replications with different samples; and

(5) his result was confirmed by a single, supplemental analysis.

Id. at 83. According to the plaintiffs, Campion’s approach was based upon the reality that age is a continuous, not a dichotomous variable, and he was exploring a single hypothesis. A.240-241; Brief of Appellants at 26. Campion’s explanations do mitigate somewhat the charge of “data snooping,” but they do not explain why Campion did not use a statistical analysis that treated age as a continuous variable, at the outset of his analysis. The single, supplemental analysis was never described or reported by the trial or appellate courts.

The Third Circuit concluded that the district court had applied a ‘‘merits standard of correctness,’’ which is higher than what Rule 702 requires. Specifically, the district court, having identified a potential methodological flaw, did not further evaluate whether Campion’s opinion relied upon good grounds. 849 F.3d at 83. The Circuit vacated the judgment below, and remanded the case to the district court for the opportunity to apply the correct standard.

The trial court’s acceptance that an adjustment was appropriate or required hardly seems a “merits standard.” The use of a proper adjustment for multiple comparisons is very much a methodological concern. If Campion could reach his conclusion only by way of an inappropriate methodology, then his conclusion surely would fail the requirements of Rule 702. The trial court did, however, appear to accept, without explicit evidence, that the failure to apply the Bonferroni correction made it impossible for Campion to present sound scientific argument for his conclusion that there had been disparate impact. The trial court’s opinion also suggests that the Bonferroni correction itself, as opposed to some more appropriate correction, was required.

Unfortunately, the reported opinions do not provide the reader with a clear account of what the analyses would have shown on the correct data set, without improper inclusions and exclusions, and with appropriate statistical adjustments. Presumably, the parties are left to make their cases on remand.

Based upon citations to sources that described the Bonferroni adjustment as “good statistical practice,” but one that is ‘‘not widely or consistently adopted’’ in the behavioral and social sciences, the Third Circuit observed that in some cases, failure to adjust for multiple comparisons may “simply diminish the weight of an expert’s finding.”5 The observation is problematic given that Kumho Tire suggests that an expert witness must use “in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho Tire Co. v. Carmichael, 526 U.S. 137, 150, (1999). One implication is that courts are prisoners to prevalent scientific malpractice and abuse of statistical methodology. Another implication is that courts need to look more closely at the assumptions and predicates for various statistical tests and adjustments, such as the Bonferroni correction.

These worrisome implications are exacerbated by the appellate court’s insistence that the question whether a study’s result was properly calculated or interpreted “goes to the weight of the evidence, not to its admissibility.”6 Combined with citations to pre-Daubert statistics cases7, judicial comments such as these can appear to be a general disregard for the statutory requirements of Rules 702 and 703. Claims of statistical significance, in studies with multiple exposure and multiple outcomes, are frequently not adjusted for multiple comparisons, without notation, explanation, or justification. The consequence is that study results are often over-interpreted and over-sold. Methodological errors related to multiple testing or over-claiming statistical significance are commonplace in tort litigation over “health-effects” studies of birth defects, cancer, and other chronic diseases that require epidemiologic evidence8.

In Karlo, the claimed methodological error is beset by its own methodological problems. As the court noted, adjustments for multiple comparisons are not free from methodological controversy9. One noteworthy textbook10 labels the Bonferroni correction as an “awful response” to the problem of multiple comparisons. Aside from this strident criticism, there are alternative approaches to statistical adjustment for multiple comparisons. In the context of the Karlo case, the Bonferroni might well be awful because Campion’s four subgroups are hardly independent tests. Because each subgroup is nested within the next higher age subgroup, the subgroup test results will be strongly correlated in a way that defeats the mathematical assumptions of the Bonferroni correction. On remand, the trial court in Karlo must still make his Rule 702 gatekeeping decision on the methodological appropriateness of whether Campion’s properly considered the role of multiple subgroups, and multiple anaslyses run on different models.


1 Although Campion describes his consulting business as small, he seems to turn up in quite a few employment discrimination cases. See, e.g., Chen-Oster v. Goldman, Sachs & Co., 10 Civ. 6950 (AT) (JCF) (S.D.N.Y. 2015); Brand v. Comcast Corp., Case No. 11 C 8471 (N.D. Ill. July 5, 2014); Powell v. Dallas Morning News L.P., 776 F. Supp. 2d 240, 247 (N.D. Tex. 2011) (excluding Campion’s opinions), aff’d, 486 F. App’x 469 (5th Cir. 2012).

2 See Defendant’s Motion to Bar Dr. Michael Campion’s Statistical Analysis, 2013 WL 11260556.

3 There was no mention of an effect size for the lower aged subgroups, and a power calculation for the 60+ subgroup’s probability of showing a z-score greater than two. Similarly, there was no discussion or argument about why this subgroup could not have been evaluated with Fisher’s exact test. In deciding the appeal, the Third Circuit observed that “Dr. Rosenberger test[ed] a subgroup of sixty-and-older employees, which Dr. Campion did not include in his analysis because ‘[t]here are only 14 terminations, which means the statistical power to detect a significant effect is very low’. A.244–45.” Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 82 n.15 (3d Cir. 2017).

4 In the trial court’s words, the z-score converts the difference in termination rates into standard deviations. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 n.13 (W.D. Pa. July 13, 2015). According to the trial court, Campion gave a rather dubious explanation of the meaning of the z-score: “[w]hen the number of standard deviations is less than –2 (actually–1.96), there is a 95% probability that the difference in termination rates of the subgroups is not due to chance alone” Id. (internal citation omitted).

5 See 849 F.3d 61, 83 (3d Cir. 2017) (citing and quoting from Paetzold & Willborn § 6:7, at 308 n.2) (describing the Bonferroni adjustment as ‘‘good statistical practice,’’ but ‘‘not widely or consistently adopted’’ in the behavioral and social sciences); see also E.E.O.C. v. Autozone, Inc., No. 00-2923, 2006 WL 2524093, at *4 (W.D. Tenn. Aug. 29, 2006) (‘‘[T]he Court does not have a sufficient basis to find that … the non-utilization [of the Bonferroni adjustment] makes [the expert’s] results unreliable.’’). And of course, the Third Circuit invoked the Daubert chestnut: ‘‘Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but

admissible evidence.’’ Daubert, 509 U.S. 579, 596 (1993).

6 See 849 F.3d at 83 (citing Leonard v. Stemtech Internat’l Inc., 834 F.3d 376, 391 (3d Cir. 2016).

7 See 849 F.3d 61, 83 (3d Cir. 2017), citing Bazemore v. Friday, 478 U.S. 385, 400 (1986) (‘‘Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.’’).

8 See Hans Zeisel & David Kaye, Prove It with Figures: Empirical Methods in Law and Litigation 93 & n.3 (1997) (criticizing the “notorious” case of Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986), for its erroneous endorsement of conclusions based upon “statistically significant” studies that explored dozens of congenital malformation outcomes, without statistical adjustment). The authors do, however, give an encouraging example of a English trial judge who took multiplicity seriously. Reay v. British Nuclear Fuels (Q.B. Oct. 8,1993) (published in The Independent, Nov. 22,1993). In Reay, the trial court took seriously the multiplicity of hypotheses tested in the study relied upon by plaintiffs. Id. (“the fact that a number of hypotheses were considered in the study requires an increase in the P-value of the findings with consequent reduction in the confidence that can be placed in the study result … .”), quoted in Zeisel & Kaye at 93. Zeisel and Kaye emphasize that courts should not be overly impressed with claims of statistically significant findings, and should pay close attention to how expert witnesses developed their statistical models. Id. at 94.

9 See David B. Cohen, Michael G. Aamodt, and Eric M. Dunleavy, Technical Advisory Committee Report on Best Practices in Adverse Impact Analyses (Center for Corporate Equality 2010).

10 Kenneth J. Rothman, Sander Greenland, and Timoth L. Lash, Modern Epidemiology 273 (3d ed. 2008); see also Kenneth J. Rothman, “No Adjustments Are Needed for Multiple Comparisons,” 1 Epidemiology 43, 43 (1990)

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.