For your delectation and delight, desultory dicta on the law of delicts.

Mississippi High Court Takes the Bite Out of Forensic Evidence

November 3rd, 2017

The Supreme Court’s 1993 decision in Daubert changed the thrust of Federal Rule of Evidence 702, which governs the admissibility of expert witness opinion testimony in both civil and criminal cases. Before Daubert, lawyers who hoped to exclude opinions lacking in evidentiary and analytical support turned to the Frye decision on “general acceptance.” Frye, however, was an outdated rule that was rarely applied outside the context of devices. Furthermore, the meaning and application of Frye were unclear. Confusion reigned on whether expert witnesses could survive Frye challenges simply by adverting to their claimed use of a generally accepted science, such as epidemiology, even though their implementation of epidemiologic science was sloppy, incoherent, and invalid.

Daubert noted that Rule 702 should be interpreted in the light of the “liberal” goals of the Federal Rules of Evidence. Some observers rejoiced at the invocation of “liberal” values, but history of the last 25 years has shown that they really yearned for libertine interpretations of the rules. Liberal, of course, never meant “anything goes.” It is unclear why “liberal” cannot mean restricting evidence not likely to advance the truth-finding function of trials.

Criminal versus Civil

Back on April 27, 2009, then President Barack Obama announced the formation of the President’s Council of Advisors on Science and Technology (PCAST). The mission of PCAST was to advise the President and his administration on science and technology, and their policy implications. Although the PCAST was a new council, presidents have had scientific advisors and advisory committees back to Franklin Roosevelt, in 1933.

On September 20, 2016, PCAST issued an important report to President Obama, Report to the President on Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods. Few areas of forensic “science,” beyond DNA matching, escaped the Council’s withering criticism. Bite-mark evidence in particular received a thorough mastication.

The criticism was hardly new. Seven years earlier, the National Academies of Science issued an indictment that forensic scientists had largely failed to establish the validity of their techniques and conclusions, and that the judiciary had “been utterly ineffective in addressing this problem.”1

The response from Obama’s Department of Justice, led by Loretta Lynch, was underwhelming.2 The Trump response was equally disappointing.3 The Left and the Right appear to agree that science is dispensable when it becomes politically inconvenient. It is a common place in the community of evidence scholars that Rule 702 is not applied with the same enthusiasm in criminal cases, to the benefit of criminal defendants, as the rule is sometimes, sporadically and inconsistently applied in civil cases. The Daubert revolution has failed the criminal justice system perhaps because courts are unwilling to lift the veil on forensic evidence, for fear they may not like what the find.4

A Grudging Look at the Scientific Invalidity of Bite Mark Evidence

Sherwood Brown was convicted of a triple murder in large measure as a result of testimony from Dr. Michael West, a forensic odontologist. West, as well as another odontologist, opined that a cut on Brown’s wrist matched the shape of a victim’s mouth. DNA testing authorized after the conviction, however, rendered West’s opinions edentulous. Samples from inside the female victim’s mouth yielded male DNA, but not that of Mr. Brown.5

Did the PCAST report leave an impression upon the highest court of Mississippi? The Supreme Court of Mississippi vacated Brown’s conviction and remanded for a new trial, in an opinion that a bitemark expert might describe as reading like a bite into a lemon. Brown v. State, No. 2017 DR 00206 SCT, Slip op. (Miss. Sup. Ct. Oct. 26, 2017). The majority could not bring themselves to comment upon the Dr. West’s toothless opinions. Three justices would have kicked the can down to the trial judge by voting to grant a new hearing without vacating Brown’s convictions. The decision seems mostly predicated on the strength of the DNA evidence, rather than the invalidity of the bite mark evidence. Mr. Brown will probably be vindicated, but bite mark evidence will continue to mislead juries, with judicial imprimatur.

1 National Research Council, Committee on Identifying the Needs of the Forensic Sciences Community, Strengthening Forensic Science in the United States: A Path Forward 53 (2009).

2 See Jordan Smith, “FBI and DoJ Vow to Continue Using Junk Science Rejected by White House Report,” The Intercept (Sept. 23, 2016); Radley Balko, “When Obama wouldn’t fight for science,” Wash. Post (Jan. 4, 2017).

3 See Radley Balko, “Jeff Sessions wants to keep forensics in the Dark Ages,” Wash. Post (April 11, 2017); Jessica Gabel Cino, “Session’s Assault on Forensic Science Will Lead to More Unsafe Convictions,” Newsweek (April 20, 2017).

4 See, e.g., Paul C. Giannelli, “Forensic Science: Daubert’s Failure,” Case Western Reserve L. Rev. (2017) (“in press”).

Disappearing Conflicts of Interest

October 29th, 2017

As the story of who funded the opposition research into Trumski and the Russian micturaters unfolds, both sides of the political spectrum seem obsessed with who funded the research. Funny thing that both sides had coins in the fountain. Funding is, in any event, an invalid proxy for good and sufficient reason. The public should be focused on the truth or falsity of the factual claims. The same goes in science, although more and more, science is evaluated by “conflicts of interest” (COIs) rather than by the strength of evidence and validity of inferences.

No one screams louder today about COIs than the lawsuit industry and its scientist fellow travelers. Although I believe we should rid ourselves of this obsession with COIs, to the extent we must put up with it, the obsession should at least be symmetrical, complete, and non-hypocritical.

In an in-press publication, Morris Greenberg has published an historical account of the role that the U.K. Medical Research Council had in studying asbestos health effects.1 Greenberg often weighs in on occupational disease issues in synch with the litigation industry, and so no one will be entirely surprised that Greenberg suspects undue industry influence (not the lawsuit industry, but an industry that actually makes things). Greenberg may be right in his historical narrative and analysis, but my point today is different. What was interesting about Greenberg’s paper was the disclosure at its conclusion, by the “American Journal of Industrial Medicine editor of record”:

Steven B. Markowitz declares that he has no conflict of interest in the review and publication decision regarding this article.”

Markowitz’s declaration is remarkable in the era when the litigation industry and its scientific allies perpetually have their knickers knotted over perceived COIs. Well known to the asbestos bar, Markowitz has testified with some regularity for plaintiffs’ lawyers and their clients. Markowitz is also an editor in chief of the “red” journal,” the American Journal of Industrial Medicine. Many of the associate editors are regular testifiers for the lawsuit industry, such as Arthur L. Frank and Richard A. Lemen.

Even more curious is that Steven Markowitz, along with fellow plaintiffs’ expert witness, Jacqueline M. Moline, recently published a case report about mesothelioma occuring in an unusual exposure situation, in the red journal. This paper appeared online in February 2017, and carried a disclosure that “[t]he authors have served as expert witnesses in cases involving asbestos tort litigation.2” A bit misleading given how both appear virtually exclusively for claimants, but still a disclosure, whereas Markowitz, qua editor of Greenberg’s article, claimed to have none.

Markowitz, as an alumnus of the Mount Sinai School of Medicine, is, of course, a member of the secret handshake society of the litigation industry, the Collegium Ramazzini. At the Collegium, Markowitz proudly presents his labor union consultancies, but these union ties are not disclosed in Markowitz’s asbestos publications.

Previously, I blogged about Markowitz’s failure to make an appropriate COI disclosure in connection with an earlier asbestos paper.3 See Conflicts of Interest in Asbestos Studies – the Plaintiffs’ Double Standard” (Sept. 18, 2013). At the time, there appeared to be no disclosure of litigation work, but I was encouraged to see, upon checking today, that Markowitz’s disclosure for his 2013 paper now reveals that he has received fees for expert testimony, from “various law firms.” A bit thin to leave out plaintiffs’ law firms, considering that the paper at issue is used regularly by Markowitz and other plaintiffs’ expert witnesses to advance their positions in asbestos cases. A more complete disclosure might read something like: “Markowitz has been paid to consult and testify in asbestos personal injury by plaintiffs’ legal counsel, and to consult for labor unions. In his testimony and consultations, he relies upon this paper and other evidence to support his opinions. This study has grown out of research that was originally funded by the asbestos workers’ union.”

Or we could just evaluate the study on its merits, or lack thereof.

1 Morris Greenberg, “Experimental asbestos studies in the UK: 1912-1950,” 60 Am. J. Indus. Med. XXX (2017) (doi: 10.1002/ajim.22762).

2 Steven B. Markowitz & Jacqueline M. Moline, “Malignant Mesothelioma Due to Asbestos Exposure in Dental Tape,” 60 Am. J. Indus. Med. 437 (2017).

Echeverria Talc Trial – Crossexamination on Alleged Expert Witness Misconduct

October 21st, 2017

In a post-trial end-zone victory dance in Echeverria v. Johnson & Johnson, plaintiffs’ lawyer, Allen Smith proffered three explanations for the jury’s stunning $417 million verdict in his talc ovarian cancer case.1 One of the explanations asserted was Smith’s boast that he had adduced evidence that Johnson & Johnson’s expert witness on epidemiology, Douglas Weed, a former National Cancer Institute epidemiologist and physician, had been sanctioned in another, non-talc case in North Carolina, for lying under oath about whether he had notes to his expert report in that other case.2 Having now viewed Dr. Weed’s testimony3, through the Courtroom Video Network, I can evaluate Smith’s claim.

Weed’s allegedly perjurious testimony took place in Carter v. Fiber Composites LLC, 11 CVS 1355, N.C. Super. Ct., where he served as a party expert witness. In April 2014, Weed gave deposition testimony in the discovery phase of the Carter case. Although not served personally with a lawful subpoena, defense counsel had agreed to accept a subpoena for their expert witness to appear and produce documents, as was the local custom. In deposition, plaintiffs’ counsel asked Dr. Weed to produce any notes he created in the process of researching and writing his expert witness report. Dr. Weed testified that he had no notes. 

The parties disputed whether Dr. Weed had complied with a subpoena served upon defense counsel. The discovery dispute escalated and Dr. Weed obtained legal counsel, and submitted a sworn affidavit that denied the existence of notes. Plaintiffs’ counsel pressed on Dr. Weed’s understanding that he had no “notes.” In an Order, dated May 6, 2014, the trial court directed Dr. Weed to produce everything in his possession. In response to the order, Weed produced his calendar and a thumb drive with “small fragments of notes,” “inserts,” and “miscellaneous items.”

The North Carolina court did not take kindly to Dr. Weed’s confusion about whether his report “segments” and “inserts” were notes, or not. Dr. Weed viewed the segments and inserts to have been parts of his report, and later included within his report without any substantial change. The court concluded, however, that although Dr. Weed did not violate any court order, his assertion, in deposition, in an affidavit, and through legal counsel, was unreasonable, and directly related to his credibility in the Carter case. See Order Concerning Plaintiffs’ Motion for Sanctions Against Defendants and Non-Party Witness for Defendants (June 22, 2015) (Forrest D. Bridges, J.).

The upshot was that Dr. Weed and his counsel had provided false information to the court, on the court’s understanding of what had been requested in discovery. In the court’s view, Dr. Weed’s misunderstanding may have been understandable as a non-lawyer, but it was not reasonable for him to persist and have his counsel argue that there were no notes. The trial court specifically did not find that Dr. Weed had lied, as asserted by Allen Smith, but found that Weed’s conduct was undertaken intentionally or with reckless disregard of the truth, and that his testimony was an unacceptable violation of the oath to tell the whole truth. The trial court concluded that it could not sanction Dr. Weed personally, but its order specified that as a sanction, the plaintiffs’ counsel would be permitted to cross-examine Dr. Weed with the court’s findings and conclusions in the Carter case. Id. Not surprisingly, defense counsel withdrew Dr. Weed as an expert witness.

In the Echeverria case, the defense counsel did not object to the cross-examination; the video proceedings did not inform the viewers whether there had been a prior motion in limine concerning this examination. Allen Smith’s assertion about the North Carolina court’s findings was thus almost true. A cynic might say he too had not told the whole truth, but he did march Dr. Weed through Judge Bridges’ order of June 2015, which was displayed to the jury.

Douglas Weed handled the cross-examination about as well as possible. He explained on cross, and later on redirect, that he did not regard segments of his report, which were later incorporated into his report as served, to be notes. He pointed out that there was no information in the segments, which differed from the final report, or which was not included in the report. Smith’s cross-examination, however, had raised questions not so much about credibility (despite Judge Bridges’ findings), but about whether Dr. Weed was a “quibbler,” who would hide behind idiosyncratic understandings of important words such as “consistency.” Given how harmless the belatedly produced report fragments and segments were, we are left to wonder why Dr. Weed persisted in not volunteering them.

Smith’s confrontation of Dr. Weed with the order from the Carter case came at the conclusion of a generally unsuccessful cross-examination. Unlike the Slemp case, in which Smith appeared to be able to ask unfounded questions without restraint from the bench, in Echeverria, Smith drew repeated objections, which were frequently sustained. His response often was to ask almost the same question again, drawing the same objection and the same ruling. He sounded stymied and defeated.

Courtroom Video Network, of course, does not film the jurors, and so watching the streaming video of the trial offers no insights into how the jurors reacted in real time to Smith’s cross-examination. If Weed’s testimony was ignored or discredited by Smith’s cross-examination on the Carter order, then the Escheverria case cannot be considered a useful test of the plaintiffs’ causal claim. Dr. Weed had offered important testimony on methodological issues for conducting and interpreting studies, as well as inferring causation.

One of the peculiarities of the Slemp case was that the defense offered no epidemiologist in the face of two epidemiologists offered by the plaintiff. In Escheverria, the defense addressed this gap and went further to have its epidemiologist address the glaring problem of how any specific causal inference can be drawn from a risk ratio of 1.3. Dr. Weed explained attributable risk and probability of causation, and this testimony and many other important points went without cross-examination or contradiction. And yet, after finding general causation on a weak record, the jury somehow leaped over an insurmountable epistemic barrier on specific causation.

1 Amanda Bronstad, “New Evidence Seen as Key in LA Jury’s $417M Talc Verdict,” (Aug. 22, 2017).

3 The cross-examination at issue arose about one hour, nine minutes into Smith’s cross-examination, on Aug. 15, 2017.

Multiplicity in the Third Circuit

September 21st, 2017

In Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283 (W. D. Pa.), plaintiffs claimed that their employer’s reduction in force unlawfully targeted workers over 50 years of age. Plaintiffs lacked any evidence of employer animus against old folks, and thus attempted to make out a statistical disparate impact claim. The plaintiffs placed their chief reliance upon an expert witness, Michael A. Campion, to analyze a dataset of workers agreed to have been the subject of the R.I.F. For the last 30 years, Campion has been on the faculty in Purdue University. His academic training and graduate degrees are in industrial and organizational psychology. Campion has served an editor of Personnel Psychology, and as a past president of the Society for Industrial and Organizational Psychology. Campion’s academic website page notes that he manages a small consulting firm, Campion Consulting Services1.

The defense sought to characterize Campion as not qualified to offer his statistical analysis2. Campion did, however, have some statistical training as part of his master’s level training in psychology, and his professional publications did occasionally involve statistical analyses. To be sure, Campion’s statistical acumen paled in comparison to the defense expert witness, James Rosenberger, a fellow and a former vice president of the American Statistical Association, as well as a full professor of statistics in Pennsylvania State University. The threshold for qualification, however, is low, and the defense’s attack on Campion’s qualifications failed to attract the court’s serious attention.

On the merits, the defense subjected Campion to a strong challenge on whether he had misused data. The defense’s expert witness, Prof. Rosenberger, filed a report that questioned Campion’s data handling and statistical analyses. The defense claimed that Campion had engaged in questionable data manipulation by including, in his RIF analysis, workers who had been terminated when their plant was transferred to another company, as well as workers who retired voluntarily.

Using simple z-score tests, Campion compared the ages of terminated and non-terminated employees in four subgroups, ages 40+, 45+, 50+, and 55+. He did not conduct an analysis of the 60+ subgroup on the claim that this group had too few members for the test to have sufficient power3Campion found a small z-score for the 40+ versus <40 age groups comparison (z =1.51), which is not close to statistical significance at the 5% level. On the defense’s legal theory, this was the crucial comparison to be made under the Age Discrimination in Employment Act (ADEA). The plaintiffs, however, maintained that they could make out a case of disparate impact by showing age discrimination at age subgroups that started above the minimum specified by the ADEA. Although age is a continuous variable, Campion decided to conduct z-scores on subgroups that were based upon five-year increments. For the 45+, 50+, and 55+ age subgroups, he found z-scores that ranged from 2.15 to 2.46, and he concluded that there was evidence of disparate impact in the higher age subgroups4. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 (W.D. Pa. July 13, 2015) (McVerry, S.J.)

The defense, and apparently the defense expert witnesses, branded Campion’s analysis as “data snooping,” which required correction for multiple comparisons. In the defense’s view, the multiple age subgroups required a Bonferroni correction that would have diminished the critical p-value for “significance” by a factor of four. The trial court agreed with the defense contention about data snooping and multiple comparisons, and excluded Campion’s opinion of disparate impact, which had been based upon finding statistically significant disparities in the 45+, 50+, and 55+ age subgroups. 2015 WL 4232600, at *13. The trial court noted that Campion, in finding significant disparities in terminations in the subgroups, but not in the 40+ versus <40 analysis:

[did] not apply any of the generally accepted statistical procedures (i.e., the Bonferroni procedure) to correct his results for the likelihood of a false indication of significance. This sort of subgrouping ‘analysis’ is data-snooping, plain and simple.”

Id. After excluding Campion’s opinions under Rule 702, as well as other evidence in support of plaintiffs’ disparate impact claim, the trial court granted summary judgment on the discrimination claims. Karlo v. Pittsburgh Glass Works, LLC, No. 2:10–cv–1283, 2015 WL 5156913 (W. D. Pa. Sept. 2, 2015).

On plaintiffs’ appeal, the Third Circuit took the wind out of the attack on Campion by holding that the ADEA prohibits disparate impacts based upon age, which need not necessarily be on workers’ being over 40 years old, as opposed to being at least 40 years old. Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 66-68 (3d Cir. 2017). This holding took the legal significance out of the statistical insignificance of Campion’s comparison 40+ versus <40 age-group termination rates. Campion’s subgroup analyses were back in play, but the Third Circuit still faced the question whether Campion’s conclusions, based upon unadjusted z-scores and p-values, offended Rule 702.

The Third Circuit noted that the district court had identified three grounds for excluding Campion’s statistical analyses:

(1) Dr. Campion used facts or data that were not reliable;

(2) he failed to use a statistical adjustment called the Bonferroni procedure; and

(3) his testimony lacks ‘‘fit’’ to the case because subgroup claims are not cognizable.

849 F.3d at 81. The first issue was raised by the defense’s claims of Campion’s sloppy data handling, and inclusion of voluntarily retired workers and workers who were terminated when their plant was turned over to another company. The Circuit did not address these data handling issues, which it left for the trial court on remand. Id. at 82. The third ground went out of the case with the appellate court’s resolution of the scope of the ADEA. The Circuit did, however, engage on the issue whether adjustment for multiple comparisons was required by Rule 702.

On the “data-snooping” issue, the Circuit concluded that the trial court had applied “an incorrectly rigorous standard for reliability.” Id. The Circuit acknowledged that

[i]n theory, a researcher who searches for statistical significance in multiple attempts raises the probability of discovering it purely by chance, committing Type I error (i.e., finding a false positive).”

849 F.3d at 82. The defense expert witness contended that applying the Bonferroni adjustment, which would have reduced the critical significance probability level from 5% to 1%, would have rendered Campion’s analyses not statistically significant, and thus not probative of disparate impact. Given that plaintiffs’ cases were entirely statistical, the adjustment would have been fatal to their cases. Id. at 82.

At the trial level and on appeal, plaintiffs and Campion had objected to the data-snooping charge on ground that

(1) he had engaged in only four subgroups;

(2) virtually all subgroups were statistically significant;

(3) his methodology was “hypothesis driven” and involved logical increments in age to explore whether the strength of the evidence of age disparity in terminations continued in each, increasingly older subgroup;

(4) his method was analogous to replications with different samples; and

(5) his result was confirmed by a single, supplemental analysis.

Id. at 83. According to the plaintiffs, Campion’s approach was based upon the reality that age is a continuous, not a dichotomous variable, and he was exploring a single hypothesis. A.240-241; Brief of Appellants at 26. Campion’s explanations do mitigate somewhat the charge of “data snooping,” but they do not explain why Campion did not use a statistical analysis that treated age as a continuous variable, at the outset of his analysis. The single, supplemental analysis was never described or reported by the trial or appellate courts.

The Third Circuit concluded that the district court had applied a ‘‘merits standard of correctness,’’ which is higher than what Rule 702 requires. Specifically, the district court, having identified a potential methodological flaw, did not further evaluate whether Campion’s opinion relied upon good grounds. 849 F.3d at 83. The Circuit vacated the judgment below, and remanded the case to the district court for the opportunity to apply the correct standard.

The trial court’s acceptance that an adjustment was appropriate or required hardly seems a “merits standard.” The use of a proper adjustment for multiple comparisons is very much a methodological concern. If Campion could reach his conclusion only by way of an inappropriate methodology, then his conclusion surely would fail the requirements of Rule 702. The trial court did, however, appear to accept, without explicit evidence, that the failure to apply the Bonferroni correction made it impossible for Campion to present sound scientific argument for his conclusion that there had been disparate impact. The trial court’s opinion also suggests that the Bonferroni correction itself, as opposed to some more appropriate correction, was required.

Unfortunately, the reported opinions do not provide the reader with a clear account of what the analyses would have shown on the correct data set, without improper inclusions and exclusions, and with appropriate statistical adjustments. Presumably, the parties are left to make their cases on remand.

Based upon citations to sources that described the Bonferroni adjustment as “good statistical practice,” but one that is ‘‘not widely or consistently adopted’’ in the behavioral and social sciences, the Third Circuit observed that in some cases, failure to adjust for multiple comparisons may “simply diminish the weight of an expert’s finding.”5 The observation is problematic given that Kumho Tire suggests that an expert witness must use “in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.” Kumho Tire Co. v. Carmichael, 526 U.S. 137, 150, (1999). One implication is that courts are prisoners to prevalent scientific malpractice and abuse of statistical methodology. Another implication is that courts need to look more closely at the assumptions and predicates for various statistical tests and adjustments, such as the Bonferroni correction.

These worrisome implications are exacerbated by the appellate court’s insistence that the question whether a study’s result was properly calculated or interpreted “goes to the weight of the evidence, not to its admissibility.”6 Combined with citations to pre-Daubert statistics cases7, judicial comments such as these can appear to be a general disregard for the statutory requirements of Rules 702 and 703. Claims of statistical significance, in studies with multiple exposure and multiple outcomes, are frequently not adjusted for multiple comparisons, without notation, explanation, or justification. The consequence is that study results are often over-interpreted and over-sold. Methodological errors related to multiple testing or over-claiming statistical significance are commonplace in tort litigation over “health-effects” studies of birth defects, cancer, and other chronic diseases that require epidemiologic evidence8.

In Karlo, the claimed methodological error is beset by its own methodological problems. As the court noted, adjustments for multiple comparisons are not free from methodological controversy9. One noteworthy textbook10 labels the Bonferroni correction as an “awful response” to the problem of multiple comparisons. Aside from this strident criticism, there are alternative approaches to statistical adjustment for multiple comparisons. In the context of the Karlo case, the Bonferroni might well be awful because Campion’s four subgroups are hardly independent tests. Because each subgroup is nested within the next higher age subgroup, the subgroup test results will be strongly correlated in a way that defeats the mathematical assumptions of the Bonferroni correction. On remand, the trial court in Karlo must still make his Rule 702 gatekeeping decision on the methodological appropriateness of whether Campion’s properly considered the role of multiple subgroups, and multiple anaslyses run on different models.

1 Although Campion describes his consulting business as small, he seems to turn up in quite a few employment discrimination cases. See, e.g., Chen-Oster v. Goldman, Sachs & Co., 10 Civ. 6950 (AT) (JCF) (S.D.N.Y. 2015); Brand v. Comcast Corp., Case No. 11 C 8471 (N.D. Ill. July 5, 2014); Powell v. Dallas Morning News L.P., 776 F. Supp. 2d 240, 247 (N.D. Tex. 2011) (excluding Campion’s opinions), aff’d, 486 F. App’x 469 (5th Cir. 2012).

2 See Defendant’s Motion to Bar Dr. Michael Campion’s Statistical Analysis, 2013 WL 11260556.

3 There was no mention of an effect size for the lower aged subgroups, and a power calculation for the 60+ subgroup’s probability of showing a z-score greater than two. Similarly, there was no discussion or argument about why this subgroup could not have been evaluated with Fisher’s exact test. In deciding the appeal, the Third Circuit observed that “Dr. Rosenberger test[ed] a subgroup of sixty-and-older employees, which Dr. Campion did not include in his analysis because ‘[t]here are only 14 terminations, which means the statistical power to detect a significant effect is very low’. A.244–45.” Karlo v. Pittsburgh Glass Works, LLC, 849 F.3d 61, 82 n.15 (3d Cir. 2017).

4 In the trial court’s words, the z-score converts the difference in termination rates into standard deviations. Karlo v. Pittsburgh Glass Works, LLC, C.A. No. 2:10-cv-01283, 2015 WL 4232600, at *11 n.13 (W.D. Pa. July 13, 2015). According to the trial court, Campion gave a rather dubious explanation of the meaning of the z-score: “[w]hen the number of standard deviations is less than –2 (actually–1.96), there is a 95% probability that the difference in termination rates of the subgroups is not due to chance alone” Id. (internal citation omitted).

5 See 849 F.3d 61, 83 (3d Cir. 2017) (citing and quoting from Paetzold & Willborn § 6:7, at 308 n.2) (describing the Bonferroni adjustment as ‘‘good statistical practice,’’ but ‘‘not widely or consistently adopted’’ in the behavioral and social sciences); see also E.E.O.C. v. Autozone, Inc., No. 00-2923, 2006 WL 2524093, at *4 (W.D. Tenn. Aug. 29, 2006) (‘‘[T]he Court does not have a sufficient basis to find that … the non-utilization [of the Bonferroni adjustment] makes [the expert’s] results unreliable.’’). And of course, the Third Circuit invoked the Daubert chestnut: ‘‘Vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are the traditional and appropriate means of attacking shaky but

admissible evidence.’’ Daubert, 509 U.S. 579, 596 (1993).

6 See 849 F.3d at 83 (citing Leonard v. Stemtech Internat’l Inc., 834 F.3d 376, 391 (3d Cir. 2016).

7 See 849 F.3d 61, 83 (3d Cir. 2017), citing Bazemore v. Friday, 478 U.S. 385, 400 (1986) (‘‘Normally, failure to include variables will affect the analysis’ probativeness, not its admissibility.’’).

8 See Hans Zeisel & David Kaye, Prove It with Figures: Empirical Methods in Law and Litigation 93 & n.3 (1997) (criticizing the “notorious” case of Wells v. Ortho Pharmaceutical Corp., 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S. 950 (1986), for its erroneous endorsement of conclusions based upon “statistically significant” studies that explored dozens of congenital malformation outcomes, without statistical adjustment). The authors do, however, give an encouraging example of a English trial judge who took multiplicity seriously. Reay v. British Nuclear Fuels (Q.B. Oct. 8,1993) (published in The Independent, Nov. 22,1993). In Reay, the trial court took seriously the multiplicity of hypotheses tested in the study relied upon by plaintiffs. Id. (“the fact that a number of hypotheses were considered in the study requires an increase in the P-value of the findings with consequent reduction in the confidence that can be placed in the study result … .”), quoted in Zeisel & Kaye at 93. Zeisel and Kaye emphasize that courts should not be overly impressed with claims of statistically significant findings, and should pay close attention to how expert witnesses developed their statistical models. Id. at 94.

9 See David B. Cohen, Michael G. Aamodt, and Eric M. Dunleavy, Technical Advisory Committee Report on Best Practices in Adverse Impact Analyses (Center for Corporate Equality 2010).

10 Kenneth J. Rothman, Sander Greenland, and Timoth L. Lash, Modern Epidemiology 273 (3d ed. 2008); see also Kenneth J. Rothman, “No Adjustments Are Needed for Multiple Comparisons,” 1 Epidemiology 43, 43 (1990)