For your delectation and delight, desultory dicta on the law of delicts.

Traditional, Frequentist Statistics Still Hegemonic

March 25th, 2017

The Defense Fallacy

In civil actions, defendants, and their legal counsel sometimes argue that the absence of statistical significance across multiple studies requires a verdict of “no cause” for the defense. This argument is fallacious, as can be seen where there are many studies, say eight or nine, which all consistently find elevated risk ratios, but with p-values slightly higher than 5%. The probability that eight studies, free of bias, would consistently find an elevated risk ratio, regardless of the individual studies’ p-values, is itself very small. If the studies were amenable to meta-analysis, the summary estimate of the risk ratio would itself likely be highly statistically significant in this hypothetical.

The Plaintiffs’ Fallacy

The plaintiffs’ fallacy derives from instances, such as the hypothetical one above, in which statistical significance, taken as a property of individual studies, is lacking. Even though we can hypothesize such instances, plaintiffs fallaciously extrapolate from them to the conclusion that statistical significance, or any other measure of sampling estimate precision, is unnecessary to support a conclusion of causation.

In courtroom proceedings, epidemiologist Kenneth Rothman is frequently cited by plaintiffs as having shown or argued that statistical significance is unimportant. For instance, in the Zoloft multi-district birth defects litigation, plaintiffs argued in a motion for reconsideration of the exclusion of their epidemiologic witness that the trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27 (2011), as well as to the Third Circuit’s invocation of the so-called “Rothman” approach in a Bendectin birth defects case, DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). According to the plaintiffs’ argument, their excluded epidemiologic witness, Dr. Anick Bérard, had used this approach in arriving at her novel conclusion that sertraline causes virtually every kind of birth defect.

The Zoloft plaintiffs did not call Rothman as a witness; nor did they even present an expert witness to explain what Rothman’s arguments were. Instead, the plaintiffs’ counsel, sneaked in some references and vague conclusions into their cross-examinations of defense expert witnesses, and submitted snippets from Rothman’s textbook, Modern Epidemiology.

If plaintiffs had called Dr. Rothman to testify, he would have probably insisted that statistical significance is not a criterion for causation. Such insistence is not as helpful to plaintiffs in cases such as Zoloft birth defects cases as their lawyers might have thought or hoped. Consider for instance the cases in which causal inferences are arrived at without formal statistical analysis. These instances are often not relevant to mass tort litigation that involve prevalent exposure and a prevalent outcome.

Rothman also would have likely insisted that consideration of random variation and bias are essential to the assessment of causation, and that many apparently or nominally statistically significant associations do not and cannot support valid inferences of causation. Furthermore, he might have been given the opportunity to explain that his criticisms of significance testing are as much directed to the creation of false positive as to false negative rates in observational epidemiology. In keeping with his publications, Rothman would have challenged strict significance testing with p-values as opposed to the use of sample statistical estimates in conjunction with confidence intervals. The irony of the Zoloft case and many other litigations was that the defense was not using significance testing in the way that Rothman had criticized; rather the plaintiffs were over-endorsing statistical significance that was nominal, plagued by multi-testing, and inconsistent.

Judge Rufe, who presided over the Zoloft MDL, pointed out that the Third Circuit in DeLuca had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:

by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicia of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”

2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). And in DeLuca, after remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses (including the infamous Dr. Done, and Shanna Swan), in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. On subsequent appeal, the Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

Judge Rufe similarly rebuffed the plaintiffs’ use of the Rothman approach, their reliance upon Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration). SeeZoloft MDL Relieves Matrixx Depression” (Feb. 4, 2015).

Some Statisticians’ Errors

Recently, Dr. Rothman and three other epidemiologists set out to track the change, over time, from 1975 to 2014, of the use of various statistical methodologies. Andreas Stang, Markus Deckert, Charles Poole & Kenneth J. Rothman, “Statistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic review,” 32 Eur. J. Epidem. 21 (2017) [cited below as Stang]. They made clear that their preferred methodological approach was to avoid the strictly dichotomous null hypothesis significance testing (NHST), which has evolved from Fisher’s significance testing and Neyman’s null hypothesis testing (NHT), in favor of the use of estimation with confidence intervals (CI). The authors conducted a meta-study, that is a study of studies, to track the trends in use of NHST, ST, NHT and CI reporting in the major bio-medical journals.

Unfortunately, the authors limited their data and analysis to abstracts, which makes their results very likely misleading and incomplete. Even when abstracts reported using so-called CI-only approaches, the authors may well have reasoned that point estimates with CIs that spanned no association were “non-significant.” Similarly, authors who found elevated risk ratios with very wide confidence intervals may well have properly acknowledged that their study did not provide credible evidence of an association. See W. Douglas Thompson, “Statistical criteria in the interpretation of epidemiologic data,” 77 Am. J. Public Health 191, 191 (1987) (discussing the over-interpretation of skimpy data).

Rothman and colleagues found that while a few epidemiologic journals had a rising prevalence of CI-only reports in abstracts, for many biomedical journals the NHST approach remained more common. Interestingly, at three of the major clinical medical journals, the Journal of the American Medical Association, the New England Journal of Medicine, and Lancet, the NHST has prevailed over the almost four decades of observation.

The clear implication of Rothman’s meta-study is that consideration of significance probability, whether or not treated as a dichotomous outcome, and whether or not treated as a p-value or a point estimate with a confidence interval, is absolutely critical to how biomedical research is conducted, analyzed, and reported. In Rothman’s words:

Despite the many cautions, NHST remains one of the most prevalent statistical procedures in the biomedical literature.”

Stang at 22. See also David Chavalarias, Joshua David Wallach, Alvin Ho Ting & John P. A. Ioannidis, “Evolution of Reporting P Values in the Biomedical Literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016) (noting the absence of the use of Bayes’ factors, among other techniques).

There is one aspect to the Stang article that is almost Trump-like in its citing to an inappropriate, unknowledgable source and then treating its author as having meaningful knowledge of the subject. As part of their rhetorical goals, Stang and colleagues declare that:

there are some indications that it has begun to create a movement away from strict adherence to NHT, if not to ST as well. For instance, in the Matrixx decision in 2011, the U.S. Supreme Court unanimously ruled that admissible evidence of causality does not have to be statistically significant [12].”

Stang at 22. Whence comes this claim? Footnote 12 takes us to what could well be fake news of a legal holding, an article by a statistician about a legal case:

Joseph L. Gastwirth, “Statistical considerations support the Supreme Court’s decision in Matrixx Initiatives v. Siracusano, 52 Jurimetrics J. 155 (2012).

Citing a secondary source when the primary source is readily available, and what is at issue, seems like poor scholarship. Professor Gastwirth is a statistician, not a lawyer, and his exegesis of the Supreme Court’s decision is wildly off target. As any first year law student could discern, the Matrixx case could not have been about the admissibility of evidence because the case had been dismissed on the pleadings, and no evidence had ever been admitted or excluded. The only issue on appeal was the adequacy of the allegations, not the admissibility of evidence.

Although the Court managed to muddle its analysis by wandering off into dicta about causation, the holding of the case is that alleging causation was not required to plead a case of materiality for a securities fraud case. Having dispatched causality from the case, the Court had no serious business in setting the considerations for alleging in pleadings or proving at trial the elements of causation. Indeed, the Court made it clear that its frolic and detour into causation could not be taken seriously:

We need not consider whether the expert testimony was properly admitted in those cases [cited earlier in the opinion], and we do not attempt to define here what constitutes reliable evidence of causation.”

Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309, 1319 (2011).

The word “admissible” or “admissibility” never appear in the Court’s opinion, and the above quote explains that the admissibility was not considered. Laughably, the Court went on to cite three cases as examples of supposed causation opinions in the absence of statistical significance. Two of the three were specific causation, differential etiology cases that involved known general causation. The third case involved a claim of birth defects from contraceptive jelly, when the plaintiffs’ expert witnesses actually relied upon statistically significant (but thoroughly flawed and invalid) associations.1

When it comes to statistical testing the legal world would be much improved if lawyers actually and carefully read statistics authors, and if statisticians and scientists actually read court opinions.

Washington Legal Foundation’s Paper on Statistical Significance in Rule 702 Proceedings

March 13th, 2017

The Washington Legal Foundation has released a Working Paper, No. 201, by Kirby Griffis, entitledThe Role of Statistical Significance in Daubert / Rule 702 Hearings,” in its Critical Legal Issues Working Paper Series, (Mar. 2017) [cited below as Griffis]. I am a fan of many of the Foundation’s Working Papers (having written one some years ago), but this one gives me pause.

Griffis’s paper manages to avoid many of the common errors of lawyers writing about this topic, but adds little to the statistics chapter in the Reference Manual on Scientific Evidence (3d ed. 2011), and he propagates some new, unfortunate misunderstandings. On the positive side, Griffis studiously avoids the transposition fallacy in defining significance probability, and he notes that multiplicity from subgroups and multiple comparisons often undermines claims of statistical significance. Griffis gets both points right. These are woefully common errors, and they deserve the emphasis Griffis gives to them in this working paper.

On the negative side, however, Griffis falls into error on several points. Griffis helpfully narrates the Supreme Court’s evolution in Daubert and then in Joiner, but he fails to address the serious mischief and devolution introduced by the Court’s opinion in Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309 (2011). See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, “Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety & Liability Reporter 1007 (Sept. 12, 2011). With respect to statistical practice, this Working Paper is at times wide of the mark.


Although avoiding the transposition fallacy, Griffis falls into another mistake in interpreting tests of significance; he states that a non-significant result tells us that an hypothesis is “perfectly consistent with mere chance”! Griffis at 9. This is, of course, wrong, or at least seriously misleading. A failure to reject the null hypothesis does not prove the null such that we can say that the “null results” in one study were perfectly consistent with chance. The test may have lacked power to detect an “effect size” of interest. Furthermore, tests of significance cannot rule out systematic bias or confounding, and that limitation alone ensures that Griffis’s interpretation is mistaken. A null result may have resulted from bias or confounding that obscured a measurable association.

Griffis states that p-values are expressed as percentages “usually 95% or 99%, corresponding to 0.05 or 0.01,” but this states things backwards. The p-value that is pre-specified to be “significant” is a probability or percentage that is low; it is the coefficient of confidence used to construct a confidence interval that is the complement of the significance probability. Griffis at 10. An alpha, or pre-specified statistical significance level, of 5% thus corresponds to a coefficient of confidence of 95% (or 1.0 – 0.05).

The Mid-p Controversy

In discussing the emerging case law, Griffis rightly points to cases that chastise Dr. Nicholas Jewell for the many liberties he has taken in various litigations as an expert witness for the lawsuit industry. One instance cited by Griffis is the Lipitor diabetes litigation, where the MDL court suggested that Jewell switched improperly from a Fisher’s exact test to a mid-test. Griffis at 18-19. Griffis seems to agree, but as I have explained elsewhere, Fisher’s exact test generates a one-tailed measure of significance probability, and the analyst is left to one of several ways of calculating a two-tailed test. SeeLipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test” (April 21, 2016). The mid-p is one legitimate approach for asymmetric distributions, and is more favorable to the defense than passing off the one-tailed measure as the result of the test. The mere fact that a statistical software package does not automatically specify the mid-p for a Fisher’s exact analysis does not make invoking this measure into p-hacking or other misconduct. Doubling the attained significance probability of a particular Fisher’s exact test result is generally considered less accurate than a mid-p calculation, even though some software packages using doubling attained significance probability as a default. As much as we might dislike bailing Jewell out of Daubert limbo, on this one, limited point, he deserved a better hearing.


On recounting the Bendectin litigation, Griffis refers to the epidemiologic studies of birth defects and Bendectin as “experiments,” Griffis at 7, and then describes such studies as comparing “populations,” when he clearly meant “samples.” Griffis at 8.

Griffis conflates personal bias with bias as a scientific concept of systematic error in research, a confusion usually perpetuated by plaintiffs’ counsel. See Griffis at 9 (“Coins are not the only things that can be biased: scientists can be, too, as can their experimental subjects, their hypotheses, and their manipulations of the data.”) Of course, the term has multiple connotations, but too often an accusation of personal bias, such as conflict of interest, is used to avoid engaging with the merits of a study.

Relative Risks

Griffis correctly describes the measure known as “relative risk” as a determination of the “the strength of a particular association.” Griffis at 10. The discussion then lapses into using a given relative risk as a measure of the likelihood that an individual with the exposure studied develop the disease. Sometimes this general-to-specific inference is warranted, but without further analysis, it is impossible to tell whether Griffis lapsed from general to specific, deliberately or inadvertently, in describing the interpretation of relative risk.


Griffis is right in his chief contention that the proper planning, conduct and interpretation statistical tests is hugely important to judicial gatekeeping of some expert witness opinion testimony under Federal Rule of Evidence 702 (and under Rule 703, too). Judicial and lawyer aptitude in this area is low, and needs to be bolstered.

Statistical Analysis Requires an Expert Witness with Statistical Expertise

November 13th, 2016

Christina K. Connearne sued her employer, Main Line Hospitals, for age discrimination. Main Line charged Connearne with fabricating medical records, but Connearne replied that the charge was merely a pretext. Connearney v. Main Line Hospitals, Inc., Civ. Action No. 15-02730, 2016 WL 6569292 (E.D. Pa. Nov. 4, 2016) [cited as Connearney]. Connearne’s legal counsel engaged Christopher Wright, an expert witness on “human resources,” for a variety of opinions, most of which were not relevant to the action. Alas, for Ms. Connearne, the few relevant opinions proffered by Wright were unreliable. On a Rule 702 motion, Judge Pappert excluded Wright from testifying at trial.

Although not a statistician, Wright sought to offer his statistical analysis in support of the age discrimination claim. Connearney at *4. According to Judge Pappert’s opinion, Wright had taken just two classes in statistics, but perhaps His Honor meant two courses. (Wright Dep., at 10:3–4.) If the latter, then Wright had more statistical training than most physicians who are often permitted to give bogus statistical opinions in health effects litigation. In 2015, the Medical College Admission Test apparently started to include some very basic questions on statistical concepts. Some medical schools now require an undergraduate course in statistics. See Harvard Medical School Requirements for Admission (2016). Most medical schools, however, still do not require statistical training for their entering students. See Veritas Prep, “How to Select Undergraduate Premed Coursework” (Dec. 5, 2011); “Georgetown College Course Requirements for Medical School” (2016).

Regardless of formal training, or lack thereof, Christopher Wright demonstrated a profound ignorance of, and disregard for, statistical concepts. (Wright Dep., at 10:15–12:10; 28:6–14.) Wright was shown to be the wrong expert witness for the job by his inability to define statistical significance. When asked what he understood to be a “statistically significant sample,” Wright gave a meaningless, incoherent answer:

I think it depends on the environment that you’re analyzing. If you look at things like political polls, you and I wouldn’t necessarily say that serving [sic] 1 percent of a population is a statistically significant sample, yet it is the methodology that’s used in the political polls. In the HR field, you tend to not limit yourself to statistical sampling because you then would miss outliers. So, most HR statistical work tends to be let’s look at the entire population of whatever it is we’re looking at and go from there.”

Connearney at *5 (Wright Dep., at 10:15–11:7). When questioned again, more specifically on the meaning of statistical significance, Wright demonstrated his complete ignorance of the subject:

Q: And do you recall the testimony it’s generally around 85 to 90 employees at any given time, the ER [emergency room]?

A: I don’t recall that specific number, no.

Q: And four employees out of 85 or 90 is about what, 5 or 6 percent?

A: I’m agreeing with your math, yes.

Q: Is that a statistically significant sample?

A: In the HR [human resources] field it sure is, yes.

Q: Based on what?

A: Well, if one employee had been hit, physically struck, by their boss, that’s less than 5 percent. That’s statistically significant.”

Connearney at *5 n.5 (Wright Dep., at 28:6–14)

In support of his opinion about “disparate treatment,” Wright’s report contained nothing than a naked comparison of two raw percentages and a causal conclusion, without any statistical analysis. Even for this simplistic comparison of rates, Wright failed to explain how he obtained the percentages in a way that permitted the parties and the trial court to understand his computation and his comparisons. Without a statistical analysis, the trial court concluded that Wright had failed to show that the disparity in termination rates among younger and older employees was not likely consistent with random chance. See also Moultrie v. Martin, 690 F. 2d 1078 (4th Cir. 1982) (rejecting writ of habeas corpus when petitioner failed to support claim of grand jury race discrimination with anything other than the numbers of white and black grand jurors).

Although Wright gave the wrong definition of statistical significance, the trial court relied upon judges of the Third Circuit who also did not get the definition quite right. The trial court cited a 2010 case in the Circuit, which conflated substantive and statistical significance and then gave a questionable definition of statistical significance:

The Supreme Court has not provided any definitive guidance about when statistical evidence is sufficiently substantial, but a leading treatise notes that ‘[t]he most widely used means of showing that an observed disparity in outcomes is sufficiently substantial to satisfy the plaintiff’s burden of proving adverse impact is to show that the disparity is sufficiently large that it is highly unlikely to have occurred at random.’ This is typically done by the use of tests of statistical significance, which determine the probability of the observed disparity obtaining by chance.”

See Connearney at *6 & n.7, citing and quoting from Stagi v. National RR Passenger Corp., 391 Fed. Appx. 133, 137 (3d Cir. 2010) (emphasis added) (internal citation omitted). Ultimately, however, this was all harmless error on the way to the right result.

Benhaim v. St. Germain – Supreme Court of Canada Wrestles With Probability

November 11th, 2016

On November 10, 2016, the Supreme Court of Canada handed down a divided (four-to-three decision) in a medical malpractice case, which involved statistical evidence, or rather probabilistic inference. Benhaim v. St-Germain, 2016 SCC 48 (Nov. 10, 2016).  The case involved an appeal from a Quebec trial court, and the Quebec Court of Appeal, and some issues peculiar to Canadian lawyers. For one thing, Canadian law does not appear to follow lost-chance doctrine outlined in the American Law Institute’s Restatement. The consequence seems to be that negligent omissions in the professional liability context are assessed for their causal effect by the Canadian “balance of probabilities” standard.

The facts were reasonably clear, although their interpretation were disputed. In November 2005, Mr. Émond was 44 years old, a lifelong non-smoker, and in good health. At his annual physical with general practitioner Dr. Albert Benhaim, Émond had a chest X-ray (CXR). Benhaim at 11, 6. Remarkably, neither the majority nor the dissent commented upon the lack of reasonable medical necessity for a CXR in a healthy, non-smoking 40-something male. Few insurers in the United States would have paid for such a procedure. Maybe Canadian healthcare is more expansive than what we see in the United States.

The radiologist reviewing Mr. Émond’s CXR reported a 1.5 to 2.0 cm solitary lesion, and suggested a review with previous CXRs and a recommendation for a CT scan of the thorax. Dr. Benhaim did not follow the radiologist’s suggestions, but Mr. Émond did have a repeat CXR two months later, on January 17, 2006, which was interpreted as unchanged. A recommendation for a follow-up third CXR in four months was not acted upon. Benhaim at 11, 7. The trial court found that the defendant physicians deviated from the professional standard of care, a finding from which there was no appeal.

Mr. Émond did have a follow-up CXR at the end of 2006, on December 4, 2006, which showed that the solitary lung nodule had grown. Follow up CT and PET scans confirmed that Mr. Émond had Stage IV lung cancer. Id.

The issues in controversy turned on the staging of Mr. Émond’s lung cancer at the time of his first CXR, in November 2005, the medical consequences of the delay in diagnosis. Plaintiffs presented expert witness opinion testimony that Mr. Émond’s lung cancer was only Stage I (or at most IIA), at initial radiographic discovery of a nodule, and that he was at Stage III or IV in December 2006, when CT and PET scans confirmed the actual diagnosis of lung cancer. In the view of plaintiff’s expert witnesses, the delay in diagnosis, and the accompanying growth of the tumor and change from Stage I to IV, dramatically decreased Émond’s chance of survival. Id. At 13, 15-16. Indeed, plaintiff’s expert witnesses opined that had Mr. Émond been timely diagnosed and treated in November 2005, he probably would have been cured.

The defense expert witness, Dr. Ferraro, testified that Mr. Émond’s lung cancer was Stage III or IV in November 2005, when the radiographic nodule was first seen, and his chances of survival at that time were already quite poor. According to Dr. Ferraro, earlier intervention and treatment would probably not have been successful in curing Mr. Émond, and the delay in diagnosis was not a cause of his death.

The trial court rejected plaintiffs’ expert witnesses’ opinions on factual grounds. These witnesses had argued that Mr. Émond’s lung cancer was at Stage I in November 2005 because the lung nodule was less than 3 cm., and because Mr. Émond was asymptomatic and in good health. These three points of contention were clearly unreliable because they were all present in January 2007, when Mr. Émond was diagnosed with Stage IV cancer, according to all the expert witnesses. Every point cited by plaintiffs’ expert witnesses in support of their staging failed to discriminate Stage I from Stage III. In Her Honor’s opinion, the lung cancer was probably Stage III in November 2005, and this staging implied a poor prognosis on all the expert witnesses’ opinions. The failure to diagnose until late 2006 was thus not, on the “balance of probabilities” a cause of death. Id. At 15, ¶21.

The intermediate appellate court reversed on grounds of a presumption of causation, which comes into being when the defendant’s negligence interferes with plaintiff’s ability to show causation, and there is some independent evidence of causation to support the case. I will leave this presumption, which the Supreme Court of Canada held inappropriate on the facts of this case, to Canadian lawyers to debate. What was more interesting was the independent evidence adduced by plaintiffs. This evidence consisted of statistical evidence in the form of generality that 78 percent of fortuitously discovered lung cancers are at Stage I, which in turn is associated with a cure rate of 70 percent. Id. at 18 30.

The plaintiffs’ witnesses hoped to apply this generality to this case, notwithstanding that Émond’s nodule was close to 2 cm. on CXR, that the general statistic was based up more sensitive CT studies, and that Émond had been a non-smoker (which may have influenced tumor growth and staging). Furthermore, there was an additional, ominous finding in Mr. Émond’s first CXR, of hilar prominence, which supported the defense’s differentiation of his case from the generality of fortuitously discovered (presumably small, solitary lung nodules without hilar involvement). Id. at 44 83.

The trial court rejected the inference from the group statistic of 70% survival to the conclusion that Mr. Émond had a 70% probability of survival. Tellingly, there was no discussion of the variance for the 70% figure; nor any mention of relevant subgroups. The Court of Appeals, however, would have turned this statistic into a binding presumption by virtue of accepting the 78 percent as providing strong evidencec that the 70% survival figure pertained to Mr. Émond. The intermediate appellate court would then have taken the group survival rate as providing a more likely than not conclusion about Mr. Émond, while rejecting the defense expert witness’s statistics as mere speculation. Id. at 36 ¶67.

Adopting a skeptical stance with respect to probabilistic evidence, the Supreme Court reversed the Quebec Court of Appeal’s reversal of the trial court’s judgment. The Court cited Richard Wright and Jonathan Cohen’s criticisms of probabilistic evidence (and Cohen’s Gatecrasher’s Paradox), and urged caution in applying class or group statistics to generate probabilities that class members share the group characteristic.

Appellate courts should generally not interfere with a trial judge’s decision not to draw an inference from a general statistic to a particular case. Statistics themselves are silent about whether the particular parties before the court would have conformed to the trend or been an exception from it. Without an evidentiary bridge to the specific circumstances of the plaintiff, statistical evidence is of little assistance. For this reason, such general trends are not determinative in particular cases. What inferences follow from such evidence — whether the generalization that a statistic represents is instantiated in the particular case — is a matter for the trier of fact. This determination must be made with reference to the whole of the evidence.”

Benhaim at 39, 74, 75 (internal citations omitted).

To some extent, the Supreme Court’s comments about statistical evidence were rather wide of there mark. The 78% statistic was based upon a high level of generality, namely all cases, without regard for the size of the radiographically discovered lesion, the manner of discovery (CXR versus CT), presence or absence of hilar pathology, or group or individual’s smoking status. In the context of the facts of the case, however, the trial court clearly had a factual basis for resisting the application of the group statistic (78% fortuitously discovered tumors were Stage I with 70% five-year survival).

The Canadian Supreme Court seems to have navigated these probabilistic waters fairly adeptly, although the majority opinion contains broad brush generalities and inaccuracies, which will, no doubt, show up in future lower court cases. For instance:

This is because the law requires proof of causation only on a balance of probabilities, whereas scientific or medical experts often require a higher degree of certainty before drawing conclusions on causation (p. 330). Simply put, scientific causation and factual causation for legal purposes are two different things.”

Benhaim at 24, 47. The Court cited legal precedent for its observation, and not any scientific treatises. And then, the Supreme Court suggested that all one needs to prevail in a tort case in Canada is a medical expert witness who speculates:

Trial judges are empowered to make legal determinations even where medical experts are not able to express an opinion with certainty.

Benhaim at 37, 72Clearly dictum on the facts of Benhaim, but it seems that judges in Canada are like those in the United States. Black robes empower them to do what mere scientists could not do. If we were to ignore the holding of Benhaim, we might think that all one needs in Canada is a medical expert who speculates.