Goodman v Viljoen – Statistical Fallacies from Both Sides

There was a deep irony to the Goodman[1] case.  If a drug company, in 1995, marketed antenatal corticosteroid (ACS) for the prevention of cerebral palsy (CP) in the United States, the government might well have prosecuted the company for misbranding.  The company might also be subject to a False Claims Act case as well. No clinical trial had found ACS efficacious for the prevention of CP at the significance level typically required by the FDA; no meta-analysis had found ACS statistically significantly better than placebo for this purpose.  In the Goodman case, however, failure to order a full course of ACS was malpractice with respect to the claimed causation of CP in the Goodman twins.

The Goodman case also occasioned a well-worn debate over the difference between scientific and legal evidence, inference, and standards of “proof.” The plaintiffs’ case rested upon a Cochrane review of ACS with respect to various outcomes. For CP, the Cochrane meta-analyzed only clinical trial data, and reported:

“a trend towards fewer children having cerebral palsy (RR 0.60, 95% CI 0.34 to 1.03, five studies, 904 children, age at follow up two to six years in four studies, and unknown in one study).”[2]

The defendant, Dr. Viljoen, appeared to argue that the Cochrane meta-analysis must be disregarded because it did not provide a showing of efficacy for ACS in preventing CP, at a significance probability less than 5 percent.  Here is the trial court’s characterization of Dr. Viljoen’s argument:

“[192] The argument that the Cochrane data concerning the effects of ACS on CP must be ignored because it fails to reach statistical significance rests on the flawed premise that legal causation requires the same standard of proof as medical/scientific causation. This is of course not the case; the two standards are in fact quite different. The law is clear that scientific certainty is not required to prove causation to the legal standard of proof on a balance of probabilities (See: Snell v. Farrell, [1990] 2 S.C.R. 311, at para. 34). Accordingly, the defendant’s argument in this regard must fail and for the purposes of this court, I accept the finding of the Cochrane analysis that ACS reduces the instance [sic] of CP by 40%.”

“Disregard” seems extreme for a meta-analysis that showed a 40% reduction in risk of a serious central nervous system disorder, with p = 0.065.  Perhaps Dr. Viljoen might have tempered his challenge some by arguing that the Cochrane analysis was insufficient.  One problem with Dr. Viljoen’s strident argument about statistical significance was that it overshadowed the more difficult, qualitative arguments about threats to validity in the Cochrane finding from loss to follow up in the aggregated trial data. These threats were probably stronger arguments against accepting the Cochrane “trend” as a causal conclusion. Indeed, the validity and the individual studies and the meta-analyses, along with questions about the accuracy of data, were not reflected in Bayesian analysis.

Another problem is that Dr. Viljoen’s strident assertion that p < 0.05 was absolutely necessary fed plaintiffs’ argument that the defendant was attempting to change the burden of proof for plaintiffs from greater than 50% to 95% or greater.  Given the defendant’s position, great care was required to prevent the trial court from committing the transposition fallacy.

Justice Walters rejected the suggestion that a meta-analysis with a p-value of 6.5% should be disregarded, but the court’s discussion skirts the question whether and how the Cochrane data can be sufficient to support a conclusion of ACS efficacy. Aside from citing a legal case, however, Justice Walters provided no basis for suggesting that the scientific standard of proof was different from the legal standard. From the trial court’s opinion, the parties or their expert witnesses appeared to conflate “confidence,” a technical term when used to describe intervals or random error around sample statistics, with “level of certainty” in the obtained result.

Justice Walters is certainly not the first judge to fall prey to the fallacious argument that the scientific burden of proof is 95%.[3]  The 95% is, of course, the coefficient of confidence for the confidence interval that is based upon a p-value of 5%. No other explanation for why 95% is a “scientific” standard of proof was offered in Goodman; nor is it likely that anyone could point to an authoritative source for the claim that scientists actually adjudge facts and theories by this 95 percent probability level.

Justice Walters’ confusion was led by the transposition fallacy, which confuses posterior and significance probabilities.  Here is a sampling from Her Honor’s opinion, first from Dr. Jon Barrett, one of the plaintiffs’ expert witnesses, an obstetrician and fetal maternal medicine specialist at Sunnybrook Hospital, in Toronto, Ontario:

“[85] Dr. Barrett’s opinion was not undermined during his lengthy cross-examination. He acknowledged that the scientific standard demands 95% certainty. He is, however, prepared to accept a lower degree of certainty. To him, 85 % is not merely a chance outcome.

                                                                                        * * *

[87] He acknowledged that scientific evidence in support of the use of corticosteroids has never shown statistical significance with respect to CP. However, he explained it is very close at 93.5%. He cautioned that if you use a black and white outlook and ignore the obvious trends, you will falsely come to the conclusion that there is no effect.”

Dr. Jon (Yoseph) Barrett is a well-respected physician, who specializes in high-risk pregnancies, but his characterization of a black-white outlook on significance testing as leading to a false conclusion of no effect was statistically doubtful.[4]  Dr. Barrett may have to make divinely inspired choices in surgery, but in a courtroom, expert witnesses are permitted to say that they just do not know. Failure to achieve statistical significance, with p < 0.05, does support a conclusion that there is no effect.

Professor Andrew Willan was plaintiffs’ testifying expert witness on statistics.  Here is how Justice Walters summarized Willan’s testimony:

“[125] Dr. Willan described different statistical approaches and in particular, the frequentist or classical approach and the Bayesian approach which differ in their respective definitions of probability. Simply, the classical approach allows you to test the hypothesis that there is no difference between the treatment and a placebo. Assuming that there is no difference, allows one to make statements about the probability that the results are not due to chance alone.

To reach statistical significance, a standard of 95% is required. A new treatment will not be adopted into practice unless there is less than a 5% chance that the results are due to chance alone (rather than due to true treatment effect).

[127] * * * The P value represents the frequentist term of probability. For the CP analysis [from the Cochrane meta-analysis], the P value is 0.065. From a statistical perspective, that means that there is a 6.5% chance that the differences that are being observed between the treatment arm versus the non-treatment arm are due to chance rather than the treatment, or conversely, a 93.5% chance that they are not.”

Justice Walters did not provide transcript references for these statements, but they are clear examples of the transposition fallacy. The court’s summary may have been unfair to Professor Willan, who seems to have taken care to avoid the transposition fallacy in his testimony:

“And I just want to draw your attention to the thing in parenthesis where it says, “P = 0.065.” So, basically that is the probability of observing data this extremely, this much in favor of ACS given, if, if in fact the no [sic, null] hypothesis was true. So, if, if the no hypothesis was true, that is there was no difference, then the probability of observing this data is only 6.5 percent.”

Notes of Testimony of Andrew Willan at 26 (April , 2010). In this quote, Professor Willan might have been more careful to point out that the significance probability of 6.5%  is a cumulative probability by describing the data observed “this extremely” and more. Nevertheless, Willan certainly made clear that the probability measure was based upon assuming the correctness of the null hypothesis. The trial court, alas, erred in stating the relevant statistical concepts.

And then there was the bizarre description by Justice Walters, of the Cochrane data, as embodying a near-uniform distribution represented by the Cochrane data:

“[190] * * * The Cochrane analysis found that ACS reduced the risk of CP (in its entirety) by 40%, 93.5% of the time.”

The trial court did not give the basis for this erroneous description of the Cochrane ACS/CP data.[5] To be sure, if the Cochrane result were true, then 40% reduction might be the expected value for all trials, but it would be a remarkable occurrence for 93.5% of the trials to obtain the same risk ratio as the one observed in the meta-analysis.

The defendant’s expert witness on statistical issues, Prof. Robert Platt, similarly testified that the significance probability reported by the Cochrane was dependent upon an assumption of the null hypothesis of no association:

“What statistical significance tells us, and I mentioned at the beginning that it refers to the probability of a chance finding could occur under the null-hypothesis of no effect. Essentially, it provides evidence in favour of there being an effect.  It doesn’t tell us anything about the magnitude of that effect.”

Notes of Testimony of Robert Platt at 11 (April 19, 2010)

Perhaps part of the confusion resulted from Prof. Willan’s sponsored Bayesian analysis, which led him to opine that the Cochrane data permitted him to state that there was a 91 to 97 percent probability of an effect, which might have appeared to the trial court to be saying the same thing as interpretation of the Cochrane’s p-value of 6.5%.  Indeed, Justice Walters may have had some assistance in this confusion from the defense statistical expert witness, Prof. Platt, who testified:

“From the inference perspective the p-value of 0.065 that we observe in the Cochrane review versus a 91 to 97 percent probability that there is an effect, those amount to the same thing.”

Notes of Testimony of Robert Platt at 50 (April 19, 2010).  Now the complement of the p-value, 93.5%, may have fallen within the range of posterior probabilities asserted by Professor Willan, but these probabilities are decidedly not the same thing.

Perhaps Prof. Platt was referring only to the numerical equivalence, but his language, “the same thing,” certainly could have bred misunderstanding.  The defense apparently attacked the reliability of the Bayesian analysis before trial, only to abandon the challenge by the time of trial.  At trial, defense expert witness Prof. Platt testified that he did not challenge Willan’s Bayesian analysis, or the computation of posterior probabilities.  Platt’s acquiescence in Willan’s Bayesian analysis is unfortunate because the parties never developed testimony exactly as to how Willan arrived at his posterior probabilities, and especially as to what prior probability he employed.

Professor Platt went on to qualify his understanding of Willan’s Bayesian analysis as providing a posterior probability that there is an effect, or in other words, that the “effect size” is greater than 1.0.  At trial, the parties spent a good deal of time showing that the Cochrane risk ratio of 0.6 represented the decreased risk for CP of administering a full course of ACS, and that this statistic could be presented as an increased CP risk ratio of 1.7, for not having administered a full course of ACS.  Platt and Willan appeared to agree that the posterior probability described the cumulative posterior probabilities for increased risks above 1.0.

“[T]he 91% is a probability that the effect is greater than 1.0, not that it is 1.7 relative risk.”

Notes of Testimony of Robert Platt at 51 (April 19, 2010); see also Notes of Testimony of Andrew Willan at 34 (April 9, 2010) (concluding that ACS reduces risk of CP, with a probability of 91 to 97 percent, depending upon whether random effects or fixed effect models are used).[6]

One point on which the parties’ expert witnesses did not agree was whether the failure of the Cochrane’s meta-analysis to achieve statistical significance was due solely to the sparse data aggregated from the randomized trials. Plaintiffs’ witnesses appeared to have testified that had the Cochrane been able to aggregate additional clinical trial data, the “effect size” would have remained constant, and the p-value would have shrunk, ultimately to below the level of 5 percent.  Prof. Platt, testifying for the defense, appropriately criticized this hand-waving excuse:

“Q. and the probability factor, the P value, was 0.065, which the previous witness had suggested is an increase in probability of our reliability on the underlying data.  Is it reasonable to assume that this data that a further increase in the sample size will achieve statistical significance?

A. No, that’s not a reasonable assumption….”

Notes of Testimony of Robert Platt at 29 (April 19, 2010).

Positions on Appeal

Dr. Viljoen continued to assert the need for significance on appeal. As appellant, he challenged the trial court’s finding that the Cochrane review concluded that there was a 40% risk reduction. See Goodman v. Viljoen, 2011 ONSC 821, at ¶192 (CanLII) (“I accept the finding of the Cochrane analysis that ACS reduces the instance of CP by 40%”). Dr. Viljoen correctly pointed out that the Cochrane review never reached such a conclusion. Appellant’s Factum, 2012 CCLTFactum 20936, ¶64.  It was the plaintiffs’ expert witnesses, not the Cochrane reviewers, who reached the conclusion of causality from the Cochrane data.

On appeal, Dr. Viljoen pressed the point that his expert witnesses described statistical significance in the Cochrane analysis would have been “a basic and universally accepted standard” for showing that ACS was efficacious in preventing CP or PVL. Id. at ¶40. The appellant’s brief then commits to the very error that Dr. Barrett complained would follow from a finding that did not have statistical significance; Dr. Viljoen maintained that the “trend” of reduced CP reduced CD rates from ACS administration “is the same as a chance occurrence.” Defendant (Appellant), 2012 CCLTFactum 20936, at ¶40; see also id. at ¶14(e) (arguing that the Cochrane result for ACS/CP “should be treated as pure chance given it was not a statistically significant difference”).

Relying upon the Daubert decision from the United States, as well as Canadian cases, Dr. Viljoen framed one of his appellate issues as whether the trial court had “erred in relying upon scientific evidence that had not satisfied the benchmark of statistical significance”:

“101. Where a scientific effect is not shown to a level of statistical significance, it is not proven. No study has demonstrated a reduction in cerebral palsy with antenatal corticosteroids at a level of statistical significance.

102. The Trial Judge erred in law in accepting that antenatal corticosteroids reduce the risk of cerebral palsy based on Dr. Willan’s unpublished Bayesian probability analysis of the 48 cases of cerebral palsy reviewed by Cochrane—an analysis prepared for the specific purpose of overcoming the statistical limitations faced by the Plaintiffs on causation.”

Defendant (Appellant), 2012 CCLTFactum 20936. The use of the verb “proven” is problematic because it suggests a mathematical demonstration, which is never available for empirical propositions about the world, and especially not for the biological world.  The use of a mathematical standard begs the question whether the Cochrane data were sufficient to establish a scientific conclusion of the efficacy of ACS in preventing CP.

In opposing Dr. Viljoen’s appeal, the plaintiffs capitalized upon his assertion that science requires a very high level of posterior probability for establishing a causal claim, by simply agreeing with it. See Plaintiffs’ (Respondents’) Factum,  2012 CCLTFactum 20937, at ¶31 (“The scientific method requires statistical significance at a 95% level.”).  By accepting the idealized notion that science somehow requires 95% certainty (as opposed to 95% confidence levels as a test for assessing random error), the plaintiffs made the defendant’s legal position untenable.

In order to keep the appellate court thinking that the defendant was imposing an extra-legal, higher burden of proof upon plaintiffs, the plaintiffs went so far as to misrepresent the testimony of their own expert witness, Professor Willan, as having committed the transposition fallacy:

“49. Dr. Willan provided the frequentist explanation of the Cochrane analysis on CP:

a. The risk ratio (RR) is .060 which means that there is a 40% risk reduction in cerebral palsy where there has been administration of antenatal corticosteroids;

b. The upper limit of the confidence interval (CI) barely crosses 1 so it just barely fails to meet the rigid test of statistical significance;

c. The p value represents the frequentist term of probability;

d. In this case the p value is .065;

e. From a statistical perspective that means that there is a 6.5% chance that the difference observed in CP rates is due to chance alone;

f. Conversely there is a 93.5% chance that the result (the 40% reduction in CP) is due to a true treatment effect of ACS.”

2012 CCLTFactum 20937, at ¶49 (citing Evidence of Dr. Willan, Respondents’ Compendium, Tab 4, pgs. 43-52).

Although Justice Doherty dissented from the affirmance of the trial court’s judgment, he succumbed to the parties’ misrepresentations about scientific certainty, and their prevalent commission of the transposition fallacy. Goodman v. Viljoen, 2012 ONCA 896 (CanLII) at ¶36 (“Scientists will draw a cause and effect relationship only when a result follows at least 95 per cent of the time. The results reported in the Cochrane analysis fell just below that standard.”), leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

The statistical errors on both sides redounded to the benefit of the plaintiffs.


[1] Goodman v. Viljoen, 2011 ONSC 821 (CanLII), aff’d, 2012 ONCA 896 (CanLII), leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

[2] Devender Roberts & Stuart R Dalziel “Antenatal corticosteroids for accelerating fetal lung maturation for women at risk of preterm birth,” Cochrane Database of Systematic Reviews, at 8, Issue 3. Art. No. CD004454 (2006).

[3] See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191, 193 (S.D.N.Y. 2005) (fallaciously arguing that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant.  Id. at 188, 193.  See also Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009) (criticizing the Ephedra decision for confusing posterior probability with significance probability).

[4] I do not have the complete transcript of Dr. Barrett’s testimony, but the following excerpt from April 9, 2010, at page 100, suggests that he helped lead Justice Walters into error: “When you say statistical significance, if you say that something is statistically significance, it means you’re, for the scientific notation, 95 percent sure. That’s the standard we use, 95 percent sure that that result could not have happened by chance. There’s still a 5 percent chance it could. It doesn’t mean for sure, but 95 percent you’re sure that the result you’ve got didn’t happen by chance.”

[5] On appeal, the dissenting judge erroneously accepted Justice Walters’ description of the Cochrane review as having supposedly reported a 40% reduction in CP incidence, 93.5% of the time, from use of ACS. Goodman v. Viljoen, 2012 ONCA 896 (CanLII) at ¶36, leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

[6] The Bayesian analysis did not cure the attributability problem with respect to specific causation.