TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Hindsight Bias – In Science & in the Law

February 27th, 2022

In the early 1970s, Amos Tversky and Daniel Kahneman raised the awareness of hindsight bias as a pervasive phenomenon in all human judgment.[1] Although these insights seemed eponymously obvious in hindsight, experimental psychologists directly tested the existence and extent of hindsight bias in a now classic paper by Baruch Fischhoff.[2] The lack of awareness of how hindsight bias affects our historical judgments seriously limits our ability to judge the past.

Kahneman’s participation in the planning phase of a new, fourth edition of the Reference Manual on Scientific Evidence, is a hopeful sign that his insights and the research of many psychologists will gain a fuller recognition in the law. Hindsight bias afflicts judges, lawyers, jurors, expert witnesses, scientists, physicians, and children of all ages.[3]

Hindsight Bias in the Law

Sixth Amendment Challenges

Challenges to the effectiveness of legal counsel is a mainstay for habeas petitions, filed by convicted felons. In hindsight, their lawyers’ conduct seems woefully inadequate. In judging such claims of ineffectiveness, the United States Supreme Court acknowledged the role and influence of hindsight bias in judging trial counsel’s strategic decisions:

“A fair assessment of attorney performance requires that every effort be made to eliminate the distorting effects of hindsight, to reconstruct the circumstances of counsel’s challenged conduct, and to evaluate the conduct from counsel’s perspective at the time. Because of the difficulties inherent in making the evaluation, a court must indulge a strong presumption that counsel’s conduct falls within the wide range of reasonable professional assistance; that is, the defendant must overcome the presumption that, under the circumstances, the challenged action might be considered sound trial strategy.”[4]

This decision raises the interesting question why there is not a strong presumption of reasonableness in other legal contexts, such as the “reasonableness” of physician judgments, or of adequate warnings.

Medical Malpractice

There is little doubt that retrospective judgments of the reasonableness of medical decisions is infected, distorted, and corrupted by hindsight bias.[5] In the words of one paper on the subject:

“There is evidence that hindsight bias, which may cause the expert to simplify, trivialise and criticise retrospectively the decisions of the treating doctor, is inevitable when the expert knows there has been an adverse outcome.”[6]

Requiring the finder of fact to assess the reasonableness of complex medical judgments in hindsight, with knowledge of the real-world outcomes of the prior judgments, pose a major threat to fairness in the trial process, in both bench and jury trials. Curiously, lawyers receive a “strong presumption” of reasonableness, but physicians and manufacturers do not.

Patent Litigation

Hindsight bias plays a large role in challenging patent validity. The works of genius seem obvious with hindsight. In the context of judging patent criteria such non-obviousness, the Supreme Court has emphasized that:

“A factfinder should be aware, of course, of the distortion caused by hindsight bias and must be cautious of arguments reliant upon ex post reasoning.”[7]

Certainly, factfinders in every kind of litigation, not just intellectual property cases, should be made aware of the distortion caused by hindsight bias.

Remedies

In all likelihood, hindsight bias can probably never be fully corrected. At a minimum, factfinders should be educated about the phenomenon. In criminal cases, defendants have called psychologists about the inherent difficulties in eyewitness or cross-race identification.[8] In New Jersey, trial courts must give a precautionary instruction in criminal cases that involve eyewitness identification.[9] In some but not all discrimination cases, courts have permitted expert witness opinion testimony about “implicit bias.”[10] In “long-tail” litigation, in which jurors must consider the reasonableness of warning decisions, or claims of failure to test, decades before the trial, defendants may well want to consider calling a psychologist to testify about the reality of hindsight bias, and how it leads to incorrect judgments about past events.

Another, independent remedy would be for the trial court to give a jury instruction on hindsight bias.  After all, the Supreme Court has clearly stated that “[a] factfinder should be aware, of course, of the distortion caused by hindsight bias and must be cautious of arguments reliant upon ex post reasoning.” The trial judge should set the stage for a proper consideration of past events, by alerting jurors to the reality and seductiveness of hindsight bias. What follows is a first attempt at such an instruction. I would love to hear from anyone who has submitted a proposed instruction on the issue.

Members of the jury, this case will require your determination of what were the facts of what scientists knew or should have known at a time in the past. At the same time that you try to make this determination, you will have been made aware of what is now known. Psychological research clearly shows that all human beings, regardless of their age, education, or life circumstances have what is known as hindsight bias. Having this bias means that we all tend to assume that people at times past should have known what we now in fact know. Calling it a bias is a way to say that this assumption is wrong. To decide this case fairly, you must try to determine what people, including experts in the field, actually knew and did before there were more recent discoveries, and without reference to what is now known and accepted.


[1] Amos Tversky & Daniel Kahneman, “Judgment under uncertainty: heuristics and Biases,” 185 Science 1124 (1974). See alsoPeople Get Ready – There’s a Reference Manual a Comin’ ”(June 6, 2021).

[2] Baruch Fischhoff, “Hindsight ≠ foresight: the effect of outcome knowledge on judgment under uncertainty,” 1 Experimental Psychology: Human Perception & Performance 288, 288 (1975), reprinted in 12 Quality & Safety Health Care 304 (2003); Baruch Fischhoff & Ruth Beyth, “I knew it would happen: Remembered probabilities of once – future things?” 13 Organizational Behavior & Human Performance 1 (1975); see Baruch Fischhoff, “An Early History of Hindsight Research,” 25 Social Cognition 10 (2007).

[3] See Daniel M. Bernstein, Edgar Erdfelder, Andrew N. Meltzoff, William Peria & Geoffrey R. Loftus, “Hindsight Bias from 3 to 95 Years of Age,” 37 J. Experimental Psychol., Learning, Memory & Cognition, 378 (2011).

[4] Strickland v. Washington, 466 U.S. 668, 689, 104 S.Ct. 2052, 2052 (1984); see also Feldman v. Thaler, 695 F.3d 372, 378 (5th Cir. 2012).

[5] Edward Banham-Hall & Sian Stevens, “Hindsight bias critically impacts on clinicians’ assessment of care quality in retrospective case note review,” 19 Clinical Medicine 16 (2019); Thom Petty, Lucy Stephenson, Pierre Campbell & Terence Stephenson, “Outcome Bias in Clinical Negligence Medicolegal Cases,” 26 J.Law & Med. 825 (2019); Leonard Berlin, “Malpractice Issues and Radiology – Hindsight Bias” 175 Am. J. Radiol. 597 (2000); Leonard Berlin, “Outcome Bias,” 183 Am. J. Radiol. 557 (2004); Thomas B. Hugh & Sidney W. A. Dekker, “Hindsight bias and outcome bias in the social construction of medical negligence: a review,” 16 J. Law. Med. 846 (2009).

[6] Thomas B. Hugh & G. Douglas Tracy, “Hindsight Bias in Medicolegal Expert Reports,” 176 Med. J. Australia 277 (2002).

[7] KSR International Co. v. Teleflex Inc., 550 U.S. 398, 127 S.Ct. 1727, 1742 (2007) (emphasis added; internal citations omitted).

[8] See Commonwealth v. Walker, 92 A.3d 766 (Pa. 2014) (Todd, J.) (rejecting per se inadmissibility of eyewitness expert witness opinion testimony).

[9] State v. Henderson, 208 N.J. 208, 27 A.3d 872 (2011).

[10] Samaha v. Wash. State Dep’t of Transp., No. cv-10-175-RMP, 2012 WL 11091843, at *4 (E.D. Wash. Jan. 3, 2012) (holding that an expert witness’s proferred opinions about the “concepts of implicit bias and stereotypes is relevant to the issue of whether an employer intentionally discriminated against an employee.”).

Of Significance, Error, Confidence & Confusion – In Law & Statistics

February 27th, 2022

A version of this post appeared previously on Professor Deborah Mayo’s blog, Error Statistics Philosophy. The post was invited as a comment on Professor Mayo’s article in Conservation Biology, which is cited and discussed below. Other commentators had important, insightful comments that can be found at Error Statistics Philosophy.[1] These commentators and many others participated in a virtual special sessionof Professor Mayo’s “Phil Stat Forum,” on January 11, 2022. This session, “Statistical Significance Test Anxiety,” was moderated by David Hand, and included presentations by Deborah Mayo and Yoav Benjamini. The presenters slides, as well as a video of the session are now online.

*      *     *     *     *     *     *     *

The metaphor of law as an “empty vessel” is frequently invoked to describe the law generally, as well as pejoratively to describe lawyers. The metaphor rings true at least in describing how the factual content of legal judgments comes from outside the law. In many varieties of litigation, not only the facts and data, but the scientific and statistical inferences must be added to the “empty vessel” to obtain a correct and meaningful outcome.

Once upon a time, the expertise component of legal judgments came from so-called expert witnesses, who were free to opine about the claims of causality solely by showing that they had more expertise than the lay jurors. In Pennsylvania, for instance, the standard for qualify witnesses to give “expert opinions” was to show that they had “a reasonable pretense to expertise on the subject.”

In the 19th and the first half of the 20th century, causal claims, whether of personal injuries, discrimination, or whatever, virtually always turned on a conception of causation as necessary and sufficient to bring about the alleged harm. In discrimination claims, plaintiffs pointed to the “inexorable zero,” in cases in which no Black citizen was ever seated on a grand jury, in a particular county, since the demise of Reconstruction. In health claims, the mode of reasoning usually followed something like Koch’s postulates.

The second half of the 20th century was marked by the rise of stochastic models in our understanding of the world. The consequence is that statistical inference made its way into the empty vessel. The rapid introduction of statistical thinking into the law did not always go well. In a seminal 1977 discrimination case, Casteneda v. Partida,[2] in an opinion by Associate Justice Blackmun, the court calculated a binomial probability for observing the sample result (rather than a result at least as extreme as such a result), and mislabeled the measurement “standard deviations” rather than standard errors:

“As a general rule for such large samples, if the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would be suspect to a social scientist.  The II-year data here reflect a difference between the expected and observed number of Mexican-Americans of approximately 29 standard deviations. A detailed calculation reveals that the likelihood that such a substantial departure from the expected value would occur by chance is less than I in 10140.”[3]

Justice Blackmun was graduated from Harvard College, summa cum laude, with a major in mathematics.

Despite the extreme statistical disparity in the 11-year run of grand juries, Justice Blackmun’s opinion provoked a robust rejoinder, not only on the statistical analysis, but on the Court’s failure to account for obvious omitted confounding variables in its simplistic analysis. And then there were the inconvenient facts that Mr. Partida was a rapist, indicted by a grand jury (50% with “Hispanic” names), which was appointed by jury commissioners (3/5 Hispanic). Partida was convicted by a petit jury (7/12 Hispanic), in front a trial judge who was Hispanic, and he was denied a writ of habeas court by Judge Garza, who went on to be a member of the Court of Appeals. In any event, Justice Blackmun’s dictum about “two or three” standard deviations soon shaped the outcome of many thousands of discrimination cases, and was translated into a necessary p-value of 5%.

Beginning in the early 1960s, statistical inference became an important feature of tort cases that involved claims based upon epidemiologic evidence. In such health-effects litigation, the judicial handling of concepts such as p-values and confidence intervals often went off the rails.  In 1989, the United States Court of Appeals for the Fifth Circuit resolved an appeal involving expert witnesses who relied upon epidemiologic studies by concluding that it did not have to resolve questions of bias and confounding because the studies relied upon had presented their results with confidence intervals.[4] Judges and expert witnesses persistently interpreted single confidence intervals from one study as having a 95 percent probability of containing the actual parameter.[5] Similarly, many courts and counsel committed the transposition fallacy in interpreting p-values as posterior probabilities for the null hypothesis.[6]

Against this backdrop of mistaken and misrepresented interpretation of p-values, the American Statistical Association’s p-value statement was a helpful and understandable restatement of basic principles.[7] Within a few weeks, however, citations to the p-value Statement started to show up in the briefs and examinations of expert witnesses, to support contentions that p-values (or any procedure to evaluate random error) were unimportant, and should be disregarded.[8]

In 2019, Ronald Wasserstein, the ASA executive director, along with two other authors wrote an editorial, which explicitly called for the abandonment of using “statistical significance.”[9] Although the piece was labeled “editorial,” the journal provided no disclaimer that Wasserstein was not speaking ex cathedra.

The absence of a disclaimer provoked much confusion. Indeed, Brian Turran, the editor of Significancepublished jointly by the ASA and the Royal Statistical Society, wrote an editorial interpreting the Wasserstein editorial as an official ASA “recommendation.” Turran ultimately retracted his interpretation, but only in response to a pointed letter to the editor.[10] Turran adverted to a misleading press release from the ASA as the source of his confusion. Inquiring minds might wonder why the ASA allowed such misleading press releases to go out.

In addition to press releases, some people in the ASA started to send emails to journal editors, to nudge them to abandon statistical significance testing on the basis of what seemed like an ASA recommendation. For the most part, this campaign was unsuccessful in the major biomedical journals.[11]

While this controversy was unfolding, then President Karen Kafadar of the ASA stepped into the breach to state definitively that the Executive Director was not speaking for the ASA.[12] In November 2019, the ASA board of directors approved a motion to create a “Task Force on Statistical Significance and Replicability.” Its charge was “to develop thoughtful principles and practices that the ASA can endorse and share with scientists and journal editors. The task force will be appointed by the ASA President with advice and participation from the ASA Board.”

Professor Mayo’s editorial has done the world of statistics, as well as the legal world of judges, lawyers, and legal scholars, a service in calling attention to the peculiar intellectual conflicts of interest that played a role in the editorial excesses of some of  the ASA’s leadership. From a lawyer’s perspective, it is clear that courts have been misled, and distracted by, some of the ASA officials who seem to have worked to undermine a consensus position paper on p-values.[13]

Curiously, the task force’s report did not find a home in any of the ASA’s several scholarly publications. Instead “The ASA President’s Task Force Statement on Statistical Significance and Replicability[14] appeared in the The Annals of Applied  Statistics, where it is accompanied by an editorial by ASA former President Karen Kafadar.[15] In November 2021, the ASA’s official “magazine,” Chance, also published the Task Force’s Statement.[16]

Judges and litigants who must navigate claims of statistical inference need guidance on the standard of care scientists and statisticians should use in evaluating such claims. Although the Taskforce did not elaborate, it advanced five basic propositions, which had been obscured by many of the recent glosses on the ASA 2016 p-value statement, and the 2019 editorial discussed above:

  1. “Capturing the uncertainty associated with statistical summaries is critical.”
  2. “Dealing with replicability and uncertainty lies at the heart of statistical science. Study results are replicable if they can be verified in further studies with new data.”
  3. “The theoretical basis of statistical science offers several general strategies for dealing with uncertainty.”
  4. “Thresholds are helpful when actions are required.”
  5. “P-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data.”

Although the Task Force’s Statement will not end the debate or the “wars,” it will go a long way to correct the contentions made in court about the insignificance of significance testing, while giving courts a truer sense of the professional standard of care with respect to statistical inference in evaluating claims of health effects.


[1] Commentators included John Park, MD; Brian Dennis, Ph.D.; Philip B. Stark, Ph.D.; Kent Staley, Ph.D.; Yudi Pawitan, Ph.D.; Brian, Hennig, Ph.D.; Brian Haig, Ph.D.; and Daniël Lakens, Ph.D.

[2] Casteneda v. Partida, 430 U.S. 432 (1977).

[3] Id. at 430 U.S. 482, 496 n.17 (1977).

[4] Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989).

[5] Richard W. Clapp & David Ozonoff, “Environment and Health: Vital Intersection or Contested Territory?” 30 Am. J. L. & Med. 189, 210 (2004) (“Thus, a RR [relative risk] of 1.8 with a confidence interval of 1.3 to 2.9 could very likely represent a true RR of greater than 2.0, and as high as 2.9 in 95 out of 100 repeated trials.”) (Both authors testify for claimants cases involving alleged environmental and occupational harms.); Schachtman, “Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012) (collecting numerous examples of judicial offenders).

[6] See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191, 193 (S.D.N.Y. 2005) (Rakoff, J.) (credulously accepting counsel’s argument that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant). The decision has been criticized in the scholarly literature, but it is still widely cited without acknowledging its error. See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009).

[7] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The Am. Statistician 129 (2016); see “The American Statistical Association’s Statement on and of Significance” (March 17, 2016). The commentary beyond the “bold faced” principles was at times less helpful in suggesting that there was something inherently inadequate in using p-values. With the benefit of hindsight, this commentary appears to represent editorizing by the authors, and not the sense of the expert committee that agreed to the six principles.

[8] Schachtman, “The American Statistical Association Statement on Significance Testing Goes to Court, Part I” (Nov. 13, 2018), “Part II” (Mar. 7, 2019).

[9] Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019); see Schachtman,“Has the American Statistical Association Gone Post-Modern?” (Mar. 24, 2019).

[10] Brian Tarran, “THE S WORD … and what to do about it,” Significance (Aug. 2019); Donald Macnaughton, “Who Said What,” Significance 47 (Oct. 2019).

[11] See, e.g., David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019); Jonathan A. Cook, Dean A. Fergusson, Ian Ford, Mithat Gonen, Jonathan Kimmelman, Edward L. Korn, and Colin B. Begg, “There is still a place for significance testing in clinical trials,” 16 Clin. Trials 223 (2019).

[12] Karen Kafadar, “The Year in Review … And More to Come,” AmStat News 3 (Dec. 2019); see also Kafadar, “Statistics & Unintended Consequences,” AmStat News 3,4 (June 2019).

[13] Deborah Mayo, “The statistics wars and intellectual conflicts of interest,” 36 Conservation Biology (2022) (in-press, online Dec. 2021).

[14] Yoav Benjamini, Richard D. DeVeaux, Bradly Efron, Scott Evans, Mark Glickman, Barry Braubard, Xuming He, Xiao Li Meng, Nancy Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young, and Karen Kafadar, “The ASA President’s Task Force Statement on Statistical Significance and Replicability,” 15 Annals of Applied Statistics (2021) (in press).

[15] Karen Kafadar, “Editorial: Statistical Significance, P-Values, and Replicability,” 15 Annals of Applied Statistics (2021).

[16] Yoav Benjamini, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, Xiao-Li Meng, Nancy M. Reid, Stephen M. Stigler, Stephen B. Vardeman, Christopher K. Wikle, Tommy Wright, Linda J. Young & Karen Kafadar, “ASA President’s Task Force Statement on Statistical Significance and Replicability,” 34 Chance 10 (2021).