TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Washington Legal Foundation’s Paper on Statistical Significance in Rule 702 Proceedings

March 13th, 2017

The Washington Legal Foundation has released a Working Paper, No. 201, by Kirby Griffis, entitledThe Role of Statistical Significance in Daubert / Rule 702 Hearings,” in its Critical Legal Issues Working Paper Series, (Mar. 2017) [cited below as Griffis]. I am a fan of many of the Foundation’s Working Papers (having written one some years ago), but this one gives me pause.

Griffis’s paper manages to avoid many of the common errors of lawyers writing about this topic, but adds little to the statistics chapter in the Reference Manual on Scientific Evidence (3d ed. 2011), and he propagates some new, unfortunate misunderstandings. On the positive side, Griffis studiously avoids the transposition fallacy in defining significance probability, and he notes that multiplicity from subgroups and multiple comparisons often undermines claims of statistical significance. Griffis gets both points right. These are woefully common errors, and they deserve the emphasis Griffis gives to them in this working paper.

On the negative side, however, Griffis falls into error on several points. Griffis helpfully narrates the Supreme Court’s evolution in Daubert and then in Joiner, but he fails to address the serious mischief and devolution introduced by the Court’s opinion in Matrixx Initiatives, Inc. v. Siracusano, 563 U.S. 27, 131 S.Ct. 1309 (2011). See Schachtman, “The Matrixx – A Comedy of Errors” (April 6, 2011)”; David Kaye, “Trapped in the Matrixx: The U.S. Supreme Court and the Need for Statistical Significance,” BNA Product Safety & Liability Reporter 1007 (Sept. 12, 2011). With respect to statistical practice, this Working Paper is at times wide of the mark.

Non-Significance

Although avoiding the transposition fallacy, Griffis falls into another mistake in interpreting tests of significance; he states that a non-significant result tells us that an hypothesis is “perfectly consistent with mere chance”! Griffis at 9. This is, of course, wrong, or at least seriously misleading. A failure to reject the null hypothesis does not prove the null such that we can say that the “null results” in one study were perfectly consistent with chance. The test may have lacked power to detect an “effect size” of interest. Furthermore, tests of significance cannot rule out systematic bias or confounding, and that limitation alone ensures that Griffis’s interpretation is mistaken. A null result may have resulted from bias or confounding that obscured a measurable association.

Griffis states that p-values are expressed as percentages “usually 95% or 99%, corresponding to 0.05 or 0.01,” but this states things backwards. The p-value that is pre-specified to be “significant” is a probability or percentage that is low; it is the coefficient of confidence used to construct a confidence interval that is the complement of the significance probability. Griffis at 10. An alpha, or pre-specified statistical significance level, of 5% thus corresponds to a coefficient of confidence of 95% (or 1.0 – 0.05).

The Mid-p Controversy

In discussing the emerging case law, Griffis rightly points to cases that chastise Dr. Nicholas Jewell for the many liberties he has taken in various litigations as an expert witness for the lawsuit industry. One instance cited by Griffis is the Lipitor diabetes litigation, where the MDL court suggested that Jewell switched improperly from a Fisher’s exact test to a mid-test. Griffis at 18-19. Griffis seems to agree, but as I have explained elsewhere, Fisher’s exact test generates a one-tailed measure of significance probability, and the analyst is left to one of several ways of calculating a two-tailed test. SeeLipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test” (April 21, 2016). The mid-p is one legitimate approach for asymmetric distributions, and is more favorable to the defense than passing off the one-tailed measure as the result of the test. The mere fact that a statistical software package does not automatically specify the mid-p for a Fisher’s exact analysis does not make invoking this measure into p-hacking or other misconduct. Doubling the attained significance probability of a particular Fisher’s exact test result is generally considered less accurate than a mid-p calculation, even though some software packages using doubling attained significance probability as a default. As much as we might dislike bailing Jewell out of Daubert limbo, on this one, limited point, he deserved a better hearing.

Mis-Definitions

On recounting the Bendectin litigation, Griffis refers to the epidemiologic studies of birth defects and Bendectin as “experiments,” Griffis at 7, and then describes such studies as comparing “populations,” when he clearly meant “samples.” Griffis at 8.

Griffis conflates personal bias with bias as a scientific concept of systematic error in research, a confusion usually perpetuated by plaintiffs’ counsel. See Griffis at 9 (“Coins are not the only things that can be biased: scientists can be, too, as can their experimental subjects, their hypotheses, and their manipulations of the data.”) Of course, the term has multiple connotations, but too often an accusation of personal bias, such as conflict of interest, is used to avoid engaging with the merits of a study.

Relative Risks

Griffis correctly describes the measure known as “relative risk” as a determination of the “the strength of a particular association.” Griffis at 10. The discussion then lapses into using a given relative risk as a measure of the likelihood that an individual with the exposure studied develop the disease. Sometimes this general-to-specific inference is warranted, but without further analysis, it is impossible to tell whether Griffis lapsed from general to specific, deliberately or inadvertently, in describing the interpretation of relative risk.

Conclusion

Griffis is right in his chief contention that the proper planning, conduct and interpretation statistical tests is hugely important to judicial gatekeeping of some expert witness opinion testimony under Federal Rule of Evidence 702 (and under Rule 703, too). Judicial and lawyer aptitude in this area is low, and needs to be bolstered.

Hacked By GeNErAL

November 15th, 2016

~!Hacked By GeNErAL alias Mathis!~

Hacked By GeNErAL

 

Greetz : Kuroi’SH, RxR, ~

\!/Just for Fun ~Hacked By GeNErAL\!/

Hacked By GeNErAL! !

Statistical Analysis Requires an Expert Witness with Statistical Expertise

November 13th, 2016

Christina K. Connearne sued her employer, Main Line Hospitals, for age discrimination. Main Line charged Connearne with fabricating medical records, but Connearne replied that the charge was merely a pretext. Connearney v. Main Line Hospitals, Inc., Civ. Action No. 15-02730, 2016 WL 6569292 (E.D. Pa. Nov. 4, 2016) [cited as Connearney]. Connearne’s legal counsel engaged Christopher Wright, an expert witness on “human resources,” for a variety of opinions, most of which were not relevant to the action. Alas, for Ms. Connearne, the few relevant opinions proffered by Wright were unreliable. On a Rule 702 motion, Judge Pappert excluded Wright from testifying at trial.

Although not a statistician, Wright sought to offer his statistical analysis in support of the age discrimination claim. Connearney at *4. According to Judge Pappert’s opinion, Wright had taken just two classes in statistics, but perhaps His Honor meant two courses. (Wright Dep., at 10:3–4.) If the latter, then Wright had more statistical training than most physicians who are often permitted to give bogus statistical opinions in health effects litigation. In 2015, the Medical College Admission Test apparently started to include some very basic questions on statistical concepts. Some medical schools now require an undergraduate course in statistics. See Harvard Medical School Requirements for Admission (2016). Most medical schools, however, still do not require statistical training for their entering students. See Veritas Prep, “How to Select Undergraduate Premed Coursework” (Dec. 5, 2011); “Georgetown College Course Requirements for Medical School” (2016).

Regardless of formal training, or lack thereof, Christopher Wright demonstrated a profound ignorance of, and disregard for, statistical concepts. (Wright Dep., at 10:15–12:10; 28:6–14.) Wright was shown to be the wrong expert witness for the job by his inability to define statistical significance. When asked what he understood to be a “statistically significant sample,” Wright gave a meaningless, incoherent answer:

I think it depends on the environment that you’re analyzing. If you look at things like political polls, you and I wouldn’t necessarily say that serving [sic] 1 percent of a population is a statistically significant sample, yet it is the methodology that’s used in the political polls. In the HR field, you tend to not limit yourself to statistical sampling because you then would miss outliers. So, most HR statistical work tends to be let’s look at the entire population of whatever it is we’re looking at and go from there.”

Connearney at *5 (Wright Dep., at 10:15–11:7). When questioned again, more specifically on the meaning of statistical significance, Wright demonstrated his complete ignorance of the subject:

Q: And do you recall the testimony it’s generally around 85 to 90 employees at any given time, the ER [emergency room]?

A: I don’t recall that specific number, no.

Q: And four employees out of 85 or 90 is about what, 5 or 6 percent?

A: I’m agreeing with your math, yes.

Q: Is that a statistically significant sample?

A: In the HR [human resources] field it sure is, yes.

Q: Based on what?

A: Well, if one employee had been hit, physically struck, by their boss, that’s less than 5 percent. That’s statistically significant.”

Connearney at *5 n.5 (Wright Dep., at 28:6–14)

In support of his opinion about “disparate treatment,” Wright’s report contained nothing than a naked comparison of two raw percentages and a causal conclusion, without any statistical analysis. Even for this simplistic comparison of rates, Wright failed to explain how he obtained the percentages in a way that permitted the parties and the trial court to understand his computation and his comparisons. Without a statistical analysis, the trial court concluded that Wright had failed to show that the disparity in termination rates among younger and older employees was not likely consistent with random chance. See also Moultrie v. Martin, 690 F. 2d 1078 (4th Cir. 1982) (rejecting writ of habeas corpus when petitioner failed to support claim of grand jury race discrimination with anything other than the numbers of white and black grand jurors).

Although Wright gave the wrong definition of statistical significance, the trial court relied upon judges of the Third Circuit who also did not get the definition quite right. The trial court cited a 2010 case in the Circuit, which conflated substantive and statistical significance and then gave a questionable definition of statistical significance:

The Supreme Court has not provided any definitive guidance about when statistical evidence is sufficiently substantial, but a leading treatise notes that ‘[t]he most widely used means of showing that an observed disparity in outcomes is sufficiently substantial to satisfy the plaintiff’s burden of proving adverse impact is to show that the disparity is sufficiently large that it is highly unlikely to have occurred at random.’ This is typically done by the use of tests of statistical significance, which determine the probability of the observed disparity obtaining by chance.”

See Connearney at *6 & n.7, citing and quoting from Stagi v. National RR Passenger Corp., 391 Fed. Appx. 133, 137 (3d Cir. 2010) (emphasis added) (internal citation omitted). Ultimately, however, this was all harmless error on the way to the right result.

Benhaim v. St. Germain – Supreme Court of Canada Wrestles With Probability

November 11th, 2016

On November 10, 2016, the Supreme Court of Canada handed down a divided (four-to-three decision) in a medical malpractice case, which involved statistical evidence, or rather probabilistic inference. Benhaim v. St-Germain, 2016 SCC 48 (Nov. 10, 2016).  The case involved an appeal from a Quebec trial court, and the Quebec Court of Appeal, and some issues peculiar to Canadian lawyers. For one thing, Canadian law does not appear to follow lost-chance doctrine outlined in the American Law Institute’s Restatement. The consequence seems to be that negligent omissions in the professional liability context are assessed for their causal effect by the Canadian “balance of probabilities” standard.

The facts were reasonably clear, although their interpretation were disputed. In November 2005, Mr. Émond was 44 years old, a lifelong non-smoker, and in good health. At his annual physical with general practitioner Dr. Albert Benhaim, Émond had a chest X-ray (CXR). Benhaim at 11, 6. Remarkably, neither the majority nor the dissent commented upon the lack of reasonable medical necessity for a CXR in a healthy, non-smoking 40-something male. Few insurers in the United States would have paid for such a procedure. Maybe Canadian healthcare is more expansive than what we see in the United States.

The radiologist reviewing Mr. Émond’s CXR reported a 1.5 to 2.0 cm solitary lesion, and suggested a review with previous CXRs and a recommendation for a CT scan of the thorax. Dr. Benhaim did not follow the radiologist’s suggestions, but Mr. Émond did have a repeat CXR two months later, on January 17, 2006, which was interpreted as unchanged. A recommendation for a follow-up third CXR in four months was not acted upon. Benhaim at 11, 7. The trial court found that the defendant physicians deviated from the professional standard of care, a finding from which there was no appeal.

Mr. Émond did have a follow-up CXR at the end of 2006, on December 4, 2006, which showed that the solitary lung nodule had grown. Follow up CT and PET scans confirmed that Mr. Émond had Stage IV lung cancer. Id.

The issues in controversy turned on the staging of Mr. Émond’s lung cancer at the time of his first CXR, in November 2005, the medical consequences of the delay in diagnosis. Plaintiffs presented expert witness opinion testimony that Mr. Émond’s lung cancer was only Stage I (or at most IIA), at initial radiographic discovery of a nodule, and that he was at Stage III or IV in December 2006, when CT and PET scans confirmed the actual diagnosis of lung cancer. In the view of plaintiff’s expert witnesses, the delay in diagnosis, and the accompanying growth of the tumor and change from Stage I to IV, dramatically decreased Émond’s chance of survival. Id. At 13, 15-16. Indeed, plaintiff’s expert witnesses opined that had Mr. Émond been timely diagnosed and treated in November 2005, he probably would have been cured.

The defense expert witness, Dr. Ferraro, testified that Mr. Émond’s lung cancer was Stage III or IV in November 2005, when the radiographic nodule was first seen, and his chances of survival at that time were already quite poor. According to Dr. Ferraro, earlier intervention and treatment would probably not have been successful in curing Mr. Émond, and the delay in diagnosis was not a cause of his death.

The trial court rejected plaintiffs’ expert witnesses’ opinions on factual grounds. These witnesses had argued that Mr. Émond’s lung cancer was at Stage I in November 2005 because the lung nodule was less than 3 cm., and because Mr. Émond was asymptomatic and in good health. These three points of contention were clearly unreliable because they were all present in January 2007, when Mr. Émond was diagnosed with Stage IV cancer, according to all the expert witnesses. Every point cited by plaintiffs’ expert witnesses in support of their staging failed to discriminate Stage I from Stage III. In Her Honor’s opinion, the lung cancer was probably Stage III in November 2005, and this staging implied a poor prognosis on all the expert witnesses’ opinions. The failure to diagnose until late 2006 was thus not, on the “balance of probabilities” a cause of death. Id. At 15, ¶21.

The intermediate appellate court reversed on grounds of a presumption of causation, which comes into being when the defendant’s negligence interferes with plaintiff’s ability to show causation, and there is some independent evidence of causation to support the case. I will leave this presumption, which the Supreme Court of Canada held inappropriate on the facts of this case, to Canadian lawyers to debate. What was more interesting was the independent evidence adduced by plaintiffs. This evidence consisted of statistical evidence in the form of generality that 78 percent of fortuitously discovered lung cancers are at Stage I, which in turn is associated with a cure rate of 70 percent. Id. at 18 30.

The plaintiffs’ witnesses hoped to apply this generality to this case, notwithstanding that Émond’s nodule was close to 2 cm. on CXR, that the general statistic was based up more sensitive CT studies, and that Émond had been a non-smoker (which may have influenced tumor growth and staging). Furthermore, there was an additional, ominous finding in Mr. Émond’s first CXR, of hilar prominence, which supported the defense’s differentiation of his case from the generality of fortuitously discovered (presumably small, solitary lung nodules without hilar involvement). Id. at 44 83.

The trial court rejected the inference from the group statistic of 70% survival to the conclusion that Mr. Émond had a 70% probability of survival. Tellingly, there was no discussion of the variance for the 70% figure; nor any mention of relevant subgroups. The Court of Appeals, however, would have turned this statistic into a binding presumption by virtue of accepting the 78 percent as providing strong evidencec that the 70% survival figure pertained to Mr. Émond. The intermediate appellate court would then have taken the group survival rate as providing a more likely than not conclusion about Mr. Émond, while rejecting the defense expert witness’s statistics as mere speculation. Id. at 36 ¶67.

Adopting a skeptical stance with respect to probabilistic evidence, the Supreme Court reversed the Quebec Court of Appeal’s reversal of the trial court’s judgment. The Court cited Richard Wright and Jonathan Cohen’s criticisms of probabilistic evidence (and Cohen’s Gatecrasher’s Paradox), and urged caution in applying class or group statistics to generate probabilities that class members share the group characteristic.

Appellate courts should generally not interfere with a trial judge’s decision not to draw an inference from a general statistic to a particular case. Statistics themselves are silent about whether the particular parties before the court would have conformed to the trend or been an exception from it. Without an evidentiary bridge to the specific circumstances of the plaintiff, statistical evidence is of little assistance. For this reason, such general trends are not determinative in particular cases. What inferences follow from such evidence — whether the generalization that a statistic represents is instantiated in the particular case — is a matter for the trier of fact. This determination must be made with reference to the whole of the evidence.”

Benhaim at 39, 74, 75 (internal citations omitted).

To some extent, the Supreme Court’s comments about statistical evidence were rather wide of there mark. The 78% statistic was based upon a high level of generality, namely all cases, without regard for the size of the radiographically discovered lesion, the manner of discovery (CXR versus CT), presence or absence of hilar pathology, or group or individual’s smoking status. In the context of the facts of the case, however, the trial court clearly had a factual basis for resisting the application of the group statistic (78% fortuitously discovered tumors were Stage I with 70% five-year survival).

The Canadian Supreme Court seems to have navigated these probabilistic waters fairly adeptly, although the majority opinion contains broad brush generalities and inaccuracies, which will, no doubt, show up in future lower court cases. For instance:

This is because the law requires proof of causation only on a balance of probabilities, whereas scientific or medical experts often require a higher degree of certainty before drawing conclusions on causation (p. 330). Simply put, scientific causation and factual causation for legal purposes are two different things.”

Benhaim at 24, 47. The Court cited legal precedent for its observation, and not any scientific treatises. And then, the Supreme Court suggested that all one needs to prevail in a tort case in Canada is a medical expert witness who speculates:

Trial judges are empowered to make legal determinations even where medical experts are not able to express an opinion with certainty.

Benhaim at 37, 72Clearly dictum on the facts of Benhaim, but it seems that judges in Canada are like those in the United States. Black robes empower them to do what mere scientists could not do. If we were to ignore the holding of Benhaim, we might think that all one needs in Canada is a medical expert who speculates.