TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Zoloft MDL Relieves Matrixx Depression

January 30th, 2015

When the Supreme Court delivered its decision in Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309 (2011), a colleague, David Venderbush from Alston & Bird LLP, and I wrote a Washington Legal Foundation Legal Backgrounder, in which we predicted that plaintiffs’ counsel would distort the holding, and inflate the dicta of the opinion. Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011)[1]. Our prediction was sadly all-too accurate. Not only was the context of the Matrixx distorted, but several district courts appeared to adopt the dicta on statistical significance as though it represented the holding of the case[2].

The Matrixx decision, along with the few district court opinions that had embraced its dicta[3], was urged as the basis for denying a defense challenge to the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist, in the Zoloft MDL. The trial court, however, correctly discerned several methodological shortcomings and failures, including Dr. Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Plaintiffs (through their Plaintiffs’ Steering Committee (PSC) in the Zoloft MDL) were undaunted and moved for reconsideration, asserting that the MDL trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx, and a Third Circuit decision in DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The MDL trial judge, however, deftly rebuffed the plaintiffs’ use of Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration).

In rejecting the motion for reconsideration, the Zoloft MDL trial judge noted that the PSC had previously cited Matrixx, and that the Court had addressed the case in its earlier ruling. 2015 WL 314149, at *2-3. The MDL Court then proceeded to expand upon its earlier ruling, and to explain how Matrixx was largely irrelevant to the Rule 702 context of Pfizer’s challenge to Dr. Bérard. There were, to be sure, some studies with nominal statistically significant results, for some birth defects, among children of mothers who took Zoloft in their first trimester of pregnancy. As Judge Rufe explained, statistical significance, or the lack thereof, was only one item in a fairly long list of methodological deficiencies in Dr. Bérard’s causation opinions:

“The [original] opinion set forth a detailed and multi-faceted rationale for finding Dr. Bérard’s testimony unreliable, including her inattention to the principles of replication and statistical significance, her use of certain principles and methods without demonstrating either that they are recognized by her scientific community or that they should otherwise be considered scientifically valid, the unreliability of conclusions drawn without adequate hypothesis testing, the unreliability of opinions supported by a ‛cherry-picked’ sub-set of research selected because it was supportive of her opinions (without adequately addressing non-supportive findings), and Dr. Bérard’s failure to reconcile her currently expressed opinions with her prior opinions and her published, peer-reviewed research. Taking into account all these factors, as well as others discussed in the Opinion, the Court found that Dr. Bérard departed from well-established epidemiological principles and methods, and that her opinion on human causation must be excluded.”

Id. at *1.

In citing the multiple deficiencies of the proffered expert witness, the Zoloft MDL Court thus put its decision well within the scope of the Third Circuit’s recent precedent of affirming the exclusion of Dr. Bennet Omalu, in Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir.2011). The Zoloft MDL Court further defended its ruling by pointing out that it had not created a legal standard requiring statistical significance, but rather had made a factual finding that epidemiologist, such as the challenged witness, Dr. Anick Bérard, would use some measure of statistical significance in reaching conclusions in her discipline of epidemiology. 2015 WL 314149, at *2[4].

On the plaintiffs’ motion for reconsideration, the Zoloft Court revisited the Matrixx case, properly distinguishing the case as a securities fraud case about materiality of non-disclosed information, not about causation. 2015 WL 314149, at *4. Although the MDL Court could and should have identified the Matrixx language as clearly obiter dicta, it did confidently distinguish the Supreme Court holding about pleading materiality from its own task of gatekeeping expert witness testimony on causation in a products liability case:

“Because the facts and procedural posture of the Zoloft MDL are so dissimilar from those presented in Matrixx, this Court reviewed but did not rely upon Matrixx in reaching its decision regarding Dr. Bérard. However, even accepting the PSC’s interpretation of Matrixx, the Court’s Opinion is consistent with that ruling, as the Court reviewed Dr. Bérard’s methodology as a whole, and did not apply a bright-line rule requiring statistically significant findings.”

Id. at *4.

In mounting their challenge to the MDL Court’s earlier ruling, the Zoloft plaintiffs asserted that the Court had failed to credit Dr. Bérard’s reliance upon what Dr. Bérard called the “Rothman approach.” This approach, attribution to Professor Kenneth Rothman had received some attention in the Bendectin litigation in the Third Circuit, where plaintiffs sought to be excused from their failure to show statistically significant associations when claiming causation between maternal use of Bendectin and infant birth defects. DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The Zoloft MDL Court pointed out that the Circuit, in DeLuca, had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:

“by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicial of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”

2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

In the Zoloft MDL, the plaintiffs not only offered an erroneous interpretation of the Third Circuit’s precedents in DeLuca, they also failed to show that the “Rothman” approach had become generally accepted in over two decades since DeLuca. 2015 WL 314149, at *4. Indeed, the hearing record was quite muddled about what the “Rothman” approach involved, other than glib, vague suggestions that the approach would have countenanced Dr. Bérard’s selective, over-reaching analysis of the extant epidemiologic studies. The plaintiffs did not call Rothman as an expert witness; nor did they offer any of Rothman’s publications as exhibits at the Zoloft hearing. Although Professor Rothman has criticized the overemphasis upon p-values and significance testing, he has never suggested that researchers and scientists should ignore random error in interpreting research data. Nevertheless, plaintiffs attempted to invoke some vague notion of a Rothman approach that would ignore confidence intervals, attained significance probability, multiplicity, bias, and confounding. Ultimately, the MDL Court would have none of it. The Court held that the Rothman Approach (whatever that is), as applied by Dr. Bérard, did not satisfy Rule 702.

The testimony at the Rule 702 hearing on the so-called “Rothman approach” had been sketchy at best. Dr. Bérard protested, perhaps too much, when asked about her having ignored p-values:

“I’m not the only one saying that. It’s really the evolution of the thinking of the importance of statistical significance. One of my professors and also a friend of mine at Harvard, Ken Rothman, actually wrote on it – wrote on the topic. And in his book at the end he says obviously what I just said, validity should not be confused with precision, but the third bullet point, it’s saying that the lack of statistical significance does not invalidate results because sometimes you are in the context of rare events, few cases, few exposed cases, small sample size, exactly – you know even if you start with hundreds of thousands of pregnancies because you are looking at rare events and if you want to stratify by exposure category, well your stratum becomes smaller and smaller and your precision decreases. I’m not the only one saying that. Ken Rothman says it as well, so I’m not different from the others. And if you look at many of the studies published nowadays, they also discuss that as well.”

Notes of Testimony of Dr. Anick Bérard, at 76:21- 77:14 (April 9, 2014). See also Notes of Testimony of Dr. Anick Bérard, at 211 (April 11, 2014) (discussing non-statistically significant findings as a “trend,” and asserting that the lack of a significant finding does not mean that there is “no effect”). Bérard’s invocation of Rothman here is accurate but unhelpful. Rothman and Bérard are not alone in insisting that confidence intervals provide a measure of precision of an estimate, and that we should be careful not to interpret the lack of significance to mean no effect. But the lack of significance cannot be used to interpret data to show an effect.

At the Rule 702 hearing, the PSC tried to bolster Dr. Bérard’s supposed reliance upon the “Rothman approach” in cross-examining Pfizer’s expert witness, Dr. Stephen Kimmel:

“Q. You know who Dr. Rothman is, the epidemiologist?
A. Yes.
Q. You actually took a course from Dr. Rothman, didn’t you?
A. I did when I was a student way back.
Q. He is a well-known epidemiologist, isn’t he?
A. Yes, he is.
Q. He has published this book, Modern Epidemiology. Do you have a copy of this?
A. I do.
Q. Do you – Have you ever read it?
A. I read his earlier edition. I have not read the most recent edition.
Q. There’s two other authors, Sander Greenland and Tim Lash. Do you know either one of them?
A. I know Sander. I don’t know Tim.
Q. Dr. Rothman has some – he has written about confidence intervals and statistical significance for some time, hasn’t he?
A. He has.
Q. Do you agree with him that statistical significance is a not matter of validity. It’s a matter of precision?
A. It’s a matter of – well, confidence intervals are matters of precision. P-values are not.
Q. Okay. I want to put up a table and see if you are in agreement with Dr. Rothman. This is the third edition of Modern Epidemiology. And he has – and ignore my brother’s handwriting. But there is an hypothesized rate ratio under 10-3. It says: p-value function from which one can find all confidence limits for a hypothetical study with a rate ratio estimate of 3.1 Do you see that there?
A. Yes. I don’t see the top of the figure, not that it matters.
Q. I want to make sure. The way I understand this, he is giving us a hypothesis that we have a relative risk of 3.1 and it [presumably a 95% confidence interval] crosses 1, meaning it’s not statistically significant. Is that fair?
A. Well, if you are using a value of .05, yes. And again, if this is a single test and there’s a lot of things that go behind it. But, yes, so this is a total hypothetical.
Q. Yes.
A. I’ sorry. He’s saying here is a hypothetical based on math. And so here is – this is what we would propose.
Q. Yes, I want to highlight what he says about this figure and get your thoughts on it. He says:
The message of figure 10-3 is that the example data are more compatible with a moderate to strong association than with no association, assuming the statistical model used to construct the function is correct.
A. Yes.
Q. Would you agree with that statement?
A. Assuming the statistical model is correct. And the problem is, this is a hypothetical.
Q. Sure. So let’s just assume. So what this means to sort of put some meat on the bone, this means that although we cross 1 and therefore are statistically
significant [sic, non-significant], he says the more likely truth here is that there is a moderate to strong effect rather than no effect?
A. Well, you know he has hypothesized this. This is not used in common methods practice in pharmacoepi. Dr. Rothman has lots of ideas but it’s not part of our standard scientific method.

Notes of Testimony of Dr. Stephen Kimmel, at 126:2 to 128:20.

Nothing very concrete about the “Rothman approach” is put before the MDL Court, either through Dr. Bérard or Dr. Kimmel. There are, however, other instructive aspects to the plaintiff’s counsel’s examination. First, the referenced portion of the text, Modern Epidemiology, is a discussion of p-value functions, not of p-values or of confidence intervals per se. Modern Epidemiology at 158-59 (3d ed. 2008). Dr. Bérard never discussed p-value functions in her report or in her testimony, and Dr. Kimmel testified, without contradiction, that such p-value functions are “not used in common methods practice.” Second, the plaintiff’s counsel never marked and offered the Rothman text as an exhibit for the MDL Court to consider. Third, the cross-examiner first asked about the implication for a hypothetical association, and then, when he wanted to “put some meat on the bone” changed the word used in Rothman’s text, “association,” to “effect.” The word “effect” does not appear in Rothman’s text at the referenced discussion about p-value functions. Fortunately, the MDL Court was not poisoned by the “meat on the bone.”

The Pit and the Pendulum

Another document glibly referenced but not provided to the MDL Court was the publication of Sir Austin Bradford Hill’s presidential address to the Royal Society of Medicine on causation. The MDL Court acknowledged that the PSC had argued that the emphasis upon statistical significance was contrary to Hill’s work and teaching. 2015 WL 314149, at *5. In the Court’s words:

“the PSC argues that the Court’s finding regarding the importance of statistical significance in the field of epidemiology is inconsistent with the work of Bradford Hill. The PSC points to a 1965 address by Sir Austin Bradford Hill, which it has not previously presented to the Court, except in opening statements of the Daubert hearings.20 The PSC failed to put forth evidence establishing that Bradford Hill’s statement that ‛I wonder whether the pendulum has not swung too far [in requiring statistical significance before drawing conclusions]’ has, in the decades since that 1965 address, altered the importance of statistical significance to scientists in the field of epidemiology.”

Id. This failure, identified by the Court, is hardly surprising. The snippet of a quotation from Hill would not sustain the plaintiffs’ sweeping generalization. The quoted language in context may help to explain why Hill’s paper was not provided:

“I wonder whether the pendulum has not swung too far – not only with the attentive pupils but even with the statisticians themselves. To decline to draw conclusions without standard errors can surely be just as silly? Fortunately I believe we have not yet gone so far as our friends in the USA where, I am told, some editors of journals will return an article because tests of significance have not been applied. Yet there are innumerable situations in which they are totally unnecessary – because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too small to be of any practical importance. What is worse the glitter of the t table diverts attention from the inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnel volunteer for some procedure or interview, 20% of patients treated in some particular way are lost to sight, 30% of a randomly-drawn sample are never contracted. The sample may, indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house and carried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers.’ The writer, the editor and the reader are unmoved. The magic formulae are there.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 299 (1965).

In the Zoloft cases, no expert witness was prepared to state that the disparity was “grotesquely obvious,” or “negligible.” And Bradford Hill’s larger point was that bias and confounding often dwarf considerations of random error, and that there are many instances in which significance testing is unavailing or unhelpful. And in some studies, with large “effect sizes,” statistical significance testing may be beside the point.

Hill’s presidential address to the Royal Society of Medicine commemorated his successes in epidemiology, and we need only turn to Hill’s own work to see how prevalent was his use of measurements of significance probability. See, e.g., Richard Doll & Austin Bradford Hill, “Smoking and Carcinoma of the Lung: Preliminary Report,” Brit. Med. J. 740 (Sept. 30, 1950); Medical Research Council, “Streptomycin Treatment of Pulmonary Tuberculosis,” Brit. Med. J. 769 (Oct. 30, 1948).

Considering the misdirection on Rothman and on Hill, the Zoloft MDL Court did an admirable job in unraveling the Matrixx trap set by counsel. The Court insisted upon parsing the Bradford Hill factors[5], over Pfizer’s objection, despite the plaintiffs’ failure to show “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” which Bradford Hill insisted was the prerequisite for the exploration of the nine factors he set out in his classic paper. Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Given the outcome, the Court’s questionable indulgence of plaintiffs’ position was ultimately harmless.


[1] See alsoThe Matrixx – A Comedy of Errors,” and “Matrixx Unloaded,” (Mar. 29, 2011), “The Matrixx Oversold,” and “De-Zincing the Matrixx.”

[2] SeeSiracusano Dicta Infects Daubert Decisions” (Sept. 22, 2012).

[3] See, e.g., In re Chantix (Varenicline) Prods. Liab. Litig., 2012 U.S. Dist. LEXIS 130144, at *22 (N.D. Ala. 2012); Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012); In re Celexa & Lexapro Prods. Liab. Litig.,  ___ F.3d ___, 2013 WL 791780 (E.D. Mo. 2013).

[4] The Court’s reasoning on this point begged the question whether an ordinary clinician, ignorant of the standards, requirements, and niceties of statistical reasoning and inference, would be allowed to testify, unconstrained by any principled epidemiologic reasoning about random or systematic error. It is hard to imagine that Rule 702 would countenance such an end-run around the requirements of sound science.

[5] Adhering to Bradford Hill’s own admonition might have saved the Court the confusion of describing statistical significance as a measure of strength of association. 2015 WL 314149, at *2.

The Lie Detector and Wonder Woman – Quirks and Quacks of Legal History

January 27th, 2015

From 1923, until the United States Supreme Court decided the Daubert case in 1993, Frye was cited as “controlling authority” on questions of the admissibility of scientific opinion testimony and test results. The decision is infuriatingly cryptic and unhelpful as to background or context of the specific case, as well as how it might be applied to future controversies. Of the 669 words, these are typically cited as the guiding “rule” with respect to expert witness opinion testimony:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”

Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923).

As most scholars of evidence realize, the back story of the Frye case is rich and bizarre. The expert witness involved, William Marston, was a lawyer and scientist, who had made advances in a systolic blood pressure cuff to be used as a “lie detector.” Marston was also an advocate of free love and, with his wife and his mistress, the inventor of Wonder Woman and her lasso of truth.

Jill Lepore, a professor of history in Harvard University, has written an historical account of Marston and his colleagues. Jill Lepore, The Secret History of Wonder Woman (N.Y. 2014). More recently, Lepore has written an important law review on the historical and legal record of the Frye case, which is concealed in the terse 669 words of the Court of Appeals’ opinion. Jill Lepore, “On Evidence: Proving Frye as a Matter of Law, Science, and History,” 124 Yale L.J. 1092 (2015).

Lepore’s history is an important gloss on the Frye case, but her paper points to a larger, more prevalent, chronic problem in the law, which especially afflicts judicial decisions of scientific or technical issues. As an historian, Lepore is troubled, as we all should be, by the censoring, selecting, suppressing, and distorting of facts that go into judicial decisions. From cases and their holdings, lawyers are taught to infer rules that guide their conduct at the bar, and their clients’ conduct and expectations, but everyone requires fair access to the evidence to determine what facts are material to decision.

As Professor Lepore puts it:

“Marston is missing from Frye because the law of evidence, case law, the case method, and the conventions of legal scholarship — together, and relentlessly — hide facts.”

Id. at 1097. Generalizing from Marston and the Frye case, Lepore notes that:

“Case law is like that, too, except that it doesn’t only fail to notice details; it conceals them.”

Id. at 1099.

Lepore documents that Marston’s psychological research was rife with cherry picking and data dredging. Id. at 1113-14. Despite his degree magna cum laude in philosophy from Harvard College, his L.L.B from Harvard Law School (with no particular distinction), and his Ph.D. from Harvard University, Marston was not a rigorous scientist. In exploring the historical and legal record, not recounted in the Frye decision, Lepore’s history provides a wonderful early example, of what has become a familiar phenomenon of modern litigation: an expert witness who seeks to achieve acceptance for a dubious opinion or device in the courtroom rather than in the court of scientific opinion. Id. at 1122. The trial judge in Frye’s murder case, Justice McCoy, was an astute judge, and quite modest in his ability to evaluate the validity of Marston’s opinions, but he had more than sufficient perspicacity to discern that Marston’s investigation was “wildly unscientific,” with no control groups. Id. at 1135. The trial record of defense counsel’s proffer, and Justice McCoy’s rulings and comments from the bench, reproduced in Lepore’s article, anticipate and predict much of the scholarship surrounding both Frye and Daubert cases.

Lepore complains that the important historical record, including Marston’s correspondence with Professor Wigmore, the criminal fraud charges against Marston, and the correspondence of Frye’s lawyers, lies “filed, undigitized” in various archives. Id. at 1150. Although Professor Lepore tirelessly cites to internet URL sources when available, she could have easily made the primary historical materials available for all, using readily available internet technology. Lepore’s main thesis should encourage lawyers and law scholars to look beyond appellate decisions as the data for legal analysis.

The Erosion of Employer Immunity in Tort Litigation

January 20th, 2015

The present workman’s compensation system in the United States has serious flaws. Scheduled awards are inadequate in some states, and their inadequacy fosters specious litigation against remote third parties who are not able to control the workplace use of hazardous materials. In many states, premiums are set on an industry-wide basis, and thus careless employers are not handed incentives to improve workplace hygiene. With awards low, and without the need to rate individual employers, compensation insurers do not adequately inspect and control individual employers’ conduct. Workman’s compensation statutes provide a lien against any third-party recovery, which means that employers (and their insurers) will be rewarded for their negligence if injured employees can frame liability suits against third-parties, such as suppliers of raw materials to the employers.

For the most part, organized labor and management reached their great compromise over occupational injury litigation back from about 1911 through the early 1930s. Before the passage of the various compensation acts, employees had common law negligence actions against employers for deviations from reasonable care. In some part of the country, juries were extremely sympathetic to injured workers, and equally hostile to employers. At the same time, employers had powerful defenses in the form of contributory negligence, which barred claims by workers who were the least bit careless for their own safety. The fellow-worker rule, assumption of risk, and statutes of limitations further diminished workers’ likelihood of success in pursuing tort claims. One option that was not on the table in the negotiations was to open up liability of remote vendors to employers as a way to mitigate the hardships of the common law tort system. Remote suppliers had even more potent defenses in the form of privity of contract, intervening and superseding negligence of the employers and employees, and all the other defenses that employers enjoyed. More important, however, the interlocutors realized that employers controlled the workplace, and had the greatest opportunity to prevent industrial injuries and occupational disease. When the workman’s compensation bargain was struck, labor knew that the scheduled awards would be workers’ sole or main source of compensation.

Worker’s compensation statutes made recovery for most injuries a certainty, with schedules of damages that were deeply discounted from what might be had in a jury trial. In return for well-nigh absolute liability, employers gained certainty of outcome, reduction of administrative costs, and immunity to tort liability for all but intentional harms. The remedial compensation statutes gave employers immunity, but they did not eradicate the basic common law bases for suits against employers. But for the worker’s compensation statutes, employees would have rights of action against employers. Gaps in the compensation acts translated into gaps in immunity, and reversion to the common law of negligence.

The predicate for the “deal” began to disintegrate after World War II. For one thing, changes in tort law diminished the defenses that employers had exercised so effectively before the deal. Contributory negligence gave way to comparative negligence.  Assumption of risk defenses were curtailed, and the fellow-servant rule was severely modified or abandoned.

Just when Labor might have been feeling consumed by buyer’s remorse over its deal, strict liability principles began to replace privity doctrines. In 1965, the American Law Institute adopted § 402A which provided for “Special Liability of Seller of Product for Physical Harm to User or Consumer,” based upon concerns of unequal knowledge of defects and latent hazards of products sold to consumers. Liability followed for harm caused by a product irrespective of privity of contract or warranty, and even if “the seller has exercised all possible care in the preparation and sale of his product.” Restatement (Second), Torts § 402A (2)(a),(b) (1965).

Section 402 became the vehicle for injured workers to ditch their capped damages in worker’s compensation court, and to put their cases back in front of juries, with the prospect of unlimited awards for non-economic damages. Although instigated by the perceived imbalance of knowledge between manufacturers and buyers with respect to design and manufacturing defects, strict liability doctrine quickly became a vehicle for redressing inadequacies in the workman’s compensation systems. What was problematic, however, was that there was often no inequality of knowledge between seller and purchaser, or hidden or latent hazard in the product or material.

There are exceptions to the exclusivity of workman’s compensation remedies against employers. One exception, available in most states, is for intentional torts committed by employers. The scienter requirement for intentional torts allowed only very few cases to proceed against employers in tort. A bigger gap in immunity, however, was opened in Pennsylvania, where workers regained their common law right to sue employers for negligence and other torts, for occupational diseases that manifest more than 300 weeks after last employment. Section 301(c)(2) of the Pennsylvania’s Workman’s Compensation Act, 77 P.S. § 411(2) removes these delayed manifested occupational disease claims from the scope of Pennsylvania’s Act. The Pennsylvania Supreme Court filled in the obvious logical gap: if the Act did not apply, then the employer had no immunity against a common law cause of action, which was never abolished, and was unavailable only when there was a statutory remedy under the Act. Tooey v. AK Steel Corp., 81 A.3d 851 (2013); “Pennsylvania Workers Regain Their Right of Action in Tort against Employers for Latent Occupational Diseases” (Feb. 14, 2014). See also Gidley v. W.R. Grace Co., 717 P.2d 21(Mont. 1986)).

The Tooey decision has the potential to open an important door for plaintiffs and defendants alike. With employer immunity erased, the employer’s duty of reasonable care to protect the health and safety of its employees can once again be harnessed to improve the lot of workers, without concocting Rube-Goldberg theories of liability against remote suppliers and premise owners. Juries will see the entire evidentiary case, including the actions and omissions of employers, which will tend to exculpate remote suppliers. Employers will be given incentives to train employees in workplace safety, and to document their efforts. Employers will assert comparative negligence and assumption of risk defenses, which will give the lie to the plaintiffs’ claims of inadequate warnings from the remote suppliers.  Tooey, and the prospect of employer liability, has the potential to improve the truth finding ability of juries in tort cases.

Folta v. Ferro Engineering, 2014 IL App (1st) 123219.

In June of last year, the Illinois intermediate appellate court followed the Pennsylvania Supreme Court’s lead in Tooey, and decided to allow a direct action against an employer when the employee’s claim was not within the scope of the Illinois workers’ compensation act. Folta v. Ferro Eng’g , 14 N.E.3d 717, 729 (Ill. App. Ct.), appeal allowed (Ill. S. Ct. Sept. 24, 2014). See Steven Sellers, “Workers’ Compensation System Threatened By Illinois Asbestos Decision, Companies Say,” 43 Product Safety & Liability Reporter (Jan. 8, 2015).

James Folta developed mesothelioma 41 years after leaving his employment with Ferro Engineering, a latency that put his claim outside the Illinois Workers’ Compensation Act and Workers’ Occupational Diseases Act. The panel of the intermediate appellate court held that the same latency that denied Mr. Folta coverage, also worked to deny the employer immunity from common law suit. Mr. Folta’s asbestos exposure occurred at his Ferro workplace, from 1966 to 1970, during which time raw asbestos and many finished asbestos product suppliers provided warnings about the dangers of asbestos inhalation.

The BNA reporter, Mr. Sellers, quoted Mark Behrens, of Shook, Hardy & Bacon, as stating that:

“This case is part of an emerging national attack on state workers’ compensation systems by the personal injury bar.”

Id. Perhaps true, but the systems have been under critical attack from the public health community, legal reformers, labor, and industry, for some time. No one seems very happy with the system except employers in the specific moment and circumstance of asserting their immunity in tort actions. The regime of worker compensation immunity for employers has failed to foster worker safety and health, and it has worked to shift liability unfairly to remote suppliers who are generally not in a position to redress communication lapses in the workplace.

The Illinois Supreme Court has allowed Ferro Engineering to appeal the Folta case. Not surprisingly, the American Insurance Association, the Property Casualty Insurers Association of America and the Travelers Indemnity Company have filed an amicus brief in support of Ferro. Various companies — Caterpillar, Inc., Aurora Pump Co., Innophos, Inc., Rockwell Automation, Inc., United States Steel Corp., F.H. Leinweber Co., Inc., Driv-Lok, Inc., Ford Motor Co., and ExxonMobil Oil Corp. — have also banded together to file an amicus brief in support of Ferro. Ironically, many of these companies would benefit from abandoning employer immunity in occupational disease litigation. Taking the short view, the defense amicus brief argues that the Illinois Appellate Court’s decision distorts the “delicate balancing of competing interests,” and will lead to a flood of asbestos litigation in Illinois. The defense amicus further argues that the intermediate appellate court’s decision is “the first step towards unraveling the quid pro quo embodied in the acts.”

The problem with the defense position is that there already a flood of asbestos litigation in Illinois and elsewhere, and the problem lies not in damming the flood, but ensuring its equitable resolution. Divining what a legislature intended is always a risky business, but it seems unlikely it had any clear understanding of diseases with latencies in excess of 25 years. And while the Ferro decision has the potential to unravel the defense’s understanding of employer immunity in long-latency occupational disease cases, the real issue is whether bringing the employer to the table in civil litigation over occupational diseases will result in more equitable allocation of responsibility for the harms alleged. Even a “wrong” decision by the Illinois Supreme Court will have the advantage of inciting the Illinois legislature to clarify what it meant, and perhaps to recalibrate tort law to acknowledge the primary role of employers in providing safe workplaces.

The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes

January 17th, 2015

As a matter of policy, I leave the comment field turned off on this blog. I don’t have the time or patience to moderate discussions, but that is not to say that I don’t value feedback. Many readers have written, with compliments, concurrences, criticisms, and corrections. Some correspondents have given me valuable suggestions and materials. I believe I can say that aside from a few scurrilous emails, the feedback generally has been constructive, and welcomed.

My last post was on Naomi Oreskes’ opinion piece in the Sunday New York Times[1]. Professor Deborah Mayo asked me for permission to re-post the substance of this post, and to link to the original[2]. Mayo’s blog does allow for comments, and much to my surprise, the posts drew a great deal of attention, links, comment, and twittering. The number and intensity of the comments, as well as the other blog posts and tweets, seemed out of proportion to the point I was trying to make about misinterpreting confidence intervals and other statistical concepts. I suspect that some climate skeptics received my criticisms of Oreskes with a degree of schadenfreude, and that some who criticized me did so because they fear any challenge to Oreskes as a climate-change advocate. So be it. As I made clear in my post, I was not seeking to engage Oreskes on climate change or her judgments on that issue. What I saw in Oreskes’ article was the same rhetorical move made in the courtroom, and in scientific publications, in which plaintiffs environmentalists attempt to claim a scientific imprimatur for their conclusions without adhering to the rigor required for scientific judgments[3].

Some of the comments about Professor Oreskes caused me to take a look at her recent book, Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (N.Y. 2010). Interestingly, much of the substance of Oreskes’ newspaper article comes directly from this book. In the context of reporting on the dispute over the EPA’s meta-analysis of studies on passive smoking and lung cancer, Oreskes addressed the 95 percent issue:

“There’s nothing magic about 95 percent. It could be 80 percent. It could be 51 percent. In Vegas if you play a game with 51 percent odds in your favor, you’ll still come out ahead if you play long enough. The 95 percent confidence level is a social convention, a value judgment. And the value it reflects is one that says that the worst mistake a scientist can make is to fool herself: to think an effect is real when it is not. Statisticians call this a type I error. You can think of it as being gullible, naive, or having undue faith in your own ideas.89 To avoid it, scientists place the burden of proof on the person claiming a cause and effect. But there’s another kind of error-type 2-where you miss effects that are really there. You can think of that as being excessively skeptical or overly cautious. Conventional statistics is set up to be skeptical and avoid type I errors. The 95 percent confidence standard means that there is only 1 chance in 20 that you believe something that isn’t true. That is a very high bar. It reflects a scientific worldview in which skepticism is a virtue, credulity is not.90 As one Web site puts it, ‘A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error’.91 In fact, some statisticians claim that type 2 errors aren’t really errors at all,  just missed opportunities.92

Id. at 156-57 (emphasis added). Oreskes’ statement of the confidence interval, from her book, advances more ambiguity by not specifying what the “something” you don’t believe to be true. Of course, if it is the assumed parameter, then she has made the same error as she did in the Times. Oreskes’ further discussion of the EPA environmental tobacco smoke meta-analysis issue makes her meaning clearer, and her interpretation of statistical significance, less defensible:

“Even if 90 percent is less stringent than 95 percent, it still means that there is a 9 in 10 chance that the observed results did not occur by chance. Think of it this way. If you were nine-tenths sure about a crossword puzzle answer, wouldn’t you write it in?94

Id.  Throughout her discussion, Oreskes fails to acknowledge that the p-value assumes the correctness of the null hypothesis in order to assess the strength of the specific data as evidence against the null. As I have pointed out elsewhere, this misinterpretation of significance testing is a rhetorical strategy to evade significance testing, as well as to obscure the role of bias and confounding in accounting for data that differs from an expected value.

Oreskes also continues to maintain that a failure to reject the null is playing “dumb” and placing:

“the burden of proof on the victim, rather than, for example, the manufacturer of a harmful product-and we may fail to protect some people who are really getting hurt.”

Id. So again, the same petitio principii as we saw in the Times. Victimhood is exactly what remains to be established. Oreskes cannot assume it, and then criticize time-tested methods that fail to deliver a confirmatory judgment.

There are endnotes in her book, but the authors fail to cite any serious statistics text. The only reference of dubious relevance is another University of Michigan book, Stephen T. Ziliak & Deidre N. McCloskey, The Cult of Statistical Significance (2008). Enough said[4].

With a little digging, I learned that Oreskes and Conway are science fiction writers, and perhaps we should judge them by literary rather than scientific standards. See Naomi Oreskes & Erik M. Conway, “The Collapse of Western Civilization: A View from the Future,” 142 Dædalus 41 (2013). I do not imply any pejorative judgment of Oreskes for advancing her apocalyptic vision of the future of Earth’s environment as a work of fiction. Her literary work is a worthy thought experiment that has the potential to lead us to accept her precautionary judgments; and at least her publication, in Dædalus, is clearly labeled science fiction.

Oreskes’ future fantasy is, not surprisingly, exactly what Oreskes, the historian of science, now predicts in terms of catastrophic environmental change. Looking back from the future, the science fiction authors attempt to explore the historical origins of the catastrophe, only to discover that it is the fault of everyone who disagreed with Naomi Oreskes in the early 21st century. Heavy blame is laid at the feet of the ancestor scientists (Oreskes’ contemporaries) who insisted upon scientific and statistical standards for inferring conclusions from observational data. Implicit in the science fiction tale is the welcome acknowledgment that science should make accurate predictions.

In Oreskes’ science fiction, these scientists of yesteryear, today’s adversaries of climate-change advocates, were “almost childlike,” in their felt-need to adopt “strict” standards, and their adherence to severe tests derived from their ancestors’ religious asceticism. In other words, significance testing is a form of self-flagellation. Lest you think, I exaggerate, consider the actual words of Oreskes and Conway:

“In an almost childlike attempt to demarcate their practices from those of older explanatory traditions, scientists felt it necessary to prove to themselves and the world how strict they were in their intellectual standards. Thus, they placed the burden of proof on novel claims, including those about climate. Some scientists in the early twenty-first century, for example, had recognized that hurricanes were intensifying, but they backed down from this conclusion under pressure from their scientific colleagues. Much of the argument surrounded the concept of statistical significance. Given what we now know about the dominance of nonlinear systems and the distribution of stochastic processes, the then-dominant notion of a 95 percent confidence limit is hard to fathom. Yet overwhelming evidence suggests that twentieth-century scientists believed that a claim could be accepted only if, by the standards of Fisherian statistics, the possibility that an observed event could have happened by chance was less than 1 in 20. Many phenomena whose causal mechanisms were physically, chemically, or biologically linked to warmer temperatures were dismissed as “unproven” because they did not adhere to this standard of demonstration.

Historians have long argued about why this standard was accepted, given that it had no substantive mathematical basis. We have come to understand the 95 percent confidence limit as a social convention rooted in scientists’ desire to demonstrate their disciplinary severity. Just as religious orders of prior centuries had demonstrated moral rigor through extreme practices of asceticism in dress, lodging, behavior, and food–in essence, practices of physical self-denial–so, too, did natural scientists of the twentieth century attempt to demonstrate their intellectual rigor through intellectual self-denial.14 This practice led scientists to demand an excessively stringent standard for accepting claims of any kind, even those involving imminent threats.”

142 Dædalus at 44.

The science fiction piece in Dædalus has now morphed into a short book, which is billed within as a “haunting, provocative work of science-based fiction.” Naomi Oreskes & Erik M. Conway, The Collapse of Western Civilization: A View from the Future (N.Y. 2014). Under the cover of fiction, Oreskes and Conway provide their idiosyncratic, fictional definition of statistical significance, in a “Lexicon of Archaic Terms,” at the back of the book:

statistical significance  The archaic concept that an observed phenomenon could only be accepted as true if the odds of it happening by chance were very small, typically taken to be no more than 1 in 20.”

Id. at 61-62. Of course, in writing fiction, you can make up anything you like. Caveat lector.


 

[1] SeePlaying Dumb on Statistical Significance” (Jan. 4, 2015).

[2] SeeSignificance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[3] SeeRhetorical Strategy in Characterizing Scientific Burdens of Proof” (Nov. 15, 2014).

[4] SeeThe Will to Ummph” (Jan. 10, 2012).

Playing Dumb on Statistical Significance

January 4th, 2015

For the last decade, at least, researchers have written to document, explain, and correct, a high rate of false-positive research findings in biomedical research[1]. And yet, there are some authors who complain that the traditional standard of statistical significance is too stringent. The best explanation for this paradox appears to lie in these authors’ rhetorical strategy of protecting their “scientific conclusions,” based upon weak and uncertain research findings, from criticisms. The strategy includes mischaracterizing significance probability as a burden of proof, and then speciously claiming that the standard for significance in the significance probability is too high as a threshold for posterior probabilities of scientific claims. SeeRhetorical Strategy in Characterizing Scientific Burdens of Proof” (Nov. 15, 2014).

Naomi Oreskes is a professor of the history of science in Harvard University. Her writings on the history of geology are well respected; her writings on climate change tend to be more adversarial, rhetorical, and ad hominem. See, e.g., Naomi Oreskes, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (N.Y. 2010). Oreskes’ abuse of the meaning of significance probability for her own rhetorical ends is on display in today’s New York Times. Naomi Oreskes, “Playing Dumb on Climate Change,” N.Y. Times Sunday Rev. at 2 (Jan. 4, 2015).

Oreskes wants her readers to believe that those who are resisting her conclusions about climate change are hiding behind an unreasonably high burden of proof, which follows from the conventional standard of significance in significance probability. In presenting her argument, Oreskes consistently misrepresents the meaning of statistical significance and confidence intervals to be about the overall burden of proof for a scientific claim:

“Typically, scientists apply a 95 percent confidence limit, meaning that they will accept a causal claim only if they can show that the odds of the relationship’s occurring by chance are no more than one in 20. But it also means that if there’s more than even a scant 5 percent possibility that an event occurred by chance, scientists will reject the causal claim. It’s like not gambling in Las Vegas even though you had a nearly 95 percent chance of winning.”

Although the confidence interval is related to the pre-specified Type I error rate, alpha, and so a conventional alpha of 5% does lead to a coefficient of confidence of 95%, Oreskes has misstated the confidence interval to be a burden of proof consisting of a 95% posterior probability. The “relationship” is either true or not; the p-value or confidence interval provides a probability for the sample statistic, or one more extreme, on the assumption that the null hypothesis is correct. The 95% probability of confidence intervals derives from the long-term frequency that 95% of all confidence intervals, based upon samples of the same size, will contain the true parameter of interest.

Oreskes is an historian, but her history of statistical significance appears equally ill considered. Here is how she describes the “severe” standard of the 95% confidence interval:

“Where does this severe standard come from? The 95 percent confidence level is generally credited to the British statistician R. A. Fisher, who was interested in the problem of how to be sure an observed effect of an experiment was not just the result of chance. While there have been enormous arguments among statisticians about what a 95 percent confidence level really means, working scientists routinely use it.”

First, Oreskes, the historian, gets the history wrong. The confidence interval is due to Jerzy Neyman, not to Sir Ronald A. Fisher. Jerzy Neyman, “Outline of a theory of statistical estimation based on the classical theory of probability,” 236 Philos. Trans. Royal Soc’y Lond. Ser. A 333 (1937). Second, although statisticians have debated the meaning of the confidence interval, they have not wandered from its essential use as an estimation of the parameter (based upon the use of an unbiased, consistent sample statistic) and a measure of random error (not systematic error) about the sample statistic. Oreskes provides a fallacious history, with a false and misleading statistics tutorial.

Oreskes, however, goes on to misidentify the 95% coefficient of confidence with the legal standard known as “beyond a reasonable doubt”:

“But the 95 percent level has no actual basis in nature. It is a convention, a value judgment. The value it reflects is one that says that the worst mistake a scientist can make is to think an effect is real when it is not. This is the familiar “Type 1 error.” You can think of it as being gullible, fooling yourself, or having undue faith in your own ideas. To avoid it, scientists place the burden of proof on the person making an affirmative claim. But this means that science is prone to ‘Type 2 errors’: being too conservative and missing causes and effects that are really there.

Is a Type 1 error worse than a Type 2? It depends on your point of view, and on the risks inherent in getting the answer wrong. The fear of the Type 1 error asks us to play dumb; in effect, to start from scratch and act as if we know nothing. That makes sense when we really don’t know what’s going on, as in the early stages of a scientific investigation. It also makes sense in a court of law, where we presume innocence to protect ourselves from government tyranny and overzealous prosecutors — but there are no doubt prosecutors who would argue for a lower standard to protect society from crime.

When applied to evaluating environmental hazards, the fear of gullibility can lead us to understate threats. It places the burden of proof on the victim rather than, for example, on the manufacturer of a harmful product. The consequence is that we may fail to protect people who are really getting hurt.”

The truth of climate change opinions do not turn on sampling error, but rather on the desire to draw an inference from messy, incomplete, non-random, and inaccurate measurements, fed into models of uncertain validity. Oreskes suggests that significance probability is keeping us from acknowledging a scientific fact, but the climate change data sets are amply large to rule out sampling error if that were a problem. And Oreskes’ suggestion that somehow statistical significance is placing a burden upon the “victim,” is simply assuming what she hopes to prove; namely, that there is a victim (and a perpetrator).

Oreskes’ solution seems to have a Bayesian ring to it. She urges that we should start with our a priori beliefs, intuitions, and pre-existing studies, and allow them to lower our threshold for significance probability:

“And what if we aren’t dumb? What if we have evidence to support a cause-and-effect relationship? Let’s say you know how a particular chemical is harmful; for example, that it has been shown to interfere with cell function in laboratory mice. Then it might be reasonable to accept a lower statistical threshold when examining effects in people, because you already have reason to believe that the observed effect is not just chance.

This is what the United States government argued in the case of secondhand smoke. Since bystanders inhaled the same chemicals as smokers, and those chemicals were known to be carcinogenic, it stood to reason that secondhand smoke would be carcinogenic, too. That is why the Environmental Protection Agency accepted a (slightly) lower burden of proof: 90 percent instead of 95 percent.”

Oreskes’ rhetoric misstates key aspects of scientific method. The demonstration of causality in mice, or only some perturbation of cell function in non-human animals, does not warrant lowering our standard for studies in human beings. Mice and rats are, for many purposes, poor predictors of human health effects. All medications developed for human use are tested in animals first, for safety and efficacy. A large majority of such medications, efficacious in rodents, fail to satisfy the conventional standards of significance probability in randomized clinical trials. And that standard is not lowered because the drug sponsor had previously demonstrated efficacy in mice, or some other furry rodent.

The EPA meta-analysis of passive smoking and lung cancer is a good example of how not to conduct science. The protocol for the EPA meta-analysis called for a 95% confidence interval, but the agency scientists manipulated their results by altering the pre-specified coefficient confidence in their final report. Perhaps even more disgraceful was the selectivity of included studies for the meta-analysis, which biased the agency’s result in a way not reflected in p-values or confidence intervals. SeeEPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2, 2012); “EPA Post Hoc Statistical Tests – One Tail vs Two” (Dec. 2, 2012).

Of course, the scientists preparing for and conducting a meta-analysis on environmental tobacco smoke began with a well-justified belief that active smoking causes lung cancer. Passive smoking, however, involves very different exposure levels and raises serious issues of the human body’s defensive mechanisms to protect against low-level exposure. Insisting on a reasonable quality meta-analysis for passive smoking and lung cancer was not a matter of “playing dumb”; it was a recognition of our actual ignorance and uncertainty about the claim being made for low-exposure effects. The shifty confidence intervals and slippery methodology exemplifies how agency scientists assume their probandum to be true, and then manipulate or adjust their methods to provide the result they had assumed all along.

Oreskes then analogizes not playing dumb on environmental tobacco smoke to not playing dumb on climate change:

“In the case of climate change, we are not dumb at all. We know that carbon dioxide is a greenhouse gas, we know that its concentration in the atmosphere has increased by about 40 percent since the industrial revolution, and we know the mechanism by which it warms the planet.

WHY don’t scientists pick the standard that is appropriate to the case at hand, instead of adhering to an absolutist one? The answer can be found in a surprising place: the history of science in relation to religion. The 95 percent confidence limit reflects a long tradition in the history of science that valorizes skepticism as an antidote to religious faith.”

I will leave substance of the climate change issue to others, but Oreskes’ methodological misidentification of the 95% coefficient of confidence with burden of proof is wrong. Regardless of motive, the error obscures the real debate, which is about data quality. More disturbing is that Oreskes’ error confuses significance and posterior probabilities, and distorts the meaning of burden of proof. To be sure, the article by Oreskes is labeled opinion, and Oreskes is entitled to her opinions about climate change and whatever.  To the extent that her opinions, however, are based upon obvious factual errors about statistical methodology, they are entitled to no weight at all.


 

[1] See, e.g., John P. A. Ioannidis, “How to Make More Published Research True,” 11 PLoS Medicine e1001747 (2014); John P. A. Ioannidis, “Why Most Published Research Findings Are False” 2 PLoS Medicine e124 (2005); John P. A. Ioannidis, Anna-Bettina Haidich, and Joseph Lau, “Any casualties in the clash of randomised and observational evidence?” 322 Brit. Med. J. 879 (2001).