Schachtman Law » Reference Manual on Scientific Evidence

TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

People Get Ready – There’s a Reference Manual a Comin’

July 16th, 2021

Science is the key …

Back in February, I wrote about a National Academies’ workshop that featured some outstanding members of the scientific and statistical world, and which gave participants to identify new potential subjects for inclusion in a proposed fourth edition of the Reference Manual on Scientific Evidence.[1] Funding for that new edition is now secured, and the National Academies has published a précis of the February workshop. National Academies of Sciences, Engineering, and Medicine, Emerging Areas of Science, Engineering, and Medicine for the Courts: Proceedings of a Workshop – in Brief (Washington, DC 2021). The Rapporteurs for these proceedings provide a helpful overview for this meeting, which was not generally covered in the legal media.[2]

The goal of the workshop, which was supported by a planning committee, the Committee on Science, Technology, and Law, the National Academies, the Federal Judicial Center, and the National Science Foundation, was, of course, to identify chapters for a new, fourth edition, of Reference Manual on Scientific Evidence. The workshop was co-chaired by Dr. Thomas D. Albright, of the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, Judge on the U.S. Court of Appeals for the Federal Circuit.

The Rapporteurs duly noted Judge O’Malley’s Workshop comments that she hoped that the reconsideration of the Reference Manual can help close the gap between science and the law. It is thus encouraging that the Rapporteurs focused a large part of their summary on the presentation of Professor Xiao-Li Meng[3] on selection bias, which “can come from cherry picking data, which alters the strength of the evidence.” Meng identified the

“7 S’(ins)” of selection bias:

(1) selection of target/hypothesis (e.g., subgroup analysis);

(2) selection of data (e.g., deleting ‘outliers’ or using only ‘complete cases’);

(3) selection of methodologies (e.g., choosing tests to pass the goodness-of-fit); (4) selective due diligence and debugging (e.g., triple checking only when the outcome seems undesirable);

(5) selection of publications (e.g., only when p-value <0.05);

(6) selections in reporting/summary (e.g., suppressing caveats); and

(7) selections in understanding and interpretation (e.g., our preference for deterministic, ‘common sense’ interpretation).”

Meng also addressed the problem of analyzing subgroup findings after not finding an association in the full sample, dubious algorithms, selection bias in publishing “splashy” and nominally “statistically significant” results, and media bias and incompetence in disseminating study results. Meng discussed how these biases could affect the accuracy of research findings, and how these biases obviously affect the accuracy, validity, and reliability of research findings that are relied upon by expert witnesses in court cases.

The Rapporteurs’ emphasis on Professor Meng’s presentation was noteworthy because the current edition of the Reference Manual is generally lacking in a serious exploration of systematic bias and confounding. To be sure, the concepts are superficially addressed in the Manual’s chapter on epidemiology, but in a way that has allowed many district judges to shrug off serious questions of invalidity with the shibboleth that such questions “to to the weight, not the admissibility,” of challenged expert witness opinion testimony. Perhaps the pending revision to Rule 702 will help improve fidelity to the spirit and text of Rule 702.

Questions of bias and noise have come to receive more attention in the professional statistical and epidemiologic literature. In 2009, Professor Timothy Lash published an important book-length treatment of quantitative bias analysis.[4] Last year, statistician David Hand published a comprehensive, but readily understandable, book on “Dark Data,” and the ways statistical and scientific interference are derailed.[5] One of the presenters at the February workshop, nobel laureate, Daniel Kahneman, published a book on “noise,” just a few weeks ago.[6]

David Hand’s book, Dark Data, (Chapter 10) sets out a useful taxonomy of the ways that data can be subverted by what the consumers of data do not know. The taxonomy would provide a useful organizational map for a new chapter of the Reference Manual:

A Taxonomy of Dark Data

Type 1: Data We Know Are Missing

Type 2: Data We Don’t Know Are Missing

Type 3: Choosing Just Some Cases

Type 4: Self- Selection

Type 5: Missing What Matters

Type 7: Changes with Time

Type 8: Definitions of Data

Type 9: Summaries of Data

Type 11: Feedback and Gaming

Type 12: Information Asymmetry

Type 13: Intentionally Darkened Data

Type 14: Fabricated and Synthetic Data

Type 15: Extrapolating beyond Your Data

Providing guidance not only on “how we know,” but also on how we go astray, patho-epistemology, would be helpful for judges and lawyers. Hand’s book really just a beginning to helping gatekeepers appreciate how superficially plausible health-effects claims are invalidated by the data relied upon by proffered expert witnesses.

* * * * * * * * * * * *

“There ain’t no room for the hopeless sinner
Who would hurt all mankind, just to save his own, believe me now
Have pity on those whose chances grow thinner”

[1] “Reference Manual on Scientific Evidence v4.0” (Feb. 28, 2021).

[2] Steven Kendall, Joe S. Cecil, Jason A. Cantone, Meghan Dunn, and Aaron Wolf.

[3] Prof. Meng is the Whipple V. N. Jones Professor of Statistics, in Harvard University. (“Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies.”)

[4] Timothy L. Lash, Matthew P. Fox, and Aliza K. Fink, Applying Quantitative Bias Analysis to Epidemiologic Data (2009).

[5] David J. Hand, Dark data : why what you don’t know matters (2020).

[6] Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (2021).

Posted in Evidence, Expert Witnesses, Reference Manual on Scientific Evidence, Rule 702, Rule 703, Scientific Evidence, Systemic Bias | No Comments »

Reference Manual on Scientific Evidence v4.0

February 28th, 2021

The need for revisions to the third edition of the Reference Manual on Scientific Evidence (RMSE) has been apparent since its publication in 2011. A decade has passed, and the federal agencies involved in the third edition, the Federal Judicial Center (FJC) and the National Academies of Science Engineering and Medicine (NASEM), are assembling staff to prepare the long-needed revisions.

The first sign of life for this new edition came back on November 24, 2020, when the NASEM held a short, closed door virtual meeting to discuss planning for a fourth edition.[1] The meeting was billed by the NASEM as “the first meeting of the Committee on Emerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence.” The Committee members heard from John S. Cooke (FJC Director), and Alan Tomkins and Reggie Sheehan, both of the National Science Foundation (NSF). The stated purpose of the meeting was to review the third edition of the RMSE to identify “identify areas of science, technology, and medicine that may be candidates for new or updated chapters in a proposed new (fourth) edition of the manual.” The only public pronouncement from the first meeting was that the committee would sponsor a workshop on the topic of new chapters for the RMSE, in early 2021.

The Committee’s second meeting took place a week later, again in closed session.[2] The stated purpose of the Committee’s second meeting was to review the third edition of the RMSE, and to discuss candidate areas for inclusion as new and updated chapters for a fourth edition.

Last week saw the Committee’s third, public meeting. The meeting spanned two days (Feb. 24 and 25, 2021), and was open to the public. The meeting was sponsored by NASEM, FJC, along with the NSF, and was co-chaired by Thomas D. Albright, Professor and Conrad T. Prebys Chair at the Salk Institute for Biological Studies, and the Hon. Kathleen McDonald O’Malley, who sits on the United States Court of Appeals for the Federal Circuit. Identified members of the committee include:

Steven M. Bellovin, professor in the Computer Science department at Columbia University;

Karen Kafadar, Departmental Chair and Commonwealth Professor of Statistics at the University of Virginia, and former president of the American Statistical Association;

Andrew Maynard, professor, and director of the Risk Innovation Lab at the School for the Future of Innovation in Society, at Arizona State University;

Venkatachalam Ramaswamy, Director of the Geophysical Fluid Dynamics Laboratory of the National Oceanic and Atmospheric Administration (NOAA) Office of Oceanic and Atmospheric Research (OAR), studying climate modeling and climate change;

Thomas Schroeder, Chief Judge for the U.S. District Court for the Middle District of North Carolina;

David S. Tatel, United States Court of Appeals for the District of Columbia Circuit; and

Steven R. Kendall, Staff Officer

The meeting comprised five panel presentations, made up of remarkably accomplished and talented speakers. Each panel’s presentations were followed by discussion among the panelists, and the committee members. Some panels answered questions submitted from the public audience. Judge O’Malley opened the meeting with introductory remarks about the purpose and scope of the RMSE, and of the inquiry into additional possible chapters.

Challenges in Evaluating Scientific Evidence in Court

The first panel consisted entirely of judges, who held forth on their approaches to judicial gatekeeping of expert witnesses, and their approach to scientific and technical issues. Chief Judge Schroeder moderated the presentations of panelists:

Barbara Parker Hervey, Texas Court of Criminal Appeals;

Patti B. Saris, Chief Judge of the United States District Court for the District of Massachusetts, member of President’s Council of Advisors on Science and Technology (PCAST);

Leonard P. Stark, U.S. District Court for the District of Delaware; and

Sarah S. Vance, Judge (former Chief Judge) of the U.S. District Court for the Eastern District of Louisiana, chair of the Judicial Panel on Multidistrict Litigation.

Emerging Issues in the Climate and Environmental Sciences

Paul Hanle, of the Environmental Law Institute moderated presenters:

Joellen L. Russell, the Thomas R. Brown Distinguished Chair of Integrative Science and Professor at the University of Arizona in the Department of Geosciences;

Veerabhadran Ramanathan, Edward A. Frieman Endowed Presidential Chair in Climate Sustainability at the Scripps Institution of Oceanography at the University of California, San Diego;

Benjamin D. Santer, atmospheric scientist at Lawrence Livermore National Laboratory; and

Donald J. Wuebbles, the Harry E. Preble Professor of Atmospheric Science at the University of Illinois.

Emerging Issues in Computer Science and Information Technology

Josh Goldfoot, Principal Deputy Chief, Computer Crime & Intellectual Property Section, at U.S. Department of Justice, moderated panelists:

Jeremy J. Epstein, Deputy Division Director of Computer and Information Science and Engineering (CISE) and Computer and Network Systems (CNS) at the National Science Foundation;

Russ Housley, founder of Vigil Security, LLC;

Subbarao Kambhampati, professor of computer science at Arizona State University; and

Alice Xiang, Senior Research Scientist at Sony AI.

Emerging Issues in the Biological Sciences

Panel four was moderated by Professor Ellen Wright Clayton, the Craig-Weaver Professor of Pediatrics, and Professor of Law and of Health Policy at Vanderbilt Law School, at Vanderbilt University. Her panelists were:

Dana Carroll, distinguished professor in the Department of Biochemistry at the University of Utah School of Medicine;

Yaniv Erlich, Chief Executive Officer of Eleven Therapeutics, Chief Science Officer of MyHeritage;

Steven E. Hyman, director of the Stanley Center for Psychiatric Research at Broad Institute of MIT and Harvard; and

Philip Sabes, Professor Emeritus in Physiology at the University of California, San Francisco (UCSF).

Emerging areas in Psychology, Data, and Statistical Sciences

Gary Marchant, Lincoln Professor of Emerging Technologies, Law and Ethics, at Arizona State University’s Sandra Day O’Connor College of Law, moderated panelists:

Xiao-Li Meng, the Whipple V. N. Jones Professor of Statistics, Harvard University, and the Founding Editor-in-Chief of Harvard Data Science Review;

Rebecca Doerge, Glen de Vries Dean of the Mellon College of Science at Carnegie Mellon University, member of the Dietrich College of Humanities and Social Sciences’ Department of Statistics and Data Science, and of the Mellon College of Science’s Department of Biological Sciences;

Daniel Kahneman, Professor of Psychology and Public Affairs Emeritus at the Princeton School of Public and International Affairs, the Eugene Higgins Professor of Psychology Emeritus at Princeton University, and a fellow of the Center for Rationality at the Hebrew University in Jerusalem; and

Goodwin Liu, Associate Justice of the California Supreme Court.

The Proceedings of this two day meeting were recorded and will be published. The website materials are unclear whether the verbatim remarks will be included, but regardless, the proceedings should warrant careful reading.

Judge O’Malley, in her introductory remarks, emphasized that the RMSE must be a neutral, disinterested source of information for federal judges, an aspirational judgment from which there can be no dissent. More controversial will be Her Honor’s assessment that epidemiologic studies can “take forever,” and other judges’ suggestion that plaintiffs lack financial resources to put forward credible, reliable expert witnesses. Judge Vance corrected the course of the discussion by pointing out that MDL plaintiffs were not disadvantaged, but no one pointed out that plaintiffs’ counsel were among the wealthiest individuals in the United States, and that they have been known to sponsor epidemiologic and other studies that wind up as evidence in court.

Panel One was perhaps the most discomforting experience, as it involved revelations about how sausage is made in the gatekeeping process. The panel was remarkable for including a state court judge from Texas, Judge Barbara Parker Hervey, of the Texas Court of Criminal Appeals. Judge Hervey remarked that [in her experience] if we judges “can’t understand it, we won’t read it.” Her dictum raises interesting issues. No doubt, in some instances, the judicial failure of comprehension is the fault of the lawyers. What happens when the judges “can’t understand it”? Do they ask for further briefing? Or do they ask for a hearing with viva voce testimony from expert witnesses? The point was not followed up.

Leonard P. Stark’s insights were interesting in that his docket in the District of Delaware is flooded with patent and Hatch-Waxman Act litigation. Judge Stark’s extensive educational training is in politics and political science. The docket volume Judge Stark described, however, raised issues about how much attention he could give to any one case.

When the panel was asked how they dealt with scientific issues, Judge Saris discussed her presiding over In re Neurontin, which was a “big challenge for me to understand,” with no randomized trials or objective assessments by the litigants.[3] Judge Vance discussed her experience of presiding in a low-level benzene exposure case, in which plaintiff claimed that his acute myelogenous leukemia was caused by gasoline.[4]

Perhaps the key difference in approach to Rule 702 emerged when the judges were asked whether they read the underlying studies. Judge Saris did not answer directly, but stated she reads the reports. Judge Vance, on the other hand, noted that she reads the relied upon studies. In her gasoline-leukemia case, she read the relied-upon epidemiologic studies, which she described as a “hodge podge,” and which were misrepresented by the expert witnesses and counsel. She emphasized the distortions of the adversarial system and the need to moderate its excesses by validating what exactly the expert witnesses had relied upon.

This division in judicial approach was seen again when Professor Karen Kafadar asked how the judges dealt with peer review. Judge Saris seemed to suggest that the peer-reviewed published article was prima facie reliable. Others disagreed and noted that peer reviewed articles can have findings that are overstated, and wrong. One speaker noted that Jerome Kassirer had downplayed the significance of, and the validation provided by, peer review, in the RMSE (3^rd ed 2011).

Curiously, there was no discussion of Rule 703, either in Judge O’Malley’s opening remarks on the RMSE, or in the first panel discussion. When someone from the audience submitted a question about the role of Rule 703 in the gatekeeping process, the moderator did not read it.

Panel Two. The climate change panel was a tour de force of the case for anthropogenic climate change. To some, the presentations may have seemed like a reprise of The Day After Tomorrow. Indeed, the science was presented so confidently, if not stridently, that one of the committee members asked whether there could be any reasonable disagreement. The panelists responded essentially by pointing out that there could be no good faith opposition. The panelists were much less convincing on the issue of attributability. None of the speakers addressed the appropriateness vel non of climate change litigation, when the federal and state governments encouraged, licensed, and regulated the exploitation and use of fossil fuel reserves.

Panel Four. Dr. Clayton’s panel was fascinating and likely to lead to new chapters. Professor Hyman presented on heritability, a subject that did not receive much attention in the RMSE third edition. With the advent of genetic claims of susceptibility and defenses of mutation-induced disease, courts will likely need some good advice on navigating the science. Dana Carroll presented on human genome editing (CRISPR). Philip Sabes presented on brain-computer interfaces, which have progressed well beyond the level of sci-fi thrillers, such as The Brain That Wouldn’t Die (“Jan in the Pan”).

In addition to the therapeutic applications, Sabes discussed some of potential forensic uses, such as lie detectors, pain quantification, and the like. Yaniv Erlich, of MyHeritage, discussed advances in forensic genetic genealogy, which have made a dramatic entrance to the common imagination through the apprehension of Joseph James DeAngelo, the Golden State killer. The technique of triangulating DNA matches from consumer DNA databases has other applications, of course, such as identifying lost heirs, and resolving paternity issues.

Panel Five. Professor Marchant’s panel may well have identified some of the most salient needs for the next edition of the RMSE. Nobel Laureate Daniel Kahneman presented some of the highlights from his forthcoming book about “noise” in human judgment.[5] Kahneman’s expansion upon his previous thinking about the sources of error in human – and scientific – judgment are a much needed addition to the RMSE. Along the same lines, Professor Xiao Li Meng, presented on selection bias, and how it pervades scientific work, and detracts from the strength of evidence in the form of:

cherry picking
subgroup analyses
unprincipled handling of outliers
selection in methodologies (different tests)
selection in due diligence (check only when you don’t like results)
publication bias that results from publishing only impressive or statistically significant results
selection in reporting, not reporting limitations all analyses
selection in understanding

Professor Meng’s insights are sorely lacking in the third edition of the RMSE, and among judicial gatekeepers generally. All too often, undue selectivity in methodologies and in relied-upon data is treated by judges as an issue that “goes to the weight, not the admissibility” of expert witness opinion testimony. In actuality, the selection biases, and other systematic and cognitive biases, are as important as, if not more important than, random error assessments. Indeed a close look at the RMSE third edition reveals a close embrace of the amorphous, anything-goes “weight of the evidence” approach in the epidemiology chapter. That chapter marginalizes meta-analyses and fails to mention systematic review techiniques altogether. The chapter on clinical medicine, however, takes a divergent approach, emphasizing the hierarchy of evidence inherent in different study types, and the need for principled and systematic reviews of the available evidence.[6]

The Committee co-chairs and panel moderators did a wonderful job to identify important new trends in genetics, data science, error assessment, and computer science, and they should be congratulated for their efforts. Judge O’Malley is certainly correct in saying that the RMSE must be a neutral source of information on statistical and scientific methodologies, and it needs to be revised and updated to address errors and omissions in the previous editions. The legal community should look for, and study, the published proceedings when they become available.

——————————————————————————————————

[1] See “Emerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence – Committee Meeting” (Nov. 24, 2020).

[2] See “Emerging Areas of Science, Engineering, and Medicine for the Courts: Identifying Chapters for a Fourth Edition of The Reference Manual on Scientific Evidence – Committee Meeting 2 (Virtual)” (Dec. 1, 2020).

[3] In re Neurontin Marketing, Sales Practices & Prods. Liab. Litig., 612 F. Supp. 2d 116 (D. Mass. 2009) (Saris, J.).

[4] Burst v. Shell Oil Co., 104 F.Supp.3d 773 (E.D.La. 2015) (Vance, J.), aff’d, ___ Fed. App’x ___, 2016 WL 2989261 (5th Cir. May 23, 2016), cert. denied, 137 S.Ct. 312 (2016). See “The One Percent Non-solution – Infante Fuels His Own Exclusion in Gasoline Leukemia Case” (June 25, 2015).

[5] Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, Noise: A Flaw in Human Judgment (anticipated May 2021).

[6] See John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” Reference Manual on Scientific Evidence 723-24 (3ed ed. 2011) (discussing hierarchy of medical evidence, with systematic reviews at the apex).

Posted in Evidence, Expert Witnesses, Peer Review, Reference Manual on Scientific Evidence, Rule 702, Rule 703, Scientific Evidence | 88 Comments »

On Praising Judicial Decisions – In re Viagra

February 8th, 2021

We live in strange times. A virulent form of tribal stupidity gave us Trumpism, a personality cult in which it impossible to function in the Republican party and criticize der Führe. Even a diehard right-winger such as Liz Cheney, who dared to criticize Trump is censured, for nothing more than being disloyal to a cretin who fomented an insurrection that resulted in the murder of a Capital police officer and the deaths of several other people.[1]

Unfortunately, a similar, even if less extreme, tribal chauvinism affects legal commentary, from both sides of the courtroom. When Judge Richard Seeborg issued an opinion, early in 2020), in the melanoma – phosphodiesterase type 5 inhibitor (PDE5i) litigation,[2] I praised the decision for not shirking the gatekeeping responsibility even when the causal claim was based upon multiple, consistent statistically significant observational studies that showed an association between PDE5i medications and melanoma.[3] Although many of the plaintiffs’ relied-upon studies reported statistically significant associations between PDE5i use and melanoma occurrence, they also found similar size associations with non-melanoma skin cancers. Because skin carcinomas were not part of the hypothesized causal mechanism, the study findings strongly suggested a common, unmeasured confounding variable such as skin damage from ultraviolet light. The plaintiffs’ expert witnesses’ failure to account for confounding was fatal under Rule 702, and Judge Seeborg’s recognition of this defect, and his willingness to go beyond multiple, consistent, statistically significant associations was what made the decision important.

There were, however, problems and even a blatant error in the decision that required attention. Although the error was harmless in that its correction would not have required, or even suggested, a different result, Judge Seeborg, like many other judges and lawyers, tripped up over the proper interpretation of a confidence interval:

“When reviewing the results of a study it is important to consider the confidence interval, which, in simple terms, is the ‘margin of error’. For example, a given study could calculate a relative risk of 1.4 (a 40 percent increased risk of adverse events), but show a 95 percent ‘confidence interval’ of .8 to 1.9. That confidence interval means there is 95 percent chance that the true value—the actual relative risk—is between .8 and 1.9.”[4]

This statement about the true value is simply wrong. The provenance of this error is old, but the mistake was unfortunately amplified in the Third Edition of the Reference Manual on Scientific Evidence,[5] in its chapter on epidemiology.[6] The chapter, which is often cited, twice misstates the meaning of a confidence interval:

“A confidence interval provides both the relative risk (or other risk measure) found in the study and a range (interval) within which the risk likely would fall if the study were repeated numerous times.”[7]

and

“A confidence interval is a range of possible values calculated from the results of a study. If a 95% confidence interval is specified, the range encompasses the results we would expect 95% of the time if samples for new studies were repeatedly drawn from the same population. Thus, the width of the interval reflects random error.”[8]

The 95% confidence interval does represent random error, 1.96 standard errors above and below the point estimate from the sample date. The confidence interval is not the range of possible values, which could well be anything, but the range of reasonable compatible estimates with this one, particular study sample statistic.[9] Intervals have lower and upper bounds, which are themselves random variables, with approximately normal (or some other specified) distributions. The essence of the interval is that no value within the interval would be rejected as a null hypothesis based upon the data collected for the particular sample. Although the chapter on statistics in the Reference Manual accurately describes confidence intervals, judges and many lawyers are misled by the misstatements in the epidemiology chapter.[10]

Given the misdirection created by the Federal Judicial Center’s manual, Judge Seeborg’s erroneous definition of a confidence interval is understandable, but it should be noted in the context of praising the important gatekeeping decision in In re Viagra. Certainly our litigation tribalism should not “allow us to believe” impossible things.[11] The time to revise the Reference Manual is long overdue.

_____________________________________________________________________

[1] John Ruwitch, “Wyoming GOP Censures Liz Cheney For Voting To Impeach Trump,” Nat’l Pub. Radio (Feb. 6, 2021).

[2] In re Viagra (Sildenafil Citrate) and Cialis (Tadalafil) Prods. Liab. Litig., 424 F. Supp. 3d 781 (N.D. Cal. 2020) [Viagra].

[3] See “Judicial Gatekeeping Cures Claims That Viagra Can Cause Melonoma” (Jan. 24, 2020).

[4] Id. at 787.

[5] Federal Judicial Center, Reference Manual on Scientific Evidence (3rd ed. 2011).

[6] Michael D. Green, D. Michal Freedman, & Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 549 (3rd ed. 2011).

[7] Id. at 573.

[8] Id. at 580.

[9] Michael O. Finkelstein & Bruce Levin, Statistics for Lawyers 171, 173-74 (3rd ed. 2015). See also Sander Greenland, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman, “Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations,” 31 Eur. J. Epidem. 337 (2016).

[10] See, e.g., Derek C. Smith, Jeremy S. Goldkind, and William R. Andrichik, “Statistically Significant Association: Preventing the Misuse of the Bradford Hill Criteria to Prove Causation in Toxic Tort Cases,” 86 Defense Counsel J. 1 (2020) (mischaracterizing the meaning of confidence intervals based upon the epidemiology chapter in the Reference Manual).

[11] See, e.g., James Beck, “Tort Pandemic Countermeasures? The Ten Best Prescription Drug/Medical Device Decisions of 2020,” Drug and Device Law Blog (Dec. 30, 2020) (suggesting that Judge Seeborg’s decision represented the rejection of plausibility and a single “association” as insufficient); Steven Boranian, “General Causation Experts Excluded In Viagra/Cialis MDL,” (Jan. 23, 2020).

Posted in Confounding, Reference Manual on Scientific Evidence, Rule 702, statistical evidence | No Comments »

Science Bench Book for Judges

July 13th, 2019

On July 1^st of this year, the National Judicial College and the Justice Speakers Institute, LLC released an online publication of the Science Bench Book for Judges [Bench Book]. The Bench Book sets out to cover much of the substantive material already covered by the Federal Judicial Center’s Reference Manual:

Acknowledgments

Table of Contents

Introduction: Why This Bench Book?

What is Science?

Scientific Evidence

Introduction to Research Terminology and Concepts

Pre-Trial Civil

Pre-trial Criminal

Trial

Juvenile Court

The Expert Witness

Evidence-Based Sentencing

Post Sentencing Supervision

Civil Post Trial Proceedings

Conclusion: Judges—The Gatekeepers of Scientific Evidence

Appendix 1 – Frye/Daubert—State-by-State

Appendix 2 – Sample Orders for Criminal Discovery

Appendix 3 – Biographies

The Bench Book gives some good advice in very general terms about the need to consider study validity,[1] and to approach scientific evidence with care and “healthy skepticism.”[2] When the Bench Book attempts to instruct on what it represents the scientific method of hypothesis testing, the good advice unravels:

“A scientific hypothesis simply cannot be proved. Statisticians attempt to solve this dilemma by adopting an alternate [sic] hypothesis – the null hypothesis. The null hypothesis is the opposite of the scientific hypothesis. It assumes that the scientific hypothesis is not true. The researcher conducts a statistical analysis of the study data to see if the null hypothesis can be rejected. If the null hypothesis is found to be untrue, the data support the scientific hypothesis as true.”[3]

Even in experimental settings, a statistical analysis of the data do not lead to a conclusion that the null hypothesis is untrue, as opposed to not reasonably compatible with the study’s data. In observational studies, the statistical analysis must acknowledge whether and to what extent the study has excluded bias and confounding. When the Bench Book turns to speak of statistical significance, more trouble ensues:

“The goal of an experiment, or observational study, is to achieve results that are statistically significant; that is, not occurring by chance.”[4]

In the world of result-oriented science, and scientific advocacy, it is perhaps true that scientists seek to achieve statistically significant results. Still, it seems crass to come right out and say so, as opposed to saying that the scientists are querying the data to see whether they are compatible with the null hypothesis. This first pass at statistical significance is only mildly astray compared with the Bench Book’s more serious attempts to define statistical significance and confidence intervals:

“4.10 Statistical Significance

The research field agrees that study outcomes must demonstrate they are not the result of random chance. Leaving room for an error of .05, the study must achieve a 95% level of confidence that the results were the product of the study. This is denoted as p ≤ 05. (or .01 or .1).”[5]

and

“The confidence interval is also a way to gauge the reliability of an estimate. The confidence interval predicts the parameters within which a sample value will fall. It looks at the distance from the mean a value will fall, and is measured by using standard deviations. For example, if all values fall within 2 standard deviations from the mean, about 95% of the values will be within that range.”[6]

Of course, the interval speaks to the precision of the estimate, not its reliability, but that is a small point. These definitions are virtually guaranteed to confuse judges into conflating statistical significance and the coefficient of confidence with the legal burden of proof probability.

The Bench Book runs into problems in interpreting legal decisions, which would seem softer grist for the judicial mill. The authors present dictum from the Daubert decision as though it were a holding:[7]

“As noted in Daubert, ‘[t]he focus, of course, must be solely on principles and methodology, not on the conclusions they generate’.”

The authors fail to mention that this dictum was abandoned in Joiner, and that it is specifically rejected by statute, in the 2000 revision to the Federal Rule of Evidence 702.

Early in the Bench Book, it authors present a subsection entitled “The Myth of Scientific Objectivity,” which they might have borrowed from Feyerabend or Derrida. The heading appears misleading because the text contradicts it:

“Scientists often develop emotional attachments to their work—it can be difficult to abandon an idea. Regardless of bias, the strongest intellectual argument, based on accepted scientific hypotheses, will always prevail, but the road to that conclusion may be fraught with scholarly cul-de-sacs.”[8]

In a similar vein, the authors misleadingly tell readers that “the forefront of science is rarely encountered in court,” and so “much of the science mentioned there shall be considered established….”[9] Of course, the reality is that many causal claims presented in court have already been rejected or held to be indeterminate by the scientific community. And just when readers may think themselves safe from the goblins of nihilism, the authors launch into a theory of naïve probabilism that science is just placing subjective probabilities upon data, based upon preconceived biases and beliefs:

“All of these biases and beliefs play into the process of weighing data, a critical aspect of science. Placing weight on a result is the process of assigning a probability to an outcome. Everything in the universe can be expressed in probabilities.”[10]

So help the expert witness who honestly (and correctly) testifies that the causal claim or its rejection cannot be expressed as a probability statement!

Although I have not read all of the Bench Book closely, there appears to be no meaningful discussion of Rule 703, or of the need to access underlying data to ensure that the proffered scientific opinion under scrutiny has used appropriate methodologies at every step in its development. Even a 412 text cannot address every issue, but this one does little to help the judicial reader find more in-depth help on statistical and scientific methodological issues that arise in occupational and environmental disease claims, and in pharmaceutical products litigation.

The organizations involved in this Bench Book appear to be honest brokers of remedial education for judges. The writing of this Bench Book was funded by the State Justice Institute (SJI) Which is a creation of federal legislation enacted with the laudatory goal of improving the quality of judging in state courts.[11] Despite its provenance in federal legislation, the SJI is a a private, nonprofit corporation, governed by 11 directors appointed by the President, and confirmed by the Senate. A majority of the directors (six) are state court judges, one state court administrator, and four members of the public (no more than two from any one political party). The function of the SJI is to award grants to improve judging in state courts.

The National Judicial College (NJC) originated in the early 1960s, from the efforts of the American Bar Association, American Judicature Society and the Institute of Judicial Administration, to provide education for judges. In 1977, the NJC became a Nevada not-for-profit (501)(c)(3) educational corporation, which its campus at the University of Nevada, Reno, where judges could go for training and recreational activities.

The Justice Speakers Institute appears to be a for-profit company that provides educational resources for judge. A Press Release touts the Bench Book and follow-on webinars. Caveat emptor.

The rationale for this Bench Book is open to question. Unlike the Reference Manual for Scientific Evidence, which was co-produced by the Federal Judicial Center and the National Academies of Science, the Bench Book’s authors are lawyers and judges, without any subject-matter expertise. Unlike the Reference Manual, the Bench Book’s chapters have no scientist or statistician authors, and it shows. Remarkably, the Bench Book does not appear to cite to the Reference Manual or the Manual on Complex Litigation, at any point in its discussion of the federal law of expert witnesses or of scientific or statistical method. Perhaps taxpayers would have been spared substantial expense if state judges were simply encouraged to read the Reference Manual.

[1] Bench Book at 190.

[2] Bench Book at 174 (“Given the large amount of statistical information contained in expert reports, as well as in the daily lives of the general society, the ability to be a competent consumer of scientific reports is challenging. Effective critical review of scientific information requires vigilance, and some healthy skepticism.”).

[3] Bench Book at 137; see also id. at 162.

[4] Bench Book at 148.

[5] Bench Book at 160.

[6] Bench Book at 152.

[7] Bench Book at 233, quoting Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[8] Bench Book at 10.

[9] Id. at 10.

[10] Id. at 10.

[11] See State Justice Institute Act of 1984 (42 U.S.C. ch. 113, 42 U.S.C. § 10701 et seq.).

Posted in Causation, Evidence, Expert Witness Discovery, Expert Witnesses, Frye, Reference Manual on Scientific Evidence, Rule 702, Rule 703, Scientific Evidence, statistical evidence | Comments Off on Science Bench Book for Judges

N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses

August 8th, 2018

The United States Supreme Court’s decision in Daubert is now over 25 years old. The idea of judicial gatekeeping of expert witness opinion testimony is even older in New Jersey state courts. The New Jersey Supreme Court articulated a reliability standard before the Daubert case was even argued in Washington, D.C. See Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991). Articulating a standard, however, is something very different from following a standard, and in many New Jersey trial courts, until very recently, the standard was pretty much anything goes.

One counter-example to the general rule of dog-eat-dog in New Jersey was Judge Nelson Johnson’s careful review and analysis of the proffered causation opinions in cases in which plaintiffs claimed that their use of the anti-acne medication isotretinoin (Accutane) caused Crohn’s disease. Judge Johnson, who sits in the Law Division of the New Jersey Superior Court for Atlantic County held a lengthy hearing, and reviewed the expert witnesses’ reliance materials.¹ Judge Johnson found that the plaintiffs’ expert witnesses had employed undue selectivity in choosing what to rely upon. Perhaps even more concerning, Judge Johnson found that these witnesses had refused to rely upon reasonably well-conducted epidemiologic studies, while embracing unpublished, incomplete, and poorly conducted studies and anecdotal evidence. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div., Atlantic Cty. Feb. 20, 2015). In response, Judge Johnson politely but firmly closed the gate to conclusion-driven duplicitous expert witness causation opinions in over 2,000 personal injury cases. “Johnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015).

Aside from resolving over 2,000 pending cases, Judge Johnson’s judgment was of intense interest to all who are involved in pharmaceutical and other products liability litigation. Judge Johnson had conducted a pretrial hearing, sometimes called a Kemp hearing in New Jersey, after the New Jersey Supreme Court’s opinion in Kemp v. The State of New Jersey, 174 N.J. 412 (2002). At the hearing and in his opinion that excluded plaintiffs’ expert witnesses’ causation opinions, Judge Johnson demonstrated a remarkable aptitude for analyzing data and inferences in the gatekeeping process.

When the courtroom din quieted, the trial court ruled that the proffered testimony of Dr., Arthur Kornbluth and Dr. David Madigan did not meet the liberal New Jersey test for admissibility. In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015). And in closing the gate, Judge Johnson protected the judicial process from several bogus and misleading “lines of evidence,” which have become standard ploys to mislead juries in courthouses where the gatekeepers are asleep. Recognizing that not all evidence is on the same analytical plane, Judge Johnson gave case reports short shrift.

“[u]nsystematic clinical observations or case reports and adverse event reports are at the bottom of the evidence hierarchy.”

Id. at *16. Adverse event reports, largely driven by the very litigation in his courtroom, received little credit and were labeled as “not evidentiary in a court of law.” Id. at 14 (quoting FDA’s description of FAERS).

Judge Johnson recognized that there was a wide range of identified “risk factors” for irritable bowel syndrome, such as prior appendectomy, breast-feeding as an infant, stress, Vitamin D deficiency, tobacco or alcohol use, refined sugars, dietary animal fat, fast food. In re Accutane, 2015 WL 753674, at *9. The court also noted that there were four medications generally acknowledged to be potential risk factors for inflammatory bowel disease: aspirin, nonsteroidal anti-inflammatory medications (NSAIDs), oral contraceptives, and antibiotics. Understandably, Judge Johnson was concerned that the plaintiffs’ expert witnesses preferred studies unadjusted for potential confounding co-variables and studies that had involved “cherry picking the subjects.” Id. at *18.

Judge Johnson had found that both sides in the isotretinoin cases conceded the relative unimportance of animal studies, but the plaintiffs’ expert witnesses nonetheless invoked the animal studies in the face of the artificial absence of epidemiologic studies that had been created by their cherry-picking strategies. Id.

Plaintiffs’ expert witnesses had reprised a common claimants’ strategy; namely, they claimed that all the epidemiology studies lacked statistical power. Their arguments often ignored that statistical power calculations depend upon statistical significance, a concept to which many plaintiffs’ counsel have virulent antibodies, as well as an arbitrarily selected alternative hypothesis of association size. Furthermore, the plaintiffs’ arguments ignored the actual point estimates, most of which were favorable to the defense, and the observed confidence intervals, most of which were reasonably narrow.

The defense responded to the bogus statistical arguments by presenting an extremely capable clinical and statistical expert witness, Dr. Stephen Goodman, to present a meta-analysis of the available epidemiologic evidence.

Meta-analysis has become an important facet of pharmaceutical and other products liability litigation[1]. Fortunately for Judge Johnson, he had before him an extremely capable expert witness, Dr. Stephen Goodman, to explain meta-analysis generally, and two meta-analyses he had performed on isotretinoin and irritable bowel outcomes.

Dr. Goodman explained that the plaintiffs’ witnesses’ failure to perform a meta-analysis was telling when meta-analysis can obviate the plaintiffs’ hyperbolic statistical complaints:

“the strength of the meta-analysis is that no one feature, no one study, is determinant. You don’t throw out evidence except when you absolutely have to.”

In re Accutane, 2015 WL 753674, at *8.

Judge Johnson’s judicial handiwork received non-deferential appellate review from a three-judge panel of the Appellate Division, which reversed the exclusion of Kornbluth and Madigan. In re Accutane Litig., 451 N.J. Super. 153, 165 A.3d 832 (App. Div. 2017). The New Jersey Supreme Court granted the isotretinoin defendants’ petition for appellate review, and the issues were joined over the appropriate standard of appellate review for expert witness opinion exclusions, and the appropriateness of Judge Johnson’s exclusions of Kornbluth and Madigan. A bevy of amici curiae joined in the fray.²

Last week, the New Jersey Supreme Court issued a unanimous opinion, which reversed the Appellate Division’s holding that Judge Johnson had “mistakenly exercised” discretion. Applying its own precedents from Rubanick, Landrigan, and Kemp, and the established abuse-of-discretion standard, the Court concluded that the trial court’s ruling to exclude Kornbluth and Madigan was “unassailable.” In re Accutane Litig., ___ N.J. ___, 2018 WL 3636867 (2018), Slip op. at 79.³

The high court graciously acknowledged that defendants and amici had “good reason” to seek clarification of New Jersey law. Slip op. at 67. In abandoning abuse-of-discretion as its standard of review, the Appellate Division had relied upon a criminal case that involved the application of the Frye standard, which is applied as a matter of law. Id. at 70-71. The high court also appeared to welcome the opportunity to grant review and reverse the intermediate court reinforce “the rigor expected of the trial court” in its gatekeeping role. Id. at 67. The Supreme Court, however, did not articulate a new standard; rather it demonstrated at length that Judge Johnson had appropriately applied the legal standards that had been previously announced in New Jersey Supreme Court cases.⁴

In attempting to defend the Appellate Division’s decision, plaintiffs sought to characterize New Jersey law as somehow different from, and more “liberal” than, the United States Supreme Court’s decision in Daubert. The New Jersey Supreme Court acknowledged that it had never formally adopted the dicta from Daubert about factors that could be considered in gatekeeping, slip op. at 10, but the Court went on to note what disinterested observers had long understood, that the so-called Daubert factors simply flowed from a requirement of sound methodology, and that there was “little distinction” and “not much light” between the Landrigan and Rubanick principles and the Daubert case or its progeny. Id at 10, 80.

Curiously, the New Jersey Supreme Court announced that the Daubert factors should be incorporated into the New Jersey Rules 702 and 703 and their case law, but it stopped short of declaring New Jersey a “Daubert” jurisdiction. Slip op. at 82. In part, the Court’s hesitance followed from New Jersey’s bifurcation of expert witness standards for civil and criminal cases, with the Frye standard still controlling in the criminal docket. At another level, it makes no sense to describe any jurisdiction as a “Daubert” state because the relevant aspects of the Daubert decision were dicta, and the Daubert decision and its progeny were superseded by the revision of the controlling statute in 2000.⁵

There were other remarkable aspects of the Supreme Court’s Accutane decision. For instance, the Court put its weight behind the common-sense and accurate interpretation of Sir Austin Bradford Hill’s famous articulation of factors for causal judgment, which requires that sampling error, bias, and confounding be eliminated before assessing whether the observed association is strong, consistent, plausible, and the like. Slip op. at 20 (citing the Reference Manual at 597-99), 78.

The Supreme Court relied extensively on the National Academies’ Reference Manual on Scientific Evidence.⁶ That reliance is certainly preferable to judicial speculations and fabulations of scientific method. The reliance is also positive, considering that the Court did not look only at the problematic epidemiology chapter, but adverted also to the chapters on statistical evidence and on clinical medicine.

The Supreme Court recognized that the Appellate Division had essentially sanctioned an anything goes abandonment of gatekeeping, an approach that has been all-too-common in some of New Jersey’s lower courts. Contrary to the previously prevailing New Jersey zeitgeist, the Court instructed that gatekeeping must be “rigorous” to “prevent[] the jury’s exposure to unsound science through the compelling voice of an expert.” Slip op. at 68-9.

Not all evidence is equal. “[C]ase reports are at the bottom of the evidence hierarchy.” Slip op. at 73. Extrapolation from non-human animal studies is fraught with external validity problems, and such studies “far less probative in the face of a substantial body of epidemiologic evidence.” Id. at 74 (internal quotations omitted).

Perhaps most chilling for the lawsuit industry will be the Supreme Court’s strident denunciation of expert witnesses’ selectivity in choosing lesser evidence in the face of a large body of epidemiologic evidence, id. at 77, and their unprincipled cherry picking among the extant epidemiologic publications. Like the trial court, the Supreme Court found that the plaintiffs’ expert witnesses’ inconsistent use of methodological criteria and their selective reliance upon studies (disregarding eight of the nine epidemiologic studies) that favored their task masters was the antithesis of sound methodology. Id. at 73, citing with approval, In re Lipitor, ___ F.3d ___ (4th Cir. 2018) (slip op. at 16) (“Result-driven analysis, or cherry-picking, undermines principles of the scientific method and is a quintessential example of applying methodologies (valid or otherwise) in an unreliable fashion.”).

An essential feature of the Supreme Court’s decision is that it was not willing to engage in the common reductionism that has “all epidemiologic studies are flawed,” and which thus privileges cherry picking. Not all disagreements between expert witnesses can be framed as differences in interpretation. In re Accutane will likely stand as a bulwark against flawed expert witness opinion testimony in the Garden State for a long time.

1 Judge Nelson Johnson is also the author of Boardwalk Empire: The Birth, High Times, and Corruption of Atlantic City (2010), a spell-binding historical novel about political and personal corruption.

2 In support of the defendants’ positions, amicus briefs were filed by the New Jersey Business & Industry Association, Commerce and Industry Association of New Jersey, and New Jersey Chamber of Commerce; by law professors Kenneth S. Broun, Daniel J. Capra, Joanne A. Epps, David L. Faigman, Laird Kirkpatrick, Michael M. Martin, Liesa Richter, and Stephen A. Saltzburg; by medical associations the American Medical Association, Medical Society of New Jersey, American Academy of Dermatology, Society for Investigative Dermatology, American Acne and Rosacea Society, and Dermatological Society of New Jersey, by the Defense Research Institute; by the Pharmaceutical Research and Manufacturers of America; and by New Jersey Civil Justice Institute. In support of the plaintiffs’ position and the intermediate appellate court’s determination, amicus briefs were filed by political action committee the New Jersey Association for Justice; by the Ironbound Community Corporation; and by plaintiffs’ lawyer Allan Kanner.

3 Nothing in the intervening scientific record called question upon Judge Johnson’s trial court judgment. See, e.g., I.A. Vallerand, R.T. Lewinson, M.S. Farris, C.D. Sibley, M.L. Ramien, A.G.M. Bulloch, and S.B. Patten, “Efﬁcacy and adverse events of oral isotretinoin for acne: a systematic review,” 178 Brit. J. Dermatol. 76 (2018).

4 Slip op. at 9, 14-15, citing Landrigan v. Celotex Corp., 127 N.J. 404, 414 (1992); Rubanick v. Witco Chem. Corp., 125 N.J. 421, 447 (1991) (“We initially took that step to allow the parties in toxic tort civil matters to present novel scientific evidence of causation if, after the trial court engages in rigorous gatekeeping when reviewing for reliability, the proponent persuades the court of the soundness of the expert’s reasoning.”).

5 The Court did acknowledge that Federal Rule of Evidence 702 had been amended in 2000, to reflect the Supreme Court’s decision in Daubert, Joiner, and Kumho Tire, but the Court did not deal with the inconsistencies between the present rule and the 1993 Daubert case. Slip op. at 64, citing Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 320-21, 320 n.8 (3d Cir. 2003).

6 See Accutane slip op. at 12-18, 24, 73-74, 77-78. With respect to meta-analysis, the Reference Manual’s epidemiology chapter is still stuck in the 1980s and the prevalent resistance to poorly conducted, often meaningless meta-analyses. See “The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence” (Nov. 14, 2011) (The Reference Manual fails to come to grips with the prevalence and importance of meta-analysis in litigation, and fails to provide meaningful guidance to trial judges).

Posted in Evidence, Expert Witnesses, Frye, Meta-analysis, Reference Manual on Scientific Evidence, Rule 702, Scientific Evidence, statistical evidence, Statistical Power | Comments Off on N.J. Supreme Court Uproots Weeds in Garden State’s Law of Expert Witnesses

Scientific Evidence in Canadian Courts

February 20th, 2018

A couple of years ago, Deborah Mayo called my attention to the Canadian version of the Reference Manual on Scientific Evidence.¹ In the course of discussion of mistaken definitions and uses of p-values, confidence intervals, and significance testing, Sander Greenland pointed to some dubious pronouncements in the Science Manual for Canadian Judges [Manual].

Unlike the United States federal court Reference Manual, which is published through a joint effort of the National Academies of Science, Engineering, and Medicine, the Canadian version, is the product of the Canadian National Judicial Institute (NJI, or the Institut National de la Magistrature, if you live in Quebec), which claims to be an independent, not-for-profit group, that is committed to educating Canadian judges. In addition to the Manual, the Institute publishes Model Jury Instructions and a guide, Problem Solving in Canada’s Courtrooms: A Guide to Therapeutic Justice (2d ed.), as well as conducting educational courses.

The NJI’s website describes the Instute’s Manual as follows:

“Without the proper tools, the justice system can be vulnerable to unreliable expert scientific evidence.

* * *

The goal of the Science Manual is to provide judges with tools to better understand expert evidence and to assess the validity of purportedly scientific evidence presented to them. …”

The Chief Justice of Canada, Hon. Beverley M. McLachlin, contributed an introduction to the Manual, which was notable for its frank admission that:

“[w]ithout the proper tools, the justice system is vulnerable to unreliable expert scientific evidence.

****

Within the increasingly science-rich culture of the courtroom, the judiciary needs to discern ‘good’ science from ‘bad’ science, in order to assess expert evidence effectively and establish a proper threshold for admissibility. Judicial education in science, the scientific method, and technology is essential to ensure that judges are capable of dealing with scientific evidence, and to counterbalance the discomfort of jurists confronted with this specific subject matter.”

Manual at 14. These are laudable goals, indeed, but did the National Judicial Institute live up to its stated goals, or did it leave Canadian judges vulnerable to the Institute’s own “bad science”?

In his comments on Deborah Mayo’s blog, Greenland noted some rather cavalier statements in Chapter two that suggest that the conventional alpha of 5% corresponds to a “scientific attitude that unless we are 95% sure the null hypothesis is false, we provisionally accept it.” And he, pointed elsewhere where the chapter seems to suggest that the coefficient of confidence that corresponds to an alpha of 5% “constitutes a rather high standard of proof,” thus confusing and conflating probability of random error with posterior probabilities. Greenland is absolutely correct that the Manual does a rather miserable job of educating Canadian judges if our standard for its work product is accuracy and truth.

Some of the most egregious errors are within what is perhaps the most important chapter of the Manual, Chapter 2, “Science and the Scientific Method.” The chapter has two authors, a scientist, Scott Findlay, and a lawyer, Nathalie Chalifour. Findlay is an Associate Professor, in the Department of Biology, of the University of Ottawa. Nathalie Chalifour is an Associate Professor on the Faculty of Law, also in the University of Ottawa. Together, they produced some dubious pronouncements, such as:

Weight of the Evidence (WOE)

“First, the concept of weight of evidence in science is similar in many respects to its legal counterpart. In both settings, the outcome of a weight-of-evidence assessment by the trier of fact is a binary decision.”

Manual at 40. Findlay and Chalifour cite no support for their characterization of WOE in science. Most attempts to invoke WOE are woefully vague and amorphous, with no meaningful guidance or content.² Sixty-five pages later, if any one is noticing, the authors let us in a dirty, little secret:

“at present, there exists no established prescriptive methodology for weight of evidence assessment in science.”

Manual at 105. The authors omit, however, that there are prescriptive methods for inferring causation in science; you just will not see them in discussions of weight of the evidence. The authors then compound the semantic and conceptual problems by stating that “in a civil proceeding, if the evidence adduced by the plaintiff is weightier than that brought forth by the defendant, a judge is obliged to find in favour of the plaintiff.” Manual at 41. This is a remarkable suggestion, which implies that if the plaintiff adduces the crummiest crumb of evidence, a mere peppercorn on the scales of justice, but the defendant has none to offer, that the plaintiff must win. The plaintiff wins notwithstanding that no reasonable person could believe that the plaintiff’s claims are more likely than not true. Even if there were the law of Canada, it is certainly not how scientists think about establishing the truth of empirical propositions.

Confusion of Hypothesis Testing with “Beyond a Reasonable Doubt”

The authors’ next assault comes in conflating significance probability with the probability connected with the burden of proof, a posterior probability. Legal proceedings have a defined burden of proof, with criminal cases requiring the state to prove guilt “beyond a reasonable doubt.” Findlay and Chalifour’s discussion then runs off the rails by likening hypothesis testing, with an alpha of 5% or its complement, 95%, as a coefficient of confidence, to a “very high” burden of proof:

“In statistical hypothesis-testing – one of the tools commonly employed by scientists – the predisposition is that there is a particular hypothesis (the null hypothesis) that is assumed to be true unless sufficient evidence is adduced to overturn it. But in statistical hypothesis-testing, the standard of proof has traditionally been set very high such that, in general, scientists will only (provisionally) reject the null hypothesis if they are at least 95% sure it is false. Third, in both scientific and legal proceedings, the setting of the predisposition and the associated standard of proof are purely normative decisions, based ultimately on the perceived consequences of an error in inference.”

Manual at 41. This is, as Greenland and many others have pointed out, a totally bogus conception of hypothesis testing, and an utterly false description of the probabilities involved.

Later in the chapter, Findlay and Chalifour flirt with the truth, but then lapse into an unrecognizable parody of it:

“Inferential statistics adopt the frequentist view of probability whereby a proposition is either true or false, and the task at hand is to estimate the probability of getting results as discrepant or more discrepant than those observed, given the null hypothesis. Thus, in statistical hypothesis testing, the usual inferred conclusion is either that the null is true (or rather, that we have insufficient evidence to reject it) or it is false (in which case we reject it). 16 The decision to reject or not is based on the value of p if the estimated value of p is below some threshold value a, we reject the null; otherwise we accept it.”

Manual at 74. OK; so far so good, but here comes the train wreck:

“By convention (and by convention only), scientists tend to set α = 0.05; this corresponds to the collective – and, one assumes, consensual – scientific attitude that unless we are 95% sure the null hypothesis is false, we provisionally accept it. It is partly because of this that scientists have the reputation of being a notoriously conservative lot, given that a 95% threshold constitutes a rather high standard of proof.”

Manual at 75. Uggh; so we are back to significance probability’s being a posterior probability. As if to atone for their sins, in the very next paragraph, the authors then remind the judicial readers that:

“As noted above, p is the probability of obtaining results at least as discrepant as those observed if the null is true. This is not the same as the probability of the null hypothesis being true, given the results.”

Manual at 75. True, true, and completely at odds with what the authors have stated previously. And to add to the reader’s now fully justified conclusion, the authors describe the standard for rejecting the null hypothesis as “very high indeed.” Manual at 102, 109. Any reader who is following the discussion might wonder how and why there is such a problem of replication and reproducibility in contemporary science.

Conflating Bayesianism with Frequentist Modes of Inference

We have seen how Findlay and Chalifour conflate significance and posterior probabilities, some of the time. In a section of their chapter that deals explicitly with probability, the authors tell us that before any study is conducted the prior probability of the truth of the tested hypothesis is 50%, sans evidence. This an astonishing creation of certainty out nothingness, and perhaps it explains the authors’ implied claim that the crummiest morsel of evidence on one side is sufficient to compel a verdict, if the other side has no morsels at all. Here is how the authors put their claim to the Canadian judges:

“Before each study is conducted (that is, a priori), the hypothesis is as likely to be true as it is to be false. Once the results are in, we can ask: How likely is it now that the hypothesis is true? In the first study, the low a priori inferential strength of the study design means that this probability will not be much different from the a priori value of 0.5 because any result will be rather equivocal owing to limitations in the experimental design.”

Manual at 64. This implied Bayesian slant, with 50% priors, in the world of science would lead anyone to believe “as many as six impossible things before breakfast,” and many more throughout the day.

Lest you think that the Manual is all rubbish, there are occasional gems of advice to the Canadian judges. The authors admonish the judges to

“be wary of individual ‘statistically significant’ results that are mined from comparatively large numbers of trials or experiments, as the results may be ‘cherry picked’ from a larger set of experiments or studies that yielded mostly negative results. The court might ask the expert how many other trials or experiments testing the same hypothesis he or she is aware of, and to describe the outcome of those studies.”

Manual at 87. Good advice, but at odds with the authors’ characterization of statistical significance as establishing the rejection of the null hypothesis well-nigh beyond a reasonable doubt.

When Greenland first called attention to this Manual, I reached to some people who had been involved in its peer review. One reviewer told me that it was a “living document,” and would likely be revised after he had the chance to call the NJI’s attention to the errors. But two years later, the errors remain, and so we have to infer that the authors meant to say all the contradictory and false statements that are still present in the downloadable version of the Manual.

1 See “‘Statistical Significance’ According to the U.S. Dept. of Health and Human Services (ii),” Error Statistics Philosophy (July 17, 2015). At the time, I wrote briefly about the Canadian Manual. See “Canadian Judges’ Reference Manual on Scientific Evidence” (July 24, 2015).

2 See “WOE-fully Inadequate Methodology – An Ipse Dixit By Another Name” (May 1, 2012); “Weight of the Evidence in Science and in Law” (July 29, 2017); see also David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).

Posted in Reference Manual on Scientific Evidence, Scientific Evidence, statistical evidence | Comments Off on Scientific Evidence in Canadian Courts

High, Low and Right-Sided Colonics – Ridding the Courts of Junk Science

July 16th, 2016

Not surprisingly, many of Selikoff’s litigation- and regulatory-driven opinions have not fared well, such as the notions that asbestos causes gastrointestinal cancers and that all asbestos minerals have equal potential and strength to cause mesothelioma. Forty years after Selikoff testified in litigation that occupational asbestos exposure caused an insulator’s colorectal cancer, the Institute of Medicine reviewed the extant evidence and announced that the evidence was “suggestive but not sufficient to infer a causal relationship between asbestos exposure and pharyngeal, stomach, and colorectal cancers.” Jonathan Samet, et al., eds., Institute of Medicine Review of Asbestos: Selected Cancers (2006).[1] The Institute of Medicine’s monograph has fostered a more circumspect approach in some of the federal agencies. The National Cancer Institute’s website now proclaims that the evidence is insufficient to permit a conclusion that asbestos causes non-pulmonary cancers of gastrointestinal tract and throat.[2]

As discussed elsewhere, Selikoff testified as early as 1966 that asbestos causes colorectal cancer, in advance of any meaningful evidence to support such an opinion, and then he, and his protégées, worked hard to lace the scientific literature with their pronouncements on the subject, without disclosing their financial, political, and positional conflicts of interest.[3]

With plaintiffs’ firm’s (Lanier) zealous pursuit of bias information from the University of Idaho, in the LoGuidice case, what are we to make of Selikoff’s and his minions’ dubious ethics of failed disclosure. Do Selikoff and Mount Sinai receive a pass because their asbestos research predated the discovery of ethics? The “Lobby” (as the late Douglas Liddell called Selikoff and his associates)[4] has seriously distorted truth-finding in any number of litigations, but nowhere are the Lobby’s distortions more at work than in lawsuits for claimed asbestos injuries. Here the conflicts of interests truly have had a deleterious effect on the quality of civil justice. As we saw with the Selikoff exceptionalism displayed by the New York Supreme Court in reviewing third-party subpoenas,[5] some courts seem bent on ignoring evidence-based analyses in favor of Mount Sinai faith-based initiatives.

Current Asbestos Litigation Claims Involving Colorectal Cancer

Although Selikoff has passed from the litigation scene, his trainees and followers have lined up at the courthouse door to propagate his opinions. Even before the IOM’s 2006 monograph, more sophisticated epidemiologists consistently rejected the Selikoff conclusion on asbestos and colon cancer, which grew out of Selikoff’s litigation activities.[6] And yet, the minions keep coming.

In the pre-Daubert era, defendants lacked an evidentiary challenge to the Selikoff’s opinion that asbestos caused colorectal cancer. Instead of contesting the legal validity or sufficiency of the plaintiffs’ general causation claims, defendants often focused on the unreliability of the causal attribution for the specific claimant’s disease. These early cases are often misunderstood to be challenges to expert witnesses’ opinions about whether asbestos causes colorectal cancer; they were not.[7]

Of course, after the IOM’s 2006 monograph, active expert witness gatekeeping should eliminate asbestos gastrointestinal cancer claims, but sadly they persist. Perhaps, courts simply considered the issue “grandfathered” in from the era in which judicial scrutiny of expert witness opinion testimony was restricted. Perhaps, defense counsel are failing to frame and support their challenges properly. Perhaps both.

Arthur Frank Jumps the Gate

Although ostensibly a “Frye” state, Pennsylvania judges have, when moved by the occasion, to apply a fairly thorough analysis of proffered expert witness opinion.[8] On occasion, Pennsylvania judges have excluded unreliably or invalidly supported causation opinions, under the Pennsylvania version of the Frye standard. A recent case, however, tried before a Workman’s Compensation Judge (WCJ), and appealed to the Commonwealth Court, shows how inconsistent the application of the standard can be, especially when Selikoff’s legacy views are at issue.

Michael Piatetsky, an architect, died of colorectal cancer. Before his death, he and his wife filed a worker’s compensation claim, in which they alleged that his disease was caused by his workplace exposure to asbestos. Garrison Architects v. Workers’ Comp. Appeal Bd. (Piatetsky), No. 1095 C.D. 2015, Pa. Cmwlth. Ct., 2016 Pa. Commw. Unpub. LEXIS 72 (Jan. 22, 2016) [cited as Piatetsky]. Mr. Piatetsky was an architect, almost certainly knowledgeable about asbestos hazards generally. Despite his knowledge, Piatetsky eschewed personal protective equipment even when working at dusty work sites well marked with warnings. Although he had engaged in culpable conduct, the employer in worker compensation proceedings does not have ordinary negligence defenses, such as contributory negligence or assumption of risk.

In litigating the Piatetsky’s claim, the employer dragged its feet and failed to name an expert witness. Eventually, after many requests for continuances, the Workers’ Compensation Judge barred the employer from presenting an expert witness. With the record closed, and without an expert witness, the Judge understandably ruled in favor of the claimant.

The employer, sans expert witness, had to confront claimant’s expert witness, Arthur L. Frank, a minion of Selikoff and a frequent testifier in asbestos and many other litigations. Frank, of course, opined that asbestos causes colon cancer and that it caused Mr. Piatetsky’s cancer. Mr. Piatetsky’s colon cancer originated on the right side of his colon. Dr. Frank thus emphasized that asbestos causes colon cancer in all locations, but especially on the right side in view of one study’s having concluded “that colon cancer caused by asbestos is more likely to begin on the right side.” Piatetsky at *6.

On appeal, the employer sought relief on several issues, but the only one of interest here is the employer’s argument “that Claimant’s medical expert based his opinion on flimsy medical studies.” Piatetsky at *10. The employer’s appeal seemed to go off the rails with the insistence that the Claimant’s medical opinion was invalid because Dr. Frank relied upon studies not involving architects. Piatetsky at *14. The Commonwealth Court was able to point to testimony, although probably exaggerated, which suggested that Mr. Piatetsky had been heavily exposed, at least at times, and thus his exposure was similar to that in the studies cited by Frank.

With respect to Frank’s right-sided (non-sinister) opinion, the Commonwealth Court framed the employer’s issue as a contention that Dr. Frank’s opinion on the asbestos-relatedness of right-sided colon cancer was “not universally accepted.” But universal acceptance has never been the test or standard for the rejection or acceptance of expert witness opinion testimony in any state. Either the employer badly framed its appeal, or the appellate court badly misstated the employer’s ground for relief. In any event, the Commonwealth Court never addressed the relevant legal standard in its discussion.

The Claimant argued that the hearing Judge had found that Frank’s opinion was based on “numerous studies.” Piatetsky at *15. None of these studies is cited to permit the public to assess the argument and the Court’s acceptance of it. The appellate court made inappropriately short work of this appellate issue by confusing general and specific causation, and invoking Mr. Piatetsky’s age, his lack of family history of colon cancer, Frank’s review of medical records, testimony, and work records, as warranting Frank’s causal inference. None of these factors is relevant to general causation, and none is probative of the specific causation claim. Many if not most colon cancers have no identifiable risk factor, and Dr. Frank had no way to rule out baseline risk, even if there were an increased risk from asbestos exposure. Piatetsky at *16. With no defense expert witness, the employer certainly had a difficult appellate journey. It is hard for the reader of the Commonwealth Court’s opinion to determine whether the case was poorly defended, poorly briefed on appeal, or poorly described by the appellate judges.

In any event, the right-sided ruse of Arthur Frank went unreprimanded. Intellectual due process might have led the appellate court to cite the article at issue, but it failed to do so. It is interesting and curious to see how the appellate court gave a detailed recitation of the controverted facts of asbestos exposure, while how glib the court was when describing the scientific issues and evidence. Nonetheless, the article referenced vaguely, which went uncited by the appellate court, was no doubt the paper: K. Jakobsson, M. Albin & L. Hagmar, “Asbestos, cement, and cancer in the right part of the colon,” 51 Occup. & Envt’l Med. 95 (1994).

These authors 24 observed versus 9.63 expected right-sided colon cancers, and they concluded that there was an increased rate of right-sided colon cancer in the asbestos cement plant workers. Notably the authors’ reference population had a curiously low rate of right-sided colon cancer. For left-sided colon cancer, the authors 9.3 expected cases but observed only 5 cases in the asbestos-cement cohort. Contrary to Frank’s suggestion, the authors did not conclude that right-sided colon cancers had been caused by asbestos; indeed, the authors never reached any conclusion whether asbestos causes colorectal cancer under any circumstances. In their discussion, these authors noted that “[d]espite numerous epidemiological and experimental studies, there is no consensus concerning exposure to asbestos and risks of gastrointestinal cancer.” Jakobsson at 99; see also Dorsett D. Smith, “Does Asbestos Cause Additional Malignancies Other than Lung Cancer,” chap. 11, in Dorsett D. Smith, The Health Effects of Asbestos: An Evidence-based Approach 143, 154 (2015). Even this casual description of the Jakobsson study will awake the learned reader to the multiple comparisons that went on in this cohort study, with outcomes reported for left, right, rectum, and multiple sites, without any adjustment to the level of significance. Risk of right-sided colon cancer was not a pre-specified outcome of the study, and the results of subsequent studies have never corroborated this small cohort study.

A sane understanding of subgroup analyses is important to judicial gatekeeping. See “Sub-group Analyses in Epidemiologic Studies — Dangers of Statistical Significance as a Bright-Line Test” (May 17, 2011). The chapter on statistics in the Reference Manual for Scientific Evidence (3d ed. 2011) has some prudent caveats for multiple comparisons and testing, but neither the chapter on epidemiology, nor the chapter on clinical medicine[9], provides any sense of the dangers of over-interpreting subgroup analyses.

Some commentators have argued that we must not dissuade scientists from doing subgroup analysis, but the issue is not whether they should be done, but how they should be interpreted.[10] Certainly many authors have called for caution in how subgroup analyses are interpreted[11], but apparently Expert Witness Arthur Frank, did not receive the memo, before testifying in the Piatetsky case, and the Commonwealth Court did not before deciding this case.

[1] As good as the IOM process can be on occasion, even its reviews are sometimes less than thorough. The asbestos monograph gave no consideration to alcohol in the causation of laryngeal cancer, and no consideration to smoking in its analysis of asbestos and colorectal cancer. See, e.g., Peter S. Liang, Ting-Yi Chen & Edward Giovannucci, “Cigarette smoking and colorectal cancer incidence and mortality: Systematic review and meta-analysis,” 124 Internat’l J. Cancer 2406, 2410 (2009) (“Our results indicate that both past and current smokers have an increased risk of [colorectal cancer] incidence and mortality. Significantly increased risk was found for current smokers in terms of mortality (RR 5 1.40), former smokers in terms of incidence (RR 5 1.25)”); Lindsay M. Hannan, Eric J. Jacobs and Michael J. Thun, “The Association between Cigarette Smoking and Risk of Colorectal Cancer in a Large Prospective Cohort from the United States,” 18 Cancer Epidemiol., Biomarkers & Prevention 3362 (2009).

[2] National Cancer Institute, “Asbestos Exposure and Cancer Risk” (last visited July 10, 2016) (“In addition to lung cancer and mesothelioma, some studies have suggested an association between asbestos exposure and gastrointestinal and colorectal cancers, as well as an elevated risk for cancers of the throat, kidney, esophagus, and gallbladder (3, 4). However, the evidence is inconclusive.”).

[3] Compare “Health Hazard Progress Notes: Compensation Advance Made in New York State,” 16(5) Asbestos Worker 13 (May 1966) (thanking Selikoff for testifying in a colon cancer case) with, Irving J. Selikoff, “Epidemiology of gastrointestinal cancer,” 9 Envt’l Health Persp. 299 (1974) (arguing for his causal conclusion between asbestos and all gastrointestinal cancers, with no acknowledgment of his role in litigation or his funding from the asbestos insulators’ union).

[4] F.D.K. Liddell, “Magic, Menace, Myth and Malice,” 41 Ann. Occup. Hyg. 3, 3 (1997); see also “The Lobby Lives – Lobbyists Attack IARC for Conducting Scientific Research” (Feb. 19, 2013).

[5]

See “The LoGiudice Inquisitiorial Subpoena & Its Antecedents in N.Y. Law” (July 14, 2016).

[6] See, e.g., Richard Doll & Julian Peto, Asbestos: Effects on health of exposure to asbestos 8 (1985) (“In particular, there are no grounds for believing that gastrointestinal cancers in general are peculiarly likely to be caused by asbestos exposure.”).

[7] See “Landrigan v. The Celotex Corporation, Revisited” (June 4, 2013); Landrigan v. The Celotex Corp., 127 N.J. 404, 605 A.2d 1079 (1992); Caterinicchio v. Pittsburgh Corning Corp., 127 NJ. 428, 605 A.2d 1092 (1992). In both Landrigan and Caterinicchio, there had been no challenge to the reliability or validity of the plaintiffs’ expert witnesses’ general causation opinions. Instead, the trial courts entered judgments, assuming arguendo that asbestos can cause colorectal cancer (a dubious proposition), on the ground that the low relative risk cited by plaintiffs’ expert witnesses (about 1.5) was factually insufficient to support a verdict for plaintiffs on specific causation. Indeed, the relative risk suggested that the odds were about 2 to 1 in defendants’ favor that the plaintiffs’ colorectal cancers were not caused by asbestos.

[8] See, e.g., Porter v. Smithkline Beecham Corp., Sept. Term 2007, No. 03275. 2016 WL 614572 (Phila. Cty. Com. Pleas, Oct. 5, 2015); “Demonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case” (Oct. 6, 2015).

[9] John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” in Reference Manual for Scientific Evidence 687 (3d ed. 2011).

[10] See, e.g., Phillip I. Good & James W. Hardin, Common Errors in Statistics (and How to Avoid Them) 13 (2003) (proclaiming a scientists’ Bill of Rights under which they should be allowed to conduct subgroup analyses); Ralph I. Horwitz, Burton H. Singer, Robert W. Makuch, Catherine M. Viscoli, “Clinical versus statistical considerations in the design and analysis of clinical research,” 51 J. Clin. Epidemiol. 305 (1998) (arguing for the value of subgroup analyses). In United States v. Harkonen, the federal government prosecuted a scientist for fraud in sending a telecopy that described a clinical trial as “demonstrating” a benefit in a subgroup of a secondary trial outcome. Remarkably, in the Harkonen case, the author, and criminal defendant, was describing a result in a pre-specified outcome, in a plausible but post-hoc subgroup, which result accorded with prior clinical trials and experimental evidence. United States v. Harkonen (D. Calif. 2009); United States v. Harkonen (D. Calif. 2010) (post-trial motions), aff’d, 510 F. App’x 633 (9th Cir. 2013) (unpublished), cert. denied, 134 S. Ct. 824, ___ U.S. ___ (2014); Brief by Scientists And Academics as Amici Curiae In Support Of Petitioner, On Petition For Writ Of Certiorari in the Supreme Court of the United States, W. Scott Harkonen v. United States, No. 13-180 (filed Sept. 4, 2013).

[11] See “Sub-group Analyses in Epidemiologic Studies — Dangers of Statistical Significance as a Bright-Line Test” (May 17, 2011) (collecting commentary); see also Lemuel A. Moyé, Statistical Reasoning in Medicine: The Intuitive P-Value Primer 206, 225 (2d ed. 2006) (noting that subgroup analyses are often misleading: “Fishing expeditions for significance commonly catch only the junk of sampling error”); Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux & Gordon H Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004) (“Beware subgroup analysis”); Susan F. Assmann, Stuart J. Pocock, Laura E. Enos, Linda E. Kasten, “Subgroup analysis and other (mis)uses) of baseline data in clinical trials,” 355 Lancet 1064 (2000); George Davey Smith & Mathias Egger, “Commentary: Incommunicable knowledge? Interpreting and applying the results of clinical trials and meta-analyses,” 51 J. Clin. Epidemiol. 289 (1998) (arguing against post-hoc hypothesis testing); Douglas G. Altman, “Statistical reviewing for medical journals,” 17 Stat. Med. 2662 (1998); Douglas G. Altman, “Commentary: Within trial variation – A false trail?” 51 J. Clin. Epidemiol. 301 (1998) (noting that observed associations are expected to vary across subgroup because of random variability); Christopher Bulpitt, “Subgroup Analysis,” 2 Lancet: 31 (1988).

Posted in Asbestos, Causation, Reference Manual on Scientific Evidence, Rule 702 | Comments Off on High, Low and Right-Sided Colonics – Ridding the Courts of Junk Science

Judicial Control of the Rate of Error in Expert Witness Testimony

May 28th, 2015

In Daubert, the Supreme Court set out several criteria or factors for evaluating the “reliability” of expert witness opinion testimony. The third factor in the Court’s enumeration was whether the trial court had considered “the known or potential rate of error” in assessing the scientific reliability of the proffered expert witness’s opinion. Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 593 (1993). The Court, speaking through Justice Blackmun, failed to provide much guidance on the nature of the errors subject to gatekeeping, on how to quantify the errors, and on to know how much error was too much. Rather than provide a taxonomy of error, the Court lumped “accuracy, validity, and reliability” together with a grand pronouncement that these measures were distinguished by no more than a “hen’s kick.” Id. at 590 n.9 (1993) (citing and quoting James E. Starrs, “Frye v. United States Restructured and Revitalized: A Proposal to Amend Federal Evidence Rule 702,” 26 Jurimetrics J. 249, 256 (1986)).

The Supreme Court’s failure to elucidate its “rate of error” factor has caused a great deal of mischief in the lower courts. In practice, trial courts have rejected engineering opinions on stated grounds of their lacking an error rate as a way of noting that the opinions were bereft of experimental and empirical evidential support[1]. For polygraph evidence, courts have used the error rate factor to obscure their policy prejudices against polygraphs, and to exclude test data even when the error rate is known, and rather low compared to what passes for expert witness opinion testimony in many other fields[2]. In the context of forensic evidence, the courts have rebuffed objections to random-match probabilities that would require that such probabilities be modified by the probability of laboratory or other error[3].

When it comes to epidemiologic and other studies that require statistical analyses, lawyers on both sides of the “v” frequently misunderstand p-values or confidence intervals to provide complete measures of error, and ignore the larger errors that result from bias, confounding, study validity (internal and external), inappropriate data synthesis, and the like[4]. Not surprisingly, parties fallaciously argue that the Daubert criterion of “rate of error” is satisfied by expert witness’s reliance upon studies that in turn use conventional 95% confidence intervals and measures of statistical significance in p-values below 0.05[5].

The lawyers who embrace confidence intervals and p-values as their sole measure of error rate fail to recognize that confidence intervals and p-values are means of assessing only one kind of error: random sampling error. Given the carelessness of the Supreme Court’s use of technical terms in Daubert, and its failure to engage in the actual evidence at issue in the case, it is difficult to know whether the Court intended to suggest that random error was the error rate it had in mind[6]. The statistics chapter in the Reference Manual on Scientific Evidence helpfully points out that the inferences that can be drawn from data turn on p-values and confidence intervals, as well as on study design, data quality, and the presence or absence of systematic errors, such as bias or confounding. Reference Manual on Scientific Evidence at 240 (3d 2011) [Manual]. Random errors are reflected in the size of p-values or the width of confidence intervals, but these measures of random sampling error ignore systematic errors such as confounding and study biases. Id. at 249 & n.96.

The Manual’s chapter on epidemiology takes an even stronger stance: the p-value for a given study does not provide a rate of error or even a probability of error for an epidemiologic study:

“Epidemiology, however, unlike some other methodologies—fingerprint identification, for example—does not permit an assessment of its accuracy by testing with a known reference standard. A p-value provides information only about the plausibility of random error given the study result, but the true relationship between agent and outcome remains unknown. Moreover, a p-value provides no information about whether other sources of error – bias and confounding – exist and, if so, their magnitude. In short, for epidemiology, there is no way to determine a rate of error.”

Manual at 575. This stance seems not entirely justified given that there are Bayesian approaches that would produce credibility intervals accounting for sampling and systematic biases. To be sure, such approaches have their own problems and they have received little to no attention in courtroom proceedings to date.

The authors of the Manual’s epidemiology chapter, who are usually forgiving of judicial error in interpreting epidemiologic studies, point to one United States Court of Appeals case that fallaciously interpreted confidence intervals magically to quantify bias and confounding in a Bendectin birth defects case. Id. at 575 n. 96[7]. The Manual could have gone further to point out that, in the context of multiple studies, of different designs and analyses, cognitive biases involved in evaluating, assessing, and synthesizing the studies are also ignored by statistical measures such as p-values and confidence intervals. Although the Manual notes that assessing the role of chance in producing a particular set of sample data is “often viewed as essential when making inferences from data,” the Manual never suggests that random sampling error is the only kind of error that must be assessed when interpreting data. The Daubert criterion would appear to encompass all varieties or error, not just random error.

The Manual’s suggestion that epidemiology does not permit an assessment of the accuracy of epidemiologic findings misrepresents the capabilities of modern epidemiologic methods. Courts can, and do, invoke gatekeeping approaches to weed out confounded study findings. See “Sorting Out Confounded Research – Required by Rule 702” (June 10, 2012). The “reverse Cornfield inequality” was an important analysis that helped establish the causal connection between tobacco smoke and lung cancer[8]. Olav Axelson studied and quantified the role of smoking as a confounder in epidemiologic analyses of other putative lung carcinogens.[9] Quantitative methods for identifying confounders have been widely deployed[10].

A recent study in birth defects epidemiology demonstrates the power of sibling cohorts in addressing the problem of residual confounding from observational population studies with limited information about confounding variables. Researchers looking at various birth defect outcomes among offspring of women who used certain antidepressants in early pregnancy generally found no associations in pooled data from Iceland, Norway, Sweden, Finland, and Denmark. A putative association between maternal antidepressant use and a specific kind of cardiac defect (right ventricular outflow tract obstruction or RVOTO) did appear in the overall analysis, but was reversed when the analysis was limited to the sibling subcohort. The study found an apparent association between RVOTO defects and first trimester maternal exposure to selective serotonin reuptake inhibitors, with an adjusted odds ratio of 1.48 (95% C.I., 1.15, 1.89). In the adjusted analysis for siblings, the study found an OR of 0.56 (95% C.I., 0.21, 1.49) in an adjusted sibling analysis[11]. This study and many others show how creative analyses can elucidate and quantify the direction and magnitude of confounding effects in observational epidemiology.

Systematic bias has also begun to succumb to more quantitative approaches. A recent guidance paper by well-known authors encourages the use of quantitative bias analysis to provide estimates of uncertainty due to systematic errors[12].

Although the courts have failed to articulate the nature and consequences of erroneous inference, some authors would reduce all of Rule 702 (and perhaps 704, 403 as well) to a requirement that proffered expert witnesses “account” for the known and potential errors in their opinions:

“If an expert can account for the measurement error, the random error, and the systematic error in his evidence, then he ought to be permitted to testify. On the other hand, if he should fail to account for any one or more of these three types of error, then his testimony ought not be admitted.”

Mark Haug & Emily Baird, “Finding the Error in Daubert,” 62 Hastings L.J. 737, 739 (2011).

Like most antic proposals to revise Rule 702, this reform vision shuts out the full range of Rule 702’s remedial scope. Scientists certainly try to identify potential sources of error, but they are not necessarily very good at it. See Richard Horton, “Offline: What is medicine’s 5 sigma?” 385 Lancet 1380 (2015) (“much of the scientific literature, perhaps half, may simply be untrue”). And as Holmes pointed out[13], certitude is not certainty, and expert witnesses are not likely to be good judges of their own inferential errors[14]. Courts continue to say and do wildly inconsistent things in the course of gatekeeping. Compare In re Zoloft (Setraline Hydrochloride) Products, 26 F. Supp. 3d 449, 452 (E.D. Pa. 2014) (excluding expert witness) (“The experts must use good grounds to reach their conclusions, but not necessarily the best grounds or unflawed methods.”), with Gutierrez v. Johnson & Johnson, 2006 WL 3246605, at *2 (D.N.J. November 6, 2006) (denying motions to exclude expert witnesses) (“The Daubert inquiry was designed to shield the fact finder from flawed evidence.”).

[1] See, e.g., Rabozzi v. Bombardier, Inc., No. 5:03-CV-1397 (NAM/DEP), 2007 U.S. Dist. LEXIS 21724, at *7, *8, *20 (N.D.N.Y. Mar. 27, 2007) (excluding testimony from civil engineer about boat design, in part because witness failed to provide rate of error); Sorto-Romero v. Delta Int’l Mach. Corp., No. 05-CV-5172 (SJF) (AKT), 2007 U.S. Dist. LEXIS 71588, at *22–23 (E.D.N.Y. Sept. 24, 2007) (excluding engineering opinion that defective wood-carving tool caused injury because of lack of error rate); Phillips v. Raymond Corp., 364 F. Supp. 2d 730, 732–33 (N.D. Ill. 2005) (excluding biomechanics expert witness who had not reliably tested his claims in a way to produce an accurate rate of error); Roane v. Greenwich Swim Comm., 330 F. Supp. 2d 306, 309, 319 (S.D.N.Y. 2004) (excluding mechanical engineer, in part because witness failed to provide rate of error); Nook v. Long Island R.R., 190 F. Supp. 2d 639, 641–42 (S.D.N.Y. 2002) (excluding industrial hygienist’s opinion in part because witness was unable to provide a known rate of error).

[2] See, e.g., United States v. Microtek Int’l Dev. Sys. Div., Inc., No. 99-298-KI, 2000 U.S. Dist. LEXIS 2771, at *2, *10–13, *15 (D. Or. Mar. 10, 2000) (excluding polygraph data based upon showing that claimed error rate came from highly controlled situations, and that “real world” situations led to much higher error (10%) false positive error rates); Meyers v. Arcudi, 947 F. Supp. 581 (D. Conn. 1996) (excluding polygraph in civil action).

[3] See, e.g., United States v. Ewell, 252 F. Supp. 2d 104, 113–14 (D.N.J. 2003) (rejecting defendant’s objection to government’s failure to quantify laboratory error rate); United States v. Shea, 957 F. Supp. 331, 334–45 (D.N.H. 1997) (rejecting objection to government witness’s providing separate match and error probability rates).

[4] For a typical judicial misstatement, see In re Zoloft Products, 26 F. Supp.3d 449, 454 (E.D. Pa. 2014) (“A 95% confidence interval means that there is a 95% chance that the ‘‘true’’ ratio value falls within the confidence interval range.”).

[5] From my experience, this fallacious argument is advanced by both plaintiffs’ and defendants’ counsel and expert witnesses. See also Mark Haug & Emily Baird, “Finding the Error in Daubert,” 62 Hastings L.J. 737, 751 & n.72 (2011).

[6] See David L. Faigman, et al. eds., Modern Scientific Evidence: The Law and Science of Expert Testimony § 6:36, at 359 (2007–08) (“it is easy to mistake the p-value for the probability that there is no difference”)

[7] Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5^th Cir. 1989), modified, 884 F.2d 166 (5^th Cir. 1989), cert. denied, 494 U.S. 1046 (1990). As with any error of this sort, there is always the question whether the judges were entrapped by the parties or their expert witnesses, or whether the judges came up with the fallacy on their own.

[8] See Joel B Greenhouse, “Commentary: Cornfield, Epidemiology and Causality,” 38 Internat’l J. Epidem. 1199 (2009).

[9] Olav Axelson & Kyle Steenland, “Indirect methods of assessing the effects of tobacco use in occupational studies,” 13 Am. J. Indus. Med. 105 (1988); Olav Axelson, “Confounding from smoking in occupational epidemiology,” 46 Brit. J. Indus. Med. 505 (1989); Olav Axelson, “Aspects on confounding in occupational health epidemiology,” 4 Scand. J. Work Envt’l Health 85 (1978).

[10] See, e.g., David Kriebel, Ariana Zeka1, Ellen A Eisen, and David H. Wegman, “Quantitative evaluation of the effects of uncontrolled confounding by alcohol and tobacco in occupational cancer studies,” 33 Internat’l J. Epidem. 1040 (2004).

[11] Kari Furu, Helle Kieler, Bengt Haglund, Anders Engeland, Randi Selmer, Olof Stephansson, Unnur Anna Valdimarsdottir, Helga Zoega, Miia Artama, Mika Gissler, Heli Malm, and Mette Nørgaard, “Selective serotonin reuptake inhibitors and ventafaxine in early pregnancy and risk of birth defects: population based cohort study and sibling design,” 350 Brit. Med. J. 1798 (2015).

[12] Timothy L.. Lash, Matthew P. Fox, Richard F. MacLehose, George Maldonado, Lawrence C. McCandless, and Sander Greenland, “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).

[13] Oliver Wendell Holmes, Jr., Collected Legal Papers at 311 (1920) (“Certitude is not the test of certainty. We have been cock-sure of many things that were not so.”).

[14] See, e.g., Daniel Kahneman & Amos Tversky, “Judgment under Uncertainty: Heuristics and Biases,” 185 Science 1124 (1974).

Posted in Cognitive Biases, Expert Witnesses, Reference Manual on Scientific Evidence, Rule 702, Scientific Evidence | Comments Off on Judicial Control of the Rate of Error in Expert Witness Testimony

ALI Reporters Are Snookered by Racette Fallacy

April 27th, 2015

In the Reference Manual on Scientific Evidence, the authors of the epidemiology chapter advance instances of acceleration of onset of disease as an example of a situation in which reliance upon doubling of risk will not provide a reliable probability of causation calculation[1]. In a previous post, I suggested that the authors’ assertion may be unfounded. See “Reference Manual on Scientific Evidence on Relative Risk Greater Than Two For Specific Causation Inference” (April 25, 2014). Several epidemiologic methods would permit the calculation of relative risk within specific time windows from first exposure.

The American Law Institute (ALI) Reporters, for the Restatement of Torts, make similar claims.[2] First, the Reporters, citing the Manual’s second edition, repeat the Manual’s claim that:

“Epidemiologists, however, do not seek to understand causation at the individual level and do not use incidence rates in group to studies to determine the cause of an individual’s disease.”

American Law Institute, Restatement (Third) of Torts: Liability for Physical and Emotional Harm § 28(a) cmt. c(4) & rptrs. notes (2010) [Comment c(4)]. In making this claim, the Reporters ignore an extensive body of epidemiologic studies on genetic associations and on biomarkers, which do address causation implicitly or explicitly, on an individual level.

The Reporters also repeat the Manual’s doubtful claim that acceleration of onset of disease prevents an assessment of attributable risk, although they acknowledge that an average earlier age of onset would form the basis of damages calculations rather than calculations for damages for an injury that would not have occurred but for the tortious exposure. Comment c(4). The Reporters go a step further than the Manual, however, and provide an example of the acceleration-of-onset studies that they have in mind:

“For studies whose results suggest acceleration, see Brad A. Racette, Welding-Related Parkinsonism: Clinical Features, Treatments, and Pathophysiology,” 56 Neurology 8, 12 (2001) (stating that authors “believe that welding acts as an accelerant to cause [Parkinson’s Disease]… .”

The citation to Racette’s 2001 paper[3] is curious, interesting, disturbing, and perhaps revealing. In this 2001 paper, Racette misrepresented the type of study he claimed to have done, and the inferences he drew from his case series are invalid. Any one experienced in the field of epidemiology would have dismissed this study, its conclusions, and its suggested relation between welding and parkinsonism.

Dr. Brad A. Racette teaches and practices neurology at Washington University in St. Louis, across the river from a hotbed of mass tort litigation, Madison County, Illinois. In the 1990s, Racette received referrals from plaintiffs’ attorneys to evaluate their clients in litigation over exposure to welding fumes. Plaintiffs were claiming that their occupational exposures caused them to develop manganism, a distinctive parkinsonism that differs from Parkinson’s disease [PD], but has signs and symptoms that might be confused with PD by unsophisticated physicians unfamiliar with both manganism and PD.

After the publication of his 2001 paper, Racette became the darling of felon Dicky Scruggs and other plaintiffs’ lawyers. The litigation industrialists invited Racette and his team down to Alabama and Mississippi, to conduct screenings of welding tradesmen, recruited by Scruggs and his team, for potential lawsuits for PD and parkinsonism. The result was a paper that helped Scruggs propel a litigation assault against the welding industry.[4]

Racette’s 2001 paper was accompanied by a press release, as have many of his papers, in which he was quoted as stating that “[m]anganism is a very different disease” from PD. Gila Reckess, “Welding, Parkinson’s link suspected” (Feb. 9, 2001)[5].

Racette’s 2001 paper provoked a strongly worded letter that called Racette and his colleagues out for misrepresenting the nature of their work:

“The authors describe their work as a case–control study. Racette et al. ascertained welders with parkinsonism and compared their concurrent clinical features to those of subjects with PD. This is more consistent with a cross-sectional design, as the disease state and factors of interest were ascertained simultaneously. Cross-sectional studies are descriptive and therefore cannot be used to infer causation.”

*****

“The data reported by Racette et al. do not necessarily support any inference about welding as a risk factor in PD. A cohort study would be the best way to evaluate the role of welding in PD.”

Bernard Ravina, Andrew Siderowf, John Farrar, Howard Hurtig, “Welding-related parkinsonism: Clinical features, treatment, and pathophysiology,” 57 Neurology 936, 936 (2001).

As we will see, Dr. Ravina and his colleagues were charitable to suggest that the study was more compatible with a cross-sectional study. Racette had set out to determine “whether welding-related parkinsonism differs from idiopathic PD.” He claimed that he had “performed a case-control study,” with a case group of welders and two control groups. His inferences drawn from his “data” are, however, fallacious because he employed an invalid study design.

In reality, Racette’s paper was nothing more than a chart review, a case series of 15 “welders” in the context of a movement disorder clinic. After his clinical and radiographic evaluation, Racette found that these 15 cases were clinically indistinguishable from PD, and thus unlike manganism. Racette did not reveal whether any of these 15 welders had been referred by plaintiffs’ counsel; nor did he suggest that these welding tradesmen made up a disproportionate number of his patient base in St. Louis, Missouri.

Racette compared his selected 15 career welders with PD to his general movement disorders clinic patient population, for comparison. From the patient population, Racette deployed two “control” groups, one matched for age and sex with the 15 welders, and the other group not matched. The America Law Institute reporters are indeed correct that Racette suggested that the average age of onset for these 15 welders was lower than that for his non-welder patients, but their uncritical embrace overlooked the fact that Racette’s suggestion does not support his claimed inference that in welders, therefore, “welding exposure acts as an accelerant to cause PD.”

Racette’s claimed inference is remarkable because he did not perform an analytical epidemiologic study that was capable of generating causal inferences. His paper incongruously presents odds ratios, although the controls have PD, the disease of interest, which invalidates any analytical inference from his case series. Given the referral and selection biases inherent in tertiary-care specialty practices, this paper can provide no reliable inferences about associations or differences in ages of onset. Even within the confines of a case series misrepresented to be a case-control study, Racette acknowledged that “[s]ubsequent comparisons of the welders with age-matched controls showed no significant differences.”

NOT A CASE-CONTROL STUDY

That Racette wrongly identified his paper as a case-control study is beyond debate. How the journal Neurology accepted the paper for publication is a mystery. The acceptance of the inference by the ALI Reporter, lawyers and judges, is regrettable.

Structurally, Racette’s paper could never quality as a case-control study, or any other analytical epidemiologic study. Here is how a leading textbook on case-control studies defines a case-control study:

“In a case-control study, individuals with a particular condition or disease (the cases) are selected for comparison with a series of individuals in whom the condition or disease is absent (the controls).”

James J. Schlesselman, Case-control Studies. Design, Conduct, Analysis at 14 (N.Y. 1982)[6].

Every patient in Racette’s paper, welders and non-welders, have the outcome of interest, PD. There is no epidemiologic study design that corresponds to what Racette did, and there is no way to draw any useful inference from Racette’s comparisons. Racette’s paper violates the key principle for a proper case-control study; namely, all subjects must be selected independently of the study exposure that is under investigation. Schlesselman stressed that that identifying an eligible case or control must not depend upon that person’s exposure status for any factor under consideration. Id. Racette’s 2001 paper deliberately violated this basic principle.

Racette’s study design, with only cases with the outcome of interest appearing in the analysis, recklessly obscures the underlying association between the exposure (welding) and age in the population. We would, of course, expect self-identified welders to be younger than the average Parkinson’s disease patient because welding is physical work that requires good health. An equally fallacious study could be cobbled together to “show” that the age-of-onset of Parkinson’s disease for sitcom actors (such as Michael J. Fox) is lower than the age-of-onset of Parkinson’s disease for Popes (such as John Paul II). Sitcom actors are generally younger as a group than Popes. Comparing age of onset between disparate groups that have different age distributions generates a biased comparison and an erroneous inference.

The invalidity and fallaciousness of Racette’s approach to studying the age-of-onset issue of PD in welders, and his uncritical inferences, have been extensively commented upon in the general epidemiologic literature. For instance, in studies that compared the age at death for left-handed versus right-handed person, studies reported an observed nine-year earlier death for left handers, leading to (unfounded) speculation that earlier mortality resulted from birth and life stressors and accidents for left handers, living in a world designed to accommodate right-handed person[7]. The inference has been shown to be fallacious and the result of social pressure in the early twentieth century to push left handers to use their right hands, a prejudicial practice that abated over the decades of the last century. Left handers born later in the century were less likely to be “switched,” as opposed to those persons born earlier and now dying, who were less likely to be classified as left-handed, as a result of a birth-cohort effect[8]. When proper prospective cohort studies were conducted, valid data showed that left-handers and right-handers have equivalent mortality rates[9].

Epidemiologist Ken Rothman addressed the fallacy of Racette’s paper at some length in one of his books:

“Suppose we study two groups of people and look at the average age at death among those who die. In group A, the average age of death is 4 years; in group B, it is 28 years. Can we say that being a member of group A is riskier than being a member of group B? We cannot… . Suppose that group A comprises nursery school students and group B comprises military commandos. It would be no surprise that the average age at death of people who are currently military commandos is 28 years or that the average age of people who are currently nursery students is 4 years. …

In a study of factory workers, an investigator inferred that the factory work was dangerous because the average age of onset of a particular kind of cancer was lower in these workers than among the general population. But just as for the nursery school students and military commandos, if these workers were young, the cancers that occurred among them would have to be occurring in young people. Furthermore, the age of onset of a disease does not take into account what proportion of people get the disease.

These examples reflect the fallacy of comparing the average age at which death or disease strikes rather than comparing the risk of death between groups of the same age.”

Kenneth J. Rothman, “Introduction to Epidemiologic Thinking,” in Epidemiology: An Introduction at 5-6 (N.Y. 2002).

And here is how another author of Modern Epidemiology[10] addressed the Racette fallacy in a different context involving PD:

“Valid studies of age-at-onset require no underlying association between the risk factor and aging or birth cohort in the source population. They must also consider whether a sufficient induction time has passed for the risk factor to have an effect. When these criteria and others cannot be satisfied, age-specific or standardized risks or rates, or a population-based case-control design, must be used to study the association between the risk factor and outcome. These designs allow the investigator to disaggregate the relation between aging and the prevalence of the risk factor, using familiar methods to control confounding in the design or analysis. When prior knowledge strongly suggests that the prevalence of the risk factor changes with age in the source population, case-only studies may support a relation between the risk factor and age-at-onset, regardless of whether the inference is justified.”

Jemma B. Wilk & Timothy L. Lash, “Risk factor studies of age-at-onset in a sample ascertained for Parkinson disease affected sibling pairs: a cautionary tale,” 4 Emerging Themes in Epidemiology 1 (2007) (internal citations omitted) (emphasis added).

A properly designed epidemiologic study would have avoided Racette’s fallacy. A relevant cohort study would have enrolled welders in the study at the outset of their careers, and would have continued to follow them even if they changed occupations. A case-control study would have enrolled cases with PD and controls without PD (or more broadly, parkinsonism), with cases and controls selected independently of their exposure to welding fumes. Either method would have determined the rate of PD in both groups, absolutely or relatively. Racette’s paper, which completely lacked non-PD cases, could not have possibly accomplished his stated objectives, and it did not support his claims.

Racette’s questionable work provoked a mass tort litigation and ultimately federal Multi-District Litigation 1535.[11] Ultimately, analytical epidemiologic studies consistently showed no association between welding and PD. A meta-analysis published in 2012 ended the debate[12] as a practical matter, and MDL 1535 is no more. How strange that the ALI reporters chose the Racette work as an example of their claims about acceleration of onset!

[1] Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 549, 614 (Wash., DC 3d ed. 2011).

[2] Michael D. Green was an ALI Reporter, and of course, an author of the chapter in the Reference Manual.

[3] Brad A. Racette, L. McGee-Minnich, S. M. Moerlein, J. W. Mink, T. O. Videen, and Joel S. Perlmutter, “Welding-related parkinsonism: clinical features, treatment, and pathophysiology,” 56 Neurology 8 (2001).

[4] See Brad A. Racette, S.D. Tabbal, D. Jennings, L. Good, Joel S. Perlmutter, and Brad Evanoff, “Prevalence of parkinsonism and relationship to exposure in a large sample of Alabama welders,” 64 Neurology 230 (2005); Brad A. Racette, et al., “A rapid method for mass screening for parkinsonism,” 27 Neurotoxicology 357 (2006) (duplicate publication of the earlier, 2005, paper).

[5] Previously available at <http://record.wustl.edu/archive/2001/02-09-01/articles/welding.html>, last visited on June 27, 2005.

[6] See also Brian MacMahon & Dimitrios Trichopoulos, Epidemiology. Principles and Methods at 229 (2ed 1996) (“A case-control study is an inquiry in which groups of individuals are selected based on whether they do (the cases) or do not (the controls) have the disease of which the etiology is to be studied.”); Jennifer L. Kelsey, W.D. Thompson, A.S. Evans, Methods in Observational Epidemiology at 148 (N.Y. 1986) (“In a case-control study, persons with a given disease (the cases) and persons without the disease (the controls) are selected … .”).

[7] See, e.g., Diane F. Halpern & Stanley Coren, “Do right-handers live longer?” 333 Nature 213 (1988); Diane F. Halpern & Stanley Coren, “Handedness and life span,” 324 New Engl. J. Med. 998 (1991).

[8] Kenneth J. Rothman, “Left-handedness and life expectancy,” 325 New Engl. J. Med. 1041 (1991) (pointing out that by comparing age of onset method, nursery education would be found more dangerous than paratrooper training, given that the age at death of pres-schoolers wo died would be much lower than that of paratroopers who died); see also Martin Bland & Doug Altman, “Do the left-handed die young?” Significance 166 (Dec. 2005).

[9] See Philip A. Wolf, Ralph B. D’Agostino, Janet L. Cobb, “Left-handedness and life expectancy,” 325 New Engl. J. Med. 1042 (1991); Marcel E. Salive, Jack M. Guralnik & Robert J. Glynn, “Left-handedness and mortality,” 83 Am. J. Public Health 265 (1993); Olga Basso, Jørn Olsen, Niels Holm, Axel Skytthe, James W. Vaupel, and Kaare Christensen, “Handedness and mortality: A follow-up study of Danish twins born between 1900 and 1910,” 11 Epidemiology 576 (2000). See also Martin Wolkewitz, Arthur Allignol, Martin Schumacher, and Jan Beyersmann, “Two Pitfalls in Survival Analyses of Time-Dependent Exposure: A Case Study in a Cohort of Oscar Nominees,” 64 Am. Statistician 205 (2010); Michael F. Picco, Steven Goodman, James Reed, and Theodore M. Bayless, “Methodologic pitfalls in the determination of genetic anticipation: the case of Crohn’s disease,” 134 Ann. Intern. Med. 1124 (2001).

[10] Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, eds., Modern Epidemiology (3d ed. 2008).

[11] Dicky Scruggs served on the Plaintiffs’ Steering Committee until his conviction on criminal charges.

[12] James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012).

Posted in Causation, Reference Manual on Scientific Evidence, Rule 702, Rule 703, Scientific Evidence | Comments Off on ALI Reporters Are Snookered by Racette Fallacy

Reference Manual on Scientific Evidence on Relative Risk Greater Than Two For Specific Causation Inference

April 25th, 2015

The first edition of the Reference Manual on Scientific Evidence [Manual] was published in 1994, a year after the Supreme Court delivered its opinion in Daubert. The Federal Judicial Center organized and produced the Manual, in response to the kernel panic created by the Supreme Court’s mandate that federal trial judges serve as gatekeepers of the methodological propriety of testifying expert witnesses’ opinions. Considering the intellectual vacuum the Center had to fill, and the speed with which it had to work, the first edition was a stunning accomplishment.

In litigating specific causation in so-called toxic tort cases, defense counsel quickly embraced the Manual’s apparent endorsement of the doubling-of-the-risk argument, which would require relative risks in excess of two in order to draw inferences of specific causation in a given case. See Linda A. Bailey, Leon Gordis, and Michael D. Green, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 123, 150, 168 (Washington, DC:, 1st ed., 1994) (“The relative risk from an epidemiological study can be adapted to this 50% plus standard to yield a probability or likelihood that an agent caused an individual’s disease. The threshold for concluding that an agent was more likely than not the cause of a disease than not is a relative risk greater than 2.0.”) (internal citations omitted).

In the Second Edition of the Manual, the authorship of the epidemiology chapter shifted, and so did its treatment of doubling of the risk. By adopting a more nuanced analysis, the Second Edition deprived defense counsel of a readily citable source for the proposition that low relative risks do not support inferences of specific causation. The exact conditions for when and how the doubling argument should prevail were, however, left fuzzy and unspecified. See Michael D. Green, D. Michal Freedman , and Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 333, 348-49 (Wash., DC, 2d ed. 2000)

The latest edition of the Manual attempts to correct the failings of the Second Edition by introducing an explanation and a discussion of some of the conditions that might undermine an inference, or opposition thereto, of specific causation from magnitude of relative risk. Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” in Federal Judicial Center, Reference Manual on Scientific Evidence 549, 612 (Wash., DC 3d ed., 2011).

The authors of the Manual now acknowledge that doubling of risk inference has “a certain logic as far as it goes,” but point out that there are some “significant assumptions and important caveats that require explication.” Id.

What are the assumptions according the Manual?

First, and foremost, there must be “[a] valid study and risk estimate.” Id. (emphasis in original). The identification of this predicate assumption is, of course, correct, but the authors overlook that the assumption is often trivially satisfied by the legal context in which the doubling argument arises. For instance, in the Landrigan and Caterinichio cases, cited below, the doubling issue arose not as an admissibility question of expert witness opinion, but on motions for directed verdict. In both cases, plaintiffs’ expert witnesses committed to opinions about plaintiffs’ being at risk from asbestos exposure, based upon studies that they identified. Defense counsel in those cases did not concede the existence of risk, the size of the risk, or the validity of the study, but rather stipulated such facts solely for purposes of their motions to dismiss. In other words, even if the plaintiffs’ relied upon studies were valid and the risk estimates accurate (with relative risks of 1.5), plaintiffs could not prevail because no reasonable jury could infer that plaintiffs’ colorectal cancers were caused by their occupational asbestos exposure. The procedural context of the doubling risk thus often pretermits questions of validity, bias, and confounding.

Second, the Manual identifies that there must be “[s]imilarity among study subjects and plaintiff.” Id. at 613. Again, this assumption is often either pretermitted for purposes of lodging a dispositive motion, conceded, or included as part of the challenge to an expert witness’s opinion’s admissibility. For example, in some litigations, plaintiffs will rely upon high-dose or high-exposure studies that are not comparable to the plaintiff’s actual exposure, and the defense may have shown that the only reliable evidence is that there is a small (relative risk less than two) or no risk at all from the plaintiff’s exposure. External validity objections may well play a role in a contest under Rule 702, but the resolution of a doubling of risk issue will require an appropriate measure of risk for the plaintiff whose injury is at issue.

In the course of identifying this second assumption, the Manual now points out that the doubling argument turns on applying “an average risk for the group” to each individual in the group. Id. This point again is correct, but the Manual does not come to terms with the challenge often made to what I call the assumption of stochastic risk. The Manual authors quote a leading textbook on epidemiology:

“We cannot measure the individual risk, and assigning the average value to everyone in the category reflects nothing more than our ignorance about the determinants of lung cancer that interact with cigarette smoke. It is apparent from epidemiological data that some people can engage in chain smoking for many decades without developing lung cancer. Others are or will become primed by unknown circumstances and need only to add cigarette smoke to the nearly sufficient constellation of causes to initiate lung cancer. In our ignorance of these hidden causal components, the best we can do in assessing risk is to classify people according to measured causal risk indicators and then assign the average observed within a class to persons within the class.”

Id at n.198., quoting from Kenneth J. Rothman, Sander Greenland, and Tim L. Lash, Modern Epidemiology 9 (3d ed. 2008). Although the textbook on this point is unimpeachable, taken at face value, it would introduce an evidentiary nihilism for judicial determinations of specific causation in cases in which epidemiologic measures of risk size are the only basis for drawing probabilistic inferences of specific causation. See also Manual at 614 n. 198., citing Ofer Shpilberg, et al., The Next Stage: Molecular Epidemiology, 50 J. Clin. Epidem. 633, 637 (1997) (“A 1.5-fold relative risk may be composed of a 5-fold risk in 10% of the population, and a 1.1-fold risk in the remaining 90%, or a 2-fold risk in 25% and a 1.1-fold for 75%, or a 1.5-fold risk for the entire population.”). The assumption of stochastic risk is, as Judge Weinstein recognized in Agent Orange, often the only assumption on which plaintiffs will ever have a basis for claiming individual causation on typical datasets available to support health effects claims. Elsewhere, the authors of the Manual’s chapter suggest that statistical “frequentists” would resist the adaptation of relative risk to provide a probability of causation because for the frequentist, the individual case either is or is not caused by the exposure at issue. Manual at 611 n.188. This suggestion appears to confuse the frequentist enterprise for evaluating evidence on the basis of statistical measures of the probability of observing at least as great a departure from expected in a sample rather than attempting to affixing a probability to the population parameter. The doubling argument derives from the well-known “urn model” in probability theory, which is not really at issue in the frequentist-Bayesian wars.

Third, the Manual authors state that the doubling argument assumes the “[n]onacceleration of disease.” In some cases, this statement is correct, but there is no evidence of acceleration, and because an acceleration-of-onset theory would diminish damages, typically defendants would have the burden of going forward with identifying the acceleration phenomenon. The authors go further, however, in stating that “for most of the chronic diseases of adulthood, it is not possible for epidemiologic studies to distinguish between acceleration of disease and causation of new disease.” Manual at 614. The inability to distinguish acceleration from causation of new cases would typically redound to the disadvantage of defendants that are making the doubling argument. In other words, the defendants would, by this supposed inability, be unable to mitigate damages by showing that the alleged harm would have occurred any way, but only later in time. See Manual at 615 n. 199 (“If acceleration occurs, then the appropriate characterization of the harm for purposes of determining damages would have to be addressed. A defendant who only accelerates the occurrence of harm, say, chronic back pain, that would have occurred independently in the plaintiff at a later time is not liable for the same amount of damages as a defendant who causes a lifetime of chronic back pain.”). More important, however, the Manual appears to be wrong that epidemiologic studies cannot identify acceleration of onset of a particular disease in an epidemiologic study or clinical trial. Many modern longitudinal epidemiologic studies and clinical trials use survival analysis and time windows to identify latency or time lagged outcomes in association with identified exposures.

The fourth assumption identified in the Manual is that the exposure under study acts independently of other exposures. The authors give the time-worn example of multiplicative synergy between asbestos and smoking, what elsewhere has been referred to as “The Mt. Sinai Catechism” (June 7, 2013). The example was improvidently chosen given that the multiplicative nature was doubtful when first advanced, and now has effectively been retracted or modified by the researchers following the health outcomes of asbestos insulators in the United States. More important for our purposes here, interactions can be quantified and added to the analysis of attributable risk; interactions are not insuperable barriers to reasonable apportiontment of risk.

Fifth, the Manual identifies two additional assumptions in that (a) the exposure at issue is not responsible for another outcome that competes with morbidity or mortality, and (b) the exposure does not provide a protective “effect” in a subpopulation of those studied. Manual at 615. On the first of these assumptions, the authors suggest that this assumption is required “because in the epidemiologic studies relied on, those deaths caused by the alternative disease process will mask the true magnitude of increased incidence of the studied disease when the study subjects die before developing the disease of interest.” Id. at 615 n.202. Competing causes, however, are frequently studied and can be treated as confounders in an appropriate regression or propensity score analysis to yield a risk estimate for each individual putative effect at issue. The second of the two assumptions is a rehash of the speculative assertion that the epidemiologic study (and the population it samples) may not have a stochastic distribution of risk. Although the stochastic assumption may not be correct, it is often favorable to the party asserting the claim who otherwise would not be able to show that he was not in a sub-population of people not affected at all, or even benefitted from the exposure. Again, modern epidemiology does not stop at identifying populations at risk, but continues to refine the assessment by trying to identify subpopulations that have the risk exclusively. The existence of multi-modal distributions of risk within a population is, again, not a barrier to the doubling argument.

With sufficiently large samples, epidemiologic studies may be able to identify subgroups that have very large relative risks, even when the overall sample under study had a relative risk under two. The possibility of such subgroups, however, should not be an invitation to wholesale speculation that a given plaintiff is in a “vulnerable” subgroup without reliable, valid evidence of what the risks for the identified subgroup are. Too often, the vulnerable plaintiff or subgroup claim is merely hand waving in an evidentiary vacuum. The Manual authors seem to adopt this hand-waving attitude when they give a speculative hypothetical example:

“For example, genetics might be known to be responsible for 50% of the incidence of a disease independent of exposure to the agent. If genetics can be ruled out in an individual’s case, then a relative risk greater than 1.5 might be sufficient to support an inference that the agent was more likely than not responsible for the plaintiff’s disease.”

Manual at 615-16 (internal citations omitted). The hypothetical is unclear whether “the genetics” cases are part of the study that yielded a relative risk of 1.5, but of course if the “genetics” were uniformly distributed in the population, and also in the sample studied in the epidemiologic study, then the “genetics” would appear to drop out of playing any role in elevating risk. But as the authors pointed out in their caveats about interaction, there may well be a role of interaction between the “genetics” and the exposure in the study such that “the genetics” cases occurred earlier or did not add anything to the disease burden that would have been caused by the exposure under study that reported out a relative risk of 1.5. So bottom line, plaintiff would need a study that applied the “genetics” to the epidemiologic study to see what relative risks might be observed in people without the genes at issue.

The Third Edition of the Manual does add more nuance to the doubling of risk argument, but alas more nuance yet is needed. The chapter is an important source to include in any legal argument for or against inferences of specific causation, but it is hardly the final word.

Below, I have updated a reference list of cases that reference the doubling argument.

Radiation

Johnston v. United States, 597 F. Supp. 374, 412, 425-26 (D. Kan. 1984) (rejecting even a relative risk of greater than two as supporting an inference of specific causation)

Allen v. United States, 588 F. Supp. 247, 418 (1984) (rejecting mechanical application of doubling of risk), rev’d on other grounds, 816 F.2d 1417 (10^th Cir. 1987), cert. denied, 484 U.S. 1004 (1988)

In re TMI Litig., 927 F. Supp. 834, 845, 864–66 (M.D. Pa. 1996), aff’d, 89 F.3d 1106 (3d Cir. 1996), aff’d in part, rev’d in part, 193 F.3d 613 (3d Cir. 1999) (rejecting “doubling dose” trial court’s analysis), modified 199 F.3d 158 (3d Cir. 2000) (stating that a dose below ten rems is insufficient to infer more likely than not the existence of a causal link)

In re Hanford Nuclear Reservation Litig., 1998 WL 775340, at *8 (E.D. Wash. Aug. 21, 1998) (“‘[d]oubling of the risk’ is the legal standard for evaluating the sufficiency of the plaintiffs’ evidence and for determining which claims should be heard by the jury,” citing Daubert II), rev’d, 292 F.3d 1124, 1136-37 (9^th Cir. 2002) (general causation)

In re Berg Litig., 293 F.3d 1127 (9th Cir. 2002) (companion case to In re Hanford)

Cano v. Everest Minerals Corp., 362 F. Supp. 2d 814, 846 (W.D. Tex. 2005) (relative risk less than 3.0 represents only a weak association)

Cook v. Rockwell Internat’l Corp., 580 F. Supp. 2d 1071, 1083n.8, 1084, 1088-89 (D. Colo. 2006) (citing Daubert II and “concerns” by Sander Greenland and David Egilman, plaintiffs’ expert witnesses in other cases), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012)

Cotroneo v. Shaw Envt’l & Infrastructure, Inc., No. H-05- 1250, 2007 WL 3145791, at *3 (S.D. Tex. Oct. 25, 2007) (citing Havner, 953 S.W.2d at 717) (radioactive material)

Swine Flu- GBS Cases

Cook v. United States, 545 F. Supp. 306, 308 (N.D. Cal. 1982)(“Whenever the relative risk to vaccinated persons is greater than two times the risk to unvaccinated persons, there is a greater than 50% chance that a given GBS case among vaccinees of that latency period is attributable to vaccination, thus sustaining plaintiff’s burden of proof on causation.”)

Robinson v. United States, 533 F. Supp. 320, 325-28 (E.D. Mich. 1982) (finding for the government and against claimant who developed acute signs and symptoms of GBS 17 weeks after innoculation, in part because of relative and attributable risks)

Padgett v. United States, 553 F. Supp. 794, 800 – 01 (W.D. Tex. 1982) (“From the relative risk, we can calculate the probability that a given case of GBS was caused by vaccination. . . . [A] relative risk of 2 or greater would indicate that it was more likely than not that vaccination caused a case of GBS.”)

Manko v. United States, 636 F. Supp. 1419, 1434 (W.D. Mo. 1986) (relative risk of 2, or less, means exposure not the probable cause of disease claimed) (incorrectly suggesting that relative risk of two means that there was a 50% chance the disease was caused by “chance alone”), aff’d in relevant part, 830 F.2d 831 (8^th Cir. 1987)

IUD Cases – Pelvic Inflammatory Disease

Marder v. G.D. Searle & Co., 630 F. Supp. 1087, 1092 (D.Md. 1986) (“In epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof—a showing of causation by the preponderance of the evidence or, in other words, a probability of greater than 50%.”), aff’d mem. on other grounds sub nom. Wheelahan v. G.D.Searle & Co., 814 F.2d 655 (4^th Cir. 1987) (per curiam)

Bendectin cases

Lynch v. Merrill-National Laboratories, 646 F.Supp. 856 (D. Mass. 1986)(granting summary judgment), aff’d, 830 F.2d 1190, 1197 (1^st Cir. 1987)(distinguishing between chances that “somewhat favor” plaintiff and plaintiff’s burden of showing specific causation by “preponderant evidence”)

DeLuca v. Merrell Dow Pharm., Inc., 911 F.2d 941, 958-59 (3d Cir. 1990) (commenting that ‘‘[i]f New Jersey law requires the DeLucas to show that it is more likely than not that Bendectin caused Amy DeLuca’s birth defects, and they are forced to rely solely on Dr. Done’s epidemiological analysis in order to avoid summary judgment, the relative risk of limb reduction defects arising from the epidemiological data Done relies upon will, at a minimum, have to exceed ‘2’’’)

Daubert v. Merrell Dow Pharms., Inc., 43 F.3d 1311, 1321 (9th Cir.) (“Daubert II”) (holding that for epidemiological testimony to be admissible to prove specific causation, there must have been a relative risk for the plaintiff of greater than 2; testimony that the drug “increased somewhat the likelihood of birth defects” is insufficient) (“For an epidemiological study to show causation under a preponderance standard . . . the study must how that children whose mothers took Bendectin are more than twice as likely to develop limb reduction birth defects as children whose mothers did not.”), cert. denied, 516 U.S. 869 (1995)

DePyper v. Navarro, 1995 WL 788828 (Mich. Cir. Ct. Nov. 27, 1995)

Oxendine v. Merrell Dow Pharm., Inc., 1996 WL 680992 (D.C. Super. Ct. Oct. 24, 1996) (noting testimony by Dr. Michael Bracken, that had Bendectin doubled risk of birth defects, overall rate of that birth defect should have fallen 23% after manufacturer withdrew drug from market, when in fact the rate remained relatively steady)

Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 716 (Tex. 1997) (holding, in accord with the weight of judicial authority, “that the requirement of a more than 50% probability means that epidemiological evidence must show that the risk of an injury or condition in the exposed population was more than double the risk in the unexposed or control population”); id. at at 719 (rejecting isolated statistically significant associations when not consistently found among studies)

Silicone Cases

Hall v. Baxter Healthcare, 947 F. Supp. 1387, 1392, 1397, 1403-04 (D. Ore. 1996) (discussing relative risk of 2.0)

Pick v. American Medical Systems, Inc., 958 F. Supp. 1151, 1160 (E.D. La. 1997) (noting, correctly but irrelevantly, in penile implant case, that “any” increased risk suggests that the exposure “may” have played some causal role)

In re Breast Implant Litigation, 11 F. Supp. 2d 1217, 1226 -27 (D. Colo. 1998) (relative risk of 2.0 or less shows that the background risk is at least as likely to have given rise to the alleged injury)

Barrow v. Bristol-Myers Squibb Co., 1998 WL 812318 (M.D. Fla. Oct. 29, 1998)

Minnesota Mining and Manufacturing v. Atterbury, 978 S.W.2d 183, 198 (Tex.App. – Texarkana 1998) (noting that Havner declined to set strict criteria and that “[t]here is no requirement in a toxic tort case that a party must have reliable evidence of a relative risk of 2.0 or greater”)

Allison v. McGhan Med. Corp., 184 F.3d 1300, 1315n.16, 1316 (11^th Cir. 1999) (affirming exclusion of expert testimony based upon a study with a risk ratio of 1.24; noting that statistically significant epidemiological study reporting an increased risk of marker of disease of 1.24 times in patients with breast implants was so close to 1.0 that it “was not worth serious consideration for proving causation”; threshold for concluding that an agent more likely than not caused a disease is 2.0, citing Federal Judicial Center, Reference Manual on Scientific Evidence 168-69 (1994))

Grant v. Bristol-Myers Squibb, 97 F. Supp. 2d 986, 992 (D. Ariz. 2000)

Pozefsky v. Baxter Healthcare Corp., No. 92-CV-0314, 2001 WL 967608, at *3 (N.D.N.Y. August 16, 2001) (excluding causation opinion testimony given contrary epidemiologic studies; noting that sufficient epidemiologic evidence requires relative risk greater than two)

In re Silicone Gel Breast Implant Litig., 318 F. Supp. 2d 879, 893 (C.D. Cal. 2004) (“The relative risk is obtained by dividing the proportion of individuals in the exposed group who contract the disease by the proportion of individuals who contract the disease in the non-exposed group.”) (noting that relative risk must be more than doubled at a minimum to permit an inference that the risk was operating in plaintiff’s case)

Norris v. Baxter Healthcare Corp., 397 F.3d 878 (10th Cir. 2005) (discussing but not deciding specific causation and the need for relative risk greater than two; no reliable showing of general causation)

Barrow v. Bristol-Meyers Squibb Co., 1998 WL 812318, at *23 (M.D. Fla., Oct. 29, 1998)

Minnesota Mining and Manufacturing v. Atterbury, 978 S.W.2d 183, 198 (Tex. App. – Texarkana 1998) (noting that “[t]here is no requirement in a toxic tort case that a party must have reliable evidence of a relative risk of 2.0 or greater”)

Asbestos

Lee v. Johns Manville Corp., slip op. at 3, Phila. Cty. Ct. C.P., Sept. Term 1978, No. 88 (123) (Oct. 26, 1983) (Forer, J.)(entering verdict in favor of defendants on grounds that plaintiff had failed to show that his colo rectal cancer had been caused by asbestos exposure after adducing evidence of a relative risk less than two)

Washington v. Armstrong World Indus., Inc., 839 F.2d 1121 (5^th Cir. 1988) (affirming grant of summary judgment on grounds that there was insufficient evidence that plaintiff’s colon cancer was caused by asbestos)

Primavera v. Celotex Corp., Phila. Cty. Ct. C.P., December Term, 1981, No. 1283 (Bench Op. of Hon. Berel Caesar, (Nov. 2, 1988) (granting compulsory nonsuit on the plaintiff’s claim that his colorectal cancer was caused by his occupational exposure to asbestos)

In re Fibreboard Corp.,893 F.2d 706, 712 (5th Cir.1990) (“It is evident that these statistical estimates deal only with general causation, for population-based probability estimates do not speak to a probability of causation in any one case; the estimate of relative risk is a property of the studied population, not of an individual’s case.” (internal quotation omitted) (emphasis in original))

Grassis v. Johns-Manville Corp., 248 N.J. Super. 446, 455-56, 591 A.2d 671, 676 (App. Div. 1991) (rejecting doubling of risk threshold in asbestos gastrointestinal cancer claim)

Landrigan v. Celotex Corp., 127 N.J. 404, 419, 605 A.2d 1079 (1992) (reversing judgment entered on directed verdict for defendant on specific causation of claim that asbestos caused decedent’s colon cancer)

Caterinicchio v. Pittsburgh Corning Corp., 127 N.J. 428, 605 A.2d 1092 (1992) (reversing judgment entered on directed verdict for defendant on specific causation of claim that asbestos caused plaintiff’s colon cancer)

In re Joint E. & S. Dist. Asbestos Litig., 758 F. Supp. 199 (S.D.N.Y. 1991), rev’d sub nom. Maiorano v. Owens Corning Corp., 964 F.2d 92, 97 (2d Cir. 1992);

Maiorana v. National Gypsum, 827 F. Supp. 1014, 1043 (S.D.N.Y. 1993), aff’d in part and rev’d in part, 52 F.3d 1124, 1134 (2d Cir. 1995) (stating a preference for the district court’s instructing the jury on the science and then letting the jury weigh the studies)

Keene Corp. v. Hall, 626 A.2d 997 (Md. Spec. Ct. App. 1993) (laryngeal cancer)

Jones v. Owens-Corning Fiberglas Corp., 288 N.J. Super. 258, 266, 672 A.2d 230, 235 (App. Div. 1996) (rejecting doubling of risk threshold in asbestos gastrointestinal cancer claim)

In re W.R. Grace & Co., 355 B.R. 462, 483 (Bankr. D. Del. 2006) (requiring showing of relative risk greater than two to support property damage claims based on unreasonable risks from asbestos insulation products)

Kwasnik v. A.C. & S., Inc. (El Paso Cty., Tex. 2002)

Sienkiewicz v. Greif (U.K.) Ltd., [2009] EWCA (Civ) 1159, at ¶23 (Lady Justice Smith) (“In my view, it must now be taken that, saving the expression of a different view by the Supreme Court, in a case of multiple potential causes, a claimant can demonstrate causation in a case by showing that the tortious exposure has at least doubled the risk arising from the non-tortious cause or causes.”)

Sienkiewicz v. Greif Ltd., [2011] UKSC 10.

“Where there are competing alternative, rather than cumulative, potential causes of a disease or injury, such as in Hotson, I can see no reason in principle why epidemiological reason should not be used to show that one of the causes was more than twice as likely as all the others put together to have caused the disease or injury.” (Lord Philips, at ¶ 93)

(arguing that statistical evidence should be considered without clearly identifying the nature and extent of its role) (Baroness Hale, ¶ 172-73)

(insisting upon difference between fact and probability of causation, with statistical evidence not probative of the former) (Lord Roger, at ¶143-59)

(“the law is concerned with the rights and wrongs of an individual situation, and should not treat people and even companies as statistics,” although epidemiologic evidence can appropriately be used he identified “in conjunction with specific evidence”) (Lord Mance, at ¶205)

(concluding that epidemiologic evidence can establish the probability, but not the fact of causation, and vaguely suggesting that whether epidemiologic evidence should be allowed was a matter of policy) (Lord Dyson, ¶218-19)

Dixon v. Ford Motor Co., 47 A. 3d 1038, 1046-47 & n.11 (Md. Ct. Special Appeals 2012)(“we can explicitly derive the probability of causation from the statistical measure known as ‘relative risk’, as did the U.S. Court of Appeals for the Third Circuit in DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 958 (3d Cir.1990), in a holding later adopted by several courts. For reasons we need not explore in detail, it is not prudent to set a singular minimum ‘relative risk’value as a legal standard. But even if there were some legal threshold, Dr. Welch provided no information that could help the finder of fact to decide whether the elevated risk in this case was ‘substantial’.”)(internal citations omitted), rev’d, 433 Md. 137, 70 A.3d 328 (2013)

Pharmaceutical Cases

Ambrosini v. Upjohn, 1995 WL 637650, at *4 (D.D.C. Oct. 18, 1995) (excluding plaintiff’s expert witness, Dr. Brian Strom, who was unable to state that mother’s use of Depo-Provero to prevent miscarriage more than doubled her child’s risk of a birth defect)

Ambrosini v. Labarraque, 101 F.3d 129, 135 (D.C. Cir. 1996)(Depo-Provera, birth defects) (testimony “does not warrant exclusion simply because it fails to establish the causal link to a specified degree of probability”)

Siharath v. Sandoz Pharms. Corp., 131 F. Supp. 2d 1347, 1356 (N.D. Ga. 2001)

Cloud v. Pfizer Inc., 198 F. Supp. 2d 1118, 1134 (D. Ariz. 2001) (sertraline and suicide)

Miller v. Pfizer, 196 F. Supp. 2d 1062, 1079 (D. Kan. 2002) (acknowledging that most courts require a showing of RR > 2, but questioning their reasoning; “Court rejects Pfizer’s argument that unless Zoloft is shown to create a relative risk [of akathisia] greater than 2.0, [expert’s] testimony is inadmissible”), aff’d, 356 F. 3d 1326 (10th Cir.), cert. denied, 543 U.S. 917 (2004)

XYZ, et al. v. Schering Health Care Ltd., [2002] EWHC 1420, at ¶21, 70 BMLR 88 (QB 2002) (noting with approval that claimants had accepted the need to prove relative risk greater than two; finding that most likely relative risk was 1.7, which required finding against claimants even if general causation were established)

Smith v. Wyeth-Ayerst Laboratories Co., 278 F. Supp. 2d 684, 691 (W.D.N.C. 2003) (recognizing that risk and cause are distinct concepts) (“Epidemiologic data that shows a risk cannot support an inference of cause unless (1) the data are statistically significant according to scientific standards used for evaluating such associations; (2) the relative risk is sufficiently strong to support an inference of ‘more likely than not’; and (3) the epidemiologic data fits the plaintiff’s case in terms of exposure, latency, and other relevant variables.”) (citing FJC Reference Manual at 384 – 85 (2d ed. 2000))

Kelley v. Sec’y of Health & Human Servs., 68 Fed. Cl. 84, 92 (Fed. Cl. 2005) (quoting Kelley v. Sec’y of Health & Human Servs., No. 02-223V, 2005 WL 1125671, at *5 (Fed. Cl. Mar. 17, 2005) (opinion of Special Master explaining that epidemiology must show relative risk greater than two to provide evidence of causation), rev’d on other grounds, 68 Fed. Cl. 84 (2005))

Pafford v. Secretary of HHS, No. 01–0165V, 64 Fed. Cl. 19, 2005 WL 4575936 at *8 (2005) (expressing preference for “an epidemiologic study demonstrating a relative risk greater than two … or dispositive clinical or pathological markers evidencing a direct causal relationship”) (citing Stevens v. Secretary of HHS, No.2001 WL 387418 at *12), aff’d, 451 F.3d 1352 (Fed. Cir. 2006)

Burton v. Wyeth-Ayerst Labs., 513 F. Supp. 2d 719, 730 (N.D. Tex. 2007) (affirming exclusion of expert witness testimony that did not meet Havner’s requirement of relative risks greater than two, Merrell Dow Pharm., Inc. v. Havner, 953 S.W.2d 706, 717–18 (Tex. 1997))

In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1172 (N.D. Calif. 2007) (observing that epidemiologic studies “can also be probative of specific causation, but only if the relative risk is greater than 2.0, that is, the product more than doubles the risk of getting the disease”)

In re Bextra & Celebrex, 2008 N.Y. Misc. LEXIS 720, *23-24, 239 N.Y.L.J. 27 (2008) (“Proof that a relative risk is greater than 2.0 is arguably relevant to the issue of specific, as opposed to general causation and is not required for plaintiffs to meet their burden in opposing defendants’ motion.”)

In re Viagra Products Liab. Litigat., 572 F. Supp. 2d 1071, 1078 (D. Minn. 2008) (noting that some but not all courts have concluded relative risks under two support finding expert witness’s opinion to be inadmissible)

Vanderwerf v. SmithKlineBeecham Corp., 529 F.Supp. 2d 1294, 1302 n.10 (D. Kan. 2008), appeal dism’d, 603 F.3d 842 (10th Cir. 2010) (“relative risk of 2.00 means that a particular event of suicidal behavior has a 50 per cent chance that is associated with the exposure to Paxil … .”)

Wright v. American Home Products Corp., 557 F. Supp. 2d 1032, 1035-36 (W.D. Mo. 2008) (fenfluramine case)

Beylin v. Wyeth, 738 F.Supp. 2d 887, 893 n.3 (E.D.Ark. 2010) (MDL court) (Wilson, J. & Montgomery, J.) (addressing relative risk of two argument in dictum; holding that defendants’ argument that for an opinion to be relevant it must show that the medication causes the relative risk to exceed two “was without merit”)

Merck & Co. v. Garza, 347 S.W.3d 256 (Tex. 2011), rev’g 2008 WL 2037350, at *2 (Tex. App. — San Antonio May 14, 2008, no pet. h.)

Scharff v. Wyeth, No. 2:10–CV–220–WKW, 2012 WL 3149248, *6 & n.9, 11 (M.D. Ala. Aug. 1, 2012) (post-menopausal hormone therapy case; “A relative risk of 2.0 implies a 50% likelihood that an exposed individual’s disease was caused by the agent. The lower relative risk in this study reveals that some number less than half of the additional cases could be attributed to [estrogen and progestin].”)

Cheek v. Wyeth, LLC (In re Diet Drugs), 890 F.Supp. 2d 552 (E.D. Pa. 2012)

Medical Malpractice – Failure to Prescribe; Delay in Treatment

Merriam v. Wanger, 757 A.2d 778, 2000 Me. 159 (2000) (reversing judgment on jury verdict for plaintiff on grounds that plaintiff failed to show that defendant failure to act were, more likely than not, a cause of harm)

Bonesmo v. The Nemours Foundation, 253 F. Supp. 2d 801, 809 (D. Del. 2003)

Theofanis v. Sarrafi, 791 N.E.2d 38,48 (Ill. App. 2003) (reversing and granting new trial to plaintiff who received an award of no damages when experts testified that relative risk was between 2.0 and 3.0)(“where the risk with the negligent act is at least twice as great as the risk in the absence of negligence, the evidence supports a finding that, more likely than not, the negligence in fact caused the harm”)

Cottrelle v. Gerrard, 67 OR (3d) 737 (2003), 2003 CanLII 50091 (ONCA), at ¶ 25 (Sharpe, J.A.) (less than a probable chance that timely treatment would have made a difference for plaintiff is insufficient), leave to appeal den’d SCC (April 22, 2004)

Joshi v. Providence Health System of Oregon Corp., 342 Or. 152, 156, 149 P. 3d 1164, 1166 (2006) (affirming directed verdict for defendants when expert witness testified that he could not state, to a reasonable degree of medical probability, beyond 30%, that administering t-PA, or other anti-coagulant would have changed the outcome and prevented death)

Ensink v. Mecosta County Gen. Hosp., 262 Mich. App. 518, 687 N.W.2d 143 (Mich. App. 2004) (affirming summary judgment for hospital and physicians when patient could not greater than 50% probability of obtaining a better result had emergency physician administered t-PA within three hours of stroke symptoms)

Lake Cumberland, LLC v. Dishman, 2007 WL 1229432, *5 (Ky. Ct. App. 2007) (unpublished) confusing 30% with a “reasonable probability”; citing without critical discussion an apparently innumerate opinion of expert witness Dr. Lawson Bernstein)

Mich. Comp. Laws § 600.2912a(2) (2009) (“In an action alleging medical malpractice, the plaintiff has the burden of proving that he or she suffered an injury that more probably than not was proximately caused by the negligence of the defendant or defendants. In an action alleging medical malpractice, the plaintiff cannot recover for loss of an opportunity to survive or an opportunity to achieve a better result unless the opportunity was greater than 50%.”)

O’Neal v. St. John Hosp. & Med. Ctr., 487 Mich. 485, 791 N.W.2d 853 (Mich. 2010) (affirming denial of summary judgment when failure to administer therapy (not t-PA) in a timely fashion supposedly more than doubled the risk of stroke)

Kava v. Peters, 450 Fed. Appx. 470, 478-79 (6th Cir. 2011) (affirming summary judgment for defendants when plaintiffs expert witnesses failed to provide clear testimony that plaintiff specific condition would have been improved by timely administration of therapy)

Smith v. Bubak, 643 F.3d 1137, 1141–42 (8th Cir. 2011) (rejecting relative benefit testimony and suggesting in dictum that absolute benefit “is the measure of a drug’s overall effectiveness”)

Young v. Mem’l Hermann Hosp. Sys., 573 F.3d 233, 236 (5th Cir. 2009) (holding that Texas law requires a doubling of the relative risk of an adverse outcome to prove causation), cert. denied, ___ U.S. ___, 130 S.Ct. 1512 (2010)

Gyani v. Great Neck Medical Group, 2011 WL 1430037 (N.Y. S.Ct. for Nassau Cty., April 4, 2011) (denying summary judgment to medical malpractice defendant on stroke patient’s claims that failure to administer t-PA, on naked assertions of proximate cause by plaintiff’s expert witness, and without considering actual magnitude of risk increased by alleged failure to treat)

Samaan v. St. Joseph Hospital, 670 F.3d 21 (1st Cir. 2012)

Goodman v. Viljoen, 2011 ONSC 821 (CanLII)(treating a risk ratio of 1.7 for harm, or 0.6 for prevention, as satisfying the “balance of probabilities” when taken with additional unquantified, unvalidated speculation), aff’d, 2012 ONCA 896 (CanLII), leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013)

Briante v. Vancouver Island Health Authority, 2014 Brit. Columbia S.Ct 1511, at ¶ 317 (plaintiff must show “on a balance of probabilities that the defendant caused the injury”)

Toxic Tort Cases

In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 836 (E.D.N.Y. 1984) (“A government administrative agency may regulate or prohibit the use of toxic substances through rulemaking, despite a very low probability of any causal relationship. A court, in contrast, must observe the tort law requirement that a plaintiff establish a probability of more than 50% that the defendant’s action injured him. … This means that at least a two-fold increase in incidence of the disease attributable to Agent Orange exposure is required to permit recovery if epidemiological studies alone are relied upon.”), aff’d 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988)

Wright v. Willamette Indus., Inc., 91 F.3d 1105 (8th Cir. 1996)(“Actions in tort for damages focus on the question of whether to transfer money from one individual to another, and under common-law principles (like the ones that Arkansas law recognizes) that transfer can take place only if one individual proves, among other things, that it is more likely than not that another individual has caused him or her harm. It is therefore not enough for a plaintiff to show that a certain chemical agent sometimes causes the kind of harm that he or she is complaining of. At a minimum, we think that there must be evidence from which the factfinder can conclude that the plaintiff was exposed to levels of that agent that are known to cause the kind of harm that the plaintiff claims to have suffered. See Abuan v. General Elec. Co., 3 F.3d at 333. We do not require a mathematically precise table equating levels of exposure with levels of harm, but there must be evidence from which a reasonable person could conclude that a defendant’s emission has probably caused a particular plaintiff the kind of harm of which he or she complains before there can be a recovery.”)

Sanderson v. Internat’l Flavors & Fragrances, Inc., 950 F. Supp. 981, 998 n. 17, 999-1000, 1004 (C.D. Cal.1996) (more than a doubling of risk is required in case involving aldehyde exposure and claimed multiple chemical sensitivities)

McDaniel v. CSX Transp., Inc., 955 S.W.2d 257, 264 (1997) (doubling of risk is relevant but not required as a matter of law)

Schudel v. General Electric Co., 120 F.3d 991, 996 (9th Cir. 1997) (polychlorinated biphenyls)

Lofgren v. Motorola, 1998 WL 299925 *14 (Ariz. Super. June 1, 1998) (suggesting that relative risk requirement in tricholorethylene cancer medical monitoring case was arbitrary, but excluding plaintiffs’ expert witnesses on other grounds)

Berry v. CSX Transp., Inc., 709 So. 2d 552 (Fla. D. Ct.App. 1998) (reversing exclusion of plaintiff’s epidemiologist in case involving claims of toxic encephalopathy from solvent exposure, before Florida adopted Daubert standard)

Bartley v. Euclid, Inc., 158 F.3d 261 (5^th Cir. 1998) (evidence at trial more than satisfied the relative risk greater than two requirement), rev’d on rehearing en banc, 180 F.3d 175 (5th Cir. 1999)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 591-92, 605 n.27, 606–07 (D.N.J. 2002) (“When the relative risk reaches 2.0, the risk has doubled, indicating that the risk is twice as high among the exposed group as compared to the non-exposed group. Thus, ‘the threshold for concluding that an agent was more likely than not the cause of an individual’s disease is a relative risk greater than 2.0’.”) (quoting FJC Reference Manual at 384), aff’d, 68 F. App’x 356 (3d Cir. 2003)

Allison v. Fire Ins. Exchange, 98 S.W.3d 227, 239 (Tex. App. — Austin 2002, no pet. h.)

Ferguson v. Riverside School Dist. No. 416, 2002 WL 34355958 (E.D. Wash. Feb. 6, 2002) (No. CS-00-0097-FVS)

Daniels v. Lyondell-Citgo Refining Co., 99 S.W.3d 722, 727 (Tex. App. – Houston [1^st Dist.] 2003) (affirming exclusion of expert witness testimony that did not meet Havner’s requirement of relative risks greater than two)

Exxon Corp. v. Makofski, 116 S.W.3d 176, 184-85 (Tex. App. — Houston 2003)

Frias v. Atlantic Richfield Co., 104 S.W.3d 925 (Tex. App. — Houston 2003)

Graham v. Lautrec, Ltd., 2003 WL 23512133, at *1 (Mich. Cir. Ct. 2003) (mold)

Mobil Oil Corp. v. Bailey, 187 S.W.3d 263, 268 (Tex. App. – Beaumont 2006) (affirming exclusion of expert witness testimony that did not meet Havner’s requirement of relative risks greater than two)

In re Lockheed Litig. Cases, 115 Cal. App. 4th 558 (2004)(alleging brain, liver, and kidney damage), rev’d in part, 23 Cal. Rptr. 3d 762, 765 (Cal. App. 2d Dist. 2005) (“[A] court cannot exclude an epidemiological study from consideration solely because the study shows a relative risk of less than 2.0.”), rev. dismissed, 192 P.3d 403 (Cal. 2007)

Novartis Grimsby Ltd. v. Cookson, [2007] EWCA (Civ) 1261, at para. 74 (causation was successfully established by risk ratio greater than two; per Lady Justice Smith: “Put in terms of risk, the occupational exposure had more than doubled the risk [of the bladder cancer complained of] due to smoking. . . . if the correct test for causation in a case such as this is the “but for” test and nothing less will do, that test is plainly satisfied on the facts as found. . . . In terms of risk, if the occupational exposure more than doubles the risk due to smoking, it must, as a matter of logic, be probable that the disease was caused by the former.”)

Watts v. Radiator Specialty Co., 990 So. 2d 143 (Miss. 2008) (“The threshold for concluding that an agent was more likely than not the cause of an individual’s disease is a relative risk greater than 2.0.”)

King v. Burlington Northern Santa Fe Ry, 762 N.W.2d 24, 36-37 (Neb. 2009) (reversing exclusion of proffered testimony of Arthur Frank on claim that diesel exposure caused multiple myeloma, and addressing in dicta the ability of expert witnesses to speculate reasons why specific causation exists even with relative risk less than two) (“If a study shows a relative risk of 2.0, ‘the agent is responsible for an equal number of cases of disease as all other background causes.’ This finding ‘implies a 50% likelihood that an exposed individual’s disease was caused by the agent.’ If the relative risk is greater than 2.0, the study shows a greater than 50–percent likelihood that the agent caused the disease.”)(internal citations to Reference Manual on Scientific Evidence (2d ed. 2000) omitted)

Henricksen v. Conocophillips Co., 605 F. Supp. 2d 1142, 1158 (E.D. Wash. 2009) (noting that under Circuit precedent, epidemiologic studies showing low-level risk may suffiicent to show general causation but are sufficient to show specific causation only if relative risk exceeds two) (excluding plaintiff‘s expert witness’s testimony because epidemiologic evidence is “contradictory and inconsistent”)

City of San Antonio v. Pollock, 284 S.W.3d 809, 818 (Tex. 2009) (holding testimony admitted insufficient as matter of law)

George v. Vermont League of Cities and Towns, 2010 Vt. 1, 993 A.2d 367, 375 (2010)

Blanchard v. Goodyear Tire & Rubber Co., No. 837-12-07 Wrcv (Eaton, J., June 28, 2010) (excluding expert witness, David Goldsmith, and entering summary judgment), aff’d, 190 Vt. 577, 30 A.3d 1271 (2011)

Pritchard v. Dow Agro Sciences, 705 F. Supp. 2d 471, 486 (W.D. Pa. 2010) (excluding opinions of Dr. Omalu on Dursban, in part because of low relative risk) (“Therefore, a relative risk of 2.0 is not dispositive of the reliability of an expert’s opinion relying on an epidemiological study, but it is a factor, among others, which the Court is to consider in its evaluation.”), aff’d, 430 Fed. Appx. 102, 2011 WL 2160456 (3d Cir. 2011)

Faust v. BNSF Ry., 337 S.W.3d 325, 337 (Tex. Ct. App. 2d Dist. 2011) (“To be considered reliable scientific evidence of general causation, an epidemiological study must (1) have a relative risk of 2.0 and (2) be statistically significant at the 95% confidence level.”) (internal citations omitted)

Nonnon v. City of New York, 88 A.D.3d 384, 398-99, 932 N.Y.S.2d 428, 437-38 (1st Dep’t 2011) (holding that the strength of the epidemiologic evidence, with relative risks greater than 2.0, permitted an inference of causation)

Milward v. Acuity Specialty Products Group, Inc., 969 F. Supp. 2d 101, 112-13 & n.7 (D. Mass. 2013) (avoiding doubling of risk issue and holding that plaintiffs’ expert witnesses failed to rely upon a valid exposure estimate and lacked sufficient qualifications to evaluate and weigh the epidemiologic studies that provided estimates of relative risk) (generalities about the “core competencies” of physicians or specialty practices cannot overcome an expert witness’s explicit admission of lacking the epidemiologic expertise needed to evaluate and weigh the epidemiologic studies and methods at issue in the case. Without the requisite qualifications, an expert witness cannot show that the challenged opinion has a sufficiently reliable scientific foundation in epidemiologic studies and method.)

Berg v. Johnson & Johnson, 940 F.Supp.2d 983 (D.S.D. 2013) (talc and ovarian cancer)

Other

In re Hannaford Bros. Co. Customer Data Sec. Breach Litig., 293 F.R.D. 21, 2:08-MD-1954-DBH, 2013 WL 1182733, *1 (D. Me. Mar. 20, 2013) (Hornby, J.) (denying motion for class certification) (“population-based probability estimates do not speak to a probability of causation in any one case; the estimate of relative risk is a property of the studied population, not of an individual’s case.”)

Posted in Causation, Reference Manual on Scientific Evidence, Rule 702 | Comments Off on Reference Manual on Scientific Evidence on Relative Risk Greater Than Two For Specific Causation Inference