TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Zoloft MDL Relieves Matrixx Depression

January 30th, 2015

When the Supreme Court delivered its decision in Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309 (2011), a colleague, David Venderbush from Alston & Bird LLP, and I wrote a Washington Legal Foundation Legal Backgrounder, in which we predicted that plaintiffs’ counsel would distort the holding, and inflate the dicta of the opinion. Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011)[1]. Our prediction was sadly all-too accurate. Not only was the context of the Matrixx distorted, but several district courts appeared to adopt the dicta on statistical significance as though it represented the holding of the case[2].

The Matrixx decision, along with the few district court opinions that had embraced its dicta[3], was urged as the basis for denying a defense challenge to the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist, in the Zoloft MDL. The trial court, however, correctly discerned several methodological shortcomings and failures, including Dr. Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Plaintiffs (through their Plaintiffs’ Steering Committee (PSC) in the Zoloft MDL) were undaunted and moved for reconsideration, asserting that the MDL trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx, and a Third Circuit decision in DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The MDL trial judge, however, deftly rebuffed the plaintiffs’ use of Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration).

In rejecting the motion for reconsideration, the Zoloft MDL trial judge noted that the PSC had previously cited Matrixx, and that the Court had addressed the case in its earlier ruling. 2015 WL 314149, at *2-3. The MDL Court then proceeded to expand upon its earlier ruling, and to explain how Matrixx was largely irrelevant to the Rule 702 context of Pfizer’s challenge to Dr. Bérard. There were, to be sure, some studies with nominal statistically significant results, for some birth defects, among children of mothers who took Zoloft in their first trimester of pregnancy. As Judge Rufe explained, statistical significance, or the lack thereof, was only one item in a fairly long list of methodological deficiencies in Dr. Bérard’s causation opinions:

“The [original] opinion set forth a detailed and multi-faceted rationale for finding Dr. Bérard’s testimony unreliable, including her inattention to the principles of replication and statistical significance, her use of certain principles and methods without demonstrating either that they are recognized by her scientific community or that they should otherwise be considered scientifically valid, the unreliability of conclusions drawn without adequate hypothesis testing, the unreliability of opinions supported by a ‛cherry-picked’ sub-set of research selected because it was supportive of her opinions (without adequately addressing non-supportive findings), and Dr. Bérard’s failure to reconcile her currently expressed opinions with her prior opinions and her published, peer-reviewed research. Taking into account all these factors, as well as others discussed in the Opinion, the Court found that Dr. Bérard departed from well-established epidemiological principles and methods, and that her opinion on human causation must be excluded.”

Id. at *1.

In citing the multiple deficiencies of the proffered expert witness, the Zoloft MDL Court thus put its decision well within the scope of the Third Circuit’s recent precedent of affirming the exclusion of Dr. Bennet Omalu, in Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir.2011). The Zoloft MDL Court further defended its ruling by pointing out that it had not created a legal standard requiring statistical significance, but rather had made a factual finding that epidemiologist, such as the challenged witness, Dr. Anick Bérard, would use some measure of statistical significance in reaching conclusions in her discipline of epidemiology. 2015 WL 314149, at *2[4].

On the plaintiffs’ motion for reconsideration, the Zoloft Court revisited the Matrixx case, properly distinguishing the case as a securities fraud case about materiality of non-disclosed information, not about causation. 2015 WL 314149, at *4. Although the MDL Court could and should have identified the Matrixx language as clearly obiter dicta, it did confidently distinguish the Supreme Court holding about pleading materiality from its own task of gatekeeping expert witness testimony on causation in a products liability case:

“Because the facts and procedural posture of the Zoloft MDL are so dissimilar from those presented in Matrixx, this Court reviewed but did not rely upon Matrixx in reaching its decision regarding Dr. Bérard. However, even accepting the PSC’s interpretation of Matrixx, the Court’s Opinion is consistent with that ruling, as the Court reviewed Dr. Bérard’s methodology as a whole, and did not apply a bright-line rule requiring statistically significant findings.”

Id. at *4.

In mounting their challenge to the MDL Court’s earlier ruling, the Zoloft plaintiffs asserted that the Court had failed to credit Dr. Bérard’s reliance upon what Dr. Bérard called the “Rothman approach.” This approach, attribution to Professor Kenneth Rothman had received some attention in the Bendectin litigation in the Third Circuit, where plaintiffs sought to be excused from their failure to show statistically significant associations when claiming causation between maternal use of Bendectin and infant birth defects. DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The Zoloft MDL Court pointed out that the Circuit, in DeLuca, had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:

“by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicial of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”

2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

In the Zoloft MDL, the plaintiffs not only offered an erroneous interpretation of the Third Circuit’s precedents in DeLuca, they also failed to show that the “Rothman” approach had become generally accepted in over two decades since DeLuca. 2015 WL 314149, at *4. Indeed, the hearing record was quite muddled about what the “Rothman” approach involved, other than glib, vague suggestions that the approach would have countenanced Dr. Bérard’s selective, over-reaching analysis of the extant epidemiologic studies. The plaintiffs did not call Rothman as an expert witness; nor did they offer any of Rothman’s publications as exhibits at the Zoloft hearing. Although Professor Rothman has criticized the overemphasis upon p-values and significance testing, he has never suggested that researchers and scientists should ignore random error in interpreting research data. Nevertheless, plaintiffs attempted to invoke some vague notion of a Rothman approach that would ignore confidence intervals, attained significance probability, multiplicity, bias, and confounding. Ultimately, the MDL Court would have none of it. The Court held that the Rothman Approach (whatever that is), as applied by Dr. Bérard, did not satisfy Rule 702.

The testimony at the Rule 702 hearing on the so-called “Rothman approach” had been sketchy at best. Dr. Bérard protested, perhaps too much, when asked about her having ignored p-values:

“I’m not the only one saying that. It’s really the evolution of the thinking of the importance of statistical significance. One of my professors and also a friend of mine at Harvard, Ken Rothman, actually wrote on it – wrote on the topic. And in his book at the end he says obviously what I just said, validity should not be confused with precision, but the third bullet point, it’s saying that the lack of statistical significance does not invalidate results because sometimes you are in the context of rare events, few cases, few exposed cases, small sample size, exactly – you know even if you start with hundreds of thousands of pregnancies because you are looking at rare events and if you want to stratify by exposure category, well your stratum becomes smaller and smaller and your precision decreases. I’m not the only one saying that. Ken Rothman says it as well, so I’m not different from the others. And if you look at many of the studies published nowadays, they also discuss that as well.”

Notes of Testimony of Dr. Anick Bérard, at 76:21- 77:14 (April 9, 2014). See also Notes of Testimony of Dr. Anick Bérard, at 211 (April 11, 2014) (discussing non-statistically significant findings as a “trend,” and asserting that the lack of a significant finding does not mean that there is “no effect”). Bérard’s invocation of Rothman here is accurate but unhelpful. Rothman and Bérard are not alone in insisting that confidence intervals provide a measure of precision of an estimate, and that we should be careful not to interpret the lack of significance to mean no effect. But the lack of significance cannot be used to interpret data to show an effect.

At the Rule 702 hearing, the PSC tried to bolster Dr. Bérard’s supposed reliance upon the “Rothman approach” in cross-examining Pfizer’s expert witness, Dr. Stephen Kimmel:

“Q. You know who Dr. Rothman is, the epidemiologist?
A. Yes.
Q. You actually took a course from Dr. Rothman, didn’t you?
A. I did when I was a student way back.
Q. He is a well-known epidemiologist, isn’t he?
A. Yes, he is.
Q. He has published this book, Modern Epidemiology. Do you have a copy of this?
A. I do.
Q. Do you – Have you ever read it?
A. I read his earlier edition. I have not read the most recent edition.
Q. There’s two other authors, Sander Greenland and Tim Lash. Do you know either one of them?
A. I know Sander. I don’t know Tim.
Q. Dr. Rothman has some – he has written about confidence intervals and statistical significance for some time, hasn’t he?
A. He has.
Q. Do you agree with him that statistical significance is a not matter of validity. It’s a matter of precision?
A. It’s a matter of – well, confidence intervals are matters of precision. P-values are not.
Q. Okay. I want to put up a table and see if you are in agreement with Dr. Rothman. This is the third edition of Modern Epidemiology. And he has – and ignore my brother’s handwriting. But there is an hypothesized rate ratio under 10-3. It says: p-value function from which one can find all confidence limits for a hypothetical study with a rate ratio estimate of 3.1 Do you see that there?
A. Yes. I don’t see the top of the figure, not that it matters.
Q. I want to make sure. The way I understand this, he is giving us a hypothesis that we have a relative risk of 3.1 and it [presumably a 95% confidence interval] crosses 1, meaning it’s not statistically significant. Is that fair?
A. Well, if you are using a value of .05, yes. And again, if this is a single test and there’s a lot of things that go behind it. But, yes, so this is a total hypothetical.
Q. Yes.
A. I’ sorry. He’s saying here is a hypothetical based on math. And so here is – this is what we would propose.
Q. Yes, I want to highlight what he says about this figure and get your thoughts on it. He says:
The message of figure 10-3 is that the example data are more compatible with a moderate to strong association than with no association, assuming the statistical model used to construct the function is correct.
A. Yes.
Q. Would you agree with that statement?
A. Assuming the statistical model is correct. And the problem is, this is a hypothetical.
Q. Sure. So let’s just assume. So what this means to sort of put some meat on the bone, this means that although we cross 1 and therefore are statistically
significant [sic, non-significant], he says the more likely truth here is that there is a moderate to strong effect rather than no effect?
A. Well, you know he has hypothesized this. This is not used in common methods practice in pharmacoepi. Dr. Rothman has lots of ideas but it’s not part of our standard scientific method.

Notes of Testimony of Dr. Stephen Kimmel, at 126:2 to 128:20.

Nothing very concrete about the “Rothman approach” is put before the MDL Court, either through Dr. Bérard or Dr. Kimmel. There are, however, other instructive aspects to the plaintiff’s counsel’s examination. First, the referenced portion of the text, Modern Epidemiology, is a discussion of p-value functions, not of p-values or of confidence intervals per se. Modern Epidemiology at 158-59 (3d ed. 2008). Dr. Bérard never discussed p-value functions in her report or in her testimony, and Dr. Kimmel testified, without contradiction, that such p-value functions are “not used in common methods practice.” Second, the plaintiff’s counsel never marked and offered the Rothman text as an exhibit for the MDL Court to consider. Third, the cross-examiner first asked about the implication for a hypothetical association, and then, when he wanted to “put some meat on the bone” changed the word used in Rothman’s text, “association,” to “effect.” The word “effect” does not appear in Rothman’s text at the referenced discussion about p-value functions. Fortunately, the MDL Court was not poisoned by the “meat on the bone.”

The Pit and the Pendulum

Another document glibly referenced but not provided to the MDL Court was the publication of Sir Austin Bradford Hill’s presidential address to the Royal Society of Medicine on causation. The MDL Court acknowledged that the PSC had argued that the emphasis upon statistical significance was contrary to Hill’s work and teaching. 2015 WL 314149, at *5. In the Court’s words:

“the PSC argues that the Court’s finding regarding the importance of statistical significance in the field of epidemiology is inconsistent with the work of Bradford Hill. The PSC points to a 1965 address by Sir Austin Bradford Hill, which it has not previously presented to the Court, except in opening statements of the Daubert hearings.20 The PSC failed to put forth evidence establishing that Bradford Hill’s statement that ‛I wonder whether the pendulum has not swung too far [in requiring statistical significance before drawing conclusions]’ has, in the decades since that 1965 address, altered the importance of statistical significance to scientists in the field of epidemiology.”

Id. This failure, identified by the Court, is hardly surprising. The snippet of a quotation from Hill would not sustain the plaintiffs’ sweeping generalization. The quoted language in context may help to explain why Hill’s paper was not provided:

“I wonder whether the pendulum has not swung too far – not only with the attentive pupils but even with the statisticians themselves. To decline to draw conclusions without standard errors can surely be just as silly? Fortunately I believe we have not yet gone so far as our friends in the USA where, I am told, some editors of journals will return an article because tests of significance have not been applied. Yet there are innumerable situations in which they are totally unnecessary – because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too small to be of any practical importance. What is worse the glitter of the t table diverts attention from the inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnel volunteer for some procedure or interview, 20% of patients treated in some particular way are lost to sight, 30% of a randomly-drawn sample are never contracted. The sample may, indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house and carried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers.’ The writer, the editor and the reader are unmoved. The magic formulae are there.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 299 (1965).

In the Zoloft cases, no expert witness was prepared to state that the disparity was “grotesquely obvious,” or “negligible.” And Bradford Hill’s larger point was that bias and confounding often dwarf considerations of random error, and that there are many instances in which significance testing is unavailing or unhelpful. And in some studies, with large “effect sizes,” statistical significance testing may be beside the point.

Hill’s presidential address to the Royal Society of Medicine commemorated his successes in epidemiology, and we need only turn to Hill’s own work to see how prevalent was his use of measurements of significance probability. See, e.g., Richard Doll & Austin Bradford Hill, “Smoking and Carcinoma of the Lung: Preliminary Report,” Brit. Med. J. 740 (Sept. 30, 1950); Medical Research Council, “Streptomycin Treatment of Pulmonary Tuberculosis,” Brit. Med. J. 769 (Oct. 30, 1948).

Considering the misdirection on Rothman and on Hill, the Zoloft MDL Court did an admirable job in unraveling the Matrixx trap set by counsel. The Court insisted upon parsing the Bradford Hill factors[5], over Pfizer’s objection, despite the plaintiffs’ failure to show “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” which Bradford Hill insisted was the prerequisite for the exploration of the nine factors he set out in his classic paper. Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Given the outcome, the Court’s questionable indulgence of plaintiffs’ position was ultimately harmless.


[1] See alsoThe Matrixx – A Comedy of Errors,” and “Matrixx Unloaded,” (Mar. 29, 2011), “The Matrixx Oversold,” and “De-Zincing the Matrixx.”

[2] SeeSiracusano Dicta Infects Daubert Decisions” (Sept. 22, 2012).

[3] See, e.g., In re Chantix (Varenicline) Prods. Liab. Litig., 2012 U.S. Dist. LEXIS 130144, at *22 (N.D. Ala. 2012); Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012); In re Celexa & Lexapro Prods. Liab. Litig.,  ___ F.3d ___, 2013 WL 791780 (E.D. Mo. 2013).

[4] The Court’s reasoning on this point begged the question whether an ordinary clinician, ignorant of the standards, requirements, and niceties of statistical reasoning and inference, would be allowed to testify, unconstrained by any principled epidemiologic reasoning about random or systematic error. It is hard to imagine that Rule 702 would countenance such an end-run around the requirements of sound science.

[5] Adhering to Bradford Hill’s own admonition might have saved the Court the confusion of describing statistical significance as a measure of strength of association. 2015 WL 314149, at *2.

More Antic Proposals for Expert Witness Testimony – Including My Own Antic Proposals

December 30th, 2014

The late Professor Margaret Berger epitomized a person you could like and even admire, while finding many of her ideas erroneous, incoherent, and even dangerous. Berger was frequently on the losing side of expert witness admissibility issues, and she fell under the influence of the plaintiffs’ bar, holding conferences with their walking-around money, laundered through SKAPP, The Project on Scientific Knowledge and Public Policy.[1] In appellate cases, Berger often lent the credibility of her scholarship to support plaintiffs’ efforts to strip away admissibility criteria for expert witness causation opinion.[2] Still, she was always polite and respectful in debate. When Judge Weinstein appointed her to chair a committee to search for appropriate court-appointed expert witnesses in the silicone gel breast implant litigation, Professor Berger proved a careful, impartial listener to all the parties involved.

In 2009, before the publication of the Third Edition of the Reference Manual on Scientific Evidence, Professor Berger gave a presentation for an American Law Institute continuing legal education program, in which she aired her antipathy toward gatekeeping.[3] With her sights set primarily on defense expert witnesses, Berger opined that a monetary relationship between an expert witness and the defendant could be grounds for a Rule 702 exclusion. While the jingle of coin doth soothe the hurt that conscience must feel (for some expert witnesses), the focus of the Rule 702 inquiry is properly on relevance, reliability, and validity. Judge Shira Scheindlin, who sat on the same panel as Professor Berger, diplomatically pointed out that employee expert witnesses are offered all the time, and any bias is a subject for cross-examination, not disqualification. Remarkably, neither Professor Berger nor Judge Scheindlin acknowledged that conflicts of interest, actual or potential, are not relevant to the Daubert or Rule 702 factors that guide admissibility. If Berger’s radical position of identifying conflict of interest with unreliability were correct, we might dismiss her views without any consideration[4], given her conflicts of interest from her association with SKAPP, and her several amicus briefs filed on behalf of plaintiffs, seeking to avoid the exacting requirements of expert witness evidence gatekeeping.

In her ALI-CLE lecture, Professor Berger waxed enthusiastically about what was then a recent federal trial court decision in Allen v. Martin Surfacing, 263 F.R.D. 47 (D. Mass. 2009). Berger asserted that the case was unpublished and that the case, like many other “Daubert” cases was hidden from view. Berger thought that Allen’s obscurity was unfortunate because the decision was “fabulous” and was based upon astute opinions of “outstanding” experts[5]. Berger was wrong on every point, from the chemical involved, to the unavailability of the opinion, to the quality of the expert witnesses (who were not ALS experts, but frequent, willing testifiers), and to the carefulness of the exposure and causation opinions offered.[6] See James L. Bernat & Richard Beresford, Ethical and Legal Issues in Neurology 59-60 (Amsterdam 2013) (discussing Allen and characterizing the court’s decision to admit plaintiffs’ expert witnesses’ opinions as based upon plausibility without more).

Implicit in Berger’s errors, however, may be the beginnings of some concrete suggestions for improving the gatekeeping process. After all, Berger thought that no one would likely find and read the Allen decision.  She may have thus believed that she had some freedom from scrutiny when she praised the decision and the expert witnesses involved. Just as there is a groundswell of support for greater disclosure of underlying data to accompany scientific publications, there should be support for wide dissemination of the underlying materials behind Rule 702 opinions. Most judges cannot or will not write sufficiently comprehensive opinions describing and supporting their decisions to admit or exclude expert witness opinion to permit vigorous public scrutiny. Some judges fail to cite to the underlying studies or data that are the bases of the challenged opinions. As a result, the “Daubert” scholarship suffers because it frequently lacks access to the actual reports, testimony, studies, and data themselves. Often the methodological flaws discussed in judicial opinions are just the tip of the iceberg, with flaws running all the way to the bottom.

And while I am on “antic proposals” of my own, courts should consider requiring all parties to file proposed findings of fact and conclusions of law, with record cites, to support their litigation positions. Lawyers on both sides of the “v.” have proven themselves cavalier and careless in their descriptions and characterizations of scientific evidence, inference, and analysis. Proposed findings would permit reviewing courts, scientists, and scholars to identify errors for the benefit of appellate courts and later trial courts.


 

[1] SKAPP claimed to have aimed at promoting transparent decision making, but deceived the public with its disclosure of having been supported by the “Common Benefit Trust, a fund established pursuant to a court order in the Silicone Gel Breast Implant Products Liability litigation.” Somehow SKAPP forgot to disclose that this court order simply created a common-benefit fund for plaintiffs’ lawyers to pursue their litigation goals. How money from the silicone gel breast implant MDL was diverted for advocated anti-Daubert policies is a mystery that no amount of transparent decision making has to date uncovered. Fortunately, for the commonweal, SKAPP appears to have been dissolved. The SKAPP website lists those who guided and supported SKAPP’s attempts to subvert expert witness validity requirements; not surprisingly, the SKAPP supporters were mostly plaintiffs’ expert witnesses:

Eula Bingham, PhD
Les Boden, PhD
Richard Clapp, DSc, MPH
Polly Hoppin, ScD
Sheldon Krimsky, PhD
David Michaels, PhD, MPH
David Ozonoff, MD, MPH
Anthony Robbins, MD, MPA

[2] See, e.g., Parker v. Mobil Oil Corp., N.Y. Ct. App., Brief Amicus Curiae of Profs. Margaret A. Berger, Edward J. Imwinkelried, Sheila Jasanoff, and Stephen A. Saltzburg (July 28, 2006) (represented by Anthony Z. Roisman, of the National Legal Scholars Law Firm).

[3] Berger, “Evidence, Procedure, and Trial Update: How You Can Win (Or Lose) Your Case (Expert Witnesses, Sanctions, Spoliation, Daubert, and More)” (Mar. 27, 2009).

Berger, “Evidence, Procedure, and Trial Update: How You Can Win (Or Lose) Your Case (Expert Witnesses, Sanctions, Spoliation, Daubert, and More)” (Mar. 27, 2009).

[4] We can see this position carried to its natural, probable, and extreme endpoint in Elizabeth Laposata, Richard Barnes, and Stanton Glantz, “Tobacco Industry Influence on the American Law Institute’s Restatements of Torts and Implications for Its Conflict of Interest Policies,” 98 Iowa Law Rev. 1 (2012), where the sanctimonious authors, all anti-tobacco advocates criticize the American Law Institute for permitting the participation of lawyers who represent tobacco industry. The authors fail to recognize that ALI members include lawyers representing plaintiffs in tobacco litigation, and that it is possible, contrary to their ideological worldview, to discuss and debate an issue without reference to ad hominem “conflicts” issues. The authors might be surprised by the degree to which the plaintiffs’ bar has lobbied (successfully) for many provisions in various Restatements.

[5] Including Richard Clapp, who served as an advisor to SKAPP, which lavished money on Professor Berger’s conferences.

[6] SeeBad Gatekeeping or Missed Opportunity – Allen v. Martin Surfacing” (Nov. 30, 2012); “Gatekeeping in Allen v. Martin Surfacing — Postscript” (April 11, 2013).

New Standard for Scientific Evidence – The Mob

December 27th, 2014

A few years ago, a law student published a Note that argued for the dismantling of judicial gatekeeping.  Note, “Admitting Doubt: A New Standards for Scientific Evidence,” 123 Harvard Law Review 2021 (2010).  The anonymous Harvard law student asserted that juries are at least as good, if not better, at handling technical questions than are “gatekeeping” federal trial judges. The empirical evidence for such a suggestion is slim, and ignores the geographic variability in jury pools.

To be sure, some jurors have much greater scientific and mathematical aptitude than some judges, but the law student’s run at Rule 702 ignores some important institutional differences between judges and juries, including that judicial errors are subject to scrutiny and review, and public comment based upon written judicial opinions. Most judges have 20 years of schooling and 10 years of job experience, which should account for some superiority.

Misplaced Sovereignty

Another student this year has published a much more sophisticated version of the Harvard student’s Note, an antic proposal with a similar policy agenda that would overthrow the regime of judicial scrutiny and gatekeeping of expert witness opinion testimony. Krista M. Pikus, “We the People: Juries, Not Judges, Should be the Gatekeepers of Expert Evidence,” 90 Notre Dame L. Rev. 453 (2014). This more recent publication, while conceding that judges may be no better than juries at evaluating scientific evidence, asserts that jury involvement is required by a political commitment to popular sovereignty. Ms. Pikus begins with the simplistic notion that:

“[o]ur system of government is based on the idea that the people are sovereign.”

Id. at 470. Since juries are made up of people, jury determinations are required to implement popular sovereignty.

This notion of sovereignty is really quite foreign to our Constitution and our system of government. “We, the People” generally do not make laws or apply them, with the exception of grand and petit jury factual determinations. The vast legislative and decision making processes are entrusted to Congress, the Executive, and the ever-expanding system of administrative agencies. The Constitution was indeed motivated to prevent governmental tyranny, but mob rule was not an acceptable alternative. For the founders, juries were a bulwark of liberty, and a shield against an overbearing Crown. Jurors were white men who owned property.

Pikus argues that judges somehow lack the political authority to serve as fact finders because they are not elected, but in some states judges are elected, and in other states and in the federal system, judges are appointed and confirmed by elected officials. Juries are, of course, not elected, and with many jurisdictions permitting juries of six persons or fewer, juries are hardly representative of the “popular sovereign.” The systematic exclusion of intelligent and well-educated jurors by plaintiffs’ counsel, along with the aversion to jury service by self-employed and busy citizens, helps ensure that juries fail to represent a fair cross-section of the population. Curiously, Pikus asserts that the “right to a trial by one’s peers is an integral part of our legal system,” but the peerage concept is nowhere in the Constitution. If it were, defendants in complicated tort cases might well have a right to juries composed of scientists or engineers.

The Right to Trial by Jury

There is, of course, a federal constitutional right to trial by jury, guaranteed by the Seventh Amendment:

“In Suits at common law … the right of trial by jury shall be preserved, and no fact tried by a jury shall be otherwise reexamined in any Court of the United States, than according to the rules of the common law.”

A strict textualist might hold that federal courts could dispense with juries in cases brought under statutory tort legislation, such as the New Jersey Products Liability Act, or for claims or defenses that were not available at the time the Seventh Amendment was enacted. Even the textualist might hold that the change in complexity of fact-finding endeavors, over two centuries, might mean that both the language and the spirit of the Seventh Amendment point away from maintaining the jury in cases of sufficient complexity.

Judges versus Juries

The fact is that judges and juries can, and do, act tyrannically, in deciding factual issues, including scientific and technical issues. Ms. Pikus would push the entire responsibility for ensuring accuracy in scientific fact finding to the least reviewable entity, the petit jury. Juries can ignore facts, decide emotively or irrationally, without fear of personal scrutiny or criticism. Pikus worries that judges “insert their policy opinions into their decisions,” which they have been known to do, but she fails to explain why we should tolerate the same from unelected, unreviewable juries. Id. at 472.

Inconsistently, Pikus acknowledges that “many can agree that some cases might be better suited for a judge instead of a jury,” such as “patent, bankruptcy, or tax” cases that “typically require additional expertise.” Id. at 471 & n. 185. To this list, we could add family law, probate, and equity matters, but the real question is what is it about a tax case that makes it more intractable to a jury than a products case. State power is much more likely to be abused or at issue in a tax case than in a modern products liability case, with a greater need for a “bulwark of liberty”. And the products liability case is much more likely to require scientific and technical expertise than a tax case.

The law of evidence, in federal and in most state courts, permits expert witnesses to present conclusory opinions, without having to account for the methodological correctness of their relied-upon studies, data, and analyses. Jurors, who are poorly paid, and pulled away from their occupations and professions, do not have the aptitude, patience, time, or interest to explore the full range of inferences and analyses performed by expert witnesses. Without some form of gatekeeping, trial outcomes are reduced to juror assessment of weak, inaccurate proxies for scientific determinations.

Pikus claims that juries, and only juries, should assess the reliability of an expert witness’s testimony. Id. at 455. As incompetent as some judges may be in adjudicating scientific issues, their errors are on display for all to see, whereas the jury’s determinations are opaque and devoid of public explanation. Judges can be singled out for technical competency, with appropriate case assignments, and they can be required to participate in professional legal education, including training in statistics, epidemiology, toxicology, genetics, and other subjects. It is difficult to imagine a world in which the jurors are sent home with the Reference Manual on Scientific Evidence, before being allowed to sit on a case. Nor is it feasible to have lay jurors serve on an extended trial that includes a close assessment of the expert witnesses’ opinions, as well as all the facts and data underlying those opinions.

Pikus criticizes the so-called “Daubert” regime as a manifestation of judicial activism, but she ignores that Daubert has been subsumed into an Act of Congress, in the form of a revised and expanded Federal Rules of Evidence.

In the end, this Note, like so much of the anti-Daubert law review literature is a complaint against removing popular, political, and emotive fact finding from technical and scientific issues in litigation. To the critics, science has no criteria of validity which the law is bound to respect. And yet, as John Adams argued before the Revolution:

“Facts are stubborn things; and whatever may be our wishes, our inclinations, or the dictates of our passion, they cannot alter the state of facts and evidence[1].”

Due process requires more than the enemies of Daubert would allow.


[1] John Adams, “Argument in Defense of the Soldiers in the Boston Massacre Trials” (Dec. 1770)

Showing Causation in the Absence of Controlled Studies

December 17th, 2014

The Federal Judicial Center’s Reference Manual on Scientific Evidence has avoided any clear, consistent guidance on the issue of case reports. The Second Edition waffled:

“Case reports lack controls and thus do not provide as much information as controlled epidemiological studies do. However, case reports are often all that is available on a particular subject because they usually do not require substantial, if any, funding to accomplish, and human exposure may be rare and difficult to study. Causal attribution based on case studies must be regarded with caution. However, such studies may be carefully considered in light of other information available, including toxicological data.”

F.J.C. Reference Manual on Scientific Evidence at 474-75 (2d ed. 2000). Note the complete lack of discussion of base-line risk, prevalence of exposure, and external validity of the “toxicological data.”

The second edition’s more analytically acute and rigorous chapter on statistics generally acknowledged the unreliability of anecdotal evidence of causation. See David Kaye & David Freedman, “Reference Guide on Statistics,” in F.J.C. Reference Manual on Scientific Evidence 91 – 92 (2d ed. 2000).

The Third Edition of the Reference Manual is even less coherent. Professor Berger’s introductory chapter[1] begrudgingly acknowledges, without approval, that:

“[s]ome courts have explicitly stated that certain types of evidence proffered to prove causation have no probative value and therefore cannot be reliable.59

The chapter on statistical evidence, which had been relatively clear in the second edition, now states that controlled studies may be better but case reports can be helpful:

“When causation is the issue, anecdotal evidence can be brought to bear. So can observational studies or controlled experiments. Anecdotal reports may be of value, but they are ordinarily more helpful in generating lines of inquiry than in proving causation.14

Reference Manual at 217 (3d ed. 2011). The “generally” is given no context or contour for readers. These authors fail to provide any guidance on what will come from anecdotal evidence, or when and why anecdotal reports may do more than merely generating “lines of inquiry.”

In Matrixx Initiatives Inc. v. Siracusano, 131 S. Ct. 1309 (2011), the Supreme Court went out of its way, way out of its way, to suggest that statistical significance was not always necessary to support conclusions of causation in medicine. Id. at 1319. The Court cited three Circuit court decisions to support its suggestion, but two of three involved specific causation inferences from so-called differential etiologies. General causation was assumed in those two cases, and not at issue[2]. The third case, the notorious Wells v. Ortho Pharmaceutical Corp., 788 F. 2d 741, 744–745 (11th Cir. 1986), was also cited in support of the suggestion that statistical significance was not necessary, but in Wells, the plaintiffs’ expert witnesses actually relied upon studies that claimed at least nominal statistical significance. Wells was and remains representative of what is possible and results when trial judges ignore the constraints of study validity. The Supreme Court, in any event, abjured any intent to specify “whether the expert testimony was properly admitted in those cases [Wells and others],” and the Court made no “attempt to define here what constitutes reliable evidence of causation.” 131 S. Ct. at 1319.

The causal claim in Siracusano involved anosmia, loss of the sense of smell, from the use of Zicam, zinc gluconate. The case arose from a motion to dismiss the complaint; no evidence was ever presented or admitted. No baseline risk of anosmia was pleaded; nor did plaintiffs allege that any controlled study demonstrated an increased risk of anosmia from nasal instillation of zinc gluconate. There were, however, clinical trials conducted in the 1930s, with zinc sulfate for poliomyelitis prophylaxis, which showed a substantial incidence of anosmia in the treated children[3]. Matrixx tried to argue that this evidence was unreliable, in part because it involved a different compound, but this argument (1) in turn demonstrated a factual issue that required discovery and perhaps a trial, and (2) traded on a clear error in asserting that the zinc in zinc sulfate and zinc gluconate were different, when in fact they are both ionic compounds that result in zinc ion exposure, as the active constituent.

The position stridently staked out in Matrixx Initiatives is not uncommon among defense counsel in tort cases. Certainly, similar, unqualified statements, rejecting the use of case reports for supporting causal conclusions, can be found in the medical literature[4].

When the disease outcome has an expected value, a baseline rate, in the exposed population, then case reports simply confirm what we already know: cases of the disease happen in people regardless of their exposure status. For this reason, medical societies, such as the Teratology Society, have issued guidances that generally downplay or dismiss the role that case reports may have in the assessment and determination of causality for birth defects:

“5. A single case report by itself is not evidence of a causal relationship between an exposure and an outcome.  Combinations of both exposures and adverse developmental outcomes frequently occur by chance. Common exposures and developmental abnormalities often occur together when there is no causal link at all. Multiple case reports may be appropriate as evidence of causation if the exposures and outcomes are both well-defined and low in incidence in the general population. The use of multiple case reports as evidence of causation is analogous to the use of historical population controls: the co-occurrence of thalidomide ingestion in pregnancy and phocomelia in the offspring was evidence of causation because both thalidomide use and phocomelia were highly unusual in the population prior to the period of interest. Given how common exposures may be, and how common adverse pregnancy outcome is, reliance on multiple case reports as the sole evidence for causation is unsatisfactory.”

The Public Affairs Committee of the Teratology Society, “Teratology Society Public Affairs Committee Position Paper Causation in Teratology-Related Litigation,” 73 Birth Defects Research (Part A) 421, 423 (2005).

When the base rate for the outcome is near zero, and other circumstantial evidence is present, some commentators insist that causality may be inferred from well-documented case reports:

“However, we propose that some adverse drug reactions are so convincing, even without traditional chronological causal criteria such as challenge tests, that a well documented anecdotal report can provide convincing evidence of a causal association and further verification is not needed.”

Jeffrey K. Aronson & Manfred Hauben, “Drug safety: Anecdotes that provide definitive evidence,” 333 Brit. Med. J. 1267, 1267 (2006) (Dr. Hauben was medical director of risk management strategy for Pfizer, in New York, at the time of publication). But which ones are convincing, and why?

        *        *        *        *        *        *        *        *        *

Dr. David Schwartz, in a recent blog post, picked up on some of my discussion of the gadolinium case reports (see here and there), and posited the ultimate question: when are case reports sufficient to show causation? David Schwartz, “8 Examples of Causal Inference Without Data from Controlled Studies” (Dec. 14, 2014).

Dr. Schwartz discusses several causal claims, all of which gave rise to litigation at some point, in which case reports or case series played an important, if not dispositive, role:

  1.      Gadolinium-based contrast agents and NSF
  2.      Amphibole asbestos and malignant mesothelioma
  3.      Ionizing radiation and multiple cancers
  4.      Thalidomide and teratogenicity
  5.      Rezulin and acute liver failure
  6.      DES and clear cell vaginal adenocarcinoma
  7.      Vinyl chloride and angiosarcoma
  8.      Manganese exposure and manganism

Dr. Schwartz’s discussion is well worth reading in its entirety, but I wanted to emphasize some of Dr. Schwartz’s caveats. Most of the exposures are rare, as are the outcomes. In some cases, the outcomes occur almost exclusively with the identified exposures. All eight examples pose some danger of misinterpretation. Gadolinium-based contrast agents appear to create a risk of NSF only in the presence of chronic renal failure. Amphibole asbestos, and most importantly, crocidolite causes malignant mesothelioma after a very lengthy latency period. Ionizing radiation causes some cancers that are all-too common, but the presence of multiple cancers in the same person, after a suitable latency period, is distinctly uncommon, as is the level of radiation needed to overwhelm bodily defenses and induce cancers. Thalidomide was associated by case reports fairly quickly with phocomelia, which has an extremely low baseline risk. Other birth defects were not convincingly demonstrated by the case series. Rezulin, an oral antidiabetic medication, was undoubtedly causally responsible for rare cases of acute liver failure. Chronic liver disease, however, which is common among type 2 diabetic patients, required epidemiologic evidence, which never materialized[5].

Manganism, by definition, is the cause of manganism, but extremely high levels of manganese exposure, and specific speciation of the manganese, are essential to the causal connection. Manganism raises another issue often seen in so-called signature diseases: diagnostic accuracy. Unless the diagnostic criteria have perfect (100%) specificity, with no false-positive diagnoses, then once again, we expect false-positive cases to appear when the criteria are applied to large numbers of people. In the welding fume litigation, where plaintiffs’ counsel and physicians engaged in widespread, if not wanton, medico-legal screenings, it was not surprising that they might find occasional cases that appeared to satisfy their criteria. Of course, the more the criteria are diluted to accommodate litigation goals, the more likely there will be false positive cases.[6]

Dr. Schwartz identifies some common themes and important factors in identifying the bases for inferring causality from uncontrolled evidence:

“(a) low or no background rate of the disease condition;

(b) low background rate of the exposure;

(c) a clear understanding of the mechanism of action.”

These factors and perhaps others should not be confused with strict criteria here. The exemplar cases suggest a family resemblance of overlapping factors that help support the inference, even against the most robust skepticism.

In litigation, defense counsel typically argue that analytical epidemiology is always necessary, and plaintiffs’ counsel claim epidemiology is never needed. The truth is more nuanced and conditional, but the great majority of litigated cases do require epidemiology for health effects because the claimed harms are outcomes that have an expected incidence or prevalence in the exposed population irrespective of exposure.


[1] Reference Manual on Scientific Evidence at 23 (3d ed. 2011) (citing “Cloud v. Pfizer Inc., 198 F. Supp. 2d 1118, 1133 (D. Ariz. 2001) (stating that case reports were merely compilations of occurrences and have been rejected as reliable scientific evidence supporting an expert opinion that Daubert requires); Haggerty v. Upjohn Co., 950 F. Supp. 1160, 1164 (S.D. Fla. 1996), aff’d, 158 F.3d 588 (11th Cir. 1998) (“scientifically valid cause and effect determinations depend on controlled clinical trials and epidemiological studies”); Wade-Greaux v. Whitehall Labs., Inc., 874 F. Supp. 1441, 1454 (D.V.I. 1994), aff’d, 46 F.3d 1120 (3d Cir. 1994) (stating there is a need for consistent epidemiological studies showing statistically significant increased risks).”)

[2] Best v. Lowe’s Home Centers, Inc., 563 F. 3d 171, 178 (6th Cir 2009); Westberry v. Gislaved Gummi AB, 178 F. 3d 257, 263–264 (4th Cir. 1999).

[3] There may have been a better argument for Matrixx in distinguishing the method and place of delivery of the zinc sulfate in the polio trials of the 1930s, but when Matrixx’s counsel was challenged at oral argument, he asserted simply, and wrongly, that the two compounds were different.

[4] Johnston & Hauser, “The value of a case report,” 62 Ann. Neurology A11 (2007) (“No matter how compelling a vignette may seem, one must always be concerned about the reliability of inference from an “n of one.” No statistics are possible in case reports. Inference is entirely dependent, then, on subjective judgment. For a case meant to suggest that agent A leads to event B, the association of these two occurrences in the case must be compared to the likelihood that the two conditions could co-occur by chance alone …. Such a subjective judgment is further complicated by the fact that case reports are selected from a vast universe of cases.”); David A. Grimes & Kenneth F. Schulz, “Descriptive studies: what they can and cannot do,” 359 Lancet 145, 145, 148 (2002) (“A frequent error in reports of descriptive studies is overstepping the data: studies without a comparison group allow no inferences to be drawn about associations, causal or otherwise.”) (“Common pitfalls of descriptive reports include an absence of a clear, specific, and reproducible case definition, and interpretations that overstep the data. Studies without a comparison group do not allow conclusions about cause and disease.”); Troyen A. Brennan, “Untangling Causation Issues in Law and Medicine: Hazardous Substance Litigation,” 107 Ann. Intern. Med. 741, 746 (1987) (recommending that testifying physicans “[a]void anecdotal evidence; clearly state the opposing side is relying on anecdotal evidence and why that is not good science.”).

[5] See In re Rezulin, 2004 WL 2884327, at *3 (S.D.N.Y. 2004).

[6] This gaming of diagnostic criteria has been a major invitation to diagnostic invalidity in litigation over asbestosis and silicosis in the United States.

More Case Report Mischief in the Gadolinium Litigation

November 28th, 2014

The Decker case is one curious decision, by the MDL trial court, and the Sixth Circuit. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014). First, the Circuit went out of its way to emphasize that the trial court had discretion, not only in evaluating the evidence on a Rule 702 challenge, but also in devising the criteria of validity[1]. Second, the courts ignored the role and the weight being assigned to Federal Rule of Evidence 703, in winnowing the materials upon which the defense expert witnesses could rely. Third, the Circuit approved what appeared to be extremely asymmetric gatekeeping of plaintiffs’ and defendant’s expert witnesses. The asymmetrical standards probably were the basis for emphasizing the breadth of the trial court’s discretion to devise the criteria for assessing scientific validity[2].

In barring GEHC’s expert witnesses from testifying about gadolinium-naive nephrogenic systemic fibrosis (NSF) cases, Judge Dan Polster, the MDL judge, appeared to invoke a double standard. Plaintiffs could adduce any case report or adverse event report (AER) on the theory that the reports were relevant to “notice” of a “safety signal” between gadolinium-based contrast agents in MRI and NSF. Defendants’ expert witnesses, however, were held to the most exacting standards of clinical identity with the plaintiff’s particular presentation of NSP, biopsy-proven presence of Gd in affected tissue, and documentation of lack of GBCA-exposure, before case reports would be permitted as reliance materials to support the existence of gadolinium-naïve NSF.

A fourth issue with the Decker opinion is the latitude it permitted the district court to allow testimony from plaintiffs’ pharmacovigilance expert witness, Cheryl Blume, Ph.D., over objections, to testify about the “signal” created by the NSF AERs available to GEHC. Decker at *11. At the same trial, the MDL judge prohibited GEHC’s expert witness, Dr. Anthony Gaspari, to testify that the AERs described by Blume did not support a clinical diagnosis of NSF.

On a motion for reconsideration, Judge Polster reaffirmed his ruling on grounds that

(1) the AERs were too incomplete to rule in or rule out a diagnosis of NSF, although they were sufficient to create a “signal”;

(2) whether the AERs were actual cases of NSF was not relevant to their being safety signals;

(3) Dr. Gaspari was not an expert in pharmacovigilance, which studied “signals” as opposed to causation; and

(4) Dr. Gaspari’s conclusion that the AERs were not NSF was made without reviewing all the information available to GEHC at the time of the AERs.

Decker at *12.

The fallacy of this stingy approach to Dr. Gaspari’s testimony lies in the courts’ stubborn refusal to recognize that if an AER was not, as a matter of medical science, a case of NSF, then it could not be a “signal” of a possible causal relationship between GBCA and NSF. Pharmacovigilance does not end with ascertaining signals; yet the courts privileged Blume’s opinions on signals even though she could not proceed to the next step and evaluate diagnostic accuracy and causality. This twisted logic makes a mockery of pharmacovigilance. It also led to the exclusion of Dr. Gaspari’s testimony on a key aspect of plaintiffs’ liability evidence.

The erroneous approach pioneered by Judge Polster was compounded by the district court’s refusal to give a jury instruction that AERs were only relevant to notice, and not to causation. Judge Polster offered his reasoning that “the instruction singles out one type of evidence, and adds, rather than minimizes, confusion.” Judge Polster cited the lack of any expert witness testimony that suggested that AERs showed causation and “besides, it doesn’t matter because those patients are not, are not the plaintiffs.” Decker at *17.

The lack of dispute about the meaning of AERs would have seemed all the more reason to control jury speculation about their import, and to give a binding instruction on AERs and their limited significance. As for the AER patients’ not being the plaintiffs, well, the case report patients were not the plaintiffs, either. This last reason is not even wrong[3]. The Circuit, in affirming, turned a blind eye to the district court’s exercise of discretion in a way that systematically increased the importance of Blume’s testimony on signals, while systematically hobbling the defendant’s expert witnesses.


[1]THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS” (Nov. 12, 2014).

[2]Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports” (Nov. 24, 2014).

[3] “Das ist nicht nur nicht richtig, es ist nicht einmal falsch!” The quote is attributed to Wolfgang Pauli in R. E. Peierls, “Wolfgang Ernst Pauli, 1900-1958,” 5 Biographical Memoirs Fellows Royal Soc’y 175, 186 (1960).

 

History – Lies My Teacher Told Me

November 26th, 2014

James W. Loewen, a professor of history, has been one of the most untiring critics of how history is taught and practiced in the United States. A large part of his criticism derives from the overt politicization of the teaching of history, especially the heavy hand of school boards and textbook committees in their selection of “appropriate textbooks” for high school students. See James W. Loewen, Lies My Teacher Told Me: Everything Your American History Textbook Got Wrong (2007). The disgraceful “conservative” sanitizing of United States highschoolers’ history textbooks is almost equal to the heavy-handed Marxist bent of some University professors. The politicization of history may be unavoidable, but we should be alert to the intellectual depredations from the right and the left.

*     *     *     *     *     *

I recently saw the self-styled social history of silicosis, Deadly Dust, by David Rosner and Gerald Markowitz, cited in a trial court brief. The cite was to the original edition, but it led me to read the “new and expanded” edition[1], published in 2006. Expanded, but not exactly super-sized, and with the same empty calories as before. The authors’ Preface to the second edition relays an air of excitement about recent (at the time of publication) media suggestions that silicosis may be the “new asbestosis.[2]” Of course, the authors were excited because the uptick in silicosis litigation around 2003, based almost exclusively upon fraudulent filings, brought them engagements as compensated expert witnesses for plaintiffs’ counsel.

The Preface also confesses that before their initial edition, the authors were ignorant about silicosis. And because they are so well read they assumed that their not having heard of silicosis meant that silicosis must have disappeared from the literature. Id. at xiii. This fallacious confusion between absence of evidence and evidence of absence pervades the entire book. Their first edition was written with this confirmation bias dominating their narrative:

“The book we wrote tells the story of a condition that dominated public health, medical, labor, and popular discourse on disease in the 1930s but that virtually vanished from popular and professional consciousness after World War II. How, we asked, could a chronic disease that took decades to develop and that was assumed to affect hundreds of thousands of American workers disappear from the literature and public notice in less than a decade? This question is the basis for Deadly Dust, and we believe that we answered it, providing a cultural, medical, and political model of how we, as a society, decide to recognize or forget about illness.”

Deadly Dust at xiv (emphasis added). The second edition is more of the same biased narrative.

Also clear from their Preface is the authors’ messianic complex. I now know why they have repeatedly attacked me for having criticized them: It is important for them to be seen as having been resistant victims of industry, indeed, especially if they are triumphant victims:

“We are particularly proud that lawyers for various industries have sought to get judges to exclude our book from court cases.”

Id. at xvi. Of course, from the lawyers’ perspective, a book such as Deadly Dust has many layers of evidentiary problems, running from authentication of documents, to multiple layers of hearsay, legal and logical relevancy, and rampant, subjective opinion spread throughout the narrative.

The “virtually vanished” phrase caused me to revisit[3] my previous quantitative assessment of discussions of silicosis in the popular and medical literature. The National Library of Medicine PubMed database is expanding back into the past, adding old journals and their articles to the database. Here is the most recent tally, by decade of articles with keyword “silicosis”:

Date Range                    Number of Articles from Keyword Search

1940 – 1949                      119

1950 – 1959                    1,436

1960 – 1969                    1,868

1970 – 1979                    1,176

1980 – 1989                       940

1990 – 1999                       883

2000 – 2009                      860

2010 — present                  498

The Rosner/Markowitz claim about silicosis “virtually vanishing” from professional discourse after World War II, is an assertion that is completely belied by the evidence. Google’s Ngram function further confirms the incorrectness of the fundamental premise of Deadly Dust:

Silicosis Ngram 1920 - 2010

Silicosis Ngram 1920 – 2010

The Google chart shows that although there was a peak around 1940, the level of referencing silicosis remained at or above the level for the mid-1930s until 1960, and never retreated to levels as low as for 1930-32.

This false premise, that silicosis vanished, or virtually vanished, from the medical literature, is the starting point for Rosner and Markowitz’ faux conspiracy charge against industry for suppressing discussion, when the reality was exactly the opposite. What follows from the false premise is a false set of conclusions.


[1] David Rosner & Gerald Markowitz, Deadly Dust: Silicosis and the On-Going Struggle to Protect Workers’ Health (Ann Arbor 2006).

[2] citing Jonathan Glater, “Suits on Silica Being Compared to Asbestos Cases,” New York Times (Sept. 6, 2003), C-1 (quoting one defense lawyer as saying that “I actually thought that we had made the world safe for sand.”).

[3] See Schachtman, “Conspiracy Theories: Historians, In and Out of Court” (April 17, 2013).

 

Gadolinium, Nephrogenic Systemic Fibrosis, and Case Reports

November 24th, 2014

Gadolinium (Gd) is a rare earth element. In its ionic form (+3), gadolinium is known to be highly toxic to humans. Gadolinium is strongly paramagnetic, which makes it a valuable contrast agent in for magnetic resonance imaging (MRI). The gadolinium is administered intravenously in a chelated form before MRI. In its chelated form, the ion is escorted out of the body through the kidneys before exposure to free Gd ion occurs. Or that was the theory.

Nephrogenic systemic fibrosis (NSF) is a rare, painful, incurable progressive connective tissue disease. NSF manifests with skin thickening and fibrosis, tethering, which means it cannot be pulled away from body. Some patients may develop extracutaneous fibrosis of muscle, lymph nodes, pleura, and other internal organs. Elana J. Bernstein, Christian Schmidt-Lauber, and Jonathan Kay, “Nephrogenic systemic fibrosis: A systemic fibrosing disease resulting from gadolinium exposure,” 26 Best Practice & Research Clin. Rheum. 489, 489 (2012).

As a diagnostic entity, NSF is a relatively recent discovery. The first case was noted in 1997, in California. Within a few years, the differential diagnostic criteria to distinguish NSF from other fibrotic diseases were developed. Centers for Disease Control, “Fibrosing skin condition among patients with renal disease–United States and Europe, 1997–2002,” 51 MMWR Morbidity and Mortality Weekly Report 25 (2002). Physicians identified the condition among patients with renal insufficiency who had received MRI with a gadolinium-based contrast agent (GBCA). Given the rarity of both the exposure (GBCA and renal insufficiency) and the outcome (NSF), the relationship between NSF and the use of gadolinium-containing contrast agents for magnetic resonance imaging (MRI) was discovered largely from case reports. A case registry is maintained at Yale University, and has identified 380 cases to date. Shawn E. Cowper, “Nephrogenic Systemic Fibrosis” at the website for The International Center for Nephrogenic Systemic Fibrosis Research (ICNSFR) [last updated June 15, 2013).

The little epidemiology that exists on the subject generally has found that all “cases” had exposure to Gd[1]. Or almost all. There have been occasional cases found without reported exposure to GBCA. Indeed, one case of NSF without prior GBCA was reported last month in the dermatological literature. C. Ross, N. De Rosa, G. Marshman, D. Astill, Nephrogenic systemic fibrosis in a gadolinium-naïve patient: Successful treatment with oral sirolimus,” Australas. J. Dermatol. (2014); doi: 10.1111/ajd.12176. [Epub ahead of print].

In litigation, the usual scenario is that plaintiffs and their counsel and expert witnesses want to offer case reports or case series as probative of a causal association between an exposure and a particular disease outcome. In the silicone gel breast implant litigation, women, who self-characterized themselves “victims,” shouted outside courtrooms, “We are the evidence.”

When the outcome in question has a baseline rate, and the exposure is widespread, this strategy is usually illegitimate and most courts have limited or prohibited the obvious attempt to prejudice the jury by the use of evidence that has little or no probative value.

The causal connection between NSF and GBCA, described above, was postulated on the basis of case reports, but this is not really a rejection of the general rule about case reports. NSF is an extremely rare outcome, and GBCA administered to patients with serious kidney insufficiency is a fairly rare exposure. In addition, gadolinium ion has a known human toxicity, and the connection between renal insufficiency and Gd toxicity is rather straightforward. The insufficiency of the kidney function results in longer “in residence” times for the GBCA, with the consequence that the gadolinium disassociates from its chelating agent, and the free Gd ion does its damage. Furthermore, biopsies of affected tissues show an uptake of gadolinium in NSF patients.

   *   *   *   *   *   *   *   *

GE Healthcare manufactures Omniscan, a GBCA, for use as an MRI-contrast medium. Given the recently discovered dangers of GBCAs in vulnerable patients, Omniscan has been a magnet for lawsuits, with the peak intensity of the litigation field in the MDL courtroom of federal district courtroom of Judge Dan Polster. Judge Polster tried the first Omniscan case, which resulted in a verdict for the plaintiff. GE appealed, complaining about several of Judge Polster’s rulings, including the uneven handling of case reports. Last month, the Sixth Circuit affirmed. Decker v. GE Healthcare Inc., ___ F.3d ___, 2014 FED App. 0258P, 2014 U.S. App. LEXIS 20049 (6th Cir. Oct. 20, 2014).

General causation between GBCAs and NSF was apparently not disputed in Decker. Although plaintiffs in the GBCA litigation established the causality of GABC in producing NSF, by case reports, Judge Polster refused to permit GEHC’s expert witnesses to testify about their reliance upon case reports of gadolinium-naïve cases of NSF; that is, the court disallowed testimony about reported cases that occurred in the absence of GBCA exposure[2]. Id. at *9. Judge Polster found that the reported gadolinium-naïve case reports were “methodologically flawed” because they did not adequately show that the NSF patients in question lacked Gd exposure, with tissue biopsy or other means. Id. at * 10. The district court speculated that there may have Gd exposure from a non-MRI procedure, but never explained what non-MRI procedure would involve internal administration of GBCA. Nor did the district court address the temporal relationship between this undocumented, conjectured non-MRI gadolinium-based imaging procedure and the onset of the reported patient’s NSF.

Before trial defendant GEHC moved for reconsideration of the district court’s previous decision on defensive use of gadolinium-naïve case reports, based upon on a then recent publication of a “purported” case of gadolinium-naïve NSF. Id. at *8. A quick read of the late-breaking case study shows that it was more than a “purported” case. A.A. Lemy, et al., “Revisiting nephrogenic systemic fibrosis in 6 kidney transplant recipients: a single-center experience,” 63 J. Am. Acad. Dermatol. 389 (2010). The cited paper by Lemy had diagnosed NSF in a patient without GBCA exposure, and mass spectrometry testing of affected tissue revealed no Gd. The district court, however, dismissed the Lemy case as irrelevant unless GEHC’s expert witnesses could demonstrate that Lemy’s patient number 5 and the plaintiff were so clinical similar that “it was probable that Mr. Decker’s NSF was not caused by his 2005 Omniscan [exposure].”

The Sixth Circuit affirmed this “tails they win; heads you lose” approach to gatekeeping as all within the scope of the district court’s exercise of discretion. Lemy’s case number 5 and Mr. Decker both had NSF, and yet the courts do not describe clinical varieties among NSF, which vary based upon their relatedness to gadolinium exposure. It would seem that the courts were imposing an extremely heavy burden on the defense to show that the gadolinium-naïve cases were absolutely free of Gd exposure, and that they resembled the particular plaintiff’s NSF diagnosis in every respect. Without any evidence of diagnostic disease criteria sensitivity and specificity, and positive predictive value for the criteria, the district and the appellate courts seem to have accepted glib demands for absolute identity between the plaintiff’s NSF manifestation and any candidate Gd-free NSF case. Given that there is clinical heterogeneity among Gd-NSF cases, and that causality was basically inferred from cases and case series, the courts’ reasoning seems strained.

The appellate court also seemed blithely unaware of the fallacious circularity of permitting a diagnostic entity to be defined based upon exposure, thereby preventing any fair test of the hypothesis that all NSF cases are caused by gadolinium. This fallacy was advanced in the silicone gel breast implant litigation, where the litigation industry shrank from claims that silicone caused classic connective tissue diseases, in the face of exculpatory epidemiologic studies. The claimants retreated to a claim that silicone caused a “new” disease that was defined by mostly vague, self-reported symptoms [so very different from NSF in this respect], in conjunction with silicone exposure. The court-appointed expert witnesses, however, would have none of these shenanigans:

“The National Science Panel concluded that they do not yet support the inclusion of SSRD [systemic silicone-related disease] in the list of accepted diseases, for 4 reasons. First, the requirement of the inclusion of the putative cause (silicone exposure) as one of the criteria does not allow the criteria set to be tested objectively without knowledge of the presence of implants, thus incurring incorporation bias (27).”

Peter Tugwell, George Wells, Joan Peterson, Vivian Welch, Jacqueline Page, Carolyn Davison, Jessie McGowan, David Ramroth, and Beverley Shea, “Do Silicone Breast Implants Cause Rheumatologic Disorders? A Systematic Review for a Court-Appointed National Science Panel,” 44 Arthritis & Rheumatism 2477, 2479 (2001) (citing David Sackett, “Bias in analytic research,” 32 J. Chronic Dis. 51 (1979)).

Of course, NSF does not share the dubious provenance of SSRD, or SAD [silicone-associated disorder] as it was sometimes known. Still, the analytic studies that have shown that NSF cases all, or mostly, had GBCA exposure, explicitly refrained from defining the NSF case as including gadolinium exposure.

Decker is thus a curious case. The trial and appellate court talked about preventing the defense expert witnesses from relying upon case reports that were “methodologically flawed,” but the courts never mentioned Federal Rule of Evidence 703, which should have been the basis for such selective pruning of the expert witnesses’ reliance materials. And then there is the matter that even if GEHC were correct about Gd-free NSF cases, the attributable risk for NSF to prior Gd exposure is almost certainly very high, and the debate over whether NSF is a “signature” disease was not likely going to affect the case outcome.

Decker can perhaps best be understood as a dispute about specific causation, with established general causation, in which the relative risk of NSF from GBCA exposure is extraordinarily high among patients with renal insufficiency. If there are other causes of NSF, they are considerably more rare than GBCA/renal insufficiency exposed cases. In the face of this very high attributable risk, GE’s expert witnesses’ discussions of an idiopathic or other cause was too speculative to pass muster under Rule 702.


[1] Elana J. Bernstein, Tamara Isakova, Mary E. Sullivan, Lori B. Chibnik, Myles Wolf & Jonathan Kay, “Nephrogenic systemic fibrosis is associated with hypophosphataemia: a case–control study,” 53 Rheumatology 1613 (2014); T.R. Elmholdt, M. Pedersen, B. Jørgensen, K. Søndergaard, J.D. Jensen, M. Ramsing, and A.B. Olesen, “Nephrogenic systemic fibrosis is found only among gadolinium-exposed patients with renal insufficiency: a case-control study from Denmark,” 165 Br. J. Dermatol. 828 (2011); P. Marckmann, “An epidemic outbreak of nephrogenic systemic fibrosis in a Danish hospital,” 66 Eur. J. Radiol. 187 (2008) (reporting all patients had gadodiamide-enhanced magnetic resonance imaging and severe renal insufficiency before onset of NSF); P. Marckmann, L. Skov, K. Rossen, J.G. Heaf, and H.S. Thomsen, “Case-control study of gadodiamide-related nephrogenic systemic fibrosis,” 22 Nephrol. Dialysis &Transplant. 3174 (2007) (all 19 cases in case-control study had prior exposure to gadolinium (Gd)-containing magnetic resonance imaging contrast agents); Centers for Disease Control, “Nephrogenic Fibrosing Dermopathy Associated with Exposure to Gadolinium-Containing Contrast Agents — St. Louis, Missouri, 2002–2006,” 56 MMWR Morbidity and Mortality Weekly Report (Feb. 23, 2007).

[2] T.A. Collidge, P.C. Thomson, P.B. Mark, et al., “Gadolinium-Enhanced MR Imaging and Nephrogenic Systemic Fibrosis: Retrospective Study of a Renal Replacement Therapy Cohort,” 245 Radiology 168-175 (2007); I.M. Wahba, E.L. Simpson, and K. White, “Gadolinium Is Not The Only Trigger For Nephrogenic Systemic Fibrosis: Insights From Two Cases And Review Of The Recent Literature,” 7 Am. J. Transplant. 1 (2007); A. Deng, D.B. Martin, et al., “Nephrogenic Systemic Fibrosis with a Spectrum of Clinical and Histopathological Presentation: A Disorder of Aberrant Dermal Remodeling,” 37 J. Cutan. Pathol. 204 (2009).

Rhetorical Strategy in Characterizing Scientific Burdens of Proof

November 15th, 2014

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration.  Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities: Fatico

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.


 

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).

 THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS

November 12th, 2014

Back in the day, some Circuits of the United States Court of Appeal embraced an asymmetric standard of review of district court decisions concerning the admissibility of expert witness opinion evidence. If the trial court’s decision was to exclude an expert witness, and that exclusion resulted in summary judgment, then the appellate court would take a “hard look” at the trial court’s decision. If the trial court admitted the expert witness’s opinions, and the case proceeded to trial, with opponent of the challenged expert witness losing the verdict, then the appellate court would take a not-so “hard look” the trial court’s decision to admit the opinion. In re Paoli RR Yard PCB Litig., 35 F.3d 717, 750 (3d Cir.1994) (Becker, J.), cert. denied, 115 S.Ct.1253 (1995).

In Kumho Tire, the 11th Circuit followed this asymmetric approach, only to have the Supreme Court reverse and render. Unlike the appellate procedure followed in Daubert, the high Court took the extra step of applying the symmetrical standard of review, presumably for the didactic purpose of showing the 11th Circuit how to engage in appellate review. Carmichael v. Kumho Tire Co., 131 F.3d 1433 (11th Cir. 1997), rev’d sub nom. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999).

If anything is clear from the Kumho Tire decision, courts do not have discretion to apply an asymmetric standard to their evaluation of a challenge, under Federal Rule of Evidence 702, to a proffered expert witness opinion. Justice Stephen Breyer, in his opinion for the Court, in Kumho Tire, went on to articulate the requirement that trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Again, trial courts do not have the discretion to abandon this inquiry.

The “same intellectual rigor” test may have some ambiguities that make application difficult. For instance, identifying the “relevant” field or discipline may be contested. Physicians traditionally have not been trained in statistical analyses, yet they produce, and rely extensively upon, clinical research, the proper conduct and interpretation of which requires expertise in study design and data analysis. Is the relevant field biostatistics or internal medicine? Given that the validity and reliability of the relied upon studies come from biostatistics, courts need to acknowledge that the rigor test requires identification of the “appropriate” field — the field that produces the criteria or standards of validity and interpretation.

Justice Breyer did grant that trial courts must have some latitude in determining how to conduct their gatekeeping inquiries. Some cases may call for full-blown hearings and post-hearing proposed findings of fact and conclusions of law; some cases may be easily decided upon the moving papers. Justice Breyer’s grant of “latitude,” however, wanders off target:

“The trial court must have the same kind of latitude in deciding how to test an expert’s reliability, and to decide whether or when special briefing or other proceedings are needed to investigate reliability, as it enjoys when it decides whether that expert’s relevant testimony is reliable. Our opinion in Joiner makes clear that a court of appeals is to apply an abuse-of-discretion standard when it ‛review[s] a trial court’s decision to admit or exclude expert testimony’. 522 U. S. at 138-139. That standard applies as much to the trial court’s decisions about how to determine reliability as to its ultimate conclusion. Otherwise, the trial judge would lack the discretionary authority needed both to avoid unnecessary ‛reliability’ proceedings in ordinary cases where the reliability of an expert’s methods is properly taken for granted, and to require appropriate proceedings in the less usual or more complex cases where cause for questioning the expert’s reliability arises. Indeed, the Rules seek to avoid ‛unjustifiable expense and delay’ as part of their search for ‛truth’ and the ‛jus[t] determin[ation]’ of proceedings. Fed. Rule Evid. 102. Thus, whether Daubert ’s specific factors are, or are not, reasonable measures of reliability in a particular case is a matter that the law grants the trial judge broad latitude to determine. See Joiner, supra, at 143. And the Eleventh Circuit erred insofar as it held to the contrary.”

Kumho, 526 U.S. at 152-53.

Now the segue from discretion to fashion the procedural mechanism for gatekeeping review to discretion to fashion the substantive criteria or standards for determining “intellectual rigor in the relevant field” represents a rather abrupt shift. The leap from discretion to fashion procedure to discretion to fashion substantive criteria of validity has no basis in prior law, in linguistics, or in science. For instance, Justice Breyer would be hard pressed to uphold a trial court’s refusal to consider bias and confounding in assessing whether epidemiologic studies established causality in a given case, notwithstanding the careless language quoted above.

The troubling nature of Justice Breyer’s language did not go unnoticed at the time of the Kumho Tire case. Indeed, three of the Justices in Kumho Tire concurred to clarify:

“I join the opinion of the Court, which makes clear that the discretion it endorses—trial-court discretion in choosing the manner of testing expert 1reliability—is not discretion to abandon the gatekeeping function. I think it worth adding that it is not discretion to perform the function inadequately.”

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999) (Scalia, J., concurring, with O’Connor, J., and Thomas, J.)

Of course, this language from Kumho Tire really cannot be treated as binding after the statute interpreted, Rule 702, was modified in 2000. The judges of the inferior federal courts have struggled with Rule 702, sometimes more to evade its reach than to perform gatekeeping in an intelligent way. Quotations of passages from cases decided before the statute was amended and revised should be treated with skepticism.

Recently, the Sixth Circuit quoted Justice Breyer’s language about latitude from Kumho Tire, in the Circuit’s decision involving GE Healthcare’s radiographic contrast medium, Omniscan. Decker v. GE Healthcare Inc., 2014 U.S. App. LEXIS 20049, at *29 (6th Cir. Oct. 20, 2014). Although the Decker case is problematic in many ways, the defendant did not challenge general causation between gadolinium and nephrogenic systemic fibrosis, a painful, progressive connective tissue disease, which afflicted the plaintiff. It is unclear exactly what sort of latitude in applying the statute, the Sixth Circuit was hoping to excuse.

Contrivance Standard Applied to Gatekeepers and Expert Witnesses

October 1st, 2014

In Rink v. Cheminova, Inc., 400 F.3d 1286 (11th Cir. 2005), the Eleventh Circuit’s articulated a “contrivance standard,” which suggested that a district court “may properly consider whether the expert’s methodology has been contrived to reach a particular result.” Id. at 1293 & n.7; see alsoThe Contrivance Standard for Expert Witness Gatekeeping” (Sept. 28, 2014).

Although this standard has some appeal, it raises questions of motives that can complicate the Rule 702 inquiry into whether an purported opinion is “knowledge.” A less psychoanalytic inquiry into the expert witness’s motivation should generally be the first line of approach.

In the Zoloft MDL, the trial court banished Dr. Anick Bérard from federal court birth defect cases because of her unprincipled and inexplicable cherry picking of data, relied upon for her causation opinions. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig. MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.). The “contrivance” was objectively obvious and manifest in double-counting data points, and ignoring point estimates that were contrary to the desired outcome, even from papers that provided point estimates that were selectively embraced.

In the Chantix MDL, the trial court found the defendant to have harped on methodological peccadilloes but obviously did not like the beatific music (3x). Cherry picking was going on, but it was perfectly acceptable to this MDL court:

“Why Dr. Kramer chose to include or exclude data from specific clinical trials is a matter for crossexamination, not exclusion under Daubert.

In re Chantix (varenicline) Prods. Liab. Litig., 889 F. Supp. 2d 1272, 1288 (2012) (MDL No. 2092) (permitting Dr. Shira Kramer to testify on causation despite her embracing a “weight of the evidence” method that turned largely on‘‘subjective interpretations’’ of various, undescribed, non-prespecified lines of evidence).

The differing approaches to cherry picking are hard to reconcile other than to note that Chantix had drawn a “black box” warning from the FDA, and the SSRIs involved in Zoloft had not been given any heightened warning from the FDA, foreign agencies, or any professional society. FDA labeling, of course, should not have been determinative of the causation question. The mind of the gatekeeper, however, is inscrutable.