TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Rhetorical Strategy in Characterizing Scientific Burdens of Proof

November 15th, 2014

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration.  Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities: Fatico

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.


 

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).

 THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS

November 12th, 2014

Back in the day, some Circuits of the United States Court of Appeal embraced an asymmetric standard of review of district court decisions concerning the admissibility of expert witness opinion evidence. If the trial court’s decision was to exclude an expert witness, and that exclusion resulted in summary judgment, then the appellate court would take a “hard look” at the trial court’s decision. If the trial court admitted the expert witness’s opinions, and the case proceeded to trial, with opponent of the challenged expert witness losing the verdict, then the appellate court would take a not-so “hard look” the trial court’s decision to admit the opinion. In re Paoli RR Yard PCB Litig., 35 F.3d 717, 750 (3d Cir.1994) (Becker, J.), cert. denied, 115 S.Ct.1253 (1995).

In Kumho Tire, the 11th Circuit followed this asymmetric approach, only to have the Supreme Court reverse and render. Unlike the appellate procedure followed in Daubert, the high Court took the extra step of applying the symmetrical standard of review, presumably for the didactic purpose of showing the 11th Circuit how to engage in appellate review. Carmichael v. Kumho Tire Co., 131 F.3d 1433 (11th Cir. 1997), rev’d sub nom. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999).

If anything is clear from the Kumho Tire decision, courts do not have discretion to apply an asymmetric standard to their evaluation of a challenge, under Federal Rule of Evidence 702, to a proffered expert witness opinion. Justice Stephen Breyer, in his opinion for the Court, in Kumho Tire, went on to articulate the requirement that trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Again, trial courts do not have the discretion to abandon this inquiry.

The “same intellectual rigor” test may have some ambiguities that make application difficult. For instance, identifying the “relevant” field or discipline may be contested. Physicians traditionally have not been trained in statistical analyses, yet they produce, and rely extensively upon, clinical research, the proper conduct and interpretation of which requires expertise in study design and data analysis. Is the relevant field biostatistics or internal medicine? Given that the validity and reliability of the relied upon studies come from biostatistics, courts need to acknowledge that the rigor test requires identification of the “appropriate” field — the field that produces the criteria or standards of validity and interpretation.

Justice Breyer did grant that trial courts must have some latitude in determining how to conduct their gatekeeping inquiries. Some cases may call for full-blown hearings and post-hearing proposed findings of fact and conclusions of law; some cases may be easily decided upon the moving papers. Justice Breyer’s grant of “latitude,” however, wanders off target:

“The trial court must have the same kind of latitude in deciding how to test an expert’s reliability, and to decide whether or when special briefing or other proceedings are needed to investigate reliability, as it enjoys when it decides whether that expert’s relevant testimony is reliable. Our opinion in Joiner makes clear that a court of appeals is to apply an abuse-of-discretion standard when it ‛review[s] a trial court’s decision to admit or exclude expert testimony’. 522 U. S. at 138-139. That standard applies as much to the trial court’s decisions about how to determine reliability as to its ultimate conclusion. Otherwise, the trial judge would lack the discretionary authority needed both to avoid unnecessary ‛reliability’ proceedings in ordinary cases where the reliability of an expert’s methods is properly taken for granted, and to require appropriate proceedings in the less usual or more complex cases where cause for questioning the expert’s reliability arises. Indeed, the Rules seek to avoid ‛unjustifiable expense and delay’ as part of their search for ‛truth’ and the ‛jus[t] determin[ation]’ of proceedings. Fed. Rule Evid. 102. Thus, whether Daubert ’s specific factors are, or are not, reasonable measures of reliability in a particular case is a matter that the law grants the trial judge broad latitude to determine. See Joiner, supra, at 143. And the Eleventh Circuit erred insofar as it held to the contrary.”

Kumho, 526 U.S. at 152-53.

Now the segue from discretion to fashion the procedural mechanism for gatekeeping review to discretion to fashion the substantive criteria or standards for determining “intellectual rigor in the relevant field” represents a rather abrupt shift. The leap from discretion to fashion procedure to discretion to fashion substantive criteria of validity has no basis in prior law, in linguistics, or in science. For instance, Justice Breyer would be hard pressed to uphold a trial court’s refusal to consider bias and confounding in assessing whether epidemiologic studies established causality in a given case, notwithstanding the careless language quoted above.

The troubling nature of Justice Breyer’s language did not go unnoticed at the time of the Kumho Tire case. Indeed, three of the Justices in Kumho Tire concurred to clarify:

“I join the opinion of the Court, which makes clear that the discretion it endorses—trial-court discretion in choosing the manner of testing expert 1reliability—is not discretion to abandon the gatekeeping function. I think it worth adding that it is not discretion to perform the function inadequately.”

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999) (Scalia, J., concurring, with O’Connor, J., and Thomas, J.)

Of course, this language from Kumho Tire really cannot be treated as binding after the statute interpreted, Rule 702, was modified in 2000. The judges of the inferior federal courts have struggled with Rule 702, sometimes more to evade its reach than to perform gatekeeping in an intelligent way. Quotations of passages from cases decided before the statute was amended and revised should be treated with skepticism.

Recently, the Sixth Circuit quoted Justice Breyer’s language about latitude from Kumho Tire, in the Circuit’s decision involving GE Healthcare’s radiographic contrast medium, Omniscan. Decker v. GE Healthcare Inc., 2014 U.S. App. LEXIS 20049, at *29 (6th Cir. Oct. 20, 2014). Although the Decker case is problematic in many ways, the defendant did not challenge general causation between gadolinium and nephrogenic systemic fibrosis, a painful, progressive connective tissue disease, which afflicted the plaintiff. It is unclear exactly what sort of latitude in applying the statute, the Sixth Circuit was hoping to excuse.

Expert Witness Mining – Antic Proposals for Reform

November 4th, 2014

Law Reviews and Altered States of Reality

In 2008, Justice Breyer observed wryly that “there is evidence that law review articles have left terra firma to soar into outer space”; and Judge Posner has criticized law review articles for the “silly titles, the many opaque passages, the antic proposals, the rude polemics, [and] the myriad pretentious citations.” In 2010, Justice Scalia, who was a law-review-producing law professor for the University of Virginia for several years, responded to a lawyer’s oral argument, in McDonald v. City of Chicago, by suggesting that the argument had no support in Supreme Court precedent, but the unsupported argument would make the lawyer the “the darling of the professoriate.” At the June 2011 Fourth Circuit Judicial Conference, Chief Justice Roberts opined that law reviews are generally not “particularly helpful for practitioners and judges.”  In his words:

“Pick up a copy of any law review that you see and the first article is likely to be, you know, the influence of Immanuel Kant on evidentiary approaches in 18th-century Bulgaria, or something, which I’m sure was of great interest to the academic that wrote it, but isn’t of much help to the bar.”

See Debra Cassens Weiss, “Law Prof Responds After Chief Justice Roberts Disses Legal ScholarshipAm. Bar Ass’n J. (July 07, 2011). Lawyers would think the Justices view law review scholarship as a useless but generally harmless activity. Sometimes, however, law review articles can actually be harmful.

Selection Effects in the Retention and Presentation of Expert Witnesses

The complaints about law review scholarship are obviously based upon extremes and travesties. Interestingly, Judge Posner himself has been no slacker when it comes to producing law review articles with “antic proposals.” See, e.g., Richard A. Posner, “An Economic Approach to the Law of Evidence,” 51 Stan. L. Rev. 1477, 1541–42 (1999). In the tradition of non-traditional, rationalist proposals that ignore experience and make up something completely untested, Judge Richard Posner has advocated rule changes that would require lawyers

“to disclose the name of all the experts whom they approached as possible witnesses before settling on the one testifying. This would alert the jury to the problem of ‘witness shopping’.”

Posner, 51 Stan. L. Rev. at 1541. The point of Judge Posner’s radical reform is to alert triers of fact to whether the expert witness testifying is the first, or the umpteenth expert witness interviewed before a suitable opinion had been “procured,” so that the fact finder can draw the“ reasonable inference” that the case must be weaker than presented if the party went through so many expert witnesses before coming up with one who would testify in the case. If one party disclosed but one expert witness, the one that actually testified, and the other party disclosed X such witnesses (where X >1), then the fact finder could find in favor of the first party upon the basis of the so-called reasonable inference.

Posner’s proposal is at best a proxy for accuracy and validity in expert witness opinion testimony, and one for which Posner presents no evidence to support his hoped-for improvement in juridical accuracy. Not only does Judge Posner present no evidence that his proposed reform and suggested inference would be in the least bit reasonable and probative of the truth, he fails to address the obvious incentives that would be created by his proposal. Fearing the prejudicial inference from having consulted with “too many” expert witnesses, lawyers, operating under the Posner Rule, would have strong incentives to go to the expert witness “one-stop-shopping” mall, where they know they can obtain expert witnesses guaranteed to align themselves with the needed litigation positions and claims. The Posner Rule would also give a strong advantage to lawyers more skilled in vetting and selecting expert witnesses, to the detriment of less experienced lawyers. Of course, lawyers who are willing to go shopping at the meretricious mall or to employ a “cleaner” who brokers the selection without footprints might escape the bite of the Posner adverse inference.

Posner’s proposed rule ignores what is at the heart of identifying and selecting expert witnesses to testify. Obviously lawyers must identify potential witnesses with suitable expertise to address the issues raised by the litigation. Database searches, such as PubMed and Google Scholar searches for bio-medical experts, can go a long way towards identifying candidates, but interviews are important as well. Posner would chill lawyers’ effective representation by placing an adverse inference upon their diligence in any contact with the person other than the “one” who will be anointed to be the party’s designated testifier.

Meetings and interviews with prospective expert witnesses to ascertain whether the witness candidate has sufficient time and interest in fulfill the litigation assignment. Expertise in the area is hardly a guarantee that the candidate will be interested in answering the specific questions that are contested in the litigation. The lawyers must also ascertain whether the witness candidate has the stamina, patience, and aptitude for the litigation context. Not all real experts do, and the consequences of engaging an expert who does not have the qualities to make a good expert witness can be disastrous. Witness candidates must also be screened for their communication skills, their appearance, and even basic hygiene. The most brilliant expert who mumbles, or who is unkempt, is useless in litigation.

Lawyers must evaluate witness candidates for conflicts of interest, many of which are unknowable until there is a face-to-face meeting. Does the witness candidate have a significant other or child who works for the litigation industry (plaintiffs’ bar) or for the defendant industry under assault in the litigation at hand? Either way, the candidate may be compromised. Was the candidate mentored by an expert witness on the other side? Is the candidate on an editorial board with the adversary’s witnesses? Is the candidate close personal friends of the adversaries or their witnesses, such that he will be less than enthusiastic in showing the infirmities of the other side’s positions? Any of these questions could lead to answers that practically disqualify a witness candidate from consideration. Proceeding without such vetting could be catastrophic for the client and counsel. Burdening the vetting process with the threat of an adverse inference is deeply unfair to diligent counsel trying to represent and serve their clients.

And there are yet additional considerations that require exploration with any witness candidate. Expert witnesses are not equally able to deal with adverse authority in the form of a noted scientist who has taken a stand on the litigation issue, or a superficially appearing authoritative author who has published an adverse opinion. As well trained as they might be, some real experts are “sheep,” who are most comfortable following the herd, and not independent thinkers. Not all experts are willing or able to read studies as critically as needed for the litigation situation, which can sometimes be more demanding than the scientific arena. Lawyers charged with retaining expert witnesses must assess their clients’ positions and determine how well their expert witnesses will perform under all the circumstances of the case.

Professor Christopher Robertson proposes an even more radical reform of the law of expert witness by removing the selection and control of expert witnesses from parties and their counsel, completely. Robertson would somehow create a pool of expert witnesses on the issues in each case, and assign them to parties in a double-blinded randomized fashion. Christopher Tarver Robertson, “Blind Expertise,” 85 N.Y. Univ. L. Rev. 174, 211 (2010). Aside from depriving litigants of autonomy and control over their cases, this approach has even greater potential for generating false results. How do the expert witness come to be retained for this process? Any two expert witnesses may very well come to an incorrect analysis precisely because they do not have the benefit of each other’s report to develop the full range of data to be considered. What if the expert assigned to plaintiff concludes that there is no case, but the expert assigned to the defendant concludes that the plaintiff’s case is meritorious? Normally, plaintiffs’ expert witnesses must file their reports in advance of the defense witnesses, who then have the opportunity to rebut but also the benefit of all the data included. Simultaneous reports risk major omissions of data to be considered on both sides. The adversarial cauldron works to ensure completeness in what data and studies are considered.

Now comes Jonah Gelbach to attempt a probabilistic, theoretical defense of reforms in the Posner-Robertson mold. Jonah B. Gelbach, “Expert Mining and Required Disclosure,” 81 U. Chicago L. Rev. 131 (2014). Professor Gelbach is a well-trained economist, and a recently minted lawyer (Yale 2013), who is now an Associate Professor at the University of Pennsylvania Law School. Gelbach’s experience with the practice of law is limited to working as a law-school intern at David Rosen & Associates, in New Haven, Connecticut, before joining the Penn faculty. His proposals may need to be taken with a 100 grains of aspirin.

Although Gelbach disagrees with particulars of the Posner-Robertson proposals, Gelbach joins with them to opine that “[t]o the extent that additional fully disclosed expert testimony increases the fact finder’s information, we can expect a beneficial increase in accuracy.” Gelbach at 133. Gelbach’s dictum, however, is an ipse dixit, and he offers only a limited hypothetical case in which full disclosure of data should be required to solve the problem. And even in his hypothetical case, the disclosure of the identities of the testers is unnecessary to correct the error that Gelbach predicts. Gelbach’s call for the disclosure of consulting expert witnesses introduces only a collateral issue that has nothing to do with the accuracy of the scientific reasoning.

Gelbach analogizes “witness shopping” to data dredging and multiple testing, with a known inflation in the rate of false positive outcomes. If a party directs multiple to conduct single outcome measurements or tests, then that party can recreate the results of multiple testing without having to disclosure the number of independent tests. Gelbach’s argument is at its strongest for a simplistic model of a simple measurement, with errors normally distributed, with accuracy of the measurement tied to the outcome of the case. Gelbach at 136. Gelbach analogizes expert witness mining with data mining, and goes so far as to provide a calculation of false positive rates from multiple testing.

The sort of multiple testing Gelbach condemns is even more obvious when something other than random error is involved. Consider the need of litigants to have chest radiograph interpreted for the presence or absence of a pneumoconiosis in occupational dust disease litigation. Not only is there an intra-observer variability, there are potential or known subjective biases in radiograph interpretations. Gelbach need not worry about multiple testing because the need for economic efficiency already encourages many lawyers to employ radiologists who are must biased in favor of their clients’ positions. The bigger problem would be to encourage lawyers to obtain an honest second opinion, which might make them less strident about their litigation positions when discussing possible settlement.

Gelbach appears to believe that mandatory disclosure of the number of expert witnesses hired as well as the contents of the written and oral reports issued by the party’s nontestifying expert witnesses is needed to abate the potential harm from “expert mining.” By introducing the probabilistic modeling of Type I and Type II errors, however, Gelbach elevates proofiness over clear thinking about the issue. The simple solution to Gelbach’s soil measurement hypothetical is to require disclosure of all testing data, regardless whether conducted by expert witnesses designated as testifying or as consulting. All are agents of the party for purposes of creating data in the form of the hypothesized soil measurement. Indeed, Gelbach’s hypothetical envisions a technical laboratory that conducts such measurements, and the lab might not even be associated with a person designated to serve as an expert witness on the litigation issues.

Gelbach’s soil-measurement case is thus, for the most part, a straw-person case. In the vast majority of cases, multiple expert witness interviews leading up to selection and retention is, however, not at all like multiple testing, either in its ability to generate deliberate false positive or false negative opinions. The evidence remains what it is, and the parameter unchanged, whatever the qualitative judgments of the witness candidates. In most litigation contexts, the data upon which the expert witnesses will rely comes from published studies, and not from a single measurement under either side’s control and ability to resample many times through the agency of multiple expert witnesses. The Rules need to help the triers of fact discern the truth, not irrelevant proxies for the truth. If the triers of fact are incompetent to adjudge the actual evidence, then we may need to find triers who are competent.

The extension of the soil hypothetical to all of expert witness opinion testimony is unwarranted. Accuracy and validity of expert opinion is not “independent and identically distributed.” Truth and accuracy in scientific judgment as applied to litigation scientific questions are not random variables with known distributions.

A party may have to comb through dozens of potential expert witnesses before arriving at an expert witness with an appropriate, accurate answer to the litigation issue. When confronted with a pamphlet entitled “100 Authors against Einstein,” Albert Einstein quipped “if I were wrong, one would have been enough.”  See Remigio Russo, 18 Mathematical Problems in Elasticity 125 (1996) (quoting Einstein). Legal counsel should not have their clients’ cause compromised because they had the misfortune of consulting the “100 Authors” before arriving at Einstein’s door. The Posner-Robertson-Gelbach proposals all suffer the same flaw: they defer unduly to conformism and ignore the truth, validity, and accuracy of procured opinions.

Disputes in science are resolved with data, from high-quality, reproducible experimental or observational studies, not with appeals to the number of speakers. The number of expert witness candidates who were interviewed or who offered preliminary opinions is irrelevant to the task assigned to the finder of fact in a case involving scientific evidence. The final, proffered opinion of the testifying expert witness is only as good as the evidence and analysis upon which it rests, which under the current rules, should be fully disclosed.

Transparency, Confusion, and Obscurantism

October 31st, 2014

In NIEHS Transparency? We Can See Right Through You (July 10, 2014), I chastised authors Kevin C. Elliott and David B. Resnik for their confusing and confused arguments about standards of proof, the definition of risk, and conflicts of interest (COIs). See Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. In their focus on environmentalism and environmental policy, Elliott and Resnik seem intent upon substituting various presumptions, leaps of faith, and unproven extrapolations for actual evidence and valid inference, in the hope of improving the environment and reducing risk to life. But to get to their goal, Elliott and Resnik engage in various equivocations and ambiguities in their use of “risk,” and they compound the muddle by introducing a sliding scale of “standards of evidence,” for legal, regulatory, and scientific conclusions.

Dr. David H. Schwartz is a scientist, who received his doctoral degree in Neuroscience from Princeton University, and his postdoctoral training in Neuropharmacology and Neurophysiology at the Center for Molecular and Behavioral Neuroscience, in Rutgers University. Dr. Schwartz has since gone to found one of the leading scientific consulting firms, Innovative Science Solutions (ISS), which supports both regulatory and litigation claims and defenses, as may scientifically appropriate. Given his experience, Dr. Schwartz is well positioned to address the standards of scientific evidentiary conclusions across regulatory, litigation, and scientific communities.

In this month’s issue of Environmental Health Perspectives (EHP), Dr. David Schwartz adds to the criticism of Elliott and Resnik’s tendentious editorial. David H. Schwartz, “Policy and the Transparency of Values in Science,” 122 Envt’l Health Persp. A291 (2014). Schwartz points out that “[a]lthough … different venues or contexts require different standards of evidence, it is important to emphasize that the actual scientific evidence remains constant.” Id.

Dr. Schwartz points out transparency is needed in how standards and evidence are represented in scientific and legal discourse, and he takes Elliott and Resnik to task for arguing, from ignorance, that litigation burdens are different from scientific standards. At times some writers misrepresent the nature of their evidence, or its weakness, and when challenged, attempt to excuse the laxness in standards by adverting to the regulatory or litigation contexts in which they are speaking. In some regulatory contexts, the burdens of proof are deliberately reduced, or shifted to the regulated industry. In litigation, the standard or burden of proof is rarely different from the scientific enterprise itself. As the United States Supreme Court made clear, trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Expert witnesses who fail to exercise the same intellectual rigor in the courtroom as in the laboratory, are eminently disposable or excludable from the legal process.

Schwartz also points out, as I had in my blog post, that “[w]hen using science to inform policy, transparency is critical. However, this transparency should include not only financial ties to industry but also ties to advocacy organizations and other strongly held points of view.”

In their Reply to Dr. Schwartz, Elliott and Resnik concede the importance of non-financial conflicts of interest, but they dig in on the supposed lower standard for scientific claims:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Rather than citing any pertinent or persuasive legal authority, Elliott and Resnik cite an expert witness, Carl Cranor, neither a lawyer nor a scientist, who has worked steadfastly for the litigation industry (the plaintiffs’ bar) on various matters. The “caution” of Elliott and Resnik is directly contradicted by the Supreme Court’s pronouncement in Kumho Tire, and is fueled by a ignoratio elenchi that is based upon a confusion between the probability of repeated sampling with confidence intervals (usually 95%) and the posterior probability of a claim: namely, the probability of the claim given the admissible evidence. As the Reference Manual for Scientific Evidence makes clear, these are very different probabilities, which Cranor and others have consistently confused. Elliott and Resnik ought to know better.

Frank Advocacy on Welding Health “Effects”

September 1st, 2014

Arthur L. Frank is a professor and the chair of Drexel University School of Public Health’s program in Environmental and Occupational Health. Frank testifies fairly extensively for plaintiffs in asbestos litigation. See, e.g., Frank Affidavit. He also has testified for plaintiffs in manganese fume litigation brought by welders, although he has no specialty training in movement disorder neurology.

Although Arthur Frank has testified for plaintiffs in asbestos cases, he does not appear to comply with professional association disclosure of conflicts, when he presents on asbestos issues. See, e.g.,More Hypocrisy Over Conflicts of Interest” (Dec. 4, 2010). The same disregard for conflict disclosures seems to hold for his work on welding health outcomes.

Last year, in September 2013, Frank presented a poster at the 11th Inhaled Particles XI (IPXI) conference, organized by the British Occupational Hygiene Society (BOHS). Frank’s poster was entitled,Health Effects of Welding Fumes,” although the poster reported only a cross-sectional study without controls from Qingdao City, China. Frank provided no conflict-of-interest disclosure in the poster. Arthur Frank, Huanqiang Zhang, and Chunsheng Xu, “Health Effects of Welding Fumes,” Inhaled Particle XI Conference, Nottingham, U.K. (Sept. 2013). His current C.V., available online, reports this as an abstract from the Inhaled Articles conference. One does not need an epidemiologic study to see how scientists could choke on this article if inhaled.

Frank’s study was based upon an anonymous questionnaire to 505 steel welders, at state-owned, foreign-funded, and privately owned companies, in Qingdao City, China. No industrial hygiene exposure measurements or use of personal respiratory protection is reported. No physical examinations are reported. No statistical tests are reported. Frank reported a “symptom” of unsteady gait/difficulty walking of 1%, but he and his Chinese colleagues did not report whether this 1% had musculo-skeletal or neurological problems. The former would be fairly common among tradesmen such as welders whose work often puts them at risk of traumatic injury.

According to Frank, a “striking” finding was an 18% prevalence hand tremor among those working at least 15 years, with lower prevalence reported for less than 5 years (4%), at 5 years (3%), and at 10 years (5%). Frank offered no explanation of how these prevalence rates were ascertained in a cross-sectional questionnaire based study; nor did Frank offer any qualification as to whether these hand tremors were rest or action tremor, bilateral or unilateral, or of any particular kind. Notwithstanding the severe limitations of this “study,” Frank offers a conclusion that “[m]anganese from inhaled particles from welding fumes cause serious outcomes in welders. Welding fumes also cause other health effects. Workplace hygiene correlates with health outcome.”

Shortly after the Inhaled Particles conference, a colleague reported that Frank had testified in an asbestos case that he had submitted a “manganese in welding” manuscript to the Annals of Industrial Hygiene, which is the journal of the (BOHS). To date, no article by Frank has appeared in this, or any, journal.

Perhaps the only interesting aspect of this little cross-sectional study, based upon self-reported symptoms, is that workers employed by the communist state report a higher rate of symptoms than those employed by privately owned companies. More evidence that worker illness, if any there should be, is not necessarily the result of the “profit motive” of private corporations. The abstract also shows that at some scientific conferences, anything goes with respect to conflicts-of-interest disclosures and shameless advocacy.

Peer Review, PubPeer, PubChase, and Rule 702 – Candles in the Ear

August 28th, 2014

In deciding the Daubert case, the Supreme Court identified several factors to assess whether “the reasoning or methodology underlying the testimony is scientifically valid and of whether that reasoning or methodology properly can be applied to the facts in issue.” One of those factors was whether the proffered opinion had been “peer reviewed” and published. Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94 (1993). The Court explained the publication factor:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, and in some instances well-grounded but innovative theories will not have been published. Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert, 509 U.S. at 593-94 (internal citations omitted). See, e.g., Lust v. Merrell Dow Pharms., Inc., 89 F. 3d 594, 597 (9th Cir. 1996) (affirming exclusion of Dr. Alan Done, plaintiffs’ expert witness in Chlomid birth defects case, in part because of the lack of peer review and publication of his litigation-driven opinions); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1406 (1996)  (noting that “the lack of peer review for [epidemiologist] Dr. Swan’s theories weighs heavily against the admissibility of Dr. Swan’s testimony”).

Case law since Daubert has made clear that peer review is neither necessary nor sufficient for the admissibility of an opinion. United States v. Mikos, 539 F.3d 706, 711 (7th Cir. 2008) (noting that the absence of peer-reviewed studies on subject of bullet grooving did not render opinion, based upon FBI database, inadmissible); In re Zoloft Prods. Liab. Litig. MDL No. 2342; 12-md-2342,  2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (excluding proffered testimony of epidemiologist Anick Bérard for arbitrarily selecting some point estimates and ignoring others in published studies).

As Susan Haack has noted, “peer review” has taken on mythic proportions in the adjudication of expert witness opinion admissibility.  Susan Haack, “Peer Review and Publication: Lessons for Lawyers,” 36 Stetson L. Rev. 789 (2007), republished in Susan Haack, Evidence Matters: Science, Proof, and Truth in the Law 156 (2014). Peer review, at best, is a weak proxy for the study validity, which is what is really needed in judicial proceedings. Proxies avoid the labor of independent, original thought, and so they are much favored by many judges.

In the past, some litigants oversold peer review as a touchstone of reliable, admissible expert witness testimony only to find that some very shoddy opinions show up in ostensibly peer-reviewed journals. SeeMisplaced Reliance On Peer Review to Separate Valid Science From Nonsense” (Aug. 14, 2011). Scientists often claim that science is “self-correcting,” but in some areas of research, there are few severe tests and little critical review, and mostly glib confirmations from acolytes.

Letters to the editor are sometimes held out as a remedy to peer-review screw ups, but such letters, which are not themselves peer reviewed, are subject to the whims of imperious editors who might wish to silence the views of those who would be critical of their judgment in publishing the article under discussion. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion.  Letters to the editor are often frowned upon in academic circles as not advancing affirmative research and scholarship agenda.

Letters to the editor often must be sent within a short time window of initial publication, often too short for busy academics to analyze a paper carefully and comment.  Furthermore, letters  and are often limited to a few hundred words, which length is often inadequate to develop a careful critique or exposition of the issues in the paper.  Moreover, such letters suffer from an additional procedural problem:  authors are permitted a response, and the letter writers are not permitted a reply. Authors thus get the last word, which they can often use to deflect or diffuse important criticisms.  The authors’ response can be sufficiently self-serving and misleading, with immunity from further criticism, that many would-be correspondents abandon the project altogether. See, e.g., PubPeer – “Example case showing why letters to the editor can be a waste of time” (Oct. 8, 2013).

Websites and blogs provide for dynamic content, with the potential for critical reviews that can be identified by search engines. See, e.g., Paul S. Brookes, “Our broken academic journal corrections system,” PSBLAB: Cardiac Mitochondrial Research in the Lab (Jan. 14, 2014). Mostly, the internet holds untapped potential for analysis, discussion, and debate on published studies.  To be sure, some journals provide “comment fields,” on their websites, with an opportunity for open discussion.  Often, full critiques must be developed and presented elsewhere. See, e.g., Androgen Study Group, “Letter to JAMA Asking for Retraction of Misleading Article on Testosterone Therapy” (Mar. 25, 2014).

PubPeer

Kate Yandell, in TheScientist, reports on the creation of PubPeer a few years ago, as a forum for post-publication review and discussion published scientific papers. Kate Yandell, “Concerns Raised Online Linger” (Aug. 25, 2014).  Billing itself as an “online journal club,” PubPeer has pointed out potentially serious problems, some of which have led to retractions and corrections. Another internet site of interest is PubChase, which monitors discussion of particular articles, as well as generating email alerts and recommendations for related articles.

One journal editor has taken notice and given notice that he will not pay attention to post-publication peer review.  Eric J. Murphy, the editor in chief of Lipids, posting a comment at PubPeer, illustrates that there will be a good deal of resistance to post-publication open peer review, out of the control of journal editors:

“As an Editor-in-Chief of a society journal, I have never examined PubPeer nor will I do so. First, there is the crowd or group mentality that may over emphasize some point in an irrational manner.  Just as using the marble theory of officiating is bad, one should never base a decision on the quantity of negative or positive comments. Second, if the concerned individual sent an e-mail or letter to me, then I would be duty bound to examine the issue.  It is not my duty to monitor PubPeer or any other such site, but rather to respond to queries sent to me.  So, with regards to Hugh’s point, I don’t support that position at all.

Mistakes happen, although frankly we try to limit these mistakes and do take steps to prevent publishing papers with FFP, it does happen.  Also, honest mistakes happen in science all the time, so[me] of these result in an erratum, while others go unnoticed by editors and reviewers.  In such a case, someone who does notice should contact the editor to put them on notice regarding the issue so that it may be resolved.  Resolution does not necessarily mean correction, but rather the editor taking a close look at the situation, discussing the situation with the original authors, and then reaching a decision.  Most of the time a correction will be made, but not always.”

Murphy’s comments are remarkable.  PubPeer provides a forum for post-publication comment, but it hardly requires editors, investigators, and consumers of scientific studies to evaluate published works by “nose counts” of favorable and unfavorable comments.  This is not, and never has been, a democratic enterprise.  Somehow, we might expect Murphy and others to evaluate the comments, on the merits, not on their prevalence.  Murphy’s declaration that he is duty-bound to investigate and evaluate letters or emails sent to him about published articles is encouraging, but the editors’ ability to ratify publication, in the face of a private communication, without comment to the scientific community, strips the community of making a principled decision on its own.  Murphy’s way, which seems largely the way of contemporary scientific publishing, ignores the important social dimension of scientific debate and resolution of issues.  Leaving control of the discussion in the hands of the editors who approved and published studies may be asking too much of editors. Nemo iudex in causa sua.

PubPeer has already tested the limits of free speech. Kate Yandell, “PubPeer Threatened with Legal Action” (Aug. 19, 2014). A scientist whose works were receiving unfavorable attention on PubPeer threatened a lawsuit.  Let’s hope that scientists can learn to be sufficiently thick skinned that there can be open discourse of the merits of their research, their data, and their conclusions.

Pritchard v. Dow Agro – Gatekeeping Exemplified

August 25th, 2014

Robert T. Pritchard was diagnosed with Non-Hodgkin’s Lymphoma (NHL) in August 2005; by fall 2005, his cancer was in remission. Mr. Pritchard had been a pesticide applicator, and so, of course, he and his wife sued the deepest pockets around, including Dow Agro Sciences, the manufacturer of Dursban. Pritchard v. Dow Agro Sciences, 705 F.Supp. 2d 471 (W.D.Pa. 2010).

The principal active ingredient of Dursban is chlorpyrifos, along with some solvents, such as xylene, cumene, and ethyltoluene. Id. at 474.  Dursban was licensed for household insecticide use until 2000, when the EPA phased out certain residential applications.  The EPA’s concern, however, was not carcinogenicity:  the EPA categorizes chlorpyrifos as “Group E,” non-carcinogenetic in humans. Id. at 474-75.

According to the American Cancer Society (ACS), the cause or causes of NHL cases are unknown.  Over 60,000 new cases are diagnosed annually, in people from all walks of life, occupations, and lifestyles. The ACS identifies some risk factors, such as age, gender, race, and ethnicity, but the ACS emphasizes that chemical exposures are not proven risk factors or causes of NHL.  See Pritchard, 705 F.Supp. 2d at 474.

The litigation industry does not need scientific conclusions of causal connections; their business is manufacturing certainty in courtrooms. Or at least, the appearance of certainty. The Pritchards found their way to the litigation industry in Pittsburgh, Pennsylvania, in the form of Goldberg, Persky & White, P.C. The Goldberg Persky firm sued Dow Agro, and then put the Pritchards in touch with Dr. Bennet Omalu, to serve as their expert witness.  A lawsuit ensued.

Alas, the Pritchards’ lawsuit ran into a wall, or at least a gate, in the form of Federal Rule of Evidence 702. In the capable hands of Judge Nora Barry Fischer, Rule 702 became an effective barrier against weak and poorly considered expert witness opinion testimony.

Dr. Omalu, no stranger to lost causes, was the medical examiner of San Joaquin County, California, at the time of his engagement in the Pritchard case. After careful consideration of the Pritchards’ claims, Omalu prepared a four page report, with a single citation, to Harrison’s Principles of Internal Medicine.  Id. at 477 & n.6.  This research, however, sufficed for Omalu to conclude that Dursban caused Mr. Pritchard to develop NHL, as well as a host of ailments he had never even sued Dow Agro for, including “neuropathy, fatigue, bipolar disorder, tremors, difficulty concentrating and liver disorder.” Id. at 478. Dr. Omalu did not cite or reference any studies, in his report, to support his opinion that Dursban caused Mr. Pritchard’s ailments.  Id. at 480.

After counsel objected to Omalu’s report, plaintiffs’ counsel supplemented the report with some published articles, including the “Lee” study.  See Won Jin Lee, Aaron Blair, Jane A. Hoppin, Jay H. Lubin, Jennifer A. Rusiecki, Dale P. Sandler, Mustafa Dosemeci, and Michael C. R. Alavanja, “Cancer Incidence Among Pesticide Applicators Exposed to Chlorpyrifos in the Agricultural Health Study,” 96 J. Nat’l Cancer Inst. 1781 (2004) [cited as Lee].  At his deposition, and in opposition to defendants’ 702 motion, Omalu became more forthcoming with actual data and argument.  According to Omalu, the Lee study “the 2004 Lee Study strongly supports a conclusion that high-level exposure to chlorpyrifos is associated with an increased risk of NHL.’’ Id. at 480.

This opinion put forward by Omalu bordered on scientific malpractice.  No; it was malpractice.  The Lee study looked at many different cancer end points, without adjustment for multiple comparisons.  The lack of adjustment means at the very least that any interpretation of p-values or confidence intervals would have to modified to acknowledge the higher rate of random error.  Now for NHL, the overall relative risk (RR) for chlorpyrifos exposure was 1.03, with a 95% confidence interval, 0.62 to 1.70.  Lee at 1783.  In other words, the study that Omalu claimed supported his opinion was about as null a study as can be, with reasonably tight confidence interval that made a doubling of the risk rather unlikely given the sample RR.

If the multiple endpoint testing were not sufficient to dissuade a scientist, intent on supporting the Pritchards’ claims, then the exposure subgroup analysis would have scared any prudent scientist away from supporting the plaintiffs’ claims.  The Lee study authors provided two different exposure-response analyses, one with lifetime exposure and the other with an intensity-weighted exposure, both in quartiles.  Neither analysis revealed an exposure-response trend.  For the lifetime exposure-response trend, the Lee study reported an NHL RR of 1.01, for the highest quartile of chloripyrifos exposure. For the intensity-weighted analysis, for the highest quartile, the authors reported RR = 1.61, with a 95% confidence interval, 0.74 to 3.53).

Although the defense and the district court did not call out Omalu on his fantasy statistical inference, the district judge certainly appreciated that Omalu had no statistically significant associations between chloripyrifos and NHL, to support his opinion. Given the weakness of relying upon a single epidemiologic study (and torturing the data therein), the district court believed that a showing of statistical significance was important to give some credibility to Omalu’s claims.  705 F.Supp. 2d at 486 (citing General Elec. Co. v. Joiner, 522 U.S. 136, 144-46 (1997);  Soldo v. Sandoz Pharm. Corp., 244 F.Supp. 2d 434, 449-50 (W.D. Pa. 2003)).

Figure 3 adapted from Lee

Figure 3 adapted from Lee

What to do when there is really no evidence supporting a claim?  Make up stuff.  Here is how the trial court describes Omalu’s declaration opposing exclusion:

 “Dr. Omalu interprets and recalculates the findings in the 2004 Lee Study, finding that ‘an 80% confidence interval for the highly-exposed applicators in the 2004 Lee Study spans a relative risk range for NHL from slightly above 1.0 to slightly above 2.5.’ Dr. Omalu concludes that ‘this means that there is a 90% probability that the relative risk within the population studied is greater than 1.0’.”

705 F.Supp. 2d at 481 (internal citations omitted); see also id. at 488. The calculations and the rationale for an 80% confidence interval were not provided, but plaintiffs’ counsel assured Judge Fischer at oral argument that the calculation was done using high school math. Id. at 481 n.12. Judge Fischer seemed unimpressed, especially given that there was no record of the calculation.  Id. at 481, 488.

The larger offense, however, was that Omalu’s interpretation of the 80% confidence interval as a probability statement of the true relative risk’s exceeding 1.0, was bogus. Dr. Omalu further displayed his lack of statistical competence when he attempted to defend his posterior probability derived from his 80% confidence interval by referring to a power calculation of a different disease in the Lee study:

“He [Omalu] further declares that ‘‘the authors of the 2004 Lee Study themselves endorse the probative value of a finding of elevated risk with less than a 95% confidence level when they point out that ‘this analysis had a 90% statistical power to detect a 1.5–fold increase in lung cancer incidence’.”

Id. at 488 (court’s quoting of Omalu’s quoting from the Lee study). To quote Wolfgang Pauli, Omalu is so far off that he is “not even wrong.” Lee and colleagues were offering a pre-study power calculation, which they used to justify their looking at the cohort for lung cancer, not NHL, outcomes.  Lee at 1787. The power calculation does not apply to the data observed for lung cancer; and the calculation has absolutely nothing to do with NHL. The power calculation certainly has nothing to do with Omalu’s misguided attempt to offer a calculation of a posterior probability for NHL based upon a subgroup confidence interval.

Given that there were epidemiologic studies available, Judge Fischer noted that expert witnesses were obligated to factor such studies into their opinions. See 705 F.Supp. 2d at 483 (citing Soldo, 244 F.Supp. 2d at 532).  Omalu sins against Rule 702 included his failure to consider any studies other than the Lee study, regardless of how unsupportive the Lee study was of his opinion.  The defense experts pointed to several studies that found lower NHL rates among exposed workers than among controls, and Omalu completely failed to consider and to explain his opinion in the face of the contradictory evidence.  See 705 F.Supp. 2d at 485 (citing Perry v. Novartis Pharm. Corp. 564 F.Supp. 2d 452, 465 (E.D. Pa. 2008)). In other words, Omalu was shown to have been a cherry picker. Id. at 489.

In addition to the abridged epidemiology, Omalu relied upon an analogy between the ethyl-toluene and other solvents that contained benzene rings and benzene itself to argue that these chemicals, supposedly like benzene, cause NHL.  Id. at 487. The analogy was never supported by any citations to published studies, and, of course, the analogy is seriously flawed. Many chemicals, including chemicals made and used by the human body, have benzene rings, without the slightest propensity to cause NHL.  Indeed, the evidence that benzene itself causes NHL is weak and inconsistent.  See, e.g., Knight v. Kirby Inland Marine Inc., 482 F.3d 347 (2007) (affirming the exclusion of Dr. B.S. Levy in a case involving benzene exposure and NHL).

Looking at all the evidence, Judge Fischer found Omalu’s general causation opinions unreliable.  Relying upon a single, statistically non-significant epidemiologic study (Lee), while ignoring contrary studies, was not sound science.  It was not even science; it was courtroom rhetoric.

Omalu’s approach to specific causation, the identification of what caused Mr. Pritchard’s NHL, was equally spurious. Omalu purportedly conducted a “differential diagnosis” or a “differential etiology,” but he never examined Mr. Pritchard; nor did he conduct a thorough evaluation of Mr. Pritchard’s medical records. 705 F.Supp. 2d at 491. Judge Fischer found that Omalu had not conducted a thorough differential diagnosis, and that he had made no attempt to rule out idiopathic or unknown causes of NHL, despite the general absence of known causes of NHL. Id. at 492. The one study identified by Omalu reported a non-statistically significant 60% increase in NHL risk, for a subgroup in one of two different exposure-response analyses.  Although Judge Fischer treated the relative risk less than two as a non-dispositive factor in her decision, she recognized that

“The threshold for concluding that an agent was more likely than not the cause of an individual’s disease is a relative risk greater than 2.0… . When the relative risk reaches 2.0, the agent is responsible for an equal number of cases of disease as all other background causes. Thus, a relative risk of 2.0 … implies a 50% likelihood that an exposed individual’s disease was caused by the agent. A relative risk greater than 2.0 would permit an inference that an individual plaintiff’s disease was more likely than not caused by the implicated agent.”

Id. at 485-86 (quoting from Reference Manual on Scientific Evidence at 384 (2d ed. 2000)).

Left with nowhere to run, plaintiffs’ counsel swung for the bleachers by arguing that the federal court, sitting in diversity, was required to apply Pennsylvania law of evidence because the standards of Rule 702 constitute “substantive,” not procedural law. The argument, which had been previously rejected within the Third Circuit, was as legally persuasive as Omalu’s scientific opinions.  Judge Fischer excluded Omalu’s proffered opinions and granted summary judgment to the defendants. The Third Circuit affirmed in a per curiam decision. 430 Fed. Appx. 102, 2011 WL 2160456 (3d Cir. 2011).

Practical Evaluation of Scientific Claims

The evaluative process that took place in the Pritchard case missed some important details and some howlers committed by Dr. Omalu, but it was more than good enough for government work. The gatekeeping decision in Pritchard was nonetheless the target of criticism in a recent book.

Kristin Shrader-Frechette (S-F) is a professor of science who wants to teach us how to expose bad science. S-F has published, or will soon publish, a book that suggests that philosophy of science can help us expose “bad science.”  See Kristin Shrader-Frechette, Tainted: How Philosophy of Science Can Expose Bad Science (Oxford U.P. 2014)[cited below at Tainted; selections available on Google books]. S-F’s claim is intriguing, as is her move away from the demarcation problem to the difficult business of evaluation and synthesis of scientific claims.

In her introduction, S-F tells us that her book shows “how practical philosophy of science” can counteract biased studies done to promote special interests and PROFITS.  Tainted at 8. Refreshingly, S-F identifies special-interest science, done for profit, as including “individuals, industries, environmentalists, labor unions, or universities.” Id. The remainder of the book, however, appears to be a jeremiad against industry, with a blind eye towards the litigation industry (plaintiffs’ bar) and environmental zealots.

The book promises to address “public concerns” in practical, jargon-free prose. Id. at 9-10. Some of the aims of the book are to provide support for “rejecting demands for only human evidence to support hypotheses about human biology (chapter 3), avoiding using statistical-significance tests with observational data (chapter 12), and challenging use of pure-science default rules for scientific uncertainty when one is doing welfare-affecting science (chapter 14).”

Id. at 10. Hmmm.  Avoiding statistical significance tests for observational data?!?  If avoided, what does S-F hope to use to assess random error?

And then S-F refers to plaintiffs’ hired expert witness (from the Milward case), Carl Cranor, as providing “groundbreaking evaluations of causal inferences [that] have helped to improve courtroom verdicts about legal liability that otherwise put victims at risk.” Id. at 7. Whether someone is a “victim” and has been “at risk” turns on assessing causality. Cranor is not a scientist, and his philosophy of science turns of “weight of the evidence” (WOE), a subjective, speculative approach that is deaf, dumb, and blind to scientific validity.

There are other “teasers,” in the introduction to Tainted.  S-F advertises that her Chapter 5 will teach us that “[c]ontrary to popular belief, animal and not human data often provide superior evidence for human-biological hypotheses.”  Tainted at 11. Chapter 6 will show that“[c]ontrary to many physicists’ claims, there is no threshold for harm from exposure to ionizing radiation.” Id.  S-F tells us that her Chapter 7 will criticize “a common but questionable way of discovering hypotheses in epidemiology and medicine—looking at the magnitude of some effect in order to discover causes. The chapter shows instead that the likelihood, not the magnitude, of an effect is the better key to causal discovery.” Id. at 13. Discovering hypotheses — what is that about? You might have thought that hypotheses were framed from observations and then tested.

Which brings us to the trailer for Chapter 8, in which S-F promises to show that “[c]ontrary to standard statistical and medical practice, statistical-significance tests are not causally necessary to show medical and legal evidence of some effect.” Tainted at 11. Again, the teaser raises lots of questions such as what could S-F possibly mean when she says statistical tests are not causally necessary to show an effect.  Later in the introduction, S-F says that her chapter on statistics “evaluates the well-known statistical-significance rule for discovering hypotheses and shows that because scientists routinely misuse this rule, they can miss discovering important causal hypotheses. Id. at 13. Discovering causal hypotheses is not what courts and regulators must worry about; their task is to establish such hypotheses with sufficient, valid evidence.

Paging through the book reveals that a rhetoric that is thick and unremitting, with little philosophy of science or meaningful advice on how to evaluate scientific studies.  The statistics chapter calls out, and lo, it features a discussion of the Pritchard case. See Tainted, Chapter 8, “Why Statistics Is Slippery: Easy Algorithms Fail in Biology.”

The chapter opens with an account of German scientist Fritz Haber’s development of organophosphate pesticides, and the Nazis use of related compounds as chemical weapons.  Tainted at 99. Then, in a fevered non-sequitur and rhetorical flourish, S-F states, with righteous indignation, that although the Nazi researchers “clearly understood the causal-neurotoxic effects of organophosphate pesticides and nerve gas,” chemical companies today “claim that the causal-carcinogenic effects of these pesticides are controversial.” Is S-F saying that a chemical that is neurotoxic must be carcinogenic for every kind of human cancer?  So it seems.

Consider the Pritchard case.  Really, the Pritchard case?  Yup; S-F holds up the Pritchard case as her exemplar of what is wrong with civil adjudication of scientific claims.  Despite the promise of jargon-free language, S-F launches into a discussion of how the judges in Pritchard assumed that statistical significance was necessary “to hypothesize causal harm.”  Tainted at 100. In this vein, S-F tells us that she will show that:

“the statistical-significance rule is not a legitimate requirement for discovering causal hypotheses.”

Id. Again, the reader is left to puzzle why statistical significance is discussed in the context of hypothesis discovery, whatever that may be, as opposed to hypothesis testing or confirmation. And whatever it may be, we are warned that “unless the [statistical significance] rule is rejected as necessary for hypothesis-discovery, it will likely lead to false causal claims, questionable scientific theories, and massive harm to innocent victims like Robert Pritchard.”

Id. S-F is decidedly not adverting to Mr. Pritichard’s victimization by the litigation industry and the likes of Dr. Omalu, although she should. S-F not only believes that the judges in Pritchard bungled their gatekeeping wrong, she knows that Dr. Omalu was correct, and the defense experts wrong, and that Pritchard was a victim of Dursban and of questionable scientific theories that were used to embarrass Omalu and his opinions.

S-F promised to teach her readers how to evaluate scientific claims and detect “tainted” science, but all she delivers here is an ipse dixit.  There is no discussion of the actual measurements, extent of random error, or threats to validity, for studies cited either by the plaintiffs or the defendants in Pritchard.  To be sure, S-F cites the Lee study in her endnotes, but she never provides any meaningful discussion of that study or any other that has any bearing on chlorpyrifos and NHL.  S-F also cited two review articles, the first of which provides no support for her ipse dixit:

“Although mutagenicity and chronic animal bioassays for carcinogenicity of chlorpyrifos were largely negative, a recent epidemiological study of pesticide applicators reported a significant exposure response trend between chlorpyrifos use and lung and rectal cancer. However, the positive association was based on small numbers of cases, i.e., for rectal cancer an excess of less than 10 cases in the 2 highest exposure groups. The lack of precision due to the small number of observations and uncertainty about actual levels of exposure warrants caution in concluding that the observed statistical association is consistent with a causal association. This association would need to be observed in more than one study before concluding that the association between lung or rectal cancer and chlorpyrifos was consistent with a causal relationship.

There is no evidence that chlorpyrifos is hepatotoxic, nephrotoxic, or immunotoxic at doses less than those that cause frank cholinesterase poisoning.”

David L. Eaton, Robert B. Daroff, Herman Autrup, James Bridges, Patricia Buffler, Lucio G. Costa, Joseph Coyle, Guy McKhann, William C. Mobley, Lynn Nadel, Diether Neubert, Rolf Schulte-Hermann, and Peter S. Spencer, “Review of the Toxicology of Chlorpyrifos With an Emphasis on Human Exposure and Neurodevelopment,” 38 Critical Reviews in Toxicology 1, 5-6(2008).

The second cited review article was written by clinical ecology zealot[1], William J. Rea. William J. Rea, “Pesticides,” 6 Journal of Nutritional and Environmental Medicine 55 (1996). Rea’s article does not appear in Pubmed.

Shrader-Frechette’s Criticisms of Statistical Significance Testing

What is the statistical significance against which S-F rails? She offers several definitions, none of which is correct or consistent with the others.

“The statistical-significance level p is defined as the probability of the observed data, given that the null hypothesis is true.8

Tainted at 101 (citing D. H. Johnson, “What Hypothesis Tests Are Not,” 16 Behavioral Ecology 325 (2004). Well not quite; attained significance probability is the probability of data observed or those more extreme, given the null hypothesis.  A Tainted definition.

Later in Chapter 8, S-F discusses significance probability in a way that overtly commits the transposition fallacy, not a good thing to do in a book that sets out to teach how to evaluate scientific evidence:

“However, typically scientists view statistical significance as a measure of how confidently one might reject the null hypothesis. Traditionally they have used a 0.05 statistical-significance level, p < or = 0.05, and have viewed the probability of a false-positive (incorrectly rejecting a true null hypothesis), or type-1, error as 5 percent. Thus they assume that some finding is statistically significant and provides grounds for rejecting the null if it has at least a 95-percent probability of not being due to chance.

Tainted at 101. Not only does the last sentence ignore the extent of error due to bias or confounding, it erroneously assigns a posterior probability that is the complement of the significance probability.  This error is not an isolated occurrence; here is another example:

“Thus, when scientists used the rule to examine the effectiveness of St. John’s Wort in relieving depression,14 or when they employed it to examine the efficacy of flutamide to treat prostate cancer,15 they concluded the treatments were ineffective because they were not statistically significant at the 0.05 level. Only at p < or = 0.14 were the results statistically significant. They had an 86-percent chance of not being due to chance.16

Tainted at 101-02 (citing papers by Shelton (endnote 14)[2], by Eisenberger (endnote 15) [3], and Rothman’s text (endnote 16)[4]). Although Ken Rothman has criticized the use of statistical significance tests, his book surely does not interpret a p-value of 0.14 as an 86% chance that the results were not due to chance.

Although S-F previous stated that statistical significance is interpreted as the probability that the null is true, she actually goes on to correct the mistake, sort of:

“Requiring the statistical-significance rule for hypothesis-development also is arbitrary in presupposing a nonsensical distinction between a significant finding if p = 0.049, but a nonsignificant finding if p = 0. 051.26 Besides, even when one uses a 90-percent (p < or = 0.10), an 85-percent (p < or = 0.15), or some other confidence level, it still may not include the null point. If not, these other p values also show the data are consistent with an effect. Statistical-significance proponents thus forget that both confidence levels and p values are measures of consistency between the data and the null hypothesis, not measures of the probability that the null is true. When results do not satisfy the rule, this means merely that the null cannot be rejected, not that the null is true.”

Tainted at 103.

S-F’s repeats some criticisms of significance testing, most of which involve their own misunderstandings of the concept.  It hardly suffices to argue that evaluating the magnitude of random error is worthless because it does not measure the extent of bias and confounding.  The flaw lies in those who would interpret the p-value as the sole measure of error involved in a measurement.

S-F takes the criticisms of significance probability to be sufficient to justify an alternative approach: evaluating causal hypotheses “on a preponderance of evidence,47 whether effects are more likely than not.”[5] Here citations, however, do not support the notion that an overall assessment of the causal hypothesis is a true alternative of statistical testing, but rather only a later step in the causal assessment, which presupposes the previous elimination of random variability in the observed associations.

S-F compounds her confusion by claiming that this purported alternative is superior to significance testing or any evaluation of random variability, and by noting that juries in civil cases must decide causal claims on the preponderance of the evidence, not on attained significance probabilities:

“In welfare-affecting areas of science, a preponderance-of-evidence rule often is better than a statistical-significance rule because it could take account of evidence based on underlying mechanisms and theoretical support, even if evidence did not satisfy statistical significance. After all, even in US civil law, juries need not be 95 percent certain of a verdict, but only sure that a verdict is more likely than not. Another reason for requiring the preponderance-of-evidence rule, for welfare-related hypothesis development, is that statistical data often are difficult or expensive to obtain, for example, because of large sample-size requirements. Such difficulties limit statistical-significance applicability. ”

Tainted at 105-06. S-F’s assertion that juries need not have 95% certainty in their verdict is either a misunderstanding or a misrepresentation of the meaning of a confidence interval, and a conflation of two very kinds of probability or certainty.  S-F invites a reading that commits the transposition fallacy by confusing the probability involved in a confidence interval with that involved in a posterior probability.  S-F’s claim that sample size requirements often limit the ability to use statistical significance evaluations is obviously highly contingent upon the facts of case, but in civil cases, such as Pritchard, this limitation is rarely at play.  Of course, if the sample size is too small to evaluate the role of chance, then a scientist should probably declare the evidence too fragile to support a causal conclusion.

S-F also postulates that that a posterior probability rather than a significance probability approach would “better counteract conflicts of interest that sometimes cause scientists to pay inadequate attention to public-welfare consequences of their work.” Tainted at 106. This claim is a remarkable assertion, which is not supported by any empirical evidence.  The varieties of evidence that go into an overall assessment of a causal hypothesis are often quantitatively incommensurate.  The so-called preponderance-of-the-evidence described by S-F is often little more than a subjective overall assessment of weight of the evidence.  The approving citations to the work of Carl Cranor support interpreting S-F to endorse this subjective, anything-goes approach to weight of the evidence.  As for WOE eliminating inadequate attention to “public welfare,” S-F’s citations actually suggest the opposite. S-F’s citations to the 1961 reviews by Wynder and by Little illustrate how subjective narrative reviews can be, with diametrically opposed results.  Rather than curbing conflicts of interest, these subjective, narrative reviews illustrate how contrary results may be obtained by the failure to pre-specify criteria of validity, and inclusion and exclusion of admissible evidence. Still, S-F asserts that “up to 80 percent of welfare-related statistical studies have false-negative or type-II errors, failing to reject a false null.” Tainted at 106. The support for this assertion is a citation to a review article by David Resnik. See David Resnik, “Statistics, Ethics, and Research: An Agenda for Education and Reform,” 8 Accountability in Research 163, 183 (2000). Resnik’s paper is a review article, not an empirical study, but at the page cited by S-F, Resnik in turn cites to well-known papers that present actual data:

“There is also evidence that many of the errors and biases in research are related to the misuses of statistics. For example, Williams et al. (1997) found that 80% of articles surveyed that used t-tests contained at least one test with a type II error. Freiman et al. (1978)  * * *  However, empirical research on statistical errors in science is scarce, and more work needs to be done in this area.”

Id. The papers cited by Resnik, Williams (1997)[6] and Freiman (1978)[7] did identify previously published studies that over-interpreted statistically non-significant results, but the identified type-II errors were potential errors, not ascertained errors, because the authors made no claim that every non-statistically significant result actually represented a missed true association. In other words, S-F is not entitled to say that these empirical reviews actually identified failures to reject fall null hypotheses. Furthermore, the empirical analyses in the studies cited by Resnik, who was in turn cited by S-F, did not look at correlations between alleged conflicts of interest and statistical errors. The cited research calls for greater attention to proper interpretation of statistical tests, not for their abandonment.

In the end, at least in the chapter on statistics, S-F fails to deliver much if anything on her promise to show how to evaluate science from a philosophic perspective.  Her discussion of the Pritchard case is not an analysis; it is a harangue. There are certainly more readable, accessible, scholarly, and accurate treatments of the scientific and statistical issues in this book.  See, e.g., Michael B. Bracken, Risk, Chance, and Causation: Investigating the Origins and Treatment of Disease (2013).


[1] Not to be confused with the deceased federal judge by the same name, William J. Rea. William J. Rea, 1 Chemical Sensitivity – Principles and Mechanisms (1992); 2 Chemical Sensitivity – Sources of Total Body Load (1994),  3 Chemical Sensitivity – Clinical Manifestation of Pollutant Overload (1996), 4 Chemical Sensitivity – Tools of Diagnosis and Methods of Treatment (1998).

[2] R. C. Shelton, M. B. Keller, et al., “Effectiveness of St. John’s Wort in Major Depression,” 285 Journal of the American Medical Association 1978 (2001).

[3] M. A. Eisenberger, B. A. Blumenstein, et al., “Bilateral Orchiectomy With or Without Flutamide for Metastic [sic] Prostate Cancer,” 339 New England Journal of Medicine 1036 (1998).

[4] Kenneth J. Rothman, Epidemiology 123–127 (NY 2002).

[5] Endnote 47 references the following papers: E. Hammond, “Cause and Effect,” in E. Wynder, ed., The Biologic Effects of Tobacco 193–194 (Boston 1955); E. L. Wynder, “An Appraisal of the Smoking-Lung-Cancer Issue,”264  New England Journal of Medicine 1235 (1961); see C. Little, “Some Phases of the Problem of Smoking and Lung Cancer,” 264 New England Journal of Medicine 1241 (1961); J. R. Stutzman, C. A. Luongo, and S. A McLuckey, “Covalent and Non-Covalent Binding in the Ion/Ion Charge Inversion of Peptide Cations with Benzene-Disulfonic Acid Anions,” 47 Journal of Mass Spectrometry 669 (2012). Although the paper on ionic charges of peptide cations is unfamiliar, the other papers do not eschew traditional statistical significance testing techniques. By the time these early (1961) reviews were written, the association that was reported between smoking and lung cancer was clearly accepted as not likely explained by chance.  Discussion focused upon bias and potential confounding in the available studies, and the lack of animal evidence for the causal claim.

[6] J. L. Williams, C. A. Hathaway, K. L. Kloster, and B. H. Layne, “Low power, type II errors, and other statistical problems in recent cardiovascular research,” 42 Am. J. Physiology Heart & Circulation Physiology H487 (1997).

[7] Jennie A. Freiman, Thomas C. Chalmers, Harry Smith and Roy R. Kuebler, “The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 ‛negative’ trials,” 299 New Engl. J. Med. 690 (1978).

Contra Parascandola’s Reduction of Specific Causation to Risk

August 22nd, 2014

Mark Parascandola is a photographer who splits his time between Washington DC, and Almeria, Spain.  Before his career in photography, Parascandola studied philosophy (Cambridge), and did graduate work in epidemiology (Johns Hopkins, MPH). In 1997 to 1998, he studied the National Cancer Institute’s role in determining that smoking causes some kinds of cancer.  He went on to serve as a staff epidemiologist at NCI, at its Tobacco Control Research Branch, in the Division of Cancer Control and Population Sciences (DCCPS).

Back in the 1990s, Parascandola wrote an article, which is a snapshot and embellishment of arguments given by Sander Greenland, on the use and alleged abuse of relative risks to derive a “probability of causation.” See Mark Parascandola, “What’s Wrong with the Probability of Causation?” 39 Jurimetrics J. 29 (1998)[cited here are Parascandola]. Parascandola’s article is a locus of arguments that have recurred from time to time, and worth revisiting.

Parascandola offers an interesting historical factoid, which is a useful reminder to those who suggest that the RR > 2 argument was the brainchild of lawyers:  The argument was first suggested in 1959, by Dr. Victor P. Bond, a physician with expertise in medical physics at the Brookhaven National Laboratory.  See Parascandola at 31 n. 6 (citing Victor P. Bond, The Medical Effects of Radiation (1960), reprinted in NACCA 13th Annual Convention 1959, at 126 (1960).

Unfortunately, Parascandola is a less reliable reporter when it comes to the judicial use of the relative risk greater than two (RR > 2) argument.  He argues that Judge Jack Weinstein opposed the RR > 2 argument on policy grounds, when in fact, Judge Weinstein rejected the anti-probabilistic argument that probabilistic inference could never establish specific causation, and embraced the RR > 2 argument as a logical policy compromise that would allow evidence of risk to substitute for specific causation in a limited fashion. Parascandola at 33-34 & n.20. Given Judge Weinstein’s many important contributions to tort and procedural law, and the importance of the Agent Orange litigation, it is worth describing Judge Weinstein’s views accurately. See In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 785, 817, 836 (E.D.N.Y. 1984) (“A government administrative agency may regulate or prohibit the use of toxic substances through rulemaking, despite a very low probability of any causal relationship.  A court, in contrast, must observe the tort law requirement that a plaintiff establish a probability of more than 50% that the defendant’s action injured him. … This means that at least a two-fold increase in incidence of the disease attributable to Agent Orange exposure is required to permit recovery if epidemiological studies alone are relied upon.”), aff’d 818 F.2d 145, 150-51 (2d Cir. 1987)(approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988); see also In re “Agent Orange” Prod. Liab. Litig., 611 F. Supp. 1223, 1240, 1262 (E.D.N.Y. 1985)(excluding plaintiffs’ expert witnesses), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988).[1]

Parascandola’s failure to cite and describe Judge Weinstein’s views raises some question of the credibility of his analyses, and his assertion that “[he] will demonstrate that the PC formula is invalid in many situations and cannot fill the role it is given.” Parascandola at 30 (emphasis added).

Parascandola describes basic arithmetic of probability of causation (PC) in terms of a disease for which we “expect cases” and for which we have “excess cases.” The rate of observed cases in an exposed population divided by the rate of expected cases in an unexposed population provides an estimate of the population relative risk (RR). The excess cases can be obtained simply from the difference between observed cases in the exposed group and the expected cases in the unexposed group.  The attributable fraction is the ratio of excess cases to total cases.

The probability of causation “PC” = 1 – (1/RR).

Heterogeneity Yields Uncertainty Argument

The RR describes a group statistic, and an individual’s altered risk will almost certainly not be exactly equal to the group’s average risk. Parascandola notes that sometimes this level of uncertainty can be remedied by risk measurements for subgroups that better fit an individual plaintiff’s characteristics.  All true, but this is hardly an argument against RR > 2.  At best, the heterogeneity argument is an expression of inference skepticism of the sort that led Judge Weinstein to accept RR > 2 as a reasonable compromise. The presence of heterogeneity of this sort simply increases the burden upon plaintiff to provide RR statistics from studies that very tightly resemble plaintiff in terms of exposure and other characteristics.

Urning for Probablistic Certainty

Parascandola describes how the PC formula arises from a consideration of the “urn model” of disease causation.  Suppose in group of sufficient size there were expected 200 stomach cancer cases within a certain time, but 300 were observed. We can model the situation with an urn of 300 marbles, 200 of which are green, and 100 are red. Blindfolded or colorblind, we pull a single marble from the urn, and we have only a 1/3 chance of obtaining a red, “excess” marble case. Parascandola at 36-37 (borrowing from David Kaye, “The Limits of the Preponderance of the Evidence Standard: Justifiably Naked Statistical Evidence and Multiple Causation,” 7 Am. Bar Fdtn. Res. J. 487, 501 (1982)).

Parascandola argues that the urn model is not necessarily correct.  Causation cannot always be reduced to a single cause. Complex etiologic mechanisms and pathways are common.  Interactions between and among causes frequently occur.  Biological phenomena are sometimes “over-determined.” Parascandola asks us to assume that some of the non-excess cases are also “etiologic cases,” which were caused by the exposure but which would not have occurred but for the exposure.  Id. at 37. Borrowing from Greenland, Parascandola asserts that “[a]ll excess cases are etiologic cases, but not vice versa.” Id. at 38 & n.37 (quoting from Sander Greenland & James M. Robins, “Conceptual Problems in the Definition and Interpretation of Attributable Fractions,” 128 Am. J. Epidem. 1185, 1185 (1988)).

Parascandola’s argument, if accepted, proves too much to help plaintiffs who hope to establish specific causation with evidence of increased risk. His argument posits a different, more complex model of causation, for which plaintiffs usually have no evidence.  (If they did have such evidence, then they would have nothing to fear in the assumptions of the simplistic urn model; they could rebut those assumptions.) Parascandola’s argument pushes the speculation envelope by asking us to believe that some “non-excess” cases are etiologic cases, but providing no basis for identifying which ones they are.  Unless and until such evidence is forthcoming, Parascandola’s argument is simply uncontrolled multi-leveled conjecture.

Again borrowing from Sander Greenland’s speculation, Parascandola advances a variant of the argument above by suggesting that an exposure may not increase the overall number of excess cases, but that it may accelerate the onset of the harm in question. While it is true that the element of time is important, both in law and in life, the invoked speculation can be, and usually is, tested by time windows or time series analyses in observational epidemiology and clinical trials.  The urn model is “flat” with respect to the temporal dimension, but if plaintiffs want to claim acceleration, then they should adduce Kaplan-Meier curves and the like.  But again, with the modification of the time dimension, plaintiffs will still need hazard ratios or other risk ratios greater than two to make out their case, unless there is a biomarker/fingerprint of individual causation. The introduction of the temporal element is important to an understanding of risk, but Parascandola’s argument does not help transmute evidence of risk in a group to causation in an individual.

Joint Chancy Causation

In line with his other speculative arguments, Parascandola asks:  what if a given cancer in the exposed group is the product of two causes rather than due to one or another of the two causes? Parascandola at 40. This question restates the speculative argument in only slightly different terms.  We could multiply the possible causal sets by suggesting that the observed effect resulted from one or the other or both or none of the causes.  Parascandola calls this “joint chancy causation,” but he manages to show only that the inference of causation from prior chance or risk is a very chancy (or dicey) step in his argument.  Parascandola argues that we should not assume that the urn model is true, when multiple causation models are “plausible and consistent” with other causal theories.

Interestingly, this line of argument would raise the burden upon plaintiffs by requiring them to specify the applicable causal model in ways that (1) they often cannot, and (2) they now, under current law, are not required to do.

Conclusion

In the end, Parascandola realizes that he has raised, not lowered, the burden for plaintiffs.  His counter is to suggest, contrary to law and science, that “the existence of alternative hypotheses should not prevent the plaintiff’s case from proceeding.” Parascandola at 41 n.50.  Because he says so. In other words, Parascandola is telling us that irrespective of how poorly established a hypothesis is, or of how speculative an inference is, or of the existence and strength of alternative hypotheses,

“This trial must be tried.”

W.S. Gilbert, Trial by Jury (1875).

With bias of every kind, no doubt.

That is not science, law, or justice.


[1] An interesting failure or lack of peer review in a legal journal.

 

Climategate on Appeal

August 17th, 2014

Michael Mann, a Professor of Meteorology at Penn State University, studies and writes about climate change. When the email servers of the University of East Anglia were hacked in 2009, Mann’s emails were among those used to suggest that there was a conspiracy to suppress evidence inconsistent with climate change.

Various committees investigated allegations of scientific malfeasance, which has come to be known as “climategate”; none found evidence of scientific misconduct. Some of the committees, however, did urge that the investigators engage in greater sharing of their supporting data, methods, and materials.

In February 2010, Penn State issued a report of its investigation, which found there was “no credible evidence that Dr. Mann had or has ever engaged in, or participated in, directly or indirectly, any actions with an intent to suppress or to falsify data.” A Final Investigation Report, from Penn State in June 2010, further cleared Mann.

In the United Kingdom, a Parliamentary Committee on Science and Technology published a report, in March 2010, finding that the criticisms of the Climate Research Unit (CRU) at the University of East Anglia (UEA) were not well founded. A month later, the UEA issued a Report of an International Panel, which found no evidence of deliberate scientific malpractice. Another UEA report, the Independent Climate Change Email Review report, found no reason to doubt the honesty of the scientists involved. An official UK governmental report, in September 2010, similarly cleared the climate researchers of wrongdoing.

The view from this side of the Atlantic largely exonerated the climate researchers of any scientific misconduct. An EPA report, in July 2010, dismissed the email content as merely a “candid discussion” among scientists collaborating on complex data. An independent review by the Department of Commerce’s Inspector General found no “inappropriate” manipulation of data in the emails. The National Science Foundation reported, in August 2011, that it could discern no research misconduct in the climategate emails.

Rand Simberg, an adjunct scholar with the Competitive Enterprise Institute (CEI) wrote a blog post, “The Other Scandal in Unhappy Valley” (July 13, 2012), in which he referred to Mann and his research as “wrongdoing” and “hockey-stick deceptions.” Simberg describes the hacked UEA emails as having “revealed” that Mann “had been engaging in data manipulation to keep the blade on his famous hockey-stick graph.” Similarly, Simberg states that “many of the luminaries of the ‛climate science’ community were shown to have been behaving in a most unscientific manner.”

The current on-line version of the Simberg’s blog post ends with a note[1]:

*Two inappropriate sentences that originally appeared in this post have been removed by the editor.

A post by Mark Steyn on the National Review online website called Mann’s hockey stick “fraudulent.” A subsequent National Review piece offered that in “common polemical usage, ‛fraudulent’ doesn’t mean honest-to-goodness criminal fraud. It means intellectually bogus and wrong.”

Legal counsel for Penn State wrote the Competitive Enterprise Institute, in August 2012, to request an apology from Simberg and the CEI, and a retraction of Simberg’s blog post. I am not sure what was in the two, subsequently removed, “inappropriate sentences” in Simberg’s piece were, or when the sentences were removed, but Dr. Mann, represented by Cozen O’Connor, went on to  sue Mark Steyn, Rand Simberg, the CEI, and National Review, for libel, in October 2012, in the Superior Court of the District of Columbia. Further publications led to an Amended Complaint in 2013.

Mann obviously does not like being called the author of fraudulent and intellectually bogus work, and he claims that the publications by Simberg and Steyn are libelous as “allegations of academic fraud.”

The D.C. Superior Court denied defendants’ motion to dismiss, setting up interlocutory appeals to the D.C. Court of Appeals, which is the highest court for the District. The appellate court allowed an interlocutory appeal, with a schedule that calls for appellants’ briefs by August 4, 2014. Dr. Mann’s brief is due by September 3, 2014, and appellants’ reply briefs by September 24, 2014.  The Court set consideration of the appeal for its November calendar.

Defendants CEI and National Review filed their opening briefs last week. This week, on August 11, 2014, the Cato Institute, Reason Foundation, Individual Rights Foundation and Goldwater Institute filed a brief in support of CEI and National Review. Other amici who filed in support of the defendants are Mark Steyn, the District of Columbia, and the Alliance Defending Freedom, and the Electronic Frontier Foundation.

I am not sure that all the epithets point to academic fraud.  Some of the adjectives, such as “bogus” do not really connote scienter or intent to deceive.  The use of the adjective “fraudulent,” however, does connote intentional falsity, designed to mislead. Deceit and intent to mislead seem to be at the heart of an accusation of fraud.

The defendants’ arguments, and their amici, on appeal predictably rely heavily upon the First Amendment to protect their speech, but surprisingly, they characterize labeling someone’s research as “fraudulent” as merely “hyperbolic” or “robust” debate and polemics.

Some of the defendants’ other arguments are even more surprising.  For instances, Cato correctly points out that “Courts are ill-suited to officiate scientific debate to determine ‛truth’ or ‛falsity’.” True, but officiate they must in criminal fraud, intellectual property, product liability, and in securities fraud cases, as well as many other kinds of litigations. Cato admonishes that the “[e]volution of scientific thought over time highlights the danger of courts[’] determining ‛truth’ in public debate.” Dangerous indeed, but a commonplace in state and federal courts throughout the land.

Is this Think Tank Thuggery or robust free speech? The use of “fraudulent” seems to be an accusation, and it would have much more “robust” to have had a careful documentation of what exactly was Professor Mann’s supposed deviation from a scientific standard of care.


[1] The words “fraud” and “fraudulent” do not appear in the current on-line version of Simberg’s post.

Homeopathy on Trial

August 7th, 2014

As Tim Minchin put it in his poem, “Storm,” an alternative medicine is either not shown to be effective or has been shown to be ineffective; because if an alternative medicine has been shown to be effective, then we call it “medicine.”

Standard Homeopathic Company makes and sells various so-called alternative medicine remedies. Plaintiffs filed suit against Standard for misleadingly claiming efficacy, and sought class action certification.  Class actions have become increasingly difficult to maintain in federal court, but this one seems like a worthy candidate. The plaintiffs, in their Third Amended Complaint, alleged that “there is ‘little evidence’ that homeopathy is effective.” Usually, plaintiffs are perfectly happy with just a ‘little evidence’ to support claims for many dollars, but here they complained about being duped by homeopathy and its claims of dubious validity.

On August 1, 2014, District Judge Dolly Gee certified a class on behalf of purchasers of defendants’ homeopathic remedies (Calms Forte, Teething Tablets, Migraine Headache Relief, Colic Tablets, etc.) from February 2008 to the present. Allen v. Hyland’s Inc., 2:12-cv-01150, 2014 WL 3819713 (C.D. Calif. Aug. 1, 2014). The win was no doubt as sweet as the sugar pills that they had bought.

Standard Homeopathic is represented by Norton Rose Fulbright, which also represents ethical drug manufacturers.  Curiously, the defense lawyers must not have seen the substantial potential conflict of interest in representing a homeopathic manufacturer. Watching defendants’ attempts to defend the truth of their advertised claims for homeopathic remedies should make for an interesting litigation to watch.