Schachtman Law » Scientific Evidence

TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Sander Greenland is one of the few academics, who has served as an expert witness, who has written post-mortems of his involvement in various litigations[1]. Although settling scores with opposing expert witnesses can be a risky business[2], the practice can provide important insights for judges and lawyers who want to avoid the errors of the past. Greenland correctly senses that many errors seem endlessly recycled, and that courts could benefit from disinterested commentary on cases. And so, there should be a resounding affirmation from federal and state courts to the proclaimed “need for critical appraisal of expert witnesses in epidemiology and statistics,” as well as in many other disciplines.

A recent exchange[3] with Professor Greenland led me to revisit his Wake Forest Law Review article. His article raises some interesting points, some mistaken, but some valuable and thoughtful considerations about how to improve the state of statistical expert witness testimony. For better and worse[4], lawyers who litigate health effects issues should read it.

Other Misunderstandings

Greenland posits criticisms of defense expert witnesses[5], who he believes have misinterpreted or misstated the appropriate inferences to be drawn from null studies. In one instance, Greenland revisits one of his own cases, without any clear acknowledgment that his views were largely rejected.[6] The State of California had declared, pursuant to Proposition 65 ( the Safe Drinking Water and Toxic Enforcement Act of 1986, Health and Safety Code sections 25249.5, et seq.), that the State “knew” that di(2-ethylhexyl)phthalate, or “DEHP” caused cancer. Baxter Healthcare challenged the classification, and according to Greenland, the defense experts erroneously interpreted inclusive studies with evidence supporting a conclusion that DEHP does not cause cancer.

Greenland argues that the Baxter expert’s reference[7] to an IARC working group’s classification of DEHP as “not classifiable as to its carcinogenicity to humans” did not support the expert’s conclusion that DEHP does not cause cancer in human. If Baxter’s expert invoked the IARC working group’s classification for complete exoneration of DEHP, then Greenland’s point is fair enough. In his single-minded attack on Baxter’s expert’s testimony, however, Greenland missed a more important point, which is that the IARC’s determination that DEHP is not classifiable as to carcinogenicity is directly contradictory of California’s epistemic claim to “know” that DEHP causes cancer. And Greenland conveniently omits any discussion that the IARC working group had reclassified DEHP from “possibly carcinogenic” to “not classifiable,” in the light of its conclusion that mechanistic evidence of carcinogenesis in rodents did not pertain to humans.[8] Greenland maintains that Baxter’s experts misrepresented the IARC working group’s conclusion[9], but that conclusion, at the very least, demonstrates that California was on very shaky ground when it declared that it “knew” that DEHP was a carcinogen. California’s semantic gamesmanship over its epistemic claims is at the root of the problem, not a misstep by defense experts in describing inconclusive evidence as exonerative.

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.”

39 Wake Forest Law Rev. at 303. Despite Greenland’s alignment with California in the Denton case, the fact of the matter is that a verdict of “uncertain” was allowed, and he was free to criticize California for making a grossly exaggerated epistemic claim on inconclusive evidence.

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

39 Wake Forest Law Rev. at 305. And yet, his involvement in the Denton case (as well as other cases, such as silicone gel breast implant cases, thimerosal cases, etc.) suggest that he is willing to lend aid and support to judgments for plaintiffs when the evidence is inconclusive.

Important Advice and Recommendations

These foregoing points are rather severe limitations to Greenland’s article, but lawyers and judges should also look to what is good and helpful here. Greenland is correct to call out expert witnesses, regardless of party of affiliation, who opine that inconclusive studies are “proof” of the null hypothesis. Although some of Greenland’s arguments against the use of significance probability may be overstated, his corrections to the misstatements and misunderstandings of significance probability should command greater attention in the legal community. In one strained passage, however, Greenland uses a disjunction to juxtapose null hypothesis testing with proof beyond a reasonable doubt[10]. Greenland of course understands the difference, but the context would lead some untutored readers to think he has equated the two probabilistic assessments. Writing in a law review for lawyers and judges might have led him to be more careful. Given the prevalence of plaintiffs’ counsel’s confusing the 95% confidence coefficient with a burden of proof akin to beyond a reasonable doubt, great care in this area is, indeed, required.

Despite his appearing for plaintiffs’ counsel in health effects litigation, some of Greenland’s suggestions are balanced and perhaps more truth-promoting than many plaintiffs’ counsel would abide. His article provides an important argument in favor of raising the legal criteria for witnesses who purport to have expertise to address and interpret epidemiologic and experimental evidence[11]. And beyond raising qualification requirements above mere “reasonable pretense at expertise,” Professor Greenland offers some thoughtful, helpful recommendations for improving expert witness testimony in the courts:

“Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Each of these three suggestions is valuable and constructive, and worthy of an independent paper. The recommendation of neutral expert witnesses and scholarly tutorials for judges is hardly new. Many defense counsel and judges have argued for them in litigation and in commentary. The first recommendation, of publishing “controversial testimony” is part of the purpose of this blog. There would be great utility to making expert witness testimony, and analysis thereof, more available for didactic purposes. Perhaps the more egregious testimonial adventures should be republished in professional journals, as Greenland suggests. Greenland qualifies his recommendation with “as space allows,” but space is hardly the limiting consideration in the digital age.

Causation

Professor Greenland correctly points out that causal concepts and conclusions are often essentially contested[12], but his argument might well be incorrectly taken for “anything goes.” More helpfully, Greenland argues that various academic ideals should infuse expert witness testimony. He suggests that greater scholarship, with acknowledgment of all viewpoints, and all evidence, is needed in expert witnessing. 39 Wake Forest Law Rev. at 293.

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Id. Despite his position in the Denton case, and others, Greenland and all expert witnesses are free to maintain that more research is needed before a causal claim can be supported. Greenland also maintains that expert witnesses should “look past” the conclusions drawn by authors, and base their opinions on the “actual data” on which the statistical analyses are based, and from which conclusions have been drawn. Courts have generally rejected this view, but if courts were to insist upon real expertise in epidemiology and statistics, then the testifying expert witnesses should not be constrained by the hearsay opinions in the discussion sections of published studies – sections which by nature are incomplete and tendentious. See “Follow the Data, Not the Discussion” (May 2, 2010).

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

39 Wake Forest Law Rev. at 293-4. This recommendation would be helpful in assuring courts that the data may simply not support conclusions sufficiently certain to be submitted to lay judges and jurors. Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319, 320 (7th Cir. 1996) (“But the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”) (internal citations omitted).

Threats to Validity

One of the serious mistakes counsel often make in health effects litigation is to invite courts to believe that statistical significance is sufficient for causal inferences. Greenland emphasizes that validity considerations often are much stronger, and more important considerations than the play of random error[13]:

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

* * * * * *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

39 Wake Forest Law Rev. at 302 – 03. Greenland’s abbreviated list of threats to validity should remind courts that they cannot sniff a p-value below five percent and then safely kick the can to the jury. The literature on evaluating bias and confounding is huge, but Greenland was a co-author on an important recent paper, which needs to be added to the required reading lists of judges charged with gatekeeping expert witness opinion testimony about health effects. See Timothy L. Lash, et al., “Good practices for quantitative bias analysis,” 43 Internat’l J. Epidem. 1969 (2014).

[1] For an influential example of this sparse genre, see James T. Rosenbaum, “Lessons from litigation over silicone breast implants: A call for activism by scientists,” 276 Science 1524 (1997) (describing the exaggerations, distortions, and misrepresentations of plaintiffs’ expert witnesses in silicone gel breast implant litigation, from perspective of a highly accomplished scientist physician, who served as a defense expert witness, in proceedings before Judge Robert Jones, in Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). In one attempt to “correct the record” in the aftermath of a case, Greenland excoriated a defense expert witness, Professor Robert Makuch, for stating that Bayesian methods are rarely used in medicine or in the regulation of medicines. Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). Greenland heaped adjectives upon his adversary, “ludicrous claim,” “disturbing, “misleading expert testimony,” and “demonstrably quite false.” See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014) (debunking Prof. Greenland’s claims).

[2] One almost comical example of trying too hard to settle a score occurs in a footnote, where Greenland cites a breast implant case as having been reversed in part by another case in the same appellate court. See 39 Wake Forest Law Rev. at 309 n.68, citing Allison v. McGhan Med. Corp., 184 F.3d 1300, 1310 (11th Cir. 1999), aff’d in part & rev’d in part, United States v. Baxter Int’l, Inc., 345 F.3d 866 (11th Cir. 2003). The subsequent case was not by any stretch of the imagination a reversal of the earlier Allison case; the egregious citation is a legal fantasy. Furthermore, Allison had no connection with the procedures for court-appointed expert witnesses or technical advisors. Perhaps the most charitable interpretation of this footnote is that it was injected by the law review editors or supervisors.

[3] See “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[4] In addition to the unfair attack on Professor Makuch, see supra, n.1, there is much that some will find “disturbing,” “misleading,” and even “ludicrous,” (some of Greenland’s favorite pejorative adjectives) in the article. Greenland repeats in brief his arguments against the legal system’s use of probabilities of causation[4], which I have addressed elsewhere.

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[6] See 39 Wake Forest Law Rev. at 294-95, citing Baxter Healthcare Corp. v. Denton, No. 99CS00868, 2002 WL 31600035, at *1 (Cal. App. Dep’t Super. Ct. Oct. 3, 2002) (unpublished); Baxter Healthcare Corp. v. Denton, 120 Cal. App. 4th 333 (2004)

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

[10] 39 Wake Forest L. Rev. at 305 (“If it is necessary to prove causation ‛beyond a reasonable doubt’–or be ‛compelled to give up the null’ – then action can be forestalled forever by focusing on any aspect of available evidence that fails to conform neatly with the causal (alternative) hypothesis. And in medical and social science there is almost always such evidence available, not only because of the ‛play of chance’ (the focus of ordinary statistical theory), but also because of the numerous validity problems in human research.”

[11] See Peter Green, “Letter from the President to the Lord Chancellor regarding the use of statistical evidence in court cases” (Jan. 23, 2002) (writing on behalf of The Royal Statistical Society; “Although many scientists have some familiarity with statistical methods, statistics remains a specialised area. The Society urges you to take steps to ensure that statistical evidence is presented only by appropriately qualified statistical experts, as would be the case for any other form of expert evidence.”).

[12] 39 Wake Forest Law Rev. at 291 (“In reality, there is no universally accepted method for inferring presence or absence of causation from human observational data, nor is there any universally accepted method for inferring probabilities of causation (as courts often desire); there is not even a universally accepted definition of cause or effect.”).

[13] 39 Wake Forest Law Rev. at 302-03 (“If one is more concerned with explaining associations scientifically, rather than with mechanical statistical analysis, evidence about validity can be more important than statistical results.”).

Posted in Causation, Expert Witnesses, Rule 702, Scientific Evidence, statistical evidence | Comments Off on Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

Sander Greenland on “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics”

February 8th, 2015

Other Misunderstandings

Greenland goes on to complain that in litigation over health claims:

“A verdict of ‛uncertain’ is not allowed, yet it is the scientific verdict most often warranted. Elimination of this verdict from an expert’s options leads to the rather perverse practice (illustrated in the DEHP testimony cited above) of applying criminal law standards to risk assessments, as if chemicals were citizens to be presumed innocent until proven guilty.”

Perhaps recognizing that he may be readily be seen as an advocate for coming to the defense of California on the DEHP issue, Greenland protests that:

“I am not suggesting that judgments for plaintiffs or actions against chemicals should be taken when evidence is inconclusive.”

Important Advice and Recommendations

“Begin publishing projects in which controversial testimony (a matter of public record) is submitted, and as space allows, published on a regular basis in scientific or law journals, perhaps with commentary. An online version could provide extended excerpts, with additional context.
Give courts the resources and encouragement to hire neutral experts to peer-review expert testimony.
Encourage universities and established scholarly societies (such as AAAS, ASA, APHA, and SER) to conduct workshops on basic epidemiologic and statistical inference for judges and other legal professionals.”

39 Wake Forest Law Rev. at 308.

Causation

Greenland’s argument provides an important corrective to the rhetoric of Oreskes, Cranor, Michaels, Egilman, and others on “manufacturing doubt”:

“Never force a choice among competing theories; always maintain the option of concluding that more research is needed before a defensible choice can be made.”

Greenland urges expert witnesses and legal counsel to be forthcoming about their assumptions, their uncertainty about conclusions:

“Acknowledgment of controversy and uncertainty is a hallmark of good science as well as good policy, but clashes with the very time limited tasks faced by attorneys and courts”

Threats to Validity

“For very imperfect data (e.g., epidemiologic data), the limited conclusions offered by statistics must be further tempered by validity considerations.”

* * * * * *

“Examples of validity problems include non-random distribution of the exposure in question, non-random selection or cooperation of subjects, and errors in assessment of exposure or disease.”

[3] See “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[5] One of Baxter’s expert witnesses appeared to be the late Professor Patricia Buffler.

[7] Although Greenland cites to a transcript, the citation is to a judicial opinion, and the actual transcript of testimony is not available at the citation give.

[8] See Denton, supra.

[9] 39 Wake Forest L. Rev. at 297.

Fixodent Study Causes Lockjaw in Plaintiffs’ Counsel

February 4th, 2015

Litigation Drives Science

Back in 2011, the Fixodent MDL Court sustained Rule 702 challenges to plaintiffs’ expert witnesses. “Hypotheses are verified by testing, not by submitting them to lay juries for a vote.” In re Denture Cream Prods. Liab. Litig., 795 F. Supp. 2d 1345, 1367 (S.D.Fla.2011), aff’d, Chapman v. Procter & Gamble Distrib., LLC, 766 F.3d 1296 (11th Cir. 2014). The Court found that the plaintiffs had raised a superficially plausible hypothesis, but that they had not verified the hypothesis by appropriate testing[1].

Like dentures to Fixodent, the plaintiffs stuck to their claims, and set out to create the missing evidence. Plaintiffs’ counsel contracted with Dr. Salim Shah and his companies Sarfez Pharmaceuticals, Inc. and Sarfez USA, Inc. (“Sarfez”) to conduct human research in India, to support their claims that zinc in denture cream causes neurological damage[2]. In re Denture Cream Prods. Liab. Litig., Misc. Action 13-384 (RBW), 2013 U.S. Dist. LEXIS 93456, *2 (D.D.C. July 3, 2013). When the defense learned of this study, and the plaintiffs’ counsel’s payments of over $300,000, to support the study, they sought discovery of raw data, study protocol, statistical analyses, and other materials from plaintiffs’ counsel. Plaintiffs’ counsel protested that they did not have all the materials, and directed defense counsel to Sarfez. Although other courts have made counsel produce similar materials from the scientists and independent contractors they engaged, in this case, defense counsel followed the trail of documents to contractor, Sarfez, with subpoenas in hand. Id. at *3-4.

The defense served a Rule 45 subpoena on Sarfez, which produced some, but not all responsive documents. Proctor & Gamble pressed for the missing materials, including study protocols, analytical reports, and raw data. Id. at *12-13. Judge Reggie Walton upheld the subpoena, which sought underlying data and non-privileged correspondence, to be within the scope of Rules 26(b) and 45, and not unduly burdensome. Id. at *9-10, *20. Sarfez attempted to argue that the requested materials, listed as email attachments, might not exist, but Judge Walton branded the suggestion “disingenuous.” Attachments to emails should be produced along with the emails. Id. at *12 (citing and collecting cases). Although Judge Walton did not grant a request for forensic recovery of hard-drive data or for sanctions, His Honor warned Sarfez that it might be required to bear the cost of forensic data recovery if it did not comply the court’s order. Id. at *15, *22.

Plaintiffs Put Their Study Into Play

The study at issue in the subpoena was designed by Frederick K. Askari, M.D., Ph.D., an associate professor of hepatology, in the University of Michigan Health System. In re Denture Cream Prods. Liab. Litig., No. 09–2051–MD, 2015 WL 392021, at *7 (S.D. Fla. Jan. 28, 2015). At the instruction of plaintiffs’ counsel, Dr. Askari sought to study the short-term effects of Fixodent on copper absorption in humans. Working in India, Askari conducted the study on 24 participants, who were given a controlled diet for 36 days. Of the 24 participants, 12, randomly selected, received 12 grams of Fixodent per day (containing 204 mg. of zinc). Another six participants, randomly selected, were given zinc acetate, three times per day (150 mg of zinc), and the remaining six participants received placebo, three times per day.

A study protocol was approved by an independent group[3], id. at *9, and the study was supposed to be conducted with a double blind. Id. at *7. Not surprisingly, those participants who received doses of Fixodent or zinc acetate had higher urinary levels of zinc (pee < 0.05). The important issue, however, was whether the dietary zinc levels affect copper excretion in a way that would support plaintiffs’ claims that copper levels were lowered sufficiently by Fixodent to cause a syndromic neurological disorder. The MDL Court ultimately concluded that plaintiffs’ expert witnesses’ opinions on general causation claims were not sufficiently supported to satisfy the requirements of Rule 702, and upheld defense challenges to those expert witnesses. In doing so, the MDL Court had much of interest to say about case reports, weight of the evidence, and other important issues. This post, however, concentrates on the deviations of one study, commissioned by plaintiffs’ counsel, from the scientific standard of care. The Askari “research” makes for a fascinating case study of how not to conduct a study in a litigation caldron.

Non-Standard Deviations

The First Deviation – Changing the Ascertainment Period After the Data Are Collected

The protocol apparently identified a primary endpoint to be:

“the mean increase in [copper 65] excretion in fecal matter above the baseline (mg/day) averaged over the study period … to test the hypothesis that the release of [zinc] either from Fixodent or Zinc Acetate impairs [copper 65] absorption as measured in feces.”

The study outcome, on the primary end point, was clear. The plaintiffs’ testifying statistician, Hongkun Wang, stated in her deposition that the fecal copper (whether isotope Cu63 or Cu65) was not different across the three groups (Fixodent, zinc acetate, and placebo). Id. at *9[4]. Even Dr. Askari himself admitted that the total fecal copper levels were not increased in the Fixodent group compared with the placebo control group. Id. at *9.[5]

Apparently after obtaining the data, and finding no difference in the pre-specified end point of average fecal copper levels between Fixodent and placebo groups, Askari turned to a new end point, measured in a different way, not described in the protocol as the primary end point.

The Second Deviation – Changing Primary End Point After the Data Are Collected

In the early (days 3, 4, and 5) and late (days 31, 32, and 33) part of the Study, participants received a dose of purified copper 65[6] to help detect the “blockade of copper.” Id. at 8*. The participants’ fecal copper 65 levels were compared to their naturally occurring copper 63 levels. According to Dr. Askari:

“if copper is being blocked in the Fixodent and zinc acetate test subjects from exposure to the zinc in the test product (Fixodent) and positive control (zinc acetate), the ratio of their fecal output of copper 65 as compared to their fecal output of copper 63 would increase relative to the control subjects, who were not dosed with zinc. In short, a higher ratio of copper 65 to copper 63 reflects blocking of copper.”

Id.

Askari analyzed the ratio of two copper isotopes (Cu65 /Cu63), in the limited period of observation to study days 31 to 33. Id. at *9. Askari thus changed the outcome to be measured, the timing of the measurement, and manner of measurement (average over entire period versus amount on days 31 to 33). On this post hoc, non-prespecified end point, Askari claimed to have found “significant” differences.

The MDL Court expressed its skepticism and concern over the difference between the protocol’s specified end point, and one that came into the study only after the data were obtained and analyzed. The plaintiffs claimed that it was their (and Askari’s) intention from the initial stages of designing the Fixodent Blockade Study to use the Cu65/Cu63 ratio as the primary end point. According to the plaintiffs, the isotope ratio was simply better articulated and “clarified” as the primary end point in the final report than it was in the protocol. The Court was not amused or assuaged by the plaintiffs’ assurances. The study sponsor, Dr. Salim Shah could not point to a draft protocol that indicated the isotope ratio as the end point; nor could Dr. Shah identify a request for this analysis by Wang until after the study was concluded. Id. at *9.[7]

Ultimately, the Court declared that whether the protocol was changed post hoc after the primary end point provided disappointing analysis, or the isotope ratio was carelessly omitted from the protocol, the design or conduct of the study was “incompatible with reliable scientific methodology.”

The Third Deviation – Changing the Standard of “Significance” After the Data Are Collected and P-Values Are Computed

The protocol for the Blockade study called for a pre-determined Type I error rate (p-value) of no more than 5 percent.[8] Id. at *10. The difference in the isotope ratio showed an attained level of significance probability of 5.7 percent, and thus even the post hoc end point missed the prespecified level of significance. The final protocol changed the value of “significance” to 10 percent, to permit the plaintiffs to declare a “statistically significant” result. Dr. Wang admitted in deposition that she doubled the acceptable level of Type I error only after she obtained the data and calculated the p-value of 0.057. Id. at *10.[9]

The Court found that this deliberate moving of the statistical goal post reflected a “lack of objectivity and reliability,” which smacked of contrivance[10].

The Court found that the study’s deviations from the protocol demonstrated a lack of objectivity. The inadequacy of the Study’s statistical analysis plan supported the Court’s conclusion that Dr. Askari’s supposed finding of a “statistically significant” difference in fecal copper isotope ratio between Fixodent and placebo group participants was “not based on sufficiently reliable and objective scientific methodology” and thus could not support plaintiffs’ expert witnesses’ general causation claims.

The Fourth Deviation – Failing to Take Steps to Preserve the Blind

The protocol called for a double-blinded study, with neither the participants nor the clinical investigators knowing which participant was in which group. Rather than delivering the three different groups capsules that looked similar, the group each received starkly different looking capsules. Id. at *11. The capsules for one set were apparently so large that the investigators worried whether the participants would comply with the dosing regimen.

The Fifth Deviation – Failing to Take Steps to Keep Biological Samples From Becoming Contaminated

Documents and emails from Dr. Shah acknowledged that there had been “difficulties in storing samples at appropriate temperature.” Id. at *11. Fecal samples were “exposed to unfrozen and undesirable temperature conditions.” Dr. Shah called for remedial steps from the Study manager, but there was no documentation that such steps were taken to correct the problem. Id.

The Consequences of Discrediting the Study

Dr. Askari opined that the Study, along with other evidence, shows that Fixodent can cause copper deficiency myeloneuropathy (“CDM”). The plaintiffs, of course, argued that the Defendants’ criticisms of the Fixodent

Study’s methodology went merely to the “weight rather than admissibility.” Id. at *9. Askari’s study was but one leg of the stool, but the defense’s thorough discrediting of the study was an important step in collapsing the support for the plaintiffs’ claims. As the MDL Court explained:

“The Court cannot turn a blind eye to the myriad, serious methodological flaws in the Fixodent Blockade Study and conclude they go to weight rather than admissibility. While some of these flaws, on their own, may not be serious enough to justify exclusion of the Fixodent Blockade Study; taken together, the Court finds Fixodent Blockade Study is not “good science,” and is not admissible. Daubert, 509 U.S. at 593 (internal quotation marks and citation omitted).”

Id. at *11.

A study, such as the Fixodent Blockade Study, is not itself admissible, but the deconstruction of the study upon which plaintiffs’ expert witnesses relied, led directly to the Court’s decision to exclude those witnesses. The Court omitted any reference to Federal Rule of Evidence 703, which addresses the requirements of facts and data, otherwise inadmissible, which may be relied upon by expert witnesses in reaching their opinions.

[1] See “Philadelphia Plaintiff’s Claims Against Fixodent Prove Toothless” (May 2, 2012); Jacoby v. Rite Aid Corp., 2012 Phila. Ct. Com. Pl. LEXIS 208 (2012), aff’d, 93 A.3d 503 (Pa. Super. 2013); “Pennsylvania Superior Court Takes The Bite Out of Fixodent Claims” (Dec. 12, 2013).

[2] See “Using the Rule 45 Subpoena to Obtain Research Data” (July 24, 2013)

[3] The group was identified as the Ethica Norma Ethical Committee.

[4] citing Wang Dep. at 56:7–25, Aug. 13, 2013), and Wang Analysis of Fixodent Blockade Study [ECF No. 2197–56] (noting “no clear treatment effect on Cu63 or Cu65”).

[5] Askari Dep. at 69:21–24, June 20, 2013.

[6] Copper 65 is not a typical tracer; it is not radioactive. Naturally occurring copper consists almost exclusively of two stable (non-radioactive) isotope, Cu65 about 31 percent, Cu63 about 69 percent. See, e.g., Manuel Olivares, Bo Lönnerdal, Steve A Abrams, Fernando Pizarro, and Ricardo Uauy, “Age and copper intake do not affect copper absorption, measured with the use of 65Cu as a tracer, in young infants,” 76 Am. J. Clin. Nutr. 641 (2002); T.D. Lyon, et al., “Use of a stable copper isotope (65Cu) in the differential diagnosis of Wilson’s disease,” 88 Clin. Sci. 727 (1995).

[7] Shah Dep. at 87:12–25; 476:2–536:12, 138:6–142:12, June 5, 2013).

[8] The reported decision leaves unclear how the analysis would proceed, whether by ANOVA for the three groups, or t-tests, and whether there was multiple testing.

[9] Wang Dep. at 151:13–152:7; 153:15–18.

[10] 2015 WL 392021, at *10, citing Perry v. United States, 755 F.2d 888, 892 (11th Cir. 1985) (“A scientist who has a formed opinion as to the answer he is going to find before he even begins his research may be less objective than he needs to be in order to produce reliable scientific results.”); Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 n. 7 (11th Cir.2005) (“In evaluating the reliability of an expert’s method … a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result.” (alteration added)).

Posted in Expert Witnesses, Rule 702, Rule 703, Scientific Evidence, statistical evidence | Comments Off on Fixodent Study Causes Lockjaw in Plaintiffs’ Counsel

Zoloft MDL Relieves Matrixx Depression

January 30th, 2015

When the Supreme Court delivered its decision in Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309 (2011), a colleague, David Venderbush from Alston & Bird LLP, and I wrote a Washington Legal Foundation Legal Backgrounder, in which we predicted that plaintiffs’ counsel would distort the holding, and inflate the dicta of the opinion. Schachtman & Venderbush, “Matrixx Unbounded: High Court’s Ruling Needlessly Complicates Scientific Evidence Principles,” 26 (14) Legal Backgrounder (June 17, 2011)[1]. Our prediction was sadly all-too accurate. Not only was the context of the Matrixx distorted, but several district courts appeared to adopt the dicta on statistical significance as though it represented the holding of the case[2].

The Matrixx decision, along with the few district court opinions that had embraced its dicta[3], was urged as the basis for denying a defense challenge to the proffered testimony of Dr. Anick Bérard, a Canadian perinatal epidemiologist, in the Zoloft MDL. The trial court, however, correctly discerned several methodological shortcomings and failures, including Dr. Bérard’s reliance upon claims of statistical significance from studies that conducted dozens and hundreds of multiple comparisons. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Plaintiffs (through their Plaintiffs’ Steering Committee (PSC) in the Zoloft MDL) were undaunted and moved for reconsideration, asserting that the MDL trial court had failed to give appropriate weight to the Supreme Court’s decision in Matrixx, and a Third Circuit decision in DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The MDL trial judge, however, deftly rebuffed the plaintiffs’ use of Matrixx, and their attempt to banish consideration of random error in the interpretation of epidemiologic studies. In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2015 WL 314149 (E.D. Pa. Jan. 23, 2015) (Rufe, J.) (denying PSC’s motion for reconsideration).

In rejecting the motion for reconsideration, the Zoloft MDL trial judge noted that the PSC had previously cited Matrixx, and that the Court had addressed the case in its earlier ruling. 2015 WL 314149, at *2-3. The MDL Court then proceeded to expand upon its earlier ruling, and to explain how Matrixx was largely irrelevant to the Rule 702 context of Pfizer’s challenge to Dr. Bérard. There were, to be sure, some studies with nominal statistically significant results, for some birth defects, among children of mothers who took Zoloft in their first trimester of pregnancy. As Judge Rufe explained, statistical significance, or the lack thereof, was only one item in a fairly long list of methodological deficiencies in Dr. Bérard’s causation opinions:

“The [original] opinion set forth a detailed and multi-faceted rationale for finding Dr. Bérard’s testimony unreliable, including her inattention to the principles of replication and statistical significance, her use of certain principles and methods without demonstrating either that they are recognized by her scientific community or that they should otherwise be considered scientifically valid, the unreliability of conclusions drawn without adequate hypothesis testing, the unreliability of opinions supported by a ‛cherry-picked’ sub-set of research selected because it was supportive of her opinions (without adequately addressing non-supportive findings), and Dr. Bérard’s failure to reconcile her currently expressed opinions with her prior opinions and her published, peer-reviewed research. Taking into account all these factors, as well as others discussed in the Opinion, the Court found that Dr. Bérard departed from well-established epidemiological principles and methods, and that her opinion on human causation must be excluded.”

Id. at *1.

In citing the multiple deficiencies of the proffered expert witness, the Zoloft MDL Court thus put its decision well within the scope of the Third Circuit’s recent precedent of affirming the exclusion of Dr. Bennet Omalu, in Pritchard v. Dow Agro Sciences, 430 F. App’x 102, 104 (3d Cir.2011). The Zoloft MDL Court further defended its ruling by pointing out that it had not created a legal standard requiring statistical significance, but rather had made a factual finding that epidemiologist, such as the challenged witness, Dr. Anick Bérard, would use some measure of statistical significance in reaching conclusions in her discipline of epidemiology. 2015 WL 314149, at *2[4].

On the plaintiffs’ motion for reconsideration, the Zoloft Court revisited the Matrixx case, properly distinguishing the case as a securities fraud case about materiality of non-disclosed information, not about causation. 2015 WL 314149, at *4. Although the MDL Court could and should have identified the Matrixx language as clearly obiter dicta, it did confidently distinguish the Supreme Court holding about pleading materiality from its own task of gatekeeping expert witness testimony on causation in a products liability case:

“Because the facts and procedural posture of the Zoloft MDL are so dissimilar from those presented in Matrixx, this Court reviewed but did not rely upon Matrixx in reaching its decision regarding Dr. Bérard. However, even accepting the PSC’s interpretation of Matrixx, the Court’s Opinion is consistent with that ruling, as the Court reviewed Dr. Bérard’s methodology as a whole, and did not apply a bright-line rule requiring statistically significant findings.”

Id. at *4.

In mounting their challenge to the MDL Court’s earlier ruling, the Zoloft plaintiffs asserted that the Court had failed to credit Dr. Bérard’s reliance upon what Dr. Bérard called the “Rothman approach.” This approach, attribution to Professor Kenneth Rothman had received some attention in the Bendectin litigation in the Third Circuit, where plaintiffs sought to be excused from their failure to show statistically significant associations when claiming causation between maternal use of Bendectin and infant birth defects. DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941 (3d Cir. 1990). The Zoloft MDL Court pointed out that the Circuit, in DeLuca, had never affirmatively endorsed Professor Rothman’s “approach,” but had reversed and remanded the Bendectin case to the district court for a hearing under Rule 702:

“by directing such an overall evaluation, however, we do not mean to reject at this point Merrell Dow’s contention that a showing of a .05 level of statistical significance should be a threshold requirement for any statistical analysis concluding that Bendectin is a teratogen regardless of the presence of other indicial of reliability. That contention will need to be addressed on remand. The root issue it poses is what risk of what type of error the judicial system is willing to tolerate. This is not an easy issue to resolve and one possible resolution is a conclusion that the system should not tolerate any expert opinion rooted in statistical analysis where the results of the underlying studies are not significant at a .05 level.”

2015 WL 314149, at *4 (quoting from DeLuca, 911 F.2d at 955). After remand, the district court excluded the DeLuca plaintiffs’ expert witnesses, and granted summary judgment, based upon the dubious methods employed by plaintiffs’ expert witnesses in cherry picking data, recalculating risk ratios in published studies, and ignoring bias and confounding in studies. The Third Circuit affirmed the judgment for Merrell Dow. DeLuca v. Merrell Dow Pharma., Inc., 791 F. Supp. 1042 (3d Cir. 1992), aff’d, 6 F.3d 778 (3d Cir. 1993).

In the Zoloft MDL, the plaintiffs not only offered an erroneous interpretation of the Third Circuit’s precedents in DeLuca, they also failed to show that the “Rothman” approach had become generally accepted in over two decades since DeLuca. 2015 WL 314149, at *4. Indeed, the hearing record was quite muddled about what the “Rothman” approach involved, other than glib, vague suggestions that the approach would have countenanced Dr. Bérard’s selective, over-reaching analysis of the extant epidemiologic studies. The plaintiffs did not call Rothman as an expert witness; nor did they offer any of Rothman’s publications as exhibits at the Zoloft hearing. Although Professor Rothman has criticized the overemphasis upon p-values and significance testing, he has never suggested that researchers and scientists should ignore random error in interpreting research data. Nevertheless, plaintiffs attempted to invoke some vague notion of a Rothman approach that would ignore confidence intervals, attained significance probability, multiplicity, bias, and confounding. Ultimately, the MDL Court would have none of it. The Court held that the Rothman Approach (whatever that is), as applied by Dr. Bérard, did not satisfy Rule 702.

The testimony at the Rule 702 hearing on the so-called “Rothman approach” had been sketchy at best. Dr. Bérard protested, perhaps too much, when asked about her having ignored p-values:

“I’m not the only one saying that. It’s really the evolution of the thinking of the importance of statistical significance. One of my professors and also a friend of mine at Harvard, Ken Rothman, actually wrote on it – wrote on the topic. And in his book at the end he says obviously what I just said, validity should not be confused with precision, but the third bullet point, it’s saying that the lack of statistical significance does not invalidate results because sometimes you are in the context of rare events, few cases, few exposed cases, small sample size, exactly – you know even if you start with hundreds of thousands of pregnancies because you are looking at rare events and if you want to stratify by exposure category, well your stratum becomes smaller and smaller and your precision decreases. I’m not the only one saying that. Ken Rothman says it as well, so I’m not different from the others. And if you look at many of the studies published nowadays, they also discuss that as well.”

Notes of Testimony of Dr. Anick Bérard, at 76:21- 77:14 (April 9, 2014). See also Notes of Testimony of Dr. Anick Bérard, at 211 (April 11, 2014) (discussing non-statistically significant findings as a “trend,” and asserting that the lack of a significant finding does not mean that there is “no effect”). Bérard’s invocation of Rothman here is accurate but unhelpful. Rothman and Bérard are not alone in insisting that confidence intervals provide a measure of precision of an estimate, and that we should be careful not to interpret the lack of significance to mean no effect. But the lack of significance cannot be used to interpret data to show an effect.

At the Rule 702 hearing, the PSC tried to bolster Dr. Bérard’s supposed reliance upon the “Rothman approach” in cross-examining Pfizer’s expert witness, Dr. Stephen Kimmel:

“Q. You know who Dr. Rothman is, the epidemiologist?
A. Yes.
Q. You actually took a course from Dr. Rothman, didn’t you?
A. I did when I was a student way back.
Q. He is a well-known epidemiologist, isn’t he?
A. Yes, he is.
Q. He has published this book, Modern Epidemiology. Do you have a copy of this?
A. I do.
Q. Do you – Have you ever read it?
A. I read his earlier edition. I have not read the most recent edition.
Q. There’s two other authors, Sander Greenland and Tim Lash. Do you know either one of them?
A. I know Sander. I don’t know Tim.
Q. Dr. Rothman has some – he has written about confidence intervals and statistical significance for some time, hasn’t he?
A. He has.
Q. Do you agree with him that statistical significance is a not matter of validity. It’s a matter of precision?
A. It’s a matter of – well, confidence intervals are matters of precision. P-values are not.
Q. Okay. I want to put up a table and see if you are in agreement with Dr. Rothman. This is the third edition of Modern Epidemiology. And he has – and ignore my brother’s handwriting. But there is an hypothesized rate ratio under 10-3. It says: p-value function from which one can find all confidence limits for a hypothetical study with a rate ratio estimate of 3.1 Do you see that there?
A. Yes. I don’t see the top of the figure, not that it matters.
Q. I want to make sure. The way I understand this, he is giving us a hypothesis that we have a relative risk of 3.1 and it [presumably a 95% confidence interval] crosses 1, meaning it’s not statistically significant. Is that fair?
A. Well, if you are using a value of .05, yes. And again, if this is a single test and there’s a lot of things that go behind it. But, yes, so this is a total hypothetical.
Q. Yes.
A. I’ sorry. He’s saying here is a hypothetical based on math. And so here is – this is what we would propose.
Q. Yes, I want to highlight what he says about this figure and get your thoughts on it. He says:
The message of figure 10-3 is that the example data are more compatible with a moderate to strong association than with no association, assuming the statistical model used to construct the function is correct.
A. Yes.
Q. Would you agree with that statement?
A. Assuming the statistical model is correct. And the problem is, this is a hypothetical.
Q. Sure. So let’s just assume. So what this means to sort of put some meat on the bone, this means that although we cross 1 and therefore are statistically
significant [sic, non-significant], he says the more likely truth here is that there is a moderate to strong effect rather than no effect?
A. Well, you know he has hypothesized this. This is not used in common methods practice in pharmacoepi. Dr. Rothman has lots of ideas but it’s not part of our standard scientific method.

Notes of Testimony of Dr. Stephen Kimmel, at 126:2 to 128:20.

Nothing very concrete about the “Rothman approach” is put before the MDL Court, either through Dr. Bérard or Dr. Kimmel. There are, however, other instructive aspects to the plaintiff’s counsel’s examination. First, the referenced portion of the text, Modern Epidemiology, is a discussion of p-value functions, not of p-values or of confidence intervals per se. Modern Epidemiology at 158-59 (3d ed. 2008). Dr. Bérard never discussed p-value functions in her report or in her testimony, and Dr. Kimmel testified, without contradiction, that such p-value functions are “not used in common methods practice.” Second, the plaintiff’s counsel never marked and offered the Rothman text as an exhibit for the MDL Court to consider. Third, the cross-examiner first asked about the implication for a hypothetical association, and then, when he wanted to “put some meat on the bone” changed the word used in Rothman’s text, “association,” to “effect.” The word “effect” does not appear in Rothman’s text at the referenced discussion about p-value functions. Fortunately, the MDL Court was not poisoned by the “meat on the bone.”

The Pit and the Pendulum

Another document glibly referenced but not provided to the MDL Court was the publication of Sir Austin Bradford Hill’s presidential address to the Royal Society of Medicine on causation. The MDL Court acknowledged that the PSC had argued that the emphasis upon statistical significance was contrary to Hill’s work and teaching. 2015 WL 314149, at *5. In the Court’s words:

“the PSC argues that the Court’s finding regarding the importance of statistical significance in the field of epidemiology is inconsistent with the work of Bradford Hill. The PSC points to a 1965 address by Sir Austin Bradford Hill, which it has not previously presented to the Court, except in opening statements of the Daubert hearings.²⁰ The PSC failed to put forth evidence establishing that Bradford Hill’s statement that ‛I wonder whether the pendulum has not swung too far [in requiring statistical significance before drawing conclusions]’ has, in the decades since that 1965 address, altered the importance of statistical significance to scientists in the field of epidemiology.”

Id. This failure, identified by the Court, is hardly surprising. The snippet of a quotation from Hill would not sustain the plaintiffs’ sweeping generalization. The quoted language in context may help to explain why Hill’s paper was not provided:

“I wonder whether the pendulum has not swung too far – not only with the attentive pupils but even with the statisticians themselves. To decline to draw conclusions without standard errors can surely be just as silly? Fortunately I believe we have not yet gone so far as our friends in the USA where, I am told, some editors of journals will return an article because tests of significance have not been applied. Yet there are innumerable situations in which they are totally unnecessary – because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too small to be of any practical importance. What is worse the glitter of the t table diverts attention from the inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnel volunteer for some procedure or interview, 20% of patients treated in some particular way are lost to sight, 30% of a randomly-drawn sample are never contracted. The sample may, indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house and carried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers.’ The writer, the editor and the reader are unmoved. The magic formulae are there.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 299 (1965).

In the Zoloft cases, no expert witness was prepared to state that the disparity was “grotesquely obvious,” or “negligible.” And Bradford Hill’s larger point was that bias and confounding often dwarf considerations of random error, and that there are many instances in which significance testing is unavailing or unhelpful. And in some studies, with large “effect sizes,” statistical significance testing may be beside the point.

Hill’s presidential address to the Royal Society of Medicine commemorated his successes in epidemiology, and we need only turn to Hill’s own work to see how prevalent was his use of measurements of significance probability. See, e.g., Richard Doll & Austin Bradford Hill, “Smoking and Carcinoma of the Lung: Preliminary Report,” Brit. Med. J. 740 (Sept. 30, 1950); Medical Research Council, “Streptomycin Treatment of Pulmonary Tuberculosis,” Brit. Med. J. 769 (Oct. 30, 1948).

Considering the misdirection on Rothman and on Hill, the Zoloft MDL Court did an admirable job in unraveling the Matrixx trap set by counsel. The Court insisted upon parsing the Bradford Hill factors[5], over Pfizer’s objection, despite the plaintiffs’ failure to show “an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance,” which Bradford Hill insisted was the prerequisite for the exploration of the nine factors he set out in his classic paper. Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965). Given the outcome, the Court’s questionable indulgence of plaintiffs’ position was ultimately harmless.

[1] See also “The Matrixx – A Comedy of Errors,” and “Matrixx Unloaded,” (Mar. 29, 2011), “The Matrixx Oversold,” and “De-Zincing the Matrixx.”

[2] See “Siracusano Dicta Infects Daubert Decisions” (Sept. 22, 2012).

[3] See, e.g., In re Chantix (Varenicline) Prods. Liab. Litig., 2012 U.S. Dist. LEXIS 130144, at *22 (N.D. Ala. 2012); Cheek v. Wyeth Pharm. Inc., 2012 U.S. Dist. LEXIS 123485 (E.D. Pa. Aug. 30, 2012); In re Celexa & Lexapro Prods. Liab. Litig., ___ F.3d ___, 2013 WL 791780 (E.D. Mo. 2013).

[4] The Court’s reasoning on this point begged the question whether an ordinary clinician, ignorant of the standards, requirements, and niceties of statistical reasoning and inference, would be allowed to testify, unconstrained by any principled epidemiologic reasoning about random or systematic error. It is hard to imagine that Rule 702 would countenance such an end-run around the requirements of sound science.

[5] Adhering to Bradford Hill’s own admonition might have saved the Court the confusion of describing statistical significance as a measure of strength of association. 2015 WL 314149, at *2.

Posted in Expert Witnesses, Rule 702, Scientific Evidence | Comments Off on Zoloft MDL Relieves Matrixx Depression

The Lie Detector and Wonder Woman – Quirks and Quacks of Legal History

January 27th, 2015

From 1923, until the United States Supreme Court decided the Daubert case in 1993, Frye was cited as “controlling authority” on questions of the admissibility of scientific opinion testimony and test results. The decision is infuriatingly cryptic and unhelpful as to background or context of the specific case, as well as how it might be applied to future controversies. Of the 669 words, these are typically cited as the guiding “rule” with respect to expert witness opinion testimony:

“Just when a scientific principle or discovery crosses the line between the experimental and demonstrable stages is difficult to define. Somewhere in this twilight zone the evidential force of the principle must be recognized, and while the courts will go a long way in admitting expert testimony deduced from a well recognized scientific principle or discovery, the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs.”

Frye v. United States, 293 F. 1013, 1014 (D.C. Cir. 1923).

As most scholars of evidence realize, the back story of the Frye case is rich and bizarre. The expert witness involved, William Marston, was a lawyer and scientist, who had made advances in a systolic blood pressure cuff to be used as a “lie detector.” Marston was also an advocate of free love and, with his wife and his mistress, the inventor of Wonder Woman and her lasso of truth.

Jill Lepore, a professor of history in Harvard University, has written an historical account of Marston and his colleagues. Jill Lepore, The Secret History of Wonder Woman (N.Y. 2014). More recently, Lepore has written an important law review on the historical and legal record of the Frye case, which is concealed in the terse 669 words of the Court of Appeals’ opinion. Jill Lepore, “On Evidence: Proving Frye as a Matter of Law, Science, and History,” 124 Yale L.J. 1092 (2015).

Lepore’s history is an important gloss on the Frye case, but her paper points to a larger, more prevalent, chronic problem in the law, which especially afflicts judicial decisions of scientific or technical issues. As an historian, Lepore is troubled, as we all should be, by the censoring, selecting, suppressing, and distorting of facts that go into judicial decisions. From cases and their holdings, lawyers are taught to infer rules that guide their conduct at the bar, and their clients’ conduct and expectations, but everyone requires fair access to the evidence to determine what facts are material to decision.

As Professor Lepore puts it:

“Marston is missing from Frye because the law of evidence, case law, the case method, and the conventions of legal scholarship — together, and relentlessly — hide facts.”

Id. at 1097. Generalizing from Marston and the Frye case, Lepore notes that:

“Case law is like that, too, except that it doesn’t only fail to notice details; it conceals them.”

Id. at 1099.

Lepore documents that Marston’s psychological research was rife with cherry picking and data dredging. Id. at 1113-14. Despite his degree magna cum laude in philosophy from Harvard College, his L.L.B from Harvard Law School (with no particular distinction), and his Ph.D. from Harvard University, Marston was not a rigorous scientist. In exploring the historical and legal record, not recounted in the Frye decision, Lepore’s history provides a wonderful early example, of what has become a familiar phenomenon of modern litigation: an expert witness who seeks to achieve acceptance for a dubious opinion or device in the courtroom rather than in the court of scientific opinion. Id. at 1122. The trial judge in Frye’s murder case, Justice McCoy, was an astute judge, and quite modest in his ability to evaluate the validity of Marston’s opinions, but he had more than sufficient perspicacity to discern that Marston’s investigation was “wildly unscientific,” with no control groups. Id. at 1135. The trial record of defense counsel’s proffer, and Justice McCoy’s rulings and comments from the bench, reproduced in Lepore’s article, anticipate and predict much of the scholarship surrounding both Frye and Daubert cases.

Lepore complains that the important historical record, including Marston’s correspondence with Professor Wigmore, the criminal fraud charges against Marston, and the correspondence of Frye’s lawyers, lies “filed, undigitized” in various archives. Id. at 1150. Although Professor Lepore tirelessly cites to internet URL sources when available, she could have easily made the primary historical materials available for all, using readily available internet technology. Lepore’s main thesis should encourage lawyers and law scholars to look beyond appellate decisions as the data for legal analysis.

Posted in Expert Witnesses, Frye, Scientific Evidence | Comments Off on The Lie Detector and Wonder Woman – Quirks and Quacks of Legal History

The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes

January 17th, 2015

As a matter of policy, I leave the comment field turned off on this blog. I don’t have the time or patience to moderate discussions, but that is not to say that I don’t value feedback. Many readers have written, with compliments, concurrences, criticisms, and corrections. Some correspondents have given me valuable suggestions and materials. I believe I can say that aside from a few scurrilous emails, the feedback generally has been constructive, and welcomed.

My last post was on Naomi Oreskes’ opinion piece in the Sunday New York Times[1]. Professor Deborah Mayo asked me for permission to re-post the substance of this post, and to link to the original[2]. Mayo’s blog does allow for comments, and much to my surprise, the posts drew a great deal of attention, links, comment, and twittering. The number and intensity of the comments, as well as the other blog posts and tweets, seemed out of proportion to the point I was trying to make about misinterpreting confidence intervals and other statistical concepts. I suspect that some climate skeptics received my criticisms of Oreskes with a degree of schadenfreude, and that some who criticized me did so because they fear any challenge to Oreskes as a climate-change advocate. So be it. As I made clear in my post, I was not seeking to engage Oreskes on climate change or her judgments on that issue. What I saw in Oreskes’ article was the same rhetorical move made in the courtroom, and in scientific publications, in which plaintiffs environmentalists attempt to claim a scientific imprimatur for their conclusions without adhering to the rigor required for scientific judgments[3].

Some of the comments about Professor Oreskes caused me to take a look at her recent book, Naomi Oreskes & Erik M. Conway, Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming (N.Y. 2010). Interestingly, much of the substance of Oreskes’ newspaper article comes directly from this book. In the context of reporting on the dispute over the EPA’s meta-analysis of studies on passive smoking and lung cancer, Oreskes addressed the 95 percent issue:

“There’s nothing magic about 95 percent. It could be 80 percent. It could be 51 percent. In Vegas if you play a game with 51 percent odds in your favor, you’ll still come out ahead if you play long enough. The 95 percent confidence level is a social convention, a value judgment. And the value it reflects is one that says that the worst mistake a scientist can make is to fool herself: to think an effect is real when it is not. Statisticians call this a type I error. You can think of it as being gullible, naive, or having undue faith in your own ideas.⁸⁹ To avoid it, scientists place the burden of proof on the person claiming a cause and effect. But there’s another kind of error-type 2-where you miss effects that are really there. You can think of that as being excessively skeptical or overly cautious. Conventional statistics is set up to be skeptical and avoid type I errors. The 95 percent confidence standard means that there is only 1 chance in 20 that you believe something that isn’t true. That is a very high bar. It reflects a scientific worldview in which skepticism is a virtue, credulity is not.⁹⁰ As one Web site puts it, ‘A type I error is often considered to be more serious, and therefore more important to avoid, than a type II error’.⁹¹ In fact, some statisticians claim that type 2 errors aren’t really errors at all, just missed opportunities.⁹²”

Id. at 156-57 (emphasis added). Oreskes’ statement of the confidence interval, from her book, advances more ambiguity by not specifying what the “something” you don’t believe to be true. Of course, if it is the assumed parameter, then she has made the same error as she did in the Times. Oreskes’ further discussion of the EPA environmental tobacco smoke meta-analysis issue makes her meaning clearer, and her interpretation of statistical significance, less defensible:

“Even if 90 percent is less stringent than 95 percent, it still means that there is a 9 in 10 chance that the observed results did not occur by chance. Think of it this way. If you were nine-tenths sure about a crossword puzzle answer, wouldn’t you write it in?⁹⁴”

Id. Throughout her discussion, Oreskes fails to acknowledge that the p-value assumes the correctness of the null hypothesis in order to assess the strength of the specific data as evidence against the null. As I have pointed out elsewhere, this misinterpretation of significance testing is a rhetorical strategy to evade significance testing, as well as to obscure the role of bias and confounding in accounting for data that differs from an expected value.

Oreskes also continues to maintain that a failure to reject the null is playing “dumb” and placing:

“the burden of proof on the victim, rather than, for example, the manufacturer of a harmful product-and we may fail to protect some people who are really getting hurt.”

Id. So again, the same petitio principii as we saw in the Times. Victimhood is exactly what remains to be established. Oreskes cannot assume it, and then criticize time-tested methods that fail to deliver a confirmatory judgment.

There are endnotes in her book, but the authors fail to cite any serious statistics text. The only reference of dubious relevance is another University of Michigan book, Stephen T. Ziliak & Deidre N. McCloskey, The Cult of Statistical Significance (2008). Enough said[4].

With a little digging, I learned that Oreskes and Conway are science fiction writers, and perhaps we should judge them by literary rather than scientific standards. See Naomi Oreskes & Erik M. Conway, “The Collapse of Western Civilization: A View from the Future,” 142 Dædalus 41 (2013). I do not imply any pejorative judgment of Oreskes for advancing her apocalyptic vision of the future of Earth’s environment as a work of fiction. Her literary work is a worthy thought experiment that has the potential to lead us to accept her precautionary judgments; and at least her publication, in Dædalus, is clearly labeled science fiction.

Oreskes’ future fantasy is, not surprisingly, exactly what Oreskes, the historian of science, now predicts in terms of catastrophic environmental change. Looking back from the future, the science fiction authors attempt to explore the historical origins of the catastrophe, only to discover that it is the fault of everyone who disagreed with Naomi Oreskes in the early 21st century. Heavy blame is laid at the feet of the ancestor scientists (Oreskes’ contemporaries) who insisted upon scientific and statistical standards for inferring conclusions from observational data. Implicit in the science fiction tale is the welcome acknowledgment that science should make accurate predictions.

In Oreskes’ science fiction, these scientists of yesteryear, today’s adversaries of climate-change advocates, were “almost childlike,” in their felt-need to adopt “strict” standards, and their adherence to severe tests derived from their ancestors’ religious asceticism. In other words, significance testing is a form of self-flagellation. Lest you think, I exaggerate, consider the actual words of Oreskes and Conway:

“In an almost childlike attempt to demarcate their practices from those of older explanatory traditions, scientists felt it necessary to prove to themselves and the world how strict they were in their intellectual standards. Thus, they placed the burden of proof on novel claims, including those about climate. Some scientists in the early twenty-first century, for example, had recognized that hurricanes were intensifying, but they backed down from this conclusion under pressure from their scientific colleagues. Much of the argument surrounded the concept of statistical significance. Given what we now know about the dominance of nonlinear systems and the distribution of stochastic processes, the then-dominant notion of a 95 percent confidence limit is hard to fathom. Yet overwhelming evidence suggests that twentieth-century scientists believed that a claim could be accepted only if, by the standards of Fisherian statistics, the possibility that an observed event could have happened by chance was less than 1 in 20. Many phenomena whose causal mechanisms were physically, chemically, or biologically linked to warmer temperatures were dismissed as “unproven” because they did not adhere to this standard of demonstration.

Historians have long argued about why this standard was accepted, given that it had no substantive mathematical basis. We have come to understand the 95 percent confidence limit as a social convention rooted in scientists’ desire to demonstrate their disciplinary severity. Just as religious orders of prior centuries had demonstrated moral rigor through extreme practices of asceticism in dress, lodging, behavior, and food–in essence, practices of physical self-denial–so, too, did natural scientists of the twentieth century attempt to demonstrate their intellectual rigor through intellectual self-denial.¹⁴ This practice led scientists to demand an excessively stringent standard for accepting claims of any kind, even those involving imminent threats.”

142 Dædalus at 44.

The science fiction piece in Dædalus has now morphed into a short book, which is billed within as a “haunting, provocative work of science-based fiction.” Naomi Oreskes & Erik M. Conway, The Collapse of Western Civilization: A View from the Future (N.Y. 2014). Under the cover of fiction, Oreskes and Conway provide their idiosyncratic, fictional definition of statistical significance, in a “Lexicon of Archaic Terms,” at the back of the book:

“statistical significance The archaic concept that an observed phenomenon could only be accepted as true if the odds of it happening by chance were very small, typically taken to be no more than 1 in 20.”

Id. at 61-62. Of course, in writing fiction, you can make up anything you like. Caveat lector.

[1] See “Playing Dumb on Statistical Significance” (Jan. 4, 2015).

[2] See “Significance Levels are Made a Whipping Boy on Climate Change Evidence: Is .05 Too Strict? (Schachtman on Oreskes)” (Jan. 4, 2015).

[3] See “Rhetorical Strategy in Characterizing Scientific Burdens of Proof” (Nov. 15, 2014).

[4] See “The Will to Ummph” (Jan. 10, 2012).

Posted in Scientific Evidence, statistical evidence | Comments Off on The Rhetoric of Playing Dumb on Statistical Significance – Further Comments on Oreskes

Rhetorical Strategy in Characterizing Scientific Burdens of Proof

November 15th, 2014

The recent opinion piece by Kevin Elliott and David Resnik exemplifies a rhetorical strategy that idealizes and elevates a burden of proof in science, and then declares it is different from legal and regulatory burdens of proof. Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. What is astonishing about this strategy is the lack of support for the claim that “science” imposes such a high burden of proof that we can safely ignore it when making “practical” legal or regulatory decisions. Here is how the authors state their claim:

“Very high standards of evidence are typically expected in order to infer causal relationships or to approve the marketing of new drugs. In other social contexts, such as tort law and chemical regulation, weaker standards of evidence are sometimes acceptable to protect the public (Cranor 2008).”

Id.[1] Remarkably, the authors cite no statute, no case law, and no legal treatise for the proposition that the tort law standard for causation is somehow lower than for a scientific claim of causality. Similarly, the authors cite no support for their claim that regulatory pronouncements are judged under a lower burden. One only need consider the burden a sponsor faces in establishing medication efficacy and safety in a New Drug Application before the Food and Drug Administration. Of course, when agencies engage in assessing causal claims regarding safety, they often act under regulations and guidances that lessen the burden of proof from what we would be required in a tort action.[2]

And most important, Elliott and Resnik fail to cite to any work of scientists for the claim that scientists require a greater burden of proof before accepting a causal claim. When these authors’ claims of differential burdens of proof were challenged by a scientist, Dr. David Schwartz, in a letter to the editors, the authors insisted that they were correct, again citing to Carl Cranor, a non-lawyer, non-scientist:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Reply to Dr. Schwartz. The only thing the authors added to the discussion was to cite to the same work by Carl Cranor[3], but change the date of the book.

Whence comes the assertion that science has a heavier burden of proof? Elliott and Resnik cite Cranor for their remarkable proposition, and so where did Cranor find support for the proposition at issue here? In his 1993 book, Cranor suggests that we “can think of type I and II error rates as “standards of proof,” which begs the question whether they are appropriately used to assess significance or posterior probabilities[4]. Cranor goes so far in his 1993 as to describe the usual level of alpha as the “95%” rule, and that regulatory agencies require something akin to proof “beyond a reasonable doubt,” when they require two “statistically significant” studies[5]. Thus Cranor’s opinion has its origins in his commission of the transposition fallacy[6].

Cranor has persisted in his fallacious analysis in his later books. In his 2006 book, he erroneously equates the 95% coefficient of statistical confidence with 95% certainty of knowledge[7]. Later in the text, he asserts that agency regulations are written when supported by “beyond a reasonable doubt.[8]”

To be fair, it is possible to find regulators stating something close to what Cranor asserts, but only when they themselves are committing the transposition fallacy:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).

And it is similarly possible to find policy wonks expressing similar views. In 1993, the Carnegie Commission published a report in which it tried to explain away junk science as simply the discrepancy in burdens of proof between law and science, but its reasoning clearly points to the Commission’s commission of the transposition fallacy:

“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate. Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993)[9].

Resnik and Cranor’s rhetoric is a commonplace in the courtroom. Here is how the rhetorical strategy plays out in courtroom. Plaintiffs’ counsel elicits concessions from defense expert witnesses that they are using the “norms” and standards of science in presenting their opinions. Counsel then argue to the finder of fact that the defense experts are wonderful, but irrelevant because the fact finder must decide the case on a lower standard. This stratagem can be found supported by the writings of plaintiffs’ counsel and their expert witnesses[10]. The stratagem also shows up in the writings of law professors who are critical of the law’s embrace of scientific scruples in the courtroom[11].

The cacophony of error, from advocates and commentators, have led the courts into frequent error on the subject. Thus, Judge Pauline Newman, who sits on the United States Court of Appeals for the Federal Circuit, and who was a member of the Committee on the Development of the Third Edition of the Reference Manual on Scientific Evidence, wrote in one of her appellate opinions[12]:

“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”

Reaching back even further into the judiciary’s wrestling with the issue of the difference between legal and scientific standards of proof, we have one of the clearest and clearly incorrect statements of the matter[13]:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain. Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain. Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty. Cf. McGill v. United States, 121 U.S.App. D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration. Similarly, the probability that is less than 5% is not the probability that the null hypothesis is correct. The United States Court of Appeals for the District of Columbia thus fell for the rhetorical gambit in accepting the strawman that scientific certainty is 95%, whereas civil and administrative law certainty is a smidgeon above 50%.

We should not be too surprised that courts have erroneously described burdens of proof in the realm of science. Even within legal contexts, judges have a very difficult time articulating exactly how different verbal formulations of the burden of proof translate into probability statements. In one of his published decisions, Judge Jack Weinstein reported an informal survey of judges of the Eastern District of New York, on what they believed were the correct quantizations of legal burdens of proof. The results confirm that judges, who must deal with burdens of proof as lawyers and then as “umpires” on the bench, have no idea of how to translate verbal formulations into mathematical quantities:

U.S. v. Fatico, 458 F.Supp. 388 (E.D.N.Y. 1978). Thus one judge believed that “clear, unequivocal and convincing” required a higher level of proof (90%) than “beyond a reasonable doubt,” and no judge placed “beyond a reasonable doubt” above 95%. A majority of the judges polled placed the criminal standard below 90%.

In running down Elliott, Resnik, and Cranor’s assertions about burdens of proof, all I could find was the commonplace error involved in moving from 95% confidence to 95% certainty. Otherwise, I found scientists declaring that the burden of proof should rest with the scientist who is making the novel causal claim. Carl Sagan famously declaimed, “extraordinary claims require extraordinary evidence[14],” but he appears never to have succumbed to the temptation to provide a quantification of the posterior probability that would cinch the claim.

If anyone has any evidence leading to support for Resnik’s claim, other than the transposition fallacy or the confusion between certainty and coefficient of statistical confidence, please share.

[1] The authors citation is to Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2008). Professor Cranor teaches philosophy at one of the University of California campuses. He is neither a lawyer nor a scientist, but he does participate with some frequency as a consultant, and as an expert witness, in lawsuits, on behalf of claimants.

[2] See, e.g., In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y. 1984) (Weinstein, J.) (“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d 818 F.2d 145 (2d Cir. 1987) (approving district court’s analysis), cert. denied sub nom. Pinkney v. Dow Chemical Co., 487 U.S. 1234 (1988).

[3] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice (NY 2006).

[4] Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34 (Oxford 1993) (One can think of α, β (the chances of type I and type II errors, respectively and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

[5] Id. (squaring 0.05 to arrive at “the chances of two such rare events occurring” as 0.0025).

[6] Michael D. Green, “Science Is to Law as the Burden of Proof is to Significance Testing: Book Review of Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law,” 37 Jurimetrics J. 205 (1997) (taking Cranor to task for confusing significance and posterior (burden of proof) probabilities). At least one other reviewer was not as discerning as Professor Green and fell for Cranor’s fallacious analysis. Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

[7] Carl F. Cranor, Toxic Torts: Science, Law, and the Possibility of Justice 100 (2006) (incorrectly asserting, without further support, that “[t]he practice of setting α =.05 I call the “95% rule,” for researchers want to be 95% certain that when knowledge is gained [a study shows new results] and the null hypothesis is rejected, it is correctly rejected.”).

[8] Id. at 266.

[9] There were some scientists on the Commission’s Task Force, but most of the members were lawyers.

[10] Jan Beyea & Daniel Berger, “Scientific misconceptions among Daubert gatekeepers: the need for reform of expert review procedures,” 64 Law & Contemporary Problems 327, 328 (2001) (“In fact, Daubert, as interpreted by ‛logician’ judges, can amount to a super-Frye test requiring universal acceptance of the reasoning in an expert’s testimony. It also can, in effect, raise the burden of proof in science-dominated cases from the acceptable “more likely than not” standard to the nearly impossible burden of ‛beyond a reasonable doubt’.”).

[11] Lucinda M. Finley, “Guarding the Gate to the Courthouse: How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999) (“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).” Finley erroneously ignores the conditioning of the significance probability on the null hypothesis, and she suggests that statistical significance is sufficient for ascribing causality); Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30, 61 (2007) (“Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”) (“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.″).

[12] Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).

[13] Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).

[14] Carl Sagan, Broca’s Brain: Reflections on the Romance of Science 93 (1979).

Posted in Causation, Rule 702, Scientific Evidence | Comments Off on Rhetorical Strategy in Characterizing Scientific Burdens of Proof

THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS

November 12th, 2014

Back in the day, some Circuits of the United States Court of Appeal embraced an asymmetric standard of review of district court decisions concerning the admissibility of expert witness opinion evidence. If the trial court’s decision was to exclude an expert witness, and that exclusion resulted in summary judgment, then the appellate court would take a “hard look” at the trial court’s decision. If the trial court admitted the expert witness’s opinions, and the case proceeded to trial, with opponent of the challenged expert witness losing the verdict, then the appellate court would take a not-so “hard look” the trial court’s decision to admit the opinion. In re Paoli RR Yard PCB Litig., 35 F.3d 717, 750 (3d Cir.1994) (Becker, J.), cert. denied, 115 S.Ct.1253 (1995).

In Kumho Tire, the 11th Circuit followed this asymmetric approach, only to have the Supreme Court reverse and render. Unlike the appellate procedure followed in Daubert, the high Court took the extra step of applying the symmetrical standard of review, presumably for the didactic purpose of showing the 11th Circuit how to engage in appellate review. Carmichael v. Kumho Tire Co., 131 F.3d 1433 (11th Cir. 1997), rev’d sub nom. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999).

If anything is clear from the Kumho Tire decision, courts do not have discretion to apply an asymmetric standard to their evaluation of a challenge, under Federal Rule of Evidence 702, to a proffered expert witness opinion. Justice Stephen Breyer, in his opinion for the Court, in Kumho Tire, went on to articulate the requirement that trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Again, trial courts do not have the discretion to abandon this inquiry.

The “same intellectual rigor” test may have some ambiguities that make application difficult. For instance, identifying the “relevant” field or discipline may be contested. Physicians traditionally have not been trained in statistical analyses, yet they produce, and rely extensively upon, clinical research, the proper conduct and interpretation of which requires expertise in study design and data analysis. Is the relevant field biostatistics or internal medicine? Given that the validity and reliability of the relied upon studies come from biostatistics, courts need to acknowledge that the rigor test requires identification of the “appropriate” field — the field that produces the criteria or standards of validity and interpretation.

Justice Breyer did grant that trial courts must have some latitude in determining how to conduct their gatekeeping inquiries. Some cases may call for full-blown hearings and post-hearing proposed findings of fact and conclusions of law; some cases may be easily decided upon the moving papers. Justice Breyer’s grant of “latitude,” however, wanders off target:

“The trial court must have the same kind of latitude in deciding how to test an expert’s reliability, and to decide whether or when special briefing or other proceedings are needed to investigate reliability, as it enjoys when it decides whether that expert’s relevant testimony is reliable. Our opinion in Joiner makes clear that a court of appeals is to apply an abuse-of-discretion standard when it ‛review[s] a trial court’s decision to admit or exclude expert testimony’. 522 U. S. at 138-139. That standard applies as much to the trial court’s decisions about how to determine reliability as to its ultimate conclusion. Otherwise, the trial judge would lack the discretionary authority needed both to avoid unnecessary ‛reliability’ proceedings in ordinary cases where the reliability of an expert’s methods is properly taken for granted, and to require appropriate proceedings in the less usual or more complex cases where cause for questioning the expert’s reliability arises. Indeed, the Rules seek to avoid ‛unjustifiable expense and delay’ as part of their search for ‛truth’ and the ‛jus[t] determin[ation]’ of proceedings. Fed. Rule Evid. 102. Thus, whether Daubert ’s specific factors are, or are not, reasonable measures of reliability in a particular case is a matter that the law grants the trial judge broad latitude to determine. See Joiner, supra, at 143. And the Eleventh Circuit erred insofar as it held to the contrary.”

Kumho, 526 U.S. at 152-53.

Now the segue from discretion to fashion the procedural mechanism for gatekeeping review to discretion to fashion the substantive criteria or standards for determining “intellectual rigor in the relevant field” represents a rather abrupt shift. The leap from discretion to fashion procedure to discretion to fashion substantive criteria of validity has no basis in prior law, in linguistics, or in science. For instance, Justice Breyer would be hard pressed to uphold a trial court’s refusal to consider bias and confounding in assessing whether epidemiologic studies established causality in a given case, notwithstanding the careless language quoted above.

The troubling nature of Justice Breyer’s language did not go unnoticed at the time of the Kumho Tire case. Indeed, three of the Justices in Kumho Tire concurred to clarify:

“I join the opinion of the Court, which makes clear that the discretion it endorses—trial-court discretion in choosing the manner of testing expert 1reliability—is not discretion to abandon the gatekeeping function. I think it worth adding that it is not discretion to perform the function inadequately.”

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 158-59 (1999) (Scalia, J., concurring, with O’Connor, J., and Thomas, J.)

Of course, this language from Kumho Tire really cannot be treated as binding after the statute interpreted, Rule 702, was modified in 2000. The judges of the inferior federal courts have struggled with Rule 702, sometimes more to evade its reach than to perform gatekeeping in an intelligent way. Quotations of passages from cases decided before the statute was amended and revised should be treated with skepticism.

Recently, the Sixth Circuit quoted Justice Breyer’s language about latitude from Kumho Tire, in the Circuit’s decision involving GE Healthcare’s radiographic contrast medium, Omniscan. Decker v. GE Healthcare Inc., 2014 U.S. App. LEXIS 20049, at *29 (6th Cir. Oct. 20, 2014). Although the Decker case is problematic in many ways, the defendant did not challenge general causation between gadolinium and nephrogenic systemic fibrosis, a painful, progressive connective tissue disease, which afflicted the plaintiff. It is unclear exactly what sort of latitude in applying the statute, the Sixth Circuit was hoping to excuse.

Posted in Rule 702, Scientific Evidence | Comments Off on THE STANDARD OF APPELLATE REVIEW FOR RULE 702 DECISIONS

Expert Witness Mining – Antic Proposals for Reform

November 4th, 2014

Law Reviews and Altered States of Reality

In 2008, Justice Breyer observed wryly that “there is evidence that law review articles have left terra firma to soar into outer space”; and Judge Posner has criticized law review articles for the “silly titles, the many opaque passages, the antic proposals, the rude polemics, [and] the myriad pretentious citations.” In 2010, Justice Scalia, who was a law-review-producing law professor for the University of Virginia for several years, responded to a lawyer’s oral argument, in McDonald v. City of Chicago, by suggesting that the argument had no support in Supreme Court precedent, but the unsupported argument would make the lawyer the “the darling of the professoriate.” At the June 2011 Fourth Circuit Judicial Conference, Chief Justice Roberts opined that law reviews are generally not “particularly helpful for practitioners and judges.” In his words:

“Pick up a copy of any law review that you see and the first article is likely to be, you know, the influence of Immanuel Kant on evidentiary approaches in 18th-century Bulgaria, or something, which I’m sure was of great interest to the academic that wrote it, but isn’t of much help to the bar.”

See Debra Cassens Weiss, “Law Prof Responds After Chief Justice Roberts Disses Legal Scholarship” Am. Bar Ass’n J. (July 07, 2011). Lawyers would think the Justices view law review scholarship as a useless but generally harmless activity. Sometimes, however, law review articles can actually be harmful.

Selection Effects in the Retention and Presentation of Expert Witnesses

The complaints about law review scholarship are obviously based upon extremes and travesties. Interestingly, Judge Posner himself has been no slacker when it comes to producing law review articles with “antic proposals.” See, e.g., Richard A. Posner, “An Economic Approach to the Law of Evidence,” 51 Stan. L. Rev. 1477, 1541–42 (1999). In the tradition of non-traditional, rationalist proposals that ignore experience and make up something completely untested, Judge Richard Posner has advocated rule changes that would require lawyers

“to disclose the name of all the experts whom they approached as possible witnesses before settling on the one testifying. This would alert the jury to the problem of ‘witness shopping’.”

Posner, 51 Stan. L. Rev. at 1541. The point of Judge Posner’s radical reform is to alert triers of fact to whether the expert witness testifying is the first, or the umpteenth expert witness interviewed before a suitable opinion had been “procured,” so that the fact finder can draw the“ reasonable inference” that the case must be weaker than presented if the party went through so many expert witnesses before coming up with one who would testify in the case. If one party disclosed but one expert witness, the one that actually testified, and the other party disclosed X such witnesses (where X >1), then the fact finder could find in favor of the first party upon the basis of the so-called reasonable inference.

Posner’s proposal is at best a proxy for accuracy and validity in expert witness opinion testimony, and one for which Posner presents no evidence to support his hoped-for improvement in juridical accuracy. Not only does Judge Posner present no evidence that his proposed reform and suggested inference would be in the least bit reasonable and probative of the truth, he fails to address the obvious incentives that would be created by his proposal. Fearing the prejudicial inference from having consulted with “too many” expert witnesses, lawyers, operating under the Posner Rule, would have strong incentives to go to the expert witness “one-stop-shopping” mall, where they know they can obtain expert witnesses guaranteed to align themselves with the needed litigation positions and claims. The Posner Rule would also give a strong advantage to lawyers more skilled in vetting and selecting expert witnesses, to the detriment of less experienced lawyers. Of course, lawyers who are willing to go shopping at the meretricious mall or to employ a “cleaner” who brokers the selection without footprints might escape the bite of the Posner adverse inference.

Posner’s proposed rule ignores what is at the heart of identifying and selecting expert witnesses to testify. Obviously lawyers must identify potential witnesses with suitable expertise to address the issues raised by the litigation. Database searches, such as PubMed and Google Scholar searches for bio-medical experts, can go a long way towards identifying candidates, but interviews are important as well. Posner would chill lawyers’ effective representation by placing an adverse inference upon their diligence in any contact with the person other than the “one” who will be anointed to be the party’s designated testifier.

Meetings and interviews with prospective expert witnesses to ascertain whether the witness candidate has sufficient time and interest in fulfill the litigation assignment. Expertise in the area is hardly a guarantee that the candidate will be interested in answering the specific questions that are contested in the litigation. The lawyers must also ascertain whether the witness candidate has the stamina, patience, and aptitude for the litigation context. Not all real experts do, and the consequences of engaging an expert who does not have the qualities to make a good expert witness can be disastrous. Witness candidates must also be screened for their communication skills, their appearance, and even basic hygiene. The most brilliant expert who mumbles, or who is unkempt, is useless in litigation.

Lawyers must evaluate witness candidates for conflicts of interest, many of which are unknowable until there is a face-to-face meeting. Does the witness candidate have a significant other or child who works for the litigation industry (plaintiffs’ bar) or for the defendant industry under assault in the litigation at hand? Either way, the candidate may be compromised. Was the candidate mentored by an expert witness on the other side? Is the candidate on an editorial board with the adversary’s witnesses? Is the candidate close personal friends of the adversaries or their witnesses, such that he will be less than enthusiastic in showing the infirmities of the other side’s positions? Any of these questions could lead to answers that practically disqualify a witness candidate from consideration. Proceeding without such vetting could be catastrophic for the client and counsel. Burdening the vetting process with the threat of an adverse inference is deeply unfair to diligent counsel trying to represent and serve their clients.

And there are yet additional considerations that require exploration with any witness candidate. Expert witnesses are not equally able to deal with adverse authority in the form of a noted scientist who has taken a stand on the litigation issue, or a superficially appearing authoritative author who has published an adverse opinion. As well trained as they might be, some real experts are “sheep,” who are most comfortable following the herd, and not independent thinkers. Not all experts are willing or able to read studies as critically as needed for the litigation situation, which can sometimes be more demanding than the scientific arena. Lawyers charged with retaining expert witnesses must assess their clients’ positions and determine how well their expert witnesses will perform under all the circumstances of the case.

Professor Christopher Robertson proposes an even more radical reform of the law of expert witness by removing the selection and control of expert witnesses from parties and their counsel, completely. Robertson would somehow create a pool of expert witnesses on the issues in each case, and assign them to parties in a double-blinded randomized fashion. Christopher Tarver Robertson, “Blind Expertise,” 85 N.Y. Univ. L. Rev. 174, 211 (2010). Aside from depriving litigants of autonomy and control over their cases, this approach has even greater potential for generating false results. How do the expert witness come to be retained for this process? Any two expert witnesses may very well come to an incorrect analysis precisely because they do not have the benefit of each other’s report to develop the full range of data to be considered. What if the expert assigned to plaintiff concludes that there is no case, but the expert assigned to the defendant concludes that the plaintiff’s case is meritorious? Normally, plaintiffs’ expert witnesses must file their reports in advance of the defense witnesses, who then have the opportunity to rebut but also the benefit of all the data included. Simultaneous reports risk major omissions of data to be considered on both sides. The adversarial cauldron works to ensure completeness in what data and studies are considered.

Now comes Jonah Gelbach to attempt a probabilistic, theoretical defense of reforms in the Posner-Robertson mold. Jonah B. Gelbach, “Expert Mining and Required Disclosure,” 81 U. Chicago L. Rev. 131 (2014). Professor Gelbach is a well-trained economist, and a recently minted lawyer (Yale 2013), who is now an Associate Professor at the University of Pennsylvania Law School. Gelbach’s experience with the practice of law is limited to working as a law-school intern at David Rosen & Associates, in New Haven, Connecticut, before joining the Penn faculty. His proposals may need to be taken with a 100 grains of aspirin.

Although Gelbach disagrees with particulars of the Posner-Robertson proposals, Gelbach joins with them to opine that “[t]o the extent that additional fully disclosed expert testimony increases the fact finder’s information, we can expect a beneficial increase in accuracy.” Gelbach at 133. Gelbach’s dictum, however, is an ipse dixit, and he offers only a limited hypothetical case in which full disclosure of data should be required to solve the problem. And even in his hypothetical case, the disclosure of the identities of the testers is unnecessary to correct the error that Gelbach predicts. Gelbach’s call for the disclosure of consulting expert witnesses introduces only a collateral issue that has nothing to do with the accuracy of the scientific reasoning.

Gelbach analogizes “witness shopping” to data dredging and multiple testing, with a known inflation in the rate of false positive outcomes. If a party directs multiple to conduct single outcome measurements or tests, then that party can recreate the results of multiple testing without having to disclosure the number of independent tests. Gelbach’s argument is at its strongest for a simplistic model of a simple measurement, with errors normally distributed, with accuracy of the measurement tied to the outcome of the case. Gelbach at 136. Gelbach analogizes expert witness mining with data mining, and goes so far as to provide a calculation of false positive rates from multiple testing.

The sort of multiple testing Gelbach condemns is even more obvious when something other than random error is involved. Consider the need of litigants to have chest radiograph interpreted for the presence or absence of a pneumoconiosis in occupational dust disease litigation. Not only is there an intra-observer variability, there are potential or known subjective biases in radiograph interpretations. Gelbach need not worry about multiple testing because the need for economic efficiency already encourages many lawyers to employ radiologists who are must biased in favor of their clients’ positions. The bigger problem would be to encourage lawyers to obtain an honest second opinion, which might make them less strident about their litigation positions when discussing possible settlement.

Gelbach appears to believe that mandatory disclosure of the number of expert witnesses hired as well as the contents of the written and oral reports issued by the party’s nontestifying expert witnesses is needed to abate the potential harm from “expert mining.” By introducing the probabilistic modeling of Type I and Type II errors, however, Gelbach elevates proofiness over clear thinking about the issue. The simple solution to Gelbach’s soil measurement hypothetical is to require disclosure of all testing data, regardless whether conducted by expert witnesses designated as testifying or as consulting. All are agents of the party for purposes of creating data in the form of the hypothesized soil measurement. Indeed, Gelbach’s hypothetical envisions a technical laboratory that conducts such measurements, and the lab might not even be associated with a person designated to serve as an expert witness on the litigation issues.

Gelbach’s soil-measurement case is thus, for the most part, a straw-person case. In the vast majority of cases, multiple expert witness interviews leading up to selection and retention is, however, not at all like multiple testing, either in its ability to generate deliberate false positive or false negative opinions. The evidence remains what it is, and the parameter unchanged, whatever the qualitative judgments of the witness candidates. In most litigation contexts, the data upon which the expert witnesses will rely comes from published studies, and not from a single measurement under either side’s control and ability to resample many times through the agency of multiple expert witnesses. The Rules need to help the triers of fact discern the truth, not irrelevant proxies for the truth. If the triers of fact are incompetent to adjudge the actual evidence, then we may need to find triers who are competent.

The extension of the soil hypothetical to all of expert witness opinion testimony is unwarranted. Accuracy and validity of expert opinion is not “independent and identically distributed.” Truth and accuracy in scientific judgment as applied to litigation scientific questions are not random variables with known distributions.

A party may have to comb through dozens of potential expert witnesses before arriving at an expert witness with an appropriate, accurate answer to the litigation issue. When confronted with a pamphlet entitled “100 Authors against Einstein,” Albert Einstein quipped “if I were wrong, one would have been enough.” See Remigio Russo, 18 Mathematical Problems in Elasticity 125 (1996) (quoting Einstein). Legal counsel should not have their clients’ cause compromised because they had the misfortune of consulting the “100 Authors” before arriving at Einstein’s door. The Posner-Robertson-Gelbach proposals all suffer the same flaw: they defer unduly to conformism and ignore the truth, validity, and accuracy of procured opinions.

Disputes in science are resolved with data, from high-quality, reproducible experimental or observational studies, not with appeals to the number of speakers. The number of expert witness candidates who were interviewed or who offered preliminary opinions is irrelevant to the task assigned to the finder of fact in a case involving scientific evidence. The final, proffered opinion of the testifying expert witness is only as good as the evidence and analysis upon which it rests, which under the current rules, should be fully disclosed.

Posted in Expert Witnesses, Scientific Evidence | Comments Off on Expert Witness Mining – Antic Proposals for Reform

Transparency, Confusion, and Obscurantism

October 31st, 2014

In NIEHS Transparency? We Can See Right Through You (July 10, 2014), I chastised authors Kevin C. Elliott and David B. Resnik for their confusing and confused arguments about standards of proof, the definition of risk, and conflicts of interest (COIs). See Kevin C. Elliott and David B. Resnik, “Science, Policy, and the Transparency of Values,” 122 Envt’l Health Persp. 647 (2014) [Elliott & Resnik]. In their focus on environmentalism and environmental policy, Elliott and Resnik seem intent upon substituting various presumptions, leaps of faith, and unproven extrapolations for actual evidence and valid inference, in the hope of improving the environment and reducing risk to life. But to get to their goal, Elliott and Resnik engage in various equivocations and ambiguities in their use of “risk,” and they compound the muddle by introducing a sliding scale of “standards of evidence,” for legal, regulatory, and scientific conclusions.

Dr. David H. Schwartz is a scientist, who received his doctoral degree in Neuroscience from Princeton University, and his postdoctoral training in Neuropharmacology and Neurophysiology at the Center for Molecular and Behavioral Neuroscience, in Rutgers University. Dr. Schwartz has since gone to found one of the leading scientific consulting firms, Innovative Science Solutions (ISS), which supports both regulatory and litigation claims and defenses, as may scientifically appropriate. Given his experience, Dr. Schwartz is well positioned to address the standards of scientific evidentiary conclusions across regulatory, litigation, and scientific communities.

In this month’s issue of Environmental Health Perspectives (EHP), Dr. David Schwartz adds to the criticism of Elliott and Resnik’s tendentious editorial. David H. Schwartz, “Policy and the Transparency of Values in Science,” 122 Envt’l Health Persp. A291 (2014). Schwartz points out that “[a]lthough … different venues or contexts require different standards of evidence, it is important to emphasize that the actual scientific evidence remains constant.” Id.

Dr. Schwartz points out transparency is needed in how standards and evidence are represented in scientific and legal discourse, and he takes Elliott and Resnik to task for arguing, from ignorance, that litigation burdens are different from scientific standards. At times some writers misrepresent the nature of their evidence, or its weakness, and when challenged, attempt to excuse the laxness in standards by adverting to the regulatory or litigation contexts in which they are speaking. In some regulatory contexts, the burdens of proof are deliberately reduced, or shifted to the regulated industry. In litigation, the standard or burden of proof is rarely different from the scientific enterprise itself. As the United States Supreme Court made clear, trial courts must inquire whether an expert witness ‘‘employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’ Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999). Expert witnesses who fail to exercise the same intellectual rigor in the courtroom as in the laboratory, are eminently disposable or excludable from the legal process.

Schwartz also points out, as I had in my blog post, that “[w]hen using science to inform policy, transparency is critical. However, this transparency should include not only financial ties to industry but also ties to advocacy organizations and other strongly held points of view.”

In their Reply to Dr. Schwartz, Elliott and Resnik concede the importance of non-financial conflicts of interest, but they dig in on the supposed lower standard for scientific claims:

“we caution against equating the standards of evidence expected in tort law with those expected in more traditional scientific contexts. The tort system requires only a preponderance of evidence (> 50% likelihood) to win a case; this is much weaker evidence than scientists typically demand when presenting or publishing results, and confusion about these differing standards has led to significant legal controversies (Cranor 2006).”

Rather than citing any pertinent or persuasive legal authority, Elliott and Resnik cite an expert witness, Carl Cranor, neither a lawyer nor a scientist, who has worked steadfastly for the litigation industry (the plaintiffs’ bar) on various matters. The “caution” of Elliott and Resnik is directly contradicted by the Supreme Court’s pronouncement in Kumho Tire, and is fueled by a ignoratio elenchi that is based upon a confusion between the probability of repeated sampling with confidence intervals (usually 95%) and the posterior probability of a claim: namely, the probability of the claim given the admissible evidence. As the Reference Manual for Scientific Evidence makes clear, these are very different probabilities, which Cranor and others have consistently confused. Elliott and Resnik ought to know better.

Posted in Conflicts of Interest, Scientific Evidence | Comments Off on Transparency, Confusion, and Obscurantism