TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Contrivance Standard for Expert Witness Gatekeeping

September 28th, 2014

According to Google ngram, the phrase “junk science” made its debut circa 1975, lagging junk food by about five years. SeeThe Rise and Rise of Junk Science” (Mar. 8, 2014). I have never much like the phrase “junk science” because it suggests that courts need only be wary of the absurd and ridiculous in their gatekeeping function. Some expert witness opinions are, in fact, serious scientific contributions, just not worthy of being advanced as scientific conclusions. Perhaps better than “junk” would be patho-epistemologic opinions, or maybe even wissenschmutz, but even these terms might obscure that the opinion that needs to be excluded derives from serious scientific, only it is not ready to be held forth as a scientific conclusion that can be colorably called knowledge.

Another formulation of my term, patho-epistemology, is the Eleventh Circuit’s lovely “Contrivance Standard.” Rink v. Cheminova, Inc., 400 F.3d 1286, 1293 & n.7 (11th Cir. 2005). In Rink, the appellate court held that the district court had acted within its discretion to exclude expert witness testimony because it had properly confined its focus to the challenged expert witness’s methodology, not his credibility:

“In evaluating the reliability of an expert’s method, however, a district court may properly consider whether the expert’s methodology has been contrived to reach a particular result. See Joiner, 522 U.S. at 146, 118 S.Ct. at 519 (affirming exclusion of testimony where the methodology was called into question because an “analytical gap” existed “between the data and the opinion proffered”); see also Elcock v. Kmart Corp., 233 F.3d 734, 748 (3d Cir. 2000) (questioning the methodology of an expert because his “novel synthesis” of two accepted methodologies allowed the expert to ”offer a subjective judgment … in the guise of a reliable expert opinion”).”

Note the resistance, however, to the Supreme Court’s mandate of gatekeeping. District courts must apply the statutes, Rule of Evidence 702 and 703. There is no legal authority for the suggestion that a district court “may properly consider wither the expert’s methodology has been contrived.” Rink, 400 F.3d at 1293 n.7 (emphasis added).

Railroading Scientific Evidence of Causation in Court

August 31st, 2014

Harold Tanfield spent 40 years or so working for Consolidated Rail Corporation (and its predecessors), from 1952 to 1992.  Mr. Tanfield’s widow sued Conrail, under the Federal Employers’ Liability Act (“FELA”), 45 U.S.C.A. §§ 51-60, for negligently overexposing her late husband to diesel fumes, which allegedly caused him to develop lung cancer. Tanfield v. Leigh RR, No. A-4170-12T2, New Jersey Superior Court, App. Div. (Aug. 11, 2014) Slip op. at 3. [cited below as Tanfield].

The trial court granted Conrail summary judgment on grounds that plaintiff failed to show that Conrail had breached a duty of care.  The appellate court reversed and remanded for trial. The Appellate Division’s decision is “per curiam,” and franked “not for publication without the approval of the Appellate Division.” Only two of the usual three appellate judges participated.  The panel decided the case one week after it was submitted.

The plaintiff relied upon two witness, a co-worker of her husband, and an expert witness, Steven R. Tahan, M.D.  Dr. Tahan is a pathologist, an Associate Professor, Department of Pathology, Harvard Medical School, and the Director of Dermatopathology, Beth Israel Deaconess Medical Center.  Dr. Tahan’s website lists melanoma as his principal research interest. A PubMed search reveals no publications on diesel fume, occupational disease, or lung cancer.  Dr. Tahan’s principal research interest, skin pathology, was decidedly not at issue in the Tanfield case.

The panel of the Appellate Division quoted from the relevant paragraphs of Tahan’s report:

“Mr. Tanfield was a railroad worker for 35 years, where he was exposed to a large number of carcinogenic chemicals and fumes, including asbestos, antimony, arsenic, benzene, beryllium, cadmium, carbon disulfide, cyanide, DDT, diesel fumes, diesel fuel, dioxins, ethylbenzene, lead, methylene chloride, mercury, naphthalene, petroleum hydrocarbon, polychlorinated biphenyls, polynuclear aromatic hydrocarbons, toluene, vinyl acetate, and other volatile organics.

I have reviewed the cytology and biopsy slides from the right lung and confirm that he had a poorly differentiated malignant non-small cell carcinoma with both adenocarcinomatous and squamous features.  I have reached the following conclusions to a reasonable degree of medical certainty based on review of the above materials, my education, training, and experience, and review of published studies.

Mr. Tanfield’s more than 35 year substantial occupational exposure to an extensive array of carcinogens and diesel fumes without provision of protective equipment such as masks, respirators, and other filters created a long-term hazard that substantially multiplied his risk for developing lung cancer over the baseline he had as a former smoker.  It is more likely than not that his occupational exposure to diesel fumes and other carcinogenic toxins present in his workplace was a significant causative factor for his development of lung cancer and death from his cancer.”

Tanfield at 6-7.

Mr. Tanfield’s co-worker testified to what appeared to him to be excessive diesel fumes in the workplace, but there is no mention of any quantitative or qualitative evidence to any other lung carcinogen.  The Appellate Division states that the above three paragraphs represent the substance of Dr. Tahan’s report, and so it appears that there is no quantification of Tanfield’s smoking abuse, or the length of time between his discontinuing his smoking and the diagnosis of his lung cancer.  There is no discussion of any support for the alleged interaction between risks, or for any quantification of the extent of his increased risk from his lifestyle choices as opposed to his workplace exposure(s). There is no discussion of what Dr. Tahan visualized in his review of cytology and pathology slides, which permitted him to draw inferences about the actual causes of Mr. Tanfield’s lung cancer.

The trial judge proceeded on the assumption that there was an adequate proffer of expert opinion on causation, but that Dr. Tahan’s opinions on the failure to provide masks or respirators was a “net opinion,” a bit out of Tahan’s area of expertise.  Tanfield at 8. The Appellate Division apparently thought having a skin pathologist opine about the duty of care for a railroad was good enough for government work.  The appellate court gave the widow the benefit of the lower evidentiary threshold for negligence under FELA, which supposedly excuses the lack of an industrial hygiene opinion.  Tanfield at 10.  According to the two-judge panel, “[t]he doctor’s [Tahan’s] opinions are backed by professional literature and by his own considerable years of research and experience.” Tanfield at 11.  The Panel’s statement is all the more remarkable given that Tahan had never published on lung cancer, exposure assessments, or industrial hygiene measures; the vaunted experience of this witness was irrelevant to the issues in the case. Perhaps even more disturbing are the gaps in the proofs concerning the lack of causal connection between many of the alleged exposures and lung cancer generally, any discussion that the level of exposure to diesel fumes, from 1952 to 1992, was such that the railroads knew or should have known that that level of diesel fume caused lung cancer in workers.  And then there is the lurking probability that Mr. Tanfield’s smoking was the sole cause of his lung cancer.

Over 50 years ago, the New York Court of Appeals rejected a claim for leukemia, based upon allegations of benzene exposure, without any quantification of risk from the alleged exposure.  Miller v. National Cabinet Co., 8 N.Y.2d 277, 283-84, 168 N.E.2d 811, 813-15, 204 N.Y.S.2d 129, 132-34, modified on other grounds, 8 N.Y.2d 1025, 70 N.E.2d 214, 206 N.Y.S.2d 795 (1960). It is time to raise the standard for New Jersey courts’ consideration of epidemiologic evidence.

Peer Review, PubPeer, PubChase, and Rule 702 – Candles in the Ear

August 28th, 2014

In deciding the Daubert case, the Supreme Court identified several factors to assess whether “the reasoning or methodology underlying the testimony is scientifically valid and of whether that reasoning or methodology properly can be applied to the facts in issue.” One of those factors was whether the proffered opinion had been “peer reviewed” and published. Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593-94 (1993). The Court explained the publication factor:

“Another pertinent consideration is whether the theory or technique has been subjected to peer review and publication. Publication (which is but one element of peer review) is not a sine qua non of admissibility; it does not necessarily correlate with reliability, and in some instances well-grounded but innovative theories will not have been published. Some propositions, moreover, are too particular, too new, or of too limited interest to be published. But submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in methodology will be detected. The fact of publication (or lack thereof) in a peer reviewed journal thus will be a relevant, though not dispositive, consideration in assessing the scientific validity of a particular technique or methodology on which an opinion is premised.”

Daubert, 509 U.S. at 593-94 (internal citations omitted). See, e.g., Lust v. Merrell Dow Pharms., Inc., 89 F. 3d 594, 597 (9th Cir. 1996) (affirming exclusion of Dr. Alan Done, plaintiffs’ expert witness in Chlomid birth defects case, in part because of the lack of peer review and publication of his litigation-driven opinions); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387, 1406 (1996)  (noting that “the lack of peer review for [epidemiologist] Dr. Swan’s theories weighs heavily against the admissibility of Dr. Swan’s testimony”).

Case law since Daubert has made clear that peer review is neither necessary nor sufficient for the admissibility of an opinion. United States v. Mikos, 539 F.3d 706, 711 (7th Cir. 2008) (noting that the absence of peer-reviewed studies on subject of bullet grooving did not render opinion, based upon FBI database, inadmissible); In re Zoloft Prods. Liab. Litig. MDL No. 2342; 12-md-2342,  2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (excluding proffered testimony of epidemiologist Anick Bérard for arbitrarily selecting some point estimates and ignoring others in published studies).

As Susan Haack has noted, “peer review” has taken on mythic proportions in the adjudication of expert witness opinion admissibility.  Susan Haack, “Peer Review and Publication: Lessons for Lawyers,” 36 Stetson L. Rev. 789 (2007), republished in Susan Haack, Evidence Matters: Science, Proof, and Truth in the Law 156 (2014). Peer review, at best, is a weak proxy for the study validity, which is what is really needed in judicial proceedings. Proxies avoid the labor of independent, original thought, and so they are much favored by many judges.

In the past, some litigants oversold peer review as a touchstone of reliable, admissible expert witness testimony only to find that some very shoddy opinions show up in ostensibly peer-reviewed journals. SeeMisplaced Reliance On Peer Review to Separate Valid Science From Nonsense” (Aug. 14, 2011). Scientists often claim that science is “self-correcting,” but in some areas of research, there are few severe tests and little critical review, and mostly glib confirmations from acolytes.

Letters to the editor are sometimes held out as a remedy to peer-review screw ups, but such letters, which are not themselves peer reviewed, are subject to the whims of imperious editors who might wish to silence the views of those who would be critical of their judgment in publishing the article under discussion. Most journals have space only for a few letters, and unpopular but salient points of view can go unreported. Many scientists will not write letters to the editors, even when the published article is terribly wrong in its methods, data analyses, conclusions, or discussion.  Letters to the editor are often frowned upon in academic circles as not advancing affirmative research and scholarship agenda.

Letters to the editor often must be sent within a short time window of initial publication, often too short for busy academics to analyze a paper carefully and comment.  Furthermore, letters  and are often limited to a few hundred words, which length is often inadequate to develop a careful critique or exposition of the issues in the paper.  Moreover, such letters suffer from an additional procedural problem:  authors are permitted a response, and the letter writers are not permitted a reply. Authors thus get the last word, which they can often use to deflect or diffuse important criticisms.  The authors’ response can be sufficiently self-serving and misleading, with immunity from further criticism, that many would-be correspondents abandon the project altogether. See, e.g., PubPeer – “Example case showing why letters to the editor can be a waste of time” (Oct. 8, 2013).

Websites and blogs provide for dynamic content, with the potential for critical reviews that can be identified by search engines. See, e.g., Paul S. Brookes, “Our broken academic journal corrections system,” PSBLAB: Cardiac Mitochondrial Research in the Lab (Jan. 14, 2014). Mostly, the internet holds untapped potential for analysis, discussion, and debate on published studies.  To be sure, some journals provide “comment fields,” on their websites, with an opportunity for open discussion.  Often, full critiques must be developed and presented elsewhere. See, e.g., Androgen Study Group, “Letter to JAMA Asking for Retraction of Misleading Article on Testosterone Therapy” (Mar. 25, 2014).

PubPeer

Kate Yandell, in TheScientist, reports on the creation of PubPeer a few years ago, as a forum for post-publication review and discussion published scientific papers. Kate Yandell, “Concerns Raised Online Linger” (Aug. 25, 2014).  Billing itself as an “online journal club,” PubPeer has pointed out potentially serious problems, some of which have led to retractions and corrections. Another internet site of interest is PubChase, which monitors discussion of particular articles, as well as generating email alerts and recommendations for related articles.

One journal editor has taken notice and given notice that he will not pay attention to post-publication peer review.  Eric J. Murphy, the editor in chief of Lipids, posting a comment at PubPeer, illustrates that there will be a good deal of resistance to post-publication open peer review, out of the control of journal editors:

“As an Editor-in-Chief of a society journal, I have never examined PubPeer nor will I do so. First, there is the crowd or group mentality that may over emphasize some point in an irrational manner.  Just as using the marble theory of officiating is bad, one should never base a decision on the quantity of negative or positive comments. Second, if the concerned individual sent an e-mail or letter to me, then I would be duty bound to examine the issue.  It is not my duty to monitor PubPeer or any other such site, but rather to respond to queries sent to me.  So, with regards to Hugh’s point, I don’t support that position at all.

Mistakes happen, although frankly we try to limit these mistakes and do take steps to prevent publishing papers with FFP, it does happen.  Also, honest mistakes happen in science all the time, so[me] of these result in an erratum, while others go unnoticed by editors and reviewers.  In such a case, someone who does notice should contact the editor to put them on notice regarding the issue so that it may be resolved.  Resolution does not necessarily mean correction, but rather the editor taking a close look at the situation, discussing the situation with the original authors, and then reaching a decision.  Most of the time a correction will be made, but not always.”

Murphy’s comments are remarkable.  PubPeer provides a forum for post-publication comment, but it hardly requires editors, investigators, and consumers of scientific studies to evaluate published works by “nose counts” of favorable and unfavorable comments.  This is not, and never has been, a democratic enterprise.  Somehow, we might expect Murphy and others to evaluate the comments, on the merits, not on their prevalence.  Murphy’s declaration that he is duty-bound to investigate and evaluate letters or emails sent to him about published articles is encouraging, but the editors’ ability to ratify publication, in the face of a private communication, without comment to the scientific community, strips the community of making a principled decision on its own.  Murphy’s way, which seems largely the way of contemporary scientific publishing, ignores the important social dimension of scientific debate and resolution of issues.  Leaving control of the discussion in the hands of the editors who approved and published studies may be asking too much of editors. Nemo iudex in causa sua.

PubPeer has already tested the limits of free speech. Kate Yandell, “PubPeer Threatened with Legal Action” (Aug. 19, 2014). A scientist whose works were receiving unfavorable attention on PubPeer threatened a lawsuit.  Let’s hope that scientists can learn to be sufficiently thick skinned that there can be open discourse of the merits of their research, their data, and their conclusions.

Pritchard v. Dow Agro – Gatekeeping Exemplified

August 25th, 2014

Robert T. Pritchard was diagnosed with Non-Hodgkin’s Lymphoma (NHL) in August 2005; by fall 2005, his cancer was in remission. Mr. Pritchard had been a pesticide applicator, and so, of course, he and his wife sued the deepest pockets around, including Dow Agro Sciences, the manufacturer of Dursban. Pritchard v. Dow Agro Sciences, 705 F.Supp. 2d 471 (W.D.Pa. 2010).

The principal active ingredient of Dursban is chlorpyrifos, along with some solvents, such as xylene, cumene, and ethyltoluene. Id. at 474.  Dursban was licensed for household insecticide use until 2000, when the EPA phased out certain residential applications.  The EPA’s concern, however, was not carcinogenicity:  the EPA categorizes chlorpyrifos as “Group E,” non-carcinogenetic in humans. Id. at 474-75.

According to the American Cancer Society (ACS), the cause or causes of NHL cases are unknown.  Over 60,000 new cases are diagnosed annually, in people from all walks of life, occupations, and lifestyles. The ACS identifies some risk factors, such as age, gender, race, and ethnicity, but the ACS emphasizes that chemical exposures are not proven risk factors or causes of NHL.  See Pritchard, 705 F.Supp. 2d at 474.

The litigation industry does not need scientific conclusions of causal connections; their business is manufacturing certainty in courtrooms. Or at least, the appearance of certainty. The Pritchards found their way to the litigation industry in Pittsburgh, Pennsylvania, in the form of Goldberg, Persky & White, P.C. The Goldberg Persky firm sued Dow Agro, and then put the Pritchards in touch with Dr. Bennet Omalu, to serve as their expert witness.  A lawsuit ensued.

Alas, the Pritchards’ lawsuit ran into a wall, or at least a gate, in the form of Federal Rule of Evidence 702. In the capable hands of Judge Nora Barry Fischer, Rule 702 became an effective barrier against weak and poorly considered expert witness opinion testimony.

Dr. Omalu, no stranger to lost causes, was the medical examiner of San Joaquin County, California, at the time of his engagement in the Pritchard case. After careful consideration of the Pritchards’ claims, Omalu prepared a four page report, with a single citation, to Harrison’s Principles of Internal Medicine.  Id. at 477 & n.6.  This research, however, sufficed for Omalu to conclude that Dursban caused Mr. Pritchard to develop NHL, as well as a host of ailments he had never even sued Dow Agro for, including “neuropathy, fatigue, bipolar disorder, tremors, difficulty concentrating and liver disorder.” Id. at 478. Dr. Omalu did not cite or reference any studies, in his report, to support his opinion that Dursban caused Mr. Pritchard’s ailments.  Id. at 480.

After counsel objected to Omalu’s report, plaintiffs’ counsel supplemented the report with some published articles, including the “Lee” study.  See Won Jin Lee, Aaron Blair, Jane A. Hoppin, Jay H. Lubin, Jennifer A. Rusiecki, Dale P. Sandler, Mustafa Dosemeci, and Michael C. R. Alavanja, “Cancer Incidence Among Pesticide Applicators Exposed to Chlorpyrifos in the Agricultural Health Study,” 96 J. Nat’l Cancer Inst. 1781 (2004) [cited as Lee].  At his deposition, and in opposition to defendants’ 702 motion, Omalu became more forthcoming with actual data and argument.  According to Omalu, the Lee study “the 2004 Lee Study strongly supports a conclusion that high-level exposure to chlorpyrifos is associated with an increased risk of NHL.’’ Id. at 480.

This opinion put forward by Omalu bordered on scientific malpractice.  No; it was malpractice.  The Lee study looked at many different cancer end points, without adjustment for multiple comparisons.  The lack of adjustment means at the very least that any interpretation of p-values or confidence intervals would have to modified to acknowledge the higher rate of random error.  Now for NHL, the overall relative risk (RR) for chlorpyrifos exposure was 1.03, with a 95% confidence interval, 0.62 to 1.70.  Lee at 1783.  In other words, the study that Omalu claimed supported his opinion was about as null a study as can be, with reasonably tight confidence interval that made a doubling of the risk rather unlikely given the sample RR.

If the multiple endpoint testing were not sufficient to dissuade a scientist, intent on supporting the Pritchards’ claims, then the exposure subgroup analysis would have scared any prudent scientist away from supporting the plaintiffs’ claims.  The Lee study authors provided two different exposure-response analyses, one with lifetime exposure and the other with an intensity-weighted exposure, both in quartiles.  Neither analysis revealed an exposure-response trend.  For the lifetime exposure-response trend, the Lee study reported an NHL RR of 1.01, for the highest quartile of chloripyrifos exposure. For the intensity-weighted analysis, for the highest quartile, the authors reported RR = 1.61, with a 95% confidence interval, 0.74 to 3.53).

Although the defense and the district court did not call out Omalu on his fantasy statistical inference, the district judge certainly appreciated that Omalu had no statistically significant associations between chloripyrifos and NHL, to support his opinion. Given the weakness of relying upon a single epidemiologic study (and torturing the data therein), the district court believed that a showing of statistical significance was important to give some credibility to Omalu’s claims.  705 F.Supp. 2d at 486 (citing General Elec. Co. v. Joiner, 522 U.S. 136, 144-46 (1997);  Soldo v. Sandoz Pharm. Corp., 244 F.Supp. 2d 434, 449-50 (W.D. Pa. 2003)).

Figure 3 adapted from Lee

Figure 3 adapted from Lee

What to do when there is really no evidence supporting a claim?  Make up stuff.  Here is how the trial court describes Omalu’s declaration opposing exclusion:

 “Dr. Omalu interprets and recalculates the findings in the 2004 Lee Study, finding that ‘an 80% confidence interval for the highly-exposed applicators in the 2004 Lee Study spans a relative risk range for NHL from slightly above 1.0 to slightly above 2.5.’ Dr. Omalu concludes that ‘this means that there is a 90% probability that the relative risk within the population studied is greater than 1.0’.”

705 F.Supp. 2d at 481 (internal citations omitted); see also id. at 488. The calculations and the rationale for an 80% confidence interval were not provided, but plaintiffs’ counsel assured Judge Fischer at oral argument that the calculation was done using high school math. Id. at 481 n.12. Judge Fischer seemed unimpressed, especially given that there was no record of the calculation.  Id. at 481, 488.

The larger offense, however, was that Omalu’s interpretation of the 80% confidence interval as a probability statement of the true relative risk’s exceeding 1.0, was bogus. Dr. Omalu further displayed his lack of statistical competence when he attempted to defend his posterior probability derived from his 80% confidence interval by referring to a power calculation of a different disease in the Lee study:

“He [Omalu] further declares that ‘‘the authors of the 2004 Lee Study themselves endorse the probative value of a finding of elevated risk with less than a 95% confidence level when they point out that ‘this analysis had a 90% statistical power to detect a 1.5–fold increase in lung cancer incidence’.”

Id. at 488 (court’s quoting of Omalu’s quoting from the Lee study). To quote Wolfgang Pauli, Omalu is so far off that he is “not even wrong.” Lee and colleagues were offering a pre-study power calculation, which they used to justify their looking at the cohort for lung cancer, not NHL, outcomes.  Lee at 1787. The power calculation does not apply to the data observed for lung cancer; and the calculation has absolutely nothing to do with NHL. The power calculation certainly has nothing to do with Omalu’s misguided attempt to offer a calculation of a posterior probability for NHL based upon a subgroup confidence interval.

Given that there were epidemiologic studies available, Judge Fischer noted that expert witnesses were obligated to factor such studies into their opinions. See 705 F.Supp. 2d at 483 (citing Soldo, 244 F.Supp. 2d at 532).  Omalu sins against Rule 702 included his failure to consider any studies other than the Lee study, regardless of how unsupportive the Lee study was of his opinion.  The defense experts pointed to several studies that found lower NHL rates among exposed workers than among controls, and Omalu completely failed to consider and to explain his opinion in the face of the contradictory evidence.  See 705 F.Supp. 2d at 485 (citing Perry v. Novartis Pharm. Corp. 564 F.Supp. 2d 452, 465 (E.D. Pa. 2008)). In other words, Omalu was shown to have been a cherry picker. Id. at 489.

In addition to the abridged epidemiology, Omalu relied upon an analogy between the ethyl-toluene and other solvents that contained benzene rings and benzene itself to argue that these chemicals, supposedly like benzene, cause NHL.  Id. at 487. The analogy was never supported by any citations to published studies, and, of course, the analogy is seriously flawed. Many chemicals, including chemicals made and used by the human body, have benzene rings, without the slightest propensity to cause NHL.  Indeed, the evidence that benzene itself causes NHL is weak and inconsistent.  See, e.g., Knight v. Kirby Inland Marine Inc., 482 F.3d 347 (2007) (affirming the exclusion of Dr. B.S. Levy in a case involving benzene exposure and NHL).

Looking at all the evidence, Judge Fischer found Omalu’s general causation opinions unreliable.  Relying upon a single, statistically non-significant epidemiologic study (Lee), while ignoring contrary studies, was not sound science.  It was not even science; it was courtroom rhetoric.

Omalu’s approach to specific causation, the identification of what caused Mr. Pritchard’s NHL, was equally spurious. Omalu purportedly conducted a “differential diagnosis” or a “differential etiology,” but he never examined Mr. Pritchard; nor did he conduct a thorough evaluation of Mr. Pritchard’s medical records. 705 F.Supp. 2d at 491. Judge Fischer found that Omalu had not conducted a thorough differential diagnosis, and that he had made no attempt to rule out idiopathic or unknown causes of NHL, despite the general absence of known causes of NHL. Id. at 492. The one study identified by Omalu reported a non-statistically significant 60% increase in NHL risk, for a subgroup in one of two different exposure-response analyses.  Although Judge Fischer treated the relative risk less than two as a non-dispositive factor in her decision, she recognized that

“The threshold for concluding that an agent was more likely than not the cause of an individual’s disease is a relative risk greater than 2.0… . When the relative risk reaches 2.0, the agent is responsible for an equal number of cases of disease as all other background causes. Thus, a relative risk of 2.0 … implies a 50% likelihood that an exposed individual’s disease was caused by the agent. A relative risk greater than 2.0 would permit an inference that an individual plaintiff’s disease was more likely than not caused by the implicated agent.”

Id. at 485-86 (quoting from Reference Manual on Scientific Evidence at 384 (2d ed. 2000)).

Left with nowhere to run, plaintiffs’ counsel swung for the bleachers by arguing that the federal court, sitting in diversity, was required to apply Pennsylvania law of evidence because the standards of Rule 702 constitute “substantive,” not procedural law. The argument, which had been previously rejected within the Third Circuit, was as legally persuasive as Omalu’s scientific opinions.  Judge Fischer excluded Omalu’s proffered opinions and granted summary judgment to the defendants. The Third Circuit affirmed in a per curiam decision. 430 Fed. Appx. 102, 2011 WL 2160456 (3d Cir. 2011).

Practical Evaluation of Scientific Claims

The evaluative process that took place in the Pritchard case missed some important details and some howlers committed by Dr. Omalu, but it was more than good enough for government work. The gatekeeping decision in Pritchard was nonetheless the target of criticism in a recent book.

Kristin Shrader-Frechette (S-F) is a professor of science who wants to teach us how to expose bad science. S-F has published, or will soon publish, a book that suggests that philosophy of science can help us expose “bad science.”  See Kristin Shrader-Frechette, Tainted: How Philosophy of Science Can Expose Bad Science (Oxford U.P. 2014)[cited below at Tainted; selections available on Google books]. S-F’s claim is intriguing, as is her move away from the demarcation problem to the difficult business of evaluation and synthesis of scientific claims.

In her introduction, S-F tells us that her book shows “how practical philosophy of science” can counteract biased studies done to promote special interests and PROFITS.  Tainted at 8. Refreshingly, S-F identifies special-interest science, done for profit, as including “individuals, industries, environmentalists, labor unions, or universities.” Id. The remainder of the book, however, appears to be a jeremiad against industry, with a blind eye towards the litigation industry (plaintiffs’ bar) and environmental zealots.

The book promises to address “public concerns” in practical, jargon-free prose. Id. at 9-10. Some of the aims of the book are to provide support for “rejecting demands for only human evidence to support hypotheses about human biology (chapter 3), avoiding using statistical-significance tests with observational data (chapter 12), and challenging use of pure-science default rules for scientific uncertainty when one is doing welfare-affecting science (chapter 14).”

Id. at 10. Hmmm.  Avoiding statistical significance tests for observational data?!?  If avoided, what does S-F hope to use to assess random error?

And then S-F refers to plaintiffs’ hired expert witness (from the Milward case), Carl Cranor, as providing “groundbreaking evaluations of causal inferences [that] have helped to improve courtroom verdicts about legal liability that otherwise put victims at risk.” Id. at 7. Whether someone is a “victim” and has been “at risk” turns on assessing causality. Cranor is not a scientist, and his philosophy of science turns of “weight of the evidence” (WOE), a subjective, speculative approach that is deaf, dumb, and blind to scientific validity.

There are other “teasers,” in the introduction to Tainted.  S-F advertises that her Chapter 5 will teach us that “[c]ontrary to popular belief, animal and not human data often provide superior evidence for human-biological hypotheses.”  Tainted at 11. Chapter 6 will show that“[c]ontrary to many physicists’ claims, there is no threshold for harm from exposure to ionizing radiation.” Id.  S-F tells us that her Chapter 7 will criticize “a common but questionable way of discovering hypotheses in epidemiology and medicine—looking at the magnitude of some effect in order to discover causes. The chapter shows instead that the likelihood, not the magnitude, of an effect is the better key to causal discovery.” Id. at 13. Discovering hypotheses — what is that about? You might have thought that hypotheses were framed from observations and then tested.

Which brings us to the trailer for Chapter 8, in which S-F promises to show that “[c]ontrary to standard statistical and medical practice, statistical-significance tests are not causally necessary to show medical and legal evidence of some effect.” Tainted at 11. Again, the teaser raises lots of questions such as what could S-F possibly mean when she says statistical tests are not causally necessary to show an effect.  Later in the introduction, S-F says that her chapter on statistics “evaluates the well-known statistical-significance rule for discovering hypotheses and shows that because scientists routinely misuse this rule, they can miss discovering important causal hypotheses. Id. at 13. Discovering causal hypotheses is not what courts and regulators must worry about; their task is to establish such hypotheses with sufficient, valid evidence.

Paging through the book reveals that a rhetoric that is thick and unremitting, with little philosophy of science or meaningful advice on how to evaluate scientific studies.  The statistics chapter calls out, and lo, it features a discussion of the Pritchard case. See Tainted, Chapter 8, “Why Statistics Is Slippery: Easy Algorithms Fail in Biology.”

The chapter opens with an account of German scientist Fritz Haber’s development of organophosphate pesticides, and the Nazis use of related compounds as chemical weapons.  Tainted at 99. Then, in a fevered non-sequitur and rhetorical flourish, S-F states, with righteous indignation, that although the Nazi researchers “clearly understood the causal-neurotoxic effects of organophosphate pesticides and nerve gas,” chemical companies today “claim that the causal-carcinogenic effects of these pesticides are controversial.” Is S-F saying that a chemical that is neurotoxic must be carcinogenic for every kind of human cancer?  So it seems.

Consider the Pritchard case.  Really, the Pritchard case?  Yup; S-F holds up the Pritchard case as her exemplar of what is wrong with civil adjudication of scientific claims.  Despite the promise of jargon-free language, S-F launches into a discussion of how the judges in Pritchard assumed that statistical significance was necessary “to hypothesize causal harm.”  Tainted at 100. In this vein, S-F tells us that she will show that:

“the statistical-significance rule is not a legitimate requirement for discovering causal hypotheses.”

Id. Again, the reader is left to puzzle why statistical significance is discussed in the context of hypothesis discovery, whatever that may be, as opposed to hypothesis testing or confirmation. And whatever it may be, we are warned that “unless the [statistical significance] rule is rejected as necessary for hypothesis-discovery, it will likely lead to false causal claims, questionable scientific theories, and massive harm to innocent victims like Robert Pritchard.”

Id. S-F is decidedly not adverting to Mr. Pritichard’s victimization by the litigation industry and the likes of Dr. Omalu, although she should. S-F not only believes that the judges in Pritchard bungled their gatekeeping wrong, she knows that Dr. Omalu was correct, and the defense experts wrong, and that Pritchard was a victim of Dursban and of questionable scientific theories that were used to embarrass Omalu and his opinions.

S-F promised to teach her readers how to evaluate scientific claims and detect “tainted” science, but all she delivers here is an ipse dixit.  There is no discussion of the actual measurements, extent of random error, or threats to validity, for studies cited either by the plaintiffs or the defendants in Pritchard.  To be sure, S-F cites the Lee study in her endnotes, but she never provides any meaningful discussion of that study or any other that has any bearing on chlorpyrifos and NHL.  S-F also cited two review articles, the first of which provides no support for her ipse dixit:

“Although mutagenicity and chronic animal bioassays for carcinogenicity of chlorpyrifos were largely negative, a recent epidemiological study of pesticide applicators reported a significant exposure response trend between chlorpyrifos use and lung and rectal cancer. However, the positive association was based on small numbers of cases, i.e., for rectal cancer an excess of less than 10 cases in the 2 highest exposure groups. The lack of precision due to the small number of observations and uncertainty about actual levels of exposure warrants caution in concluding that the observed statistical association is consistent with a causal association. This association would need to be observed in more than one study before concluding that the association between lung or rectal cancer and chlorpyrifos was consistent with a causal relationship.

There is no evidence that chlorpyrifos is hepatotoxic, nephrotoxic, or immunotoxic at doses less than those that cause frank cholinesterase poisoning.”

David L. Eaton, Robert B. Daroff, Herman Autrup, James Bridges, Patricia Buffler, Lucio G. Costa, Joseph Coyle, Guy McKhann, William C. Mobley, Lynn Nadel, Diether Neubert, Rolf Schulte-Hermann, and Peter S. Spencer, “Review of the Toxicology of Chlorpyrifos With an Emphasis on Human Exposure and Neurodevelopment,” 38 Critical Reviews in Toxicology 1, 5-6(2008).

The second cited review article was written by clinical ecology zealot[1], William J. Rea. William J. Rea, “Pesticides,” 6 Journal of Nutritional and Environmental Medicine 55 (1996). Rea’s article does not appear in Pubmed.

Shrader-Frechette’s Criticisms of Statistical Significance Testing

What is the statistical significance against which S-F rails? She offers several definitions, none of which is correct or consistent with the others.

“The statistical-significance level p is defined as the probability of the observed data, given that the null hypothesis is true.8

Tainted at 101 (citing D. H. Johnson, “What Hypothesis Tests Are Not,” 16 Behavioral Ecology 325 (2004). Well not quite; attained significance probability is the probability of data observed or those more extreme, given the null hypothesis.  A Tainted definition.

Later in Chapter 8, S-F discusses significance probability in a way that overtly commits the transposition fallacy, not a good thing to do in a book that sets out to teach how to evaluate scientific evidence:

“However, typically scientists view statistical significance as a measure of how confidently one might reject the null hypothesis. Traditionally they have used a 0.05 statistical-significance level, p < or = 0.05, and have viewed the probability of a false-positive (incorrectly rejecting a true null hypothesis), or type-1, error as 5 percent. Thus they assume that some finding is statistically significant and provides grounds for rejecting the null if it has at least a 95-percent probability of not being due to chance.

Tainted at 101. Not only does the last sentence ignore the extent of error due to bias or confounding, it erroneously assigns a posterior probability that is the complement of the significance probability.  This error is not an isolated occurrence; here is another example:

“Thus, when scientists used the rule to examine the effectiveness of St. John’s Wort in relieving depression,14 or when they employed it to examine the efficacy of flutamide to treat prostate cancer,15 they concluded the treatments were ineffective because they were not statistically significant at the 0.05 level. Only at p < or = 0.14 were the results statistically significant. They had an 86-percent chance of not being due to chance.16

Tainted at 101-02 (citing papers by Shelton (endnote 14)[2], by Eisenberger (endnote 15) [3], and Rothman’s text (endnote 16)[4]). Although Ken Rothman has criticized the use of statistical significance tests, his book surely does not interpret a p-value of 0.14 as an 86% chance that the results were not due to chance.

Although S-F previous stated that statistical significance is interpreted as the probability that the null is true, she actually goes on to correct the mistake, sort of:

“Requiring the statistical-significance rule for hypothesis-development also is arbitrary in presupposing a nonsensical distinction between a significant finding if p = 0.049, but a nonsignificant finding if p = 0. 051.26 Besides, even when one uses a 90-percent (p < or = 0.10), an 85-percent (p < or = 0.15), or some other confidence level, it still may not include the null point. If not, these other p values also show the data are consistent with an effect. Statistical-significance proponents thus forget that both confidence levels and p values are measures of consistency between the data and the null hypothesis, not measures of the probability that the null is true. When results do not satisfy the rule, this means merely that the null cannot be rejected, not that the null is true.”

Tainted at 103.

S-F’s repeats some criticisms of significance testing, most of which involve their own misunderstandings of the concept.  It hardly suffices to argue that evaluating the magnitude of random error is worthless because it does not measure the extent of bias and confounding.  The flaw lies in those who would interpret the p-value as the sole measure of error involved in a measurement.

S-F takes the criticisms of significance probability to be sufficient to justify an alternative approach: evaluating causal hypotheses “on a preponderance of evidence,47 whether effects are more likely than not.”[5] Here citations, however, do not support the notion that an overall assessment of the causal hypothesis is a true alternative of statistical testing, but rather only a later step in the causal assessment, which presupposes the previous elimination of random variability in the observed associations.

S-F compounds her confusion by claiming that this purported alternative is superior to significance testing or any evaluation of random variability, and by noting that juries in civil cases must decide causal claims on the preponderance of the evidence, not on attained significance probabilities:

“In welfare-affecting areas of science, a preponderance-of-evidence rule often is better than a statistical-significance rule because it could take account of evidence based on underlying mechanisms and theoretical support, even if evidence did not satisfy statistical significance. After all, even in US civil law, juries need not be 95 percent certain of a verdict, but only sure that a verdict is more likely than not. Another reason for requiring the preponderance-of-evidence rule, for welfare-related hypothesis development, is that statistical data often are difficult or expensive to obtain, for example, because of large sample-size requirements. Such difficulties limit statistical-significance applicability. ”

Tainted at 105-06. S-F’s assertion that juries need not have 95% certainty in their verdict is either a misunderstanding or a misrepresentation of the meaning of a confidence interval, and a conflation of two very kinds of probability or certainty.  S-F invites a reading that commits the transposition fallacy by confusing the probability involved in a confidence interval with that involved in a posterior probability.  S-F’s claim that sample size requirements often limit the ability to use statistical significance evaluations is obviously highly contingent upon the facts of case, but in civil cases, such as Pritchard, this limitation is rarely at play.  Of course, if the sample size is too small to evaluate the role of chance, then a scientist should probably declare the evidence too fragile to support a causal conclusion.

S-F also postulates that that a posterior probability rather than a significance probability approach would “better counteract conflicts of interest that sometimes cause scientists to pay inadequate attention to public-welfare consequences of their work.” Tainted at 106. This claim is a remarkable assertion, which is not supported by any empirical evidence.  The varieties of evidence that go into an overall assessment of a causal hypothesis are often quantitatively incommensurate.  The so-called preponderance-of-the-evidence described by S-F is often little more than a subjective overall assessment of weight of the evidence.  The approving citations to the work of Carl Cranor support interpreting S-F to endorse this subjective, anything-goes approach to weight of the evidence.  As for WOE eliminating inadequate attention to “public welfare,” S-F’s citations actually suggest the opposite. S-F’s citations to the 1961 reviews by Wynder and by Little illustrate how subjective narrative reviews can be, with diametrically opposed results.  Rather than curbing conflicts of interest, these subjective, narrative reviews illustrate how contrary results may be obtained by the failure to pre-specify criteria of validity, and inclusion and exclusion of admissible evidence. Still, S-F asserts that “up to 80 percent of welfare-related statistical studies have false-negative or type-II errors, failing to reject a false null.” Tainted at 106. The support for this assertion is a citation to a review article by David Resnik. See David Resnik, “Statistics, Ethics, and Research: An Agenda for Education and Reform,” 8 Accountability in Research 163, 183 (2000). Resnik’s paper is a review article, not an empirical study, but at the page cited by S-F, Resnik in turn cites to well-known papers that present actual data:

“There is also evidence that many of the errors and biases in research are related to the misuses of statistics. For example, Williams et al. (1997) found that 80% of articles surveyed that used t-tests contained at least one test with a type II error. Freiman et al. (1978)  * * *  However, empirical research on statistical errors in science is scarce, and more work needs to be done in this area.”

Id. The papers cited by Resnik, Williams (1997)[6] and Freiman (1978)[7] did identify previously published studies that over-interpreted statistically non-significant results, but the identified type-II errors were potential errors, not ascertained errors, because the authors made no claim that every non-statistically significant result actually represented a missed true association. In other words, S-F is not entitled to say that these empirical reviews actually identified failures to reject fall null hypotheses. Furthermore, the empirical analyses in the studies cited by Resnik, who was in turn cited by S-F, did not look at correlations between alleged conflicts of interest and statistical errors. The cited research calls for greater attention to proper interpretation of statistical tests, not for their abandonment.

In the end, at least in the chapter on statistics, S-F fails to deliver much if anything on her promise to show how to evaluate science from a philosophic perspective.  Her discussion of the Pritchard case is not an analysis; it is a harangue. There are certainly more readable, accessible, scholarly, and accurate treatments of the scientific and statistical issues in this book.  See, e.g., Michael B. Bracken, Risk, Chance, and Causation: Investigating the Origins and Treatment of Disease (2013).


[1] Not to be confused with the deceased federal judge by the same name, William J. Rea. William J. Rea, 1 Chemical Sensitivity – Principles and Mechanisms (1992); 2 Chemical Sensitivity – Sources of Total Body Load (1994),  3 Chemical Sensitivity – Clinical Manifestation of Pollutant Overload (1996), 4 Chemical Sensitivity – Tools of Diagnosis and Methods of Treatment (1998).

[2] R. C. Shelton, M. B. Keller, et al., “Effectiveness of St. John’s Wort in Major Depression,” 285 Journal of the American Medical Association 1978 (2001).

[3] M. A. Eisenberger, B. A. Blumenstein, et al., “Bilateral Orchiectomy With or Without Flutamide for Metastic [sic] Prostate Cancer,” 339 New England Journal of Medicine 1036 (1998).

[4] Kenneth J. Rothman, Epidemiology 123–127 (NY 2002).

[5] Endnote 47 references the following papers: E. Hammond, “Cause and Effect,” in E. Wynder, ed., The Biologic Effects of Tobacco 193–194 (Boston 1955); E. L. Wynder, “An Appraisal of the Smoking-Lung-Cancer Issue,”264  New England Journal of Medicine 1235 (1961); see C. Little, “Some Phases of the Problem of Smoking and Lung Cancer,” 264 New England Journal of Medicine 1241 (1961); J. R. Stutzman, C. A. Luongo, and S. A McLuckey, “Covalent and Non-Covalent Binding in the Ion/Ion Charge Inversion of Peptide Cations with Benzene-Disulfonic Acid Anions,” 47 Journal of Mass Spectrometry 669 (2012). Although the paper on ionic charges of peptide cations is unfamiliar, the other papers do not eschew traditional statistical significance testing techniques. By the time these early (1961) reviews were written, the association that was reported between smoking and lung cancer was clearly accepted as not likely explained by chance.  Discussion focused upon bias and potential confounding in the available studies, and the lack of animal evidence for the causal claim.

[6] J. L. Williams, C. A. Hathaway, K. L. Kloster, and B. H. Layne, “Low power, type II errors, and other statistical problems in recent cardiovascular research,” 42 Am. J. Physiology Heart & Circulation Physiology H487 (1997).

[7] Jennie A. Freiman, Thomas C. Chalmers, Harry Smith and Roy R. Kuebler, “The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: survey of 71 ‛negative’ trials,” 299 New Engl. J. Med. 690 (1978).

Too Many Narratives – Historians in the Dock

July 13th, 2014

Historical Associates Inc. (HAI) is a commercial vendor for historical services, including litigation services. Understandably, this firm, like the academic historians who service the litigation industry, takes a broad view of the desirability of historian expert witness testimony.  An article in one of the HAI’s newsletters stakes out lawyer strategies in trying to prove historical facts.  Lawyers can present percipient witnesses, or they

“can present the story themselves, but in the end, arguments by advocates can raise questions of bias that obscure, rather than clarify, the historical facts at issue.”

Mike Reis and Dave Wiseman, “Introducing and interpreting facts-in-evidence: the historian’s role as expert witness,” HAIpoints 1 (Summer 2010)[1]. These commercial historians recommend that advocacy bias, so clear in lawyers’ narratives, be diffused or obscured by having a professional historian present the “story.”  They tout the research skills of historians: “Historians know how to find critical historical information.” And to be sure, historians, whether academic or for-hire may offer important bibliographic services, as well as help in translating, authenticating, and contextualizing documents.  But these historians from HAI want a role on center-stage, or at least in the witness box.  They tell us that:

“Historians synthesize information into well-documented, compelling stories.”

Ah yes, compelling stories, as in “the guiltless gust of a rattling good yarn[2].” The legal system should take a pass on such stories.

*     *     *     *     *     *

A recent law review article attempts to provide a less commercial defense of expert witness testimony.  See Alvaro Hasani, “Putting history on the stand: a closer look at the legitimacy of criticisms levied against historians who testify as expert witnesses,” 34 Whittier L. Rev. 343 (2013) [Hasani].  Hasani argues that historians strive to provide objective historical “interpretation,” by selecting reliable sources, and reliably reading and interpreting these sources to create a reliable “narrative.” Hasani at 355. Hasani points to some courts that have thrown up their hands and declared Daubert reliability factors inapplicable to non-scientific historian testimony. See, e.g., United States v. Paracha, No. 03 CR. 1 197(SHS), 2006 WL 12768, at *19 (S.D.N.Y. Jan. 3, 2006) (noting that Daubert is not designed for gatekeeping of a non-scientific, historian expert witness’s methodology); Saginaw Chippewa Indian Tribe of Michigan v. Granholm, 690 F. Supp. 2d 622, 634 (E.D. Mich. 2010) (noting that “[t]here is no way to ‘test’ whether the experts’ testimony concerning the historical understanding of the treaties is correct. Nor is it possible to establish an ‘error rate’ for historical experts.”).

Not all testifying historians agree, however, that their research and findings are non-scientific.  Here is how one plaintiffs’ expert witness characterized historical thinking:

“Q. Do you believe that historical thinking is a form of scientific thinking?

A. I do. I think that history is sometimes classed with the humanities, sometimes classed with the social sciences, but I think there is a good deal of historical research and writing that is a form of social science.”

Examination Before Trial of Gerald Markowitz, in Mendez v. American Optical, District Court for Tarrant County, Texas (342d Judicial District), at 44:13-20 (July 19, 2005). Professor Susan Haack, and others, have made a persuasive case that the epistemic warrants for claims of knowledge, whether denominated scientific or non-scientific, are not different in kind. If historian testimony is not about knowledge of the past, then it clearly has no role in a trial. Furthermore, Professor Markowitz is correct that sometimes historical opinions are scientific in the sense that they can be tested. If a labor historian asserts that workers are exploited and subjected to unsafe work conditions due to the very nature of capitalism and the profit motives, then that historian’s opinion will be substantially embarrassed by the widespread occupational disease in European and Asian communist regimes.

When Deborah Lipstadt described historian David Irving as a holocaust denier[3], Irving sued Lipstadt for defamation.  In defending against the claim, Lipstadt successfully carried the burden of proving the truth of her accusation.  The trial court’s judgment, quoted by Hasani, reads like a so-called Daubert exclusion of plaintiff Irving’s putative historical writing. Irving v. Penguin Books Ltd., No. 1996-1-1113, 2000 WL 362478, at ¶¶ 1.1, 13.140 (Q.B. Apr. 11, 2000)(finding that “Irving ha[d] misstated historical evidence; adopted positions which run counter to the weight of the evidence; given credence to unreliable evidence and disregarded or dismissed credible evidence.”).

The need for gatekeeping of historian testimony should be obvious.  Historian testimony is often narrative of historical fact that is not beyond the ken of an ordinary fact finder, once the predicate facts are placed into evidence.  Such narratives of historical fact present a serious threat to the integrity of fact finding by creating the conditions for delegation and deferring fact finding responsibility to the historian witness, with an abdication of responsibility by the fact finder. See Ronald J. Allen, “The Conceptual Challenge of Expert Evidence,” 14 Discusiones Filosóficas 41, 50-53 (2013).

Some historians clearly believe that they are empowered by the witness chair to preach or advocate. Allan M. Brandt, who has served as a party expert witness to give testimony on many occasions for plaintiffs in tobacco cases, unapologetically described the liberties he has taken thus:

“It seems to me now, after the hopes and disappointments of the courtroom battle, that we have a role to play in determining the future of the tobacco pandemic. If we occasionally cross the boundary between analysis and advocacy, so be it. The stakes are high, and there is much work yet to do.”

Allan M. Brandt, The Cigarette Century: The Rise, Fall, and Deadly Persistance of the Product That Defined American 505 (2007).

Hasani never comes to grips with the delegation problem or with Brandt’s attitude, which is quite prevalent in the product liability arena. The problem is more than merely “occasional.” The overreaching by historian witnesses reflects the nature of their discipline, the lack of necessity for their testimony, and the failure of courts to exercise their gatekeeping. The problem with Brandt’s excuse making is that neither analysis nor advocacy is needed or desired. Advocacy is the responsibility of counsel, as well as the kind of analysis involved in much of historian testimony.  For instance, when historians offer testimony about the so-called “state of the art,” they are drawing inferences from published and unpublished sources about what people knew or should have known, and about their motivations.  Although their bibliographic and historical researches can be helpful to the fact finder’s effort to understand who was writing what about the issue in times past, historians have no real expertise, beyond the lay fact finder, in discerning intentions, motivations, and belief states.

Hasani concludes that the prevalence of historian expert witness testimony is growing. Hasani at 364.  He cites, however, only four cases for the proposition, three of which pre-date Daubert.  The fourth is an native American rights case. Hasani at 364 n.139. There is little or no evidence that historian expert witness testimony is becoming more prevalent, although it continues in product liability where state of the art — who knew what, when — remains an issue in strict liability and negligence. Mack v. Stryker Corp., 893 F. Supp. 2d 976 (D. Minn. 2012), aff’d, 748 F.3d 845 (8th Cir. 2014). There remains a need for judicial vigilance in policing such state-of-the-art testimony.


[1] Mike Reis is the Vice President and Director of Litigation Research at History Associates Inc. Mr. Reis was received his bachelor’s degree from Loyola College, and his master’s degree from George Washington University, both in history. David Wiseman, an erstwhile trial attorney, conducts historical research for History Associates.

[2] Attributed to Anthony Burgess.

[3] Deborah E. Lipstadt, Denying the Holocaust: The Growing Assault on Truth and Memory 8 (1993).

 

Twerski’s Defense of Daubert

July 6th, 2014

Professor Aaron D. Twerski teaches torts and products liability at the Brooklyn Law School.  Along with a graduating student, Lior Sapir, Twerski has published an article in which the authors mistakenly asseverate that “[t]his is not another article about Daubert.” Aaron D. Twerski & Lior Sapir, “Sufficiency of the Evidence Does Not Meet Daubert Standards: A Critique of the Green-Sanders Proposal,” 23 Widener L.J. 641, 641 (2014) [Twerski & Sapir].

A few other comments.

1. The title of the article.  True, true, and immaterial. As Professor David Bernstein has pointed out many times, Daubert is no longer the law; Federal Rule of Evidence 702, a statute, is the law.  Just as the original Rule 702 superseded Frye in 1975, a revised Rule 702, in 2000, superseded Daubert in 1975. See David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).

2. Twerski and Sapir have taken aim at a draft paper by Professors Green and Sanders, who also presented similar ideas at a workshop in March 2012, in Spain. The Green-Sanders manuscript is available on line. Michael D. Green & Joseph Sanders, “Admissibility Versus Sufficiency: Controlling the Quality of Expert Witness Testimony in the United States,” (March 5, 2012) <downloaded on March 25, 2012>. This article appears to have matured since spring 2012, but it has never progressed to parturition.  Professor Green’s website suggests a mutated version is in the works:  “The Daubert Sleight of Hand: Substituting Reliability, Methodology, and Reasoning for an Old Fashioned Sufficiency of the Evidence Test.”

Indeed, the draft paper is a worthwhile target. SeeAdmissibility versus Sufficiency of Expert Witness Evidence” (April 18, 2012).  Green and Sanders pursue a reductionist approach to Rule 702, which is unfaithful to the letter and spirit of the law.

3. In their critique of Green and Sanders, Twerski and Sapir get some issues wrong. First they insist upon talking about Daubert criteria.  The “criteria” were never really criteria, and as Bernstein’s scholarship establishes, it is time to move past Daubert.

4. Twerski and Sapir assert that Daubert imposes a substantial or heavy burden of proof upon the proponent of expert witness opinion testimony:

“The Daubert trilogy was intended to set a formidable standard for admissibility before one entered the thicket of evaluating whether it was sufficient to serve as grounds for recovery.”

Twerski & Sapir at 648.

Daubert instituted a “high threshold of reliability”.

Twerski & Sapir at 649.

“But, the message from the Daubert trilogy is unmistakable: a court must have a high degree of confidence in the integrity of scientific evidence before it qualifies for consideration in any formal test to be utilized in litigation.”

Twerski & Sapir at 650.

“The Daubert standard is anything but minimal.”

Twerski & Sapir at 651.

Twerski and Sapir never explain whence comes “high,” “formidable,” and “anything but minimal.” To be sure, the Supreme Court noted that “[s]ince Daubert . . . parties relying on expert evidence have had notice of the exacting standards of reliability such evidence must meet.” Weisgram v. Marley Co., 528 U.S. 440, 455 (2000) (emphasis added). An exacting standard, however, is not necessarily a heavy burden.  It may be that the exacting standard is infrequently satisfied because the necessary evidence and inferences, of sufficiency quality and validity, are often missing. The truth is that science is often in the no-man’s land of indeterminate, inconclusive, and incomplete. Nevertheless, Twerski and Sapir play into the hands of the reductionist Green-Sanders’ thesis by talking about what appears to be a [heavy] burden of proof and the “weight of evidence” needed to sustain the burden.

5. Twerski and Sapir obviously recognize that reliability is different from sufficiency, but they miss the multi-dimensional aspect of expert witness opinion testimony.  Consider their assertion that:

“[t]he Court of Appeals for the Eleventh Circuit in Joiner had not lost its senses when it relied on animal studies to prove that PCBs cause lung cancer. If the question was whether any evidence viewed in the light most favorable to plaintiff supported liability, the answer was probably yes.”

Twerski & Sapir at 649; see Joiner v. Gen. Electric Co., 78 F.3d 524, 532 (11th Cir. 1996) rev’d, 522 U.S. 136 (1997).

The imprecision in thinking about expert witness testimony obscures what happened in Joiner, and what must happen under the structure of the evidence statutes (or case law).  The Court of Appeals never relied upon animal studies; nor did the district court below.  Expert witnesses relied upon animal studies, and other studies, and then offered an opinion that these studies “prove” PCBs cause human lung cancer, and Mr. Joiner’s lung cancer in particular.  Those opinions, which the Eleventh Circuit would have taken at face value, would be sufficient to support submitting the case to jury.  Indeed, courts that evade the gatekeeping requirements of Rule 702 routinely tout the credentials of the expert witnesses, recite that they have used science in some sense, and that criticisms of their opinions “go to the weight not the admissibility” of the opinions.  These are, of course, evasions used to dodge Daubert and Rule 702. They are evasions because the science recited is at a very high level of abstraction (“I relied upon epidemiology”), because credentials are irrelevant, and because “weight not the admissibility” is a conclusion not a reason.

Some of the issues obscured by the reductionist weight-of-the-evidence approach are the internal and external validity of the studies cited, whether the inferences drawn from the studies cited are valid and accurate, and whether the method of synthesizing  conclusion from disparate studies is appropriate. These various aspects of an evidentiary display cannot be reduced to a unidimensional “weight.” Consider how many observational studies suggested, some would say demonstrated, that beta carotene supplements reduced the risk of lung cancer, only to be pushed aside by one or two randomized clinical trials.

6. Twerski and Sapir illustrate the crucial point that gatekeeping judges must press beyond the conclusory opinions by exploring the legal controversy over Parlodel and post-partum strokes.  Twerski & Sapir at 652. Their exploration takes them into some of the same issues that confronted the Supreme Court in Joiner:  extrapolations or “leaps of faith” between different indications, different species, different study outcomes, between surrogate end points and the end point of interest, between very high to relatively low therapeutic doses. Twerski and Sapir correctly discern that these various issues cannot be simply subsumed under weight or sufficiency.

7. Professors Green and Sanders have published a brief reply, in which they continue their “weight of the evidence” reductionist argument. Michael D. Green & Joseph Sanders, “In Defense of Sufficiency: A Reply to Professor Twerski and Mr. Sapir,” 23 Widener L.J. 663 (2014). Green and Sanders restate their position that courts can, should, and do sweep all the nuances of evidence and inference validity into a single metric – weight and sufficiency – to adjudicate so-called Daubert challenges.  What Twerski and Sapir seem to have stumbled upon is that Green and Sanders are not engaged in a descriptive enterprise; they are prescribing a standard that abridges and distorts the law and best practice in order to ensure that dubious causal claims are submitted to the finder of fact.

Zoloft MDL Excludes Proffered Testimony of Anick Bérard, Ph.D.

June 27th, 2014

Anick Bérard is a Canadian perinatal epidemiologist in the Université de Montréal.  Bérard was named by plaintiffs’ counsel in the Zoloft MDL to offer an opinion that selective serotonin reuptake inhibitor (SSRI) antidepressants as a class, and Zoloft (sertraline) specifically, cause a wide range of birth defects. Bérard previously testified against GSK about her claim that paroxetine, another SSRI antidepressant is a teratogen.

Pfizer challenged Bérard’s proffered testimony under Federal Rules of Evidence 104(a), 702, 703, and 403.  Today, the Zoloft MDL transferee court handed down its decision to exclude Dr. Bérard’s testimony at the time of trial.  In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL 2342, Document 979 (June 27, 2014).  The MDL court acknowledged the need to consider the selectivity (“cherry picking”) of studies upon which Dr. Bérard relied, as well as her failure to consider multiple comparisons, ascertainment bias, confounding by indication, and lack of replication of specific findings across the different SSRI medications, and across studies. Interestingly, the MDL court recognized that Dr. Bérard’s critique of studies as “underpowered” was undone by her failure to consider available meta-analyses or to conduct one of her own. The MDL court seemed especially impressed by Dr. Bérard’s having published several papers that rejected a class effect of teratogenicity for all SSRIs, as recently as 2012, while failing to identify anything that was published subsequently that could explain her dramatic change in opinion for litigation.

Substituting Risk for Specific Causation

June 15th, 2014

Specious, Speculative, Spurious, and Sophistical

Some legal writers assert that all evidence is ultimately “probable,” but that assertion appears to be true only to the extent that the evidentiary support for any claim can be mapped on scale from 0 to 1, much as probability is.  Probability thus finds its way into discussions of burdens of persuasion as requiring the claim to be shown more probably than not, and expert witness certitude as requiring “reasonable degree of scientific probability.”

There is a contrary emphasis in the law on “actual truth,” which is different from “mere probability.”  The rejection of probabilism can be seen in some civil cases, in which courts have emphasized the need for individualistic data and conclusions, beyond generalizations that might be made about groups that clearly encompass the individual at issue. For example, the Supreme Court has held that charging more for funding a woman’s pension than a man’s is discriminatory because not all women will outlive all men, or the men’s average life expectancy. City of Los Angeles Dep’t of Water and Power v. Manhart, 435 U.S. 702, 708 (1978) (“Even a true generalization about a class is an  insufficient reason for disqualifying an individual to whom the generalization does not apply.”). See also El v. Southeastern Pennsylvania Transportation Authority, 479 F.3d 232, 237 n.6 (3d Cir. 2007) (“The burden of persuasion … is the obligation to convince the factfinder at trial that a litigant’s necessary propositions of fact are indeed true.”).

Specific causation is the soft underbelly of the toxic tort world, in large measure because courts know that risk is not specific causation. In the context of risk of disease, which is usually based upon a probabilistic group assessment, courts occasionally distinguish between risk and specific causation. SeeProbabilism Case Law” (Jan. 28, 2013) (collecting cases for and against probabilism).

In In re Fibreboard Corp., 893 F. 2d 706, 711-12 (5th Cir. 1990), the court rejected a class action approach to litigating asbestos personal injury claims because risk could not substitute for findings of individual causation:

“That procedure cannot focus upon such issues as individual causation, but ultimately must accept general causation as sufficient, contrary to Texas law. It is evident that these statistical estimates deal only with general causation, for ‘population-based probability estimates do not speak to a probability of causation in any one case; the estimate of relative risk is a property of the studied population, not of an individual’s case.’ This type of procedure does not allow proof that a particular defendant’s asbestos ‘really’ caused a particular plaintiff’s disease; the only ‘fact’ that can be proved is that in most cases the defendant’s asbestos would have been the cause.”

Id. at 711-12 (citing Steven Gold, “Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion, and Statistical Evidence,” 96 Yale L.J. 376, 384, 390 (1986). See also Guinn v. AstraZeneca Pharms., 602 F.3d 1245, 1255 (11th Cir. 2010) (“An expert, however, cannot merely conclude that all risk factors for a disease are substantial contributing factors in its development. ‘The fact that exposure to [a substance] may be a risk factor for [a disease] does not make it an actual cause simply because [the disease] developed.’”) (internal citation omitted).

Specific causation is the soft underbelly of the toxic tort world, in large measure because courts know that risk is not specific causation. The analytical care of the Guinn case and others is often abandoned when it will stand in the way of compensation. The conflation of risk and (specific) causation is prevalent precisely because in many cases there is no scientific or medical way to discern what antecedent risks actually played a role in causing an individual’s disease.  Opinions about specific causation are thus frequently devoid of factual or logical support, and are propped up solely by hand waving about differential etiology and inference to the best explanation.

In the scientific world, most authors recognize that risk, even if real and above baseline, regardless of magnitude, does not support causal attribution in a specific case.[1]  Sir Richard Doll, who did so much to advance the world’s understanding of asbestosis as a cause of lung cancer, issued a caveat about the limits of specific causation inference. Richard Doll, “Proof of Causality: Deduction from Epidemiological Observation,” 45 Perspectives in Biology & Medicine 499, 500 (2002) (“That asbestos is a cause of lung cancer in this practical sense is incontrovertible, but we can never say that asbestos was responsible for the production of the disease in a particular patient, as there are many other etiologically significant agents to which the individual may have been exposed, and we can speak only of the extent to which the risk of the disease was increased by the extent of his or her exposure.”)

Similarly, Kenneth Rothman, a leading voice among epidemiologists, cautioned against conflating epidemiologic inferences about groups with inferences about causes in individuals. Kenneth Rothman, Epidemiology: An Introduction 44 (Oxford 2002) (“An elementary but essential principal that epidemiologists must keep in mind is that a person may be exposed to an agent and then develop disease without there being any causal connection between exposure and disease.”  … “In a courtroom, experts are asked to opine whether the disease of a given patient has been caused by a specific exposure.  This approach of assigning causation in a single person is radically different from the epidemiologic approach, which does not attempt to attribute causation in any individual instance.  Rather, the epidemiologic approach is to evaluate the proposition that the exposure is a cause of the disease in a theoretical sense, rather than in a specific person.”) (emphasis added).

The late David Freedman, who was the co-author of the chapters on statistics in all three editions of the Reference Manual on Scientific Evidence, was also a naysayer when it came to transmuting risk into cause:

“The scientific connection between specific causation and a relative risk of two is doubtful. *** Epidemiologic data cannot determine the probability of causation in any meaningful way because of individual differences.”

David Freedman & Philip Stark, “The Swine Flu Vaccine and Guillaine-Barré Syndrome:  A Case Study in Relative Risk and Specific Causation,” 64 Law & Contemporary Problems 49, 61 (2001) (arguing that proof of causation in a specific case, even starting with a relative risk of four, was “unconvincing”; citing Manko v. United States, 636 F. Supp. 1419, 1437 (W.D. Mo. 1986) (noting relative risk of 3.89–3.92 for GBS from swine-flu vaccine), aff’d in part, 830 F.2d 831 (8th Cir. 1987)).

Graham Colditz, who testified for plaintiffs in the hormone therapy litigation, similarly has taught that an increased risk of disease cannot be translated into the “but-for” standard of causation.  Graham A. Colditz, “From epidemiology to cancer prevention: implications for the 21st Century,” 18 Cancer Causes Control 117, 118 (2007) (“Knowledge that a factor is associated with increased risk of disease does not translate into the premise that a case of disease will be prevented if a specific individual eliminates exposure to that risk factor. Disease pathogenesis at the individual level is extremely complex.”)

Another epidemiologist, who wrote the chapter in the Federal Judicial Center’s Reference Manual on Scientific Evidence, on epidemiology, put the matter thus:

“However, the use of data from epidemiologic studies is not without its problems. Epidemiology answers questions about groups, whereas the court often requires information about individuals.

Leon Gordis, Epidemiology 362 (5th ed. 2014) (emphasis in original).

=========================================================

In New Jersey, an expert witness’s opinion that lacks a factual foundation is termed a “net opinion.” Polzo v. County of Essex, 196 N.J. 569, 583 (2008) (explaining New Jersey law’s prohibition against “net opinions” and “speculative testimony”). Under federal law, Rule 702, such an opinion is simply called inadmissible.

Here is an interesting example of a “net opinion” from an expert witness, in the field of epidemiology, who has testified in many judicial proceedings:

 

                                                                                          November 12, 2008

George T. Brugess, Esq.
Hoey & Farina, Attorneys at Law
542 South Dearborn Street, Suite 200
Chicago, IL 60605

Ref: Oscar Brooks v. Ingram Barge and Jantran Inc.

* * * *

Because [the claimant] was employed 28 years, he falls into the greater than 20 years railroad employment category (see Table 3 of Garshick’s 2004 paper) which shows a significant risk for lung cancer that ranges from 1.24 to 1.50. This means that his diesel exposure was a significant factor in his contracting lung cancer. His extensive smoking was also a factor in his lung cancer, and diesel exposure combined with smoking is an explanation for the relatively early age, 61 years old, of his diagnosis.

Now assuming that diesel exposure truly causes lung cancer, what was the basis for this witness (David F. Goldsmith, PhD) to opine that diesel exposure was a “significant factor” in the claimant’s developing lung cancer?  None really.  There was no basis in the report, or in the scientific data, to transmute an exposure that yielded a risk ratio of 1.24 to 1.50 for lung cancer, in a similarly exposed population to diesel emissions, into a “significant factor.” The claimant’s cancer may have arisen from background, baseline risk.  The cancer may have arisen from the risk due to smoking, which would have been on the order of a 2,000% increase, or so.  The cancer may have arisen from the claimed carcinogenicity of diesel emissions, on the order of 25 to 50%, which was rather insubstantial compared with his smoking risk.  Potentially, the cancer arose from a combination of the risk from both diesel emissions and tobacco smoking. In the population of men who looked like Mr. Oscar Brooks, by far, the biggest reduction in incidence would be achieved by removing tobacco smoking.

There were no biomarkers that identified the claimant’s lung cancer as having been caused by diesel emissions.  The expert witness’s opinion was nothing more than an ipse dixit that equated a risk, and a rather small risk, with specific causation.  Notice how a 24% increased risk from diesel emissions was a “significant factor,” but the claimant’s smoking history was merely “a factor.”

Goldsmith’s report on specific causation was a net opinion that exemplifies what is wrong with a legal system that encourages and condones baseless expert witness testimony. In Agent Orange, Judge Weinstein pointed out that the traditional judicial antipathy to probabilism would mean no recovery in many chemical and medicinal exposure cases.  If the courts lowered their scruples to permit recovery on a naked statistical inference of greater than 50%, from relative risks greater than two, some cases might remain viable (but alas not the Agent Orange case itself). Judge Weinstein was, no doubt, put off by the ability of defendants, such as tobacco companies, to avoid liability because plaintiffs would never have more than evidence of risk.  In the face of relative risks often in excess of 30, with attributable risks in excess of 95%, this outcome was disturbing.

Judge Weinstein’s compromise was a pragmatic solution to the problem of adjudicating specific causation on the basis of risk evidence. Although as noted above, many scientists rejected any use of risk to support specific causation inferences, some scientists agreed with this practical solution.  Ironically, David Goldsmith, the author of the report in the Oscar Brooks case, supra, was one such writer who had embraced the relative risk cut off:

“A relative risk greater than 2.0 produces an attributable risk (sometimes called attributable risk percent10) or an attributable fraction that exceeds 50%.  An attributable risk greater than 50% also means that ‘it is more likely than not’, or, in other words, there is a greater than 50% probability that the exposure to the risk factor is associated with disease.”

David F. Goldsmith & Susan G. Rose, “Establishing Causation with Epidemiology,” in Tee L. Guidotti & Susan G. Rose, eds., Science on the Witness Stand:  Evaluating Scientific Evidence in Law, Adjudication, and Policy 57, 60 (OEM Press 2001).

In the Brooks case, Goldsmith did not have an increased risk even close to 2.0. The litigation industry ultimately would not accept anything other than full compensation for attributable risks greater than 0%.


[1] See, e.g., Sander Greenland, “Relation of the Probability of Causation to Relative Risk and Doubling Dose:  A Methodologic Error that Has Become a Social Problem,” 89 Am. J. Pub. Health 1166, 1168 (1999)(“[a]ll epidemiologic measures (such as rate ratios and rate fractions) reflect only the net impact of exposure on a population”); Joseph V. Rodricks & Susan H. Rieth, “Toxicological Risk Assessment in the Courtroom:  Are Available Methodologies Suitable for Evaluating Toxic Tort and Product Liability Claims?” 27 Regulatory Toxicol. & Pharmacol. 21, 24-25 (1998)(noting that a population risk applies to individuals only if all persons within the population are the same with respect to the influence of the risk on outcome); G. Friedman, Primer of Epidemiology 2 (2d ed. 1980)(epidemiologic studies address causes of disease in populations, not causation in individuals)

 

Hysterical Histortions

June 12th, 2014

Ramses Delafontaine is a young, aspiring historian. In his graduate thesis, Historicizing the Forensification of History: A Study of Historians as Expert Witnesses in Tobacco Litigation in the United States of America (Univ. Ghent 2013), discusses my commentary on Marxist historians, David Rosner and Gerald Markowitz, and suggests that I claim that lawyers without historical training or experience can do the job of historians.  Id. at 98-100.

Given their training and skills in documenting and recounting narratives, lawyers do, indeed, often do the job of historians, and they often do it very well. Of course, lawyers are often guided, inspired, and assisted by professional historians. Sometimes that guidance is necessary. Lawyers’ narratives, unlike historians’, are also subject to judicial control in the form of evidentiary rules about speculation, relevance, reliability, authentication, and trustworthiness.

Regurgitating Historical Evidence

American courts have thus appropriately limited the use of expert witnesses to present historical narratives in judicial proceedings.  See In re Fosamax Prods. Liab. Litig., 645 F. Supp. 2d 164, 192 (S.D.N.Y. 2009) (‘‘[A]n expert cannot be presented to the jury solely for the purpose of constructing a factual narrative based upon record evidence.’’) (internal citations omitted).[1] In Fosamax, Judge Keenan excluded Dr. Susan Parisian’s proffered narrative account of the development and approval of a bisphosphonate medication for osteoporosis in a case involving a claim of osteonecrosis of the jaw (“phossy jaw”).  Judge Keenan detailed the problems that arise from using expert witnesses to present partisan historical narratives:

“In detailing the factual basis for her opinions, Dr. Parisian’s report presents a narrative of select regulatory events through the summary or selective quotation from internal Merck documents, regulatory filings, and the deposition testimony of Merck employees.

The Court agrees with Merck that, to the extent such evidence is admissible, it should be presented to the jury directly. Dr. Parisian’s commentary on any documents and exhibits in evidence will be limited to explaining the regulatory context in which they were  created, defining any complex or specialized terminology, or drawing inferences that would not be apparent without the benefit of experience or specialized knowledge. She will not be permitted to merely read, selectively quote from, or ‘regurgitate’ the evidence.”

Id.[2]

Ramses Delafontaine is wrong, however, to opine that my rants against Rosner and Markowitz suggest that I have ruled out any role for historians in litigation.   Lisa K. Walker is an historian, who trained at the University of California, Berkeley.  In the welding fume litigation, plaintiffs’ counsel weaved a complex narrative of conspiracy allegations, based in large measure upon the absence of evidence. At the request of Metropolitan Life Insurance Company, Professor Walker researched the dates of publication for various editions of a booklet, Health Protection in Welding, which formed the basis for the plaintiffs’ speculations. Walker found and analyzed eight separate editions, and dated each by internal and external references.  Based upon her research, Walker submitted a declaration, which ultimately was immensely helpful to the resolution of the issues. See In re Welding Rod Prods. Liab. Litig., Case No. 1:03-CV-17000 (MDL Docket No. 1535) (N.D.Ohio Nov. 24, 2004; In re Welding Fume Prods. Liab. Litig., 2007 WL 1087605 (N.D. Ohio April 9, 2007)  (O’Malley, J.) (granting summary judgment in favor of Metropolitan Life Insurance Company).

Although Metropolitan Life should not have had to disprove the allegations, Walker’s research showed the plaintiffs’ historical speculation to be clearly wrong. Sometimes historians can and do contribute valuably to the resolution of legal issues, but the issues are usually more modest than the ones that social and labor historians want to see resolved in favor of their pet theories.

Delafontaine also has a website on the role of historians as expert witnesses in United States tobacco cases.  One of his theses is that “historians who have been involved as expert witnesses for the tobacco industry have been in it for the money and have sold their professional integrity as a historian and an academic.” Delafontaine’s approach is a bit one-sided in that he sees only defendants’ expert witnesses as being “in it for the money,” despite the substantial billings of plaintiffs’ expert witnesses, and their ideological biases. Somehow Delafontaine has missed how even the feel-good advocacy of anti-tobacco activists occasionally outruns its evidentiary headlights.  See, e.g., Michael Siegel, “Is the tobacco control movement misrepresenting the acute cardiovascular health effects of secondhand smoke exposure? An analysis of the scientific evidence and commentary on the implications for tobacco control and public health practice,” 4 Epidem. Persp. & Innov. 12 (2007).


[1] See also In re Prempro Prods. Liab. Litig., 554 F.Supp. 2d 871, 880, 886 (E.D.Ark.2008) (overturning a punitive damages award based on Dr. Parisian’s testimony in part because she ‘‘did not explain the documents, provide summaries, or tie them in to her proposed regulatory testimony’’ and ‘‘did not provide analysis, opinion, or expertise’’); Highland Capital Management, L.P. v. Schneider, 379 F.Supp. 2d 461, 469 (S.D.N.Y.2005)(‘‘[A]n expert cannot be presented to the jury solely for the purpose of constructing a factual narrative based upon record evidence.’’); In re Rezulin Products Liab. Litig., 309 F.Supp. 2d 531, 546 (S.D.N.Y.2004) (rejecting portion of expert report presenting history of Rezulin for no purpose but to ‘‘provid[e] an historical commentary of what happened,’’ along with subjective assessments of intent, motives, and states of mind); In re Diet Drugs Prods. Liab. Litig., MDL No. 1203, 2000 WL 876900, at *9 (E.D.Pa. June 20, 2000) (same); Taylor v. Evans, 1997 WL 154010, at *2 (S.D.N.Y. Apr.1, 1997) (rejecting portions of expert report on the ground that the testimony consisted of ‘‘a narrative of the case which a lay juror is equally capable of constructing’’).

[2] Dr. Parisian appears to be a serial narrative abuser, who has been repeatedly but not consistently excluded. See Scheinberg v. Merck & Co. 924 F.Supp. 2d 477, 497 (S.D.N.Y. 2013); Pritchett v. I-Flow Corp., 2012 WL 1059948, at *7 (D. Colo. Mar. 28, 2012); Miller v. Stryker Instruments, 2012 WL 1718825, at *10-12 (D. Ariz. Mar. 29, 2012) (excluding narrative testimony); Kaufman v. Pfizer Pharms., Inc., 2011 WL 7659333, at *6-10 (S.D. Fla. Aug. 4, 2011), reh’g denied, 2011 WL 10501233 (S.D. Fla. Aug. 10, 2011)(narrative testimony); Hines v. Wyeth, 2011 WL 2680842, at *7 (S.D.W. Va. July 8, 2011), reh’g granted in part, 2011 WL 2730908, at *2 (S.D.W. Va. July 13, 2011); In re Heparin Prods. Liab. Litig., 2011 WL 1059660, at *8 (N.D. Ohio March 21, 2011); Lopez v. I-Flow Inc., 2011 WL 1897548, at *9-10 (D. Ariz. Jan. 26, 2011) (narrative testimony); In re Trasylol Prods. Liab. Litig., 709 F.Supp.2d 1323, 1351 (S.D. Fla. 2010)(tendentious narrative testimony) (“Plainly stated, Dr. Parisian is an advocate, presented with the trappings of an expert but with no expectation or intention of abiding by the opinion constraints of Rule 702.”), reh’g denied, 2010 WL 2541892 (S.D. Fla. June 22, 2010); In re Gadolinium-Based Contrast Agents Prods. Liab. Litig., 2010 WL 1796334, at *13 (N.D. Ohio May 4, 2010); Bessemer v. Novartis Pharms. Corp., 2010 WL 2300222 (N.J. Super. Law Div. April 30, 2010); In re Prempro Prods. Liab. Litig., 554 F. Supp. 2d 871, 879-87 (E.D. Ark. 2008)(reversing judgment on grounds of erroneous admission of narrative testimony), aff’d  in relevant part, 586 F.3d 547, 571 (8th Cir. 2009). Occasionally, Dr. Parisian slips through the gate.  See, e.g., Block v. Woo Young Medical Co., 937 F.Supp.2d 1028, 1044-47 (2013)

Goodman v Viljoen – Statistical Fallacies from Both Sides

June 8th, 2014

There was a deep irony to the Goodman[1] case.  If a drug company, in 1995, marketed antenatal corticosteroid (ACS) for the prevention of cerebral palsy (CP) in the United States, the government might well have prosecuted the company for misbranding.  The company might also be subject to a False Claims Act case as well. No clinical trial had found ACS efficacious for the prevention of CP at the significance level typically required by the FDA; no meta-analysis had found ACS statistically significantly better than placebo for this purpose.  In the Goodman case, however, failure to order a full course of ACS was malpractice with respect to the claimed causation of CP in the Goodman twins.

The Goodman case also occasioned a well-worn debate over the difference between scientific and legal evidence, inference, and standards of “proof.” The plaintiffs’ case rested upon a Cochrane review of ACS with respect to various outcomes. For CP, the Cochrane meta-analyzed only clinical trial data, and reported:

“a trend towards fewer children having cerebral palsy (RR 0.60, 95% CI 0.34 to 1.03, five studies, 904 children, age at follow up two to six years in four studies, and unknown in one study).”[2]

The defendant, Dr. Viljoen, appeared to argue that the Cochrane meta-analysis must be disregarded because it did not provide a showing of efficacy for ACS in preventing CP, at a significance probability less than 5 percent.  Here is the trial court’s characterization of Dr. Viljoen’s argument:

“[192] The argument that the Cochrane data concerning the effects of ACS on CP must be ignored because it fails to reach statistical significance rests on the flawed premise that legal causation requires the same standard of proof as medical/scientific causation. This is of course not the case; the two standards are in fact quite different. The law is clear that scientific certainty is not required to prove causation to the legal standard of proof on a balance of probabilities (See: Snell v. Farrell, [1990] 2 S.C.R. 311, at para. 34). Accordingly, the defendant’s argument in this regard must fail and for the purposes of this court, I accept the finding of the Cochrane analysis that ACS reduces the instance [sic] of CP by 40%.”

“Disregard” seems extreme for a meta-analysis that showed a 40% reduction in risk of a serious central nervous system disorder, with p = 0.065.  Perhaps Dr. Viljoen might have tempered his challenge some by arguing that the Cochrane analysis was insufficient.  One problem with Dr. Viljoen’s strident argument about statistical significance was that it overshadowed the more difficult, qualitative arguments about threats to validity in the Cochrane finding from loss to follow up in the aggregated trial data. These threats were probably stronger arguments against accepting the Cochrane “trend” as a causal conclusion. Indeed, the validity and the individual studies and the meta-analyses, along with questions about the accuracy of data, were not reflected in Bayesian analysis.

Another problem is that Dr. Viljoen’s strident assertion that p < 0.05 was absolutely necessary fed plaintiffs’ argument that the defendant was attempting to change the burden of proof for plaintiffs from greater than 50% to 95% or greater.  Given the defendant’s position, great care was required to prevent the trial court from committing the transposition fallacy.

Justice Walters rejected the suggestion that a meta-analysis with a p-value of 6.5% should be disregarded, but the court’s discussion skirts the question whether and how the Cochrane data can be sufficient to support a conclusion of ACS efficacy. Aside from citing a legal case, however, Justice Walters provided no basis for suggesting that the scientific standard of proof was different from the legal standard. From the trial court’s opinion, the parties or their expert witnesses appeared to conflate “confidence,” a technical term when used to describe intervals or random error around sample statistics, with “level of certainty” in the obtained result.

Justice Walters is certainly not the first judge to fall prey to the fallacious argument that the scientific burden of proof is 95%.[3]  The 95% is, of course, the coefficient of confidence for the confidence interval that is based upon a p-value of 5%. No other explanation for why 95% is a “scientific” standard of proof was offered in Goodman; nor is it likely that anyone could point to an authoritative source for the claim that scientists actually adjudge facts and theories by this 95 percent probability level.

Justice Walters’ confusion was led by the transposition fallacy, which confuses posterior and significance probabilities.  Here is a sampling from Her Honor’s opinion, first from Dr. Jon Barrett, one of the plaintiffs’ expert witnesses, an obstetrician and fetal maternal medicine specialist at Sunnybrook Hospital, in Toronto, Ontario:

“[85] Dr. Barrett’s opinion was not undermined during his lengthy cross-examination. He acknowledged that the scientific standard demands 95% certainty. He is, however, prepared to accept a lower degree of certainty. To him, 85 % is not merely a chance outcome.

                                                                                        * * *

[87] He acknowledged that scientific evidence in support of the use of corticosteroids has never shown statistical significance with respect to CP. However, he explained it is very close at 93.5%. He cautioned that if you use a black and white outlook and ignore the obvious trends, you will falsely come to the conclusion that there is no effect.”

Dr. Jon (Yoseph) Barrett is a well-respected physician, who specializes in high-risk pregnancies, but his characterization of a black-white outlook on significance testing as leading to a false conclusion of no effect was statistically doubtful.[4]  Dr. Barrett may have to make divinely inspired choices in surgery, but in a courtroom, expert witnesses are permitted to say that they just do not know. Failure to achieve statistical significance, with p < 0.05, does support a conclusion that there is no effect.

Professor Andrew Willan was plaintiffs’ testifying expert witness on statistics.  Here is how Justice Walters summarized Willan’s testimony:

“[125] Dr. Willan described different statistical approaches and in particular, the frequentist or classical approach and the Bayesian approach which differ in their respective definitions of probability. Simply, the classical approach allows you to test the hypothesis that there is no difference between the treatment and a placebo. Assuming that there is no difference, allows one to make statements about the probability that the results are not due to chance alone.

To reach statistical significance, a standard of 95% is required. A new treatment will not be adopted into practice unless there is less than a 5% chance that the results are due to chance alone (rather than due to true treatment effect).

[127] * * * The P value represents the frequentist term of probability. For the CP analysis [from the Cochrane meta-analysis], the P value is 0.065. From a statistical perspective, that means that there is a 6.5% chance that the differences that are being observed between the treatment arm versus the non-treatment arm are due to chance rather than the treatment, or conversely, a 93.5% chance that they are not.”

Justice Walters did not provide transcript references for these statements, but they are clear examples of the transposition fallacy. The court’s summary may have been unfair to Professor Willan, who seems to have taken care to avoid the transposition fallacy in his testimony:

“And I just want to draw your attention to the thing in parenthesis where it says, “P = 0.065.” So, basically that is the probability of observing data this extremely, this much in favor of ACS given, if, if in fact the no [sic, null] hypothesis was true. So, if, if the no hypothesis was true, that is there was no difference, then the probability of observing this data is only 6.5 percent.”

Notes of Testimony of Andrew Willan at 26 (April , 2010). In this quote, Professor Willan might have been more careful to point out that the significance probability of 6.5%  is a cumulative probability by describing the data observed “this extremely” and more. Nevertheless, Willan certainly made clear that the probability measure was based upon assuming the correctness of the null hypothesis. The trial court, alas, erred in stating the relevant statistical concepts.

And then there was the bizarre description by Justice Walters, of the Cochrane data, as embodying a near-uniform distribution represented by the Cochrane data:

“[190] * * * The Cochrane analysis found that ACS reduced the risk of CP (in its entirety) by 40%, 93.5% of the time.”

The trial court did not give the basis for this erroneous description of the Cochrane ACS/CP data.[5] To be sure, if the Cochrane result were true, then 40% reduction might be the expected value for all trials, but it would be a remarkable occurrence for 93.5% of the trials to obtain the same risk ratio as the one observed in the meta-analysis.

The defendant’s expert witness on statistical issues, Prof. Robert Platt, similarly testified that the significance probability reported by the Cochrane was dependent upon an assumption of the null hypothesis of no association:

“What statistical significance tells us, and I mentioned at the beginning that it refers to the probability of a chance finding could occur under the null-hypothesis of no effect. Essentially, it provides evidence in favour of there being an effect.  It doesn’t tell us anything about the magnitude of that effect.”

Notes of Testimony of Robert Platt at 11 (April 19, 2010)

Perhaps part of the confusion resulted from Prof. Willan’s sponsored Bayesian analysis, which led him to opine that the Cochrane data permitted him to state that there was a 91 to 97 percent probability of an effect, which might have appeared to the trial court to be saying the same thing as interpretation of the Cochrane’s p-value of 6.5%.  Indeed, Justice Walters may have had some assistance in this confusion from the defense statistical expert witness, Prof. Platt, who testified:

“From the inference perspective the p-value of 0.065 that we observe in the Cochrane review versus a 91 to 97 percent probability that there is an effect, those amount to the same thing.”

Notes of Testimony of Robert Platt at 50 (April 19, 2010).  Now the complement of the p-value, 93.5%, may have fallen within the range of posterior probabilities asserted by Professor Willan, but these probabilities are decidedly not the same thing.

Perhaps Prof. Platt was referring only to the numerical equivalence, but his language, “the same thing,” certainly could have bred misunderstanding.  The defense apparently attacked the reliability of the Bayesian analysis before trial, only to abandon the challenge by the time of trial.  At trial, defense expert witness Prof. Platt testified that he did not challenge Willan’s Bayesian analysis, or the computation of posterior probabilities.  Platt’s acquiescence in Willan’s Bayesian analysis is unfortunate because the parties never developed testimony exactly as to how Willan arrived at his posterior probabilities, and especially as to what prior probability he employed.

Professor Platt went on to qualify his understanding of Willan’s Bayesian analysis as providing a posterior probability that there is an effect, or in other words, that the “effect size” is greater than 1.0.  At trial, the parties spent a good deal of time showing that the Cochrane risk ratio of 0.6 represented the decreased risk for CP of administering a full course of ACS, and that this statistic could be presented as an increased CP risk ratio of 1.7, for not having administered a full course of ACS.  Platt and Willan appeared to agree that the posterior probability described the cumulative posterior probabilities for increased risks above 1.0.

“[T]he 91% is a probability that the effect is greater than 1.0, not that it is 1.7 relative risk.”

Notes of Testimony of Robert Platt at 51 (April 19, 2010); see also Notes of Testimony of Andrew Willan at 34 (April 9, 2010) (concluding that ACS reduces risk of CP, with a probability of 91 to 97 percent, depending upon whether random effects or fixed effect models are used).[6]

One point on which the parties’ expert witnesses did not agree was whether the failure of the Cochrane’s meta-analysis to achieve statistical significance was due solely to the sparse data aggregated from the randomized trials. Plaintiffs’ witnesses appeared to have testified that had the Cochrane been able to aggregate additional clinical trial data, the “effect size” would have remained constant, and the p-value would have shrunk, ultimately to below the level of 5 percent.  Prof. Platt, testifying for the defense, appropriately criticized this hand-waving excuse:

“Q. and the probability factor, the P value, was 0.065, which the previous witness had suggested is an increase in probability of our reliability on the underlying data.  Is it reasonable to assume that this data that a further increase in the sample size will achieve statistical significance?

A. No, that’s not a reasonable assumption….”

Notes of Testimony of Robert Platt at 29 (April 19, 2010).

Positions on Appeal

Dr. Viljoen continued to assert the need for significance on appeal. As appellant, he challenged the trial court’s finding that the Cochrane review concluded that there was a 40% risk reduction. See Goodman v. Viljoen, 2011 ONSC 821, at ¶192 (CanLII) (“I accept the finding of the Cochrane analysis that ACS reduces the instance of CP by 40%”). Dr. Viljoen correctly pointed out that the Cochrane review never reached such a conclusion. Appellant’s Factum, 2012 CCLTFactum 20936, ¶64.  It was the plaintiffs’ expert witnesses, not the Cochrane reviewers, who reached the conclusion of causality from the Cochrane data.

On appeal, Dr. Viljoen pressed the point that his expert witnesses described statistical significance in the Cochrane analysis would have been “a basic and universally accepted standard” for showing that ACS was efficacious in preventing CP or PVL. Id. at ¶40. The appellant’s brief then commits to the very error that Dr. Barrett complained would follow from a finding that did not have statistical significance; Dr. Viljoen maintained that the “trend” of reduced CP reduced CD rates from ACS administration “is the same as a chance occurrence.” Defendant (Appellant), 2012 CCLTFactum 20936, at ¶40; see also id. at ¶14(e) (arguing that the Cochrane result for ACS/CP “should be treated as pure chance given it was not a statistically significant difference”).

Relying upon the Daubert decision from the United States, as well as Canadian cases, Dr. Viljoen framed one of his appellate issues as whether the trial court had “erred in relying upon scientific evidence that had not satisfied the benchmark of statistical significance”:

“101. Where a scientific effect is not shown to a level of statistical significance, it is not proven. No study has demonstrated a reduction in cerebral palsy with antenatal corticosteroids at a level of statistical significance.

102. The Trial Judge erred in law in accepting that antenatal corticosteroids reduce the risk of cerebral palsy based on Dr. Willan’s unpublished Bayesian probability analysis of the 48 cases of cerebral palsy reviewed by Cochrane—an analysis prepared for the specific purpose of overcoming the statistical limitations faced by the Plaintiffs on causation.”

Defendant (Appellant), 2012 CCLTFactum 20936. The use of the verb “proven” is problematic because it suggests a mathematical demonstration, which is never available for empirical propositions about the world, and especially not for the biological world.  The use of a mathematical standard begs the question whether the Cochrane data were sufficient to establish a scientific conclusion of the efficacy of ACS in preventing CP.

In opposing Dr. Viljoen’s appeal, the plaintiffs capitalized upon his assertion that science requires a very high level of posterior probability for establishing a causal claim, by simply agreeing with it. See Plaintiffs’ (Respondents’) Factum,  2012 CCLTFactum 20937, at ¶31 (“The scientific method requires statistical significance at a 95% level.”).  By accepting the idealized notion that science somehow requires 95% certainty (as opposed to 95% confidence levels as a test for assessing random error), the plaintiffs made the defendant’s legal position untenable.

In order to keep the appellate court thinking that the defendant was imposing an extra-legal, higher burden of proof upon plaintiffs, the plaintiffs went so far as to misrepresent the testimony of their own expert witness, Professor Willan, as having committed the transposition fallacy:

“49. Dr. Willan provided the frequentist explanation of the Cochrane analysis on CP:

a. The risk ratio (RR) is .060 which means that there is a 40% risk reduction in cerebral palsy where there has been administration of antenatal corticosteroids;

b. The upper limit of the confidence interval (CI) barely crosses 1 so it just barely fails to meet the rigid test of statistical significance;

c. The p value represents the frequentist term of probability;

d. In this case the p value is .065;

e. From a statistical perspective that means that there is a 6.5% chance that the difference observed in CP rates is due to chance alone;

f. Conversely there is a 93.5% chance that the result (the 40% reduction in CP) is due to a true treatment effect of ACS.”

2012 CCLTFactum 20937, at ¶49 (citing Evidence of Dr. Willan, Respondents’ Compendium, Tab 4, pgs. 43-52).

Although Justice Doherty dissented from the affirmance of the trial court’s judgment, he succumbed to the parties’ misrepresentations about scientific certainty, and their prevalent commission of the transposition fallacy. Goodman v. Viljoen, 2012 ONCA 896 (CanLII) at ¶36 (“Scientists will draw a cause and effect relationship only when a result follows at least 95 per cent of the time. The results reported in the Cochrane analysis fell just below that standard.”), leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

The statistical errors on both sides redounded to the benefit of the plaintiffs.


[1] Goodman v. Viljoen, 2011 ONSC 821 (CanLII), aff’d, 2012 ONCA 896 (CanLII), leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

[2] Devender Roberts & Stuart R Dalziel “Antenatal corticosteroids for accelerating fetal lung maturation for women at risk of preterm birth,” Cochrane Database of Systematic Reviews, at 8, Issue 3. Art. No. CD004454 (2006).

[3] See, e.g., In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191, 193 (S.D.N.Y. 2005) (fallaciously arguing that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant.  Id. at 188, 193.  See also Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009) (criticizing the Ephedra decision for confusing posterior probability with significance probability).

[4] I do not have the complete transcript of Dr. Barrett’s testimony, but the following excerpt from April 9, 2010, at page 100, suggests that he helped lead Justice Walters into error: “When you say statistical significance, if you say that something is statistically significance, it means you’re, for the scientific notation, 95 percent sure. That’s the standard we use, 95 percent sure that that result could not have happened by chance. There’s still a 5 percent chance it could. It doesn’t mean for sure, but 95 percent you’re sure that the result you’ve got didn’t happen by chance.”

[5] On appeal, the dissenting judge erroneously accepted Justice Walters’ description of the Cochrane review as having supposedly reported a 40% reduction in CP incidence, 93.5% of the time, from use of ACS. Goodman v. Viljoen, 2012 ONCA 896 (CanLII) at ¶36, leave appeal den’d, Supreme Court of Canada No. 35230 (July 11, 2013).

[6] The Bayesian analysis did not cure the attributability problem with respect to specific causation.