TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

New Jersey Kemps Ovarian Cancer – Talc Cases

September 16th, 2016

Gatekeeping in many courtrooms has been reduced to requiring expert witnesses to swear an oath and testify that they have followed a scientific method. The federal rules of evidence and most state evidence codes require more. The law, in most jurisdictions, requires that judges actively engage with, and inspect, the bases for expert witnesses’ opinions and claims to determine whether expert witnesses who want to heard in a courtroom have actually, faithfully followed a scientific methodology.  In other words, the law requires judges to assess the scientific reasonableness of reliance upon the actual data cited, and to evaluate whether the inferences drawn from the data, to reach a stated conclusion, are valid.

We are getting close to a quarter of a century since the United States Supreme Court outlined the requirements of gatekeeping, in Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993). Since the Daubert decision, the Supreme Court’s decisional law, and changes in the evidence rules themselves, have clarified the nature and extent of the inquiry judges must conduct into the reasonable reliance upon facts and data, and into the inferential steps leading to a conclusion.  And yet, many judges resist, and offer up excuses and dodges for shirking their gatekeeping obligations.  See generally David E. Bernstein, “The Misbegotten Judicial Resistance to the Daubert Revolution,” 89 Notre Dame L. Rev. 27 (2013).

There is a courtroom in New Jersey, in which gatekeeping is taken seriously from beginning to end.  There is at least one trial judge who encourages and even demands that the expert witnesses appear and explain their methodologies and actually show their methodological compliance.  Judge Johnson first distinguished himself in In re Accutane, No. 271(MCL), 2015 WL 753674, 2015 BL 59277 (N.J.Super. Law Div. Atlantic Cty. Feb. 20, 2015).[1] And more recently, in two ovarian cancer cases, Judge Johnson dusted two expert witnesses, who thought they could claim their turn in the witness chair by virtue of their credentials and some rather glib hand waving. Judge Johnson conducted the New Jersey analogue of a Federal Rule of Evidence 104(a) Daubert hearing, as required by the New Jersey Supreme Court’s decision in Kemp v. The State of New Jersey, 174 N.J. 412 (2002). The result was disastrous for the two expert witnesses who opined that use of talcum powder by women causes ovarian cancer. Carl v. Johnson & Johnson, No. ATL-L-6546-14, 2016 WL 4580145 (N.J. Super. Ct. Law Div., Atl. Cty., Sept. 2, 2016) [cited as Carl].

Judge Johnson obviously had a good epidemiology teacher in Professor Stephen Goodman, who testified in the Accutane case.  Against this standard, it is easy to see how the plaintiffs’ talc expert witnesses, Drs. Daniel Cramer and Dr. Graham Colditz, fell “significantly” short. After presiding over seven days of court hearings, and reviewing extensive party submissions, including the actual studies relied upon by the expert witnesses and the parties, Judge Johnson made no secret of his disappointment with the lack of rigor in the analyses proffered by Cramer and Colditz:

“Throughout these proceedings the court was disappointed in the scope of Plaintiffs’ presentation; it almost appeared as if counsel wished the court to wear blinders. Plaintiffs’ two principal witnesses on causation, Dr. Daniel Cramer and Dr. Graham Colditz, were generally dismissive of anything but epidemiological studies, and within that discipline of scientific investigation they confined their analyses to evidence derived only from small retrospective case-control studies. Both witnesses looked askance upon the three large cohort studies presented by Defendants. As confirmed by studies listed at Appendices A and B, the participants in the three large cohort studies totaled 191,090 while those case-control studies advanced by Plaintiffs’ witnesses, and which were the ones utilized in the two meta-analyses performed by Langseth and Terry, total 18,384 participants. As these proceedings drew to a close, two words reverberated in the court’s thinking:

“narrow and shallow.” It was almost as if counsel and the expert witnesses were saying, Look at this, and forget everything else science has to teach as.

Carl at *12.

Judge Johnson did what for so many judges is unthinkable; he looked behind the curtain put up by highly credentialed Oz expert witnesses in his courtroom. What he found was unexplained, unjustified selectivity in their reliance upon some but not all the available data, and glib conclusions that gloss over significant limits in the resolving power of the available epidemiologic studies. Judge Johnson was particularly unsparing of Graham Colditz, a capable scientist, who deviated from the standards he set for himself in the work he had published in the scientific community:

“Dr. Graham Colditz is a brilliant scientist and a dazzling witness. His vocal inflection, cadence, and adroit use of histrionics are extremely effective. Dr. Colditz’s reputation for his breadth of knowledge about cancer and the esteem in which he is held by his peers is well deserved. Yet, at times, it seemed that issues raised in these proceedings, and the questions posed to him, were a bit mundane for a scientist of his caliber.”

Carl at *15. Dr. Colditz and the plaintiffs’ cause were not helped by Dr. Colditz’s own previous publications of studies and reviews that failed to support any “substantial association between perineal talc use and ovarian cancer risk overall,” and failed to conclude that talc was even a “risk factor” for ovarian cancer.  Carl at *18.

Relative Risk Size

Many courts have fumbled their handling of the issue whether applicable relative risks must exceed two before fact finders may infer specific causation between claimed exposures and specific diseases. There certainly can be causal associations that involve relative risks between 1.0, up to and including 2.0.  Eliminating validity concerns may be more difficult with such smaller relative risks, but there is nothing theoretically insuperable about having a causal association based upon such small relative risks. Judge Johnson apparently saw the diversity of opinions on this relative risk issue, many of which opinions are stridently maintained, and thoroughly fallacious.

Judge Johnson ultimately did not base his decision, with respect to general or specific causation, on the magnitude of relative risk, or the covering Bradford Hill factor of “strength of association.” Dr. Cramer appropriately acknowledged that his meta-analysis result, of an odds ratio of 1.29 was “weak,” Carl at *19, and Judge Johnson was critical of Dr. Colditz for failing to address the lack of strength of the association, and for engaging in a constant refrain that the association was “significant,” which is a precision not a size estimate for the measurement. Carl at *17.

Aware of the difficulty that New Jersey appellate courts have had with the issues surrounding relative risks greater than two, Judge Johnson was realistic to steer clear of any specific judicial reliance on the small size of the relative risk.  His Honor’s prudence is unfortunate however because ultimately small relative risks, even assuming that general causation is established, do nothing to support specific causation.  Indeed, relative risks of 1.29 (and odds ratios generally overstate the size of the underlying relative risk) would on a stochastic model support the conclusion that specific causation was less than 50% probable.  Critics have pointed out that risk may not be stochastically distributed, which is a great point, except that

(1) plaintiffs often have no idea how the risk, if real, is distributed in the observed sample, and

(2) the upshot of the point is that even for relative risks greater than 2.0, there is no warrant for inferring specific causation in a given case.

Judge Johnson did wade into the relative risk waters by noting that when relative risks were “significantly” less than two, establishing biological plausibility became essential.  Carl at *11.  This pronouncement is muddled on at least two fronts.  First, the relative risk scale is a continuum, and there is no standard reference for what relative risks greater than 1.0 are “significantly” less than 2.0.  Presumably, Judge Johnson thought that 1.29 was in the “significantly less than 2.0” range, but he did not say so; nor did he cite a source that supported this assessment. Perhaps he was suggesting that the upper bound of some meta-analysis was less than two. Second, and more troubling, the claim that biological plausibility becomes “essential” in the face of small relative risks is also unsupported. Judge Johnson does not cite any support for this claim, and I am not aware of any.  Elsewhere in his opinion, Judge Johnson noted that

“When a scientific rationale doesn’t exist to explain logically the biological mechanism by which an agent causes a disease, courts may consider epidemiologic studies as an alternate [sic] means of proving general causation.”

Carl at *8. So it seems that biological plausibility is not essential after all.

This glitch in the Carl opinion is likely of no lasting consequence, however, because epidemiologists are rarely at a loss to posit some biologically plausible mechanism. As the Dictionary of Epidemiology explains the matter:

“The causal consideration that an observed, potentially causal association between an exposure and a health outcome may plausibly be attributed to causation on the basis of existing biomedical and epidemiological knowledge. On a schematic continuum including possible, plausible, compatible, and coherent, the term plausible is not a demanding or stringent requirement, given the many biological mechanisms that often can be hypothesized to underlie clinical and epidemiological observations; hence, in assessing causality, it may be logically more appropriate to require coherence (biological as well as clinical and epidemiological). Plausibility should hence be used cautiously, since it could impede development or acceptance of new knowledge that does not fit existing biological evidence, pathophysiological reasoning, or other evidence.”

Miquel Porta, et al., eds., “Biological plausibility,” in A Dictionary of Epidemiology at 24 (6th ed. 2014). Most capable epidemiologists have thought up half a dozen biologically plausible mechanisms each morning before they have had their first cup of coffee. But the most compelling reason that this judicial hiccup is inconsequential is that the plaintiffs’ expert witnesses’ postulated mechanism, inflammation, was demonstrably absent in the tissue of the specific plaintiffs.  Carl at *13. The glib invocation of “inflammation” would seem bound to fail even as the most liberal test of plausibility when talc has anti-cancer properties that result from its ability to inhibit new blood vessel formation, a necessity of solid tumor growth, and the completely unexplained selectivity for ovarian tissue to the postulated effect, which leaves vaginal, endometrial, or fallopian tissues unaffected. Carl at *13-14. On at least two occasions, the United States Food and Drug Administration rejected “Citizen Petitions” for ovarian cancer warnings on talc products, advanced by the dubious Samuel S. Epstein for the Cancer Prevention Coalition, in large measure because of Epstein’s undue selectivity in citing epidemiologic studies and because a “cogent biological mechanism by which talc might lead to ovarian cancer is lacking… .” Carl at *15, citing Stephen M. Musser, Directory FDA Director, Letter Denying Citizens’ Petition (April 1, 2014).

Large Studies

Judge Johnson quoted the Reference Manual on Scientific Evidence (3d ed.  2011) for his suggestion that establishing causation requires large studies.  The quoted language, however, really does not bear on his suggestion:

“Common sense leads one to believe that a large enough sample of individuals must be studied if the study is to identify a relationship between exposure to an agent and disease that truly exists. Common sense also suggests that by enlarging the sample size (the size of the study group), researchers can form a more accurate conclusion and reduce the chance of random error in their results…With large numbers, the outcome of test is less likely to be influenced by random error, and the researcher would have greater confidence in the inferences drawn from the data.”

Reference Manual at page 576.  What the Reference Manual simply calls for studies with “large enough” samples.  How large is large enough is a variable that depends upon the magnitude of the association to be detected, the length of follow up, and the base rate or incidence of the outcome of interest. As far as “common sense,” goes, the Reference Manual is correct only insofar as larger is better with respect to sampling error.  Increasing sample size does nothing to address internal or external validity of studies, and may lead to erroneous interpretations by allowing results to achieve statistical significance at predetermined levels, when the observed associations result from bias or confounding, and not from any underlying relationship between exposure and disease outcome.

There is a more disturbing implication in Judge Johnson’s criticism of Graham Colditz for relying upon the smaller number of subjects in the case-control studies than are found in the available cohort studies. Ovarian cancer is a relatively rare cancer (compared with breast and colon cancer), and case-control studies are more efficient at assessing increased risk than are cohort studies for a rare outcome.  The number of cases in a case-control study represents an implied population many times larger than the number of actual cases in a case-control study.  If Judge Johnson had looked at the width of the confidence intervals for the “small” case-control studies, and compared those widths to the interval widths of the cohort studies, he would have seen that “smaller” case-control studies (fewer cases, as well as fewer total subjects) can generate more statistical precision than the larger cohort studies (with many more cohort and control subjects).  A more useful comparison would have been to the number of actual ovarian cancer cases in the meta-analyzed case-control studies with the number of actual ovarian cancer cases in the cohort studies. On this comparison, the cohort studies might not fare so well.

The size of the cohort for a rare outcome is thus fairly meaningless in terms of the statistical precision generated.  Smaller case-control studies will likely have much more power, and that should be reflected in the confidence intervals of the respective studies.

The issue, as I understand the talc litigation, is not size of the case-control versus cohort studies, but rather their analytical resolving power.  Case-control studies for this sort of exposure and outcome will be plagued by recall and other biases, as well as difficulty in selecting the right control group.  And the odds ratio will tend to overestimate the relative risk, in both directions.  Cohort studies, with good, pre-morbid exposure assessments, would thus be much more rigorous and accurate in estimating the true rate ratios. In the final analysis, Judge Johnson was correct to be critical of Graham Colditz for dismissing the cohort studies, but his rationale for this criticism was, in a few places, confused and confusing. There was nothing subtle about the analytical gaps, ipse dixits, and cherry picking shown by these plaintiffs’ expert witnesses.


[1] SeeJohnson of Accutane – Keeping the Gate in the Garden State” (Mar. 28, 2015).

Judge Bernstein’s Criticism of Rule 703 of the Federal Rules of Evidence

August 30th, 2016

Federal Rule of Evidence Rule 703 addresses the bases of expert witness opinions, and it is a mess. The drafting of this Rule is particularly sloppy. The Rule tells us, among other things, that:

“[i]f experts in the particular field would reasonably rely on those kinds of facts or data in forming an opinion on the subject, they need not be admissible for the opinion to be admitted.”

This sentence of the Rule has a simple grammatical and logical structure:

If A, then B;

where A contains the concept of reasonable reliance, and B tells us the consequence that the relied upon material need not be itself admissible for the opinion to be admissible.

But what happens if the expert witness has not reasonably relied upon certain facts or data; i.e., ~A?  The conditional statement as given does not describe the outcome in this situation. We are not told what happens when an expert witness’s reliance in the particular field is unreasonable.  ~A does not necessarily imply ~B. Perhaps the drafters meant to write:

B if and only if A.

But the drafters did not give us the above rule, and they have left judges and lawyers to make sense of their poor grammar and bad logic.

And what happens when the reliance material is independently admissible, say as a business record, government report, and first-person observation?  May an expert witness rely upon admissible facts or data, even when a reasonable expert would not do so? Again, it seems that the drafters were trying to limit expert witness reliance to some rule of reason, but by tying reliance to the admissibility of the reliance material, they managed to conflate two separate notions.

And why is reliance judged by the expert witness’s particular field?  Fields of study and areas of science and technology overlap. In some fields, it is common place for putative experts to rely upon materials that would not be given the time of day in other fields. Should we judge the reasonableness of homeopathic healthcare providers’ reliance by the standards of reasonableness in homeopathy, such as it is, or should we judge it by the standards of medical science? The answer to this rhetorical question seems obvious, but the drafters of Rule 703 introduced a Balkanized concept of science and technology by introducing the notion of the expert witness’s “particular field.” The standard of Rule 702 is “knowledge” and “helpfulness,” both of which concepts are not constrained by “particular fields.”

And then Rule 703 leaves us in the dark about how to handle an expert witness’s reliance upon inadmissible facts or data. According to the Rule, “the proponent of the opinion may disclose [the inadmissible facts or data] to the jury only if their probative value in helping the jury evaluate the opinion substantially outweighs their prejudicial effect. And yet, disclosing inadmissible facts or data would always be highly prejudicial because they represent facts and data that the jury is forbidden to consider in reaching its verdict.  Nonetheless, trial judges routinely tell juries that an expert witness’s opinion is no better than the facts and data on which the opinion is based.  If the facts and data are inadmissible, the jury must disregard them in its fact finding; and if an expert witness’s opinion is based upon facts and data that are to be disregarded, then the expert witness’s opinion must be disregarded as well. Or so common sense and respect for the trial’s truth-finding function would suggest.

The drafters of Rule 703 do not shoulder all the blame for the illogic and bad results of the rule. The judicial interpretation of Rule 703 has been sloppy, as well. The Rule’s “plain language” tells us that “[a]n expert may base an opinion on facts or data in the case that the expert has been made aware of or personally observed.”  So expert witnesses should be arriving at their opinions through reliance upon facts and data, but many expert witnesses rely upon others’ opinions, and most courts seem to be fine with such reliance.  And the reliance is often blind, as when medical clinicians rely upon epidemiologic opinions, which in turn are based upon data from studies that the clinicians themselves are incompetent to interpret and critique.

The problem of reliance, as contained within Rule 703, is deep and pervasive in modern civil and criminal trials. In the trial of health effect claims, expert witnesses rely upon epidemiologic and toxicologic studies that contain multiple layers of hearsay, often with little or no validation of the trustworthiness of many of those factual layers. The inferential methodologies are often obscure, even to the expert witnesses, and trial counsel are frequently untrained and ill prepared to expose the ignorance and mistakes of the expert witnesses.

Back in February 2008, I presented at an ALI-ABA conference on expert witness evidence about the problems of Rule 703.[1] I laid out a critique of Rule 703, which showed that the Rule permitted expert witnesses to rely upon “castles in the air.” A distinguished panel of law professors and judges seemed to agree; at least no one offered a defense of Rule 703.

Shortly after I presented at the ALI-ABA conference, Professor Julie E. Seaman published an insightful law review in which she framed the problems of rule 703 as constitutional issues.[2] Encouraged by Professor Seaman’s work, I wrote up my comments on Rule 703 for an ABA publication,[3] and I have updated those comments in the light of subsequent judicial opinions,[4] as well as the failure of the Third Edition of the Reference Manual of Scientific Evidence to address the problems.[5]

===================

Judge Mark I. Bernstein is a trial court judge for the Philadelphia County Court of Common Pleas. I never tried a case before Judge Bernstein, who has announced his plans to leave the Philadelphia bench after 29 years of service,[6] but I had heard from some lawyers (on both sides of the bar) that he was a “pro-plaintiff” judge. Some years ago, I sat next to him on a CLE panel on trial evidence, at which he disparaged judicial gatekeeping,[7] which seemed to support his reputation. The reality seems to be more complex. Judge Bernstein has shown that he can be a critical consumer of complex scientific evidence, and an able gatekeeper under Pennsylvania’s crazy quilt-work pattern of expert witness law. For example, in a hotly contested birth defects case involving sertraline, Judge Bernstein held a pre-trial evidentiary hearing and looked carefully at the proffered testimony of Michael D. Freeman, a chiropractor and self-styled “forensic epidemiologist, and Robert Cabrera, a teratologist. Applying a robust interpretation of Pennsylvania’s Frye rule, Judge Bernstein excluded Freeman and Cabrera’s proffered testimony, and entered summary judgment for defendant Pfizer, Inc. Porter v. Smithkline Beecham Corp., 2016 WL 614572 (Phila. Cty. Ct. Com. Pl.). SeeDemonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case” (Oct. 6, 2015).

And Judge Bernstein has shown that he is one of the few judges who takes seriously Rule 705’s requirement that expert witnesses produce their relied upon facts and data at trial, on cross-examination. In Hansen v. Wyeth, Inc., Dr. Harris Busch, a frequent testifier for plaintiffs, glibly opined about the defendant’s negligence.  On cross-examination, he adverted to the volumes of depositions and documents he had reviewed, but when defense counsel pressed, the witness was unable to produce and show exactly what he had reviewed. After the jury returned a verdict for the plaintiff, Judge Bernstein set the verdict aside because of the expert witness’s failure to comply with Rule 705. Hansen v. Wyeth, Inc., 72 Pa. D. & C. 4th 225, 2005 WL 1114512, at *13, *19, (Phila. Ct. Common Pleas 2005) (granting new trial on post-trial motion), 77 Pa. D. & C. 4th 501, 2005 WL 3068256 (Phila. Ct. Common Pleas 2005) (opinion in support of affirmance after notice of appeal).

In a recent law review article, Judge Bernstein has issued a withering critique of Rule 703. See Hon. Mark I. Bernstein, “Jury Evaluation of Expert Testimony Under the Federal Rules,” 7 Drexel L. Rev. 239 (2015). Judge Bernstein is clearly dissatisfied with the current approach to expert witnesses in federal court, and he lays almost exclusive blame on Rule 703 and its permission to hide the crucial facts, data, and inferential processes from the jury. In his law review article, Judge Bernstein characterizes Rules 703 and 705 as empowering “the expert to hide personal credibility judgments, to quietly draw conclusions, to individually decide what is proper evidence, and worst of all, to offer opinions without even telling the jury the facts assumed.” Id. at 264. Judge Bernstein cautions that the subversion of the factual predicates for expert witnesses’ opinions under Rule 703 has significant, untoward consequences for the court system. Not only are lawyers allowed to hire professional advocates as expert witnesses, but the availability of such professional witnesses permits and encourages the filing of unnecessary litigation. Id. at 286. Hear hear.

Rule 703’s practical consequence of eliminating the hypothetical question has enabled the expert witness qua advocate, and has up-regulated the trial as a contest of opinions and opiners rather than as an adversarial procedure that is designed to get at the truth. Id. at 266-67. Without having access to real, admissible facts and data, the jury is forced to rely upon proxies for the truth: qualifications, demeanor, and courtroom poise, all of which fail the jury and the system in the end.

As a veteran trial judge, Judge Bernstein makes a persuasive case that the non-disclosure permitted under Rule 703 is not really curable under Rule 705. Id. at 288.  If the cross-examination inquiry into reliance material results in the disclosure of inadmissible facts, then judges and the lawyers must deal with the charade of a judicial instruction that the identification of the inadmissible facts is somehow “not for the truth.” Judge Bernstein argues, as have many others, that this “not for the truth” business is an untenable fiction, either not understood or ignored by jurors.

Opposing counsel, of course, may ask for an elucidation of the facts and data relied upon, but when they consider the time and difficulty involved in cross-examining highly experienced, professional witnesses, opposing counsel usually choose to traverse the adverse opinion by presenting their own expert witness’s opinion rather than getting into nettlesome details and risking looking foolish in front of the jury, or even worse, allowing the highly trained adverse expert witness to run off at the mouth.

As powerful as Judge Bernstein’s critique of Rule 703 is, his analysis misses some important points. Lawyers and judges have other motives for not wanting to elicit underlying facts and data: they do not want to “get into the weeds,” and they want to avoid technical questions of valid inference and quality of data. Yet sometimes the truth is in the weeds. Their avoidance of addressing the nature of inference, as well as facts and data, often serves to make gatekeeping a sham.

And then there is the problem that arises from the lack of time, interest, and competence among judges and jurors to understand the technical details of the facts and data, and inferences therefrom, which underlie complex factual disputes in contemporary trials. Cross examination is reduced to the attempt to elicit “sound bites” and “cheap shots,” which can be used in closing argument. This approach is common on both sides of the bar, in trials before judges and juries, and even at so-called Daubert hearings. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1, 32 (2015) (“Rule 703 is frequently ignored in Daubert analyses”).

The Rule 702 and 703 pretrial hearing is an opportunity to address the highly technical validity questions, but even then, the process is doomed to failure unless trial judges make adequate time and adopt an attitude of real intellectual curiosity to permit a proper exploration of the evidentiary issues. Trial lawyers often discover that a full exploration is technical and tedious, and that it pisses off the trial judge. As much as judges dislike having to serve as gatekeepers of expert witness opinion testimony, they dislike even more having to assess the reasonableness of individual expert witness’s reliance upon facts and data, especially when this inquiry requires a deep exploration of the methods and materials of each relied upon study.

In favor of something like Rule 703, Bernstein’s critique ignores that there are some facts and data that will never be independently admissible. Epidemiologic studies, with their multiple layers of hearsay, come to mind.

Judge Bernstein, as a reformer, is wrong to suggest that the problem is solely in hiding the facts and data from the jury. Rules 702 and 703 march together, and there are problems with both that require serious attention. See David E. Bernstein & Eric G. Lasker,“Defending Daubert: It’s Time to Amend Federal Rule of Evidence 702,” 57 William & Mary L. Rev. 1 (2015); see alsoOn Amending Rule 702 of the Federal Rules of Evidence” (Oct. 17, 2015).

And we should remember that the problem is not solely with juries and their need to see the underlying facts and data. Judges try cases too, and can butcher scientific inference with any help from a lay jury. Then there is the problem of relied upon opinions, discussed above. And then there is the problem of unreasonable reliance of the sort that juries cannot discern even if they see the underlying, relied upon facts and data.


[1] Schachtman, “Rule 703 – The Problem Child of Article VII”; and “The Effective Presentation of Defense Expert Witnesses and Cross-examination of Plaintiffs’ Expert Witnesses”; at the ALI-ABA Course on Opinion and Expert Witness Testimony in State and Federal Courts (February 14-15, 2008).

[2] See Julie E. Seaman, “Triangulating Testimonial Hearsay: The Constitutional Boundaries of Expert Opinion Testimony,” 96 Georgetown L.J. 827 (2008).

[3]  Nathan A. Schachtman, “Rule of Evidence 703—Problem Child of Article VII,” 17 Proof 3 (Spring 2009).

[4]RULE OF EVIDENCE 703 — Problem Child of Article VII” (Sept. 19, 2011)

[5] SeeGiving Rule 703 the Cold Shoulder” (May 12, 2012); “New Reference Manual on Scientific Evidence Short Shrifts Rule 703,” (Oct. 16, 2011).

[6] Max Mitchell, “Bernstein Announces Plan to Step Down as Judge,” The Legal Intelligencer (July 29, 2016).

[7] See Schachtman, “Court-Appointed Expert Witnesses,” for Mealey’s Judges & Lawyers in Complex Litigation, Class Actions, Mass Torts, MDL and the Monster Case Conference, in West Palm Beach, Florida (November 8-9, 1999). I don’t recall Judge Bernstein’s exact topic, but I remember he criticized the Pennsylvania Supreme Court’s decision in Blum v. Merrill Dow Pharmaceuticals, 534 Pa. 97, 626 A.2d 537 ( 1993), which reversed a judgment for plaintiffs, and adopted what Judge Bernstein derided as a blending of Frye and Daubert, which he called Fraubert. Judge Bernstein had presided over the Blum trial, which resulted in the verdict for plaintiffs.

Art Historian Expert Testimony

August 15th, 2016

Art appraisal and authentication is sometimes held out as a non-technical and non-scientific area of expertise, and as such, not subject to rigorous testing.[1] But to what extent is this simply excuse mongering for an immature field of study? The law has seen way too much of this sort of rationalization in criminal forensic studies.[2] If an entire field of learning suffers from unreliability because of its reliance upon subjective methodologies, lack of rigor, inability or unwillingness to use measurements, failure to eliminate biases through blinding, and the like, then do expert witnesses in this field receive a “pass” under Rule 702, simply because they are doing reasonably well compared with their professional colleagues?

In the movie Who the Fuck is Jackson Pollack, the late Thomas Hoving was interviewed about the authenticity of a painting claimed to have been “painted” by Jackson Pollack. Hoving “authoritatively,” and with his typical flamboyance, averred that the disputed painting was not a Pollack because the work “did not sing to me like a Pollack.” Hoving did not, however, attempt to record the notes he heard; nor did Hoving speak to what key Pollack usually painted in.

In a recent case of defamation and tortious interference with prospective business benefit, a plaintiff sued over the disparagement of a painting’s authenticity and provenance. As a result of the defendants’ statements that the painting at issue was not created by Peter M. Doig, auction houses refused to sell the painting held by plaintiff. In litigation, the plaintiff proffered an expert witness who opined that the painting was, in fact, created by Doig. The defendants challenged plaintiff’s expert witness as not reliable or relevant under Federal Rule of Evidence 702. Fletcher v. Doig, 13 C 3270, 2016 U.S. Dist. LEXIS 95081 (N.D. Ill. July 21, 2016).

Peter Bartlow, the plaintiff’s expert witness on authenticity, was short on academic credentials. He had gone to college, and finished only one year of graduate study in art history. Bartlow did, however, have 40 years in experience in appraisal and authentication. Fletcher, at *3-4. Beyond qualifications, the defendants complained that Bartlow’s method was

(1) invented for the case,

(2) was too “generic” to establish authenticity, and

(3) failed to show that any claimed generic feature was unique to the work of the artist in question, Peter M. Doig.

The trial court rebuffed this challenge by noting that Peter Bartlow did not have to be an expert specifically in Doig’s work. Fletcher at *7. Similarly, the trial court rejected the defendants’ suggestion that the disputed work must exhibit “unique” features of Doig’s ouevre. Bartlow had made a legally sufficient case for his opinions based upon a qualitative analysis of 45 acknowledged works, using specific qualitative features of 11 known works. Id. At *10. Specifically, Bartlow compared types of paint, similarities in styles, shapes and positioning, and “repeated lineatures” by superimposing lines from known paintings to the questioned ones. Id. With respect to the last of these approaches, the trial court found that Bartlow’s explanation that the approach of superimposing lines to show similarity was simply a refinement of methods commonly used by art appraisers.

By comparison with Thomas Hoving’s subjective auditory methodology, as explained in Who the Fuck, Bartlow’s approach was positively brilliant, even if the challenged methodologies left much to be desired. For instance, Bartlow compared one disputed painting with 45 or so paintings of accepted provenance. No one tested Bartlow’s ability, blinded to provenance, to identify true and false positives of Doig paintings. SeeThe Eleventh Circuit Confuses Adversarial and Methodological Bias, Manifestly Erroneously” (June 6, 2015); see generally Christopher Robertson & Aaron Kesselheim, Blinding as a Solution to Bias: Strengthening Biomedical Science, Forensic Science, and Law (2016).

Interestingly, the Rule 702 challenges in Fletcher were in a case slated to be tried by the bench. The trial court thus toasted the chestnut that trial courts have even greater latitude in admitting expert witness opinion testimony in bench trials, in which “the usual concerns of [Rule 702] – keeping unreliable testimony from the jury – are not present.” Fletcher at *3 (citing Metavante Corp. v. Emigrants Savings Bank, 619 F.3d 648, 670 (7th Cir. 2010)). Citing Seventh Circuit precedent, the trial court, in Fletcher, asserted that the need to rule on admissibility before trial was lessened in a bench trial. Id. (citing In re Salem, 465 F.3d 767, 777 (7th Cir. 2006)). The courts that have taken this position have generally failed to explain why the standard for granting or denying a Rule 702 challenge should be different in a bench trial. Clearly, a bench trial can be just as much a waste of time, money, and energy as a jury trial. Even more clearly, judges can be, and are, snookered by misleading expert witness opinions, and they are also susceptible to their own cognitive biases and the false allure of unreliable opinion testimony, built upon invalid inferences. Men and women do not necessarily see more clearly when wearing black robes, but they can achieve some measure of objectivity by explaining and justifying their gatekeeping opinions in writing, subject to public review, comment, and criticism.


[1] See, e.g. Lees v. Carthage College, 714 F.3d 516, 525 (7th Cir. 2013) (holding that an expert witness’s testimony on premises security involved non-scientific expertise and knowledge that did “not easily admit of rigorous testing and replication”).

[2] See, e.g., National Academies of Science, Strengthening Forensic Science in the United States: A Path Forward (2009).

High, Low and Right-Sided Colonics – Ridding the Courts of Junk Science

July 16th, 2016

Not surprisingly, many of Selikoff’s litigation- and regulatory-driven opinions have not fared well, such as the notions that asbestos causes gastrointestinal cancers and that all asbestos minerals have equal potential and strength to cause mesothelioma.  Forty years after Selikoff testified in litigation that occupational asbestos exposure caused an insulator’s colorectal cancer, the Institute of Medicine reviewed the extant evidence and announced that the evidence was  “suggestive but not sufficient to infer a causal relationship between asbestos exposure and pharyngeal, stomach, and colorectal cancers.” Jonathan Samet, et al., eds., Institute of Medicine Review of Asbestos: Selected Cancers (2006).[1] The Institute of Medicine’s monograph has fostered a more circumspect approach in some of the federal agencies.  The National Cancer Institute’s website now proclaims that the evidence is insufficient to permit a conclusion that asbestos causes non-pulmonary cancers of gastrointestinal tract and throat.[2]

As discussed elsewhere, Selikoff testified as early as 1966 that asbestos causes colorectal cancer, in advance of any meaningful evidence to support such an opinion, and then he, and his protégées, worked hard to lace the scientific literature with their pronouncements on the subject, without disclosing their financial, political, and positional conflicts of interest.[3]

With plaintiffs’ firm’s (Lanier) zealous pursuit of bias information from the University of Idaho, in the LoGuidice case, what are we to make of Selikoff’s and his minions’ dubious ethics of failed disclosure. Do Selikoff and Mount Sinai receive a pass because their asbestos research predated the discovery of ethics? The “Lobby” (as the late Douglas Liddell called Selikoff and his associates)[4] has seriously distorted truth-finding in any number of litigations, but nowhere are the Lobby’s distortions more at work than in lawsuits for claimed asbestos injuries. Here the conflicts of interests truly have had a deleterious effect on the quality of civil justice. As we saw with the Selikoff exceptionalism displayed by the New York Supreme Court in reviewing third-party subpoenas,[5] some courts seem bent on ignoring evidence-based analyses in favor of Mount Sinai faith-based initiatives.

Current Asbestos Litigation Claims Involving Colorectal Cancer

Although Selikoff has passed from the litigation scene, his trainees and followers have lined up at the courthouse door to propagate his opinions. Even before the IOM’s 2006 monograph, more sophisticated epidemiologists consistently rejected the Selikoff conclusion on asbestos and colon cancer, which grew out of Selikoff’s litigation activities.[6] And yet, the minions keep coming.

In the pre-Daubert era, defendants lacked an evidentiary challenge to the Selikoff’s opinion that asbestos caused colorectal cancer. Instead of contesting the legal validity or sufficiency of the plaintiffs’ general causation claims, defendants often focused on the unreliability of the causal attribution for the specific claimant’s disease. These early cases are often misunderstood to be challenges to expert witnesses’ opinions about whether asbestos causes colorectal cancer; they were not.[7]

Of course, after the IOM’s 2006 monograph, active expert witness gatekeeping should eliminate asbestos gastrointestinal cancer claims, but sadly they persist. Perhaps, courts simply considered the issue “grandfathered” in from the era in which judicial scrutiny of expert witness opinion testimony was restricted. Perhaps, defense counsel are failing to frame and support their challenges properly.  Perhaps both.

Arthur Frank Jumps the Gate

Although ostensibly a “Frye” state, Pennsylvania judges have, when moved by the occasion, to apply a fairly thorough analysis of proffered expert witness opinion.[8] On occasion, Pennsylvania judges have excluded unreliably or invalidly supported causation opinions, under the Pennsylvania version of the Frye standard. A recent case, however, tried before a Workman’s Compensation Judge (WCJ), and appealed to the Commonwealth Court, shows how inconsistent the application of the standard can be, especially when Selikoff’s legacy views are at issue.

Michael Piatetsky, an architect, died of colorectal cancer. Before his death, he and his wife filed a worker’s compensation claim, in which they alleged that his disease was caused by his workplace exposure to asbestos. Garrison Architects v. Workers’ Comp. Appeal Bd. (Piatetsky), No. 1095 C.D. 2015, Pa. Cmwlth. Ct., 2016 Pa. Commw. Unpub. LEXIS 72 (Jan. 22, 2016) [cited as Piatetsky]. Mr. Piatetsky was an architect, almost certainly knowledgeable about asbestos hazards generally.  Despite his knowledge, Piatetsky eschewed personal protective equipment even when working at dusty work sites well marked with warnings. Although he had engaged in culpable conduct, the employer in worker compensation proceedings does not have ordinary negligence defenses, such as contributory negligence or assumption of risk.

In litigating the Piatetsky’s claim, the employer dragged its feet and failed to name an expert witness.  Eventually, after many requests for continuances, the Workers’ Compensation Judge barred the employer from presenting an expert witness. With the record closed, and without an expert witness, the Judge understandably ruled in favor of the claimant.

The employer, sans expert witness, had to confront claimant’s expert witness, Arthur L. Frank, a minion of Selikoff and a frequent testifier in asbestos and many other litigations. Frank, of course, opined that asbestos causes colon cancer and that it caused Mr. Piatetsky’s cancer. Mr. Piatetsky’s colon cancer originated on the right side of his colon. Dr. Frank thus emphasized that asbestos causes colon cancer in all locations, but especially on the right side in view of one study’s having concluded “that colon cancer caused by asbestos is more likely to begin on the right side.” Piatetsky at *6.

On appeal, the employer sought relief on several issues, but the only one of interest here is the employer’s argument “that Claimant’s medical expert based his opinion on flimsy medical studies.” Piatetsky at *10. The employer’s appeal seemed to go off the rails with the insistence that the Claimant’s medical opinion was invalid because Dr. Frank relied upon studies not involving architects. Piatetsky at *14. The Commonwealth Court was able to point to testimony, although probably exaggerated, which suggested that Mr. Piatetsky had been heavily exposed, at least at times, and thus his exposure was similar to that in the studies cited by Frank.

With respect to Frank’s right-sided (non-sinister) opinion, the Commonwealth Court framed the employer’s issue as a contention that Dr. Frank’s opinion on the asbestos-relatedness of right-sided colon cancer was “not universally accepted.” But universal acceptance has never been the test or standard for the rejection or acceptance of expert witness opinion testimony in any state.  Either the employer badly framed its appeal, or the appellate court badly misstated the employer’s ground for relief. In any event, the Commonwealth Court never addressed the relevant legal standard in its discussion.

The Claimant argued that the hearing Judge had found that Frank’s opinion was based on “numerous studies.” Piatetsky at *15. None of these studies is cited to permit the public to assess the argument and the Court’s acceptance of it. The appellate court made inappropriately short work of this appellate issue by confusing general and specific causation, and invoking Mr. Piatetsky’s age, his lack of family history of colon cancer, Frank’s review of medical records, testimony, and work records, as warranting Frank’s causal inference. None of these factors is relevant to general causation, and none is probative of the specific causation claim.  Many if not most colon cancers have no identifiable risk factor, and Dr. Frank had no way to rule out baseline risk, even if there were an increased risk from asbestos exposure. Piatetsky at *16. With no defense expert witness, the employer certainly had a difficult appellate journey. It is hard for the reader of the Commonwealth Court’s opinion to determine whether the case was poorly defended, poorly briefed on appeal, or poorly described by the appellate judges.

In any event, the right-sided ruse of Arthur Frank went unreprimanded.  Intellectual due process might have led the appellate court to cite the article at issue, but it failed to do so.  It is interesting and curious to see how the appellate court gave a detailed recitation of the controverted facts of asbestos exposure, while how glib the court was when describing the scientific issues and evidence.  Nonetheless, the article referenced vaguely, which went uncited by the appellate court, was no doubt the paper:  K. Jakobsson, M. Albin & L. Hagmar, “Asbestos, cement, and cancer in the right part of the colon,” 51 Occup. & Envt’l Med. 95 (1994).

These authors 24 observed versus 9.63 expected right-sided colon cancers, and they concluded that there was an increased rate of right-sided colon cancer in the asbestos cement plant workers.  Notably the authors’ reference population had a curiously low rate of right-sided colon cancer.  For left-sided colon cancer, the authors 9.3 expected cases but observed only 5 cases in the asbestos-cement cohort.  Contrary to Frank’s suggestion, the authors did not conclude that right-sided colon cancers had been caused by asbestos; indeed, the authors never reached any conclusion whether asbestos causes colorectal  cancer under any circumstances.  In their discussion, these authors noted that “[d]espite numerous epidemiological and experimental studies, there is no consensus concerning exposure to asbestos and risks of gastrointestinal cancer.” Jakobsson at 99; see also Dorsett D. Smith, “Does Asbestos Cause Additional Malignancies Other than Lung Cancer,” chap. 11, in Dorsett D. Smith, The Health Effects of Asbestos: An Evidence-based Approach 143, 154 (2015). Even this casual description of the Jakobsson study will awake the learned reader to the multiple comparisons that went on in this cohort study, with outcomes reported for left, right, rectum, and multiple sites, without any adjustment to the level of significance.  Risk of right-sided colon cancer was not a pre-specified outcome of the study, and the results of subsequent studies have never corroborated this small cohort study.

A sane understanding of subgroup analyses is important to judicial gatekeeping. SeeSub-group Analyses in Epidemiologic Studies — Dangers of Statistical Significance as a Bright-Line Test” (May 17, 2011).  The chapter on statistics in the Reference Manual for Scientific Evidence (3d ed. 2011) has some prudent caveats for multiple comparisons and testing, but neither the chapter on epidemiology, nor the chapter on clinical medicine[9], provides any sense of the dangers of over-interpreting subgroup analyses.

Some commentators have argued that we must not dissuade scientists from doing subgroup analysis, but the issue is not whether they should be done, but how they should be interpreted.[10] Certainly many authors have called for caution in how subgroup analyses are interpreted[11], but apparently Expert Witness Arthur Frank, did not receive the memo, before testifying in the Piatetsky case, and the Commonwealth Court did not before deciding this case.


[1] As good as the IOM process can be on occasion, even its reviews are sometimes less than thorough. The asbestos monograph gave no consideration to alcohol in the causation of laryngeal cancer, and no consideration to smoking in its analysis of asbestos and colorectal cancer. See, e.g., Peter S. Liang, Ting-Yi Chen & Edward Giovannucci, “Cigarette smoking and colorectal cancer incidence and mortality: Systematic review and meta-analysis,” 124 Internat’l J. Cancer 2406, 2410 (2009) (“Our results indicate that both past and current smokers have an increased risk of [colorectal cancer] incidence and mortality. Significantly increased risk was found for current smokers in terms of mortality (RR 5 1.40), former smokers in terms of incidence (RR 5 1.25)”); Lindsay M. Hannan, Eric J. Jacobs and Michael J. Thun, “The Association between Cigarette Smoking and Risk of Colorectal Cancer in a Large Prospective Cohort from the United States,” 18 Cancer Epidemiol., Biomarkers & Prevention 3362 (2009).

[2] National Cancer Institute, “Asbestos Exposure and Cancer Risk” (last visited July 10, 2016) (“In addition to lung cancer and mesothelioma, some studies have suggested an association between asbestos exposure and gastrointestinal and colorectal cancers, as well as an elevated risk for cancers of the throat, kidney, esophagus, and gallbladder (3, 4). However, the evidence is inconclusive.”).

[3] Compare “Health Hazard Progress Notes: Compensation Advance Made in New York State,” 16(5) Asbestos Worker 13 (May 1966) (thanking Selikoff for testifying in a colon cancer case) with, Irving J. Selikoff, “Epidemiology of gastrointestinal cancer,” 9 Envt’l Health Persp. 299 (1974) (arguing for his causal conclusion between asbestos and all gastrointestinal cancers, with no acknowledgment of his role in litigation or his funding from the asbestos insulators’ union).

[4] F.D.K. Liddell, “Magic, Menace, Myth and Malice,” 41 Ann. Occup. Hyg. 3, 3 (1997); see alsoThe Lobby Lives – Lobbyists Attack IARC for Conducting Scientific Research” (Feb. 19, 2013).

[5]

SeeThe LoGiudice Inquisitiorial Subpoena & Its Antecedents in N.Y. Law” (July 14, 2016).

[6] See, e.g., Richard Doll & Julian Peto, Asbestos: Effects on health of exposure to asbestos 8 (1985) (“In particular, there are no grounds for believing that gastrointestinal cancers in general are peculiarly likely to be caused by asbestos exposure.”).

[7] See Landrigan v. The Celotex Corporation, Revisited” (June 4, 2013); Landrigan v. The Celotex Corp., 127 N.J. 404, 605 A.2d 1079 (1992); Caterinicchio v. Pittsburgh Corning Corp., 127 NJ. 428, 605 A.2d 1092 (1992). In both Landrigan and Caterinicchio, there had been no challenge to the reliability or validity of the plaintiffs’ expert witnesses’ general causation opinions. Instead, the trial courts entered judgments, assuming arguendo that asbestos can cause colorectal cancer (a dubious proposition), on the ground that the low relative risk cited by plaintiffs’ expert witnesses (about 1.5) was factually insufficient to support a verdict for plaintiffs on specific causation.  Indeed, the relative risk suggested that the odds were about 2 to 1 in defendants’ favor that the plaintiffs’ colorectal cancers were not caused by asbestos.

[8] See, e.g., Porter v. Smithkline Beecham Corp., Sept. Term 2007, No. 03275. 2016 WL 614572 (Phila. Cty. Com. Pleas, Oct. 5, 2015); “Demonstration of Frye Gatekeeping in Pennsylvania Birth Defects Case” (Oct. 6, 2015).

[9] John B. Wong, Lawrence O. Gostin & Oscar A. Cabrera, “Reference Guide on Medical Testimony,” in Reference Manual for Scientific Evidence 687 (3d ed. 2011).

[10] See, e.g., Phillip I. Good & James W. Hardin, Common Errors in Statistics (and How to Avoid Them) 13 (2003) (proclaiming a scientists’ Bill of Rights under which they should be allowed to conduct subgroup analyses); Ralph I. Horwitz, Burton H. Singer, Robert W. Makuch, Catherine M. Viscoli, “Clinical versus statistical considerations in the design and analysis of clinical research,” 51 J. Clin. Epidemiol. 305 (1998) (arguing for the value of subgroup analyses). In United States v. Harkonen, the federal government prosecuted a scientist for fraud in sending a telecopy that described a clinical trial as “demonstrating” a benefit in a subgroup of a secondary trial outcome.  Remarkably, in the Harkonen case, the author, and criminal defendant, was describing a result in a pre-specified outcome, in a plausible but post-hoc subgroup, which result accorded with prior clinical trials and experimental evidence. United States v. Harkonen (D. Calif. 2009); United States v. Harkonen (D. Calif. 2010) (post-trial motions), aff’d, 510 F. App’x 633 (9th Cir. 2013) (unpublished), cert. denied, 134 S. Ct. 824, ___ U.S. ___ (2014); Brief by Scientists And Academics as Amici Curiae In Support Of Petitioner, On Petition For Writ Of Certiorari in the Supreme Court of the United States, W. Scott Harkonen v. United States, No. 13-180 (filed Sept. 4, 2013).

[11] SeeSub-group Analyses in Epidemiologic Studies — Dangers of Statistical Significance as a Bright-Line Test” (May 17, 2011) (collecting commentary); see also Lemuel A. Moyé, Statistical Reasoning in Medicine:  The Intuitive P-Value Primer 206, 225 (2d ed. 2006) (noting that subgroup analyses are often misleading: “Fishing expeditions for significance commonly catch only the junk of sampling error”); Victor M. Montori, Roman Jaeschke, Holger J. Schünemann, Mohit Bhandari, Jan L Brozek, P. J. Devereaux & Gordon H Guyatt, “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004) (“Beware subgroup analysis”); Susan F. Assmann, Stuart J. Pocock, Laura E. Enos, Linda E. Kasten, “Subgroup analysis and other (mis)uses) of baseline data in clinical trials,” 355 Lancet 1064 (2000); George Davey Smith & Mathias Egger, “Commentary: Incommunicable knowledge? Interpreting and applying the results of clinical trials and meta-analyses,” 51 J. Clin. Epidemiol. 289 (1998) (arguing against post-hoc hypothesis testing); Douglas G. Altman, “Statistical reviewing for medical journals,” 17 Stat. Med. 2662 (1998); Douglas G. Altman, “Commentary:  Within trial variation – A false trail?” 51 J. Clin. Epidemiol. 301 (1998) (noting that observed associations are expected to vary across subgroup because of random variability); Christopher Bulpitt, “Subgroup Analysis,” 2 Lancet: 31 (1988).

Lawyer and Economist Expert Witnesses Fail the t-Test

July 7th, 2016

Chad L. Staller is a lawyer and James Markham is an economist.  The two testify frequently in litigation.  They are principals in a litigation-mill known as the Center for Forensic Economic Studies (CFES), which has been a provider of damages opinions-for-hire for decades.

According to its website, the CFES is:

“a leading provider of expert economic analysis and testimony. Our economists and statisticians consult on matters arising in litigation, with a focus on the analysis of economic loss and expert witness testimony on damages.

We assist with discovery, uncover key data, critique opposing claims and produce clear, credible reports and expert testimony. Attorneys and their clients have relied on our expertise in thousands of cases in jurisdictions across the country.”

Modesty was never CFES’s strong suit. CFES was founded by Chad Staller’s father, the late Jerome M. Staller, who infused the run-away inflation of the early 1980s into his reports for plaintiffs in personal injury actions. When this propensity for inflation brought in a large volume of litigation consulting, Staller brought on Brian P. Sullivan.  The CFES website notes that Sullivan’s “courtroom demeanor was a model of modesty and good humor, yet he was known to be merciless when cross examined by an opposing attorney.” My personal recollection is that Sullivan sweated profusely on cross-examination. In one case, in which I cross-examined him, Sullivan had added several figures incorrectly to the plaintiff’s detriment.  My cross-examination irked the trial judge (Judge Dowling, who was easily irked) to the point that he interrupted me to ask why I was wasting time to point out an error that favored the defense. The question allowed me to give a short summation about how I thought the jury might want to know that the witness, Sullivan, had such difficulty in adding uncomplicated numbers.

In Butt v. v. United Brotherhood of Carpenters & Joiners of America, 2016 WL 3365772 (E.D. Pa. June 16, 2016) [cited as Butt], plaintiffs, women union members sued for alleged disparate treatment, which treatment supposedly caused them to have lower incomes than male union members. To support their claims, the women produced reports prepared by CFES’s Chad Staller and James Markham. Counsel for the union challenged the admissibility of the proffered opinions under Rule 702. The magistrate judge sustained the Rule 702 challenges, in an opinion that questioned the reliability and ability of the challenged putative expert witnesses.[1]

Staller and Markham apparently had proffered a “t-test,” which, in their opinion, showed a statistically significant disparity in male and female hours worked, “not attributable to chance.” Butt at *1. Staller and Markham failed, however, to explain or justify their use of the t-test.  The sample size in their analysis included 17 women and 388 men on average across ten years. The magistrate judge noted serious reservations over the CFES analysis’s failure to specify how many men or women were employed in any given year. Plaintiffs’ counsel improvidently attempted to support the CFES analysis by adverting to the Reference Manual on Scientific Evidence (3d ed. 2011), which properly notes that the t-test is designed for small samples, but also issues the caveat that “[a] t-test is not appropriate for small samples drawn from a population that is not normal.” Butt at *1 n.2. The CFES reports, submitted without statistical analysis output, apparently did not attempt to justify the assumption of normality; nor did they proffer a non-parametric analysis.

Putting aside the plaintiffs’ expert witnesses’ failure to explain and justify its use of the t-test, the magistrate judge took issue with the assumption that a comparison of average salaries between the genders was an appropriate analysis in the first place. Butt at *2.

First, the CFES reports assigned damages beyond the years used in their data analysis, which ended in 2012. This extrapolation was especially speculative unwarranted given that union carpenter working hours were trending downward after 2009. Butt at *3. Second, and even more seriously, the magistrate judge saw that no useful comparison could be made between male and female salaries without taking into account several important additional variables such as their individual skills, the extent that individual carpenters solicited employment, or used referral systems, or accepted out-of-town employment. Butt at *3.[2] Without an appropriate multivariate analysis, the CFES reports could not conclude that the discrepancy in hours worked was caused by, rather than merely correlated with, gender. Butt at *4.[3]


[1] See Calhoun v. Yamaha Motor Corp., U.S.A., 350 F.3d 316, 322 (3d Cir. 2003) (affirming exclusion of “speculative and unreliable” expert evidence).

[2] citing Stair v. Lehigh Valley Carpenters Local Union No. 600 of United Brotherhood of Carpenters and Joiners of America, No. Civ. A. 91-1507, 1993 WL 235491, at *7, *18 (E.D. Pa. July 24, 1993) (Huyett, J.), aff’d, 43 F.3d 1463 (3d Cir. 1994) (“Many variables determine the number of hours worked by a carpenter: whether the carpenter solicits employment, whether he or she uses the referral system, whether an employer asks for that carpenter by name, whether the carpenter will accept out of town employment, and whether the carpenter has the skills requested by an employer when that employer calls the Union for a referral.”

[3] Interesting cases cited by the magistrate judge in support included Molthan v. Temple University, 778 F.2d 955, 963 (3d Cir. 1985) (“Because the considerations affecting promotion decisions may differ greatly from one department to another, statistical evidence of a general underrepresentation of women in the position of full professor adds little to a disparate treatment claim.”); Riding v. Kaufmann’s Dep’t Store, 220 F.Supp. 2d 442, 459 (W.D. Pa. 2002) (“Plaintiff’s statistical evidence is mildly interesting, but she does not put the data in context (how old were the women?) [or] tell us what to do with it or what inferences should be gathered from it…”); Brown v. Cost Co., No. Civ. A. 03-224 ERIE, 2006 WL 544296, at *3 (W.D. Pa. Mar. 3, 2006) (excluding statistical evidence proffered in support of claims of disparate treatment).

Lipitor Diabetes MDL’s Inexact Analysis of Fisher’s Exact Test

April 21st, 2016

Muriel Bristol was a biologist who studied algae at the Rothamsted Experimental Station in England, after World War I.  In addition to her knowledge of plant biology, Bristol claimed the ability to tell whether tea had been added to milk, or the tea poured first and then milk had been added.  Bristol, as a scientist and a proper English woman, preferred the latter.

Ronald Fisher, who also worked at Rothamsted, expressed his skepticism over Dr. Bristol’s claim. Fisher set about to design a randomized experiment that would efficiently and effectively test her claim. Bristol was presented with eight cups of tea, four of which were prepared with milk added to tea, and four prepared with tea added to milk.  Bristol, of course, was blinded to which was which, but was required to label each according to its manner of preparation. Fisher saw his randomized experiment as a 2 x 2 contingency table, from he could calculate the observed outcome (and ones more extreme if there were any more extreme outcomes) using the assumption of fixed marginal rates and the hypergeometric probability distribution.  Fisher’s Exact Test was born at tea time.[1]

Fisher described the origins of his Exact Test in one of his early texts, but he neglected to report whether his experiment vindicated Bristol’s claim. According to David Salsburg, H. Fairfield Smith, one of Fisher’s colleagues, acknowledged that Bristol nailed Fisher’s Exact test, with all eight cups correctly identified. The test has gone on to become an important tool in the statistician’s armamentarium.

Fisher’s Exact, like any statistical test, has model assumptions and preconditions.  For one thing, the test is designed for categorical data, with binary outcomes. The test allows us to evaluate whether two proportions are likely different by chance alone, by calculating the probability of the observed outcome, as well as more extreme outcomes.

The calculation of an exact attained significance probability, using Fisher’s approach, provides a one-sided p-value, with no unique solution to calculating a two-side attained significance probability. In discrimination cases, the one-sided p-value may well be more appropriate for the issue at hand. The Fisher’s Exact Test has thus played an important role in showing the judiciary that small sample size need not be an insuperable barrier to meaningful statistical analysis. In discrimination cases, the one-sided p-value provided by the test is not a particular problem.[2]

The difficulty of using Fisher’s Exact for small sample sizes is that the hypergeometric distribution, upon which the test is based, is highly asymmetric. The observed one-sided p-value does not measure the probability of a result equally extreme in the opposite direction. There are at least three ways to calculate the p-value:

  1. Double the one-sided p-value.
  2. Add the point probabilities from the opposite tail that are more extreme than the observed point probability.
  3. Use the mid-P value; that is, add all values more extreme (smaller) than the observed point probability from both sides of the distribution, PLUS ½ of the observed point probability.

Some software programs will proceed in one of these ways by default, but their doing so does guarantee the most accurate measure of two-tailed significance probability.

In the Lipitor MDL for diabetes litigation, Judge Gergel generally used sharp analyses to cut through the rancid fat of litigation claims, to get to the heart of the matter. By and large, he appears to have done a splendid job. In course of gatekeeping under Federal Rule of Evidence 702, however, Judge Gergel may have misunderstood the nature of Fisher’s Exact Test.

Nicholas Jewell is a well-credentialed statistician at the University of California.  In the courtroom, Jewell is a well-known expert witness for the litigation industry.  He is no novice at generating unreliable opinion testimony. See In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D. Pa. Dec. 2, 2015) (excluding Jewell’s opinions as scientifically unwarranted and methodologically flawed). In the Lipitor cases, some of Jewell’s opinions seemed outlandish indeed, and Judge Gergel generally excluded them. See In re Lipitor Marketing, Sales Practices and Prods. Liab. Litig., MDL No. 2:14-mn-02502-RMG, ___ F.Supp. 3d  ___ (2015), 2015 WL 7422613 (D.S.C. Nov. 20, 2015) [Lipitor Jewell], reconsideration den’d, 2016 WL 827067 (D.S.C. Feb. 29, 2016) [Lipitor Jewell Reconsidered].

As Judge Gergel explained, Jewell calculated a relative risk for abnormal blood glucose in a Lipitor group to be 3.0 (95% C.I., 0.9 to 9.6), using STATA software. Also using STATA, Jewell obtained an attained significance probability of 0.0654, based upon Fisher’s Exact Test. Lipitor Jewell at *7.

Judge Gergel did not report whether Jewell’s reported p-value of 0.0654, was one- or two-sided, but he did state that the attained probability “indicates a lack of statistical significance.” Id. & n. 15. The rest of His Honor’s discussion of the challenged opinion, however, makes clear that of 0.0654 must have been a two-sided value.  If it had been a one-sided p-value, then there would have been no way of invoking the mid-p to generate a two-sided p-value below 5%. The mid-p will always be larger than the one-tailed exact p-value generated by Fisher’s Exact Test.

The court noted that Dr. Jewell had testified that he believed that STATA generated this confidence interval by “flip[ping]” the Taylor series approximation. The STATA website notes that it calculates confidence intervals for odds ratios (which are different from the relative risk that Jewell testified he computed), by inverting the Fisher exact test.[3] Id. at *7 & n. 17. Of course, this description suggests that the confidence interval is not based upon exact methods.

STATA does not provide a mid p-value calculation, and so Jewell used an on-line calculator, to obtain a mid p-value of 0.04, which he declared statistically significant. The court took Jewell to task for using the mid p-value as though it were a different analysis or test.  Id. at *8. Because the mid-p value will always be larger than the one-sided exact p-value from Fisher’s Exact Test, the court’s explanation does not really make sense:

“Instead, Dr. Jewell turned to the mid-p test, which would ‘[a]lmost surely’ produce a lower p-value than the Fisher exact test.”

Id. at *8. The mid-p test, however, is not different from the Fisher’s exact; rather it is simply a way of dealing with the asymmetrical distribution that underlies the Fisher’s exact, to arrive at a two-tailed p-value that more accurately captures the rate of Type I error.

The MDL court acknowledged that the mid-p approach, was not inherently unreliable, but questioned Jewell’s inconsistent, selective use of the approach for only one test.[4]  Jewell certainly did not help the plaintiffs’ cause and his standing by having discarding the analyses that were not incorporated into his report, thus leaving the MDL court to guess at how much selection went on in his process of generating his opinions..  Id. at *9 & n. 19.

None of Jewell’s other calculated p-values involved the mid-p approach, but the court’s criticism begs the question whether the other p-values came from a Fisher’s Exact Test with small sample size, or other highly asymmetrical distribution. Id. at *8. Although Jewell had shown himself willing to engage in other dubious, result-oriented analyses, Jewell’s use of the mid-p for this one comparison may have been within acceptable bounds after all.

The court also noted that Jewell had obtained the “exact p-value and that this p-value was not significant.” Id. The court’s notation here, however, does not report the important detail whether that exact, unreported p-value was merely the doubled of the one-sided p-value given by the Fisher’s Exact Test. As the STATA website, cited by the MDL court, explains:

“The test naturally gives a one-sided p-value, and there are at least four different ways to convert it to a two-sided p-value (Agresti 2002, 93). One way, not implemented in Stata, is to double the one-sided p-value; doubling is simple but can result in p-values larger than one.”

Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009) (citing Alan Agresti, Categorical Data Analysis 93 (2d ed. 2002)).

On plaintiffs’ motion for reconsideration, the MDL court reaffirmed its findings with respect to Jewell’s use of the mid-p.  Lipitor Jewell Reconsidered at *3. In doing so, the court insisted that the one instance in which Jewell used the mid-p stood in stark contrast to all the other instances in which he had used Fisher’s Exact Test.  The court then cited to the record to identify 21 other instances in which Jewell used a p-value rather than a mid-p value.  The court, however, did not provide the crucial detail whether these 21 other instances actually involved small-sample applications of Fisher’s Exact Test.  As result-oriented as Jewell can be, it seems safe to assume that not all his statistical analyses involved Fisher’s Exact Test, with its attendant ambiguity for how to calculate a two-tailed p-value.


Post-Script (Aug. 9, 2017)

The defense argument and the judicial error were echoed in a Washington Legal Foundation paper that pilloried Nicholas Jewell for the surfeit of many methodological flaws in his expert witness opinions in In re Lipitor. Unfortunately, the paper uncritically recited the defense’s theory about the Fisher’s Exact Test:

“In assessing Lipitor data, even after all of the liberties that [Jewell] took with selecting data, he still could not get a statistically-significant result employing a Fisher’s exact test, so he switched to another test called a mid-p test, which generated a (barely) statistically significant result.”

Kirby Griffis, “The Role of Statistical Significance in Daubert/Rule 702 Hearings,” at 19, Wash. Leg. Foundation Critical Legal Issues Working Paper No. 201 (Mar. 2017). See Kirby Griffis, “Beware the Weak Argument: The Rule of Thirteen,” For the Defense 72 (July 2013) (quoting Justice Frankfurter, “A bad argument is like the clock striking thirteen. It puts in doubt the others.”). The fallacy of Griffis’ argument is that it assumes that a mid-p calculation is a different statistical test from the Fisher’s Exact test, which yields a one-tailed significance probability. Unfortunately, Griffis’ important paper is marred by this and other misstatements about statistics.


[1] Sir Ronald A. Fisher, The Design of Experiments at chapter 2 (1935); see also Stephen Senn, “Tea for three: Of infusions and inferences and milk in first,” Significance 30 (Dec. 2012); David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century  (2002).

[2] See, e.g., Dendy v. Washington Hosp. Ctr., 431 F. Supp. 873 (D.D.C. 1977) (denying preliminary injunction), rev’d, 581 F.2d 99 (D.C. Cir. 1978) (reversing denial of relief, and remanding for reconsideration). See also National Academies of Science, Reference Manual on Scientific Evidence 255 n.108 (3d ed. 2011) (“Well-known small sample techniques [for testing significance and calculating p-values] include the sign test and Fisher’s exact test.”).

[3] See Wesley Eddings, “Fisher’s exact test two-sided idiosyncrasy” (Jan. 2009), available at <http://www.stata.com/support/faqs/statistics/fishers-exact-test/>, last visited April 19, 2016 (“Stata’s exact confidence interval for the odds ratio inverts Fisher’s exact test.”). This article by Eddings contains a nice discussion of why the Fisher’s Exact Test attained significance probability disagrees with the calculated confidence interval. Eddings points out the asymmetry of the hypergeometric distribution, which complicates arriving at an exact p-value for a two-sided test.

[4] See Barber v. United Airlines, Inc., 17 Fed.Appx. 433, 437 (7th Cir. 2001) (“Because in formulating his opinion Dr. Hynes cherry-picked the facts he considered to render an expert opinion, the district court correctly barred his testimony because such a selective use of facts fails to satisfy the scientific method and Daubert.”).

The Education of Judge Rufe – The Zoloft MDL

April 9th, 2016

The Honorable Cynthia M. Rufe is a judge on the United States District Court, for the Eastern District of Pennsylvania.  Judge Rufe was elected to a judgeship on the Bucks County Court of Common Pleas in 1994.  She was appointed to the federal district court in 2002. Like most state and federal judges, little in her training and experience as a lawyer prepared her to serve as a gatekeeper of complex expert witness scientific opinion testimony.  And yet, the statutory code of evidence, and in particular, Federal Rules of Evidence 702 and 703, requires her do just that.

The normal approach to MDL cases is marked by the Field of Dreams: “if you build it, they will come.” Last week, Judge Rufe did something that is unusual in pharmaceutical litigation; she closed the gate and sent everyone home. In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016).

Her Honor’s decision was hardly made in haste.  The MDL began in 2012, and proceeded in a typical fashion with case management orders that required the exchange of general causation expert witness reports. The plaintiffs’ steering committee (PSC), acting for the plaintiffs, served the report of only one epidemiologist, Anick Bérard, who took the position that Zoloft causes virtually every major human congenital anomaly known to medicine. The defendants challenged the admissibility of Bérard’s opinions.  After extensive briefings and evidentiary hearings, the trial court found that Bérard’s opinions were riddled with inconsistent assessments of studies, eschewed generally accepted methods of causal inference, ignored contrary evidence, adopted novel, unreliable methods of endorsing “trends” in studies, and failed to address epidemiologic studies that did not support her subjective opinions. In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D.Pa.2014). The trial court permitted plaintiffs an opportunity to seek reconsideration of Bérard’s exclusion, which led to the trial court’s reaffirming its previous ruling. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 314149, at *2 (E.D.Pa. Jan. 23, 2015).

Notwithstanding the PSC’s claims that Bérard was the best qualified expert witness in her field and that she was the only epidemiologist needed to support the plaintiffs’ causal claims, the MDL court indulged the PSC by permitting plaintiffs another bite at the apple.  Over defendants’ objections, the court permitted the PSC to name yet another expert witness, statistician Nicholas Jewell, to do what Bérard had failed to do: proffer an opinion on general causation supported by sound science.  In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 115486, at * 2 (E.D.Pa. Jan. 7, 2015).

As a result of this ruling, the MDL dragged on for over a year, in which time, the PSC served a report by Jewell, and then the defendants conducted a discovery deposition of Jewell, and lodged a new Rule 702 challenge.  Although Jewell brought more statistical sophistication to the task, he could not transmute lead into gold; nor could he support the plaintiffs’ causal claims without committing most of the same fallacies found in Bérard’s opinions.  After another round of Rule 702 briefs and hearings, the MDL court excluded Jewell’s unwarranted causal opinions. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D.Pa. Dec. 2, 2015).

The successive exclusions of Bérard and Jewell left the MDL court in a peculiar position. There were other witnesses, Robert Cabrera, a teratologist, Michael Levin, a molecular biologist, and Thomas Sadler, an embryologist, whose opinions addressed animal toxicologic studies, biological plausibility, and putative mechanisms.  These other witnesses, however, had little or no competence in epidemiology, and they explicitly relied upon Bérard’s opinions with respect to human outcomes.  As a result of Bérard’s exclusion, these witnesses were left free to offer their views about what happens in animals at high doses, or about theoretical mechanisms, but they were unable to address human causation.

Although the PSC had no expert witnesses who could legitimately offer reasonably supported opinions about the causation of human birth defects, the plaintiffs refused to decamp and leave the MDL forum. Faced with the prospect of not trying their cases to juries, the PSC instead tried the patience of the MDL judge. The PSC pulled out the stops in adducing weak, irrelevant, and invalid evidence to support their claims, sans epidemiologic expertise. The PSC argued that adverse event reports, internal company documents that discussed possible associations, the biological plausibility opinions of Levin and Sadler, the putative mechanism opinions of Cabrera, differential diagnoses offered to support specific causation, and the hip-shot opinions of a former-FDA-commissioner-for-hire, David Kessler could come together magically to supply sufficient evidence to have their cases submitted to juries. Judge Rufe saw through the transparent effort to manufacture evidence of causation, and granted summary judgment on all remaining Zoloft cases in the MDL. s In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799, at *4 (E.D. Pa. April 5, 2016).

After a full briefing and hearing on Bérard’s opinion, a reconsideration of Bérard, a permitted “do over” of general causation with Jewell, a full briefing and hearing on Jewell’s opinions, the MDL court was able to deal deftly with the snippets of evidence “cobbled together” to substitute for evidence that might support a conclusion of causation. The PSC’s cobbled case was puffed up to give the appearance of voluminous evidence, in 200 exhibits that filled six banker’s boxes.  Id. at *5. The ruse was easily undone; most of the exhibits and purported evidence were obvious rubbish. “The quantity of the evidence is not, however, coterminous with the quality of evidence with regard to the issues now before the Court.” Id. The banker’s boxes contained artifices such as untranslated foreign-language documents, and company documents relating to the development and marketing of the medication. The PSC resubmitted reports from Levin, Cabrera, and Sadler, whose opinions were already adjudicated to be incompetent, invalid, irrelevant, or inadequate to support general causation.  The PSC pointed to the specific causation opinions of a clinical cardiologist, Ra-Id Abdulla, M.D., who proffered dubious differential etiologies, ruling in Zoloft as a cause of individual children’s birth defects, despite his inability to rule out truly known and unknown causes in the differential reasoning.  The MDL court, however, recognized that “[a] differential diagnosis assumes that general causation has been established,” id. at *7, and that Abdulla could not bootstrap general causation by purporting to reach a specific causation opinion (even if those specific causation opinions were legitimate).

The PSC submitted the recent consensus statement of the American Statistical Association (ASA)[1], which it misrepresented to be an epidemiologic study.  Id. at *5. The consensus statement makes some pedestrian pronouncements about the difference between statistical and clinical significance, about the need for other considerations in addition to statistical significance, in supporting causal claims, and the lack of bright-line distinctions for statistical significance in assessing causality.  All true, but immaterial to the PSC’s expert witnesses’ opinions that over-endorsed statistical significance in the few instances in which it was shown, and over-interpreted study data that was based upon data mining and multiple comparisons, in blatant violation of the ASA’s declared principles.

Stretching even further for “human evidence,” the PSC submitted documentary evidence of adverse event reports, as though they could support a causal conclusion.[2]  There are about four million live births each year, with an expected rate of serious cardiac malformations of about one per cent.[3]  The prevalence of SSRI anti-depressant use is at least two per cent, which means that we would expect 800 cardiac birth defects each year to occur in children of mother’s who took SSRI anti-depressants in the first trimester. If Zoloft had an average market share of all the SSRIs of about 25 per cent, then 200 cardiac defects each year would occur in children born to mothers who took Zoloft.  Given that Zoloft has been on the market since the early 1990s, we would expect that there would be thousands of children, exposed to Zoloft during embryogenesis, born with cardiac defects, if there was nothing untoward about maternal exposure to the medication.  Add the stimulated reporting of adverse events from lawyers, lawyer advertising, and lawyer instigation, you have manufactured evidence not probative of causation at all.[4] The MDL court cut deftly and swiftly through the smoke screen:

“These reports are certainly relevant to the generation of study hypotheses, but are insufficient to create a material question of fact on general causation.”

Id. at *9. The MDL court recognized that epidemiology was very important in discerning a causal connection between a common exposure and a common outcome, especially when the outcome has an expected rate in the general population. The MDL court stopped short of holding that epidemiologic evidence was required (which on the facts of the case would have been amply justified), but instead supported its ratio decidendi on the need to account for the extant epidemiology that contradicted or failed to support the strident and subjective opinions of the plaintiffs’ expert witnesses. The MDL court thus gave plaintiffs every benefit of the doubt by limiting its holding on the need for epidemiology to:

“when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why that body of research does not contradict or undermine their opinion.”

Id. at *5, quoting from In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449, 476 (E.D. Pa. 2014).

The MDL court also saw through the thin veneer of respectability of the testimony of David Kessler, a former FDA commissioner who helped make large fortunes for some of the members of the PSC by the feeding frenzy he created with his moratorium on silicone gel breast implants.  Even viewing Kessler’s proffered testimony in the most charitable light, the court recognized that he offered little support for a causal conclusion other than to delegate the key issues to epidemiologists. Id. at *9. As for the boxes of regulatory documents, foreign labels, and internal company memoranda, the MDL court found that these documents did not raise a genuine issue of material fact concerning general causation:

“Neither these documents, nor draft product documents or foreign product labels containing language that advises use of birth control by a woman taking Zoloft constitute an admission of causation, as opposed to acknowledging a possible association.”

Id.

In the end, the MDL court found that the PSC’s many banker boxes of paper contained too much of nothing for the issue at hand.  Having put the defendants through the time and expense of litigating and re-litigating these issues, nothing short of dismissing the pending cases was a fair and appropriate outcome to the Zoloft MDL.

_______________________________________

Given the denouement of the Zoloft MDL, it is worth considering the MDL judge’s handling of the scientific issues raised, misrepresented, argued, or relied upon by the parties.  Judge Rufe was required, by Rules 702 and 703, to roll up her sleeves and assess the methodological validity of the challenged expert witnesses’ opinions.  That Her Honor was able to do this is a testament to her hard work. Zoloft was not Judge Rufe’s first MDL, and she clearly learned a lot from her previous judicial assignment to an MDL for Avandia personal injury actions.

On May 21, 2007, the New England Journal of Medicine published online a seriously flawed meta-analysis of cardiovascular disease outcomes and rosiglitazone (Avandia) use.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  The Nissen article did not appear in print until June 14, 2007, but the first lawsuits resulted within a day or two of the in-press version. The lawsuits soon thereafter reached a critical mass, with the inevitable creation of a federal court Multi-District Litigation.

Within a few weeks of Nissen’s article, the Annals of Internal Medicine published an editorial by Cynthia Mulrow, and other editors, in which questioned the Nissen meta-analysis[5], and introduced an article that attempted to replicate Nissen’s work[6].  The attempted replication showed that the only way Nissen could have obtained his nominally statistically significant result was to have selected a method, Peto’s fixed effect method, known to be biased for use with clinical trials with uneven arms. Random effect methods, more appropriate for the clinically heterogeneous clinical trials, consistently failed to replicate the Nissen result. Other statisticians weighed in and pointed out that using the risk difference made much more sense when there were multiple trials with zero events in one or the other or both arms of the trials. Trials with zero cardiovascular events in both arms represented important evidence of low, but equal risk, of heart attacks, which should be captured in an appropriate analysis.  When the risk difference approach was used, with exact statistical methods, there was no statistically significant increase in risk in the dataset used by Nissen.[7] Other scientists, including some of Nissen’s own colleagues at the Cleveland Clinic, and John Ioannidis, weighed in to note how fragile and insubstantial the Nissen meta-analysis was[8]:

“As rosiglitazone case demonstrates, minor modifications of the meta-analysis protocol can change the statistical significance of the result.  For small effects, even the direction of the treatment effect estimate may change.”

Nissen achieved his political objective with his shaky meta-analysis.  The FDA convened an Advisory Committee meeting, which in turn resulted in a negative review of the safety data, and the FDA’s imposition of warnings and a Risk Evaluation and Mitigation Strategy, which all but prohibited use of rosiglizone.[9]  A clinical trial, RECORD, had already started, with support from the drug sponsor, GlaxoSmithKline, which fortunately was allowed to continue.

On a parallel track to the regulatory activities, the federal MDL, headed by Judge Rufe, proceeded to motions and a hearing on GSK’s Rule 702 challenge to plaintiffs’ evidence of general causation. The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that showed significant diffidence in addressing scientific issues.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011.

After Judge Rufe denied GSK’s challenges to the admissibility of plaintiffs’ expert witnesses’ causation opinions in the Avandia MDL, the RECORD trial was successfully completed and published.[10]  RECORD was a long term, prospectively designed randomized cardiovascular trial in over 4,400 patients, followed on average of 5.5 yrs.  The trial was designed with a non-inferiority end point of ruling out a 20% increased risk when compared with standard-of-care diabetes treatment The trial achieved its end point, with a hazard ratio of 0.99 (95% confidence interval, 0.85-1.16) for cardiovascular hospitalization and death. A readjudication of outcomes by the Duke Clinical Research Institute confirmed the published results.

On Nov. 25, 2013, after convening another Advisory Committee meeting, the FDA announced the removal of most of its restrictions on Avandia:

“Results from [RECORD] showed no elevated risk of heart attack or death in patients being treated with Avandia when compared to standard-of-care diabetes drugs. These data do not confirm the signal of increased risk of heart attacks that was found in a meta-analysis of clinical trials first reported in 2007.”

FDA Press Release, “FDA requires removal of certain restrictions on the diabetes drug Avandia” (Nov. 25, 2013). And in December 2015, the FDA abandoned its requirement of a Risk Evaluation and Mitigation Strategy for Avandia. FDA, “Rosiglitazone-containing Diabetes Medicines: Drug Safety Communication – FDA Eliminates the Risk Evaluation and Mitigation Strategy (REMS)” (Dec. 16, 2015).

GSK’s vindication came too late to reverse Judge Rufe’s decision in the Avandia MDL.  GSK spent over six billion dollars on resolving Avandia claims.  And to add to the company’s chagrin, GSK lost patent protection for Avandia in April 2012.[11]

Something good, however, may have emerged from the Avandia litigation debacle.  Judge Rufe heard from plaintiffs’ expert witnesses in Avandia about the hierarchy of evidence, about how observational studies must be evaluated for bias and confounding, about the importance of statistical significance, and about how studies that lack power to find relevant associations may still yield conclusions with appropriate meta-analysis. Important nuances of meta-analysis methodology may have gotten lost in the kerfuffle, but given that plaintiffs had reasonable quality clinical trial data, Avandia plaintiffs’ counsel could eschew their typical reliance upon weak and irrelevant lines of evidence, based upon case reports, adverse event disproportional reporting, and the like.

The Zoloft litigation introduced Judge Rufe to a more typical pharmaceutical litigation. Because the outcomes of interest were birth defects, there were no clinical trials.  To be sure, there were observational epidemiologic studies, but now the defense expert witnesses were carefully evaluating the studies for bias and confounding, and the plaintiffs’ expert witnesses were double counting studies and ignoring multiple comparisons and validity concerns.  Once again, in the Zoloft MDL, plaintiffs’ expert witnesses made their non-specific complaints about “lack of power” (without ever specifying the relevant alternative hypothesis), but it was the defense expert witnesses who cited relevant meta-analyses that attempted to do something about the supposed lack of power. Plaintiffs’ expert witnesses inconsistently argued “lack of power” to disregard studies that had outcomes that undermined their opinions, even when those studies had narrow confidence intervals surrounding values at or near 1.0.

The Avandia litigation laid the foundation for Judge Rufe’s critical scrutiny by exemplifying the nature and quantum of evidence to support a reasonable scientific conclusion.  Notwithstanding the mistakes made in the Avandia litigation, this earlier MDL created an invidious distinction with the Zoloft PSC’s evidence and arguments, which looked as weak and insubstantial as they really were.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>. SeeThe American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016).

[2] See 21 C.F.R. § 314.80 (a) Postmarketing reporting of adverse drug experiences (defining “[a]dverse drug experience” as “[a]ny adverse event associated with the use of a drug in humans, whether or not considered drug related”).

[3] See Centers for Disease Control and Prevention, “Birth Defects Home Page” (last visited April 8, 2016).

[4] See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).

[5] Cynthia D. Mulrow, John Cornell & A. Russell Localio, “Rosiglitazone: A Thunderstorm from Scarce and Fragile Data,” 147 Ann. Intern. Med. 585 (2007).

[6] George A. Diamond, Leon Bax & Sanjay Kaul, “Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infartion and Cardiovascular Death,” 147 Ann. Intern. Med. 578 (2007).

[7] Tian, et al., “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction” 10 Biostatistics 275 (2008)

[8] Adrian V. Hernandez, Esteban Walker, John P.A. Ioannidis,  and Michael W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

[9] Janet Woodcock, FDA Decision Memorandum (Sept. 22, 2010).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11]Pharmacovigilantism – Avandia Litigation” (Nov. 27, 2013).

Expert Witness – Ghost Busters

March 29th, 2016

Andrew Funkhouser was tried and convicted for selling cocaine.  On appeal, the Missouri Court of Appeals affirmed his conviction and his sentence of prison for 30 years. State v. Funkhouser, 729 S.W.2d 43 (Mo. App. 1987). On a petition for post-conviction relief, Funkhouser asserted that he was deprived of his Sixth Amendment right to effective counsel. Funkhouser v. State, 779 S.W.2d 30 (Mo. App. 1989).

One of the alleged grounds of ineffectiveness was his lawyer’s failure to object to the prosecutor’s cross-examination of a defense expert witness, clinical psychologist Frederick Nolen, on Nolan’s belief in ghosts. Id. at 32. On direct examination, Nolen testified that he had published or presented on multiple personalities, hypnosis, and ghosts.

On cross-examination, the prosecution inquired of Nolan about his theory of ghosts:

“Q. Doctor, I believe that you’ve done some work in the theory of ghosts, is that right?

A. Yes.

Q. I believe you told me that some of that work you’d based on your own experiences, is that correct?

A. Yes.

Q. You also told me you have lived in a haunted house for 13 years, is that right?

A. Yes.

Q. You have seen the ghost, is that correct?

A. Yes.”

Id. at 32-33. Funkhouser asserted that the cross-examination was improper because his expert witness was examined on his religious beliefs, and his counsel was ineffective for failing to object. Id. at 33.  The Missouri Court of Appeals disagreed. Counsel are permitted to cross-examine an adversary’s expert witness

“in any reasonable respect that will test his qualifications, credibility, skill or knowledge and the value and accuracy of his opinions.”

The court held that any failure to object could not be incompetence because the examination was proper. Id.

So there you have it: wacky beliefs systems are fair game for cross-examination of expert witnesses, at least in the “Show-Me” state.

And this broad scope of cross-examination is probably a good thing because almost anything seems to go in Missouri. The Show-Me state has been wiping up the rear in the law of expert witness admissibility. Missouri Revised Statutes contains a version of the Federal Rule of Evidence 702, which goes back to the language before the federal statutory revision in 2000:

Expert witness, opinion testimony admissible–hypothetical question not required, when.

490.065. 1. In any civil action, if scientific, technical or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education may testify thereto in the form of an opinion or otherwise.

In January 2016, the Missouri state senate passed a bill that would bring the Missouri standard in line with the current federal court rule of evidence. Most of the Republican senators voted for the bill; none of the Democrats voted in favor of the reform. Chris Semones, Missouri: One Step Closer to Daubert,” in Expert Witness Network (Jan. 26, 2016).

Lipitor MDL Cuts the Fat Out of Specific Causation

March 25th, 2016

Ms. Juanita Hempstead was diagnosed with hyperlipidemia in March 1998. Over a year later, in June 1999, with her blood lipids still elevated, her primary care physician prescribed 20 milligrams of atorvastatin per day. Ms. Hempstead did not start taking the statin regularly until July 2000. In September 2002, her lipids were under control, her blood glucose was abnormally high, and she had gained 13 pounds since she was first prescribed a statin medication. Hempstead v. Pfizer, Inc., 2:14–cv–1879, MDL No. 2:14–mn–02502–RMG, 2015 WL 9165589, at *2-3 (D.S.C. Dec. 11, 2015) (C.M.O. No. 55 in In re Lipitor Marketing, Sales Practices and Products Liability Litigation) [cited as Hempstead]. In the fall of 2003, Hempstead experienced abdominal pain, and she stopped taking the statin for a few weeks, presumably because of a concern over potential liver toxicity. Her cessation of the statin led to an increase in her blood fat, but her blood sugar remained elevated, although not in the range that would have been diagnostic of diabetes. In May 2004, about five years after starting on statin medication, having gained 15 pounds since 1999, Ms. Hempstead was diagnosed with type II diabetes mellitus. Id.

Living in a litigious society, and being bombarded with messages from the litigation industry, Ms. Hempstead sued the manufacturer of atorvastatin, Pfizer, Inc. In support of her litigation claim, Hempstead’s lawyers enlisted the support of Elizabeth Murphy, M.D., D.Phil., a Professor of Clinical Medicine, and Chief of Endocrinology and Metabolism at San Francisco General Hospital. Id. at *6. Dr. Murphy received her doctorate in biochemistry from Oxford University, and her medical degree from the Harvard Medical School. Despite her graduations from elite educational institutions, Dr. Murphy never learned the distinction between ex ante risk and assignment of causality in an individual patient.

Dr. Murphy claimed that atorvastatin causes diabetes, and that the medication caused Ms. Hempstead’s diabetes in 2004. Murphy pointed to a five-part test for her assessment of specific causation:

(1) reports or reliable studies of diabetes in patients taking atorvastatin;

(2) causation is biological plausible;

(3) diabetes appeared in the patient after starting atorvastatin;

(4) the existence of other possible causes of the patient’s diabetes; and

(5) whether the newly diagnosed diabetes was likely caused by the atorvastatin.

Id. In response to this proffered testimony, the defendant, Pfizer, Inc., challenged the admissibility of Dr. Murphy’s opinion under Federal Rule of Evidence 702.

The trial court, in reviewing Pfizer’s challenge, saw that Murphy’s opinion essentially was determined by (1), (2), and (3), above. In other words, once Murphy had become convinced of general causation, she was willing to causally attribute diabetes to atorvastatin in every patient who developed diabetes after starting to take the medication. Id. at *6-7.

Dr. Murphy relied upon some epidemiologic studies that suggested a relative risk of diabetes to be about 1.5 in patients who had taken atorvastatin. Id. at *5, *8. Unfortunately, the trial court, as is all too common among judges writing Rule 702 opinions, failed to provide citations to the materials upon which plaintiff’s expert witness relied. A safe bet, however, is that those studies, if they had any internal and external validity at all, involved multivariate analyses to analyze risk ratios for diabetes at time t1, in patients at time who had no diabetes before starting use of atorvastatin at time t0, compared with patients who did not have diabetes at t0 but never took the statin. If so, then Dr. Murphy’s use of a temporal relationship between starting atorvastatin and developing diabetes is quite irrelevant because the relative risk (1.5) relied upon is generated in studies in which the temporality is present. Ms. Hempstead’s development of diabetes five years after starting atorvastatin does not make her part of a group with a relative risk any higher than the risk ratio of 1.5, cited by Dr. Murphy. Similarly, the absence or presence of putative risk factors other than the accused statin is irrelevant because the risk ratio of 1.5 was mostly likely arrived at in studies that controlled or adjusted for the other risk factors in the epidemiologic study by a multivariate analysis. Id. at *5 & n. 8.

Dr. Murphy acknowledged that there are known risk factors for diabetes, and that plaintiff Ms. Hempstead had a few. Plaintiff was 55 years old at the time of diagnosis, and advancing age is a risk factor. Plaintiff’s body mass index (BMI) was elevated and it had increased over the five years since beginning to take atorvastatin. Even though not obese, Ms. Hempstead’s BMI was sufficiently high to confer a five-fold increase in risk for diabetes. Id. at *9. Plaintiff also had hypertension and metabolic syndrome, both of which are risk factors (with the latter adding to the level of risk of the former). Id. at *10. Perhaps hoping to avoid the intractable problem of identifying which risk factors were actually at work in Ms. Hempstead to produce her diabetes, Dr. Murphy claimed that all risk factors were causes of plaintiff’s diabetes. Her analysis was thus not so much a differential etiology as a non-differential, non-discriminating assertion that any and all risk factors were probably involved in producing the individual case. Not surprisingly, Dr. Murphy, when pressed, could not identify any professional organizations or peer-reviewed publications that employed such a methodology of attribution. Id. at *6. Dr. Murphy had never used such a method of attribution in her clinical practice; instead she attempted to justify and explain her methodology by adverting to its widespread use by expert witnesses in litigation. Id.

Relative Risk and the Inference of Specific Causation

The main thrust of the Dr. Murphy’s and the plaintiff’s specific causation claim seems to have been based upon a simple, simplistic identification of ex ante risk with causation. The MDL court recognized, however, that in science and in law, risk is not the same as causation.[1]

The existence of general causation, with elevated relative risks not likely the result of bias, chance, or confounding, does not necessarily support the inference that every person exposed to the substance or drug and who develops the outcome of interest, had his or her outcome caused by the exposure.

The law requires each plaintiff to show that his or her alleged injury, the outcome in the relied upon epidemiologic studies, was actually caused by the alleged exposure under a preponderance of the evidence. Id. at *4 (citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1249 n. 1 (11th Cir.2010))

The disconnect between risk and causation is especially strong when the nature of the causation involved results from the modification of the incidence rate of a disease as a function of exposure. Although the MDL court did not explicitly note the importance of a base rate, which gives rise to an “expected value” or “expected outcome” in an epidemiologic sample, the court’s insistence upon a relative risk greater than two, from studies of sample groups that are sufficiently similar to the plaintiff, implicitly affirms the principle. The MDL court did, however, call out Dr. Murphy’s reasoning that specific causation exists for every drug-exposed patient, in the face of studies that show general causation with associations of the magnitude less than risk ratios of two, was logically flawed. Id. at *8 (citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1255 (11th Cir. 2010) (“The fact that exposure to [a substance] may be a risk factor for [a disease] does not make it an actual cause simply because [the disease] developed.”).

The MDL court acknowledged the obvious, that some causal relationships may be based upon risk ratios of two or less (but greater than 1.0). Id. at *4. A risk ratio greater than 1.0, but not greater than two, can result only when some of the cases with the outcome of interest, here diabetes, would have occurred anyway in the population that has been sampled. And with increased risk ratios at two or less, a majority of the study sample would have developed the outcome even in the absence of the exposure of interest. With this in mind, the MDL court asked how plaintiff could show specific causation, even assuming that general causation were established with the use of epidemiologic methods.

The court in Hempstead reasoned that if the risk ratio were greater than 2.0, a majority of the exposed sample would have developed the outcome of interest because of the exposure being studied. Id. at *5. If the sampled population has had the same level of exposure as the plaintiff, then a case-specific inference of specific causation is supported.[2] Of course, this inferential strategy presupposes that general causation has been established, by ruling out bias, confounding, and chance, with high-quality, statistically significant findings of risk ratios in excess of 2.0. Id. at *5.

To be sure, there are some statisticians, such as Sander Greenland, who have criticized this use of a sample metric to assess the probability of individual causation, in part because the sample metric is an average level of risk, based upon the whole sample. Greenland is fond of speculating that the risk may not be stochastically distributed, but as the Supreme Court has recently acknowledged, there are times when the use of an average is appropriate to describe individuals within a sampled population. Tyson Foods, Inc. v. Bouaphakeo, No. 14-1146, 2016 WL 1092414 (U.S. S. Ct. Mar. 22, 2016).

The Whole Tsumish

Dr. Murphy, recognizing that there are other known and unknown causes and risk factors for diabetes, made a virtue of foolish consistency by opining that all risk factors present in Ms. Hempstead were involved in producing her diabetes. Dr. Murphy did not, and could not, explain, however, how or why she believed that every risk factor (age, BMI, hypertension, recent weight gain, metabolic syndrome, etc.), rather than some subset of factors, or some idiopathic factors, were involved in producing the specific plaintiff’s disease. The MDL court concluded that Dr. Murphy’s opinion was an ipse dixit of the sort that qualified her opinion for exclusion from trial. Id. at *10.

Biological Fingerprints

Plaintiffs posited typical arguments about “fingerprints” or biological markers that would support inferences of specific causation in the absence of high relative risks, but as is often the case with such arguments, they had no factual foundation for their claims that atorvastatin causes diabetes. Neither Dr. Murphy nor anyone else had ever identified a biological marker that allowed drug-exposed patients with diabetes to be identified as having had their diabetes actually caused by the drug of interest, as opposed to other known or unknown causes.

With Dr. Murphy’s testimony failing to satisfy common sense and Rule 702, plaintiff relied upon cases in which circumstances permitted inferences of specific causation from temporal relationships between exposure and outcome. In one such case, the plaintiff developed throat irritation from very high levels of airborne industrial talc exposure, which abated upon cessation of exposure, and returned with renewed exposure. Given that general causation was conceded, and natural experimental nature of challenge, dechallenge, and rechallenge, the Fourth Circuit in this instance held that the temporal relationship of an acute insult and onset was an adequate basis for expert witness opinion testimony on specific causation. Id. at *11. (citing Westberry v. Gislaved Gummi AB, 178 F.3d 257, 265 (4th Cir.1999) (“depending on the circumstances, a temporal relationship between exposure to a substance and the onset of a disease or a worsening of symptoms can provide compelling evidence of causation”); Cavallo v. Star Enter., 892 F. Supp. 756, 774 (E.D. Va.1995) (discussing unique, acute onset of symptoms caused by chemicals). In the Hempstead case, however, the very nature of the causal relationship claimed did not involve an acute reaction. The claimed injury, diabetes, emerged five years after statin use commenced, and the epidemiologic studies relied upon were all based upon this chronic use, with a non-acute, latent outcome. The trial judge thus would not credit the mere temporality between drug use and new onset of diabetes as probative of anything.


[1] Id. at *8, citing Guinn v. AstraZeneca Pharm. LP, 602 F.3d 1245, 1255 (11th Cir.2010) (“The fact that exposure to [a substance] may be a risk factor for [a disease] does not make it an actual cause simply because [the disease] developed.”); id. at *11, citing McClain v. Metabolife Int’l, Inc., 401 F.3d 1233, 1243 (11th Cir.2005) (“[S]imply because a person takes drugs and then suffers an injury does not show causation. Drawing such a conclusion from temporal relationships leads to the blunder of the post hoc ergo propter hoc fallacy.”); see also Roche v. Lincoln Prop. Co., 278 F.Supp. 2d 744, 752 (E.D. Va.2003) (“Dr. Bernstein’s reliance on temporal causation as the determinative factor in his analysis is suspect because it is well settled that a causation opinion based solely on a temporal relationship is not derived from the scientific method and is therefore insufficient to satisfy the requirements of Rule 702.”) (internal quotes omitted).

[2] See Reference Manual on Scientific Evidence at 612 (3d ed. 2011) (noting “the logic of the effect of doubling of the risk”); see also Marder v.G.D. Searle & Co., 630 F. Supp. 1087, 1092 (D. Md.1986) (“In epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof-a showing of causation by the preponderance of the evidence or, in other words, a probability of greater than 50%.”).

The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees

March 19th, 2016

People say crazy things. In a radio interview, Evangelical Michael Huckabee argued that the Kentucky civil clerk who refused to issue a marriage license to a same-sex couple was as justified in defying an unjust court decision as people are justified in disregarding Dred Scott v. Sanford, 60 U.S. 393 (1857), which Huckabee described as still the “law of the land.”1 Chief Justice Roger B. Taney would be proud of Huckabee’s use of faux history, precedent, and legal process to argue his cause. Definition of “huckabee”: a bogus factoid.

Consider the case of Sander Greenland, who attempted to settle a score with an adversary’s expert witness, who had opined in 2002, that Bayesian analyses were rarely used at the FDA for reviewing new drug applications. The adversary’s expert witness obviously got Greenland’s knickers in a knot because Greenland wrote an article in a law review of all places, in which he presented his attempt to “correct the record” and show how the statement of the opposing expert witness was“ludicrous” .2 To support his indictment on charges of ludicrousness, Greenland ignored the FDA’s actual behavior in reviewing new drug applications,3 and looked at the practice of the Journal of Clinical Oncology, a clinical journal published 24 issues a year, with occasional supplements. Greenland found the word “Bayesian” 50 times in over 40,000 journal pages, and declared victory. According to Greenland, “several” (unquantified) articles had used Bayesian methods to explore, post hoc, statistically nonsignificant results.”4

Given Greenland’s own evidence, the posterior odds that Greenland was correct in his charges seem to be disturbingly low, but he might have looked at the published papers that conducted more serious, careful surveys of the issue.5 This week, the Journal of the American Medical Association published yet another study by John Ioannidis and colleagues, which documented actual practice in the biomedical literature. And no surprise, Bayesian methods barely register in a systematic survey of the last 25 years of published studies. See David Chavalarias, Joshua David Wallach, Alvin Ho Ting Li, John P. A. Ioannidis, “Evolution of reporting P values in the biomedical literature, 1990-2015,” 315 J. Am. Med. Ass’n 1141 (2016). See also Demetrios N. Kyriacou, “The Enduring Evolution of the P Value,” 315 J. Am. Med. Ass’n 1113 (2016) (“Bayesian methods are not frequently used in most biomedical research analyses.”).

So what are we to make of Greenland’s animadversions in a law review article? It was a huckabee moment.

Recently, the American Statistical Association (ASA) issued a statement on the use of statistical significance and p-values. In general, the statement was quite moderate, and declined to move in the radical directions urged by some statisticians who attended the ASA’s meeting on the subject. Despite the ASA’s moderation, the ASA’s statement has been met with huckabee-like nonsense and hyperbole. One author, a pharmacologist trained at the University of Washington, with post-doctoral training at the University of California, Berkeley, and an editor of PloS Biology, was moved to write:

However, the ASA notes, the importance of the p-value has been greatly overstated and the scientific community has become over-reliant on this one – flawed – measure.”

Lauren Richardson, “Is the p-value pointless?” (Mar. 16, 2016). And yet, no where in the ASA’s statement does the group suggest that the the p-value was a “flawed” measure. Richardson suffered a lapse and wrote a huckabee.

Not surprisingly, lawyers attempting to spin the ASA’s statement have unleashed entire hives of huckabees in an attempt to deflate the methodological points made by the ASA. Here is one example of a litigation-industry lawyer who argues that the American Statistical Association Statement shows the irrelevance of statistical significance for judicial gatekeeping of expert witnesses:

To put it into the language of Daubert, debates over ‘p-values’ might be useful when talking about the weight of an expert’s conclusions, but they say nothing about an expert’s methodology.”

Max Kennerly, “Statistical Significance Has No Place In A Daubert Analysis” (Mar. 13, 2016) [cited as Kennerly]

But wait; the expert witness must be able to rule out chance, bias and confounding when evaluating a putative association for causality. As Austin Bradford Hill explained, even before assessing a putative association for causality, scientists need first to have observations that

reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance.”

Austin Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295, 295 (1965) (emphasis added).

The analysis of random error is an essential step on the methodological process. Simply because a proper methodology requires consideration of non-statistical factors does not remove the statistical from the methodology. Ruling out chance as a likely explanation is a crucial first step in the methodology for reaching a causal conclusion when there is an “expected value” or base rate of for the outcome of interest in the population being sampled.

Kennerly shakes his hive of huckabees:

The erroneous belief in an ‘importance of statistical significance’ is exactly what the American Statistical Association was trying to get rid of when they said, ‘The widespread use of “statistical significance” (generally interpreted as p ≤ 0.05)’ as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.”

And yet, the ASA never urged that scientists “get rid of” statistical analyses and assessments of attained levels of significance probability. To be sure, they cautioned against overinterpreting p-values, especially in the context of multiple comparisons, non-prespecified outcomes, and the like. The ASA criticized bright-line rules, which are often used by litigation-industry expert witnesses to over-endorse the results of studies with p-values less than 5%, often in the face of multiple comparisons, cherry-picked outcomes, and poorly and incompletely described methods and results. What the ASA described as a “considerable distortion of the scientific process” was claiming scientific truth on the basis of “p < 0.05.” As Bradford Hill pointed out in 1965, a clear-cut association, beyond that which we would care to attribute to chance, is the beginning of the analysis of an association for causality, not the end of it. Kennerly ignores who is claiming “truth” in the litigation context.  Defense expert witnesses frequently are opining no more than “not proven.” The litigation industry expert witnesses must opine that there is causation, or else they are out of a job.

The ASA explained that the distortion of the scientific process comes from making a claim of a scientific conclusion of causality or its absence, when the appropriate claim is “we don’t know.” The ASA did not say, suggest, or imply that a claim of causality can be made in the absence of finding statistical significance, and as well as validation of the statistical model on which it is based, and other factors as well. The ASA certainly did not say that the scientific process will be served well by reaching conclusions of causation without statistical significance. What is clear is that statistical significance should not be an abridgment for a much more expansive process. Reviewing the annals of the International Agency for Research on Cancer (even in its currently politicized state), or the Institute of Medicine, an honest observer would be hard pressed to come up with examples of associations for outcomes that have known base rates, which associations were determined to be causal in the absence of studies that exhibited statistical significance, along with many other indicia of causality.

Some other choice huckabees from Kennerly:

“It’s time for courts to start seeing the phrase ‘statistically significant’ in a brief the same way they see words like ‘very,’ ‘clearly,’ and ‘plainly’. It’s an opinion that suggests the speaker has strong feelings about a subject. It’s not a scientific principle.”

Of course, this ignores the central limit theorems, the importance of random sampling, the pre-specification of hypotheses and level of Type I error, and the like. Stuff and nonsense.

And then in a similar vein, from Kennerly:

The problem is that many courts have been led astray by defendants who claim that ‘statistical significance’ is a threshold that scientific evidence must pass before it can be admitted into court.”

In my experience, litigation-industry lawyers oversell statistical significance rather than defense counsel who may question reliance upon studies that lack it. Kennerly’s statement is not even wrong, however, because defense counsel knowledgeable of the rules of evidence would know that statistical studies themselves are rarely admitted into evidence. What is admitted, or not, is the opinion of expert witnesses, who offer opinions about whether associations are causal, or not causal, or inconclusive.


1 Ben Mathis-Lilley, “Huckabee Claims Black People Aren’t Technically Citizens During Critique of Unjust Laws,” The Slatest (Sept. 11 2015) (“[T]he Dred Scott decision of 1857 still remains to this day the law of the land, which says that black people aren’t fully human… .”).

2 Sander Greenland, “The Need for Critical Appraisal of Expert Witnesses in Epidemiology and Statistics,” 39 Wake Forest Law Rev. 291, 306 (2004). See “The Infrequency of Bayesian Analyses in Non-Forensic Court Decisions” (Feb. 16, 2014).

3 To be sure, eight years after Greenland published this diatribe, the agency promulgated a guidance that set recommended practices for Bayesian analyses in medical device trials. FDA Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials (February 5, 2010); 75 Fed. Reg. 6209 (February 8, 2010); see also Laura A. Thompson, “Bayesian Methods for Making Inferences about Rare Diseases in Pediatric Populations” (2010); Greg Campbell, “Bayesian Statistics at the FDA: The Trailblazing Experience with Medical Devices” (Presentation give by Director, Division of Biostatistics Center for Devices and Radiological Health at Rutgers Biostatistics Day, April 3, 2009). Even today, Bayesian analysis remains uncommon at the U.S. FDA.

4 39 Wake Forest Law Rev. at 306-07 & n.61 (citing only one paper, Lisa Licitra et al., Primary Chemotherapy in Resectable Oral Cavity Squamous Cell Cancer: A Randomized Controlled Trial, 21 J. Clin. Oncol. 327 (2003)).

5 See, e.g., J. Martin Bland & Douglas G. Altman, “Bayesians and frequentists,” 317 Brit. Med. J. 1151, 1151 (1998) (“almost all the statistical analyses which appear in the British Medical Journal are frequentist”); David S. Moore, “Bayes for Beginners? Some Reasons to Hesitate,” 51 The Am. Statistician 254, 254 (“Bayesian methods are relatively rarely used in practice”); J.D. Emerson & Graham Colditz, “Use of statistical analysis in the New England Journal of Medicine,” in John Bailar & Frederick Mosteler, eds., Medical Uses of Statistics 45 (1992) (surveying 115 original research studies for statistical methods used; no instances of Bayesian approaches counted); Douglas Altman, “Statistics in Medical Journals: Developments in the 1980s,” 10 Statistics in Medicine 1897 (1991); B.S. Everitt, “Statistics in Psychiatry,” 2 Statistical Science 107 (1987) (finding only one use of Bayesian methods in 441 papers with statistical methodology).