TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Slemp Trial Part 3 – The Defense Expert Witness – Huh

July 9th, 2017

On June 19, 2017, the U.S. Supreme Court curtailed the predatory jurisdictional practices of the lawsuit industry in seeking out favorable trial courts with no meaningful connection to their claims. See Bristol-Myers Squib Co. v. Superior Court, No. 16-466, 582 U.S. ___ (June 19, 2017). The same day, the defendants in a pending talc cancer case in St. Louis filed a motion for a mistrial. Swann v. Johnson & Johnson, Case No. 1422-CC09326-01, Division 10, Circuit Court of St. Louis City, Missouri. Missouri law may protect St. Louis judges from having to get involved in gatekeeping scientific expert witness testimony, but when the Supreme Court speaks to the requirements of the federal constitution’s due process clause, even St. Louis judges must listen. Bristol-Myers held that the constitution limits the practice of suing defendants in jurisdictions unrelated to the asserted claims, and the St. Louis trial judge, Judge Rex Burlison, granted the requested mistrial in Swann. As a result, there will not be another test of plaintiffs’ claims that talc causes ovarian cancer, and the previous Slemp case will remain an important event to interpret.

The Sole Defense Expert Witness

Previous posts1 addressed some of the big picture issues as well as the opening statements in Slemp. This posts turns to the defense expert witness, Dr. Walter Huh, in an attempt to understand how and why the jury returned its egregious verdict. Juries can, of course, act out of sympathy, passion, or prejudice, but their verdicts are usually black boxes when it comes to discerning their motivations and analyses. A more interesting and fruitful exercise is to ask whether a reasonable jury could have reached the conclusion in the case. The value of this exercise is limited, however. A reasonable jury should have reasonable expertise in the subject matter, and in our civil litigation system, this premise is usually not satisfied.

Dr. Walter Huh, a gynecologic oncologist, was the only expert witness who testified for the defense. As the only defense witness, and as a clinician, Huh had a terrible burden. He had to meet and rebut testimony outside his fields of expertise, including pathology, toxicology, and most important, epidemiology. Huh was by all measures well-spoken, articulate, and well-qualified as a clinical gynecologic oncologist. Defense counsel and Huh, however, tried to make the case that Huh was qualified to speak to all issues in the case. The initial examination on qualifications was long and tedious, and seemed to overcompensate for the obvious gaps in Dr. Huh’s qualifications. In my view, the defense never presented much in the way of credible explanations about where Huh had obtained the training, experience, and expertise to weigh in on areas outside clinical medicine. Ultimately, the cross-examination is the crucial test of whether this strategy of one witness for all subjects can hold. The cross-examination of Dr. Huh, however, exposed the gaps in qualifications, and more important, Dr. Huh made substantive errors that were unnecessary and unhelpful to the defense of the case.

The defense pitched the notion that Dr. Huh somehow trumped all the expert witnesses called by plaintiff because Huh was the “only physician heard by the jury” in court. Somehow, I wonder whether the jury was so naïve. It seems like a poor strategic choice to hope that the biases of the jury in favor of the omniscience of physicians (over scientists) will carry the day.

There were, to be sure, some difficult clinical issues, on which Dr. Huh could address within his competence. Cancer causation itself is a multi-disciplinary science, but in the case of a disease, such as ovarian cancer, with a substantial base-rate in the general population and without any biomarker of a causal pathway between exposure and outcome, epidemiology will be a necessary tool. Huh was thus forced to “play” on the plaintiffs’ expert witnesses’ home court, much to his detriment.

General Causation

Don’t confuse causation with links, association, and risk factors

The defense strong point is that virtually no one, other than the plaintiffs’ expert witnesses themselves, and only in the context of litigation, has causally attributed ovarian cancer to talc exposure. There are, however, some ways that this point can be dulled in the rough and tumble of trial. Lawyers, like journalists, and even some imprecise scientists, use a variety of terms such as “risk,” “risk factor,” “increased risk,” and “link,” for something less than causation. Sometimes these terms are used deliberately to try to pass off something less than causation as causation; sometimes the speaker is confused; and sometimes the speaker is simply being imprecise. It seems incumbent upon the defense to explain the differences between and among these terms, and to stick with a consistent, appropriate terminology.

One instance in which Dr. Huh took his eye off the “causation ball,” arose when plaintiffs’ counsel showed him a study conclusion that talc use among African American women was statistically significantly associated with ovarian cancer. Huh answered, non-responsively, “I disagree with the concept that talc causes ovarian cancer.” The study, however, did not advance a causal conclusion and there was no reason to suggest to the jury that he disagreed with anything in the paper; rather it was the opportunity to repeat that association is not causation, and the article did not contradict anything he had said.

Similarly, Dr. Huh was confronted with several precautionary recommendations that women “may” benefit from avoiding talc. Remarkably, Huh simply disagreed, rather than making the obvious point that the recommendation was not stated as something that would in fact benefit women.

When witnesses answer long, involved questions, with a simple “yes,” then they may have made every implied proposition in the questions into facts in the case. In an exchange between plaintiff’s counsel and Huh, counsel asked whether a textbook listed talc as a risk factor.2 Huh struggled to disagree, which disagreement tended to impair his credibility, for disagreeing with a textbook he acknowledged using and relying upon. Disagreement, however, was not necessary; the text merely stated that “talc … may increase risk.” If “increased risk” had been defined and explained as something substantially below causation, then Huh could have answered simply “yes, but that quotation does not support a causal claim.”

At another point, plaintiffs’ counsel, realizing that none of the individual studies reached a causal conclusion, asked whether it would be improper for a single study to give such a conclusion. It was a good question, with a solid premise, but Dr. Huh missed the opportunity for explaining that the authors of all the various individual studies had not conducted systematic reviews that advanced the causal conclusion that plaintiffs would need. Certainly, the authors of individual studies were not prohibited from taking the next step to advance a causal conclusion in a separate paper with the appropriate analysis.

Bradford Hill’s Factors

Dr. Huh’s testimony provided the jury with some understanding of Sir Austin Bradford Hill’s nine factors, but Dr. Huh would have helped himself by acknowledging several important points. First, as Hill explained, the nine factors are invoked only after there is a clear-cut (valid) association beyond that which we care to attribute to chance. Second, establishing all nine factors is not necessary. Third, some of the nine factors are more important than others.

Study validity

In the epidemiology of talc and ovarian cancer, statistical power and significance are not the crucial issues; study validity is. It should have been the plaintiff’s burden to rule out bias, and confounding, as well as chance. Hours had passed in the defense examination of Dr. Huh before study validity was raised, and it was never comprehensively explained. Dr. Huh explained recall bias as a particular problem of case-control studies, which made up the bulk of evidence upon which plaintiffs’ expert witnesses relied. A more sophisticated witness on epidemiology might well have explained that the selection of controls can be a serious problem without obvious solutions in case-control studies.

On cross-examination, plaintiffs’ counsel, citing Kenneth Rothman, asked whether misclassification bias always yields a lower risk ratio. Dr. Huh resisted with “not necessarily,” but failed to dig in whether the conditions for rejecting plaintiffs’ generalization (such as polychotomous exposure classification) obtained in the relevant cohort studies. More importantly, Huh missed the opportunity to point out that the most recent, most sophisticated cohort study reported a risk ratio below 1.0, which on the plaintiffs’ theory about misclassification would have been even lower than 1.0 than reported in the published paper. Again, a qualified epidemiologist would not have failed to make these points.

Dr. Huh never read the testimony of one of the plaintiffs’ expert witnesses on epidemiology, Graham Colditz, and offered no specific rebuttal of Colditz’s opinions. With respect to the other of plaintiffs’ epidemiology expert witness, Dr. Cramer, Huh criticized him for engaging in post-hoc secondary analyses and asserted that Cramer’s meta-analysis could not be validated. Huh never attempted to validate the meta-analysis himself; nor did Huh offer his own meta-analysis or explain why a meta-analysis of seriously biased studies was unhelpful. These omissions substantially blunted Huh’s criticisms.

On the issue of study validity, Dr. Huh seem to intimate that cohort studies were necessarily better than case-control studies because of recall bias, but also because there are more women involved in the cohort studies than in the case-control studies. The latter point, although arithmetically correct, is epidemiologically bogus. There are often fewer ovarian cancer cases in the cohort study, especially if the cohort is not followed for a very long time. The true test comes in the statistical precision of the point estimate, relative risk or odds ratio, in the different type of study. The case-control studies often generate much more precise point estimates as seen from their narrower confidence intervals. Of course, the real issue is not precision here, but accuracy.  Still, Dr. Huh appeared to have endorsed the defense counsel misleading argument about study size, a consideration that will not help the defense when the contentions of the parties are heard in scientific fora.

Statistical Significance

Huh appeared at times to stake out a position that if a study does not have statistical significance, then we must accept the null hypothesis. I believe that most careful scientists would reject this position. Null studies simply fail to reject the null hypothesis.

Although there seems to be no end to fallacious reasoning by plaintiffs, there is a particular defense fallacy seen in some cases that turn on epidemiology. What if we had 10 studies that each found an elevated risk ratio of 1.5, with two-tailed 95 percent confidence intervals of 0.92 – 2.18, or so. Can the defense claim victory because no study is statistically significant? Huh seemed to suggest so, but this is clearly wrong. Of course, we might ask why no one conducted the 11th study, with sufficient power to detect a risk ratio of 1.5, at the desired level of significance. But parties go to trial with the evidence they have, not what they might want to have. On the above 10-study hypothetical, a meta-analysis might well be done (assuming the studies could be appropriately included), and the summary risk ratio for all studies would be 1.5, and highly statistically significant.

On the question of talc and ovarian cancer, there were several meta-analyses at issue, and so the role of statistical significance of individual studies was less relevant. The real issue was study validity. This issue was muddled by assertions that risk ratios such as 2.05 (95%, 0.94 – 4.47) were “chance findings.” Chance may not have been ruled out, but the defense can hardly assert that chance and chance alone produced the findings; otherwise, it will be sunk by the available meta-analyses.

Strength of Association

The risk ratios involved in most of the talc ovarian cancer studies are small, and that is obviously an important factor to consider in evaluating the studies for causal conclusions. Still, it is also obvious that sometimes real causal associations can be small in magnitude. Dr Huh could and should have conceded in direct that small associations can be causal, but explained that validity concerns about the studies that show small associations become critical. Examples would have helped, such as the body of observational epidemiology that suggested that estrogen replacement therapy in post-menopausal women provided cardiovascular benefit, only to be reversed by higher quality clinical trials. Similarly, observational studies suggested that lung cancer rates were reduced by Vitamin A intake, but again clinical trial data showed the opposite.

Consistency of Studies

Are studies that have statistically non-significant risk ratios above 1.0 inconsistent with studies that find statistically significant elevated risk ratios? At several points, Huh appears to say that such a group of studies is inconsistent, but that is not necessarily so. Huh’s assertion provoked a good bit of harmful cross-examination, in which he seemed to resist the notion that meta-analysis could help answer whether a group of studies is statistically consistent. Huh could have conceded the point readily but emphasized that a group of biased studies would give only a consistently biased estimate of association.

Authority

One of the cheapest tricks in the trial lawyers’ briefcase is the “learned treatise” exception to the rule against hearsay.”3 The lawyer sets up witnesses in deposition by obtaining their agreement that a particular author or text is “authoritative.” Then at trial, the lawyer confronts the witnesses with a snippet of text, which appears to disagree with the expert witnesses’ testimony. Under the rule, in federal and in some state courts, the jury may accept the snippet or sound bite as true, and also accept that the witnesses do not know what they are talking about when they disagree with the “authoritative” text.

The rule is problematic and should have been retired long ago. Since 1663, the Royal Society has sported the motto:  “Nullius in verba.”  Disputes in science are resolved with data, from high-quality, reproducible experimental or observational studies, not with appeals to the prestige of the speaker. And yet, we lawyers will try, and sometimes succeed, with this greasy kidstuff approach cross-examination. Indeed, when there is an opportunity to use it, we may even have an obligation to use so-called learned treatises to advance our clients’ cause.

In the Slemp trial, the plaintiff’s counsel apparently had gotten a concession from Dr. Huh that plaintiff’s expert witness on epidemiology, Dr. Daniel Cramer, was “credible and authoritative.” Plaintiff’s counsel then used Huh’s disagreement with Cramer’s testimony as well as his published papers to undermine Huh’s credibility.

This attack on Huh was a self-inflicted wound. The proper response to a request for a concession that someone or some publication is “authoritative,” is that this word really has no meaning in science. “Nullius in verba,” and all that. Sure, someone can be a respected research based upon past success, but past performance is no guarantee of future success. Look at Linus Pauling and Vitamin C. The truth of a conclusion rests on the data and the soundness of the inferences therefrom.

Collateral Attacks

The plaintiff’s lawyer in Slemp was particularly adept at another propaganda routine – attacking the witness on the stand for having cited another witness, whose credibility in turn was attacked by someone else, even if that someone else was a crackpot. Senator McCarthy (Joseph not Eugene) would have been proud of plaintiff’s lawyer’s use of the scurrilous attack on Paolo Boffetta for his views on EMF and cancer, as set out in Microwave News, a fringe publication that advances EMF-cancer claims. Now, the claim that non-ionizing radiation causes cancer has not met with much if any acceptance, and Boffetta’s criticisms of the claims are hardly unique or unsupported. Yet plaintiff’s counsel used this throw-away publication’s characterization of Boffetta as “the devil’s advocate,” to impugn Boffetta’s publications and opinions on EMF, as well as Huh’s opinions that relied upon some aspect of Boffetta’s work on talc. Not that “authority” counts, but Boffetta is the Associate Director for Population Sciences of the Tisch Cancer Institute and Chief of the Division of Cancer Prevention and Control of the Department of Oncological Sciences, at the Mt. Sinai School of Medicine in New York. He has published many epidemiologic studies, as well as a textbook on the epidemiology of cancer.4

The author from the Microwave News was never identified, but almost certainly lacks the training, experience, and expertise of Paolo Boffetta. The point, however, is that this cross-examination was extremely collateral, had nothing to do with Huh, or the issues in the Slemp case, and warranted an objection and admonition to plaintiff’s counsel for the scurrilous attack. An alert trial judge, who cared about substantial justice, might have shut down this frivolous, highly collateral attack, sua sponte. When Huh was confronted with the “devil’s advocate” characterization, he responded “OK,” seemingly affirming the premise of the question.

Specific Causation

Dr. Huh and the talc defendants took the position that epidemiology never informs assessment of individual causation. This opinion is hard to sustain. Elevated risk ratios reflect more individual cases than expected in a sample. Epidemiologic models are used to make individual predictions of risk for purposes of clinical monitoring and treatment. Population-based statistics are used to define range of normal function and to assess individuals as impaired or disabled, or not.

At one point in the cross-examination, plaintiffs’ counsel suggested the irrelevance of the size of relative risk by asking whether Dr. Huh would agree that a 20% increased risk was not small if you are someone who has gotten the disease. Huh answered “Well, if it is a real association.” This answer fails on several levels. First, it conflates “increased risk” and “real association” with causation. The point was for Huh to explain that an increased risk, if statistically significant, may be an association, but it is not necessary causal.

Second, and equally important, Huh missed the opportunity to explain that even if the 20% increased risk was real and causal, it would still mean that an individual patient’s ovarian cancer was most likely not caused by the exposure. See David H. Schwartz, “The Importance of Attributable Risk in Toxic Tort Litigation,” (July 5, 2017).

Conclusion

The defense strategy of eliciting all their scientific and medical testimony from a single witness was dangerous at best. As good a clinician as Dr. Huh appears to be, the defense strategy did not bode well for a good outcome when many of the scientific issues were outside of Dr. Huh’s expertise.


2 Jonathan S. Berek & Neville F. Hacker, Gynecologic Oncology at 231 (6th ed. 2014).

3 SeeTrust-Me Rules of Evidence” (Oct. 18 2012).

4 See, e.g., Paolo Boffetta, Stefania Boccia, Carol La Vecchia, A Quick Guide to Cancer Epidemiology (2014).

Samuel Tarry’s Protreptic for Litigation-Sponsored Publications

July 9th, 2017

Litigation-related research has been the punching bag of self-appointed public health advocates for some time. Remarkably, and perhaps not surprising to readers of this blog, many of the most strident critics have deep ties to the lawsuit industry, and have served the plaintiffs’ bar loyally and zealously for many years.1,2,3,4 And many of these critics have ignored or feigned ignorance of the litigation provenance of much research that they hold dear, such as Irving Selikoff’s asbestos research undertaken for the asbestos workers’ union and its legal advocates. These critics’ campaign is an exquisite study in hypocrisy.

For some time, I have argued that the standards for conflict-of-interest disclosures should be applied symmetrically and comprehensively to include positional conflicts, public health and environmental advocacy, as well as litigation consulting or testifying for any party. Conflicts should be disclosed, but they should not become a facile excuse or false justification for dismissing research, regardless of the party that sponsored it.5 Scientific studies should be interpreted scientifically – that is carefully, thoroughly, and rigorously – regardless whether they are conducted and published by industry-sponsored, union-sponsored, or Lord help us, even lawyer-sponsored scientists.

Several years ago, a defense lawyer, Samuel Tarry, published a case series of industry-sponsored research or analysis, which grew out of litigation, but made substantial contributions to the scientific understanding of claimed health risks. See Samuel L. Tarry, Jr., “Can Litigation-Generated Science Promote Public Health?” 33 Am. J. Trial Advocacy 315 (2009). Tarry’s paper is a helpful corrective to the biased (and often conflicted) criticisms of industry-sponsored research and analysis by the lawsuit industry and its scientific allies and consultants. It an ocean of uninformative papers about “Daubert,” Tarry’s paper stands out and should be required reading for all lawyers who practice in the area of “health effects litigation.”

Tarry presented a brief summary of the litigation context for three publications that deserve to remembered and used as exemplars of important, sound, scientific publications that helped changed the course of litigations, as well as the scientific community’s appreciation of prior misleading contentions and publications. His three case studies grew out of the silicone-gel breast implant litigation, the latex allergy litigation, and the never-ending asbestos litigation.

1. Silicone

There are some glib characterizations of the silicone gel breast implant litigation as having had no evidentiary basis. A more careful assessment would allow that there was some evidence, much of it fraudulent and irrelevant. See, e.g., Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in the silicone gel breast implant litigation as “charlatans” and the litigation as largely based upon fraud). The lawsuit industry worked primarily through so-called support groups, which in turn funded friendly, advocate physicians, who in turn testified for plaintiffs and their lawyers in personal injury cases.

When the defendants, such as Dow Corning, reacted by sponsoring serious epidemiologic analyses of the issue whether exposure to silicone gel was associated with specific autoimmune or connective tissue diseases, the plaintiffs’ bar mounted a conflict-of-interest witch hunt over industry funding.6 Ultimately, the source of funding became obviously irrelevant; the concordance between industry-funded and all high quality research on the litigation claims was undeniable. Obvious that is to court-appointed expert witnesses7, and to a blue-ribbon panel of experts in the Institute of Medicine8.

2. Latex Hypersensitivity

Tarry’s second example comes from the latex hypersensitivity litigation. Whatever evidentiary basis may have existed for isolated cases of latex allergy, the plaintiffs’ bar had taken and expanded into a full-scale mass tort. A defense expert witness, Dr. David Garabrant, a physician and an epidemiologist, published a meta-analysis and systematic review of the extant scientific evidence. David H. Garabrant & Sarah Schweitzer, “Epidemiology of latex sensitization and allergies in health care workers,” 110 J. Allergy & Clin. Immunol. S82 (2002). Garabrant’s formal, systematic review documented his litigation opinions that the risk of latex hypersensitivity was much lower than claimed and not the widespread hazard asserted by plaintiffs and their retained expert witnesses. Although Garabrant’s review did not totally end the litigation and public health debate about latex, it went a long way toward ending both.

3. Fraudulent Asbestos-Induced Radiography

I still recall, sitting at my desk, my secretary coming into my office to tell me excitedly that a recent crop of silicosis claimants had had previous asbestosis claims. When I asked how she knew, she showed me the computer print out for closed files for another client. Some of the names were so distinctive that the probability that there were two men with the same name was minuscule. When we obtained the closed files from storage, sure enough, the social security numbers matched, as did all other pertinent data, except that what had been called asbestosis previously was now called silicosis.

My secretary’s astute observation was mirrored in the judicial proceedings of Judge Janis Graham Jack, who presided over MDL 1553. Judge Jack, however, discovered something even more egregious: in some cases, a single physician interpreted a single chest radiograph as showing either asbestosis or silicosis, but not both. The two, alternative diagnoses were recorded in two, separate reports, for two different litigation cases against different defendants. This fraudulent practice, as well as others, are documented in Judge Jack’s extraordinary, thorough opinion. See In re Silica Prods. Liab. Litig., 398 F. Supp. 2d 563 (S.D. Tex. 2005)9.

The revelations of fraud in Judge Jack’s opinion were not entirely surprising. As everyone involved in asbestos litigation has always known, there is a disturbing degree of subjectivity in the interpretation of chest radiographs for pneumoconiosis. The federal government has long been aware of this problem, and through the Centers for Disease Control and the National Institute of Occupational Safety and Health, has tried to subdue extreme subjectivity by creating a pneumoconiosis classification schemed for chest radiographs known as the “B-reader” system. Unfortunately, B-reader certification meant only that physicians could achieve inter-observer and intra-observer reproducibility of interpretations on the examination, but they were free to peddle extreme interpretations for litigation. Indeed, the B-reader certification system exacerbated the problem by creating a credential that was marketed to advance the credibility of some of the most biased, over-reading physicians in asbestos, silica, and coal pneumoconiosis litigation.

Tarry’s third example is a study conducted under the leadership of the late Joseph Gitlin, at Johns Hopkins Medical School. With funding from defendants and insurers, Dr. Joseph Gitlin conducted a concordance study of films that had been read by predatory radiologists and physicians as showing pneumoconiosis. The readers in his study found a very low level of positive films (less than 5%), despite their having been interpreted as showing pneumoconiosis by the litigation physicians. See Joseph N. Gitlin, Leroy L. Cook, Otha W. Linton, and Elizabeth Garrett-Mayer, “Comparison of ‘B’ Readers’ Interpretations of Chest Radiographs for Asbestos Related Changes,” 11 Acad. Radiol. 843 (2004); Marjorie Centofanti, “With thousands of asbestos workers demanding compensation for lung disease, a radiology researcher here finds that most cases lack merit,” Hopkins Medicine (2006). As with the Sokol hoax, the practitioners of post-modern medicine cried “foul,” and decried industry sponsorship, but the disparity between courtroom and hospital medicine was sufficient proof for most disinterested observers that there was a need to fix the litigation process.

Meretricious Mensuration10 – Manganese Litigation Example

Tarry’s examples are important reminders that corporate sponsorship, whether from the plaintiffs’ lawsuit industry or from manufacturing industry, does not necessarily render research tainted or unreliable. Although lawyers often confront exaggerated or false claims, and witness important, helpful correctives in the form of litigation-sponsored studies, the demands of legal practice and “the next case” typically prevent lawyers from documenting the scientific depredations and their rebuttals. Sadly, unlike litigations such as those involving Bendectin and silicone, the chronicles of fraud and exaggeration are mostly closed books in closed files in closed offices. These examples need the light of day and a fresh breeze to disseminate them widely in both the scientific and legal communities, so that all may have a healthy appreciation for the value of appropriately conducted studies generated in litigation contexts.

As I have intimated elsewhere, the welding fume litigation is a great example of specious claiming, which ultimately was unhorsed by publications inspired or funded by the defense. In the typical welding fume case, plaintiff claimed that exposure to manganese in welding fume caused Parkinson’s disease or manganism. Although manganism sounds as though it must be a disease that can be caused only by manganese, in the hands of plaintiffs’ expert witnesses, manganism became whatever ailment plaintiffs claimed to have suffered. Circularity and perfect definitional precision were achieved by semantic fiat.

The Sanchez-Ramos Meta-Analysis

Manganese Madness was largely the creation of the Litigation Industry, under the dubious leadership of Dickie Scruggs & Company. Although the plaintiffs enjoyed a strong tail wind in the courtroom of an empathetic judge, they had difficulties in persuading juries and ultimately decamped from MDL 1535, in favor of more lucrative targets. In their last hurrah, however, plaintiffs retained a neurologist, Juan Sanchez-Ramos, who proffered a biased, invalid synthesis, which he billed as a meta-analysis11.

Sanchez-Ramos’s meta-analysis, such as it was, provoked professional disapproval and criticism from the defense expert witness, Dr. James Mortimer. Because the work product of Sanchez-Ramos was first disclosed in deposition, and not in his Rule 26 report, Dr. Mortimer undertook belatedly a proper meta-analysis.12 Even though Dr. Mortimer’s meta-analysis was done in response to the Sanchez-Ramos’s improper, tardy disclosure, the MDL judge ruled that Mortimer’s meta-analysis was too late. The effect, however, of Mortimer’s meta-analysis was clear in showing that welding had no positive association with Parkinson’s disease outcomes. The MDL 1535 resolved quickly thereafter, and with only slight encouragement, Dr. Mortimer published a further refined meta-analysis with two other leading neuro-epidemiologists. See James Mortimer, Amy Borenstein, and Lorene Nelson, “Associations of welding and manganese exposure with Parkinson disease: Review and meta-analysis,” 79 Neurology 1174 (2012). See also Manganese Meta-Analysis Further Undermines Reference Manual’s Toxicology Chapter(Oct. 15, 2012).


1 See, e.g., David Michaels & Celeste Monforton, “Manufacturing Uncertainty Contested Science and the Protection ofthe Public’s Health and Environment,” 95 Am. J. Pub. Health S39, S40 (2005); David Michaels & Celeste Monforton, “How Litigation Shapes the Scientific Literature: Asbestos and Disease Among Automobile Mechanics,” 15 J. L. & Policy 1137, 1165 (2007). Michaels had served as a plaintiffs’ paid expert witness in chemical exposure litigation, and Monforton had been employed by labor unions before these papers were published, without disclosure of conflicts.

2 Leslie Boden & David Ozonoff, “Litigation-Generated Science: Why Should We Care?” 116 Envt’l Health Persp. 121, 121 (2008) (arguing that systematic distortion of the scientific record will result from litigation-sponsored papers even with disclosure of conflicts of interest). Ozonoff had served as a hired plaintiffs’ expert witnesses on multiple occasion before the publication of this article, which was unadorned by disclosure.

3 Lennart Hardell, Martin J. Walker, Bo Walhjalt, Lee S. Friedman, and Elihu D. Richter, “Secret Ties to Industry and Conflicting Interest in Cancer Research,” 50 Am. J. Indus. Med. 227, 233 (2007) (criticizing “powerful industrial interests” for “undermining independent research on hazard and risk,” in a “red” journal that is controlled by allies of the lawsuit industry). Hardell was an expert witness for plaintiffs in mobile phone litigation in which plaintiffs claimed that non-ionizing radiation caused brain cancer. In federal litigation, Hardell was excluded as an expert witness when his proffered opinions were found to be scientifically unreliable. Newman v. Motorola, Inc., 218 F. Supp. 2d. 769, 777 (D. Md. 2002), aff’d, 78 Fed. Appx. 292 (4th Cir. 2003).

4 See David Egilman & Susanna Bohme, “IJOEH and the Critique of Bias,” 14 Internat’l J. Occup. & Envt’l Health 147, 148 (2008) (urging a Marxist critique that industry-sponsored research is necessarily motivated by profit considerations, and biased in favor of industry funders). Although Egilman usually gives a disclosure of his litigation activities, he typically characterizes those activities as having been for both plaintiffs and defendants, even though his testimonial work for defendants is minuscule.

5 Kenneth J. Rothman, “Conflict of Interest: The New McCarthyism in Science,” 269 J. Am. Med. Ass’n 2782 (1993).

6 See Charles H. Hennekens, I-Min Lee, Nancy R. Cook, Patricia R. Hebert, Elizabeth W. Karlson, Fran LaMotte; JoAnn E. Manson, and Julie E. Buring, “Self-reported Breast Implants and Connective- Tissue Diseases in Female Health Professionals: A Retrospective Cohort Study, 275 J. Am. Med. Ass’n 616-19 (1998) (analyzing established cohort for claimed associations, with funding from the National Institutes of Health and Dow Corning Corporation).

7 See Barbara Hulka, Betty Diamond, Nancy Kerkvliet & Peter Tugwell, “Silicone Breast Implants in Relation to Connective Tissue Diseases and Immunologic Dysfunction: A Report by a National Science Panel to the Hon. Sam Pointer Jr., MDL 926 (Nov. 30, 1998).” The court-appointed expert witnesses dedicated a great deal of their professional time to their task of evaluating the plaintiffs’ claims and the evidence. At the end of the process, they all published their litigation work in leading journals. See Barbara Hulka, Nancy Kerkvliet & Peter Tugwell, “Experience of a Scientific Panel Formed to Advise the Federal Judiciary on Silicone Breast Implants,” 342 New Engl. J. Med. 812 (2000); Esther C. Janowsky, Lawrence L. Kupper., and Barbara S. Hulka, “Meta-Analyses of the Relation between Silicone Breast Implants and the Risk of Connective-Tissue Diseases,” 342 New Engl. J. Med. 781 (2000); Peter Tugwell, George Wells, Joan Peterson, Vivian Welch, Jacqueline Page, Carolyn Davison, Jessie McGowan, David Ramroth, and Beverley Shea, “Do Silicone Breast Implants Cause Rheumatologic Disorders? A Systematic Review for a Court-Appointed National Science Panel,” 44 Arthritis & Rheumatism 2477 (2001).

8 Stuart Bondurant, Virginia Ernster, and Roger Herdman, eds., Safety of Silicone Breast Implants (Institute of Medicine) (Wash. D.C. 1999).

9 See also Lester Brickman, “On the Applicability of the Silica MDL Proceeding to Asbestos Litigation, 12 Conn. Insur. L. J. 289 (2006); Lester Brickman, “Disparities Between Asbestosis and Silicosis Claims Generated By Litigation Screenings and Clinical Studies,” 29 Cardozo L. Rev. 513 (2007).

10 This apt phraseology is due to the late Keith Morgan, whose wit, wisdom, and scientific acumen are greatly missed. See W. Keith C. Morgan, “Meretricious Mensuration,” 6 J. Eval. Clin. Practice 1 (2000).

11 See Deposition of Dr. Juan Sanchez-Ramos, in Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008514 (N.D. Ohio May 17, 2011).

12 See Deposition of Dr. James Mortimer, in Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008054 (N.D. Ohio June 29, 2011).

The Education of Judge Rufe – The Zoloft MDL

April 9th, 2016

The Honorable Cynthia M. Rufe is a judge on the United States District Court, for the Eastern District of Pennsylvania.  Judge Rufe was elected to a judgeship on the Bucks County Court of Common Pleas in 1994.  She was appointed to the federal district court in 2002. Like most state and federal judges, little in her training and experience as a lawyer prepared her to serve as a gatekeeper of complex expert witness scientific opinion testimony.  And yet, the statutory code of evidence, and in particular, Federal Rules of Evidence 702 and 703, requires her do just that.

The normal approach to MDL cases is marked by the Field of Dreams: “if you build it, they will come.” Last week, Judge Rufe did something that is unusual in pharmaceutical litigation; she closed the gate and sent everyone home. In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799 (E.D. Pa. April 5, 2016).

Her Honor’s decision was hardly made in haste.  The MDL began in 2012, and proceeded in a typical fashion with case management orders that required the exchange of general causation expert witness reports. The plaintiffs’ steering committee (PSC), acting for the plaintiffs, served the report of only one epidemiologist, Anick Bérard, who took the position that Zoloft causes virtually every major human congenital anomaly known to medicine. The defendants challenged the admissibility of Bérard’s opinions.  After extensive briefings and evidentiary hearings, the trial court found that Bérard’s opinions were riddled with inconsistent assessments of studies, eschewed generally accepted methods of causal inference, ignored contrary evidence, adopted novel, unreliable methods of endorsing “trends” in studies, and failed to address epidemiologic studies that did not support her subjective opinions. In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D.Pa.2014). The trial court permitted plaintiffs an opportunity to seek reconsideration of Bérard’s exclusion, which led to the trial court’s reaffirming its previous ruling. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 314149, at *2 (E.D.Pa. Jan. 23, 2015).

Notwithstanding the PSC’s claims that Bérard was the best qualified expert witness in her field and that she was the only epidemiologist needed to support the plaintiffs’ causal claims, the MDL court indulged the PSC by permitting plaintiffs another bite at the apple.  Over defendants’ objections, the court permitted the PSC to name yet another expert witness, statistician Nicholas Jewell, to do what Bérard had failed to do: proffer an opinion on general causation supported by sound science.  In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 115486, at * 2 (E.D.Pa. Jan. 7, 2015).

As a result of this ruling, the MDL dragged on for over a year, in which time, the PSC served a report by Jewell, and then the defendants conducted a discovery deposition of Jewell, and lodged a new Rule 702 challenge.  Although Jewell brought more statistical sophistication to the task, he could not transmute lead into gold; nor could he support the plaintiffs’ causal claims without committing most of the same fallacies found in Bérard’s opinions.  After another round of Rule 702 briefs and hearings, the MDL court excluded Jewell’s unwarranted causal opinions. In re Zoloft Prods. Liab. Litig., No. 12–md–2342, 2015 WL 7776911 (E.D.Pa. Dec. 2, 2015).

The successive exclusions of Bérard and Jewell left the MDL court in a peculiar position. There were other witnesses, Robert Cabrera, a teratologist, Michael Levin, a molecular biologist, and Thomas Sadler, an embryologist, whose opinions addressed animal toxicologic studies, biological plausibility, and putative mechanisms.  These other witnesses, however, had little or no competence in epidemiology, and they explicitly relied upon Bérard’s opinions with respect to human outcomes.  As a result of Bérard’s exclusion, these witnesses were left free to offer their views about what happens in animals at high doses, or about theoretical mechanisms, but they were unable to address human causation.

Although the PSC had no expert witnesses who could legitimately offer reasonably supported opinions about the causation of human birth defects, the plaintiffs refused to decamp and leave the MDL forum. Faced with the prospect of not trying their cases to juries, the PSC instead tried the patience of the MDL judge. The PSC pulled out the stops in adducing weak, irrelevant, and invalid evidence to support their claims, sans epidemiologic expertise. The PSC argued that adverse event reports, internal company documents that discussed possible associations, the biological plausibility opinions of Levin and Sadler, the putative mechanism opinions of Cabrera, differential diagnoses offered to support specific causation, and the hip-shot opinions of a former-FDA-commissioner-for-hire, David Kessler could come together magically to supply sufficient evidence to have their cases submitted to juries. Judge Rufe saw through the transparent effort to manufacture evidence of causation, and granted summary judgment on all remaining Zoloft cases in the MDL. s In re Zoloft Prod. Liab. Litig., MDL NO. 2342, 12-MD-2342, 2016 WL 1320799, at *4 (E.D. Pa. April 5, 2016).

After a full briefing and hearing on Bérard’s opinion, a reconsideration of Bérard, a permitted “do over” of general causation with Jewell, a full briefing and hearing on Jewell’s opinions, the MDL court was able to deal deftly with the snippets of evidence “cobbled together” to substitute for evidence that might support a conclusion of causation. The PSC’s cobbled case was puffed up to give the appearance of voluminous evidence, in 200 exhibits that filled six banker’s boxes.  Id. at *5. The ruse was easily undone; most of the exhibits and purported evidence were obvious rubbish. “The quantity of the evidence is not, however, coterminous with the quality of evidence with regard to the issues now before the Court.” Id. The banker’s boxes contained artifices such as untranslated foreign-language documents, and company documents relating to the development and marketing of the medication. The PSC resubmitted reports from Levin, Cabrera, and Sadler, whose opinions were already adjudicated to be incompetent, invalid, irrelevant, or inadequate to support general causation.  The PSC pointed to the specific causation opinions of a clinical cardiologist, Ra-Id Abdulla, M.D., who proffered dubious differential etiologies, ruling in Zoloft as a cause of individual children’s birth defects, despite his inability to rule out truly known and unknown causes in the differential reasoning.  The MDL court, however, recognized that “[a] differential diagnosis assumes that general causation has been established,” id. at *7, and that Abdulla could not bootstrap general causation by purporting to reach a specific causation opinion (even if those specific causation opinions were legitimate).

The PSC submitted the recent consensus statement of the American Statistical Association (ASA)[1], which it misrepresented to be an epidemiologic study.  Id. at *5. The consensus statement makes some pedestrian pronouncements about the difference between statistical and clinical significance, about the need for other considerations in addition to statistical significance, in supporting causal claims, and the lack of bright-line distinctions for statistical significance in assessing causality.  All true, but immaterial to the PSC’s expert witnesses’ opinions that over-endorsed statistical significance in the few instances in which it was shown, and over-interpreted study data that was based upon data mining and multiple comparisons, in blatant violation of the ASA’s declared principles.

Stretching even further for “human evidence,” the PSC submitted documentary evidence of adverse event reports, as though they could support a causal conclusion.[2]  There are about four million live births each year, with an expected rate of serious cardiac malformations of about one per cent.[3]  The prevalence of SSRI anti-depressant use is at least two per cent, which means that we would expect 800 cardiac birth defects each year to occur in children of mother’s who took SSRI anti-depressants in the first trimester. If Zoloft had an average market share of all the SSRIs of about 25 per cent, then 200 cardiac defects each year would occur in children born to mothers who took Zoloft.  Given that Zoloft has been on the market since the early 1990s, we would expect that there would be thousands of children, exposed to Zoloft during embryogenesis, born with cardiac defects, if there was nothing untoward about maternal exposure to the medication.  Add the stimulated reporting of adverse events from lawyers, lawyer advertising, and lawyer instigation, you have manufactured evidence not probative of causation at all.[4] The MDL court cut deftly and swiftly through the smoke screen:

“These reports are certainly relevant to the generation of study hypotheses, but are insufficient to create a material question of fact on general causation.”

Id. at *9. The MDL court recognized that epidemiology was very important in discerning a causal connection between a common exposure and a common outcome, especially when the outcome has an expected rate in the general population. The MDL court stopped short of holding that epidemiologic evidence was required (which on the facts of the case would have been amply justified), but instead supported its ratio decidendi on the need to account for the extant epidemiology that contradicted or failed to support the strident and subjective opinions of the plaintiffs’ expert witnesses. The MDL court thus gave plaintiffs every benefit of the doubt by limiting its holding on the need for epidemiology to:

“when epidemiological studies are equivocal or inconsistent with a causation opinion, experts asserting causation opinions must thoroughly analyze the strengths and weaknesses of the epidemiological research and explain why that body of research does not contradict or undermine their opinion.”

Id. at *5, quoting from In re Zoloft Prods. Liab. Litig., 26 F. Supp. 3d 449, 476 (E.D. Pa. 2014).

The MDL court also saw through the thin veneer of respectability of the testimony of David Kessler, a former FDA commissioner who helped make large fortunes for some of the members of the PSC by the feeding frenzy he created with his moratorium on silicone gel breast implants.  Even viewing Kessler’s proffered testimony in the most charitable light, the court recognized that he offered little support for a causal conclusion other than to delegate the key issues to epidemiologists. Id. at *9. As for the boxes of regulatory documents, foreign labels, and internal company memoranda, the MDL court found that these documents did not raise a genuine issue of material fact concerning general causation:

“Neither these documents, nor draft product documents or foreign product labels containing language that advises use of birth control by a woman taking Zoloft constitute an admission of causation, as opposed to acknowledging a possible association.”

Id.

In the end, the MDL court found that the PSC’s many banker boxes of paper contained too much of nothing for the issue at hand.  Having put the defendants through the time and expense of litigating and re-litigating these issues, nothing short of dismissing the pending cases was a fair and appropriate outcome to the Zoloft MDL.

_______________________________________

Given the denouement of the Zoloft MDL, it is worth considering the MDL judge’s handling of the scientific issues raised, misrepresented, argued, or relied upon by the parties.  Judge Rufe was required, by Rules 702 and 703, to roll up her sleeves and assess the methodological validity of the challenged expert witnesses’ opinions.  That Her Honor was able to do this is a testament to her hard work. Zoloft was not Judge Rufe’s first MDL, and she clearly learned a lot from her previous judicial assignment to an MDL for Avandia personal injury actions.

On May 21, 2007, the New England Journal of Medicine published online a seriously flawed meta-analysis of cardiovascular disease outcomes and rosiglitazone (Avandia) use.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  The Nissen article did not appear in print until June 14, 2007, but the first lawsuits resulted within a day or two of the in-press version. The lawsuits soon thereafter reached a critical mass, with the inevitable creation of a federal court Multi-District Litigation.

Within a few weeks of Nissen’s article, the Annals of Internal Medicine published an editorial by Cynthia Mulrow, and other editors, in which questioned the Nissen meta-analysis[5], and introduced an article that attempted to replicate Nissen’s work[6].  The attempted replication showed that the only way Nissen could have obtained his nominally statistically significant result was to have selected a method, Peto’s fixed effect method, known to be biased for use with clinical trials with uneven arms. Random effect methods, more appropriate for the clinically heterogeneous clinical trials, consistently failed to replicate the Nissen result. Other statisticians weighed in and pointed out that using the risk difference made much more sense when there were multiple trials with zero events in one or the other or both arms of the trials. Trials with zero cardiovascular events in both arms represented important evidence of low, but equal risk, of heart attacks, which should be captured in an appropriate analysis.  When the risk difference approach was used, with exact statistical methods, there was no statistically significant increase in risk in the dataset used by Nissen.[7] Other scientists, including some of Nissen’s own colleagues at the Cleveland Clinic, and John Ioannidis, weighed in to note how fragile and insubstantial the Nissen meta-analysis was[8]:

“As rosiglitazone case demonstrates, minor modifications of the meta-analysis protocol can change the statistical significance of the result.  For small effects, even the direction of the treatment effect estimate may change.”

Nissen achieved his political objective with his shaky meta-analysis.  The FDA convened an Advisory Committee meeting, which in turn resulted in a negative review of the safety data, and the FDA’s imposition of warnings and a Risk Evaluation and Mitigation Strategy, which all but prohibited use of rosiglizone.[9]  A clinical trial, RECORD, had already started, with support from the drug sponsor, GlaxoSmithKline, which fortunately was allowed to continue.

On a parallel track to the regulatory activities, the federal MDL, headed by Judge Rufe, proceeded to motions and a hearing on GSK’s Rule 702 challenge to plaintiffs’ evidence of general causation. The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that showed significant diffidence in addressing scientific issues.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011.

After Judge Rufe denied GSK’s challenges to the admissibility of plaintiffs’ expert witnesses’ causation opinions in the Avandia MDL, the RECORD trial was successfully completed and published.[10]  RECORD was a long term, prospectively designed randomized cardiovascular trial in over 4,400 patients, followed on average of 5.5 yrs.  The trial was designed with a non-inferiority end point of ruling out a 20% increased risk when compared with standard-of-care diabetes treatment The trial achieved its end point, with a hazard ratio of 0.99 (95% confidence interval, 0.85-1.16) for cardiovascular hospitalization and death. A readjudication of outcomes by the Duke Clinical Research Institute confirmed the published results.

On Nov. 25, 2013, after convening another Advisory Committee meeting, the FDA announced the removal of most of its restrictions on Avandia:

“Results from [RECORD] showed no elevated risk of heart attack or death in patients being treated with Avandia when compared to standard-of-care diabetes drugs. These data do not confirm the signal of increased risk of heart attacks that was found in a meta-analysis of clinical trials first reported in 2007.”

FDA Press Release, “FDA requires removal of certain restrictions on the diabetes drug Avandia” (Nov. 25, 2013). And in December 2015, the FDA abandoned its requirement of a Risk Evaluation and Mitigation Strategy for Avandia. FDA, “Rosiglitazone-containing Diabetes Medicines: Drug Safety Communication – FDA Eliminates the Risk Evaluation and Mitigation Strategy (REMS)” (Dec. 16, 2015).

GSK’s vindication came too late to reverse Judge Rufe’s decision in the Avandia MDL.  GSK spent over six billion dollars on resolving Avandia claims.  And to add to the company’s chagrin, GSK lost patent protection for Avandia in April 2012.[11]

Something good, however, may have emerged from the Avandia litigation debacle.  Judge Rufe heard from plaintiffs’ expert witnesses in Avandia about the hierarchy of evidence, about how observational studies must be evaluated for bias and confounding, about the importance of statistical significance, and about how studies that lack power to find relevant associations may still yield conclusions with appropriate meta-analysis. Important nuances of meta-analysis methodology may have gotten lost in the kerfuffle, but given that plaintiffs had reasonable quality clinical trial data, Avandia plaintiffs’ counsel could eschew their typical reliance upon weak and irrelevant lines of evidence, based upon case reports, adverse event disproportional reporting, and the like.

The Zoloft litigation introduced Judge Rufe to a more typical pharmaceutical litigation. Because the outcomes of interest were birth defects, there were no clinical trials.  To be sure, there were observational epidemiologic studies, but now the defense expert witnesses were carefully evaluating the studies for bias and confounding, and the plaintiffs’ expert witnesses were double counting studies and ignoring multiple comparisons and validity concerns.  Once again, in the Zoloft MDL, plaintiffs’ expert witnesses made their non-specific complaints about “lack of power” (without ever specifying the relevant alternative hypothesis), but it was the defense expert witnesses who cited relevant meta-analyses that attempted to do something about the supposed lack of power. Plaintiffs’ expert witnesses inconsistently argued “lack of power” to disregard studies that had outcomes that undermined their opinions, even when those studies had narrow confidence intervals surrounding values at or near 1.0.

The Avandia litigation laid the foundation for Judge Rufe’s critical scrutiny by exemplifying the nature and quantum of evidence to support a reasonable scientific conclusion.  Notwithstanding the mistakes made in the Avandia litigation, this earlier MDL created an invidious distinction with the Zoloft PSC’s evidence and arguments, which looked as weak and insubstantial as they really were.


[1] Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” The American Statistician, available online (Mar. 7, 2016), in-press at DOI:10.1080/00031305.2016.1154108, <http://dx.doi.org/10.1080/>. SeeThe American Statistical Association’s Statement on and of Significance” (Mar. 17, 2016); “The ASA’s Statement on Statistical Significance – Buzzing from the Huckabees” (Mar. 19, 2016).

[2] See 21 C.F.R. § 314.80 (a) Postmarketing reporting of adverse drug experiences (defining “[a]dverse drug experience” as “[a]ny adverse event associated with the use of a drug in humans, whether or not considered drug related”).

[3] See Centers for Disease Control and Prevention, “Birth Defects Home Page” (last visited April 8, 2016).

[4] See, e.g., Derrick J. Stobaugh, Parakkal Deepak, & Eli D. Ehrenpreis, “Alleged isotretinoin-associated inflammatory bowel disease: Disproportionate reporting by attorneys to the Food and Drug Administration Adverse Event Reporting System,” 69 J. Am. Acad. Dermatol. 393 (2013) (documenting stimulated reporting from litigation activities).

[5] Cynthia D. Mulrow, John Cornell & A. Russell Localio, “Rosiglitazone: A Thunderstorm from Scarce and Fragile Data,” 147 Ann. Intern. Med. 585 (2007).

[6] George A. Diamond, Leon Bax & Sanjay Kaul, “Uncertain Effects of Rosiglitazone on the Risk for Myocardial Infartion and Cardiovascular Death,” 147 Ann. Intern. Med. 578 (2007).

[7] Tian, et al., “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 × 2 tables with all available data but without artificial continuity correction” 10 Biostatistics 275 (2008)

[8] Adrian V. Hernandez, Esteban Walker, John P.A. Ioannidis,  and Michael W. Kattan, “Challenges in meta-analysis of randomized clinical trials for rare harmful cardiovascular events: the case of rosiglitazone,” 156 Am. Heart J. 23, 28 (2008).

[9] Janet Woodcock, FDA Decision Memorandum (Sept. 22, 2010).

[10] Philip D. Home, et al., “Rosiglitazone evaluated for cardiovascular outcomes in oral agent combination therapy for type 2 diabetes (RECORD): a multicentre, randomised, open-label trial,” 373 Lancet 2125 (2009).

[11]Pharmacovigilantism – Avandia Litigation” (Nov. 27, 2013).

Systematic Reviews and Meta-Analyses in Litigation

February 5th, 2016

Kathy Batty is a bellwether plaintiff in a multi-district litigation[1] (MDL) against Zimmer, Inc., in which hundreds of plaintiffs claim that Zimmer’s NexGen Flex implants are prone to have their femoral and tibial elements prematurely aseptically loosen (independent of any infection). Batty v. Zimmer, Inc., MDL No. 2272, Master Docket No. 11 C 5468, No. 12 C 6279, 2015 WL 5050214 (N.D. Ill. Aug. 25, 2015) [cited as Batty].

PRISMA Guidelines for Systematic Reviews

Zimmer proffered Dr. Michael G. Vitale, an orthopedic surgeon, with a master’s degree in public health, to testify that, in his opinion, Batty’s causal claims were unfounded. Batty at *4. Dr. Vitale prepared a Rule 26 report that presented a formal, systematic review of the pertinent literature. Batty at *3. Plaintiff Batty challenged the admissibility of Dr. Vitale’s opinion on grounds that his purportedly “formal systematic literature review,” done for litigation, was biased and unreliable, and not conducted according to generally accepted principles for such reviews. The challenged was framed, cleverly, in terms of Dr. Vitale’s failure to comply with a published set of principles outlined in “PRISMA” guidelines (Preferred Reporting Items for Systematic reviews and Meta-Analyses), which enjoy widespread general acceptance among the clinical journals. See David Moher , Alessandro Liberati, Jennifer Tetzlaff, Douglas G. Altman, & The PRISMA Group, “Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement,” 6 PLoS Med e1000097 (2009) [PRISMA]. Batty at *5. The trial judge, Hon. Rebecca R. Pallmeyer, denied plaintiff’s motion to exclude Dr. Vitale, but in doing so accepted, arguendo, the plaintiff’s implicit premise that an expert witness’s opinion should be reached in the manner of a carefully constructed systematic review.

The plaintiff’s invocation of the PRISMA guidelines presented several difficult problems for her challenge and for the court. PRISMA provides a checklist of 27 items for journal editors to assess the quality and completeness of systematic reviews that are submitted for publication. Plaintiff Batty focused on several claimed deviations from the guidelines:

  • “failing to explicitly state his study question,
  • failing to acknowledge the limitations of his review,
  • failing to present his findings graphically, and failing to reproduce his search results.”

Batty’s challenge to Dr. Vitale thus turned on whether Zimmer’s expert witness had failed to deploy “same level of intellectual rigor,” as someone in the world of clinical medicine would [should] have in conducting a similar systematic review. Batty at *6.

Zimmer deflected the challenge, in part by arguing that PRISMA’s guidelines are for the reporting of systematic reviews, and they are not necessarily criteria for valid reviews. The trial court accepted this rebuttal, Batty at *7, but missed the point that some of the guidelines call for methods that are essential for rigorous, systematic reviews, in any forum, and do not merely specify “publishability.” To be sure, PRISMA itself does not always distinguish between what is essential for journal publication, as opposed to what is needed for a sufficiently valid systematic review. The guidelines, for instance, call for graphical displays, but in litigation, charts, graphs, and other demonstratives are often not produced until the eve of trial, when case management orders call for the parties to exchange such materials. In any event, Dr. Vitale’s omission of graphical representations of his findings was consistent with his finding that the studies were too clinical heterogeneous in study design, follow-up time, and pre-specified outcomes, to permit nice, graphical summaries. Batty at *7-8.

Similarly, the PRISMA guidelines call for a careful specification of the clinical question to be answered, but in litigation, the plaintiff’s causal claims frame the issue to be addressed by the defense expert witness’s literature review. The trial court readily found that Dr. Vitale’s research question was easily discerned from the context of his report in the particular litigation. Batty at *7.

Plaintiff Batty’s challenge pointed to Dr. Vitale’s failure to acknowledge explicitly the limitations of his systematic review, an omission that virtually defines expert witness reports in litigation. Given the availability of discovery tools, such as a deposition of Dr. Vitale (at which he readily conceded the limitations of his review), and the right of confrontation and cross-examination (which are not available, alas, for published articles), the trial court found that this alleged deviation was not particularly relevant to the plaintiff’s Rule 702 challenge. Batty at *8.

Batty further charged that Dr. Vitale had not “reproduced” his own systematic review. Arguing that a systematic review’s results must be “transparent and reproducible,” Batty claimed that Zimmer’s expert witness’s failure to compile a list of studies that were originally retrieved from his literature search deprived her, and the trial court, of the ability to determine whether the search was complete and unbiased. Batty at *8. Dr. Vitale’s search protocol and inclusionary and exclusionary criteria were, however, stated, explained, and reproducible, even though Dr. Vitale did not explain the application of his criteria to each individual published paper. In the final analysis, the trial court was unmoved by Batty’s critique, especially given that her expert witnesses failed to identify any relevant studies omitted from Dr. Vitale’s review. Batty at *8.

Lumping or Commingling of Heterogeneous Studies

The plaintiff pointed to Dr. Vitale’s “commingling” of studies, heterogeneous in terms of “study length, follow-up, size, design, power, outcome, range of motion, component type” and other clinical features, as a deep flaw in the challenged expert witness’s methodology. Batty at *9. Batty’s own retained expert witness, Dr. Kocher, supported Batty’s charge by adverting to the clinical variability in studies included in Dr. Vitale’s review, and suggesting that “[h]igh levels of heterogeneity preclude combining study results and making conclusions based on combining studies.” Dr. Kocher’s argument was rather beside the point because Dr. Vitale had not impermissibly combined clinically or statistically heterogeneous outcomes.[2] Similarly, the plaintiff’s complaint that Dr. Vitale had used inconsistent criteria of knee implant survival rates was dismissed by the trial court, which easily found Dr. Vitale’s survival criteria both pre-specified and consistent across his review of studies, and relevant to the specific alleged by Ms. Batty. Batty at *9.

Cherry Picking

The trial court readily agreed with Plaintiff’s premise that an expert witness who used inconsistent inclusionary and exclusionary criteria would have to be excluded under Rule 702. Batty at *10, citing In Re Zoloft, 26 F. Supp. 3d 449, 460–61 (E.D. Pa.2014) (excluding epidemiologist Dr. Anick Bérard proffered testimony because of her biased cherry picking and selection of studies to support her studies, and her failure to account for contradictory evidence). The trial court, however, did find that Dr. Vitale’s review was corrupted by the kind of biased cherry picking that Judge Rufe found to have been committed by Dr. Anick Bérard, in the Zoloft MDL.

Duplicitous Duplication

Plaintiff’s challenge of Dr. Vitale did manage to spotlight an error in Dr. Vitale’s inclusion of two studies that were duplicate analyses of the same cohort. Apparently, Dr. Vitale had confused the studies as not being of the same cohort because the two papers reported different sample sizes. Dr. Vitale admitted that his double counting the same cohort “got by the peer-review process and it got by my filter as well.” Batty at *11, citing Vitale Dep. 284:3–12. The trial court judged Dr. Vitale’s error to have been:

“an inadvertent oversight, not an attempt to distort the data. It is also easily correctable by removing one of the studies from the Group 1 analysis so that instead of 28 out of 35 studies reporting 100% survival rates, only 27 out of 34 do so.”

Batty at *11.

The error of double counting studies in quantitative reviews and meta-analyses has become a prevalent problem in both published studies[3] and in litigation reports. Epidemiologic studies are sometimes updated and extended with additional follow up. The prohibition against double counting data is so obvious that it often is not even identified on checklists, such as PRISMA. Furthermore, double counting of studies, or subgroups within studies, is a flaw that most careful readers can identify in a meta-analysis, without advance training. According to statistician Stephen Senn, double counting of evidence is a serious problem in published meta-analytical studies.[4] Senn observes that he had little difficulty in finding examples of meta-analyses gone wrong, including meta-analyses with double counting of studies or data, in some of the leading clinical medical journals. Senn urges analysts to “[b]e vigilant about double counting,” and recommends that journals should withdraw meta-analyses promptly when mistakes are found.”[5]

An expert witness who wished to skate over the replication and consistency requirement might be tempted, as was Dr. Michael Freeman, to count the earlier and later iteration of the same basic study to count as “replication.” Proper methodology, however, prohibits double dipping data to count the later study that subsumes the early one as a “replication”:

“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology. Dr. Freeman claims the Alwan and Reefhuis studies demonstrate replication. However, the population Alwan studied is only a subset of the Reefhuis population and therefore they are effectively the same.”

Porter v. SmithKline Beecham Corp., No. 03275, 2015 WL 5970639, at *9 (Phila. Cty. Pennsylvania, Ct. C.P. October 5, 2015) (Mark I. Bernstein, J.)

Conclusions

The PRISMA and similar guidelines do not necessarily map the requisites of admissible expert witness opinion testimony, but they are a source of some important considerations for the validity of any conclusion about causality. On the other hand, by specifying the requisites of a good publication, some PRISMA guidelines are irrelevant to litigation reports and testimony of expert witnesses. Although Plaintiff Batty’s challenge overreached and failed, the premise of her challenge is noteworthy, as is the trial court’s having taken the premise seriously. Ultimately, the challenge to Dr. Vitale’s opinion failed because the specified PRISMA guidelines, supposedly violated, were either irrelevant or satisfied.


[1] Zimmer Nexgen Knee Implant Products Liability Litigation.

[2] Dr. Vitale’s review is thus easily distinguished from what has become commonplace in litigation of birth defect claims, where, for instance, some well-known statisticians [names available upon request] have conducted qualitative reviews and quantitative meta-analyses of highly disparate outcomes, such as any and all cardiovascular congenital anomalies. In one such case, a statistician expert witness hired by plaintiffs presented a meta-analysis that included study results of any nervous system defect, and central nervous system defect, and any neural tube defect, without any consideration of clinical heterogeneity or even overlap with study results.

[3] See, e.g., Shekoufeh Nikfar, Roja Rahimi, Narjes Hendoiee, and Mohammad Abdollahi, “Increasing the risk of spontaneous abortion and major malformations in newborns following use of serotonin reuptake inhibitors during pregnancy: A systematic review and updated meta-analysis,” 20 DARU J. Pharm. Sci. 75 (2012); Roja Rahimi, Shekoufeh Nikfara, Mohammad Abdollahic, “Pregnancy outcomes following exposure to serotonin reuptake inhibitors: a meta-analysis of clinical trials,” 22 Reproductive Toxicol. 571 (2006); Anick Bérard, Noha Iessa, Sonia Chaabane, Flory T. Muanda, Takoua Boukhris, and Jin-Ping Zhao, “The risk of major cardiac malformations associated with paroxetine use during the first trimester of pregnancy: A systematic review and meta-analysis,” 81 Brit. J. Clin. Pharmacol. (2016), in press, available at doi: 10.1111/bcp.12849.

[4] Stephen J. Senn, “Overstating the evidence – double counting in meta-analysis and related problems,” 9, at *1 BMC Medical Research Methodology 10 (2009).

[5] Id. at *1, *4.


DOUBLE-DIP APPENDIX

Some papers and textbooks, in addition to Stephen Senn’s paper, cited above, which note the impermissible method of double counting data or studies in quantitative reviews.

Aaron Blair, Jeanne Burg, Jeffrey Foran, Herman, Gibb, Sander Greenland, Robert Morris, Gerhard Raabe, David Savitz, Jane Teta, Dan Wartenberg, Otto Wong, and Rae Zimmerman, “Guidelines for Application of Meta-analysis in Environmental Epidemiology,” 22 Regulatory Toxicol. & Pharmacol. 189, 190 (1995).

“II. Desirable and Undesirable Attributes of Meta-Analysis

* * *

Redundant information: When more than one study has been conducted on the same cohort, the later or updated version should be included and the earlier study excluded, provided that later versions supply adequate information for the meta-analysis. Exclusion of, or in rare cases, carefully adjusting for overlapping or duplicated studies will prevent overweighting of the results by one study. This is a critical issue where the same cohort is reexamined or updated several times. Where duplication exists, decision criteria should be developed to determine which of the studies are to be included and which excluded.”

Sander Greenland & Keith O’Rourke, “Meta-Analysis – Chapter 33,” in Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 652, 655 (3d ed. 2008) (emphasis added)

Conducting a Sound and Credible Meta-Analysis

Like any scientific study, an ideal meta-analysis would follow an explicit protocol that is fully replicable by others. This ideal can be hard to attain, but meeting certain conditions can enhance soundness (validity) and credibility (believability). Among these conditions we include the following:

  • A clearly defined set of research questions to address.

  • An explicit and detailed working protocol.

  • A replicable literature-search strategy.

  • Explicit study inclusion and exclusion criteria, with a rationale for each.

  • Nonoverlap of included studies (use of separate subjects in different included studies), or use of statistical methods that account for overlap.* * * * *”

Matthias Egger, George Davey Smith, and Douglas G. Altman, Systematic Reviews in Health Care: Meta-Analysis in Context 59 – 60 (2001).

Duplicate (multiple) publication bias

***

The production of multiple publications from single studies can lead to bias in a number of ways.85 Most importantly, studies with significant results are more likely to lead to multiple publications and presentations,45 which makes it more likely that they will be located and included in a meta-analysis. The inclusion of duplicated data may therefore lead to overestimation of treatment effects, as recently demonstrated for trials of the efficacy of ondansetron to prevent postoperative nausea and vomiting86.”

Khalid Khan, Regina Kunz, Joseph Kleijnen, and Gerd Antesp, Systematic Reviews to Support Evidence-Based Medicine: How to Review and Apply Findings of Healthcare Research 35 (2d ed. 2011)

“2.3.5 Selecting studies with duplicate publication

Reviewers often encounter multiple publications of the same study. Sometimes these will be exact  duplications, but at other times they might be serial publications with the more recent papers reporting increasing numbers of participants or lengths of follow-up. Inclusion of duplicated data would inevitably bias the data synthesis in the review, particularly because studies with more positive results are more likely to be duplicated. However, the examination of multiple reports of the same study may provide useful information about its quality and other characteristics not captured by a single report. Therefore, all such reports should be examined. However, the data should only be counted once using the largest, most complete report with the longest follow-up.”

Julia H. Littell, Jacqueline Corcoran, and Vijayan Pillai, Systematic Reviews and Meta-Analysis 62-63 (2008)

Duplicate and Multiple Reports

***

It is a bit more difficult to identify multiple reports that emanate from a single study. Sometimes these reports will have the same authors, sample sizes, program descriptions, and methodological details. However, author lines and sample sizes may vary, especially when there are reports on subsamples taken from the original study (e.g., preliminary results or special reports). Care must be taken to ensure that we know which reports are based on the same samples or on overlapping samples—in meta-analysis these should be considered multiple reports from a single study. When there are multiple reports on a single study, we put all of the citations for that study together in summary information on the study.”

Kay Dickersin, “Publication Bias: Recognizing the Problem, Understanding Its Origins and Scope, and Preventing Harm,” Chapter 2, in Hannah R. Rothstein, Alexander J. Sutton & Michael Borenstein, Publication Bias in Meta-Analysis – Prevention, Assessment and Adjustments 11, 26 (2005)

“Positive results appear to be published more often in duplicate, which can lead to overestimates of a treatment effect (Timmer et al., 2002).”

Julian P.T. Higgins & Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions 152 (2008)

“7.2.2 Identifying multiple reports from the same study

Duplicate publication can introduce substantial biases if studies are  inadvertently included more than once in a meta-analysis (Tramer 1997). Duplicate publication can take various forms, ranging from identical manuscripts to reports describing different numbers of participants and different outcomes (von Elm 2004). It can be difficult to detect duplicate publication, and some ‘detectivework’ by the review authors may be required.”

Don’t Double Dip Data

March 9th, 2015

Meta-analyses have become commonplace in epidemiology and in other sciences. When well conducted and transparently reported, meta-analyses can be extremely helpful. In several litigations, meta-analyses determined the outcome of the medical causation issues. In the silicone gel breast implant litigation, after defense expert witnesses proffered meta-analyses[1], court-appointed expert witnesses adopted the approach and featured meta-analyses in their reports to the MDL court[2].

In the welding fume litigation, plaintiffs’ expert witness offered a crude, non-quantified, “vote counting” exercise to argue that welding causes Parkinson’s disease[3]. In rebuttal, one of the defense expert witnesses offered a quantitative meta-analysis, which provided strong evidence against plaintiffs’ claim.[4] Although the welding fume MDL court excluded the defense expert’s meta-analysis from the pre-trial Rule 702 hearing as untimely, plaintiffs’ counsel soon thereafter initiated settlement discussions of the entire set of MDL cases. Subsequently, the defense expert witness, with his professional colleagues, published an expanded version of the meta-analysis.[5]

And last month, a meta-analysis proffered by a defense expert witness helped dispatch a long-festering litigation in New Jersey’s multi-county isotretinoin (Accutane) litigation. In re Accutane Litig., No. 271(MCL), 2015 WL 753674 (N.J. Super., Law Div., Atlantic Cty., Feb. 20, 2015) (excluding plaintiffs’ expert witness David Madigan).

Of course, when a meta-analysis is done improperly, the resulting analysis may be worse than none at all. Some methodological flaws involve arcane statistical concepts and procedures, and may be easily missed. Other flaws are flagrant and call for a gatekeeping bucket brigade.

When a merchant puts his hand the scale at the check-out counter, we call that fraud. When George Costanza double dipped his chip twice in the chip dip, he was properly called out for his boorish and unsanitary practice. When a statistician or epidemiologist produces a meta-analysis that double counts crucial data to inflate a summary estimate of association, or to create spurious precision in the estimate, we don’t need to crack open Modern Epidemiology or the Reference Manual on Scientific Evidence to know that something fishy has taken place.

In litigation involving claims that selective serotonin reuptake inhibitors cause birth defects, plaintiffs’ expert witness, a perinatal epidemiologist, relied upon two published meta-analyses[6]. In an examination before trial, this epidemiologist was confronted with the double counting (and other data entry errors) in the relied-upon meta-analyses, and she readily agreed that the meta-analyses were improperly done and that she had to abandon her reliance upon them.[7] The result of the expert witness’s deposition epiphany, however, was that she no longer had the illusory benefit of an aggregation of data, with an outcome supporting her opinion. The further consequence was that her opinion succumbed to a Rule 702 challenge. See In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL No. 2342; 12-md-2342, 2014 U.S. Dist. LEXIS 87592; 2014 WL 2921648 (E.D. Pa. June 27, 2014) (Rufe, J.).

Double counting of studies, or subgroups within studies, is a flaw that most careful readers can identify in a meta-analysis, without advance training. According to statistician Stephen Senn, double counting of evidence is a serious problem in published meta-analytical studies. Stephen J. Senn, “Overstating the evidence – double counting in meta-analysis and related problems,” 9, at *1 BMC Medical Research Methodology 10 (2009). Senn observes that he had little difficulty in finding examples of meta-analyses gone wrong, including meta-analyses with double counting of studies or data, in some of the leading clinical medical journals. Id. Senn urges analysts to “[b]e vigilant about double counting,” id. at *4, and recommends that journals should withdraw meta-analyses promptly when mistakes are found,” id. at *1.

Similar advice abounds in books and journals[8]. Professor Sander Greenland addresses the issue in his chapter on meta-analysis in Modern Epidemiology:

Conducting a Sound and Credible Meta-Analysis

Like any scientific study, an ideal meta-analysis would follow an explicit protocol that is fully replicable by others. This ideal can be hard to attain, but meeting certain conditions can enhance soundness (validity) and credibility (believability). Among these conditions we include the following:

  • A clearly defined set of research questions to address.

  • An explicit and detailed working protocol.

  • A replicable literature-search strategy.

  • Explicit study inclusion and exclusion criteria, with a rationale for each.

  • Nonoverlap of included studies (use of separate subjects in different included studies), or use of statistical methods that account for overlap. * * * * *”

Sander Greenland & Keith O’Rourke, “Meta-Analysis – Chapter 33,” in Kenneth J. Rothman, Sander Greenland, Timothy L. Lash, Modern Epidemiology 652, 655 (3d ed. 2008) (emphasis added).

Just remember George Costanza; don’t double dip that chip, and don’t double dip in the data.


[1] See, e.g., Otto Wong, “A Critical Assessment of the Relationship between Silicone Breast Implants and Connective Tissue Diseases,” 23 Regulatory Toxicol. & Pharmacol. 74 (1996).

[2] See Barbara Hulka, Betty Diamond, Nancy Kerkvliet & Peter Tugwell, “Silicone Breast Implants in Relation to Connective Tissue Diseases and Immunologic Dysfunction:  A Report by a National Science Panel to the Hon. Sam Pointer Jr., MDL 926 (Nov. 30, 1998)”; Barbara Hulka, Nancy Kerkvliet & Peter Tugwell, “Experience of a Scientific Panel Formed to Advise the Federal Judiciary on Silicone Breast Implants,” 342 New Engl. J. Med. 812 (2000).

[3] Deposition of Dr. Juan Sanchez-Ramos, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008514 (N.D. Ohio May 17, 2011).

[4] Deposition of Dr. James Mortimer, Street v. Lincoln Elec. Co., Case No. 1:06-cv-17026, 2011 WL 6008054 (N.D. Ohio June 29, 2011).

[5] James Mortimer, Amy Borenstein & Laurene Nelson, Associations of Welding and Manganese Exposure with Parkinson’s Disease: Review and Meta-Analysis, 79 Neurology 1174 (2012).

[6] Shekoufeh Nikfar, Roja Rahimi, Narjes Hendoiee, and Mohammad Abdollahi, “Increasing the risk of spontaneous abortion and major malformations in newborns following use of serotonin reuptake inhibitors during pregnancy: A systematic review and updated meta-analysis,” 20 DARU J. Pharm. Sci. 75 (2012); Roja Rahimi, Shekoufeh Nikfara, Mohammad Abdollahic, “Pregnancy outcomes following exposure to serotonin reuptake inhibitors: a meta-analysis of clinical trials,” 22 Reproductive Toxicol. 571 (2006).

[7] “Q So the question was: Have you read it carefully and do you understand everything that was done in the Nikfar meta-analysis?

A Yes, I think so.

* * *

Q And Nikfar stated that she included studies, correct, in the cardiac malformation meta-analysis?

A That’s what she says.

* * *

Q So if you look at the STATA output, the demonstrative, the — the forest plot, the second study is Kornum 2010. Do you see that?

A Am I —

Q You’re looking at figure four, the cardiac malformations.

A Okay.

Q And Kornum 2010, —

A Yes.

Q — that’s a study you relied upon.

A Mm-hmm.

Q Is that right?

A Yes.

Q And it’s on this forest plot, along with its odds ratio and confidence interval, correct?

A Yeah.

Q And if you look at the last study on the forest plot, it’s the same study, Kornum 2010, same odds ratio and same confidence interval, true?

A You’re right.

Q And to paraphrase My Cousin Vinny, no self-respecting epidemiologist would do a meta-analysis by including the same study twice, correct?

A Well, that was an error. Yeah, you’re right.

***

Q Instead of putting 2 out of 98, they extracted the data and put 9 out of 28.

A Yeah. You’re right.

Q So there’s a numerical transposition that generated a 25-fold increased risk; is that right?

A You’re correct.

Q And, again, to quote My Cousin Vinny, this is no way to do a meta-analysis, is it?

A You’re right.”

Testimony of Anick Bérard, Kuykendall v. Forest Labs, at 223:14-17; 238:17-20; 239:11-240:10; 245:5-12 (Cole County, Missouri; Nov. 15, 2013). According to a Google Scholar search, the Rahimi 2005 meta-analysis had been cited 90 times; the Nikfar 2012 meta-analysis, 11 times, as recently as this month. See, e.g., Etienne Weisskopf, Celine J. Fischer, Myriam Bickle Graz, Mathilde Morisod Harari, Jean-Francois Tolsa, Olivier Claris, Yvan Vial, Chin B. Eap, Chantal Csajka & Alice Panchaud, “Risk-benefit balance assessment of SSRI antidepressant use during pregnancy and lactation based on best available evidence,” 14 Expert Op. Drug Safety 413 (2015); Kimberly A. Yonkers, Katherine A. Blackwell & Ariadna Forray, “Antidepressant Use in Pregnant and Postpartum Women,” 10 Ann. Rev. Clin. Psychol. 369 (2014); Abbie D. Leino & Vicki L. Ellingrod, “SSRIs in pregnancy: What should you tell your depressed patient?” 12 Current Psychiatry 41 (2013).

[8] Julian Higgins & Sally Green, eds., Cochrane Handbook for Systematic Reviews of Interventions 152 (2008) (“7.2.2 Identifying multiple reports from the same study. Duplicate publication can introduce substantial biases if studies are inadvertently included more than once in a meta-analysis (Tramèr 1997). Duplicate publication can take various forms, ranging from identical manuscripts to reports describing different numbers of participants and different outcomes (von Elm 2004). It can be difficult to detect duplicate publication, and some ‘detectivework’ by the reviewauthors may be required.”); see also id. at 298 (Table 10.1.a “Definitions of some types of reporting biases”); id. at 304-05 (10.2.2.1 Duplicate (multiple) publication bias … “The inclusion of duplicated data may therefore lead to overestimation of intervention effects.”); Julian P.T. Higgins, Peter W. Lane, Betsy Anagnostelis, Judith Anzures-Cabrera, Nigel F. Baker, Joseph C. Cappelleri, Scott Haughie, Sally Hollis, Steff C. Lewis, Patrick Moneuse & Anne Whitehead, “A tool to assess the quality of a meta-analysis,” 4 Research Synthesis Methods 351, 363 (2013) (“A common error is to double-count individuals in a meta-analysis.”); Alessandro Liberati, Douglas G. Altman, Jennifer Tetzlaff, Cynthia Mulrow, Peter C. Gøtzsche, John P.A. Ioannidis, Mike Clarke, Devereaux, Jos Kleijnen, and David Moher, “The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration,” 151 Ann. Intern. Med. W-65, W-75 (2009) (“Some studies are published more than once. Duplicate publications may be difficult to ascertain, and their inclusion may introduce bias. We advise authors to describe any steps they used to avoid double counting and piece together data from multiple reports of the same study (e.g., juxtaposing author names, treatment comparisons, sample sizes, or outcomes).”) (internal citations omitted); Erik von Elm, Greta Poglia; Bernhard Walder, and Martin R. Tramèr, “Different patterns of duplicate publication: an analysis of articles used in systematic reviews,” 291 J. Am. Med. Ass’n 974 (2004); John Andy Wood, “Methodology for Dealing With Duplicate Study Effects in a Meta-Analysis,” 11 Organizational Research Methods 79, 79 (2008) (“Dependent studies, duplicate study effects, nonindependent studies, and even covert duplicate publications are all terms that have been used to describe a threat to the validity of the meta-analytic process.”) (internal citations omitted); Martin R. Tramèr, D. John M. Reynolds, R. Andrew Moore, Henry J. McQuay, “Impact of covert duplicate publication on meta­analysis: a case study,” 315 Brit. Med. J. 635 (1997); Beverley J Shea, Jeremy M Grimshaw, George A. Wells, Maarten Boers, Neil Andersson, Candyce Hamel, Ashley C. Porter, Peter Tugwell, David Moher, and Lex M. Bouter, “Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews,” 7(10) BMC Medical Research Methodology 2007 (systematic reviews must inquire whether there was “duplicate study selection and data extraction”).

Zoloft MDL Excludes Proffered Testimony of Anick Bérard, Ph.D.

June 27th, 2014

Anick Bérard is a Canadian perinatal epidemiologist in the Université de Montréal.  Bérard was named by plaintiffs’ counsel in the Zoloft MDL to offer an opinion that selective serotonin reuptake inhibitor (SSRI) antidepressants as a class, and Zoloft (sertraline) specifically, cause a wide range of birth defects. Bérard previously testified against GSK about her claim that paroxetine, another SSRI antidepressant is a teratogen.

Pfizer challenged Bérard’s proffered testimony under Federal Rules of Evidence 104(a), 702, 703, and 403.  Today, the Zoloft MDL transferee court handed down its decision to exclude Dr. Bérard’s testimony at the time of trial.  In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., MDL 2342, Document 979 (June 27, 2014).  The MDL court acknowledged the need to consider the selectivity (“cherry picking”) of studies upon which Dr. Bérard relied, as well as her failure to consider multiple comparisons, ascertainment bias, confounding by indication, and lack of replication of specific findings across the different SSRI medications, and across studies. Interestingly, the MDL court recognized that Dr. Bérard’s critique of studies as “underpowered” was undone by her failure to consider available meta-analyses or to conduct one of her own. The MDL court seemed especially impressed by Dr. Bérard’s having published several papers that rejected a class effect of teratogenicity for all SSRIs, as recently as 2012, while failing to identify anything that was published subsequently that could explain her dramatic change in opinion for litigation.

Intellectual Due Process in West Virginia and Beyond

June 1st, 2014

Harris v. CSX Transportation

I have borrowed and modified the phrase “Intellectual Due Process” from earlier writers because of its obvious implications for the presentation, interpretation, synthesis, and evaluation of scientific evidence in court. See Scott Brewer, “Scientific Expert Testimony and Intellectual Due Process,” 107 Yale L. J. 1535 (1998). The major reason courts write opinions is to explain and justify their decisions to litigants, present and future, and to a wider audience of lawyers, scholars, and the general public. Judicial opinions involving scientific evidence, whether in legislation, regulation, or litigation must satisfy the societal need to explain and justify the acceptance and rejection of scientific claims. Despite a great deal of hand waving that law and science are somehow different, in the end, when courts describe their acceptance or rejection of scientific claims, they are addressing the same epistemic warrant that scientists themselves employ. Even a cursory review of the judicial output reveals an unsatisfactory state of affairs in which many courts mangle scientific and statistical evidence and inference.  There is much that is needed to correct the problem.

One proposal would be to require that the parties file proposed findings of facts in connection with Rule 702 gatekeeping challenges.  Courts should file detailed findings of facts that underlie their decisions to admit or to exclude expert witness opinion testimony.  Another proposal would require courts to cite properly the scientific studies that they discuss in reaching a legal conclusion about sufficiency or admissibility.  These are small steps, but ones that would help reduce the gross inaccuracies and the glib generalizations, while increasing the opportunity for public scrutiny and criticism.

We do not think anything is amiss with special courts for tax, patent, family law, national security, equity, or commercial matters.  There is an even greater need for scientific skill, knowledge, and aptitude in a specialized science court.  The time has come for special courts to hear cases involving scientific claims in health effects and other litigation.

*   *   *   *   *   *   *

A decision of the West Virginia Supreme Court, late last year, illustrates the need for substantial reform of how claiming based upon “scientific evidence” is permitted and evaluated in court.  Mrs. Harris sued the railroad for the wrongful death of her husband, who died of multiple myeloma. Mr. Harris had been exposed, in his railroad workplace, to diesel exhaust, which Mrs. Harris claimed caused his cancer. See Harris v. CSX Transportation, Inc., 232 W.Va. 617, 753 S.E.2d 275 (2013). The trial court excluded Mrs. Harris’s expert witnesses. Harris v. CSX Transportation, Inc., No. 12-1135, 2012 WL 8899119 (Cir. Ct. Marshall Cty., W.Va. Aug. 21, 2012).

1. The West Virginia Supreme Court reversed the trial court’s exclusion of witnesses on the basis of an asymmetrical standard of review, which would allow de novo review of trial court decisions to exclude expert witness opinions, but which would privilege trial court decisions to admit opinions by limiting appellate review to abuse of discretion. This asymmetry was, of course, the same dodge that the Third and Eleventh Circuits had used to keep the “gates open,” regardless of validity or reliability concerns, and the same dodge that the Supreme Court shut down in General Electric v. Joiner. A single judge dissented in Harris, Justice Loughry, who took the majority to task for twisting facts and law to get to a desired result.

2. The Harris Court cited a federal court case for dicta that “Rule 702 reflects an attempt to liberalize the rules governing the admissibility of expert testimony.” See Harris, 753 S.E.2d at 279 (citing and quoting from Weisgram v. Marley Co., 169 F.3d 514, 523 (8th Cir.1999). Remarkably, the Harris Court omitted reference to the United States Supreme Court’s unanimous affirmance of Weisgram, which saw Justice Ginsburg write that “[s]ince Daubert, moreover, parties relying on expert evidence have had notice of the exacting standards of reliability such evidence must meet.” Weisgram v. Marley Co., 528 U.S. 440, 442 (2000).  The Harris Court’s lack of scholarship is telling.

3. Meta-analysis appeared to play a role in the case, but the judicial decisions in Harris fail to describe the proffered evidence. The majority in Harris noted that one of plaintiff’s expert witnesses, Dr. Infante, relied upon a meta-analysis referred to as “Sonoda 2001.” Harris, 753 S.E.2d at 309. Neither the Court nor the dissent cited the published meta-analysis in a way that would help an interested reader in finding the paper.  One could imagine the hue and cry if courts cited judicial cases or statutes by short-hand names without providing enough information to access the relied upon source.  In this case, a PubMed search reveals the source so perhaps the error is harmless. Tomoko Sonoda, Yoshie Nagata, Mitsuru Mori, Tadao Ishida & Kohzoh Imai, “Meta-analysis of multiple myeloma and benzene exposure,” 11. J. Epidemiol. 249 (2001).  Still, the time has come for courts to describe and report the scientific evidence with the same care and detail that they would use in a car collision case.

4. A quick read shows that the Sonoda meta-analysis supports the dissent’s assessment:

“‘Dr. Infante testified on direct examination that Sonoda 2001 considered 8 case-control studies specific to engine exhaust and stated it concluded that diesel and non-diesel engine exhaust causes multiple myeloma.’ Yet, as the trial court found, ‘[o]n cross examination Dr. Infante acknowledged that none of the 8 papers included in the Sonoda meta-analysis mention diesel exhaust’.”

Harris, 753 S.E.2d at 309.  The dissent would have been considerably more powerful had it actually adverted to the language of Sonoda 2001:

“These results suggested that benzene exposure itself was not likely to be a risk factor of MM [multiple myeloma]. It is thought that several harmful chemical agents in engine exhaust, other than benzene, could be etiologically related to the risk of MM. Further case-control studies on MM are needed to obtain more information about detailed occupational exposure to toxic substances.”

Sonoda at 249 (2001) (emphasis added).  Contrary to Infante’s asseveration, Sonoda and colleagues never concluded that diesel exhaust causes multiple myeloma.  The state of scholarship and “intellectual due process” makes it impossible to tell whether or not Dr. Infante was telling the truth or the Harris Court badly misunderstood the record. Either way, something must give.

The dissent went on to note that Dr. Infante conducted his own meta-analysis, which included studies that did not mention diesel exhaust. Harris, 753 S.E.2d at 309.  The railroad complained that some of the studies were small and had limited power, but that is exactly why a meta-analysis would be appropriate.  The more disturbing complaints were that the meta-analysis left out important studies, and that it included irrelevant studies of benzene exposure and myeloma, which raised insuperable problems of external validity.

5. A half empty glass that is always full.  According to the Harris Court, the West Virginia shadow of Rule 702 is a rule of “admissibility rather than exclusion.” Harris, 753 S.E.2d at 279 (citing and quoting from In re Flood Litig. Coal River Watershed, 222 W.Va. 574, 581, 668 S.E.2d 203, 210 (2008), which in turn quoted a federal case, Arcoren v. United States, 929 F.2d 1235, 1239 (8th Cir. 1991), decided before the Supreme Court decided Daubert.)  This is just silly hand waving and blatant partisanship.  A rule that sets out criteria or bases for admissibility also demarcates the inadmissible.

6. Cherry Picking. Dr. Infante was permitted by the Harris Court to aggregate data from studies that did not observe diesel exposure, while he failed to include, or he deliberately excluded data from, a large, powerful, exonerative study conducted by scientists from the National Cancer Institute, the International Agency for Research on Cancer (IARC), and the Karolinska Institute. See Paolo Boffetta, Mustafa Dosemeci, Gloria Gridley, Heather Bath, Tahere Moradi and Debra Silverman, “Occupational exposure to diesel engine emissions and risk of cancer in Swedish men and women,” 12 Cancer Causes Control 365 (2001). Dr. Infante inexplicably excluded this study, which found a risk ratio for men exposed to diesel exhaust that was below one, 0.98, with a very narrow 95% confidence interval, 0.92-1.05. Boffetta at 368, Table 2.

7. The West Virginia articulated an incohorent definition of “reliable,” designed to give itself the ability to reject gatekeeping completely. Citing its earlier decision in Flood, the Court offered its own ipse dixit:

“The assessment of whether scientifically-based expert testimony is “reliable,” as that term is used in [Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), and Wilt v. Buracker, 191 W.Va. 39, 443 S.E.2d 196 (1993)], does not mean an assessment of whether the testimony is persuasive, convincing, or well-founded. Rather, assessing ‘reliability’ is a shorthand term of art for assessing whether the testimony is to a reasonable degree based on the use of knowledge and procedures that have been arrived at using the methods of science — rather than being based on irrational and intuitive feelings, guesses, or speculation. If the former is the case, then the jury may (or may not, in its sole discretion) ‘rely upon’ the testimony. In re Flood Litig., 222 W.Va. at 582 n. 5, 668 S.E.2d at 211 n. 5.”

Harris, 753 S.E.2d at 279-80. Surely, this is circular or vacuous or both. Opinions not “well-founded” will be ones that are based upon guesses or speculation.  Opinions arrived at by the “methods of science” will be ones that have an epistemic warrant that will survive a claim that they are not “well-founded.”

8. The Harris Court evidenced its hostility to scientific evidence by dredging up one of its own decisions involving a multiple myeloma causation claim, State ex rel. Wiseman v. Henning, 212 W.Va. 128, 569 S.E.2d 204 (2002).  Wiseman involved a specious claim that a traumatic rib injury caused multiple myeloma, a claim at odds with scientific method and observation:

“Some research has suggested that people in some jobs may have an increased risk of developing multiple myeloma because they are exposed to certain chemicals. But the International Agency for Research on Cancer (IARC) states that the evidence is limited overall. It has been suggested that people may have an increased risk if they work in the petrol or oil industry, farming, wood working, the leather industry, painting and decorating, hairdressing, rubber manufacturing or fire fighting. But there is no evidence to prove that any of these occupations carry an increased risk of myeloma.”

Cancer Research UK, “Myeloma risks and causes” (last visited May 28, 2014). Even the most non-progressive jurisdictions have generally eradicated specious claiming for trauma-induced cancers, but West Virginia has carved out a place second to none in its race to the bottom.

9. WOE.  Not surprisingly, the Harris Court relied heavily on the First Circuit’s “weight of the evidence” end-run around the notion of epistemic warrant for scientific claims, citing Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11 (1st Cir.2011), cert. denied sub nom., U.S. Steel Corp. v. Milward, ___ U.S. ___, 2012 WL 33303 (2012). The Harris Court went on to conflate and confuse WOE with Bradford Hill, and cited a recent New York case that confidently saw through WOE hand waving, while ignoring its devasting critique of expert witnesses’ attempts to pass off WOE for scientific, epistemic warrant.  Reeps ex rel. Reeps v. BMW of N. Am., LLC, No. 100725/08,

2013 WL 2362566, at *3, 2012 N.Y. Misc. LEXIS 5788; 2012 NY Slip Op 33030U  (N.Y. Sup. Ct. May 10, 2013).

10.  Link.  Dr. Infante links a lot, even when his sources do not:

“Dr. Infante testified that the International Agency for Research on Cancer issued Technical Publication Number 42 in 2009, and that the publication stated that diesel exhaust exposures have been linked to multiple myeloma and leukemia.”

Harris, 753 S.E.2d at 294. The Harris Court neglected to give the title of the publication, which tells a different story.  Identification of research needs to resolve the carcinogenicity of high-priority IARC carcinogens. The dissent was willing to go behind the conclusory and false characterization that Dr. Infante and plaintiff gave to this publication.  Harris, 753 S.E.2d at 309. The trial court’s finding (and the dissent’s assertion) that the IARC Technical Publication 42 intended to express a research agenda, not to make a causation statement, seems unassailable.  Furthermore, it appears to be precisely the sort of specious claim that a court should keep from a jury.  The cited IARC source actually notes that the then current IARC classification of diesel exhaust was of inadequate evidence for human carcinogenicity, with a focus on lung cancer, and barely a mention of multiple myeloma.

11.  The Benzene Connection. Plaintiffs’ expert witnesses, including Dr. Infante, argued that benzene was a component of diesel exhaust, and benzene caused multiple myeloma.  This move ignored not only the lack of evidence to implicate benzene in the causation of multiple myeloma, but it also ignored the large quantitative differences between the benzene occupational exposure studies and the very small amounts of benzene in diesel exhaust.  The Harris Court held that the trial court acted improperly by inquiring into and finding the following facts, which were “exclusively” for the jury:

  • “There is substantially more benzene in cigarette smoke than diesel exhaust.
  • Benzene is present only in trivial doses in diesel exhaust.
  • The hypothesis that diesel exhaust causes multiple myeloma is confounded by the fact that cigarette smoking does not.”

The Harris majority further chastised the trial court for adverting to the ten or so studies that failed to find a statistically significant association between benzene exposure and multiple myeloma.  Harris, 753 S.E.2d at 305-06.  This inquiry directly calls into question, however, Dr. Infante’s methodology.

If these facts, found by the trial court, were reasonably established, then Dr. Infante’s argument was less than bogus, and a major underpinning for inclusion of benzene studies in his meta-analysis was refuted.  These are precisely the sort of foundational facts that must be part of an inquiry into the methodological grounds of an expert witness’s opinion.

12.  The Harris Court confused “proving causation” with “showing a methodology that provides an epistemic warrant for concluding.” Harris, 753 S.E.2d at 300. The Harris Court asserted that the trial court exceeded its gatekeeping function by inquiring into whether Mrs. Harris’s expert witnesses “proved” causation. Harris, 753 S.E.2d at 300. Speaking of “proof of” or “proving” causation is an affectation of lawyers, who refer to their evidence as their “proofs.”  Epidemiologic articles and meta-analyses do not end with quod erat demonstrandum. Beyond the curious diction, there is a further issue in the majority’s suggestion that the trial court set the bar too high in declaring that the plaintiff failed to “prove” causation.  Even if we were to accept the continuous nature of strength of evidence for a causal conclusion, Dr. Infante and the other plaintiff’s witnesses, would be fairly low on the curve, and their lowly position must of necessity speak to the merits of the defense motion to exclude under Rule 702.

13. Purely Matters for Jury. The Harris Court criticized the trial court for conducting a “mini-trial,” which set out to “resolve issues that were purely matters for jury consideration.” Harris, 753 S.E.2d at 305. In holding that the matters addressed in the pre-trial hearing were “exclusively grist for the jury and which had no relevancy to the limited role the trial court had under the facts of this case,” the Harris Court displayed a profound disregard for what facts would be relevant for a challenge to the plaintiff’s expert witnesses’ methodology. Many of the facts found by the trial court were directly relevant to “general acceptance,” validity (internal and external) of studies relied upon, and reliability of reasoning and inferences drawn. Aside from the lack of general acceptance and peer review of the plaintiff’s claimed causal relationship, the proffered testimony was filled with gaps and lacunae, which are very much at issue in methodological challenges to an opinion of causality.

*   *   *   *   *   *   *

The Harris case has taken its place next to Milward in the litigation industry’s arsenal of arguments for abandoning meaningful judicial supervision and gatekeeping of expert witness opinion testimony.  See Andrew S. Lipton, “Proving Toxic Harm: Getting Past Slice and Dice Tactics,” 45 McGeorge L. Rev. 707, 731 (2014) (plaintiffs’ bar cheerleading for the Harris decision as “a lengthy and thoughtful analysis”, and for the Milward case as roadmap to evade meaningful judicial oversight).  Not all was perfect with the trial court’s opinion.  The defense seemed to have misled the court by asserting that “a difference between a case group and control group is not statistically significant then there is no difference at all.”  See Respondent’s Brief at 5, Harris v. CSX Transportation, Inc., 2013 WL 4747999 (filed (Feb. 4, 2013) (citing  App. 169, 228-230 (Shields) as having explained that the p-values greater than 0.05 do not support a causal association).

This is hardly true, and indeed, the lack of statistical significance does not lead to a claim that the null hypothesis of no association between exposure and outcome is correct.  The defense, however, did not have a burden of showing the null to be correct; only that there was no reliable method deployed to reject the null in favor an alternative that the risk ratio for myeloma was raised among workers exposed to diesel exhaust.

Still, the trial court did seem to understand the importance of replication, in studies free of bias and confounding. Courts generally will have to do better at delineating what are “positive” and “negative” studies, with citations to the data and the papers, so that judicial opinions provide a satisfactory statement of reasons for judicial decisions.

Biostatistics and FDA Regulation: The Convergence of Science and Law

May 29th, 2014

On May 20, 2014, the Food and Drug Law Institute (FDLI), the Drug Information Association (DIA), and the Harvard Law School’s Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics, in collaboration with the Harvard School of Public Health Department of Biostatistics and Harvard Catalyst | The Harvard Clinical and Translational Science Center, presented a symposium on“Biostatistics and FDA Regulation: The Convergence of Science and Law.”

The symposium might just as well have been described as the collision of science and law.

The Symposium agenda addressed several cutting-edge issues on statistical evidence in the law, criminal, civil, and regulatory. Names of presenters are hyperlinked to presentations slides that are available.

I. Coleen Klasmeier, of Sidley Austin LLP, introduced and moderated the first section, “Introduction to Statistics and Regulatory Law,” which focused on current biostatistical issues in regulation of drugs, devices, and foods by the Food and Drug Administration (FDA). Qi Jiang, Executive Director of Amgen, Robert T. O’Neill, retired from the FDA, and now Statistical Advisor in CDER, and Jerald S. Schindler, of Merck Research Laboratories, presented.

II. Qi Jiang moderated and introduced the second section on safety issues, and the difficulties presented by meta-analysis and other statistical assessments of safety outcomes in clinical trials and in marketing of drugs and devices. Lee-Jen Wei, of the Harvard School of Public Health, Geoffrey M. Levitt, an Associate General Counsel of Pfizer, Inc., and Janet Wittes, of the Statistics Collaborative, presented.

III. Aaron Katz, of Ropes & Gray LLP, introduced the third section, on “Statistical Disputes in Life Sciences Litigation,” which addressed recent developments in expert witness gatekeeping, the Avandia litigation, and the role of statistics in two recent cases, Matrixx, Inc. v. Siracusano, and United States v. HarkonenAnand Agneshwar, of Arnold & Porter LLP, Lee-Jen Wei, Christina L. Diaz, Assistant General Counsel of GlaxoSmithKline, and Nathan A. Schachtman presented.

IV. Christopher Robertson, a law professor now visiting at Harvard Law School, moderated a talk by Robert O’Neill on “Emerging Issues,” at the FDA.

V. Dr. Wittes moderated a roundtable discussion on “Can We Handle the Truth,” which explored developments in First Amendment and media issues involved in regulation and litigation. Anand Agneshwar, and Freddy A. Jimenez, Assistant General Counsel, Johnson & Johnson, presented.

Pharmacovigilantism – Avandia Litigation

November 27th, 2013

Six and one-half years ago, I gave a presentation on the then newly emerging controversy over Avandia (rosiglitazone).  Plaintiffs’ counsel Vance Andrus chaired the program, Mealey’s™ Avandia Litigation Conference, in Chicago on July 13, 2007.  Vance was a gracious host despite my skepticism about the potential for plaintiffs to cash in on their use of Avandia.

Despite Vance’s best efforts, the program was one of those lopsided affairs, and I was the only presenter who came prepared to address the scientific evidence from a skeptical perspective.  The remaining presenters were mostly cheerleaders for their declaration of war against GlaxoSmithKline over claims of heart attack and stroke from the use of Avandia.

This week, a Food and Drug Administration announcement sent me back to my presentation slides, which were provocatively titled “Pharmacovigilantism and Avandia.” Dr. Steven Nissen had published a meta-analysis in the New England Journal of Medicine in May 2007, and it had all the appearances of a contrived effort to embarrass GSK. See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457 (2007).  A few weeks later, Dr. George Diamond published a thorough debunking of the Nissen meta-analysis, by showing that the statistically significant result in Nissen’s meta-analysis could be achieved only by choosing an inappropriate meta-analytic method.  Any other choice resulted in a result that lacked statistical significance for the rate of heart attack among patients taking Avandia.

Litigation, of course, followed and the Rule 702 hearings and decision resulted in a serious abridgement of the scientific process.  The federal MDL trial judge denied GSK’s motions to exclude plaintiffs’ causation witnesses in an opinion that has become a model for Rule 702 avoidance.  In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011) (Rufe, J.).  SeeLearning to Embrace Flawed Evidence – The Avandia MDL’s Daubert Opinion” (Jan. 10, 2011)

This week, without much fanfare, the FDA announced that maybe the evidence supporting the claims that Avandia causes heart attacks is not so strong, after all.  SeeFDA Drug Safety Communication: FDA requires removal of some prescribing and dispensing restrictions for rosiglitazone-containing diabetes medicines” (Nov. 25, 2011).  The Avandia MDL stands out as an expensive, negligent rush to judgment; a case more of phamacovigilantism than of pharmacovigilance.

EPA Post Hoc Statistical Tests – One Tail vs Two

December 2nd, 2012

EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 2

In 1992, the U.S. Environmental Protection Agency (EPA) published a risk assessment of lung cancer (and other) risks from environmental tobacco smoke (ETS).  See Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders EPA/600/6-90/006F (1992).  The agency concluded that ETS causes about 3,000 lung cancer deaths each year among non-smoking adults.  See also EPA “Fact Sheet: Respiratory Health Effects of Passive Smoking,” Office of Research and Development, and Office of Air and Radiation, EPA Document Number 43-F-93-003 (Jan. 1993).

In my last post, I discussed  how various plaintiffs, including tobacco companies, challenged the EPA’s conclusions as agency action that violated administrative and statutory procedures. “EPA Cherry Picking (WOE) – EPA 1992 Meta-Analysis of ETA & Lung Cancer – Part 1” (Dec. 2. 2012). The plaintiffs further claimed that the EPA had manufactured its methods to achieve the result it desired in advance of the analyses. A federal district court agreed with the methodological challenges to the EPA’s report, but the Court of Appeals reversed on grounds that the agency’s report was not reviewable agency action.  Flue-Cured Tobacco Cooperative Stabilization Corp. v. EPA, 4 F. Supp. 2d 435 (M.D.N.C. 1998), rev’d 313 F.3d 852, 862 (4th Cir. 2002) (Widener, J.) (holding that the issuance of the report was not “final agency action”).

One of the grounds of the plaintiffs’ challenge was that the EPA had changed, without explanation, from a 95% to a 90% confidence interval.  The change in the specification of the coefficient of confidence was equivalent to a shift from a two-tailed to a one-tailed test of confidence, with alpha set at 5%.  This change, along with gerrymandering or “cherry picking” of studies, allowed the EPA to claim a statistically significant association between ETS and lung cancer. 4 F. Supp. 2d at 461.  The plaintiffs pointed to EPA’s own previous risk assessments, as well as statistical analyses by the World Health Organization (International Agency for Research on Cancer), the National Research Council, and the Surgeon General, all of which routinely use 95% intervals, and two-tailed tests of significance.  Id.

In its 1990 Draft ETS Risk Assessment, the EPA had used a 95% confidence interval, but in later drafts, changed to a 90% interval.  One of the epidemiologists on the EPA’s Scientific Advisory Board, Geoffrey Kabat, criticized this post hoc change, noting that the use of 90% intervals are disfavored and that the post hoc change in statistical methodology created the appearance of an intent to influence the outcome of the analysis. Id. (citing Geoffrey Kabat, “Comments on EPA’s Draft Report: Respiratory Health Effects of Passive Smoking: Lung Cancer and Other Disorders,” II.SAB.9.15 at 6 (July 28, 1992) (JA 12,185).

The EPA argued that its adoption of a one-tailed test of significance was justified on the basis of an a priori hypothesis that ETS is associated with lung cancer.  Id. at 451-52, 461 (citing to ETS Risk Assessment at 5–2). The court found this EPA argument hopelessly circular.  The agency postulated its a priori hypothesis, which it then took as license to dilute the statistical test for assessing the evidence.  The agency, therefore, had assumed what it wished to show, in order to achieve the result it sought.  Id. at 456.  The EPA claimed that the one-tailed test had more power, but with dozens of studies aggregated into a summary result, the court recognized that Type I error was a larger threat to the validity of the agency’s conclusions.

The EPA also advanced a muddled defense of its use of 90% confidence intervals by arguing that if it used a 95% interval, the results would have been incongruent with the one-tailed p-values.  The court recognized that this was really no discrepancy at all, but only a corollary of using either one-tailed 5% tests or 90% confidence intervals.  Id. at 461.

If the EPA had adhered to its normal methodology, there would have been no statistically significant association between ETS and lung cancer. With its post hoc methodological choice, and highly selective approach to study inclusions in its meta-analysis, the EPA was able to claim a weak statistically significant association between ETS and lung cancer.  Id. at 463.  The court found this to be a deviation from the legally required use of “best judgment possible based upon the available evidence.”  Id.

Of course, the EPA could have announced its one-tailed test from the inception of the risk assessment, and justified its use on grounds that it was attempting to reach only a precautionary judgment for purposes of regulation.  Instead, the agency tried to showcase its finding as a scientific conclusion, which only further supported the tobacco companies’ challenge to the post hoc change in plan for statistical analysis.

Although the validity issues in the EPA’s 1992 meta-analysis should have been superseded by later studies, and later meta-analyses, the government’s fraud case, before Judge Kessler, resurrected the issue:

“3344. Defendants criticized EPA’s meta-analysis of U.S. epidemiological studies, particularly its use of an ‘unconventional 90 percent confidence interval’. However, Dr. [David] Burns, who participated in the EPA Risk Assessment, testified that the EPA used a one-tailed 95% confidence interval, not a two-tailed 90% confidence interval. He also explained in detail why a one-tailed test was proper: The EPA did not use a 90% confidence interval. They used a traditional 95% confidence interval, but they tested for that interval only in one direction. That is, rather than testing for both the possibility that exposure to ETS increased risk and the possibility that it decreased risk, the EPA only tested for the possibility that it increased the risk. It tested for that possibility using the traditional 5% chance or a P value of 0.05. It did not test for the possibility that ETS protected those exposed from developing lung cancer at the direction of the advisory panel which made that decision based on its prior decision that the evidence established that ETS was a carcinogen. What was being tested was whether the exposure was sufficient to increase lung cancer risk, not whether the agent itself, that is cigarette smoke, had the capacity to cause lung cancer with sufficient exposure. The statement that a 90% confidence interval was used comes from the observation that if you test for a 5% probability in one direction the boundary is the same as testing for a 10% probability in two directions. Burns WD, 67:5-15. In fact, the EPA Risk Assessment stated, ‘Throughout this chapter, one-tailed tests of significance (p = 0.05) are used …’ .”

U.S. v. Philip Morris USA, Inc., 449 F. Supp. 2d 1, 702-03 (D.D.C., 2006) (Kessler, J.) (internal citations omitted).

Judge Kessler was misled by Dr. Burns, a frequent testifier for plaintiffs’ counsel in tobacco cases.  Burns should have known that with respect to the lower bound of the confidence interval, which is what matters for determining whether the meta-analysis excludes a risk ratio of 1.0, there is no difference between a one-tailed 95% confidence interval and a two-tailed 90% interval.  Burns’ sophistry hardly saves the EPA’s error in changing its pre-specified end point and statistical analysis, or the danger of unduly increasing the risk of Type I error in the EPA meta-analysis. SeePin the Tail on the Significance Test” (July 14th, 2012)

Post-script

Judge Widener wrote the opinion for a panel of the United States Court of Appeals, for the Fourth Circuit, which reversed the district court’s judgment, enjoining the EPA’s report.  The Circuit’s decision did not address the scientific issues, but by holding that the agency action was not reviewable, the basis for the district court’ review of the scientific and statistical issues was removed.  For those pundits who see only self-interested behavior in judging, the author of the Circuit’s decision was a life-time smoker, who grew Burley tobacco on his farm, outside Abingdon, Virginia.  Judge Widener died on September 19, 2007, of lung cancer.