TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Bipartisan Junk Science – Pork-Barrel Causation

September 19th, 2012

Despite the hand waving and finger pointing, junk science is embraced by both political parties in the United States, when it suits their purposes.  Both parties want to have God and science on their sides.

Congress created September 11th Victim Compensation Fund, 49 USC § 40101, also known as the James Zadroga 9/11 Health and Compensation Act (P.L. 111-347) (signed into law in January 2011). The Act was a touching acknowledgement of the dedication and sacrifices of first responders to the World Trade Center and Pentagon victims of an Islamic jihad. Being a victim, however, implies that the harm to be compensated was caused by the attack and its consequences.  The New York politicians soon learned that causality can be turned into a very malleable concept.

The law allocated over $4 billion for medical screening and treatment of fire fighters, policemen, emergency responders, and survivors.  Most of the covered conditions were acute onset respiratory and mental disorders caused by gases, fumes, dusts, and stresses, to which the workers were exposed.  The law also made the director of CDC’s National Institute for Occupational Safety and Health (NIOSH), the head of a World Trade Center Health Program, which could add new conditions to the list of compensable diseases, based upon a review of scientific evidence.

In September 2011, several New York congressmen and Senators petitioned the director, citing flimsy or non-existent scientific evidence, to add cancer to the list.  Senators Kirsten Gillibrand (D-NY) and Charles Schumer (D-NY), and Representatives Carolyn Maloney (D-NY), Jerrold Nadler (D-NY), Peter King (R-NY), Charles Rangel (D-NY), Nita Velazquez (D-NY), Michael Grimm (R-NY),  and Yvette Clark (D-NY), made their request, citing R. Zeig-Owens, M. Webber, C.B. Hall, et al., “Early assessment of cancer outcomes in New York City firefighters after the 9/11 attacks: an observational cohort study,” 378 Lancet 898 (2011).

This is pork barrel politics masquerading as sympathy for putative victims.  The Zeig-Owens study reported a non-statistically significant standardized incidence ratio for all cancer, of either 1.10 (95% CI 0.98–1.25), with a comparison group of the generalized U.S. male population, or 1.19 (95% CI 0.96–1.47), with unexposed firefighters as a comparison group, and corrected for possible surveillance bias.  Of course, given that there is no disease of cancer, the composite end point is not particularly meaningful.

Here are the authors’ (including Dr. Prezant’s) published interpretation of the data:

“We reported a modest excess of cancer cases in the WTC-exposed cohort. We remain cautious in our interpretation of this finding because the time since 9/11 is short for cancer outcomes, and the reported excess of cancers is not limited to specific organ types. As in any observational study, we cannot rule out the possibility that effects in the exposed group might be due to unidentified confounders.”

Zeig-Owens, at 898.  The Zeig-Owens study did not support any conclusions of causality between the workers’ exposures in 2001, and any type of cancer. See NIOSH Report Sets Up Run on September 11th Victim Compensation Fund by Non-Victims.

The WTC Health Program director requested recommendations from the program’s Scientific – Technical Advisory Committee (STAC), whether to add cancer generally, or any particular kind of cancer, to the Zadroga Act’s list of compensable conditions.  In April 2012, the STAC made its recommendations, essentially relying upon likely exposures, without any consideration of individual dose, duration, latency, and without any serious consideration of the available epidemiologic evidence.

The STAC claimed that the Lancet study reported statistically significant excesses of cancer; it did not. The Committee also failed to come to grips with the biological implausibility of excess rates of solid malignant tumors presenting within less than a decade since exposure:

“Given that cancer latencies for solid tumors average 20 years or more, it is noteworthy that the published FDNY study of fire fighters showed a statistically significant excess in all-site cancer with only 7 years of follow-up.”

In June 2012, NIOSH director, Dr. Howard, reported that he was inclined to accept the STAC’s recommendation, but held open a public comment period.  See Anemona Hartocollis, “Sept. 11 Health Fund Given Clearance to Cover Cancer,” N.Y. Times (June 8, 2012).  Not surprising, given the political pressure, the WTC Health Program director promulgated his final rule to include 50 types of cancer, including many that occurred less often than expected in the Zeig-Owens study.

This decision ignores appropriate scientific methodology for reaching causal conclusions.  Worse than its intellectual shabbiness, the decision insults the true victims of the jihad terrorism.

The rule is effective October 12, 2012.

 

The Supreme Court’s Unsteady Gatekeeping Pre-Daubert

September 8th, 2012

Some writers assert that the United States Supreme Court did not wade into the troubled waters of medical causation and expert witness testimony until it decided Daubert v. Merrell Dow Pharmaceuticals, 509 U.S. 579 (1993).  Actually, the Court swam in these stormy waters in admiralty and FELA cases, at least up through the 1950’s.

In 1953, Mr. Sentilles, a marine engineer, was thrown to the deck of his ship, and washed off deck, by a wave.  He became ill with tuberculosis, and he brought a person injury action (for “maintenance and cure”) against vessel owner.  Inter-Caribbean Shipping Corp. v. Sentilles, 256 F.2d 156 (5th Cir. 1958).  The vessel owner defended on the theory that the plaintiff’s diabetes pre-disposed him to TB, and that the plaintiffs’ expert witnesses were equivocal in their conclusions of causality or aggravation.  The jury nonetheless found for the plaintiff.

The judgment entered on a jury verdict for the seaman was reversed by the Fifth Circuit, which found the plaintiffs’ expert witnesses’ testimony inadequate to support submission of the case to the jury:

“The rule as to the medical testimony respecting causation which is required to take a case to a jury has been thus stated:

It appears to be well settled that medical testimony as to the possibility of a causal relation between a given accident or injury and the subsequent death or impaired physical or mental condition of the person injured is not sufficient, standing alone, to establish such relation. By testimony as to possibility is meant testimony in which the witness asserts that the accident or injury `might have’, `may have’, or `could have’ caused, or `possibly did’ cause the subsequent physical condition or death or that a given physical condition (or death) `might have,’ `may have,’ or `could have’ resulted or `possibly did’ result from a previous accident or injury — testimony, that is, which is confined to words indicating the possibility or chance of the existence of the causal relation in question and does not include words indicating the probability or likelihood of its existence.”

Id. (internal citations omitted).

The Supreme Court granted a writ of certiorari, heard argument, and reversed the Court of Appeals.  Sentilles v. Inter-Caribbean Shipping Corp., 361 U.S. 107 (1959).  In announcing the Court’s opinion, Justice Brennan voiced the remarkable doctrine that the jury could find reasonable probability when the expert witnesses could not:

“The jury’s power to draw the inference that the aggravation of petitioner’s tubercular condition, evident so shortly after the accident, was in fact caused by that accident, was not impaired by the failure of any medical witness to testify that it was in fact the cause.  Neither can it be impaired by the lack of medical unanimity as to the respective likelihood of the potential causes of the aggravation, or by the fact that other potential causes of the aggravation existed and were not conclusively negated by the proofs.  The matter does not turn on the use a of particular form of words by the physicians in giving their testimony.  The members of the jury, not the medical witnesses, were sworn to make a legal determination of the question of causation.  They were entitled to take all the circumstances, including the medical testimony into consideration.  Though this case involves a medical issue, it is no exception to the admonition that, ‘It is not the function of a court to search the record for conflicting circumstantial evidence in order to take the case away from the jury on a theory that the proof gives equal support to inconsistent and uncertain inferences.  The focal point of judicial review is the reasonableness of the particular inference or conclusion drawn by the jury. * * * The very essence of its function is to select from conflicting inferences and conclusions that which it considers most reasonable.  * * * Courts are not free to reweigh the evidence and set aside the jury verdict merely because the jury could have drawn different inferences or conclusions or because judges feel that other results are more reasonable.’”

Id. at 109-10.  Justice Brennan thus ignored equally venerable precedent that juries are not free to speculate, and he failed to consider how the jury in this case could reach a determination in the face of conflicting evidence, and without ruling out alternative causes.

Sentilles was decided before the enactment of the Federal Rules of Evidence, and there was no challenge to the plaintiff’s expert witnesses’ testimony under the Frye doctrine.  Another crucial difference, of course, is that Sentilles was an isolated case, not likely to recur frequently in the federal courts.  With the rise of product liability law, and the emergence of epidemiology as a basis for inferring causality, the federal courts would soon see mass exposure situations resulting in mass torts.  Dubious expert witness testimony resulting in dubious judgments of causation would attain much greater notoriety, for the expert witnesses, for the trial bar, and for the courts that tolerated the results.

 

 

David Egilman’s Methodology for Divining Causation

September 6th, 2012

If the Method Yields An Erroneous Conclusion, then the Method is Wrong

David Stephen Egilman wanted very much to testify in a diacetyl case.  One judge, however, did not think that this was such a good idea, and excluded Dr. Egilman’s testimony. Newkirk v. Conagra Foods, Inc.  727  F.Supp. 2d 1006 (E.D. Wash. 2010).

Egilman was so distraught by being excluded that he sought to file a personal appeal to the United States Court of Appeal. See “Declaration of David Egilman, M.D., M.P.H., in Support of Opposition to Motion for Order to Show Cause Why Appeal Should Not Be Dismissed for Lack of Standing.”  (Attached: Egilman Motion Appeal Diacetyl Exclusion 2011 and Egilman Declaration Newkirk Diacetyl Appeal 2011.)

Egilman improvidently, if not scurrulously, attacked the district judge for having excluded Egilman’s proffered testimony.  If Egilman’s attack on the trial judge were not sufficiently odd, Egilman also claimed a right to intervene in the appeal by advancing the claim that the Rule 702 exclusion hurt his livelihood.  Here is how Egilman put the matter:

“The Daubert ruling eliminates my ability to testify in this case and in others. I will lose the opportunity to bill for services in this case and in others (although I generally donate most fees related to courtroom testimony to charitable organizations, the lack of opportunity to do so is an injury to me). Based on my experience, it is virtually certain that some lawyers will choose not to attempt to retain me as a result of this ruling. Some lawyers will be dissuaded from retaining my services because the ruling is replete with unsubstantiated pejorative attacks on my qualifications as a scientist and expert. The judge’s rejection of my opinion is primarily an ad hominem attack and not based on an actual analysis of what I said – in an effort to deflect the ad hominem nature of the attack the judge creates ‘straw man’ arguments and then knocks the straw men down, without ever addressing the substance of my positions.”

Egilman Declaration at ¶ 11.

The Ninth Circuit, unmoved by the prospect of an impoverished Dr. Egilman, denied his personal appeal, and affirmed the district court’s exclusion. Newkirk v. Conagra Foods, Inc., 438 Fed. Appx. 607 (9th Cir. 2011).

In his appellate papers, Egilman did not stop at simply citing his pecuniary interest.  With no sense of false shame or modesty, Egilman recited what a wonderful expert witness he has been.  Egilman suggested that courts have been duly impressed by his views on the scientific assessment of causation:

“My views on the scientific standards for the determination of cause-effect relationships (medical epistemology) have been cited by the Massachusetts Supreme Court (Vassallo v. Baxter Healthcare Corporation, 428 Mass. 1 (1998)):

‘Although there was conflicting testimony at the Oregon hearing as to the necessity of epidemiological data to establish causation of a disease, the judge appears to have accepted the testimony of an expert epidemiologist that, in the absence of epidemiology, it is “sound science…. to rely on case reports, clinical studies, in vivo tests and animal tests.” The judge may also have relied on the affidavit of the plaintiff’s epidemiological expert, Dr. David S. Egilman, who identified several examples in which disease causation has been established based on animal and clinical case studies alone to demonstrate that “doctors utilize epidemiological data as one tool among many”.’”

Egilman Declaration at p.5-6.

We may excuse Dr. Egilman, a non-lawyer, for incorrectly referring to a non-existent court.  Massachusetts does not have a “Supreme Court,” but the quoted language did indeed come from the Supreme Judicial Court of Massachusetts, in Vassallo v. Baxter Healthcare Corporation, 428 Mass. 1, 12, 696 N.E.2d 909, 917 (1998).

The Massachusetts court’s suggestion that there was conflicting testimony at the “Oregon hearing,” about the need for epidemiologic evidence is itself rather bizarre.  The Oregon hearing was the Rule 702 hearing before Judge Jones, of the District of Oregon.  Judge Jones appointed four technical advisors to assist him in ruling on the defendants’ motions to exclude plaintiffs’ causation opinions.  One of the appointed advisors was an epidemiologist.  More important, the plaintiffs’ counsel presented the testimony of an epidemiologist, Dr. David Goldsmith.  The Massachusetts court did not, and indeed, could not cite the Oregon District Court’s opinion, or the underlying record, for any suggestion that epidemiologic testimony was not needed to show a causal relationship between silicone breast implants and the development of autoimmune disease.  See Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 (D. Or. 1996). Judge Jones made his views very clear:  epidemiology was needed, but lacking, in the plaintiffs’ case.  The argument that epidemiology was unnecessary came from Dr. Egilman’s report, and the plaintiffs’ counsel’s briefs.

There is more, however, to the disingenuousness of Dr. Egilman’s citation to the Vassallo case.  The Newkirk court, in receiving his curious affidavit, would not likely know that Vassallo was a silicone gel breast implant case, and one may suspect that Dr. Egilman wanted to keep the Ninth Circuit uninformed of his role in the silicone litigation.  If Dr. Egilman submitted an affidavit in connection with the so-called Oregon hearings, which took place during the summer of 1996, it was not a particularly important piece of evidence.  Egilman is not mentioned by name in the Hall decision, even though the district court clearly rejected the plaintiffs’ witnesses and affiants, in their efforts to make a case for silicone as a cause of autoimmune disease.

A few months after the Oregon hearings, Judge Weinstein, in the fall of 1996, along with other federal and state judges, held a “Daubert” hearing on the admissibility of expert witness opinion testimony in breast implant cases, pending in New York state and federal courts.  Plaintiffs’ counsel suggested that Egilman might testify, but ultimately he was a no show.  After the New York hearings, Judge Weinstein granted, sua sponte, partial summary judgment against all plaintiffs’ claims of systemic immune-system injury.  In re Breast Implant Cases, 942 F. Supp. 958 (E.&S.D.N.Y. 1996).

At the New York hearings, plaintiffs’ counsel again attempted to make an epidemiologic case, and once again called Dr. David Goldsmith.  Marshaling the evidentiary display that Egilman would have presented had he shown up in New York, Dr. Goldsmith’s testimony did not go well. At one point, Judge Weinstein interrupted and offered his interim assessment of Dr. Goldsmith and the plaintiffs causation case:

THE COURT: Why are you presenting this witness, for epidemiological purposes?

MR. GORDON: That’s correct.

THE COURT: And I can tell you for epidemiological purposes, based on the only testimony I have seen, he doesn’t meet my standard of anybody who can be helpful to a jury, not because he isn’t a great epidemiologist, I’m sure he is, but because the data he is relying on admittedly is almost useless. I’m not going to go forward with a trial on this kind of haphazard abstract without any basic definition or explication.

Transcript at p.159:7-18, from Nyitray v. Baxter Healthcare Corp., CV 93-159 (E.D.N.Y. Oct. 9, 1996)(pre-trial hearing before Judge Jack Weinstein, Justice Lobis, and Magistrate Cheryl Pollak).  In his semi-autobiographical writings, Judge Jack B. Weinstein elaborated upon his published breast-implant decision, with a bit more detail about how he viewed the plaintiffs’ expert witnesses.  Judge Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans”; “[t]he breast implant litigation was largely based on a litigation fraud. … Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”)

When Judge Weinstein began to create a process for the selection of Rule 706 court-appointed expert witnesses, plaintiffs’ counsel rushed to have Judge Pointer take control over the process.  Because Judge Pointer believed that there must be some germ of validity in the plaintiffs’ case, the plaintiffs were hoping that his courtroom, the center of MDL 926, would be a more favorable forum than Judge Weinstein’s withering skepticism.  Ultimately, Judge Pointer, through a select nominating committee, appointed appointed expert witnesses, in the fields of toxicology, immunology, rheumology, and epidemiology.  MDL 926 Order No. 31 (Appointment of Rule 706 Expert Witnesses).

Each of the four witnesses prepared, presented, and defended his or her own report, but all the reports soundly rejected plaintiffs’ causation theories.  Laural L. Hooper, Joe S. Cecil, and Thomas E. Willging, Neutral Science Panels: Two Examples of Panels of Court-Appointed Experts in the Breast Implants Product Liability Litigation (Fed. Jud. Ctr. 2001).

In the United Kingdom, the British Minister of Health ordered an independent review of the breast implant controversy, which led to the formation of the Independent Review Group (IRG) to evaluate the causal claims that were being made by claimants and advocates. The IRG concluded that there was no demonstrable risk of connective tissue disease from silicone breast implants. Independent Review Group, Silicone Breast Implants: The Report of the Independent Review Group 8, 22-23 (July 1998).

In 1999, The Institute of Medicine delivered its assessment of the safety of silicone breast implants.  Again, the plaintiffs’ theories were rejected.  Stuart Bondurant, Virginia Ernster, and Roger Herdman, eds., Safety of Silicone Breast Implants (1999).

Still, Egilman persisted.  As late as 2000, Egilman was posting his breast-implant litigation report at his Brown University website.  His conclusion, however awkwardly worded, was clear enough:

“Although a prospective, large epidemiological study investigating atypical symptoms and disease would clearly contribute to underestimating of the strength of association between silicone breast implants and disease, the available epidemiologic evidence is suggestive of a causal association for silicone breast implants and atypical connective tissue diseases and scleroderma.”

David S. Egilman, “Breast Implants and Disease” (2000) (“For purposes of this report SBI induced disease is considered an iatrogenic environmental disease.”) (<http://209.67.232.40/brown/implants/sbi.html> lasted visited on Mar. 28, 2000).

Sometime after 2000, Egilman developed a sensitivity to being associated with the plaintiffs’ side of the silicone litigation.  In 2009, Dr. Laurence Hirsch, published an article critical of Egilman’s disclosures of conflicts of interest, in some of his published articles.  Hirsch struck a sensitive nerve in mentioning Egilman’s involvement in the breast implant litigation:

“Egilman reports having testified for plaintiffs in legal cases involving asbestosis, occupational lung disease, beryllium poisoning, silicone breast implants and connective tissue disease (characterized as the epitome of junk science91), selective serotonin reuptake inhibitor and suicide risk, atypical antipsychotics and metabolic changes, and selective COX-2 inhibitors and cardiovascular disease, an amazing breadth of medical expertise.”

Laurence J. Hirsch, “Conflicts of Interest, Authorship, and Disclosures in Industry-Related Scientific Publications: The Tort Bar and Editorial Oversight of Medical Journals,” 84 Mayo Clin. Proc. 811, 815 (2009).

Egilman apparently besieged Dr. Hirsch and the Mayo Clinic Proceedings with his protests, and it seems that he was able to induce the author or the journal into a “correction”:

“Dr Egilman has not testified in court in breast implant and connective tissue disease, or in antidepressant or antipsychotic drug cases.”

Laurence J. Hirsch, “Corrections,” 85 Mayo Clin. Proc. 99 (2010).  But this correction is itself incorrect because Dr. Egilman testified over the course of three days, in court, in the same Vassallo v. Baxter Healthcare case he holds up as having embraced his causal “principles.”  The Vassallo case involved allegations that silicone had caused systemic autoimmune disease, an allegation that was ultimately shown to be meritless by the MDL court’s neutral expert witnesses, as well as the Institute of Medicine.

Perhaps this history helps explain Dr. Egilman’s coyness in what he told the Newkirk appellate court about his involvement in the Vassallo case.  More likely is that Dr. Egilman understands, all too well, the logical implications of his being wrong in the breast implant litigation.  If his vaunted method leads to an erroneous conclusion, then the method must be wrong.  It is a simple matter of modus tollens.

 

 

 

 

Open Admissions for Expert Witnesses in Chantix Litigation

September 1st, 2012

Chantix is medication that helps people stop smoking.  Smoking kills people, but make a licensed drug and the lawsuits will come.

Earlier this month, Judge Inge Prytz Johnson, the MDL trial judge in the Chantix litigation, filed an opinion that rejected Pfizer’s challenges to plaintiffs’ general causation expert witnesses.  Memorandum Opinion and Order, In re Chantix (Varenicline) Products Liability Litigation, MDL No. 2092, Case 2:09-cv-02039-IPJ Document 642 (N.D. Ala. Aug. 21, 2012)[hereafter cited as Chantix].

Plaintiffs claimed that Chantix causes depression and suicidality, sometimes severe enough to result in suicide, attempted or completed.  Chantix at 3-4.  Others have written about Judge Johnson’s decision.  See Lacayo, “Win Some, Lose Some: Recent Federal Court Rulings on Daubert Challenges to Plaintiffs’ Experts,” (Aug. 30, 2012).

The breadth and depth of error of the trial court’s analysis, or lack thereof, remains, however, to be explored.

 

STATISTICAL SIGNIFICANCE

The Chantix MDL court notes several times that the defendant “harped” on this or that issue; the reader might think the defendant was a music label rather than a pharmaceutical manufacturer.  One of the defendant’s chords that failed to resonate with the trial judge was the point that the plaintiffs’ expert witnesses relied upon statistically non-significant results.  Here is how the trial court reported the issue:

“While the defendant repeatedly harps on the importance of statistically significant data, the United States Supreme Court recently stated that ‘[a] lack of statistically significant data does not mean that medical experts have no reliable basis for inferring a causal link between a drug and adverse events …. medical experts rely on other evidence to establish an inference of causation.’ Matrixx Initiatives, Inc. v. Siracsano, 131 S.Ct. 1309, 1319 (2011).”

Chantix at 22.

Well, it was only a matter of time before the Supreme Court’s dictum would be put to this predictably erroneous interpretation.  SeeThe Matrixx Oversold” (April 4, 2011).

Matrixx involved a motion to dismiss the complaint, which the trial court granted, but the Ninth Circuit reversed.  No evidence was offered; nor was any ruling that evidence was unreliable or insufficient at issue. The Supreme Court affirmed the Circuit on the issue whether pleading statistical significance was necessary.  Matrixx Initiatives took this position in the hopes of avoiding the merits, and so the issue of causation was never before the Supreme Court.  A unanimous Supreme Court held that because FDA regulatory action does not require reliable evidence to support a causal conclusion, pleading materiality for a securities fraud suit does not require an allegation of causation, and thus does not require an allegation of statistically significant evidence. Everything that the Court said about statistical significance and causation was obiter dictum, and rather ill-considered dictum at that.

The Supreme Court thus wandered far beyond its holding to suggest that courts “frequently permit expert testimony on causation based on evidence other than statistical significance.” Matrixx Initiatives, Inc. v. Siracsano, 131 S.Ct. 1309, 1319 (2011) (citing Wells v. Ortho Pharm. Corp., 788 F.2d 741, 744-745 (11th Cir.1986)).  But the Supreme Court’s citation to Wells, in Justice Sotomayor’s opinion, failed to support the point she was trying to make, or the decision that the trial court announced in Chantix.

Wells involved a claim of birth defects caused by the use of spermicidal jelly contraceptive.  At least one study reported a statistically significant increase in detected birth defects over the expected rate.  Wells v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D.Ga. 1985), aff’d, and rev’d in part on other grounds, 788 F.2d 741 (11th Cir.), cert. denied, 479 U.S.950 (1986).  Wells is not an example of a case in which an expert witness opined about causation in the absence of a scientific study with statistical significance. Of course, finding statistical significance is just the beginning of assessing the causality of an association; the Wells case was and remains notorious for the expert witness’s poor assessment of all the determinants of scientific causation, including the validity of the studies relied upon.

The Wells decision was met with severe criticism in the 1980s.  The decision was widely criticized for its failure to evaluate the entire evidentiary display, as well as for its failure to rule out bias and confounding in the studies relied upon by the plaintiff.  See, e.g., James L. Mills and Duane Alexander, “Teratogens and ‘Litogens’,” 15 New Engl. J. Med. 1234 (1986); Samuel R. Gross, “Expert Evidence,” 1991 Wis. L. Rev. 1113, 1121-24 (1991) (“Unfortunately, Judge Shoob’s decision is absolutely wrong. There is no scientifically credible evidence that Ortho-Gynol Contraceptive Jelly ever causes birth defects.”). See also Editorial, “Federal Judges v. Science,” N.Y. Times, December 27, 1986, at A22 (unsigned editorial);  David E. Bernstein, “Junk Science in the Courtroom,” Wall St. J. at A 15 (Mar. 24,1993) (pointing to Wells as a prominent example of how the federal judiciary had embarrassed the American judicial system with its careless, non-evidence based approach to scientific evidence). A few years later, another case in the same judicial district, against the same defendant, for the same product, resulted in the grant of summary judgment.  Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991) (supposedly distinguishing Wells on the basis of more recent studies).

Neither the Justices in Matrixx Initiatives nor the trial court in Chantix can be excused for their poor scholarship, or their failure to note that Wells was overruled sub silentio by the Supreme Court’s own subsequent decisions in Daubert, Joiner, Kumho Tire, and Weisgram.  And if the weight of precedent did not kill the concept, then there is the simple matter of a supervening statute:  the 2000 amendment of Rule 702, of Federal Rules of Evidence.

 

CONFUSING REGULATORY ACTION WITH CAUSAL ASSESSMENTS

The Supreme Court in Matrixx Initiatives was careful to distinguish causal judgments from regulatory action, but then went on in dictum to conflate the two.  The trial judge in Chantix showed no similar analytical care.  Judge Johnson held that the asserted absence of statistical significance was not a basis for excluding plaintiffs’ expert witnesses’ opinions on general causation.  Her Honor adverted to the Matrixx Initiatives dictum that the FDA “does not apply any single metric for determining when additional inquiry or action is necessary.” Matrixx, 131 S.Ct. at 1320.  Chantix at 22.  Judge Johnson noted

“that ‘[n]ot only does the FDA rely on a wide range of evidence of causation, it sometimes acts on the basis of evidence that suggests, but does not prove, causation…. the FDA may make regulatory decisions against drugs based on postmarketing evidence that gives rise to only a suspicion of causation’.  Matrixx, id. The court declines to hold the plaintiffs’ experts to a more exacting standard as the defendant requests.”

Chantix at 23.

In the trial court’s analysis, the difference between regulatory action and civil litigation fact adjudication is obliterated.  This, however, is not the law of the United States, which has consistently acknowledged the difference. See, e.g., IUD v. API, 448 U.S. 607, 656 (1980)(“agency is free to use conservative assumptions in interpreting the data on the side of overprotection rather than underprotection.”)

As the Second Edition of the Reference Manual on Scientific Evidence (which was the out-dated edition cited by the court in Chantix) explains:

“[p]roof of risk and proof of causation entail somewhat different questions because risk assessment frequently calls for a cost-benefit analysis. The agency assessing risk may decide to bar a substance or product if the potential benefits are outweighed by the possibility of risks that are largely unquantifiable because of presently unknown contingencies. Consequently, risk assessors may pay heed to any evidence that points to a need for caution, rather than assess the likelihood that a causal relationship in a specific case is more likely than not.”

Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” in Reference Manual On Scientific Evidence at 33 (Fed. Jud. Ctr. 2d. ed. 2000).

 

CONCLUSIONS VS. METHODOLOGY

Judge Johnson insisted that the “court’s focus was solely on the principles and methodology, not on the conclusions they generate.” Chantix at 9.  This insistence, however, is contrary to the established law of Rule 702.

Although the United States Supreme Court attempted, in Daubert, to draw a distinction between the reliability of an expert witness’s methodology and conclusion, that Court soon realized that the distinction was flawed. If an expert witness’s proffered testimony is discordant from regulatory and scientific conclusions, a reasonable, disinterested scientists would be led to question the reliability of the testimony’s methodology and its inferences from facts and data, to its conclusion.  The Supreme Court recognized this connection in General Electric v. Joiner, and the connection between methodology and conclusions was ultimately incorporated into a statute, the revised Federal Rule of Evidence 702:

“[I]f scientific, technical or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training or education, may testify thereto in the form of an opinion or otherwise, if

  1. the testimony is based upon sufficient fact or data,
  2. the testimony is the product of reliable principles and methods; and
  3. the witness has applied the principles and methods reliably to the facts.”

When the testimony is a conclusion about causation, the Rule 702 directs an inquiry into whether that conclusion is based upon sufficient fact or data, and whether that conclusion is the product of reliable principles and methods.  The court’s focus should indeed be on the conclusion as well the methodology claimed to generate the conclusion.  The Chantix MDL court thus ignored the clear mandate of a statute, Rule 702(1), and applied dictum from Daubert, superseded by Joiner, and an Act of Congress.  The ruling is thus legally invalid to the extent it departs from the statute.

 

EPIDEMIOLOGY

For obscure reasons, Judge Johnson sought to deprecate the need to rely upon epidemiologic studies, whether placebo-controlled clinical trials or observational studies.  See Chantix at 25 (citing Rider v. Sandoz Pharm. Corp., 295 F.3d 1194, 1198-99 (11 Cir.2002)). Of course, the language cited in Rider came from a pre-Daubert, pre-Joiner, case, Wells v. Ortho Pharm. Corp., 788 F.2d 741, 745 (11th Cir.1986) (holding that “a cause-effect relationship need not be clearly established by animal or epidemiological studies”).  This dubious legal lineage cannot support the glib dismissal of the need for epidemiologic evidence.

 

WEIGHT OF THE EVIDENCE (WOE)

According to Judge Johnson, plaintiffs’ expert witness Shira Kramer considered all the evidence relevant to Chantix and neuropsychiatric side effects, in what Kramer described as a “weight of the evidence” analysis.  Chantix at 26.  In her report, Kramer had written that determinations about the weight of evidence are “subjective interpretations” based upon “various lines of scientific evidence. Id. (citing and quoting Kramer’s report). Kramer also claimed that every scientist “brings a unique set of experiences, training and expertise …. Philosophical differences exist between experts…. Therefore, it is not surprising that differences of opinion exist among scientists. Such differences of opinion are not necessarily evidence of flawed scientific reasoning or methodology, but rather differences in judgment between scientists.” Id.

Without any support from scientific literature, or the Reference Manual on Scientific Evidence, Judge Johnson accepted Kramer’s explanation of a totally subjective, unprincipled approach as a scientific methodology.  Not surprisingly, Judge Johnson cited the First Circuit’s embrace of a similar vacuous embrace of a WOE analysis in Milward v. Acuity Specialty Products Group, Inc. 639 F.3d 11, 22 (1st Cir. 2011).  Chantix at 51.

 

CHERRY PICKING

Judge Johnson noted, contrary to her earlier suggestion that Shira Kramer had considered all the studies, that Kramer had excluded data from her analysis.  Kramer’s basis for excluding data may have been based upon pre-specified exclusionary principles, or they may have been completely ad hoc, as were the lack of weighting principles in her WOE analysis.  In its gatekeeping role, however, the trial court expressed complete indifference to Kramer’s selectivity in excluding data.  “Why Dr. Kramer chose to include or exclude data from specific clinical trials is a matter for cross-examination.”  Chantix at 27.  This indifference is an abdication of the court’s gatekeeping responsibility.

 

POWER

The trial court attempted to justify its willingness to mute defendant’s harping on statistical significance by adverting to the concept of statistical power:

“Oftentimes, epidemiological studies lack the statistical power needed for definitive conclusions, either because they are small or the suspected adverse effect is particularly rare. Id. [Michael D. Green et al., “Reference Guide on Epidemiology,” in Reference Manual on Scientific Evidence 333, 335 (Fed. Judicial Ctr. 2d ed. 2000)… .

Chantix at 29 n.16.

To be fair to the trial court, the Reference Manual invited this illegitimate use of statistical power because it, at times, omits the specification that statistical power requires not only a level of statistical significance to be attained, but also a specified alternative hypothesis to assess power.  See Power in the Courts — Part One; Power in the Courts — Part Two.  The trial court offered no alternative hypothesis against which any measure of power was to be assessed.

Judge Johnson did not report any power analyses, and she certainly did not report any quantification of power or lack thereof against some specific alternative hypothesis.  Judge Johnson’s invocation of power was just that – power used arbitrarily, without data, evidence, or reason.

 

CONFIDENCE INTERVALS

As with the invocation of statistical power, the trial also invoked the concept of confidence intervals to suggest that such intervals provide a more refined approach to assessing statistical significance:

“A study found to have ‘results that are unlikely to be the result of random error’ is ‘statistically significant’. Reference Guide on Epidemiology, supra, at 354. Statistical significance, however, does not indicate the strength of an association found in a study. Id. at 359. ‘A study may be statistically significant but may find only a very weak association; conversely, a study with small sample sizes may find a high relative risk but still not be statistically significant.’ Id. To reach a ‘more refined assessment of appropriate inferences about the association found in an epidemiologic study’, researchers rely on another statistical technique known as a confidence interval’. Id. at 360.”

Chantix at 30 n.17.  True, true, but immaterial.  The trial court, again, never carries through with the direction given by the Reference Manual.  Not a single confidence interval is presented.  No confidence intervals are subjected to this more refined assessment.  Why have more refined assessments when even the cruder assessments are not done?

 

OPEN ADMISSIONS IN SCHOOL OF EXPERT WITNESSING

The trial court somehow had the notion that all it had to do was state that every disputed fact and opinion went to the weight not the admissibility, and then pass to a presumably more scientifically literate jury.  To be sure, the court engaged in a good deal of hand waving, going through the motions of deciding a contested issues.  Not only did the Judge Johnson smash poor Pfizer’s harp, Her Honor unhinged the gate that federal judges are supposed to keep.  Chantix declares that it is now open admissions for expert witnesses testifying to causation in federal cases.  This is a judgment in search of an appeal.

Eighth Circuit Holds That Increased Risk Is Not Cause

August 4th, 2012

The South Dakota legislature took it upon itself to specify the “risks” to be included in the informed consent required by state law for an abortion procedure:

(1) A statement in writing providing the following information:
* * *
(e) A description of all known medical risks of the procedure and statistically significant risk factors to which the pregnant woman would be subjected, including:
(i) Depression and related psychological distress;
(ii) Increased risk of suicide ideation and suicide;
* * *

S.D.C.L. § 34-23A-10.1(1)(e)(i)(ii).  Planned Parenthood challenged the law on constitutional grounds, and the district court granted a preliminary injunction against the South Dakota statute, which a panel of the Eight Circuit affirmed, only to have that Circuit en banc reverse and remand the case for further proceedings.  Planned Parenthood Minn. v. Rounds, 530 F.3d 724 (8th Cir. 2008) (en banc).

On remand, the parties filed cross-motions for summary judgment.  The district court held that the so-called suicide advisory was unconstitutional.  On the second appeal to the Eight Circuit, a divided panel affirmed the trial court’s holding on the suicide advisory. 653 F.3d 662 (8th Cir. 2011).  The Circuit, however, again granted rehearing en banc, and reversed the summary judgment for Planned Parenthood on the advisory.  Planned Parenthood Minnesota v. Rounds, Slip op. July 24, 2012 (en banc)[Slip op.].

In support of the injunction, Planned Parenthood argued that the state’s mandatory suicide advisory violated women’s abortion rights and physicians’ free speech rights. The en banc court rejected this argument, holding that the required advisory was “truthful, non-misleading information,” which did not unduly burden abortion rights, even if it might cause women to forgo abortion.  See Planned Parenthood of Southeastern Pennsylvania v. Casey, 505 U.S. 833, 882-83 (1992).

Risk  ≠ Cause

Planned Parenthood’s success in the trial court turned on its identification of risk (or increased risk) with cause, and its expert witness evidence that causation had not been accepted in the medical literature. In other words, Planned Parenthood argued that the advisory required disclosure of a conclusive causal “link” between abortion and suicide or suicidal ideation.  See 650 F. Supp. 2d 972, 982 (D.S.D. 2009).  The en banc court, on the second appeal, sought to save the statute by rejecting Planned Parenthood’s reading.  The court parsed the statute to suggest that the term “increased risk” is more precise and limited than the umbrella term of “risk,” standing alone.  Slip op. at 6.  The statute does not define “increased risk,” which the en banc court noted had various meanings in medicine.  Id. at 7.

Reviewing the medical literature, the en banc court held that the term “increased risk” does not refer to causation but to a much more modest finding of “a relatively higher probability of an adverse outcome in one group compared to other groups—that is, to ‘relative risk’.”  Id.  The en banc majority seemed to embroil itself in some considerable semantic confusion.  One the hand, the majority, in a rhetorical rift proclaimed that:

“It would be nonsensical for those in the field to distinguish a relationship of ‘increased risk’ from one of causation if the term ‘risk’ itself was equivalent to causation.”

Id. at 9.  The majority’s nonsensical labeling is, well, … nonsensical.  There is a compelling difference in assessment of risk and causation.  Risk is an ex ante concept, applied before the effect has occurred. Assessment or attribution of causation takes place after the effect. Of course, there is a sense of risk or “increased risk,” which is epistemologically more modest, but that hardly makes the more rigorous use of risk as an ex ante cause, nonsensical.

The majority, however, is not content to leave the matter alone.  Elsewhere, the en banc court contradicts itself, and endorses a view that risk = causation.  For instance, in citing to a civil action involving a claimed causal relationship between Bendectin and a birth defect, the Eighth Circuit reduces risk to cause.  See Slip op. at 26 n. 9 (citing Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 312 , modified on reh’g, 884 F.2d 166 (5th Cir. 1989)).  The en banc court’s “explanatory” parenthetical explains the depths of its confusion:

“explaining that if studies establish, within an acceptable confidence interval, that those who use a pharmaceutical have a relative risk of greater than 1.0—that is, an increased risk—of an adverse outcome, those studies might be considered sufficient to support a jury verdict of liability on a failure-to-warn claim.”

This reading of Brock is wrong on two counts.  First, the Fifth Circuit, in Brock, and consistently since, has required the relative risk greater than 1.0 to be statistically significant at the conventional significance probability, as well as other indicia of causality, such as the Bradford Hill factors.  So Brock and its progeny did not confuse or conflate risk with cause, or dilute the meaning of cause such that it could be satisfied by a mere showing of an increased relative risk.

Second, Brock itself made a serious error in interpreting statistical significance and confidence intervals. The Bendectin studies at issue in Brock were not statistically significant, and the confidence intervals did not include a measure of no association (relative risk = one). Brock, however, in notoriously incorrect dicta claimed that the computation of confidence intervals took into account bias and confounding as well as sampling variability.  Brock v. Merrill Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989)(“Fortunately, we do not have to resolve any of the above questions [as to bias and confounding], since the studies presented to us incorporate the possibility of these factors by the use of a confidence interval.”)(emphasis in original).  See, e.g., David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The New Wigmore – A Treatise on Evidence:  Expert Evidence § 12.6.4, at 546 (2d ed. 2011); Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 86-87 (2009)(criticizing the over-interpretation of confidence intervals by the Brock court); Schachtman, “Confidence in Intervals and Diffidence in the Courts” (Mar. 4, 2012).

The en banc majority’s discussion of the studies of abortion and suicidality make clear that the presence of bias and confounding in a study may prevent inference of causation, but they do not undermine the conclusion that the studies show an increased risk.  A conclusion that the body of epidemiologic studies was inconclusive, and that it failed to “to disentangle confounding factors and establish relative risks of abortion compared to its alternatives,” did not, therefore, render the suicide advisory about risk or increased risk unsupported, untruthful, or misleading.  Slip op. at 20.  Indeed, the en banc court provided an example, outside the context of abortion, to illustrate its meaning.  The en banc court’s use of the example of prolonged television viewing and “increased risk” of mortality suggests that the court took risk to mean any association, no matter how likely it was the result of bias or confounding.  See id. at 10 n. 3 (citing Anders Grøntved, et al., “Television Viewing and Risk of Type 2 Diabetes, Cardiovascular Disease, and All-Cause Mortality, 305 J. Am. Med. Ass’n 2448 (2011). The en banc majority held that the advisory would be misleading only if Planned Parenthood could show that the available epidemiologic studies conclusively ruled out causation.  Slip op. at 24-25.

The Suicide Advisory Has Little Content Because Risk Is Not Cause

The majority decision clarified that the mandatory disclosure does not require a physician to inform a patient that abortion causes suicide or suicidal thoughts.  Slip op. at 25.  The en banc court took solace in its realization that physicians’ reviewing the available studies could provide a disclosure that captures the difference between risk, relative risk, and causation.  In other words, physicians are free to tell patients that this thing called increased risk is not concerning because the studies are highly confounded, and they do not show causation.  Id. at 25-26.  Indeed, it would be hard to imagine an ethical physician telling patients anything else.

Dissent

Four of the Eight Circuit judges dissented, pointing to evidence that the South Dakota legislators intended to mandate a disclosure about causality.  Slip op. at 29.  Putting aside whether the truthfulness of the suicide advisory can be saved by reverting to a more modest interpretation of risk or of increased risk, the dissenters appear to have the better argument that the advisory is misleading.  The majority, however, by driving its wedge between causation and increased risk have allowed physicians to explain that the advisory has little or no meaning.

NOCEBO

The nocebo effect is the dark side of the placebo effect.  As pointed out recently in the Journal of the American Medical Association, nocebos can induce harmful outcomes because of the expectation of injury from the “psychosocial context or therapeutic environment” affecting patients’ perception of their health.  Luana Colloca & Damien Finniss, “Nocebo Effects, Patient-Clinician Communication, and Therapeutic Outcomes,” 307 J. Am. Med. Ass’n 567, 567 (2012).  It is fairly well accepted that clinicians can inadvertently prejudice health outcomes by how they frame outcome information to patients.  Colloca and Finniss note that the negative expectations created by nocebo communication can take place in the process of obtaining informed consent.

Unfortunately, there is no discussion of nocebo effects in the Eight Circuit’s decision. Planned Parenthood might well consider the role the nocebo effect has on the risk-benefit of an informed consent disclosure about a risk that really is not a risk, or is not a risk in the sense that it is a factor that will result in the putative cause, but rather only something that is under study and which cannot be separated from many confounding factors.  Surely, physicians in South Dakota will figure out how to give truthful, non-misleading disclosures that incorporate the mandatory suicide advisory, as well as the scientific evidence.

Viagra, Part II — MDL Court Sees The Light – Bad Data Trump Nuances of Statistical Inference

July 8th, 2012

In the Viagra vision loss MDL, the first Daubert hearing did not end well for the defense.  Judge Magnuson refused to go beyond conclusory statements by the plaintiffs’ expert witness, Gerald McGwin, and to examine the qualitative and quantitative evaluative errors invoked to support plaintiffs’ health claims.  The weakness of McGwin’s evidence, however, appeared to  encourage Judge Magnuson to authorize extensive discovery into McGwin’s study.  In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071, 1090 (D. Minn. 2008).

The discovery into McGwin’s study had already been underway, with subpoenas to him and to his academic institution.  As it turned out, defendant’s discovery into the data and documents underlying McGwin’s study won the day.  Although Judge Magnuson struggled with inferential statistics, he understood the direct attack on the integrity of McGwin’s data.  Over a year after denying defendant’s Rule 702 motion to exclude Gerald McGwin, the MDL court reconsidered and granted the motion.  In re Viagra Products Liab. Litig., 658 F. Supp. 2d 936, 945 (D. Minn. 2009).

The basic data on prior exposures and risk factors for the McGwin study was collected by telephone surveys, from which the information was coded into an electronic dataset.  In analyzing the data, McGwin used the electronic dataset and not the survey forms.  Id. at 939.  The transfer from survey forms to electronic dataset did not go smoothly; about 11 patients were miscoded as “exposed“ when their use of Viagra post-dated the onset of NAION. Id. at 942.  Furthermore, the published article incorrectly stated personal history of heart attack as a “risk factor ”; the survey inquired about family not personal history of heart attack. Id. at 944.

The plaintiffs threw several bombs in response, but without legal effect.  First, the plaintiffs claimed that the study participants had been recontacted and the database had been corrected, but they were unable to document this process or the alleged corrections.  Id. at 433.  Furthermore, the plaintiffs could not explain how, if their contention had been true, McGwin would have not committed serious violations of his university’s institutional review board’s regulations with respect to deviations from the original protocol.  Id. at 943 n.7.

Second, the plaintiffs argued that the underlying survey forms were “inadmissible ” and thus the defense could not use them to impeach the McGwin study.  Some might think this a duplicitous argument, utterly at odds with Rule 703 – rely upon a study but prevent use of underlying data and documents to explain that the study does not show what it purports to show.  The MDL court spared the plaintiffs the embarrassment of ruling that the documents on which McGwin had based his study were inadmissible, and found that the forms were business records and admissible under Federal Rule of evidence 803(6).  The court could have gone further to point out that McGwin’s reliance upon hearsay in the form of his study, McGwin 2006, opened the door to impeaching the hearsay relied upon with other hearsay.  See Rule 806.

When defense counsel sat down with McGwin in a deposition, they found that he had not undertaken any new analyses of corrected data.  Plaintiffs’ counsel directed him not to do so.  Id. at 940-41.  But then after the deposition was over, McGwin submitted a letter to the journal to report a corrected analysis.  Pfizer’s counsel obtained the letter in response to their subpoena to McGwin’s university, the University of Alabama, Birmingham.  Mirabile dictu; now the increase risk appeared limited to only to the defendant’s medication, Viagra!

The trial court was not amused.  First, the new analysis was no longer peer reviewed, and the court had placed a great deal of emphasis on peer review in denying the first challenge to McGwin.  Second, the new analysis was no longer that of an independent scientist, but was conducted and submitted as a letter to the editor, while McGwin was working for plaintiffs’ counsel.  Third, the plaintiffs and McGwin conceded that the data were not accurate.  Last, but not least, the trial court clearly was not pleased that the plaintiffs’ counsel had deliberately delayed McGwin’s further analyses until after the deposition, and then tried to submit yet another supplemental report with those further analyses. In sum:

“the Court finds good reason to vacate its original Daubert Order permitting Dr. McGwin to testify as a general causation expert based on the McGwin Study as published. Almost every indicia of reliability the Court relied on in its previous Daubert Order regarding the McGwin Study has been shown now to be unreliable.  Peer review and publication mean little if a study is not based on accurate underlying data. Likewise, the known rate of error is also meaningless if it is based on inaccurate data. Even if the McGwin Study as published was conducted according to generally accepted epidemiologic research and did not result from post-litigation research, the fact that the McGwin Study appears to have been based on data that cannot now be documented or supported renders it inadmissibly unreliable. The Court concludes that under Daubert, Dr. McGwin’s opinion, to the extent that it is based on the McGwin Study as published, lacks sufficient indicia of reliability to be admitted as a general causation opinion.”

Id. at 945-46.  The remaining evidence was the Margo & French study, but McGwin had previously criticized that study as lacking data that ensured that Viagra use preceded onset of NAION.  In the end, McGwin was left with bupkes, and the plaintiffs were left with even less.

*******************

McGwin 2006 Was Also A Pain in the Rear End for McGwin

The Rule 702 motions and hearings on McGwin’s proposed testimony had consequences in the scientific world itself.  In 2011, the British Journal of Ophthalmology retracted McGwin’s 2006 paper.  “Retraction: Non-arteritic anterior ischaemic optic neuropathy and the treatment of erectile dysfunction, ” 95 Brit. J. Ophthalmol. 595 (2011).

Interestingly, the retraction was reported in the Retraction Watch blog, “Retractile dysfunction? Author says journal yanked paper linking Viagra, Cialis to vision problem after legal threats.”  The blog treated the retraction as routine except for the hint of “legal pressure”:

“One of the authors of the paper, a researcher at the University of Alabama named Gerald McGwin Jr., told us that the journal retracted the article because it had become a tool in a lawsuit involving Pfizer, which makes Viagra, and, presumably, men who’d developed blindness after taking the drug:

‘The article just became too much of a pain in the rear end. It became one of those things where we couldn’t provide all the relevant documentation [to the university, which had to provide records for attorneys].’

Ultimately, however, McGwin said that the BJO pulled the plug on the paper.”

Id. The legal threat is hard to discern other than the fact that lawyers wanted to see something that peer reviewers almost never see – the documentation underlying the published paper.  So now, the study that formed the basis for the original ruling against Pfizer floats aimlessly as a derelict on the sea of science.  McGwin is, however, still at his craft.  In a study he published in 2010, he claimed that Viagra but not Cialis use was associated with hearing impairment.  Gerald McGwin, Jr, “Phosphodiesterase Type 5 Inhibitor Use and Hearing Impairment,” 136 Arch. Otolaryngol. Head & Neck Surgery 488 (2010).

Where are Senator Grassley and Congressman Waxman when you need them?

Love is Blind but What About Judicial Gatekeeping of Expert Witnesses? – Viagra Part I

July 7th, 2012

The Viagra litigation over claimed vision loss vividly illustrates the difficulties that trial judges have in understanding and applying the concept of statistical significance.  In this MDL, plaintiffs sued for a specific form of vision loss, non-arteritic ischemic optic neuropathy (NAION), which they claimed was caused by their use of defendant’s medication, Viagra.  In re Viagra Products Liab. Litig., 572 F. Supp. 2d 1071 (D. Minn. 2008).  Plaintiffs’ key expert witness, Gerald McGwin considered three epidemiologic studies; none found a statistically significant elevation of risk of NAION after Viagra use.  Id. at 1076. The defense filed a Rule 702 motion to exclude McGwin’s testimony, based in part upon the lack of statistical significance of the risk ratios he relied upon for his causal opinion.  The trial court held that this lack did not render McGwin’s testimony and unreliable and inadmissible  Id. at 1090.

One of the three studies considered by McGwin was his own published paper.  G. McGwin, Jr., M. Vaphiades, T. Hall, C. Owsley, ‘‘Non-arteritic anterior ischaemic optic neuropathy and the treatment of erectile dysfunction,’’ 90 Br. J. Ophthalmol. 154 (2006)[“McGwin 2006”].    The MDL court noted that McGwin had stated that his paper reported an odds ratio (OR) of 1.75, with a 95% confidence interval (CI), 0.48 to 6.30.  Id. at 1080.  The study also presented multiple subgroup analyses of men who had reported Viagra use after a history of heart attack (OR = 10.7) or hypertension (OR = 6.9), but the MDL court did not provide p-values or confidence intervals for the subgroup analysis results.

Curiously, Judge Magnuson eschewed the guidance of the Reference Manual on Scientific Evidence, in dealing with statistics of sampling estimates of means or proportions.  The Reference Manual on Scientific Evidence (2d ed. 2000) urges that:

“[w]henever possible, an estimate should be accompanied by its standard error.”

RMSE 2d ed. at 117-18.  The new third edition again conveys the same basic message:

What is the standard error? The confidence interval?

An estimate based on a sample is likely to be off the mark, at least by a small amount, because of random error. The standard error gives the likely magnitude of this random error, with smaller standard errors indicating better estimates.”

RMSE 3d ed. at 243.

The point of the RSME‘s guidance is, of course, that the standard error, or the confidence interval (C.I.) based upon a specified number of standard errors, is an important component of the sample statistic, without which the sample estimate is virtually meaningless.  Just as a narrative statement should not be truncated, a statistical or numerical expression should not be unduly abridged.

The statistical data on which McGwin was basing his opinion was readily available from McGwin 2006:

“Overall, males with NAION were no more likely to report a history of Viagra … use compared to similarly aged controls (odd ratio (OR) 1.75, 95% confidence interval (CI) 0.48 to 6.30.  However, for those with a history of myocardial infarction, a statistically significant association was observed (OR 10.7, 95% CI 1.3 to 95.8). A similar association was observed for those with a history of hypertension though it lacked statistical significance (OR 6.9, 95% CI 0.8 to 63.6).”

McGwin 2006, at 154.  Following the RSME‘s guidance would have assisted the MDL court in its gatekeeping responsibility in several distinct ways.  First, the court would have focused on how wide the 95% confidence intervals were.  The width of the intervals pointed to statistical imprecision and instability in the point estimates urged by McGwin.  Second, the MDL court would have confronted the extent to which there were multiple ad hoc subgroup analyses in McGwin’s paper.  See Newman v. Motorola, Inc., 218 F. Supp. 2d 769, 779 (D. Md. 2002)(“It is not good scientific methodology to highlight certain elevated subgroups as significant findings without having earlier enunciated a hypothesis to look for or explain particular patterns.”) Third, the court would have confronted the extent to which the study’s validity was undermined by several potent biases.  Statistical significance was the least of the problems faced by McGwin 2006.

The second study considered and relied upon by McGwin was referred to as Margo & French.  McGwin cited this paper for an “elevated OR of 1.10,” id. at 1081, but again, had the court engaged with the actual evidence, it would have found that McGwin had cherry picked the data he chose to emphasize.  The Margo & French study was a retrospective cohort study using the National Veterans Health Administration’s pharmacy and clinical databases.  C. Margo & D. French, ‘‘Ischemic optic neuropathy in male veterans prescribed phosphodiesterase-5 inhibitors,’’ 143 Am. J. Ophthalmol. 538 (2007).  There were two outcomes ascertained:  NAION and “possible” NAION.  The relative risk of NAION among men prescribed a PDE-5 inhibitor (the class to which Viagra belongs) was 1.02 (95% confidence interval [CI]: 0.92 to 1.12.  In other words, the Margo & French paper had very high statistical precision, and it reported essentially no increased risk at all.  Judge Magnuson cited uncritically McGwin’s endorsement of a risk ratio that included ‘‘possible’’ NAION cases, which could not bode well for a gatekeeping process that is supposed to protect against speculative evidence and conclusions.

McGwin’s citation of Margo & French for the proposition that men who had taken the PDE-5 inhibitors had a 10% increased risk was wrong on several counts.  First, he relied upon an outcome measure that included ‘‘possible’’ cases of NAION.  Second, he completely ignored the sampling error that is captured in the confidence interval.  The MDL court failed to note or acknowledge the p-value or confidence interval for any result in Margo & French. The consideration of random error was not an optional exercise for the expert witness or the court; nor was ignoring it a methodological choice that simply went to the ‘‘disagreement among experts.’’

The Viagra MDL court not only lost its way by ignoring the guidance of the RMSE, it appeared to confuse the magnitude of the associations with the concept of statistical significance.  In the midst of the discussion of statistical significance, the court digressed to address the notion that the small relative risk in Margo & French might mean that no plaintiff could show specific causation, and then in the same paragraph returned to state that ‘‘persuasive authority’’ supported the notion that the lack of statistical significance did not detract from the reliability of a study.  Id. at 1081 (citing In re Phenylpropanolamine (PPA) Prods. Liab. Litig., MDL No. 1407, 289 F.Supp.2d 1230, 1241 (W.D.Wash. 2003)).  The magnitude of the observed odds ratio is an independent concept from that of whether an odds ratio as extreme or more so would have occurred by chance if there really was no elevation.

Citing one case, at odds with a great many others, however, did not create an epistemic warrant for ignoring the lack of statistical significance.  The entire notion of cited caselaw for the meaning and importance of statistical significance for drawing inferences is wrong headed.  Even more to the point, the lack of statistical significance in the key study in the PPA litigation did not detract from the reliability of the study, although other features of that study certainly did.  The lack of statistical significance in the PPA study did, however, detract from the reliability of the inference from the study’s estimate of ‘‘effect size’’ to a conclusion of causal association. Indeed, nowhere in the key PPA study did its authors draw a causal conclusion with respect to PPA ingestion and hemorrhagic stroke.  See Walter Kernan, Catherine Viscoli, Lawrence Brass, Joseph Broderick, Thomas Brott, Edward Feldmann, Lewis Morgenstern, Janet Lee Wilterdink, and Ralph Horwitz, ‘‘Phenylpropanolamine and the Risk of Hemorrhagic Stroke,’’ 343 New England J. Med. 1826 (2000).

The MDL court did attempt to distinguish the Eighth Circuit’s decision in Glastetter v. Novartis Pharms. Corp., 252 F.3d 986 (8th Cir. 2001), cited by the defense:

‘‘[I]n Glastetter … expert evidence was excluded because ‘rechallenge and dechallenge data’ presented statistically insignificant results and because the data involved conditions ‘quite distinct’ from the conditions at issue in the case. Here, epidemiologic data is at issue and the studies’ conditions are not distinct from the conditions present in the case. The Court does not find Glastetter to be controlling.’’

Id. at 1081 (internal citations omitted; emphasis in original).  This reading of Glastetter, however, misses important features of that case and the Parlodel litigation more generally.  First, the Eighth Circuit commented not only upon the rechallenge-dechallenge data, which involved arterial spasms, but upon an epidemiologic study of stroke, from which Ms. Glastetter suffered.  The Glastetter court did not review the epidemiologic evidence itself, but cited to another court, which did discuss and criticize the study for various ‘‘statistical and conceptual flaws.’’  See Glastetter, 252 F.3d at 992 (citing Siharath v. Sandoz Pharms.Corp., 131 F.Supp. 2d 1347, 1356-59 (N.D.Ga.2001)).  Glastetter was binding authority, and not so easily dismissed and distinguished.

The Viagra MDL court ultimately placed its holding upon the facts that:

‘‘the McGwin et al. and Margo et al. studies were peer-reviewed, published, contain known rates of error, and result from generally accepted epidemiologic research.’’

In re Viagra, 572 F. Supp. 2d at 1081 (citations omitted).  This holding was a judicial ipse dixit substituting for the expert witness’s ipse dixit.  There were no known rates of error for the systematic errors in the McGwin study, and the ‘‘known’’ rates of error for random error in McGwin 2006  were intolerably high.  The MDL court never considered any of the error rates, systematic or random, for the Margo & French study.  The court appeared to have abdicated its gatekeeping responsibility by delegating it to unknown peer reviewers, who never considered whether the studies at issue in isolation or together could support a causal health claim.

With respect to the last of the three studies considered, the Gorkin study, McGwin opined that it was  too small, and the data were not suited to assessing temporal relationship.  Id.  The court did not appear inclined to go beyond McGwin’s ipse dixit.  The Gorkin study was hardly small, in that it was based upon more than 35,000 patient-years of observation in epidemiologic studies and clinical trials, and provided an estimate of incidence for NAION among users of Viagra that was not statistically different from the general U.S. population.  See L. Gorkin, K. Hvidsten, R. Sobel, and R. Siegel, ‘‘Sildenafil citrate use and the incidence of nonarteritic anterior ischemic optic neuropathy,’’ 60 Internat’l J. Clin. Pract. 500, 500 (2006).

Judge Magnuson did proceed, in his 2008 opinion, to exclude all the other expert witnesses put forward by the plaintiffs.  McGwin survived the defendant’s Rule 702 challenge, largely because the court refused to consider the substantial random variability in the point estimates from the studies relied upon by McGwin. There was no consideration of the magnitude of random error, or for that matter, of the systematic error in McGwin’s study.  The MDL court found that the studies upon which McGwin relied had a known and presumably acceptable ‘‘rate of error.’’  In fact, the court did not consider the random or sampling error in any of the three cited studies; it failed to consider the multiple testing and interaction; and it failed to consider the actual and potential biases in the McGwin study.

Some legal commentators have argued that statistical significance should not be a litmus test.  David Faigman, Michael Saks, Joseph Sanders, and Edward Cheng, Modern Scientific Evidence: The Law and Science of Expert Testimony § 23:13, at 241 (‘‘Statistical significance should not be a litmus test. However, there are many situations where the lack of significance combined with other aspects of the research should be enough to exclude an expert’s testimony.’’)  While I agree that significance probability should not be evaluated in a mechanical fashion, without consideration of study validity, multiple testing, bias, confounding, and the like, handing waving about litmus tests does not excuse courts or commentators from totally ignoring random variability in studies based upon population sampling.  The dataset in the Viagra litigation was not a close call.

Maryland Puts the Brakes on Each and Every Asbestos Exposure

July 3rd, 2012

Last week, the Maryland Court of Special Appeals reversed a plaintiffs’ verdict in Dixon v. Ford Motor Company, 2012 WL 2483315 (Md. App. June 29, 2012).  Jane Dixon died of pleural mesothelioma.  The plaintiffs, her survivors, claimed that her last illness and death were caused by her household improvement projects, which involved exposure to spackling/joint compound, and by her husband’s work with car parts and brake linings, which involved “take home” exposure on his clothes.  Id. at *1.

All the expert witnesses appeared to agree that mesothelioma is a “dose-response disease,” meaning that the more the exposure, the greater the likelihood that a person exposed will develop the disease. Id. at *2.  Plaintiffs’ expert witness, Dr. Laura Welch, testified that “every exposure to asbestos is a substantial contributing cause and so brake exposure would be a substantial cause even if [Mrs. Dixon] had other exposures.” On cross-examination, Dr. Welch elaborated upon her opinion to explain that any “discrete” exposure would be a contributing factor. Id.

Welch, of course, criticized the entire body of epidemiology of car mechanics and brake repairmen, which generally finds no increased risk of mesothelioma above overall population rates.  With respect to the take-home exposure, Welch had to acknowledge that there were no epidemiologic studies that investigated the risk of wives of brake mechanics.  Welch argued that the studies of car mechanics did not involve exposure to brake shoes as would have been experienced by brake repairmen, but her argument only served to make her attribution based upon take-home exposure to brake linings seem more preposterous.  Id. at *3.  The court recognized that Dr. Welch’s opinion may have been trivially true, but still unhelpful.  Each discrete exposure, even as attenuated as a take-home exposure from having repaired a single brake shoe may have “contributed,” but that opinion did not help the jury assess whether the contribution was substantial.

The court sidestepped the issue of fiber type, and threshold, and honed in on the agreement that mesothelioma risk showed a dose-response relationship with asbestos exposure.  (There is a sense that the court confused the dose-response concept to mean no threshold.)  The court credited hyperbolic risk assessment figures from the United States Environmental Protection Agency, which suggested that even ambient air exposure to asbestos leads to an increase in mesothelioma risk, but then realized that such claims made the legal need to characterize the risk from the defendant’s product all the more important before the jury could reasonably have concluded that any particular exposure experienced by Ms. Dixon was “a substantial contributing factor.”  Id. at *5.

Having recognized that the best the plaintiffs could offer was a claim of increased risk, and perhaps crude quantification of the relative risks resulting from each product’s exposure, the court could not escape that the conclusion that Dr. Welch’s empty recitation of “every exposure” is substantial was nothing more than an unscientific and empty assertion.  Welch’s claim was either tautologically true or empirical nonsense.  The court also recognized that risk substituting for causation opened the door to essentially probabilistic evidence:

“If risk is our measure of causation, and substantiality is a threshold for risk, then it follows—as intimated above—that ‘substantiality’ is essentially a burden of proof. Moreover, we can explicitly derive the probability of causation from the statistical measure known as ‘relative risk’ … .  For reasons we need not explore in detail, it is not prudent to set a singular minimum ‘relative risk’ value as a legal standard.12 But even if there were some legal threshold, Dr. Welch provided no information that could help the finder of fact to decide whether the elevated risk in this case was ‘substantial’.”

Id. at *7.  The court’s discussion here of “the elevated risk” seems wrong unless we understand it to mean the elevated risk attributable to the particular defendant’s product, in the context of an overall exposure that we accept as having been sufficient to cause the decedent’s mesothelioma.  Despite the lack of any quantification of relative risks in the case, overall or from particular products, and the court’s own admonition against setting a minimum relative risk as a legal standard, the court proceeded to discuss relative risks at length.  For instance, the court criticized Judge Kozinski’s opinion in Daubert, upon remand from the Supreme Court, for not going far enough:

“In other words, the Daubert court held that a plaintiff’s risk of injury must have at least doubled in order to hold that the defendant’s action was ‘more likely than not’ the actual cause of the plaintiff’s injury. The problem with this holding is that relative risk does not behave like a ‘binary’ hypothesis that can be deemed ‘true’ or ‘false’ with some degree of confidence; instead, the un-certainty inherent in any statistical measure means that relative risk does not resolve to a certain probability of specific causation. In order for a study of relative risk to truly fulfill the preponderance standard, it would have to result in 100% confidence that the relative risk exceeds two, which is a statistical impossibility. In short, the Daubert approach to relative risk fails to account for the twin statistical uncertainty inherent in any scientific estimation of causation.”

Id. at *7 n.12 (citing Daubert v. Merrell Dow Pharms., Inc., 43 F.3d 1311, 1320-21 (9th Cir.1995) (holding that that a preponderance standard requires causation to be shown by probabilistic evidence of relative risk greater than two) (opinion on remand from Daubert v. Merrell Dow Pharms., 509 U.S. 579 (1993)).  The statistical impossibility derives from the asymptotic nature of the normal distribution, but the court failed to explain why a relative risk of two must be excluded as statistically implausible based upon the sample statistic.  After all, a relative risk greater than two, with a lower bound of a 95% confidence interval above one, based upon an unbiased sampling, suggests that our best evidence is that the population parameter is greater than two, as well.  The court, however, insisted upon stating the relative-risk-greater-than-two rule with a vengeance:

“All of this is not to say, however, that any and all attempts to establish a burden of proof of causation using relative risk will fail. Decisions can be – and in science or medicine are – premised on the lower limit of the relative risk ratio at a requisite confidence level. The point of this minor discussion is that one cannot apply the usual, singular ‘preponderance’ burden to the probability of causation when the only estimate of that probability is statistical relative risk. Instead, a statistical burden of proof of causation must consist of two interdependent parts: a requisite confidence of some minimum relative risk. As we explain in the body of our discussion, the flaws in Dr. Welch’s testimony mean we need not explore this issue any further.44

Id. (emphasis in original).

And despite having declared the improvidence of addressing the relative risk issue, and then the lack of necessity for addressing the issue given Dr. Welch’s flawed testimony, the court nevertheless tackled the issue once more, a couple of pages later:

“It would be folly to require an expert to testify with absolute certainty that a plaintiff was exposed to a specific dose or suffered a specific risk. Dose and risk fall on a spectrum and are not ‘true or false’. As such, any scientific estimate of those values must be expressed as one or more possible intervals and, for each interval, a corresponding confidence that the true value is within that interval.”

Id. at 9 (emphasis in original; internal citations omitted).  The court captured the frequentist concept of the confidence interval as being defined operationally by repeated samplings and their random variability, but the confidence of the confidence interval means that the specified coefficient represents the percentage of all such intervals that include the “true” value, not the probability that a particular interval, calculated from a given sample, contains the true value.  The true value is either in or not in the interval generated from a single sample risk statistic.  Again, it is unclear why the court was weighing in on this aspect of probabilistic evidence when plaintiffs’ expert witness, Welch, offered no quantitation of the overall risk or of the risk attributable to a specific product exposure.

The court indulged the plaintiffs’ no-threshold fantasy but recognized that the risks of low-level asbestos exposure were low, and likely below a doubling of risk, an issue that the court stressed it wanted to avoid.  The court cited one study that suggested a risk (odds) ratio of 1.1 for exposures less than 0.5 fiber/ml – years.  See id. at *5 (citing Y. Iwatsubo et al., “Pleural mesothelioma: dose-response relation at low levels of asbestos exposure in a French population-based case-control study,” 148 Am. J. Epidemiol. 133 (1998) (estimating an odds ratio of 1.1 for exposures less than 0.5 fibers/ml-years).  But the court, which tried to be precise elsewhere, appears to have lost its way in citing Iwatsubo here.  After all, how can a single odds ratio of 1.1 describe all exposures from 0 all the way up to 0.5 f/ml-years?  How can a single odds ratio describe all exposures in this range, regardless of fiber type, when chrystotile asbestos carries little to no risk for mesothelioma, and certainly orders of magnitude risk less than amphibole fibers such as amosite and crocidolite.  And if a low-level exposure has a risk ratio of 1.1, how can plaintiffs’ hired expert witness, Welch, even make the attribution of Dixon’s mesothelioma to the entirety of her exposure, let alone the speculative take-home chrysotile exposure involved from Ford’s brake linings?  Obviously, had the court posed these questions, it would it would have realized that “it is not possible” to permit Welch’s testimony at all.

The court further lost its way in addressing the exculpatory epidemiology put forward by the defense expert witnesses:

“Furthermore, the leading epidemiological report cited by Ford and its amici that specifically studied ‘brake mechanics’, P.A. Hessel et al., ‘Meso-thelioma Among Brake Mechanics: An Expanded Analysis of a Case-control Study’, 24 Risk Analysis 547 (2004), does not at all dispel the notion that this population faced an increased risk of mesothelioma due to their industrial asbestos exposure. … When calculated at the 95% confidence level, Hessel et al. estimated that the odds ratio of mesothelioma could have been as low as 0.01 or as high as 4.71, implying a nearly quintupled risk of mesothelioma among the population of brake mechanics. 24 Risk Analysis at 550–51.”

Id. at *8.  Again, the court is fixated with the confidence interval, to the exclusion of the estimated magnitude of the association!  This time, after earlier shouting that it was the lower bound of the interval that matters scientifically, the court emphasizes the upper bound.  The court here has strayed far from the actual data, and any plausible interpretation of them:

“The odds ratio (OR) for employment in brake installation or repair was 0.71 (95% CI: 0.30-1.60) when controlled for insulation or shipbuilding. When a history of employment in any of the eight occupations with potential asbestos exposure was controlled, the OR was 0.82 (95% CI: 0.36-1.80). ORs did not increase with increasing duration of brake work. Exclusion of those with any of the eight exposures resulted in an OR of 0.62 (95% CI: 0.01-4.71) for occupational brake work.”

P.A. Hessel et al., “Mesothelioma Among Brake Mechanics: An Expanded Analysis of a Case-control Study,” 24 Risk Analysis 547, 547 (2004).  All of Dr. Hessel’s estimates of effect sizes were below 1.0, and he found no trend for duration of brake work.  Cherry picking out the upper bound of a single subgroup analysis for emphasis was unwarranted, and hardly did justice to the facts or the science.

Dr. Welch’s conclusion that the exposure and risk in this case were “substantial” simply was not a scientific conclusion, and without it her testimony did not provide information for the jury to use in reaching its conclusion as to substantial factor causation. Id. at *7.  The court noted that Welch, and the plaintiffs, may have lacked scientific data to provide estimates of Dixon’s exposure to asbestos or relative risk of mesothelioma, but ignorance or uncertainty was hardly the basis to warrant an expert witness’s belief that the relevant exposures and risks are “substantial.” Id. at *10.  The court was well justified in being discomforted by the conclusory, unscientific opinion rendered by Laura Welch.

In the final puzzle of the Dixon case, the court vacated the judgment, and remanded for a new trial, “either without her opinion on substantiality or else with some quantitative testimony that will help the jury fulfill its charge.”  Id. at *10.  The court thus seemed to imply that an expert witness need not utter the magic word, “substantial,” for the case to be submitted to the jury against a brake defendant in a take-home exposure case.  Given the state of the record, the court should have simply reversed and rendered judgment for Ford.

Ecological Fallacy Goes to Court

June 30th, 2012

In previous posts, I have bemoaned the judiciary’s tin ear for important qualitative differences between and among different research study designs.  The Reference Manual for Scientific Evidence (3d ed. 2011)(RMSE3d) offers inconsistent advice, ranging from Margaret Berger’s counsel to abandon any hierarchy of evidence, to other chapters’ emphasizing the importance of a hierarchy.

The Cook case is one of the more aberrant decisions, which elevated an ecological study, without a statistically significant result, into an acceptable basis for a causal conclusion under Rule 702.  Senior Judge Kane’s decision in the litigation over radioactive contamination from the Colorado Rocky Flats nuclear weapons plant is illustrative of a judicial refusal to engage with the substantive differences among studies, and to ignore the inability of some study designs to support causality.  See Cook v. Rockwell Internat’l Corp., 580 F. Supp. 2d 1071, 1097-98 (D. Colo. 2006) (“Defendants assert that ecological studies are inherently unreliable and therefore inadmissible under Rule 702.  Ecological studies, however, are one of several methods of epidemiological study that are well-recognized and accepted in the scientific community.”), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012).  Senior Judge Kane’s point about the recognition and acceptance of ecological studies has nothing to do with their ability to support conclusions of causality.  This basic non sequitur led the trial judge into ruling that the challenge “goes to the weight, not the admissibility” of the challenged opinion testimony.  This is a bit like using an election day exit poll, with 5% returns, for “reliable” evidence to support a prediction of the winner.  The poll may have been conducted most expertly, but it lacks the ability to predict the winner.

The issue is not whether ecological studies are “scientific”; they are part of the epidemiologists’ toolkit.  The issue is whether they warrant inferences of causation.  Some so-called scientific studies are merely hypothesis generating, preliminary, tentative, or data-dredging exercises.  Judge Kane opined that ecological studies are merely “less probative” than other studies, and the relative weights of studies do not render them inadmissible.  Id.  This is a misunderstanding or an abdication of gatekeeping responsibility.  First, studies themselves are not admissible; it is the expert witness, whose testimony is challenged.  Second, Rule 702 requires that the proffered opinion be “scientific knowledge,” and ecological studies simply lack the necessary epistemic warrant.

The legal sources cited by Senior Judge Kane provide only equivocal and minimal support at best for his decision.  The court pointed to RSME2d at 344-45, for the proposition that ecological studies are useful for establishing associations, but are weak evidence for causality. The other legal citations give seem equally unhelpful.  In re Hanford Nuclear Reservation Litig., No. CY–91– 3015–AAM, 1998 WL 775340 at *106 (E.D.Wash. Aug.21, 1998) (citing RMSE2d and the National Academy of Science Committee on Radiation Dose Reconstruction for Epidemiological Uses, which states that “ecological studies are usually regarded as hypothesis generating at best, and their results must be regarded as questionable until confirmed with cohort or case‑control studies.” National Research Council, Radiation Dose Reconstruction for Epidemiologic Uses at 70 (1995)), rev’d on other grounds, 292 F.3d 1124 (9th Cir. 2002).  Ruff v. Ensign– Bickford Indus., Inc., 168 F.Supp. 2d 1271, 1282 (D. Utah 2001) (reviewing evidence that consisted of a case-control study in addition to an ecological study; “It is well established in the scientific community that ecological studies are correlational studies and generally provide relatively weak evidence for establishing a conclusive cause and effect relationship.’’); see also id. at 1274 n.3 (“Ecological studies tend to be less reliable than case–control studies and are given little evidentiary weight with respect to establishing causation.”)

 

ERROR COMPOUNDED

The new edition of RMSE cites the Cook case at several places.  In an introductory chapter, the late Professor Margaret Berger cites the case incorrectly for having excluded expert witness testimony.  See Margaret A. Berger, “The Admissibility of Expert Testimony 11, 24 n.62 in RMSE3d (“See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”)  The chapter on epidemiology cites Cook correctly for having refused to exclude the plaintiffs’ expert witness, Dr. Richard Clapp, who relied upon an ecological study of two cancer outcomes in the area adjacent to the Rocky Flats Nuclear Weapons Plant.  See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 561 n. 34, in Reference Manual for Scientific Evidence (3d ed. 2011).  The authors, however, abstain from any judgmental comments about the Cook case, which is curious given their careful treatment of ecological studies and their limitations:

“4. Ecological studies

Up to now, we have discussed studies in which data on both exposure and health outcome are obtained for each individual included in the study.33 In contrast, studies that collect data only about the group as a whole are called ecological studies.34 In ecological studies, information about individuals is generally not gathered; instead, overall rates of disease or death for different groups are obtained and compared. The objective is to identify some difference between the two groups, such as diet, genetic makeup, or alcohol consumption, that might explain differences in the risk of disease observed in the two groups.35 Such studies may be useful for identifying associations, but they rarely provide definitive causal answers.36

Id. at 561.  The epidemiology chapter proceeds to note that the lack of information about individual exposure and disease outcome in an ecological study “detracts from the usefulness of the study,” and renders it prone to erroneous inferences about the association between exposure and outcome, “a problem known as an ecological fallacy.”  Id. at 562.  The chapter authors define the ecological fallacy:

“Also, aggregation bias, ecological bias. An error that occurs from inferring that a relationship that exists for groups is also true for individuals.  For example, if a country with a higher proportion of fishermen also has a higher rate of suicides, then inferring that fishermen must be more likely to commit suicide is an ecological fallacy.”

Id. at 623.  Although the ecological study design is weak and generally unsuitable to support causal inferences, the authors note that such studies can be useful in generating hypotheses for future research using studies that gather data about individuals. Id. at 562.  See also David Kaye & David Freedman, “Reference Guide on Statistics,” 211, 266 n.130 (citing the epidemiology chapter “for suggesting that ecological studies of exposure and disease are ‘far from conclusive’ because of the lack of data on confounding variables (a much more general problem) as well as the possible aggregation bias”); Leon Gordis, Epidemiology 205-06 (3d ed. 2004)(ecologic studies can be of value to suggest future research, but “[i]n and of themselves, however, they do not demonstrate conclusively that a causal association exists”).

The views expressed in the Reference Manual for Scientific Evidence, about ecological studies, are hardly unique.  The following quotes show how ecological studies are typically evaluated in epidemiology texts:

Ecological fallacy

An ecological fallacy or bias results if inappropriate conclusions are drawn on the basis of ecological data. The bias occurs because the association observed between variables at the group level does not necessarily represent the association that exists at the individual level (see Chapter 2).

***

Such ecological inferences, however limited, can provide a fruitful start for more detailed epidemiological work.”

R. Bonita, R. Beaglehole, and T. Kjellström, Basic Epidemiology 43 2d ed. (WHO 2006).

“A first observation of a presumed relationship between exposure and disease is often done at the group level by correlating one group characteristic with an outcome, i.e. in an attempt to relate differences in morbidity or mortality of population groups to differences in their local environment, living habits or other factors. Such correlational studies that are usually based on existing data are prone to the so-called ‘ecological fallacy’ since the compared populations may also differ in many other uncontrolled factors that are related to the disease. Nevertheless, ecological studies can provide clues to etiological hypotheses and may serve as a gateway towards more detailed investigations.”

Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 17-18 (2005).

The Cook case is a wonderful illustration of the judicial mindset that avoids and evades gatekeeping by resorting to the conclusory reasoning that a challenge “goes to the weight, not the admissibility” of an expert witness’s opinion.

Let’s Require Health Claims to Be Evidence Based

June 28th, 2012

Litigation arising from the FDA’s refusal to approval “health claims” for foods and dietary supplements is a fertile area for disputes over the interpretation of statistical evidence.  A ‘‘health claim’’ is ‘‘any claim made on the label or in labeling of a food, including a dietary supplement, that expressly or by implication … characterizes the relationship of any substance to a disease or health-related condition.’’ 21 C.F.R. § 101.14(a)(1); see also 21 U.S.C. § 343(r)(1)(A)-(B).

Unlike the federal courts exercising their gatekeeping responsibility, the FDA has committed to pre-specified principles of interpretation and evaluation. By regulation, the FDA gives notice of standards for evaluating complex evidentiary displays for the ‘‘significant scientific agreement’’ required for approving a food or dietary supplement health claim.  21 C.F.R. § 101.14.  See FDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final (2009).

If the FDA’s refusal to approve a health claim requires pre-specified criteria of evaluation, then we should be asking ourselves why have the federal courts failed to develop a set of criteria for evaluating health effects claims as part of its Rule 702 (“Daubert“) gatekeeping responsibilities.  Why, after close to 20 years after the Supreme Court decided Daubert, can lawyers make “health claims” without having to satisfy evidence-based criteria?

Although the FDA’s guidance is not always as precise as might be hoped, it is far better than the suggestion of the new Reference Manual for Scientific Evidence (3d ed. 2011) that there is no hierarchy of evidence.   See RMSE 3d at 564 & n.48 (citing and quoting idiosyncratic symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]; “Late Professor Berger’s Introduction to the Reference Manual on Scientific Evidence” (Oct. 23, 2011).

The FDA’s attempt to articulate an evidence-based hierarchy is noteworthy because the agency must evaluate a wide range of evidence, from in vitro, to animal studies, to observational studies of varying kinds, to clinical trials, to meta-analyses and reviews.  The FDA’s criteria are a good start, and I imagine that they will develop and improve over time.  Although imperfect, the criteria are light years ahead of the situation in federal and state court gatekeeping.  Unlike gatekeeping in civil actions, the FDA criteria are pre-stated and not devised post hoc.  The FDA’s attempt to implement evidence-based principles in the evaluation of health claims made is a model that would much improve the Reference Manual for Scientific EvidenceSee Christopher Guzelian & Philip Guzelian, “Prevention of false scientific speech: a new role for an evidence-based approach,” 27 Human & Experimental Toxicol. 733 (2008).

The FDA’s evidence-based criteria need work in some areas.  For instance, the FDA’s Guidance on meta-analysis is not particularly specific or helpful:

Research Synthesis Studies

Reports that discuss a number of different studies, such as review articles, do not provide sufficient information on the individual studies reviewed for FDA to determine critical elements such as the study population characteristics and the composition of the products used. Similarly, the lack of detailed information on studies summarized in review articles prevents FDA from determining whether the studies are flawed in critical elements such as design, conduct of studies, and data analysis. FDA must be able to review the critical elements of a study to determine whether any scientific conclusions can be drawn from it. Therefore, FDA intends to use review articles and similar publications to identify reports of additional studies that may be useful to the health claim review and as background about the substance/disease relationship. If additional studies are identified, the agency intends to evaluate them individually. Most meta-analyses, because they lack detailed information on the studies summarized, will only be used to identify reports of additional studies that may be useful to the health claim review and as background about the substance-disease relationship.  FDA, however, intends to consider as part of its health claim review process a meta-analysis that reviews all the publicly available studies on the substance/disease relationship. The reviewed studies should be consistent with the critical elements, quality and other factors set out in this guidance and the statistical analyses adequately conducted.”

FDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final at 10 (2009).

The dismissal of review articles as a secondary source is welcome, but meta-analyses are quantitative reviews that can add additional insights and evidence, if methodologically appropriate, by providing a summary estimate of association, sensitivity analyses, meta-regression, etc.  The FDA’s guidance was applied in connection with the agency’s refusal to approve a health claim for vitamin C and lung cancer.  Proponents claimed that a particular meta-analysis supported their health claim, but the FDA disagreed.  The proponents sought injunctive relief in federal district court, which upheld the FDA’s decision on vitamin C and lung cancer.  Alliance for Natural Health US v. Sebelius, 786 F.Supp. 2d 1, 21 (D.D.C. 2011).  The district court found that the FDA’s refusal to approve the health claim was neither arbitrary nor capricious with respect to its evaluation of the cited meta-analysis:

‘‘The FDA discounted the Cho study because it was a ‘meta-analysis’ of studies reflected in a review article. FDA Decision at 2523. As explained in the 2009 Guidance Document, ‘research synthesis studies’, and ‘review articles’, including ‘most meta-analyses’, ‘do not provide sufficient information on the individual studies reviewed’ to determine critical elements of the studies and whether those elements were flawed. 2009 Guidance Document at A.R. 2432. The Guidance Document makes an exception for meta-analyses ‘that review[ ] all the publicly available studies on the substance/disease relationship’. Id. Based on the Court’s review of the Cho article, the FDA’s decision to exclude this article as a meta-analysis was not arbitrary and capricious.’’

Id. at 19.

The FDA’s Guidance was adequate for its task in the vitamin C/lung cancer health claim, but notably absent from the Guidance are any criteria to evaluate competing meta-analyses that do include “all the publicly available studies on the substance/disease relationship.”  The model assumptions of meta-analyses, fixed effect versus random effects, lack of heterogeneity, as well as other considerations will need to be spelled out in advance.  Still not a bad start.  Implementing evidence-based criteria in Rule 702 gatekeeping has the potential to tame the gatekeeper’s discretion.