TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Watson Popcorn Case Pops Along

September 8th, 2012

Earlier today, I discussed the pending motion that would have limited, or eliminated, Dr. Egilman’s testimony in the Watson diacetyl case. See Good’s Expert Witness Opinion Not Good Enough in Tenth Circuit.  Apparently, Chief Judge Daniel denied the defendant’s renewed Rule 702 motion, and so “this trial must be tried.”  Whether the gatekeeping was sufficiently exact, time will tell.

Details to follow.

 

 

 

Good’s Expert Witness Opinion Not Good Enough in Tenth Circuit

September 8th, 2012

Last month, the Tenth Circuit reversed a significant judgment against Ford Motor Company in a rollover accident.  Hoffman v. Ford Motor Co., No. 10-1137, 2012 WL 3518997,  2012 U.S. App. LEXIS 17215 (10th Cir. Aug. 16, 2012) (unpublished), rev’g 690 F. Supp. 2d 1179 (D. Colo. 2010).  The plaintiff, Erica Hoffman, sustained severe injuries, with resulting quadriplegia, when she was thrown out of the passenger seat, and out of her parents’ 1999 Mercury Cougar, during the rollover.

Hoffman sued Ford Motor Company on the claim that the seatbelt, which she claims to have been wearing, released during the accident.  Hoffman’s expert witness, Dr. Craig Good opined that Hoffman’s seatbelt “most probably” unlatched during the accident due to a design defect that permitted unlatching under various inertial forces.  Dr. Good supported his opinion by testing that sought to show the threshold of inertial forces under which unlatching occurred.  Good, however, lacked rollover crash data, and he thus used data from crash tests conducted only in the horizontal plane, rather than the more complex forces at work in a rollover situation.  From the “planar” data, Good concluded that Hoffman’s accident presented sufficient force to cause the inertial unlatching of her seatbelt during the accident.  The jury found Ford liable; the verdict, molded for comparative negligence, amounted to 4.5 million dollars.

Ford had objected, under Rule 702, to Good’s testimony, and to his extrapolation of his test data from a horizontal test scenario to the three-dimensional array of forces involved in the actual rollover accident.  The gravamen of Ford’s complaint was that Good had failed to show the levels of acceleration needed to induce inertial unlatching in the laboratory in the real-world setting of Hoffman’s accident.

The panel of the Tenth Circuit, divided 2 to 1, agreed with Ford that the trial judge had not been:

“a sufficiently exacting gatekeeper; Daubert requires more precision. Good failed to present a scientific connection between the accelerations he found necessary to inertially unlatch buckles tested in the laboratory and accelerations that occurred could have occurred on Erica’s buckle during the rollover. As a result, his opinion (that Erica’s buckle was defective because it inertially unlatched during the accident) should not have been admitted at trial.”

Hoffman, 2012 U.S. App. LEXIS 17215, *4-5.  Because Good’s opinion was necessary to support plaintiff’s recovery, and because there was no other evidence to support the claim of design defect, the Circuit reversed and remanded with instructions that judgment be entered in favor of Ford.

The Hoffman decision does not really broach new ground in the law of expert witnesses.  The law requires that testing data bear on the situation in which the product supposedly malfunctioned.  The opinion, however, has already had the salutary effect of causing the newly assigned trial judge to Watson v. Dillon Companies, to reconsider the previous Rule 702 rulings in this diacetyl consumer case.

Chief Judge Wiley Daniel took over the case for trial, when Senior Judge Miller assumed inactive senior status.  Plaintiff Watson claims lung injuries from diacetyl inhaled in the course of consuming upwards of 7,000 bags of popcorn, over seven years.  Judge Miller heard, and largely denied, defendants’ 702 motions. Watson v. Dillon Companies, Inc., 797 F. Supp. 2d 1138 (D.Colo. 2012).  Chief Judge Daniel found the Hoffman precedent sufficiently on point to the challenged diacetyl exposure assessment, that he invited renewed argument on the challenges to plaintiffs’ expert witnesses.  Order of Aug. 29, 2012.

In Chief Judge Daniel’s words:

“After reviewing the Tenth Circuit’s recent pronouncement on Daubert challenges and the admissibility of expert testimony at trial, I reexamined pertinent documents in this matter. Specifically, I reread Judge Miller’s June 22, 2011 order denying the motions to exclude expert testimony of Plaintiffs’ expert witnesses along with the opinion issued by the United States District Court for the Eastern District of Washington in Newkirk v. ConAgra Foods, Inc., 727 F. Supp. 2d 1006 (E.D.Wash. 2010). I also revisited the material submitted by Defendants surrounding the issue of the reliability of Dr. Martyny’s testing of diacetyl levels at Plaintiffs’ home, as Plaintiffs’ expert witnesses based some of their opinions on these test results.

I find that the Hoffman opinion may impact previous expert witness rulings including, but not limited to, Dr. David Egilman’s opinions. Accordingly, on Tuesday, September 4, 2012, prior to the commencement of jury selection in this matter, the parties shall be prepared to discuss these issues and how they may impact the trial.”

Along with the encouragement provided by the Hoffman case, the Chief Judge may have been moved to revisit the 702 issues by a defense filing that challenged the plaintiffs’ exposure level evidence.  The defendant filed a motion in limine to preclude plaintiffs’ expert witnesses’ reliance upon data generated by an Innova Model 1312 Photoacoustic Multi-Gas Monitor.  The court denied this motion, with leave to raise it at trial, but also precluded mention of the testing in front of the jury until the evidentiary matter is resolved. Order of June 22, 2012.

It appears that the plaintiffs may have withdrawn the challenged evidence, which if true, will have significance beyond this case, given the media and regulatory fora sought out by Dr. Egilman and his colleagues.

Let us hope that the Hoffman opinion inspires the court to be the sufficiently exacting gatekeeper required by law.

Confusing Regulatory and Litigation Standards for Showing Causation – More on Chantix

September 3rd, 2012

 

Before denying Pfizer’s Rule 702 challenges to the plaintiffs’ expert witnesses’ opinion testimony, the Chantix MDL court handed Pfizer a significant victory by holding that the company’s 2009 warnings were adequate as a matter of law.  In re Chantix Products Liab. Litig., 2012 U.S. Dist. LEXIS 101780, *21 (Jul. 23, 2012). The MDL court assessed warnings in the full context of the learned intermediary setting, in which the prescribing physicians are the intended audience for the warnings. The court held that when the warning addresses the particular injury sustained by the plaintiff, the warning is adequate.  Id. at *29-30 n.10. See Michelle Yeary, “Chantix Warnings Adequate As a Matter of Law” (July 31, 2012).

Perhaps the MDL Court felt that it needed to level the playing field by denying the defendant’s Rule 702 motions.  In any event, I seem to be not alone in expressing dismay over the glib pronouncements of the Chantix MDL court’s Rule 702 opinionSee David Oliver, “Of Mice and Monkeys and Men” (Aug. 30, 2012) (noting the court’s indulgence in the extreme assumption that causation in one mammalian species justifies an inference that it will cause in others, including humans).

In “Open Admissions for Expert Witnesses in Chantix Litigation (Sept. 1, 2012),” I detailed much that went wrong in the gatekeeping in the Chantix litigation.  Unfortunately, I only scratched at the surface.

CONFUSING REGULATORY ACTION WITH CAUSAL ASSESSMENTS

One of the more stunning aspects of the Chantix opinion is its holding that the plaintiffs’ expert witnesses need present opinions no more rigorous and warranted than would be required to justify FDA action.  Memorandum Opinion and Order at 22-23, In re Chantix (Varenicline) Prod. Liab. Litig., MDL No. 2092, Case 2:09-cv-02039-IPJ Document 642 (N.D. Ala. Aug. 21, 2012)[hereafter cited as Chantix].  As I noted in the earlier post, this holding against the overwhelming weight of precedent on the issue.  Judge Johnson relied heavily upon the Supreme Court’s decision in Matrixx Initiatives, but that decision carefully distinguished causal judgments in civil actions from regulatory action, at least for a while, before the Court conflated them in dictum.

To be sure, Judge Johnson, in the Chantix litigation, is not the first federal judge to conflate regulatory decision making with the sufficiency and reliability needed to establish medical causation in civil litigation.  Judge Rakoff, confusing statistical significance probability with posterior probability attached to the causation issue, reached a similar conclusion in the Ephedra MDL.  See In re Ephedra Prods. Liab. Litig., 393 F. Supp. 2d 181, 189 (S.D.N.Y. 2005) (relying upon FDA ban despite “the absence of definitive scientific studies establishing causation”).

The FDA could not be clearer that its labeling requirements do not bear on the civil tort standards of liability and causation.  Back in 1979, the FDA stated that its “[l]abeling requirements will not affect adversely the civil tort liability of manufacturers, physicians, pharmacists, and other dispensers of prescription drug products.”  44 Fed. Reg. 40016, 40023 (FDA July 6, 1979) (addressing patient package inserts)
In terms of modifying drug warnings, the FDA requires that manufacturers address potential adverse events “as soon as there is reasonable evidence of a causal association with a drug; a causal relationship need not have been definitely established.” 21 C.F.R. § 201.57(c)(6)(i) (stating requirement for medications approved after June 30, 2001).  For medications approved before July 1, 2001, the FDA requires that warnings be modified “as soon as there is reasonable evidence of an association of a serious hazard with a drug; a causal relationship need not have been proved.” Id. at § 201.80(e).  See also “Labeling of Diphenhydramine Containing Drug Products for Over-the-Counter Human Use,” 67 Fed. Reg. 72,555, at 72,556 (Dec. 6, 2002) (“FDA’s decision to act in an instance such as this one need not meet the standard of proof required to prevail in a private tort action. . .. To mandate a warning or take similar regulatory action, FDA need not show, nor do we allege, actual causation.”)(citing Agent Orange, Glastetter, and Hollander).

SUPREME COURT OF THE UNITED STATES

Matrixx Initiatives, Inc. v. Siracuso, ___U.S. ___, 131 S. Ct. 1309, 1320 (2011) (regulatory and administrative agencies “may make regulatory decisions … based on post-marketing evidence that gives rise to only a suspicion of causation.)(internal citation omitted)

IUD v. API, 448 U.S. 607, 656 (1980)(“agency is free to use conservative assumptions in interpreting the data on the side of overprotection rather than underprotection.”)

First Circuit

In re Neurontin Mktg., Sales Practices, and Prod. Liab. Litig., 612 F. Supp. 2d 116, 136 (D. Mass. 2009) (‘‘It is widely recognized that, when evaluating pharmaceutical drugs, the FDA often uses a different standard than a court does to evaluate evidence of causation in a products liability action. Entrusted with the responsibility of protecting the public from dangerous drugs, the FDA regularly relies on a risk-utility analysis, balancing the possible harm against the beneficial uses of a drug. Understandably, the agency may choose to ‘err on the side of caution,’ … and take regulatory action such as revising a product label or removing a drug from the marketplace ‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-like-than-not standard used to assess tort liability.’’’)(internal citations omitted)

Sutera v. Perrier Group of Am., Inc., 986 F. Supp. 655, 667 (D. Mass. 1997)

Second Circuit

Mancuso v. Consolidated Edison Co., 967 F. Supp. 1437, 1448 (S.D.N.Y. 1997) (“recommended or prescribed precautionary standards cannot provide legal causation”; “[f]ailure to meet regulatory standards is simply not sufficient” to establish liability)

In re Agent Orange Product Liab. Litig., 597 F. Supp. 740, 781 (E.D.N.Y.1984)(“The distinction between avoidance of risk through regulation and compensation for injuries after the fact is a fundamental one.”), aff’d in relevant part, 818 F.2d 145 (2d Cir.1987), cert. denied sub nom. Pinkney v. Dow Chemical Co., 484 U.S. 1004  (1988)

Third Circuit

Gates v. Rohm & Haas Co., 655 F.3d 255 (3d Cir. 2011) (‘‘plaintiffs could not carry their burden of proof for a class of specific persons simply by citing regulatory standards for the population as a whole’’)

In re Schering-Plough Corp. Intron/Temodar Consumer Class Action, 2009 WL 2043604, at *13 (D.N.J. July 10, 2009)(“[T]here is a clear and decisive difference between allegations that actually contest the safety or effectiveness of the Subject Drugs and claims that merely recite violations of the FDCA, for which there is no private right of action.”)

Soldo v. Sandoz Pharm. Corp., 244 F. Supp. 2d 434, 543 (W.D. Pa. 2003) (“FDA is a regulatory agency whose mandate is to control which drugs are marketed in the United States and how they are marketed. FDA ordinarily does not attempt to prove that the drug in fact causes a particular adverse effect.”)

O’Neal v. Dep’t of the Army, 852 F. Supp. 327, 333 (M.D. Pa. 1994) (administrative risk figures are “appropriate for regulatory purposes in which the goal is to be particularly cautious [but] overstate the actual risk and, so, are inappropriate for use in determining” civil liability)

Wade-Greaux v. Whitehall Laboratories, Inc., 874 F. Supp. 1441, 1464 (D.V.I.) (“assumption[s that] may be useful in a regulatory risk-benefit context … ha[ve] no applicability to issues of causation-in-fact”), aff’d, 46 F.3d 1120 (3d  Cir. 1994)

Fourth Circuit

Meade v. Parsley, No. 2:09-cv-00388, 2010 U.S. Dist. LEXIS 125217, * 25 (S.D.W. Va. Nov. 24, 2010) (‘‘Inasmuch as the cost-benefit balancing employed by the FDA differs from the threshold standard for establishing causation in tort actions, this court likewise concludes that the FDA-mandated [black box] warnings cannot establish general causation in this case.’’)

Dunn v. Sandoz Pharm. Corp., 275 F. Supp. 2d 672, 684 (M.D.N.C. 2003) (FDA “risk benefit analysis” “does not demonstrate” causation in any particular plaintiff).

Fifth Circuit

Johnson v. Arkema Inc., 2012 WL ______ (5th Cir. June 20, 2012) (per curiam) (affirming exclusion of expert witness who relied upon regulatory pronouncements; noting the precautionary nature of such statements, and the absence of specificity for the result claimed at the exposures experienced by plaintiff)

Allen v. Pennsylvania Eng’g Corp., 102 F.3d 194, 198-99 (5th Cir. 1996)(“Scientific knowledge of the harmful level of exposure to a chemical, plus knowledge that the plaintiff was exposed to such quantities, are minimal facts necessary to sustain the plaintiffs’ burden in a toxic tort case”; regulatory agencies,  charged with protecting public health, employ a lower standard of proof in promulgating regulations than that used in tort cases)

Cano v. Everest Minerals Corp., 362 F. Supp. 2d 814, 825 (W.D. Tex. 2005) (noting that a product that “has been classified as a carcinogen by agencies responsible for public health regulations is not probative of” common-law specific causation);

Burleson v. Glass, 268 F.Supp. 2d 699, 717 (W.D. Tex. 2003) (“the mere fact that [the product] has been classified by certain regulatory organizations as a carcinogen is not probative on the issue of whether [plaintiff’s] exposure . . . caused his . . . cancers”), aff’d, 393 F.3d 577 (5th Cir. 2004)

Newton v. Roche Labs., Inc., 243 F. Supp. 2d 672, 677, 683 (W.D. Tex. 2002) (“Although evidence of an association may. . .be important in the scientific and regulatory contexts. . ., tort law requires a higher standard of causation.”)(FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

Molden v. Georgia Gulf Corp., 465 F. Supp. 2d 606, 611 (M.D. La. 2006) (“regulatory and advisory bodies make prophylactic rules governing human exposure based on proof that is reasonably lower than that appropriate in tort law”)

Sixth Circuit

Nelson v. Tennessee Gas Pipeline Co., 243 F.3d 244, 252-53 (6th Cir. 2001)(exposure above regulatory levels is insufficient to establish causation)

Stites v Sundstrand Heat Transfer, Inc., 660 F. Supp. 1516, 1525 (W.D. Mich. 1987) (rejecting use of regulatory standards to support claim of increased risk, noting the differences in goals and policies between regulation and litigation)

Baker v. Chevron USA Inc., 680 F. Supp. 2d 865, 880 (S.D. Ohio 2010) (“[t]he mere fact that Plaintiffs were exposed to [the product] in excess of mandated limits is insufficient to establish causation”; rejecting Dr. Dahlgren’s opinion and its reliance upon a “one-hit” or “no threshold” theory of causation in which exposure to one molecule of a cancer-causing agent has some finite possibility of causing a genetic mutation leading to cancer, a theory that may be accepted for purposes of setting regulatory standards, but as reliable scientific knowledge; ‘‘regulatory agencies are charged with protecting public health and thus reasonably employ a lower threshold of proof in promulgating their regulations’’)

Eighth Circuit

Glastetter v. Novartis Pharms. Corp., 107 F. Supp. 2d 1015, 1036 (E.D. Mo. 2000) (“[T]he [FDA’s] statement fails to affirmatively state that a connection exists between [the drug] and the type of injury in this case.  Instead, it states that the evidence received by the FDA calls into question [drug’s] safety, that [the drug] may be an additional risk factor. . .and that the FDA had new evidence suggesting that therapeutic use of [the drug] may lead to serious adverse experiences.  Such language does not establish that the FDA had concluded that [the drug] can cause [the injury]; instead, it indicates that in light of the limited social utility of [the drug for the use at issue] and the reports of possible adverse effects, the drug should no longer be used for that purpose.”) (emphasis in original), aff’d, 252 F.3d 986, 991 (8th Cir. 2001) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events; “methodology employed by a government agency results from the preventive perspective that the agencies adopt”)(“The FDA will remove drugs from the marketplace upon a lesser showing of harm to the public than the preponderance-of-the-evidence or the more-like-than-not standard used to assess tort liability . . . . [Its] decision that [the drug] can cause [the injury] is unreliable proof of medical causation.”)

Wright v.Williamette Indus., Inc., 91 F.3d 1105, 1107 (8th Cir. 1996)

Nelson v. Am. Home Prods. Corp., 92 F. Supp. 2d 954, 958 (W.D. Mo. 2000) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

National Bank of Commerce v. Associated Milk Producers, Inc., 22 F. Supp. 2d 942, 961 (E.D.Ark. 1998), aff’d, 191 F.3d 858 (8th Cir. 1999)

Junk v. Terminix International Co., 594 F. Supp. 2d 1062, 1071 (S.D. Iowa 2008) (“government agency regulatory standards are irrelevant to [plaintiff’s] burden of proof in a toxic tort cause of action because of the agency’s preventative perspective”)

Ninth Circuit

Lopez v. Wyeth-Ayerst Labs., Inc., 1998 WL 81296, at *2 (9th Cir. Feb. 25, 1998) (FDA’s precautionary decisions on labeling are not a determination of causation of specified adverse events)

Tenth Circuit

Hollander v. Shandoz Pharm. Corp., 95 F. Supp. 2d 1230, 1239 (W.D. Okla. 2000) (distinguishing FDA’s threshold of proof as lower than appropriate in tort law), aff’d in relevant part, 289 F.3d 1193, 1215 (10th Cir. 2002)

Mitchell v. Gencorp Inc., 165 F.3d 778, 783 n.3 (10th Cir. 1999) (state administrative finding that product was a carcinogen was based upon lower administrative standard than tort standard)

In re Breast Implant Litig., 11 F. Supp. 2d 1217, 1229 (D.Colo. 1998)

Eleventh Circuit

Rider v. Sandoz Pharm. Corp., 295 F.3d 1194, 1201 (11th Cir. 2002)(FDA may take regulatory action, such as revising warning labels or withdrawing drug from the market ‘‘upon a lesser showing of harm to the public than the preponderance-of-the-evidence or more-likely-than-not standard used to assess tort liability’’)(“A regulatory agency such as the FDA may choose to err on the side of caution. Courts, however, are required by the Daubert trilogy to engage in objective review of the evidence to determine whether it has sufficient scientific basis to be considered reliable.”)

McClain v. Metabolife Internat’l, Inc., 401 F.3d 1233, 1248-1250 (11th Cir. 2005)(ephedra)( “[U]se of FDA data and recommendations raises a more subtle methodological issue in a toxic tort case. The issue involves identifying and contrasting the type of risk assessment that a government agency follows for establishing public health guidelines versus an expert analysis of toxicity and causation in a toxic tort case.’’)

Siharath v. Sandoz Pharm. Corp., 131 F. Supp. 2d 1347, 1370 (N.D. Ga. 2001)(“The standard by which the FDA deems a drug harmful is much lower than is required in a court of law.  The FDA’s lesser standard is necessitated by its prophylactic role in reducing the public’s exposure to potentially harmful substances.”)

In re Seroquel Products Liab. Litig., 601 F. Supp. 2d 1313, 1315 (M.D. Fla. 2009)(noting that administrative agencies “impose[] different requirements and employ[] different labeling and evidentiary standards” because a “regulatory system reflects a more prophylactic approach” than the common law)

STATES

New York

Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 450, 857 N.E.2d 1114, 1122, 824 N.Y.S.2d 584 (N.Y. 2006) (“standards promulgated by regulatory agencies as protective measures are inadequate to demonstrate legal causation”)

In re Bextra & Celebrex, 2008 N.Y. Misc. LEXIS 720, *20, 239 N.Y.L.J. 27 (2008) (characterizing FDA Advisory Panel recommendations as regulatory standard and protective measure).

Ohio

Valentine v. PPG Industries, Inc., 821 N.E.2d 580, 597-98 (Ohio App. 2004), aff’d, 850 N.E.2d 683 (Ohio 2006).

Pennsylvania

Betz v. Pneumo Abex LLC, 44 A. 3d 27 (Pa. 2012).

Texas

Exxon Corp. v. Makofski, 116 S.W.3d 176, 184-85 (Tex. App. 2003)

Extraordinary Claims Require Extraordinary Evidence – Cold Fusion

August 18th, 2012

According to the font of knowledge, Wikipedia, the oft-quoted expression, “An extraordinary claim requires extraordinary proof,” is due to Macello Truzzi.  See Marcello Truzzi, “On the Extraordinary: An Attempt at Clarification,” 1(1) Zetetic Scholar 11 (1978).  I certainly recall hearing a similar statement from Carl Sagan, who popularized the expression on his PBS specials. But Pierre-Simon Laplace, in his Bayesian phase, stated the matter best, over two centuries ago:

“The weight of evidence for an extraordinary claim must be proportioned to its strangeness.”

Martin Fleischmann and his associate, B. Stanley Pons, might have avoided some embarrassment if they had taken Laplace’s maxim to heart.  With a very low posterior probability, they needed an extraordinary likelihood ratio to make their claimed outcome of “cold fusion” credible.

In March 1989, Fleischmann and Pons held a news conference to announce their illusory discovery of so-called “cold fusion.” The immediate reaction from many in the media was uncritical acclaim.  Fleischmann and Pons made the front page of major newspapers, and the covers of the then popular weekly news magazines (Time and Newsweek).  The media frenzy was clearly justified if the claim were true.  Their spectacular claim invited attempts at replication, but no amount of wish bias could make dream into fact. Scientists from around the world, including the American Physical Society and from the United States Department of Energy, in short order, put Fleischmann and Pons’ claim to rest.

Martin Fleischmann died earlier this month, and the New York Times published a lengthy obituary.  Douglas Martin, Martin Fleischmann, Seeker of Cold Fusion, Dies at 85, N.Y. Times A18 (Aug. 12, 2012). Not surprisingly, the Times focused on the “cold fusion” fiasco, and the discredited research claim of Fleischmann and Pons.  The obituary quoted Richard Petrasso, a Massachusetts Institute of Technology physicist who, in a 1991 interview, expressed his initial conviction that Fleischmann and Pons’ work was an “absolute fraud,” but later softened in noting that the two scientists “probably believed in what they were doing.” William J. Broad, “Cold-Fusion Claim Is Faulted on Ethics as Well as Science, ” New York Times (Mar. 17, 1991).

Petrasso was probably correct, but his interpretation, while charitable, highlights the power of wish and confirmation biases in science.  Fleischmann was a capable, well-trained scientist.  He received his doctorate from the University of London, held respectable academic appointments, and was elected a fellow of the Royal Society.  He had over 240 articles published in journals. While social constructivists anguish over corporate influence in science, a great deal of really bad science receives a pass because wish and confirmation biases are so commonplace.  The Times quoted Fleischmann as saying, in 2009, that “unless we get fusion to work in some fashion, we are doomed, aren’t we?” Perhaps his sense of doom helped make his slippery evidence easier to accept.  According to the obituary, Fleischmann and Pons planned their experimental approach while hiking in Utah.  Whiskey was involved.

Inexpensive, limitless energy attracted a great deal of attention.  Unfortunately, many science and health claims do not elicit prompt attempts at replication, and the public and the scientific communities are often willing to accept claims at face value.  They would be prudent to heed Laplace’s dictum. I can think of any number of litigation claims which evaded expert witness gatekeeping because of violations of Laplace’s guidance.

As for Fleischmann’s death, the obituary in the Times probably suffices to prove the fact.

Copywrongs – Plagiarism in the Law

July 20th, 2012

Previously I have written about the ethical and practical issues involved in lawyers’ plagiarism.  See also Copycat – Further Thoughts.  Professor Douglas E. Abrams, of the University of Missouri School of Law, has written an interesting article on issues raided by lawyers’ plagiarism, “Plagiarism in Lawyers’ Advocacy: Imposing Discipline for Conduct Prejudicial to the Administration of Justice,” which is due out next year in the Wake Forest Law Review.  For now, a draft is available from the Social Science Research Network for download.

Abrams details some recent cases in which counsel were chastised for copying published material, prior judicial opinions, and other counsel’s briefs.  There is still a lot of gray areas.  Abrams does not deal with legal forms.  However flattering it is for judges to adopt language from lawyers’ briefs, is it plagiarism for them to do so?  Does a judge commit plagiarism by adopting language wholesale from a lawclerk’s draft? Does a senior lawyer commit plagiarism by leaving off the names of junior lawyers and lawclerks, who contributed portions of the brief? Does it matter if the senior lawyer’s writing is an article for publication rather than a brief to the court?  If a lawyer takes language from another’s brief, and uses it in an article, does she commit plagiarism?  If a lawyer discovers plagiarism committed by another lawyer, is there an ethical obligation to report the plagiarizer?

Gatekeeping the Lumpers and Splitters – Composite End Points

June 26th, 2012

The battle between lumpers and splitters is fought in many disciplines, and so it is not surprise that it finds its way into litigation.

The battle is often entrenched in the discipline of epidemiology, where practitioners tussle over the definition of the end point of a study or clinical trial. Lumping has the advantage of increasing study size, with attendant increases in statistical power.  The down side of lumping is that the “lumped” or composite outcome may no longer be meaningful with respect to the more precise outcome of interest.  In other words, the lumping threatens the external validity of the study.  Splitting preserves external validity with respect to outcome of interest, but decreases study size, with a greater risk of Type II errror.

The issue arises in birth defect litigation, such as the claims made against the manufacturer of Bendectin, where the claimants’ expert witnesses frequently tried to increase power by lumping different birth defects together, despite the lack of embryological plausibility.  The issue has come up in cardiovascular end point trials and meta-analyses, involving thrombo-embolic outcomes, such as stroke and heart attack.  The Celebrex litigation, for instance, involved contested issues of what cardiovascular end points to combine to capture the postulated thrombotic causal mechanism.  In re Pfizer Inc. Securities Litig., 2010 WL 1047618 (S.D.N.Y. 2010).

Despite the recurrence of lumping/splitting issues in litigation of epidemiologic evidence, the Reference Manual for Scientific Evidence (3d ed. 2011)  does not treat the subject at all.  Federal and state judges are often at sea (without sextant or compass) in disputes over lumping and splitting, where the methodology selected can often determine the result.  The following is a collection of some observations, comments, and guidances from the biomedical literature on the use of composite end points. 

 

Composite Endpoints

A.  Definition

Composite end points are typically defined, perhaps circularly, as a single group of health outcomes, which group is made up of constituent or single end points.  Meinert defined a composite outcome as “an event that is considered to have occurred if any of several different events or outcomes is observed.”  C. Meinert, Clinical Trials Dictionary (Johns Hopkins Center for Clinical Trials 1996). Similarly, Montori defined composite end points as “outcomes that capture the number of patients experiencing one or more of several adverse events.”  Montori, et al., “Validity of composite end points in clinical trials.”  300 Brit. Med. J.  594, 596 (2005).  Composite end points are also sometimes referred to as combined or aggregate end points.

Many composite end points are clearly defined for a clinical trial, and the component end points are specified.  In some instances, the composite nature of an outcome may be subtle or be glossed over by the study’s authors.  In the realm of cardiovascular studies, for example, investigators may look at stroke as a single endpoint, without acknowledging that there are important clinical and pathophysiological differences between ischemic strokes and hemorrhagic strokes (intracerebral or subarachnoid).  The Fletchers give the example:

“In a study of cardiovascular disease, for example, the primary outcomes might be the occurrence of either fatal coronary heart disease or non-fatal myocardial infarction.  Composite outcomes are often used when the individual elements share a common cause and treatment.  Because they comprise more outcome events than the component outcomes alone, they are more likely to show a statistical effect.”

R. Fletcher & S. Fletcher, Clinical Epidemiology: The Essentials 109 (4th ed. 2005).

B.  Utility of Composite End Points

1.  Power

Use of composite end points frequently occurs in the context of studying heart attacks as the outcome of interest.  Improvements in medical care have led to decreased frequency in rates of myocardial infarction (MI) and repeat MIs.  In clinical trials, because of the close medical attention received by participants, event rates are even lower than what might be expected from the relevant general patient population.  These low event rates have caused power issues for clinical trialists, who have responded by turning to composite end points to capture more events.  Composite end points permit smaller sample sizes and shorter follow-up times.  Increasing study power, while reducing sample size or observation time, is perhaps the most frequently cited rationale for using composite end points.

Typical statements from the medical literature:

“Clinical trials, particularly in cardiology, often use composite end points to reduce sample size requirements and to capture the overall impact of therapeutic interventions.”

(Ferreira-Gonzalez 2007, p. 1b, Introduction)

“The widespread use of composite end points reflects their elegant simplicity as a solution to the problem of declining event rates.”

(Montori 2005, at 596, Conclusions)

“The primary rationale for considering a composite primary outcome instead of a single event outcome is sample size.”

(Neaton 2005, at 598b)

“Clinical trialists use composite end points, outcomes that capture the number of patients who have one or more of several events, to increase event rates and statistical power.”

(Ferreira-Gonzalez 2007, p. 6a, Box)

“Although dealing with multiple testing is an important factor in the design and analysis of clinical trials, this may not be the only motivation behind the popularity of composite outcome measures.  Instead, issues of statistical efficiency appear to be prominent, with composite outcomes in time-to-event trials leading to higher event rates and thus enabling smaller sample sizes or shorter follow-up (or both).”

(Freemantle 2003, at 2555 b-c)

“Investigators often use composite end points to enhance the statistical efficiency of clinical trials.”

(Montori 2004, at 1094b)

2.  Competing Risks

Another reason that is offered in support of using composite end points is composites provide a strategy to avoid the problem of competing risks.  (Neaton 2005, at 569a)  Death (any cause) is sometimes added to a distinct clinical morbidity because patients who are taken out of the trial by death are “unavailable” to experience the morbidity outcome.

3.  Multiple Testing

By aggregating several individual end points into a single pre-specified outcome, trialists can avoid corrections for multiple testing.  Trials that seek data on multiple outcomes, or on multiple subgroups, inevitably raise concerns about the appropriate choice of the measure for the statistical test (alpha) to determine whether to reject the null hypothesis.  According to some authors, “[c]omposite endpoints alleviate multiplicity concerns.”  Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1593a (2005).  Schultz and Grimes, who written extensively about methodological issues, comment further:

“If designated a priori as the primary outcome, the composite obviates the multiple comparisons associated with testing of the separate components.  Moreover, composite outcomes usually lead to high event rates thereby increasing power or reducing sample size requirements.  Not surprisingly, investigators frequently use composite endpoints.”

Id.  Freemantle and Calvert acknowledge that the need to avoid false positive results from multiple testing is an important rationale for composite end points:

“Because the likelihood of observing a statistically significant result by chance alone increases with the number of tests, it is important to restrict the number of tests undertaken and limit the type 1 error to preserve the overall error rate for the trial.”

Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J . 756, 756a – b (2007).  Freemantle previously had articulated a similar rationale:

“[T]he correct (a priori) identification of a composite end point can increase the statistical precision and thus the efficiency of a trial.”

(Freemantle 2003, at 2558a)

4.  Indecision about an Appropriate Single Outcome

The International Conference on Harmonization suggests that the inability to select a single outcome variable may lead to the adoption of a composite outcome:

“If a single primary variable cannot be selected …, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable.”

International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use; “ICH harmonized tripartite guideline:  statistical principles for clinical trials,” 18 Stat. Med. 1905 (1999).

Freemantle gives this rationale some measure of approval:

“Composite outcomes can help in avoiding arbitrary decisions between different candidate outcomes when prespecifying the primary outcome … .”

(Freemantle & Calvert 2007, at 757a)

“[A] composite outcome may help investigators who are having difficulty in deciding which outcome to elect as the primary outcome measure in a trial and deal with the issue of multiplicity in an efficient manner, avoiding the need for arbitrary choices.”

(Freemantle 2003, at 2558a-b)

The “indecision” rationale has also been criticized:

“Inability to reach consensus on a single outcome is generally not a good reason to use a composite end point.”

(Neaton 2005, at 569b)

 

C.  Validity of Composite End Points

The validity of composite end points depends upon assumptions, which will have to be made at the time of the study design and protocol creation.  After the data are collected and analyzed, the assumptions may or may not be supported.

“The validity of composite end points depends on

  • similarity in patient importance,
  • [similarity in] treatment effect, and
  • number of events across the components.”

(Montori 2005, at 596, Summary Point No. 2)

“Use of composite end points is usually justified by the assumption that the effect on each of the components will be similar and that patients will attach similar importance to each component.”

(Montori 2005, at 594a, paragraph 2)

 

D.  Role of Mechanism in Justifying Composite End Points

A composite end point will obviously make sense when the individual end points are biologically related, and the investigators reasonably expect that the individual end points would be affected in the same direction, and in the same approximate amount.

“Confidence in a composite end point rests partly on a belief that similar reductions in relative risk apply to all the components.  Investigators should therefore construct composite endpoints in which the biology would lead us to expect similar effects across components.”

(Montori 2005, 595b)

 

E.  Methodological Issues

The acceptability of composite end points is often a delicate balance between the statistical power and efficiency gained and the reliability concerns raised by using the composite.  As with any statistical or interpretative tool, the key questions revolve how is the tool used, and for what purpose.  The reliability issues raised by the use of composites are likely to be highly contextual.

For instance, there is an important asymmetry between justifying the use of a composite for measuring efficacy and the use of the same composite for safety outcomes.  A biological improvement in type 2 diabetes might be expected to lead to a reduction in all the macrovascular complications of that disease, but a medication for type 2 diabetes might have a very specific toxicity or drug interaction, which affects only constituent end point among all macrovascular complications, such as myocardial infarction.  The asymmetry between efficacy and safety outcomes is specifically addressed in a recent publication:

“Varying definitions of composite end points, such as MACE, can lead to substantially different results and conclusions.  There, the term MACE, in particular, should not be used, and when composite study end points are desired, researchers should focus separately on safety and effectiveness outcomes, and construct separate composite end points to match these different clinical goals.”

(Kip 2008, 701, Abstract – Conclusions)(emphasis in original)

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  Severally years ago, the British Medical Journal published a paper by Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004).  The authors distill their advice down to six suggestions, one of which deals explicitly with composite end points:

“Guide to avoid being misled by biased presentation and interpretation of data

1.  Read only the Methods and Results sections; bypass the Discuss section

2.  Read the abstract reported in evidence based secondary publications

3.  Beware faulty comparators

4.  Beware composite endpoints

5.  Beware small treatment effects

6.  Beware subgroup analyses”

 

 

 

 

 

 

 

 

 

 

 

Id. at 1093a (emphasis added).  The authors elaborate on the problems that arise from the use of composite end points:

“Problems in the interpretation of these trials arise when composite end points include component outcomes to which patients attribute very different importance… .”

(Montori 2004, at 1094b.)

“Problems may also arise when the most important end point occurs infrequently or when the apparent effect on component end points differs.”

(Montori 2004, at 1095a.)

“When the more important outcomes occur infrequently, clinicians should focus on individual outcomes rather than on composite end points.  Under these circumstances, inferences about the end points (which because they occur infrequently will have very wide confidence intervals) will be weak.”

(Montori 2004, at 1095a.)

“When large variations exist between components the composite end point should be abandoned.”

(Montori 2005, at 596, Summary Point No. 3)

“Occasionally, composite end points prove useful and informative for clinical decision making.  Often, they do not.”

(Montori 2005, at 596, Conclusions)

“Composite endpoints frequently lack clinical relevancy.  Thus, composite endpoints address multiplicity and generally yield statistical efficiency at the risk of creating interpretational difficulties.”

(Schulz & Grimes 2005, at 1593a-b)

“The disadvantages of composite outcomes may arise when the constituents do not move in line with each other.”

(Freemantle 2003, at 2558a)

“Composite end points, as currently used in cardiovascular trials, may often be misleading.”

(Ferreira-Gonzalez 2007, p. 6a, Box)

“Trialists should report complete data on individual component end points to facilitate appropriate interpretation; clinicians should view with caution the results of cardiovascular trials that use composite end points to report their results.”

(Ferreira-Gonzalez 2007, p. 7a)

 

F.  Methodological Issues Concerning Causal Inferences from Composite End Points to Individual End Points

Several authors have criticized pharmaceutical companies for using composite end points to “game” their trials.  Composites allow smaller sample size, but they lend themselves to broader claims for outcomes included within the composite.  The same criticism appears to be valid when applied to attempts to infer that there is risk of an individual endpoint based upon a showing of harm in the composite endpoint.

“If a trial report specifies a composite endpoint, the components of the composite should be in the well-known pathophysiology of the disease.  The researchers should interpret the composite endpoint in aggregate rather than as showing efficacy of the individual components.  However, the components should be specified as secondary outcomes and reported beside the results of the primary analysis.”

(Schulz & Grimes 2005, at 1595a)(emphasis added)

“[A] positive result for a composite outcome applies only to the cluster of events included in the composite and not to the individual components.”

(Freemantle & Calvert 2007, at 757a) [Freemantle and Calvert urge “health warnings” that a composite end point benefit cannot be interpreted to mean an actual benefit in every constituent end point.]

“To avoid the burying of important components of composite primary outcomes for which on their own no effect is concerned, . . . the components of a composite outcome should always be declared as secondary outcomes, and the results described alongside the result for the composite outcome.”

(Freemantle 2003, at 2559a, Point No. 3; 2559b-c, Box)

“Authors and journal editors should ensure that the reporting of composite outcomes is clear and avoids the suggestion that individual components of the composite have been demonstrated to be effective.”

(Freemantle 2003, at 2559b-c, Box Point No. 4)

 

G.  Regulatory Experience

“Regulatory behavior may have led to the addition of ‘death’ to many composite primary end points used in trials, and it is our experience that the Food and Drug Administration has actively promoted the use of such composite outcome measures in the heart failure trials.”

(Freemantle & Calvert 2007, at 757a)

The FDA addressed composite end points in the context of its recommendations for looking at cardiovascular outcomes in Phase III and Phase IV clinical trials for anti-diabetic therapies.

“In cardiovascular trials, as in all trials, the primary endpoint should be predefined, justified, and accurately captured and analyzed. Powering the study on an individual type of event (e.g., myocardial infarction) is usually not feasible because of low incidence rates. Therefore, many cardiovascular trials use the MACE (Major Adverse Cardiovascular Event) composite endpoint, which contains all-cause mortality (or cardiovascular death), non-fatal myocardial infarction, and stroke. Some cardiovascular trials include other macrovascular events, such as coronary revascularization and lower-extremity amputation. Use of all-cause mortality as part of the MACE endpoint in a trial with excellent follow-up has the advantage of certainty as to whether the event occurred. However, the cause of death should still be determined in a well-designed trial to ensure that there are no imbalances in particular fatal events (e.g., neoplasms or strokes). Use of cardiovascular death as part of the MACE endpoint may be more relevant but, like myocardial infarction and stroke, requires adjudication by an independent and blinded committee with pre-specified case definitions and methodology for ascertaining events (e.g., access to medical records and laboratory data).  If the study is powered on a composite endpoint, there will likely be too few events for the individual components (e.g., acute myocardial infarction) of the composite to provide conclusive evidence of a difference between treatment groups with regard to these individual endpoints. In addition, a difference between treatment groups in the composite endpoint may primarily be driven by one or more of the individual components that comprise the endpoint. As a result, secondary efficacy measures often include analyses of the individual components as initial and total events to determine their contribution to the overall primary efficacy results.”

(FDA Background Introductory Memorandum, for Endocrinologic and Metabolic Drugs Advisory Committee meeting, July 1-2, 2008, at p. 17 – 18.)

 

H.  Specific Composite End Points

1.  Myocardial ischemia 

In the Avandia litigation, some investigators chose to look at a composite of “myocardia ischemia.”  Plaintiffs’ counsel, and even some publications, appear to equate a finding of this composite end point with one of myocardial infarction.  For instance, Curt Furberg equated MI with myocardial ischemia in a JAMA publication of his meta-analysis of rosiglitazone trials.  See, e.g., Singh, et al., “Long-term risk of cardiovascular events with rosiglitazone:  a meta-analysis,” 298 JAMA 1189, 1193 (2007)(“Two previous meta-analyses showed that the risk of MI was significantly increased by rosiglitazone. An unpublished meta-analysis (ZM 2005/00181/01) conducted in 2005 involving 14,237 participants from 42 double-blind RCTs determined the incidence of MI in the rosiglitazone group to be 1.99% vs. 1.51% in controls (hazard ratio, 1.31; 95% CI, 1.01-1.70).”)(emphasis added; internal references omitted).  From his endnotes, it is clear that Furberg is referencing GlaxoSmithKline’s own meta-analysis, which used myocardial ischemia, not MI, as an end point.  See Alexander Cobitz, et al., “A retrospective evaluation of congestive heart failure and myocardial ischemia events in 14 237 patients with type 2 diabetes mellitus enrolled in 42 short-term, double-blind, randomized clinical studies with rosiglitazone,” 17 Pharmacoepi. and Drug Safety 769 (2008) (reporting GSK’s meta-analysis of 42 clinical trials for a broad definition of myocardial ischemia).  Furberg’s confusion seems the sort of carelessness that trial judges should be alert to guard against.

Myocardial ischemia may be variously defined, but at least it may include MI and angina.  Sometimes revascularization is added.  Subjective symptoms as vague as “dyspnea,” or as specific as sub-sternal pain, may be part of the definition.  A definition of myocardial ischemia used in an exploratory, hypothesis-generating analysis, for purposes of “pharmacovigilance,” may have different validity and operational characteristics from a definition used in a study that is trying to determine whether a medication, does in fact, cause any one of the constituent end points within the composite.

2.  MACE

Recently, the use of the MACE composite end point has been subjected to greater scrutiny and criticism.  Kip summarizes his group’s recent analysis:

“In light of the approximate prior 15 years of the term MACE and its wide heterogeneity in definition and research applications, it is unlikely that a consensus definition will either be universally desired or practical for future research.  Therefore, we recommend against the routine use of MACE as a composite end point at large.  However, if a broad heterogeneous composite end point such as MACE is ultimately desired, minimally, it must be clearly defined, and the individual as well as composite end points need to be analyzed, presented, and discussed.”

(Kip 2008, at 706b)

Kip notes that this his group’s recommendations are consistent with those of the Academic Research Consortium, which has tried to establish consensus composite end point definitions for stent trials.  See Cutlip, et al., “Clinical end points in coronary stent trials:  a case for standardized definitions,” 115 Circulation 2344 (2007).

3.  Cardiovascular or cardiac death

The use of a composite end point of cardiac death has elicited some strong criticism in the published literature, most notably from Dr. Nissen’s former colleague, Dr. Eric Topol.  See generally, Lauer & Topol, “Clinical trials – Multiple treatments, multiple end points, and multiple lessons,” 289 JAMA 2575 (2003).

“Among fatal end points, only all-cause mortality can be considered objective, unbiased, and clinically relevant.  As previously reviewed in depth, the use of end points such as ‘cardiac death’, ‘vascular death’, and ‘arrhythmic death’ are inherently subject to error due to biased assessment and to the biological complexities of disease, especially among elderly individuals.”

(Lauer & Topol 2003, at 2575b)

“When mortality is considered, only all-cause mortality is a valid end point, while end points such as ‘cardiac death’ and ‘arrhythmic death’ should be actively discouraged.”

(Lauer & Topol 2003, at 2577a)

4.  All-cause death

Although most authors accept “any death” as a potential corrective to competing risks, and the ultimate, objective outcome, Lauer and Topol do not completely spare the inclusion of all-cause death in outcome composites, from criticism:

“A composite end point that includes death as well as nonfatal events is subject to biases related to competing risks.  Obviously, patients who die cannot later experience nonfatal myocardial infarction or be hospitalized.  A treatment that leads to an increased risk of death may therefore appear to reduce the risk of nonfatal events.  Although formal methods have been developed to analyze competing risks in an unbiased manner, the optimal approach to this problem is unclear.”

(Lauer & Topol 2003, at 2576a)

 

J.   Bibliography

Cutlip, et al., “Clinical end points in coronary stent trials:  a case for standardized definitions,” 115 Circulation 2344 (2007)

FDA Background Introductory Memorandum, for Endocrinologic and Metabolic Drugs Advisory Committee meeting (July 1-2, 2008)

Ferreira-Gonzalez, et al., “Problems with the use of composite end point in cardiovascular trials: systematic review of randomized controlled trials.”  334 Brit. Med. J.  (published online 2 April 2007).

R. Fletcher & S. Fletcher, Clinical Epidemiology:  The Essentials (4th ed. 2005).

Freemantle, et al., “Composite outcomes in randomized trials: Greater precision but with greater uncertainty.”  289 J. Am. Med. Ass’n  2554 (2003)

Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J.  756 (2007)

International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use.  ICH harmonized tripartite guideline:  statistical principles for clinical trials, 18 Stat. Med. 1905 (1999)

Kip, et al., “The problem with composite end points in cardiovascular studies,” 51 J. Am. Coll. Cardiol. 701 (2008)

Lauer & Topol, “Clinical trials – Multiple treatments, multiple end points, and multiple lessons.”  289 J. Am. Med. Ass’n 2575 (2003)

Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004)

Montori, et al., “Validity of composite end points in clinical trials.”  300 Brit. Med. J. 594 (2005).

Neaton, et al., “Key issues in end point selection for heart failure trials:  composite end points,” 11 J. Cardiac Failure 567 (2005)

Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591 (2005)

NIOSH Report Sets Up Run on September 11th Victim Compensation Fund by Non-Victims

June 16th, 2012

Congress created September 11th Victim Compensation Fund, 49 USC § 40101, to compensate victims of the terrorist attack.  Being a victim implies that the harm to be compensated was caused by the attack and its consequences.  Understandably, many of the harms were acute injuries, but what about cancer?  The latency period for most cancers are greater than 10 years, and the latency alone would suggest that persons who developed cancer within 10 years were not “victims,” but rather expected incidences of prevalent, chronic disease, or the result of much earlier exposures in the patients’ lifetimes.

Less than a year ago, the New York Times reported on a NIOSH report, which documented that there was little evidence upon which to rely, and what was available did not support conclusions of causality.  See Anemona Hartocollis, “Scant Evidence to Link 9/11 to Cancer, U.S. Report Says,” N.Y. Times (July 26, 2011).  The report appropriately noted that “[d]rawing causal inferences about exposures resulting from the Sept. 11, 2001, terrorist attacks and the observation of cancer cases in responders and survivors is especially challenging since cancer is not a rare disease.”

A few months later, the Times reported on an epidemiologic study of firefighters who were present at the World Trade Center in 2001.  Sydney Ember, “Study Suggests Higher Cancer Risk for 9/11 Firefighters,” N.Y. Times (Sept. 1, 2011).  According to the Times, the study:

“says firefighters who toiled in the wreckage of the World Trade Center in 2001 were 19 percent more likely to develop cancer than those who were not there, the strongest evidence to date of a possible link between work at ground zero and cancer. The study, published Thursday in the British medical journal The Lancet, included almost 10,000 New York City firefighters, most of whom were exposed to the caustic dust and smoke created by the fall of the twin towers. The findings indicate an “increased likelihood for the development of any type of cancer,” said Dr. David J. Prezant, the chief medical officer for the New York Fire Department, who led the study. But he said the results were far from conclusive. ‘This is not an epidemic’, he said.”

Well this is just bad reporting; the study said nothing of the sort.  The study reported a non-statistically significant standardized incidence ratio for all cancer, of either 1.10 (95% CI 0·98–1·25), with a comparison group of the generalized U.S. male population, or 1·19 (95% CI 0·96–1·47), with unexposed firefighters as a comparison group, and corrected for possible surveillance bias.  Here are the authors’ (including Dr. Prezant’s) published interpretation of the data:

“We reported a modest excess of cancer cases in the WTC-exposed cohort. We remain cautious in our interpretation of this finding because the time since 9/11 is short for cancer outcomes, and the reported excess of cancers is not limited to specific organ types. As in any observational study, we cannot rule out the possibility that effects in the exposed group might be due to unidentified confounders. Continued follow-up will be important and should include cancer screening and prevention strategies.

Rachel Zeig-Owens, Mayris Webber, Charles Hall, Theresa Schwartz, Nadia Jaber, Jessica Weakle , Thomas Rohan, Hillel Cohen, Olga Derman, Thomas Aldrich, Kerry Kelly, David  Prezant, “Early assessment of cancer outcomes in New York City firefighters after the 9/11 attacks: an observational cohort study,” 378 Lancet 898, 898 (2011) [hereafter Zeig-Owens].

The Zeig-Owens study was a cohort of New York firefighters who had worked at the WTC in the immediate aftermath of the attack.  The data neither ruled out chance nor bias and confounding as a basis for the reported risk ratios. The potentially toxic exposures at the WTC were only some of the exposures these men experienced over their careers.  Comparisons with the general population are thus not terribly revealing, but the study also compared the firefighters with other firefighters who did not work at the WTC.

For firefighters, lung cancer is typically a concern, but the Zeig-Owens study reported that the WTC firefighters had a lower than expected incidence of lung cancer:

lung cancer 0.53 (95% CI 0.18 – 1.54)

Some cancers had an elevated SIR, which is also expected given that the study looked at dozens of different outcomes.  Esophageal cancer was typical of the few that cancers that fell above 1.0; the SIR was 1.32, but the 95% confidence interval was huge, running from 0.12 to 14.53.  Understandably the authors of the WTC study did not assert any causal conclusions.

The lack of causal conclusions and evidence did not ultimately stand in the way of politics.  Last week, John Howard, the director of the National Institute of Occupational Safety and Health (NIOSH) issued his ruling that some 50 different types of cancer be added to the illnesses and injuries covered by the 9/11 Compensation Fund.  Anemona Hartocollis, “Sept. 11 Health Fund Given Clearance to Cover Cancer,” N.Y. Times (June 8, 2012).

Not only is Howard’s report not based upon appropriate scientific conclusions of causality, it is a ghastly insult to those men and women who were truly victims of the attack.  On virtually no evidence at all, Howard’s decision dilutes the fund for those truly injured.  The extent of the dilution is disturbing; the decision will not only allow the victims and the heroic rescuers to apply for compensation for cancers, but it will allow residents and passerbys to do so, as well.

The Times quoted Dr. Howard as stating that the Zeig-Owens study provided “a strong foundation for a conclusion that some cancers had been caused by exposure to the WTC debris.”  This is rubbish.  Coming from the director of a supposedly scientific agency, the statement is shocking.  As noted above, the lung cancer incidence ratio for WTC firefighters was lower than expected, compared to either non-WTC firefighters or the general male population.  The presence of “known or potential carcinogens” in the debris hardly justifies compensation unless the exposures were at sufficient intensity and duration, with appropriate latencies, to have caused the claimed cancers.  Howard’s report is shamelessly bereft of evidence to support bilking the compensation fund, and diverting compensation from true victims of the jihadist attack.

Although the Times ignored the primary data, it did acknowledge that Howard’s report, and the recommendation upon which he relied, seemed to be based upon “societal concerns that the cancer patients not be left out of the fund.”  This is interest-group politics substituting for science.

The Times quoted Dr. Alfred I. Neugut, an oncologist and professor of epidemiology at the Mailman School of Public Health at Columbia University, as stating that the decision was “primarily motivated by concern for a sympathetic population,”  and that “[t]he scientific evidence currently is certainly weak; whether future evidence bears out the wisdom of this decision will have to be seen.”

But “weak” is understatement.  The Zeig-Owens study looked at dozens of organ cancers and subtypes; it was a huge exercise in data mining, which can best be described as hypothesis generating.  Howard, however, decided to reject any semblence of evidence-based medicine:

“Requiring evidence of positive associations from studies of 9/11-exposed populations exclusively does not serve the best interests.”

So apparently positive associations are no longer required; compensation can be based upon negative associations, as was the case with WTC firefighting and lung cancer.

The Times cheered Howard’s decision in an editorial that followed quickly on the heels of the NIOSH report. “Ground Zero Cancers” (June 14, 2012).  The Times acknowledged that “[s]ome experts still believe the evidence linking the Sept. 11 attacks to cancers is weak. But we have a moral obligation to ensure that those harmed by exposure at ground zero get the medical and financial help they need.”

But “weak” is a gross overstatement, and whence comes a moral obligation to help those not harmed by exposure at ground zero?  Where is the morality of diluting the compensation fund for those who were truly victims.  Howard’s report represents not only an abdication of the evidence-based world view, but profound disrespect for those were killed and maimed in this brutal attack.

Predictably, plaintiffs’ counsel have already urged mesothelioma patients to consider the fund.  SeeLawyer Urges NYC Mesothelioma Sufferers to Explore Options after Decision Expanding 9/11 Fund,” New York, NY (PRWEB) (June 16, 2012) (“New York mesothelioma lawyer Joseph W. Belluck today said that 9/11 workers with mesothelioma should step forward to explore their eligibility for compensation in light of a federal ruling that greatly expands the scope of a $4.3 billion fund established to compensate and treat people exposed to toxic smoke, dust and fumes following the Sept. 11, 2001, terrorist attacks.”) No mesothelioma cases were reported as incident among the WTC exposed firefighters.

An internet search on ” cancer 9/11 compensation fund” turned up dozens of lawyer advertisements and websites urging cancer claims against the Fund.

Sorting Out Confounded Research – Required by Rule 702

June 10th, 2012

CONFOUNDING

Back in 2000, several law professors wrote an essay, in which they detailed some of the problems faced in expert witness gatekeeping.  They noted that judges easily grasped the problem of generalizing from animal evidence to human experience, and thus they simplistically emphasized human (epidemiologic) data.  But in their emphasis, the judges missed problems of internal validity, such as confounding, in epidemiologic studies:

“Why do courts have such a preference for human epidemiological studies over animal experiments? Probably because the problem of external validity (generalizability) is one of the most obvious aspects of research methodology, and therefore one that non-scientists (including judges) are able to discern with ease – and then give excessive weight to (because whether something generalizes or not is an empirical question; sometimes things do and other times they do not). But even very serious problems of internal validity are harder for the untrained to see and understand, so judges are slower to exclude inevitably confounded epidemiological studies (and give insufficient weight to that problem). Sophisticated students of empirical research see the varied weaknesses, want to see the varied data, and draw more nuanced conclusions.”

David Faigman, David Kaye, Michael Saks, Joseph Sanders, “How Good is Good Enough?  Expert Evidence Under Daubert and Kumho,” 50 Case Western Reserve L. Rev. 645, 661 n.55 (2000).  I am not sure that the problems are dependent in the fashion suggested by the authors, but their assessment that judges may be slow and frequently lack the ability to draw nuanced conclusions seems fair enough. Judges continue to miss important validity issues, perhaps because the adversarial process levels all studies to debating points in litigation.  See, e.g., In re Welding Fume Prods. Liab. Litig., 2006 WL 4507859, *33 (N.D.Ohio 2006)(reducing all studies to one level, and treating all criticisms as though they rendered all studies invalid).

[This discussion of confounding has been updated; see here and there.]

 

Scientific illiteracy among the judiciary

February 29th, 2012

Ken Feinberg, speaking at a symposium on mass torts, asks what legal challenges do mass torts confront in the federal courts.  The answer seems obvious.

Pharmaceutical cases that warrant federal court multi-district litigation (MDL) treatment typically involve complex scientific and statistical issues.  The public deserves having MDL cases assigned to judges who have special experience and competence to preside in cases in which these complex issues predominate.  There appears to be no procedural device to ensure that the judges selected in the MDL process have the necessary experience and competence, and a good deal of evidence to suggest that the MDL judges are not up to the task at hand.

In the aftermath of the Supreme Court’s decision in Daubert, the Federal Judicial Center assumed responsibility for producing science and statistics tutorials to help judges grapple with technical issues in their cases.  The Center has produced videotaped lectures as well as the Reference Manual on Scientific Evidence, now in its third edition.  Despite the Center’s best efforts, many federal judges have shown themselves to be incorrigible.  It is time to revive the discussions and debates about implementing a “science court.”

The following three federal MDLs all involved pharmaceutical products, well-respected federal judges, and a fundamental error in statistical inference.

Avandia

Avandia is a prescription oral anti-diabetic medication licensed by GlaxoSmithKline (GSK).  Concerns over Avandia’s association with excess heart attack risk resulted in regulatory revisions of its availability, as well as thousands of lawsuits.  In a decision that affected virtually all of those several thousand claims, aggregated for pretrial handing in a federal MDL, a federal judge, in ruling on a Rule 702 motion, described a clinical trial with a risk ratio greater than 1.0, with a p-value of 0.08, as follows:

“The DREAM and ADOPT studies were designed to study the impact of Avandia on prediabetics and newly diagnosed diabetics. Even in these relatively low-risk groups, there was a trend towards an adverse outcome for Avandia users (e.g., in DREAM, the p-value was .08, which means that there is a 92% likelihood that the difference between the two groups was not the result of mere chance).FN72

In re Avandia Marketing, Sales Practices and Product Liability Litigation, 2011 WL 13576, *12 (E.D. Pa. 2011)(Rufe, J.).  This is a remarkable error by a trial judge given the responsibility for pre-trial handling of so many cases.  There are many things you can argue about a p-value of 0.08, but Judge Rufe’s interpretation is not an argument; it is error.  That such an error, explicitly warned against in the Reference Manual on Scientific Evidence, could be made by an MDL judge, over 15 years since the first publication of the Manual, highlights the seriousness and the extent of the illiteracy problem.

What possible basis could the Avandia MDL court have to support this clearly erroneous interpretation of crucial studies in the litigation?  Footnote 72 in Judge Rufe’s opinion references a report by plaintiffs’ expert witness, Allan D. Sniderman, M.D, “a cardiologist, medical researcher, and professor at McGill University.” Id. at *10.  The trial court goes on to note that:

“GSK does not challenge Dr. Sniderman’s qualifications as a cardiologist, but does challenge his ability to analyze and draw conclusions from epidemiological research, since he is not an epidemiologist. GSK’s briefs do not elaborate on this challenge, and in any event the Court finds it unconvincing given Dr. Sniderman’s credentials as a researcher and published author, as well as clinician, and his ability to analyze the epidemiological research, as demonstrated in his report.”

Id.

What more evidence could the Avandia MDL trial court possibly have needed to show that Sniderman was incompetent to give statistical and epidemiologic testimony?  Fundamentally at odds with the Manual on an uncontroversial point, Sniderman had given the court a baseless, incorrect interpretation of a p-value.  Everything else he might have to say on the subject was likely suspect.  If, as the court suggested, GSK did not elaborate upon its challenge with specific examples, then shame on GSK. The trial court, however, could have readily determined that Sniderman was speaking nonsense by reading the chapter on statistics in the Reference Manual on Scientific Evidence.  For all my complaints about gaps in coverage in the Manual, the text, on this issue is clear and concise. It really is not too much to expect an MDL trial judge to be conversant with the basic concepts of scientific and statistical evidence set out in the Manual, which is prepared to help federal judges.

Phenylpropanolamine (PPA) Litigation

Litigation over phenylpropanolamine was aggregated, within the federal system, before Judge Barbara Rothstein.  Judge Rothstein is not only a respected federal trial judge, she was the director of the Federal Judicial Center, which produces the Reference Manual on Scientific Evidence.  Her involvement in overseeing the preparation of the third edition of the Manual, however, did not keep Judge Rothstein from badly misunderstanding and misstating the meaning of a p-value in the PPA litigation.  See In re Phenylpropanolamine (PPA) Prods. Liab. Litig., 289 F.Supp. 2d 1230, 1236 n.1 (W.D. Wash. 2003)(“P-values measure the probability that the reported association was due to chance… .”).  Tellingly, Judge Rothstein denied, in large part, the defendants’ Rule 702 challenges.  Juries, however, overwhelmingly rejected the claims that PPA caused their strokes.

Ephedra Litigation

Judge Rakoff, of the Southern District of New York, notoriously committed the transposition fallacy in the Ephedra litigation:

“Generally accepted scientific convention treats a result as statistically significant if the P-value is not greater than .05. The expression ‘P=.05’ means that there is one chance in twenty that a result showing increased risk was caused by a sampling error—i.e., that the randomly selected sample accidentally turned out to be so unrepresentative that it falsely indicates an elevated risk.”

In re Ephedra Prods. Liab. Litig., 393 F.Supp. 2d 181, 191 (S.D.N.Y. 2005).

Judge Rakoff then fallaciously argued that the use of a critical value of less than 5% of significance probability increased the “more likely than not” burden of proof upon a civil litigant.  Id. at 188, 193.  See Michael O. Finkelstein, Basic Concepts of Probability and Statistics in the Law 65 (2009).

Judge Rakoff may well have had help in confusing the probability used to characterize the plaintiff’s burden of proof with the probability of attained significance.  At least one of the defense expert witnesses in the Ephedra cases gave an erroneous definition of “statistically significant association,” which may have invited the judicial error:

“A statistically significant association is an association between exposure and disease that meets rigorous mathematical criteria demonstrating that the finding is unlikely to be the result of chance.”

Report of John Concato, MD, MS, MPH, at 7, ¶29 (Sept. 13, 2004).  Dr. Concato’s error was picked up and repeated in the defense briefing of its motion to preclude:

“The likelihood that an observed association could occur by chance alone is evaluated using tests for statistical significance.”

Memorandum of Law in Support of Motion by Ephedra Defendants to Exclude Expert Opinions of Charles Buncher, [et alia] …That Ephedra Causes Hemorrhagic Stroke, Ischemic Stroke, Seizure, Myocardial Infarction, Sudden Cardiac Death, and Heat-Related Illnesses at 9 (Dec. 3, 2004).

Judge Rakoff’s insistence that requiring “statistical significance” at the customary 5% level would change the plaintiffs’ burden of proof, and require greater certitude for epidemiologists than for other expert witnesses who opine in less “rigorous” fields of learning, is wrong as a matter of fact.  His Honor’s comparison, however, ignores the Supreme Court’s observation that the point of Rule 702 is:

‘‘to make certain that an expert, whether basing testimony upon professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field.’’

Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999).

Judge Rakoff not only ignored the conditional nature of significance probability, but he overinterpreted the role of significance testing in arriving at a conclusion of causality.  Statistical significance may answer the question of the strength of the evidence for ruling out chance in producing the data observed based upon an assumption of the no risk, but it doesn’t alone answer the question whether the study result shows an increased risk.  Bias and confounding must be considered, along with other Bradford Hill factors.

Even if the p-value could be turned into a posterior probability of the null hypothesis, there would be many other probabilities that would necessarily diminish that probability.  Some of the other factors (which could be expressed as objective or subjective probabilities) include:

  • accuracy of the data reporting
  • data collection
  • data categorization
  • data cleaning
  • data handling
  • data analysis
  • internal validity of the study
  • external validity of the study
  • credibility of study participants
  • credibility of study researchers
  • credibility of the study authors
  • accuracy of the study authors’ expression of their research
  • accuracy of the editing process
  • accuracy of the testifying expert witness’s interpretation
  • credibility of the testifying expert witness
  • other available studies, and their respective data and analysis factors
  • all the other Bradford Hill factors

If these largely independent factors each had a probability or accuracy of 95%, the conjunction of their probabilities would likely be below the needed feather weight on top of 50%.  In sum, Judge Rakoff’s confusing significance probability and the posterior probability of the null hypothesis does not subvert the usual standards of proof in civil cases.  See also Sander Greenland, “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment,” 53 Preventive Medicine 225 (2011).

WHENCE COMES THIS ERROR

As a matter of intellectual history, I wonder where this error entered into the judicial system.  As a general matter, there was not much judicial discussion of statistical evidence before the 1970s.  The earliest manifestation of the transpositional fallacy in connection with scientific and statistical evidence appears in an opinion of the United States Court of Appeals, for the District of Columbia Circuit.  Ethyl Corp. v. EPA, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976).  The Circuit’s language is worth looking at carefully:

“Petitioners demand sole reliance on scientific facts, on evidence that reputable scientific techniques certify as certain.

Typically, a scientist will not so certify evidence unless the probability of error, by standard statistical measurement, is less than 5%. That is, scientific fact is at least 95% certain.  Such certainty has never characterized the judicial or the administrative process. It may be that the ‘beyond a reasonable doubt’ standard of criminal law demands 95% certainty.  Cf. McGill v. United States, 121 U.S.App.D.C. 179, 185 n.6, 348 F.2d 791, 797 n.6 (1965). But the standard of ordinary civil litigation, a preponderance of the evidence, demands only 51% certainty. A jury may weigh conflicting evidence and certify as adjudicative (although not scientific) fact that which it believes is more likely than not. ***”

 Id.  The 95% certainty appears to derive from 95% confidence intervals, although “confidence” is a technical term in statistics, and it most certainly does not mean the probability of the alternative hypothesis under consideration.  Similarly, the error that is less than 5% is not the probability of error of the belief in hypothesis of no difference between observations and expectations, but rather the probability of observing the data or the data even more extreme, on the assumption that observed would equal the expected.  The District of Columbia Circuit thus created a strawman:  scientific certainty is 95%, whereas civil and administrative law certainty is 51%.  This is rubbish, which confuses the frequentist probability from hypothesis testing with the subjective probability for belief in a fact.

The transpositional fallacy has a good pedigree, but that does not make it correct.  Only a lawyer would suggest that a mistake once made was somehow binding upon future litigants.  The following collection of citations and references illustrate how widespread the fundamental misunderstanding of statistical inference is, in the courts, in the academy, and at the bar.  If courts cannot deliver fair, accurate adjudication of scientific facts, then it is time to reform the system.


Courts

U.S. Supreme Court

Vasquez v. Hillery, 474 U.S. 254, 259 n.3 (1986) (“the District Court . . . accepted . . . a probability of 2 in 1,000 that the phenomenon was attributable to chance”)

U.S. Court of Appeals

First Circuit

Fudge v. Providence Fire Dep’t, 766 F.2d 650, 658 (1st Cir. 1985) (“Widely accepted statistical techniques have been developed to determine the likelihood an observed disparity resulted from mere chance.”)

Second Circuit

Nat’l Abortion Fed. v. Ashcroft, 330 F. Supp. 2d 436 (S.D.N.Y. 2004), aff’d in part, 437 F.3d 278 (2d Cir. 2006), vacated, 224 Fed. App’x 88 (2d Cir. 2007) (reporting an expert witness’s interpretation of a p-value of 0.30 to mean that there was a 30% probability that the study results were due to chance alone)

Smith v. Xerox Corp., 196 F.3d 358, 366 (2d Cir. 1999) (“If an obtained result varies from the expected result by two standard deviations, there is only about a .05 probability that the variance is due to chance.”)

Waisome v. Port Auth., 948 F.2d 1370, 1376 (2d Cir. 1991) (“about one chance in 20 that the explanation for a deviation could be random”)

Ottaviani v. State Univ. of New York at New Paltz, 875 F.2d 365, 372 n.7 (2d Cir. 1989)

Murphy v. General Elec. Co., 245 F. Supp. 2d 459, 467 (N.D.N.Y. 2003) (“less than a 5% probability that age was related to termination by chance”)

Third Circuit

United States v. State of Delaware, 2004 WL 609331, *10 n.27 (D. Del. 2004) (“there is a 5% (or 1 in 20) chance that the relationship observed is purely random”)

Magistrini v. One Hour Martinizing Dry Cleaning, 180 F. Supp. 2d 584, 605 n.26 (D.N.J. 2002) (“only 5% probability that an observed association is due to chance”)

Fifth Circuit

EEOC v. Olson’s Dairy Queens, Inc., 989 F.2d 165, 167 (5th Cir. 1993) (“Dr. Straszheim concluded that the likelihood that [the] observed hiring patterns resulted from truly race-neutral hiring practices was less than one chance in ten thousand.”)

Capaci v. Katz & Besthoff, Inc., 711 F.2d 647, 652 (5th Cir. 1983) (“the highest probability of unbiased hiring was 5.367 × 10-20”), cert. denied, 466 U.S. 927 (1984)

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982)(” A variation of two standard deviations would indicate that the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke. Sullivan, Zimmer & Richards, supra n.9 at 74.”)

Vuyanich v. Republic Nat’l Bank, 505 F. Supp. 224, 272 (N.D.Tex. 1980) (“the chances are less than one in 20 that the true coefficient is actually zero”), judgement vacated, 723 F.2d 1195 (5th Cir. 1984).

Rivera v. City of Wichita Falls, 665 F.2d 531, 545 n.22 (5th Cir. 1982) (“the probability of the observed outcome occurring purely by chance would be approximately five out of 100; that is, it could be said with a 95% certainty that the outcome was not merely a fluke”)

Seventh Circuit

Adams v. Ameritech Services, Inc., 231 F.3d 414, 424, 427 (7th Cir. 2000) (“it is extremely unlikely (that is, there is less than a 5% probability) that the disparity is due to chance.”)

Sheehan v. Daily Racing Form, Inc., 104 F.3d 940, 941 (7th Cir. 1997) (“An affidavit by a statistician . . . states that the probability that the retentions . . . are uncorrelated with age is less than 5 percent.”)

Eighth Circuit

Craik v. Minnesota State Univ. Bd., 731 F.2d 465, 476n. 13 (8th Cir. 1984) (“Statistical significance is a measure of the probability that an observed disparity is not due to chance. Baldus & Cole, Statistical Proof of Discrimination § 9.02, at 290 (1980). A finding that a disparity is statistically significant at the 0.05 or 0.01 level means that there is a 5 per cent. or 1 per cent. probability, respectively, that the disparity is due to chance.

Ninth Circuit

Good v. Fluor Daniel Corp., 222 F.Supp. 2d 1236, 1241n.9 (E.D. Wash. 2002)(describing “statistical tools to calculate the probability that the difference seen is caused by random variation”)

D.C. Circuit

National Lime Ass’n v. EPA, 627 F.2d 416,453 (D.C. Cir. 1980)

FEDERAL CIRCUIT

Hodges v. Secretary Dep’t Health & Human Services, 9 F.3d 958, 967 (Fed. Cir. 1993) (Newman, J., dissenting) (“Scientists as well as judges must understand: ‘the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences’.”)(citing and quoting from the Report of the Carnegie Commission on Science, Technology, and Government, Science and Technology in Judicial Decision Making 28 (1993).


Regulatory Guidance

OSHA’s Guidance for Compliance with Hazard Communication Act:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) Section V (July 6, 2007).


Academic Commentators

Lucinda M. Finley, “Guarding the Gate to the Courthouse:  How Trial Judges Are Using Their Evidentiary Screening Role to Remake Tort Causation Rules,” 336 DePaul L. Rev. 335, 348 n. 49 (1999):

“Courts also require that the risk ratio in a study be ‘statistically significant,’ which is a statistical measurement of the likelihood that any detected association has occurred by chance, or is due to the exposure. Tests of statistical significance are intended to guard against what are called ‘Type I’ errors, or falsely ascribing a relationship when there in fact is not one (a false positive).  See SANDERS, supra note 5, at 51. The discipline of epidemiology is inherently conservative in making causal ascriptions, and regards Type I errors as more serious than Type II errors, or falsely assuming no association when in fact there is one (false negative). Thus, epidemiology conventionally requires a 95% level of statistical significance, i.e. that in statistical terms it is 95% likely that the association is due to exposure, rather than to chance. See id. at 50-52; Thompson, supra note 3, at 256-58. Despite courts’ use of statistical significance as an evidentiary screening device, this measurement has nothing to do with causation. It is most reflective of a study’s sample size, the relative rarity of the disease being studied, and the variance in study populations. Thompson, supra note 3, at 256.”

 

Erica Beecher-Monas, Evaluating Scientific Evidence: An Interdisciplinary Framework for Intellectual Due Process 42 n. 30 (2007):

 “‘By rejecting a hypothesis only when the test is statistically significant, we have placed an upper bound, .05, on the chance of rejecting a true hypothesis’. Fienberg et al., p. 22. Another way of explaining this is that it describes the probability that the procedure produced the observed effect by chance.”

Professor Fienberg stated the matter corrrectly, but Beecher-Monas goes on to restate the matter in her own words, erroneously.  Later, she repeats her incorrect interpretation:

“Statistical significance is a statement about the frequency with which a particular finding is likely to arise by chance.19”

Id. at 61 (citing a paper by Sander Greenland, who correctly stated the definition).

Mark G. Haug, “Minimizing Uncertainty in Scientific Evidence,” in Cynthia H. Cwik & Helen E. Witt, eds., Scientific Evidence Review:  Current Issues at the Crossroads of Science, Technology, and the Law – Monograph No. 7, at 87 (2006)

Carl F. Cranor, Regulating Toxic Substances: A Philosophy of Science and the Law at 33-34(Oxford 1993)(One can think of α, β (the chances of type I and type II errors, respectively) and 1- β as measures of the “risk of error” or “standards of proof.”) See also id. at 44, 47, 55, 72-76.

Arnold Barnett, “An Underestimated Threat to Multiple Regression Analyses Used in Job Discrimination Cases, 5 Indus. Rel. L.J. 156, 168 (1982) (“The most common rule is that evidence is compelling if and only if the probability the pattern obtained would have arisen by chance alone does not exceed five percent.”)

David W. Barnes, Statistics as Proof: Fundamentals of Quantitative Evidence 162 (1983)(“Briefly, however, the findings of statistical significance at the P < .05, P < .04, and P < .02 levels indicate that the court can be 95%, 96%, and 98% certain, respectively, that the null hypotheses involved in the specific tests carried out … should be rejected.”)

Wayne Roth-Nelson & Kathey Verdeal, “Risk Evidence in Toxic Torts,” 2 Envt’l Lawyer 405,415-16 (1996) (confusing burden of proof with standard for hypothesis testint; and apparently endorsing the erroneous views given by Judge Newman, dissenting in Hodges). Caveat: Roth-Nelson is now a “forensic” toxicologist, who testifies in civil and criminal trials.

Steven R. Weller, “Book Review: Regulating Toxic Substances: A Philosophy of Science and Law,” 6 Harv. J. L. & Tech. 435, 436, 437-38 (1993) (“only when the statistical evidence gathered from studies shows that it is more than ninety-five percent likely that a test substance causes cancer will the substance be characterized scientifically as carcinogenic … to determine legal causality, the plaintiff need only establish that the probability with which it is true that the substance in question causes cancer is at least fifty percent, rather than the ninety-five percent to prove scientific causality”).

The Carnegie Commission on Science, Technology, and Government, Report on Science and Technology in Judicial Decision Making 28 (1993) (“The reality is that courts often decide cases not on the scientific merits, but on concepts such as burden of proof that operate differently in the legal and scientific realms. Scientists may misperceive these decisions as based on a misunderstanding of the science, when in actuality the decision may simply result from applying a different norm, one that, for the judiciary, is appropriate.  Much, for instance, has been written about ‘junk science’ in the courtroom. But judicial decisions that appear to be based on ‘bad’ science may actually reflect the reality that the law requires a burden of proof, or confidence level, other than the 95 percent confidence level that is often used by scientists to reject the possibility that chance alone accounted for observed differences.”).


Plaintiffs’ Counsel

Steven Rotman, “Don’t Know Much About Epidemiology?” Trial (Sept. 2007) (Author’s question answered in the affirmative:  “P values.  These measure the probability that a reported association between a drug and condition was due to chance.  A P-value of 0.05, which is generally considered the standard for statistical significance, means there is a 5 percent probability that the association was due to chance.”)

Defense Counsel

Bruce R. Parker & Anthony F. Vittoria, “Debunking Junk Science: Techniques for Effective Use of Biostatistics,” 65 Defense Csl. J. 35, 44 (2002) (“a P value of .01 means the researcher can be 99 percent sure that the result was not due to chance”).

Tortini – Guilt-Free Pastry

January 16th, 2012

Well, I have blogged over 100 posts on Tortini.  I have had the gratification of seeing some of these posts quoted, approvingly and disapprovingly, in print publications, as well as in other blogs.   More important, the blog has put me in contact with some very interesting people, who have generously shared ideas, comments, and criticisms — all grist for my blogging mill.

Up till now, I have not made my blog interactive; I have not set up the blog to permit comments from readers.   I have avoided this level of on-line, immediate interactivity with my readers mostly to avoid the pressure of having to monitor the blog closely.  As the quasi-publisher, I feel a responsibility to make sure that the comments were legitimate “fair comment,” and not defamatory rubbish, or worse.

Recently, Professor Deborah Mayo posted a good portion of my post, The Continuing Saga of Bad-Faith Assertions of Conflicts of Interest on her blog. The post attracted the attention of a critic who described my post with a mixed metaphor:  “meretricious garbage.”  I responded on Mayo’s website, but the exchange made me realize that there are plusses and minuses to opening up a blog to comments.

I think for my part, I will continue Tortini as I have been doing.  If I have given offense, personally, professionally, or intellectually, I invite you to write to me.  Let me know whether you are willing to have me post your comments.  I am certainly open to posting opposing points of view on Tortini.