For your delectation and delight, desultory dicta on the law of delicts.

Ecological Fallacy Goes to Court

June 30th, 2012

In previous posts, I have bemoaned the judiciary’s tin ear for important qualitative differences between and among different research study designs.  The Reference Manual for Scientific Evidence (3d ed. 2011)(RMSE3d) offers inconsistent advice, ranging from Margaret Berger’s counsel to abandon any hierarchy of evidence, to other chapters’ emphasizing the importance of a hierarchy.

The Cook case is one of the more aberrant decisions, which elevated an ecological study, without a statistically significant result, into an acceptable basis for a causal conclusion under Rule 702.  Senior Judge Kane’s decision in the litigation over radioactive contamination from the Colorado Rocky Flats nuclear weapons plant is illustrative of a judicial refusal to engage with the substantive differences among studies, and to ignore the inability of some study designs to support causality.  See Cook v. Rockwell Internat’l Corp., 580 F. Supp. 2d 1071, 1097-98 (D. Colo. 2006) (“Defendants assert that ecological studies are inherently unreliable and therefore inadmissible under Rule 702.  Ecological studies, however, are one of several methods of epidemiological study that are well-recognized and accepted in the scientific community.”), rev’d and remanded on other grounds, 618 F.3d 1127 (10th Cir. 2010), cert. denied, ___ U.S. ___ (May 24, 2012).  Senior Judge Kane’s point about the recognition and acceptance of ecological studies has nothing to do with their ability to support conclusions of causality.  This basic non sequitur led the trial judge into ruling that the challenge “goes to the weight, not the admissibility” of the challenged opinion testimony.  This is a bit like using an election day exit poll, with 5% returns, for “reliable” evidence to support a prediction of the winner.  The poll may have been conducted most expertly, but it lacks the ability to predict the winner.

The issue is not whether ecological studies are “scientific”; they are part of the epidemiologists’ toolkit.  The issue is whether they warrant inferences of causation.  Some so-called scientific studies are merely hypothesis generating, preliminary, tentative, or data-dredging exercises.  Judge Kane opined that ecological studies are merely “less probative” than other studies, and the relative weights of studies do not render them inadmissible.  Id.  This is a misunderstanding or an abdication of gatekeeping responsibility.  First, studies themselves are not admissible; it is the expert witness, whose testimony is challenged.  Second, Rule 702 requires that the proffered opinion be “scientific knowledge,” and ecological studies simply lack the necessary epistemic warrant.

The legal sources cited by Senior Judge Kane provide only equivocal and minimal support at best for his decision.  The court pointed to RSME2d at 344-45, for the proposition that ecological studies are useful for establishing associations, but are weak evidence for causality. The other legal citations give seem equally unhelpful.  In re Hanford Nuclear Reservation Litig., No. CY–91– 3015–AAM, 1998 WL 775340 at *106 (E.D.Wash. Aug.21, 1998) (citing RMSE2d and the National Academy of Science Committee on Radiation Dose Reconstruction for Epidemiological Uses, which states that “ecological studies are usually regarded as hypothesis generating at best, and their results must be regarded as questionable until confirmed with cohort or case‑control studies.” National Research Council, Radiation Dose Reconstruction for Epidemiologic Uses at 70 (1995)), rev’d on other grounds, 292 F.3d 1124 (9th Cir. 2002).  Ruff v. Ensign– Bickford Indus., Inc., 168 F.Supp. 2d 1271, 1282 (D. Utah 2001) (reviewing evidence that consisted of a case-control study in addition to an ecological study; “It is well established in the scientific community that ecological studies are correlational studies and generally provide relatively weak evidence for establishing a conclusive cause and effect relationship.’’); see also id. at 1274 n.3 (“Ecological studies tend to be less reliable than case–control studies and are given little evidentiary weight with respect to establishing causation.”)



The new edition of RMSE cites the Cook case at several places.  In an introductory chapter, the late Professor Margaret Berger cites the case incorrectly for having excluded expert witness testimony.  See Margaret A. Berger, “The Admissibility of Expert Testimony 11, 24 n.62 in RMSE3d (“See Cook v. Rockwell Int’l Corp., 580 F. Supp. 2d 1071 (D. Colo. 2006) (discussing why the court excluded expert’s testimony, even though his epidemiological study did not produce statistically significant results).”)  The chapter on epidemiology cites Cook correctly for having refused to exclude the plaintiffs’ expert witness, Dr. Richard Clapp, who relied upon an ecological study of two cancer outcomes in the area adjacent to the Rocky Flats Nuclear Weapons Plant.  See Michael D. Green, D. Michal Freedman, and Leon Gordis, “Reference Guide on Epidemiology,” 549, 561 n. 34, in Reference Manual for Scientific Evidence (3d ed. 2011).  The authors, however, abstain from any judgmental comments about the Cook case, which is curious given their careful treatment of ecological studies and their limitations:

“4. Ecological studies

Up to now, we have discussed studies in which data on both exposure and health outcome are obtained for each individual included in the study.33 In contrast, studies that collect data only about the group as a whole are called ecological studies.34 In ecological studies, information about individuals is generally not gathered; instead, overall rates of disease or death for different groups are obtained and compared. The objective is to identify some difference between the two groups, such as diet, genetic makeup, or alcohol consumption, that might explain differences in the risk of disease observed in the two groups.35 Such studies may be useful for identifying associations, but they rarely provide definitive causal answers.36

Id. at 561.  The epidemiology chapter proceeds to note that the lack of information about individual exposure and disease outcome in an ecological study “detracts from the usefulness of the study,” and renders it prone to erroneous inferences about the association between exposure and outcome, “a problem known as an ecological fallacy.”  Id. at 562.  The chapter authors define the ecological fallacy:

“Also, aggregation bias, ecological bias. An error that occurs from inferring that a relationship that exists for groups is also true for individuals.  For example, if a country with a higher proportion of fishermen also has a higher rate of suicides, then inferring that fishermen must be more likely to commit suicide is an ecological fallacy.”

Id. at 623.  Although the ecological study design is weak and generally unsuitable to support causal inferences, the authors note that such studies can be useful in generating hypotheses for future research using studies that gather data about individuals. Id. at 562.  See also David Kaye & David Freedman, “Reference Guide on Statistics,” 211, 266 n.130 (citing the epidemiology chapter “for suggesting that ecological studies of exposure and disease are ‘far from conclusive’ because of the lack of data on confounding variables (a much more general problem) as well as the possible aggregation bias”); Leon Gordis, Epidemiology 205-06 (3d ed. 2004)(ecologic studies can be of value to suggest future research, but “[i]n and of themselves, however, they do not demonstrate conclusively that a causal association exists”).

The views expressed in the Reference Manual for Scientific Evidence, about ecological studies, are hardly unique.  The following quotes show how ecological studies are typically evaluated in epidemiology texts:

Ecological fallacy

An ecological fallacy or bias results if inappropriate conclusions are drawn on the basis of ecological data. The bias occurs because the association observed between variables at the group level does not necessarily represent the association that exists at the individual level (see Chapter 2).


Such ecological inferences, however limited, can provide a fruitful start for more detailed epidemiological work.”

R. Bonita, R. Beaglehole, and T. Kjellström, Basic Epidemiology 43 2d ed. (WHO 2006).

“A first observation of a presumed relationship between exposure and disease is often done at the group level by correlating one group characteristic with an outcome, i.e. in an attempt to relate differences in morbidity or mortality of population groups to differences in their local environment, living habits or other factors. Such correlational studies that are usually based on existing data are prone to the so-called ‘ecological fallacy’ since the compared populations may also differ in many other uncontrolled factors that are related to the disease. Nevertheless, ecological studies can provide clues to etiological hypotheses and may serve as a gateway towards more detailed investigations.”

Wolfgang Ahrens & Iris Pigeot, eds., Handbook of Epidemiology 17-18 (2005).

The Cook case is a wonderful illustration of the judicial mindset that avoids and evades gatekeeping by resorting to the conclusory reasoning that a challenge “goes to the weight, not the admissibility” of an expert witness’s opinion.

Let’s Require Health Claims to Be Evidence Based

June 28th, 2012

Litigation arising from the FDA’s refusal to approval “health claims” for foods and dietary supplements is a fertile area for disputes over the interpretation of statistical evidence.  A ‘‘health claim’’ is ‘‘any claim made on the label or in labeling of a food, including a dietary supplement, that expressly or by implication … characterizes the relationship of any substance to a disease or health-related condition.’’ 21 C.F.R. § 101.14(a)(1); see also 21 U.S.C. § 343(r)(1)(A)-(B).

Unlike the federal courts exercising their gatekeeping responsibility, the FDA has committed to pre-specified principles of interpretation and evaluation. By regulation, the FDA gives notice of standards for evaluating complex evidentiary displays for the ‘‘significant scientific agreement’’ required for approving a food or dietary supplement health claim.  21 C.F.R. § 101.14.  See FDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final (2009).

If the FDA’s refusal to approve a health claim requires pre-specified criteria of evaluation, then we should be asking ourselves why have the federal courts failed to develop a set of criteria for evaluating health effects claims as part of its Rule 702 (“Daubert“) gatekeeping responsibilities.  Why, after close to 20 years after the Supreme Court decided Daubert, can lawyers make “health claims” without having to satisfy evidence-based criteria?

Although the FDA’s guidance is not always as precise as might be hoped, it is far better than the suggestion of the new Reference Manual for Scientific Evidence (3d ed. 2011) that there is no hierarchy of evidence.   See RMSE 3d at 564 & n.48 (citing and quoting idiosyncratic symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]; “Late Professor Berger’s Introduction to the Reference Manual on Scientific Evidence” (Oct. 23, 2011).

The FDA’s attempt to articulate an evidence-based hierarchy is noteworthy because the agency must evaluate a wide range of evidence, from in vitro, to animal studies, to observational studies of varying kinds, to clinical trials, to meta-analyses and reviews.  The FDA’s criteria are a good start, and I imagine that they will develop and improve over time.  Although imperfect, the criteria are light years ahead of the situation in federal and state court gatekeeping.  Unlike gatekeeping in civil actions, the FDA criteria are pre-stated and not devised post hoc.  The FDA’s attempt to implement evidence-based principles in the evaluation of health claims made is a model that would much improve the Reference Manual for Scientific EvidenceSee Christopher Guzelian & Philip Guzelian, “Prevention of false scientific speech: a new role for an evidence-based approach,” 27 Human & Experimental Toxicol. 733 (2008).

The FDA’s evidence-based criteria need work in some areas.  For instance, the FDA’s Guidance on meta-analysis is not particularly specific or helpful:

Research Synthesis Studies

Reports that discuss a number of different studies, such as review articles, do not provide sufficient information on the individual studies reviewed for FDA to determine critical elements such as the study population characteristics and the composition of the products used. Similarly, the lack of detailed information on studies summarized in review articles prevents FDA from determining whether the studies are flawed in critical elements such as design, conduct of studies, and data analysis. FDA must be able to review the critical elements of a study to determine whether any scientific conclusions can be drawn from it. Therefore, FDA intends to use review articles and similar publications to identify reports of additional studies that may be useful to the health claim review and as background about the substance/disease relationship. If additional studies are identified, the agency intends to evaluate them individually. Most meta-analyses, because they lack detailed information on the studies summarized, will only be used to identify reports of additional studies that may be useful to the health claim review and as background about the substance-disease relationship.  FDA, however, intends to consider as part of its health claim review process a meta-analysis that reviews all the publicly available studies on the substance/disease relationship. The reviewed studies should be consistent with the critical elements, quality and other factors set out in this guidance and the statistical analyses adequately conducted.”

FDA – Guidance for Industry: Evidence-Based Review System for the Scientific Evaluation of Health Claims – Final at 10 (2009).

The dismissal of review articles as a secondary source is welcome, but meta-analyses are quantitative reviews that can add additional insights and evidence, if methodologically appropriate, by providing a summary estimate of association, sensitivity analyses, meta-regression, etc.  The FDA’s guidance was applied in connection with the agency’s refusal to approve a health claim for vitamin C and lung cancer.  Proponents claimed that a particular meta-analysis supported their health claim, but the FDA disagreed.  The proponents sought injunctive relief in federal district court, which upheld the FDA’s decision on vitamin C and lung cancer.  Alliance for Natural Health US v. Sebelius, 786 F.Supp. 2d 1, 21 (D.D.C. 2011).  The district court found that the FDA’s refusal to approve the health claim was neither arbitrary nor capricious with respect to its evaluation of the cited meta-analysis:

‘‘The FDA discounted the Cho study because it was a ‘meta-analysis’ of studies reflected in a review article. FDA Decision at 2523. As explained in the 2009 Guidance Document, ‘research synthesis studies’, and ‘review articles’, including ‘most meta-analyses’, ‘do not provide sufficient information on the individual studies reviewed’ to determine critical elements of the studies and whether those elements were flawed. 2009 Guidance Document at A.R. 2432. The Guidance Document makes an exception for meta-analyses ‘that review[ ] all the publicly available studies on the substance/disease relationship’. Id. Based on the Court’s review of the Cho article, the FDA’s decision to exclude this article as a meta-analysis was not arbitrary and capricious.’’

Id. at 19.

The FDA’s Guidance was adequate for its task in the vitamin C/lung cancer health claim, but notably absent from the Guidance are any criteria to evaluate competing meta-analyses that do include “all the publicly available studies on the substance/disease relationship.”  The model assumptions of meta-analyses, fixed effect versus random effects, lack of heterogeneity, as well as other considerations will need to be spelled out in advance.  Still not a bad start.  Implementing evidence-based criteria in Rule 702 gatekeeping has the potential to tame the gatekeeper’s discretion.

Gatekeeping the Lumpers and Splitters – Composite End Points

June 26th, 2012

The battle between lumpers and splitters is fought in many disciplines, and so it is not surprise that it finds its way into litigation.

The battle is often entrenched in the discipline of epidemiology, where practitioners tussle over the definition of the end point of a study or clinical trial. Lumping has the advantage of increasing study size, with attendant increases in statistical power.  The down side of lumping is that the “lumped” or composite outcome may no longer be meaningful with respect to the more precise outcome of interest.  In other words, the lumping threatens the external validity of the study.  Splitting preserves external validity with respect to outcome of interest, but decreases study size, with a greater risk of Type II errror.

The issue arises in birth defect litigation, such as the claims made against the manufacturer of Bendectin, where the claimants’ expert witnesses frequently tried to increase power by lumping different birth defects together, despite the lack of embryological plausibility.  The issue has come up in cardiovascular end point trials and meta-analyses, involving thrombo-embolic outcomes, such as stroke and heart attack.  The Celebrex litigation, for instance, involved contested issues of what cardiovascular end points to combine to capture the postulated thrombotic causal mechanism.  In re Pfizer Inc. Securities Litig., 2010 WL 1047618 (S.D.N.Y. 2010).

Despite the recurrence of lumping/splitting issues in litigation of epidemiologic evidence, the Reference Manual for Scientific Evidence (3d ed. 2011)  does not treat the subject at all.  Federal and state judges are often at sea (without sextant or compass) in disputes over lumping and splitting, where the methodology selected can often determine the result.  The following is a collection of some observations, comments, and guidances from the biomedical literature on the use of composite end points. 


Composite Endpoints

A.  Definition

Composite end points are typically defined, perhaps circularly, as a single group of health outcomes, which group is made up of constituent or single end points.  Meinert defined a composite outcome as “an event that is considered to have occurred if any of several different events or outcomes is observed.”  C. Meinert, Clinical Trials Dictionary (Johns Hopkins Center for Clinical Trials 1996). Similarly, Montori defined composite end points as “outcomes that capture the number of patients experiencing one or more of several adverse events.”  Montori, et al., “Validity of composite end points in clinical trials.”  300 Brit. Med. J.  594, 596 (2005).  Composite end points are also sometimes referred to as combined or aggregate end points.

Many composite end points are clearly defined for a clinical trial, and the component end points are specified.  In some instances, the composite nature of an outcome may be subtle or be glossed over by the study’s authors.  In the realm of cardiovascular studies, for example, investigators may look at stroke as a single endpoint, without acknowledging that there are important clinical and pathophysiological differences between ischemic strokes and hemorrhagic strokes (intracerebral or subarachnoid).  The Fletchers give the example:

“In a study of cardiovascular disease, for example, the primary outcomes might be the occurrence of either fatal coronary heart disease or non-fatal myocardial infarction.  Composite outcomes are often used when the individual elements share a common cause and treatment.  Because they comprise more outcome events than the component outcomes alone, they are more likely to show a statistical effect.”

R. Fletcher & S. Fletcher, Clinical Epidemiology: The Essentials 109 (4th ed. 2005).

B.  Utility of Composite End Points

1.  Power

Use of composite end points frequently occurs in the context of studying heart attacks as the outcome of interest.  Improvements in medical care have led to decreased frequency in rates of myocardial infarction (MI) and repeat MIs.  In clinical trials, because of the close medical attention received by participants, event rates are even lower than what might be expected from the relevant general patient population.  These low event rates have caused power issues for clinical trialists, who have responded by turning to composite end points to capture more events.  Composite end points permit smaller sample sizes and shorter follow-up times.  Increasing study power, while reducing sample size or observation time, is perhaps the most frequently cited rationale for using composite end points.

Typical statements from the medical literature:

“Clinical trials, particularly in cardiology, often use composite end points to reduce sample size requirements and to capture the overall impact of therapeutic interventions.”

(Ferreira-Gonzalez 2007, p. 1b, Introduction)

“The widespread use of composite end points reflects their elegant simplicity as a solution to the problem of declining event rates.”

(Montori 2005, at 596, Conclusions)

“The primary rationale for considering a composite primary outcome instead of a single event outcome is sample size.”

(Neaton 2005, at 598b)

“Clinical trialists use composite end points, outcomes that capture the number of patients who have one or more of several events, to increase event rates and statistical power.”

(Ferreira-Gonzalez 2007, p. 6a, Box)

“Although dealing with multiple testing is an important factor in the design and analysis of clinical trials, this may not be the only motivation behind the popularity of composite outcome measures.  Instead, issues of statistical efficiency appear to be prominent, with composite outcomes in time-to-event trials leading to higher event rates and thus enabling smaller sample sizes or shorter follow-up (or both).”

(Freemantle 2003, at 2555 b-c)

“Investigators often use composite end points to enhance the statistical efficiency of clinical trials.”

(Montori 2004, at 1094b)

2.  Competing Risks

Another reason that is offered in support of using composite end points is composites provide a strategy to avoid the problem of competing risks.  (Neaton 2005, at 569a)  Death (any cause) is sometimes added to a distinct clinical morbidity because patients who are taken out of the trial by death are “unavailable” to experience the morbidity outcome.

3.  Multiple Testing

By aggregating several individual end points into a single pre-specified outcome, trialists can avoid corrections for multiple testing.  Trials that seek data on multiple outcomes, or on multiple subgroups, inevitably raise concerns about the appropriate choice of the measure for the statistical test (alpha) to determine whether to reject the null hypothesis.  According to some authors, “[c]omposite endpoints alleviate multiplicity concerns.”  Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591, 1593a (2005).  Schultz and Grimes, who written extensively about methodological issues, comment further:

“If designated a priori as the primary outcome, the composite obviates the multiple comparisons associated with testing of the separate components.  Moreover, composite outcomes usually lead to high event rates thereby increasing power or reducing sample size requirements.  Not surprisingly, investigators frequently use composite endpoints.”

Id.  Freemantle and Calvert acknowledge that the need to avoid false positive results from multiple testing is an important rationale for composite end points:

“Because the likelihood of observing a statistically significant result by chance alone increases with the number of tests, it is important to restrict the number of tests undertaken and limit the type 1 error to preserve the overall error rate for the trial.”

Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J . 756, 756a – b (2007).  Freemantle previously had articulated a similar rationale:

“[T]he correct (a priori) identification of a composite end point can increase the statistical precision and thus the efficiency of a trial.”

(Freemantle 2003, at 2558a)

4.  Indecision about an Appropriate Single Outcome

The International Conference on Harmonization suggests that the inability to select a single outcome variable may lead to the adoption of a composite outcome:

“If a single primary variable cannot be selected …, another useful strategy is to integrate or combine the multiple measurements into a single or composite variable.”

International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use; “ICH harmonized tripartite guideline:  statistical principles for clinical trials,” 18 Stat. Med. 1905 (1999).

Freemantle gives this rationale some measure of approval:

“Composite outcomes can help in avoiding arbitrary decisions between different candidate outcomes when prespecifying the primary outcome … .”

(Freemantle & Calvert 2007, at 757a)

“[A] composite outcome may help investigators who are having difficulty in deciding which outcome to elect as the primary outcome measure in a trial and deal with the issue of multiplicity in an efficient manner, avoiding the need for arbitrary choices.”

(Freemantle 2003, at 2558a-b)

The “indecision” rationale has also been criticized:

“Inability to reach consensus on a single outcome is generally not a good reason to use a composite end point.”

(Neaton 2005, at 569b)


C.  Validity of Composite End Points

The validity of composite end points depends upon assumptions, which will have to be made at the time of the study design and protocol creation.  After the data are collected and analyzed, the assumptions may or may not be supported.

“The validity of composite end points depends on

  • similarity in patient importance,
  • [similarity in] treatment effect, and
  • number of events across the components.”

(Montori 2005, at 596, Summary Point No. 2)

“Use of composite end points is usually justified by the assumption that the effect on each of the components will be similar and that patients will attach similar importance to each component.”

(Montori 2005, at 594a, paragraph 2)


D.  Role of Mechanism in Justifying Composite End Points

A composite end point will obviously make sense when the individual end points are biologically related, and the investigators reasonably expect that the individual end points would be affected in the same direction, and in the same approximate amount.

“Confidence in a composite end point rests partly on a belief that similar reductions in relative risk apply to all the components.  Investigators should therefore construct composite endpoints in which the biology would lead us to expect similar effects across components.”

(Montori 2005, 595b)


E.  Methodological Issues

The acceptability of composite end points is often a delicate balance between the statistical power and efficiency gained and the reliability concerns raised by using the composite.  As with any statistical or interpretative tool, the key questions revolve how is the tool used, and for what purpose.  The reliability issues raised by the use of composites are likely to be highly contextual.

For instance, there is an important asymmetry between justifying the use of a composite for measuring efficacy and the use of the same composite for safety outcomes.  A biological improvement in type 2 diabetes might be expected to lead to a reduction in all the macrovascular complications of that disease, but a medication for type 2 diabetes might have a very specific toxicity or drug interaction, which affects only constituent end point among all macrovascular complications, such as myocardial infarction.  The asymmetry between efficacy and safety outcomes is specifically addressed in a recent publication:

“Varying definitions of composite end points, such as MACE, can lead to substantially different results and conclusions.  There, the term MACE, in particular, should not be used, and when composite study end points are desired, researchers should focus separately on safety and effectiveness outcomes, and construct separate composite end points to match these different clinical goals.”

(Kip 2008, 701, Abstract – Conclusions)(emphasis in original)

There are many clear statements that caution the consumers of medical studies against being misled by misleading claims that may be based upon composite end points, in the medical literature.  Severally years ago, the British Medical Journal published a paper by Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004).  The authors distill their advice down to six suggestions, one of which deals explicitly with composite end points:

“Guide to avoid being misled by biased presentation and interpretation of data

1.  Read only the Methods and Results sections; bypass the Discuss section

2.  Read the abstract reported in evidence based secondary publications

3.  Beware faulty comparators

4.  Beware composite endpoints

5.  Beware small treatment effects

6.  Beware subgroup analyses”












Id. at 1093a (emphasis added).  The authors elaborate on the problems that arise from the use of composite end points:

“Problems in the interpretation of these trials arise when composite end points include component outcomes to which patients attribute very different importance… .”

(Montori 2004, at 1094b.)

“Problems may also arise when the most important end point occurs infrequently or when the apparent effect on component end points differs.”

(Montori 2004, at 1095a.)

“When the more important outcomes occur infrequently, clinicians should focus on individual outcomes rather than on composite end points.  Under these circumstances, inferences about the end points (which because they occur infrequently will have very wide confidence intervals) will be weak.”

(Montori 2004, at 1095a.)

“When large variations exist between components the composite end point should be abandoned.”

(Montori 2005, at 596, Summary Point No. 3)

“Occasionally, composite end points prove useful and informative for clinical decision making.  Often, they do not.”

(Montori 2005, at 596, Conclusions)

“Composite endpoints frequently lack clinical relevancy.  Thus, composite endpoints address multiplicity and generally yield statistical efficiency at the risk of creating interpretational difficulties.”

(Schulz & Grimes 2005, at 1593a-b)

“The disadvantages of composite outcomes may arise when the constituents do not move in line with each other.”

(Freemantle 2003, at 2558a)

“Composite end points, as currently used in cardiovascular trials, may often be misleading.”

(Ferreira-Gonzalez 2007, p. 6a, Box)

“Trialists should report complete data on individual component end points to facilitate appropriate interpretation; clinicians should view with caution the results of cardiovascular trials that use composite end points to report their results.”

(Ferreira-Gonzalez 2007, p. 7a)


F.  Methodological Issues Concerning Causal Inferences from Composite End Points to Individual End Points

Several authors have criticized pharmaceutical companies for using composite end points to “game” their trials.  Composites allow smaller sample size, but they lend themselves to broader claims for outcomes included within the composite.  The same criticism appears to be valid when applied to attempts to infer that there is risk of an individual endpoint based upon a showing of harm in the composite endpoint.

“If a trial report specifies a composite endpoint, the components of the composite should be in the well-known pathophysiology of the disease.  The researchers should interpret the composite endpoint in aggregate rather than as showing efficacy of the individual components.  However, the components should be specified as secondary outcomes and reported beside the results of the primary analysis.”

(Schulz & Grimes 2005, at 1595a)(emphasis added)

“[A] positive result for a composite outcome applies only to the cluster of events included in the composite and not to the individual components.”

(Freemantle & Calvert 2007, at 757a) [Freemantle and Calvert urge “health warnings” that a composite end point benefit cannot be interpreted to mean an actual benefit in every constituent end point.]

“To avoid the burying of important components of composite primary outcomes for which on their own no effect is concerned, . . . the components of a composite outcome should always be declared as secondary outcomes, and the results described alongside the result for the composite outcome.”

(Freemantle 2003, at 2559a, Point No. 3; 2559b-c, Box)

“Authors and journal editors should ensure that the reporting of composite outcomes is clear and avoids the suggestion that individual components of the composite have been demonstrated to be effective.”

(Freemantle 2003, at 2559b-c, Box Point No. 4)


G.  Regulatory Experience

“Regulatory behavior may have led to the addition of ‘death’ to many composite primary end points used in trials, and it is our experience that the Food and Drug Administration has actively promoted the use of such composite outcome measures in the heart failure trials.”

(Freemantle & Calvert 2007, at 757a)

The FDA addressed composite end points in the context of its recommendations for looking at cardiovascular outcomes in Phase III and Phase IV clinical trials for anti-diabetic therapies.

“In cardiovascular trials, as in all trials, the primary endpoint should be predefined, justified, and accurately captured and analyzed. Powering the study on an individual type of event (e.g., myocardial infarction) is usually not feasible because of low incidence rates. Therefore, many cardiovascular trials use the MACE (Major Adverse Cardiovascular Event) composite endpoint, which contains all-cause mortality (or cardiovascular death), non-fatal myocardial infarction, and stroke. Some cardiovascular trials include other macrovascular events, such as coronary revascularization and lower-extremity amputation. Use of all-cause mortality as part of the MACE endpoint in a trial with excellent follow-up has the advantage of certainty as to whether the event occurred. However, the cause of death should still be determined in a well-designed trial to ensure that there are no imbalances in particular fatal events (e.g., neoplasms or strokes). Use of cardiovascular death as part of the MACE endpoint may be more relevant but, like myocardial infarction and stroke, requires adjudication by an independent and blinded committee with pre-specified case definitions and methodology for ascertaining events (e.g., access to medical records and laboratory data).  If the study is powered on a composite endpoint, there will likely be too few events for the individual components (e.g., acute myocardial infarction) of the composite to provide conclusive evidence of a difference between treatment groups with regard to these individual endpoints. In addition, a difference between treatment groups in the composite endpoint may primarily be driven by one or more of the individual components that comprise the endpoint. As a result, secondary efficacy measures often include analyses of the individual components as initial and total events to determine their contribution to the overall primary efficacy results.”

(FDA Background Introductory Memorandum, for Endocrinologic and Metabolic Drugs Advisory Committee meeting, July 1-2, 2008, at p. 17 – 18.)


H.  Specific Composite End Points

1.  Myocardial ischemia 

In the Avandia litigation, some investigators chose to look at a composite of “myocardia ischemia.”  Plaintiffs’ counsel, and even some publications, appear to equate a finding of this composite end point with one of myocardial infarction.  For instance, Curt Furberg equated MI with myocardial ischemia in a JAMA publication of his meta-analysis of rosiglitazone trials.  See, e.g., Singh, et al., “Long-term risk of cardiovascular events with rosiglitazone:  a meta-analysis,” 298 JAMA 1189, 1193 (2007)(“Two previous meta-analyses showed that the risk of MI was significantly increased by rosiglitazone. An unpublished meta-analysis (ZM 2005/00181/01) conducted in 2005 involving 14,237 participants from 42 double-blind RCTs determined the incidence of MI in the rosiglitazone group to be 1.99% vs. 1.51% in controls (hazard ratio, 1.31; 95% CI, 1.01-1.70).”)(emphasis added; internal references omitted).  From his endnotes, it is clear that Furberg is referencing GlaxoSmithKline’s own meta-analysis, which used myocardial ischemia, not MI, as an end point.  See Alexander Cobitz, et al., “A retrospective evaluation of congestive heart failure and myocardial ischemia events in 14 237 patients with type 2 diabetes mellitus enrolled in 42 short-term, double-blind, randomized clinical studies with rosiglitazone,” 17 Pharmacoepi. and Drug Safety 769 (2008) (reporting GSK’s meta-analysis of 42 clinical trials for a broad definition of myocardial ischemia).  Furberg’s confusion seems the sort of carelessness that trial judges should be alert to guard against.

Myocardial ischemia may be variously defined, but at least it may include MI and angina.  Sometimes revascularization is added.  Subjective symptoms as vague as “dyspnea,” or as specific as sub-sternal pain, may be part of the definition.  A definition of myocardial ischemia used in an exploratory, hypothesis-generating analysis, for purposes of “pharmacovigilance,” may have different validity and operational characteristics from a definition used in a study that is trying to determine whether a medication, does in fact, cause any one of the constituent end points within the composite.

2.  MACE

Recently, the use of the MACE composite end point has been subjected to greater scrutiny and criticism.  Kip summarizes his group’s recent analysis:

“In light of the approximate prior 15 years of the term MACE and its wide heterogeneity in definition and research applications, it is unlikely that a consensus definition will either be universally desired or practical for future research.  Therefore, we recommend against the routine use of MACE as a composite end point at large.  However, if a broad heterogeneous composite end point such as MACE is ultimately desired, minimally, it must be clearly defined, and the individual as well as composite end points need to be analyzed, presented, and discussed.”

(Kip 2008, at 706b)

Kip notes that this his group’s recommendations are consistent with those of the Academic Research Consortium, which has tried to establish consensus composite end point definitions for stent trials.  See Cutlip, et al., “Clinical end points in coronary stent trials:  a case for standardized definitions,” 115 Circulation 2344 (2007).

3.  Cardiovascular or cardiac death

The use of a composite end point of cardiac death has elicited some strong criticism in the published literature, most notably from Dr. Nissen’s former colleague, Dr. Eric Topol.  See generally, Lauer & Topol, “Clinical trials – Multiple treatments, multiple end points, and multiple lessons,” 289 JAMA 2575 (2003).

“Among fatal end points, only all-cause mortality can be considered objective, unbiased, and clinically relevant.  As previously reviewed in depth, the use of end points such as ‘cardiac death’, ‘vascular death’, and ‘arrhythmic death’ are inherently subject to error due to biased assessment and to the biological complexities of disease, especially among elderly individuals.”

(Lauer & Topol 2003, at 2575b)

“When mortality is considered, only all-cause mortality is a valid end point, while end points such as ‘cardiac death’ and ‘arrhythmic death’ should be actively discouraged.”

(Lauer & Topol 2003, at 2577a)

4.  All-cause death

Although most authors accept “any death” as a potential corrective to competing risks, and the ultimate, objective outcome, Lauer and Topol do not completely spare the inclusion of all-cause death in outcome composites, from criticism:

“A composite end point that includes death as well as nonfatal events is subject to biases related to competing risks.  Obviously, patients who die cannot later experience nonfatal myocardial infarction or be hospitalized.  A treatment that leads to an increased risk of death may therefore appear to reduce the risk of nonfatal events.  Although formal methods have been developed to analyze competing risks in an unbiased manner, the optimal approach to this problem is unclear.”

(Lauer & Topol 2003, at 2576a)


J.   Bibliography

Cutlip, et al., “Clinical end points in coronary stent trials:  a case for standardized definitions,” 115 Circulation 2344 (2007)

FDA Background Introductory Memorandum, for Endocrinologic and Metabolic Drugs Advisory Committee meeting (July 1-2, 2008)

Ferreira-Gonzalez, et al., “Problems with the use of composite end point in cardiovascular trials: systematic review of randomized controlled trials.”  334 Brit. Med. J.  (published online 2 April 2007).

R. Fletcher & S. Fletcher, Clinical Epidemiology:  The Essentials (4th ed. 2005).

Freemantle, et al., “Composite outcomes in randomized trials: Greater precision but with greater uncertainty.”  289 J. Am. Med. Ass’n  2554 (2003)

Freemantle & Calvert, “Composite and surrogate outcomes in randomized controlled trials.” 334 Brit. Med. J.  756 (2007)

International Conference on Harmonisation of Technical Requrements for Registration of Pharmaceuticals for Human Use.  ICH harmonized tripartite guideline:  statistical principles for clinical trials, 18 Stat. Med. 1905 (1999)

Kip, et al., “The problem with composite end points in cardiovascular studies,” 51 J. Am. Coll. Cardiol. 701 (2008)

Lauer & Topol, “Clinical trials – Multiple treatments, multiple end points, and multiple lessons.”  289 J. Am. Med. Ass’n 2575 (2003)

Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Brit. Med. J. 1093 (2004)

Montori, et al., “Validity of composite end points in clinical trials.”  300 Brit. Med. J. 594 (2005).

Neaton, et al., “Key issues in end point selection for heart failure trials:  composite end points,” 11 J. Cardiac Failure 567 (2005)

Schulz & Grimes, “Multiplicity in randomized trials I:  endpoints and treatments,” 365 Lancet 1591 (2005)

Meta-Meta-Analysis – Celebrex Litigation – The Claims – Part 2

June 25th, 2012


As I noted in part one, the tables were turned on imputation, with plaintiffs making the same accusation that G.E. made in the gadolinium litigation:  imputation involves adding “phantom events” or “imaginary events to each arm of ‘zero event’ trials.”  See Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 8, 9 (May 5, 2010), in Securities Litig.

The plaintiffs claimed that Wei “created” an artifact of a risk ratio of 1.0 by using imputation in each of the zero-event trials.  The reality, however, is that each of those trials had zero risk difference, and the rates of event in drug and placebo arms were both low and equal to one another.  The plaintiffs’ claim that Wei “diluted” the risk is little more than saying that he failed to inflate the risk by excluding zero-event trials.  But zero-event trials represent a test in which the risk of events in both arms is equal, and relatively low.

The plaintiffs seemed to make their point half-heartedly.  They admitted that “imputation in and of itself is a commonly used methodology,” id. at 10, but they claimed that “adding zero-event trials to a meta-analysis is debated among scientists.”  Id.  A debate over methodology in the realm of meta-analysis procedures hardly makes any one of the debated procedures “not generally accepted,” especially in the context of meta-analysis of uncommon adverse events arising in clinical trials designed for other outcomes.  After all, investigators do not design trials to assess a suspected causal association between a medication and an adverse outcome as their primary outcome.  The debate over the ethics of such a trial would be much greater than any gentle debate over whether to include zero-event trials by using either the risk difference or imputation procedures.

The gravamen of the plaintiffs’ complaint against Wei seems to be that he included too many zero-event trials, “skewing the numbers greatly, and notably cites to no publications in which the dominant portion of the meta-analysis was comprised of studies with no events.”  Id. The plaintiffs further argue that Wei could have minimized the “distortion” created by imputation by using a fractional event, ” a smaller number like .000000001 to each trial.”  Id. The plaintiffs notably cited no texts or articles for this strategy.  In any event, if the zero-event trials are small, as they typically are, then they will have large study variances.  Because meta-analyses weight each trial by the inverse of the variance, studies with large variances have little weight in the summary estimate of association.  Including small studies with imputation methods will generally not affect the outcome very much, and their contribution may well reflect the reality of lower or non-differential risk from the medication.

Eliminating trials on the grounds that they had zero events has also been criticized for throwing away important data.  Charles H. Hennekens, David L. DeMets, C. Noel Bairey Merz, Steven L. Borzak, Jeffrey S. Borer,  “Doing More Harm Than Good,” 122 Am. J. Med. 315 (2009) (criticizing Nissen’s meta-analysis of rosiglitazone in which he excluded zero event trials for as biased towards overestimating the magnitude of the summary estimate of association). George A. Diamond, L. Bax, S. Kaul, “Uncertain effects of rosiglitazone on the risk for myocardial infarction and cardiovascular death,” 147 Ann. Intern. Med. 578 (2007) (conducting sensitivity analyses on Nissen’s meta-analysis of rosiglitazone to show that Nissen’s findings lost statistical significance when continuity corrections were made for zero-event trials).



The plaintiffs are correct that the risk difference is not the predominant risk measure used in meta-analysis or in clinical trials for that matter.  Researchers prefer risk ratios because they reflect base rates in the ratio.  As one textbook explains:

“the limitation of the [risk difference] statistic is its insensitivity to base rates. For example, a risk that increases from 50% to 52% may be less important than one that increases from 2% to 4%, although in both instances RD = 0.02.”

Julia Littell, Jacqueline Corcoran, and Vijayan Pillai, Systematic Reviews and Meta-Analysis 85 (Oxford 2008).  This feature of the risk difference hardly makes its use unreliable, however.

Pfizer pointed out that at least one other case addressed the circumstances in which the risk difference would be superior to risk ratios in meta-analyses:

“The risk difference method is often used in meta-analyses where many of the individual studies (which are all being pooled together in one, larger analysis) do not contain any individuals who developed the investigated side effect.FN17  whereas such studies would have to be excluded from an odds ratio calculation, they can be included in a risk difference calculation. FN18

FN17. This scenario is more likely to occur when studying a particularly rare event, such as suicide.

FN18. Studies where no individuals experienced the effect must be excluded from an odds ratio calculation because their inclusion would necessitate dividing by zero, which, as perplexed middle school math students come to learn, is impossible. The risk difference’s reliance on subtraction, rather than division, enables studies with zero incidences to remain in a meta-analysis. (Hr’g Tr. 310-11, June 20, 2008 (Gibbons.)).”

In re Neurontin Marketing, Sales Practices, and Products Liab. Litig.,  612 F.Supp. 2d 116, 126 (D. Mass. 2009) (MDL 1629).  See Pfizer’s Defendants’ Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei (Sept. 8, 2009), in Securities Litig. (citing In re Neurontin).

Pfizer also pointed out that Wei had employed both the risk ratio and the risk difference in conducting his meta-analyses, and that none of his summary estimates of association were statistically significant.  Id. at 19, 24.


The plaintiffs argued that the use of “exact confidence” intervals was not scientifically reliable and could not have been used by Pfizer at the time period covered by the securities class’s allegations.  See Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 15 (May 5, 2010).  Exact intervals, however, are hardly a novelty, and there is often no single way to calculate a confidence interval.  See E. B. Wilson,  “Probable inference, the law of succession, and statistical inference,” 22 J. Am. Stat. Ass’n 209 (1927); C. Clopper, E. S. Pearson, “The use of confidence or fiducial limits illustrated in the case of the binomial,” 26 Biometrika 404 (1934).  Approximation methods are often used, despite their lack of precision, because of their ease in calculation.

Plaintiffs further claimed that the combination of risk difference and exact intervals is novel, not reliable, and not in existence during the class period.  Plaintiffs’ Reply Mem at 15.  The plaintiffs’ argument traded on Wei’s having published on the use of exact intervals in conjunction with the risk difference for heart attacks in clinical trials of Avandia.  See L. Tian, T. Cai, M.A. Pfeffer, N. Piankov, P.Y. Cremieux, and L.J. Wei, “Exact and efficient inference procedure for meta-analysis and its application to the analysis of independent 2 x 2 tables with all available data but without artificial continuity correction,” 10 Biostatistics 275 (2009).  Their argument ignored that Wei combined two well-understood statistical techniques, in a transparent way, with empirical testing of the validity of his approach.  Contrary to plaintiffs’ innuendo, Wei did not develop his approach as an expert witness for GlaxoSmithKline; a version of the manuscript describing his approach was posted on line well before he was ever contacted by GSK counsel. (L.J. Wei, personal communication)  Plaintiffs also claimed that Wei’s use of exact intervals for risk difference showed no increased risk of heart attack for Avandia, contrary to a well-known meta-analysis by Dr. Steven Nissen.  See Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457, 2457 (2007).  This claim, however, is a crude distortion of Wei’s paper, which showed that there was a positive risk difference for heart attacks in the same dataset used by Nissen, but the confidence intervals included zero (no risk difference), and thus chance could not be excluded as explaining Nissen’s result.



Pfizer was ultimately successful in defending the Celebrex litigation on the basis of lack of risk associated with 200 mg/day use.  Pfizer also attempted to argue a duration effect on grounds that in one large trial that saw a statistically significant hazard ratio associated with higher doses, the result occurred for the first time among trial participants on medication, at 33 months into the trial.  Judge Bryer rejected this challenge, without explanation.  In re Bextra & Celebrex Marketing Celebrex Sales Practices & Prod. Liab. Litig., 524 F.Supp. 2d 1166, 1183 (2007).  The reasonable inference, however, is that the meta-analyses showed statistically significant results across trials with less duration of use, for 400 mg and 800 mg/day use.

Clearly duration of use is a potential consideration unless the mechanism of causation is such that a causally related adverse event would occur from the first use or very short-term use of the medication.  See In re Vioxx Prods. Liab. Litig., MDL No. 1657, 414 F. Supp. 2d. 574, 579 (E.D. La. 2006) (“A trial court may consider additional factors in assessing the scientific reliability of expert testimony . . . includ[ing] whether the expert’s opinion is based on incomplete or inaccurate dosage or duration data.”).  In the Celebrex litigation, plaintiffs’ counsel appeared to want to have duration effects both ways; they did not want to disenfrancise plaintiffs whose claims turned on short-term use, but at the same time, they criticized Professor Wei for including short-term trials of Celebrex.

One form that the plaintiffs’ criticism of Wei took was his failure to weight the trials included in his meta-analyses by duration.  In the plaintiffs’ words:

“Wei failed to utilize important information regarding the duration of the clinical trials that he analyzed, information that is critical to interpreting and understanding the Celebrex and Bextra safety information that is contained within those clinical trials.3 Because the types of cardiovascular events that are at issue in this case occur relatively rarely and are more likely to be observed after an extended period of exposure, the scientific community is in agreement that they would not be expected to appear in trials of very short duration.”

Plaintiffs’ Mem. of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 2 (July 23, 2009), submitted in In re Pfizer, Inc. Securities Litig., Nos. 04 Civ. 9866(LTS)(JLC), 05 md 1688(LTS) (S.D.N.Y.)[hereafter Securities Litig.]  The plaintiffs maintained that Wei’s meta-analyses were “fatally flawed” because he ignored trial duration, such as would be factored in by performing the analyses in terms of patient years.  Id. at 3

Many of the sources cited by plaintiffs do not support their argument. For instance, the plaintiffs cited articles that noted that weighted averages should be used, but virtually all methods, including Wei’s, weight studies by their variance, which takes into account sample size. Id. at 9 n.3, citing Egger, et al. “Meta-analysis: Principles and Procedures,” 315 Brit. Med. J. 1533 (1997) (an arithmetic average from all trials gives misleading results as results from small studies are more subject to the play of chance and should be given less weight. Meta-analyses use weighted results in which larger trials have more influence that smaller ones). See also id. at 22.  True, true, and immaterial.  No one in the Celebrex cases was using an arithmetic average of risk across trials or studies.

Most of the short-term studies were small, and thus contributed little to the overall summary estimate of association.  Some of the plaintiffs’ citations actually supported using “individual patient data” in the form of time-to-event analyses, which was not possible with many of the clinical trials available.  Indeed, the article the plaintiffs cited, by Dahabreh, did not use time-to-event data for rosiglitazone, because such data were not generally available.  Id. at 9 n.3, citing Dahabreh, “Meta-Analysis Of Rare Events: An Update And Sensitivity Analysis Of Cardiovascular Events In Randomized Trials Of Rosiglitazone,” 5 Clinical Trials 116 (2008).

The plaintiffs’ claim was thus a fairly weak challenge to using simple 2 x 2 tables for the included studies in Wei’s meta-analysis. Both sides failed to mention that many published meta-analyses eschew “patient years” in favor of a simple odds ratio for dichotomous count data from each included study.  See, e.g., Steven E. Nissen, M.D., and Kathy Wolski, M.P.H., “Effect of Rosiglitazone on the Risk of Myocardial Infarction and Death from Cardiovascular Causes,” 356 New Engl. J. Med. 2457, 2457 (2007)(using Peto method with count data, for fixed effect model).  Patient years would be a crude tool to modify the fairly common 2 x 2 table.  The analysis for large studies, with a high number of patient years, would still not reveal whether the adverse events occurred early or late in the trials.  Only a time-to-event analysis could provide the missing information about “duration,” and neither side’s expert witnesses appeared to use a time-to-event analysis.

Interestingly, plaintiffs’ expert witness, Prof. Madigan appears to have received the patient-level data from Pfizer’s clinical trials, but still did not conduct a time-to-event analysis.  Plaintiffs’ Mem. of Law in Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 12 (July 23, 2009), submitted in In re Pfizer, Inc. Securities Litig., Nos. 04 Civ. 9866(LTS)(JLC), 05 md 1688(LTS) (S.D.N.Y.)[hereafter Securities Litig] (noting that Madigan had examined all SAS data files produced by Pfizer, and that “[t]hese  files contained voluminous information on each subject in the trials, including information about duration of exposure to the drug ( or placebo), any adverse events experienced and a wide variety of other information.”).  Of course, even with time-to-event data from the Pfizer clinical trials, Madigan had the problem of whether to limit himself to just the Pfizer trials or use all the data, including non-Pfizer trials.  If he opted for completeness, he would have been forced to include trials for which he did not have underlying data.  In all likelihood, Madigan used patient-years in his analyses because he could not conduct a complete analysis with time-to-event data for all trials.

The plaintiffs’ point appears well taken if the court were to assume that there really was a duration issue, but the plaintiffs’ theories were to the contrary, and Pfizer lost its attempt to limit claims to those events that appeared 33 months (or some other fixed time) after first ingestion.  It is certainly correct that patient-year analyses, in the absence of time-to-event analyses, is generally preferred.  Pfizer had used patient-year information to analyze combined trials in its submission to the FDA’s Advisory Committee.  See Pfizer’s Submission of Advisory Committee Briefing Document at 15 (January 12, 2005).  See also  FDA Reviewer Guidance: Conducting a Clinical Safety Review of a New Product Application and Preparing a Report on the Review at 22 (2005); see also id. at 15 (“If there is a substantial difference in exposure across treatment groups, incidence rates should be calculated using person-time exposure in the denominator, rather than number of patients in the denominator.”);  R. H. Friis & T. A. Sellers, Epidemiology for Public Health Practice at 105 (2008) (“To allow for varying periods of observation of the subjects, one uses a modification of the formula for incidence in which the denominator becomes person-time of observation”).

Professor Wei chose not to do a “patient-year” analysis because such a methodological commitment would have required him to drop over a dozen Celebrex clinical trials involving thousands of patients, and dozens of heart attack and stroke events of interest.  Madigan’s approach led him to disregard a large amount of data.  Wei could, of course, stratified the summary estimates for different length clinical trials, and analyzed whether there were differences as a function of trial duration.  Pfizer claimed that Wei conducted a variety of sensitivity analyses, but it is unclear whether he ever used this technique.  Wei should have been allowed in any event to take plaintiffs at their word that thrombotic events from Celebrex occurred shortly after first ingestion.   Pfizer Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei at 2 (Sept. 8, 2009), in Secur. Litig.



According to Pfizer, Professor Madigan reached different results from Wei’s largely because he had used different event counts and end points.  The defendants’ challenge to Madigan turned largely upon the unreliable way he went about counting events to include in his meta-analyses.

Data concerning unexpected adverse events in clinical trials often is collected as reports of treating physicians, whose descriptions may be incomplete, inaccurate, or inadequate.  When there is a suggestion that a particular adverse event – say heart attack – occurred more frequently in the medication arm as opposed to the placebo or comparator arms, the usual course of action is to have a panel of clinical experts review all the adverse event reports, and supporting medical charts, to provide diagnoses that can be used in a more complete statistical analyses.  Obviously, the reviewers should be blinded to the patients’ assignment to medication or placebo, and the reviewers should be clinical experts in the clinical specialty of the adverse event.  Cardiologists should be making the call for heart attacks.

In addition to event definition and adjudication, clinical trial interpretation sometimes leads to the use of “composite end points,” which consist of related diagnostic categories, aggregated in some way that makes biological sense.  For instance, if the concern is that a medication causes cardiovascular thrombotic events, a suitable cardiovascular composite end point might include heart attack and ischemic stroke.  Inclusion of hemorrhagic stroke, endocarditis, and valvular disease in the composite, however, would be inappropriate, given the concern over thrombosis.

Professor Madigan is a highly qualified statistician, but, as Pfizer argued, he had no clinical expertise to reassign diagnoses or determine appropriate composite end points.  The essence of the defendants’ challenges revolved around claims of flawed outcome and endpoint ascertainment and definitions.  According to Pfizer’s briefing, the event definition process was unblinded, and conducted by inexpert, partisan reviewers.  Madigan apparently relied upon the work of another plaintiffs’ witness, cardiologist Dr. Lawrence Baruch, as well as that of Dr. Curt Furberg.  Furberg was not a cardiologist; indeed he has never been licensed to practice medicine in the United Dates, and he had not treated a patient in over 30 years. Pfizer Mem. of Law in Opp. to Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei at 29 (Sept. 8, 2009), in Secur. Litig.  Furthermore, Furberg was not familiar with current diagnostic criteria for heart attack.  Plaintiffs’ counsel asked Furberg to rework some but not all of Baruch’s classifications, but only for fatal events.  Baruch could not explain why Furberg made these reclassifications.  Furberg acknowledged that he had never used “one-line descriptions to classify events,” which he did in the Celebrex litigation, when he received the assignment from plaintiffs’ counsel on the eve of the Court’s deadline for disclosures.  Id. According to Pfizer, if the plaintiffs’ witnesses had used appropriate end points and event counts, their meta-analyses would not have differed from Professor Wei’s work.  Id.

Pfizer pointed to Madigan’s testimony to claim that he had admitted that, based upon the impropriety of Furberg’s changing end point definitions, and his own changes, made without the assistance of a clinician, he would not submit the earlier version of his meta-analysis for peer review.  Pfizer’s [Proposed] Findings of Fact and Conclusions of Law with Respect to Motion to Exclude Certain Plaintiffs’ Experts’ Opinions Regarding Celebrex and Bextra, and Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei, Document 175, submitted in Securities Litig. (Dec. 4, 2009). at 33,  43.  The plaintiffs countered that Furberg’s reclassifications did not change Madigan’s reports, at least for certain years. Plaintiffs’ Reply Mem. of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 18 (May 5, 2010), in Securities Litig.

The trial court denied Pfizer’s challenges to Madigan’s meta-analysis in the securities fraud class action.  The court attributed any weakness in the classification of fatal adverse events by Baruch and Furberg to the limitations of the underlying data created and produced by Pfizer itself.  In re Pfizer Inc. Securities Litig., 2010 WL 1047618, *4 (S.D.N.Y. 2010).



Pfizer also argued that Madigan put together composite outcomes that did not make biological sense in view of the plaintiffs’ causal theories.  For instance, Madigan left out strokes in his composite, although he included both heart attack and stroke in his primary end point for his Vioxx litigation analysis, and he had no reason to distinguish Vioxx and Celebrex in terms of claimed thrombotic effects.  Pfizer’s [Proposed] Findings of Fact and Conclusions of Law with Respect to Motion to Exclude Certain Plaintiffs’ Experts’ Opinions Regarding Celebrex and Bextra, and Plaintiffs’ Motion to Exclude Defendants’ Expert Dr. Lee-Jen Wei, Document 175, submitted in Securities Litig. (Dec. 4, 2009). at 13-14, 18.  According to Pfizer, Madigan’s composite was novel and unvalidated by relevant, clinical opinion.  Id. at 29, 33.

The plaintiffs’ response is obscure.  The plaintiffs seemed to claim that Madigan was justified in excluding strokes because some kinds of stroke, hemorrhagic strokes, are unrelated to thrombosis.  Plaintiffs’ Reply Memorandum of Law in Further Support of Their Motion to Exclude Expert Testimony by Defendants’ Expert Dr. Lee-Jen Wei at 14 (May 5, 2010), in Securities Litig. at 14.  This argument is undermined by the facts:  better than 85% of strokes being ischemic in origin, and even some hemorrhagic strokes start as a result of an ischemic event.

In any event, Pfizer’s argument about Madigan’s composite end points did not gain any traction with the trial judge in the securities fraud class action:

“Dr. Madigan’s written submissions and testimony described clearly and justified cogently his statistical methods, selection of endpoints, decisions regarding event classification, sources of data, as well as the conclusions he drew from his analysis. Indeed, Dr. Madigan’s meta-analysis was based largely on data and endpoints developed by Pfizer. All four of the endpoints that Dr. Madigan used in his analysis-Hard CHD, Myocardial Thromboembolic Events, Cardiovascular Thromboembolic Events, and CV Mortality-have been employed by Pfizer in its own research and analysis. The use of Hard CHD in the relevant literature combined with the use of the other three endpoints by Pfizer in its own 2005 meta-analysis will assist the trier of fact in determining Pfizer’s knowledge and understanding of the pre-December 17, 2004, cardiovascular safety profile of Celebrex.”

In re Pfizer Inc. Securities Litig., 2010 WL 1047618, *4 (S.D.N.Y. 2010).