For your delectation and delight, desultory dicta on the law of delicts.

The Shmeta-Analysis in Paoli

July 11th, 2019

In the Paoli Railroad yard litigation, plaintiffs claimed injuries and increased risk of future cancers from environmental exposure to polychlorinated biphenyls (PCBs). This massive litigation showed up before federal district judge Hon. Robert F. Kelly,[1] in the Eastern District of Pennsylvania, who may well have been the first judge to grapple with a litigation attempt to use meta-analysis to show a causal association.

One of the plaintiffs’ expert witnesses was the late William J. Nicholson, who was a professor at Mt. Sinai School of Medicine, and a colleague of Irving Selikoff. Nicholson was trained in physics, and had no professional training in epidemiology. Nonetheless, Nicholson was Selikoff’s go-to colleague for performing epidemiologic studies. After Selikoff withdrew from active testifying for plaintiffs in tort litigation, Nicholson was one of his colleagues who jumped into the fray as a surrogate advocate for Selikoff.[2]

For his opinion that PCBs were causally associated with liver cancer in humans,[3] Nicholson relied upon a report he wrote for the Ontario Ministry of Labor. [cited here as “Report”].[4] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[5]

The defense challenged the admissibility of Nicholson’s meta-analysis, on several grounds. The trial court decided the challenge based upon the Downing case, which was the law in the Third Circuit, before the Supreme Court decided Daubert.[6] The Downing case allowed some opportunity for consideration of reliability and validity concerns; there is, however, disappointingly little discussion of any actual validity concerns in the courts’ opinions.

The defense challenge to Nicholson’s proffered testimony on liver cancer turned on its characterization of meta-analysis as a “novel” technique, which is generally unreliable, and its claim that Nicholson’s meta-analysis in particular was unreliable. None of the individual studies that contributed data showed any “connection” between PCBs and liver cancer; nor did any individual study conclude that there was a causal association.

Of course, the appropriate response to this situation, with no one study finding a statistically significant association, or concluding that there was a causal association, should have been “so what?” One of the reasons to do a meta-analysis is that no available study was sufficiently large to find a statistically significant association, if one were there. As for drawing conclusions of causal associations, it is not the role or place of an individual study to synthesize all the available evidence into a principled conclusion of causation.

In any event, the trial court concluded that the proffered novel technique lacked sufficient reliability, that the meta-analysis would “overwhelm, confuse, or mislead the jury,” and that the proffered meta-analysis on liver cancer was not sufficiently relevant to the facts of the case (in which no plaintiff had developed, or had died of, liver cancer). The trial court noted that the Report had not been peer-reviewed, and that it had not been accepted or relied upon by the Ontario government for any finding or policy decision. The trial court also expressed its concern that the proffered testimony along the lines of the Report would possibly confuse the jury because it appeared to be “scientific” and because Nicholson appeared to be qualified.

The Appeal

The Court of Appeals for the Third Circuit, in an opinion by Judge Becker, reversed Judge Kelly’s exclusion of the Nicholson Report, in an opinion that is still sometimes cited, even though Downing is no longer good law in the Circuit or anywhere else.[7] The Court was ultimately not persuaded that the trial court had handled the exclusion of Nicholson’s Report and its meta-analysis correctly, and it remanded the case for a do-over analysis.

Judge Becker described Nicholson’s Report as a “meta-analysis,” which pooled or “combined the results of numerous epidemiologic surveys in order to achieve a larger sample size, adjusted the results for differences in testing techniques, and drew his own scientific conclusions.”[8] Through this method, Nicholson claimed to have shown that “exposure to PCBs can cause liver, gall bladder and biliary tract disorders … even though none of the individual surveys supports such a conclusion when considered in isolation.”[9]


The appellate court gave no weight to the possibility that a meta-analysis would confuse a jury, or that its “scientific nature” or Nicholson’s credentials would lead a jury to give it more weight than it deserved.[10] The Court of Appeals conceded, however, that exclusion would have been appropriate if the methodology used itself was invalid. The appellate opinion further acknowledged that the defense had offered opposition to Nicholson’s Report in which it documented his failure to include data that were inconsistent with his conclusions, and that “Nicholson had produced a scientifically invalid study.”[11]

Judge Becker’s opinion for a panel of the Third Circuit provided no details about the cherry picking. The opinion never analyzed why this charge of cherry-picking and manipulation of the dataset did not invalidate the meta-analytic method generally, or Nicholson’s method as applied. The opinion gave no suggestion that this counter-affidavit was ever answered by the plaintiffs.

Generally, Judge Becker’s opinion dodged engagement with the specific threats to validity in Nicholson’s Report, and took refuge in the indisputable fact that hundreds of meta-analyses were published annually, and that the defense expert witnesses did not question the general reliability of meta-analysis.[12] These facts undermined the defense claim that meta-analysis was novel.[13] The reality, however, was that meta-analysis was in its infancy in bio-medical research.

When it came to the specific meta-analysis at issue, the court did not discuss or analyze a single pertinent detail of the Report. Despite its lack of engagement with the specifics of the Report’s meta-analysis, the court astutely observed that prevalent errors and flaws do not mean that a particular meta-analysis is “necessarily in error.”[14] Of course, without bothering to look, the court would not know whether the proffered meta-analysis was “actually in error.”

The appellate court would have given Nicholson’s Report a “pass” if it was an application of an accepted methodology. The defense’s remedy under this condition would be to cross-examine the opinion in front of a jury. If, on the other hand, the Nicholson had altered an accepted methodology to skew its results, then the court’s gatekeeping responsibility under Downing would be invoked.

The appellate court went on to fault the trial court for failing to make sufficiently explicit findings as to whether the questioned meta-analysis was unreliable. From its perspective, the Court of Appeals saw the trial court as resolving the reliability issue upon the greater credibility of defense expert witnesses in branding the disputed meta-analysis as unreliability. Credibility determinations are for the jury, but the court left room for a challenge on reliability itself:[15]

“Assuming that Dr. Nicholson’s meta-analysis is the proper subject of Downing scrutiny, the district court’s decision is wanting, because it did not make explicit enough findings on the reliability of Dr. Nicholson’s meta-analysis to satisfy Downing. We decline to define the exact level at which a district court can exclude a technique as sufficiently unreliable. Reliability indicia vary so much from case to case that any attempt to define such a level would most likely be pointless. Downing itself lays down a flexible rule. What is not flexible under Downing is the requirement that there be a developed record and specific findings on reliability issues. Those are absent here. Thus, even if it may be possible to exclude Dr. Nicholson’s testimony under Downing, as an unreliable, skewed meta-analysis, we cannot make such a determination on the record as it now stands. Not only was there no hearing, in limine or otherwise, at which the bases for the opinions of the contesting experts could be evaluated, but the experts were also not even deposed. All of the expert evidence was based on affidavits.”

Peer Review

Understandably, the defense attacked Nicholson’s Report as not having been peer reviewed. Without any scrutiny of the scientific bona fides of the workers’ compensation agency, the appellate court acquiesced in Nicholson’s self-serving characterization of his Report as having been reviewed by “cooperating researchers” and the Panel of the Ontario Workers’ Compensation agency. Another partisan expert witness characterized Nicholson’s Report as a “balanced assessment,” and this seemed to appease the Third Circuit, which was wary of requiring peer review in the first place.[16]

Relevancy Prong

The defense had argued that Nicholson’s Report was irrelevant because no individual plaintiff claimed liver cancer.[17] The trial court largely accepted this argument, but the appellate court disagreed because of conclusory language in Nicholson’s affidavit, in which he asserted that “proof of an increased risk of liver cancer is probative of an increased risk of other forms of cancer.” The court seemed unfazed by the ipse dixit, asserted without any support. Indeed, Nicholson’s assertion was contradicted by his own Report, in which he reported that there were fewer cancers among PCB-exposed male capacitor manufacturing workers than expected,[18] and that the rate for all cancers for both men and women was lower than expected, with 132 observed and 139.40 expected.[19]

The trial court had also agreed with the defense’s suggestion that Nicholson’s report, and its conclusion of causality between PCB exposure and liver cancer, were irrelevant because the Report “could not be the basis for anyone to say with reasonable degree of scientific certainty that some particular person’s disease, not cancer of the liver, biliary tract or gall bladder, was caused by PCBs.”[20]


It would likely have been lost on Judge Becker and his colleagues, but Nicholson presented SMRs (standardized mortality ratios) throughout his Report, and for the all cancers statistic, he gave an SMR of 95. What Nicholson clearly did in this, and in all other instances, was simply divide the observed number by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of false and misleading. And in case anyone remembers General Electric v. Joiner, Nicholson’s summary estimate of risk for lung cancer in men was below the expected rate.[21]

Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[22]

Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.

Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies is not a valid meta-analysis; indeed, it is a well-known example of how to generate a Simpson’s Paradox, which can change the direction or magnitude of any association.[23]

Some may be tempted to criticize the defense for having focused its challenge on the “novelty” of Nicholson’s approach in Paoli. The problem of course was the invalidity of Nicholson’s work, but both the trial court’s exclusion of Nicholson, and the Court of Appeals’ reversal and remand of the exclusion decision, illustrate the problem in getting judges, even well-respected judges, to accept their responsibility to engage with questioned scientific evidence.

Even in Paoli, no amount of ketchup could conceal the unsavoriness of Nicholson’s scrapple analysis. When the Paoli case reached the Court Appeals again in 1994, Nicholson’s analysis was absent.[24] Apparently, the plaintiffs’ counsel had second thoughts about the whole matter. Today, under the revised Rule 702, there can be little doubt that Nicholson’s so-called meta-analysis should have been excluded.

[1]  Not to be confused with the Judge Kelly of the same district, who was unceremoniously disqualified after attending an ex parte conference with plaintiffs’ lawyers and expert witnesses, at the invitation of Dr. Irving Selikoff.

[2]  Pace Philip J. Landrigan & Myron A. Mehlman, “In Memoriam – William J. Nicholson,” 40 Am. J. Indus. Med. 231 (2001). Landrigan and Mehlman assert, without any support, that Nicholson was an epidemiologist. Their own description of his career, his undergraduate work at MIT, his doctorate in physics from the University of Washington, his employment at the Watson Laboratory, before becoming a staff member in Irving Selikoff’s department in 1969, all suggest that Nicholson brought little to no experience in epidemiology to his work on occupational and environmental exposure epidemiology.

[3]  In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).

[4]  William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto, Ontario Dec. 1987).

[5]  Id. at 373.

[6]  United States v. Downing, 753 F.2d 1224 (3d Cir.1985)

[7]  In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 111 S.Ct. 1584 (1991).

[8]  Id. at 845.

[9]  Id.

[10]  Id. at 841, 848.

[11]  Id. at 845.

[12]  Id. at 847-48.

[13]  See, e.g., Robert Rosenthal, Judgment studies: Design, analysis, and meta-analysis (1987); Richard J. Light & David B. Pillemer, Summing Up: the Science of Reviewing Research (1984); Thomas A. Louis, Harvey V. Fineberg & Frederick Mosteller, “Findings for Public Health from Meta-Analyses,” 6 Ann. Rev. Public Health 1 (1985); Kristan A. L’abbé, Allan S. Detsky & Keith O’Rourke, “Meta-analysis in clinical research,” 107 Ann. Intern. Med. 224 (1987).

[14]  Id. at 857.

[15]  Id. at 858/

[16]  Id. at 858.

[17]  Id. at 845.

[18]  Report, Table 16.

[19]  Report, Table 18.

[20]  In re Paoli, 916 F.2d at 847.

[21]  See General Electric v. Joiner, 522 U.S. 136 (1997); NAS, “How Have Important Rule 702 Holdings Held Up With Time?” (March 20, 2015).

[22]  Report, Table 22.

[23]  James A. Hanley, Gilles Thériault, Ralf Reintjes and Annette de Boer, “Simpson’s Paradox in Meta-Analysis,” 11 Epidemiology 613 (2000); H. James Norton & George Divine, “Simpson’s paradox and how to avoid it,” Significance 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in Statistics, 2 Biometrika 121 (1903).

[24]  In re Paoli RR Yard Litig., 35 F.3d 717 (3d Cir. 1994).

When Is Risk Really Risk?

February 14th, 2012

The term “risk” has a fairly precise meaning in scientific parlance.  The following is a typical definition:

RISK The probability that an event will occur, e.g., that an individual will become ill or die within a stated period of time or by a certain age. Also, a nontechnical term encompassing a variety of measures of the probability of a (generally) unfavorable outcome. See also probability.

Miquel Porta, ed., A Dictionary of Epidemiology 212-18 (5th ed. 2008)(sponsored by the Internat’l Epidemiological Ass’n).

In other words, a risk is an ex ante cause.  The probability is not a qualification about whether there is a causal relationship, but rather whether any person at risk will develop the outcome of interest.  Such is the nature of stochastic risks.

Regulatory agencies often use the term “risk” metaphorically, as a fiction to justify precautionary regulations.  Although there may be nothing wrong with such precautionary initiatives, regulators often imply a real threat of harm from what can only be a hypothetical harm.  Why?  If for no other reason, regulators operate with a “wish bias” in favor of the reality of the risk they wish to avert if risk it should be.  We can certainly imagine the cognitive slippage that results from the need to motivate the regulated actors to comply with regulations, and at times, to prosecute the noncompliant.

Plaintiffs’ counsel in personal injury and class action litigation have none of the regulators’ socially useful motives for engaging in distortions of the meaning of the word “risk.”  In the context of civil litigation, plaintiffs’ counsel use the term “risk,” borrowed from the Humpty-Dumpty playbook:

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master — that’s all.”

Lewis Carroll, Through the Looking-Glass 72 (Raleigh 1872).

Undeniably, the word mangling and distortion have had some success with weak-minded judges, but Humpty-Dumpty linguistics had a fall recently in the Third Circuit.  Others have written about it, but I am only just getting around to read the analytically precise and insightful decision in Gates v. Rohm and Haas Co., 655 F.3d 255 (3d Cir. 2011).  See Sean Wajert, “Court of Appeals Rejects Medical Monitoring Class Action” (Aug. 31, 2011); Carl A. Solano, “Appellate Court Consensus on Medical Monitoring Class Actions Solidifies” (Sept. 12, 2011).

Gates was an attempted class action, in which the district court denied plaintiffs’ motion for certification of a medical monitoring and property damage class.  265 F.R.D. 208 (E.D.Pa. 2010)(Pratter, J.).  Plaintiffs contended that they were exposed to varying amounts of vinyl chloride exposure in air, and perhaps in water at levels too low to detect. Gates, 655 F.3d at 258-59.   The class’s request for medical monitoring foundered because plaintiffs were unable to prove that they were all exposed to a level of vinyl chloride that created a significant risk of serious latent disease for all class members. Id. at 267-68.

With no scientific evidence in hand, the plaintiffs tried to maintain that they were “at risk” on the basis of EPA regulations, which set a very low, precautionary threshold, but the district and circuit courts rebuffed this use of regulatory “risk” language:

The court identified two problems with the proposed evidence. First, it rejected the plaintiffs’ proposed threshold—exposure above 0.07µ/m3, developed as a regulatory threshold by the EPA for mixed populations of adults and children—as a proper standard for determining liability under tort law. Second, the court correctly noted, even if the 0.07 µ/m3 standard were a correct measurement of the aggregate threshold, it would not be the threshold for each class member who may be more or less susceptible to diseases from exposure to vinyl chloride.18 Although the positions of regulatory policymakers are relevant, their risk assessments are not necessarily conclusive in determining what risk exposure presents to specified individuals. See Federal Judicial Center, Reference Manual on Scientific Evidence 413 (2d ed.2000) (“While risk assessment information about a chemical can be somewhat useful in a toxic tort case, at least in terms of setting reasonable boundaries as to the likelihood of causation, the impetus for the development of risk assessment has been the regulatory process, which has different goals.”); id. at 423 (“Particularly problematic are generalizations made in personal injury litigation from regulatory positions…. [I]f regulatory standards are discussed in toxic tort cases to provide a reference point for assessing exposure levels, it must be recognized that  there is a great deal of variability in the extent of evidence required to support different regulations.”).

Thus, plaintiffs could not carry their burden of proof for a class of specific persons simply by citing regulatory standards for the population as a whole. Cf. Wright v. Willamette Indus., Inc., 91 F.3d 1105, 1107 (8th Cir.1996) (“Whatever may be the considerations that ought to guide a legislature in its determination of what the general good requires, courts and juries, in deciding cases, traditionally make more particularized inquiries into matters of cause and effect.”).

Plaintiffs have failed to propose a method of proving the proper point where exposure to vinyl chloride presents a significant risk of developing a serious latent disease for each class member.

Plaintiffs propose a single concentration without accounting for the age of the class member being exposed, the length of exposure, other individual factors such as medical history, or showing the exposure was so toxic that such individual factors are irrelevant. The court did not abuse its discretion in concluding individual issues on this point make trial as a class unfeasible, defeating cohesion.

Id. at 268.  For class actions, the inability to invoke a low threshold of “permissible” exposure may be the death knell of medical monitoring and personal injury class actions.  The implications of the Gates court’s treatment of “regulatory risk” is, however, more far reaching.  Sometimes risk is not really risk at all.  The ambiguity of the risk in risk assessment has confused judges from the lowest magistrate up to Supreme Court justices.  It is time to disambiguate.  See General Electric v. Joiner, 522 U.S. 136, 153-54 (1997) (Stevens, J., dissenting in part) (erroneously assuming that plaintiffs’ expert witness was justified in relying upon a weight-of-evidence methodology because such methodology is often used in risk assessment).

Two Articles of Interest in JAMA – Nocebo Effects; Medical Screening

February 12th, 2012

Two articles in this week’s Journal of the American Medical Association (JAMA) are of interest to lawyers who litigate, or counsel about, health effects.

One article deals with the nocebo effect, which is the dark side of the placebo effect.  Placebos can induce beneficial outcomes because of the expectation of useful therapy; nocebos can induce harmful outcomes because of the expectation of injury. The viewpoint article in JAMA points out that nocebo effects, like placebo effects, result from the “psychosocial context or therapeutic environment” affecting a patient’s perception of his state of health or illness.  Luana Colloca, MD, PhD, and Damien Finniss, MSc Med., “Nocebo Effects, Patient-Clinician Communication, and Therapeutic Outcomes,” 307 J. Am. Med. Ass’n 567, 567 (2012).

The authors discuss how clinicians can inadvertently prejudice health outcomes by how they frame outcome information to patients.  Importantly, Colloca and Finniss also note that the negative expectations created by the nocebo communication can take place in the process of obtaining informed consent.

The litigation significance is substantial because the creation of negative expectations is not the exclusive domain of clinicians.  Plaintiffs’ counsel, support and advocacy groups, and expert witnesses, even when well meaning, can similarly create negative expectations for health outcomes.  These actors often enjoy undeserved authority among their audience of litigants or claimants.  The extremely high rate of psychogenic illness found in many litigations is the result.  The harmful communications, however, are not limited to plaintiffs’ lawyers and their auxiliaries.  As Colloca and Finniss point out, nocebo effects can be induced by well-meaning warnings and disclosure of information from healthcare providers to patients.  Id. at 567.  The potential to induce negative harms in this way has the obvious consequence for the tort system:  more warnings are not always beneficial.  Indeed, warnings themselves can bring about harm.  This realization should temper courts’ enthusiasms for the view that more warnings are always better.  Warnings about adverse health outcomes should be based upon good scientific bases.


The other article from this week’s issue of JAMA addresses the harms of screening.  Steven H. Woolf, MD, MPH, and Russell Harris, MD, MPH, “The Harms of Screening: New Attention to an Old Concern,” 307 J. Am. Med. Ass’n 565 (2012).    As I pointed out on these pages, screening for medical illnesses carries significant health risks to patients and ethical risks for the healthcare providers.  SeeEthics and Daubert: The Scylla and Charybdis of Medical Monitoring” (Feb. 1, 2012).  Bayes’ Theorem teaches us that even very high likelihood ratios for screening tests will yield true positive cases swamped by false positive cases when the baseline prevalence is low.  See Jonathan Deeks and Douglas Altman, “Diagnostic tests 4: likelihood ratios,” 329 Brit. Med. J. 168 (2004) (Providing a useful nomogram to illustrate how even highly accurate tests, with high likelihood ratios, will produce more false than true positive cases when the baseline prevalence of disease is low).

The viewpoint piece by Woolf and Harris emphasizes the potential iatrogenic harms from screening:

  • physical injury from the test itself (as in colonic perforations from colonoscopy);
  • cascade of further testing, with further risk of harm, both physical and emotional;
  • anxiety and emotional distress over abnormal results;
  • overdiagnosis; and
  • the overtreatment of conditions that are not substantial threats to patients’ health

These issues should have an appropriately chilling effect on judicial enthusiasm for medical monitoring and surveillance claims.  Great care is required to fashion a screening plan for patients or claimants.  Of course, there are legal risks as well, as when plaintiffs’ counsel fail to obtain the necessary prescriptions or permits to conduct radiological screenings.  See Schachtman “State Regulators Impose Sanction for Unlawful Silicosis Screenings,” 17(13) Wash. Leg. Fdtn. Legal Op. Ltr. (May 25, 2007).  Caveat litigator.

Ethics and Daubert: The Scylla and Charybdis of Medical Monitoring

February 1st, 2012

Build a courtroom and they will come. The floodgates argument, all too quickly rejected by the judiciary, proved all too true in West Virginia. West Virginia built a courtroom that would entertain multiple claims from virtually every West Virginian. This jurisprudential hospitality offers medical monitoring that requires no predicate present injury. Bower v. Westinghouse Electric Corp., 522 S.E.2d 424 (W.Va. 1999).

Everyone is exposed to hazardous substances and to medications with potential side effects. In West Virginia, almost everyone is a potential plaintiff in a medical monitoring case.

Universal health care may be attainable, after all, funded by the manufacturers of predominately beneficial products. Almost heaven West Virginia, indeed. Type 2 diabetes mellitus, or adult-onset diabetes, is a devastating disease that results from uncontrolled blood sugars. The medical complications of diabetes are extensive and well known: blindness, gangrene, kidney failure, heart attack, stroke, liver disease, and others. The costs of this medical care are staggering, and diabetics are among the neediest patients in our health care system. Imagine if the “compensation goals” of the tort system could be subverted to provide medical monitoring to diabetic patients. If possible anywhere, it would seem West Virginia would be the most likely candidate.

Between March 1997 and March 2000, many Type 2 diabetics achieved control of their blood sugars with the help of a new oral medication, Troglitazone (Rezulin®).  Troglitazone modifies the Type 2 diabetic patient’s resistance to insulin. The drug effectively reduces blood sugar, and it avoids the need for exogenous insulin. Most life-saving drugs have side effects, and Troglitazone is no exception. Physicians, knowledgeable about Troglitazone’s efficacy and its potential for rare, idiosyncratic liver toxicity, prescribed the drug to help their patients gain control over their blood sugar levels and to avoid the serious complications of diabetes. In March 2000, the manufacturer of Troglitazone voluntarily withdrew the drug from the market. Adverse publicity over liver toxicity and the availability of two other more recent glitazones, which initially had the appearance of a safer adverse event profile, had shifted the risk-benefit balance against Troglitazone.

No one can be surprised that Rezulin plaintiffs sought class certification in West Virginia state court; nor can anyone, in view of Bower, be surprised that asymptomatic plaintiffs sought medical monitoring as a remedy, within the context of the class action. Observers unfamiliar with the weakness of the Rezulin plaintiffs’ scientific proofs might, however, be surprised at the plaintiffs’ failure, initially at the trial court level, to win class certification in West Virginia, for a medical monitoring class. In re West Virginia Rezulin Litigation, W.Va. Cir.Ct., Civil Action No. 00-C-1180H, Amended Order Denying Class Certification (Dec. 12, 2001) (Hutchison, J.), 2001 WL 1818442 (Dec 13, 2001).

The West Virginia trial court’s rejection of the proposed Rezulin medical monitoring class is remarkable for many reasons. Some commentators regard West Virginia law as the outer limits of medical monitoring jurisprudence.  In the Rezulin case, however, Judge John Hutchison delivered a thorough, analytical opinion, which demonstrated that the liberal West Virginia criteria for a medical monitoring remedy cannot be satisfied as easily as once thought. Among the notable holdings were the trial court’s insistence that:

(1) the monitoring proponents adduce epidemiologic evidence that the exposure at issue can actually cause the latent injury for which monitoring is sought;

(2) the proponents of monitoring identify highly sensitive tests, which when deployed on the exposed population that has a relatively high prevalence of the latent injury, will have a high predictive value; and

(3) the proposed monitoring will allow for early preventive care.

In determining whether the class plaintiffs had met the criteria for medical monitoring, Judge Hutchison did not face any significant evidentiary gatekeeping responsibility. The trial court did not have to ponder the contours of any reliable epidemiologic studies. The court found no epidemiologic studies to show that Rezulin can cause latent injury months or years after the drug is discontinued.

Similarly, the court did not have to delve into any evidentiary thicket of contradictory scientific proof to determine whether the proposed medical monitoring program was based upon reliable scientific and medical methods. The court found that most of the proposed tests had low sensitivity, and that there were no diagnostic tests that can determine whether any liver injury was caused by Rezulin. Given the many other causes of liver diseases among the plaintiff class members, there was no evidence of any prevalence of latent injury from Rezulin. Without an assessment of prevalence of latent injury, any proposed test would have little or no positive predictive value. The proposed program failed for lack of substantial evidentiary support.

The court was further impressed by the riskiness of the proposed monitoring program. The proposed tests, lacking sensitivity and specificity, were likely to result in many “false positives,” which in turn would lead to liver biopsies.  (Indeed, false positives would likely swamp any true positives if any there should be). Liver biopsies, however, are painful, invasive, and carry a small, but definite, risk of death. Furthermore, the court found that the proposed tests would not facilitate medical interventions that could prevent or resolve the detected problem.

This failure to obtain class certification for medical monitoring is noteworthy for more than the narrow case holdings. There is intriguing obiter dictum. The court noted that one of the plaintiffs’ expert witnesses admitted that the proposed monitoring program was an “experiment.” The court found this admission directly relevant to the plaintiffs’ failure to produce epidemiologic evidence that the substance at issue could actually cause latent injury. Apparently, the plaintiffs’ witness was advocating implementation of the monitoring program so it might yield the evidence that the class must proffer before it could obtain the monitoring remedy. The court readily dismissed this Alice in Wonderland insistence upon “[s]entence first—verdict afterwards.” The court showed little patience for the “stuff and nonsense” of trying to satisfy the criterion of epidemiologic evidence with the anticipated results that would come from the proposed monitoring program itself.

Implicit in the trial court’s rejection of evidentiary bootstrapping is a larger, ethical concern. There is something unsettling about a court-ordered medical monitoring program that is an “experiment.” Class certification decisions are complicated enough without having to endorse experimentation on human beings. Perhaps the suggestion of human experimentation chilled any residual enthusiasm for the notion that medical monitoring might otherwise be a suitable judicial remedy for achieving corrective justice in a mass tort case.

And yet there is an “experimental” aspect to many, if not most, proposed monitoring programs. Little or no clinical experience is available to support the claimed benefits of many proposed large, lifelong monitoring regimes. Indeed, such programs are not wholly benign. The potential harms of monitoring, some of which were acknowledged in Judge Hutchison’s opinion, are significant.

The imposition of potentially harmful monitoring should, indeed, trouble our courts and cause their reticence in embracing monitoring as a remedy. Courts need to confront the ethical implications that flow from the experimental nature of many medical monitoring proposals.

Proposals for monitoring differ from expert witness opinion that is typically offered in personal injury cases involving present injuries. Physician witnesses, at the request of the parties, usually examine claimants, evaluate and diagnose their conditions, and opine about prognosis and etiology. Although such witnesses use their medical experience, training, and knowledge, they generally are not acting within the context of a patient-physician relationship. Adams v. Harron, 191 F.3d 447, 1999 WL 710326 (4th Cir. 1999). In the usual personal injury case, physician witnesses are not advocating medical interventions; at most, they are endorsing or criticizing the reasonable medical necessity of medical plans of treating physicians.  In medical monitoring class actions, however, physician expert witnesses advocate medical interventions for people they have often never met and have never evaluated.

Recommendations for preventive health measures carry risks of harm, and these risks must provoke ethical scrutiny of the proposed monitoring. The offering of an opinion that a plaintiff, or a class of plaintiffs, should receive medical monitoring is the practice of medicine. As part of medical practice, the presentation of such opinions is subject to ethical constraints, which courts should observe and foster. Medico-legal opinions that recommend preventive interventions represent a significant involvement in the claimant’s actual medical care. Screening or monitoring recommendations must acknowledge and avoid the highly individualized risks of harm and the essential need for informed consent to protect individual autonomy.

Physicians who prepare medical monitoring litigation plans cannot absolve themselves of ethical and professional responsibility by disclaiming the existence of physician-patient relationships. Such physicians are not practicing mere courthouse medicine; they are engaged in medical practice, as defined by the American Medical Association, AMA Policy H-265.993, and in the sense that they are seeking to control future medical interventions for the class members.  Physicians who propose medical monitoring or screening for claimants thus operate under the ethical constraints of avoiding harm, providing benefits, and respecting individual patient autonomy. The medical community recognizes that good intentions notwithstanding, monitoring can be harmful. “[P]reventive therapies can give rise to anticipatory anxiety, side effects, the stress of false-positive results and an unhealthy preoccupation with disease.” Huston, “The Perils of Prevention,” 154 Canadian Med. Ass’n J. 1463 (1996). Other potential adverse effects of monitoring include deriving false assurances of health and being labeled as “sick.” Marshall, “Prevention. How Much Harm? How Much Benefit? 3. Physical, Psychological and Social Harm,” 155 Canadian Med. Ass’n J. 169 (1996).

Furthermore, some screening programs will detect true-positive results with little or no clinical significance.  For example, in cancer screening, some nodules detected will be benign. Other nodules may be extremely indolent malignancies, which would never become aggressive, metastatic growths. Indeed, such masses, picked up in screening, might regress before they would have been otherwise detectable. Screening programs must come to grips with the vagaries of the diseases and conditions that are the subject of the monitoring. The potential for harm, from monitoring, may be increased by the litigation setting, in which people are encouraged to become invested in illness seeking behaviors.

Given the potential for harm, physician witnesses who advocate monitoring face ethical and evidentiary burdens to establish the efficacy and benefit of the planned screening. At a minimum, class members will have to be properly advised, and will have to be given informed consent. The process of obtaining consent must accommodate the intensely personal and individualized judgments about the risks of monitoring.

Well-established criteria for evaluating public health interventions are available and employed by such agencies and groups as the United States Preventive Task Force, the Canadian Task Force on the Periodic Health Examination, the Cochrane Collaboration, and others.  The existence of generally accepted evaluative criteria has obvious implications for determining the admissibility of monitoring proposals under either Daubert or Frye standards. Expert witnesses, in this ethically sensitive area, must be held to the same intellectual rigor that would be employed to evaluate monitoring or screening programs in the field of public health. Pitfalls, fallacies, and methodological error are abundant in the field of preventive medicine. Marshall, “Prevention. How Much Harm? How Much Benefit? 2. Ten Potential Pitfalls in Determining the Clinical Significance of Benefits,” 154 Canadian Med. Ass’n J. 1837 (1996). Even well-intentioned advice, such as counseling routine mammography in women, has been the subject of heated controversy and intense methodological debate. Ernster, “Mammograms and Personal Choice,” The New York Times (Feb. 14, 2002).   Courts must acknowledge that if a proposed preventive program does not satisfy generally accepted criteria for medical interventions and does not have proven benefits that clearly outweigh the potential harms, medical monitoring becomes a court-sanctioned human experiment.

The guiding principles and corollaries for human experimental research can be found in several sources, including The Nuremberg Code, Permissible Medical Experiments, World Medical Association, “Declaration of Helsinki’s Ethical Principles for Medical Research Involving Human Subjects,” 284 J. Am. Med. Ass’n 3043 (Dec. 20, 2000), as restated on several occasions, regulations of the Food and Drug Administration, Protection of Human Subjects, 21 C.F.R. § 50.25; and the Department of Health and Human Services, 45 C.F.R. § 46.

Informed consent is the absolute requirement for any human medical experimentation. Regulations and guidelines of various federal and state agencies and medical organizations, however, place further limitations on the course of permissible experimental design.  The Declaration of Helsinki, for instance, requires that the research design be clearly set out in an experimental protocol, which has been approved by an independent ethical review committee. The proposed medical research

“must conform to generally scientific principles, [and] be based on a thorough knowledge of the scientific literature….”

Declaration of Helsinki, ¶11 (2000). Permissible Medical Experiments, supra. Daubert and Frye thus become ethical imperatives, as well as legal requirements, before any serious consideration can be given to a medical monitoring program.

In all likelihood, no court, if it really thought about the matter, would want to serve as an Institutional Review Board, and to sit in judgment of an experimental protocol. The realization that the proposed remedy is itself an experiment should suffice to quash any advocacy for the result. Indeed, an awareness of the ethical problems entailed by poorly supported medical monitoring programs must guide and propel courts to be vigilant in their gatekeeping responsibilities.  Much of the earlier case law on monitoring developed before the principles and implications of Daubert could be realized in monitoring cases, and these older judgments must be questioned in the light of these ethical and evidentiary concerns.

Judge Hutchison’s decision to deny certification for a Rezulin medical monitoring class obviated consideration of the ethical and evidentiary problems posed by monitoring remedies. The clear absence of proof to support the remedy for the Rezulin plaintiffs avoided debate over how to protect the informed consent process when the personal perception of the risks of monitoring will be perceived differently by each class member.

The paradisiacal Appalachian dream, however, did not last very long.

The Supreme Court of West Virginia did not appear to be concerned by the ethics of human experimentation or the need for showing a basis in evidence for the reliability or accuracy of screening tests.  Chief Justice Starcher, writing for a unanimous court, reversed and remanded the case to proceed as a class action.  The Supreme Court’s opinion was a mechanical recitation of class action rules, interpreted to disallow any preliminary inquiry into the merits of the suit. In re West Virginia Rezulin Litig., 585 S.E.2d 52 (W.Va. 2003).  The word “ethics” does not appear in the Supreme Court’s opinion. The Nuremberg Code was nowhere in sight.

Perhaps most medical monitoring class action battles are now behind us, given that federal courts have come to their senses and have generally disallowed class actions for this remedy.  The cases on the book, however, represent ethically dubious judgments, which call for condemnation from the medical and legal community.  Courts must take stock of the certainty that many medical monitoring schemes will produce far more false positive cases than true positive cases, and widespread fear, anxiety, and harm from unnecessary medical interventions.  See generally Christopher P. Guzelian, Bruce E. Hillner, and Philip S. Guzelian, “A Quantitative Methodology for Determining the Need for Exposure-Prompted Medical Monitoring,” 79 Indiana L. J. 57 (2004).

[An earlier version of this post was published under the same title in Industrywide Liability News (Spring 2002)]