TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence

November 14th, 2011

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”

Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing. Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.  706 F. Supp. 358, 373 (E.D. Pa. 1988).

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.  In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis.

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that “no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”  In In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).  The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.  52 F.3d 1124 (2d Cir. 1995).  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.  Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.  See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia Cases” (2011) (forthcoming in Jurimetrics).

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.  See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation. Id. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies)

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis.  With this historical backdrop, it is interesting to see what the new third edition provides for guidance to the federal judiciary on this important topic.

STATISTICS CHAPTER

The statistics chapter of the third edition gives continues to give scant attention to meta-analysis.  The chapter notes, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautions that meta-analytic procedures “have their own weakness,” without detailing what that one weakness is.  RMSE 3d at 254 n. 107.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”

Id. at 289.

This definition is inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.

EPIDEMIOLOGY CHAPTER

The chapter on epidemiology delves into meta-analysis in greater detail than the statistics chapter, and offers apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”

Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).  The epidemiology chapter appropriately notes that meta-analysis can help address concerns over random error in small studies.  Id. at 579; see also id. at 607 n. 171.

Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter hedges considerably:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

Id. at 607.  The stated objection to pooling results for observational studies is certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter goes on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the RSME, the authors repeat their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”

Id. at 607 n.177.  Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the RMSE 3d fails to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

Id. at 608.  The authors are entitled to their opinion, but their discussion leaves the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed.  What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.

MEDICAL TESTIMONY CHAPTER

The chapter on medical testimony is the third pass at meta-analysis in RMSE 3d.   The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

RMSE 3d at 722-23.

The chapter curiously omits observational studies, but the footnote reference (note 143) then inconsistently discusses two meta-analyses of observational, rather than experimental, studies:

“143. Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”

Id. at 723 n.143.

The medical testimony chapter then provides further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

Id. at 723-24.  This discussion further muddies the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are a necessary precondition for conducting a meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the new Reference Manual.

OSHA’s HazCom Standard — Statistical and Scientific Nonsense

November 13th, 2011

Almost 28 years ago, the United States Department of Labor (Occupational Safety and Health Administration or OSHA) promulgated The Hazard Communication Standard. 29 C.F.R. § 1910.1200 (November 1983; effective date November 25, 1985) (HazCom standard).  Initially the HazCom standard applied to importers and manufacturers of chemicals.  Starting one year later, November 25, 1986, the standard covered manufacturing employers, under OSHA jurisdiction, by defining their duties to protect and inform employees.

The HazCom standard applies to all chemical manufacturers and distributors and to

“any chemical which is known to be present in the workplace in such a manner that employees may be exposed under normal conditions of use or in a foreseeable emergency.”

29 C.F.R. § 1910.1200(b)(1), and (b)(2).  The standard requires manufacturers and distributors of hazardous chemicals inform not only their own employees of the dangers posed by the chemicals, but downstream employers and employees as well.  The standard implements this duty to warn downstream employers’ employees by requiring that containers of hazardous chemicals leaving the workplace are labeled with “appropriate hazard warnings.”  See Martin v. American Cyanamid Co., 5 F.3d 140, 141-42 (6th Cir. 1993) (reviewing agency’s interpretation of the standard).

The HazCom standard attempts to provide some definition of the health hazards for which warnings are required:

“For health hazards, evidence which is statistically significant and which is based on at least one positive study conducted in accordance with established scientific principles is considered to be sufficient to establish a hazardous effect if the results of the study meet the definitions of health hazards in this section.”

29 C.F.R. § 1910.1200(d)(2).

This regulatory language is troubling. What does statistically significant mean?  The concept remains important in health effects research, but several writers have subjected the use of significance testing specifically, and frequentist statistics generally, to criticisms.  See, e.g., Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (Ann Arbor 2008) (example of one of the more fringe, and not particularly cogent, criticisms of frequentist statistics).  And what are the “established scientific principles,” which would allow a single “positive study” to “establish” a hazardous “effect”?

The HazCom standard is important not only for purposes of regulatory compliance, but for its potential implications for products liability law, as well.  With its importance in mind, what can be said about the definition of health hazard, provided in 29 C.F.R. § 1910.1200(d)(2)?

Perhaps a good place to start is with the guidance provided by OSHA on compliance with the HazCom standard.  To be sure, like most agency guidance statements, this one is prefaced with caveats and cautions:

“This guidance is not a standard or regulation, and it creates no new legal obligations. It is advisory in nature, informational in content, and is intended to assist employers in providing a safe and healthful workplace. Pursuant to the Occupational Safety and Health Act, employers must comply with safety and health standards promulgated by OSHA or by a state with an OSHA-approved state plan. In addition, pursuant to Section 5(a)(1), the General Duty Clause of the Act, employers must provide their employees with a workplace free from recognized hazards likely to cause death or serious physical harm. Employers can be cited for violating the General Duty Clause if there is a recognized hazard and they do not take reasonable steps to prevent or abate the hazard. However, failure to implement any specific recommendations in this guidance is not, in itself, a violation of the General Duty Clause. Citations can only be based on standards, regulations, and the General Duty Clause.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) (July 6, 2007).

Section II of the Guidance describes how manufacturers may assess whether their chemicals are “hazardous.”  A health hazard is defined as a chemical

“for which there is statistically significant evidence based on at least one study conducted in accordance with established scientific principles that acute or chronic health effects may occur in exposed employees.”

A fair-minded person might object that this is no guidance at all.  Statistically significant is not defined in the regulations. Study is not defined.  The guidance specifies that the study or studies must be conducted in accordance with “established scientific principles,” but must the interpretation or judgment of causality be made similarly in accordance with such principles? One would hope so, but the Guidance does not really specify.  The use of “may” seems to inject a level of conjecture or speculation into the hazard assessment.

Section V of the Guidance addresses data analysis, and here the agency attempts to provide some meaning to statistical significance and other terms in the regulation, but in doing so, the Guidance offers incoherent, incredible advice.

The Guidance notes that the regulation specifies one “positive study,” which presumably is a study that is some evidence in favor of an “effect.”  Because we are dealing with chemical exposures in occupational settings, the studies at issue will be, at best, observational studies.  Randomized clinical trials are out.  The one study (at least) at issue must be sufficient to establish a hazardous effect if that effect is considered a “health hazard” within the meaning of the regulations.  This is problematic on many levels.  What sort of study are we discussing?  An experimental study in planaria worms, a case study of a single human, an ecological study, or an analytical epidemiologic (case-control or cohort) study?  Whatever the study is, it would be a most remarkable study if it alone were “sufficient” to “establish” an “effect.”

A reasonable manufacturer or disinterested administrator surely would interpret the sufficiency requirement to mean that the entire evidentiary display must be considered rather than whether one study, taken in isolation, ripped from its scientific context, should be used to suggest a duty to warn.  The Guidance, and the regulations, however, never address the real-world complexity of hazard assessment.

Section V of the Guidance offers a failed attempt to illuminate the meaning of statistical significance:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

Few statisticians or scientists would accept the proffered definition as acceptable.  The Guidance’s statement that a p-value is equivalent to the probability of the “toxic effect” occurring by chance is unacceptable for several reasons.

First, it is a notoriously incorrect, fallacious statement of the meaning of a p-value:

“Since p is calculated by assuming the null hypothesis is correct (that there is no difference [between observed and expected] in the full population), the p-value cannot give the chance that this hypothesis is true.  The p-value merely gives the chance of getting evidence against the null hypothesis as strong or stronger than the evidence at hand — assuming that the null hypothesis … is correct.”

David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The New Wigmore: Expert Evidence § 12.8.2, at 559 (2d ed. 2010) (discussing the transpositional fallacy).

Second, even if we could ignore the statistical solecism, the Guidance’s use of a mechanical test for statistical significance is troubling.  The p-value is not necessarily an appropriate protection against Type I error, or a “false alarm” that there is an association between the exposure and outcome of interest.  Multiple testing and other aspects of a study may inflate the number of false alarms to the point that a study with a low p-value, even one much lower than 5%, will not rule out the likely role of chance as an explanation for the study’s result.

Third, the Guidance’s suggestion that “statistical significance” boils down to a conclusion that the “effect is real” may be its greatest offense against scientific and statistical methodology.  Section V of the Guidance emphasizes that the HazCom standard states that

“evidence that is statistically significant and which is based on at least one positive study conducted in accordance with established scientific principles is considered to be sufficient to establish a hazardous effect if the results of the study meet the [HCS] definitions of health hazards.”

This is nothing more than semantic fiat and legerdemain.

Statistical significance may, in some circumstances, permit an inference that the divergence from the expected was not likely due to chance, but it cannot, in the context of observational studies, allow for a conclusion that the divergence resulted because of a cause-effect relationship between the exposure and the outcome.  Statistical significance cannot rule out systemic bias or confounding in the study; nor can it help us reconcile inconsistencies across studies.  The study may have identified an association, which must be assessed for its causal or non-causal nature, in the context of all relevant evidence.  See Arthur Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965).”

The OSHA Guidance is really no guidance at all.  Ensuring worker health and safety by requiring employers to provide industrial hygiene protections for workers is an exceedingly important task, but this aspect of the HazCom standard is incoherent and incompetent. Workers and employers are in the dark, and product suppliers are vulnerable to arbitrary and capricious enforcement.

Lording the Data – Scientific Fraud

November 10th, 2011

Last week, the New York Times published a news story about psychologist Diederik Stapel, of the Netherlands.  Tilburg University accused him of having committed research fraud  in several dozen published papers, including the journal Science, the official journal of the AAAS.  See Benedict Carey, “Fraud Case Seen as a Red Flag for Psychology Research: Noted Dutch Psychologist, Stapel, Accused of Research Fraud,” New York Times (Nov. 2, 2011).  The Times expressed surprise over the suggestion that psychology is plagued by fraud and sloppy research.  The surprise is that there are not more stories in the lay media over the poor quality of scientific research.  The readers of Retraction Watch, and the Office of Research Integrity’s blog will recognize how commonplace Stapel’s fraud is.

Stapel’s fraud has wide-ranging implications for the doctoral students, whose dissertations he supervised, and for colleagues, with whom he collaborated.  Stapel apologized and expressed his regret, but his conduct leaves a large body of his work, and that of others, under a cloud of suspicion.

Lording the Data

The University committee reported that Stapel had escaped detection for a long time because he was “lord of the data,” by refusing to disclose and share the data.

“Outright fraud may be rare, these experts say, but they contend that Dr. Stapel took advantage of a system that allows researchers to operate in near secrecy and massage data to find what they want to find, without much fear of being challenged.”

Benedict Carey, “Fraud Case,” New York Times (Nov. 2, 2011).  Data sharing is preached but rarely practice.

In a recent publication, Dr. Wicherts and his colleagues, at the University of Amsterdam, reported that two-thirds of his sample of Dutch research psychologists refused to share their data, in contravention of the established ethical rules of the discipline. Remarkably, many of the refuseniks had explicit contractual obligations with their publishing journals to provide data.  Jelte Wicherts, Marjan Bakker, Dylan Molenaar, “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results,” PLoS ONE 6(11): e26828 (Nov. 2, 2011)

Scientific fraud seems no more common among scientists with industry ties, which are so often the subject of ad hominem conflict of interest claims.  Instead, fraudfeasors such as Stapel or Hwang Woo-suk are more often simply egotistical, narcissistic, self-aggrandizing, self-promoting, or delusional.  In the United States, litigation, occasionally has brought out charlatans, but it has also resulted in high-quality studies that have provided strong evidence for or against litigation claims.  Compare Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans” and the litigation as largely based upon fraud) with Committee on the Safety of Silicone Breast Implants, Institute of Medicine, Safety of Silicone Breast Implants (Wash. D.C. 1999) (reviewing studies, many of which were commissioned by litigation defendants, and which collectively showed lack of association between silicone and autoimmune diseases).

The relation between litigation and research is one that has typically been approached by self-righteous voices, such as David Michaels and David Egilman, and others who have their own deep conflicts of interest.  What is clear is that all litigants, as well as the public, would benefit from enforcing data sharing requirements.  SeeLitigation and Research” (April 15, 2007) (science should not be built upon blind trust of scientists: “Nullius in verba.”).

The Times article emphasized Wicherts’ research about lack of data sharing, and suggested that data sharing could improve the quality of scientific publications.  The time may have come, however, for sterner measures of civil and criminal penalties for scientists who abuse and waste governmental funding, or who aid and abet fraudulent litigation.

New-Age Levellers – Flattening Hierarchy of Evidence

October 30th, 2011

The Levelers were political dissidents in England, in the middle of the 17th century.  Among their causes, Levelers advanced popular sovereignty, equal protection of the law, and religious tolerance.

The political agenda of the Levelers sounds quite noble to 21st century Americans, but their ideals have no place in the world of science:  not all opinions or scientific studies are created equally; not all opinions are worthy of being taken seriously in scientific discourse or in courtroom presentations of science; and not all opinions should be tolerated, especially when they claim causal conclusions based upon shoddy or inadequate evidence.

In some litigations, legal counsel set out to obscure the important quantitative and qualitative distinctions among scientific studies.  Sometimes, lawyers find cooperative expert witnesses, willing to engage in hand waving about “the weight of the evidence,” where the weights are assigned post hoc, in a highly biased fashion.  No study (that favors the claim) left behind.  This is not science, and it is not how science operates, even though some expert witnesses, such as Professor Cranor in the Milward case, have been able to pass off their views as representative of scientific practice.

A sound appreciation of how scientists evaluate studies, and of why not all studies are equal, is essential to any educated evaluation of scientific controversies.  Litigants who face high-quality studies, with results inconsistent with their litigation claims, may well resort to “leveling” of studies.  This leveling may be advanced out of ignorance, but more likely the leveling is an attempt to snooker courts with evidence from exploratory, preliminary, and hypothesis-generating studies as somehow equal to, or greater than, the value of hypothesis-testing studies.

Some of the leveling tactics that have become commonplace in litigation include asserting that:

  • All experts witnesses are the same;
  • All expert witnesses conduct the same analysis;
  • All expert witnesses read articles, interpret them, and offer opinions;
  • All expert witnesses are inherently biased;
  • All expert witnesses select the articles to read and interpret in line with their biases;
  • All epidemiologic studies are the same;
  • All studies are flawed; and
  • All opinions are, in the final analysis, subjective.

This leveling strategy can be seen in Professor Margaret Berger’s introduction to the Reference Manual on Scientific Evidence (RMSE 3d), where she supported an ill-defined “weight-of-the-evidence” approach to causal judgments. SeeLate Professor Berger’s Introduction to the Reference Manual on Scientific Evidence” (Oct. 23, 2011).

Other chapters in the RMSE 3d are at odds with Berger’s introduction.  The epidemiology chapter does not explicitly address the hierarchy of studies, but it does describe cross-sectional, ecological, and secular trend studies are less able to support causal conclusions.  Cross-sectional studies are described as “rarely useful in identifying toxic agents,” RMSE 3d at 556, and as “used infrequently when the exposure of interest is an environmental toxic agent,” RMSE 3d at 561.  Cross-sectional studies are described as hypothesis-generating as opposed to hypothesis testing, although not in those specific terms.  Id. (describing cross-sectional studies as providing valuable leads for future research).  Ecological studies are described as useful for identifying associations, but not helpful in determining whether such associations are causal; and ecological studies are identified as a fertile source of error in the form of the “ecological fallacy.”  Id. at 561 -62.

The epidemiology chapter perhaps weakens its helpful description of the limited role of ecological studies by citing, with apparent approval, a district court that blinked at its gatekeeping responsibility to ensure that testifying expert witnesses did, in fact, rely upon “sufficient facts or data,” as well as upon studies that are “of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject.” Rule 703. RMSE 3d at 561 n.34 (citing Cook v. Rockwell International Corp., 580 F. Supp. 2d 1071, 1095–96 (D. Colo. 2006), where the district court acknowledged the severe limitations of ecological studies in supporting causal inferences, but opined that the limitations went to the weight of the study). Of course, the insubstantial weight of an ecological study is precisely what may result in the study’s failure to support a causal claim.

The ray of clarity in the epidemiology chapter about the hierarchical nature of studies is muddled by an attempt to level epidemiology and toxicology.  The chapter suggests that there is no hierarchy of disciplines (as opposed to studies within a discipline).  RMSE 3d at 564 & n.48 (citing and quoting symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.” Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004).)  Carbone, of course, is best known for his advocacy of a viral cause (SV40), of human mesothelioma, a claim unsupported, and indeed contradicted, by epidemiologic studies.  His statement does not support the chapter’s leveling of epidemiology and toxicology, and Carbone is, in any event, an unlikely source to cite.

The epidemiology chapter undermines its own description of the role of study design in evaluating causality by pejoratively asserting that most epidemiologic studies are “flawed”:

“It is important to emphasize that all studies have ‘flaws’ in the sense of limitations that add uncertainty about the proper interpretation of the results.9 Some flaws are inevitable given the limits of technology, resources, the ability and willingness of persons to participate in a study, and ethical constraints. In evaluating epidemiologic evidence, the key questions, then, are the extent to which a study’s limitations compromise its findings and permit inferences about causation.”

RSME 3d at 553.  This statement is actually a significant improvement over the second edition, where the authors of the epidemiology chapter asserted, without qualification:

“It is important to emphasize that most studies have flaws.”

RMSE 2d 337.  The “flaws” language from the earlier chapter was used on occasion by courts that were set on ignoring competing interpretations of epidemiologic studies.  Since all or most studies are flawed, why bother figuring out what is valid and reliable?  Just let the jury sort it out.  This is not an aid to gatekeeping, but rather a prescription for allowing the gatekeeper to call in sick.

The current epidemiology chapter essentially backtracks from the harsh connotations of its use of the term “flaws,” by now equating the term with “limitations.”  Flaws and limitations, however, are quite different from one another.  What is left out in the third edition’s description is the sense that there are indeed some studies that are so flawed that they must be disregarded altogether.  There may also be limitations in studies, especially observational studies, which is why the party with the burden of proof should generally not be allowed to proceed with only one or two epidemiologic studies.  Rule 702, after all, requires that an expert opinion to be based upon “sufficient facts or data.”

The RSME 3d chapter on medical evidence is a refreshing break from the leveling approach seen elsewhere.  Here at least, the chapter authors devote several pages to explaining the role of study design in assessing an etiological issue:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.

When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” RMSE 3d 687, 723 -24 (2011).  The third edition’s chapter is a significant improvement of the second edition’s chapter on medical testimony, which does not mention the hierarchy of evidence.  Mary Sue Henifin, Howard M. Kipen, and Susan R. Poulter, ” Reference Guide on Medical Testimony,” RMSE 2d 440 (2000).  Indeed, the only time the word “hierarchy” appeared in the entire second edition was in connection with the hierarchy of the federal judiciary.

The tension, contradictions, and differing emphases among the various chapters of the RSME 3d point to an important “flaw” in the new edition.  The chapters appear to have been written largely in isolation, and without much regard for what the other chapters contain.  The chapters overlap, and indeed contradict one another on key points.  Witness Berger’s rejection of the hierarchy of evidence, the epidemiology chapter’s inconstant presentation of the concept without mentioning it by name, and the medical testimony chapter’s embrace and explicit presentation of the hierarchical nature of medical study evidence.  Fortunately, the laissez-faire editorial approach allowed the disagreement to remain, without censoring any position, but the federal judiciary is not aided by the contradiction and tension in the approaches.

Given the importance of the concept, even the medical testimony chapter in RSME 3d may seem to be too little, too late to be helpful to the judiciary.  There are book-length treatments of systematic reviews and “evidence-based medicine”: the three pages in Wong’s chapter barely scratch the surface of this important topic of how evidence is categorized, evaluated, and synthesized in making judgments of causality.

There are many textbooks and articles available to judges and lawyers on how to assess medical studies.  Recently, John Cherrie has posted on his blog, OH-world, about a series of 17 articles, in the journal Aerzteblatt International, on the proper evaluation of medical and epidemiologic studies.

These papers, overall, make the point that not all studies are equal, and that not all evidentiary displays are adequate to support conclusions of causal association.  The papers are available without charge from the journal’s website:

01. Critical Appraisal of Scientific Articles

02. Study Design in Medical Research

03. Types of Study in Medical Research

04. Confidence Interval or P-Value?

05. Requirements and Assessment of Laboratory Tests: Inpatient Admission Screening

06. Systematic Literature Reviews and Meta-Analyses

07. The Specification of Statistical Measures and Their Presentation in Tables and Graphs

08. Avoiding Bias in Observational Studies

09. Interpreting Results in 2×2 Tables

10. Judging a Plethora of p-Values: How to Contend With the Problem of Multiple Testing

11. Data Analysis of Epidemiological Studies

12. Choosing statistical tests

13. Sample size calculation in clinical trials

14. Linear regression analysis

15. Survival analysis

16. Concordance analysis

17. Randomized controlled trials

This year, the Journal of Clinical Epidemiology began publishing a series of papers, known by the acronym GRADE, which aim to provide guidance on how studies are categorized and assessed for their evidential quality in supporting treatments and intervention.  The GRADE project is led by Gordon Guyatt, who is known for having coined the term “evidence-based medicine,” and written widely on the subject.  Guyatt, along with his colleagues including Peter Tugwell (who was one of the court-appointed expert witnesses in MDL 926), has described the GRADE project:

“The ‘Grades of Recommendation, Assessment, Development, and Evaluation’ (GRADE) approach provides guidance for rating quality of evidence and grading strength of recommendations in health care. It has important implications for those summarizing evidence for systematic reviews, health technology assessment, and clinical practice guidelines. GRADE provides a systematic and transparent framework for clarifying questions, determining the outcomes of interest, summarizing the evidence that addresses a question, and moving from the evidence to a recommendation or decision. Wide dissemination and use of the GRADE approach, with endorsement from more than 50 organizations worldwide, many highly influential   http://www.gradeworkinggroup.org/), attests to the importance of this work. This article introduces a 20-part series providing guidance for the use of GRADE methodology that will appear in the Journal of Clinical Epidemiology.”

Gordon Guyatt, Andrew D. Oxman, Holger Schünemann, Peter Tugwell, Andre Knottnerus, “GRADE guidelines – new series of articles in Journal of Clinical Epidemiology,” 64 J. Clin. Epidem. 380 (2011).  See also Gordon Guyatt, Andrew Oxman, et al., for the GRADE Working Group, “Rating quality of evidence and strength of recommendations GRADE: an emerging consensus on rating quality of evidence and strength of recommendations,” 336 Brit. Med. J. 924 (2008).  [pdf]

Of the 20 papers planned, 9 of the GRADE papers have been published to date in the Journal of Clinical Epidemiology:

01 Intro – GRADE evidence profiles & summary of findings tables

02 Framing question & deciding on important outcomes

03 Rating quality of evidence

04 Rating quality of evidence – study limitations (risk of bias)

05 Rating the quality of evidence—publication bias

06 Rating up quality of evidence – imprecision

07 Rating quality of evidence – inconsistency

08 Rating quality of evidence – indirectness

09 Rating up quality of evidence

The GRADE guidance papers focus on the efficacy of treatments and interventions, but in doing so, they evaluate “effects” and are thus applicable to the etiologic issues of alleged harm that find their way into court.  The papers build on other grading systems advanced previously by the Oxford Center for Evidence-Based Medicine, the U.S. Preventive Services Task Force (Agency for Healthcare Research and Quality AHRQ), the Cochrane Collaboration, as well as many individual professional organizations.

GRADE has had some success in harmonizing disparate grading systems, and forging a consensus among organizations that had been using their own systems, such as the  World Health Organization, the American College of Physicians, the American Thoracic Society, the Cochrane Collaboration, the American College of Chest Physicians, the British Medical Journal, and Kaiser Permanente.

There are many other important efforts to provide consensus support for improving the quality of the design, conduct, and reporting of published studies, as well as the interpretation of those studies once published.  Although the RSME 3d does a good job of introducing its readers to the basics of study design, it could have done considerably more to help judges become discerning critics of scientific studies and of conclusions based upon individual or multiple studies.

Historians As Expert Witnesses – A Wiki

October 28th, 2011

“The one duty we owe to history is to rewrite it.”

Oscar Wilde, The Critic As Artist (1891)

“What will history say?  History, sir, will tell lies as usual.”

George Bernard Shaw, The Devil’s Disciple (1901)

* * * * * * * * * * * * * * * * * * * * * * * * *

The Defense Research Institute recently announced that Bill Childs, a professor at the Western New England University School of Law, will be speaking the use of historians as expert witnesses in litigation.  Having puzzled about this very issue in previous writings, I look forward to Professor Childs’ contributions on the issue.  The announcement also noted Professor Childs’ creation, “the Historians as Experts Wiki,” which I knew about, but had not previously visited.

The wiki is a valuable resource of information about historians who have participated in the litigation process in all manner of cases, including art, asbestos, creationism, native Americans, holocaust, products liability, intellectual property, and voting rights.  There are pages for each historian witness, including expert witnesses in other fields, who have given testimony of an explicitly historical nature. The website is still in its formative stages, but it holds great promise as a resource to lawyers who are researching historians who have been listed as expert witnesses in their cases.

Most of my musings about historians as expert witnesses have been provoked by those who have testified about the history of silicosis.  Last year, I presented at a conference sponsored by the International Commission on Occupational Health (ICOH), about such historians.  See “A Walk on the Wild Side,” July 16, 2010.  My presentation abstract, along with all the proceedings of that conference, will be published next year as  “Courting Clio:  Historians and Their Testimony in Products Liability Action,” in: Brian Dolan and Paul Blanc, eds., At Work in the World: Proceedings of the Fourth International Conference on the History of Occupational and Environmental Health, Perspectives in Medical Humanities, University of California Medical Humanities Consortium, University of California Press (2012)(in press).

Philadelphia Courts – Structural Bias and Reverse Bifurcation

October 27th, 2011

When I studied federal courts in law school, some of the most interesting cases involving federal diversity and removal jurisdiction were decisions of the Third Circuit, on appeals from the Eastern District of Pennsylvania.  At the time, it did not occur to me that there must be strong incentives to push the boundaries of federal jurisdiction so hard to avoid state court.  A few years later, when I started to try cases in the Philadelphia County Court of Common Pleas, I “got it.”

You probably do not need to have a doctorate in economics to object when someone pisses on you, and calls it rain.  Still, it is comforting to have corroboration from someone with a doctorate.

Joshua D. Wright, a professor of law and economics at George Mason University School of Law, has written up the results of a study, “Are Plaintiffs Drawn to Philadelphia’s Civil Courts? An Empirical Examination,” published by the International Center for Law & Economics.  Professor Wright finds that the Philadelphia civil court system contains significant structural biases, which makes the Philadelphia Court of Common Pleas (PCCP) a magnet for plaintiffs from around the country, and which inflates verdicts and settlements in civil cases.

One such structural bias is the existence of a Complex Litigation Center.  Some of the judges and administrators in charge of the Center have seen their role to be rain makers, to bring litigation business to Philadelphia.  Of course, proper venue and the doctrine of forum non conveniens may tend to get in the way of such an official business plan.

Another structural bias in the Philadelphia courts is the automatic, unthinking use of a procedure called reverse bifurcation.  Typically bifurcation requires plaintiff to establish liability before proceeding to causation and damages, but reverse bifurcation puts causation and damages first.  This bizarre procedure was first urged by Johns-Manville lawyers in asbestos litigation, to avoid the shame and shock of having the jury hear their company’s liability case at the same time that the jury heard the evidence whether plaintiff was injury.  Reverse bifurcation gave them a chance to sanitize the trial on medical causation.  If they lost an up-or-down medical issue, the Johns-Manville lawyers could settle to avoid having the ugly liability evidence shared with the jury.

Johns-Manville soon filed for bankruptcy, but the plaintiffs’ bar learned that reverse bifurcation was a wonderful procedure.  They could get a verdict after three days of trial, and the second phase of the case was virtually untriable by the defense.  Why?  Because the plaintiffs’ lawyers found that they could inject their liability case surreptitiously into the first phase.  Claiming a relevancy to fear and emotional distress, plaintiffs’ counsel asked their clients whether they ever contemplated the horror of living with the increased risks of disease they now supposedly faced, and plaintiffs responded that they had no idea of the risks when they worked at the shipyards, refineries, or other workplaces.  In summation, plaintiffs’ counsel would slip in something like “After the last few days, you, members of the Jury, now know more about asbestos than my client did after 30 years of working in the shipyard.”  Defense objections and motions in limine were studiously ignored.  Who needs to prove a failure to warn, when you can simply assert it?

Egregiously, the reverse bifurcation procedure stuck, even when defendants, unlike Johns-Manville, had potent defenses.  Some Philadelphia judges, in second phase trials, tolerate indignant arguments from plaintiff’s counsel, to the effect that first the (recalcitrant) defendant caused this injury to his client, and now that defendant wants to take away plaintiff’s money, which the jury so thoughtfully, carefully, and justly awarded in the first phase.  Winning a second phase trial, in a case that has been reverse bifurcated, is a bit like cleaning out the Augean stables.

Some judges even went so far, in phase II liability trials as to sever crossclaims of the non-settling defendant.  This procedural maneuver required the defendant to post a bond for the entire judgment, without any offsets, in order to pursue an appeal.  The lack of a final judgment seemed not to disturb anyone other than the victimized defendant.

Not all Philadelphia judges were keen on these inequitable procedures.  I recall trying an asbestos case in front of Judge Levan Gordon, who refused to be bullied by the head of the Complex Litigation Center into reverse bifurcating asbestos trials.  (O’Donnell v. Celotex Corp., PCCP July Term 1982, No. 1619; May 1989)  Judge Gordon had his own strong medicine for defendants:  he tried the cases all issues, with no bifurcation of punitive damages.  Judge Gordon tried my case, which was prosecuted by now Philadelphia Judge Sandy Byrd, straight through.  Because my adversary, Sandy Byrd, insisted on pressing negligence and punitive damages, I was able to try an empty-chair defense against the United States government, which owned and ran the Philadelphia Naval Shipyard, where plaintiff worked.  I was also able to put on a state-of-the-art defense.  And my jury saw what juries rarely see in Philadelphia, the complete story.  They refused to hold my clients responsible for what really was the negligence of the government, even though I had a weak medical defense.

The head of the Complex Litigation Center was furious that Judge Gordon had taken up three weeks of courtroom time.  Her Honor was deaf to explanations that it was plaintiffs’ choice to pursue negligence and punitive damages, which claims opened the door to the sophisticated intermediary and state-of-the-art defenses.  Somehow it was the defendants’ fault for tying up a courtroom, and for derailing the all-important case statistics.

Then, as now, there are some excellent judges in Philadelphia, who are intent to try cases fairly and impartially, with even-handed procedures.  And then there are other judges, who have helped create Philadelphia’s reputation, and the statistics that support Professor Wright’s conclusions.

Manufacturing Certainty

October 25th, 2011

Steven Wodka is a plaintiffs’ lawyer, based in New Jersey, who has worked closely, for many years, with Dr. David Michaels, as his paid expert witness.  Yes, the David Michaels who is now the head of the Occupational Safety and Health Administration (OSHA).

When Michaels for nominated for his current post, the Democratic majority leaders in the Senate protected him from hearings, which would have revealed Michaels’ deep and disturbing conflicts of interest.  The Democratic Senators succeeded in their efforts, and Michaels was confirmed as undersecretary of the Department of Labor, on a voice vote, without hearings.

Mr. Wodka may have lost his friend, colleague, and expert witness to the OSHA, but at the same time he gained an ally in his litigation efforts on behalf of plaintiffs.  Wodka, who litigates in New Jersey and elsewhere, was troubled by court decisions that OSHA’s Hazard Communication regulations preempted his state-law tort claims. See, e.g., Bass v. Air Products, 2006 WL 1419375 (N.J. App. Div. 2006) (holding that OSHA’s hazard communication standard was a comprehensive regulatory scheme that preempted state tort failure-to-warn claims for warnings that complied with federal regulations).

Wodka may have lost his expert witness (for a while), but he gained an inside track to the Department of Labor.  Disappointed by New Jersey’s appellate court, Wodka sought an advisory opinion from the Department of Labor on the preemptive effect of HazCom.  See David Schwartz, “Solicitor Says Hazard Communication Rule Does Not Preempt Failure-to-Warn Lawsuits,” BNA (October 20, 2011).

The Department of Labor, now under control of his friend and paid expert witness, Dr. Michaels, did not disappoint.  Solicitor of Labor M. Patricia Smith, in a letter dated October 18, 2011, wrote Mr. Wodka that, notwithstanding what the appellate courts may have told him, he was correct after all.  The OSHA’s Hazard Commuication Standard, 29 C.F.R. 1200(a)(2), does not, according to the Department, preempt state tort claims alleging failures to warn.

The solicitor relied upon Section 4(b)(4) of the OSH Act, which states that nothing in the Act is intended to “enlarge or diminish or affect in any other manner the common law or statutory rights, duties or liabilities of employers and employees under any law with respect to injuries, diseases, or death arising out of, or in the course of, employment.”  The OSH Act, however, in making this disclaimer, was focused on the employer-employee relationship, with its attendant duties, rights, and obligations.  Failure-to-warn claims arise out of laws, whether statutory or common law, designed to protect consumers.  The solicitor’s analysis really misses the key point that a comprehensive scheme, such as the HazCom Act and regulations, applies to strangers to the employer-employee relationship, and constrains the nature and content of warnings communications to the employees of purchasers of chemical products and raw materials.

The solicitor was clear that “a definitive determination of conflict can only be made based on the particulars of each case.”  Smith Letter, at footnote 4.  This slight speedbump did not slow down Mr. Wodka, who was quoted by the BNA as saying that “[t]his letter makes the question clear,” and “I’m already going to move for reconsideration of one of my cases based on this letter.”

It is good to have friends in powerful places.

Of course, there is a good deal of irony involved in this story.  David Michaels has made a career out of scolding industry over conflicts of interest.  Michaels’ book, Doubt is Their Product, gets waved around in courtrooms, when defense expert witnesses testify that the plaintiffs’ evidence fails to show that a product causes harm, or has caused plaintiff’s harm.  Some people may find this scolding a little irritating, especially from someone, like Michaels, who fails to disclose his own significant conflicts of interest, from monies received as a testifying and consulting expert witness, and from running an organization,  The Project on Scientific Knowledge and Public Policy (SKAPP),  bankrolled by the plaintiffs’ counsel in the silicone gel breast implant litigation.

Doubt is not such a bad thing in the face of uncertain and inconclusive evidence.  We could use more doubt, and open-minded thought.  As Bertrand Russell wrote some years ago:

“The biggest cause of trouble in the world today is that the stupid people are so sure about things and the intelligent folks are so full of doubts.”

Late Professor Berger’s Introduction to the Reference Manual on Scientific Evidence

October 23rd, 2011

In several posts, I have addressed isolated issues in Professor Margaret Berger’s introductory chapter to the third edition of the Reference Manual on Scientific Evidence (RMSE 3d).  Let me back up and address the bigger, more disturbing picture.

Professor Berger was a well-respected evidence scholar, who had written about Daubert issues in her lifetime.  See generally Edward K. Cheng, ” Introduction: Festschrift in Honor of Margaret A. Berger,” 75 Brooklyn L. Rev. 1057 (2010).  Along with Judge Jack Weinstein, she was the author of Weinstein’s Evidence and Cases and Materials on Evidence.  Berger was intellectually opposed to the Daubert enterprise.  See, e.g., Margaret A. Berger & Aaron D. Twerski, “Uncertainty and Informed Choice:  Unmasking Daubert,” 104 Mich. L.  Rev. 257 (2005).  This opposition is clearly reflected in the Berger’s chapter in the new edition of the RMSE 3d.

Over the course of several years, Berger organized and supervised a series of symposia, Science for Judges.  Berger’s symposia involved many respected authors as well as some highly partisan, pro-plaintiff scholars.  Berger also participated in some of the four so-called Coronado Conferences, which featured discussions, with subsequent publications, on expert witness issues.  Both Science for Judges and the Coronado Conferences were sponsored by SKAPP, the Project on Scientific Knowledge and Public Policy, an anti-Daubert advocacy group, headed up mostly by plaintiffs’ expert witnesses.

According to SKAPP‘s website, the organization enjoyed past support from the Common Benefit Trust, a fund established pursuant to a court order in the Silicone Gel Breast Implant Products Liability litigation.  SKAPP has consistently misrepresented the funding source of its anti-Daubert organization.  What SKAPP hides is that this “fund” is nothing more than plaintiffs’ counsel’s walking-around money from MDL 926, which involved, ironically, claims for autoimmune disease allegedly caused by silicone gel breast implants.  This MDL collapsed after 1999, when court-appointed experts and then the Institute of Medicine declared that the scientific evidence did not support plaintiffs’ causal claims.  See Judge Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans”; “[t]he breast implant litigation was largely based on a litigation fraud. … Claims—supported by medical charlatans—that enormous damages to women’s systems resulted could not be supported.”)

Flush with silicone MDL “common benefits money,” plaintiffs’ counsel helped fund SKAPP, rather than returning the money to their clients.  See Ralph Klier v. Elf Atochem North America Inc., 2011 U.S. App. LEXIS 19650 (5th Cir. 2011) (holding that district court abused its discretion in distributing residual funds from class action over arsenic exposure to charities; directing that residual funds be distributed to class members with manifest personal injuries).  As with all common benefit funds in multi-district litigations, the fund in MDL 926 was established pursuant to a court order, but it was certainly not money from the federal courts; SKAPP’s funding was from plaintiffs’ lawyers, who had been rebuffed and refuted by science in the courtroom.  Some of those plaintiffs’ lawyers used their left-over “walking-around” money, laundered through SKAPP, to help sponsor anti-Daubert articles in several fora, including Berger’s Science for Judges symposia, and the Coronado ConferencesSeeSKAPP A LOT” (April 30, 2010).

Given the misleading propaganda from SKAPP about the sources of its funding, Professor Berger may well have been misled, along with other scholars who participated at SKAPP-funded events.  On the other hand, I would have hoped that these scholars were aware that the “Common Benefit Trust, a fund established pursuant to a court order in the Silicone Gel Breast Implant Liability litigation,” was nothing more than plaintiffs’ counsels’ spending allowance for advancing their own litigation goals.

Back in 2000, Professor Berger wrote a similar introductory chapter on admissibility of expert witness testimony in the second edition of the RMSE.  The second edition’s chapter, however, was decidedly less partisan, with relatively neutral presentations and discussions of the leading Supreme Court and lower court decisions.  Berger’s opposition to judicial gatekeeping was subdued and in check, as befitted a neutral introduction in a volume published by the Federal Judicial Center.

The third edition of the RMSE features a very different introduction by Professor Berger.  The gloves are off, and so is any pretense at non-partisanship.

Berger, in her chapter in RSME 3d, provides a detailed discussion of Daubert, Joiner, Kumho Tire, and Weisgram, but remarkably, Berger offers virtually no discussion of the amendments to, and revisions of, Rule 702, in 2000, after she wrote the RSME 2d chapter.  The actual text of the Rule, which is now the operative, controlling legal language, is not set out in her RSME 3d chapter; nor does Berger present any of the discussion from the Advisory Committee notes on the scope and purpose of the 2000 revision.  Instead, Berger reports, and acquiesces in a loose practice, employed by some trial courts that continue to cite and to rely upon Daubert, or Circuit-level pre-2000 precedent, without mentioning the new Rule.  Later in the chapter, Berger does discuss a specific-causation decision by Judge Jack Weinstein, in In re Zyprexa, 2009 WL 1357236 (E.D.N.Y. May 12, 2009), where he excluded the expert witness.  A footnote makes clear that Judge Weinstein held the witness’s testimony failed the three prongs of the new Rule 702.  RMSE 3d at 24 & n. 64.  This discussion obscures as much as illustrates that the rule, as amended, is the operative language.  The chapter fails to note that Judge Weinstein’s correct practice of citing the actual Rule is correct as a matter of legal process.  Berger is not shy elsewhere about criticizing trial judges’ practices so her passivity in connection with the disregard of a statutory revision of Rule 702 is difficult to understand except as a way to dodge the mandates of the revised rule.

The second edition had a lengthy discussion of Rosen v. Ciba-Geigy Corp., 78 F.3d 316, 319 (7th Cir.), cert. denied, 519 U.S. 819 (1996), where Judge Posner famously declared “the courtroom is not the place for scientific guesswork, even of the inspired sort. Law lags science; it does not lead it.”  See Margaret A. Berger, “The Supreme Court’s Trilogy on the Admissibility of Expert Testimony,” RMSE 2d 9, 24 (2000).  In the RMSE 3d, Rosen is gone, and now we have the philosophy of Milward, with its radical leveling of evidence and expert witness opinion to replace Rosen.  Remarkably, the cite to Milward had to have been added after Professor Berger’s death, but she no doubt would have approved.  There are no counterbalancing citations to important decisions, reversing trial judges for inadequate gatekeeping, such as Tamraz v. Lincoln Elec. Co., 620 F.3d 665 (6th Cir. 2010), cert. den., ___ U.S. ___ (2011), which were decided before Professor Berger’s death.

As an academic scholar and a citizen, Berger was entitled to her views about Daubert.  In her lifetime, she wrote and spoke about those views, sincerely and passionately.  Her writings and lectures helped provoke an important discussion on the role of science in the courtroom.  Her selection, however, to introduce a National Research Council volume on science in the courtroom seems dubious given her partisan views.  One could only imagine the hue and cry if, say, Peter Huber (of Galileo’s Revenge fame) were selected to write the volume’s introduction to the law of expert witness admissibility, or if tobacco companies had funded Science for Judges seminars, with money laundered through not-for-profit organizations.

Libertine View of Expert Witness Admissibility

Berger complains that the Federal Rules of Evidence were intended to be interpreted liberally in favor of the admissibility of evidence.  RMSE 3d at 36 (“the preference for admissibility contained both in the Federal Rules of Evidence and in Daubert itself”).  The word “liberal” does not appear in the Federal Rules of Evidence.  Instead, the Rules contain an explicit statement of how judges must construe and apply their evidentiary provisions:

“These rules shall be construed to secure fairness in administration, elimination of unjustifiable expense and delay, and promotion of growth and development of the law of evidence to the end that the truth may be ascertained and proceedings justly determined.”

Rule 102 (“Purpose and Construction”).

Berger does not, nor can she, explain how a “let it all in” approach helps to secure fairness, eliminates unjustifiable expense and time of trial, or leads to just outcomes.  This would be a most illiberal result.  The truth will not be readily ascertained if expert witnesses are permitted to pass off hypotheses and ill-founded conclusions as scientific knowledge.

In any event, we should resist the mechanical, outcome-determinative interpretation of “liberal.”  Bertrand Russell presented a much more compelling understanding of what it means to have a liberal outlook in human enterprises:

“The essence of the liberal outlook lies not in what opinions are held, but in how they are held: instead of being held dogmatically, they are held tentatively, and with a consciousness that new evidence may at any moment lead to their abandonment. This is the way opinions are held in science, as opposed to the way in which they are held in theology.”

Bertrand Russell, “Philosophy and Politics,” in Unpopular Essays 15 (N.Y. 1950)(emphasis in original).  Lord Russell’s admonition counsels greater not less skepticism in the liberal outlook on opinions that lie at the fringes, and beyond the fringes, of human knowledge.

Now, it is true that the Supreme Court, back in 1993, spoke of the “Rule’s basic standard of relevance … is a liberal one.” Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 587, 588 (1993).  Similarly, the Court spoke of the Rules’ general “liberal thrust” in relaxing barriers to opinion testimony.  But in adopting an epistemic standard, rather than a nose-counting, sociological standard of “general acceptance,” the Court did, in fact, liberalize the rules of admissibility for expert witness opinions.  Implicit in Professor Berger’s critique is an unhappiness with both the liberal epistemic and the conservative general-acceptance approach.  The principal remaining option apparently would be Ferebee‘s libertine, “let it all in” approach, which was rejected by the Supreme Court and Congress.

Serious Omissions in Berger’s “Admissibility of Expert Testimony”

A. Short Shrifting The Rules

I have previously written about the complete omission of Rule 703 and its role in ensuring the trustworthiness of expert witness opinion.  See New Reference Manual on Scientific Evidence Short Shrifts Rule 703 (Oct. 16, 2011).  And above, I have explored how Professor Berger studiously ignored the amended Rule 702 itself, in order to hold on to inconsistent dicta in cases that predated the statutory amendment.

The Federal Rules of Evidence are statutory law.  In 1972, the Rules were adopted by order of the Supreme Court, and were transmitted by the Chief Justice to Congress.  By law, the proposed rules “shall have no force or effect except to the extent, and with such amendments, as they may be expressly approved by Act of Congress.”  Pub. L. 93-12, Mar. 30, 1973, 87 Stat. 9.  The Supreme Court has made clear that the Federal Rules of Evidence are legislatively enacted and that the Court must interpret them as it would any statute.  See, e.g., Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 587 (1993) (courts must “interpret the legislatively enacted Federal Rules of Evidence as [they] would any statute”); United States v. Salerno, 505 U.S. 317, 322 (1992) (refusing to ignore the plain language of Rule 802 and 803; “To respect [the legislature’s] determination, we must enforce the words that it enacted.”); Beech Aircraft Corp. v. Rainey, 488 U.S. 153, 163 (1988).

One of the key lessons of Daubert itself was that the Frye rule did not survive the 1972 enactment of the Federal Rules of Evidence, given the lack of reference to the Frye rule in chapter VII of the rules.  The Rules trump precedent.  See David Bernstein, “Courts Refusing to Apply Federal Rule of Evidence 702” (May 6, 2006) (arguing that the language of the 2000 amended Rule 702 trumps the various dicta scattered about the Daubert quartet as a matter of legal process).  But see Glen Weissenberger, “The Proper Interpretation of the Federal Rules of Evidence: Insights from Article VI,” 30 Cardozo L. Rev. 4 (2009) (arguing, admittedly contrary to Supreme Court precedent and the majority of evidence scholars, that the Federal Rules of Evidence are something more akin to a codification of common law, and that the usual canons of statutory interpretation do not fully apply).

B.  Ignoring the Hierarchy of Evidence

Professor Berger not only omits consideration of the reasonableness of relying upon individual scientific studies, she fails to give any consideration to a hierarchy of evidence, which distinguishes between and among study designs.  To some extent, the RSME 3d chapters on epidemiology and on medical testimony remedy this failure, but Berger’s chapter is thus badly out of synch with key chapters in the RMSE 3d, as well as with how science evaluates claims of causality and reaches conclusions of causality (or not) from multiple studies of varying designs and quality.  See RSME 3d, at 561 (noting that certain study designs, such as cross-sectional and ecological studies, are frequently unsuitable for supporting inferences of causal association); id. at 723-34 (describing the hierarchy of evidence in which some studies may raise interesting questions without offering much in the way of answering those questions).  The result of Berger’s treatment is that evidence is “leveled,” allowing litigants to escape meaningful gatekeeping as long as they can point to some study, regardless of study invalidity or poor quality.

Berger’s Concerns About Credibility

A. The Credibility of Theories

Berger worries that the 702 gatekeeping process leads to courts’ making credibility determinations of the expert witnesses and their scientific theories.  FMSE 3d at 36.  Surely federal judges have at least the ability to distinguish analytically between credibility of witnesses and the scientific opinions that are proffered.  As for the credibility of experts’ theories, I confess it is difficult to understand what Berger may have had in mind other than the actual requirements of Rule 702 itself.  If the proffered testimony is not based upon:

1. sufficient facts or data,

2. the product of reliable principles and methods, and

3. a reliable application of principles and methods to the facts of the case

then, no doubt, the testimony will be unreliable and incredible. The clear lesson of expert witness litigation, and of science in the law generally, is that qualified, and apparently credible expert witnesses, sometimes advance opinions and conclusions that fail one or more of the requirements of Rule 702.  Berger seems to have conflated reliability and credibility as a way of waving judges off any searching inquiry into the former.

B. The Credibility of Defense Expert Witnesses

Without any substantial support in case law or in the Rules, Professor Berger posits a concern over whether courts should permit a broad inquiry into the defense expert witnesses’ relationships with the defendant.  RMSE 3d at 21-22.  Berger worries that defendants will support their Daubert challenges with testimony from academics from “highly respected academic institution[s],” which likely receive donations and research grants from private corporations.

The posited concern is curious because it assumes that the “Daubert” challenge is to the plaintiff’s expert witness.  Accepting the assumption, why should not the concern be over whether the plaintiffs’ expert witnesses are compromised by their bias, whether financial or positional?  Berger’s assumption ignores the fact that the credibility and qualifications of expert witnesses are generally not at issue in a challenge to the reliability of proffered opinion testimony.

Berger’s entire discussion of credibility is a rather fanciful and far-fetched way of injecting credibility into Rule 702 determinations as a way to argue that such determinations must be left for the ultimate trier of fact — the jury — charged with resolving credibility issues.

Berger’s discussion is itself incredibly lopsided and biased attack on defendants’ expert witnesses. Her discussion is also beside the point of the Rule 702 and 703 evidentiary issues.  Courts should be focused on the reasonableness of the challenged expert witness’s reliance upon facts and data, and whether the witness has used the methods of science in a reliable way to reach his or her opinions.  Furthermore, there is a stark asymmetry between plaintiffs and defendants, and their expert witnesses, with respect to litigation bias.  Defense counsel and defense expert witnesses (assuming that they are financially compensated) stand to lose by having courts exclude plaintiffs’ expert witnesses and dismiss plaintiffs’ claims.  Plaintiffs’ expert witnesses and plaintiffs’ counsel, collectively, the litigation industry, have everything to gain and nothing to lose by abrogating the gatekeeping process.  Professor Berger’s introduction to expert witness admissibility in RSME 3d, wittingly or not, attempts to aid that litigation industry.

New Reference Manual on Scientific Evidence Short Shrifts Rule 703

October 16th, 2011

In “RULE OF EVIDENCE 703 — Problem Child of Article VII (Sept. 19, 2011),” I wrote about how Federal Rule of Evidence 703 is generally ignored and misunderstood in current federal practice.  The Supreme Court, in deciding Daubert, shifted the focus to Rule 702, as the primary tool to deploy in admitting, as well as limiting and excluding, expert witness opinion testimony.  The Court’s decision, however, did not erase the need for an additional, independent rule to control the quality of inadmissible materials upon which expert witnesses rely.  Indeed, Rule 702 as amended in 2000, incorporated much of the learning of the Daubert decision, and then some, but it does not address the starting place of any scientific opinion:  the data, the analyses (usually statistical) of data, and the reasonableness of relying upon those data and analyses.  Instead, Rule 702 asks whether the proffered testimony is based upon:

  1. sufficient facts or data,
  2. the product of reliable principles and methods, and
  3. a reliable application of principles and methods to the facts of the case

Noticeably absent from Rule 702, in its current form, is any directive to determine whether the proffered expert witness opinion is based upon facts or data of the sort upon which experts in the pertinent field would reasonably rely.  Furthermore,  Daubert did not address the fulsome importation and disclosure of untrustworthy hearsay opinions through Rule 703.  See Problem Child (discussing the courts’ failure to appreciate the structure of peer-reviewed articles, and the need to ignore the discussion and introduction sections of such articles as often containing speculative opinions and comments).  See also Luciana B. Sollaci & Mauricio G. Pereira, “The introduction, methods, results, and discussion (IMRAD) structure: a fifty-year survey,” 92 J. Med. Libr. Ass’n 364 (2004); Montori, et al., “Users’ guide to detecting misleading claims in clinical research reports,” 329 Br. Med. J. 1093, 1093 (2004) (advising readers on how to avoid being misled by published literature, and counseling readers to “Read only the Methods and Results sections; bypass the Discuss section.”)  (emphasis added).

Given this background, it is disappointing but not surprising that the new Reference Manual on Scientific Evidence severely slights Rule 703.  Using either a word search in the PDF version or the index at end of book tells the story:  There are five references to Rule 703 in the entire RMSE!  The statistics chapter has an appropriate but fleeting reference:

“Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703. Often, however, the battle over statistical evidence concerns weight or sufficiency rather than admissibility.”

RMSE 3d at 214. At least this chapter acknowledges, however briefly, the potential problem that Rule 703 poses for expert witnesses.  The chapter on survey research similarly discusses how the data collected in a survey may “run afoul” of Rule 703.  RMSE 3d at 361, 363-364.

The chapter on epidemiology takes a different approach by interpreting Rule 703 as a rule of admissibility of evidence:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible,184 as it tends to make an issue in dispute more or less likely.185

Id. at 610.  This view is mistaken.  Sufficient rigor in an epidemiologic study is certainly needed for reliance by an expert witness, but such rigor does not make the study itself admissible; the rigor simply permits the expert witness to rely upon a study that is typically several layers of inadmissible hearsay.  See Reference Manual on Scientific Evidence v3.0 – Disregarding Study Validity in Favor of the “Whole Gamish” (Oct. 14, 2011) (discussing the argument put forward by the epidemiology chapter for considering Rule 703 as an exception to the rule against hearsay).

While the treatment of Rule 703 in the epidemiology chapter is troubling, the introductory chapter on the admissibility of expert witness opinion testimony by the late Professor Margaret Berger really sets the tone and approach for the entire volume. See Berger, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Professor Berger never mentions Rule 703 at all!  Gone and forgotten. The omission is not, however, an oversight.  Rule 703, with its requirement of qualifying each study relied upon as having been “reasonably relied upon,” as measured by what experts in the appropriate discipline, is the refutation of Berger’s argument that somehow a pile of weak, flawed studies, taken together can yield a scientifically reliable conclusion. SeeWhole Gamish,” (Oct. 14th, 2011).

Rule 703 is not merely an invitation to trial judges; it is a requirement to look at the discrete studies relied upon to determine whether the building blocks are sound.  Only then can the methods and procedures of science begin to analyze the entire evidentiary display to yield reliable scientific opinions and conclusions.

Reference Manual on Scientific Evidence v3.0 – Disregarding Study Validity in Favor of the “Whole Gamish”

October 14th, 2011

There is much to digest in the new Reference Manual on Scientific Evidence, third edition (RMSE 3d).  Much of what is covered is solid information on the individual scientific and technical disciplines covered.  Although the information is easily available from other sources, there is some value in collecting the material in a single volume for the convenience of judges.  Of course, given that this information is provided to judges from an ostensibly neutral, credible source, lawyers will naturally focus on what is doubtful or controversial in the RMSE.

I have already noted some preliminary concerns, however, with some of the comments in the Preface, by Judge Kessler and Dr. Kassirer.  See “New Reference Manual’s Uneven Treatment of Conflicts of Interest.”  In addition, there is a good deal of overlap among the chapters on statistics, epidemiology, and medical testimony.  This overlap is at first blush troubling because the RMSE has the potential to confuse and obscure issues by having multiple authors address them inconsistently.  This is an area where reviewers should pay close attention.

From first looks at the RMSE 3d, there is a good deal of equivocation between encouraging judges to look at scientific validity, and discouraging them from any meaningful analysis by emphasizing inaccurate proxies for validity, such as conflicts of interest.  (As I have pointed out, the new RSME did not do quite so well in addressing its own conflicts of interest.  SeeToxicology for Judges – The New Reference Manual on Scientific Evidence (2011).”)

The strengths of the chapter on statistical evidence, updated from the second edition, remain, as do some of the strengths and flaws of the chapter on epidemiology.  I hope to write more about each of these important chapters at a later date.

The late Professor Margaret Berger has an updated version of her chapter from the second edition, “The Admissibility of Expert Testimony,” RSME 3d 11 (2011).  Berger’s chapter has a section criticizing “atomization,” a process she describes pejoratively as a “slicing-and-dicing” approach.  Id. at 19.  Drawing on the publications of Daubert-critic Susan Haack, Berger rejects the notion that courts should examine the reliability of each study independently. Id. at 20 & n. 51 (citing Susan Haack, “An Epistemologist in the Bramble-Bush: At the Supreme Court with Mr. Joiner,” 26 J. Health Pol. Pol’y & L. 217–37 (1999).  Berger contends that the “proper” scientific method, as evidenced by works of the International Agency for Research on Cancer, the Institute of Medicine, the National Institute of Health, the National Research Council, and the National Institute for Environmental Health Sciences, “is to consider all the relevant available scientific evidence, taken as a whole, to determine which conclusion or hypothesis regarding a causal claim is best supported by the body of evidence.” Id. at 19-20 & n.52.  This contention, however, is profoundly misleading.  Of course, scientists undertaking a systematic review should identify all the relevant studies, but some of the “relevant” studies may well be insufficiently reliable (because of internal or external validity issues) to answer the research question at hand. All the cited agencies, and other research organizations and researchers, exclude studies that are fundamentally flawed, whether as a result of bias, confounding, erroneous data analyses, or related problems.  Berger cites no support for the remarkable suggestion that scientists do not make “reliability” judgments about available studies when assessing the “totality of the evidence.”

Professor Berger, who had a distinguished career as a law professor and evidence scholar, died in November 2010.  She was no friend of Daubert, but remarkably her antipathy has outlived her.  Her critical discussion of “atomization” cites the notorious decision in Milward v. Acuity Specialty Products Group, Inc., 639 F.3d 11, 26 (1st Cir. 2011), which was decided four months after her passing. Id. at 20 n.51. (The editors note that the published chapter was Berger’s last revision, with “a few edits to respond to suggestions by reviewers.”)

Professor Berger’s contention about the need to avoid assessments of individual studies in favor of the whole gamish must also be rejected because Federal Rule of Evidence 703 requires that each study considered by an expert witness “qualify” for reasonable reliance by virtue of the study’s containing facts or data that are “of a type reasonably relied upon by experts in the particular field forming opinions or inferences upon the subject.”  One of the deeply troubling aspects of the Milward decision is that it reversed the trial court’s sensible decision to exclude a toxicologist, Dr. Martyn Smith, who outran his headlights on issues having to do with a field in which he was clearly inexperienced – epidemiology.

Scientific studies, and especially epidemiologic studies, involve multiple levels of hearsay.  A typical epidemiologic study may contain hearsay leaps from patient to clinician, to laboratory technicians, to specialists interpreting test results, back to the clinician for a diagnosis, to a nosologist for disease coding, to a national or hospital database, to a researcher querying the database, to a statistician analyzing the data, to a manuscript that details data, analyses, and results, to editors and peer reviewers, back to study authors, and on to publication.  Those leaps do not mean that the final results are untrustworthy, only that the study itself is not likely admissible in evidence.

The inadmissibility of scientific studies is not problematic because Rule 703 permits testifying expert witnesses to formulate opinions based upon facts and data, which are not themselves admissible in evidence. The distinction between relied upon, and admissible, studies is codified in the Federal Rules of Evidence, and in virtually every state’s evidence law.

Referring to studies, without qualification, as admissible in themselves is wrong as a matter of evidence law.  The error has the potential to encourage carelessness in gatekeeping expert witnesses’ opinions for their reliance upon inadmissible studies.  The error is doubly wrong if this approach to expert witness gatekeeping is taken as license to permit expert witnesses to rely upon any marginally relevant study of their choosing.  It is therefore disconcerting that the new Reference Manual on Science Evidence (RMSE 3d) fails to make the appropriate distinction between admissibility of studies and admissibility of expert witness opinion that has reasonably relied upon appropriate studies.

Consider the following statement from the chapter on epidemiology:

“An epidemiologic study that is sufficiently rigorous to justify a conclusion that it is scientifically valid should be admissible,184 as it tends to make an issue in dispute more or less likely.185

RMSE 3d at 610.  Curiously, the authors of this chapter have ignored Professor Berger’s caution against slicing and dicing, and speak to a single study’s ability to justify a conclusion. The authors of the epidemiology chapter seem to be stressing that scientifically valid studies should be admissible.  The footnote emphasizes the point:

See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958 (3d Cir. 1990); cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D. Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome] cases exists.”), aff’d, 724 F.2d 613 (8th Cir. 1984). Hearsay concerns may limit the independent admissibility of the study, but the study could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R. Evid. 703 as part of the underlying facts or data relied on by the expert. In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded that certain epidemiologic studies were admissible despite criticism of the methodology used in the studies. The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf. Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, questions relating to the bases and sources of an expert’s opinion affect the weight to be assigned that opinion rather than its admissibility. . . .”).”

RMSE 3d at 610 n.184 (emphasis in bold, added).  This statement, that studies relied upon by an expert in forming an opinion may be admissible pursuant to Rule 703, is unsupported by Rule 703 and the overwhelming weight of case law interpreting and applying the rule.  (Interestingly, the authors of this chapter seem to abandon their suggestion that studies relied upon “might qualify for the learned treatise exception to the hearsay rule, Fed. R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5),” which was part of their argument in the Second Edition of the RMSE.  RMSE 2d at 335 (2000).)  See also RMSE 3d at 214 (discussing statistical studies as generally “admissible,” but acknowledging that admissibility may be no more than permission to explain the basis for an expert’s opinion).

The cases cited by the epidemiology chapter, Kehm and Ellis, both involved “factual findings” in public investigative or evaluative reports, which were independently admissible under Federal Rule of Evidence 803(8)(C).  See Ellis, 745 F.2d at 299-303; Kehm, 724 F.2d at 617-18.  As such, the cases hardly support the chapter’s suggestion that Rule 703 is a rule of admissibility for epidemiologic studies.

Here the RMSE, in one sentence, confuses Rule 703 with an exception to the rule against hearsay, which would prevent the statistical studies from being received in evidence.  The point is reasonably clear, however, that the studies “may be offered” to explain an expert witness’s opinion.  Under Rule 705, that offer may also be refused. The offer, however, is to “explain,” not to have the studies admitted in evidence.

The RMSE is certainly not alone in advancing this notion that studies are themselves admissible.  Other well-respected evidence scholars lapse into this position:

“Well conducted studies are uniformly admitted.”

David L. Faigman, et al., Modern Scientific Evidence:  The Law and Science of Expert Testimony v.1, § 23:1,at 206 (2009)

Evidence scholars should not conflate admissibility of the epidemiologic (or other) studies with the ability of an expert witness to advert to a study to explain his or her opinion.  The testifying expert witness really has no need to become a conduit for off-hand comments and opinions in the introduction or discussion section of relied upon articles, and the wholesale admission of such hearsay opinions undermines the court’s control over opinion evidence.  Rule 703 authorizes reasonable reliance upon “facts and data,” not every opinion that creeps into the published literature.

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.