New-Age Levellers – Flattening Hierarchy of Evidence

The Levelers were political dissidents in England, in the middle of the 17th century.  Among their causes, Levelers advanced popular sovereignty, equal protection of the law, and religious tolerance.

The political agenda of the Levelers sounds quite noble to 21st century Americans, but their ideals have no place in the world of science:  not all opinions or scientific studies are created equally; not all opinions are worthy of being taken seriously in scientific discourse or in courtroom presentations of science; and not all opinions should be tolerated, especially when they claim causal conclusions based upon shoddy or inadequate evidence.

In some litigations, legal counsel set out to obscure the important quantitative and qualitative distinctions among scientific studies.  Sometimes, lawyers find cooperative expert witnesses, willing to engage in hand waving about “the weight of the evidence,” where the weights are assigned post hoc, in a highly biased fashion.  No study (that favors the claim) left behind.  This is not science, and it is not how science operates, even though some expert witnesses, such as Professor Cranor in the Milward case, have been able to pass off their views as representative of scientific practice.

A sound appreciation of how scientists evaluate studies, and of why not all studies are equal, is essential to any educated evaluation of scientific controversies.  Litigants who face high-quality studies, with results inconsistent with their litigation claims, may well resort to “leveling” of studies.  This leveling may be advanced out of ignorance, but more likely the leveling is an attempt to snooker courts with evidence from exploratory, preliminary, and hypothesis-generating studies as somehow equal to, or greater than, the value of hypothesis-testing studies.

Some of the leveling tactics that have become commonplace in litigation include asserting that:

  • All experts witnesses are the same;
  • All expert witnesses conduct the same analysis;
  • All expert witnesses read articles, interpret them, and offer opinions;
  • All expert witnesses are inherently biased;
  • All expert witnesses select the articles to read and interpret in line with their biases;
  • All epidemiologic studies are the same;
  • All studies are flawed; and
  • All opinions are, in the final analysis, subjective.

This leveling strategy can be seen in Professor Margaret Berger’s introduction to the Reference Manual on Scientific Evidence (RMSE 3d), where she supported an ill-defined “weight-of-the-evidence” approach to causal judgments. SeeLate Professor Berger’s Introduction to the Reference Manual on Scientific Evidence” (Oct. 23, 2011).

Other chapters in the RMSE 3d are at odds with Berger’s introduction.  The epidemiology chapter does not explicitly address the hierarchy of studies, but it does describe cross-sectional, ecological, and secular trend studies are less able to support causal conclusions.  Cross-sectional studies are described as “rarely useful in identifying toxic agents,” RMSE 3d at 556, and as “used infrequently when the exposure of interest is an environmental toxic agent,” RMSE 3d at 561.  Cross-sectional studies are described as hypothesis-generating as opposed to hypothesis testing, although not in those specific terms.  Id. (describing cross-sectional studies as providing valuable leads for future research).  Ecological studies are described as useful for identifying associations, but not helpful in determining whether such associations are causal; and ecological studies are identified as a fertile source of error in the form of the “ecological fallacy.”  Id. at 561 -62.

The epidemiology chapter perhaps weakens its helpful description of the limited role of ecological studies by citing, with apparent approval, a district court that blinked at its gatekeeping responsibility to ensure that testifying expert witnesses did, in fact, rely upon “sufficient facts or data,” as well as upon studies that are “of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject.” Rule 703. RMSE 3d at 561 n.34 (citing Cook v. Rockwell International Corp., 580 F. Supp. 2d 1071, 1095–96 (D. Colo. 2006), where the district court acknowledged the severe limitations of ecological studies in supporting causal inferences, but opined that the limitations went to the weight of the study). Of course, the insubstantial weight of an ecological study is precisely what may result in the study’s failure to support a causal claim.

The ray of clarity in the epidemiology chapter about the hierarchical nature of studies is muddled by an attempt to level epidemiology and toxicology.  The chapter suggests that there is no hierarchy of disciplines (as opposed to studies within a discipline).  RMSE 3d at 564 & n.48 (citing and quoting symposium paper that “[t]here should be no hierarchy [among different types of scientific methods to determine cancer causation]. Epidemiology, animal, tissue culture and molecular pathology should be seen as integrating evidences in the determination of human carcinogenicity.” Michele Carbone et al., “Modern Criteria to Establish Human Cancer Etiology,” 64 Cancer Res. 5518, 5522 (2004).)  Carbone, of course, is best known for his advocacy of a viral cause (SV40), of human mesothelioma, a claim unsupported, and indeed contradicted, by epidemiologic studies.  His statement does not support the chapter’s leveling of epidemiology and toxicology, and Carbone is, in any event, an unlikely source to cite.

The epidemiology chapter undermines its own description of the role of study design in evaluating causality by pejoratively asserting that most epidemiologic studies are “flawed”:

“It is important to emphasize that all studies have ‘flaws’ in the sense of limitations that add uncertainty about the proper interpretation of the results.9 Some flaws are inevitable given the limits of technology, resources, the ability and willingness of persons to participate in a study, and ethical constraints. In evaluating epidemiologic evidence, the key questions, then, are the extent to which a study’s limitations compromise its findings and permit inferences about causation.”

RSME 3d at 553.  This statement is actually a significant improvement over the second edition, where the authors of the epidemiology chapter asserted, without qualification:

“It is important to emphasize that most studies have flaws.”

RMSE 2d 337.  The “flaws” language from the earlier chapter was used on occasion by courts that were set on ignoring competing interpretations of epidemiologic studies.  Since all or most studies are flawed, why bother figuring out what is valid and reliable?  Just let the jury sort it out.  This is not an aid to gatekeeping, but rather a prescription for allowing the gatekeeper to call in sick.

The current epidemiology chapter essentially backtracks from the harsh connotations of its use of the term “flaws,” by now equating the term with “limitations.”  Flaws and limitations, however, are quite different from one another.  What is left out in the third edition’s description is the sense that there are indeed some studies that are so flawed that they must be disregarded altogether.  There may also be limitations in studies, especially observational studies, which is why the party with the burden of proof should generally not be allowed to proceed with only one or two epidemiologic studies.  Rule 702, after all, requires that an expert opinion to be based upon “sufficient facts or data.”

The RSME 3d chapter on medical evidence is a refreshing break from the leveling approach seen elsewhere.  Here at least, the chapter authors devote several pages to explaining the role of study design in assessing an etiological issue:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.

When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

John B. Wong, Lawrence O. Gostin, and Oscar A. Cabrera, “Reference Guide on Medical Testimony,” RMSE 3d 687, 723 -24 (2011).  The third edition’s chapter is a significant improvement of the second edition’s chapter on medical testimony, which does not mention the hierarchy of evidence.  Mary Sue Henifin, Howard M. Kipen, and Susan R. Poulter, ” Reference Guide on Medical Testimony,” RMSE 2d 440 (2000).  Indeed, the only time the word “hierarchy” appeared in the entire second edition was in connection with the hierarchy of the federal judiciary.

The tension, contradictions, and differing emphases among the various chapters of the RSME 3d point to an important “flaw” in the new edition.  The chapters appear to have been written largely in isolation, and without much regard for what the other chapters contain.  The chapters overlap, and indeed contradict one another on key points.  Witness Berger’s rejection of the hierarchy of evidence, the epidemiology chapter’s inconstant presentation of the concept without mentioning it by name, and the medical testimony chapter’s embrace and explicit presentation of the hierarchical nature of medical study evidence.  Fortunately, the laissez-faire editorial approach allowed the disagreement to remain, without censoring any position, but the federal judiciary is not aided by the contradiction and tension in the approaches.

Given the importance of the concept, even the medical testimony chapter in RSME 3d may seem to be too little, too late to be helpful to the judiciary.  There are book-length treatments of systematic reviews and “evidence-based medicine”: the three pages in Wong’s chapter barely scratch the surface of this important topic of how evidence is categorized, evaluated, and synthesized in making judgments of causality.

There are many textbooks and articles available to judges and lawyers on how to assess medical studies.  Recently, John Cherrie has posted on his blog, OH-world, about a series of 17 articles, in the journal Aerzteblatt International, on the proper evaluation of medical and epidemiologic studies.

These papers, overall, make the point that not all studies are equal, and that not all evidentiary displays are adequate to support conclusions of causal association.  The papers are available without charge from the journal’s website:

01. Critical Appraisal of Scientific Articles

02. Study Design in Medical Research

03. Types of Study in Medical Research

04. Confidence Interval or P-Value?

05. Requirements and Assessment of Laboratory Tests: Inpatient Admission Screening

06. Systematic Literature Reviews and Meta-Analyses

07. The Specification of Statistical Measures and Their Presentation in Tables and Graphs

08. Avoiding Bias in Observational Studies

09. Interpreting Results in 2×2 Tables

10. Judging a Plethora of p-Values: How to Contend With the Problem of Multiple Testing

11. Data Analysis of Epidemiological Studies

12. Choosing statistical tests

13. Sample size calculation in clinical trials

14. Linear regression analysis

15. Survival analysis

16. Concordance analysis

17. Randomized controlled trials

This year, the Journal of Clinical Epidemiology began publishing a series of papers, known by the acronym GRADE, which aim to provide guidance on how studies are categorized and assessed for their evidential quality in supporting treatments and intervention.  The GRADE project is led by Gordon Guyatt, who is known for having coined the term “evidence-based medicine,” and written widely on the subject.  Guyatt, along with his colleagues including Peter Tugwell (who was one of the court-appointed expert witnesses in MDL 926), has described the GRADE project:

“The ‘Grades of Recommendation, Assessment, Development, and Evaluation’ (GRADE) approach provides guidance for rating quality of evidence and grading strength of recommendations in health care. It has important implications for those summarizing evidence for systematic reviews, health technology assessment, and clinical practice guidelines. GRADE provides a systematic and transparent framework for clarifying questions, determining the outcomes of interest, summarizing the evidence that addresses a question, and moving from the evidence to a recommendation or decision. Wide dissemination and use of the GRADE approach, with endorsement from more than 50 organizations worldwide, many highly influential   http://www.gradeworkinggroup.org/), attests to the importance of this work. This article introduces a 20-part series providing guidance for the use of GRADE methodology that will appear in the Journal of Clinical Epidemiology.”

Gordon Guyatt, Andrew D. Oxman, Holger Schünemann, Peter Tugwell, Andre Knottnerus, “GRADE guidelines – new series of articles in Journal of Clinical Epidemiology,” 64 J. Clin. Epidem. 380 (2011).  See also Gordon Guyatt, Andrew Oxman, et al., for the GRADE Working Group, “Rating quality of evidence and strength of recommendations GRADE: an emerging consensus on rating quality of evidence and strength of recommendations,” 336 Brit. Med. J. 924 (2008).  [pdf]

Of the 20 papers planned, 9 of the GRADE papers have been published to date in the Journal of Clinical Epidemiology:

01 Intro – GRADE evidence profiles & summary of findings tables

02 Framing question & deciding on important outcomes

03 Rating quality of evidence

04 Rating quality of evidence – study limitations (risk of bias)

05 Rating the quality of evidence—publication bias

06 Rating up quality of evidence – imprecision

07 Rating quality of evidence – inconsistency

08 Rating quality of evidence – indirectness

09 Rating up quality of evidence

The GRADE guidance papers focus on the efficacy of treatments and interventions, but in doing so, they evaluate “effects” and are thus applicable to the etiologic issues of alleged harm that find their way into court.  The papers build on other grading systems advanced previously by the Oxford Center for Evidence-Based Medicine, the U.S. Preventive Services Task Force (Agency for Healthcare Research and Quality AHRQ), the Cochrane Collaboration, as well as many individual professional organizations.

GRADE has had some success in harmonizing disparate grading systems, and forging a consensus among organizations that had been using their own systems, such as the  World Health Organization, the American College of Physicians, the American Thoracic Society, the Cochrane Collaboration, the American College of Chest Physicians, the British Medical Journal, and Kaiser Permanente.

There are many other important efforts to provide consensus support for improving the quality of the design, conduct, and reporting of published studies, as well as the interpretation of those studies once published.  Although the RSME 3d does a good job of introducing its readers to the basics of study design, it could have done considerably more to help judges become discerning critics of scientific studies and of conclusions based upon individual or multiple studies.