For your delectation and delight, desultory dicta on the law of delicts.

Epidemiology, Risk, and Causation – Report of Workshops

November 15th, 2011

This month’s issue of Preventive Medicine includes a series of papers arising from last year’s workshops on “Epidemiology, Risk, and Causation,” at Cambridge University. The workshops were organized by philosopher Alex Broadbent,  a member of the Department of History and Philosophy of Science, in Cambridge University.  The workshops were financially sponsored by the Foundation for Genomics and Population Health (PHG), a not-for-profit British organization.

Broadbent’s workshops were intended for philosophers of science, statisticians, and epidemiologists, lawyers involved in health effects litigation will find the papers of interest as well.  The themes of workshops included:

  • the nature of epidemiologic causation,
  • the competing claims of observational and experimental research for establishing causation,
  • the role of explanation and prediction in assessing causality,
  • the role of moral values in causal judgments, and
  • the role of statistical and epistemic uncertainty in causal judgments

See Alex Broadbent, ed., “Special Section: Epidemiology, Risk, and Causation,” 53 Preventive Medicine 213-356 (October-November 2011).  Preventive Medicine is published by Elsevier Inc., so you know that the articles are not free.  Still you may want to read these at your local library to determine what may be useful in challenging and defending causal judgments in the courtroom.  One of the interlocutors, Sander Greenland, is of particular interest because he shows up as an expert witness with some regularity.

Here are the individual papers published in this special issue:

Alfredo Morabia, Michael C. Costanza, Philosophy and epidemiology

Alex Broadbent, Conceptual and methodological issues in epidemiology: An overview

Alfredo Morabia, Until the lab takes it away from epidemiology

Nancy Cartwright, Predicting what will happen when we act. What counts for warrant?

Sander Greenland, Null misinterpretation in statistical testing and its impact on health risk assessment

Daniel M. Hausman, How can irregular causal generalizations guide practice

Mark Parascandola, Causes, risks, and probabilities: Probabilistic concepts of causation in chronic disease epidemiology

John Worrall, Causality in medicine: Getting back to the Hill top

Olaf M. Dekkers, On causation in therapeutic research: Observational studies, randomised experiments and instrumental variable analysis

Alexander Bird, The epistemological function of Hill’s criteria

Michael Joffe, The gap between evidence discovery and actual causal relationships

Stephen John, Why the prevention paradox is a paradox, and why we should solve it: A philosophical view

Jonathan Wolff, How should governments respond to the social determinants of health?

Alex Broadbent, What could possibly go wrong? — A heuristic for predicting population health outcomes of interventions, Pages 256-259

The Treatment of Meta-Analysis in the Third Edition of the Reference Manual on Scientific Evidence

November 14th, 2011

Meta-analysis is a statistical procedure for aggregating data and statistics from individual studies into a single summary statistical estimate of the population measurement of interest.  The first meta-analysis is typically attributed to Karl Pearson, circa 1904, who sought a method to overcome the limitations of small sample size and low statistical power.  Statistical methods for meta-analysis, however, did not mature until the 1970s.  Even then, the biomedical scientific community remained skeptical of, if not out rightly hostile to, meta-analysis until relatively recently.

The hostility to meta-analysis, especially in the context of observational epidemiologic studies, was colorfully expressed by Samuel Shapiro and Alvan Feinstein, as late as the 1990s:

“Meta-analysis begins with scientific studies….  [D]ata from these studies are then run through computer models of bewildering complexity which produce results of implausible precision.”

* * * *

“I propose that the meta-analysis of published non-experimental data should be abandoned.”

Samuel Shapiro, “Meta-analysis/Smeta-analysis,” 140 Am. J. Epidem. 771, 777 (1994).  See also Alvan Feinstein, “Meta-Analysis: Statistical Alchemy for the 21st Century,” 48 J. Clin. Epidem. 71 (1995).

The professional skepticism about meta-analysis was reflected in some of the early judicial assessments of meta-analysis in court cases.  In the 1980s and early 1990s, some trial judges erroneously dismissed meta-analysis as a flawed statistical procedure that claimed to make something out of nothing. Allen v. Int’l Bus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (suggesting that meta-analysis of observational studies was controversial among epidemiologists).

In In re Paoli Railroad Yard PCB Litigation, Judge Robert Kelly excluded plaintiffs’ expert witness Dr. William Nicholson and his testimony based upon his unpublished meta-analysis of health outcomes among PCB-exposed workers.  Judge Kelly found that the meta-analysis was a novel technique, and that Nicholson’s meta-analysis was not peer reviewed.  Furthermore, the meta-analysis assessed health outcomes not experienced by any of the plaintiffs before the trial court.  706 F. Supp. 358, 373 (E.D. Pa. 1988).

The Court of Appeals for the Third Circuit reversed the exclusion of Dr. Nicholson’s testimony, and remanded for reconsideration with instructions.  In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856-57 (3d Cir. 1990), cert. denied, 499 U.S. 961 (1991); Hines v. Consol. Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991).  The Circuit noted that meta-analysis was not novel, and that the lack of peer-review was not an automatic disqualification.  Acknowledging that a meta-analysis could be performed poorly using invalid methods, the appellate court directed the trial court to evaluate the validity of Dr. Nicholson’s work on his meta-analysis.

In one of many squirmishes over colorectal cancer claims in asbestos litigation, Judge Sweet in the Southern District of New York was unimpressed by efforts to aggregate data across studies.  Judge Sweet declared that “no matter how many studies yield a positive but statistically insignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding a series of zeros together yields yet another zero as the product, adding a series of positive but statistically insignificant SMRs together does not produce a statistically significant pattern.”  In In re Joint E. & S. Dist. Asbestos Litig., 827 F. Supp. 1014, 1042 (S.D.N.Y. 1993).  The plaintiffs’ expert witness who had offered the unreliable testimony, Dr. Steven Markowitz, like Nicholson, another foot soldier in Dr. Irving Selikoff’s litigation machine, did not offer a formal meta-analysis to justify his assessment that multiple non-significant studies, taken together, rule out chance as a likely explanation for an aggregate finding of an increased risk.

Judge Sweet was quite justified in rejecting this back of the envelope, non-quantitative meta-analysis.  His suggestion, however, that multiple non-significant studies could never collectively serve to rule out chance as an explanation for an overall increased rate of disease in the exposed groups is wrong.  Judge Sweet would have better focused on the validity issues in key studies, the presence of bias and confounding, and the completeness of the proffered meta-analysis.  The Second Circuit reversed the entry of summary judgment, and remanded the colorectal cancer claim for trial.  52 F.3d 1124 (2d Cir. 1995).  Over a decade later, with even more accumulated studies and data, the Institute of Medicine found the evidence for asbestos plaintiffs’ colorectal cancer claims to be scientifically insufficient.  Institute of Medicine, Asbestos: Selected Cancers (Wash. D.C. 2006).

Courts continue to go astray with an erroneous belief that multiple studies, all without statistically significant results, cannot yield a statistically significant summary estimate of increased risk.  See, e.g., Baker v. Chevron USA, Inc., 2010 WL 99272, *14-15 (S.D.Ohio 2010) (addressing a meta-analysis by Dr. Infante on multiple myeloma outcomes in studies of benzene-exposed workers).  There were many sound objections to Infante’s meta-analysis, but the suggestion that multiple studies without statistical significance could not yield a summary estimate of risk with statistical significance was not one of them.

In the last two decades, meta-analysis has emerged as an important technique for addressing random variation in studies, as well as some of the limitations of frequentist statistical methods.  In 1980s, articles reporting meta-analyses were rare to non-existent.  In 2009, there were over 2,300 articles with “meta-analysis” in their title, or in their keywords, indexed in the PubMed database of the National Library of Medicine.  See Michael O. Finkelstein and Bruce Levin, “Meta-Analysis of ‘Sparse’ Data: Perspectives from the Avandia Cases” (2011) (forthcoming in Jurimetrics).

The techniques for aggregating data have been studied, refined, and employed extensively in thousands of methods and application papers in the last decade. Consensus guideline papers have been published for meta-analyses of clinical trials as well as observational studies.  See Donna Stroup, et al., “Meta-analysis of Observational Studies in Epidemiology: A Proposal for Reporting,” 283 J. Am. Med. Ass’n 2008 (2000) (MOOSE statement); David Moher, Deborah Cook, Susan Eastwood, Ingram Olkin, Drummond Rennie, and Donna Stroup, “Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement,” 354 Lancet 1896 (1999).  See also Jesse Berlin & Carin Kim, “The Use of Meta-Analysis in Pharmacoepidemiology,” in Brian Strom, ed., Pharmacoepidemiology 681, 683–84 (4th ed. 2005); Zachary Gerbarg & Ralph Horwitz, “Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis,” 41 J. Clin. Epidemiol. 503 (1988).

Meta-analyses, of observational studies and of randomized clinical trials, routinely are relied upon by expert witnesses in pharmaceutical and so-called toxic tort litigation. Id. See also In re Bextra and Celebrex Marketing Sales Practices and Prod. Liab. Litig., 524 F. Supp. 2d 1166, 1174, 1184 (N.D. Cal. 2007) (holding that reliance upon “[a] meta-analysis of all available published and unpublished randomized clinical trials” was reasonable and appropriate, and criticizing the expert witnesses who urged the complete rejection of meta-analysis of observational studies)

The second edition of the Reference Manual on Scientific Evidence gave very little attention to meta-analysis.  With this historical backdrop, it is interesting to see what the new third edition provides for guidance to the federal judiciary on this important topic.


The statistics chapter of the third edition gives continues to give scant attention to meta-analysis.  The chapter notes, in a footnote, that there are formal procedures for aggregating data across studies, and that the power of the aggregated data will exceed the power of the individual, included studies.  The footnote then cautions that meta-analytic procedures “have their own weakness,” without detailing what that one weakness is.  RMSE 3d at 254 n. 107.

The glossary at the end of the statistics chapter offers a definition of meta-analysis:

“meta-analysis. Attempts to combine information from all studies on a certain topic. For example, in the epidemiological context, a meta-analysis may attempt to provide a summary odds ratio and confidence interval for the effect of a certain exposure on a certain disease.”

Id. at 289.

This definition is inaccurate in ways that could yield serious mischief.  Virtually all meta-analyses are built upon a systematic review that sets out to collect all available studies on a research issue of interest.  It is a rare meta-analysis, however, that includes “all” studies in its quantitative analysis.  The meta-analytic process involves a pre-specification of inclusionary and exclusionary criteria for the quantitative analysis of the summary estimate of risk.  Those criteria may limit the quantitative analysis to randomized trials, or to analytical epidemiologic studies.  Furthermore, meta-analyses frequently and appropriately have pre-specified exclusionary criteria that relate to study design or quality.

On a more technical note, the offered definition suggests that the summary estimate of risk will be an odds ratio, which may or may not be true.  Meta-analyses of risk ratios may yield summary estimates of risk in terms of relative risk or hazard ratios, or even of risk differences.  The meta-analysis may combine data of means rather than proportions as well.


The chapter on epidemiology delves into meta-analysis in greater detail than the statistics chapter, and offers apparently inconsistent advice.  The overall gist of the chapter, however, can perhaps best be summarized by the definition offered in this chapter’s glossary:

“meta-analysis. A technique used to combine the results of several studies to enhance the precision of the estimate of the effect size and reduce the plausibility that the association found is due to random sampling error.  Meta-analysis is best suited to pooling results from randomly controlled experimental studies, but if carefully performed, it also may be useful for observational studies.”

Reference Guide on Epidemiology, RSME3d at 624.  See also id. at 581 n. 89 (“Meta-analysis is better suited to combining results from randomly controlled experimental studies, but if carefully performed it may also be helpful for observational studies, such as those in the epidemiologic field.”).  The epidemiology chapter appropriately notes that meta-analysis can help address concerns over random error in small studies.  Id. at 579; see also id. at 607 n. 171.

Having told us that properly conducted meta-analyses of observational studies can be helpful, the chapter hedges considerably:

“Meta-analysis is most appropriate when used in pooling randomized experimental trials, because the studies included in the meta-analysis share the most significant methodological characteristics, in particular, use of randomized assignment of subjects to different exposure groups. However, often one is confronted with nonrandomized observational studies of the effects of possible toxic substances or agents. A method for summarizing such studies is greatly needed, but when meta-analysis is applied to observational studies – either case-control or cohort – it becomes more controversial.174 The reason for this is that often methodological differences among studies are much more pronounced than they are in randomized trials. Hence, the justification for pooling the results and deriving a single estimate of risk, for example, is problematic.175

Id. at 607.  The stated objection to pooling results for observational studies is certainly correct, but many research topics have sufficient studies available to allow for appropriate selectivity in framing inclusionary and exclusionary criteria to address the objection.  The chapter goes on to credit the critics of meta-analyses of observational studies.  As they did in the second edition of the RSME, the authors repeat their cites to, and quotes from, early papers by John Bailar, who was then critical of such meta-analyses:

“Much has been written about meta-analysis recently and some experts consider the problems of meta-analysis to outweigh the benefits at the present time. For example, John Bailar has observed:

‘[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions so extreme, that one might well conclude there is something seriously and fundamentally wrong with the method. For the present . . . I still prefer the thoughtful, old-fashioned review of the literature by a knowledgeable expert who explains and defends the judgments that are presented. We have not yet reached a stage where these judgments can be passed on, even in part, to a formalized process such as meta-analysis.’

John Bailar, “Assessing Assessments,” 277 Science 528, 529 (1997).”

Id. at 607 n.177.  Bailar’s subjective preference for “old-fashioned” reviews, which often cherry picked the included studies is, well, “old fashioned.”  More to the point, it is questionable science, and a distinctly minority viewpoint in the light of substantial improvements in the conduct and reporting of meta-analyses of observational studies.  Bailar may be correct that some meta-analyses should have never left the protocol stage, but the RMSE 3d fails to provide the judiciary with the tools to appreciate the distinction between good and bad meta-analyses.

This categorical rejection, cited with apparent approval, is amplified by a recitation of some real or apparent problems with meta-analyses of observational studies.  What is missing is a discussion of how many of these problems can be and are dealt with in contemporary practice:

“A number of problems and issues arise in meta-analysis. Should only published papers be included in the meta-analysis, or should any available studies be used, even if they have not been peer reviewed? Can the results of the meta-analysis itself be reproduced by other analysts? When there are several meta-analyses of a given relationship, why do the results of different meta-analyses often disagree? The appeal of a meta-analysis is that it generates a single estimate of risk (along with an associated confidence interval), but this strength can also be a weakness, and may lead to a false sense of security regarding the certainty of the estimate. A key issue is the matter of heterogeneity of results among the studies being summarized.  If there is more variance among study results than one would expect by chance, this creates further uncertainty about the summary measure from the meta-analysis. Such differences can arise from variations in study quality, or in study populations or in study designs. Such differences in results make it harder to trust a single estimate of effect; the reasons for such differences need at least to be acknowledged and, if possible, explained.176 People often tend to have an inordinate belief in the validity of the findings when a single number is attached to them, and many of the difficulties that may arise in conducting a meta-analysis, especially of observational studies such as epidemiologic ones, may consequently be overlooked.177

Id. at 608.  The authors are entitled to their opinion, but their discussion leaves the judiciary uninformed about current practice, and best practices, in epidemiology.  A categorical rejection of meta-analyses of observational studies is at odds with the chapter’s own claim that such meta-analyses can be helpful if properly performed.  What was needed, and is missing, is a meaningful discussion to help the judiciary determine whether a meta-analysis of observational studies was properly performed.


The chapter on medical testimony is the third pass at meta-analysis in RMSE 3d.   The second edition’s chapter on medical testimony ignored meta-analysis completely; the new edition addresses meta-analysis in the context of the hierarchy of study designs:

“Other circumstances that set the stage for an intense focus on medical evidence included

(1) the development of medical research, including randomized controlled trials and other observational study designs;

(2) the growth of diagnostic and therapeutic interventions;141

(3) interest in understanding medical decision making and how physicians reason;142 and

(4) the acceptance of meta-analysis as a method to combine data from multiple randomized trials.143

RMSE 3d at 722-23.

The chapter curiously omits observational studies, but the footnote reference (note 143) then inconsistently discusses two meta-analyses of observational, rather than experimental, studies:

“143. Video Software Dealers Ass’n v. Schwarzenegger, 556 F.3d 950, 963 (9th Cir. 2009) (analyzing a meta-analysis of studies on video games and adolescent behavior); Kennecott Greens Creek Min. Co. v. Mine Safety & Health Admin., 476 F.3d 946, 953 (D.C. Cir. 2007) (reviewing the Mine Safety and Health Administration’s reliance on epidemiological studies and two meta-analyses).”

Id. at 723 n.143.

The medical testimony chapter then provides further confusion by giving a more detailed listing of the hierarchy of medical evidence in the form of different study designs:

3. Hierarchy of medical evidence

With the explosion of available medical evidence, increased emphasis has been placed on assembling, evaluating, and interpreting medical research evidence.  A fundamental principle of evidence-based medicine (see also Section IV.C.5, infra) is that the strength of medical evidence supporting a therapy or strategy is hierarchical.  When ordered from strongest to weakest, systematic review of randomized trials (meta-analysis) is at the top, followed by single randomized trials, systematic reviews of observational studies, single observational studies, physiological studies, and unsystematic clinical observations.150 An analysis of the frequency with which various study designs are cited by others provides empirical evidence supporting the influence of meta-analysis followed by randomized controlled trials in the medical evidence hierarchy.151 Although they are at the bottom of the evidence hierarchy, unsystematic clinical observations or case reports may be the first signals of adverse events or associations that are later confirmed with larger or controlled epidemiological studies (e.g., aplastic anemia caused by chloramphenicol,152 or lung cancer caused by asbestos153). Nonetheless, subsequent studies may not confirm initial reports (e.g., the putative association between coffee consumption and pancreatic cancer).154

Id. at 723-24.  This discussion further muddies the water by using a parenthetical to suggest that meta-analyses of randomized clinical trials are equivalent to systematic reviews of such studies — “systematic review of randomized trials (meta-analysis).” Of course, systematic reviews are not meta-analyses, although they are a necessary precondition for conducting a meta-analysis.  The relationship between the procedures for a systematic review and a meta-analysis are in need of clarification, but the judiciary will not find it in the new Reference Manual.

OSHA’s HazCom Standard — Statistical and Scientific Nonsense

November 13th, 2011

Almost 28 years ago, the United States Department of Labor (Occupational Safety and Health Administration or OSHA) promulgated The Hazard Communication Standard. 29 C.F.R. § 1910.1200 (November 1983; effective date November 25, 1985) (HazCom standard).  Initially the HazCom standard applied to importers and manufacturers of chemicals.  Starting one year later, November 25, 1986, the standard covered manufacturing employers, under OSHA jurisdiction, by defining their duties to protect and inform employees.

The HazCom standard applies to all chemical manufacturers and distributors and to

“any chemical which is known to be present in the workplace in such a manner that employees may be exposed under normal conditions of use or in a foreseeable emergency.”

29 C.F.R. § 1910.1200(b)(1), and (b)(2).  The standard requires manufacturers and distributors of hazardous chemicals inform not only their own employees of the dangers posed by the chemicals, but downstream employers and employees as well.  The standard implements this duty to warn downstream employers’ employees by requiring that containers of hazardous chemicals leaving the workplace are labeled with “appropriate hazard warnings.”  See Martin v. American Cyanamid Co., 5 F.3d 140, 141-42 (6th Cir. 1993) (reviewing agency’s interpretation of the standard).

The HazCom standard attempts to provide some definition of the health hazards for which warnings are required:

“For health hazards, evidence which is statistically significant and which is based on at least one positive study conducted in accordance with established scientific principles is considered to be sufficient to establish a hazardous effect if the results of the study meet the definitions of health hazards in this section.”

29 C.F.R. § 1910.1200(d)(2).

This regulatory language is troubling. What does statistically significant mean?  The concept remains important in health effects research, but several writers have subjected the use of significance testing specifically, and frequentist statistics generally, to criticisms.  See, e.g., Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives (Ann Arbor 2008) (example of one of the more fringe, and not particularly cogent, criticisms of frequentist statistics).  And what are the “established scientific principles,” which would allow a single “positive study” to “establish” a hazardous “effect”?

The HazCom standard is important not only for purposes of regulatory compliance, but for its potential implications for products liability law, as well.  With its importance in mind, what can be said about the definition of health hazard, provided in 29 C.F.R. § 1910.1200(d)(2)?

Perhaps a good place to start is with the guidance provided by OSHA on compliance with the HazCom standard.  To be sure, like most agency guidance statements, this one is prefaced with caveats and cautions:

“This guidance is not a standard or regulation, and it creates no new legal obligations. It is advisory in nature, informational in content, and is intended to assist employers in providing a safe and healthful workplace. Pursuant to the Occupational Safety and Health Act, employers must comply with safety and health standards promulgated by OSHA or by a state with an OSHA-approved state plan. In addition, pursuant to Section 5(a)(1), the General Duty Clause of the Act, employers must provide their employees with a workplace free from recognized hazards likely to cause death or serious physical harm. Employers can be cited for violating the General Duty Clause if there is a recognized hazard and they do not take reasonable steps to prevent or abate the hazard. However, failure to implement any specific recommendations in this guidance is not, in itself, a violation of the General Duty Clause. Citations can only be based on standards, regulations, and the General Duty Clause.”

U.S. Dep’t of Labor, Guidance for Hazard Determination for Compliance with the OSHA Hazard Communication Standard (29 CFR § 1910.1200) (July 6, 2007).

Section II of the Guidance describes how manufacturers may assess whether their chemicals are “hazardous.”  A health hazard is defined as a chemical

“for which there is statistically significant evidence based on at least one study conducted in accordance with established scientific principles that acute or chronic health effects may occur in exposed employees.”

A fair-minded person might object that this is no guidance at all.  Statistically significant is not defined in the regulations. Study is not defined.  The guidance specifies that the study or studies must be conducted in accordance with “established scientific principles,” but must the interpretation or judgment of causality be made similarly in accordance with such principles? One would hope so, but the Guidance does not really specify.  The use of “may” seems to inject a level of conjecture or speculation into the hazard assessment.

Section V of the Guidance addresses data analysis, and here the agency attempts to provide some meaning to statistical significance and other terms in the regulation, but in doing so, the Guidance offers incoherent, incredible advice.

The Guidance notes that the regulation specifies one “positive study,” which presumably is a study that is some evidence in favor of an “effect.”  Because we are dealing with chemical exposures in occupational settings, the studies at issue will be, at best, observational studies.  Randomized clinical trials are out.  The one study (at least) at issue must be sufficient to establish a hazardous effect if that effect is considered a “health hazard” within the meaning of the regulations.  This is problematic on many levels.  What sort of study are we discussing?  An experimental study in planaria worms, a case study of a single human, an ecological study, or an analytical epidemiologic (case-control or cohort) study?  Whatever the study is, it would be a most remarkable study if it alone were “sufficient” to “establish” an “effect.”

A reasonable manufacturer or disinterested administrator surely would interpret the sufficiency requirement to mean that the entire evidentiary display must be considered rather than whether one study, taken in isolation, ripped from its scientific context, should be used to suggest a duty to warn.  The Guidance, and the regulations, however, never address the real-world complexity of hazard assessment.

Section V of the Guidance offers a failed attempt to illuminate the meaning of statistical significance:

“Statistical significance is a mathematical determination of the confidence in the outcome of a test. The usual criterion for establishing statistical significance is the p-value (probability value). A statistically significant difference in results is generally indicated by p < 0.05, meaning there is less than a 5% probability that the toxic effects observed were due to chance and were not caused by the chemical. Another way of looking at it is that there is a 95% probability that the effect is real, i.e., the effect seen was the result of the chemical exposure.”

Few statisticians or scientists would accept the proffered definition as acceptable.  The Guidance’s statement that a p-value is equivalent to the probability of the “toxic effect” occurring by chance is unacceptable for several reasons.

First, it is a notoriously incorrect, fallacious statement of the meaning of a p-value:

“Since p is calculated by assuming the null hypothesis is correct (that there is no difference [between observed and expected] in the full population), the p-value cannot give the chance that this hypothesis is true.  The p-value merely gives the chance of getting evidence against the null hypothesis as strong or stronger than the evidence at hand — assuming that the null hypothesis … is correct.”

David H. Kaye, David E. Bernstein, and Jennifer L. Mnookin, The New Wigmore: Expert Evidence § 12.8.2, at 559 (2d ed. 2010) (discussing the transpositional fallacy).

Second, even if we could ignore the statistical solecism, the Guidance’s use of a mechanical test for statistical significance is troubling.  The p-value is not necessarily an appropriate protection against Type I error, or a “false alarm” that there is an association between the exposure and outcome of interest.  Multiple testing and other aspects of a study may inflate the number of false alarms to the point that a study with a low p-value, even one much lower than 5%, will not rule out the likely role of chance as an explanation for the study’s result.

Third, the Guidance’s suggestion that “statistical significance” boils down to a conclusion that the “effect is real” may be its greatest offense against scientific and statistical methodology.  Section V of the Guidance emphasizes that the HazCom standard states that

“evidence that is statistically significant and which is based on at least one positive study conducted in accordance with established scientific principles is considered to be sufficient to establish a hazardous effect if the results of the study meet the [HCS] definitions of health hazards.”

This is nothing more than semantic fiat and legerdemain.

Statistical significance may, in some circumstances, permit an inference that the divergence from the expected was not likely due to chance, but it cannot, in the context of observational studies, allow for a conclusion that the divergence resulted because of a cause-effect relationship between the exposure and the outcome.  Statistical significance cannot rule out systemic bias or confounding in the study; nor can it help us reconcile inconsistencies across studies.  The study may have identified an association, which must be assessed for its causal or non-causal nature, in the context of all relevant evidence.  See Arthur Bradford Hill, “The Environment and Disease: Association or Causation?” 58 Proc. Royal Soc’y Med. 295 (1965).”

The OSHA Guidance is really no guidance at all.  Ensuring worker health and safety by requiring employers to provide industrial hygiene protections for workers is an exceedingly important task, but this aspect of the HazCom standard is incoherent and incompetent. Workers and employers are in the dark, and product suppliers are vulnerable to arbitrary and capricious enforcement.

Lording the Data – Scientific Fraud

November 10th, 2011

Last week, the New York Times published a news story about psychologist Diederik Stapel, of the Netherlands.  Tilburg University accused him of having committed research fraud  in several dozen published papers, including the journal Science, the official journal of the AAAS.  See Benedict Carey, “Fraud Case Seen as a Red Flag for Psychology Research: Noted Dutch Psychologist, Stapel, Accused of Research Fraud,” New York Times (Nov. 2, 2011).  The Times expressed surprise over the suggestion that psychology is plagued by fraud and sloppy research.  The surprise is that there are not more stories in the lay media over the poor quality of scientific research.  The readers of Retraction Watch, and the Office of Research Integrity’s blog will recognize how commonplace Stapel’s fraud is.

Stapel’s fraud has wide-ranging implications for the doctoral students, whose dissertations he supervised, and for colleagues, with whom he collaborated.  Stapel apologized and expressed his regret, but his conduct leaves a large body of his work, and that of others, under a cloud of suspicion.

Lording the Data

The University committee reported that Stapel had escaped detection for a long time because he was “lord of the data,” by refusing to disclose and share the data.

“Outright fraud may be rare, these experts say, but they contend that Dr. Stapel took advantage of a system that allows researchers to operate in near secrecy and massage data to find what they want to find, without much fear of being challenged.”

Benedict Carey, “Fraud Case,” New York Times (Nov. 2, 2011).  Data sharing is preached but rarely practice.

In a recent publication, Dr. Wicherts and his colleagues, at the University of Amsterdam, reported that two-thirds of his sample of Dutch research psychologists refused to share their data, in contravention of the established ethical rules of the discipline. Remarkably, many of the refuseniks had explicit contractual obligations with their publishing journals to provide data.  Jelte Wicherts, Marjan Bakker, Dylan Molenaar, “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results,” PLoS ONE 6(11): e26828 (Nov. 2, 2011)

Scientific fraud seems no more common among scientists with industry ties, which are so often the subject of ad hominem conflict of interest claims.  Instead, fraudfeasors such as Stapel or Hwang Woo-suk are more often simply egotistical, narcissistic, self-aggrandizing, self-promoting, or delusional.  In the United States, litigation, occasionally has brought out charlatans, but it has also resulted in high-quality studies that have provided strong evidence for or against litigation claims.  Compare Hon. Jack B. Weinstein, “Preliminary Reflections on Administration of Complex Litigation” 2009 Cardozo L. Rev. de novo 1, 14 (2009) (describing plaintiffs’ expert witnesses in silicone litigation as “charlatans” and the litigation as largely based upon fraud) with Committee on the Safety of Silicone Breast Implants, Institute of Medicine, Safety of Silicone Breast Implants (Wash. D.C. 1999) (reviewing studies, many of which were commissioned by litigation defendants, and which collectively showed lack of association between silicone and autoimmune diseases).

The relation between litigation and research is one that has typically been approached by self-righteous voices, such as David Michaels and David Egilman, and others who have their own deep conflicts of interest.  What is clear is that all litigants, as well as the public, would benefit from enforcing data sharing requirements.  SeeLitigation and Research” (April 15, 2007) (science should not be built upon blind trust of scientists: “Nullius in verba.”).

The Times article emphasized Wicherts’ research about lack of data sharing, and suggested that data sharing could improve the quality of scientific publications.  The time may have come, however, for sterner measures of civil and criminal penalties for scientists who abuse and waste governmental funding, or who aid and abet fraudulent litigation.