The Second and Third Editions of the Reference Manual on Scientific Evidence contained a chapter, “How Science Works,” by Professor David Goodstein. This chapter ambitiously set out to cover philosophy and sociology of science to help orient judges as strangers in a strange land. Goodstein’s chapter had been a useful introduction to scientific methodology, and it countered some of the antic ideas seen in some judicial opinions, as well as in some other chapters of the Manual. Goodstein brought a good deal of experience and expertise to the task. He was a distinguished professor of physics and Vice Provost at the California Institute of Technology, and he had written engagingly about scientific discovery and the pathology of science.[1] Sadly, Goodstein died in April 2024. His death may have had some role in the delayed publication of the Fourth Edition of the Manual,[2] and the improvident replacement of his chapter with a new chapter written by authors less articulate about how science works.
The substitute chapter on “How Science Works” was written by two authors considerably less accomplished than the late Professor Goodstein.[3] Michael Weisberg is a professor of philosophy at the University of Pennsylvania, where he is the deputy director of Perry World House, which “analyzes global policy challenges through the realms of climate, democracy, global justice and human rights, and security.” The connection with Perry House may explain the new chapter’s heavy reliance upon the development of the chlorofluorocarbon (CFC) connection to ozone layer depletion as an exemplar of scientific discovery and knowledge. The University of Pennsylvania webpage describes Weisberg as “educat[ing] the next generation of environmental leaders in the classroom, at the negotiating table, and in the field, ensuring that their voices have maximal impact on addressing the climate crisis.”[4] So we have a philosopher of advocacy science, as it were. Some readers might think those credentials are not optimal for preparing a nuts-and-bolts description of how science works. Reading sections of the new chapter will not diminish their concerns.
Joining with Weisberg on this new version of “How Science Works,” is Anastasia Thanukos, who works at the University of California Museum of Paleontology. Thanukos has her masters degree in integrative biology, and her doctorate in science education.[5]
The new “method” chapter has some virtues. As did Goodstein’s chapter, the new authors put peer review into a realistic perspective that should keep judges from being snoockered into admitting weak or bogus evidence because it had been published in a peer reviewed journal.[6] The authors should have gone much farther in pointing out that the rise of predatory and pay-to-play journals, as well as journals controlled by advocacy groups, have undermined much of the publishing model of modern science.
Weisberg and Thanukos discuss “expertise” in a way that is interesting but irrelevant to legal cases. They seem blithely unaware that the standard for qualifying an expert witness is extremely low. Who will disbuse them when they argue that “[i]t is worth evaluating the closeness of a scientist’s disciplinary expertise to a scientific topic on which expert testimony is delivered”?[7] In what emerges as a consistent pattern of giving anti-manufacturing industry examples, the authors point to Richard Scorer as an accomplished scientist, who had no specific expertise in CFC ozone depletion. Notwithstanding the lack of specific expertise, an industry-backed group promoted Scorer’s views that criticized the CFC-ozone depletion hypothesis.[8] Citing Naomi Oreskes, the new Manual chapter states that “[t]he problem of scientists with legitimate expertise in one field weighing in on a scientific question outside their area of expertise is a pernicious one that has affected public acceptance of science and policy on issues such as climate change and tobacco exposure.”[9] Later, when Weisberg and Thanukos discuss the Milward case, they miss the pernicious influence that flowed from allowing Martyn Smith, a toxicologist, to give methodologically muddled opinion testimony on epidemiology. Pernicious is where you find it, and the authors of the new chapter find virtually all untoward instances of poor scientific method and conduct to originate from manufacturing industry.
Weisberg and Thanukos introduce a discussion of the “replication crisis,” a phrase and concept absent from the third edition of the Reference Manual.[10] The authors express some skepticism that there is an actual crisis over replication,[11] but their focus on climate science may mean that they are simply blinded by groupthink in that discipline. Their discussion of retractions omits the steep rise in retraction rates in most scientific disciplines,[12] and the authors ignore the proliferation of poor quality journals. Positively, the authors introduce a discussion of study preregistration, a notion absent from the third edition of the Manual, and they explain that such preregistration may serve as a bulwark against data dredging post hoc analyses.[13] Negatively, the authors ignore how frequently preregistered protocols are not used, or are used and then violated.
Weisberg and Thanukos appropriately ignore “weight of the evidence” (WOE) and “inference to the best explanation” (IBE). Readers might (mistakenly) think that the new chapter implicitly rejects WOE, as put forth by Carl Cranor and credulously accepted by the First Circuit in Milward, when the chapter authors insist that
“the judge’s task requires a deeper examination of the available evidence and methods by which it was arrived at, as well as an assessment of how the community of experts in this area has evaluated or would evaluate the evidence and reasoning in question.”[14]
Contrary to the Milward decision from 2011, the new authors are not shy about stating the obvious; there is good science, and there is bad science. Not all “judgment” about causality is acceptable and fit for submission to juries.[15] Given the judicial resistance to Rule 702, the obvious here requires stating. Weisberg and Thanukos acknowledge that some scientific judgment is unreliable or invalid because it was based upon work that was not carried out in accordance with current standards for scientific investigation and inference.[16] It should not surprise anyone that most of their examples of bad science are the product of manufacturing industry; the authors are oblivious to bad science sponsored by the lawsuit industry or by non-governmental advocacy organizations (NGOs).
Weisberg and Thanukos frame scientific disagreements and debates as governed by both data and ethical norms. Science is not infinitely contestable. There are identifiable norms, including a norm that scientists should “seek relevant information,” and “scrutinize ideas and evidence.”[17] Contrary to Milward’s standard of judicial abstention and credulity in the face of dodgy causal claims, these authors state what should be obvious, that scientific scrutiny involves, among other things, “an evaluation of methods, considering potential biases and oversights.”[18]
The chapters’ authors, non-lawyers, get closer to the heart of the error in Milward’s abstention doctrine with their recognition of what should have been obvious to the authors of the law chapter (Richter & Capra):
“When research relevant to a trial has not yet been scrutinized by a community with the appropriate technical expertise, a judge may be placed in the position of providing or requesting this scrutiny.”[19]
Rather than some vague, subjective, and content-free WOE standard, Weisberg and Thanukos urge scientists, and by implication judges as well, to engage in serious efforts to “identify and avoid bias” and abide by ethical guidelines.[20] In other (my) words, the new authors agree that there is a standard of care reflected in the norms of science, and consequently there can be deviations from that standard. For Weisberg and Thanukos, compliance with the normative structure of scientific investigations is at the heart of building up accurate and predictive conclusions from data.[21] As part of their communitarian and normative conception of the scientific process, the authors appear to accept the reality and necessity for judges to act as gatekeepers.[22]
And while this recognition of standards and the need to police against deviations from standards is commendable, Weisberg and Thanukos proceed to give an abridgment of scientific method and process that is distorted and erroneous. They steadfastly ignore the concept of hierarchy of evidence, and thus provide illegitimate cover for levelers of evidence. In discussing randomized controlled trials, for instance, they note that such trials are often taken as “the gold standard,” but then they counter, without citation, support, or argument, that such trials “are just one line of evidence among many.”[23] The authors elide discussion and reconciliation of when that “just one line of evidence” conflicts with observational studies.
Notwithstanding their helpful comments about the need to evaluate studies for bias and other errors, these authors enter into the Milward controversy with an observation that assessing many lines of evidence is required and can be difficult for courts, and has led to “controversy.” Citing to papers including one by the late Margaret Berger at her notorious lawsuit industry SKAPP-funded Coronado Conference, Weisberg and Thanukos float the observation that:
“In science, the available evidence (some of which may come from other research programs not designed to test the hypothesis under consideration) is evaluated as a body, along with the strengths, weaknesses, and caveats relating to each type of data, an approach which, some scholars have argued, the judiciary has not always followed.98”[24]
This claim that the available evidence is evaluated as “a body” is presented as a fact about how science works, without any citation or argument. Several comments are in order. First, the claim is at odds with the authors’ own statements that scientific norms require evaluating each study for biases and other disqualifying flaws. Second, the claim is at odds with the authors’ own reference to systematic reviews and meta-analyses,[25] which are governed by protocols with inclusionary and exclusionary criteria for individual studies, and which require consideration of individual study validity before it enters the “body” of evidence that is quantitatively or qualitatively evaluated. In the authors’ words, “authors delineate both the criteria that studies must meet for inclusion in the review and the methods that will be used to assess the studies.”[26] The Milward case involved an expert witness who had proffered the very opposite of a systematic review in the form of post hoc rejiggering of studies and their data to fit a pre-conceived litigation goal. In the context of addressing the replication crisis, Weisberg and Thanukos correctly observe “peer review alone cannot ensure that the conclusions of published studies are actually correct, highlighting the responsibility judges bear in evaluating the validity of the methodologies that contributed to a particular piece of research.”[27] Of course, the Milward case involved a hired expert witness whose unprincipled re-analysis of studies was never peer reviewed or published.
Third, the authors could easily have found additional support for the contrary proposition that individual studies must be evaluated before being considered as part of the entire evidentiary display. The IARC Preamble, which roughly describes how that agency arrives at its so-called hazard classifications of human carcinogenicity, specifies that individual studies within each of three streams of evidence are evaluated for validity and soundness before contributing to a sub-conclusion with respect to (1) epidemiology, (2) toxicology, and (3) mechanistic lines of evidence.[28] Each of those three lines of evidence is adjudged “sufficient,” “limited,” or “inadequate,” by specialists in the three respective areas, before an overall evaluation is reached. There is much that is objectionable in the IARC working group procedures, but this division of labor and the need to consider disparate lines of evidence and studies within each line separately before attempting a synthesis, is present in all systematic review methodology. The suggestion from Weisberg and Thanukos that “the available evidence” in science is “evaluated as a body” is not only unsupported, but it is demonstrably false and misleading.
This claim about holistic evaluation is a fairly transparent but failed attempt to support a claim made in the chapter on the admissibility of expert witness evidence by Liesa Richter and Daniel Capra, who present an exposition of the notorious Milward case, without criticism, in a way to suggest that the case represents appropriate judicial gatekeeping under Rule 702, and that the case is consistent with scientific norms.[29] The chapter on how science works, after having stated a false claim about scientific methodology for synthesis and integrating disparate lines of evidence, attempts to provide a gloss on the similar and equally benighted claim of Richter and Capra, in footnote 98:
“98. Some scholars have raised concerns that the courts have on occasion unfairly dismissed numerous individual lines of evidence as being flawed or insufficiently conclusive and concluded that evidence is lacking, when in fact the body of evidence, taken as a whole, points to a clear conclusion. For more, see discussion of Milward v. Acuity Specialty Products Group, Inc.; see also Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, in this manual; Berger 2005, supra note 97; and Steve C. Gold, A Fitting Vision of Science for the Courtroom, 3 Wake Forest J.L. & Pol’y 1 (2013).”
Some “scholars” have indeed said such things in their more unscholarly moments; some scholars have criticized Milward, but they are not cited in this new methods chapter. The footnote is accurate, but highly misleading by omission. The First Circuit in Milward also said as much, also without support or justification, and Richter and Capra, in their chapter of the Manual, fourth edition, parrot the Milward case. Weisberg and Thanukos cite to two articles, by Margaret Berger and by Steven Gold, both law professors, not scientists, and both ideologically hostile to Rule 702 gatekeeping. The Berger article was from a lawsuit-industry SKAPP funded symposium known as the Coronado Conference, and the Gold paper comes out of a symposium sponsored by the lawsuit industry itself and the Center for Progressive Reform, an advocacy NGO to which one of Mr. Milward’s expert witnesses, Carl Cranor, belongs. So the authors of the new science methodology chapter failed to cite any scientific source, but cited to papers by lawyers in the capture of the lawsuit industry, and a single (infamous) decision that ignored Rules 702 and 703, as well as the extensive literature on systematic reviews. Weisberg and Thanukos could have cited many sources that contradicted their claim, and the claim of the lawsuit industry sponsored lawyers, but they did not. This is what biased and subversive scholarship looks like.
Funding Bias – The New McCarthyism
The selective citation to articles sponsored by the lawsuit industry is ironic in the context of what Weisberg and Thanukos have to say elsewhere about the “funding effect.” Some of what the authors say about personal bias is almost reasonable. For instance, they suggest that funding source is a “valid consideration” in evaluating methodologies and conclusions of expert testimony, and presumably of published studies as well, but not a sufficient reason to exclude such testimony or reliance.[30] Interestingly, these authors ignored the funding and the ideological interests of the symposia they cited in support of the repudiated Milward abstention doctrine.
Over three decades ago, Kenneth Rothman, the founder of Epidemiology, the official journal of the International Society for Environmental Epidemiology (ISEE), wrote his protest against the obsession with funding in article that should have been cited in the new chapter, for balance. Rothman described the fixation on funding as the “new McCarthyism in science,” which manifested as intolerance toward industry-sponsored studies, and strict scrutiny of “conflict-of-interest” (COI) disclosures.[31] The new McCarthyites amplify the gamesmanship over COI disclosures by excusing or justifying non-disclosure of COIs from scientists who have positional conflicts, or who are aligned with advocacy groups or with the lawsuit industry.
This asymmetrical standard for adjudging conflicts is on full display in the Weisberg and Thanukos chapter, when they claim that “in pharmaceuticals, there is a strong tendency for industry-sponsored trials to favor the industry’s product.”[32] The chapter authors, and their cited source, ignore the context in which the pharmaceutical industry scientists publish clinical trial results. A successful clinical trial that showed efficacy with minimal adverse events is the result of years of prior research, including phase I and II trials, and preclinical testing. If the research fails to show efficacy, or shows unreasonable harm, in any of this prior research, the phase III trial is never done and so never published. If the medication is never licensed, the phase III trial will generally not be published. The selection effects are obvious and overwhelming in determining that the published results of phase III trials will be work that favors the sponsor. The “failed” phase III trial may result in a securities class action against the pharmaceutical company. In the realm of observational studies, some work commissioned by manufacturing industry has its origins in the poorly conducted, flawed work of environmental zealots and NGOs. Manufacturing industry has an obvious interest in correcting the scientific record, and again, any carefully done study would rebut that of the zealots and favor the industry sponsor.
Elsewhere, the authors offer a more balanced assessment when they observe that “[a]ll research is potentially influenced by bias, and every funder of research has the potential to introduce a source of bias.”[33] Similarly, the fourth edition chapter notes that “[a]ll scientists have some sort of motivation for their work, and this does not preclude scientific knowledge building, so long as biased methodologies and interpretations are avoided.”[34] Their recognition that motivated reasoning is everywhere suggests that all research should receive scrutiny regardless of apparent or disclosed funding source.[35]
When it comes to providing examples of funding-effect distortions of science, Weisberg and Thanukos seem to blank on instances created by the lawsuit industry or by environmental NGOs. The reader should contrast how readily and stridently the authors point to bias in industry-sponsored research with how the authors tie themselves up with double negatives when making the same point about NGOs:
“That is not to suggest that government-or nongovernmental organization (NGO)-sponsored research is necessarily free from bias.”[36]
The cognitive dissonance is palpable. The only conclusion that could be drawn from such a locution is that Weisberg and Thanukos have not worked very hard to identify and disclose their own biases.
STATISTICS DONE POORLY
When it comes to explaining and discussing the role of statistical methods in the scientific process, Weisberg and Thanukos go off the rails. The new chapter is an unmitigated disaster, which should have been corrected in the peer review and oversight process. The first sign of trouble became apparent upon checking the definition of “p-value” in the chapter’s glossary:
“p-value. A statistic that gives the calculated probability that the null hypothesis could be true even given the observed differences between conditions.”[37]
This definition is the transposition fallacy on steroids. Obviously, a p-value cannot be the probability that the null hypothesis “could be true” when the procedure for calculating a p-value must assume that the null hypothesis is true, along with a specified probability model. Equally important, the p-value does not describe a probability in connection with the null hypothesis because it describes the probability of observing data as different from the null, or more so, as seen in this particular sample. The statistics chapter in the Manual by Hall and Kaye states the meaning correctly. The coverage of statistical concepts by Weisberg and Thanukos should be studiously ignored.
The outrageously incorrect definition of p-value in the glossary is not an isolated error. The authors are clearly statistically challenged. In the text of their chapter, they incorrectly describe the p-value, consistently with their aberrant glossary entry:
“the commonly used p-value approach, scientists compare a test hypothesis (e.g., that drug X is effective) to a null (e.g., that there is no difference in cure rates between those who took drug X and those who took a placebo). Scientists then calculate the probability that the null hypothesis could be true even with the observed difference between conditions (e.g., the cure rate of patients taking drug X compared to that of those taking a placebo).”[38]
Weisberg and Thanukos thus conflate frequentist and Bayesian statistics. They also obliterate the meaning of the confidence interval, an important concept for judges and lawyers to understand. Here is how the authors describe the confidence interval in their chapter:
“Evaluating estimates: In science (and in contrast to their lay meanings), the terms uncertainty and error refer to the variability of a set of data that is intended to estimate a single number. Uncertainty and error are generally expressed as a range, within which we are confident that, if the study were repeated, the new result would fall. Scientists often use a 95% confidence interval for this purpose.”[39]
Describing the confidence interval in the same sentence as “uncertainty and error” is bound to induce uncertainty and error. The confidence interval provides a range of estimates based upon random error, and uncertainty only in the form of imprecision in the point estimate. There are of course myriad other kinds of uncertainty and error not captured by the confidence interval. The most important of the authors’ errors is that they assert incorrectly that the confidence interval provides a range within which new results from the study repeated would fall. This is, again, a variant on the transposition fallacy that the authors commit in their definition of the p-value. The confidence interval provides a range of results that would not be rejected as alternative null hypotheses by the data in the obtained sample. Because of random error, future samples would give different results, with different confidence intervals, which would not be co-extensive with the first obtained confidence interval. To be sure, the statistics chapter states the matter correctly, and the epidemiology chapter finally gets it correct in its text (after having mangled the concept in the second and third editions), but the epidemiology chapter perpetuates its previous errors in defining confidence intervals in its glossary. This sort of issue, and it is a serious one, could have been eliminated had there been meaningful peer review and editorial oversight for consistency and accuracy of the Manual as a whole.
Weisberg and Thanukos address statistical power in a way that may also mislead readers. They tell us that “[p]ower refers to a test’s ability to reject a hypothesis that is indeed false.” W&T at 88. If only were it so. The authors omit that power is a probability that at a specified level of significance (say p < 0.05), and a specified alternative hypothesis, sample size, and probability model, the sample result will reject the null hypothesis in favor of the alternative hypothesis. Then the authors suggest confusingly that “[w]ell-designed studies have sufficient power to detect the differences of interest, but it may not be apparent when a test lacks power.”[40]
If the study at issue presents a confidence interval around a point estimate of interest, then it will be clear what alternative null hypotheses are statistically compatible with the sample result at the pre-specified level of alpha (significance). Any point outside the interval would be rejected by such a test of significance, and so the casual reader will have a rather good idea of what could and could not be rejected by the sample data. And of course, virtually every study will have low power to detect extremely small increased risks, say relative risk of 1.00001. And most studies will have high power to detect risk ratios of over 1,000.
This new chapter on “How Science Works” also propagates some well-known fallacies about statistical significance testing. Implicit in the authors’ committing the transposition fallacy, is a conceptual and mathematical confusion between the coefficient of confidence (1-α) and the posterior probability of an hypothesis.
The authors’ mistake comes in their insistence upon labeling precision in a test result as “certainty.” In the quote below, the authors’ confusion is clear and obvious:
“Note that the 95% and 5% cutoffs are somewhat arbitrary, and a higher degree of confidence might be required if more certainty were desired—for example if an impactful policy decision depended on the conclusion.”[41]
An impactful [sic] policy decision might well call for more certainty, or a higher posterior probability, but a higher coefficient of confidence will not necessarily map to hypothesis probability at all. The authors’ confusion and conflation of the probability of alpha and the Bayesian posterior probability arises elsewhere within the chapter:
“(1) A p-value lower than 0.05 does not prove that a null hypothesis is false. It is strong evidence, but there is a small chance that the difference observed could be the result of chance alone.
(2) Using a low p-value (e.g., 0.05) as a criterion for significance sets a high bar for rejecting the null hypothesis, minimizing the chance of getting a false positive… .”[42]
Again, a p-value less than five percent is hardly strong evidence in the context of large database studies, especially when there are multiple comparisons and the outcome is not the pre-specified outcome of the analysis. The authors’ confusion is on full display when they discuss the Zoloft birth defects litigation, where the Third Circuit affirmed the exclusion of plaintiffs’ expert witnesses’ causation opinions and the grant of summary judgment to the defendants. According to the authors’ narrative:
“plaintiffs’ expert’s testimony would have argued that multiple, nonsignificant associations between Zoloft use and birth defects indicated a causal relationship. The testimony was excluded because these results were consistent with a weak causal relationship (a small effect size), one that is ‘so weak that one cannot conclude that the risk is greater than that seen in the general population’.”[43]
Of course, in the Zoloft litigation, the excluded plaintiffs’ expert witnesses were caught red-handed – at cherry picking – and attempting to circumvent the lack of significance with a methodologically incorrect meta-analyses.[44]
If the risk of birth defects among children born to mothers who used Zoloft in pregnancy was no greater than seen in the general population, then there would be no risk, not risk “so weak” it cannot be seen. Locutions such as the “results were consistent with a weak causal relationship,” when the results were equally consistent with no causal relationship suggest that the writers cannot bring themselves to say that the causal hypothesis was simply not supported at all. Of course, no study may exclude an increased risk of 0.01 percent, or a relative risk of 1.01, but at some point, when multiple attempts fail to reveal an increased risk, we may conclude that the proponents of the causal claim have failed to make their case.
META-SHMETA-ANALYSIS
Weisberg and Thanukos address meta-analysis incompletely in the context of systematic reviews. The authors do not provide any insights into how meta-analyses are done, and more glaringly, they fail to mention that not all systematic reviews can or should result in quantitative syntheses of estimates of association. On the positive side, they state that meta-analyses are important in litigation, and that the application of rigorous methodologies should be required.[45] With clearly unintended irony, Weisberg and Thanukos offer, as support for their statement, the Paoli Railroad Yard case, “in which the exclusion of a contested meta-analysis was overturned.”[46]
Weisberg and Thanukos have stepped into the wet corner of a pigsty. The issue in the Paoli case arose from a meta-analysis of mortality rates associated with polychlorobiphenyl (PCB) exposures. The district court excluded the proponent of the meta-analysis, not because it was unreliable, but because it was novel. Holding it up in conjunction with a statement about application of rigorous or reliable methodologies was way off the relevant legal point.
The expert witness who proffered the meta-analysis in Paoli was William Nicholson, who was a physicist with no professional training in epidemiology. For his opinion that PCBs were causally associated with human liver cancer, Nicholson relied upon a non-peer-reviewed, unpublished report he wrote for the Ontario Ministry of Labor.[47] Nicholson described his report as a “study of the data of all the PCB worker epidemiological studies that had been published,” from which he concluded that there was “substantial evidence for a causal association between excess risk of death from cancer of the liver, biliary tract, and gall bladder and exposure to PCBs.”[48]
The defense challenged Nicholson’s opinion, not on Rule 702, but on case law that pre-dated the Daubert decision.[49] The challenge included pointing out the unreliability of the Nicholson’s meta-analysis, but also asserted (incorrectly) the novelty of meta-analysis generally. The district court sustained the defense objection on the grounds of “novelty,” without reaching the reliability analysis.[50] The Third Circuit appropriately reversed and remanded for consideration of the reliability of Nicholson’s meta-analysis.[51]
The consideration of Nicholson’s “meta-analysis” never occurred on remand; plaintiffs’ counsel and their expert witnesses withdrew their reliance upon Nicholson’s analysis. Their about face was highly prudent. Nicholson’s report presented SMRs (standardized mortality ratios); for the all cancers statistic, he reported an SMR of 95. What Nicholson did, in this analysis, and in all other instances, was simply divide the observed number of deaths by the expected, and multiply by 100. This crude, simplistic calculation fails to present a standardized mortality ratio, which requires taking into account the age distribution of the exposed and the unexposed groups, and a weighting of the contribution of cases within each age stratum. Nicholson’s presentation of data was nothing short of a fraud.
Nicholson’s Report was replete with many other methodological sins. He used a composite of three organs (liver, gall bladder, bile duct) without any biological rationale. His analysis combined male and female results, and still his analysis of the composite outcome was based upon only seven cases. Of those seven cases, some of the cases were not confirmed as primary liver cancer, and at least one case was confirmed as not being a primary liver cancer.[52]
As noted, Nicholson failed to standardize the analysis for the age distribution of the observed and expected cases, and he failed to present meaningful analysis of random or systematic error. When he did present p-values, he presented one-tailed values, and he made no corrections for his many comparisons from the same set of data.
Finally, and most egregiously, Nicholson’s meta-analysis was meta-analysis in name only. What he had done was simply to add “observed” and “expected” events across studies to arrive at totals, and to recalculate a bogus risk ratio, which he fraudulently called a standardized mortality ratio. Adding events across studies, without weighting by the inverse of study variance, is not a valid meta-analysis; indeed, it is a well-known example of how to generate the error known as Simpson’s Paradox, which can change the direction or magnitude of any association.[53]
In citing to the Paoli case as a reversal of exclusion of a contested meta-analysis, Weisberg and Thanukos give a truncated analysis that misleads readers, judges, and lawyers. There never was a proper consideration of the reliability vel non of Nicholson’s meta-analysis in the Paoli litigation, and in the final analysis, the Paoli plaintiffs abandoned reliance upon Nicholson’s ill-conceived meta-analysis.
VIRTUE SIGNALING
Although there are no land acknowledgments for the property on which Federal Judicial Center building is located, Weisberg and Thanukos miss few opportunities to let us know that they are woke scholars. There is the gratuitous and triggering “pregnant people,”[54] which begs any number of biological questions. Then there is the authors’ statement that they are limiting their focus to the “Western conception of science,” which begs another question, why would we call any other epistemically valid approach, from any corner of the globe, as something other than “science.”[55]
Equally gratuitous are the authors’ endorsements of DEI and “diversity,” with overbroad generalizations that diversity per se advances science,[56] and a claim that “women, people of color, other historically oppressed groups, and non-Western people” are not taken seriously as scientists.[57] In over 40 years of litigating technical and scientific issues, I have never seen a judge or a lawyer disrespect an expert witness based upon sex, race, ethnicity, or national origin. Of course, I have seen expert witnesses treated roughly for propounding bad science, and that seems perfectly appropriate.
[1] See David Goodstein, ON FACT AND FRAUD: CAUTIONARY TALES FROM THE FRONT LINES OF SCIENCE (2010).
[2] Weisberg and Thanukos frequently refer to other chapters in the Manual, which suggests that their chapter was written late in the development of the Fourth Edition, and perhaps contributed to the delayed publication.
[3] Michael Weisberg & Anastasia Thanukos, How Science Works, in National Academies of Sciences, Engineering, and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 47 (4th ed. 2025) [cited as W&T].
[4] See Michael Weisberg, University of Pennsylvania Philosophy, at https://philosophy.sas.upenn.edu/people/michael-weisberg.
[5] Anna Thanukos, Staff, available at https://ucmp.berkeley.edu/people/anna-thanukos/#:~:text=Her%20background%3A%20Anna%20has%20a,Education%2C%20both%20from%20UC%20Berkeley
[6] W&T at 72-75.
[7] W&T at 81.
[8] W&T at 81.
[9] W&T at 81 & n.85 (emphasis added), citing Naomi Oreskes & Erik M. Conway, MERCHANTS OF DOUBT: HOW A HANDFUL OF SCIENTISTS OBSCURED THE TRUTH ON ISSUES FROM TOBACCO SMOKE TO GLOBAL WARMING (2010).
[10] W&T at 94-96.
[11] W&T at 95 n.120.
[12] Richard Van Noorden, More than 10,000 research papers were retracted in 2023 — a new record, 624 NATURE 479 (2023).
[13] W&T at 95.
[14] W&T at 55.
[15] W&T at 63, 68.
[16] W&T at 68.
[17] W&T at 65.
[18] W&T at 70.
[19] W&T at 71.
[20] W&T at 66.
[21] W&T at 75.
[22] W&T at 49.
[23] W&T at 83.
[24] W&T at 86 (citing Richter and Capra’s discussion of Milward in chapter one of the Manual, and Professor Gold’s article from the lawsuit industry celebratory conference on the Milward case).
[25] W&T at 99-100.
[26] W&T at 99.
[27] W&T 96 (emphasis added).
[28] IARC MONOGRAPHS ON THE IDENTIFICATION OF CARCINOGENIC HAZARDS TO HUMANS – PREAMBLE (2019), available at https://monographs.iarc.who.int/wp-content/uploads/2019/07/Preamble-2019.pdf
[29] Liesa L. Richter & Daniel J. Capra, The Admissibility of Expert Testimony, National Academies of Sciences, Engineering, and Medicine & Federal Judicial Center, REFERENCE MANUAL ON SCIENTIFIC EVIDENCE 1, 32-33 (4th ed. 2025).
[30] W&T at 76.
[31] Kenneth J. Rothman, “Conflict of interest: the new McCarthyism in science,” 269 J. AM. MED. ASS’N 2782 (1993). See Schachtman, The Rhetoric and Challenge of Conflicts of Interest, TORTINI (July 30, 2013).
[32] W&T at 76 & n.67, citing Sergio Sismondo, Pharmaceutical Company Funding and Its Consequences: A Qualitative Systematic Review, 29 CONTEMP. CLINICAL TRIALS 109 (2008).
[33] W&T at 77.
[34] W&T at 59-60.
[35] W&T at 59-60.
[36] W&T at 76.
[37] W&T at 111.
[38] W&T at 87.
[39] W&T at 90.
[40] W&T at 88.
[41] W&T at 90 (emphasis added).
[42] W&T at 88.
[43] W&T at 90 (internal citations omitted).
[44] In re Zoloft (Sertraline Hydrochloride) Prods. Liab. Litig., 26 F. Supp. 3d 449 (E.D. Pa. 2014); No. 12-md-2342, 2015 WL 314149, at *3 (E.D. Pa. Jan. 23, 2015) (rejecting proffered expert witness opinion based upon “cherry-picking of studies and data within studies”), aff’d, 858 F.3d 787 (3rd Cir. 2017).
[45] W&T at 99.
[46] W&T at 99 & n.134, citing In re Paoli R.R. Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990).
[47] William Nicholson, Report to the Workers’ Compensation Board on Occupational Exposure to PCBs and Various Cancers, for the Industrial Disease Standards Panel (ODP); IDSP Report No. 2 (Toronto Dec. 1987) [Report].
[48] Id. at 373.
[49] See United States v. Downing, 753 F.2d 1224 (3d Cir.1985).
[50] In re Paoli RR Yard Litig., 706 F. Supp. 358, 372-73 (E.D. Pa. 1988).
[51] In re Paoli RR Yard PCB Litig., 916 F.2d 829 (3d Cir. 1990), cert. denied sub nom. General Elec. Co. v. Knight, 499 U.S. 961 (1991).
[52] Report, Table 22.
[53] See James A. Hanley, et al., Simpson’s Paradox in Meta-Analysis, 11 EPIDEMIOLOGY 613 (2000); H. James Norton & George Divine, Simpson’s paradox and how to avoid it, SIGNIFICANCE 40 (Aug. 2015); George Udny Yule, Notes on the theory of association of attributes in statistics, 2 BIOMETRIKA 121 (1903).
[54] W&T at 84.
[55] W&T at 50.
[56] W&T at 71 n. 52-54.
[57] W&T at 102.
