TORTINI

For your delectation and delight, desultory dicta on the law of delicts.

Palavering About P-Values

August 17th, 2019

The American Statistical Association’s most recent confused and confusing communication about statistical significance testing has given rise to great mischief in the world of science and science publishing.[1] Take for instance last week’s opinion piece about “Is It Time to Ban the P Value?” Please.

Helena Chmura Kraemer is an accomplished professor of statistics at Stanford University. This week the Journal of the American Medical Association network flagged Professor Kraemer’s opinion piece on p-values as one of its most read articles. Kraemer’s eye-catching title creates the impression that the p-value is unnecessary and inimical to valid inference.[2]

Remarkably, Kraemer’s article commits the very mistake that the ASA set out to correct back in 2016,[3] by conflating the probability of the data under a hypothesis of no association with the probability of a hypothesis given the data:

“If P value is less than .05, that indicates that the study evidence was good enough to support that hypothesis beyond reasonable doubt, in cases in which the P value .05 reflects the current consensus standard for what is reasonable.”

The ASA tried to break the bad habit of scientists’ interpreting p-values as allowing us to assign posterior probabilities, such as beyond a reasonable doubt, to hypotheses, but obviously to no avail.

Kraemer also ignores the ASA 2016 Statement’s teaching of what the p-value is not and cannot do, by claiming that p-values are determined by non-random error probabilities such as:

“the reliability and sensitivity of the measures used, the quality of the design and analytic procedures, the fidelity to the research protocol, and in general, the quality of the research.”

Kraemer provides errant advice and counsel by insisting that “[a] non-significant result indicates that the study has failed, not that the hypothesis has failed.” If the p-value is the measure of the probability of observing an association at least as large as obtained given an assumed null hypothesis, then of course a large p-value cannot speak to the failure of the hypothesis, but why declare that the study has failed? The study was perhaps indeterminate, but it still yielded information that perhaps can be combined with other data, or help guide future studies.

Perhaps in her most misleading advice, Kraemer asserts that:

“[w]hether P values are banned matters little. All readers (reviewers, patients, clinicians, policy makers, and researchers) can just ignore P values and focus on the quality of research studies and effect sizes to guide decision-making.”

Really? If a high quality study finds an “effect size” of interest, we can now ignore random error?

The ASA 2016 Statement, with its “six principles,” has provoked some deliberate or ill-informed distortions in American judicial proceedings, but Kraemer’s editorial creates idiosyncratic meanings for p-values. Even the 2019 ASA “post-modernism” does not advocate ignoring random error and p-values, as opposed to proscribing dichotomous characterization of results as “statistically significant,” or not.[4] The current author guidelines for articles submitted to the Journals of the American Medical Association clearly reject this new-fangled rejection of evaluating this new-fangled rejection of the need to assess the role of random error.[5]


[1]  See Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Editorial: Moving to a World Beyond ‘p < 0.05’,” 73 Am. Statistician S1, S2 (2019).

[2]  Helena Chmura Kraemer, “Is It Time to Ban the P Value?J. Am. Med. Ass’n Psych. (August 7, 2019), in-press at doi:10.1001/jamapsychiatry.2019.1965.

[3]  Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s Statement on p-Values: Context, Process, and Purpose,” 70 The American Statistician 129 (2016).

[4]  “Has the American Statistical Association Gone Post-Modern?” (May 24, 2019).

[5]  See instructions for authors at https://jamanetwork.com/journals/jama/pages/instructions-for-authors

Mass Torts Made Less Bad – The Zambelli-Weiner Affair in the Zofran MDL

July 30th, 2019

Judge Saylor, who presides over the Zofran MDL, handed down his opinion on the Zambelli-Weiner affair, on July 25, 2019.[1] As discussed on these pages back in April of this year,[2] GlaxoSmithKline (GSK), the defendant in the Zofran birth defects litigation, sought documents from plaintiffs and Dr Zambelli-Weiner (ZW) about her published study on Zofran and birth defects.[3] Plaintiffs refused to respond to the discovery on grounds of attorney work product,[4] and of consulting expert witness confidential communications.[5] After an abstract of ZW’s study appeared in print, GSK subpoenaed ZW and her co-author, Dr. Russell Kirby, for a deposition and for production of documents.

Plaintiffs’ counsel sought a protective order. Their opposition relied upon a characterization of ZW as a research scientist; they conveniently ommitted their retention of her as a paid expert witness. In December 2018, the MDL court denied plaintiffs’ motion for a protective order, and allowed the deposition to go forward to explore the financial relationship between counsel and ZW.

In January 2019, when GSK served ZW with its subpoena duces tecum, ZW through her own counsel moved for a protective order, supported by ZW’s affidavit with factual assertions to support her claim to be not subject to the deposition. The MDL court quickly denied her motion, and in short order, her lawyer notified the court that ZW’s affidavit contained “factual misrepresentations,” which she refused to correct, and he sought leave to withdraw.

According to the MDL court, the ZW affidavit contained three falsehoods. She claimed not to have been retained by any party when she had been a paid consultant to plaintiffs at times over the previous five years, since December 2014. ZW claimed that she had no factual information about the litigation, when in fact she had participated in a Las Vegas plaintiffs’ lawyers’ conference, “Mass Torts Made Perfect,” in October 2015. Furthermore, ZW falsely claimed that monies received from plaintiffs’ law firms did not go to fund the Zofran study, but went to her company, Translational Technologies International Health Research & Economics, for unrelated work. ZW received in excess of $200,000 for her work on the Zofran study.

After ZW obtained new counsel, she gave deposition testimony in February 2019, when she acknowledged the receipt of money for the study, and the lengthy relationship with plaintiffs’ counsel. Armed with this information, GSK moved for full responses to its document requests. Again, plaintiffs’ counsel and ZW resisted on grounds of confidentiality and privilege.

Judge Saylor reviewed the requested documents in camera, and held last week that they were not protected by consulting expert witness privilege or by attorney work product confidentiality. ZW’s materials and communications in connection with the Las Vegas plaintiffs’ conference never had the protection of privilege or confidentiality. ZW presented at a “quasi-public” conference attended by lawyers who had no connection to the Zofran litigation.[6]

With respect to work product claims, Judge Saylor found that GSK had shown “exceptional circumstances” and “substantial need” for the requested materials given that the plaintiffs’ testifying expert witnesses had relied upon the ZW study, which had been covertly financially supported by plaintiffs’ lawyers.[7] With respect to whatever was thinly claimed to be privileged and confidential, Judge Saylor found the whole arrangement to fail the smell test:[8]

“It is troublesome, to say the least, for a party to engage a consulting, non-testifying expert; pay for that individual to conduct and publish a study, or otherwise affect or influence the study; engage a testifying expert who relies upon the study; and then cloak the details of the arrangement with the consulting expert in the confidentiality protections of Rule 26(b) in order to conceal it from a party opponent and the Court. The Court can see no valid reason to permit such an arrangement to avoid the light of discovery and the adversarial process. Under the circumstances, GSK has made a showing of substantial need and an inability to obtain these documents by other means without undue hardship.

Furthermore, in this case, the consulting expert made false statements to the Court as to the nature of her relationship with plaintiffs’ counsel. The Court would not have been made aware of those falsehoods but for the fact that her attorney became aware of the issue and sought to withdraw. Certainly plaintiffs’ counsel did nothing at the time to correct the false impressions created by the affidavit. At a minimum, the submission of those falsehoods effectively waived whatever protections might otherwise apply. The need to discover the truth and correct the record surely outweighs any countervailing policy in favor of secrecy, particularly where plaintiffs’ testifying experts have relied heavily on Dr. Zambelli-Weiner’s study as a basis for their causation opinions. In order to effectively cross-examine plaintiffs’ experts about those opinions at trial, GSK is entitled to review the documents. At a minimum, the documents shed additional light on the nature of the relationship between Dr. Zambelli-Weiner and plaintiffs’ counsel, and go directly to the credibility of Dr. Zambelli-Weiner and the reliability of her study results.”

It remains to be seen whether Judge Saylor will refer the matter of ZW’s false statements in her affidavit to the U.S. Attorney’s office, or the lawyers’ complicity in perpetuating these falsehoods to disciplinary boards.

Mass torts will never be perfect, or even very good. Judge Saylor, however, has managed to make the Zofran litigation a little less bad.


[1]  Memorandum and order on In Camera Production of Documents Concerning Dr. April Zambelli-Weiner, In re Zofran Prods. Liab. Litig., MDL 2657, D.Mass. (July 25, 2019) [cited as Mem.].

[2]  NAS, “Litigation Science – In re Zambelli-Weiner” (April 8, 2019).

[3]  April Zambelli-Weiner, et al., “First Trimester Ondansetron Exposure and Risk of Structual Birth Defects,” 83 Reproductive Toxicol. 14 (2019).

[4]  Fed. R. Civ. P. 26(b)(3).

[5]  Fed. R. Civ. P. 26(b)(4)(D).

[6]  Mem. at 7-9.

[7]  Mem. at 9.

[8]  Mem. at 9-10.

Statistical Significance at the New England Journal of Medicine

July 19th, 2019

Some wild stuff has been going on in the world of statistics, at the American Statistical Association, and elsewhere. A very few obscure journals have declared p-values to be verboten, and presumably confidence intervals as well. The world of biomedical research has generally reacted more sanely, with authors defending the existing frequentist approaches and standards.[1]

This week, the editors of the New England Journal of Medicine have issued new statistical guidelines for authors. The Journal’s approach seems appropriately careful and conservative for the world of biomedical research. In an editorial introducing the new guidelines,[2] the Journal editors remind their potential authors that statistical significance and p-values are here to stay:

“Despite the difficulties they pose, P values continue to have an important role in medical research, and we do not believe that P values and significance tests should be eliminated altogether. A well-designed randomized or observational study will have a primary hypothesis and a prespecified method of analysis, and the significance level from that analysis is a reliable indicator of the extent to which the observed data contradict a null hypothesis of no association between an intervention or an exposure and a response. Clinicians and regulatory agencies must make decisions about which treatment to use or to allow to be marketed, and P values interpreted by reliably calculated thresholds subjected to appropriate adjustments have a role in those decisions.”[3]

The Journal’s editors described their revamped statistical policy as being based upon three premises:

(1) adhering to prespecified analysis plans if they exist;

(2) declaring associations or effects only for statistical analyses that have pre-specified “a method for controlling type I error”; and

(3) presenting evidence about clinical benefits or harms requires “both point estimates and their margins of error.”

With a hat tip to the ASA’s recent pronouncements on statistical significance,[4] the editors suggest that their new guidelines have moved away from bright-line applications of statistical significance “as a bright-line marker for a conclusion or a claim”[5]:

“[T]he notion that a treatment is effective for a particular outcome if P < 0.05 and ineffective if that threshold is not reached is a reductionist view of medicine that does not always reflect reality.”[6]

The editors’ language intimates greater latitude for authors in claiming associations or effects from their studies, but this latitude may well be circumscribed by tighter control over such claims in the inevitable context of multiple testing within a dataset.

The editors’ introduction of the new guidelines is not entirely coherent. The introductory editorial notes that the use of p-values for reporting multiple outcomes, without adjustments for multiplicity, inflates the number of findings with p-values less than 5%. The editors thus caution against “uncritical interpretation of multiple inferences,” which can be particularly threatening to valid inference when not all the comparisons conducted by the study investigators have been reported in their manuscript.[7] They reassuringly tell prospective authors that many methods are available to adjust for multiple comparisons, and can be used to control Type I error probability “when specified in the design of a study.”[8]

But what happens when such adjustment methods are not pre-specified in the study design? Failure to to do so do not appear to be disqualifying factors for publication in the Journal. For one thing, when the statistical analysis plan of the study has not specified adjustment methods for controlling type I error probabilities, then authors must replace p-values with “estimates of effects or association and 95% confidence intervals.”[9] It is hard to understand how this edict helps when the specified coefficient of 95% is a continuation of the 5% alpha, which would have been used in any event. The editors seem to be saying that if authors fail to pre-specify or even post-specify methods for controlling error probabilities, then they cannot declare statistical significance, or use p-values, but they can use confidence intervals in the same way they have been using them, and with the same misleading interpretations supplied by their readers.

More important, another price authors will have to pay for multiple testing without pre-specified methods of adjustment is that they will affirmatively have to announce their failure to adjust for multiplicity and that their putative associations “may not be reproducible.” Tepid as this concession is, it is better than previous practice, and perhaps it will become a badge of shame. The crucial question is whether judges, in exercising their gatekeeping responsibilities, will see these acknowledgements as disabling valid inferences from studies that carry this mandatory warning label.

The editors have not issued guidelines for the use of Bayesian statistical analyses, because “the large majority” of author manuscripts use only frequentist analyses.[10] The editors inform us that “[w]hen appropriate,” they will expand their guidelines to address Bayesian and other designs. Perhaps this expansion will be appropriate when Bayesian analysts establish a track record of abuse in their claiming of associations and effects.

The new guidelines themselves are not easy to find. The Journal has not published these guidelines as an article in their published issues, but has relegated them to a subsection of their website’s instructions to authors for new manuscripts:

https://www.nejm.org/author-center/new-manuscripts

Presumably, the actual author instructions control in any perceived discrepancy between this week’s editorial and the guidelines themselves. Authors are told that p-values generally should be two-sided. Authors’ use of:

“Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to match any adjustment made to significance levels in the corresponding test.”

Similarly, the guidelines call for, but do not require, pre-specified methods of controlling family-wide error rates for multiple comparisons. For observational studies submitted without pre-specified methods of error control, the guidelines recommend the use of point estimates and 95% confidence intervals, with an explanation that the interval widths have not been adjusted for multiplicity, and a caveat that the inferences from these findings may not be reproducible. The guidelines recommend against using p-values for such results, but again, it is difficult to see why reporting the 95% confidence intervals is recommended when p-values are not recommended.


[1]  Jonathan A. Cook, Dean A. Fergusson, Ian Ford, Mithat Gonen, Jonathan Kimmelman, Edward L. Korn, and Colin B. Begg, “There is still a place for significance testing in clinical trials,” 16 Clin. Trials 223 (2019).

[2]  David Harrington, Ralph B. D’Agostino, Sr., Constantine Gatsonis, Joseph W. Hogan, David J. Hunter, Sharon-Lise T. Normand, Jeffrey M. Drazen, and Mary Beth Hamel, “New Guidelines for Statistical Reporting in the Journal,” 381 New Engl. J. Med. 285 (2019).

[3]  Id. at 286.

[4]  See id. (“Journal editors and statistical consultants have become increasingly concerned about the overuse and misinterpretation of significance testing and P values in the medical literature. Along with their strengths, P values are subject to inherent weaknesses, as summarized in recent publications from the American Statistical Association.”) (citing Ronald L. Wasserstein & Nicole A. Lazar, “The ASA’s statement on p-values: context, process, and purpose,” 70 Am. Stat. 129 (2016); Ronald L. Wasserstein, Allen L. Schirm, and Nicole A. Lazar, “Moving to a world beyond ‘p < 0.05’,” 73 Am. Stat. s1 (2019)).

[5]  Id. at 285.

[6]  Id. at 285-86.

[7]  Id. at 285.

[8]  Id., citing Alex Dmitrienko, Frank Bretz, Ajit C. Tamhane, Multiple testing problems in pharmaceutical statistics (2009); Alex Dmitrienko & Ralph B. D’Agostino, Sr., “Multiplicity considerations in clinical trials,” 378 New Engl. J. Med. 2115 (2018).

[9]  Id.

[10]  Id. at 286.

Science Bench Book for Judges

July 13th, 2019

On July 1st of this year, the National Judicial College and the Justice Speakers Institute, LLC released an online publication of the Science Bench Book for Judges [Bench Book]. The Bench Book sets out to cover much of the substantive material already covered by the Federal Judicial Center’s Reference Manual:

Acknowledgments

Table of Contents

  1. Introduction: Why This Bench Book?
  2. What is Science?
  3. Scientific Evidence
  4. Introduction to Research Terminology and Concepts
  5. Pre-Trial Civil
  6. Pre-trial Criminal
  7. Trial
  8. Juvenile Court
  9. The Expert Witness
  10. Evidence-Based Sentencing
  11. Post Sentencing Supervision
  12. Civil Post Trial Proceedings
  13. Conclusion: Judges—The Gatekeepers of Scientific Evidence

Appendix 1 – Frye/Daubert—State-by-State

Appendix 2 – Sample Orders for Criminal Discovery

Appendix 3 – Biographies

The Bench Book gives some good advice in very general terms about the need to consider study validity,[1] and to approach scientific evidence with care and “healthy skepticism.”[2] When the Bench Book attempts to instruct on what it represents the scientific method of hypothesis testing, the good advice unravels:

“A scientific hypothesis simply cannot be proved. Statisticians attempt to solve this dilemma by adopting an alternate [sic] hypothesis – the null hypothesis. The null hypothesis is the opposite of the scientific hypothesis. It assumes that the scientific hypothesis is not true. The researcher conducts a statistical analysis of the study data to see if the null hypothesis can be rejected. If the null hypothesis is found to be untrue, the data support the scientific hypothesis as true.”[3]

Even in experimental settings, a statistical analysis of the data do not lead to a conclusion that the null hypothesis is untrue, as opposed to not reasonably compatible with the study’s data. In observational studies, the statistical analysis must acknowledge whether and to what extent the study has excluded bias and confounding. When the Bench Book turns to speak of statistical significance, more trouble ensues:

“The goal of an experiment, or observational study, is to achieve results that are statistically significant; that is, not occurring by chance.”[4]

In the world of result-oriented science, and scientific advocacy, it is perhaps true that scientists seek to achieve statistically significant results. Still, it seems crass to come right out and say so, as opposed to saying that the scientists are querying the data to see whether they are compatible with the null hypothesis. This first pass at statistical significance is only mildly astray compared with the Bench Book’s more serious attempts to define statistical significance and confidence intervals:

4.10 Statistical Significance

The research field agrees that study outcomes must demonstrate they are not the result of random chance. Leaving room for an error of .05, the study must achieve a 95% level of confidence that the results were the product of the study. This is denoted as p ≤ 05. (or .01 or .1).”[5]

and

“The confidence interval is also a way to gauge the reliability of an estimate. The confidence interval predicts the parameters within which a sample value will fall. It looks at the distance from the mean a value will fall, and is measured by using standard deviations. For example, if all values fall within 2 standard deviations from the mean, about 95% of the values will be within that range.”[6]

Of course, the interval speaks to the precision of the estimate, not its reliability, but that is a small point. These definitions are virtually guaranteed to confuse judges into conflating statistical significance and the coefficient of confidence with the legal burden of proof probability.

The Bench Book runs into problems in interpreting legal decisions, which would seem softer grist for the judicial mill. The authors present dictum from the Daubert decision as though it were a holding:[7]

“As noted in Daubert, ‘[t]he focus, of course, must be solely on principles and methodology, not on the conclusions they generate’.”

The authors fail to mention that this dictum was abandoned in Joiner, and that it is specifically rejected by statute, in the 2000 revision to the Federal Rule of Evidence 702.

Early in the Bench Book, it authors present a subsection entitled “The Myth of Scientific Objectivity,” which they might have borrowed from Feyerabend or Derrida. The heading appears misleading because the text contradicts it:

“Scientists often develop emotional attachments to their work—it can be difficult to abandon an idea. Regardless of bias, the strongest intellectual argument, based on accepted scientific hypotheses, will always prevail, but the road to that conclusion may be fraught with scholarly cul-de-sacs.”[8]

In a similar vein, the authors misleadingly tell readers that “the forefront of science is rarely encountered in court,” and so “much of the science mentioned there shall be considered established….”[9] Of course, the reality is that many causal claims presented in court have already been rejected or held to be indeterminate by the scientific community. And just when readers may think themselves safe from the goblins of nihilism, the authors launch into a theory of naïve probabilism that science is just placing subjective probabilities upon data, based upon preconceived biases and beliefs:

“All of these biases and beliefs play into the process of weighing data, a critical aspect of science. Placing weight on a result is the process of assigning a probability to an outcome. Everything in the universe can be expressed in probabilities.”[10]

So help the expert witness who honestly (and correctly) testifies that the causal claim or its rejection cannot be expressed as a probability statement!

Although I have not read all of the Bench Book closely, there appears to be no meaningful discussion of Rule 703, or of the need to access underlying data to ensure that the proffered scientific opinion under scrutiny has used appropriate methodologies at every step in its development. Even a 412 text cannot address every issue, but this one does little to help the judicial reader find more in-depth help on statistical and scientific methodological issues that arise in occupational and environmental disease claims, and in pharmaceutical products litigation.

The organizations involved in this Bench Book appear to be honest brokers of remedial education for judges. The writing of this Bench Book was funded by the State Justice Institute (SJI) Which is a creation of federal legislation enacted with the laudatory goal of improving the quality of judging in state courts.[11] Despite its provenance in federal legislation, the SJI is a a private, nonprofit corporation, governed by 11 directors appointed by the President, and confirmed by the Senate. A majority of the directors (six) are state court judges, one state court administrator, and four members of the public (no more than two from any one political party). The function of the SJI is to award grants to improve judging in state courts.

The National Judicial College (NJC) originated in the early 1960s, from the efforts of the American Bar Association, American Judicature Society and the Institute of Judicial Administration, to provide education for judges. In 1977, the NJC became a Nevada not-for-profit (501)(c)(3) educational corporation, which its campus at the University of Nevada, Reno, where judges could go for training and recreational activities.

The Justice Speakers Institute appears to be a for-profit company that provides educational resources for judge. A Press Release touts the Bench Book and follow-on webinars. Caveat emptor.

The rationale for this Bench Book is open to question. Unlike the Reference Manual for Scientific Evidence, which was co-produced by the Federal Judicial Center and the National Academies of Science, the Bench Book’s authors are lawyers and judges, without any subject-matter expertise. Unlike the Reference Manual, the Bench Book’s chapters have no scientist or statistician authors, and it shows. Remarkably, the Bench Book does not appear to cite to the Reference Manual or the Manual on Complex Litigation, at any point in its discussion of the federal law of expert witnesses or of scientific or statistical method. Perhaps taxpayers would have been spared substantial expense if state judges were simply encouraged to read the Reference Manual.


[1]  Bench Book at 190.

[2]  Bench Book at 174 (“Given the large amount of statistical information contained in expert reports, as well as in the daily lives of the general society, the ability to be a competent consumer of scientific reports is challenging. Effective critical review of scientific information requires vigilance, and some healthy skepticism.”).

[3]  Bench Book at 137; see also id. at 162.

[4]  Bench Book at 148.

[5]  Bench Book at 160.

[6]  Bench Book at 152.

[7]  Bench Book at 233, quoting Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 595 (1993).

[8]  Bench Book at 10.

[9]  Id. at 10.

[10]  Id. at 10.

[11] See State Justice Institute Act of 1984 (42 U.S.C. ch. 113, 42 U.S.C. § 10701 et seq.).

The opinions, statements, and asseverations expressed on Tortini are my own, or those of invited guests, and these writings do not necessarily represent the views of clients, friends, or family, even when supported by good and sufficient reason.