Michael D. Freeman is a chiropractor and self-styled “forensic epidemiologist,” affiliated with Departments of Public Health & Preventive Medicine and Psychiatry, Oregon Health & Science University School of Medicine, in Portland, Oregon. His C.V. can be found here. Freeman has an interesting publication in press on his views of forensic epidemiology. Michael D. Freeman & Maurice Zeegers, “Principles and applications of forensic epidemiology in the medico-legal setting,” Law, Probability and Risk (2015); doi:10.1093/lpr/mgv010. Freeman’s views on epidemiology did not, however, pass muster in the courtroom. Porter v. Smithkline Beecham Corp., Phila. Cty. Ct. C.P., Sept. Term 2007, No. 03275. Slip op. (Oct. 5, 2015).
In Porter, plaintiffs sued Pfizer, the manufacturer of the SSRI antidepressant Zoloft. Plaintiffs claimed the mother plaintiff’s use of Zoloft during pregnancy caused her child to be born with omphalocele, a serious defect that occurs when the child’s intestines develop outside his body. Pfizer moved to exclude plaintiffs’ medical causation expert witnesses, Dr. Cabrera and Dr. Freeman. The trial judge was the Hon. Mark I. Bernstein, who has written and presented frequently on expert witness evidence.[1] Judge Bernstein held a two day hearing in September 2015, and last week, His Honor ruled that the plaintiffs’ expert witnesses failed to meet Pennsylvania’s Frye standard for admissibility. Judge Bernstein’s opinion reads a bit like a Berenstain Bear book on how not to use epidemiology.
GENERAL CAUSATION SCREW UPS
Proper Epidemiologic Method
First, Find An Association
Dr. Freeman has a methodologic map that included Bradford Hill criteria at the back end of the procedure. Dr. Freeman, however, impetuously forgot that before you get to the back end, you must traverse the front end:
“Dr. Freemen agrees that he must, and claims he has, applied the Bradford Hill Criteria to support his opinion. However, the starting procedure of any Bradford-Hill analysis is ‘an association between two variables’ that is ‘perfectly clear-cut and beyond what we would care to attribute to the play of chance’.35 Dr. Freeman testified that generally accepted methodology requires a determination, first, that there’s evidence of an association and, second, whether chance, bias and confounding have been accounted for, before application of the Bradford-Hill criteria.36 Because no such association has been properly demonstrated, the Bradford Hill criteria could not have been properly applied.”
Slip op. at 12-13. In other words, don’t go rushing to the Bradford Hill factors until and unless you have first shown an association; second, you have shown that it is “clear cut,” and not likely the result of bias or confounding; and third, you have ruled out the play of chance or random variability in explaining the difference between the observed and expected rates of disease.
Proper epidemiologic method requires surveying the pertinent published studies that investigate whether there is an association between the medication use and the claimed harm. The expert witnesses must, however, do more than write a bibliography; they must assess any putative associations for “chance, confounding or bias”:
“Proper epidemiological methodology begins with published study results which demonstrate an association between a drug and an unfortunate effect. Once an association has been found, a judgment as whether a real causal relationship between exposure to a drug and a particular birth defect really exists must be made. This judgment requires a critical analysis of the relevant literature applying proper epidemiologic principles and methods. It must be determined whether the observed results are due to a real association or merely the result of chance. Appropriate scientific studies must be analyzed for the possibility that the apparent associations were the result of chance, confounding or bias. It must also be considered whether the results have been replicated.”
Slip op. at 7.
Then Rule Out Chance
So if there is something that appears to be an association in a study, the expert epidemiologist must assess whether it is likely consistent with a chance association. If we flip a fair coin 10 times, we “expect” 5 heads and 5 tails, but actually the probability of not getting the expected result is about three times greater than obtaining the expected result. If on one series of 10 tosses we obtain 6 heads and 4 tails, we would certainly not reject a starting assumption that the expected outcome was 5 heads/ 5 tails. Indeed, the probability of obtaining 6 heads / 4 tails or 4 heads /6 tails is almost double that of the probability of obtaining the expected outcome of equal number of heads and tails.
As it turned out in the Porter case, Dr. Freeman relied rather heavily upon one study, the Louik study, for his claim that Zoloft causes the birth defect in question. See Carol Louik, Angela E. Lin, Martha M. Werler, Sonia Hernández-Díaz, and Allen A. Mitchell, “First-Trimester Use of Selective Serotonin-Reuptake Inhibitors and the Risk of Birth Defects,” 356 New Engl. J. Med. 2675 (2007). The authors of the Louik study were quite clear that they were not able to rule out chance as a sufficient explanation for the observed data in their study:
“The previously unreported associations we identified warrant particularly cautious interpretation. In the absence of preexisting hypotheses and the presence of multiple comparisons, distinguishing random variation from true elevations in risk is difficult. Despite the large size of our study overall, we had limited numbers to evaluate associations between rare outcomes and rare exposures. We included results based on small numbers of exposed subjects in order to allow other researchers to compare their observations with ours, but we caution that these estimates should not be interpreted as strong evidence of increased risks.24”
Slip op at 10 (quoting from Louik study).
Judge Bernstein thus criticized Freeman for failing to account for chance in explaining his putative association between maternal Zoloft use and infant omphalocele. The appropriate and generally accepted methodology for accomplishing this step of evaluating a putative association is to consider whether the association is statistically significant at the conventional level.
In relying heavily upon the Louik study, Dr. Freeman opened himself up to serious methodological criticism. Judge Bernstein’s opinion stands for the important proposition that courts should not be unduly impressed with nominal statistical significance in the presence of multiple comparisons and very broad confidence intervals:
“The Louik study is the only study to report a statistically significant association between Zoloft and omphalocele. Louik’s confidence interval which ranges between 1.6 and 20.7 is exceptionally broad. … The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power. Studies that rely on a very small number of cases can present a random statistically unstable clustering pattern that may not replicate the reality of a larger population. The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”
Slip op. at 8. Statistical precision in the point estimate of risk, which includes assessing the outcome in the context of whether the authors conducted multiple comparisons, and whether the observed confidence intervals were very broad, is part of the generally accepted epidemiologic methodology, which Freeman flouted:
“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”
Slip op. at 9. The studies that Freeman cited and apparently relied upon failed to report statistically significant associations between sertraline (Zoloft) and omphalocele. Judge Bernstein found this lack to be a serious problem for Freeman and his epidemiologic opinion:
“While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”
Slip op. at 10. The lack of statistical significance, in the context of repeated attempts to find it, helped sink Freeman’s proffered testimony.
Then Rule Out Bias and Confounding
As noted, Freeman relied heavily upon the Louik study, which was the only study to report a nominally statistically significant risk ratio for maternal Zoloft use and infant omphalocele. The Louik study, by its design, however, could not exclude chance or confounding as full explanation for the apparent association, and Judge Bernstein chastised Dr. Freeman for overselling the study as support for the plaintiffs’ causal claim:
“The Louik authors were unable to rule out confounding or chance. The results have never been replicated concerning omphalocele. Dr. Freeman’s testimony does not explain, or seemingly even consider these serious limitations.”
Slip op. at 8.
And Only Then Consider the Bradford Hill Factors
Even when an association is clear cut, and beyond what we can likely attribute to chance, generally accepted methodology requires the epidemiologist to consider the Bradford Hill factors. As Judge Bernstein explains, generally accepted methodology for assessing causality in this area requires a proper consideration of Hill’s factors before a conclusion of causation is reached:
“As the Bradford-Hill factors are properly considered, causality becomes a matter of the epidemiologist’s professional judgment.”
Slip op. at 7.
Consistency or Replication
The nine Hill factors are well known to lawyers because they have been stated and discussed extensively in Hill’s original article, and in references such as the Reference Manual on Scientific Evidence. Not all the Hill factors are equally important, or important at all, but one that is consistency or concordance of results among the available epidemiologic studies. Stated alternatively, a clear cut association unlikely to be explained by chance is certainly interesting and probative, but it raises an important methodological question — can the result be replicated? Judge Bernstein restated this important Hill factor as an important determinant of whether a challenged expert witness employed a generally accepted method:
“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology.”
Slip op. at 10.
“More significantly neither Reefhuis nor Alwan reported statistically significant associations between Zoloft and omphalocele. While non-significant results can be of some use, despite a multitude of subsequent studies which isolated omphalocele, there is no study which replicates or supports Dr. Freeman’s conclusions.”
Slip op. at 10.
Replication But Without Double Dipping the Data
Epidemiologic studies are sometimes updated and extended with additional follow up. An expert witness who wished to skate over the replication and consistency requirement might be tempted, as was Dr. Freeman, to count the earlier and later iteration of the same basic study to count as “replication.” The Louik study was indeed updated and extended this year in a published paper by Jennita Reefhuis and colleagues.[2] Proper methodology, however, prohibits double dipping data to count the later study that subsumes the early one as a “replication”:
“Generally accepted methodology considers statistically significant replication of study results in different populations because apparent associations may reflect flaws in methodology. Dr. Freeman claims the Alwan and Reefhuis studies demonstrate replication. However, the population Alwan studied is only a subset of the Reefhuis population and therefore they are effectively the same.”
Slip op. at 10.
The Lumping Fallacy
Analyzing the health outcome of interest at the right level of specificity can sometimes be a puzzle and a challenge, but Freeman generally got it wrong by opportunistically “lumping” disparate outcomes together when it helps him get a result that he likes. Judge Bernstein admonishes:
“Proper methodology further requires that one not fall victim to the … the ‘Lumping Fallacy’. … Different birth defects should not be grouped together unless they a part of the same body system, share a common pathogenesis or there is a specific valid justification or necessity for an association20 and chance, bias, and confounding have been eliminated.”
Slip op. at 7. Dr. Freeman lumped a lot, but Judge Bernstein saw through the methodological ruse. As Judge Bernstein pointed out:
“Dr. Freeman’s analysis improperly conflates three types of data: Zoloft and omphalocele, SSRI’s generally and omphalocele, and SSRI’s and gastrointestinal and abdominal malformations.”
Slip op. at 8. Freeman’s approach, which sadly is seen frequently in pharmaceutical and other products liability cases, is methodologically improper:
“Generally accepted causation criteria must be based on the data applicable to the specific birth defect at issue. Dr. Freeman improperly lumps together disparate birth defects.”
Slip op. at 11.
Class Effect Fallacy
Another kind of improper lumping results from treating all SSRI antidepressants the same to either lump them together, or to pick and choose from among all the SSRIs, the data points that are supportive of the plaintiffs’ claims (while ignoring those SSRI data points not supportive of the claims). To be sure, the SSRI antidepressants do form a “class,” in that they all have a similar pharmacologic effect. The SSRIs, however, do not all achieve their effect in the serotonergic neurons the same way; nor do they all have the same “off-target” effects. Treating all the SSRIs as interchangeable for a claimed adverse effect, without independent support for this treatment, is known as the class effect fallacy. In Judge Bernstein’s words:
“Proper methodology further requires that one not fall victim to the ‘Class Effect Fallacy’ … . A class effect cannot be assumed. The causation conclusion must be drug specific.”
Slip op. at 7. Dr. Freeman’s analysis improperly conflated Zoloft data with SSRI data generally. Slip op. at 8. Assuming what you set out to demonstrate is, of course, a fine way to go methodologically into the ditch:
“Without significant independent scientific justification it is contrary to generally accepted methodology to assume the existence of a class effect. Dr. Freeman lumps all SSRI drug results together and assumes a class effect.”
Slip op. at 10.
SPECIFIC CAUSATION SCREW UPS
Dr. Freeman was also offered by plaintiffs to provide a specific causation opinion – that Mrs. Porter’s use of Zoloft in pregnancy caused her child’s omphalocele. Freeman claimed to have performed a differential diagnosis or etiology or something to rule out alternative causes.
Genetics
In the field of birth defects, one possible cause looming in any given case is an inherited or spontaneous genetic mutation. Freeman purported to have considered and ruled out genetic causes, which he acknowledged to make up a substantial percentage of all omphalocele cases. Bo Porter, Mrs. Porter’s son, was tested for known genetic causes, and Freeman argued that this allowed him to “rule out” genetic causes. But the current state of the art in genetic testing allowed only for identifying a small number of possible genetic causes, and Freeman failed to explain how he might have ruled out the as-of-yet unidentified genetic causes of birth defects:
“Dr. Freeman fails to properly rule out genetic causes. Dr. Freeman opines that 45-49% of omphalocele cases are due to genetic factors and that the remaining 50-55% of cases are due to non-genetic factors. Dr. Freeman relies on Bo Porter’s genetic testing which did not identify a specific genetic cause for his injury. However, minor plaintiff has not been tested for all known genetic causes. Unknown genetic causes of course cannot yet be tested. Dr. Freeman has made no analysis at all, only unwarranted assumptions.”
Slip op. at 15-16. Judge Bernstein reviewed Freeman’s attempted analysis and ruling out of potential causes, and found that it departed from the generally accepted methodology in conducting differential etiology. Slip op. at 17.
Timing Errors
One feature of putative terotogenicity is that an embryonic exposure must take place at a specific gestational developmental time in order to have its claimed deleterious effect. As Judge Bernstein pointed out, omphalocele results from an incomplete folding of the abdominal wall during the third to fifth weeks of gestation. Mrs. Porter, however, did not begin taking Zoloft until her seventh week of pregnancy, which left Dr. Freeman opinion-less as to how Zoloft contributed to the claimed causation of the minor plaintiff’s birth defect. Slip op. at 14. This aspect of Freeman’s specific causation analysis was glaringly defect, and clearly not the sort of generally accepted methodology of attributing a birth defect to a teratogen.
******************************************************
All in all, Judge Bernstein’s opinion is a tour de force demonstration of how a state court judge, in a so-called Frye jurisdiction, can show that failure to employ generally accepted methods renders an expert witness’s opinions inadmissible. There is one small problem in statistical terminology.
Statistical Power
Judge Bernstein states, at different places, that the Louik study was and was not statistically significant for Zoloft and omphalocele. The court’s opinion ultimately does explain that the nominal statistical significance was vitiated by multiple comparisons and an extremely broad confidence interval, which more than justified its statement that the study was not truly statistically significant. In another moment, however, the court referred to the problem as one of lack of statistical power. For some reason, however, Judge Bernstein chose to explain the problem with the Louik study as a lack of statistical power:
“Equally significant is the lack of power concerning the omphalocele results. The Louik study had only 3 exposed subjects who developed omphalocele thus limiting its statistical power.”
Slip op. at 8. The adjusted odds ratio for Zoloft and omphalocele, was 5.7, with a 95% confidence interval of 1.6 – 20.7. Power was not the issue because if the odds ratio were otherwise credible, free from bias, confounding, and chance, the study had the power to observe an increased risk of close to 500%, which met the pre-stated level of significance. The problem, however, was multiple testing, fragile and imprecise results, and inability to evaluate the odds ratio fully for bias and confounding.
[1] Mark I. Bernstein, “Expert Testimony in Pennsylvania,” 68 Temple L. Rev. 699 (1995); Mark I. Bernstein, “Jury Evaluation of Expert Testimony under the Federal Rules,” 7 Drexel L. Rev. 239 (2014-2015).
[2] Jennita Reefhuis, Owen Devine, Jan M Friedman, Carol Louik, Margaret A Honein, “Specific SSRIs and birth defects: bayesian analysis to interpret new data in the context of previous reports,” 351 Brit. Med. J. (2015).