As this is a methods group I thought we should discuss the ongoing debate regarding the random effects model of meta-analysis. I have re-posted my blog entry regarding the JAMA article by Serghiou and Goodman for your comments
One of the most troubling aspects in statistical practice is the tendency for statisticians to make an assumption and then lay out theory that hinges on the "truthfulness" of that assumption and then lay out models and estimators that all hinge on that overarching assumption. Meta-analysis is one such area where the assumption of random effects is so pervasive that this assumption is now treated as "fact" and indeed most simulations that support such models also start off by incorporating these assumptions into the simulation protocol thus creating a self-fulfiling prophesy. Most journal editors (Stat Med, Biostatistics or Br J Math Stat Psychol are the three I have tried) have demonstrated poor editorial understanding of these issues and very poor quality reviews that essentially in sum boil down to "who are you to question the accepted norm and why should we believe you?" but make no attempt at logical thinking. Finally, the comment by Serghiou and Goodman surfaced in JAMA (
) and after getting no response to my comments from the authors, thought that this deserves a discussion on RG. I therefore lay out my comments and welcome yours as well.
Regarding the Random-Effects Meta-analysis
Why Is Random-Effects Meta-analysis Used?
Serghiou & Goodman  point out that each study evaluating the effect of opioid treatment presented in Busse et al  provides its own answer in terms of an observed (estimated) effect size. The latter represents the best estimate from that study of the underlying true opioid effect. They correctly indicate that the true effect itself is the underlying benefit of opioid treatment if it could be measured perfectly, and is a single value that cannot directly be known. All the studies cited by Busse et al  had observed treatment effects that vary and this variation could be due to chance (statistically; random error) even if the true effects were to be the same in these studies. Serghiou & Goodman  go on to state that, in statistics, the belief that the true effect is the same in each study is called the fixed-effect assumption and that the model of meta-analysis under this assumption is called a fixed-effect meta-analysis.
While we agree with Serghiou & Goodman  that medical studies addressing the same question are typically subject to additional errors (beyond random error), differences in such study characteristics however do not reduce the confidence that each study is actually estimating the same true effect. This is where we diverge from Serghiou & Goodman and our argument is that one cannot alter the nature of the true effect, rather an additional error is introduced called systematic error that makes this seem to be the case. Serghiou & Goodman  assert that the alternative assumption is that the true effects being estimated are different from each other or heterogeneous. We assert that the alternative assumption is that there is more error than just random error in study effect estimation. In statistical jargon, Serghiou & Goodman’s assertion is called the random-effects (RE) assumption and our assertion remains the fixed effect assumption. The plural in effects versus singular in effect is important as this implies there is more than 1 true effect under Serghiou & Goodman but not under our assertion.
Serghiou & Goodman next go on to make two claims about the random effects meta-analysis :
a) a random-effects assumption is less restrictive than a fixed effect assumption because it reflects the variation or heterogeneity in the true effects estimated by each trial
b) usually results in a more realistic estimate of the uncertainty in the overall treatment effect with larger CIs than would be obtained if a fixed effect was assumed.
Both of these can easily be argued for under the fixed effect assumption. The first claim is actually a reference to what is called overdispersion in statistical jargon which means that the data are more dispersed than the dispersion assumed under the respective model. However, the fixed effect assumption is only restrictive if systematic error is not considered and not accounted for and the latter can easily be done and has been implemented in the IVhet model . As far as the second claim is concerned, the observed heterogeneity is actually of the study effect estimates and a more realistic estimate of uncertainty in the overall treatment effect with larger CIs can also be obtained when a fixed effect is assumed . Indeed, when compared with the correct fixed effect model , the models under the random effect assumption have a poorer quality of error estimation and their CI coverage remains inadequate [4,5] and nominal coverage is certainly not achievable with currently utilized RE models [3,4,6]. This inadequacy is a result of the change in the weight structure making the weights inaccurate when used in the variance computation .
Description of Random-Effects Meta-analysis
Different individuals engaged in a meta-analysis may have different goals and perspectives on what the parameter of interest is and the appropriate model to consider. Serghiou & Goodman have a particular view that endorses the position one may take when considering a random effects model. We are sure not everyone would agree with Serghiou & Goodman on the primary question that a meta analysis is trying to answer and thus the interpretation within the RE framework [8,9] that they lay down characterizes study expected effects as ‘true effects’ and the RE pooled estimate as an ‘average of true effects’. These terms originate from the statistical interpretation of the observed treatment effect for a study as a combination of a treatment effect common to all studies (the average of true effects) plus a “random effect” specific to that study alone. However, this interpretation has been challenged [10,11], given that there is no reason why studies could not have deviated from an unknown underlying net treatment benefit because their estimates suffer from different degrees of random and systematic error. If this is true, then rather than a so called ‘average effect, researchers should essentially be pursuing the unknown ‘net treatment benefit’ of interest in a representative population. This follows from the fact that treatment effects are not necessarily the same for everyone and the overall benefit ascribed to a treatment in a clinical trial is indeed a ‘net treatment benefit’ reflecting a mixture of substantial benefits for some, little benefit for many and harm for a few.  The individual study populations that make up the meta-analytic studies may not always be centred around the representative populations’ distribution of effects  and in this context the ‘average of true effects’ may be considered an estimate of centred net treatment benefit (centred referring to the representative population). The so called ‘true effects’ would then be the non-centred study effects, such non-centring being a form of systematic error.
We are sure Serghiou & Goodman, when faced with this objection, would contend that, in any case, the concept would be the same with the centred net treatment benefit also being an average treatment effect. There is however a fundamental difference between our concept and the concept advocated by Serghiou & Goodman - that of exchangeability. The study level (non-centred) treatment benefit is not exchangeable and each study population departs from the centred net treatment benefit because of systematic differences (e.g. in the diversity of the study populations, treatments and methods) leading to non-centring of study level treatment benefit . If we accept this, then the purpose of meta-analysis has to change and become the application of the most appropriate weights to such study effects, so that the weighted mean estimate has the least possible error (when compared to the unknown true net treatment effect within a representative population).
In the report by Busse et al, the fixed effect (IVhet) pooled opiod benefit was −0.62 cm (95%CI, −0.77 to −0.47 cm) and this was more conservative than the reported random-effects pooled opioid benefit of −0.69 cm (95%CI, −0.82 to −0.56 cm). We would therefore argue that the RE assumption can lead to less conservative estimates and we are not alone in saying this . The statistical measures of heterogeneity, often expressed as an I2, may be interpreted under both the fixed or random effects assumptions to reflect that, in the Busse et al study, three-quarters of the variability in effect estimates are due to heterogeneity between studies rather than sampling error (this heterogeneity being due to systematic error under the fixed effect assumption and which are thought to be “varying true effects” under the random effects assumption). Serghiou & Goodman go on to suggest that a more natural heterogeneity measure is the standard deviation of the hypothetical “true effects”, often denoted as τ. In reality, τ is just a nuisance parameter  and an attempt (albeit a weak one) to parameterise the hypothetical random effects and simply move weights away from inverse variance and towards equality if heterogeneity across studies is large but retain inverse variance weights if the between study heterogeneity was small. Indeed, the weights in Figure 2 of Busse et al are “approximately” equal and this is simply what the random effects assumption leads to under heterogeneity – the arithmetic mean or the naturally weighted mean . Indeed other methodologists have pointed out that because the RE assumption implies that observed effects must be randomly sampled from a population of standard deviation τ , the validity of this model is integrally tied to the procedures that are followed in selecting the meta-analysis studies and to the purported generalizability of the inferences to be drawn from the results . It is likely that the RE model does not even apply if we were to keep to a strict view of randomization in statistical inference .
Why Did the Authors Use Random-Effects Meta-analysis?
Serghiou and Goodman suggest that this was because meta-analyses incorporate some uncertainties that mathematical summaries cannot reflect. They suggest that the choice of the random effects assumption by Busse et al  was a sensible approach because they purportedly use the statistical method least likely to overstate certainty regardless of perceptions or philosophy about true effects being fixed or random. This is simply not true since we know for a fact that overdispersion is a key problem with random effects models . Random-effects models are a frequent choice (not necessarily in the case of Busse et al) because many analysts have been sold this idea and do not have the expertise to make this judgment themselves and thus rely on expert consensus to decide upon model choices.
The studies in the report by Busse et al demonstrate substantial variability, and Serghiou & Goodman examine two studies from two different countries and conclude that these studies used different opioids to treat different sources of pain in culturally different populations that may assess pain differently and rightly conclude that the differences seen are probably beyond chance variation. However, this does not provide substantial evidence that these studies are not examining the same net treatment benefit. For example, the two studies used different opioids to treat different sources of pain in culturally different populations that may assess pain differently, but if we define a net treatment benefit in an ideal population as the underlying true effect then these two studies suffer from systematic error and not only is a fixed effect plausible but a fixed effect meta-analysis would then be the appropriate approach. The same meta-analysis in Figure 2 of Busse et al is given in the attached graph file below and clearly we do not have the problem of equal weights.
What Are Limitations of a Random-Effects Meta-analysis?
Serghiou & Goodman point out that there are many approaches to calculating the random effects estimates. These actually represent a multitude of solutions to computing the between study standard deviation, τ, in an attempt to try to improve the flawed error estimation (overdispersion) with RE models  but, unfortunately this has not worked because there has been a consistent refusal to accept the fact that the problem really lies with the concept of the RE assumption itself. As Serghiou & Goodman state , most of these variants produce similar estimates and all including the DerSimonian-Laird method produce CIs that are too narrow and P values that are too small when there are few studies and sizable heterogeneity . The fixed effect model (IVhet) remains optimal in this setting of few studies and high heterogeneity [3,18]. In addition small studies more strongly influence estimates from random-effects than from fixed-effect models (like the IVhet) and since smaller studies are usually judged to be more likely to be biased, as Serghiou & Goodman state, this can be a substantial concern .
How Should the Results of a Random-Effects Meta-analysis Be Interpreted in This Particular Study?
These issues discussed above mean that the problems with error estimation will lead to spuriously significant estimates under the random effects assumption. The differing study effects reflect heterogeneity between studies in terms of the differences each study brings to the table. This is what has led to the deviation of study effects from the net treatment benefit and arises through systematic differences in patient populations, concomitant care, measurement biases and other study characteristics, all of which are factors that impact on what is known as the heterogeneity of treatment effects. If we simulate under the assumption that, in truth, systematic error leads to deviation of the study effects from net treatment benefit , and that the magnitude and direction of such a deviation is study specific, then by assuming random effects with a common variance, all that will happen is that when differences between study effects are large, the RE estimator will just move towards equal weights by down-weighting big studies and up-weighting small studies  This is what happens in the meta-analyses reported in Busse et al  and since the one in Figure 2 of Busse et al has an I2 of 70.4%, all studies have more or less equal weights. The meta-analysis by Busse et al cannot be considered conclusive unless an appropriate fixed effect analysis is run (see attached file here) and these results then provided strong evidence against opioids increasing pain and suggested that opioids are generally likely to reduce chronic non-cancer pain by a modest 0.62 cm more than placebo (less than the 1 cm minimum clinically important difference). This is more conservative than even what they report and in view of the amount of heterogeneity, it is possible that in some settings and patients, the net treatment benefit of opioids could be lesser or greater than this fixed-effect estimate. This is not because the net treatment benefit varies across studies but rather because the true net treatment benefit in the ideal population is modulated by individual study populations and characteristics. Heterogeneity therefore tells us that there are many reasons for study deviations from the net treatment benefit and should be investigated to clarify this area. Evidently, a meta-analysis should not be attempted if there is no biologic plausibility for the existence of a net treatment benefit in a representative population.
1. Serghiou S, Goodman SN. Random-Effects Meta-analysis: Summarizing Evidence With Caveats. JAMA 2019 Jan 22;321(3):301-302.
2. Busse JW, Wang L, Kamaleldin M, Craigie S, Riva JJ, Montoya L, et al. Opioids for Chronic Noncancer Pain: A Systematic Review and Meta-analysis. JAMA 2018; 320:2448-2460.
3. Doi SA, Barendregt JJ, Khan S, Thalib L, Williams GM. Advances in the meta-analysis of heterogeneous clinical trials I: The inverse variance heterogeneity model. Contemp Clin Trials 2015; 45:130-8.
4. Noma H. Confidence intervals for a random-effects meta-analysis based on Bartlett-type corrections. Stat Med 2011; 30:3304-12.
5. Brockwell SE, Gordon IR. A simple method for inference on an overall effect in meta-analysis. Stat Med 2007; 26:4531-43.
6. Brockwell SE, Gordon IR. A comparison of statistical methods for meta-analysis. Stat Med 2001; 20:825-40.
7. Al Khalaf MM, Thalib L, Doi SA. Combining heterogenous studies using the random-effects model is a mistake and leads to inconclusive meta-analyses. J Clin Epidemiol 2011; 64:119-23.
8. Senn S. Trying to be precise about vagueness. Stat Med 2007; 26:1417-30.
9. Higgins JP, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects meta-analysis. J R Stat Soc Ser A Stat Soc 2009; 172:137-159.
10. Doi SA, Thalib L. A Quality-Effects Model for Meta-Analysis. Epidemiology 2008; 19:94-100.
11. Doi SA, Barendregt JJ, Khan S, Thalib L, Williams GM. Advances in the meta-analysis of heterogeneous clinical trials II: The quality effects model. Contemp Clin Trials 2015; 45:123-9.
12. Kravitz RL, Duan N, Braslow J. Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q 2004; 82:661-87.
13. Poole C, Greenland S. Random-effects meta-analyses are not always conservative. Am J Epidemiol 1999; 150:469-75.
14. Shuster JJ. Empirical vs natural weighting in random effects meta-analysis. Stat Med 2010; 29:1259-65.
15. Overton RC. A Comparison of Fixed-Effects and Mixed (Random-Effects) Models for Meta-Analysis Tests of Moderator Variable Effects. Psychological Methods 1998; 3:354-379.
16. Thorlund K, Wetterslev J, Awad T, Thabane L, Gluud C. Comparison of statistical inferences from the DerSimonian-Laird and alternative random-effects model meta-analyses - an empirical assessment of 920 Cochrane primary outcome meta-analyses. Res Synth Methods 2011; 2:238-53.
17. Langan D, Higgins JP, Simmonds M. Comparative performance of heterogeneity variance estimators in meta-analysis: a review of simulation studies. Res Synth Methods 2016;
18. Doi SAR, Furuya-Kanamori L, Thalib L, Barendregt JJ. Meta-analysis in evidence-based healthcare: a paradigm shift away from random effects is overdue. Int J Evid Based Healthc 2017; 15(4):152-160
19. Henmi M, Copas JB. Confidence intervals for random effects meta-analysis and robustness to publication bias. Stat Med 2010; 29:2969-83.