6 For example, heterogeneity is likely to arise through diversity in doses, lengths of follow up, study quality, and inclusion criteria for participants. Since systematic reviews bring together studies that are diverse both clinically and methodologically, heterogeneity in their results is to be expected. As we show later, a little inconsistency exists among these trials but it does not affect the conclusion of the review (that serotonin reuptake inhibitors have lower discontinuation rates than tricyclic antidepressants). However, this P value does not reasonably describe the extent of heterogeneity in the results of the trials. 13 Over 15 000 participants from 135 trials are included in the assessment of comparative drop-out rates, and the test for heterogeneity is significant (P = 0.005). One of the largest meta-analyses in the Cochrane Database of Systematic Reviews is of clinical trials of tricyclic antidepressants and selective serotonin reuptake inhibitors for treatment of depression. Summary odds ratios calculated with random effects methodĬonversely, the test arguably has excessive power when there are many studies, especially when those studies are large. 10Įight trials of amantadine for prevention of influenza. Using a cut-off of 10% for significance 12 ameliorates this problem but increases the risk of drawing a false positive conclusion (type I error). Because the test is poor at detecting true heterogeneity, a non-significant result cannot be taken as evidence of homogeneity. But the test of heterogeneity yields a P value of 0.09, conventionally interpreted as being non-significant. 11 The treatment effects in the eight trials seem inconsistent: the reduction in odds vary from 16% to 93%, with some of the confidence intervals not overlapping. 9, 10 For example, consider the meta-analysis of randomised controlled trials of amantadine for preventing influenza ( fig 1). Meta-analyses often include small numbers of studies, 6, 8 and the power of the test in such circumstances is low.
The test is known to be poor at detecting true heterogeneity among studies as significant. 7 P values are obtained by comparing the statistic with a χ 2 distribution with k-1 degrees of freedom (where k is the number of studies). The usual test statistic (Cochran's Q) is computed by summing the squared deviations of each study's estimate from the overall meta-analytic estimate, weighting each study's contribution in the same manner as in the meta-analysis. A test for heterogeneity examines the null hypothesis that all studies are evaluating the same effect.