I have a new working paper coauthored with Johannes König and Richard Tol. It's a follow up to my 2013 paper in the Journal of Economic Literature, where I computed standard errors for simple journal impact factors for all economics journals and tried to evaluate whether the differences between journals were significant.* In the new paper, we develop standard errors and confidence intervals for recursive journal impact factors, which take into account that some citations are more prestigious than others, as well as for the associated ranks of journals. We again apply these methods to the all economics journals included in the Web of Science.
Recursive impact factors include the popular Scimago Journal Rank, or SJR, and Clarivate's Article Influence score. We use Pinski and Narin's invariant method, which has been used in some rankings of economics journals.
As simple impact factors are just the mean citations an article published in a journal in a given period receives in a later year, it is easy to compute standard errors for them using the formula for the standard error of the mean. But the vector of recursive impact factors is the positive eigenvector of a matrix and its variance does not have a simple analytical form.
So, we use bootstrapping to estimate the distribution of each impact factor. Taking all 88,928 articles published in 2014-18 in the economics journals included in the Web of Science, we resample from this dataset and compute the vector of recursive impact factors from the new dataset.** Repeating this 1,000 times we pick the 2.5% or 97.5% range of values for each journal to get a 95% confidence interval:
95% confidence intervals of the recursive impact factor, arithmetic scale (left axis) and logarithmic scale (right axis).
The graph repeats the same data twice with different scales so that it's possible to see some detail for both high- and low-ranked journals. Also, notice that while the confidence intervals for the highest ranked journals are quite symmetric, confidence intervals become increasingly asymmetric as we go down the ranks.
The top ranked journal, the Quarterly Journal of Economics, clearly stands out above all others. The confidence intervals of all other journals overlap with those of other journals and so the ranks of these journals are somewhat uncertain.*** So, next we construct confidence intervals for the journals' ranks.
It turns out that there are a few ways to do this. We could just construct a journal ranking for each iteration of the bootstrap and then derive the distribution of ranks for each individual journal across the 1,000 iterations. Hall and Miller (2009), Xie et al. (2009), and Mogstad et al. (2020) show that this procedure may not be consistent when some of the groups (here journals) being ranked are tied or close to tied. The corrected confidence intervals are generally broader than the naive bootstrap approach.
We compute confidence intervals for ranks using the simple bootstrap, the Xie et al. method, and the Mogstad et al. method:
95% confidence intervals of the rank based on the recursive impact factor. The inner intervals are based on Goldstein’s bootstrap method, the middle intervals use Xie’s correction to the bootstrap, and the outer intervals follow Mogstad’s pairwise comparison.
The simple bootstrap under-estimates the true range of ranks, while it seems that the Mogstad et al. method might be overly conservative. On the other hand, Xie et al.' s approach depends on choosing a couple of "tuning parameters".
All methods agree that the confidence interval of the rank of the Quarterly Journal of Economics only includes one. Based on the simple bootstrap, the remainder of the "Top-5'' journals are in the top 6 together with the Journal of Finance, while the Xie et al. method and the Mogstad et al. methods generally broaden estimated confidence intervals, particularly for mid-ranking journals. All methods agree that most apparent differences in journal quality are, in fact, mostly insignificant. We think that impact factors, whether simple or recursive should always be published together with confidence intervals.
* The latter exercises were a bit naive. As pointed out by Horrace and Parmeter (2017), we need to account for the issue of multiple comparisons.
** Previous research on this topic resampled at the journal level, missing most of the variation in citation counts.
*** Overlapping confidence intervals don't neccessarily mean that there is no signficant difference between two means. Goldstein and Harvey (1995) show that the correct confidence intervals for such a test of the difference between two means are narrower than the conventional 95% confidence intervals. On the other hand, for multiple comparisons we would want wider confidence intervals.