Is the Poisson distribution a short-cut to getting standard errors for journal impact factors? The nice thing about the Poisson distribution is that the variance is equal to the mean. The journal impact factor is the mean number of citations received in a given year by articles published in a journal in the previous few years. So if citations followed a Poisson distribution it would be easy to compute a standard error for the impact factor. The only additional information you would need besides the impact factor itself, is the number of articles published in the relevant previous years.
This is the idea behind Darren Greenwood's 2007 paper on credible intervals for journal impact factors. As he takes a Bayesian approach things are a little more complicated in practice. Now, earlier this year Lutz Bornmann published a letter in Scientometrics that also proposes using the Poisson distribution to compute uncertainty bounds - this time, frequentist confidence intervals. Using the data from my 2013 paper in the Journal of Economic Literature, I investigated whether this proposal would work. My comment on Bornmann's letter is now published in Scientometrics.
It is not necessarily a good assumption that citations follow a Poisson process. First, it is well-known that the number of citations received each year by an article, first increases and then decreases (Fok and Franses, 2007; Stern, 2014) and so the simple Poisson assumption cannot be true for individual articles. For example, Fok and Franses argue that for articles that receive at least some citations, the profile of citations over time follows the Bass model. Furthermore, articles in a journal vary in quality and do not all each have the same expected number of citations. Previous research finds that the distribution of citations across a group of articles is related to the log-normal distribution (Stringer et al., 2010; Wang et al., 2013).
Stern (2013) computed the actual observed standard deviation of citations in 2011 at the journal level for all articles published in the previous five years in all 230 journals in the economics subject category of the Journal Citation Reports using the standard formula for the variance
where Vi is the variance of citations received in 2011 for all articles published in journal i between 2006 and 2010 inclusively, Ni is the number of articles published in the journal in that period, Cj is the number of citations received in 2011 by article j published in the relevant period, and Mi is the 5-year impact factor of the journal. Then the standard error of the impact factor is √(Vi/Ni ).
Table 1 in Stern (2013) presents the standard deviation of citations, the estimated 5-year impact factor, the standard error of that impact factor, and a 95% confidence interval for all 230 journals. Also included are the number of articles published in the five year window, the official impact factor published in the Journal Citation Reports and the median citations for each journal.
The following graph plots the variance against the mean for the 229 journals with non-zero impact factors:
There is a strong linear relationship between the logs of the mean and the variance but it is obvious that the variance is not equal to the mean for this dataset. A simple regression of the log of the variance of citations on the log of the mean yields:
where standard errors are given in parentheses. The R-squared of this regression is 0.92. If citations followed the Poisson distribution, the constant would be zero and the slope would be equal to one. These hypotheses are clearly rejected. Using the Poisson assumption for these journals would result in underestimating the width of the confidence interval for almost all journals, especially those with higher impact factors. In fact, only four journals have variances equal to or smaller than their impact factors. As an example, the standard error of the impact factor estimated by Stern (2013) for the Quarterly Journal of Economics is 0.57. The Poisson approach yields 0.2.
Unfortunately, accurately computing standard errors and confidence intervals for journal impact factors appears to be harder than just referring to the impact factor and number of articles published. But it is not very difficult to download the citations to articles in a target set of journals from the Web of Science or Scopus and compute the confidence intervals from them. I downloaded the data and did the main computations in my 2013 paper in a single day. It would be trivially easy for Clarivate, Elsevier, or other providers to report standard errors.
References
Bornmann, L. (2017) Confidence intervals for Journal Impact Factors, Scientometrics 111:1869–1871.
Fok, D. and P. H. Franses (2007) Modeling the diffusion of scientific publications, Journal of Econometrics 139: 376-390.
Stern, D. I. (2013) Uncertainty measures for economics journal impact factors, Journal of Economic Literature 51(1), 173-189.
Stern, D. I. (2014) High-ranked social science journal articles can be identified from early citation information, PLoS ONE 9(11), e112520.
Stringer, M. J, Sales-Pardo, M., Nunes Amaral, L. A. (2010) Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal, Journal of the American Society for Information Science and Technology 61(7): 1377–1385.
Wang, D., Song C., Barabási A.-L. (2013) Quantifying long-term scientific impact, Science 342: 127–132.
This is the idea behind Darren Greenwood's 2007 paper on credible intervals for journal impact factors. As he takes a Bayesian approach things are a little more complicated in practice. Now, earlier this year Lutz Bornmann published a letter in Scientometrics that also proposes using the Poisson distribution to compute uncertainty bounds - this time, frequentist confidence intervals. Using the data from my 2013 paper in the Journal of Economic Literature, I investigated whether this proposal would work. My comment on Bornmann's letter is now published in Scientometrics.
It is not necessarily a good assumption that citations follow a Poisson process. First, it is well-known that the number of citations received each year by an article, first increases and then decreases (Fok and Franses, 2007; Stern, 2014) and so the simple Poisson assumption cannot be true for individual articles. For example, Fok and Franses argue that for articles that receive at least some citations, the profile of citations over time follows the Bass model. Furthermore, articles in a journal vary in quality and do not all each have the same expected number of citations. Previous research finds that the distribution of citations across a group of articles is related to the log-normal distribution (Stringer et al., 2010; Wang et al., 2013).
Stern (2013) computed the actual observed standard deviation of citations in 2011 at the journal level for all articles published in the previous five years in all 230 journals in the economics subject category of the Journal Citation Reports using the standard formula for the variance
where Vi is the variance of citations received in 2011 for all articles published in journal i between 2006 and 2010 inclusively, Ni is the number of articles published in the journal in that period, Cj is the number of citations received in 2011 by article j published in the relevant period, and Mi is the 5-year impact factor of the journal. Then the standard error of the impact factor is √(Vi/Ni ).
Table 1 in Stern (2013) presents the standard deviation of citations, the estimated 5-year impact factor, the standard error of that impact factor, and a 95% confidence interval for all 230 journals. Also included are the number of articles published in the five year window, the official impact factor published in the Journal Citation Reports and the median citations for each journal.
The following graph plots the variance against the mean for the 229 journals with non-zero impact factors:
There is a strong linear relationship between the logs of the mean and the variance but it is obvious that the variance is not equal to the mean for this dataset. A simple regression of the log of the variance of citations on the log of the mean yields:
where standard errors are given in parentheses. The R-squared of this regression is 0.92. If citations followed the Poisson distribution, the constant would be zero and the slope would be equal to one. These hypotheses are clearly rejected. Using the Poisson assumption for these journals would result in underestimating the width of the confidence interval for almost all journals, especially those with higher impact factors. In fact, only four journals have variances equal to or smaller than their impact factors. As an example, the standard error of the impact factor estimated by Stern (2013) for the Quarterly Journal of Economics is 0.57. The Poisson approach yields 0.2.
Unfortunately, accurately computing standard errors and confidence intervals for journal impact factors appears to be harder than just referring to the impact factor and number of articles published. But it is not very difficult to download the citations to articles in a target set of journals from the Web of Science or Scopus and compute the confidence intervals from them. I downloaded the data and did the main computations in my 2013 paper in a single day. It would be trivially easy for Clarivate, Elsevier, or other providers to report standard errors.
References
Bornmann, L. (2017) Confidence intervals for Journal Impact Factors, Scientometrics 111:1869–1871.
Fok, D. and P. H. Franses (2007) Modeling the diffusion of scientific publications, Journal of Econometrics 139: 376-390.
Stern, D. I. (2013) Uncertainty measures for economics journal impact factors, Journal of Economic Literature 51(1), 173-189.
Stern, D. I. (2014) High-ranked social science journal articles can be identified from early citation information, PLoS ONE 9(11), e112520.
Stringer, M. J, Sales-Pardo, M., Nunes Amaral, L. A. (2010) Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal, Journal of the American Society for Information Science and Technology 61(7): 1377–1385.
Wang, D., Song C., Barabási A.-L. (2013) Quantifying long-term scientific impact, Science 342: 127–132.
No comments:
Post a Comment