I have a new working paper with Stephan Bruns on the meta-analysis of Granger causality test statistics. This is a methodological paper that follows up on our meta-analysis of the energy-GDP Granger causality literature that was published in the

1. Publication bias due to sampling variability - the common tendency for statistically significant results to be preferentially published. This is either because journals reject papers that don't find anything significant or more likely because authors don't bother submitting papers without significant results. So they either scrap studies that don't find anything significant or data-mine until they do. This means that the published literature may over-represent the frequency of statistically significant tests. This is likely to be a problem in many areas of economics, but especially in a field where results are all about test statistics and not about effect sizes.

2. Omitted variables bias - Granger causality tests are very susceptible to omitted variables bias. For example, energy use might seem to cause output in a bivariate Granger causality test because it is highly correlated with capital. This is a very serious problem in the actual empirical Granger causality literature, which I noted in my PhD dissertation.

3. Over-fitting/over-rejection bias - In small samples, there is a tendency for vector autoregression model fitting procedures to select more lags of the variables than the true underlying data generation process has. There is also a tendency to over-reject the null hypothesis of no causality in these over-fitted models. This means that a lot of Granger causality results from small sample studies are spurious. We realized in our

Each graph shows normalized test statistics for causality in one of the two directions. Rather than fit models with more lags in larger samples, researchers tend to deplete the degrees of freedom by adding more lags. Therefore, there tend to be fewer degrees of freedom for studies with three lags than with two, and fewer for those with two than with one. Also, we see that the average significance level increases as the lags increase and degrees of freedom reduces.

Of course, the second two types of biases give researchers additional opportunities to select statistically significant results for publication and so, more generally, "publication bias" includes selection of statistically significant results from those provided by sampling variability and by various biases.

The standard meta-regression model used in economics deals with the first of the three biases by exploiting the idea that if there is a genuine effect then studies with larger samples should have more statistically significant test statistics than smaller studies. If there is no real effect then there will be either no relation or even a negative relation between significance and sample size. Meta-analysis can test for the effects of omitted variables bias by including dummy variables and interaction terms for the different variables included in primary studies. Finally, in our

The new paper focuses on the latter issue and examines both the potential prevalence of over-fitting and over-rejection and the effectiveness of controlling for over-fitting. The approach used in this paper is a little different to the

Power is the probability of reject the null hypothesis of no causality when it is incorrect - so here we have set up a simulated VAR where there is causality from energy to GDP. Mu is the mean sample size of the primary studies in our simulation and var is the variance. So, the lefthand graph is a simulation of mostly small sample studies. The middle one has a mixture of small and large studies, and the right hand graph has mostly large studies (but a few small ones too). The meta sample size is the number of studies that are brought together in the meta-analysis. DGP2a is a data generating process with a small effect size - DGP2b has a larger effect size.

So, what do these graphs show? When the samples in primary studies are small and we only have a meta sample of 10 or 20 studies, it is hard to detect a genuine effect, whatever we do. When the effect size is small it is still hard to detect an effect even when we have 80 primary studies using the traditional economics meta-regression model ("basic model"). Our "extended model" which controls for the number of lags really helps a lot in this situation. With large primary study sizes it is quite easy to detect a true effect with only 20 studies in the meta-analysis and our method adds little value. However, the energy-GDP causality literature has mostly small similar sized samples and is trying to detect what is quite a small effect in the energy causes GDP direction (elasticity of 0.05 or 0.1). Our approach has much to offer in this context.

*Energy Journal*last year. There are several biases in the published literature on the energy-output relationship, which we document in the*Energy Journal*paper:1. Publication bias due to sampling variability - the common tendency for statistically significant results to be preferentially published. This is either because journals reject papers that don't find anything significant or more likely because authors don't bother submitting papers without significant results. So they either scrap studies that don't find anything significant or data-mine until they do. This means that the published literature may over-represent the frequency of statistically significant tests. This is likely to be a problem in many areas of economics, but especially in a field where results are all about test statistics and not about effect sizes.

2. Omitted variables bias - Granger causality tests are very susceptible to omitted variables bias. For example, energy use might seem to cause output in a bivariate Granger causality test because it is highly correlated with capital. This is a very serious problem in the actual empirical Granger causality literature, which I noted in my PhD dissertation.

3. Over-fitting/over-rejection bias - In small samples, there is a tendency for vector autoregression model fitting procedures to select more lags of the variables than the true underlying data generation process has. There is also a tendency to over-reject the null hypothesis of no causality in these over-fitted models. This means that a lot of Granger causality results from small sample studies are spurious. We realized in our

*Energy Journal*paper that this was also a serious problem in the empirical Granger causality literature. The following graph illustrates this using studies from the energy-output causality literature:Each graph shows normalized test statistics for causality in one of the two directions. Rather than fit models with more lags in larger samples, researchers tend to deplete the degrees of freedom by adding more lags. Therefore, there tend to be fewer degrees of freedom for studies with three lags than with two, and fewer for those with two than with one. Also, we see that the average significance level increases as the lags increase and degrees of freedom reduces.

Of course, the second two types of biases give researchers additional opportunities to select statistically significant results for publication and so, more generally, "publication bias" includes selection of statistically significant results from those provided by sampling variability and by various biases.

The standard meta-regression model used in economics deals with the first of the three biases by exploiting the idea that if there is a genuine effect then studies with larger samples should have more statistically significant test statistics than smaller studies. If there is no real effect then there will be either no relation or even a negative relation between significance and sample size. Meta-analysis can test for the effects of omitted variables bias by including dummy variables and interaction terms for the different variables included in primary studies. Finally, in our

*Energy Journal*paper we controlled for the over-fitting/over-rejection bias by including the number of degrees of freedom lost in model fitting in our meta-regression.The new paper focuses on the latter issue and examines both the potential prevalence of over-fitting and over-rejection and the effectiveness of controlling for over-fitting. The approach used in this paper is a little different to the

*Energy Journal*paper - here we include the number of lags selected as the control variable. We show by means of Monte Carlo simulations that, even if the primary literature is dominated by false-positive findings of Granger causality, the meta-regression model correctly identifies the absence of genuine Granger causality. The following graphs show the key results:Power is the probability of reject the null hypothesis of no causality when it is incorrect - so here we have set up a simulated VAR where there is causality from energy to GDP. Mu is the mean sample size of the primary studies in our simulation and var is the variance. So, the lefthand graph is a simulation of mostly small sample studies. The middle one has a mixture of small and large studies, and the right hand graph has mostly large studies (but a few small ones too). The meta sample size is the number of studies that are brought together in the meta-analysis. DGP2a is a data generating process with a small effect size - DGP2b has a larger effect size.

So, what do these graphs show? When the samples in primary studies are small and we only have a meta sample of 10 or 20 studies, it is hard to detect a genuine effect, whatever we do. When the effect size is small it is still hard to detect an effect even when we have 80 primary studies using the traditional economics meta-regression model ("basic model"). Our "extended model" which controls for the number of lags really helps a lot in this situation. With large primary study sizes it is quite easy to detect a true effect with only 20 studies in the meta-analysis and our method adds little value. However, the energy-GDP causality literature has mostly small similar sized samples and is trying to detect what is quite a small effect in the energy causes GDP direction (elasticity of 0.05 or 0.1). Our approach has much to offer in this context.

## No comments:

## Post a Comment