Last year, Energy Economics announced a call for papers for a special issue on replication in energy economics. Together with Stephan Bruns and Johannes König we decided to do a replication of my 1993 paper in Energy Economics on Granger causality between energy use and GDP. That paper was the first chapter in my PhD dissertation. It is my fourth most cited paper and given the number of citations could be considered "classic" enough to do an updated robustness analysis on it. In fact, another replication of my paper has already been published as part of the special issue. The main results of my 1993 paper were that in order to find Granger causality from energy use to GDP we need to use both a quality adjusted measure of energy and control for capital and labor inputs.
It is a bit unusual to include the original author as an author on a replication study, and my role was a bit unusual. Before the research commenced, I discussed with Stephan the issues in doing a replication of this paper, giving feedback on the proposed design of the replication and robustness analysis. The research plan was published on a website dedicated to pre-analysis plans. Publishing a research plan is similar to registering a clinical trial and is supposed to help reduce the prevalence of p-hacking. Then, after Stephan and Johannes carried out the analysis, I gave feedback and helped edit the final paper.
Unfortunately, I had lost the original dataset and the various time series I used have been updated by the US government agencies that produce them. The only way to reconstruct the original data would have been to find hard copies of all the original data sources. Instead we used the data from my 2000 paper in Energy Economics, which is quite similar to the original data. Using this close to original data, Stephan and Johannes could reproduce all my original results in terms of the direction of Granger Causality and the same qualitative significance levels. In this sense, the replication was a success.
But the test I did in 1993 on the log levels of the variables is inappropriate if the variables have stochastic trends (unit roots). The more appropriate test is the Toda-Yamamoto test. So, the next step was to redo the 1993 analysis using the Toda-Yamamoto test. Surprisingly, these results are also very similar to those in Stern (1993). But, when Stephan and Johannes used the data for 1949-1990 that are currently available on US government websites, the Granger causality test of the effect of energy on GDP was no longer statistically significant at the 10% level. Revisions to past GDP have been very extensive, as we show in the paper:
Results were similar when they extended the data to 2015. However, when they allowed for structural breaks in the intercept to account for oil price shocks and the 2008-9 financial crisis, the results were again quite similar to Stern (1993) both for 1949-1990 and for 1949-2015.
They then carried out an extensive robustness check using different control variables and variable specifications and a meta-analysis of those tests to see which factors had the greatest influence on the results.
They conclude that p-values tend to be substantially smaller (test statistics are more significant) if energy use is quality adjusted rather than measured by total joules and if capital is included. Including labor has mixed results. These findings largely support Stern’s (1993) two main conclusions and emphasize the importance of accounting for changes in the energy mix in time series modeling of the energy-GDP relationship and controlling for other factors of production.
I am pretty happy with the outcome of this analysis! Usually it is hard to publish replication studies that confirm the results of previous research. We have just resubmitted the paper to Energy Economics and I am hoping that this mostly confirmatory replication will be published. In this case, the referees added a lot of value to the paper, as they suggested to do the analysis with structural breaks.
It is a bit unusual to include the original author as an author on a replication study, and my role was a bit unusual. Before the research commenced, I discussed with Stephan the issues in doing a replication of this paper, giving feedback on the proposed design of the replication and robustness analysis. The research plan was published on a website dedicated to pre-analysis plans. Publishing a research plan is similar to registering a clinical trial and is supposed to help reduce the prevalence of p-hacking. Then, after Stephan and Johannes carried out the analysis, I gave feedback and helped edit the final paper.
Unfortunately, I had lost the original dataset and the various time series I used have been updated by the US government agencies that produce them. The only way to reconstruct the original data would have been to find hard copies of all the original data sources. Instead we used the data from my 2000 paper in Energy Economics, which is quite similar to the original data. Using this close to original data, Stephan and Johannes could reproduce all my original results in terms of the direction of Granger Causality and the same qualitative significance levels. In this sense, the replication was a success.
But the test I did in 1993 on the log levels of the variables is inappropriate if the variables have stochastic trends (unit roots). The more appropriate test is the Toda-Yamamoto test. So, the next step was to redo the 1993 analysis using the Toda-Yamamoto test. Surprisingly, these results are also very similar to those in Stern (1993). But, when Stephan and Johannes used the data for 1949-1990 that are currently available on US government websites, the Granger causality test of the effect of energy on GDP was no longer statistically significant at the 10% level. Revisions to past GDP have been very extensive, as we show in the paper:
Results were similar when they extended the data to 2015. However, when they allowed for structural breaks in the intercept to account for oil price shocks and the 2008-9 financial crisis, the results were again quite similar to Stern (1993) both for 1949-1990 and for 1949-2015.
They then carried out an extensive robustness check using different control variables and variable specifications and a meta-analysis of those tests to see which factors had the greatest influence on the results.
They conclude that p-values tend to be substantially smaller (test statistics are more significant) if energy use is quality adjusted rather than measured by total joules and if capital is included. Including labor has mixed results. These findings largely support Stern’s (1993) two main conclusions and emphasize the importance of accounting for changes in the energy mix in time series modeling of the energy-GDP relationship and controlling for other factors of production.
I am pretty happy with the outcome of this analysis! Usually it is hard to publish replication studies that confirm the results of previous research. We have just resubmitted the paper to Energy Economics and I am hoping that this mostly confirmatory replication will be published. In this case, the referees added a lot of value to the paper, as they suggested to do the analysis with structural breaks.
No comments:
Post a Comment