Wednesday, August 5, 2020

Abandoning a Paper

Now and then it's time to give up on a project. In September 2018, I attended a climate econometrics conference at Frascati near Rome. For my presentation, I did some research on the performance of different econometric estimators of the equilibrium climate sensitivity (ECS) including the multicointegrating vector autoregression (MVAR) that we used in our paper in the Journal of Econometrics. The paper included estimates using historical time series observations (from 1850 to 2014), a Monte Carlo analysis, estimates using output of 16 Global Circulation Models (GCMs), and a meta-analysis of the GCM results.


The historical results, which are mostly also in the Journal of Econometrics paper, appear to show that taking energy balance into account increases the estimated climate sensitivity. By energy balance, we mean that if there is disequilibrium between radiative forcing and surface temperature the ocean must be heating or cooling. Surface temperature is in equilibrium with ocean heat, and in fact follows ocean heat much more closely than it follows radiative forcing. Not taking this into account results in omitted variables bias. Multicointegrating estimators model this flow and stock equilibirum. The residuals from a cointegrating relationship between the temperature and radiative forcing flows are accumulated into a heat stock, which in turn cointegrates with surface temperature. If we have actual observations on ocean heat content or radiative imbalances we can use them. But available time series are much shorter than those for surface temperature or radiative forcing. The results also suggested that using a longer time series increases the estimated climate sensitivity.

The Monte Carlo analysis was supposed to investigate these hypotheses more formally. I used the estimated MVAR as the model of the climate system and simulated the radiative forcing series as a random walk. I made 2000 different random walks and estimated the climate sensitivity with each of the estimators. This showed that, not surprisingly, the MVAR was an unbiased estimator. The other estimators were biased using a random walk of just 165 periods. But when I used a 1000 year series all estimators were unbiased. In other words, they were all consistent estimators of the ECS. This makes sense, because in the end equilibrium is reached between forcing and surface temperature. But it takes a long time.

Each of the GCMs I used has an estimated ECS ("reported ECS") from an experiment where carbon dioxide is suddenly increased fourfold. I was using data from a historical simulation of each GCM, which uses the estimated historical forcings over the period 1850 to 2014. A major problem in this analysis is that the modelling teams do not report the forcing that they used. This is because the global forcing that results from applying aerosols etc depends on the model and the simulation run. So, I used the same forcing series that we used to estimate our historical models. This isn't unprecedented, Marvel et al. (2018) do the same.

In general, the estimated ECS were biased down relative to the reported ECS for the GCMs, but again, the estimators that took energy balance into account seemed to do better. In an meta-analysis of the results, I compared how much the reported radiative imbalance (=ocean heat uptake roughly) from each GCM increased to how much the energy balance equation said it should increase using the reported temperature series, reported ECS, and my radiative forcing series. A regression analysis showed, that where the two matched, the estimators that took energy balance into account were unbiased, while those that did not match, under-estimated the ECS.

These results seemed pretty nice and I submitted the paper for publication. Earlier this year, I got a revise and resubmit. But when I finally got around to working on the paper post-lockdown and post-teaching things began to fall apart.

First, I came across the Forster method of estimating the radiative forcing in GCMs. This uses the energy balance equation:

where F is radiative forcing, T is surface temperature, and N is radiative imbalance. Lambda is the feedback parameter. ECS is inversely proportional to it. The deltas indicate the change since some baseline period. Then, if we know N and T, both of which are provided in GCM results, we can find F! So, I used this to get the forcing specific to each GCM. The results actually looked nicer than in the originally submitted paper. These are the results for the MVAR for 15 CMIP5 GCMs:


The rising line is a 45 degree line, which marks equality between reported and estimated ECSs. The multicointegrating estimators were still better than the other estimators. But there wasn't any systematic variation in the degree of underestimation that would allow us to use a meta-analysis to derive an adjusted estimate of the ECS.

This is still OK. But then I read and re-read more research on under-estimation of the ECS from historical observations. The recent consensus is that estimates from recent historical data will inevitably under-estimate the ECS because feedbacks change from the early stages after an increase in forcing to the latter stages as a new equilibrium is reached. The effective climate sensitivity is lower at first and greater later.

OK, even if we have to give up on estimating the long-run ECS, my estimates are estimates of the historical sensitivity. Aren't they? The problem is that I used the long-run ECS to derive the forcing from the energy balance equation. So, the forcing I derived is wrong. It is too low. I could go back to using the forcing I used previously, I guess. But now I don't believe the meta-analysis of that data is meaningful. So, I have a bunch of estimates using the wrong forcing with no way to further analyse them.

I also revisited the Monte Carlo analysis. By the way I had an on-and-off again coauthor through this research. He helped me a lot with understanding how to analyse the data. But he didn't like my overly bullish conclusions on the submitted paper and so withdrew his name from it. But he was maybe going to get back on the revised submission. He thought that the existing analysis which used an MVAR to produce the simulated data was maybe biased unfairly in favour of the MVAR. So, I came up with a new data-generating process. Instead of starting with a forcing series I would start with the heat content series. From that I would derive temperature, which needs to be in equilibrium with heat content and then using the energy balance equation derive the forcing. To model the heat content I fitted a unit root autoregressive model (stochastic trend) to the heat content reported from the Community GCM with the addition of a volcanic forcing explanatory variable. The stochastic trend represents anthropogenic forcing. The Community GCM is one of the 15 GCMs I was using and it has temperature and heat content series that look a lot like the observations. I then fitted a stationary autoregressive model for temperature with the addition of the heat content as an explanatory variable. The simulated model used normally distributed shocks with the same variance as these fitted models and volcanic shocks.

As an aside, the volcanic shocks were produced by the model:
where rangamma(0.05) are random numbers drawn from a standard gamma distribution with shape parameter 0.05. This is supposed to produce the stratospheric sulfur radiative forcing, which decays over a few years following an eruption. Here is an example realisation:

The dotted line is historical volcanic forcing and the solid line a simulated forcing. My coauthor said it looked "awesome".

So, again, I produced two sets of 2000 datasets. One with a sample size of 165 and one with a sample size of 1000. Now, even in the smaller sample, all four estimators I was testing produced essentially identical and unbiased results! I ran this yesterday. So, our Monte Carlo result disappears. I can't see anything unreasonable about this data generating process, which produces completely different results to the one in the submitted paper. So, I don't see anything to justify one over the other. So, this was the point where I gave up on this project.

My coauthor, who is based in Europe, is on vacation. Maybe he'll see a way to save it when he comes back, but I am sceptical.