Monday, May 23, 2016

Should We Test for Cointegration Using the Johansen Procedure If We Want to Estimate a Single Equation Static Regression?

A student from Cuba asked me:

"I want to apply the DOLS methodology... I have read several books and research works about DOLS but none of them explain clearly how to test cointegration in this case.... I asked some professors about this issue and one of them told me that I should apply the Johansen cointegration test."

It's quite easy to find papers that do this - first test for cointegration using the Johansen procedure, report only the cointegration test statistics, and if they can be used to reject the null hypothesis of non-cointegration then use some other method such as Dynamic Ordinary Least Squares (DOLS) to estimate a static single equation regression model. These researchers aren't actually interested in the complete vector autoregression (VAR) system, which is OK. I've reviewed quite a lot of papers that use this approach.

If your model has more than two variables (one dependent variable and one explanatory variable) then this is a very bad idea. The cointegration test statistics from the Johansen procedure (if they reject the null) say nothing about the cointegration properties of your single equation regression model.

The following simple example shows why. Imagine we have three variables, X1, X2, and X3 with the following "data generation process":


where epsilon 1 is a stationary stochastic process and epsilon 2 and 3 are simply white noise. Variables X2 and X3 follow simple random walks. Variable X1 cointegrates with X2. But X3 is a random walk that has nothing to do with the other two variables. If you estimate a VAR with these variables and do the Johansen cointegration test, you should expect to find that there is one cointegrating vector. But the following regression:


will not cointegrate. It is a spurious regression because it includes X3 which is an unrelated random walk. We cannot rely on finding that the VAR "cointegrates" to assume that this regression also cointegrates. Only X1 and X2 cointegrate in this example. Of course, it is possible that X1, X2, and X3 are jointly cointegrated but as this example shows, that doesn't have to be the case.

How can we avoid this? The cointegrating vector in this case is [1, -beta1]. We could test within the Johansen procedure whether we can restrict the cointegrating vector to not include a coefficient for X3. Unlike gamma3 in the static regression, if X3 does not belong in the cointegrating relationship, then this coefficient is expected to be zero. We can and should also test the residuals of the static regression to see if they cointegrate.

Tuesday, May 17, 2016

Stochastic Trend Included in Top 100 Economics Blogs!

Economics Blogs

I'm honored that Stochastic Trend has made it into a list of the top 100 economics blogs, albeit at position 99. It's a good list of possible blogs to follow.

P.S. 18 December 2016

Also see this list.

Friday, May 13, 2016

"Replicating" the Climate Contest

My previous post discussed Doug Keenan's climate contest. I wondered how accurate we could actually expect to be in such a situation. I assume that the temperature series is a simple random walk, possibly with a constant drift term. We want to see how accurately we can determine whether there is a drift term in the random walk or not.

So, again just using Excel, I created 1000 series of 134 observations each distributed as Normal(mu, 0.11), where mu is the drift term. For 250 series I set mu to 0.01, for 250 series I set it to -0.01 and for 500 to 0. I then compute the usual t-test for the significance of the sample mean for each series.

Only 127 t-tests were significant at the 5% level and 201 at the 10% level. Using a 10% significance level, statistical power - correct rejection of the incorrect null hypothesis of no drift - is 29%. Using a 5% significance level, power is 20%. There is no distortion of the actual "size" of the test - the number of incorrect rejections of the true null.

So, combining this information, if you use this method and a 10% significance level you will get 595 correct classifications of whether a random walk has a drift or does not have a drift, which is far below the 900 required to win the contest.

Of course, it seems that Keenan's data is a bit more complicated than this and may or may not have any relevance to the actual nature of climate data or the nature of the climate change problem.

You can download my data here. The first column is the drift term used and the first row indicates the years and the statistics columns.

More Mathiness in Climate Econometrics: Doug Keenan's Climate Change Contest

My colleague Robert Kaufmann got an e-mail from Doug Keenan inviting him to participate in his "climate change contest" without the usual $10 submission fee. I hadn't heard about this contest and went to the site to investigate. So, Keenan has produced 1000 time series of 135 observations each that are somehow derived from random numbers and then added a plus 1 or minus 1 per 100 observations trend to some of these. The series have been calibrated so that the they could potentially reproduce in some way the observed global temperature time series from 1880 to the present without an added trend. The task of the contestant is to determine for each series whether it has an added trend or not. If any submission gets 90% of these or more right by 30th November, this year, that submission will win $100,000.

Keenan's idea is that no-one can validly detect with 90% accuracy whether there is a trend in temperature or not. Therefore, the IPCC's claim that temperature has definitely increased over the last century and it is very likely that this is due to human activity must be wrong.

I downloaded the data. Looking at some of the series it's pretty clear that they are some sort of random walk (stochastic trend). It is not simply a series of random numbers (white noise) with a linear trend added. I haven't bothered to write a program to test this. Assuming that they are simple random walks, I tested in Excel whether the mean of first difference was different to zero for each of the thousand series. Only 8 of the series have a mean first difference that is significantly different to zero at the 5% level using the standard calculation of the standard error of the mean, which assumes that the first differences are white noise. If they were normally distributed white noise and none of the original series had an added trend then we would expect about 50 of the means of the first differences to be significantly different to zero by the definition of statistical significance. So, something else seems to be going on here. I expect that statistical power to detect a non-zero drift term of 0.01 or -0.01 when the standard deviation of the first differences is 0.11 is in any case rather low. Perhaps we could use structural time series methods, but statistical power of 90% at a significance level of 10% is a lot to ask for in this situation. I created my own dataset to see how many series one could expect to correctly classify - statistical power using a simple data generating process and a simple test was 29% for a 10% significance level test. This means that we can only correctly classify 595 of the 1000 series.

The real question to ask is whether Keenan's thought experiment makes sense. I would argue that it doesn't. His argument is that if temperature follows some kind of integrated process then it is very hard to determine whether it has a drift component or will sooner or later just stochastically trend down again. Therefore, we can't know if temperature has statistically significantly increased or not. But theory and climate models predict that global temperature should be stationary if radiative forcing is constant. If we detect a random walk or a close to random walk signal in the temperature data then something else is happening. Research can then try to determine if it is likely to be due to anthropogenic factors or not. It is possible that we make a type 1 error - falsely rejecting the null hypothesis - but we can determine how likely that is. So, in my opinion, Keenan's contest is another case of mathiness in climate econometrics.