Recently, Stephan Bruns published a paper with John Ioannidis in PLoS ONE critiquing the p-curve. I've blogged about the p-curve previously. Their argument is that the p-curve cannot distinguish "true effects" from "null effects" in the presence of omitted variables bias. Simonsohn et al., the originators of the p-curve, have responded in their blog, which I have added to the blogroll here. They say, of course, the p-curve cannot distinguish between causal effects and other effects but it can distinguish between "false positives", which are non-replicable effects and "replicable effects", which include both "confounded effects" (correlation but not causation) and "causal effects". Bruns and Ioannidis have responded to this comment too.
In my previous blogpost on the p-curve, I showed that the Granger causality tests we meta-analysed in our Energy Journal paper in 2014 form a right-skewed p-curve. This would mean that there was a "true effect" according to the p-curve methodology. However, our meta-regression analysis where we regressed the test statistics on the square root of degrees of freedom in the underlying regressions showed no "genuine effect". Now I understand what is going on. The large number of highly significant results in the Granger causality meta-dataset is generated by "overfitting bias". This result is "replicable". If we fit VAR models to more such short time series we will again get large numbers of significant results. However, regression analysis shows that this result is bogus as the p-values are not negatively correlated with degrees of freedom. Therefore, the power trace meta-regression is a superior method to the p-curve. In addition, we can modify this regression model to account for omitted variables bias by adding dummy variables and interaction terms (as we do in our paper). This can help to identify a causal effect. Of course, if no researchers actually estimate the true causal model then this method too cannot identify the causal effect. But there are always limits to our ability to be sure of causality. Meta-regression can help rule out some cases of confounded effects.
So, to sum up there are the following dichotomies:
* In the case of unit root spurious regressions mentioned in Bruns and Ioannidis' response, things are a bit complicated. In the case of a bivariate spurious regression, where there is a drift in the same direction in both variables then it is likely that Stanley's FAT-PET and similar methods will show that there is a true effect. Even though there is no relationship at all between the two variables, the nature of the data-generating-process for each means that they will be correlated. Where there is no drift or the direction of drift varies randomly then there should be equal numbers of positive and negative t-statistics in underlying studies and no relationship between the value of the t-statistic and degrees of freedom, though there is a relationship between the absolute value of the t-statistic and degrees of freedom. Here meta-regression does better than the p-curve. I'm not sure if the meta-regression model in our Energy Journal paper might be fooled by Granger Causality tests in levels of unrelated unit root variables. These would likely be spuriously significant but the significance might not rise strongly with sample size?
In my previous blogpost on the p-curve, I showed that the Granger causality tests we meta-analysed in our Energy Journal paper in 2014 form a right-skewed p-curve. This would mean that there was a "true effect" according to the p-curve methodology. However, our meta-regression analysis where we regressed the test statistics on the square root of degrees of freedom in the underlying regressions showed no "genuine effect". Now I understand what is going on. The large number of highly significant results in the Granger causality meta-dataset is generated by "overfitting bias". This result is "replicable". If we fit VAR models to more such short time series we will again get large numbers of significant results. However, regression analysis shows that this result is bogus as the p-values are not negatively correlated with degrees of freedom. Therefore, the power trace meta-regression is a superior method to the p-curve. In addition, we can modify this regression model to account for omitted variables bias by adding dummy variables and interaction terms (as we do in our paper). This can help to identify a causal effect. Of course, if no researchers actually estimate the true causal model then this method too cannot identify the causal effect. But there are always limits to our ability to be sure of causality. Meta-regression can help rule out some cases of confounded effects.
So, to sum up there are the following dichotomies:
- Replicable vs. non-replicable - can use p-curve.
- True or genuine effect (a correlation in the data-generating process) vs. false positive - metaregression model is more likely to give correct inference.*
- Causal vs. confounded effect - extended meta-regression model can rule out some confounded effects.
* In the case of unit root spurious regressions mentioned in Bruns and Ioannidis' response, things are a bit complicated. In the case of a bivariate spurious regression, where there is a drift in the same direction in both variables then it is likely that Stanley's FAT-PET and similar methods will show that there is a true effect. Even though there is no relationship at all between the two variables, the nature of the data-generating-process for each means that they will be correlated. Where there is no drift or the direction of drift varies randomly then there should be equal numbers of positive and negative t-statistics in underlying studies and no relationship between the value of the t-statistic and degrees of freedom, though there is a relationship between the absolute value of the t-statistic and degrees of freedom. Here meta-regression does better than the p-curve. I'm not sure if the meta-regression model in our Energy Journal paper might be fooled by Granger Causality tests in levels of unrelated unit root variables. These would likely be spuriously significant but the significance might not rise strongly with sample size?