Tag Archives: Power of test

On the Credibility of the ‘Credibility Revolution’

Kevin Lang argues in NBER wp 31666:

When economists analyze a well-conducted RCT or natural experiment and find a statistically significant effect, they conclude the null of no effect is unlikely to be true. But how frequently is this conclusion warranted? The answer depends on the proportion of tested nulls that are true and the power of the tests. I model the distribution of t-statistics in leading economics journals. Using my preferred model, 65% of narrowly rejected null hypotheses and 41% of all rejected null hypotheses with |t|<10 are likely to be false rejections. For the null to have only a .05 probability of being true requires a t of 5.48.

Doubts about Empirical Research

The Economist reports about research by Paul Smaldino and Richard McElreath indicating that studies in psychology, neuroscience and medicine have low statistical power (the probability to correctly reject a null hypothesis). If, nevertheless, almost all published studies contain significant results (i.e., rejections of null hypotheses), then this is suspicious.

Furthermore, Smaldino and McElreath’s research suggests that

the process of replication, by which published results are tested anew, is incapable of correcting the situation no matter how rigorously it is pursued.

With the help of a model of competing research institutes, Smaldino and McElreath simulate how empirical scientific research  progresses. Labs that find more new results also tend to produce more false positives. More careful labs try to rule out false positives but publish less. More “successful” labs are allowed to replicate. As a consequence, less careful labs spread out. Replication—repetition of randomly selected findings—does not stop this process.

poor methods still won—albeit more slowly. This was true in even the most punitive version of the model, in which labs received a penalty 100 times the value of the original “pay-off” for a result that failed to replicate, and replication rates were high (half of all results were subject to replication efforts).

Smaldino and McElreath conclude that “top-performing laboratories will always be those who are able to cut corners”—even in a world with frequent replication. The Economist concludes that

[u]ltimately, therefore, the way to end the proliferation of bad science is not to nag people to behave better, or even to encourage replication, but for universities and funding agencies to stop rewarding researchers who publish copiously over those who publish fewer, but perhaps higher-quality papers.