Tag Archives: Test

Models Make Economics A Science

In the Journal of Economic Literature, Ariel Rubinstein discusses Dani Rodrik’s “superb” book “Economics Rules.” The article nicely articulates what economics and specifically, economic modeling is about. Some quotes (emphasis my own) …

… on the nature of economics:

[A] quote … by John Maynard Keynes to Roy Harrod in 1938: “It seems to me that economics is a branch of logic, a way of thinking”; “Economics is a science of thinking in terms of models joined to the art of choosing models which are relevant to the contemporary world.”

[Rodrik] … declares: “Models make economics a science” … He rejects … the … common justification given by economists for calling economics a science: “It’s a science because we work with the scientific method: we build hypotheses and then test them. When a theory fails the test, we discard it and either replace it or come up with an improved version.” Dani’s response: “This is a nice story, but it bears little relationship to what economists do in practice …”

… on models, forecasts, and tests:

A good model is, for me, a good story about an interaction between human beings …

A story is not a tool for making predictions. At best, it can help us realize that a particular outcome is possible or that some element might be critical in obtaining a particular result. … Personally, I don’t have any urge to predict anything. I dread the moment (which will hopefully never arrive) when academics, and therefore also governments and corporations, will be able to predict human behavior with any accuracy.

A story is not meant to be “useful” in the sense that most people use the word. I view economics as useful in the sense that Chekhov’s stories are useful—it inspires new ideas and clarifies situations and concepts. … [Rodrik] is aware … “Mischief occurs when economists begin to treat a model as the model. Then the narrative takes on a life of its own and becomes dislodged from the setting that produced it. It turns into an all-purpose explanation that obscures alternative, and potentially more useful, story lines”.

A story is not testable. But when we read a story, we ask ourselves whether it has any connection to reality. In doing so, we are essentially trying to assess whether the basic scenario of the story is a reasonable one, rather than whether the end of the story rings true. … Similarly, … testing an economic model should be focused on its assumptions, rather than its predictions. On this point, I am in agreement with Economics Rules: “. . . what matters to the empirical relevance of a model is the realism of its critical assumptions”.

… on facts:

The big “problem” with interpreting data collected from experiments, whether in the field or in the lab, is that the researchers themselves are subject to the profession’s incentive system. The standard statistical tests capture some aspects of randomness in the results, but not the uncertainty regarding such things as the purity of the experiment, the procedure used to collect the data, the reliability of the researchers, and the differences in how the experiment was perceived between the researcher and the subjects. These problems, whether they are the result of intentional sleight of hand or the natural tendency of researchers to ignore inconvenient data, make me somewhat skeptical about “economic facts.”

Self-Correcting Research?

The Economist doubts that science is self-correcting as “many more dodgy results are published than are subsequently corrected or withdrawn.”

Referees do a bad job. Publishing pressure leads researchers to publish their (correct and incorrect) results multiple times. Replication studies are hard and thankless. And everyone seems to be getting the statistics wrong.

A researcher suffers from a type I error when she incorrectly rejects an hypothesis although it is true (false positive); and from a type II error when she incorrectly accepts an hypothesis although it is wrong (false negative). A good testing procedure minimises the type II error given a specified type I error that is, it maximises the power of the test. While employing a test with a power of 80% is considered good practice actual hypothesis testing often suffers from much lower power. As a consequence, many or even a majority of apparent “results” identified by a test might be wrong while most of the “non-results” are correctly identified. Quoting from the article:

… consider 1,000 hypotheses being tested of which just 100 are true (see chart). Studies with a power of 0.8 will find 80 of them, missing 20 because of false negatives. Of the 900 hypotheses that are wrong, 5%—that is, 45 of them—will look right because of type I errors. Add the false positives to the 80 true positives and you have 125 positive results, fully a third of which are specious. If you dropped the statistical power from 0.8 to 0.4, which would seem realistic for many fields, you would still have 45 false positives but only 40 true positives. More than half your positive results would be wrong.