Rigorous Evidence Isn’t

written by Lant Pritchett

Currently, there are many statements floating around in development about the use of “rigorous evidence” in formulating policies and programs. Nearly all of these claims are fatuous. The problem is, rigorous evidence isn’t.

That is, suppose one generates some evidence about the impact of some programmatic or policy intervention in one particular context that is agreed by all to be “rigorous” because it meets methodological criteria for internal validity of its causal claims. But the instant this evidence is used in formulating policy it isn’t rigorous evidence any more. Evidence would be “rigorous” about predicting the future impact of the adoption of a policy only if the conditions under which the policy was to be implemented were exactly the same in every relevant dimension as that under which the “rigorous” evidence was generated. But that can never be so because neither economics—nor any other social science—have theoretically sound and empirically validated invariance laws that specify what “exactly the same” conditions would be.

So most uses of rigorous evidence aren’t. Take, for instance, the justly famous 2007 JPE paper by Ben Olken on the impact of certain types of monitoring on certain types of corruption. According to Google Scholar as of today, this paper has been cited 637 times. The question is, for how many of the uses of this “rigorous evidence” is it really “rigorous evidence”? We (well, my assistant) sampled 50 of the citing papers with 57 unique mentions of Olken (2007). Only 8 of those papers were about Indonesia (Of course even those 8 are only even arguably “rigorous” applications as they might be about different programs or different mechanisms or different contexts.) 47 of the 57 (82%) of the mentions are neither about Indonesia nor even an East Asia or Pacific country—they might be a review of the literature about corruption in general, about another country, or methodological. We also tracked whether the words “context” or “external validity” appeared within +/- two paragraphs of the mention. In 34 of the 57 (60%) mentions, the evidence was not about Indonesia and did not mention that the results, while “rigorous” for the time, place and programmatic/policy context, have no claim to be rigorous about any other time, place, or programmatic/policy context.

Another justly famous paper, Angrist and Lavy (1999) in the QJE uses regression discontinuity to identify the impact of class size on student achievement in Israel. This paper has been cited 1244 times. I looked through the first 150 citations to this paper (which Google Scholar sorts by the number of times the citing paper has itself been cited) and (other than other papers by the authors) not one mentioned Israel (not that surprisingly, as Israel is a tiny country) in the title or abstract while China, India, Bangladesh, Cambodia, Bolivia, UK, Wales, USA (various states and cities), Kenya and South Africa all figured. Angrist and Lavy do not, and do not claim to, provide “rigorous” evidence about any of those contexts.

If one is formulating policies or programs for attacking corruption in highway procurement in Peru or reducing class size in secondary school in Thailand, it is impossible to base those policies on “rigorous evidence” as evidence that is rigorous for Indonesia or Israel isn’t rigorous for these other countries.

Now, some might make the argument that formulation of policies or programs in context X should rely exclusively/primarily/preferentially on evidence that is “rigorous” in context Z because at least we know that in context Z in which it was generated the evidence is internally valid. This is both fatuous and false as a general proposition.

Fatuous in that no one understands the phrase “policy based on rigorous evidence” to mean “policy based on evidence that isn’t rigorous with respect to the actual policy context to which it is being applied (because there are no rigorous claims to external validity) but rather based on evidence that is rigorous in some other context.” No one understands it that way because that isn’t rigorous evidence.

It is also false as a general proposition. It is easy to construct plausible empirical examples in which the evidence suggests that the bias from internal validity is much (much) smaller than the bias from external validity as the contextual variation in “true” impact is much larger than the methodological bias from lack of “clean” causal identification of simple methods. In these instances, better policy is made using “bad” (e.g. not internally valid) evidence from the same context than “rigorous” evidence from another context (e.g. Pritchett and Sandefur 2013).

Sadly perhaps, there is no shortcut around using judgment and wisdom in assessing all of the available evidence in formulating policies and programs. Slogans like “rigorous evidence” are an abuse, not a use, of social science.

Leave a Reply Cancel reply