Most Charities Shouldn’t Evaluate Their Work: Part One Why not? |

This two-part series first appeared in Stanford Social Innovation Review.

Most “evaluations” of charities’ work are done by the charities themselves and are a waste of time. Perhaps this is a surprising view for an advocate who thinks that charitable work should be based on evidence—but it’s true because charitable activity should be based on good quality,robust evidence, which isn’t what many charities can reasonably be expected to produce.

What is evaluation?

Before we get into why this is true, let’s get clear about what evaluation is, and what it isn’t.

The effect of a charity’s work depends on the quality of the idea (intervention) it uses, and how well it implements that idea. Both idea and implementation need to be good for the impact to be high; and if either idea or implementation is low, the impact will be low. Think of it as:

impact = idea x implementation

To illustrate the difference, consider a breakfast club in a school for disadvantaged children. The idea is that a decent breakfast aids learning by avoiding the distractions of hunger. The implementation involves having foods that the children will eat, buying them at a good price, getting children to show up, and so on.

Assessing the quality of implementation is relatively easy. Do children come, what is their feedback on the breakfast club, how much of the food gets wasted, etc.? This is monitoring. It’s vital, and by rapidly providing feedback to staff, it enables the organization to improve its processes and improve them dramatically (as Bill Gates discussed in his annual letter). This monitoring (or “process evaluation”) should happen almost always, and the charity can normally do it themselves.

Notice, however, that monitoring looks only at the performance of that one organization, so it will never (on its own) tell you whether a charity is good relative to other places you might put your money—that is, whether funding that charity is a good idea.

Assessing the quality of the idea is rather harder. That involves investigating whether a decent breakfast actually does aid learning. And that requires isolating the effect of the intervention from other extraneous factors. This is impact evaluation, and “evaluation is distinguished from monitoring by a serious attempt to establish attribution,” says Michael Kell, chief economist at the (UK) National Audit Office.

It’s hard. In our example, we can’t look simply at whether the children who attend breakfast club are now learning better than before: Perhaps the club starting coincided with a new teacher arriving, or better books arriving, or children suddenly watching more Sesame Street. (Organisations could look at these factors in a pre/post study, and many charities do, but they prove little if anything.)

Neither can we look at whether children in breakfast club learn faster than those who don’t, because it’s highly likely that there will be major differences between those who come and those who don’t: Perhaps only the worst learners attend. To get around that, we’d have to do a randomised control trial. That throws up complicated questions such as whether to randomise children, or schools, or towns; and what sample size to have. But even then we’re not out of the woods. We might have to deal with “spill-over effects” (benefits to children who don’t go to club—for example, those who do go may be less disruptive in class); and “cross-over effects” (such as children who attend the club giving food to children who don’t go). These and many others are all normal questions in such research.

Hence, establishing attribution—which is integral to evaluating an idea—is a whole field of social science research.

Monitoring is of the implementation.

Evaluation is of the idea.

Immediately we see that monitoring and evaluating are totally different exercises, despite the fact that they are often used as though they are identical.

Most charities aren’t comprised of social scientists

It’s reasonable to expect charities to monitor their work: How many trains do you run, are they on time, and what do passengers think of them? As with companies reporting the number of units they’ve sold, we might audit those figures, but there’s not normally anything technically difficult in monitoring.

By contrast, evaluation is social science research, and it is hard: What effect do trains have on economic growth? Most charities aren’t able to run evaluations because they don’t have the skills in-house. We can see this in a recent review by the Paul Hamlyn Foundation of reports from grantees (possibly the only analysis of its type): 70 percent of evidence presented by charities ranked below what the foundation called “good.”

The good news is that most charities don’t need these research skills. Once we know whether and when breakfast clubs work—once they’ve been evaluated rigorously—then we know and don’t need to evaluate them again (unless the context is very different). To take a medical analogy, we don’t expect every hospital to be a test-site. The clinical trials are done somewhere—properly, with luck—and then published so that everybody else can use the results.

Often, the ideas used by charities don’t need to be evaluated again, because they’ve been amply evaluated already. Charities—and funders and others—can use those existing evaluations to choose effective interventions. All the charity then needs to do is run the programs well. Charities need to be skilled at implementation—at running breakfast clubs, or community transport, or drug rehab centres. By the long-established law of comparative advantage, we should let them do what they’re best at and not ask them also to get good at the totally unrelated skills of social science research.

Part two: So who should evaluate anything then?