Most Charities Shouldn’t Evaluate Their Work: Part Two: Who should measure what? |

This two-part series first appeared in Stanford Social Innovation Review.

So what should happen if no one has properly evaluated an idea yet? If it’s important, an independent and suitably skilled researcher should evaluate it in enough detail and in enough contexts for other charities and donors to rely on the findings. The leading medical journal The Lancet cites a tenet of good clinical research: “Ask an important question, and answer it reliably.”

A countercultural implication follows from this. It’s often said that the evaluation of a grant should be proportionate to the size of the grant. It’s also often said that evaluations should be proportionate to the size of the charities. We can see now that both views are wrong. The aim of an evaluation is to provide a reliable answer to an important question. From there, the amount worth spending on an evaluation is proportionate to the size of the knowledge gap and the scale of the programs that might use the answer.

To illustrate, suppose a small company has developed a new drug for breast cancer. The “first-in-(wo)man studies,” as they’re called, involve only a few people, for obvious safety reasons. Relative to the cost of dispensing the drug to those few women, how much should the company spend on evaluating the effect on them? The answer is “a lot,” because the answer is important for many people. So the cost of the “pilot” is irrelevant. So too is the size of the company running the “pilot.” Often, the cost of robustly evaluating a program will exceed the cost of delivering that program—which is fine, if the results are useful to a wide audience.

Conflicted out

So not only are most charities unskilled at evaluations—and we wouldn’t want them to be—but also we wouldn’t want most charities to evaluate their own work even if they could. Despite their deep understanding of their work, charities are the worst people imaginable to evaluate it because they’re the protagonists. They’re selling. They’re conflicted. Hence, it’s hardly surprising that the Paul Hamlyn Foundation study found “some, though relatively few, instances of outcomes being reported with little or no evidence to back this up.”

I’m not saying that charities are corrupt or evil. It’s just unreasonable—possibly foolish—to expect that people can be impartial about their own work, salaries, or reputations. As a charity CEO, I’ve seen how “impact assessment” and fundraising are co-mingled: Charities are encouraged to parade their self-generated impact data in fundraising applications. No prizes for guessing what happens to self-generated impact data that isn’t flattering.

To my knowledge, nobody’s ever examined the effect of this self-reporting among charities. But they have in medicine, where independent studies produce strikingly different results to those produced by the protagonists. Published studies funded by pharmaceutical companies are four times more likely to give results favorable to the company than are independent studies. It’s thought that around half of all clinical trial results are unpublished, and it doesn’t take a genius to figure out which half that might be.

Who should do evaluations?

Skilled and independent researchers, such as academics, should normally take on evaluation of ideas. They should be funded independently and as a public good, such that charities, donors, and others can access them to decide which ideas to use. It’s no accident that The Wellcome Trust, the UK’s largest charity, requires that all the research it funds is published in open-access journals.

The charity itself can normally take on monitoring of implementation.

Useful resources and ideas for funders

Academics and others have evaluated ideas well and published the results. Smart funders use that material to avoid funding something that is known to be useless or even harmful. To be fair, much available information could be easier to find and understand. But some of it is already easily accessible:

On first world education, The Education Endowment Foundation (a £125 million fund of UK government money to improve education for 5-16 year olds in England) has collated and analysed evidence on interventions in many countries, and created a wonderful toolkit (“menu”) that shows the quality of the evidence and the apparent strength of the intervention. The foundation is rigorously evaluating all the interventions it funds, publishing the results, and adding them to the toolkit.

In international development, several entities publish all their evaluation findings. They include the Abdul Latif Jameel Poverty Action Lab at MIT (J-PAL) and Innovations for Poverty Action, which have collectively run some 400 impact evaluations. Also, the International Initiative for Impact Evaluation (3ie) database includes more than 600 impact evaluations, as well as systematic reviews of all the evidence on particular topics. Recent systematic reviews have looked at small-scale farmers, female genital cutting, and HIV testing.

In health, The Cochrane Collaboration uses a network of more than 28,000 medics in over 100 countries to produce systematic reviews of many types of intervention. Its database already has more than 5,000 such reviews, including some related to health care in disaster and emergency situations.

Various universities have centers that produce and publish such research. One example is Oxford University’s Centre for Evidence-Based Intervention, which looks at social and psychosocial problems.

The UK government’s new What Works Centres will create libraries of evidence about crime and policing, aging, and various other sectors in which UK charities and donors operate.

These are free resources. Smart donors and charities use them. And they publish their own evaluations—with full methodological detail—so that others can learn from them. It’s essential that charities’ work is evaluated properly so that resources can flow to the best. That means appropriately skilled and independent people should evaluate a charity’s work only when necessary.

What does decent evidence look like? Like this–>