Many (many!) charities are too small to measure their own impact

Most charities should not evaluate their own impact. Funders should stop asking them to evaluate themselves. For one thing, asking somebody to mark their own homework was never likely to be a good idea.

This article explains the four very good reasons that most charities should not evaluate their own impact, and gives new data about how many of them are too small.

Most operational charities should not (be asked to) evaluate themselves because:

1. They have the wrong incentive. Their incentive is (obviously!) to make themselves look as great as possible – impact evaluations are used to compete for funding – so their incentive is to produce research that is flattering. That can mean rigging the research to make it flattering and/or burying findings that doesn’t flatter them. I say this having been a charity CEO myself and done both.

Non-profits respond to that incentive. For example, a rigorous study* offered over 1400 microfinance institutions the chance to have their intervention rigorously evaluated. Some of the invitations included a (real) study by prominent authors indicating that microcredit is effective. Other invitations included information on (real) research – by the same authors using a very similar design – indicating the ineffective. A third set of invitations did not include research results. Guess what? The organisations whose invitations implied that the evaluation would find their intervention to be effective were twice as likely to respond and agree to be evaluated than those whose invitation implied the danger of find their intervention to be ineffective. This suggests that the incentive creates a big selection bias even in what impact evaluations happen.

2. They lack the necessary skills in impact evaluation. Most operational charities are specialists in, say, supporting victims of domestic violence or delivering first aid training or distributing cash in refugee camps. These are completely different skills to doing causal research, and one would not expect expertise in these unrelated skills to be co-located. {NB, this article is about impact evaluation. Other types of evaluation, e.g., process evaluation, may be different. Also a few charities do have the skills to do impact evaluation – the IRC, Give Directly come to mind. But they are the exceptions. Most don’t.}

3. They often lack the funding to do evaluation research properly. One major problem is that a good experimental evaluation may involve gathering data about a control group which does not get the programme or which gets a different programme, and few operational charities have access to such a set of people.

A good guide is a mantra from evidence-based medicine, that research should “ask an important question and answer it reliably”. If there not enough money (or sample size) to answer the question reliably, don’t try to answer it at all.

Caroline talking about exactly this to a group of major donors

4. They’re too small. Specifically, their programmes are too small: they do not have enough sample size for evaluations of just their programmes to produce statistically meaningful results, i.e., to distinguish the effects of the programme from that of other factors or random chance, i.e., results of self-evaluations by operational charities are quite likely to be just wrong. For example, when the Institute of Fiscal Studies did a rigorous study of the effects of breakfast clubs, it needed 106 schools in the sample: that is way more than most operational charities providing breakfast clubs have.

Giving Evidence has done some proper analysis to corroborate this view that many operational charities’ programmes are too small to reliably evaluate. The UK Ministry of Justice runs a ‘Data Lab’, which any organisation running a programme to reduce re-offending can ask to evaluate that programme: the Justice Data Lab uses the MoJ’s data to compare the re-offending behaviour of participants in the programme with that of a similar (‘propensity score-matched’) set of non-participants. It’s glorious because, for one thing, it shows loads of charities’ programmes all evaluated in the same way, on the same metric (12-month reoffending rate) by the same independent researchers. It is the sole such dataset of which we are aware, anywhere in the world.

In the most recent data (all its analyses up to October 2020), the JDL had analysed 104 programmes run by charities (‘the voluntary and community sector’), of which fully 62 prove too small to produce conclusive results. 60% of the charity-run programmes were too small to evaluate reliably.

The analyses also show the case for reliable intervention and not just guessing which charity-run programmes work or assuming that they all do:

a. Some charity-run programmes create harm: they increase reoffending, and

b. Charity-run programmes vary massively in how effective they are:

Hence most charities should not be PRODUCERS of research. But they should be USERS of rigorous, independent research – about where the problems are, why, what works to solve them, and who is doing what about them. We’ve written about this amply elsewhere.

* I particularly love this study because of how I came across it. It was mentioned by a bloke I got talking to in a playground while looking after my godson. The playground happens to be between MIT and Harvard, so draws an unusual crowd, but still. Who needs research infrastructure when you can just chat to random strangers in the park?…

 

More about why charities shouldn’t evaluate themselves–>

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s