This article first published in Civil Society Magazine.
Most of the impact data charities gather does not help raise funds or improve performance
This opinion piece is a response to an article titled How do you measure the value of impact measurement? by David Ainsworth.
David’s article is actually mis-titled. It’s not a discussion of methods for measuring the
value of impact measurement, but rather stating a view that there may be none, which may explain why nobody seems interested in measuring it. I broadly agree with that. His article raises many points, so we’ll go carefully through them.
First, let’s be clear that “impact measurement” has two completely different purposes: first, helping to raise funds, and second, improving performance. Let’s consider them separately.
Does impact measurement help raise funds?
There is no robust evidence that it does. To my knowledge, there have only been a few rigorous tests of whether telling donors about impact, or the quality of the charity, changes their giving behaviour. They’ve all been in the US and all with ‘retail’ donors. (This of itself is astonishing. There have probably been a zillion studies of whether, when and how independent assessments of companies – such as bond ratings – affect investor behaviour.)
Those studies all found that retail donations are not increased if the charity highlights that it had high ratings from an independent evaluator (the study used Charity Navigator scores) and may be decreased. I summarised those studies, here.
However, since literally nobody has tested the effect of impact reporting on donors in any other country, nor on other types of donors such as major donors or foundations, we don’t know whether it affects their giving.
If we move outside the realm of rigorous studies (to things which are more likely to be wrong!) there are lots of anecdotes about how demonstrating high impact (we’ll return to what that actually means) leads to more funding.
GiveWell assesses charities and recommends only what it thinks are the handful best it’s ever seen. It claims to have moved $40m to them last year alone (plus another $70m from a Facebook-co-founder who now sort-of houses GiveWell.) The Chronicle of Philanthropy is soon publishing analysis of giving to the top 400 US charities, and thinks that donors are becoming more interested in impact.
Maybe, maybe not. A friend in this sector said a while ago that she felt that ‘the impact question’ is simply a hurdle. In the fundraising dance, a new step is that donors ask “how do you measure your impact” and the trick is to have a response – it doesn’t much matter what that response is, it’s simply to establish that you have thought about it. The donor takes that as a proxy for you doing something about it.
So if we ask ‘does impact measurement affect fundraising?’ the answer is that we don’t really know. (Notice that we’ve not discussed whether having good performance affects fundraising. We’ll come to that.)
Does impact measurement improve performance?
Let’s again be clear, that measuring impact is a research exercise, whereas improving performance involves changing tasks, systems, processes. They’re quite different. People sometimes seem to expect that simply ‘measuring their impact’ will improve it, but not so: you don’t fatten a pig by weighing it every day.
Again, we don’t know whether impact measurement improves performance. I know of no studies of the performance of charities which have vs. haven’t done rigorous measurement. As David says, it would be hard to design one – though perhaps not impossible.
There are certainly some examples of charities who have made management changes as a result of looking at impact. For instance, Chance UK provides mentors for primary school children who are at risk of developing anti-social behaviour and possibly being permanently excluded from school. Through impact research, it found that male mentors were best suited to children with behavioural difficulties, whereas children with emotional problems responded best to female mentors. That finding is pretty obviously actionable. (More on that here.)
But those examples are too few. At the inaugural meeting of the (then-so-named) Social Impact Analysts Association, in Berlin a few years ago, somebody asked for a show of hands of organisations who had ceased a programme on learning about its impact. Not one.
Not much impact measurement is designed to be useful
My growing hypothesis is that little “impact research” by charities is designed to be useful or robust. Much of it simply “measures the impact” of one thing: generating an answer that the impact is, say, 14.2, which tells you precisely nothing about how that could be improve. Management choices are invariably between two (or more) courses of action, and yet very little “impact research” is comparative like that.
Furthermore, if impact measurement were designed to improve decisions (either by funders or by charity managers), there ought to be more interest in harmonising metrics. Inspiring Impact found that 120 charities and social enterprises were, between them, using over 130 tools and systems for measurement.
Much “evaluation” is so unreliable that one wouldn’t want to base either funding or management decisions on it. Most evaluation by charities is done using before/after studies, which are open to bias. (Maybe any observed change before vs. after the programme would have happened anyway). In many cases the sample size is too small to be robust.
(There are only few analyses of the quality of charities’ evaluations, and we could do with more research about the quality of our research: such studies have lead to massive improvements in the quality of medical research.)
The Paul Hamlyn Foundation looked at the quality of “impact reports” it had received, and even on very generous criteria, found that only a quarter is “good”. And Project Oracle looks at the quality of research by London’s youth sector, and when I last looked, it wasn’t clear how many organisations had failed to reach even the first level on its ‘ladder’, but of those which had, only 11 per cent had reach the second rung. None had reached any further.
Internationally it isn’t much better. The (US-based) Center for Education Innovations is assessing the quality of research by education NGOs in international development, and found that essentially they’re all Level 1 on the Maryland Scale, a respected academic scale, which is more demanding than either Paul Hamlyn or Project Oracle’s first criteria.
This all rather implies that most charities’ “impact measurement” or “evaluation” doesn’t actually show what their impact is.
In fact, since poor quality evaluation is more likely to flatter, and charities are judged on research which they themselves produce, they are incentivised – by funders – to produce bad research. Ken Berger, formerly of Charity Navigator, and I wrote about this earlier this year.
The director of a large US foundation told me once that most funders are more like parents than asset managers: they want to make the most of the ones they’ve got, and are not interested in swapping. He said that that’s why they don’t want to know that it’s not working. A month later I heard a UK charity CEO say that “people who’ve been funding you for 10 years don’t want to pay for decent quality evaluation because they don’t want to suddenly hear that it doesn’t really work.”
People will throw rocks when I say this, but I increasingly think that non-profits’ impact research isn’t a serious attempt at research. If it were, there would be proper training for non-profits on producing it, training for funders on commissioning it and interpreting it, guidelines for reporting it clearly, and quality control mechanisms akin to peer review, and places to put it, such as repositories and journals. There aren’t.
Use research, don’t produce it
In fact, to “measure impact” is to produce evaluation research, and in my view, few charities need to do that. Given that most have neither the skills, money, sample size nor incentive to do it well, they could improve their impact much better by using the existing research: go see what the robust and academic studies say about your type of intervention or the behavioural changes or incentives which it assumes. I call this RTFM (or read the manual, for short). It’s what every doctor does. Don’t produce research: rather, go read and heed the masses which already exists.
I’ve written before about how donors shouldn’t ask “What is your impact?”, but rather “On what evidence is your work based?” or “What makes you think that your intervention works better than other interventions addressing the same problem?”
For example, the Education Endowment Foundation this week published an evaluation of in-school breakfast clubs – a randomised controlled trial involving 106 schools. It’s obviously much more reliable than anything which any individual charity will produce, so charities should use it to design their programmes and cite it as such.
There are other types of research which are easier than evaluation (establishing causation) and more readily useful to non-profits, such as asking prospective and actual beneficiaries what they want, what they think of what they’re getting, and what they’d like changed.
Can you measure the impact of impact measurement?
Yes. It’s relatively straightforward to measure the effect on fundraising, as has already been done. (Strictly speaking, this doesn’t measure the effect of impact measurement on fundraising; but rather the effect of telling donors about impact.)
Measuring the impact of impact measurement on performance is harder, and in my view unnecessary. Measuring impact with precision and reliability is fiendishly hard: to screen out the effect of all other possible factors, such as other changes in the world, bias from small sample size, survivor bias etc. A director of the US military’s measurement lab (which looks at radar precision, for example), when recently asked how or whether it measures – or even considers – its own effect, just said “no, that would be impossible”. I laughed.
A better question is “what gives us to think that using existing research can improve performance?” And that we really ought to be able to answer properly.