This article first published in the Stanford Social Innovation Review.
Social impact bonds (SIBs) are a high-profile innovation in funding public services. The pilot SIB in Peterborough, UK, which aims to reduce recidivism, has been widely watched and—despite not yet producing results—already widely emulated.
Given the international interest in SIBs and similar payment by results (pay-for-success) schemes, it’s important to determine whether the Peterborough SIB works. The Ministry of Justice describes the program’s evaluation method as “the Rolls Royce of evaluation.” However, Professor Sheila Bird of Cambridge University and the UK Medical Research Council says: “[It] might well be a brilliant success; it might achieve little. But we aren’t going to know either way.”
This article examines three aspects of determining whether the SIB works.
The first is straightforward: whether the investors should be repaid. Determining this will be easy, because it depends solely on the re-offending rate and the contractual terms—both of which will be clear.
Second, whether the intervention itself works to reduce re-offending—a central question. Determining this will be more difficult, because this first SIB is using a variety of interventions—only some of which have been evaluated rigorously and the combination has never been evaluated.
The issue is attribution: figuring out whether the re-offending rate amongst the Peterborough prisoners has anything to do with the charities’ work which the bond funds. Both sides agree that the way to see what the charities have achieved is to compare:
- The one-year re-offending rates of men with whom the charities work.
- The one-year re-offending rates of a group of similar men with whom the charities haven’t worked. This “control group” screens out effects of, say, changes in society, the law, or sentencing procedures.
It’s essential that the “treatment group” and control group be effectively identical beforehand; if they are, the sole difference between them is the program, which alone must account for differences in re-offending rates between the groups. Bird would have liked the treatment group and control group to have been selected at random to ensure that the groups were effectively identical. But this isn’t what is happening. Social Finance says it was impossible: within the prison, the program is advertised and open to anybody whose sentence is a year or less. Prisoners are used to—and exasperated by—being apparently arbitrarily excluded from things, and neither Social Finance, the nonprofit company that invented social impact bonds and is running the Peterborough pilot, nor the prison governor wanted this program to generate ill-will in that way. Social Finance says that its “investors wouldn’t tolerate excluding some people.” Sheila’s view is that random selection inside prisons (as outside them) is not only possible, but also pretty common.
If randomising prisoners wasn’t possible, the next best option would have been randomising prisons: In other words, several randomly selected prisons would get the program while others wouldn’t, and the re-offending rates of their populations would be compared. Social Finance says that this wasn’t possible either, because the Ministry of Justice would never have allowed a pilot in several prisons at once.
Interestingly, Peterborough prison wasn’t chosen at random, but rather because the prison governor was willing to engage. As Bird remarks, that may indicate an usual trait in the governor, which itself may influence the results. It’s not impossible that a prison governor willing to take on this innovative project is unusually progressive in other respects too: perhaps Peterborough prison offers other unique programs that could skew the results.
To construct a control group, the bond evaluation uses Propensity Score Matching (PSM), a system often used when samples can’t be randomised. With PSM, you start by figuring out what indicators have historically correlated with eligibility for the treatment (propensity to be eligible). In this case, prisoners at institutions other than Peterborough who have the same “propensity scores” as the treatment group serve as a control group. Social Finance is doing an unusually elaborate PSM by having about ten “control” prisoners for each “treatment” prisoner.
Nonetheless, there are major objections to PSM as a way of attributing any effects observed. One is that PSM can only ever look at indicators that are observable, such as age, background, and criminal history. Yet it’s often unobservable factors—such as attitude or resilience—that drive behaviour.
Another problem is that the only data available for the PSM are what’s stored in the Police National Computer, which is surprisingly basic. For instance, it can’t distinguish whether somebody has mental health problems or a history of heroin use, which obviously would influence their behaviour and the care they need.
Astonishingly, even the Ministry of Justice explicitly acknowledges that the control group may be pointless (see page 7 of this Ministry of Justice document about the evaluation).
The third respect is whether the bond structure itself works. Social Finance says that the mere existence of this first bond proves that it is possible. It prove possible to define performance criteria against which a public body agreed to repay, and to find private donors willing to provide funding based on those criteria.
But when we eventually see the re-offending rates of the treatment and control groups, we won’t know whether to attribute any differences to:
- Social Finance’s particular mix of interventions
- The money. The SIB brings in about £1,667 pounds per prisoner. Bird thinks any prison governor could use that amount to dramatically reduce re-offending. It’s possible that the prison governors could out-perform Social Finance’s program.
- The new financing mechanism itself. We won’t know whether it produces better outcomes than if that money had been put into that intervention through, say, a grant program.
The core problem might be that Social Finance is delivering on a contract: it isn’t doing social science research, to which distinguishing between possible causes is central. So does the difficulty of seeing the effect of the financing mechanism itself matter? Well, not for Social Finance or its donors in this first instance. Their proximate issue is delivering the contractual obligations such that they get paid. But surely it would have been helpful to Social Finance’s future work to see the effect of the SIB mechanism itself.
It certainly matters to the Ministry of Justice, which 1) may end up paying for a service that didn’t achieve anything beyond what that particular prison governor would have achieved without that money, and 2) won’t therefore know what service they should roll out to other prisons if the Peterborough service does apparently succeed.
It matters even more to UK taxpayers who are funding all of this—as well as hoping not to be burgled or mugged. Yet they’re unlikely to object because the intricacies of randomisation and PSM for determining attribution are a shade too complex.
“All these problems could have been averted,” says Bird. She says, for example, that this first SIB could have been tested against a known intervention with a conventional funding mechanism.
And yet, we should not let the best be the enemy of the good. Clearly, we are likely to get better public services when the interests of the provider and purchaser are better aligned, and SIBs are a step in the right direction. Despite the Peterborough SIB’s curious design choices, it has taught us many things—and will teach us many more.
Why asking ‘what does this achieve’ doesn’t usually help much—>