How I ate an elephant – evaluating system change in bite size chunks

Systems change has been described as ‘counter-structural’ and ‘counter-cultural’ – to which it might be added that evaluation may be counter-intuitive. Mental models for evaluation need to adjust to what can be learned about the value of interventions into complex systems – or how to evaluate like you are eating an elephant.

Evaluation often focuses on measuring outcomes directly attributable to an intervention. This approach is often considered to provide the best opportunity for accountability (and sometimes for learning, often complemented with more qualitative methods). It is common sense to want to know the outcomes that were generated by funded activity. This commonsense notion applies in many day-to-day contexts and is the idea that underpin the testing of specific hypothesises using experimental methods including the randomised controlled trial. It is an idea that provides for rigour in evaluation. It helps reduce bias and self-serving evaluation reports that obscure rather than enlighten. Measurement of outcomes relative to a counterfactual can also provide a means for the comparison of the specific value of one action relative to other possible actions. It may appear that there are few, if any, other rigorous approaches for holding fundees to account and informing rational allocative decisions than the measurement of outcomes of interest. Yet there are many situations where the requirements for this approach cannot be met – sometimes the initiative is difficult to define, often the data simply isn’t there.

Evaluation of systems change initiatives, if they are to be rigorous and strategic, must work hard to uncover the value of action in situations where reliable and valid measurement sensitive and specific to funded action is not possible, prohibitively expensive, or impractical. Saying ‘its complex’ is not a valid excuse. It must be axiomatic that any action that is funded may be subject to evaluation. Every funded action may be reasonable or unreasonable in its design. Every funded action may be performed well or performed poorly relative to its resourcing. It must still be possible, if evaluation is to be of use for accountability as well as learning, to evaluate when an initiative is hard to define, when data is fragmented and when counterfactuals are difficult to construct.

Any systems change initiative or evaluation must first define ‘the system’ and not leave the concept vague if the desire is to avoid a lot of angst about the value of potentially useful, opportunistic, or uncoordinated action and confusion. The system may be defined as a group of actors and/or institutions that have a related, and ideally, common goal. It may relate to a geographical location (as in place-based initiatives) or functional responsibility (a health system, or education system etc). Pragmaticism is required to set the boundaries of the system – does a health system extend to the mental health system? Or does early childhood education system extend to the child welfare system? I find that often a system may be defined pragmatically as those within a conceptual system that are willing to work with you.

Evaluation of systems change can benefit from considering different levels of systems change – the explicit conditions of systems change, the semi-explicit conditions of relational change, and the implicit conditions for transformative change as described in the ‘Waters of System Change’ (Kania et al 2018).

  • At the level of the explicit the questions are often about the appropriateness, efficiency and effectiveness of what was funded or done [1].
  • At the level of the semi-explicit conditions of relational change questions are often about how the quality of relationships between system actors have changed amongst those in the system, and how the system has grown over time.
  • At the deepest level of the implicit conditions for transformative evaluation might seek to answer questions about how mental models or deeply held beliefs, values and assumptions are changing.

Following this idea, rigorous evaluation by a funder of systems change initiatives will often benefit from answers to at least four key questions. These questions are relevant during the design, funding, delivery, re-design and re-funding decision-making process. These questions are interdependent, answers may only be uncertain and partial with a short shelf life. Just like exercise for staying fit, evaluation requires constant effort and is not something that is ever fully achieved.  Of course, the elephant is a whole that is more than the sum of its parts, but the goal here is to be practical and not bite off more than we can chew.

  1. What is the system we are trying to change? What is a feasible boundary to our sphere of influence (narrower than our sphere of interest) and who are the actors, what are their relationships and mental models that might be holding us back, that is, what are the conditions we are seeking to change?
  2. Are we funding the right things? Will this action if done well change the explicit or implicit conditions in our system of influence in ways we value?
    1. Would the proposed outcomes of initiatives, if achieved, contribute in a meaningful and cost-effective way to our mission and vision and the conditions we are seeking to change?
    2. Is there a coherent plan to achieve those outcomes i.e. a reasonable or sound value proposition that the fundee can generate those outcomes, avoiding the need for heroic assumptions?
  3. Did the fundee do what they said they would do?
    1. Have they implemented the actions they agreed to?
    2. Have they delivered the quality outputs?
    3. Are system conditions observable and have they changed?
  4. Do current system parameters suggest we should fund more of the same? Perhaps we need to do something slightly different or a whole new approach may be required?

These questions may seem reasonably uncontroversial – the first question simply requires a definition of the system and its parameters or elements: the actors, boundaries, and relationships that make up the system of interest. Many system changes initiatives fail to define the system of interest, and as a result, evade accountability and spend too much time talking and not enough doing. The second question is a bit different; it is about the funder being accountable to itself – did we fund a sound value proposition, or did we fund a nice idea. The third question, which is essentially the question, “are we/ they doing things, right?” – is relatively easy to answer, but is also controversial because even if the answer is ‘yes’ it does not mean a fundee should continue to be funded. In a dynamic and complex system, past outcomes do not always provide evidence that more of the same activity is desirable. The fourth question, which is essentially the question, “are we/ they doing the right things?” is the hardest. Once defined, data on the status of system elements can be compared with the intended outcomes of funded initiatives. If the intended outcomes of funded initiatives would logically address deficits in the status of system elements this is a good start. If the fundee appears to have a sound proposition for achieving intended outcomes in a cost-effective manner, this would suggest they should be funded, especially if past results warrant placing trust in the fundee, and there are no better propositions to consider.

The questions are also controversial in what they do not ask of fundees. They focus on the concrete outcomes that fundee initiatives could reasonably achieve, not hoped for indirect impacts to which an initiative may contribute in a manner that cannot be quantified. That is, they focus on outcomes of influence, on ‘what’ they will do, not outcomes of interest or ‘why’ they will do it. This approach accepts that reality, or our understanding of it, is stratified. The implication is that we can’t answer question about the value of one level of the system based on measures that make sense at another level (just as we can’t easily answer questions about psychological behaviour using methods of based on the movement of protons and neutrons in physical systems). Measuring contribution to indirect outcomes (or impacts) of interest may be possible and useful in situations where adequate data exists, but when it doesn’t assessment of contribution may invite the same bias and confusion that measurement sets out to avoid.

The art of systems evaluation starts with asking the right questions. The science of system evaluation may lie in devising methods for measuring conditions and understanding relationships at different levels of the system. While it may well be that in the waters of system change there are ripples of impact, it doesn’t follow necessarily that trying to measure those ripples across different levels of the system is the best way to understand value.

Evaluators have an important role to play. Evaluators can help describe system actors, relationships and boundaries – and ensure different perspectives are engaged. They can hold fundees to account for doing what they said they would do. Evaluators can assist with the collation of evidence and insights about what was delivered, what was achieved, and what was learned. This can be presented in a way that is useful for the decision-making of fundees as well as for funders. Evaluators can also measure direct outcomes and sometimes indirect outcomes. Evaluators may also help funders identify the flaws and refine value propositions in both the funding and re-funding stages to provide a level of rigour that may rival that of venture capitalists and champion reasoned decision-making in situations where certainty is not possible and learning and constant effort are required.

While systems are potentially infinite in size, it is generally easier to eat a large meal in bite size chucks – just like physics, chemistry and biology, or micro and macroeconomics don’t try to answer everything about the world all at once. A system will have different levels that are interdependent, but it is often too hard to measure the impact of action at one level on another – but it is important to be aware of different levels and to ask the right questions about each one and, obviously, to use the right methods to answer the right questions. In situations where it is difficult or not possible to measure changes across levels of the system, to attribute changes in the overall health of the system (i.e. the emergent properties of the system) to individual actions, it is strategic to base funding decisions on reasoned answers to the four questions.


[1] Evaluators are often focused on the role of evaluation for addressing the explicit conditions for systems change. This mental model currently prevails – as much as it might need changing, it also provides a concrete starting point for moving from talk to action.

Receive our latest news and insights
  • This field is for validation purposes and should be left unchanged.