In my past life as a molecular biologist, I learned Arthur Kornberg’s “Ten Commandments: Lessons from the Enzymology of DNA Replication” (J. Bacteriology. 2000 Jul; 182:3613). Dr. Kornberg was a biochemist (he won the Nobel Prize in Medicine in 1959 for identifying the enzymes that mediate DNA replication), and his fourth Commandment was, “Do not Waste Clean Thinking on Dirty Enzymes”. His point was that cells are complex (lots of different enzymes!), so before making conclusions, try to ensure that you’ve eliminated or controlled for unexpected variables that might have influenced your results.
Recent WASH sector discussions, project reports, and calls for impact assessments have reminded me of Dr. Kornberg’s lesson. Across our sector, there is increasing emphasis on experiments and impacts assessments for measuring how well interventions work and their cost-effectiveness. The growing demand for good evidence is, of course, welcome. However, I wonder if the requirements for obtaining good evidence are well considered.
Clearly, a critical requirement for measuring program impacts is an understanding of what would have happened if the intervention was not implemented: i.e., would the targeted outcomes have changed if there was no program?
Nevertheless, a few years ago I heard the head of a development organization claim credit for declines in infant mortality rates in areas of Kenya where his organization was implementing agricultural improvement programs. The very next day the Kenyan government and development partners released data indicating that the dissemination of insecticide-treated bednets was driving significant reductions in infant mortality across the country.
So how much impact did the agricultural improvement programs actually have on infant mortality? It’s hard to say, because the development organization did not have estimates of how much infant mortality would have changed in their intervention communities if they had not implemented their agricultural improvement programs.
The best strategies for estimating what would have happened if a program wasn’t implemented employ comparisons with a valid “control” group: i.e., intervention units (communities, households, institutions, etc.) that did not receive the program but were very similar to the intervention units that did receive the program.
The key is the level of similarity between the control and intervention groups: the more the two groups differed (e.g., in economic development, education levels, geographies, occupations, political leadership, other development programs, etc.) when the program was implemented, the harder it is determine whether differences in the outcome of interest (e.g., infant mortality) between the two groups are due to the presence/absence of the program or to some unrelated “confounding” factor such as a difference in average education levels between the two groups.
The best option for establishing a valid control group is to randomly select intervention units out of a large pool and assign unselected units to the control group. If the selection pool is big enough, average measures for any parameter will be identical between the intervention and control groups – this is the law of large numbers. If the selection pool is too small for random selection or other considerations are important for targeting the intervention, it is possible to establish a matched set of controls by selecting a group that looks similar to the intervention group across many different parameters.
Random selection obviously has to happen prior to program implementation. Ideally, matching a control group to the intervention group should also occur prior to starting implementation. Retrospective matching (i.e., finding a matched control group after a program is completed) requires pre-intervention data for both the intervention group and potential controls that covers many different parameters. Often this data is not available or is of poor quality.
The lack of useful pre-intervention data means that poorly matched intervention and control groups compromise many of the non-randomized WASH impact assessments that are initiated after program completion. There is potential for confounding factors (known and unknown) in the control group to influence measurements of program impact in the intervention group.
This posting, then, is one more call for government agencies, development partners, and implementing organizations to consult with impact evaluation specialists before designing and rolling-out programs. Don’t wait until it’s too late: we should not waste clean thinking on confounded comparisons.