qualitative case study program evaluation

Open access
Published: 26 May 2016

A qualitative case study of evaluation use in the context of a collaborative program evaluation strategy in Burkina Faso

Léna D’Ostie-Racine 1 ,
Christian Dagenais 1 &
Valéry Ridde 2 , 3

Health Research Policy and Systems volume 14 , Article number: 37 ( 2016 ) Cite this article

5653 Accesses

5 Citations

6 Altmetric

Metrics details

Program evaluation is widely recognized in the international humanitarian sector as a means to make interventions and policies more evidence based, equitable, and accountable. Yet, little is known about the way humanitarian non-governmental organizations (NGOs) actually use evaluations.

The current qualitative evaluation employed an instrumental case study design to examine evaluation use (EU) by a humanitarian NGO based in Burkina Faso. This organization developed an evaluation strategy in 2008 to document the implementation and effects of its maternal and child healthcare user fee exemption program. Program evaluations have been undertaken ever since, and the present study examined the discourses of evaluation partners in 2009 (n = 15) and 2011 (n = 17). Semi-structured individual interviews and one group interview were conducted to identify instances of EU over time. Alkin and Taut’s (Stud Educ Eval 29:1–12, 2003) conceptualization of EU was used as the basis for thematic qualitative analyses of the different forms of EU identified by stakeholders of the exemption program in the two data collection periods.

Results demonstrated that stakeholders began to understand and value the utility of program evaluations once they were exposed to evaluation findings and then progressively used evaluations over time. EU was manifested in a variety of ways, including instrumental and conceptual use of evaluation processes and findings, as well as the persuasive use of findings. Such EU supported planning, decision-making, program practices, evaluation capacity, and advocacy.

Conclusions

The study sheds light on the many ways evaluations can be used by different actors in the humanitarian sector. Conceptualizations of EU are also critically discussed.

Peer Review reports

Humanitarian assistance organizations are increasing investing in program evaluation to enhance performance, practice and accountability [ 1 – 5 ]. Yet, ensuring knowledge derived from evaluation of humanitarian action, defined as the “ systematic and impartial examination of humanitarian action intended to draw lessons to improve policy and practice and enhance accountability ” [ 6 ], is actually used remains an important challenge [ 2 , 4 , 5 , 7 – 9 ]. A common difficulty highlighted by Hallam [ 4 ] is that “ too often, humanitarian evaluations exist as a disconnected process, rather than becoming embedded as part of the culture and mindset of humanitarian organisations ”. The literature offers few examples of evaluation strategies that have been integrated into a humanitarian aid program, used effectively, and documented over time [ 10 ]. Rare also are studies that document the perspectives of both knowledge producers (e.g. evaluators) and intended users [ 10 ].

The present article examines evaluation use (EU) by HELP ( Hilfe zur Selbsthilfe e.V. ), a German humanitarian non-governmental organization (NGO) based in Burkina Faso that has developed an evaluation strategy now embedded into the country’s healthcare user fee exemption program [ 11 – 14 ]. The exemption program was implemented in Burkina Faso in part because of the country’s high rates of mortality and morbidity and its context of economic poverty, in which user fees undermine the accessibility of health services for many [ 13 – 16 ]. Especially in the Sahel region, where HELP implemented its user fee exemption program, maternal and infant rates of malnutrition, morbidity and mortality are exceedingly high, as shown in WHO’s 2014 statistical report [ 13 , 14 , 17 ]. HELP’s program is aimed at exempting indigents, pregnant and breastfeeding women, as well as children under five, from user fees [ 13 ]. Similar user fee subsidies or exemption programs had been attempted in different West African countries [ 18 ], but planning, implementation, and evaluation were frequently insufficient and often only partial [ 19 , 20 ], and in general the measured impacts were smaller than expected [ 21 ]. Hence, while such exemption programs innovated upon previous practices in West Africa [ 22 ] and in some instances seemed promising [ 21 ], for a complex array of reasons, health sector deficiencies persisted and health indicators remained worrisome [ 21 , 23 , 24 ]. Thus, documenting and evaluating the implementation of innovative health financing programs has become increasingly necessary. West African decision-makers and practitioners have required empirical documentation on the processes and effects of user fee exemptions to ground their reflections, decisions and actions [ 18 , 22 , 23 , 25 , 26 ].

HELP had previously implemented an exemption program in Niger, which had been evaluated in 2007 at the request of its funding agency, the European Commission’s Humanitarian Aid and Civil Protection department (ECHO). The external evaluators were impressed by the HELP managers’ interest in evaluation findings and by their proactivity in implementing evaluation recommendations. Conscious that empirical evidence can support improvements in the humanitarian sector [ 23 , 26 ], HELP managers consulted those same external evaluators while planning the Burkina Faso user fee exemption program, hoping to render it more evidence based. Together, the external evaluators and HELP managers developed an actual strategy for the evaluation, to be embedded within the user fee exemption program, and requested and were granted a specific budget for that evaluation strategy. Upon budget approval in 2008, HELP staff and the evaluators simultaneously developed both the Burkina Faso exemption program and the evaluation strategy aimed at documenting its implementation and effectiveness for purposes of accountability, program learning and improvement, and advocacy [ 8 , 11 ]. Indeed, evaluating HELP’s exemption program as it evolved in Burkina Faso would provide opportunities for HELP and its partners to learn from and improve the exemption program. Resulting documentation could also be used to enhance HELP’s transparency and accountability and to facilitate its advocacy for equitable access to healthcare. Advocating for equitable access to healthcare was also one of ECHO’s objectives and hence was in line with its own mission. These were the main motives driving HELP decision-makers and their partners, including a principal evaluator, to develop the evaluation strategy.

Ridde et al. [ 12 ] have described in detail 12 of the studies undertaken by HELP as part of the evaluation strategy (Box 1). Stakeholders of the strategy, referred to in this article as evaluation partners (EPs), were primarily HELP’s exemption program staff and the external evaluators, but also included the Sahel regional health director ( directeur régional de la santé , DRS), the district chief physicians ( médecins chefs de district , MCDs), and representatives from ECHO, as well as advocacy partners, including a journalist and a representative of Amnesty International.

Box 1 HELP evaluation studies from 2007 to 2011

Following an evaluability assessment of EU in Burkina Faso as part of the evaluation strategy described by Ridde et al. [ 12 ], it was clear the experiences of its EPs presented a rich opportunity to examine progressive EU over time [ 28 ]. More specifically, the present study is innovative in examining the different forms of EU in depth, using a diachronic approach to observe any variations in EU between 2009 and 2011 from the varied perspectives of the different EPs. EPs who had collaborated both on the Niger 2007 evaluation and on the evaluation strategy in Burkina Faso were able to discuss variations in EU between 2007 and 2011.

Evaluation use

Traditionally, EU has been viewed solely as the use of evaluation findings, referring, for example, to the application of evaluation recommendations [ 29 , 30 ]. In this view, after reading an evaluation report, staff in a humanitarian program aimed at alleviating malnutrition could, for example, strive to implement a recommendation to increase the supply of a given nutrient to toddlers of a given community. Current definitions of EU, however, include not only findings use but also process use, a term originally coined by Patton [ 31 ] to refer to the “ individual changes in thinking, attitudes, and behaviour, and program or organizational changes in procedures and culture that occur among those involved in evaluation as a result of the learning that occurs during the evaluation process ”. Patton [ 32 ] explained that process use could, for instance, manifest as “ infusing evaluative thinking into an organization’s culture ” [ 32 ], which might be seen in attempts to use more clear, specific, concrete and observable logic [ 31 ]. Humanitarian staff for the same nutritional program could, for example, learn during an evaluation process to specify clearer program objectives, beneficiary selection criteria, program actions and success indicators. Such process use could enhance shared understanding among them and potentially lead to program improvements and ultimately to lower rates of malnourishment. In the present study, we have attempted to attend to a broad spectrum of EUs by according no primacy to findings use over process use and by documenting unintended uses as well as uses that occurred over time in a cumulative or gradual manner.

The principal objective of the present study was to examine the diverse uses of evaluation findings and processes engendered by the evaluation strategy. A related objective was to examine whether any changes in EU occurred between 2009 and 2011. Hence, the focus was not on the use of a particular evaluation study, but more generally on how EU evolved over time, as the evaluation strategy was developed and more than 15 evaluation studies (Box 1) were conducted. For the present study, we employed an adapted version of Alkin and Taut’s [ 33 ] conceptualization of EU to ensure its diverse manifestations were identified. In their model, EU is either findings use (instrumental, conceptual, legitimative) or process use (instrumental, conceptual, symbolic). ‘Instrumental use’ involves direct use of evaluation-based knowledge for decision-making or for changing program practices [ 33 ]. ‘Conceptual use’ refers to indirect use of knowledge that leads to changes in the intended user’s understanding of program-related issues. ‘Symbolic use’ relates to situations in which those requesting the evaluation simply seek to demonstrate their willingness to undergo evaluation for the sake of reputation or status [ 29 , 33 ]. Lastly, ‘legitimative use’ occurs when evaluation findings are used to justify previously undertaken actions or decisions [ 33 ]. We adapted Alkin and Taut’s [ 33 ] conceptualization by integrating its symbolic and legitimative uses under the broader concept of ‘persuasive use’ to also account for what Estabrooks [ 34 ] described as using evaluation as a persuasive or political means to legitimize a position or practice. Leviton and Hughes [ 35 ] further clarify the interpersonal influence that is integral to persuasive use, explaining that it involves using evaluation-based knowledge as a means to convince others to subscribe to the implications of an evaluation and hence to support a particular position by promoting or defending it. We added this term to stress the point made by previous authors that persuasive forms of EU can also serve constructive purposes [ 35 , 36 ]. For instance, empirical evidence can be used persuasively to advocate for equity in global health. Symbolic and legitimative EU are terms that commonly carry negative connotations and are not easily applied to such constructive purposes. Persuasive use is included to draw attention to the different and concurrent ways in which evaluations can be used to influence reputations, judgment of actions or political positions.

Some examples may help clarify these different forms of EU. For instance, discussions during the evaluation process about the lack of potable water in a given village could lead intended users to think about strategies to bring water to the village; they might also recognize how helpful evaluations are in highlighting water needs for that village and how hard village locals have been working to fetch their water. These are forms of ‘conceptual process use’, in that intended users’ conceptions changed as a result of discussions during the evaluation process. Had such conceptual changes occurred as they learned of evaluation findings, this would have been ‘conceptual findings use’. Had intended users come to meet with locals and/or decided to dig a well, this would illustrate ‘instrumental process use’. It would have been ‘instrumental findings use’, had this decision to build a well been taken based on findings showing, for example, high morbidity rates associated with dehydration. Having already taken the decision to build the well, stakeholders could ask for an evaluation solely to empirically demonstrate the need for a well; this would be ‘legitimative use’. Or, they could have their well-building intervention evaluated without any intent or effort to use evaluations, but simply for ‘symbolic use’, to demonstrate their willingness to be evaluated. Then again, the well-building intervention could also undergo evaluation to provide convincing data that could be used in political claims advocating for human rights to potable water policies, thereby constituting ‘persuasive use’.

Research design

This evaluation used a qualitative single case study design and a descriptive approach to examine EPs’ discourses about EU over time [ 37 , 38 ]. This was an instrumental case study, in that HELP’s evaluation strategy was chosen for its ability to provide insight into EU [ 39 ]. To document the evolution of EU over time, two waves of data collection were conducted by the first author in Burkina Faso using a diachronic approach with an interval of 29 months (July 2009 and November 2011). The 2009 data collection lasted 5 weeks and employed individual interviews. The 1-month 2011 data collection involved individual interviews as well as one group interview. Documentation and non-participatory observation provided contextual complementary information.

Recruitment procedures

Objectives and procedures of the present study were explained to EPs upon soliciting their participation. When EPs responded positively, interviews were scheduled at a time and place of their convenience. Recruitment for individual interviews in 2009 and 2011 followed two purposeful sampling strategies [ 40 ]. The intensity sampling strategy (targeting persons intensely affected by the studied phenomenon) led us to recruit the principal evaluator and the NGO’s head of mission as the first participants [ 40 ]. Thereafter, the snowball sampling strategy was used, in which participants were asked to suggest other information-rich respondents. A conscious effort was made to limit the risks of ‘ enclicage ’ (a French term describing the risk that the researcher would be assimilated into a given clique and estranged from other groups and/or the larger group as a whole), as cautioned by Olivier de Sardan [ 41 ]. The extensive experience in the study context of one of the authors helped avoid such potential sampling biases. Data triangulation was also achieved by recruiting multiple participants with diverse relationships to HELP’s evaluation strategy as a means of obtaining varied perspectives and enhancing the study’s validity [ 42 ]. Such intra-group diversification was a conscious attempt to collect multiple viewpoints for a comprehensive appreciation of EPs’ individual and collective experiences [ 43 , 44 ].

Participants, data collection instrument and protocol

Semi-structured individual interviews were conducted in 2009 (n = 32; 15 respondents, 17 interviews) and in 2011 (n = 36; 17 respondents, 19 interviews) in Ouagadougou, Dori and Sebba. In each round of data collection, an extra interview was conducted with two EPs who had been particularly active and involved in the evaluation strategy and had more to say after a single interview; hence, the number of interviews exceeded the number of respondents by two in both collections. Table 1 presents the distribution of respondents for both data collections. Six EPs were interviewed in both 2009 and 2011. All EPs from HELP involved in the evaluation strategy were interviewed at least once, either in 2009 or 2011. EPs interviewed only in one data collection were either not working with HELP or out of the country during the other collection. Length of collaboration in the evaluation strategy ranged from three to 52 consecutive months for 16 EPs and was intermittent for the others. Eighteen EPs were locals from Burkina Faso, three were from West Africa, and five were international expats. Five were women, three held management positions, one was an evaluator, and another was a community outreach worker.

Individual interviews lasted an average of 60 minutes. Interviews (individual and group) were semi-structured and followed an interview guide flexibly enough to allow it to evolve as the study progressed [ 40 ]. Questions were open-ended and solicited descriptions of EPs’ experiences and perceptions, as they had evolved over the course of the evaluation strategy, of (1) the evaluation strategy; (2) evaluation use; (3) collaboration with other EPs; and (4) the influence of evaluation upon them, other partners and their work environment. For most EPs, questions focused on the years 2009 to 2011, but those who had collaborated in the Niger evaluation were also free to recall their experiences starting in 2007. Specific examples of interview questions are presented in Box 2.

Box 2 Interview guide: examples of questions

The group interview was conducted at the start of the 2011 data collection period before the individual interviews, as a means of discerning interpersonal dynamics and spurring collective brainstorming on the general questions of the present study; it lasted 90 minutes. This was a small group (n = 3; a manager and two coordinators) of HELP personnel who had been responsible for evaluation-related activities. Inspired by Kitzinger’s [ 45 , 46 ] suggestions for focus groups, we used open-ended questions to foster interactions among them as a means of exploring emerging themes, norms and differences in perceptions regarding the evaluation strategy, EU and interpersonal dynamics among EPs. They were encouraged to explore different viewpoints and reasoning. Significant themes were later discussed in the individual interviews.

Interviews were conducted in French (Box 2), recorded digitally, transcribed and anonymized to preserve confidentiality. Transcripts were the primary data source for analyses.

Two additional sources of information provided insight into the study context, although not formal study data. Non-participant observation shed light upon EPs’ interpersonal dynamics and HELP’s functioning, as the first author spent 4 weeks during each of the two data collections in HELP’s offices interacting with HELP staff and with visiting partners. In 2011, she also accompanied HELP staff from all three sites on a 5-day team trip, during which a team meeting was held. Documents relevant to the evaluation strategy (e.g. evaluation plans and reports, scientific articles, policy briefs, meeting summaries, emails between EPs, advocacy documentation) were also collected to deepen understanding of the study’s context. These data provided opportunities for triangulating data sources, thereby strengthening the validity of EPs’ discourses.

Qualitative thematic content analyses were performed on the interview transcripts [ 47 ] using a mixed (inductive and deductive) approach and codebook. Coding and analysis were facilitated by the use of QDA Miner data analysis software. An adapted version of Alkin and Taut’s [ 33 ] model was used to identify and code different forms of EU. We used their conceptualizations of instrumental and conceptual EU but adapted the model, as mentioned earlier, by adding persuasive EU as a broad term encompassing the concepts of symbolic, legitimative and advocacy forms of EU. A specific code entitled ‘change’ was also created to capture any observations of changes related to EU mentioned and discussed by respondents in the 2011 interviews. For example, if a respondent in 2011 noticed that more evaluations had been conducted and disseminated and that this had led to more instances of EU, the code ‘change’ was applied to this sentence and integrated into the 2011 analyses and results (described below). Special attention was paid to ensuring that a broad range of EUs would be detected. After coding, we retrieved each type of EU and examined the coded excerpts for 2009 and for 2011 separately to identify and describe any apparent differences emerging from the respondents’ discourses on EUs between 2009 and 2011. In this manner, a thematic conceptual matrix was created, facilitating the organization and analysis of specific instrumental, conceptual and persuasive (including symbolic/legitimative) uses of evaluations in both 2009 and 2011. A summary of this matrix is presented in Table 2 [ 47 ]. The first author performed all the coding and analyses but met twice with a qualitative research consultant, six times with a co-author, and 10 times with a research colleague to discuss and verify the codebook and to ensure coding consistency and rigour over time (coding conferences). The iterative analysis process allowed for review of coded excerpts and hence continuity of the coding and interpretations. Attention was paid to capturing EPs’ interpersonal dynamics, as well as their individual and collective experiences over time [ 45 , 46 ]. As mentioned, both non-participant observation and documentation helped the first author gain a deeper understanding of HELP’s context, but neither was analyzed systematically, due to lack of time and because interview data were already abundant. Analyses were not systematically validated by a second researcher, but two EPs active in the evaluation strategy commented on and validated a draft of the present article. The research was approved by the Ministry of Health of Burkina Faso. Ethical approval for the study was granted by the Research Ethics Committee of the University of Montreal’s Faculty of Arts and Sciences and by the Health Research Ethics Committee of the Ministry of Health of Burkina Faso.

Verification

Member checking was undertaken at various times and with different EPs to strengthen the validity of the findings [ 44 ]. For example, during data collections, the first author frequently verified her comprehension of the issues raised by EPs either during the interviews or after. The different themes emerging from analyses were discussed with several respondents to see whether they reflected EPs’ experiences and whether additional themes should be included. Drafts of the articles were sent by email to four participants who were thought to be most likely to have the time to read and comment on the drafts; two were able to respond to these member checking calls. Their feedback was always integrated into the iterative analysis process and usually also into the article drafts. Such member checking took place in informal discussions, during interviews and even in email correspondence. Other strategies were used to ensure responsiveness, sensitivity and reflexivity in the researcher’s approach and to support the validity of the present study [ 48 ]; these included co-coding and code discussions with a peer, using an iterative process in the analyses, peer debriefing (discussing the research methodology and analyses with academic peers), and keeping a log book of questions, ideas, challenges and decisions related to the study [ 49 , 50 ].

We first present results on use of evaluation findings for 2009 and 2011, followed by results on use of evaluation processes for 2009 and 2011. In the 2011 interviews, respondents frequently mentioned EU examples similar to those presented in 2009. For the sake of brevity, we present only the examples from 2011 that cover new ground. Results are summarized in Table 2 ; it should be noted that the column on the left lists respondents speaking about use by intended users; hence, when external evaluators (EE) are indicated, it refers to themes discussed by evaluators about intended users’ EU, and not their own.

Use of evaluation findings in 2009 and 2011

Instrumental use of evaluation findings.

In 2009, participants described various ways in which evaluation findings were used instrumentally. An evaluator was pleasantly surprised by HELP’s interest and proactivity in implementing recommendations from a previous evaluation in Niger in 2007 (Box 1: study 9): “ They took our recommendations into consideration and completely changed their practice and the way they intervened ” (EE3). A HELP staff member corroborated this affirmation and described how they used evaluation findings to plan the exemption in Burkina Faso, paying specific attention to avoiding mistakes underscored in the previous evaluation report [ 51 ]. For example, as recommended by evaluators, HELP sought the collaboration of the DRS and MCDs – as representatives of the Ministry of Health (MoH) –right from the start of the user fee exemption program in Burkina Faso instead of setting up its intervention in parallel to the State’s health system, as had unwisely been done in Niger. EPs also noted that evaluation findings had helped them identify and resolve problems in their program and its implementation. For example, a HELP staff member recalled learning about preliminary evaluation findings (Box 1: study 7) that indicated some intended beneficiaries did not know they could be exempted from user fees. In response, HELP increased its awareness-raising efforts through radio information sessions and pamphlets. EPs also spoke about how evaluation findings had been used to identify solutions that were concrete, locally meaningful and applicable. According to a HELP staff member and MoH representatives, some findings were not used immediately but guided planning and decision-making. For example, following the presentation of an action research report (Box 1: study 15, Dori), MoH representatives decided to incorporate the recommendations into the district’s annual plan to set as priorities to improve health services quality and raise awareness of the exemption.

The 2011 interviews revealed that findings were being used for similar purposes as in 2009, including to improve practices and to guide decisions. For example, three HELP staff members referred to evaluation findings that had helped them better identify, select and recruit eligible beneficiaries (Box 1: studies 6 and 14). In that study, findings highlighted that, while indigents were a target group of the exemption, little had been done to reach out to them. This led HELP staff to test and use an effective selection strategy for indigents. Additionally, findings showing that the cost to exempt indigents was lower than expected led to a decision to increase the number of indigent beneficiaries for each health centre. Another use noted by an EP was that evaluation findings validated their decision to advocate for free healthcare, which enabled HELP to pursue its actions in this direction. Participants noted that evaluation findings were also used to identify, explain and resolve certain challenges they encountered. For instance, HELP staff recalled findings from study 7 (Box 1) showing that some intended beneficiaries were being deceived by health centre staff into paying user fees. This valuable information was used to resolve the problem by investing in efforts to raise awareness about the exemption program, its services, target beneficiaries and criteria. Another example concerned findings that demonstrated medical staff were complying with and respecting norms for medical prescriptions, contrary to rumours that they had been issuing excessive and inappropriate prescriptions since the exemption for personal gain. This valuable information guided the responses of the medical supervisors in the field, who were reassured to learn they did not need to worry much about this issue. Findings from another evaluation on workload (Box 1: study 16) suggested that, while the exemption program did increase the medical staff’s workload, it did not correspond to WHO’s definition of work overload [ 52 ] . An MoH representative noted that these findings had helped him to organize and manage his health centre’s resources, motivate his healthcare staff, and better adapt to the increase in consultations. An MoH representative also said evaluation findings were used to acknowledge accomplishments, review objectives, and correct practices when necessary. A HELP staff member correctly noted that changes in their practices (instrumental use) were preceded by changes in awareness (conceptualization).

Conceptual use of evaluation findings

In 2009, respondents described a few instances of conceptual use of findings. One useful aspect of evaluation findings was that they provided the HELP staff with another, more external perspective. For example, one staff member observed that, at HELP, “ we have an internal vision because we work inside it ” and that evaluation findings (Box 1: study 12) could shed light on their partners’ views on various issues, such as when reimbursements for medical fees arrived late. HELP staff knew the reasons for this delay were outside their control, but “ it was interesting to see how the others [partners] perceived and sometimes criticized this; some even said it was because HELP was too late with reimbursements ” (HELP Staff (HS) 4). Similarly, a funding agency representative suggested that evaluation findings gave the agency a better understanding of people’s reactions to the exemption and, hence, of the field reality. Another EP suggested that findings pointed to deficiencies in the exemption program and were helpful in reflecting upon potential solutions: “ In my opinion, evaluations gave us a lot of experience and lessons to learn from ” (HS10).

In 2011, various EPs described how learning of the evaluation findings gave them a better understanding of the impacts of their work and of the exemption program. A HELP staff member recalled findings (Box 1: study 7) demonstrating that user fees were the primary barrier to healthcare accessibility, above and beyond geographical and cultural factors. Such findings validated the exemption program’s mission and counteracted previous arguments against user fee exemptions. Many of the findings also revealed positive effects of the exemption program on, for example, health service use. Consequently, another benefit of evaluation findings was that they boosted EPs’ motivations for their work:

“ I think this study [Box 1: study 3] was really useful and it had pretty important impacts on us. Speaking of the effects on the community, that was a motivating factor for us, it enabled us to see that by going in and out of the community all the time, we were actually bringing something ” (HS22).

After evaluation reports were presented, an MoH representative noted that he felt more capable when examining the health centre’s clinical data or even dealing with his patients after hearing about the different findings. One EP explained how some findings had changed his conception of the exemption and of program evaluation. He realized evaluations could detect the multiple effects of interventions, including some unexpected ones. For example, findings revealed that mothers felt empowered since the exemption implementation, as they could consult without their husbands’ approval and money [ 53 ]. Another participant also observed that hearing about evaluation findings changed many EPs’ receptivity to program evaluation. EPs were more forthcoming and followed evaluation activities better after attending report-presentation workshops (French: ateliers de restitutions ) and hearing about the different evaluation findings. He recalled health workers saying, “… the evaluators ‘ come take our data and leav e !’ but after attending report-release workshops, they understood the findings and their utility; it encourages them to collaborate ” (HS2). Participants also believed evaluation findings enhanced their capacities and their understanding of the field reality.

Persuasive use of evaluation findings

In 2009, persuasive use of evaluation was alluded to by EPs describing how evaluations supported their advocacy work. HELP staff said HELP’s major challenge was to disseminate evidence and convince their partners. Another explained their advocacy strategy, which involved partnering with the regional MoH (DRS and MCDs) and having them disseminate evaluation findings at national MoH meetings. One participant observed that Burkina Faso’s political decentralization facilitated the participation of the regional and district level MoH representatives, since they did not need consent from their national counterparts. The overarching goal was to convince policymakers of the benefits of user fee exemptions. HELP staff and MoH EPs suggested that the evaluation strategy validated their exemption work and bolstered their advocacy: “ We hope that maybe, with the expected results, a funding agency […] perhaps even the State, can participate [in the exemption]”. Hence, HELP used findings persuasively to try to convince regional and national politicians to support and scale up the exemption in Burkina Faso. One EP noted that findings were used in project proposals and reports as a means to convince others of the worthiness of pursuing HELP’s exemption program.

In the 2011 interviews, EPs also spoke of using evaluation findings to influence partners and policymakers. HELP staff recalled partnering with University of Montreal researchers to produce and compile evidence on HELP’s exemption program. Their studies demonstrated the value of the exemption, thereby establishing the pillars of HELP’s advocacy work. Evidence suggested that lifting the financial barriers to health access was commendable and logical. HELP staff recalled presenting findings to the MoH at national and international conferences to promote adoption of a national exemption program. Some also spoke about partnering with Amnesty International to advocate for evidence-based policymaking by the State [ 24 ]. HELP frequently shared scientific documentation with its funding agency, advocating for a national exemption program. An evaluator acknowledged HELP’s limited success in convincing politicians to adopt and scale up the exemption program, which sometimes led HELP and its partners to question “ …the use of all our work? ” (EE8). He explained how HELP and the evaluation strategy’s decision-makers had opted to end the evaluation strategy activities gradually, as it had already produced sufficient knowledge on essential questions, and to focus instead on HELP’s advocacy to find ways to increase politicians’ use of scientific evidence. Funding agency representatives criticized HELP’s persuasive use, suggesting that HELP needed to be more proactive in its advocacy strategy to seek and seize every diffusion opportunity:

“ I have the impression that HELP doesn’t really know how to show the value of its research […] Diffusion activities were good but I think they could have done even better. One example is the last diffusion activity; they weren’t able to meet with the Ministry of Health, even though this is a key stakeholder ” (ECHO representative) .

Meanwhile, HELP staff suggested that further targeting diffusion efforts to community members would benefit the exemption program’s activities. One difficulty with this, alluded to by an MoH representative, was the necessity of translating many of the presentations into local languages, as many in the community did not speak French. An evaluator explained how financial constraints led to the prioritization of knowledge transfer (KT) activities targeting political leaders, in hopes this would produce greater impacts. Nevertheless, he explained how evaluators with HELP had sought creative means, such as policy briefs and short films, to reach a diverse audience, focusing particularly on policymakers.

In both 2009 and 2011, one challenge underscored by EPs was that of interesting policymakers in these evidence-based findings and in the exemption itself. In 2009, the discourse was hopeful, while the 2011 interviews expressed more disappointment and doubt regarding the feasibility of advocacy objectives. From the 2011 interviews, it was clear that HELP had used evaluation findings to try to persuade others of the value of the exemption program. Whether they succeeded in their persuasive attempts is another interesting question, distinct from the present article’s focus specifically on EPs’ own use.

Overall, EPs described instances of instrumental, conceptual and persuasive use of findings in both 2009 and 2011. However, they discussed using more evaluations in 2011 than in 2009. One evaluator asserted that there was so much more EU by EPs in 2011 that it was not comparable to 2009. An evaluator also suggested this was because only one study, along with the action research project, had been finalized by the time of our first data collection in 2009. EUs were also described in greater detail by EPs in 2011 than in 2009.

Use of evaluation processes in 2009 and 2011

Instrumental use of evaluation processes.

Recommendations are often associated with findings, as they are frequently presented in the final evaluation report. However, in 2009, EPs recalled various lessons already learned during the evaluation process. For example, HELP staff recalled having discussions with evaluators and pointing out a problem, which was that the eligibility criterion for HELP’s user fees exemption for breastfeeding mothers was too vague, because breastfeeding duration varies widely across mother/baby pairs (Box 1: study 13). Based on discussions during the evaluation process, HELP stakeholders operationalized mothers’ eligibility to 2 years following a baby’s birth, and this information was then shared via guidelines disseminated to all health centres. Further, EPs who had been involved in the 2007 evaluation in Niger (Box 1: study 9) recalled learning that, because the evaluation had only been organized near the end of the project, it was not possible to use a pre–post design, which would have been the most meaningful methodologically. Having learned from this experience, HELP coordinators consulted the evaluator while planning their Burkina Faso exemption program to ensure pre–post designs could be used in the evaluations to measure the program’s efficacy more reliably. The coordinators had worked both in Niger and then in Burkina Faso and, hence, carried over such lessons. An evaluator recalled how his being consulted at the beginning of the Burkina Faso program led HELP stakeholders to delay implementing the exemption there in order to collect baseline data, despite the ethical dilemma that delaying the exemption meant delaying saving lives. Process discussions clarified that, irrespective of when the exemption would be implemented, the duration of the program was fixed and therefore the number of lives saved in the given time frame would be identical. Moreover, if careful planning led more convincing evidence of the exemption’s beneficial effects, HELP’s advocacy would have greater persuasive power. It was also made clear that funding a series of evaluations could produce useful knowledge for advocacy. Stakeholders made use of these discussions and decided (instrumental process use) to seek funds from a funding agency. They received funding to develop the evaluation strategy, which evolved over time into an extensive series of evaluations. New collaborations and networks with different African institutions were also born out of this initial evaluation partnership.

In 2011, an evaluator suggested that the initial collaboration process between HELP and evaluators had stimulated a proliferation of partnerships and networks among EPs, which developed further into their own respective documentation and advocacy projects. An MoH representative reported having learned a great deal about writing research protocols while collaborating with the external evaluators, which subsequently led him to write his own internal research protocol. Another MoH representative also recalled an evaluation of obstetric service use in which community members were, to his surprise, stakeholders in the research process even though they had little education (Box 1: study 8). He quickly realized the added value of their participation, as they gradually understood and supported the findings, became more proactive than usual, and identified sensible means of increasing obstetrical service use. Another instrumental use described by an evaluator and an MoH representative was that their collaboration may have sparked some EPs’ interest and motivation to develop their capacities further, as several subsequently chose to pursue graduate studies in health research. The evaluator believed that, for some EPs, the experience of networking with researchers and developing new contacts with local and international supervisors may have facilitated admissions to graduate schools and scholarships.

Conceptual use of evaluation processes

In the 2009 interviews, HELP staff described experiencing capacity building during evaluations and said their methodological, conceptual and technical understanding of the different research phases had been reinforced or updated. A HELP coordinator suggested his comprehension of public health had also improved during evaluations, which aided his management of the NGO. Other conceptual changes were noted. As another HELP staff member explained, “ What was good was that we were participating and engaging [in the evaluations] so it was not something external that just fell upon us… the fact that we had to ask questions meant we had to think about it ” (HS2). Through this process, they realized they could ask pertinent questions that strengthened their confidence. One HELP staff member said that participating in evaluations sparked a “ spirit of curiosity ” necessary to ask research questions and stimulated a sense of agency in pursuing answers. He believed more needed to be done to maintain such capacities and make the staff more autonomous. Another HELP staff member described how EPs’ interactions facilitated discussions and fostered the development of a common vocabulary infused with values such as scientific rigour and evaluation use. An evaluator believed evaluation processes had also led to the harmonization of EPs’ perceptions of the exemption and its impacts.

In 2011, EPs conveyed numerous examples of conceptual process use, including capacity building in evaluation (conceptualization, application and practice). An evaluator reported improvements over time in many of the HELP staff’s research, professional and management skills. One HELP staff member said working closely with evaluators was a source of inspiration, guidance and feedback that made him feel stronger and supported. Some reported that participating in evaluations helped their thinking become more rigorous, gave them another perspective on the program, highlighted the importance of measuring program effects and heightened their receptivity to evaluation. Another HELP staff member noted that it was when EPs really got involved in evaluations that they began to understand the findings and the value of evaluation, which in turn facilitated integration of EU into the HELP organization. HELP staff member said that participating in the evaluation dissemination process had many benefits, because the preparation and interactions involved required them to reflect more actively on the findings, which, in turn, enhanced their assimilation of the findings, making those more applicable. In his opinion, evaluation processes deepened and harmonized partners’ understanding of the exemption program, helping them find a common direction. A HELP coordinator also said, “ By rubbing shoulders with the evaluation culture, we were won over! ” (HS7). He described staff as being more prudent in their communications, using language that was measured, succinct, goal-oriented, scientific and evidence-based: “ It prevents us from arguing over facts that are not backed up ” (HS7). Another HELP staff member learned that precise communication with evaluators was helpful in obtaining results in tune with his information needs. An EP explained how the evaluation strategy expanded their professional networks, which facilitated information sharing and knowledge transfer. For all these reasons, various respondents believed other humanitarian NGOs involved in emergency action would also benefit from documenting the effects of their work.

Descriptions of conceptual process use examples changed between 2009 and 2011 as EPs suggested they had learned a great deal about evaluation, which changed their attitudes and behaviour with regard to evaluation activities. In 2011, respondents had more to say and were more enthusiastic about sharing the changes in their work, attitudes and understanding brought on by evaluation. Conceptual use appeared to have increased over time. Looking back over the evolution of the strategy, an evaluator highlighted the fact that the first evaluation activities, which proved useful for HELP, opened the way for more and progressive development of the evaluation strategy as new funding was granted for each successive phase of the exemption project. In 2009, EPs were impatient to hear about the evaluation findings, but once the evaluations were completed and the results shared, EPs became much more receptive to evaluators and convinced that program evaluation was pertinent for HELP. The evaluator pointed out that, as evaluation questions were answered, more were raised, and the evaluation strategy team developed progressively more evaluation activities. This was corroborated by documentation produced and shared by the evaluation strategy team. Thereafter, EPs used evaluation findings more frequently and EU became progressively mainstreamed into HELP’s exemption program.

Persuasive use of evaluation processes

In both 2009 and 2011, no respondent described any form of persuasive process use. In no instance did EPs describe having engaged in the evaluation process simply to satisfy the wish of their funding agency, to promote their own reputation or to convince others. As noted earlier, some spoke about engaging in the evaluation process, but their focus was more on using the findings than on the evaluation process itself.

The 2011 interviews shed light on the dynamics between some HELP staff and evaluators that inevitably influenced evaluation processes and perhaps EU. While these conditions influencing EU are a topic of their own to be covered in a future article, a few details provide valuable insight into the present study findings. For example, participants suggested that some HELP staff were reluctant to participate in the evaluation process partly because they did not completely trust the motives of evaluators who, according to them, may have been more concerned about furthering their research careers than about HELP’s actual mission. They expressed their discomfort to colleagues and to evaluators, but did not object to the conduct of evaluations and, in the end, found them useful.

As described in the methods section, non-participant observation and documentation provided valuable contextual information on the evaluation strategy and EPs. While systematic analysis of these data was not feasible due to time constraints, both sources provided relevant information. Non-participant observation enabled the first author to become immersed in the study context, to detect welcoming, collaborative and friendly dynamics between most EPs, and to observe that EPs were generally at ease in communicating with each other about questions and concerns. Certain other dynamics were also apparent, such as the relatively peaceful and friendly interactions between HELP staff and EPs. HELP staff tended to joke, tease one another, and laugh together. They had social gatherings on evenings and weekends. It was also apparent that some HELP staff tended to have more affinity than others with evaluators. All evaluators were warmly welcomed by HELP staff. While reluctance to trust evaluators’ motives was discussed only in individual interviews, informal discussions revealed that these issues had been discussed explicitly in team meetings. Team meetings appeared to foster frank and direct communication. Even so, various participants mentioned that, in Burkina Faso, anyone dealing with politics learns to communicate using a “ langue de bois ”, a diplomatic way of avoiding discussing sensitive issues directly, and this was indeed observed in interviews and interpersonal dynamics.

Collected documentation relating to the evaluation strategy and to collaborations among EPs also helped the first author become immersed in the working dynamics of EPs. It corroborated EPs’ discourses about increasing efforts over time to formalize agreements together by documenting contracts, report presentations and collaboration plans. Documents relating to evaluation activities and results (e.g. reports, scientific articles, policy briefs) proliferated between 2009 and 2011, supporting EPs’ descriptions of an increase in evaluation activities and EU over time. Emails between the principal evaluators and HELP coordinators were frequent from 2009 and too numerous to examine systematically, but generally their content demonstrated frank and transparent problem-solving, brainstorming and sharing of information about activities, events and scientific articles. As noted earlier, these forms of data were collected by the first author to complement the individual and group interview data and as a means of becoming better acquainted with the EPs’ working environment.

The present study enabled us to identify and provide rich descriptions of the different forms of EU in which EPs engaged between 2009 and 2011, as HELP’s evaluation strategy was rolled out. Descriptions of EU, including instrumental, conceptual and persuasive use of findings and/or processes, were generally more elaborate and specific in 2011, and EPs emphasized that EU had increased since 2009. EPs described all the forms of EU found in Alkin and Taut’s [ 33 ] categories, with the exception of persuasive (and symbolic) process use. Indeed, evaluation findings were used instrumentally by EPs for numerous purposes, including to identify program malfunctions and come up with solutions, to guide decisions and actions, and to manage and motivate colleagues. EPs also used findings conceptually in many ways, such as learning to see their program and work from an external perspective, recognizing the value of the exemption program and of their own work, communicating and motivating staff, and gaining an appreciation for the field reality and for program evaluation. EPs also used findings in a persuasive manner to convince others to support and scale up the exemption program. Persuading political decision-makers proved challenging, which corroborates Dagenais et al.’s [ 8 ] findings in the same national context and points to the common difficulty of making policymaking more evidence-based [ 54 , 55 ]. It became clear by 2011 that scientific knowledge was abundant and accessible to anyone interested, and therefore the evaluators felt they had done their work. It had also become clear that, to conserve the scientific rigour and neutrality expected of university researchers, the principal evaluators had to rethink their involvement in advocacy activities. Negotiating where KT ended and advocacy began presented an interesting challenge for external evaluators, HELP coordinators and other EPs. Financial limitations also led to difficult decisions regarding what KT activities could be undertaken, by whom, and for whom.

Participating in evaluations also prompted many instances of process use. Overall, the evaluation process provided countless opportunities for EPs to reflect upon their program and how they worked together and interacted. It provided opportunities to develop partnerships, communicate problems, and identify and implement potential solutions. It was clear, however, that issues of mistrust regarding evaluators’ motives and the allocation of evaluation resources were still taboo for several participants and not discussed openly among EPs. This may have negatively influenced their collaboration. Finding ways to overcome such challenges might result in more successful collaboration, evaluation participation and EU. Nevertheless, evaluation activities led EPs to learn about their program, evaluation processes and research methodology. By engaging in evaluations and interacting with evaluators, EPs learned to think in a different way about programs and scientific rigour. Since Patton’s original work [ 56 ] on utilization-focused evaluations, which described the benefits of participatory approaches and process use, many authors have documented the importance of engaging participants in the evaluation process [ 5 , 57 – 62 ]. The literature suggests that participation should ideally begin at conceptualization of an evaluation study [ 31 ]. While this may be ideal, the limited time and financial resources common to humanitarian practitioners, including in HELP’s organizational context, led some EPs to disinvest or invest only partially in the evaluation strategy. This was a source of frustration for evaluators and those more invested in the evaluation strategy. Yet, some EPs described how participating principally in the dissemination phase was helpful to them as a creative way of dealing with this issue of limited time, as it led them to invest in and reflect upon all the previous phases of evaluation that had led to the results they were mandated to present. This is an interesting option to consider when participating in all stages of all the evaluations is impossible, as it was for some EPs.

The reason for the absence of persuasive (symbolic) process use was not explained by our respondents, but Højlund’s [ 63 ] thoughts on an organization’s internal propensity and its external pressures to engage in evaluations provide interesting insights. More specifically, from the individual and group interview data, it was clear that, while HELP’s funders had requested the first evaluation, EPs felt little external pressure to undertake evaluations. The propensity to evaluate came from the inside, primarily from HELP’s coordinator, and the overall motives for evaluation were clear: to have credible findings to inform advocacy for accessible health services, and to learn about and improve the exemption program. Engaging in an evaluation process for symbolic reasons simply did not seem to be a concern for EPs. Respondents intended to use the evaluation findings, but not the process, for persuasive purposes.

A frequent challenge during the present study was to determine what exactly sparked EU. For instance, in the section above on instrumental process use in 2009, we discussed how evaluation discussions led participants to reconsider their approach and to seek more evaluation resources, develop the evaluation strategy, and form new collaborative networks and partnerships. It is difficult to pinpoint exactly when and why such attitude changes and decisions occurred. Were they prompted directly by discussions during an evaluation activity, which would clearly fall under process use, or did they arise simply from EPs being immersed in an evaluation strategy and thus in frequent interaction and communication with evaluators? This points to a limitation of the present study associated with respondents’ difficulty in recalling specifically what triggered a given decision or action. This issue was discussed by Leviton and Hughes [ 35 ], who described how, under such conditions, it is difficult to decipher where conceptual use ends and instrumental use begins and, in turn, to categorize use according to a specific EU taxonomy such as that of Alkin and Taut [ 33 ].

In the real-world setting of the present study, instrumental, conceptual and persuasive uses often overlapped and were not easily teased apart. Therefore, current EU taxonomy has received its share of criticism for operationalization challenges or for constraining the scope of evaluation consequences [ 64 – 66 ]. We encountered this challenge of limited scope when, for example, EPs discussed long-lasting effects the evaluation process had on them (e.g. expanded professional network, increased funding for the evaluation strategy). While we were sufficiently able to decipher the source of such effects so that we could categorize them using Alkin and Taut’s [ 33 ] EU taxonomy, it is true that Kirkhart’s [ 66 ] integrated theory of evaluation influence is better adapted to such situations. Kirkhart implored researchers to expand the scope of EU by acknowledging the full range of evaluation influences and suggested that existing conceptualizations of EU tend to overlook the value of process use and of uses that occur unintentionally or incrementally over time [ 66 ]. However, that model would also have presented its share of challenges, as our respondents were frequently unable to provide specific information about the source, intentionality or timeframe of influence, the three principal dimensions of the model. Providing such information was difficult for them, possibly because of the sheer number of evaluation activities undertaken as part of the evaluation strategy. We therefore concur with other authors in believing that Alkin and Taut’s [ 33 ] taxonomy of EU remains relevant [ 10 ], as we found that it facilitated our in-depth examination of the multiple facets and specific forms (instrumental, conceptual, persuasive) of EU processes and findings over time. We agree with Mark [ 67 ] that, rather than reinventing the wheel, a reasonable solution would be to see the concept of evaluation use not as competing with that of evaluation influence but rather as being complementary to it. This may help researchers, evaluators and intended users attend to an evaluation’s broad array of potential consequences when planning for, conducting or studying evaluations [ 67 ].

Another potential limitation of the study stems from the high mobility and turnover among participants, such that we were able to capture the evolving perspectives of only six EPs over the two data collections. Clarke and Ramalingam [ 68 ] discussed the fact that high turnover is common in humanitarian NGOs and presents both challenges (e.g. loss of organizational memory) and opportunities (e.g. bringing on new staff in line with evolving program objectives). Interviewing the same participants in both phases of the study might have produced different results, but the present findings reflect change processes that are common to the humanitarian sector reality. Patton [ 69 ] described turnover as the Achilles’ heel of utilization-focused evaluation and discussed the importance of working with multiple intended users so that the departure of one is not necessarily detrimental to EU. Such a challenge and solution apply to the present study, in which our aim was to follow multiple intended users who were present for either part or all of the study period. In fact, those interviewed in both data collections were four of the primary intended users (from HELP), an external evaluator, and an MoH representative. Hence, the study enabled us to examine the evolution of EU and how it was influenced by interpersonal dynamics and changing realities, such as turnover, that are common to many humanitarian NGOs, through the perspectives of EPs who had experienced the evaluation strategy in a variety of ways.

A third potential limitation of the study is that all three authors have, over time and to different degrees, developed professional and friendly relationships with various EPs – the second and third authors having acted as consultants for HELP; in a collaboration that evolves over time, this is not surprising and perhaps sometimes even desirable, but may make it difficult to maintain the neutrality required of an external evaluator. Mitigating these human dimensions while navigating the numerous potential evaluator roles, as described by Pattona and LaBossière [ 70 ], may have led to forms of normative discourse. Nevertheless, it is worth noting that the first author completed the research in total independence and without interference from HELP in the data. She undertook the study without payment and received only periodic material or logistical support from HELP when necessary to conduct the data collection. Also, only the first author, who never worked as consultant for HELP, conducted the interviews and analyzed and interpreted the data. While most evaluation studies have examined a single evaluation study or a specific evaluation program at one point in time [for examples see 10], the present study examined EU over time, with data collections separated by 29 months, and during an ongoing series of evaluation studies that were part of the evaluation strategy which originated from a single evaluation study in Niger in 2007. This was challenging because the literature provided few examples to guide the conceptualization and conduct of the present study. Yet, this was also the strength of the study, as it presented an innovative standpoint from which to examine EU. Future research may provide further guidance for the study of EU following a single evaluation or multiple evaluations embedded within an organization’s routine operations. Clearly, in our study context, evaluation partners’ EU evolved over time, and the study’s design enabled us to decipher the multiple forms in which EU occurred, including not only instrumental and conceptual forms of process and findings use, but also persuasive findings use. The study’s methodology was bolstered by our ability to seek out multiple groups of participants and thereby to triangulate perspectives. An important new contribution of the present study is, in fact, that it presents the views of both evaluators and intended users.

In 2004, a report by WHO emphasized the need to enhance the use of empirical knowledge in the health sector [ 23 ]. The following year, WHO members pledged to achieve universal healthcare and again highlighted the importance of using empirical evidence to guide global health policymaking and practices [ 26 ]. Nevertheless, how exactly are evaluations performed and used in global health and humanitarian contexts? Henry [ 65 ] pointed out that most of the EU literature is theoretical or conceptual and that very little of it examines EU systematically. Sandison [ 9 ] and Oliver [ 71 ] described how empirical research on EU within humanitarian organizations is particularly rare. HELP’s user fee exemption program presented an opportunity to include an evaluation strategy to study and document the processes, challenges, successes and impacts of the program. Simultaneously, this evaluation strategy itself presented an exceptional occasion to study and understand how evaluations can be both useful and actually used in the humanitarian context. In examining EU resulting from HELP’s evaluation strategy, the present case study helps bridge the knowledge-to-action gap by shedding light on the different ways HELP and its partners used evaluations. By studying how they collaborated to infuse EU into their practice and by examining how their discourses on EU evolved between 2009 and 2011, we determined that they increasingly used evaluation processes and findings instrumentally and conceptually, and used evaluation findings persuasively. Such uses served the mission of HELP’s exemption program in numerous ways by, among other things, supporting its members’ ability to think critically, improving their collaboration, identifying problems in the program and potential solutions, facilitating decision-making, and supporting HELP’s advocacy activities. In March 2016, we learned that Burkina Faso’s Ministerial Council [ 72 ] announced that, by April 2016, a national policy would be implemented to provide free healthcare for children under five and pregnant women, and to give women free access to caesarean sections and deliveries as well as to breast and cervical cancer screenings. While numerous barriers remain between empirical knowledge and its uptake in the political arena, and while it seems particularly difficult to use pilot studies to inform public policymaking [ 21 ], there is little doubt that HELP’s pilot exemption program and its associated evaluation strategy and advocacy activities, along with the work of partner organizations, played an important role in inspiring Burkina Faso’s recent policies. In a subsequent paper, we will discuss our analyses of the conditions that appear to have influenced EU among HELP’s evaluation partners.

Abbreviations

Directeur régional de la santé (regional health director)

European Commission’s Humanitarian Aid and Civil Protection department

Evaluation partner

Non-governmental organization Hilfe zur Selbsthilfe e.V.

Knowledge transfer

Médecin chef de district (district chief physician)

Ministry of Health

Non-governmental organization

Darcy J, Knox Clarke P. Evidence & knowledge in humanitarian action. Background paper, 28th ALNAP meeting, Washington, DC, 5–7 March 2013. London: ALNAP; 2013.

Google Scholar

Beck T. Evaluating humanitarian action: an ALNAP guidance booklet. London: ALNAP; 2003.

Crisp J. Thinking outside the box: evaluation and humanitarian action. Forced Migration Review. 2004;8:4–7.

Hallam A. Harnessing the power of evaluation in humanitarian action: An initiative to improve understanding and use of evaluation. ALNAP working paper. London: ALNAP/Overseas Development Institute; 2011.

Hallam A, Bonino F. Using evaluation for a change: insights from humanitarian practitioners. London: ALNAP/Overseas Development Institute; 2013.

ALNAP. Evaluating humanitarian action using the OECD-DAC criteria: an ALNAP guide for humanitarian agencies. London: ALNAP/Overseas Development Institute; 2006. http://www.alnap.org/pool/files/eha_2006.pdf . Accessed 11 January 2016.

Harveu P, Stoddard A, Harmer A, Taylor G, DiDomenico V, Brander L. The state of the humanitarian system: Assessing performance and progress. A pilot study. ALNAP working paper. London: ALNAP/Overseas Development Institute; 2010.

Dagenais C, Queuille L, Ridde V. Evaluation of a knowledge transfer strategy from a user fee exemption program for vulnerable populations in Burkina Faso. Global Health Promotion. 2013;20 Suppl 1:70–9. doi: 10.1177/1757975912462416 .

Article PubMed Google Scholar

Sandison P. The utilisation of evaluations. ALNAP Review of Humanitarian Action in 2005: Evaluation utilisation. London: ALNAP/Overseas Development Institute; 2006. http://www.livestock-emergency.net/userfiles/file/common-standards/ALNAP-2006.pdf . Accessed 11 January 2016.

Cousins JB, Shulha LM. A comparative analysis of evaluation utilization and its cognate fields of enquiry. In: Shaw I, Greene JC, Mark M, editors. Handbook of evaluation: policies, programs and practices. Thousand Oaks: Sage Publications; 2006. p. 233–54.

Ridde V, Heinmüller R, Queuille L, Rauland K. Améliorer l’accessibilité financière des soins de santé au Burkina Faso. Glob Health Promot. 2011;18(1):110–3. doi: 10.1177/1757975910393193 .

Ridde V, Queuille L, Atchessi N, Samb O, Heinmüller R, Haddad S. The evaluation of an experiment in healthcare user fees exemption for vulnerable groups in Burkina Faso. Field ACTions Science Reports. 2012;Special issue 7:1–8.

Ridde V, Queuille L. User fees exemption: One step on the path toward universal access to healthcare. 2010. http://www.usi.umontreal.ca/pdffile/2010/exemption/exemption_va.pdf . Accessed 11 January 2016.

HELP. Annual Report 2008. Bonn: HELP-Hilfe zur Selbshilfe e.V.; 2008. http://www.help-ev.de/fileadmin/media/pdf/Downloads/HELP_Annual_Report_engl_web.pdf . Accessed 22 November 2009.

INSD. La région du Sahel en chiffres. Ouagadougou: Ministère de l’Économie et des Finances; 2010.

World Health Organization. World health statistics 2007. Geneva: WHO; 2007.

World Health Organization. World Health Statistics 2014. Geneva: WHO; 2014.

Traoré C. Préface. In: Ridde V, Queuille L, Kafando Y, editors. Capitalisation de politiques publiques d'exemption du paiement des soins en Afrique de l'Ouest. Ouagadougou: CRCHUM/HELP/ECHO; 2012. p. 5–8.

Ridde V, Robert E, Meessen B. A literature review of the disruptive effects of user fee exemption policies on health systems. BMC Public Health. 2012;12:289.

Article PubMed PubMed Central Google Scholar

Olivier de Sardan JP, Ridde V. Public policies and health systems in Sahelian Africa: theoretical context and empirical specificity. BMC Health Serv Res. 2015;15 Suppl 3:S3.

Ridde V. From institutionalization of user fees to their abolition in West Africa: a story of pilot projects and public policies. BMC Health Serv Res. 2015;15 Suppl 3:S6.

Ridde V, Queuille L. Capitaliser pour apprendre et changer les politiques publiques d'exemption du paiement des soins en Afrique de l'Ouest: une (r)évolution en cours? In: Ridde V, Queuille L, Kafando Y, editors. Capitalisation de politiques publiques d'exemption du paiement des soins en Afrique de l'Ouest. Ouagadougou: CRCHUM/HELP/ECHO; 2012. p. 9–14.

World Health Organization. World Report on Knowledge for Better Health: Strengthening Health Systems. Geneva: WHO; 2004.

International A. Burkina Faso: Giving life, risking death. Time for action to reduce maternal mortality in Burkina Faso. Index number: AFR 60/001/2010. London: Amnesty International; 2010.

World Conference on Science. Excerpts from the declaration on science and the use of scientific knowledge. Sci Commun. 1999;21(2):183–6.

Article Google Scholar

World Health Organization. The World Health Report: Research for Universal Health Coverage. Geneva: WHO; 2013.

Ridde V, Diarra A, Moha M. User fees abolition policy in Niger. Comparing the under five years exemption implementation in two districts. Health Policy. 2011;99:219–25.

D’Ostie-Racine L, Dagenais C, Ridde V. An evaluability assessment of a West Africa based non-governmental organization's (NGO) progressive evaluation strategy. Eval Program Plann. 2013;36(1):71–9.

Shulha LM, Cousins JB. Evaluation use: theory, research, and practice since 1986. Eval Pract. 1997;18(3):195–208.

Herbert JL. Researching evaluation influence: a review of the literature. Eval Rev. 2014;38(5):388–419.

Patton MQ. Utilization-focused evaluation. 4th ed. Los Angeles: Sage Publications; 2008.

Patton MQ. Process use as a usefulism. N Dir Eval. 2007;116:99–112.

Alkin MC, Taut SM. Unbundling evaluation use. Stud Educ Eval. 2003;29:1–12.

Estabrooks C. The conceptual structure of research utilization. Res Nurs Health. 1999;22:203–16.

Article CAS PubMed Google Scholar

Leviton LC, Hughes EFX. Research on the utilization of evaluations. Eval Rev. 1981;5(4):525–48.

Weiss C. Introduction. In: Weiss C, Lexington MA, editors. Using Social Research in Pubic Policy Making. Lanham: Lexington Books; 1977.

Yin RK. Enhancing the quality of case studies in health services research. Health Serv Res. 1999;34(5 Pt 2):1209.

CAS PubMed PubMed Central Google Scholar

Yin RK. Case study research: design and methods. Thousand Oaks: Sage publications; 2014.

Stake RE. Case studies. In: Denzin NK, Lincoln YS, editors. Strategies of qualitative inquiry. 2nd ed. Thousand Oaks: Sage; 2003.

Patton MQ. Qualitative evaluation and research methods. 2nd ed. New York: Sage; 1990.

Olivier de Sardan JP. L’enquête socio-anthropologique de terrain : synthèse méthodologique et recommandations à usage des étudiants Niamey. Niger: LASDEL: Laboratoire d’études et recherches sur les dynamiques sociales et le développement local; 2003.

Creswell JW, Plano CV. Designing and conducting mixed methods research. Thousand Oaks: Sage Publications; 2006.

Pires AP. Échantillonage et recherche qualitative: essai théorique et méthodologique. In: Poupart J, Deslauriers J-P, Groulx L-H, Laperrière A, Mayer R, Pires AP, editors. La recherche qualitative: Enjeux épisémologiques et méthodologiques. Montréal: Gaëtan Morin; 1997. p. 113–67.

Stake RE. Qualitative research: Studying how things work. New York: The Guilford Press; 2010.

Kitzinger J. The methodology of Focus Groups: the importance of interaction between research participants. Sociol Health Illness. 1994;16(1):103–21.

Kitzinger J. Qualitative research: introducing focus groups. BMJ. 1995;311(7000):299–302.

Article CAS PubMed PubMed Central Google Scholar

Miles MB, Huberman M. Qualitative data analysis: an expanded sourcebook. 2nd ed. Newbury Park: Sage Publications; 1994.

Morse JM, Barrett M, Mayan M, Olson K, Spiers J. Verification strategies for establishing reliability and validity in qualitative research. Int J Qualitative Methods. 2002;1(2):1–19.

Patton MQ. Qualitative research. Wiley Online Library. 2005. doi: 10.1002/0470013192.bsa514 .

Ritchie J, Lewis J, Nicholls CM, Ormston R. Qualitative research practice: a guide for social science students and researchers. New York: Sage; 2013.

Ridde V, Diarra A. A process evaluation of user fees abolition for pregnant women and children under five years in two districts in Niger (West Africa). BMC Health Serv Res. 2009;9:89.

Antarou L, Ridde V, Kouanda S, Queuille L. La charge de travail des agents de santé dans un contexte de gratuité des soins au Burkina Faso et au Niger [Health staff workload in a context of user fees exemption policy for health care in Burkina Faso and Niger]. Bull Soc Pathol Exot. 2013;106(4):264–71.

Samb O, Belaid L, Ridde V. Burkina Faso: la gratuité des soins aux dépens de la relation entre les femmes et les soignants? Humanitaire: Enjeux, pratiques, débats. 2013;35:4–43.

Knox Clarke P, Darcy J. Insufficient evidence? The quality and use of evaluation in humanitarian action. London: ALNAP/Overseas Development Institute; 2014.

Crewe E, Young J. Bridging research and policy: Context, evidence and links. Working Paper 173. London: Overseas Development Institute; 2002. http://www.odi.org.uk/publications/working_papers/wp173.pdf . Accessed 11 January 2016.

Patton MQ. Utilization-focused evaluation. 1st ed. Thousand Oaks: Sage; 1978.

Buchanan-Smith M, Cosgrave J. Evaluation of humanitarian action: Pilot guide. London: ALNAP/Overseas Development Institute; 2013.

Cousins JB. Organizational consequences of participatory evaluation: School district case study. In: Leithwood K, Louis KS, editors. Organizational learning in schools. New York: Taylor & Francis; 1998. p. 127–48.

Cousins JB. Utilization effects of participatory evaluation. In: Kellaghan T, Stufflebeam DL, Wingate LA, editors. International handbook of educational evaluation: Part two: Practice. Boston: Kluwer; 2003. p. 245–66.

Chapter Google Scholar

Cousins JB, Earl LM. The case for participatory evaluation. Educ Eval Policy Analysis. 1992;14(4):397–418.

King JA. Developing evaluation capacity through process use. N Dir Eval. 2007;2007(116):45–59.

Patton MQ. Future trends in evaluation. In: Segone M, editor. From policies to results: Developing capacities for country monitoring and evaluation systems. Paris: UNICEF and IPEN; 2008. p. 44–56.

Højlund S. Evaluation use in the organizational context – changing focus to improve theory. Evaluation. 2014;20(1):26–43.

Henry G. Influential evaluations. Am J Eval. 2003;24(4):515–24.

Henry G. Beyond use: understanding evaluation's influence on attitudes and actions. Am J Eval. 2003;24(3):293–314.

Kirkhart KE. Reconceptualizing evaluation use: an integrated theory of influence. N Dir Eval. 2000;88:5–23.

Mark MM. Toward better research on—and thinking about—evaluation influence, especially in multisite evaluations. N Dir Eval. 2011;2011(129):107–19.

Clarke P, Ramalingam B. Organisational change in the humanitarian sector. London: ALNAP/Overseas Development Institute; 2008.

Patton MQ. Utilization-focused evaluation. 3rd ed. Thousand Oaks: Sage; 1997.

Patton MQ, LaBossière F. évaluation axée sur l'utilisation. In: Ridde V, Dagenais C, editors. Approches et pratiques en évaluation de programme. Montréal: Les Presses de l'Université de Montréal; 2009.

Oliver ML. Evaluation of emergency response: humanitarian aid agencies and evaluation influence. Dissertation, Georgia State University, 2008. http://scholarworks.gsu.edu/pmap_diss/23 . Accessed 11 Jan 2016.

Le Ministère du Burkina Faso. Compte-rendu du Conseil des ministres du mercredi 2 mars 2016. Portail officiel du gouvernement du Burkina Faso. Ouagadougou: Le Ministre de la Communication et des Relations avec le Parlement; 2016.

Download references

Acknowledgments

The authors wish to thank the two peer reviewers, whose feedback was especially helpful in improving the manuscript. Over the course of this study, Léna D’Ostie-Racine received funding from the Strategic Training Program in Global Health Research, a partnership of the Canadian Institutes of Health Research and the Québec Population Health Research Network. She was later also funded by the Fonds de recherche du Québec - Société et culture. The authors wish to express their utmost gratitude for the kind assistance and proactive participation of HELP managers and staff, the external evaluators, the district health management teams of Dori and Sebba in Burkina Faso, and the ECHO representatives, who together made this study possible. The authors also wish to thank Ludovic Queuille for his support throughout the study and for his insightful comments on previous drafts of the present article. The authors are also thankful to Didier Dupont for his consultations on qualitative analyses and to Karine Racicot for her remarkable help in reviewing and clarifying the application of the codebook. We also wish to thank all those, including Zoé Ouangré and Xavier Barsalou-Verge, who helped transcribe the interviews, which contained a vast array of African, Canadian and European accents. Our gratitude also goes out to all colleagues who provided support and insights throughout the study and/or commented on drafts of this article.

Authors’ contributions

All three authors conceptualized and designed the research project. Throughout the research project, LDR worked under the supervision, guidance and support of CD and VR. She developed the interview questions, collected the data, developed the thematic codebook, transcribed some interviews, and analyzed and interpreted the data independently. She also produced the manuscript. CD and VR reviewed and commented on drafts of the manuscript, providing input and guidance. All authors read and approved the final manuscript.

Authors’ information

Léna D’Ostie-Racine is a PhD student at the University of Montreal in research/clinical psychology. Her research thesis focuses on the use of program evaluation and conditions that influence the use of program evaluation processes and results, as well as on the development of an evaluation culture within the context of a humanitarian NGO promoting health equity.

Christian Dagenais, PhD, is associate professor at the University of Montreal. His research interests are centred around program evaluation and knowledge transfer. He coordinated a thematic segment of the Canadian Journal of Program Evaluation in 2009 and is a co-author of the book Approches et pratiques en évaluation de programme published in 2012. Since 2009, he has led the RENARD team ( www.equiperenard.ca ), which is funded by the Fonds de recherche du Quebec – Société et culture and is the first cross-disciplinary group in Quebec devoted to studying knowledge transfer in social interventions, including educational, health and community services.

Valéry Ridde, PhD, is associate professor of global health in the Department of Social and Preventive Medicine and the Research Institute (IRSPUM) of the University of Montreal School of Public Health. His research interests are centred around program evaluation, global health and healthcare accessibility ( www.equitesante.org ). VR holds a Canadian Institutes of Health Research (CIHR) funded Research Chair in Applied Public Health [CPP 137901].

Sources of support

The first author received financial support from the Fonds de recherche du Québec – Société et culture (FRQSC) and support from Équipe RENARD.

Author information

Authors and affiliations.

Department of Psychology, University of Montreal, Pavillon Marie-Victorin, Room C355, P.O. Box 6128, Centre-ville Station, Montreal, Quebec, H3C 3J7, Canada

Léna D’Ostie-Racine & Christian Dagenais

Department of Social and Preventive Medicine, University of Montreal School of Public Health (ESPUM), Montreal, Canada

Valéry Ridde

University of Montreal Public Health Research Institute (IRSPUM), Montreal, Canada

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Léna D’Ostie-Racine .

Ethics declarations

Competing interests.

The first author has benefited from HELP’s logistical assistance. The second and third authors have both worked as consultants for HELP. The funders and the NGO HELP did not take part in decisions on the study design, data collection or analysis, nor in the preparation and publication of the manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

D’Ostie-Racine, L., Dagenais, C. & Ridde, V. A qualitative case study of evaluation use in the context of a collaborative program evaluation strategy in Burkina Faso. Health Res Policy Sys 14 , 37 (2016). https://doi.org/10.1186/s12961-016-0109-0

Download citation

Received : 27 February 2015

Accepted : 29 April 2016

Published : 26 May 2016

DOI : https://doi.org/10.1186/s12961-016-0109-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Utilization
Program evaluation
Burkina Faso (West Africa)
User fee exemption

Health Research Policy and Systems

ISSN: 1478-4505

Submission enquiries: Access here and click Contact Us
General enquiries: [email protected]

qualitative case study program evaluation

Search form

Table of Contents
Troubleshooting Guide
A Model for Getting Started
Justice Action Toolkit
Best Change Processes
Databases of Best Practices
Online Courses
Ask an Advisor
Subscribe to eNewsletter
Community Stories
YouTube Channel
About the Tool Box
How to Use the Tool Box
Privacy Statement
Workstation/Check Box Sign-In
Online Training Courses
Capacity Building Training
Training Curriculum - Order Now
Community Check Box Evaluation System
Build Your Toolbox
Facilitation of Community Processes
Community Health Assessment and Planning
Section 1. A Framework for Program Evaluation: A Gateway to Tools

Chapter 36 Sections

Section 2. Community-based Participatory Research
Section 3. Understanding Community Leadership, Evaluators, and Funders: What Are Their Interests?
Section 4. Choosing Evaluators
Section 5. Developing an Evaluation Plan
Section 6. Participatory Evaluation
Main Section

This section is adapted from the article "Recommended Framework for Program Evaluation in Public Health Practice," by Bobby Milstein, Scott Wetterhall, and the CDC Evaluation Working Group.

Around the world, there exist many programs and interventions developed to improve conditions in local communities. Communities come together to reduce the level of violence that exists, to work for safe, affordable housing for everyone, or to help more students do well in school, to give just a few examples.

But how do we know whether these programs are working? If they are not effective, and even if they are, how can we improve them to make them better for local communities? And finally, how can an organization make intelligent choices about which promising programs are likely to work best in their community?

Over the past years, there has been a growing trend towards the better use of evaluation to understand and improve practice.The systematic use of evaluation has solved many problems and helped countless community-based organizations do what they do better.

Despite an increased understanding of the need for - and the use of - evaluation, however, a basic agreed-upon framework for program evaluation has been lacking. In 1997, scientists at the United States Centers for Disease Control and Prevention (CDC) recognized the need to develop such a framework. As a result of this, the CDC assembled an Evaluation Working Group comprised of experts in the fields of public health and evaluation. Members were asked to develop a framework that summarizes and organizes the basic elements of program evaluation. This Community Tool Box section describes the framework resulting from the Working Group's efforts.

Before we begin, however, we'd like to offer some definitions of terms that we will use throughout this section.

By evaluation , we mean the systematic investigation of the merit, worth, or significance of an object or effort. Evaluation practice has changed dramatically during the past three decades - new methods and approaches have been developed and it is now used for increasingly diverse projects and audiences.

Throughout this section, the term program is used to describe the object or effort that is being evaluated. It may apply to any action with the goal of improving outcomes for whole communities, for more specific sectors (e.g., schools, work places), or for sub-groups (e.g., youth, people experiencing violence or HIV/AIDS). This definition is meant to be very broad.

Examples of different types of programs include:

Direct service interventions (e.g., a program that offers free breakfast to improve nutrition for grade school children)
Community mobilization efforts (e.g., organizing a boycott of California grapes to improve the economic well-being of farm workers)
Research initiatives (e.g., an effort to find out whether inequities in health outcomes based on race can be reduced)
Surveillance systems (e.g., whether early detection of school readiness improves educational outcomes)
Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
Social marketing campaigns (e.g., a campaign in the Third World encouraging mothers to breast-feed their babies to reduce infant mortality)
Infrastructure building projects (e.g., a program to build the capacity of state agencies to support community development initiatives)
Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)
Administrative systems (e.g., an incentive program to improve efficiency of health services)

Program evaluation - the type of evaluation discussed in this section - is an essential organizational practice for all types of community health and development work. It is a way to evaluate the specific projects and activities community groups may take part in, rather than to evaluate an entire organization or comprehensive community initiative.

Stakeholders refer to those who care about the program or effort. These may include those presumed to benefit (e.g., children and their parents or guardians), those with particular influence (e.g., elected or appointed officials), and those who might support the effort (i.e., potential allies) or oppose it (i.e., potential opponents). Key questions in thinking about stakeholders are: Who cares? What do they care about?

This section presents a framework that promotes a common understanding of program evaluation. The overall goal is to make it easier for everyone involved in community health and development work to evaluate their efforts.

Why evaluate community health and development programs?

The type of evaluation we talk about in this section can be closely tied to everyday program operations. Our emphasis is on practical, ongoing evaluation that involves program staff, community members, and other stakeholders, not just evaluation experts. This type of evaluation offers many advantages for community health and development professionals.

For example, it complements program management by:

Helping to clarify program plans
Improving communication among partners
Gathering the feedback needed to improve and be accountable for program effectiveness

It's important to remember, too, that evaluation is not a new activity for those of us working to improve our communities. In fact, we assess the merit of our work all the time when we ask questions, consult partners, make assessments based on feedback, and then use those judgments to improve our work. When the stakes are low, this type of informal evaluation might be enough. However, when the stakes are raised - when a good deal of time or money is involved, or when many people may be affected - then it may make sense for your organization to use evaluation procedures that are more formal, visible, and justifiable.

How do you evaluate a specific program?

Before your organization starts with a program evaluation, your group should be very clear about the answers to the following questions:.

What will be evaluated?
What criteria will be used to judge program performance?
What standards of performance on the criteria must be reached for the program to be considered successful?
What evidence will indicate performance on the criteria relative to the standards?
What conclusions about program performance are justified based on the available evidence?

To clarify the meaning of each, let's look at some of the answers for Drive Smart, a hypothetical program begun to stop drunk driving.

Drive Smart, a program focused on reducing drunk driving through public education and intervention.
The number of community residents who are familiar with the program and its goals
The number of people who use "Safe Rides" volunteer taxis to get home
The percentage of people who report drinking and driving
The reported number of single car night time crashes (This is a common way to try to determine if the number of people who drive drunk is changing)
80% of community residents will know about the program and its goals after the first year of the program
The number of people who use the "Safe Rides" taxis will increase by 20% in the first year
The percentage of people who report drinking and driving will decrease by 20% in the first year
The reported number of single car night time crashes will decrease by 10 % in the program's first two years
A random telephone survey will demonstrate community residents' knowledge of the program and changes in reported behavior
Logs from "Safe Rides" will tell how many people use their services
Information on single car night time crashes will be gathered from police records
Are the changes we have seen in the level of drunk driving due to our efforts, or something else? Or (if no or insufficient change in behavior or outcome,)
Should Drive Smart change what it is doing, or have we just not waited long enough to see results?

The following framework provides an organized approach to answer these questions.

A framework for program evaluation

Program evaluation offers a way to understand and improve community health and development practice using methods that are useful, feasible, proper, and accurate. The framework described below is a practical non-prescriptive tool that summarizes in a logical order the important elements of program evaluation.

The framework contains two related dimensions:

Steps in evaluation practice, and
Standards for "good" evaluation.

The six connected steps of the framework are actions that should be a part of any evaluation. Although in practice the steps may be encountered out of order, it will usually make sense to follow them in the recommended sequence. That's because earlier steps provide the foundation for subsequent progress. Thus, decisions about how to carry out a given step should not be finalized until prior steps have been thoroughly addressed.

However, these steps are meant to be adaptable, not rigid. Sensitivity to each program's unique context (for example, the program's history and organizational climate) is essential for sound evaluation. They are intended to serve as starting points around which community organizations can tailor an evaluation to best meet their needs.

Engage stakeholders
Describe the program
Focus the evaluation design
Gather credible evidence
Justify conclusions
Ensure use and share lessons learned

Understanding and adhering to these basic steps will improve most evaluation efforts.

The second part of the framework is a basic set of standards to assess the quality of evaluation activities. There are 30 specific standards, organized into the following four groups:

Feasibility

These standards help answer the question, "Will this evaluation be a 'good' evaluation?" They are recommended as the initial criteria by which to judge the quality of the program evaluation efforts.

Engage Stakeholders

Stakeholders are people or organizations that have something to gain or lose from what will be learned from an evaluation, and also in what will be done with that knowledge. Evaluation cannot be done in isolation. Almost everything done in community health and development work involves partnerships - alliances among different organizations, board members, those affected by the problem, and others. Therefore, any serious effort to evaluate a program must consider the different values held by the partners. Stakeholders must be part of the evaluation to ensure that their unique perspectives are understood. When stakeholders are not appropriately involved, evaluation findings are likely to be ignored, criticized, or resisted.

However, if they are part of the process, people are likely to feel a good deal of ownership for the evaluation process and results. They will probably want to develop it, defend it, and make sure that the evaluation really works.

That's why this evaluation cycle begins by engaging stakeholders. Once involved, these people will help to carry out each of the steps that follows.

Three principle groups of stakeholders are important to involve:

People or organizations involved in program operations may include community members, sponsors, collaborators, coalition partners, funding officials, administrators, managers, and staff.
People or organizations served or affected by the program may include clients, family members, neighborhood organizations, academic institutions, elected and appointed officials, advocacy groups, and community residents. Individuals who are openly skeptical of or antagonistic toward the program may also be important to involve. Opening an evaluation to opposing perspectives and enlisting the help of potential program opponents can strengthen the evaluation's credibility.

Likewise, individuals or groups who could be adversely or inadvertently affected by changes arising from the evaluation have a right to be engaged. For example, it is important to include those who would be affected if program services were expanded, altered, limited, or ended as a result of the evaluation.

Primary intended users of the evaluation are the specific individuals who are in a position to decide and/or do something with the results.They shouldn't be confused with primary intended users of the program, although some of them should be involved in this group. In fact, primary intended users should be a subset of all of the stakeholders who have been identified. A successful evaluation will designate primary intended users, such as program staff and funders, early in its development and maintain frequent interaction with them to be sure that the evaluation specifically addresses their values and needs.

The amount and type of stakeholder involvement will be different for each program evaluation. For instance, stakeholders can be directly involved in designing and conducting the evaluation. They can be kept informed about progress of the evaluation through periodic meetings, reports, and other means of communication.

It may be helpful, when working with a group such as this, to develop an explicit process to share power and resolve conflicts . This may help avoid overemphasis of values held by any specific stakeholder.

Describe the Program

A program description is a summary of the intervention being evaluated. It should explain what the program is trying to accomplish and how it tries to bring about those changes. The description will also illustrate the program's core components and elements, its ability to make changes, its stage of development, and how the program fits into the larger organizational and community environment.

How a program is described sets the frame of reference for all future decisions about its evaluation. For example, if a program is described as, "attempting to strengthen enforcement of existing laws that discourage underage drinking," the evaluation might be very different than if it is described as, "a program to reduce drunk driving by teens." Also, the description allows members of the group to compare the program to other similar efforts, and it makes it easier to figure out what parts of the program brought about what effects.

Moreover, different stakeholders may have different ideas about what the program is supposed to achieve and why. For example, a program to reduce teen pregnancy may have some members who believe this means only increasing access to contraceptives, and other members who believe it means only focusing on abstinence.

Evaluations done without agreement on the program definition aren't likely to be very useful. In many cases, the process of working with stakeholders to develop a clear and logical program description will bring benefits long before data are available to measure program effectiveness.

There are several specific aspects that should be included when describing a program.

Statement of need

A statement of need describes the problem, goal, or opportunity that the program addresses; it also begins to imply what the program will do in response. Important features to note regarding a program's need are: the nature of the problem or goal, who is affected, how big it is, and whether (and how) it is changing.

Expectations

Expectations are the program's intended results. They describe what the program has to accomplish to be considered successful. For most programs, the accomplishments exist on a continuum (first, we want to accomplish X... then, we want to do Y...). Therefore, they should be organized by time ranging from specific (and immediate) to broad (and longer-term) consequences. For example, a program's vision, mission, goals, and objectives , all represent varying levels of specificity about a program's expectations.

Activities are everything the program does to bring about changes. Describing program components and elements permits specific strategies and actions to be listed in logical sequence. This also shows how different program activities, such as education and enforcement, relate to one another. Describing program activities also provides an opportunity to distinguish activities that are the direct responsibility of the program from those that are conducted by related programs or partner organizations. Things outside of the program that may affect its success, such as harsher laws punishing businesses that sell alcohol to minors, can also be noted.

Resources include the time, talent, equipment, information, money, and other assets available to conduct program activities. Reviewing the resources a program has tells a lot about the amount and intensity of its services. It may also point out situations where there is a mismatch between what the group wants to do and the resources available to carry out these activities. Understanding program costs is a necessity to assess the cost-benefit ratio as part of the evaluation.

Stage of development

A program's stage of development reflects its maturity. All community health and development programs mature and change over time. People who conduct evaluations, as well as those who use their findings, need to consider the dynamic nature of programs. For example, a new program that just received its first grant may differ in many respects from one that has been running for over a decade.

At least three phases of development are commonly recognized: planning , implementation , and effects or outcomes . In the planning stage, program activities are untested and the goal of evaluation is to refine plans as much as possible. In the implementation phase, program activities are being field tested and modified; the goal of evaluation is to see what happens in the "real world" and to improve operations. In the effects stage, enough time has passed for the program's effects to emerge; the goal of evaluation is to identify and understand the program's results, including those that were unintentional.

A description of the program's context considers the important features of the environment in which the program operates. This includes understanding the area's history, geography, politics, and social and economic conditions, and also what other organizations have done. A realistic and responsive evaluation is sensitive to a broad range of potential influences on the program. An understanding of the context lets users interpret findings accurately and assess their generalizability. For example, a program to improve housing in an inner-city neighborhood might have been a tremendous success, but would likely not work in a small town on the other side of the country without significant adaptation.

Logic model

A logic model synthesizes the main program elements into a picture of how the program is supposed to work. It makes explicit the sequence of events that are presumed to bring about change. Often this logic is displayed in a flow-chart, map, or table to portray the sequence of steps leading to program results.

Creating a logic model allows stakeholders to improve and focus program direction. It reveals assumptions about conditions for program effectiveness and provides a frame of reference for one or more evaluations of the program. A detailed logic model can also be a basis for estimating the program's effect on endpoints that are not directly measured. For example, it may be possible to estimate the rate of reduction in disease from a known number of persons experiencing the intervention if there is prior knowledge about its effectiveness.

The breadth and depth of a program description will vary for each program evaluation. And so, many different activities may be part of developing that description. For instance, multiple sources of information could be pulled together to construct a well-rounded description. The accuracy of an existing program description could be confirmed through discussion with stakeholders. Descriptions of what's going on could be checked against direct observation of activities in the field. A narrow program description could be fleshed out by addressing contextual factors (such as staff turnover, inadequate resources, political pressures, or strong community participation) that may affect program performance.

Focus the Evaluation Design

By focusing the evaluation design, we mean doing advance planning about where the evaluation is headed, and what steps it will take to get there. It isn't possible or useful for an evaluation to try to answer all questions for all stakeholders; there must be a focus. A well-focused plan is a safeguard against using time and resources inefficiently.

Depending on what you want to learn, some types of evaluation will be better suited than others. However, once data collection begins, it may be difficult or impossible to change what you are doing, even if it becomes obvious that other methods would work better. A thorough plan anticipates intended uses and creates an evaluation strategy with the greatest chance to be useful, feasible, proper, and accurate.

Among the issues to consider when focusing an evaluation are:

Purpose refers to the general intent of the evaluation. A clear purpose serves as the basis for the design, methods, and use of the evaluation. Taking time to articulate an overall purpose will stop your organization from making uninformed decisions about how the evaluation should be conducted and used.

There are at least four general purposes for which a community group might conduct an evaluation:

To gain insight .This happens, for example, when deciding whether to use a new approach (e.g., would a neighborhood watch program work for our community?) Knowledge from such an evaluation will provide information about its practicality. For a developing program, information from evaluations of similar programs can provide the insight needed to clarify how its activities should be designed.
To improve how things get done .This is appropriate in the implementation stage when an established program tries to describe what it has done. This information can be used to describe program processes, to improve how the program operates, and to fine-tune the overall strategy. Evaluations done for this purpose include efforts to improve the quality, effectiveness, or efficiency of program activities.
To determine what the effects of the program are . Evaluations done for this purpose examine the relationship between program activities and observed consequences. For example, are more students finishing high school as a result of the program? Programs most appropriate for this type of evaluation are mature programs that are able to state clearly what happened and who it happened to. Such evaluations should provide evidence about what the program's contribution was to reaching longer-term goals such as a decrease in child abuse or crime in the area. This type of evaluation helps establish the accountability, and thus, the credibility, of a program to funders and to the community.
Empower program participants (for example, being part of an evaluation can increase community members' sense of control over the program);
Supplement the program (for example, using a follow-up questionnaire can reinforce the main messages of the program);
Promote staff development (for example, by teaching staff how to collect, analyze, and interpret evidence); or
Contribute to organizational growth (for example, the evaluation may clarify how the program relates to the organization's mission).

Users are the specific individuals who will receive evaluation findings. They will directly experience the consequences of inevitable trade-offs in the evaluation process. For example, a trade-off might be having a relatively modest evaluation to fit the budget with the outcome that the evaluation results will be less certain than they would be for a full-scale evaluation. Because they will be affected by these tradeoffs, intended users have a right to participate in choosing a focus for the evaluation. An evaluation designed without adequate user involvement in selecting the focus can become a misguided and irrelevant exercise. By contrast, when users are encouraged to clarify intended uses, priority questions, and preferred methods, the evaluation is more likely to focus on things that will inform (and influence) future actions.

Uses describe what will be done with what is learned from the evaluation. There is a wide range of potential uses for program evaluation. Generally speaking, the uses fall in the same four categories as the purposes listed above: to gain insight, improve how things get done, determine what the effects of the program are, and affect participants. The following list gives examples of uses in each category.

Some specific examples of evaluation uses

To gain insight:.

Assess needs and wants of community members
Identify barriers to use of the program
Learn how to best describe and measure program activities

To improve how things get done:

Refine plans for introducing a new practice
Determine the extent to which plans were implemented
Improve educational materials
Enhance cultural competence
Verify that participants' rights are protected
Set priorities for staff training
Make mid-course adjustments
Clarify communication
Determine if client satisfaction can be improved
Compare costs to benefits
Find out which participants benefit most from the program
Mobilize community support for the program

To determine what the effects of the program are:

Assess skills development by program participants
Compare changes in behavior over time
Decide where to allocate new resources
Document the level of success in accomplishing objectives
Demonstrate that accountability requirements are fulfilled
Use information from multiple evaluations to predict the likely effects of similar programs

To affect participants:

Reinforce messages of the program
Stimulate dialogue and raise awareness about community issues
Broaden consensus among partners about program goals
Teach evaluation skills to staff and other stakeholders
Gather success stories
Support organizational change and improvement

The evaluation needs to answer specific questions . Drafting questions encourages stakeholders to reveal what they believe the evaluation should answer. That is, what questions are more important to stakeholders? The process of developing evaluation questions further refines the focus of the evaluation.

The methods available for an evaluation are drawn from behavioral science and social research and development. Three types of methods are commonly recognized. They are experimental, quasi-experimental, and observational or case study designs. Experimental designs use random assignment to compare the effect of an intervention between otherwise equivalent groups (for example, comparing a randomly assigned group of students who took part in an after-school reading program with those who didn't). Quasi-experimental methods make comparisons between groups that aren't equal (e.g. program participants vs. those on a waiting list) or use of comparisons within a group over time, such as in an interrupted time series in which the intervention may be introduced sequentially across different individuals, groups, or contexts. Observational or case study methods use comparisons within a group to describe and explain what happens (e.g., comparative case studies with multiple communities).

No design is necessarily better than another. Evaluation methods should be selected because they provide the appropriate information to answer stakeholders' questions, not because they are familiar, easy, or popular. The choice of methods has implications for what will count as evidence, how that evidence will be gathered, and what kind of claims can be made. Because each method option has its own biases and limitations, evaluations that mix methods are generally more robust.

Over the course of an evaluation, methods may need to be revised or modified. Circumstances that make a particular approach useful can change. For example, the intended use of the evaluation could shift from discovering how to improve the program to helping decide about whether the program should continue or not. Thus, methods may need to be adapted or redesigned to keep the evaluation on track.

Agreements summarize the evaluation procedures and clarify everyone's roles and responsibilities. An agreement describes how the evaluation activities will be implemented. Elements of an agreement include statements about the intended purpose, users, uses, and methods, as well as a summary of the deliverables, those responsible, a timeline, and budget.

The formality of the agreement depends upon the relationships that exist between those involved. For example, it may take the form of a legal contract, a detailed protocol, or a simple memorandum of understanding. Regardless of its formality, creating an explicit agreement provides an opportunity to verify the mutual understanding needed for a successful evaluation. It also provides a basis for modifying procedures if that turns out to be necessary.

As you can see, focusing the evaluation design may involve many activities. For instance, both supporters and skeptics of the program could be consulted to ensure that the proposed evaluation questions are politically viable. A menu of potential evaluation uses appropriate for the program's stage of development could be circulated among stakeholders to determine which is most compelling. Interviews could be held with specific intended users to better understand their information needs and timeline for action. Resource requirements could be reduced when users are willing to employ more timely but less precise evaluation methods.

Gather Credible Evidence

Credible evidence is the raw material of a good evaluation. The information learned should be seen by stakeholders as believable, trustworthy, and relevant to answer their questions. This requires thinking broadly about what counts as "evidence." Such decisions are always situational; they depend on the question being posed and the motives for asking it. For some questions, a stakeholder's standard for credibility could demand having the results of a randomized experiment. For another question, a set of well-done, systematic observations such as interactions between an outreach worker and community residents, will have high credibility. The difference depends on what kind of information the stakeholders want and the situation in which it is gathered.

Context matters! In some situations, it may be necessary to consult evaluation specialists. This may be especially true if concern for data quality is especially high. In other circumstances, local people may offer the deepest insights. Regardless of their expertise, however, those involved in an evaluation should strive to collect information that will convey a credible, well-rounded picture of the program and its efforts.

Having credible evidence strengthens the evaluation results as well as the recommendations that follow from them. Although all types of data have limitations, it is possible to improve an evaluation's overall credibility. One way to do this is by using multiple procedures for gathering, analyzing, and interpreting data. Encouraging participation by stakeholders can also enhance perceived credibility. When stakeholders help define questions and gather data, they will be more likely to accept the evaluation's conclusions and to act on its recommendations.

The following features of evidence gathering typically affect how credible it is seen as being:

Indicators translate general concepts about the program and its expected effects into specific, measurable parts.

Examples of indicators include:

The program's capacity to deliver services
The participation rate
The level of client satisfaction
The amount of intervention exposure (how many people were exposed to the program, and for how long they were exposed)
Changes in participant behavior
Changes in community conditions or norms
Changes in the environment (e.g., new programs, policies, or practices)
Longer-term changes in population health status (e.g., estimated teen pregnancy rate in the county)

Indicators should address the criteria that will be used to judge the program. That is, they reflect the aspects of the program that are most meaningful to monitor. Several indicators are usually needed to track the implementation and effects of a complex program or intervention.

One way to develop multiple indicators is to create a "balanced scorecard," which contains indicators that are carefully selected to complement one another. According to this strategy, program processes and effects are viewed from multiple perspectives using small groups of related indicators. For instance, a balanced scorecard for a single program might include indicators of how the program is being delivered; what participants think of the program; what effects are observed; what goals were attained; and what changes are occurring in the environment around the program.

Another approach to using multiple indicators is based on a program logic model, such as we discussed earlier in the section. A logic model can be used as a template to define a full spectrum of indicators along the pathway that leads from program activities to expected effects. For each step in the model, qualitative and/or quantitative indicators could be developed.

Indicators can be broad-based and don't need to focus only on a program's long -term goals. They can also address intermediary factors that influence program effectiveness, including such intangible factors as service quality, community capacity, or inter -organizational relations. Indicators for these and similar concepts can be created by systematically identifying and then tracking markers of what is said or done when the concept is expressed.

In the course of an evaluation, indicators may need to be modified or new ones adopted. Also, measuring program performance by tracking indicators is only one part of evaluation, and shouldn't be confused as a basis for decision making in itself. There are definite perils to using performance indicators as a substitute for completing the evaluation process and reaching fully justified conclusions. For example, an indicator, such as a rising rate of unemployment, may be falsely assumed to reflect a failing program when it may actually be due to changing environmental conditions that are beyond the program's control.

Sources of evidence in an evaluation may be people, documents, or observations. More than one source may be used to gather evidence for each indicator. In fact, selecting multiple sources provides an opportunity to include different perspectives about the program and enhances the evaluation's credibility. For instance, an inside perspective may be reflected by internal documents and comments from staff or program managers; whereas clients and those who do not support the program may provide different, but equally relevant perspectives. Mixing these and other perspectives provides a more comprehensive view of the program or intervention.

The criteria used to select sources should be clearly stated so that users and other stakeholders can interpret the evidence accurately and assess if it may be biased. In addition, some sources provide information in narrative form (for example, a person's experience when taking part in the program) and others are numerical (for example, how many people were involved in the program). The integration of qualitative and quantitative information can yield evidence that is more complete and more useful, thus meeting the needs and expectations of a wider range of stakeholders.

Quality refers to the appropriateness and integrity of information gathered in an evaluation. High quality data are reliable and informative. It is easier to collect if the indicators have been well defined. Other factors that affect quality may include instrument design, data collection procedures, training of those involved in data collection, source selection, coding, data management, and routine error checking. Obtaining quality data will entail tradeoffs (e.g. breadth vs. depth); stakeholders should decide together what is most important to them. Because all data have limitations, the intent of a practical evaluation is to strive for a level of quality that meets the stakeholders' threshold for credibility.

Quantity refers to the amount of evidence gathered in an evaluation. It is necessary to estimate in advance the amount of information that will be required and to establish criteria to decide when to stop collecting data - to know when enough is enough. Quantity affects the level of confidence or precision users can have - how sure we are that what we've learned is true. It also partly determines whether the evaluation will be able to detect effects. All evidence collected should have a clear, anticipated use.

By logistics , we mean the methods, timing, and physical infrastructure for gathering and handling evidence. People and organizations also have cultural preferences that dictate acceptable ways of asking questions and collecting information, including who would be perceived as an appropriate person to ask the questions. For example, some participants may be unwilling to discuss their behavior with a stranger, whereas others are more at ease with someone they don't know. Therefore, the techniques for gathering evidence in an evaluation must be in keeping with the cultural norms of the community. Data collection procedures should also ensure that confidentiality is protected.

Justify Conclusions

The process of justifying conclusions recognizes that evidence in an evaluation does not necessarily speak for itself. Evidence must be carefully considered from a number of different stakeholders' perspectives to reach conclusions that are well -substantiated and justified. Conclusions become justified when they are linked to the evidence gathered and judged against agreed-upon values set by the stakeholders. Stakeholders must agree that conclusions are justified in order to use the evaluation results with confidence.

The principal elements involved in justifying conclusions based on evidence are:

Standards reflect the values held by stakeholders about the program. They provide the basis to make program judgments. The use of explicit standards for judgment is fundamental to sound evaluation. In practice, when stakeholders articulate and negotiate their values, these become the standards to judge whether a given program's performance will, for instance, be considered "successful," "adequate," or "unsuccessful."

Analysis and synthesis

Analysis and synthesis are methods to discover and summarize an evaluation's findings. They are designed to detect patterns in evidence, either by isolating important findings (analysis) or by combining different sources of information to reach a larger understanding (synthesis). Mixed method evaluations require the separate analysis of each evidence element, as well as a synthesis of all sources to examine patterns that emerge. Deciphering facts from a given body of evidence involves deciding how to organize, classify, compare, and display information. These decisions are guided by the questions being asked, the types of data available, and especially by input from stakeholders and primary intended users.

Interpretation

Interpretation is the effort to figure out what the findings mean. Uncovering facts about a program's performance isn't enough to make conclusions. The facts must be interpreted to understand their practical significance. For example, saying, "15 % of the people in our area witnessed a violent act last year," may be interpreted differently depending on the situation. For example, if 50% of community members had watched a violent act in the last year when they were surveyed five years ago, the group can suggest that, while still a problem, things are getting better in the community. However, if five years ago only 7% of those surveyed said the same thing, community organizations may see this as a sign that they might want to change what they are doing. In short, interpretations draw on information and perspectives that stakeholders bring to the evaluation. They can be strengthened through active participation or interaction with the data and preliminary explanations of what happened.

Judgments are statements about the merit, worth, or significance of the program. They are formed by comparing the findings and their interpretations against one or more selected standards. Because multiple standards can be applied to a given program, stakeholders may reach different or even conflicting judgments. For instance, a program that increases its outreach by 10% from the previous year may be judged positively by program managers, based on standards of improved performance over time. Community members, however, may feel that despite improvements, a minimum threshold of access to services has still not been reached. Their judgment, based on standards of social equity, would therefore be negative. Conflicting claims about a program's quality, value, or importance often indicate that stakeholders are using different standards or values in making judgments. This type of disagreement can be a catalyst to clarify values and to negotiate the appropriate basis (or bases) on which the program should be judged.

Recommendations

Recommendations are actions to consider as a result of the evaluation. Forming recommendations requires information beyond just what is necessary to form judgments. For example, knowing that a program is able to increase the services available to battered women doesn't necessarily translate into a recommendation to continue the effort, particularly when there are competing priorities or other effective alternatives. Thus, recommendations about what to do with a given intervention go beyond judgments about a specific program's effectiveness.

If recommendations aren't supported by enough evidence, or if they aren't in keeping with stakeholders' values, they can really undermine an evaluation's credibility. By contrast, an evaluation can be strengthened by recommendations that anticipate and react to what users will want to know.

Three things might increase the chances that recommendations will be relevant and well-received:

Sharing draft recommendations
Soliciting reactions from multiple stakeholders
Presenting options instead of directive advice

Justifying conclusions in an evaluation is a process that involves different possible steps. For instance, conclusions could be strengthened by searching for alternative explanations from the ones you have chosen, and then showing why they are unsupported by the evidence. When there are different but equally well supported conclusions, each could be presented with a summary of their strengths and weaknesses. Techniques to analyze, synthesize, and interpret findings might be agreed upon before data collection begins.

Ensure Use and Share Lessons Learned

It is naive to assume that lessons learned in an evaluation will necessarily be used in decision making and subsequent action. Deliberate effort on the part of evaluators is needed to ensure that the evaluation findings will be used appropriately. Preparing for their use involves strategic thinking and continued vigilance in looking for opportunities to communicate and influence. Both of these should begin in the earliest stages of the process and continue throughout the evaluation.

The elements of key importance to be sure that the recommendations from an evaluation are used are:

Design refers to how the evaluation's questions, methods, and overall processes are constructed. As discussed in the third step of this framework (focusing the evaluation design), the evaluation should be organized from the start to achieve specific agreed-upon uses. Having a clear purpose that is focused on the use of what is learned helps those who will carry out the evaluation to know who will do what with the findings. Furthermore, the process of creating a clear design will highlight ways that stakeholders, through their many contributions, can improve the evaluation and facilitate the use of the results.

Preparation

Preparation refers to the steps taken to get ready for the future uses of the evaluation findings. The ability to translate new knowledge into appropriate action is a skill that can be strengthened through practice. In fact, building this skill can itself be a useful benefit of the evaluation. It is possible to prepare stakeholders for future use of the results by discussing how potential findings might affect decision making.

For example, primary intended users and other stakeholders could be given a set of hypothetical results and asked what decisions or actions they would make on the basis of this new knowledge. If they indicate that the evidence presented is incomplete or irrelevant and that no action would be taken, then this is an early warning sign that the planned evaluation should be modified. Preparing for use also gives stakeholders more time to explore both positive and negative implications of potential results and to identify different options for program improvement.

Feedback is the communication that occurs among everyone involved in the evaluation. Giving and receiving feedback creates an atmosphere of trust among stakeholders; it keeps an evaluation on track by keeping everyone informed about how the evaluation is proceeding. Primary intended users and other stakeholders have a right to comment on evaluation decisions. From a standpoint of ensuring use, stakeholder feedback is a necessary part of every step in the evaluation. Obtaining valuable feedback can be encouraged by holding discussions during each step of the evaluation and routinely sharing interim findings, provisional interpretations, and draft reports.

Follow-up refers to the support that many users need during the evaluation and after they receive evaluation findings. Because of the amount of effort required, reaching justified conclusions in an evaluation can seem like an end in itself. It is not . Active follow-up may be necessary to remind users of the intended uses of what has been learned. Follow-up may also be required to stop lessons learned from becoming lost or ignored in the process of making complex or political decisions. To guard against such oversight, it may be helpful to have someone involved in the evaluation serve as an advocate for the evaluation's findings during the decision -making phase.

Facilitating the use of evaluation findings also carries with it the responsibility to prevent misuse. Evaluation results are always bounded by the context in which the evaluation was conducted. Some stakeholders, however, may be tempted to take results out of context or to use them for different purposes than what they were developed for. For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation.

Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked. Active follow-up can help to prevent these and other forms of misuse by ensuring that evidence is only applied to the questions that were the central focus of the evaluation.

Dissemination

Dissemination is the process of communicating the procedures or the lessons learned from an evaluation to relevant audiences in a timely, unbiased, and consistent fashion. Like other elements of the evaluation, the reporting strategy should be discussed in advance with intended users and other stakeholders. Planning effective communications also requires considering the timing, style, tone, message source, vehicle, and format of information products. Regardless of how communications are constructed, the goal for dissemination is to achieve full disclosure and impartial reporting.

Along with the uses for evaluation findings, there are also uses that flow from the very process of evaluating. These "process uses" should be encouraged. The people who take part in an evaluation can experience profound changes in beliefs and behavior. For instance, an evaluation challenges staff members to act differently in what they are doing, and to question assumptions that connect program activities with intended effects.

Evaluation also prompts staff to clarify their understanding of the goals of the program. This greater clarity, in turn, helps staff members to better function as a team focused on a common end. In short, immersion in the logic, reasoning, and values of evaluation can have very positive effects, such as basing decisions on systematic judgments instead of on unfounded assumptions.

Additional process uses for evaluation include:

By defining indicators, what really matters to stakeholders becomes clear
It helps make outcomes matter by changing the reinforcements connected with achieving positive results. For example, a funder might offer "bonus grants" or "outcome dividends" to a program that has shown a significant amount of community change and improvement.

Standards for "good" evaluation

There are standards to assess whether all of the parts of an evaluation are well -designed and working to their greatest potential. The Joint Committee on Educational Evaluation developed "The Program Evaluation Standards" for this purpose. These standards, designed to assess evaluations of educational programs, are also relevant for programs and interventions related to community health and development.

The program evaluation standards make it practical to conduct sound and fair evaluations. They offer well-supported principles to follow when faced with having to make tradeoffs or compromises. Attending to the standards can guard against an imbalanced evaluation, such as one that is accurate and feasible, but isn't very useful or sensitive to the context. Another example of an imbalanced evaluation is one that would be genuinely useful, but is impossible to carry out.

The following standards can be applied while developing an evaluation design and throughout the course of its implementation. Remember, the standards are written as guiding principles, not as rigid rules to be followed in all situations.

The 30 more specific standards are grouped into four categories:

The utility standards are:

Stakeholder Identification : People who are involved in (or will be affected by) the evaluation should be identified, so that their needs can be addressed.
Evaluator Credibility : The people conducting the evaluation should be both trustworthy and competent, so that the evaluation will be generally accepted as credible or believable.
Information Scope and Selection : Information collected should address pertinent questions about the program, and it should be responsive to the needs and interests of clients and other specified stakeholders.
Values Identification: The perspectives, procedures, and rationale used to interpret the findings should be carefully described, so that the bases for judgments about merit and value are clear.
Report Clarity: Evaluation reports should clearly describe the program being evaluated, including its context, and the purposes, procedures, and findings of the evaluation. This will help ensure that essential information is provided and easily understood.
Report Timeliness and Dissemination: Significant midcourse findings and evaluation reports should be shared with intended users so that they can be used in a timely fashion.
Evaluation Impact: Evaluations should be planned, conducted, and reported in ways that encourage follow-through by stakeholders, so that the evaluation will be used.

Feasibility Standards

The feasibility standards are to ensure that the evaluation makes sense - that the steps that are planned are both viable and pragmatic.

The feasibility standards are:

Practical Procedures: The evaluation procedures should be practical, to keep disruption of everyday activities to a minimum while needed information is obtained.
Political Viability : The evaluation should be planned and conducted with anticipation of the different positions or interests of various groups. This should help in obtaining their cooperation so that possible attempts by these groups to curtail evaluation operations or to misuse the results can be avoided or counteracted.
Cost Effectiveness: The evaluation should be efficient and produce enough valuable information that the resources used can be justified.

Propriety Standards

The propriety standards ensure that the evaluation is an ethical one, conducted with regard for the rights and interests of those involved. The eight propriety standards follow.

Service Orientation : Evaluations should be designed to help organizations effectively serve the needs of all of the targeted participants.
Formal Agreements : The responsibilities in an evaluation (what is to be done, how, by whom, when) should be agreed to in writing, so that those involved are obligated to follow all conditions of the agreement, or to formally renegotiate it.
Rights of Human Subjects : Evaluation should be designed and conducted to respect and protect the rights and welfare of human subjects, that is, all participants in the study.
Human Interactions : Evaluators should respect basic human dignity and worth when working with other people in an evaluation, so that participants don't feel threatened or harmed.
Complete and Fair Assessment : The evaluation should be complete and fair in its examination, recording both strengths and weaknesses of the program being evaluated. This allows strengths to be built upon and problem areas addressed.
Disclosure of Findings : The people working on the evaluation should ensure that all of the evaluation findings, along with the limitations of the evaluation, are accessible to everyone affected by the evaluation, and any others with expressed legal rights to receive the results.
Conflict of Interest: Conflict of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results.
Fiscal Responsibility : The evaluator's use of resources should reflect sound accountability procedures and otherwise be prudent and ethically responsible, so that expenditures are accounted for and appropriate.

Accuracy Standards

The accuracy standards ensure that the evaluation findings are considered correct.

There are 12 accuracy standards:

Program Documentation: The program should be described and documented clearly and accurately, so that what is being evaluated is clearly identified.
Context Analysis: The context in which the program exists should be thoroughly examined so that likely influences on the program can be identified.
Described Purposes and Procedures: The purposes and procedures of the evaluation should be monitored and described in enough detail that they can be identified and assessed.
Defensible Information Sources: The sources of information used in a program evaluation should be described in enough detail that the adequacy of the information can be assessed.
Valid Information: The information gathering procedures should be chosen or developed and then implemented in such a way that they will assure that the interpretation arrived at is valid.
Reliable Information : The information gathering procedures should be chosen or developed and then implemented so that they will assure that the information obtained is sufficiently reliable.
Systematic Information: The information from an evaluation should be systematically reviewed and any errors found should be corrected.
Analysis of Quantitative Information: Quantitative information - data from observations or surveys - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
Analysis of Qualitative Information: Qualitative information - descriptive information from interviews and other sources - in an evaluation should be appropriately and systematically analyzed so that evaluation questions are effectively answered.
Justified Conclusions: The conclusions reached in an evaluation should be explicitly justified, so that stakeholders can understand their worth.
Impartial Reporting: Reporting procedures should guard against the distortion caused by personal feelings and biases of people involved in the evaluation, so that evaluation reports fairly reflect the evaluation findings.
Metaevaluation: The evaluation itself should be evaluated against these and other pertinent standards, so that it is appropriately guided and, on completion, stakeholders can closely examine its strengths and weaknesses.

Applying the framework: Conducting optimal evaluations

There is an ever-increasing agreement on the worth of evaluation; in fact, doing so is often required by funders and other constituents. So, community health and development professionals can no longer question whether or not to evaluate their programs. Instead, the appropriate questions are:

What is the best way to evaluate?
What are we learning from the evaluation?
How will we use what we learn to become more effective?

The framework for program evaluation helps answer these questions by guiding users to select evaluation strategies that are useful, feasible, proper, and accurate.

To use this framework requires quite a bit of skill in program evaluation. In most cases there are multiple stakeholders to consider, the political context may be divisive, steps don't always follow a logical order, and limited resources may make it difficult to take a preferred course of action. An evaluator's challenge is to devise an optimal strategy, given the conditions she is working under. An optimal strategy is one that accomplishes each step in the framework in a way that takes into account the program context and is able to meet or exceed the relevant standards.

This framework also makes it possible to respond to common concerns about program evaluation. For instance, many evaluations are not undertaken because they are seen as being too expensive. The cost of an evaluation, however, is relative; it depends upon the question being asked and the level of certainty desired for the answer. A simple, low-cost evaluation can deliver information valuable for understanding and improvement.

Rather than discounting evaluations as a time-consuming sideline, the framework encourages evaluations that are timed strategically to provide necessary feedback. This makes it possible to make evaluation closely linked with everyday practices.

Another concern centers on the perceived technical demands of designing and conducting an evaluation. However, the practical approach endorsed by this framework focuses on questions that can improve the program.

Finally, the prospect of evaluation troubles many staff members because they perceive evaluation methods as punishing ("They just want to show what we're doing wrong."), exclusionary ("Why aren't we part of it? We're the ones who know what's going on."), and adversarial ("It's us against them.") The framework instead encourages an evaluation approach that is designed to be helpful and engages all interested stakeholders in a process that welcomes their participation.

Evaluation is a powerful strategy for distinguishing programs and interventions that make a difference from those that don't. It is a driving force for developing and adapting sound strategies, improving existing programs, and demonstrating the results of investments in time and other resources. It also helps determine if what is being done is worth the cost.

This recommended framework for program evaluation is both a synthesis of existing best practices and a set of standards for further improvement. It supports a practical approach to evaluation based on steps and standards that can be applied in almost any setting. Because the framework is purposefully general, it provides a stable guide to design and conduct a wide range of evaluation efforts in a variety of specific program areas. The framework can be used as a template to create useful evaluation plans to contribute to understanding and improvement. The Magenta Book - Guidance for Evaluation provides additional information on requirements for good evaluation, and some straightforward steps to make a good evaluation of an intervention more feasible, read The Magenta Book - Guidance for Evaluation.

Online Resources

Are You Ready to Evaluate your Coalition? prompts 15 questions to help the group decide whether your coalition is ready to evaluate itself and its work.

The American Evaluation Association Guiding Principles for Evaluators helps guide evaluators in their professional practice.

CDC Evaluation Resources provides a list of resources for evaluation, as well as links to professional associations and journals.

Chapter 11: Community Interventions in the "Introduction to Community Psychology" explains professionally-led versus grassroots interventions, what it means for a community intervention to be effective, why a community needs to be ready for an intervention, and the steps to implementing community interventions.

The Comprehensive Cancer Control Branch Program Evaluation Toolkit is designed to help grantees plan and implement evaluations of their NCCCP-funded programs, this toolkit provides general guidance on evaluation principles and techniques, as well as practical templates and tools.

Developing an Effective Evaluation Plan is a workbook provided by the CDC. In addition to information on designing an evaluation plan, this book also provides worksheets as a step-by-step guide.

EvaluACTION , from the CDC, is designed for people interested in learning about program evaluation and how to apply it to their work. Evaluation is a process, one dependent on what you’re currently doing and on the direction in which you’d like go. In addition to providing helpful information, the site also features an interactive Evaluation Plan & Logic Model Builder, so you can create customized tools for your organization to use.

Evaluating Your Community-Based Program is a handbook designed by the American Academy of Pediatrics covering a variety of topics related to evaluation.

GAO Designing Evaluations is a handbook provided by the U.S. Government Accountability Office with copious information regarding program evaluations.

The CDC's Introduction to Program Evaluation for Publilc Health Programs: A Self-Study Guide is a "how-to" guide for planning and implementing evaluation activities. The manual, based on CDC’s Framework for Program Evaluation in Public Health, is intended to assist with planning, designing, implementing and using comprehensive evaluations in a practical way.

McCormick Foundation Evaluation Guide is a guide to planning an organization’s evaluation, with several chapters dedicated to gathering information and using it to improve the organization.

A Participatory Model for Evaluating Social Programs from the James Irvine Foundation.

Practical Evaluation for Public Managers is a guide to evaluation written by the U.S. Department of Health and Human Services.

Penn State Program Evaluation offers information on collecting different forms of data and how to measure different community markers.

Program Evaluaton information page from Implementation Matters.

The Program Manager's Guide to Evaluation is a handbook provided by the Administration for Children and Families with detailed answers to nine big questions regarding program evaluation.

Program Planning and Evaluation is a website created by the University of Arizona. It provides links to information on several topics including methods, funding, types of evaluation, and reporting impacts.

User-Friendly Handbook for Program Evaluation is a guide to evaluations provided by the National Science Foundation. This guide includes practical information on quantitative and qualitative methodologies in evaluations.

W.K. Kellogg Foundation Evaluation Handbook provides a framework for thinking about evaluation as a relevant and useful program tool. It was originally written for program directors with direct responsibility for the ongoing evaluation of the W.K. Kellogg Foundation.

Print Resources

This Community Tool Box section is an edited version of:

CDC Evaluation Working Group. (1999). (Draft). Recommended framework for program evaluation in public health practice . Atlanta, GA: Author.

The article cites the following references:

Adler. M., & Ziglio, E. (1996). Gazing into the oracle: the delphi method and its application to social policy and community health and development. London: Jessica Kingsley Publishers.

Barrett, F. Program Evaluation: A Step-by-Step Guide. Sunnycrest Press, 2013. This practical manual includes helpful tips to develop evaluations, tables illustrating evaluation approaches, evaluation planning and reporting templates, and resources if you want more information.

Basch, C., Silepcevich, E., Gold, R., Duncan, D., & Kolbe, L. (1985). Avoiding type III errors in health education program evaluation: a case study . Health Education Quarterly. 12(4):315-31.

Bickman L, & Rog, D. (1998). Handbook of applied social research methods. Thousand Oaks, CA: Sage Publications.

Boruch, R. (1998). Randomized controlled experiments for evaluation and planning. In Handbook of applied social research methods, edited by Bickman L., & Rog. D. Thousand Oaks, CA: Sage Publications: 161-92.

Centers for Disease Control and Prevention DoHAP. Evaluating CDC HIV prevention programs: guidance and data system . Atlanta, GA: Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, 1999.

Centers for Disease Control and Prevention. Guidelines for evaluating surveillance systems. Morbidity and Mortality Weekly Report 1988;37(S-5):1-18.

Centers for Disease Control and Prevention. Handbook for evaluating HIV education . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Adolescent and School Health, 1995.

Cook, T., & Campbell, D. (1979). Quasi-experimentation . Chicago, IL: Rand McNally.

Cook, T.,& Reichardt, C. (1979). Qualitative and quantitative methods in evaluation research . Beverly Hills, CA: Sage Publications.

Cousins, J.,& Whitmore, E. (1998). Framing participatory evaluation. In Understanding and practicing participatory evaluation , vol. 80, edited by E Whitmore. San Francisco, CA: Jossey-Bass: 5-24.

Chen, H. (1990). Theory driven evaluations . Newbury Park, CA: Sage Publications.

de Vries, H., Weijts, W., Dijkstra, M., & Kok, G. (1992). The utilization of qualitative and quantitative data for health education program planning, implementation, and evaluation: a spiral approach . Health Education Quarterly.1992; 19(1):101-15.

Dyal, W. (1995). Ten organizational practices of community health and development: a historical perspective . American Journal of Preventive Medicine;11(6):6-8.

Eddy, D. (1998). Performance measurement: problems and solutions . Health Affairs;17 (4):7-25.Harvard Family Research Project. Performance measurement. In The Evaluation Exchange, vol. 4, 1998, pp. 1-15.

Eoyang,G., & Berkas, T. (1996). Evaluation in a complex adaptive system . Edited by (we don´t have the names), (1999): Taylor-Powell E, Steele S, Douglah M. Planning a program evaluation. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Fawcett, S.B., Paine-Andrews, A., Fancisco, V.T., Schultz, J.A., Richter, K.P, Berkley-Patton, J., Fisher, J., Lewis, R.K., Lopez, C.M., Russos, S., Williams, E.L., Harris, K.J., & Evensen, P. (2001). Evaluating community initiatives for health and development. In I. Rootman, D. McQueen, et al. (Eds.), Evaluating health promotion approaches . (pp. 241-277). Copenhagen, Denmark: World Health Organization - Europe.

Fawcett , S., Sterling, T., Paine-, A., Harris, K., Francisco, V. et al. (1996). Evaluating community efforts to prevent cardiovascular diseases . Atlanta, GA: Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion.

Fetterman, D.,, Kaftarian, S., & Wandersman, A. (1996). Empowerment evaluation: knowledge and tools for self-assessment and accountability . Thousand Oaks, CA: Sage Publications.

Frechtling, J.,& Sharp, L. (1997). User-friendly handbook for mixed method evaluations . Washington, DC: National Science Foundation.

Goodman, R., Speers, M., McLeroy, K., Fawcett, S., Kegler M., et al. (1998). Identifying and defining the dimensions of community capacity to provide a basis for measurement . Health Education and Behavior;25(3):258-78.

Greene, J. (1994). Qualitative program evaluation: practice and promise . In Handbook of Qualitative Research, edited by NK Denzin and YS Lincoln. Thousand Oaks, CA: Sage Publications.

Haddix, A., Teutsch. S., Shaffer. P., & Dunet. D. (1996). Prevention effectiveness: a guide to decision analysis and economic evaluation . New York, NY: Oxford University Press.

Hennessy, M. Evaluation. In Statistics in Community health and development , edited by Stroup. D.,& Teutsch. S. New York, NY: Oxford University Press, 1998: 193-219

Henry, G. (1998). Graphing data. In Handbook of applied social research methods , edited by Bickman. L., & Rog. D.. Thousand Oaks, CA: Sage Publications: 527-56.

Henry, G. (1998). Practical sampling. In Handbook of applied social research methods , edited by Bickman. L., & Rog. D.. Thousand Oaks, CA: Sage Publications: 101-26.

Institute of Medicine. Improving health in the community: a role for performance monitoring . Washington, DC: National Academy Press, 1997.

Joint Committee on Educational Evaluation, James R. Sanders (Chair). The program evaluation standards: how to assess evaluations of educational programs . Thousand Oaks, CA: Sage Publications, 1994.

Kaplan, R., & Norton, D. The balanced scorecard: measures that drive performance . Harvard Business Review 1992;Jan-Feb71-9.

Kar, S. (1989). Health promotion indicators and actions . New York, NY: Springer Publications.

Knauft, E. (1993). What independent sector learned from an evaluation of its own hard-to -measure programs . In A vision of evaluation, edited by ST Gray. Washington, DC: Independent Sector.

Koplan, J. (1999) CDC sets millennium priorities . US Medicine 4-7.

Lipsy, M. (1998). Design sensitivity: statistical power for applied experimental research . In Handbook of applied social research methods, edited by Bickman, L., & Rog, D. Thousand Oaks, CA: Sage Publications. 39-68.

Lipsey, M. (1993). Theory as method: small theories of treatments . New Directions for Program Evaluation;(57):5-38.

Lipsey, M. (1997). What can you build with thousands of bricks? Musings on the cumulation of knowledge in program evaluation . New Directions for Evaluation; (76): 7-23.

Love, A. (1991). Internal evaluation: building organizations from within . Newbury Park, CA: Sage Publications.

Miles, M., & Huberman, A. (1994). Qualitative data analysis: a sourcebook of methods . Thousand Oaks, CA: Sage Publications, Inc.

National Quality Program. (1999). National Quality Program , vol. 1999. National Institute of Standards and Technology.

National Quality Program . Baldridge index outperforms S&P 500 for fifth year, vol. 1999.

National Quality Program , 1999.

National Quality Program. Health care criteria for performance excellence , vol. 1999. National Quality Program, 1998.

Newcomer, K. Using statistics appropriately. In Handbook of Practical Program Evaluation, edited by Wholey,J., Hatry, H., & Newcomer. K. San Francisco, CA: Jossey-Bass, 1994: 389-416.

Patton, M. (1990). Qualitative evaluation and research methods . Newbury Park, CA: Sage Publications.

Patton, M (1997). Toward distinguishing empowerment evaluation and placing it in a larger context . Evaluation Practice;18(2):147-63.

Patton, M. (1997). Utilization-focused evaluation . Thousand Oaks, CA: Sage Publications.

Perrin, B. Effective use and misuse of performance measurement . American Journal of Evaluation 1998;19(3):367-79.

Perrin, E, Koshel J. (1997). Assessment of performance measures for community health and development, substance abuse, and mental health . Washington, DC: National Academy Press.

Phillips, J. (1997). Handbook of training evaluation and measurement methods . Houston, TX: Gulf Publishing Company.

Poreteous, N., Sheldrick B., & Stewart P. (1997). Program evaluation tool kit: a blueprint for community health and development management . Ottawa, Canada: Community health and development Research, Education, and Development Program, Ottawa-Carleton Health Department.

Posavac, E., & Carey R. (1980). Program evaluation: methods and case studies . Prentice-Hall, Englewood Cliffs, NJ.

Preskill, H. & Torres R. (1998). Evaluative inquiry for learning in organizations . Thousand Oaks, CA: Sage Publications.

Public Health Functions Project. (1996). The public health workforce: an agenda for the 21st century . Washington, DC: U.S. Department of Health and Human Services, Community health and development Service.

Public Health Training Network. (1998). Practical evaluation of public health programs . CDC, Atlanta, GA.

Reichardt, C., & Mark M. (1998). Quasi-experimentation . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications, 193-228.

Rossi, P., & Freeman H. (1993). Evaluation: a systematic approach . Newbury Park, CA: Sage Publications.

Rush, B., & Ogbourne A. (1995). Program logic models: expanding their role and structure for program planning and evaluation . Canadian Journal of Program Evaluation;695 -106.

Sanders, J. (1993). Uses of evaluation as a means toward organizational effectiveness. In A vision of evaluation , edited by ST Gray. Washington, DC: Independent Sector.

Schorr, L. (1997). Common purpose: strengthening families and neighborhoods to rebuild America . New York, NY: Anchor Books, Doubleday.

Scriven, M. (1998) . A minimalist theory of evaluation: the least theory that practice requires . American Journal of Evaluation.

Shadish, W., Cook, T., Leviton, L. (1991). Foundations of program evaluation . Newbury Park, CA: Sage Publications.

Shadish, W. (1998). Evaluation theory is who we are. American Journal of Evaluation:19(1):1-19.

Shulha, L., & Cousins, J. (1997). Evaluation use: theory, research, and practice since 1986 . Evaluation Practice.18(3):195-208

Sieber, J. (1998). Planning ethically responsible research . In Handbook of applied social research methods, edited by L Bickman and DJ Rog. Thousand Oaks, CA: Sage Publications: 127-56.

Steckler, A., McLeroy, K., Goodman, R., Bird, S., McCormick, L. (1992). Toward integrating qualitative and quantitative methods: an introduction . Health Education Quarterly;191-8.

Taylor-Powell, E., Rossing, B., Geran, J. (1998). Evaluating collaboratives: reaching the potential. Madison, Wisconsin: University of Wisconsin Cooperative Extension.

Teutsch, S. A framework for assessing the effectiveness of disease and injury prevention . Morbidity and Mortality Weekly Report: Recommendations and Reports Series 1992;41 (RR-3 (March 27, 1992):1-13.

Torres, R., Preskill, H., Piontek, M., (1996). Evaluation strategies for communicating and reporting: enhancing learning in organizations . Thousand Oaks, CA: Sage Publications.

Trochim, W. (1999). Research methods knowledge base , vol.

United Way of America. Measuring program outcomes: a practical approach . Alexandria, VA: United Way of America, 1996.

U.S. General Accounting Office. Case study evaluations . GAO/PEMD-91-10.1.9. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. Designing evaluations . GAO/PEMD-10.1.4. Washington, DC: U.S. General Accounting Office, 1991.

U.S. General Accounting Office. Managing for results: measuring program results that are under limited federal control . GAO/GGD-99-16. Washington, DC: 1998.

U.S. General Accounting Office. Prospective evaluation methods: the prosepctive evaluation synthesis . GAO/PEMD-10.1.10. Washington, DC: U.S. General Accounting Office, 1990.

U.S. General Accounting Office. The evaluation synthesis . Washington, DC: U.S. General Accounting Office, 1992.

U.S. General Accounting Office. Using statistical sampling . Washington, DC: U.S. General Accounting Office, 1992.

Wandersman, A., Morrissey, E., Davino, K., Seybolt, D., Crusto, C., et al. Comprehensive quality programming and accountability: eight essential strategies for implementing successful prevention programs . Journal of Primary Prevention 1998;19(1):3-30.

Weiss, C. (1995). Nothing as practical as a good theory: exploring theory-based evaluation for comprehensive community initiatives for families and children . In New Approaches to Evaluating Community Initiatives, edited by Connell, J. Kubisch, A. Schorr, L. & Weiss, C. New York, NY, NY: Aspin Institute.

Weiss, C. (1998). Have we learned anything new about the use of evaluation? American Journal of Evaluation;19(1):21-33.

Weiss, C. (1997). How can theory-based evaluation make greater headway? Evaluation Review 1997;21(4):501-24.

W.K. Kellogg Foundation. (1998). The W.K. Foundation Evaluation Handbook . Battle Creek, MI: W.K. Kellogg Foundation.

Wong-Reiger, D.,& David, L. (1995). Using program logic models to plan and evaluate education and prevention programs. In Evaluation Methods Sourcebook II, edited by Love. A.J. Ottawa, Ontario: Canadian Evaluation Society.

Wholey, S., Hatry, P., & Newcomer, E. . Handbook of Practical Program Evaluation. Jossey-Bass, 2010. This book serves as a comprehensive guide to the evaluation process and its practical applications for sponsors, program managers, and evaluators.

Yarbrough, B., Lyn, M., Shulha, H., Rodney K., & Caruthers, A. (2011). The Program Evaluation Standards: A Guide for Evalualtors and Evaluation Users Third Edition . Sage Publications.

Yin, R. (1988). Case study research: design and methods . Newbury Park, CA: Sage Publications.

Search Menu
Browse content in Arts and Humanities
Browse content in Archaeology
Anglo-Saxon and Medieval Archaeology
Archaeological Methodology and Techniques
Archaeology by Region
Archaeology of Religion
Archaeology of Trade and Exchange
Biblical Archaeology
Contemporary and Public Archaeology
Environmental Archaeology
Historical Archaeology
History and Theory of Archaeology
Industrial Archaeology
Landscape Archaeology
Mortuary Archaeology
Prehistoric Archaeology
Underwater Archaeology
Urban Archaeology
Zooarchaeology
Browse content in Architecture
Architectural Structure and Design
History of Architecture
Residential and Domestic Buildings
Theory of Architecture
Browse content in Art
Art Subjects and Themes
History of Art
Industrial and Commercial Art
Theory of Art
Biographical Studies
Byzantine Studies
Browse content in Classical Studies
Classical History
Classical Philosophy
Classical Mythology
Classical Literature
Classical Reception
Classical Art and Architecture
Classical Oratory and Rhetoric
Greek and Roman Papyrology
Greek and Roman Epigraphy
Greek and Roman Law
Greek and Roman Archaeology
Late Antiquity
Religion in the Ancient World
Digital Humanities
Browse content in History
Colonialism and Imperialism
Diplomatic History
Environmental History
Genealogy, Heraldry, Names, and Honours
Genocide and Ethnic Cleansing
Historical Geography
History by Period
History of Emotions
History of Agriculture
History of Education
History of Gender and Sexuality
Industrial History
Intellectual History
International History
Labour History
Legal and Constitutional History
Local and Family History
Maritime History
Military History
National Liberation and Post-Colonialism
Oral History
Political History
Public History
Regional and National History
Revolutions and Rebellions
Slavery and Abolition of Slavery
Social and Cultural History
Theory, Methods, and Historiography
Urban History
World History
Browse content in Language Teaching and Learning
Language Learning (Specific Skills)
Language Teaching Theory and Methods
Browse content in Linguistics
Applied Linguistics
Cognitive Linguistics
Computational Linguistics
Forensic Linguistics
Grammar, Syntax and Morphology
Historical and Diachronic Linguistics
History of English
Language Evolution
Language Reference
Language Acquisition
Language Variation
Language Families
Lexicography
Linguistic Anthropology
Linguistic Theories
Linguistic Typology
Phonetics and Phonology
Psycholinguistics
Sociolinguistics
Translation and Interpretation
Writing Systems
Browse content in Literature

Bibliography

Children's Literature Studies
Literary Studies (Romanticism)
Literary Studies (American)
Literary Studies (Asian)
Literary Studies (European)
Literary Studies (Eco-criticism)
Literary Studies (Modernism)
Literary Studies - World
Literary Studies (1500 to 1800)
Literary Studies (19th Century)
Literary Studies (20th Century onwards)
Literary Studies (African American Literature)
Literary Studies (British and Irish)
Literary Studies (Early and Medieval)
Literary Studies (Fiction, Novelists, and Prose Writers)
Literary Studies (Gender Studies)
Literary Studies (Graphic Novels)
Literary Studies (History of the Book)
Literary Studies (Plays and Playwrights)
Literary Studies (Poetry and Poets)
Literary Studies (Postcolonial Literature)
Literary Studies (Queer Studies)
Literary Studies (Science Fiction)
Literary Studies (Travel Literature)
Literary Studies (War Literature)
Literary Studies (Women's Writing)
Literary Theory and Cultural Studies
Mythology and Folklore
Shakespeare Studies and Criticism
Browse content in Media Studies
Browse content in Music
Applied Music
Dance and Music
Ethics in Music
Ethnomusicology
Gender and Sexuality in Music
Medicine and Music
Music Cultures
Music and Media
Music and Religion
Music and Culture
Music Education and Pedagogy
Music Theory and Analysis
Musical Scores, Lyrics, and Libretti
Musical Structures, Styles, and Techniques
Musicology and Music History
Performance Practice and Studies
Race and Ethnicity in Music
Sound Studies
Browse content in Performing Arts
Browse content in Philosophy
Aesthetics and Philosophy of Art
Epistemology
Feminist Philosophy
History of Western Philosophy
Metaphysics
Moral Philosophy
Non-Western Philosophy
Philosophy of Language
Philosophy of Mind
Philosophy of Perception
Philosophy of Science
Philosophy of Action
Philosophy of Law
Philosophy of Religion
Philosophy of Mathematics and Logic
Practical Ethics
Social and Political Philosophy
Browse content in Religion
Biblical Studies
Christianity
East Asian Religions
History of Religion
Judaism and Jewish Studies
Qumran Studies
Religion and Education
Religion and Health
Religion and Politics
Religion and Science
Religion and Law
Religion and Art, Literature, and Music
Religious Studies
Browse content in Society and Culture
Cookery, Food, and Drink
Cultural Studies
Customs and Traditions
Ethical Issues and Debates
Hobbies, Games, Arts and Crafts
Lifestyle, Home, and Garden
Natural world, Country Life, and Pets
Popular Beliefs and Controversial Knowledge
Sports and Outdoor Recreation
Technology and Society
Travel and Holiday
Visual Culture
Browse content in Law
Arbitration
Browse content in Company and Commercial Law
Commercial Law
Company Law
Browse content in Comparative Law
Systems of Law
Competition Law
Browse content in Constitutional and Administrative Law
Government Powers
Judicial Review
Local Government Law
Military and Defence Law
Parliamentary and Legislative Practice
Construction Law
Contract Law
Browse content in Criminal Law
Criminal Procedure
Criminal Evidence Law
Sentencing and Punishment
Employment and Labour Law
Environment and Energy Law
Browse content in Financial Law
Banking Law
Insolvency Law
History of Law
Human Rights and Immigration
Intellectual Property Law
Browse content in International Law
Private International Law and Conflict of Laws
Public International Law
IT and Communications Law
Jurisprudence and Philosophy of Law
Law and Politics
Law and Society
Browse content in Legal System and Practice
Courts and Procedure
Legal Skills and Practice
Primary Sources of Law
Regulation of Legal Profession
Medical and Healthcare Law
Browse content in Policing
Criminal Investigation and Detection
Police and Security Services
Police Procedure and Law
Police Regional Planning
Browse content in Property Law
Personal Property Law
Study and Revision
Terrorism and National Security Law
Browse content in Trusts Law
Wills and Probate or Succession
Browse content in Medicine and Health
Browse content in Allied Health Professions
Arts Therapies
Clinical Science
Dietetics and Nutrition
Occupational Therapy
Operating Department Practice
Physiotherapy
Radiography
Speech and Language Therapy
Browse content in Anaesthetics
General Anaesthesia
Neuroanaesthesia
Clinical Neuroscience
Browse content in Clinical Medicine
Acute Medicine
Cardiovascular Medicine
Clinical Genetics
Clinical Pharmacology and Therapeutics
Dermatology
Endocrinology and Diabetes
Gastroenterology
Genito-urinary Medicine
Geriatric Medicine
Infectious Diseases
Medical Toxicology
Medical Oncology
Pain Medicine
Palliative Medicine
Rehabilitation Medicine
Respiratory Medicine and Pulmonology
Rheumatology
Sleep Medicine
Sports and Exercise Medicine
Community Medical Services
Critical Care
Emergency Medicine
Forensic Medicine
Haematology
History of Medicine
Browse content in Medical Skills
Clinical Skills
Communication Skills
Nursing Skills
Surgical Skills
Browse content in Medical Dentistry
Oral and Maxillofacial Surgery
Paediatric Dentistry
Restorative Dentistry and Orthodontics
Surgical Dentistry
Medical Ethics
Medical Statistics and Methodology
Browse content in Neurology
Clinical Neurophysiology
Neuropathology
Nursing Studies
Browse content in Obstetrics and Gynaecology
Gynaecology
Occupational Medicine
Ophthalmology
Otolaryngology (ENT)
Browse content in Paediatrics
Neonatology
Browse content in Pathology
Chemical Pathology
Clinical Cytogenetics and Molecular Genetics
Histopathology
Medical Microbiology and Virology
Patient Education and Information
Browse content in Pharmacology
Psychopharmacology
Browse content in Popular Health
Caring for Others
Complementary and Alternative Medicine
Self-help and Personal Development
Browse content in Preclinical Medicine
Cell Biology
Molecular Biology and Genetics
Reproduction, Growth and Development
Primary Care
Professional Development in Medicine
Browse content in Psychiatry
Addiction Medicine
Child and Adolescent Psychiatry
Forensic Psychiatry
Learning Disabilities
Old Age Psychiatry
Psychotherapy
Browse content in Public Health and Epidemiology
Epidemiology
Public Health
Browse content in Radiology
Clinical Radiology
Interventional Radiology
Nuclear Medicine
Radiation Oncology
Reproductive Medicine
Browse content in Surgery
Cardiothoracic Surgery
Gastro-intestinal and Colorectal Surgery
General Surgery
Neurosurgery
Paediatric Surgery
Peri-operative Care
Plastic and Reconstructive Surgery
Surgical Oncology
Transplant Surgery
Trauma and Orthopaedic Surgery
Vascular Surgery
Browse content in Science and Mathematics
Browse content in Biological Sciences
Aquatic Biology
Biochemistry
Bioinformatics and Computational Biology
Developmental Biology
Ecology and Conservation
Evolutionary Biology
Genetics and Genomics
Microbiology
Molecular and Cell Biology
Natural History
Plant Sciences and Forestry
Research Methods in Life Sciences
Structural Biology
Systems Biology
Zoology and Animal Sciences
Browse content in Chemistry
Analytical Chemistry
Computational Chemistry
Crystallography
Environmental Chemistry
Industrial Chemistry
Inorganic Chemistry
Materials Chemistry
Medicinal Chemistry
Mineralogy and Gems
Organic Chemistry
Physical Chemistry
Polymer Chemistry
Study and Communication Skills in Chemistry
Theoretical Chemistry
Browse content in Computer Science
Artificial Intelligence
Computer Architecture and Logic Design
Game Studies
Human-Computer Interaction
Mathematical Theory of Computation
Programming Languages
Software Engineering
Systems Analysis and Design
Virtual Reality
Browse content in Computing
Business Applications
Computer Security
Computer Games
Computer Networking and Communications
Digital Lifestyle
Graphical and Digital Media Applications
Operating Systems
Browse content in Earth Sciences and Geography
Atmospheric Sciences
Environmental Geography
Geology and the Lithosphere
Maps and Map-making
Meteorology and Climatology
Oceanography and Hydrology
Palaeontology
Physical Geography and Topography
Regional Geography
Soil Science
Urban Geography
Browse content in Engineering and Technology
Agriculture and Farming
Biological Engineering
Civil Engineering, Surveying, and Building
Electronics and Communications Engineering
Energy Technology
Engineering (General)
Environmental Science, Engineering, and Technology
History of Engineering and Technology
Mechanical Engineering and Materials
Technology of Industrial Chemistry
Transport Technology and Trades
Browse content in Environmental Science
Applied Ecology (Environmental Science)
Conservation of the Environment (Environmental Science)
Environmental Sustainability
Environmentalist Thought and Ideology (Environmental Science)
Management of Land and Natural Resources (Environmental Science)
Natural Disasters (Environmental Science)
Nuclear Issues (Environmental Science)
Pollution and Threats to the Environment (Environmental Science)
Social Impact of Environmental Issues (Environmental Science)
History of Science and Technology
Browse content in Materials Science
Ceramics and Glasses
Composite Materials
Metals, Alloying, and Corrosion
Nanotechnology
Browse content in Mathematics
Applied Mathematics
Biomathematics and Statistics
History of Mathematics
Mathematical Education
Mathematical Finance
Mathematical Analysis
Numerical and Computational Mathematics
Probability and Statistics
Pure Mathematics
Browse content in Neuroscience
Cognition and Behavioural Neuroscience
Development of the Nervous System
Disorders of the Nervous System
History of Neuroscience
Invertebrate Neurobiology
Molecular and Cellular Systems
Neuroendocrinology and Autonomic Nervous System
Neuroscientific Techniques
Sensory and Motor Systems
Browse content in Physics
Astronomy and Astrophysics
Atomic, Molecular, and Optical Physics
Biological and Medical Physics
Classical Mechanics
Computational Physics
Condensed Matter Physics
Electromagnetism, Optics, and Acoustics
History of Physics
Mathematical and Statistical Physics
Measurement Science
Nuclear Physics
Particles and Fields
Plasma Physics
Quantum Physics
Relativity and Gravitation
Semiconductor and Mesoscopic Physics
Browse content in Psychology
Affective Sciences
Clinical Psychology
Cognitive Psychology
Cognitive Neuroscience
Criminal and Forensic Psychology
Developmental Psychology
Educational Psychology
Evolutionary Psychology
Health Psychology
History and Systems in Psychology
Music Psychology
Neuropsychology
Organizational Psychology
Psychological Assessment and Testing
Psychology of Human-Technology Interaction
Psychology Professional Development and Training
Research Methods in Psychology
Social Psychology
Browse content in Social Sciences
Browse content in Anthropology
Anthropology of Religion
Human Evolution
Medical Anthropology
Physical Anthropology
Regional Anthropology
Social and Cultural Anthropology
Theory and Practice of Anthropology
Browse content in Business and Management
Business Ethics
Business Strategy
Business History
Business and Technology
Business and Government
Business and the Environment
Comparative Management
Corporate Governance
Corporate Social Responsibility
Entrepreneurship
Health Management
Human Resource Management
Industrial and Employment Relations
Industry Studies
Information and Communication Technologies
International Business
Knowledge Management
Management and Management Techniques
Operations Management
Organizational Theory and Behaviour
Pensions and Pension Management
Public and Nonprofit Management
Strategic Management
Supply Chain Management
Browse content in Criminology and Criminal Justice
Criminal Justice
Criminology
Forms of Crime
International and Comparative Criminology
Youth Violence and Juvenile Justice
Development Studies
Browse content in Economics
Agricultural, Environmental, and Natural Resource Economics
Asian Economics
Behavioural Finance
Behavioural Economics and Neuroeconomics
Econometrics and Mathematical Economics
Economic History
Economic Systems
Economic Methodology
Economic Development and Growth
Financial Markets
Financial Institutions and Services
General Economics and Teaching
Health, Education, and Welfare
History of Economic Thought
International Economics
Labour and Demographic Economics
Law and Economics
Macroeconomics and Monetary Economics
Microeconomics
Public Economics
Urban, Rural, and Regional Economics
Welfare Economics
Browse content in Education
Adult Education and Continuous Learning
Care and Counselling of Students
Early Childhood and Elementary Education
Educational Equipment and Technology
Educational Strategies and Policy
Higher and Further Education
Organization and Management of Education
Philosophy and Theory of Education
Schools Studies
Secondary Education
Teaching of a Specific Subject
Teaching of Specific Groups and Special Educational Needs
Teaching Skills and Techniques
Browse content in Environment
Applied Ecology (Social Science)
Climate Change
Conservation of the Environment (Social Science)
Environmentalist Thought and Ideology (Social Science)
Natural Disasters (Environment)
Social Impact of Environmental Issues (Social Science)
Browse content in Human Geography
Cultural Geography
Economic Geography
Political Geography
Browse content in Interdisciplinary Studies
Communication Studies
Museums, Libraries, and Information Sciences
Browse content in Politics
African Politics
Asian Politics
Chinese Politics
Comparative Politics
Conflict Politics
Elections and Electoral Studies
Environmental Politics
European Union
Foreign Policy
Gender and Politics
Human Rights and Politics
Indian Politics
International Relations
International Organization (Politics)
International Political Economy
Irish Politics
Latin American Politics
Middle Eastern Politics
Political Behaviour
Political Economy
Political Institutions
Political Methodology
Political Communication
Political Philosophy
Political Sociology
Political Theory
Politics and Law
Public Policy
Public Administration
Quantitative Political Methodology
Regional Political Studies
Russian Politics
Security Studies
State and Local Government
UK Politics
US Politics
Browse content in Regional and Area Studies
African Studies
Asian Studies
East Asian Studies
Japanese Studies
Latin American Studies
Middle Eastern Studies
Native American Studies
Scottish Studies
Browse content in Research and Information
Research Methods
Browse content in Social Work
Addictions and Substance Misuse
Adoption and Fostering
Care of the Elderly
Child and Adolescent Social Work
Couple and Family Social Work
Developmental and Physical Disabilities Social Work
Direct Practice and Clinical Social Work
Emergency Services
Human Behaviour and the Social Environment
International and Global Issues in Social Work
Mental and Behavioural Health
Social Justice and Human Rights
Social Policy and Advocacy
Social Work and Crime and Justice
Social Work Macro Practice
Social Work Practice Settings
Social Work Research and Evidence-based Practice
Welfare and Benefit Systems
Browse content in Sociology
Childhood Studies
Community Development
Comparative and Historical Sociology
Economic Sociology
Gender and Sexuality
Gerontology and Ageing
Health, Illness, and Medicine
Marriage and the Family
Migration Studies
Occupations, Professions, and Work
Organizations
Population and Demography
Race and Ethnicity
Social Theory
Social Movements and Social Change
Social Research and Statistics
Social Stratification, Inequality, and Mobility
Sociology of Religion
Sociology of Education
Sport and Leisure
Urban and Rural Studies
Browse content in Warfare and Defence
Defence Strategy, Planning, and Research
Land Forces and Warfare
Military Administration
Military Life and Institutions
Naval Forces and Warfare
Other Warfare and Defence Issues
Peace Studies and Conflict Resolution
Weapons and Equipment

The Oxford Handbook of Quantitative Methods in Psychology, Vol. 1

< Previous chapter
Next chapter >

17 Program Evaluation: Principles, Procedures, and Practices

Aurelio José Figueredo, Department of Psychology, School of Mind, Brain, and Behavior, Division of Family Studies and Human Development, College of Agriculture and Life Sciences, University of Arizona, Tucson, AZ

Sally Gayle Olderbak, Department of Psychology, School of Mind, Brain, and Behavior, Division of Family Studies and Human Development, College of Agriculture and Life Sciences, University of Arizona, Tucson, AZ

Gabriel Lee Schlomer, Division of Family Studies and Human Development, College of Agriculture and Life Sciences, University of Arizona, Tucson, AZ

Rafael Antonio Garcia, Division of Family Studies and Human Development, College of Agriculture and Life Sciences, University of Arizona, Tucson, AZ

Pedro Sofio Abril Wolf, Department of Psychology, University of Cape Town, South Africa

Published: 01 October 2013
Cite Icon Cite
Permissions Icon Permissions

This chapter provides a review of the current state of the principles, procedures, and practices within program evaluation. We address a few incisive and difficult questions about the current state of the field: (1) What are the kinds of program evaluations? (2) Why do program evaluation results often have so little impact on social policy? (3) Does program evaluation suffer from a counterproductive system of incentives? and (4) What do program evaluators actually do? We compare and contrast the merits and limitations, strengths and weaknesses, and relative progress of the two primary contemporary movements within program evaluation, Quantitative Methods and Qualitative Methods, and we propose an epistemological framework for integrating the two movements as complementary forms of investigation, each contributing to different stages in the scientific process. In the final section, we provide recommendations for systemic institutional reforms addressing identified structural problems within the real-world practice of program evaluation.

Introduction

President Barack Obama’s 2010 Budget included many statements calling for the evaluation of more U. S. Federal Government programs ( Office of Management and Budget, 2009 ). But what precisely is meant by the term evaluation ? Who should conduct these evaluations? Who should pay for these evaluations? How should these evaluations be conducted?

This chapter provides a review of the principles, procedures, and practices within program evaluation . We start by posing and addressing a few incisive and difficult questions about the current state of that field:

What are the different kinds of program evaluations?

Why do program evaluation results often have so little impact on social policy?

Does program evaluation suffer from a counterproductive system of incentives?

We then ask a fourth question regarding the real-world practice of program evaluation: What do program evaluators actually do? In the two sections that follow, we try to answer this question by reviewing the merits and limitations, strengths and weaknesses, and relative progress of the two primary contemporary “movements” within program evaluation and the primary methods of evaluation upon which they rely: Part 1 addresses Quantitative Methods and Part 2 addresses Qualitative Methods. Finally, we propose a framework for the integration of the two movements as complementary forms of investigation in program evaluation, each contributing to different stages in the scientific process. In the final section, we provide recommendations for systemic institutional reforms addressing identified structural problems within the real-world practice of program evaluation.

What Are the Different Kinds of Program Evaluations?

Scriven (1967) introduced the important distinction between summative program evaluations as compared with formative program evaluations. The goal of a summative evaluation is to judge the merits of a fixed, unchanging program as a finished product, relative to potential alternative programs. This judgment should consist of an analysis of the costs and benefits of the program, as compared with other programs targeted at similar objectives, to justify the expenses and opportunity costs society incurs in implementing one particular program as opposed to an alternative program, as well as in contrast to doing nothing at all. Further, a summative evaluation must examine both the intended and the unintended outcomes of the programmatic intervention and not just the specific stated goals, as represented by the originators, administrators, implementers, or advocates of the program ( Scriven, 1991 ). A formative evaluation , on the other hand, is an ongoing evaluation of a program that is not fixed but is still in the process of change. The goal of a formative evaluation is to provide feedback to the program managers with the purpose of improving the program regarding what is and what is not working well and not to make a final judgment on the relative merits of the program.

The purely dichotomous and mutually exclusive model defining the differences between summative and formative evaluations has been softened and qualified somewhat over the years. Tharp and Gallimore (1979 , 1982 ), in their research and development (R&D) program for social action, proposed a model of evaluation succession , patterned on the analogy of ecological succession , wherein an ongoing, long-term evaluation begins as a formative program evaluation and acquires features of a summative program evaluation as the program naturally matures, aided by the continuous feedback from the formative program evaluation process. Similarly, Patton (1996) has proposed a putatively broader view of program evaluation that falls between the summative versus formative dichotomy: (1) knowledge-generating evaluation, evaluations that are designed to increase our conceptual understanding of a particular topic; (2) developmental evaluation, an ongoing evaluation that strives to continuously improve the program; and (3) using the evaluation processes, which involves more intently engaging the stakeholders, and others associated with the evaluation, to think more about the program and ways to improve its efficacy or effectiveness. Patton has argued that the distinction between summative and formative evaluation is decreasing, and there is a movement within the field of program evaluation that applies a more creative use and application of evaluation. What he termed knowledge-generative evaluation is a form of evaluation focused not on the instrumental use of evaluation findings (e.g., making decisions based on the results of the evaluation) but, rather, on the conceptual use of evaluation findings (e.g., theory construction).

A developmental evaluation ( Patton, 1994 ) is a form of program evaluation that is ongoing and is focused on the development of the program. Evaluators provide constant feedback but not always in the forms of official reports. Developmental evaluation assumes components of the program under evaluation are constantly changing, and so the evaluation is not geared toward eventually requiring a summative program evaluation but, rather, is focused on constantly adapting and evolving the evaluation to fit the evolving program. Patton (1996) proposed that program evaluators should focus not only on reaching the evaluation outcomes, but also on the process of the evaluation itself, in that the evaluation itself can be “participatory and empowering … increasing the effectiveness of the program through the evaluation process rather than just the findings” (p. 137).

Stufflebeam (2001) has presented a larger classification of the different kinds of evaluation, consisting of 22 alternative approaches to evaluation that can be classified into four categories. Stufflebeam’s first category is called Pseudoevaluations and encompasses evaluation approaches that are often motivated by politics, which may lead to misleading or invalid results. Pseudoevaluation approaches include: (1) Public Relations-Inspired Studies and (2) Politically Controlled Studies (for a description of each of the 22 evaluation approaches, please refer to Stufflebeam’s [2001] original paper). Stufflebeam’s second category is called Questions-And-Methods-Evaluation Approaches (Quasi-Evaluation Studies) and encompasses evaluation approaches geared to address a particular question, or apply a particular method, which often result in narrowing the scope of the evaluation. This category includes: (3) Objectives-Based Studies; (4) Accountability, Particularly Payment by Results Section; (5) Objective Testing Program; (6) Outcome Evaluation as Value-Added Assessment; (7) Performance Testing; (8) Experimental Studies; (9) Management Information Systems; (10) Benefit–Cost Analysis Approach; (11) Clarification Hearing; (12) Case Study Evaluations; (13) Criticism and Commentary; (14) Program Theory-Based Evaluation; and (15) Mixed-Methods Studies.

Stufflebeam’s (2001) third category, Improve- ment/Accountability-Oriented Evaluation Approaches , is the most similar to the commonly used definition of program evaluation and encompasses approaches that are extensive and expansive in their approach and selection of outcome variables, which use a multitude of qualitative and quantitative methodologies for assessment. These approaches include: (16) Decision/Accountability-Oriented Studies; (17) Consumer-Oriented Studies; and (18) Accreditation/Certification Approach. Stufflebeam’s fourth category is called Social Agenda/Advocacy Approaches and encompasses evaluation approaches that are geared toward directly benefitting the community in which they are implemented, sometimes so much so that the evaluation may be biased, and are heavily included by the perspective of the stakeholders. These approaches include: (19) Client-Centered Studies (or Responsive Evaluation); (20) Constructivist Evaluation; (21) Deliberative Democratic Evaluation; and (22) Utilization-Focused Evaluation.

These different types of program evaluations are not exhaustive of all the types that exist, but they are the ones that we consider most relevant to the current analysis and ultimate recommendations.

Why Do Program Evaluation Results Often Have So Little Impact on Social Policy?

At the time of writing, the answer to this question is not completely knowable. Until we have more research on this point, we can never completely document the impact that program evaluation has on public policy. Many other commentators on program evaluation (e.g., Weiss, 1999 ), however, have made the point that program evaluation does not have as much of an impact on social policy as we would like it to have. To illustrate this point, we will use two representative case studies: the Kamehameha Early Education Project (KEEP) , and the Drug Abuse Resistance Education (DARE) . Although the success or failure of a program and the success or failure of a program evaluation are two different things, one is intimately related to the other, because the success or failure of the program evaluation is necessarily considered in reference to the success or failure of the program under evaluation.

The Frustrated Goals of Program Evaluation

When it comes to public policy, the goal of an evaluation should include helping funding agencies, such as governmental entities, decide whether to terminate, cut back, continue, scale up, or disseminate a program depending on success or failure of the program, which would be the main goal of a summative program evaluation. An alternative goal might be to suggest modifications to existing programs in response to data gathered and analyzed during an evaluation, which would be the main goal of a formative program evaluation. Although both goals are the primary purposes of program evaluation, in reality policymakers rarely utilize the evaluation findings for these goals and rarely make decisions based on the results of evaluations. Even an evaluation that was successful in its process can be blatantly ignored and result in a failure in its outcome. We relate this undesirable state of affairs further below with the concept of a market failure from economic theory.

According to Weiss (1999) there are four major reasons that program evaluations may not have a direct impact on decisions by policymakers (the “Four I’s”). First, when making decisions, a host of competing interests present themselves. Because of this competition, the results of different evaluations can be used to the benefit or detriment of the causes of various interested parties. Stakeholders with conflicting interests can put the evaluator between a rock and a hard place. An example of this is when a policymaker receives negative feedback regarding a program. On the one hand, the policymaker is interested in supporting successful programs, but on the other hand, a policymaker who needs to get re-elected might not want to be perceived as “the guy who voted no on drug prevention.” Second, the ideologies of different stakeholder groups can also be a barrier for the utilization of program evaluation results. These ideologies filter potential solutions and restrict results to which policymakers will listen. This occurs most often when the ideology claims that something is “fundamentally wrong.” For example, an abstinence-only program, designed to prevent teenage pregnancy, may be in competition with a program that works better, but because the program passes out condoms to teenagers, the abstinence-only plan may be funded because of the ideologies of the policymakers or their constituents. Third, the information contained in the evaluation report itself can be a barrier. The results of evaluations are not the only source of information and are often not the most salient. Policymakers often have extensive information regarding a potential policy, and the results of the evaluation are competing with every other source of information that can enter the decision-making process. Finally, the institutional characteristics of the program itself can become a barrier. The institution is made up of people working within the context of a set structure and a history of behavior. Because of these institutional characteristics, change may be difficult or even considered “off-limits.” For example, if an evaluation results in advocating the elimination a particular position, then the results may be overlooked because the individual currently in that position is 6 months from retirement. Please note that we are not making a value judgment regarding the relative merits of such a decision but merely describing the possible situation.

The utilization of the results of an evaluation is the primary objective of an evaluation; however, it is often the case that evaluation results are put aside in favor of other, less optimal actions ( Weiss, 1999 ). This is not a problem novel to program evaluators but a problem that burdens most applied social science. A prime example of this problem is that of the reliability of eyewitness testimony. Since Elizabeth Loftus published her 1979 book, Eyewitness Testimony , there has been extensive work done on the reliability of eyewitnesses and the development of false memories. Nevertheless, it took 20 years for the U. S. Department of Justice to institute national standards reflecting the implications of these findings ( Wells et al., 2000 ). Loftus did accomplish what Weiss refers to as “enlightenment” ( Weiss, 1980 ), or the bringing of scientific data into the applied realm of policymaking. Although ideally programs would implement evaluation findings immediately, this simply does not often happen. As stated by Weiss (1999) , the volume of information that organizations or policymakers have regarding a particular program is usually too vast to be overthrown by one dissenting evaluation. These problems appear to be inherent in social sciences and program evaluation, and it is unclear how to ameliorate them.

To illustrate how programs and program evaluations can succeed or fail, we use two representative case studies: one notable success of the program evaluation process, the KEEP , and one notable failure of the program evaluation process, DARE .

Kamehameha Early Education Project

A classic example of a successful program evaluation described by Tharp and Gallimore (1979) was that of KEEP. Kamehameha Early Evaluation Project was started in 1970 to improve the reading and general education of Hawaiian children. The project worked closely with program evaluators to identify solutions for many of the unique problems faced by young Hawaiian-American children in their education, from kindergarten through third grade, and to discover methods for disseminating these solutions to the other schools in Hawaii. The evaluation took 7 years before significant improvement was seen and involved a multidisciplinary approach, including theoretical perspectives from the fields of psychology, anthropology, education, and linguistics.

Based on their evaluation of KEEP, Tharp and Gallimore (1979) identified four necessary conditions for a successful program evaluation: (1) longevity—evaluations need time to take place, which requires stability in other areas of the program; (2) stability in the values and goals of the program; (3) stability of funding; and (4) the opportunity for the evaluators’ recommendations to influence the procedure of the program.

In terms of the “Four I’s,” the interests of KEEP were clear and stable. The project was interested in improving general education processes. In terms of ideology and information, KEEP members believed that the evaluation process was vital to its success and trusted the objectivity of the evaluators, taking their suggestions to heart. From its inception, the institution had an evaluation system built in. Since continuing evaluations were in process, the program itself had no history of institutional restriction of evaluations.

Drug Abuse Resistance Education

In this notable case, we are not so much highlighting the failure of a specific program evaluation, or of a specific program per se , as highlighting the institutional failure of program evaluation as a system, at least as currently structured in our society. In the case of DARE, a series of program evaluations produced results that, in the end, were not acted upon. Rather, what should have been recognized as a failed program lives on to this day. The DARE program was started in 1983, and the goal of the program was to prevent drug use. Although there are different DARE curricula, depending on the targeted age group, the essence of the program is that uniformed police officers deliver a curriculum in safe classroom environments aimed at preventing drug use among the students. As of 2004, DARE has been the most successful school-based prevention program in attracting federal money: The estimated average federal expenditure is three-quarters of a billion dollars per year ( West & O’Neal, 2004 ). Although DARE is successful at infiltrating school districts and attracting tax dollars, research spanning more than two decades has shown that the program is ineffective at best and detrimental at worst. One of the more recent meta-analyses ( West & O’Neal, 2004 ) estimated the average effect size for DARE’s effectiveness was extremely low and not even statistically significant ( r = 0 . 0 1 ; Cohen’s d = 0 . 0 2 , 95% confidence interval = –0.04, 0.08).

Early studies pointed to the ineffectiveness of the DARE program ( Ennett, Tobler, Ringwalt, & Flewelling, 1994 ; Clayton, Cattarello, & Johnstone, 1996 ; Dukes, Ullman, & Stein, 1996 ). In response to much of this research, the Surgeon General placed the DARE program in the “Does Not Work” category of programs in 2001. In 2003, the U. S. Government Accountability Office (GAO) wrote a letter to congressmen citing a series of empirical studies in the 1990s showing that in some cases DARE is actually iatrogenic, meaning that DARE does more harm than good.

Despite all the evidence, DARE is still heavily funded by tax dollars through the following government agencies: California National Guard, Combined Federal Campaign (CFC), Florida National Guard, St. Petersburg College, Multijurisdictional, Counterdrug Task Force Training, Indiana National Guard, Midwest Counterdrug Training Center/ National Guard, U.S. Department of Defense, U.S. Department of Justice, Bureau of Justice Assistance (BJA), Drug Enforcement Administration, Office of Juvenile Justice and Delinquency Prevention, and the U.S. Department of State.

These are institutional conflicts of interest. As described above, few politicians want to be perceived as “the guy who voted against drug prevention.” The failure of DARE stems primarily from these conflicts of interest. In lieu of any better options, the U.S. Federal Government continues to support DARE, simply because to not do so might appear as if they were doing nothing. At the present writing in 2012, DARE has been in effect for 29 years. Attempting to change the infrastructure of a longstanding program like this would be met with a great deal of resistance.

We chose the DARE example specifically because it is a long-running example, as it takes years to make the determination that somewhere something in the system of program evaluation failed. If this chapter were being written in the early 1990s, people in the field of program evaluation might reasonably be predicting that based on the data available, this program should either be substantially modified or discontinued. Rather, close to two decades later and after being blacklisted by the government, it is still a very well-funded program. One may argue that the program evaluators themselves did their job; however, what is the use of program evaluation if policymakers are not following recommendations based on data produced by evaluations? Both the scientific evidence and the anecdotal evidence seem to suggest that programs with evaluations built-in seem to result in better utilization of evaluation results and suggestions. This may partly result from better communication between the evaluator and the stakeholders, but if the evaluator is on a first-name basis (or maybe goes golfing) with the stakeholders, then what happens to his/her ability to remain objective? We will address these important issues in the sections that immediately follow by exploring the extant system of incentives shaping the practice of program evaluation.

What System of Incentives Governs the Practice of Program Evaluation?

Who are program evaluators.

On October 19, 2010, we conducted a survey of the brief descriptions of qualifications and experience of evaluators posted by program evaluators (344 postings in total) under the “Search Resumes” link on the American Evaluation Association (AEA) website ( http://www.eval.org/find_an_evaluator/evaluator_search.asp ). Program evaluators’ skills were evenly split in their levels of quantitative (none: 2.0%; entry: 16.9%; intermediate: 37.5%; advanced: 34.9%; expert: 8.4%; strong: 0.3%) and qualitative evaluation experience (none: 1.5%; entry: 17.2%; intermediate: 41.6%; advanced: 27.3%; expert: 12.2%; strong: 0.3%). Program evaluators also expressed a range of years they were involved with evaluation (<1 year: 12.5%; 1–2 years: 20.1%; 3–5 years: 24.1%; 6–10 years: 19.8%; 〉10 years: 23.5%).

In general, program evaluators were highly educated, with the highest degree attained being either a masters (58.8%) or a doctorate of some sort (36%), and fewer program evaluators had only an associates (0.3%) or bachelors degree (5.0%). The degree specializations were also widely distributed. Only 12.8% of the program evaluators with posted resumes described their education as including some sort of formal training specifically in evaluation. The most frequently mentioned degree specialization was in some field related to Psychology (25.9%), including social psychology and social work. The next most common specialization was in Education (15.4%), followed by Policy (14.0%), Non-Psychology Social Sciences (12.2%), Public Health or Medicine (11.6%), Business (11.3%), Mathematics or Statistics (5.8%), Communication (2.9%), Science (2.3%), Law or Criminal Justice (2.0%), Management Information Systems and other areas related to Technology (1.5%), Agriculture (1.5%), and Other, such as Music (1.7%).

For Whom Do Program Evaluators Work?

We sampled job advertisements for program evaluators using several Internet search engines: usajobs.gov, jobbing.com, and human resources pages for government agencies such as National Institutes of Health (NIH), the National Institute of Mental Health (NIMH), Centers for Disease Control (CDC), and GAO. Based on this sampling, we determined there are four general types of program evaluation jobs.

Many agencies that deliver or implement social programs organize their own program evaluations, and these account for the first, second, and third types of program evaluation jobs available. The first type of program evaluation job is obtained in response to a call or request for proposals for a given evaluation. The second type of program evaluation job is obtained when the evaluand (the program under evaluation) is asked to hire an internal program evaluator to conduct a summative evaluation. The third general type of program evaluation job is obtained when a program evaluator is hired to conduct a formative evaluation; this category could include an employee of the evaluand who serves multiple roles in the organization, such as secretary and data collector.

We refer to the fourth type of program evaluation job as the Professional Government Watchdog. That type of evaluator works for an agency like the GAO. The GAO is an independent agency that answers directly to Congress. The GAO has 3300 workers ( http://www.gao.gov/about/workforce/ ) working in roughly 13 groups: (1) Acquisition and Sourcing Management; (2) Applied Research and Methods; (3) Defense Capabilities and Management; (4) Education, Workforce, and Income Security; (5) Financial Management and Assurance; (6) Financial Markets and Community Investment; (7) Health Care; (8) Homeland Security and Justice; (9) Information Technology; (10) International Affairs and Trade; (11) Natural Resources and Environment; (12) Physical Infrastructure; and (13) Strategic Issues. Each of these groups is tasked with the oversight of a series of smaller agencies that deal with that group’s content. For example, the Natural Resources and Environment group oversees the Department of Agriculture, Department of Energy, Department of the Interior, Environmental Protection Agency, Nuclear Regulatory Commission, Army Corps of Engineers, National Science Foundation, National Marine Fisheries Service, and the Patent and Trademark Office.

With the many billions of dollars being spent by the U. S. government on social programs, we sincerely doubt that 3300 workers can possibly process all the program evaluations performed for the entire federal government. Recall that the estimated average federal expenditure for DARE alone is three-quarters of a billion dollars per year and that this program has been supported continuously for 17 years. We believe that such colossal annual expenditures should include enough to pay for a few more of these “watchdogs” or at least justify the additional expense of doing so.

Who Pays the Piper?

The hiring of an internal program evaluator for the purpose of a summative evaluation is a recipe for an ineffective evaluation. There is a danger that the program evaluator can become what Scriven (1976 , 1983 ) has called a program advocate . According to Scriven, these program evaluators are not necessarily malicious but, rather, could be biased as a result of the nature of the relationship between the program evaluator, the program funder, and the program management. The internal evaluator is generally employed by, and answers to, the management of the program and not directly to the program funder. In addition, because the program evaluator’s job relies on the perceived “success” of the evaluation, there is an incentive to bias the results in favor of the program being evaluated. Scriven has argued that this structure may develop divided loyalties between the program being evaluated and the agency funding the program (Shadish, Cook, & Leviton, 1991). Scriven (1976 , 1983 ) has recommended that summative evaluations are necessary for a society to optimize resource allocation but that we should also periodically re-assign program evaluators to different program locations to prevent individual evaluators from being co-opted into local structures. The risks of co-opting are explained in the next section .

Moral Hazards and Perverse Incentives

As a social institution, the field of program evaluation has professed very high ethical standards. For example, in 1994 The Joint Committee on Standards for Educational Evaluation produced the Second Edition of an entire 222-page volume on professional standards in program evaluation. Not all were what we would typically call ethical standards per se , but one of the four major categories of professional evaluation standards was called Propriety Standards and addressed what most people would refer to as ethical concerns. The other three categories were denoted Utility Standards, Feasibility Standards , and Accuracy Standards . Although it might be argued that a conscientious program evaluator is ethically obligated to carefully consider the utility, feasibility, and accuracy of the evaluation, it is easy to imagine how an occasional failure in any of these other areas might stem from factors other than an ethical lapse.

So why do we need any protracted consideration of moral hazards and perverse incentives in a discussion of program evaluation? We should make clear at the outset that we do not believe that most program evaluators are immoral or unethical. It is important to note that in most accepted uses of the term, the expression moral hazard makes no assumptions, positive or negative, about the relative moral character of the parties involved, although in some cases the term has unfortunately been used in that pejorative manner. The term moral hazard only refers (or should only refer) to the structure of perverse incentives that constitute the particular hazard in question ( Dembe & Boden, 2000 ). We wish to explicitly avoid the implication that there are immoral or unethical individuals or agencies out there that intentionally corrupt the system for their own selfish benefit. Unethical actors hardly need moral hazards to corrupt them: They are presumably already immoral and can therefore be readily corrupted, presumably with little provocation. It is the normally moral or ethical people about which we need to worry under the current system of incentives, because this system may actually penalize them for daring to do the right thing for society.

Moral hazards and perverse incentives refer to conditions under which the incentive structures in place tend to promote socially undesirable or harmful behavior (e.g., Pauly, 1974 ). Economic theory refers the socially undesirable or harmful consequences of such behavior as market failures , which occur when there is an inefficient allocation of goods and services in a market. Arguably, continued public or private funding of an ineffective or harmful social program therefore constitutes a market failure, where the social program is conceptualized as the product that is being purchased. In economics, one of the well-documented causes of market failures is incomplete or incorrect information on which the participants in the market base their decisions. That is how these concepts may relate to the field of program evaluation.

One potential source of incomplete or incorrect information is referred to in economic theory as that of information asymmetry , which occurs in economic transactions where one party has access to either more or better information than the other party. Information asymmetry may thus lead to moral hazard, where one party to the transaction is insulated from the adverse consequences of a decision but has access to more information than another party (specifically, the party that is not insulated from the adverse consequences of the decision in question). Thus, moral hazards are produced when the party with more information has an incentive to act contrary to the interests of the party with less information. Moral hazard arises because one party does not risk the full consequences of its own decisions and presumably acquires the tendency to act less cautiously than otherwise, leaving another party to suffer the consequences of those possibly ill-advised decisions.

Furthermore, a principal-agent problem might also exist where one party, called an agent , acts on behalf of another party, called the principal . Because the principal usually cannot completely monitor the agent, the situation often develops where the agent has access to more information than the principal does. Thus, if the interests of the agent and the principal are not perfectly consistent and mutually aligned with each other, the agent may have an incentive to behave in a manner that is contrary to the interests of the principal. This is the problem of perverse incentives , which are incentives that have unintended and undesirable effects (“unintended consequences”), defined as being against the interests of the party providing the incentives (in this case, the principal). A market failure becomes more than a mere mistake and instead becomes the inevitable product of a conflict of interests between the principal and the agent. A conflict of interests may lead the agent to manipulate the information that they provide to the principal. The information asymmetry thus generated will then lead to the kind of market failure referred to as adverse selection . Adverse selection is a market failure that occurs when information asymmetries between buyers and sellers lead to suboptimal purchasing decisions on the part of the buyer, such as buying worthless or detrimental goods or services (perhaps like DARE?).

When applying these economic principles to the field of program evaluation, it becomes evident that because program evaluators deal purely in information, and this information might be manipulated—either by them or by the agencies for which they work (or both of them in implicit or explicit collusion)—we have a clear case of information asymmetry . This information asymmetry, under perverse incentives , may lead to a severe conflict of interests between the society or funding agency (the principal) and the program evaluator (the agent). This does not mean that the agent must perforce be corrupted, but the situation does create a moral hazard for the agent, regardless of any individual virtues. If the perverse incentives are acted on (meaning they indeed elicit the execution of impropriety), then it is clearly predicted by economic theory to produce a market failure and specifically adverse selection on the part of the principal.

Getting back to the question of the professional standards actually advocated within program evaluation, how do these lofty ideals compare to the kind of behavior that might be expected under moral hazards and perverse incentives, presuming that program evaluators are subject to the same kind of motivations, fallibilities, and imperfections as the rest of humanity? The Joint Committee on Standards for Educational Evaluation (1994) listed the following six scenarios as examples of conflicts of interest:

Evaluators might benefit or lose financially, long term or short term, depending on what evaluation results they report, especially if the evaluators are connected financially to the program being evaluated or to one of its competitors.

The evaluator’s jobs and/or ability to get future evaluation contracts might be influenced by their reporting of either positive or negative findings.

The evaluator’s personal friendships or professional relationships with clients may influence the design, conduct, and results of an evaluation.

The evaluator’s agency might stand to gain or lose, especially if they trained the personnel or developed the materials involved in the program being evaluation.

A stakeholder or client with a personal financial interest in a program may influence the evaluation process.

A stakeholder or client with a personal professional interest in promoting the program being evaluated may influence the outcome of an evaluation by providing erroneous surveys or interview responses. (p. 115)

In response to these threats to the integrity of a program evaluation, the applicable Propriety Standard reads: “Conflicts of interest should be dealt with openly and honestly, so that it does not compromise the evaluation processes and results” ( The Joint Committee on Standards for Educational Evaluation, 1994 , p. 115). Seven specific guidelines are suggested for accomplishing this goal, but many of them appear to put the onus on the individual evaluators and their clients to avoid the problem. For example, the first three guidelines recommend that the evaluator and the client jointly identify in advance possible conflicts of interest, agree in writing to preventive procedures, and seek more balanced outside perspectives on the evaluation. These are all excellent suggestions and should work extremely well in all cases, except where either the evaluator, the client, or both are actually experiencing real-world conflicts of interests. Another interesting guideline is: “Make internal evaluators directly responsible to agency heads, thus limiting the influence other agency staff might have on the evaluators” (p. 116). We remain unconvinced that the lower-echelon and often underpaid agency staff have more of a vested interest in the outcome of an evaluation than the typically more highly paid agency head presumably managing the program being evaluated.

A similar situation exists with respect to the Propriety Standards for the Disclosure of Findings: “The formal parties to an evaluation should ensure that the full set of evaluation findings along with pertinent limitations are made accessible to the persons affected by the evaluation, and any others with expressed legal rights to receive the results” ( The Joint Committee on Standards for Educational Evaluation, 1994 , p. 109). This statement implicitly recognizes the problem of information asymmetry described above but leaves it up to the “formal parties to an evaluation” to correct the situation. In contrast, we maintain that these are precisely the interested parties that will be most subject to moral hazards and perverse incentives and are therefore the least motivated by the financial, professional, and possibly even political incentives currently in place to act in the broader interests of society as a whole in the untrammelled public dissemination of information.

Besides financial gain or professional advancement, Stufflebeam (2001) has recognized political gains and motivations also play a role in the problem of information asymmetry:

The advance organizers for a politically controlled study include implicit or explicit threats faced by the client for a program evaluation and/or objectives for winning political contests. The client’s purpose in commissioning such a study is to secure assistance in acquiring, maintaining, or increasing influence, power, and/or money. The questions addressed are those of interest to the client and special groups that share the client’s interests and aims. Two main questions are of interest to the client: What is the truth, as best can be determined, surrounding a particular dispute or political situation? What information would be advantageous in a potential conflict situation? … Generally, the client wants information that is as technically sound as possible. However, he or she may also want to withhold findings that do not support his or her position. The strength of the approach is that it stresses the need for accurate information. However, because the client might release information selectively to create or sustain an erroneous picture of a program’s merit and worth, might distort or misrepresent the findings, might violate a prior agreement to fully release findings, or might violate a “public’s right to know” law, this type of study can degenerate into a pseudoevaluation. (p. 10–11)

By way of solutions, Stufflebeam (2001) then offers:

While it would be unrealistic to recommend that administrators and other evaluation users not obtain and selectively employ information for political gain, evaluators should not lend their names and endorsements to evaluations presented by their clients that misrepresent the full set of relevant findings, that present falsified reports aimed at winning political contests, or that violate applicable laws and/or prior formal agreements on release of findings. (p. 10)

Like most of the guidelines offered by The Joint Committee on Standards for Educational Evaluation (1994) for the Disclosure of Findings, this leaves it to the private conscience of the individual administrator or evaluator to not abuse their position of privileged access to the information produced by program evaluation. It also necessarily relies on the individual administrator’s or evaluator’s self-reflective and self-critical conscious awareness of any biases or selective memory for facts that one might bring to the evaluation process, to be intellectually alerted and on guard against them.

To be fair, some of the other suggestions offered in both of these sections of the Propriety Standards are more realistic, but it is left unclear exactly who is supposed to be specifically charged with either implementing or enforcing them. If it is again left up to either the evaluator or the client, acting either individually or in concert, it hardly addresses the problems that we have identified. We will take up some of these suggestions later in this chapter and make specific recommendations for systemic institutional reforms as opposed to individual exhortations to virtue.

As should be clear from our description of the nature of the problem, it is impossible under information asymmetry to identify specific program evaluations that have been subject to these moral hazards, precisely because they are pervasive and not directly evident (almost by definition) in any individual final product. There is so much evidence for these phenomena from other fields, such as experimental economics, that the problems we are describing should be considered more than unwarranted speculation. This is especially true in light of the fact that some of our best hypothetical examples came directly from the 1994 book cited above on professional evaluation standards, indicating that these problems have been widely recognized for some time. Further, we do not think that we are presenting a particularly pejorative view of program evaluation collectively or of program evaluators individually: we are instead describing how some of the regrettable limitations of human nature, common to all areas of human endeavor, are exacerbated by the way that program evaluations are generally handled at the institutional level. The difficult situation of the honest and well-intentioned program evaluator under the current system of incentives is just a special case of this general human condition, which subjects both individuals and agencies to a variety of moral hazards.

Cui Bono? The Problem of Multiple Stakeholders

In the historic speech, Pro Roscio Amerino , given by Marcus Tullius Cicero in 80 BC, he is quoted as having said ( Berry, 2000 ):

The famous Lucius Cassius, whom the Roman people used to regard as a very honest and wise judge, was in the habit of asking, time and again, “To whose benefit?”

That speech made famous the expression “cui bono?” for the next two millennia that followed. In program evaluation, we have a technical definition for the generic answer to that question. Stakeholders are defined as the individuals or organizations that are either directly or indirectly affected by the program and its evaluation ( Rossi & Freeman, 1993 ). Although a subtle difference here is that the stakeholders can either gain or lose and do not always stand to benefit, the principle is the same. Much of what has been written about stakeholders in program evaluation is emphatic on the point that the paying client is neither the only, nor necessarily the most important, stakeholder involved. The evaluator is responsible for providing information to a multiplicity of different interest groups. This casts a program evaluator more in the role of a public servant than a private contractor.

For example, The Joint Committee on Standards for Educational Evaluation (1994) addressed the problem of multiple stakeholders under several different and very interesting headings. First, under Utility Standards, they state that Stakeholder Identification is necessary so that “[p]ersons involved in or affected by the evaluation should be identified, so that their needs can be addressed” (p. 23). This standard presupposes the rather democratic and egalitarian assumption that the evaluation is being performed to address the needs of all affected and not just those of the paying client.

Second, in the Feasibility Standards, under Political Viability , the explain that “[t]he evaluation should be planned and conducted with anticipation of the different positions of various interest groups, so that their cooperation might be obtained, and so that possible attempts by any of these groups to curtail evaluation operations or to bias or misapply the results can be averted or counteracted” (p. 63). This standard instead presupposes that the diverse stakeholder interests have to be explicitly included within the evaluation process because of political expediency, at the very least as a practical matter of being able to effectively carry out the evaluation, given the possible interference by these same special interest groups. The motivation of the client in having to pay to have these interests represented, and of the evaluator in recommending that this be done, might therefore be one of pragmatic or “enlightened” self-interest rather than of purely altruistic and public-spirited goals.

Third, in the Propriety Standards, under Service Orientation , they state: “Evaluations should be designed to assist organizations to address and effectively serve the needs of the targeted participants” (p. 83). This standard presupposes that both the client, directly, and the evaluator, indirectly, are engaged in public service for the benefit of these multiple stakeholders. Whether this results from enlightened self-interest on either of their parts, with an eye to the possible undesirable consequences of leaving any stakeholder groups unsatisfied, or to disinterested and philanthropic communitarianism is left unclear.

Fourth, in the Propriety Standards, under Disclosure of Findings, as already quoted above, there is the statement that the full set of evaluation findings should be made accessible to all the persons affected by the evaluation and not just to the client. This standard again presupposes that the evaluation is intended and should be designed for the ultimate benefit of all persons affected. So all persons affected are evidently “ cui bono? ”As another ancient aphorism goes, “ vox populi, vox dei ” (“the voice of the people is the voice of god,” first attested to have been used by Alcuin of York, who disagreed with the sentiment, in a letter to Charlemagne in 798 AD; Page, 1909, p. 61).

Regardless of the subtle differences in perspective among many of these standards, all of them present us with a very broad view of for whom program evaluators should actually take themselves to working. These standards again reflect very lofty ethical principles. However, we maintain that the proposed mechanisms and guidelines for achieving those goals remain short of adequate to insure success.

What Do Program Evaluators Actually Do? Part I: Training and Competencies

Conceptual foundations of professional training.

Recent attempts have been made ( King, Stevahn, Ghere, & Minnema, 2001 ; Stevahn, King, Ghere, & Minnema, 2005 ) at formalizing the competencies and subsequent training necessary of program evaluators. These studies have relied on the thoughts and opinions of practicing evaluators in terms of their opinion of the essential competencies of an effective evaluator. In their studies, participants were asked to rate their perceived importance on a variety of skills that an evaluator should presumably have. In this study ( King et al., 2001 ), there was remarkably general agreement among evaluators for competencies that an evaluator should possess. For example, high agreement was observed for characteristics such as the ability to collect, analyze, and interpret data as well as to report the results. In addition, there was almost universal agreement regarding the evaluator’s ability to frame the evaluation question as well as understand the evaluation process. These areas of agreement suggest that the essential training that evaluators should have are in the areas of data-collection methods and data-analytic techniques. Surprisingly, however, there was considerable disagreement regarding the ability to do research-oriented activities, drawing a line between conducting evaluation and conducting research. Nonetheless, we believe that training in research-oriented activities is essential to program evaluation because the same techniques such as framing questions, data collection, and data analysis and interpretation are gained through formal training in research methods. This evidently controversial position will be defended further below. Formal training standards are not yet developed for the field of evaluation ( Stevahan et al., 2005 ). However, it does appear that the training necessary to be an effective evaluator includes formal and rigorous training in both research methods and the statistical models that are most appropriate to those methods. Further below, we outline some of the research methodologies and statistical models that are most common within program evaluation.

In addition to purely data-analytic models, however, logic models provide program evaluators with an outline, or a roadmap, for achieving the outcome goals of the program and illustrate relationships between resources available, planned activities, and the outcome goals. The selection of outcome variables is important because these are directly relevant to the assessment of the success of the program. An outcome variable refers to the chosen changes that are desired by the program of interest. Outcome variables can be specified at the level of the individual, group, or population and can refer to a change in specific behaviors, practices, or ways of thinking. A generic outline for developing a logic model is presented by the United Way (1996). They define a logic model as including four components. The first component is called Inputs and refers to the resources available to program, including financial funds, staff, volunteers, equipment, and any potential restraints, such as licensure. The second component is called Activities and refers to any planned services by the program, such as tutoring, counseling, or training. The third component is called Outputs and refers to the number of participants reached, activities performed, product or services delivered, and so forth. The fourth component is called Outcomes and refers to the benefits produced by those outputs for the participants or community that the program was directed to help. Each component of the logic model can be further divided into initial or intermediate goals, with a long- or short-term timeframe, and can include multiple items within each component.

Table 17.1 displays an example of a logic model. The logic model shown is a tabular representation that we prepared of the VERB Logic Model developed for the Youth Media Campaign Longitudinal Survey, 2002–2004 ( Center for Disease Control, 2007 ). This logic model describes the sequence of events envisioned by the program for bringing about behavior change, presenting the expected relations between the campaign inputs, activities, impacts, and outcomes. A PDF of the original figure can be downloaded directly from the CDC website ( http://www.cdc.gov/youthcampaign/research/PDF/LogicModel.pdf ).

We believe that it is essential for program evaluators to be trained in the development and application of logic models because they can assist immensely in both the design and the analysis phases of the program evaluation. It is also extremely important that the collaborative development of logic models be used as a means of interacting and communicating with the program staff and stakeholders during this process, as an additional way of making sure that their diverse interests and concerns are addressed in the evaluation of the program.

Conceptual Foundations of Methodological and Statistical Training

In response to a previous assertion by Shadish, Cook, and Leviton (1991) that program evaluation was not merely “applied social science,” Sechrest and Figueredo (1993) argued that the reason that this was so was:

Shadish et al. (1991) appeal to the peculiar problems manifest in program evaluation. However, these various problems arise not merely in program evaluation but whenever one tries to apply social science. The problems, then, arise not from the perverse peculiarities of program evaluation but from the manifest failure of much of mainstream social science and the identifiable reasons for that failure. (p. 646–647)

These “identifiable reasons” consisted primarily of various common methodological practices that led to the “chronically inadequate external validity of the results of the dominant experimental research paradigm” (p. 647) that had been inadvisedly adopted by mainstream social science.

According to Sechrest and Figueredo (1993) , the limitations of these sterile methodological practices were very quickly recognized by program evaluators, who almost immediately began creating the quasi-experimental methods that were more suitable for real-world research and quickly superseded the older laboratory-based methods, at least within program evaluation:

Arguably, for quasi-experimentation, the more powerful and sophisticated intellectual engines of causal inference are superior, by now, to those of the experimental tradition. (p. 647)

The proposed distinction between program evaluation and applied social science was therefore more a matter of practice than a matter of principle . Program evaluation had adopted methodological practices that were appropriate to its content domain, which mainstream social science had not. The strong implication was that the quasi-experimental methodologies developed within program evaluation would very likely be more suitable for applied social science in general than the dominant experimental paradigm.

Similarly, we extend this line of reasoning to argue that program evaluators do not employ a completely unique set of statistical methods either. However, because program evaluators disproportionately employ a certain subset of research methods, which are now in more general use throughout applied psychosocial research, it necessarily follows that they must therefore disproportionally employ a certain subset of statistical techniques that are appropriate to those particular designs. In the sections below, we therefore concentrate on the statistical techniques that are in most common use in program evaluation, although these data-analytic methods are not unique to program evaluation per se .

What Do Program Evaluators Actually Do? Part II: Quantitative Methods

Foundations of quantitative methods: methodological rigor.

Even its many critics acknowledge that the hallmark and main strength of the so-called quantitative approach to program evaluation resides primarily in its methodological rigor, whether it is applied in shoring up the process of measurement or in buttressing the strength of causal inference. In the following sections, we review a sampling of the methods used in quantitative program evaluation to achieve the sought-after methodological rigor, which is the “Holy Grail” of the quantitative enterprise.

Evaluation-Centered Validity

Within program evaluation, and social sciences in general, there are several types of validity that have been identified. Cook and Campbell (1979) formally distinguished between four types of validity more specific to program evaluation: (1) internal validity, (2) external validity, (3) statistical conclusion validity, and (4) construct validity. Internal validity refers to establishing the causal relationship between two variables such as treatment and outcome; external validity refers to supporting the generalization of results beyond a specific study; statistical conclusion validity refers to applying statistical techniques appropriately to a given problem; and construct validity falls within a broader class of validity issues in measurement (e.g. face validity, criterion validity, concurrent validity, etc.) but specifically consists of assessing and understanding program components and outcomes accurately. In the context of a discussion of methods in program evaluation, two forms of validity take primacy: internal and external validity. Each validity type is treated with more detail in the following sections.

internal validity

The utility of a given method in program evaluation is generally measured in terms of how internally valid it is believed to be. That is, the effectiveness of a method in its ability to determine the causal relationship between the treatment and outcome is typically considered in the context of threats to internal validity. There are several different types of threat to internal validity, each of which applies to greater and lesser degrees depending on the given method of evaluation. Here we describe a few possible threats to internal validity.

selection bias

Selection bias is the greatest threat to internal validity for quasi-experimental designs. Selection bias is generally a problem when comparing experimental and control groups that have not been created by the random assignment of participants. In such quasi-experiments, group membership (e.g., treatment vs. control) may be determined by some unknown or little-known variable that may contribute to systematic differences between the groups and may thus become confounded with the treatment. History is another internal validity threat. History refers to any events, not manipulated by the researcher, that occur between the treatment and the posttreatment outcome measurement that might even partially account for that posttreatment outcome. Any events that coincide with the treatment, whether systematically related to the treatment or not, that could produce the treatment effects on the outcome are considered history threats. For example, practice effects in test taking could account for differences pretest and posttreatment if the same type of measure is given at each measurement occasion. Maturation is the tendency for changes in an outcome to spontaneously occur over time. For example, consider a program aimed at increasing formal operations in adolescents. Because formal operations tend to increase over time during adolescence, the results of any program designed to promote formal operations during this time period would be confounded with the natural maturational tendency for formal operations to improve with age. Finally, regression to the mean may cause another threat to internal validity. These regression artifacts generally occur when participants are selected into treatment groups or programs because they are unusually high or low on certain characteristics. When individuals deviate substantially from the mean, this might in part be attributable to errors of measurement. In such cases, it might be expected that over time, their observed scores will naturally regress back toward the mean, which is more representative of their true scores. In research designs where individuals are selected in this way, programmatic effects are difficult to distinguish from those of regression toward the mean. Several other forms of threats to internal validity are also possible (for examples, see Shadish, Cook, & Campbell, 2002 ; Mark & Cook, 1984 ; Smith, 2010 ).

external validity

External validity refers to the generalizability of findings, or the application of results beyond the given sample in a given setting. The best way to defend against threats of external validity is to conduct randomized experiments on representative samples, where participants are first randomly drawn from the population and then randomly assigned to the treatment and control groups. Because there are no prior characteristics systematically shared by all members of either the control or treatment participants with members of their own corresponding groups, but systematically differing between those groups, it can be extrapolated that the effect of a program is applicable to others beyond the specific sample assessed. This is not to say that the results of a randomized experiment will be applicable to all populations. For example, if a program is specific to adolescence and was only tested on adolescents, then the impact of the treatment may be specific to adolescents. On the contrary, evaluations that involve groups that were nonrandomly assigned face the possibility that the effect of the treatment is specific to the population being sampled and thus becomes ungeneralizable to other populations. For example, if a program is designed to reduce the recidivism rates of violent criminals, but the participants in a particular program are those who committed a specific violent crime, then the estimated impact of that program may be specific to only those individuals who committed that specific crime and not generalizable to other violent offenders.

Randomized Experiments

Randomized experiments are widely believed to offer evaluators the most effective way of assessing the causal influence of a given treatment or program ( St. Pierre, 2004 ). The simplest type of randomized experiment is one in which individuals are randomly assigned to one of at least two groups—typically a treatment and control group. By virtue of random assignment, each group is approximately equivalent in their characteristics and thus threats to internal validity as a result of selection bias are, by definition, ruled out. Thus, the only systematic difference between the groups is implementation of the treatment (or program participation), so that any systematic differences between groups can be safely attributed to receiving or not receiving the treatment. It is the goal of the evaluator to assess this degree of difference to determine the effectiveness of the treatment or program ( Heckman & Smith, 1995 ; Boruch, 1997 ).

Although randomized experiments might provide the best method for establishing the causal influence of a treatment or program, they are not without their problems. For example, it may simply be undesirable or unfeasible to randomly assign participants to different groups. Randomized experiments may be undesirable if results are needed quickly. In some cases, implementation of the treatment may take several months or even years to complete, precluding timely assessment of the treatment’s effectiveness. In addition, it is not feasible to randomly assign participant characteristics. That is, questions involving race or sex, for example, cannot be randomly assigned, and, therefore, use of a randomized experiment to answer questions that center on these characteristics is impossible. Although experimental methods are useful for eliminating these confounds by distributing participant characteristics evenly across groups, when research questions center on these prior participant characteristics, experimental methods are not feasible methods to apply to this kind of problem. In addition, there are ethical considerations that must be taken into account before randomly assigning individuals to groups. For example, it would be unethical to assign participants to a cigarette smoking condition or other condition that may cause harm. Furthermore, it is ethically questionable to withhold effective treatment from some individuals and administer treatment to others, such as in cancer treatment or education programs ( see Cook, Cook, & Mark, 1977 ; Shadish et al., 2002 ). Randomized experiments may also suffer other forms of selection bias insensitive to randomization. For example, selective attrition from treatments may create nonequivalent groups if some individuals are systematically more likely to drop out than others ( Smith, 2010 ). Randomized experiments may also suffer from a number of other drawbacks. For a more technical discussion of the relationship between randomized experiments and causal inference, see Cook, Scriven, Coryn, and Evergreen (2010) .

Quasi-Experiments

Quasi-experiments are identical to randomized experiments with the exception of one element: randomization. In quasi-experimental designs, participants are not randomly assigned to different groups, and thus the groups are considered non-equivalent. However, during data analysis, a program evaluator may attempt to construct equivalent groups through matching. Matching involves creating control and treatment groups that are similar in their characteristics, such as age, race, and sex. Attempts to create equivalent groups through matching may result in undermatching, where groups may be similar in one characteristic (such as race) but nonequivalent in others (such as socioeconomic status). In such situations, a program evaluator may make use of statistical techniques that control for undermatching ( Smith, 2010 ) or decide to only focus on matching those characteristics that could moderate the effects of the treatment.

Much debate surrounds the validity of using randomized experiments versus quasi-experiments in establishing causality ( see , for example, Cook et. al. 2010 ). Our goal in this section is not to evaluate the tenability of asserting causality within quasi-experimental designs (interested readers are referred to Cook & Campbell, 1979 ) but, rather, to describe some of the more common methods that fall under the rubric of quasi-experiments and how they relate to program evaluation.

one-group, posttest-only design

Also called the one-shot case study ( Campbell, 1957 ), the one-group, posttest-only design provides the evaluator with information only about treatment participants and only after the treatment has been administered. It contains neither a pretest nor a control group, and thus conclusions about program impact are generally ambiguous. This design can be diagrammed:

The NR refers to the nonrandom participation in this group. The X refers to the treatment, which from left to right indicates that it temporally precedes the outcome (O), and the subscript 1 indicates that the outcome was measured at time-point 1. Although simple in its formulation, this design has a number of drawbacks that may make it undesirable. For example, this design is vulnerable to several threats to internal validity, particularly history threats ( Kirk, 2009 ; Shadish et al., 2002 ). Because there is no other group with which to make comparisons, it is unknown if the treatment is directly associated with the outcome or if other events that coincide with treatment implementation confound treatment effects.

Despite these limitations, there is one circumstance in which this design might be appropriate. As discussed by Kirk (2009) , the one-group, posttest-only design may be useful when sufficient knowledge about the expected value of the dependent variable in the absence of the treatment is available. For example, consider high school students who have taken a course of calculus and recently completed an exam. To assess the impact of the calculus course, one would have to determine the average expected grade on the exam had the students not taken the course and compare it to the scores they actually received ( Shadish et al., 2002 ). In this situation, the expected exam grade for students had they not taken the course would likely be very low compared to the student’s actual grades. Thus, this technique is only likely useful when the size of the effect (taking the class) is relatively large and distinct from alternative possibilities (such has history threat).

posttest-only, nonequivalent groups design

This design is similar to the one-group, posttest-only design in that only posttest measures are available; however, in this design, a comparison group is available. Unlike a randomized experiment with participants randomly assigned to a treatment and a control group, in this design participant group membership is not randomized. This design can be diagrammed:

Interpretation of this diagram is similar to that of the previous one; however, in this diagram, the dashed line indicates that the participants in each of these groups are different individuals. It is important to note that the individuals in these two groups represent nonequivalent groups and may be systematically different from each other in some uncontrolled extraneous characteristics. This design is a significant improvement over the one-group, posttest-only design in that a comparison group that has not experienced the treatment can be compared on the dependent variable of interest. The principal drawback, however, is that this method may suffer from selection bias if the control and treatment groups differ from each other in a systematic way this is not related to the treatment (Melvin & Cook, 1984 ). For example, participants selected into a treatment based on their need for the treatment may differ on characteristics other than treatment need from those not selected into the treatment.

Evaluators may implement this method when pretest information is not available, such as when a treatment starts before the evaluator has been consulted. In addition, an evaluator may choose to use this method if pretest measurements have the potential to influence posttest outcomes ( Willson & Putnam, 1982 ). For example, consider a program designed to increase spelling ability in middle childhood. At pretest and posttest, children are given a list of words to spell. Program effectiveness would then be assessed via estimating the improvement in spelling by comparing their spelling performance before and after the program. However, if the same set of words were given to children at posttest that where administered in the pretest, then the effect of the program might be confounded with a practice effect.

Although it is possible that pretest measures may influence posttest outcomes, such situations are likely to be relatively rare. In addition, the costs of not including a pretest may significantly outweigh the potential benefits ( see Shadish et al., 2002 ).

one-group, pretest–posttest design

In the pretest–posttest design, participants are assessed before the treatment and assessed again after the treatment has been administered. However, there is no control group comparison. The form of this design is:

This design provides a baseline with which to compare the same participants before and after treatment. Change in the outcome between pretest and posttest is commonly attributed to the treatment. This attribution, however, may be misinformed as the design is vulnerable to threats to internal validity. For example, history threats may occur if uncontrolled extraneous events coincide with treatment implementation. In addition, maturation threats may also occur if the outcome of interest is related with time. Finally, if the outcome measure was unusually high or low at pretest, then the change detected by the posttest may not be the result of the treatment but, rather, of regression toward the mean ( Melvin & Cook, 1984 ).

Program evaluators might use this method when it is not feasible to administer a program only to one set of individuals and not to another. For example, this method would be useful if a program has been administered to all students in a given school, where there cannot be a comparative control group.

pretest and posttest, nonequivalent groups design

The pretest and posttest nonequivalent groups design is probably the most common to program evaluators ( Shadish et al., 2002 ). This design combines the previous two designs by not only including pretest and posttest measures but also a control group at pretest and posttest. This design can be diagrammed:

The advantage of this design is that threats to internal validity can more easily be ruled out ( Mark & Cook, 1984 ). When threats to internal validity are plausible, they can be more directly assessed in this design. Further, in the context of this design, statistical techniques are available to help account for potential biases ( Kenny, 1975 ). Indeed, several authors make recommendations that data should be analyzed in a variety of ways to determine the proper effect size of the treatment and evaluate the potential for selection bias that might be introduced as a result of nonrandom groups ( see Cook & Campbell, 1979 ; Reichardt, 1979 ; Bryk, 1980 ).

In summary, the pretest and posttest, nonequivalent groups design, although not without its flaws, is a relatively effective technique for assessing treatment impact. An inherent strength of this design is that with the exception of selection bias as a result of nonrandom groups, no single general threat to internal validity can be assigned. Rather, threats to internal validity are likely to be specific to the given problem under evaluation.

interrupted time series design

The interrupted time series design is essentially an extension of the pretest and posttest, nonequivalent groups design, although it not strictly necessary for one to include a control group. Ideally, this design consists of repeated measures of some outcome prior to treatment, implementation of the treatment, and then repeated measures of the outcome after treatment. The general form of this design can be diagrammed:

In this diagram, the first line of Os refers to the treatment group, which can be identified by the X among the Os. The second line of Os refers to the control condition, as indicated by the lack of an X. The dashed line between the two conditions indicates participants are different between the two groups, and the NR indicates that individuals and nonrandomly distributed between the groups.

Interrupted time series design is considered by many to be the most powerful quasi-experimental design to examine the longitudinal effects of treatments ( Wagner et al., 2002 ). Several pieces of information can be gained about the impact of a treatment. The first is a change in the level of the outcome (as indicated by a change in the intercept of the regression line) after the treatment. This simply means that change in mean levels of the outcome as a result of the treatment can be assessed. The second is change in the temporal trajectory of the outcome (as indicated by a change in the slope of the regression line). Because of the longitudinal nature of the data, the temporal trajectories of the outcome can be assessed both pre- and posttreatment, and any change in the trajectories can be estimated. Other effects can be assessed as well, such as any changes in the variances of the outcomes after treatment, whether the effect of the treatment is continuous or discontinuous and if the effect of the treatment is immediate or delayed ( see Shadish et al., 2002 ). Thus, several different aspects of treatment implementation can be assessed with this design.

In addition to its utility, the interrupted time series design (with a control group) is robust against many forms of internal validity threat. For example, with a control group added to the model, history is no longer a threat because any external event that might have co-occurred with the treatment should have affected both groups, presumably equally. In addition, systematic pretest differences between the treatment and control groups can be more accurately assessed because there are several pretest measures. Overall, the interrupted time series design with a nonequivalent control group is a very powerful design ( Mark & Cook, 1984 ).

A barrier to this design includes the fact that several measurements are needed both before and after treatment. This may be impossible if the evaluator was not consulted until after the treatment was implemented. In addition, some evaluators may have to rely on the availability of existing data that they did not collect or historical records. These limitations may place constraints on the questions that can be asked by the evaluator.

regression discontinuity design

First introduced to the evaluation community by Thistlethwaite and Campbell (1960) , the regression-discontinuity design (RDD) provides a powerful and unbiased method for estimating treatment effects that rivals that of a randomized experiment ( see Huitema, 1980 ). The RDD contains both a treatment and a control group. Unlike other quasi-experimental designs, however, the determination of group membership is perfectly known. That is, in the RDD, participants are assigned to either a treatment or control group based on a particular cutoff ( see also Trochim, 1984 , for a discussion of so-called fuzzy regression discontinuity designs). The RDD takes the following form:

O A refers to the pretest measure for which the criterion for group assignment is determined, C refers to the cutoff score for group membership, X refers to the treatment, and O 2 refers to the measured outcome. As an example, consider the case where elementary school students are assigned to a program aimed at increasing reading comprehension. Assignment to the program versus no program is determined by a particular cutoff score on a pretest measure of reading comprehension. In this case, group membership (control vs. treatment) is not randomly assigned; however, the principle or decision rule for assignment is perfectly known (e.g., the cut-off score). By directly modeling the known determinant of group membership, the evaluator is able to completely account for the selection process that determined group membership.

The primary threat to the internal validity of the RDD is history, although the tenability of this factor as a threat is often questionable. More importantly, the analyses of RDDs are by nature complex, and correctly identifying the functional forms of the regression parameters (linear, quadratic, etc.) can have a considerable impact on determining the effectiveness of a program ( see Reichardt, 2009 , for a review).

Measurement and Measurement Issues in Program Evaluation

In the context of program evaluation, three types of measures should be considered: (1) input measures, (2) process measures, and (3) outcome measures ( Hollister & Hill, 1995 ). Input measures consist of more general measures about the program and the participants in them, such as the number of individuals in a given program or the ethnic composition of program participants. Process measures center on the delivery of the program, such as a measure of teaching effectiveness in a program designed to improve reading comprehension in schoolchildren. Outcome measures are those measures that focus on the ultimate result of the program, such as a measure of reading comprehension at the conclusion of the program. Regardless of the type of measurement being applied, it is imperative that program evaluators utilize measures that are consistent with the goals of the evaluation. For example, in an evaluation of the performance of health-care systems around the world, the World Health Organization (WHO) published a report ( World Health Organization, 2000 ) that estimated how well the different health-care systems of different countries were functioning. As a part of this process, the authors of the report sought to make recommendations based on empirical evidence rather than WHO ideology. However, their measure of overall health system functioning was based, in part, on an Internet-based questionnaire of 1000 respondents, half of whom were WHO employees. In this case, the measure used to assess health system functioning was inconsistent with the goals of the evaluation, and this problem did not go unnoticed ( see Williams, 2001 ). Evaluators should consider carefully what the goals of a given program are and choose measures that are appropriate toward the goals of the program.

An important part of choosing measures appropriate to the goals of a program is choosing measures that are psychometrically sound. At minimum, measures should be chosen that have been demonstrated in past research to have adequate internal consistency. In addition, if the evaluator intends to administer a test multiple times, then the chosen measure should have good test–retest reliability. Similarly, if the evaluator chooses a measure that is scored by human raters, then the measure should show good inter-rater reliability. In addition to these basic characteristics of reliability, measures should also have good validity, in that they actually measure the constructs that they are intended to measure. Published measures are more likely to already possess these qualities and thus may be less problematical when choosing among possible measures.

It may be the case, however, that either an evaluator is unable to locate an appropriate measure or no appropriate measures currently exist. In this case, evaluators may consider developing their own scales of measurement as part of the process of program evaluation. Smith (2010) has provided a nice tutorial on constructing a survey-based scale for program evaluation. Rather than restate these points, however, we discuss some of the issues that an evaluator may face when constructing new measures in the process of program evaluation. Probably the most important point is that there is no way, a priori , to know that the measure being constructed is valid, in that it measures what it intended to measure. Presumably the measure will be high in face validity, but this does not necessarily translate into construct validity. Along these lines, if an evaluator intends to create their own measure of a given construct in the context of an evaluation, then the measure should be properly vetted regarding its utility in assessing program components prior to making any very strong conclusions.

One way to validate a new measure is to add additional measures in the program evaluation to show convergent and divergent validity. In addition, wherever possible, it would be ideal if pilot data on the constructed measure could be obtained from some of the program participants to help evaluate the psychometric properties of the measure prior to its administration to the larger sample that will constitute the formal program evaluation.

Another problem that program evaluators may face is that of “re-inventing the wheel,” when creating a measure from scratch. When constructing a measure, program evaluators are advised to research the construct that they intend to measure so that useful test items can be developed. One way to avoid re-inventing the wheel may be to either borrow items for other validated scales or to modify an existing scale to suit the needs of the program and evaluation, while properly citing the original sources. Collaboration with academic institutions can help facilitate this process by providing resources to which an evaluator may not already have access.

Statistical Techniques in Program Evaluation

Program evaluators may employ a wide variety of techniques to analyze the results of their evaluation. These techniques range from “simple” correlations, t -tests, and analyses of variance (ANOVAs) to more intensive techniques such as multilevel modeling, structural equation modeling, and latent growth curve modeling. It is often the case that the research method chosen for the evaluation dictates the statistical technique used to analyze the resultant data. For experimental designs and quasi-experimental designs, various forms of ANOVA, multiple regression, and non-parametric statistics may suffice. However, for longitudinal designs, there may be more options for the program evaluator in terms of how to analyze the data. In this section, we discuss some of the analytical techniques that might be employed when analyzing longitudinal data and, more specifically, the kind of longitudinal data derived from an interrupted time series design. For example, we discuss the relative advantages and disadvantages of repeated measures analysis of variance (RM-ANOVA), multilevel modeling, and latent growth curve modeling. For a more systematic review of some of the more basic statistical techniques in program evaluation, readers are referred to Newcomer and Wirtz (2004) .

To discuss the properties of each of these techniques, consider a hypothetical longitudinal study on alcohol use among adolescents. Data on alcohol consumption were collected starting when the adolescents were in sixth grade and continued through the twelfth grade. As a part of the larger longitudinal study, a group of adolescents were enrolled in a program aimed at reducing alcohol consumption during adolescence. The task of the evaluator is to determine the effectiveness of the program in reducing alcohol use across adolescence.

One way to analyze such data would be to use RM-ANOVA. In this analysis, the evaluator would have several measures of alcohol consumption across time and another binary variable that coded whether a particular adolescent received the program. When modeling this data, the repeated measures of alcohol consumption would be treated as a repeated measure, whereas the binary program variable would be treated as a fixed factor. The results of this analysis would indicate the functional form of the alcohol consumption trend over time as well as if the trend differed between the two groups (program vs. no program). The advantage of the repeated measures technique is that the full form of the alcohol consumption trajectory can be modeled, and increases and decreases in alcohol consumption can easily be graphically displayed (e.g., in SPSS). In addition, the shape of the trajectory (e.g., linear, quadratic, cubic, etc.) of alcohol consumption can be tested empirically through significance testing. The primary disadvantage of RM-ANOVA in this case is that the test of the difference between the two groups is limited to the shape of the overall trajectory and cannot be extended to specific periods of time. For example, prior to the treatment, we would expect that the two groups should not differ in their alcohol consumption trajectories; only after the treatment do we expect differences. Rather than specifically testing the difference in trajectories following the treatment, a test is being conducted about the overall shape of the curves. In addition, this technique cannot test the assumption that the two groups are equal in their alcohol consumption trajectories prior to the treatment, a necessary precondition needed to make inferences about the effectiveness of the program. To test these assumptions, we need to move to multilevel modeling (MLM).

Multilevel modeling is a statistical technique designed for use with data that violate the assumption of independence ( see Kenny, Kashy, & Cook, 2006 ). The assumption of independence states that after controlling for an independent variable, the residual variance between variables should be independent. Longitudinal data (as well as dyadic data) tend to violate this assumption. The major advantage of MLM is that the structure of these residual covariances can be directly specified ( see Singer, 1998 , for examples). In addition, and more specifically in reference to the current program evaluation example, the growth function of longitudinal data can be more directly specified in a number of flexible ways ( see , for example, Singer & Willett, 2003 , p. 138). One interesting technique that has seen little utilization in the evaluation field is what has been called a piecewise growth model ( see Seltzer, Frank, & Bryk, 1994 , for an example). In this model, rather than specifying a single linear or curvilinear slope, two slopes with a single intercept are modeled. The initial slope models change up to a specific point, whereas the subsequent slope models change after a specific point. Perhaps by now, the utility of this method has been discovered as it applies to time series analysis in that trajectories of change can be modeled before and after the implementation of a treatment, intervention, or program. In terms of the present example, change in alcohol consumption can be a model for the entire sample before and after the program implementation. Importantly, different slopes can be estimated for the two different groups (program vs. no program) and empirically tested for differences in the slopes. For example, consider a model that specified a linear growth trajectory for the initial slope (prior to the program) and another linear growth trajectory for the subsequent slope (after the program). In a piecewise growth model, significance testing (as well as the estimation of effect sizes) can be performed separately for both the initial slope and subsequent slope. Further, by adding the fixed effect of program participation (program vs. no program), initial and subsequent slopes for the different groups can be modeled and the differences between the initial and subsequent slopes for the two groups can be tested. With piecewise growth modeling, the evaluator can test the assumption that the initial slopes between the two groups are, in fact, the same as well as test the hypotheses that following the program the growth trajectories of the two groups differ systematically, with the intended effect being that the program group shows a less positive or even negative slope over time (increased alcohol consumption among adolescents being presumed undesirable).

Although this method is very useful for interrupted time series design, it is not without its drawbacks. Perhaps one drawback is the complexity of model building; however, this drawback is quickly ameliorated with some research on the topic and perhaps some collaboration. Another drawback to this technique is that the change in subsequent slope may be driven primarily by a large change in behavior immediately following the program and does not necessarily indicate a lasting change over time. Other modeling techniques can be used to explore such variations in behavioral change over time. The interested reader can refer to Singer and Willett (2003) .

Structural equation modeling can also be used to model longitudinal data through the use of latent growth curve models. For technical details on how to specify a latent growth curve model, the interested reader can refer to Duncan, Duncan, and Stryker (2006) . The primary advantage of using latent growth curve modeling over MLM is that latent variables can be used (indeed, piecewise growth models can be estimated in a latent growth model framework as well; see Muthén & Muthén, 2009, p. 105). In addition, more complex models such as multilevel latent growth curve models can be implemented. Such models also account for the interdependence of longitudinal data but are also useful when data are nested—for example, when there is longitudinal data on alcohol consumption in several different schools. These models can become increasingly complex, and it is recommended that evaluators without prior knowledge of this statistical technique seek the advice and possible collaboration with experts on this topic.

What Do Program Evaluators Actually Do? Part III: Qualitative Methods

Foundations of qualitative methods: credibility and quality.

The two principal pillars on which qualitative program evaluation rests are credibility and quality . These two concepts lie at the heart of all qualitative research, regardless of any more specific philosophical or ideological subscriptions ( Patton, 1999 ). Although these concepts are not considered to be purely independent of each other in the literature, for the sake of clarity of explanation, we will treat them as such unless otherwise specified.

credibility

When performing a literature search on the credibility concept within the qualitative paradigms, the emphasis seems to be primarily with the researcher and only secondarily on the research itself. The points most notably brought to light are those of researcher competence and trustworthiness .

Competence is the key to establishing the credibility of a researcher. If a researcher is deemed as incompetent, then the credibility and quality of the entire study immediately comes into question. One of the biggest issues lies with training of qualitative researchers in methods. In a classic example of the unreliability of eyewitness testimonies, Katzer, Cook, and Crouch (1978) point out what can happen when sufficient training does not occur. Ignorance is not bliss, at least in science. Giving any researcher tools without the knowledge to use them is simply bad policy. Subsequent to their initial training, the next most important consideration with respect to competence is the question of their scientific “track record.” If an evaluator has demonstrated being able to perform high-quality research many times, then it can be assumed that the researcher is competent.

trustworthiness

Something else to note when considering the credibility of an evaluator is trustworthiness. There is little doubt that the researcher’s history must be taken into account ( Patton, 1999 ). Without knowing where the researcher is “coming from,” in terms of possible ideological commitments, the reports made by a given evaluator may appear objective but might actually be skewed by personal biases. This is especially a problem with more phenomenological methods of qualitative program evaluation, such as interpretive and social constructionist. As Denzin (1989) and many others have pointed out, pure neutrality or impartiality is rare. This means that not being completely forthright about any personal biases should be a “red flag” regarding the trustworthiness (or lack thereof) of the evaluator.

judging credibility

There are those that argue that credibility and trustworthiness are not traits that an evaluator can achieve themselves, but rather that it has to be established by the stakeholders, presumably democratically and all providing equal input ( Atkinson, Heath, & Chenail, 1991 ). This notion seems to be akin to that of external validity. This is also fundamentally different from another school of thought that claims to be able to increase “truth value” via external auditing (Lincoln & Guba, 1985). Like external validation, Atkinson would argue that evaluators are not in a position to be able to judge their own work and that separate entities should be responsible for such judging. According to this perspective, stakeholders need to evaluate the evaluators. If we continue down that road, then the evaluators of the evaluators might need to be evaluated, and they will need to be evaluated, and so on and so forth. As the Sixth Satire , written by First Century Roman poet Decimus Iunius Iuvenalis, asks: “quis custodiet ipsos custodes?” (“who shall watch the watchers?” Ramsay, 1918 ) The way around this infinite regress is to develop some sort of standard by which comparisons between the researcher and the standard can be made.

Evaluators can only be as credible as the credibility of the system that brought them to their current positions. Recall that there is a diverse array of backgrounds among program evaluators and a broad armamentarium of research methods and statistical models available from which they can select, as well as the fact that there are currently no formal training standards in program evaluation ( Stevahan et al., 2005 ). Until a standard of training is in place, there is no objective way to assess the credibility of a researcher, and evaluators are forced to rely on highly subjective measures of credibility, fraught with biases and emotional reactions.

The other key concern in qualitative program evaluation is quality. Quality concerns echo those voiced regarding questions of reliability and validity in quantitative research, although the framing of these concepts is done within the philosophical framework of the research paradigm ( Golafshani, 2003 ). Patton, as the “go-to guy” for how to do qualitative program evaluations, has applied quantitative principles to qualitative program evaluation throughout his works ( Patton, 1999 , 1997 , 1990 ), although they seem to fall short in application. His primary emphases are on rigor in testing and interpretation.

rigorous testing

Apart from being thorough in the use of any single qualitative method , there appears to be a single key issue with respect to testing rigor, and this is called triangulation .

Campbell discussed the concept of methodological triangulation ( Campbell, 1953 , 1956 ; Campbell & Fiske, 1959 ). Triangulation is the use of multiple methods, each having their own unique biases, to measure a particular phenomenon. This multiple convergence allows for the systematic variance ascribable to the “trait” being measured by multiple indicators to be partitioned from the systematic variance associated with each “method” and from the unsystematic variance attributable to the inevitable and random “error” of measurement, regardless of the method used. Within the context of qualitative program evaluation, this can consist either of mixing quantitative and qualitative methods or of mixing qualitative methods. Patton (1999) outspokenly supported the use of either form of triangulation, because each method of measurement has its own advantages and disadvantages.

Other contributors to this the literature have claimed that the “jury is still out” concerning the advantages of triangulation ( Barbour, 1998 ) and that clearer definitions are needed to determine triangulation’s applicability to qualitative methods. Barbour’s claim seems unsupported because there is a clear misinterpretation of Patton’s work. Patton advocates a convergence of evidence. Because the nature of qualitative data is not as precise as the nature of quantitative data, traditional hypothesis testing is virtually impossible. Barbour is under the impression that Patton is referring to perfectly congruent results. This is obviously not possible because, as stated above, there will always be different divergences between different measures based on which method of measurement is used. Patton is advocating the use of multiple and mixed methods to produce consistent results. One example of how to execute triangulation within the qualitative paradigm focused on three different educational techniques ( Oliver-Hoyo & Allen, 2006 ). For cooperative grouping, hands-on activities, and graphical skills, these authors used interviews, reflective journal entries, surveys, and field notes. The authors found that the exclusive use of surveys would have led to different conclusions, because the results of the surveys alone indicated that there was either no change or a negative change, whereas the other methods instead indicated that there was a positive change with the use of these educational techniques. This demonstrates the importance of using triangulation. When results diverge, meaning that they show opposing trends using different methods, the accuracy of the findings falls into question.

Lincoln and Guba (1985) have also discussed the importance of triangulation but have emphasized its importance in increasing the rigor and trustworthiness of research with respect to the interpretation stage. This is ultimately because all methods will restrict what inferences can be made from a qualitative study.

rigorous interpretation

As with quantitative program evaluation, qualitative methods require rigorous interpretation at two levels: the microscale, which is the sample, and the macroscale, which is the population for quantitative researchers and is most often the social or global implication for qualitative researchers.

Looking at qualitative data is reminiscent of exploratory methods in quantitative research but without the significance tests. Grounded Theory is one such analytic method. The job of the researcher is to systematically consider all of the data and to extract theory from the data ( Strauss & Corbin, 1990 ). The only exception made is for theory extension when going with a preconceived theory is acceptable.

Repeatedly throughout the literature (e.g., Patton, 1999 ; Atkinson, Heath, & Chenail, 1991 ; Lincoln & Guba, 1985), the evaluator is emphasized as the key instrument in analysis of data. Although statistics can be helpful, they are seen as restricting and override any “insight” from the researcher. Analysis necessarily depends on the “astute pattern recognition” abilities of the investigating researcher ( Patton, 1999 ). What Leech and Onwuegbuzie (2007) have called “data analysis triangulation” is essentially an extension of the triangulation concept described by Patton (1999) as applied to data analytics. The idea is that by analyzing data with different techniques, convergence can be determined, making the findings more credible or trustworthy.

Because a large part of qualitative inquiry is subjective and dependent on a researcher’s creativity, Patton (1999) has advocated reporting all relevant data and making explicit all thought processes, thus avoiding the problem of interpretive bias. This may allow anyone that reads the evaluation report to determine whether the results and suggestions were sufficiently grounded. Shek et al. (2005) have outlined the necessary steps that must occur to demonstrate that the researcher is not simply forcing their opinions into their research.

Qualitative Methods in Program Evaluation

The most common methods in qualitative program evaluation are straightforward and fall into one of two broad categories: first-party or third-party methods (done from the perspective of the evaluands, which are the programs being evaluated). These methods are also used by more quantitative fields of inquiry, although they are not usually framed as part of the research process.

first-party methods

When an evaluator directly asks questions to the entities being evaluated, the evaluator is utilizing a first-party method. Included in this method are techniques such as interviews (whether of individuals or focus groups), surveys, open-ended questionnaires, and document analyses.

Interviews, surveys, and open-ended questionnaires are similar in nature. In interviews, the researcher begins with a set of potential questions, and depending on the way in which the individuals within the entity respond, the questions will move in a particular direction. The key here is that the questioning is fluid, open, and not a forced choice. In the case of surveys and open-ended questionnaires, fixed questions are presented to the individual, but the potential answers are left as open as possible, such as in short-answer responding. Like with interviews, if it can be helped, the questioning is open and not a forced choice ( see Leech & Onwuegbuzie, 2007 ; Oliver-Hoyo & Allen, 2006 ; Pugach, 2001 ; Patton, 1999 ).

Although document analysis is given its own category in the literature ( Pugach, 2001 ; Patton, 1999 ), it seems more appropriate to include the document analysis technique along with other first-party methods. Document analysis will usually be conducted on prior interviews, transcribed statements, or other official reports. It involves doing “archival digging” to gather data for the evaluation. Pulling out key “success” or “failure” stories are pivotal to performing these kinds of analyses and utilized as often as possible for illustrative purposes.

The unifying theme of these three techniques is that the information comes from within the entity being evaluated.

third-party methods

The other primary type of methodology used in qualitative research is third-party methods. The two major third-party methods are naturalistic observations and case studies. These methods are more phenomenological in nature and require rigorous training on the part of the researcher for proper execution. These methods are intimately tied with the Competence section above.

Naturalistic observation has been used by biological and behavioral scientists for many years and involves observation of behavior within its natural context. This method involves observing some target (whether that is a human or nonhuman animal) performing a behavior in its natural setting. This is most often accomplished reviewing video recordings or recording the target in person while not interacting with the target. There are, however, many cases of researchers interacting with the target and then “going native” or becoming a member of the group they initially sought to study ( Patton, 1999 ). Some of the most prominent natural scientists have utilized this method (e.g., Charles Darwin, Jane Goodall, and Isaac Newton). According to Patton (1999) , there are well-documented problems with this method, including phenomena like researcher presence effects, “going native,” researcher biases, and concerns regarding researcher training. Despite the inherent risks and problems with naturalistic observation, it has been, and will likely continue to be, a staple method within scientific inquiry.

Case studies can be special cases of a naturalistic observation or can be a special kind of “artificial” observation. Case studies provide extensive detail about a few individuals ( Banfield & Cayago-Gicain, 2006 ; Patton, 1999 ) and can simply be used to demonstrate a point (as in Abma, 2000 ). Case studies usually take a substantial amount of time to gather appropriate amounts of idiographic data. This method utilizes any records the researcher can get their hands on, regarding the individual being studied (self-report questionnaires, interviews, medical records, performance reviews, financial records, etc.). As with naturalistic observation, case study researchers must undergo much training before they can be deemed “capable” of drawing conclusions based on a single individual. The problems with case studies are all of those in naturalistic observation but with the addition of a greater probability of a sampling error. Because case studies are so intensive, they are often also very expensive. The salience and exhaustion of a few cases makes it difficult to notice larger, nominal trends in the data ( Banfield & Cayago-Gicain, 2006 ). This could also put a disproportionate emphasis on the “tails” of the distribution, although that may be precisely what the researcher wants to accomplish ( see next section ).

Critiques/Criticisms of Quantitative Methods

One of the major critiques of quantitative methods by those in qualitative evaluation is that of credibility. Relevance of findings using quantitative evaluation to what is “important” or what is “the essence” of the question, according to those using qualitative evaluation methods, is rather poor ( see discussion in Reichardt & Rallis, 1994a , 1994b ). Recall that according to Atkinson (1991) , the relevance of findings, and whether they are appropriate, cannot be determined by the evaluator. The stakeholders are the only ones that can determine relevance. Although there are those in qualitative program evaluation that think almost everything is caused by factors like “social class” and “disparity in power,” Atkinson would argue that the evaluator is not able to determine what is or is not relevant to the reality experienced by the stakeholders.

Another criticism is that quantitative research tends to focus simply on the majority, neglecting the individuals in the outer ends of the normal distribution. This is a valid critique for those quantitative researchers who tend to “drop” their outliers for better model fits. Banfield and Cayago-Gicain (2006) have pointed out that qualitative research allows for more detail on a smaller sample. This allows for more context surrounding individuals to be presented. With additional knowledge from the “atypical” (tails of the distribution) cases, theory can be extracted that fits all of the data best and not just the “typical” person.

Beyond the Qualitative/Quantitative Debate

Debate about the superiority of qualitative versus quantitative methodology has a long history in program evaluation. Prior to the 1970s, randomized experiments were considered the gold standard in impact assessment. More and more, however, as evaluators realized the limitations of randomized experiments, quasi-experiments became more acceptable ( Madey, 1982 ). It was also not until the early 1970s that qualitative methods became more acceptable; however, epistemological differences between the two camps prevailed in perpetuating the debate, even leading to distrust and slander between followers of the different perspectives ( Kidder & Fine, 1987 ). In an effort to ebb the tide of the qualitative–quantitative debate, some evaluators have long called for integration between the two approaches. By recognizing that methods typically associated with qualitative and quantitative paradigms are not inextricably linked to these paradigms ( Reichardt & Cook, 1979 ), an evaluator has greater flexibility with which to choose specific methods that are simply the most appropriate for a given evaluation question ( Howe, 1988 ). Further, others have pointed out that because the qualitative and quantitative approaches are not entirely incompatible (e.g., Reichardt & Rallis, 1994a , 1994b ), common ground can be found between the two methods when addressing evaluation questions.

An evaluator thus may choose to use quantitative or qualitative methods alone or may choose to use both methods in what is known as a mixed methods design. A mixed methods approach to evaluation has been advocated on the basis that the two methods: (1) provide cross-validation (triangulation) of results and (2) complement each other, where the relative weakness of one method becomes the relative strength of the other. For example, despite the purported epistemological differences between the two paradigms, the different approaches to evaluation often lead to the same answers ( Sale, Lohfeld, & Brazil, 2002 ). Thus, combining both methods into the same evaluation can result in converging lines of evidence. Further, each method can be used to complement the other. For example, the use of qualitative data collection techniques can help in the development or choice of measurement instruments, as the personal interaction with individual participants may pave the way for collecting more sensitive data ( Madey, 1982 ).

Despite the promise of integrating qualitative and quantitative methods through a mixed method approach, Sale et al. (2002) challenged the notation that qualitative and quantitative methods are separable from their respective paradigms, contrary to the position advocated by Reichardt and Cook (1979) . Indeed, these authors have suggested that because the two approaches deal with fundamentally different perspectives, the use of both methods to triangulate or complement each other is invalid. Rather, mixed methods should be used in accordance with one another only with the recognition of the different questions that they address. In this view, it should be recognized that qualitative and quantitative methods do address different questions, but at the same time they can show considerable overlap. Thus, mixed methods designs provide a more complete picture of the evaluation space by providing all three components: cross-validation, complimentarity, and unique contributions from each.

Despite the utility in principle of integrating both qualitative and quantitative methods in evaluation and the more recent developments in mixed methodology ( see Greene & Caracelli, 1997 ), the overwhelming majority of published articles in practice employ either qualitative or quantitative methods to the exclusion of the other. Perhaps one reason for the persistence of the single methodology approach is the lack of training in both approaches in evaluation training programs. For example, the AEA website ( http://www.eval.org ) lists 51 academic programs that have an evaluation focus or evaluation option. In a review of each of these programs, we found that none of the evaluation programs had a mixed methods focus. Moreover, when programs did have a focus, it was on quantitative methods. Further, within these programs quantitative methods and qualitative methods were generally taught in separate classes, and there was no evidence of any class in any program that was focused specifically on mixed methods designs. Indeed, Johnson and Onwuegbuzie (2004) have noted that “ … graduate students who graduate from educational institutions with an aspiration to gain employment in the world of academia or research are left with the impression that they have to pledge allegiance to one research school of thought or the other” (p. 14). Given the seeming utility of a mixed methods approach, it is unfortunate that more programs do not offer specific training in these techniques.

Competing Paradigms or Possible Integration?

In summary, the quantitative and qualitative approaches to program evaluation have been widely represented as incommensurable Kuhnian paradigms (e.g., Guba & Lincoln, 1989 ). On the other hand, it has been suggested that perhaps the road to reconciliation lies with Reichenbach’s (1938) important distinction between the context of discovery versus the context of justification in scientific research. Sechrest and Figueredo (1993) paraphrased their respective definitions:

In the context of discovery, free reign is given to speculative mental construction, creative thought, and subjective interpretation. In the context of justification, unfettered speculation is superseded by severe testing of formerly favored hypotheses, observance of a strict code of scientific objectivity, and the merciless exposure of one’s theories to the gravest possible risk of falsification. (p. 654)

Based on that philosophical perspective, Sechrest and Figueredo (1993) recommended the following methodological resolution of the quantitative/qualitative debate:

We believe that some proponents of qualitative methods have incorrectly framed the issue as an absolute either/or dichotomy. Many of the limitations that they attribute to quantitative methods have been discoursed upon extensively in the past. The distinction made previously, however, was not between quantitative and qualitative, but between exploratory and confirmatory research. This distinction is perhaps more useful because it represents the divergent properties of two complementary and sequential stages of the scientific process, rather than two alternative procedures … Perhaps a compromise is possible in light of the realization that although rigorous theory testing is admittedly sterile and nonproductive without adequate theory development, creative theory construction is ultimately pointless without scientific verification. (p. 654)

We also endorse that view. However, in case Sechrest and Figueredo (1993) were not completely clear the first time, we will restate this position here a little more emphatically. We believe that qualitative methods are most useful in exploratory research, meaning early in the evaluation process, the so-called context of discovery, in that they are more flexible and open and permit the researcher to follow intuitive leads and discover previously unknown and unimagined facts that were quite simply not predicted by existing theory. Qualitative methods are therefore a useful tool for theory construction . However, the potentially controversial part of this otherwise conciliatory position is that it is our considered opinion that qualitative methods are inadequate for confirmatory research, the so-called context of justification, in that they do not and cannot even in principle be designed to rigorously subject our theories to critical risk of falsification, as by comparison to alternative theories ( Chamberlin, 1897 ; Platt, 1964 ; Popper, 1959 ; Lakatos, 1970 , 1978 ). For that purpose, quantitative methods necessarily excel because of their greater methodological rigor and because they are equipped to do just that. Quantitative methods are therefore a more useful tool for theory testing . This does not make quantitative evaluation in any way superior to qualitative evaluation, in that exploration and confirmation are both part of the necessary cycle of scientific research.

It is virtually routine in many other fields, such as in the science of ethology, to make detailed observations regarding the natural history of any species before generating testable hypotheses that predict their probable behavior. In cross-cultural research, it is standard practice to do the basic ethnographical exploration of any new society under study prior to making any comparative behavioral predictions. These might be better models for program evaluation to follow than constructing the situation as an adversarial one between supposedly incommensurable paradigms.

Conclusions and Recommendations for the Future

As a possible solution to some of the structural problems, moral hazards, and perverse incentives in the practice of program evaluation that we have reviewed, Scriven (1976 , 1991 ) long ago suggested that the program funders should pay for summative evaluations and pay the summative evaluators directly . We completely agree with this because we believe that the summative program evaluators must not have to answer to the evaluands and that the results of the evaluation should not be “filtered” through them.

For example, in the Propriety Standards for Conflicts of Interest, The Joint Committee on Standards for Educational Evaluation (1994) has issued the following guideline: “Wherever possible, obtain the evaluation contract from the funding agency directly, rather than through the funded program or project” (p. 116). Our only problem with this guideline is that the individual evaluator is called on to implement this solution. Should an ethical evaluator then decline contracts offered by the funded program or project? This is not a realistic solution to the problem. As a self-governing society, we should simply not accept summative evaluations in which the funded programs or projects (evaluands) have contracted their own program evaluators. This is a simple matter of protecting the public interest by making the necessary institutional adjustments to address a widely recognized moral hazard.

Similarly, in the Propriety Standards for Disclosure of Findings, The Joint Committee on Standards for Educational Evaluation (1994) has issued various guidelines for evaluators to negotiate in advance with clients for complete, unbiased, and detailed disclosure of all evaluation findings to all directly and indirectly affected parties. The problem is that there is currently no incentive in place for an individual evaluator to do so and possibly jeopardize the award of an evaluation contract by demanding conditions of such unrestricted dissemination of information to which almost no client on this planet is very likely to agree.

On the other hand, we recommend that the evaluands should pay for formative evaluations and pay the formative evaluators directly . This is because we believe that formative evaluators should provide continuous feedback to the evaluands and not publish those results externally before the program is fully mature (e.g., Tharp & Gallimore, 1979 ). That way, the formative evaluator can gain the complete trust and cooperation of the program administrators and the program staff. Stufflebeam (2001) writes:

Clients sometimes can legitimately commission covert studies and keep the findings private, while meeting relevant laws and adhering to an appropriate advance agreement with the evaluator. This can be the case in the United States for private organizations not governed by public disclosure laws. Furthermore, an evaluator, under legal contractual agreements, can plan, conduct, and report an evaluation for private purposes, while not disclosing the findings to any outside party. The key to keeping client-controlled studies in legitimate territory is to reach appropriate, legally defensible, advance, written agreements and to adhere to the contractual provisions concerning release of the study’s findings. Such studies also have to conform to applicable laws on release of information. (p. 15)

In summary, summative evaluations should generally be external , whereas formative evaluations should generally be internal . Only strict adherence to these guidelines will provide the correct incentive system for all the parties concerned, including the general public, which winds up paying for all this. The problem essentially boils down to one of intellectual property. Who actually owns the data generated by a program evaluation? In a free market society, the crude but simple answer to this question is typically “whoever is paying for it!” In almost no case is it the program evaluator, who is typically beholden to one party or another for employment. We should therefore arrange for the owner of that intellectual property to be in every case the party whose interests are best aligned with those of the society as a whole. In the case of a formative evaluation, that party is the program- providing agency (the evaluand) seeking to improve its services with a minimum of outside interference, whereas in the case of a summative evaluation, that party is the program- funding agency charged with deciding whether any particular program is worth society’s continuing investment and support.

Many informative and insightful comparisons and contrasts have been made on the relative merits and limitations of internal and external evaluators (e.g., Braskamp, Brandenburg, & Ory, 1987 ; Love, 1991 ; Mathison, 1994 ; Meyers, 1981 ; Newman & Brown, 1996 ; Owen & Rogers, 1999 ; Patton, 1997 ; Tang, Cowling, Koumijian, Roeseler, Lloyd, & Rogers, 2002 ; Weiss, 1998 ). Although all of those considerations are too many to list here, internal evaluators are generally valued for their greater availability and lower cost as well as for their greater contextual knowledge of the particular organization and ability to obtain a greater degree of commitment from stakeholders to the ultimate recommendations of the evaluation, based on the perceived legitimacy obtained through their direct experience in the program. We believe that these various strengths of internal evaluators are ideally suited to the needs of formative evaluation; however, some of these same characteristics might compromise their credibility in the context of a summative evaluation. In contrast, external evaluators are generally valued for their greater technical expertise as well as for their greater independence and objectivity, including greater accountability to the public interest and ability to criticize the organization being evaluated—hence their greater ability to potentially position themselves as mediators or arbiters between the stakeholders. We believe that these various strengths of external evaluators are ideally suited to the needs of summative evaluation; however, some of these same characteristics might compromise their effectiveness in the context of a formative evaluation.

A related point is that qualitative methods are arguably superior for conducting the kind of exploratory research often needed in a formative evaluation, whereas quantitative methods are arguably superior for conducting the confirmatory research often needed in a summative evaluation. By transitive inference with our immediately prior recommendation, we would envision qualitative methods being of greater use to internal evaluators and quantitative methods being of greater use to external evaluators, if each method is being applied to what they excel at achieving, within their contingently optimal contexts. With these conclusions, we make our final recommendation that the qualitative/quantitative debate be officially ended , with the recognition that both kinds of research each have their proper and necessary place in the cycle of scientific research and, by logical implication, that of program evaluation. Each side must abandon the claims that their preferred methods can do it all and, in the spirit of the great evaluation methodologist and socio-cultural evolutionary theorist Donald Thomas Campbell, to recognize that all our methods are fallible ( Campbell & Fiske, 1959 ) and that only through exploiting their mutual complementarities can we put all of the interlocking fish scales of omniscience back together ( Campbell, 1969 ).

Abma, T. A. ( 2000 ). Stakeholder conflict: A case study. Evaluation and Program Planning, 23, 199–210.

Google Scholar

Atkinson, B. , Heath, A. , & Chenail, R. ( 1991 ). Qualitative research and the legitimization of knowledge. Journal of Marital and Family Therapy, 17(2), 175–180.

Barbour, R. S. ( 1998 ). Mixing qualitative methods: Quality assurance or qualitative quagmire? Qualitative Health Research, 8(3), 352–361.

Banfield, G. , & Cayago-Gicain, M. S. ( 2006 ). Qualitative approaches to educational evaluation: A regional conference-workshop. International Education Journal, 7(4), 510–513.

Berry, D. H. ( 2000 ) Cicero Defense Speeches , trans. New York: Oxford University Press.

Google Preview

Boruch, R. F. ( 1997 ). Randomized experiments for planning and evaluation: A practical guide . Thousand Oaks, CA: Sage.

Braskamp, L.A. , Brandenburg, D.C. & Ory, J.C. ( 1987 ). Lessons about clients’ expectations. In J Nowakowski (Ed.), The client perspective on evaluation: New Directions For Program Evaluation , 36, 63–74. San Francisco, CA: Jossey-Bass.

Bryk, A. S. ( 1980 ). Analyzing data from premeasure/postmeasure designs. In S. Anderson , A. Auquier , W. Vandaele , & H. I. Weisburg (Eds.), Statistical methods for comparative studies (pp. 235–260). Hoboken, NJ: John Wiley & Sons.

Campbell, D. T. ( 1953 ). A study of leadership among submarine officers . Columbus, OH: The Ohio State University, Personnel Research Board.

Campbell, D. T. ( 1956 ). Leadership and its effects upon the group . Columbus, OH: Bureau of Business Research, The Ohio State University.

Campbell, D. T. ( 1957 ). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297–312.

Campbell, D. T. ( 1969 ). Ethnocentrism of disciplines and the fish-scale model of omniscience. In M. Sherif and C. W. Sherif , (Eds.), Interdisciplinary Relationships in the Social Sciences , (pp. 328–348). Chicago IL: Aldine.

Campbell, D. T. , & Fiske, D. W. ( 1959 ). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 56(2), 81–105.

Center for Disease Control ( 2007 ). Youth media campaign, VERB logic model . Retrieved May 18, 2012, from http://www.cdc.gov/youthcampaign/research/logic.htm

Chamberlin, T.C. ( 1897 ). The method of multiple working hypotheses. Journal of Geology, 5, 837–848.

Clayton, R. R. , Cattarello, A. M. , & Johnstone B. M. ( 1996 ). The effectiveness of Drug Abuse Resistance Education (Project DARE): 5-year follow-up results. Preventive Medicine 25(3), 307–318.

Cook, T. D. , & Campbell, D. T. ( 1979 ). Quasi-experimentation: design and analysis issues for field settings . Chicago, IL: Rand-McNally.

Cook, T. D. , Cook, F. L. , & Mark, M. M. ( 1977 ). Randomized and quasi-experimental designs in evaluation research: An introduction. In L. Rutman (Ed.), Evaluation research methods: A basic guide (pp. 101–140). Beverly Hills, CA: Sage.

Cook, T. D. , Scriven, M. , Coryn, C. L. S. , & Evergreen, S. D. H. ( 2010 ). Contemporary thinking about causation in evaluation: A dialogue with Tom Cook and Michael Scriven. American Journal of Evaluation, 31, 105–117.

Dembe, A. E. , & Boden, L. I. ( 2000 ). Moral hazard: A question of morality? New Solutions 2000, 10(3), 257–279.

Denzin, N. K. ( 1989 ). Interpretive interactionism . Newbury Park, CA: Sage.

Dukes, R. L. , Stein, J. A. , & Ullman, J. B. ( 1996 ). Long-term impact of Drug Abuse Resistance Education (D.A.R.E.). Evaluation Review, 21(4), 483–500.

Dukes, R. L. , Ullman, J. B. , & Stein, J. A. ( 1996 ). Three-year follow-up of Drug Abuse Resistance Education (D.A.R.E.). Evaluation Review, 20(1), 49–66.

Duncan, T. E. , Duncan, S. C. , & Stryker, L. A. ( 2006 ). An introduction to latent variable growth curve modeling: Concepts, issues, and applications (2nd Ed.). Mahwah, NJ: Laurence Erlbaum.

Ennett, S. T. , Tobler, M. S. , Ringwalt, C. T. , & Flewelling, R. L. ( 1994 ). How effective is Drug Abuse Resistance Education? A meta-analysis of Project DARE outcomes evaluations. American Journal of Public Health, 84(9), 1394–1401.

General Accountability Office ( 2003 ). Youth Illicit Drug Use Prevention (Report No. GAO-03-172R). Marjorie KE: Author.

Golafshani, N. ( 2003 ). Understanding reliability and validity in qualitative research. The Qualitative Report , 8(4), 597–606.

Greene, J. C. , & Caracelli, V. J. ( 1997 ). Defining and describing the paradigm issue in mixed-method evaluation. In J. C. Greene & V. J. Caracelli (Eds.), Advances in mixed-method evaluation: The challenges and benefits of integrating diverse paradigms (pp. 5–17).(New Directions for Evaluation, No. 74). San Francisco: Jossey-Bass.

Guba, E. G. , & Lincoln, Y. S. ( 1989 ). Fourth generation evaluation . Newbury Park: Sage.

Heckman, J. J. , & Smith, J. A. ( 1995 ). Assessing the case for social experiments. Journal of Economic Perspectives, 9, 85–110.

Hollister, R. G. & Hill, J ( 1995 ). Problems in the evaluation of community-wide initiatives. In Connell, J. P. , Kubish, A. C. , Schorr, L. B. , & Weiss, C. H. (Eds.), New approaches to evaluating community intiatives: Concepts, methods, and contexts (pp. 127–172). Washington, DC: Aspen Institute.

Howe, K. R. ( 1988 ). Against the quantitative-qualitative incompatibility thesis or dogmas die hard. Educational Researcher, 17, 10–16.

Huitema, B. E. ( 1980 ). The analysis of covariance and alternatives . New York, NY: John Wiley & Sons.

Johnson, R. B. , & Onwuegbuzie, A. J. ( 2004 ). Mixed-methods research: A research paradigm whose time has come. Educational Researcher, 33, 14–26.

Katzer, J. , Cook, K. and Crouch, W. ( 1978 ). Evaluating information: A guide for users of social science research . Reading, MA: Addison-Wesley.

Kenny, D. A. ( 1975 ). A quasi-experimental approach to assessing treatment effects in the non-equivalent control group design. Psychological Bulletin, 82, 345–362.

Kenny, D. A. , Kashy, D. A. & Cook, W. L. ( 2006 ). Dyadic data analysis . New York: Guilford Press.

Kidder, L. H. , & Fine, M. ( 1987 ). Qualitative and quantitative methods: When stories converge. In M. M. Mark & R. L. Shotland (Eds.), Multiple methods in program evaluation (pp. 57–75). Indianapolis, IN: Jossey-Bass.

King, J. A. , Stevahn, L. , Ghere, G. & Minnema, J. ( 2001 ). Toward a taxonomy of essential evaluator competencies. American Journal of Evaluation, 22, 229–247.

Kirk, R. E. ( 2009 ). Experimental Design. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 23–45). Thousand Oaks, CA: Sage.

Lakatos, I. ( 1970 ). Falsification and the methodology of scientific research programmes. In Lakatos, I. , & Musgrave, A. , (Eds.), Criticism and the growth of knowledge (pp. 91–196). Cambridge, UK: Cambridge University Press.

Lakatos, I. ( 1978 ). The methodology of scientific research programs . Cambridge, UK: Cambridge University Press.

Leech, N. L. , & Onwuegbuzie, A. J. ( 2007 ). An array of qualitative data analysis tools: A call for data analysis triangulation. School Psychology Quarterly, 22(4), 557–584.

Loftus, E.F. ( 1979 ). Eyewitness Testimony, Cambridge, MA: Harvard University Press.

Love, A.J. ( 1991 ). Internal evaluation: Building organizations from within . Newbury Park, CA: Sage.

Madey, D. L. ( 1982 ). Some benefits of integration qualitative and quantitative methods in program evaluation. Educational Evaluation and Policy Analysis, 4, 223–236.

Mark, M. M. , & Cook, T. D. ( 1984 ). Design of randomized experiments and quasi-experiments. In L. Rutman (Ed.), Evaluation research methods: A basic guide (pp. 65–120). Beverly Hills, CA: Sage.

Mathison, S. ( 1994 ). Rethinking the evaluator role: partnerships between organizations and evaluators. Evaluation and Program Planning , 17(3), 299–304.

Meyers, W. R. ( 1981 ). The Evaluation Enterprise: A Realistic Appraisal of Evaluation Careers, Methods, and Applications . San Francisco, CA: Jossey-Bass.

Muthén, L. K. & Muthén, B. O. ( 1998 –2009). Mplus user’s guide. Statistical analysis with latent variables . Los Angeles, CA: Muthén & Muthén.

Newcomer, K. E. & Wirtz, P. W. ( 2004 ). Using statistics in evaluation. In Wholey, J. S. , Hatry, H. P. & Newcomer, R. E. (Eds.), Handbook of practical program evaluation (pp. 439–478). San Francisco, CA: John Wiley & Sons.

Newman, D. L. & Brown, R. D. ( 1996 ). Applied ethics for program evaluation . San Francisco, CA: Sage.

Office of Management and Budget. ( 2009 ). A new era of responsibility: Renewing America’s promise . Retrieved May 18, 2012, from http://www.gpoaccess.gov/usbudget/fy10/pdf/fy10-newera.pdf

Oliver-Hoyo, M. , & Allen, D. ( 2006 ). The use of triangulation methods in qualitative educational research. Journal of College Science Teaching , 35, 42–47.

Owen, J. M. , & Rogers, P. J. ( 1999 ). Program Evaluation: Forms and Approaches (2nd ed.), St Leonards, NSW: Allen & Unwin.

Page, R. B. ( 1909 ). The Letters of Alcuin. New York: The Forest Press.

Patton, M. Q. ( 1990 ) Qualitative evaluation and research methods . Thousand Oaks, CA: Sage.

Patton, M. Q. ( 1994 ). Developmental evaluation. Evaluation Practice. 15(3), 311–319.

Patton, M. Q. ( 1996 ). A world larger than formative and summative. Evaluation Practice, 17(2), 131–144.

Patton, M.Q. ( 1997 ). Utilization-focused evaluation: The new century text (3rd ed.). Thousand Oaks, CA: Sage.

Patton, M. Q. ( 1999 ). Enhancing the quality and credibility of qualitative analysis. Health Services Research, 35:5 Part II, 1189–1208.

Pauly, M. V. ( 1974 ). Overinsurance and public provision of insurance: The roles of moral hazard and adverse selection. Quarterly Journal of Economics, 88, 44–62.

Platt, J. R. ( 1964 ). Strong inference. Science , 146, 347–353.

Popper, K. ( 1959 ). The Logic of Scientific Discovery . New York: Basic Books.

Pugach, M. C. ( 2001 ). The stories we choose to tell: Fulfilling the promise of qualitative research for special education. The Council for Exceptional Children, 67(4), 439–453.

Ramsay, G. G. ( 1918 ). Juvenal and Persius . trans. New York: Putnam.

Reichardt, C. S. ( 1979 ). The statistical analysis of data from non-equivalent groups design. In T. D. Cook & D. T. Campbell (Eds.), Quasi-experimentation: Design and analysis issues for field settings (pp. 147–206). Chicago, IL: Rand-McNally.

Reichardt, C. S. ( 2009 ). Quasi-experimental design. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 46–71). Thousand Oaks, CA: Sage.

Reichardt, C. S. , & Cook, T. D. ( 1979 ). Beyond qualitative versus quantitative methods. In T. D. Cook & C. S. Reichardt (Eds.), Qualitative and quantitative methods in evaluation research (pp. 7–32). Beverly Hills, CA: Sage.

Reichardt, C. S. , & Rallis, S. F. ( 1994 b). Qualitative and quantitative inquiries are not incompatible: A call for a new partnership. New Directions for Program Evaluation, 61, 85–91.

Reichardt, C.S. , & Rallis, S. F. ( 1994 a). The relationship between the qualitative and quantitative research traditions. New Directions for Program Evaluation, 61, 5–11.

Reichenbach, H. ( 1938 ). Experience and prediction . Chicago: University of Chicago Press.

Rossi, P. H. , & Freeman, H. E. ( 1993 ). Evaluation: A systematic approach (5th ed.). Newbury Park, CA: Sage.

Sale, J. E. M. , Lohfeld, L. H. , & Brazil, K. ( 2002 ). Revisiting the quantitative-qualitative debate: Implications for mixed-methods research. Quality & Quantity, 36, 43–53.

Scriven, M. ( 1967 ). The methodology of evaluation. In Gredler, M. E. , (Ed.), Program Evaluation (p. 16). Englewood Cliffs, New Jersey: Prentice Hall, 1996.

Scriven, M. ( 1976 ). Evaluation bias and its control. In C. C. Abt (Ed.) The Evaluation of Social Programs , (pp. 217–224). Beverly Hills, CA: Sage.

Scriven, M. ( 1983 ). Evaluation idiologies. In G. F. Madaus , M. Scriven & D. L. Stufflebeam (Eds.). Evaluation models: Viewpoints on educational and human services evaluation (pp. 229–260). Boston: Kluwer-Nijhoff.

Scriven, M. ( 1991 ). Pros and cons about goal-free evaluation. Evaluation Practice , 12(1), 55–76.

Sechrest, L. , & Figueredo, A. J. ( 1993 ). Program evaluation. Annual Review of Psychology , 44, 645–674.

Seltzer, M. H. , Frank, K. A. , & Bryk, A. S. ( 1994 ). The metric matters: The sensitivity of conclusions about growth in student achievement to choice of metric. Education Evaluation and Policy Analysis, 16, 41–49.

Shadish, W. R. , Cook, T. D. , & Campbell, D. T. ( 2002 ). Experimental and quasi-experimental designs for generalized causal inference . Boston, MA: Houghton Mifflin.

Shadish, W. R. , Cook, T. D. , & Leviton, L. C. ( 2001 ). Foundations of program evaluations: Theories of practice . Newberry Park, CA: Sage.

Shek, D. T. L. , Tang, V. M. Y. , & Han, X. Y. ( 2005 ). Evaluation of evaluation studies using qualitative research methods in the social work literature (1990–2003): Evidence that constitutes a wake-up call. Research on Social Work Practice, 15, 180–194.

Singer, J. D. ( 1998 ). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics, 24, 323–355.

Singer, J. D. & Willett, J. B. ( 2003 ). Applied longitudinal data analysis: Modeling change and event occurrence . New York: Oxford University Press.

Smith, M. J. ( 2010 ). Handbook of program evaluation for social work and health professionals. New York: Oxford University Press.

St. Pierre, R. G. ( 2004 ). Using randomized experiments. In J. S. Wholey , H. P. Hatry , & K. E. Newcomer (Eds.), Handbook of practical program evaluation (2nd ed., pp. 150–175). San Fransisco, CA: John Wiley & Sons.

Stevahn, L. , King, J. A. , Ghere, G. & Minnema, J. ( 2005 ). Establishing essential compenticies for program evaluators. American Journal of Evaluation, 26, 43–59.

Strauss, A. , & Corbin, J. ( 1990 ). Basics of qualitative research: Grounded theory procedures and techniques . Newbury Park, CA: Sage.

Stufflebeam, D. L. ( 2001 ). Evaluation models. New Directions for Evaluation , 89, 7–98.

Tang, H. , Cowling, D.W. , Koumijian, K. , Roeseler, A. , Lloyd, J. , & Rogers, T. ( 2002 ). Building local program evaluation capacity toward a comprehensive evaluation. In R. Mohan , D. J. Bernstein , & M. D. Whitsett (Eds.), Responding to sponsors and Stakeholders in Complex Evaluation Environments (pp. 39–56). New Directions for Evaluation, No. 95. San Francisco, CA: Jossey-Bass.

Tharp, R. , & Gallimore, R. ( 1979 ). The ecology of program research and development: A model of evaluation succession. In L. B. Sechrest , S. G. West , M. A. Phillips , R. Redner , & W. Yeaton (Eds.), Evaluation Studies Review Annual (Vol. 4, pp. 39–60). Beverly Hills, CA: Sage.

Tharp, R. , & Gallimore, R. ( 1982 ). Inquiry process in program development. Journal of Community Psychology, 10(2), 103–118.

The Joint Committee on Standards for Educational Evaluation. ( 1994 ). The Program Evaluation Standards (2nd ed.). Thousand Oaks, CA: Sage.

Thistlethwaite, D. L. , & Campbell, D. T. ( 1960 ). Regression-discontinuity analysis: An alternative to the ex post facto experiment. The Journal of Educational Psychology, 51, 309–317.

Trochim, W. M. K. ( 1984 ). Research design for program evaluation: The regression discontinuity approach . Newbury Park, CA: Sage.

United Way. ( 1996 ). Guide for logic models and measurements . Retrieved May 18, 2012, from http://www.yourunitedway.org/media/GuideforLogModelandMeas.ppt

Wagner, A. K. , Soumerai, S. B. , Zhang, F. , & Ross-Degnan, D. ( 2002 ). Segmented regression analysis of interrupted time series studies in medication use research. Journal of Clinical Pharmacy and Therapeutics, 27, 299–309.

Weiss, C.H. ( 1980 ) Knowledge creep and decision accretion. Knowledge: Creation, Diffusion, Utilisation 1(3): 381–404.

Weiss, C. H. ( 1998 ). Evaluation: Methods for Studying Programs and Policies . 2nd ed. Upper Saddle River, NJ: Prentice Hall.

Weiss, C. J. ( 1999 ) The interface between evaluation and public policy. Evaluation, 5(4), 468–486.

Wells, G.L. , Malpass, R.S. , Lindsay, R.C.L. , Fisher, R.P. , Turtle, J.W. , & Fulero, S.M. ( 2000 ). From the lab to the police station: A successful application of eyewitness research. American Psychologist, 55(6), 581–598.

West, S.L. , & O’Neal, K.K. ( 2004 ) Project D.A.R.E. outcome effectiveness revisited. American Journal of Public Health, 94(6) 1027–1029.

Williams, A. ( 2001 ). Science or marketing at WHO? Commentary on ‘World Health 2000’. Health Economics, 10, 93–100.

Willson, E. B. , & Putnam, R. R. ( 1982 ). A meta-analysis of pretest sensitization effects in experimental design. American Educational Research Journal, 19, 249–258.

World Health Organization ( 2000 ). The World Health Report 2000 – Health Systems: Improving Performance . World Health Organization: Geneva, Switzerland.

About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Open access
Institutional account management
Rights and permissions
Get help with access
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Copyright © 2024 Oxford University Press
Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Int J Integr Care
v.22(2); Apr-Jun 2022

Qualitative Case Study: A Pilot Program to Improve the Integration of Care in a Vulnerable Inner-City Community

Margaret frances williamson.

1 University of NSW, AU

Hyun Jung Song

Louise dougherty.

2 University of NSW and Sydney Local Health District, AU

Lisa Parcsi

3 Sydney Local Health District, AU

Margo Linn Barr

Introduction:.

There is a strong correlation between vulnerable populations and poor health outcomes. Growing evidence suggests that person-centred interventions using ‘link workers’ can support communities to navigate and engage with health and community services, leading to improved health service access. We describe the initial phase and qualitative evaluation of a Healthy Living Program, supported by a link worker role. The Program aimed to improve health service access for residents of an Australian inner-city suburb.

To inform future program development, semi-structured interviews were conducted with clients and stakeholders (n = 21). The interviews were analysed thematically to understand program impact, success factors, constraints and potential improvements.

Key themes relating to impacts were a new model of working with community, improved access to services, and responsiveness to community need. Key factors for success included being a trusted, consistent presence, having knowledge of the community and health system, and successful engagement with the community and stakeholders. The constraints included difficulty influencing health system change and lack of community input. Suggested improvements were expanding the service, enhancing health system change and increasing community involvement.

Conclusion:

Knowledge gained from this study will inform future integrated approaches in health districts to address health inequities in areas of need.

Introduction

In 2017, the Australian Health Policy Collaboration published a report [ 1 ] highlighting the growing health disparities in Australia which correlate closely with socio-economic status. The report showed that 40% of low-income Australians experienced poor health outcomes. It attributed these poor health outcomes to multiple factors including poor access to healthcare, poor nutrition, high rates of obesity and high smoking rates. These problems overlap social issues related to housing, poverty and inadequate education [ 2 , 3 ].

Addressing the health needs of populations experiencing disadvantage can be difficult, especially if their needs are complex and solutions to address these issues do not fit neatly within existing health systems and practices. Person-centred interventions using community health workers (CHWs), patient navigators (PNs) and link workers (LWs), can help to support the community in navigating and linking them with health and community services. A significant body of literature exists on interventions using CHWs, PNs and LWs in various settings to help vulnerable communities navigate complex health systems, improve service delivery and address the social determinants of health [ 2 , 4 , 5 ]. There is growing evidence that person-centred interventions provided by these roles, can be effective in improving access to health services (especially cancer screening) [ 4 , 6 , 7 , 8 ], promoting a wide range of healthy behaviours [ 2 , 4 ], improving chronic disease management [ 2 , 9 , 10 , 11 , 12 ], reducing preventable health service use [ 13 ] and improving the overall health and wellbeing of populations [ 12 ], including those from disadvantaged groups [ 2 , 6 , 9 , 14 ].

The success of these interventions has been linked to factors related to the planning and development of the programs, how the program is delivered, staff attributes, the accessibility of referral services, community engagement and the integration of the work within a supportive health system [ 15 , 16 , 17 , 18 ].

As health authorities attempt to address the health needs of disadvantaged populations using similar patient-centred community-based intervention models, an understanding of the factors that contribute to their success in specific settings and of the challenges for such programs, is important in order to enhance implementation and outcomes [ 15 ].

Healthy Living Program

The target area for the Healthy Living Program (the Program) was Waterloo, an inner suburb of Sydney, Australia, with public housing representing over one quarter of the homes in the suburb (28%) and 90% of homes being flats or apartments [ 19 ]. The suburb has a number of vulnerable populations, including those who are economically disadvantaged, those from culturally and linguistically diverse backgrounds and those with complex social and health needs [ 19 ]. More than half of public housing residents are over the age of 60 years, 66% were born overseas, 8% have an Aboriginal and/or Torres Strait Islander background, 86% receive some form of pension or financial assistance.

Australia’s health system aims to provide safe, affordable and quality health care to all Australians. The federal government funds aged care services and subsidises access to community-based general practitioners (GPs), medical specialists, nurses; some allied health professionals and medicines. State and territory governments are responsible for public hospitals, community and mental health services, ambulance and emergency services and public health and preventive services. The Sydney Local Health District (SLHD) is one of 15 publicly funded Health Districts in the state of New South Wales and is responsible for the management and implementation of state-run health services and the health and wellbeing of the residents in this inner suburb (in addition to the needs of residents of other suburbs within the SLHD’s catchment area, which has a population of more than 670,000) [ 20 ]. The SLHD established the Program in 2017, in response both to community concerns about the range of health issues faced by the Waterloo community and perceptions that health services were not responsive to the needs of the community living in the Waterloo public housing estate [ 21 ]. The Program was implemented as the first phase (‘initiative and design’ phase) in the development of a broader integrated care model to improve access and integrate care among vulnerable communities within the SLHD [ 22 ].

The Program aimed to develop processes to: (1) better understand the health and wellbeing needs of the community; (2) provide navigation services to facilitate access to health services for individuals, groups and the community; (3) advocate changes to the way services are delivered to meet community need; (4) support community development activities to reduce health disparities and improve the wellbeing of the community; and (5) facilitate improved connectedness and communication between the community, other government and non-government organisations (NGOs) and SLHD services.

A Healthy Living Link Worker (the LW) was appointed to explore how program staff might act as a point of connection, liaison and navigation between residents and the SLHD services, and ultimately lead to improved service delivery and better health outcomes for residents. The work was supported by two independent advisory groups, one for SLHD managers and one for community and community-based NGO representatives. The Program had been in place for over two and a half years, and the most recent incumbent of the LW role had been employed for almost 12 months. Clients for the navigation services were self-referred or referred by local community NGOs or government agencies.

At the time of the evaluation, the LW had worked with 75 clients and associated SLHD staff to navigate appropriate pathways of care in the previous 12 months. Client information was collected by the LW and recorded in a restricted database, not connected to the SLHD clinical records. The LW also worked with specific SLHD services to improve access to urgent care for specific client groups, including those with chronic complex health needs. Part of the work included co-ordinating and supporting wellbeing checks, health information sessions and the delivery of health information to a range of at-risk community groups. Figure 1 provides an overview of the complexity of the work and the number of different services and referral pathways that need to be negotiated and integrated.

Overview of the service environment for the Healthy Living Program.

We conducted a qualitative study as part of a larger evaluation of the Program. Semi-structured interviews investigated the perspectives of the clients, community-based NGOs and health and welfare government staff, on the perceived impact, success factors, constraints and potential improvements for the LW role.

Eighty key individuals who worked with or were affected by the role, were identified through discussions with the supervisor of the Program and the current incumbent. These key individuals included 30 clients who had directly interacted with the LW and were referred to or were involved in a LW initiated activity; and SLHD staff (n = 23) and staff from NGOs (n = 14) and other government agencies who provided services to the Waterloo residents and had worked with the LW. We limited interviews to those who were able to participate without assistance in the English-language interviews as the majority of the LW interactions and activities were in English.

We aimed to interview at least five individuals from each group to gain a wide range of experiences, views and opinions. The supervisor of the Program and/or the current incumbent sent email invitations to the potential informants, inviting them to participate in the interviews. Some community members without email addresses were contacted by phone. Interviewees gave written and verbal consent to participate. Figure 2 summarises the selection, recruitment and data collection methods.

Recruitment and data collection.

Data collection and consent

Preliminary discussions with key stakeholders and the current LW on the impacts and key success factors and challenges for the program, informed a semi-structured interview guide ( Table 1 ). The questions focussed on the LW role and were adapted for community members and staff of the SLHD and other agencies. The research team pilot tested the interview guides for their comprehensibility, and questions were revised appropriately. Telephone and face-to-face interviews were conducted by two researchers between 9 February and 26 March 2020, and were recorded and transcribed verbatim.

Interview guide.

Data analysis

The transcripts were thematically analysed through an iterative process. The research team met regularly to discuss coding and analysis, identify themes and resolve any disagreements or concerns [ 23 , 24 , 25 ]. Based on a sample of the transcripts, two researchers developed and discussed the initial coding framework with the research team. Themes were refined as more interviews were analysed, and they were further refined with agreement from the team. NVivo 12 software [ 26 ] was used for coding and analysis.

Ethics approval

Ethics approval for the study was granted by Sydney Local Health District Ethics Review Committee (RPAH Zone) X19-0357 and 2019/2019/ {"type":"entrez-protein","attrs":{"text":"STE16400","term_id":"1438692099","term_text":"STE16400"}} STE16400 .

Evaluation Findings

Of the 39 individuals contacted, 21 agreed to participate (participation rate = 54%). The characteristics of the interviewees are presented in Table 2 .

Interviewee characteristics.

Thematic analysis focussed on the four main areas of investigation: impact, success factors, constraining factors and suggestions for improvements for the Program and the LW role. The themes identified for each area of investigation are presented in Table 3 . The clients interviewed had previously sought the help of the LW to access services, and their responses during the interviews were mainly concerned with the impact of the role and the success factors in terms of their own experience. Sub-themes are bolded in the text.

Themes according to areas of investigation.

Perceived impact of the role

Interviewees identified that the LW had developed a new model for addressing health issues in the community , was a dedicated presence in the community, listened to their issues and worked with others to address the health needs of the community.

“This is not a model of care that we’ve done before. … So this is actually a really important service, in my opinion, because of actually doing it in a different way. It’s actually looking at a community and saying, ‘What are its needs?’, and actually going out and meeting people and engaging with people in that local area and thinking about how do we do things better.” (SLHD staff 04) “I think the community would have benefited a lot … their concerns have been tackled, somebody did something. And I think that means a lot.” (SLHD staff 02)

Social isolation, mental health and oral health of children and youth, were found to be significant problems for the residents of the Program target area. The LW had partnered with NGOs and other government agencies to support a community choir to reduce social isolation, improve mental health and build community connections and skills for the local community.

“[The choir] has been a really fantastic initiative. Yeah, it’s been really positive and a lot of work, … I think it had some really great benefits for people who have been coming along regularly.” (NGO staff 01)

The LW also collaborated with the local dental hospital to create a systematic change in the way pre-school children and young people with complex needs accessed urgent and preventive oral health services.

“[The LW] highlighted that the youth health was an area which was having issues linking with oral health.” (SLHD staff 06)

A range of interviewees identified that a key impact of the role was directing individuals to appropriate health services and supporting their interaction with service providers.

“I have a problem with [health condition] and needed surgery. And [the LW] helped me to get to the hospital. I was very sick.” (Client 01) “Having a go between, you know, someone to talk to the authorities, you know, and the little guy, that’s us, it always helps, you know, because you don’t know where to begin and … [the LW] has a foot in the door, has a fair idea what direction to take. [The LW] helps us.” (Client 05).

Almost all clients were satisfied with the assistance provided by the LW.

“I am completely and honestly satisfied with his job for other people because I can see … when we first meet, I am very sick and have no family, [the LW] helped me.” (Client 01)

Several interviewees conveyed the importance of the navigation services provided by the LW in addressing community need .

“The big achievement I think is to make a health network accessible to the community, and it might seem to be a small thing to say, but it means a lot. It means a lot to the community, it means a lot to us as well.” (Other govt staff 04) “I think navigation is his key strength. It’s navigating services for the vulnerable population.” (SLHD staff 06)

The respondents reported that access to the LW had led to individuals changing their attitudes to health services and being more engaged with health services.

“And I have seen some massive outcomes that come back with positive results … a lot of the clients in Waterloo and Redfern are not being retraumatised within the health system. It seems that re-traumatisation seems to be decreasing. A lot of them are willing to readily engage back with health services.” (NGO staff 02) And we’ve noticed that because [the LW] has been involved, the patient’s attendance rate has been higher.” (SLHD staff 02)

Generally, respondents reported that the LW facilitated collaborations between stakeholders within the SLHD, NGOs and other government agencies, enabling the community to connect with the required services and supporting activities with other agencies to improve the health and wellbeing of the community.

“And I’m glad, because we were able to do something for that community that really needed that help. And had [the LW] not referred them, they would still be in pain, and waiting, and leaving it for later.” (SLHD staff 02) “[The LW] has brought in other health organisations like Diabetes Australia and Drug Health and Mental Health to different community events, and Multicultural Health.” (NGO staff 02) “I think there has been a couple of things that have really worked quite well, and one of them was the Health Expo that was held, in terms of getting a range of health organisations together.” (NGO staff 04)

Success factors for the role

Respondents reported that the LW was a trusted and consistent presence in the community and on committees where [the LW] represented the SLHD. The LW’s ability to build trusted working relationships with community members was valued by the community and the NGO staff.

“[The LW’s] got the trust [of the community].” (NGO staff 01) “I think having the same person in the role, over a consistent period, and where that position has been routinely reliable, that makes all the difference to that engagement, that relationship building …” (NGO staff 02) “You can call [the LW] anytime and [the LW] responds straight away. Anytime! Anytime! [the LW] checks if you are feeling good or OK! It is a good service.” (Client 01)

The LW’s ability to communicate and connect with community members and staff from a range of organisations was reported to have facilitated the work of the role.

“[The LW has] always been good with talking to people and that, you know, communicating and stuff.” (Client 04) “With me, [the LW] understand [sic] my position, so what I’m going through. So for me I think [the LW] is doing good.” (Client 06) “[The LW] listens to me and tries to establish what’s the best way.” (Client 05) “[The LW] really knows how to build this connection, it comes naturally. [The LW’s] a people’s person … [the LW] can talk to you about a very difficult subject, … [the LW] really knows how to make you at ease, and that’s one of his really big positive attributes.” (SLHD staff 02)

A key factor for the success of the role was knowledge of the community and the health system .

“I think they’ve got someone in the role who has a depth of knowledge and a connection with that particular community that is hard to replicate really.” (NGO staff 01) “You know, [the LW] can organise things, [the LW] knows. [The LW’s] really knowledgeable about what services there are and things like that.” (Client 04)

Almost all interview respondents reported the LW had successfully engaged with specific groups in the community, including housing estate residents and Aboriginal and Torres Strait Islander peoples.

“Good links with the Aboriginal community, which is important, as it’s a high Aboriginal population in Waterloo … Seems to be good engagement.” (SLHD staff 03) “The LW participated in the weekly outreach sessions [at the housing estate] that have happened, so that’s a way of being able to, sort of, make direct contact with people who have health concerns.” (NGO staff 04)

They also stated the LW supported collaborations within SLHD and between the SLHD services and the community and other organisations. This engagement across sectors was thought to contribute to the role’s impact.

“The Health Expo …… a lot of community members came, there were a lot of services represented, so people could engage with a number of both [SLHD services and other government services].” (SLHD staff 03) “I think the LW’s engaged very well with Aboriginal health staff, I think that’s worked really well, and there’s been a bit more health promotion going on there.” (SLHD staff 04) “And [the LW] went out of his way to engage on our behalf and speak on our behalf … we were getting almost no consents and dropping off that site in – from our project, working with [the LW], we got a really good response.” (SLHD staff 06)

Constraints of the role

Five themes emerged related to constraints of the role. Three of the themes focussed on the function of the role. The position description stated that the role had three main functions: navigation, systems influence and capacity building. Some respondents felt that the responsibilities for each of these functions were too much for one person .

“One person can’t do everything.” (NGO staff 03) “It’s probably identified as too big a job for one person. I think it’s identified that there’s different components to the job as well as different specialities.” (NGO staff 02)

Various SLHD and NGO staff had different expectations of the role . Some questioned whether the focus of the intervention work should have been at a patient level or at a system level. Several interviewees suggested that the role could have a more strategic focus.

“I think there’s probably a disconnect between what’s anticipated from the role and what the role can actually achieve.” (NGO staff 01) “I think there’s still confusion about whether it’s strategic level or whether it’s patient level, and, I think, people are saying it’s both, but it’s a stand-alone position … I would have seen it as a strategic role … It’s not a community health worker role, I didn’t think. There is a link, but I think it should be more strategic.” (SLHD staff 05)

Interviewees from the NGOs were concerned that although they had been instrumental in setting up the role, there was a lack of community input in the direction of the LW’s work.

“….. One of the things that happened with this role, was when it started, the NGOs were involved in calling for it, were meeting with senior staff about the role quite regularly, and quite frequently. And that all stopped when the role came into play.” (NGO staff 02)

Suggested improvements

A number of key informants from SLHD, NGOs and other government agencies, suggested the value of expanding the LW work by increasing the availability of the LW’s role and employing additional staff in the same role or with different skills, with a focus on the navigation work.

“Like, all of that foundation is there. I think it’s just about expanding it. That would be my one thing, that I would love to see more of it … I think with more resources or another person, I think that a lot more could be achieved.” (Other govt staff 02) “I think with the district that [the LW’s] covering, there needs to be … maybe between one or two staff underneath [the LW] to cover the area that [the LW’s] covering, because it is quite diverse and quite complex.” (NGO staff 03) “If [the LW] had a couple of people working with [the LW] in a team, that would be great, wouldn’t it?” (NGO staff 01)

Some key informants made suggestions about how the role could more strategically influence local health system change , including developing more structured working relationships between the LW and SLHD services, permitting the LW to have “authority to negotiate with services to bring about change”, and finally, empowering health services to take a more holistic approach and work together to address community health needs.

“I think that role has a capacity to sit at a broader more systemic level and coordinate, I guess, change at that level.” (Other govt staff 01) “I think there’s opportunities where [the LW] could be involved, and I think that comes back to that strategic level where, if you make the relationships between the other services and look for gaps or identify opportunities at that level, I think, there’s an element of that missing.” (SLHD staff 05) “What would be ideal I think would be [if the LW] can negotiate the high-level kind of, how to improve the system, like the navigation, the systems navigation … but also, some way to influence the health side of the equation more.” (NGO staff 01)

Some key informants suggested outstanding health service gaps that should be addressed, including support for housing estate residents caused by the uncertainties related to the redevelopment of the housing estate, and more support for residents with acute and ongoing mental health conditions.

“… Given the current state of the community with the [housing estate] redevelopment, … I think the anxiety and the mental health issues around that, there would be potential to do some workshops or some sort of regular support groups that could support the community because it has been an extremely stressful time.” (Other govt staff 02) “Stuff around building resilience in the community, before the moves, before the relocations, would be great.” (Other govt staff 04) “… Mental health is a major concern. Within social housing, … they just can’t get the access they need immediately … They’re just not getting the right support and services that are needed.” (NGO staff 03)

Interviewees from NGOs also identified the need for the community and NGOs to be more actively involved in ‘steering’ and supporting the work of the LW, and for ongoing consultation with the community. This included creating a steering group or reference group, with representatives from the community, NGOs and the SLHD working together to detect issues and address them.

“… We would have preferred the structure for the reference group to have the health people involved and the non-government people [community and NGOs] involved to have been in the same group … that increases the understanding of everybody around the table, and quite often it also means that people in power will also understand that there are aspects to a problem that they might not necessarily be aware of.” (NGO staff 04)

Our case study describes the initial phase of a program designed to identify and address health needs and health care access for community members from vulnerable groups in an inner-city suburb. This component of the program evaluation, which focussed on the work of the LW and was based on interviews with community members and staff from the SLHD and other relevant community organisations, explored perceptions on the impact of the work, its success factors and challenges, and provided suggestions for how the LW role and the broader Healthy Living Program may be improved. Overall, the interview respondents reported that the LW was working successfully with individuals to identify their health needs and to find ways of addressing access and service delivery issues. The work also addressed some of the health issues of the broader community, including the oral health needs of residents, acting as an advocate or link, and enabling changes to the way health services were delivered to meet community need for individuals and some at risk groups.

The interviews with staff from NGOs, other government agencies and SLHD, highlighted factors which have facilitated the success of the role. Having the relevant experience and personal skills, such as being a good communicator and having knowledge about the community and health services, were critical factors for success and acceptance of the LW as a trusted, respected and consistent presence in the community. A number of reviews and studies of similar roles, such as CHWs, LWs and PNs, have found similar factors as facilitators of success in other vulnerable groups [ 5 , 7 , 15 , 16 , 17 , 18 , 27 , 28 , 29 , 30 ].

The interviews highlighted the importance of the LW’s ability to communicate with and effectively promote collaboration with service providers and stakeholders, including coordinating their involvement in activities to promote health. Similarly, Pescheny and colleagues have noted the value of effective and regular communication between LWs and service providers in facilitating service delivery [ 30 ]. A range of other studies have also reported good communication between these stakeholders as an important enabler for improving integrated service delivery [ 7 , 15 , 16 , 30 ]. Trust was seen as a vital element for acceptance and connection to the clients within the community of similar programs [ 5 , 7 , 16 , 27 , 28 ], especially among marginalised populations [ 7 ]. One factor that was not emphasised in the literature but which both clients and staff from other government and community organisations valued in this study, was the importance of the LW being a consistent presence, whether at the end of the telephone for support, at meetings or through ongoing involvement in projects valued by the community and other stakeholders.

Constraints of the role which were highlighted in the interviews, such as the number and range of responsibilities of the LW and the lack of role clarity, matched the available information in the literature. In their reviews of the factors influencing implementation and maintenance of navigation programs, a number of studies attributed role clarity for the worker as well as for clients and health service providers, to the success of the program [ 5 , 7 , 15 , 17 , 18 , 29 ]. A review of an indigenous health worker program revealed that a good understanding of their role by service providers and Aboriginal community members, was an enabler in effective care coordination [ 18 ]. In addition to a lack of clear understanding and agreement on the LW role, disengagement by health service providers and a lack of shared partnerships between navigator services and other stakeholders, were found to be major barriers to implementation in a review of social prescribing services [ 30 ]. Perhaps as this Program was in the early stage of development, and although the need for the Program was identified, the scale of the demand for assistance led to interviewees suggesting the work was too much for one LW and that additional staff be employed to cover the shortfall.

This qualitative evaluation also raised other barriers, including the lack of ownership of the work by community members and community-based NGOs, and the difficulty in bringing about changes to how health services were delivered. This issue was also explored by Scott and colleagues in their review of reviews on CHW programs, which revealed two key enablers for the effectiveness of these programs in effecting system change: community embeddedness and integration into the health system [ 29 ]. Achieving community embeddedness was tied to active involvement of community members in CHW recruitment, priority-setting and monitoring. Integration into the health system would ‘foster’ respectful collaboration and communication between workers and senior health staff, and the health service may benefit from the ‘unique and practical knowledge’ of the workers. This in turn may lead to system change.

The WHO guideline on health policy and system support to optimise CHW programs, made similar recommendations related to the constraints identified in our evaluation, including: the necessary personal attributes and professional experiences of the workers; adequate funding; a clear understanding of the scope of work and its anticipated responsibilities and role by the workers, clients and stakeholders; community engagement, including community participation in the selection of workers, priority setting and monitoring of the work; and integration in and support by the health system [ 17 ].

Following our evaluation, five key high-level recommendations were made to the SLHD: investigate opportunities to extend the work; concentrate the work on significant issues faced by the community; increase community involvement in the program of work; improve collaborations between the Program, individual SLHD services and the community; and link the Program with other similar programs within the SLHD or wider. The SLHD currently is considering the recommendations as it develops the next phase of the Program.

The main limitation of the interview study was that there may have been people who were not interviewed who had very different views to those who were, including community members who could not communicate without assistance in English. However, the study did capture interesting and pertinent views from a wide range of stakeholders, and similar themes emerged across the diverse groups, suggesting that most of the important issues perceived by the participants were identified. These themes also aligned with the literature.

Many health jurisdictions around the world are looking at ways to address the social determinants of health in their vulnerable populations and improve the integration of their health care. We present the first phase in the development of a Healthy Living Program, this being the implementation of a LW in the community, as part of an integrated care initiative in an inner-city suburb with a large housing estate. The Program aimed to address community concerns about a range of health issues faced by the community and community perceptions that health services were not responsive to their needs. Our evaluation showed that this new model for addressing health issues in this community was well received and respondents reported that it provided a valued link between the community and the health services. While most clients and stakeholders were satisfied with the work, some constraining factors were identified. A number of these constraints have also been identified in similar programs across the globe leading to reduced program impact. For the ongoing development and success of these programs, it is imperative to learn from the lessons of other program implementations and continue to evaluate new and ongoing programs and address any constraining factors that are identified. One significant area for improvement identified in this and other evaluations is to ensure that the work is supported by and integrated into local health services so that integrated care can be accessed by their disadvantaged populations.

Acknowledgements

We wish to thank the people who participated in the key informant interviews for sharing their time and experience.

Karen Edwards, Principal Consultant, Counterpoint Consulting, Australia.

M.H. Kwekkeboom, PhD., prof Care and support, Amsterdam University of Applied Sciences, The Netherlands.

Funding Information

This study was funded by the Sydney Local Health District.

Competing Interests

The authors have no competing interests to declare.

Official websites use .gov

A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS

A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Introduction

Introduction to Program Evaluation for Public Health Programs: A Self-Study Guide

‹ View Table of Contents

What Is Program Evaluation?
Evaluation Supplements Other Types of Reflection and Data Collection
Distinguishing Principles of Research and Evaluation
Why Evaluate Public Health Programs?
CDC’s Framework for Program Evaluation in Public Health
How to Establish an Evaluation Team and Select a Lead Evaluator
Organization of This Manual

Most program managers assess the value and impact of their work all the time when they ask questions, consult partners, make assessments, and obtain feedback. They then use the information collected to improve the program. Indeed, such informal assessments fit nicely into a broad definition of evaluation as the “ examination of the worth, merit, or significance of an object. ” [4] And throughout this manual, the term “program” will be defined as “ any set of organized activities supported by a set of resources to achieve a specific and intended result. ” This definition is intentionally broad so that almost any organized public health action can be seen as a candidate for program evaluation:

Direct service interventions (e.g., a program that offers free breakfasts to improve nutrition for grade school children)
Community mobilization efforts (e.g., an effort to organize a boycott of California grapes to improve the economic well-being of farm workers)
Research initiatives (e.g., an effort to find out whether disparities in health outcomes based on race can be reduced)
Advocacy work (e.g., a campaign to influence the state legislature to pass legislation regarding tobacco control)
Training programs (e.g., a job training program to reduce unemployment in urban neighborhoods)

What distinguishes program evaluation from ongoing informal assessment is that program evaluation is conducted according to a set of guidelines. With that in mind, this manual defines program evaluation as “the systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform decisions about future program development.” [5] Program evaluation does not occur in a vacuum; rather, it is influenced by real-world constraints. Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program effectiveness.

Many different questions can be part of a program evaluation, depending on how long the program has been in existence, who is asking the question, and why the information is needed.

In general, evaluation questions fall into these groups:

Implementation: Were your program’s activities put into place as originally intended?
Effectiveness: Is your program achieving the goals and objectives it was intended to accomplish?
Efficiency: Are your program’s activities being produced with appropriate use of resources such as budget and staff time?
Cost-Effectiveness: Does the value or benefit of achieving your program’s goals and objectives exceed the cost of producing them?
Attribution: Can progress on goals and objectives be shown to be related to your program, as opposed to other things that are going on at the same time?

All of these are appropriate evaluation questions and might be asked with the intention of documenting program progress, demonstrating accountability to funders and policymakers, or identifying ways to make the program better.

Planning asks, “What are we doing and what should we do to achieve our goals?” By providing information on progress toward organizational goals and identifying which parts of the program are working well and/or poorly, program evaluation sets up the discussion of what can be changed to help the program better meet its intended goals and objectives.

Increasingly, public health programs are accountable to funders, legislators, and the general public. Many programs do this by creating, monitoring, and reporting results for a small set of markers and milestones of program progress. Such “performance measures” are a type of evaluation—answering the question “How are we doing?” More importantly, when performance measures show significant or sudden changes in program performance, program evaluation efforts can be directed to the troubled areas to determine “Why are we doing poorly or well?”

Linking program performance to program budget is the final step in accountability. Called “activity-based budgeting” or “performance budgeting,” it requires an understanding of program components and the links between activities and intended outcomes. The early steps in the program evaluation approach (such as logic modeling) clarify these relationships, making the link between budget and performance easier and more apparent.

While the terms surveillance and evaluation are often used interchangeably, each makes a distinctive contribution to a program, and it is important to clarify their different purposes. Surveillance is the continuous monitoring or routine data collection on various factors (e.g., behaviors, attitudes, deaths) over a regular interval of time. Surveillance systems have existing resources and infrastructure. Data gathered by surveillance systems are invaluable for performance measurement and program evaluation, especially of longer term and population-based outcomes. In addition, these data serve an important function in program planning and “formative” evaluation by identifying key burden and risk factors—the descriptive and analytic epidemiology of the public health problem. There are limits, however, to how useful surveillance data can be for evaluators. For example, some surveillance systems such as the Behavioral Risk Factor Surveillance System (BRFSS), Youth Tobacco Survey (YTS), and Youth Risk Behavior Survey (YRBS) can measure changes in large populations, but have insufficient sample sizes to detect changes in outcomes for more targeted programs or interventions. Also, these surveillance systems may have limited flexibility to add questions for a particular program evaluation.

In the best of all worlds, surveillance and evaluation are companion processes that can be conducted simultaneously. Evaluation may supplement surveillance data by providing tailored information to answer specific questions about a program. Data from specific questions for an evaluation are more flexible than surveillance and may allow program areas to be assessed in greater depth. For example, a state may supplement surveillance information with detailed surveys to evaluate how well a program was implemented and the impact of a program on participants’ knowledge, attitudes, and behavior. Evaluators can also use qualitative methods (e.g., focus groups, semi-structured or open-ended interviews) to gain insight into the strengths and weaknesses of a particular program activity.

Both research and program evaluation make important contributions to the body of knowledge, but fundamental differences in the purpose of research and the purpose of evaluation mean that good program evaluation need not always follow an academic research model. Even though some of these differences have tended to break down as research tends toward increasingly participatory models [6] and some evaluations aspire to make statements about attribution, “pure” research and evaluation serve somewhat different purposes (See “Distinguishing Principles of Research and Evaluation” table, page 4), nicely summarized in the adage “Research seeks to prove; evaluation seeks to improve.” Academic research focuses primarily on testing hypotheses; a key purpose of program evaluation is to improve practice. Research is generally thought of as requiring a controlled environment or control groups. In field settings directed at prevention and control of a public health problem, this is seldom realistic. Of the ten concepts contrasted in the table, the last three are especially worth noting. Unlike pure academic research models, program evaluation acknowledges and incorporates differences in values and perspectives from the start, may address many questions besides attribution, and tends to produce results for varied audiences.

Research Principles

Program Evaluation Principles

Scientific method

State hypothesis.
Collect data.
Analyze data.
Draw conclusions.

Framework for program evaluation

Engage stakeholders.
Describe the program.
Focus the evaluation design.
Gather credible evidence.
Justify conclusions.
Ensure use and share lessons learned.

Decision Making

Investigator-controlled

Authoritative.

Stakeholder-controlled

Collaborative.
Internal (accuracy, precision).
External (generalizability).

Repeatability program evaluation standards

Feasibility.
Descriptions.
Associations.
Merit (i.e., quality).
Worth (i.e., value).
Significance (i.e., importance).

Isolate changes and control circumstances

Narrow experimental influences.
Ensure stability over time.
Minimize context dependence.
Treat contextual factors as confounding (e.g., randomization, adjustment, statistical control).
Understand that comparison groups are a necessity.

Incorporate changes and account for circumstances

Expand to see all domains of influence.
Encourage flexibility and improvement.
Maximize context sensitivity.
Treat contextual factors as essential information (e.g., system diagrams, logic models, hierarchical or ecological modeling).
Understand that comparison groups are optional (and sometimes harmful).

Data Collection

Limited number (accuracy preferred).
Sampling strategies are critical.
Concern for protecting human subjects.

Indicators/Measures

Quantitative.
Qualitative.
Multiple (triangulation preferred).
Concern for protecting human subjects, organizations, and communities.
Mixed methods (qualitative, quantitative, and integrated).

Analysis & Synthesis

One-time (at the end).
Focus on specific variables.
Ongoing (formative and summative).
Integrate all data.
Attempt to remain value-free.
Examine agreement on values.
State precisely whose values are used.

Conclusions

Attribution

Establish time sequence.
Demonstrate plausible mechanisms.
Control for confounding.
Replicate findings.

Attribution and contribution

Account for alternative explanations.
Show similar effects in similar contexts.

Disseminate to interested audiences

Content and format varies to maximize comprehension.

Feedback to stakeholders

Focus on intended uses by intended users.
Build capacity.
Emphasis on full disclosure.
Requirement for balanced assessment.
To monitor progress toward the program’s goals
To determine whether program components are producing the desired progress on outcomes
To permit comparisons among groups, particularly among populations with disproportionately high risk factors and adverse health outcomes
To justify the need for further funding and support
To find opportunities for continuous quality improvement.
To ensure that effective programs are maintained and resources are not wasted on ineffective programs

Program staff may be pushed to do evaluation by external mandates from funders, authorizers, or others, or they may be pulled to do evaluation by an internal need to determine how the program is performing and what can be improved. While push or pull can motivate a program to conduct good evaluations, program evaluation efforts are more likely to be sustained when staff see the results as useful information that can help them do their jobs better.

Data gathered during evaluation enable managers and staff to create the best possible programs, to learn from mistakes, to make modifications as needed, to monitor progress toward program goals, and to judge the success of the program in achieving its short-term, intermediate, and long-term outcomes. Most public health programs aim to change behavior in one or more target groups and to create an environment that reinforces sustained adoption of these changes, with the intention that changes in environments and behaviors will prevent and control diseases and injuries. Through evaluation, you can track these changes and, with careful evaluation designs, assess the effectiveness and impact of a particular program, intervention, or strategy in producing these changes.

Recognizing the importance of evaluation in public health practice and the need for appropriate methods, the World Health Organization (WHO) established the Working Group on Health Promotion Evaluation. The Working Group prepared a set of conclusions and related recommendations to guide policymakers and practitioners. [7] Recommendations immediately relevant to the evaluation of comprehensive public health programs include:

Encourage the adoption of participatory evaluation approaches that provide meaningful opportunities for involvement by all of those with a direct interest in initiatives (programs, policies, and other organized activities).
Require that a portion of total financial resources for a health promotion initiative be allocated to evaluation—they recommend 10%.
Ensure that a mixture of process and outcome information is used to evaluate all health promotion initiatives.
Support the use of multiple methods to evaluate health promotion initiatives.
Support further research into the development of appropriate approaches to evaluating health promotion initiatives.
Support the establishment of a training and education infrastructure to develop expertise in the evaluation of health promotion initiatives.
Create and support opportunities for sharing information on evaluation methods used in health promotion through conferences, workshops, networks, and other means.

The figure presents the steps and standards of the CDC Evaluation Framework. The 6 steps are (1) engage stakeholders, (2) describe the program (3) focus the evaluation and its design, (4) gather credible evidence, (5) justify conclusions, and (6)ensure use and share lessons learned.

Program evaluation is one of ten essential public health services [8] and a critical organizational practice in public health. [9] Until recently, however, there has been little agreement among public health officials on the principles and procedures for conducting such studies. In 1999, CDC published Framework for Program Evaluation in Public Health and some related recommendations. [10] The Framework, as depicted in Figure 1.1, defined six steps and four sets of standards for conducting good evaluations of public health programs.

The underlying logic of the Evaluation Framework is that good evaluation does not merely gather accurate evidence and draw valid conclusions, but produces results that are used to make a difference. To maximize the chances evaluation results will be used, you need to create a “market” before you create the “product”—the evaluation. You determine the market by focusing evaluations on questions that are most salient, relevant, and important. You ensure the best evaluation focus by understanding where the questions fit into the full landscape of your program description, and especially by ensuring that you have identified and engaged stakeholders who care about these questions and want to take action on the results.

The steps in the CDC Framework are informed by a set of standards for evaluation. [11] These standards do not constitute a way to do evaluation; rather, they serve to guide your choice from among the many options available at each step in the Framework. The 30 standards cluster into four groups:

Utility: Who needs the evaluation results? Will the evaluation provide relevant information in a timely manner for them?

Feasibility: Are the planned evaluation activities realistic given the time, resources, and expertise at hand?

Propriety: Does the evaluation protect the rights of individuals and protect the welfare of those involved? Does it engage those most directly affected by the program and changes in the program, such as participants or the surrounding community?

Accuracy: Will the evaluation produce findings that are valid and reliable, given the needs of those who will use the results?

Sometimes the standards broaden your exploration of choices. Often, they help reduce the options at each step to a manageable number. For example, in the step “Engaging Stakeholders,” the standards can help you think broadly about who constitutes a stakeholder for your program, but simultaneously can reduce the potential list to a manageable number by posing the following questions: ( Utility ) Who will use these results? ( Feasibility ) How much time and effort can be devoted to stakeholder engagement? ( Propriety ) To be ethical, which stakeholders need to be consulted, those served by the program or the community in which it operates? ( Accuracy ) How broadly do you need to engage stakeholders to paint an accurate picture of this program?

Similarly, there are unlimited ways to gather credible evidence (Step 4). Asking these same kinds of questions as you approach evidence gathering will help identify ones what will be most useful, feasible, proper, and accurate for this evaluation at this time. Thus, the CDC Framework approach supports the fundamental insight that there is no such thing as the right program evaluation. Rather, over the life of a program, any number of evaluations may be appropriate, depending on the situation.

Experience in the type of evaluation needed
Comfortable with quantitative data sources and analysis
Able to work with a wide variety of stakeholders, including representatives of target populations
Can develop innovative approaches to evaluation while considering the realities affecting a program (e.g., a small budget)
Incorporates evaluation into all program activities
Understands both the potential benefits and risks of evaluation
Educates program personnel in designing and conducting the evaluation
Will give staff the full findings (i.e., will not gloss over or fail to report certain findings)

Good evaluation requires a combination of skills that are rarely found in one person. The preferred approach is to choose an evaluation team that includes internal program staff, external stakeholders, and possibly consultants or contractors with evaluation expertise.

An initial step in the formation of a team is to decide who will be responsible for planning and implementing evaluation activities. One program staff person should be selected as the lead evaluator to coordinate program efforts. This person should be responsible for evaluation activities, including planning and budgeting for evaluation, developing program objectives, addressing data collection needs, reporting findings, and working with consultants. The lead evaluator is ultimately responsible for engaging stakeholders, consultants, and other collaborators who bring the skills and interests needed to plan and conduct the evaluation.

Although this staff person should have the skills necessary to competently coordinate evaluation activities, he or she can choose to look elsewhere for technical expertise to design and implement specific tasks. However, developing in-house evaluation expertise and capacity is a beneficial goal for most public health organizations. Of the characteristics of a good evaluator listed in the text box below, the evaluator’s ability to work with a diverse group of stakeholders warrants highlighting. The lead evaluator should be willing and able to draw out and reconcile differences in values and standards among stakeholders and to work with knowledgeable stakeholder representatives in designing and conducting the evaluation.

Seek additional evaluation expertise in programs within the health department, through external partners (e.g., universities, organizations, companies), from peer programs in other states and localities, and through technical assistance offered by CDC. [12]

You can also use outside consultants as volunteers, advisory panel members, or contractors. External consultants can provide high levels of evaluation expertise from an objective point of view. Important factors to consider when selecting consultants are their level of professional training, experience, and ability to meet your needs. Overall, it is important to find a consultant whose approach to evaluation, background, and training best fit your program’s evaluation needs and goals. Be sure to check all references carefully before you enter into a contract with any consultant.

To generate discussion around evaluation planning and implementation, several states have formed evaluation advisory panels. Advisory panels typically generate input from local, regional, or national experts otherwise difficult to access. Such an advisory panel will lend credibility to your efforts and prove useful in cultivating widespread support for evaluation activities.

Evaluation team members should clearly define their respective roles. Informal consensus may be enough; others prefer a written agreement that describes who will conduct the evaluation and assigns specific roles and responsibilities to individual team members. Either way, the team must clarify and reach consensus on the:

Purpose of the evaluation
Potential users of the evaluation findings and plans for dissemination
Evaluation approach
Resources available
Protection for human subjects.

The agreement should also include a timeline and a budget for the evaluation.

This manual is organized by the six steps of the CDC Framework. Each chapter will introduce the key questions to be answered in that step, approaches to answering those questions, and how the four evaluation standards might influence your approach. The main points are illustrated with one or more public health examples that are composites inspired by actual work being done by CDC and states and localities. [13] Some examples that will be referred to throughout this manual:

The program aims to provide affordable home ownership to low-income families by identifying and linking funders/sponsors, construction volunteers, and eligible families. Together, they build a house over a multi-week period. At the end of the construction period, the home is sold to the family using a no-interest loan.

Lead poisoning is the most widespread environmental hazard facing young children, especially in older inner-city areas. Even at low levels, elevated blood lead levels (EBLL) have been associated with reduced intelligence, medical problems, and developmental problems. The main sources of lead poisoning in children are paint and dust in older homes with lead-based paint. Public health programs address the problem through a combination of primary and secondary prevention efforts. A typical secondary prevention program at the local level does outreach and screening of high-risk children, identifying those with EBLL, assessing their environments for sources of lead, and case managing both their medical treatment and environmental corrections. However, these programs must rely on others to accomplish the actual medical treatment and the reduction of lead in the home environment.

A common initiative of state immunization programs is comprehensive provider education programs to train and motivate private providers to provide more immunizations. A typical program includes a newsletter distributed three times per year to update private providers on new developments and changes in policy, and provide a brief education on various immunization topics; immunization trainings held around the state conducted by teams of state program staff and physician educators on general immunization topics and the immunization registry; a Provider Tool Kit on how to increase immunization rates in their practice; training of nursing staff in local health departments who then conduct immunization presentations in individual private provider clinics; and presentations on immunization topics by physician peer educators at physician grand rounds and state conferences.

Each chapter also provides checklists and worksheets to help you apply the teaching points.

[4] Scriven M. Minimalist theory of evaluation: The least theory that practice requires. American Journal of Evaluation 1998;19:57-70.

[5] Patton MQ. Utilization-focused evaluation: The new century text. 3rd ed. Thousand Oaks, CA: Sage, 1997.

[6] Green LW, George MA, Daniel M, Frankish CJ, Herbert CP, Bowie WR, et al. Study of participatory research in health promotion: Review and recommendations for the development of participatory research in health promotion in Canada . Ottawa, Canada : Royal Society of Canada , 1995.

[7] WHO European Working Group on Health Promotion Evaluation. Health promotion evaluation: Recommendations to policy-makers: Report of the WHO European working group on health promotion evaluation. Copenhagen, Denmark : World Health Organization, Regional Office for Europe, 1998.

[8] Public Health Functions Steering Committee. Public health in America . Fall 1994. Available at <http://www.health.gov/phfunctions/public.htm>. January 1, 2000.

[9] Dyal WW. Ten organizational practices of public health: A historical perspective. American Journal of Preventive Medicine 1995;11(6)Suppl 2:6-8.

[10] Centers for Disease Control and Prevention. op cit.

[11] Joint Committee on Standards for Educational Evaluation. The program evaluation standards: How to assess evaluations of educational programs. 2nd ed. Thousand Oaks, CA: Sage Publications, 1994.

[12] CDC’s Prevention Research Centers (PRC) program is an additional resource. The PRC program is a national network of 24 academic research centers committed to prevention research and the ability to translate that research into programs and policies. The centers work with state health departments and members of their communities to develop and evaluate state and local interventions that address the leading causes of death and disability in the nation. Additional information on the PRCs is available at www.cdc.gov/prc/index.htm.

[13] These cases are composites of multiple CDC and state and local efforts that have been simplified and modified to better illustrate teaching points. While inspired by real CDC and community programs, they are not intended to reflect the current

Pages in this Report

Acknowledgments
Guide Contents
Executive Summary
› Introduction
Step 1: Engage Stakeholders
Step 2: Describe the Program
Step 3: Focus the Evaluation Design
Step 4: Gather Credible Evidence
Step 5: Justify Conclusions
Step 6: Ensure Use of Evaluation Findings and Share Lessons Learned
Program Evaluation Resources

E-mail: [email protected]

To receive email updates about this page, enter your email address:

Exit Notification / Disclaimer Policy

The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
You will be subject to the destination website's privacy policy when you follow the link.
CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

Research Note
Open access
Published: 14 May 2024

How effective is nutrition training for staff running after school programs in improving quality of food purchased and meal practices? A program evaluation

Cecilie Beinert 1 ,
Margrethe Røed 1 &
Frøydis N. Vik 1

BMC Research Notes volume 17 , Article number: 136 ( 2024 ) Cite this article

35 Accesses

Metrics details

Objectives / purpose

After school programs represents a setting for promoting healthy dietary habits. The aim of this study was to evaluate how effective the after school program staff perceived nutrition training aiming to improve quality of food purchased and meal practices. We further aimed to assess the changes in purchase of primarily fish and fish products, whole grains and fruit and vegetables, by collecting receipts from food purchase before and after the intervention.

This is a mixed methods study. Group interviews with after school staff were carried out and the data was analyzed deductively according to the RE-AIM framework. Receipts from food purchase were collected. Findings from the qualitative interviews indicated that the intervention had been a positive experience for the staff and suggested a new way of working with promoting healthy foods in after school program units. Although there were some challenges reported, the staff made necessary adjustments to make the changes possible to sustain over time. Findings from the receipts support the changes reported by the staff. These showed increased purchase of vegetables, fish, and whole grain in all four after school program units. After school programs in similar settings may expand on these findings to improve the students’ dietary habits.

Peer Review reports

Introduction

Investing in young children’s health, education, and development are fundamental for the individual’s lifelong health and development [ 1 ]. An unhealthy diet during childhood tends to track into adulthood [ 2 ] and increases the risk of childhood obesity [ 3 ] and non-communicable diseases later in life [ 4 ]. Developing healthy eating habits in childhood is therefore essential to maintain good health throughout life [ 5 ].

In Norway, the diet of children and adolescents (age 9 and 13) are mostly in line with national dietary guidelines, however, they still consume too much saturated fat and added sugar and too little fruit, vegetables and fish [ 6 ]. Parents play an important role in establishing healthy eating habits in their children, but arenas where children spend a considerable amount of their time and consume meals on a regular basis also influence eating habits. Schools and afterschool programs (ASPs), both represent areas of health promotion opportunities, as they amongst others, shall facilitate daily mealtimes that give the students a basis for developing enjoyment of food, a sense of community and good health habits [ 7 ]. Although the national guidance for ASPs is clear that they shall ensure the food served is in accordance with national dietary guidelines, previous investigations show this may not always be the case [ 8 , 9 ], although the evidence base regarding foods served in ASPs in Norway is scarce. Every ASP has one leader and several members of staff. There are also no formal competence requirements for staff working in ASPs in Norway [ 10 ]. In 2021, 6 out of 10 students attended ASPs and it is most common for students in first grade (83%), with descending numbers up to fourth grade (30%) [ 11 ]. In 2022, the Norwegian government introduced up to 12 h per week free ASP for all first graders, resulting in an increase of participants from 83 to 92% [ 12 ]. As of August 1st, 2023, the government further introduced a 12 h per week free of charge ASP for second grade [ 13 ].

The intervention

A non-profit foundation, Geitmyra Culinary Center for Children (from now on Geitmyra), has developed an intervention; a nutrition training, to enhance the competence of those working with food in ASPs, aiming to improve the quality of foods and meal practice within the setting and context of the ASP. Examples of context may be time available for preparing meals, amount of money allocated to food purchases, and if the kitchen facilities allow for preparing of different dishes).

The intervention was carried out in autumn 2022. The staff at the ASPs were to serve food in accordance with national dietary guidelines, implementing changes over time, and maintain the dietary changes over time. The schools were public and recruited through the municipality and they volunteered to serve as pilot schools to inform a scale up to include the whole municipality.

The intervention lasted for five months consisting of different components, e.g. initial meetings, menu planning and cooking courses and a “food celebration”. See Table 1 for details.

Considering Norwegian student’s non-optimal diet and the need to increase ASP staff’s competence in preparing healthy meals served to students, our research questions were:

How effective did the ASP staff perceive the nutrition training to be?

How effective was the nutrition training targeting ASP staff in improving quality of foods purchased?

The aim of this study was to evaluate how effective the ASP staff perceived nutrition training aiming to improve quality of food purchased and meal practices. Further, to assess the changes in purchase of primarily fish and fish products, whole grains and fruit and vegetables, by collecting receipts from ASPs food purchases before and after the intervention.

Methods and materials

Study design.

To answer research question 1, the process evaluation, a qualitative study was conducted. Semi-structured group interviews were chosen as the data collection method because it gives comparable, reliable data and the flexibility to ask follow-up questions to clarify answers [ 14 ].

To answer research question 2, the outcome evaluation, a before and after comparison of quantitative data; foods purchased by ASPs were used. This was to test the data collection method, not powered to detect statistically significant change.

Sample and procedure

Process evaluation: All ASP leaders from the participating ASPs were invited to take part in a digital group interview. There were four participants, all female and they agreed to participate by written consent. It lasted for 55 min. A semi-structured interview guide [ 15 ] exploring general experiences of the nutrition training was developed in addition to possible barriers and facilitators to implementation of the knowledge and skills. One ASP staff member (female) was interviewed individually, via zoom due to convenience and it lasted for 43 min. We were unable to recruit more participants among ASP staff. All interviews were conducted by two research staff, a research scientist (MR) and a master’s student (KDB), both of whom were women and teachers of food and health in a University in Southern Norway. Interviews were audio-recorded with consent from participants. Receipts of food purchases (objective measures) for all four ASP units were collected from January to May in 2022 and 2023, respectively.

Data analysis – process evaluation

Interviews were transcribed verbatim (KDB). The transcripts were uploaded to NVivo 12 Pro. Data were analysed deductively to reflect participants perceptions of program impact as it related to the components of the RE-AIM framework. The RE-AIM framework was chosen to structure presentation of the process evaluation data because of its long history and broad utility [ 16 , 17 ]. RE-AIM is a framework to guide the planning and evaluation of programs or interventions and involves assessment of program impact against five domains: Reach, Effectiveness, Adoption, Implementation, and Maintenance. Recordings were listened to several times by the lead researcher (CB) and data coded under the headings of the RE-AIM framework. Two other research members (FNV, MR) familiarized themselves with the material to check for similarities and differences between the researchers. Discrepancies were resolved through discussions.

Data analysis - outcome evaluation

Grams per student per month were calculated from receipts for different food groups at both time points using Excel. Numbers from pure fish and fish products, whole grains, fruit, and vegetables are highlighted as these are of special interest because they were targeted in the intervention.

Process evaluation – research question 1

All ASP staff were invited to participate, and it was up to the individual ASP-leader to decide how many of their staff participated in the nutrition training. In total, 23 participated in the inspirational course (all four ASP leaders and 19 ASP staff in total from all four schools).

Effectiveness

A re-occurring feedback was that the intervention was a positive experience and that the intervention had provided them with valuable input. They reported changes in the way the meals are organized and that they included more vegetables. Instead of offering alternative dishes like bread with optional spread if students did not want the meal offered, they now offered cut vegetables or a single piece of bread as side dish to the meals prepared, to stimulate the students to try what’s offered; “we have stopped always substituting with crispbread. But when serving soup, we either make crisp bread or they get a slice of whole grain bread. By that, we also learned that they don’t need so much, they get something. And you see that when they don’t get anything else, they try it instead” (ASP leader 1).

A small group of the older students in the ASP were also trained during the intervention and helped select, prepare, and serve the meals, which was helpful to the employees who now experienced having more time for the students.

The interviewees discussed how the intervention had been a positive experience. The recipes were possible to carry out with the facilities available. The dialogue with Geitmyra during the intervention had been positive and helpful. Still, a few challenges were reported. One school reported a challenge related to recruiting the older students to help prepare or serve the meal, due to few students in this age group and that most of them wanted to play or do other things. Challenges with left-over food were also reported by several, as the students did not like all the dishes that were served or that the portion sizes were too large. The staff worked hard to avoid food waste, by adjusting the portions sizes. They also replaced recipes with those the students liked, giving leftovers to the teachers in school, and storing the food and using it the day after, or later that day. As one ASP leader expressed “We are fortunate to always have teachers who are hungry in the afternoon, so we usually get rid of the food …. The last time we had lentil soup, I heated it up and put it out when the parents came to pick them up. So, they got to taste it, but we, I also struggle making… its still a bit… too much, so we get a bit of leftovers” (ASP leader 3).

Implementation

The interviewees expressed how they adjusted the intervention to make it sustain over time. This was partly related to barriers like time and personnel “… we make it happen, but not to the extent that we did perhaps when Geitmyra was there with two extra adults… ” (ASP leader 2). One leader expressed how they used extra ovens in the school kitchen for preparing large quantities of crispbread for instance. Another ASP leader said they took the older students, who helped prepare the food, out of class a bit earlier, so they got enough time to prepare everything. Some also expressed how they skipped recipes their students did not like, instead of reintroducing it for repeated exposure. Some also adjusted the recipes to what the students preferred, by e.g., reducing the number of spices or by exchanging ingredients. There were no issues reported regarding economy related to purchase of food or equipment.

Maintenance

During the interviews, all agreed that they now had the tools needed to sustain these changes over time. As one ASP leader expressed, “ I really feel we got everything we need and more to continue ” (ASP leader 4). The ASP staff member expressed how she believed they would manage to continue, but that this required sustained initiative from their ASP leader. Post-intervention, the interviewees expressed how they had changed their practice.

Outcome evaluation – research question 2

Food purchases before and after the nutrition training were compared. In all four ASP-units there was an increase in the purchase of vegetables, pure fish, and whole grains (Table 2 ). The purchase of fruit decreased in two units and increased in the other two units. For results on other food groups relevant regarding dietary guidelines, see appendix 1 .

This study aimed to investigate if nutrition training was effective in improving the skills and knowledge among ASPs staff and if this may lead to changes in food purchase in line with national dietary guidelines, which could improve the diet of students in ASPs. All four ASP units; 23 participants, did the nutrition training, which is considered decent, because ASP staff also needed to be with the students, and some were absent from work (reach).

Findings from the interviews indicated that the intervention had been a successful way of working with promoting healthy nutrition in ASPs (effectiveness). Although the staff met some challenges (adoption) and made some adjustments (implementation) to the intervention, they reported on changes made regarding student involvement and what they are serving. The ASP staff (including the leaders) reported being confident in continuing with the changes they had made, which may indicate a higher confidence and competence in their work regarding mealtimes (maintenance). Although there were challenges met regarding time and facilities, the staff made necessary adjustments to make it work.

The changes reported by the ASP staff and leaders were supported by the receipts collected pre- and post-interventions. For two ASPs, fruit purchase went down after the intervention. The ASP leader at school 1 said they had almost stopped serving fruit, and only served vegetables after the intervention, and school 4 mentioned how they had begun serving vegetables to all meals. This might explain the drop in fruit purchase at school 1 and the large rise in vegetable purchases at school 4. When assessing fruit and vegetables combined, this increased in all schools.

Mozaffarian et al. evaluated an organizational intervention in after school programs in the US, to improve snack and beverage quality [ 18 ]. They found significant improvements with respect to increasing servings of fresh fruits and vegetables [ 18 ] in line with our study regarding vegetables, and partly for fruits. They did not find increased servings of whole grains as we did in our study. Since this study evaluated private ASPs in the US, we cannot assume that settings and context are similar to our Norwegian study. To our knowledge, there are sparse literature available in a Nordic context for interventions targeting nutrition among ASP staff. Our findings from this study may support implementation of a more comprehensive training program in Norway and may also inform after school programs in a Nordic setting.

This study found that nutrition training with close follow-up over time may be an effective way of creating change in food and meal practices in ASPs. There are large societal benefits to be made if youth in Norway adhere to dietary guidelines, such as prevention of non-communicable diseases and economic benefits [ 19 ] and investing in child health will, therefore, yield long term benefits [ 1 , 2 , 4 ]. Although there are dietary guidelines on food offered in ASPs in Norway, these must also be followed by the individual units, which may not always be the case [ 8 , 9 ].

Findings from this program evaluation of nutrition training showed that it was effective in improving skills and knowledge among after school program staff. Receipts from food purchases before and after the intervention revealed an increase in purchase of vegetables, pure fish, and whole grain products.

Limitations

Although all four ASP leaders agreed to participate in group interviews, the number of ASP units was small, and we were only able to interview one ASP staff. More participants would have made it possible to assess effectiveness to a larger degree. No control group was included, and we cannot rule out that changes in food purchases could be explained by other factors besides the intervention. Also, there is no long-term follow-up of the participants. Strengths include purchasing data (receipts) collected over a period of five months before and after the staff training, which are objective data showing change. The fact that the receipts were collected the same five months (January to May) the two consecutive years, limits season variation in food purchase and strengthen comparison. Strengths are also in applying mixed methods.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

After school program

Geitmyra Culinary Center for Children

Clark H, Coll-Seck AM, Banerjee A, Peterson S, Dalglish SL, Ameratunga S et al. A future for the world’s children? A WHO–UNICEF–Lancet Commission. Lancet. 2020 Feb 22 [cited 2020 Jul 30];395(10224):605–58. https://doi.org/10.1016/S0140-6736(19)32540-1

Mitchell GL, Farrow C, Haycraft E, Meyer C. Parental influences on children’s eating behaviour and characteristics of successful parent-focussed interventions. Appetite. 2013;60(1):85–94.

Article PubMed Google Scholar

World Health Organization. Report of the commission on ending childhood obesity. Geneva. 2016 [cited 2023 Sep 19]. https://apps.who.int/iris/handle/10665/204176

World Health Organization. Global action plan for the prevention and control of noncommunicable diseases 2013–2020. 2013 [cited 2020 Jul 28]. https://apps.who.int/iris/bitstream/handle/10665/94384/9789241506236_eng.pdf

Schwarzenberg SJ, Georgieff MK. Advocacy for improving nutrition in the first 1000 days to support childhood development and adult health. Pediatrics. 2018;141(2).

Hansen Brooke L, Myhre Borch J, Johannesen Wetting AM, Paulsen Mohn M, Andersen Frost L. UNGKOST 3 Nationwide dietary survey among students in 4th and 8th grade in Norway, 2015. (In Norwegian). Oslo; 2016 [cited 2019 May 27]. https://www.fhi.no/globalassets/dokumenterfiler/rapporter/2017/ungkost-3-rapport-blant-9-og-13-aringer_endeligversjon-12-01-17.pdf

Norwegian Directorate for Education and Training. Framework plan for SFO. 2021. https://www.udir.no/contentassets/ae8e58012cc94e04b0a767c9cea88e67/rammeplan-sfo-engelsk.pdf

the Norwegian Consumer Council. Food in after-school-hours care. Nutrition for life, play and learning (In Norwegian) [Internet]. 2018. https://storage02.forbrukerradet.no/media/2018/09/20180822-ke-appetitt-pa-livet-sfo-rapport.pdf

Norwegian Directorate of Health. Food and meals in after-school-hours care managers. A qualitative nationwide survey among after-school-hours care managers (In Norwegian). 2013. https://www.helsedirektoratet.no/rapporter/mat-og-maltider-i-skolen-og-skolefritidsordningen-undersokelser/Mat og måltider i skolefritidsordningen – En kvantitativ landsdekkende undersøkelse blant ledere av skolefritidsordningen.pdf/_/attachment/inline/8c.

Utdanningsforbundet. Use of assistants in primary school (In Norwegian) [Internet]. 2023. https://utdanning.no/yrker/beskrivelse/skoleassistent#:~:text=Det kreves ingen formell utdanning for å arbeide som skoleassistent.

Norwegian Directorate for Education and Training. The Norwegian Education Mirror 2022. 2023. pp. 37–40. https://www.udir.no/in-english/the-education-mirror-2022/compulsory-education2/out-of-school-care-abbreviated-as-sfo-in-norwegian/

Ministry of Education and Research. 60 000 students get cheaper after school program and 7 000 children get free Kindergarten (In Norwegian). 2023. https://www.regjeringen.no/no/aktuelt/60-000-elever-far-billigere-sfo-og-7000-barn-far-gratis-barnehage/id2984316/

Lovdata. Regulations for the Education Act Chap. 1B. The after-school program (In Norwegian). § 1B-4 Free after-school program for students in 1st and 2nd year Norway. 2023. https://lovdata.no/dokument/SF/forskrift/2006-06-23-724/KAPITTEL_3#§1b-4

Malterud K. Qualitative research methods for medicine and health sciences (in Norwegian). 4th ed. Universitetsforlaget; 2017. p. 256.

Merriam SB. Qualitative research: a guide to design and implementation. San Francisco: Jossey-Bass; 2009. p. 304.

Google Scholar

Shaw RB, Sweet SN, McBride CB, Adair WK, Martin Ginis KA. Operationalizing the reach, effectiveness, adoption, implementation, maintenance (RE-AIM) framework to evaluate the collective impact of autonomous community programs that promote health and well-being. BMC Public Health. 2019;19(1):1–14.

Article Google Scholar

Glasgow RE, Vogt TM, Boles SM. Evaluating the public health impact of health promotion interventions: the RE-AIM framework. Am J Public Health. 1999 Oct 7 [cited 2024 Apr 25];89(9):1322–7. https://doi.org/10.2105/AJPH.89.9.1322

Mozaffarian RS, Wiecha JL, Roth BA, Nelson TF, Lee RM, Gortmaker SL. Impact of an Organizational Intervention Designed to Improve Snack and Beverage Quality in YMCA After-School Programs. Am J Public Health. 2010 May 1 [cited 2024 Apr 25];100(5):925. /pmc/articles/PMC2853616/

Sælensminde K, Johansson L, Helleve A. Societals gains of adeherence to dietary guidelines (In Norwegian). Oslo; 2016 [cited 2019 Dec 18]. https://www.helsedirektoratet.no/rapporter/samfunnsgevinster-av-a-folge-helsedirektoratets-kostrad/

Download references

Acknowledgements

The authors wish to thank the participants at each ASP unit and the staff at Geitmyra Culinary Center for Children. We also want to thank the master’s student (KDB) that participated in the interviews and transcribed verbatim.

This study is funded by Sparebankstiftelsen, Kompetansefondet and the University of Agder. The financial contributors were not involved in designing the study, data collection, analyses, interpretation of data or in writing the manuscript.

Open access funding provided by University of Agder

Author information

Authors and affiliations.

Department of Nutrition and Public Health, University of Agder, Postboks 422, Kristiansand, 4604, Norway

Cecilie Beinert, Margrethe Røed & Frøydis N. Vik

You can also search for this author in PubMed Google Scholar

Contributions

CB, FNV and MR initiated and designed the study. CB, FNV and MR initiated and drafted the first version of the manuscript. MR performed the interviews. CB analyzed the interviews and MR and FNV interpreted them together with CB. All authors contributed to, read, and approved the final version of this manuscript.

Corresponding author

Correspondence to Frøydis N. Vik .

Ethics declarations

Ethics approval and consent to participate.

The study was approved by Norwegian Agency for Shared Services in Education and Research (ref. 231769) and by the faculty ethics committee and has been conducted in line with the Helsinki Declaration of 1985, revised in 2008. The study was voluntary, and written informed consent was obtained from all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

: Appendix 1 Food purchase in grams per student per month

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Beinert, C., Røed, M. & Vik, F.N. How effective is nutrition training for staff running after school programs in improving quality of food purchased and meal practices? A program evaluation. BMC Res Notes 17 , 136 (2024). https://doi.org/10.1186/s13104-024-06798-5

Download citation

Received : 14 November 2023

Accepted : 07 May 2024

Published : 14 May 2024

DOI : https://doi.org/10.1186/s13104-024-06798-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Nutrition training
Intervention

BMC Research Notes

ISSN: 1756-0500

Submission enquiries: [email protected]
General enquiries: [email protected]

Open access
Published: 10 May 2024

Translating global evidence into local implementation through technical assistance: a realist evaluation of the Bloomberg philanthropies initiative for global Road safety

Rachel Neill ORCID: orcid.org/0000-0002-1110-5479 1 ,
Angélica López Hernández 1 ,
Adam D. Koon 1 &
Abdulgafoor M. Bachani 1

Globalization and Health volume 20 , Article number: 42 ( 2024 ) Cite this article

219 Accesses

Metrics details

Traffic-related crashes are a leading cause of premature death and disability. The safe systems approach is an evidence-informed set of innovations to reduce traffic-related injuries and deaths. First developed in Sweden, global health actors are adapting the model to improve road safety in low- and middle-income countries via technical assistance (TA) programs; however, there is little evidence on road safety TA across contexts. This study investigated how, why, and under what conditions technical assistance influenced evidence-informed road safety in Accra (Ghana), Bogotá (Colombia), and Mumbai (India), using a case study of the Bloomberg Philanthropies Initiative for Global Road Safety (BIGRS).

We conducted a realist evaluation with a multiple case study design to construct a program theory. Key informant interviews were conducted with 68 government officials, program staff, and other stakeholders. Documents were utilized to trace the evolution of the program. We used a retroductive analysis approach, drawing on the diffusion of innovation theory and guided by the context-mechanism-outcome approach to realist evaluation.

TA can improve road safety capabilities and increase the uptake of evidence-informed interventions. Hands-on capacity building tailored to specific implementation needs improved implementers’ understanding of new approaches. BIGRS generated novel, city-specific analytics that shifted the focus toward vulnerable road users. BIGRS and city officials launched pilots that brought evidence-informed approaches. This built confidence by demonstrating successful implementation and allowing government officials to gauge public perception. But pilots had to scale within existing city and national contexts. City champions, governance structures, existing political prioritization, and socio-cultural norms influenced scale-up.

The program theory emphasizes the interaction of trust, credibility, champions and their authority, governance structures, political prioritization, and the implement-ability of international evidence in creating the conditions for road safety change. BIGRS continues to be a vehicle for improving road safety at scale and developing coalitions that assist governments in fulfilling their role as stewards of population well-being. Our findings improve understanding of the complex role of TA in translating evidence-informed interventions to country-level implementation and emphasize the importance of context-sensitive TA to increase impact.

Road traffic crashes are the leading cause of death for persons aged 5–29 years age [ 1 ], and the 12th leading cause of deaths overall [ 2 ]. Road traffic mortality is three times higher in low-income countries than high-income countries (HICs), despite low-income countries having less than 1% of global motor vehicles [ 2 ]. Over half of traffic-related deaths are vulnerable road users (e.g., pedestrians, cyclists, and motorcyclists) [ 2 ].

Attention to road safety has grown, supported by evidence on the severity of the problem and solutions [ 3 ]. Successive ‘Decades of Action for Road Safety’ have raised awareness, and new institutions have improved policy cohesion and civil society mobilization [ 3 ]. The global road safety community has also cohered around a consensus-based solution – the safe system approach – developed in Sweden and increasingly applied globally. The safe system approach is a human-centered, proactive approach that shifts the focus of road safety from preventing crashes and improving road user behavior to preventing deaths and injuries while accounting for human error [ 4 ]. Despite global momentum, there is limited implementation of the safe system approach in low- and middle-income countries (LMICs) [ 3 , 5 , 6 ]. Global road safety programs emphasize the adaptation of the safe systems model to LMICs [ 5 , 7 ], even though the implementation context in LMICs varies significantly [ 8 ].

The role of technical assistance

Technical assistance (TA) is one way to increase the uptake of the safe system approach and other evidence-informed interventions. TA is a capacity-building process to design and/or improve the quality, effectiveness, and efficiency of programs and policies, [ 9 ]. Multi-country TA programs seek to translate the safe system approach to LMICs to reduce traffic-related injuries and mortality. The Bloomberg Philanthropies Initiative for Global Road Safety (BIGRS) is one of the largest and longest-standing multi-country road safety TA programs. This analysis concerns BIGRS Phase II, which provided a common package of TA interventions to ten LMIC city governments from 2014 to 2019. By the end of Phase II, cities differed considerably on the scale and scope of implementation.

BIGRS’ differential experiences across LMIC cities present an empirical case study on the feasibility of adapting common technical approaches across divergent contexts and the TA’s role. How much influence does TA have? What is the role of context in shaping TA providers’ and recipients’ agency?

A diverse body of scholarship concerns these questions and can guide empirical inquiry. Diffusion of innovation theory describes the process of transferring an evidence-informed intervention from one setting to another [ 10 , 11 ] and has been used to explore TA effectiveness [ 9 ]. Diffusion of innovation theory focuses on intervention characteristics, intervention adaptation, and how adaptation influences adoption and fidelity [ 10 , 11 , 12 ]. Greenhalgh’s Determinants of Diffusion, Dissemination, and Implementation of Innovations in Health Service Delivery and Organization Conceptual Model builds on diffusion of innovation theory by mapping considerations that influence the uptake of innovations. These include credibility, personal relationships, effective communication,translation of the innovation to meet end-users needs, and support to adopters [ 12 ]. More broadly, social science theories consider the role of structural context (e.g., laws, social norms, and governance) and pragmatic implementation contexts (e.g., individuals, relationships, and organizational cultures) in determining adaptation, and implementation [ 13 ]. These literature bring different perspectives to explain change through the interaction of interventions, actors, and context.

However, there is limited application of this literature to understand TA, especially road safety TA. A growing body of case studies describes what works and does not work for improving road safety in LMICs [ 14 , 15 , 16 ]. Limited research emphasizes political will, intervention tailoring, human and financial resources for dissemination [ 17 ], the best practice exchange [ 18 ], technology transfer [ 19 ], and the power of multi-sectoral coalitions [ 20 ] to translate road safety evidence into practice. However, despite the existence of several multi-million-dollar road safety TA and funding programs [ 21 , 22 , 23 , 24 , 25 ], we did not identify any evidence on the role of TA in supporting or inhibiting road safety improvements – a key evidence gap.

Study objective

This study aims to improve understanding of if, how, why, and under what conditions TA programs strengthen evidence-informed road safety programs in LMICs. We do this via a realist evaluation with a multiple case study design of BIGRS’ implementation, comparing how common TA interventions interacted with contextual factors to produce differential observable outcomes in Accra, Bogotá, and Mumbai. These findings are distilled into a program theory that provides insight into how ‘global’ approaches are translated to country-level implementation and can be used to guide TA’s design and implementation.

Realist evaluation connects theories of ‘how the world works’ with ‘how a program works’ to explain how interventions trigger mechanisms in different contexts [ 26 ]. We used a realist evaluation methodology [ 27 ] to identify how, why, and under what conditions TA can strengthen evidence-informed road safety, with a multiple case study design to improve understanding of how BIGRS worked in diverse contexts [ 28 , 29 ] [ 26 ]. This methodology was selected to identify the underlying mechanisms driving the program’s differential outcomes in different contexts [ 27 ].

Realist evaluation

Programs are theories about how something works. They are embedded into open systems and adaptively interact with the context. Intervention outcomes result from engagement between program actors and contexts [ 27 ]. An intervention-context-mechanism-outcome pattern (ICMO) represents this [ 27 ] (Table 1 ).

We adhered to the Realist and Meta-narrative Evidence Syntheses: Evolving Standards (RAMSES) II reporting guidelines for realist evaluation to guide design, data collection, and analysis [ 26 ], provided in Additional File 1 . The study protocol is in Additional File 2 .

Study setting

Bloomberg Philanthropies’ BIGRS Phase Two was implemented in Accra, Addis Ababa, Bandung, Bangkok, Bogotá, Fortaleza, Mumbai, Ho Chi Min, Sao Paulo, and Shanghai from 2014 to 2019 and is the focus of this study. BIGRS is currently in its third phase and has scaled up to 27 cities and two states across Latin America, sub-Saharan Africa, and Asia.

Cities applied for BIGRS-supported TA by submitting a proposal that demonstrated their commitment to and plans for road safety. This is important because it meant that cities demonstrated a common commitment and desired TA, at least in theory. Funding for interventions (e.g., re-designing an intersection or mass media campaigns) came from city governments.

BIGRS’ TA came with a technical agenda – aligned to the safe system approach – on how road safety should be improved. BIGRS’ scope was tailored to city needs within pre-existing parameters and excluding funding for capital construction. To provide TA, BIGRS seconded staff into leading road safety agencies to build institutional capacity for change. Embedded staff supported BIGRS interventions, provided direct TA to city counterparts, and often, provided cross-cutting support to city officials. In addition, seven international partner organizations managed technical activities. Partners and embedded staff were aligned to technical areas and often worked with different counterparts (e.g., an enforcement partner working with the police, an infrastructure partner working with an engineering unit).

Case design and sampling

A multiple case study design was utilized; see Additional file 3 for details. Only cities continuing into BIGRS Phase Three were eligible for selection to ensure access to informants. We purposefully selected three cities – Accra, Ghana; Bogotá, Colombia; and Mumbai, India – with different baseline characteristics described in Table 2 .

Data collection

Key informant interviews (KIIs) were our primary data source, and documents were secondary. Program documents were used to build an initial program theory, develop the interview guides, and follow ‘hunches’ about how an intervention worked in a context [ 35 , 36 ]. We also snowballed documents from interviews to confirm and triangulate interview findings.

Key informant interviews

We used a theoretical sampling approach to select informants based on their ICMO potential [ 27 ]. We iteratively sampled informants until saturation – when interviews provided no new insights [ 37 ]. Table 3 describes the KII sample. Road safety governance models influenced the balance of KII types. Road safety governance in Accra and Mumbai is more diffused than Bogotá, which had fewer government informants. KIIs also varied across cities due to differential access to informants. To overcome this disadvantage, we triangulated findings with the document review.

Interviews were conducted from January 2020 to November 2022 by two members of the research team with doctoral-level training in qualitative methods. Participants were contacted via email and invited to a one-hour interview on barriers and enablers to BIGRS and mechanisms associated with program outcomes. Interviews in Mumbai were conducted via Zoom. Interviews in Bogotá and Accra were conducted in-person and on Zoom.

A realist approach to interviewing was used to build an iterative understanding of how the program worked, test our interpretations, and seek alternative explanations (Additional file 3 ). Interviews were recorded and transcribed with permission. Fifty-one interviews were conducted in English, recorded, and transcribed. Seventeen interviews were conducted in Spanish by a native speaker, recorded, transcribed, and translated into English by a certified translator.

Data collection and analysis were done iteratively using a process of retroduction [ 35 , 36 , 38 ] (Fig. 1 ).

Iterative analysis process

As is common in this literature [ 39 ], BIGRS’ theory of change (TOC) was used as the initial program theory (IPT). Following guidance on realist evaluation analysis [ 38 ], we iteratively identified ICMOs and compared them to the IPT and broader literature to develop the program theory. This included an initial thematic coding of the data, a second round of theory refinement coding where themes were split into ICMOs, triangulation of findings from the documents and interviews, and comparison of the findings with existing theory to deepen our understanding of plausible mechanisms. Additional file 3 describes this process in more detail. We conducted this analysis in NVivo12.

Regular discussions were held across the research team to define and iterate on the codebook, discuss emergent themes, and review ICMO configurations. Memos were developed in Microsoft Word and documented ICMO iterations. Draft findings were shared with a subset of the participants for feedback and validation before program theory finalization.

In 2014, BIGRS initiated a common TA program in Accra, Bogotá, and Mumbai. TA interventions, individuals providing and receiving TA, the city context, and the national-level road safety context influenced implementation. Table 4 outlines interventions and outcomes. Interventions are grouped under two outcomes: (1) improved road safety capabilities (via capacity building and data) and (2) increased the uptake of evidence-informed road safety interventions (via infrastructure, enforcement, and policy support).

We present one example per case that demonstrates how different interventions worked together to achieve different outcomes in case study cities. Interventions (i), mechanisms (m), contexts (c), and outcomes (o) are denoted in the text. Reference to KII data is provided as M# for Mumbai, B# for Bogotá, A# for Accra, and G# for KIs working across multiple cities.

Transforming junctions on Mumbai’s congested streets

When BIGRS began, Mumbai’s road safety officials used high-level figures on traffic fatalities supplemented with national or state-level statistics to guide road safety decision making (c) . The city-specific data required to target road safety interventions was buried in paper-based police records of variable quality (c) . A government official in Mumbai describes:

" That is [a] very big [problem] because we are not like other countries, we are not getting the data correctly.” – M17

In response, BIGRS’ TA first sought to improve surveillance data. An embedded surveillance coordinator partnered with the police to catalog and analyze city surveillance data and package it in new annual city road safety reports (i) [ 40 ]. Infrastructure assessments (i) further demonstrated how specific road junctions contributed to injuries and mortality (o). BIGRS staff described the change in data availability:

“[Before] there were no reports at all […]. Now I have […] a 40-page report that talks about who the road users are, […] a list of high-risk junctions and corridors […], a map that details the hotspots where crashes are occurring […] which vehicle is causing maximum crashes, […] the time of the day, the month of the year, the day of the week.” – M8

New data demonstrated that half of traffic-related crash victims were pedestrians, which was further reported by local city media [ 41 , 42 ]. Providing granular, city-specific data shifted the focus (m) of government towards pedestrian safety (M8, M12, M17). The same BIGRS staff member described:

“The government didn't know that so many pedestrians were dying in crashes. These reports help bring that to light. And when that came to light, they started taking a more serious approach.” – M8

The emphasis on pedestrians was echoed by city government officials, who agreed that the data was illuminating (M12, M17). But the city also required solutions for this perception shift to lead to concrete action. The same government official describes the challenges (c) :

“We lack the best instrument in the old system to make the road elevated, or a road underpass is very difficult because the traffic on that high main road. […] That is very critical because it we are really facing problems.” – M17

BIGRS provided the safe system approach as a solution– but how would it work in Mumbai? This question was a central concern in all cities, especially in densely populated Mumbai, where participants described the street as a ‘contested space’ (M9, M2, M21) (c) . Implementing the safe system approach in Mumbai required a complex adaptation process (G5, M2, M9, G6, M7) (i) . A BIGRS infrastructure partner described:

“We’re constantly trying to balance Global Best Practices versus what can be done in an Indian city while pushing boundaries to be able to think outside the box. […] it's helpful to show International Best Practices, but also at the same time, balancing it out with what’s actually possible in Indian cities.” – M9

Implement-ability was top of mind for city government officials taking risks by trying a new approach (c) . BIGRS staff had to recognize those risks and work collaboratively with city government counterparts to understand how international approaches could work within local realities (B15, M13) (c) . A BIGRS staff member described this:

“[When you introduce international examples], there are a lot of questions and pushback saying, ‘how could this be done [here]? That was also instructive to us. How do you deal with such situations?” – M13

Short-term demonstration projects – for example, temporarily changing traffic flow using cones and other local, low-cost materials (i) – allowed city government counterparts to see the safe system approach in action on their street in a low-risk context, demonstrating that a new approach was possible (B15, M11, G7, M9, B7, M17). A city official describes:

“People are generally not aware of the things [happening outside India], [but] the problems are same. […] That can be taken only if you can show them the models […] because firsthand information from those people is much more important.” – M17

Seeing the possibility of change was perceived to shift the focus of road safety towards vulnerable road users and especially pedestrians (M18B, B15, M11, G7, M9, M15, M18, M17, M2, G2, M9, M7, M14, M8) (m) . It also created a ‘how-to’ moment, enabling city government counterparts to internalize both the concept and implementation feasibility (m). A city official describes what he learned:

“[I learned] new technical things, that might be there's been a certain technical change in junction design or in the road design. […] we were not able to do that thing nicely already […] We were able to grab that opportunity properly.” – M18

Once city government counterparts understood the potential of the safe system approach (o) , BIGRS TA worked with city officials to use the data and select specific junctions for re-design (i) . Data was critical because it helped target infrastructure improvement to junctions with an outsized number of crashes. A BIGRS staff described:

“Now [Mumbai city government are] not just randomly doing the interventions. They're very focused on where crashes are occurring, who the victims are, who the perpetrators are, and how to ensure that these crashes don't occur at all.” – M8

Once the mechanism for change was triggered, transforming junctions started with a pilot (i). Pilots ensured that the safe systems approach was feasible and appropriately adapted to the context, that its impact on different road users was understood and planned for, and that the re-design successfully reduced crashes [ 43 , 44 , 45 ]. Pilots also allowed the city government to understand public sentiment about the changes (c) [ 43 ] . If the public was supportive, this reduced the risk to city officials trying a new approach (m) . The city government counterpart’s confidence in new approaches grew (m). This was further reinforced by data showing that infrastructure redesign positively impacted traffic flow [ 43 ]. A pilot’s success was described as leading to exponential growth in implementation (M21, M1) [ 44 , 46 , 47 ]. A BIGRS partner described:

“If you see our work, it has exponentially grown in impact. […] From that one [pilot] corridor, we […] build a relationship and trust […] and so we got a chance to do design in the intersection. Then you try it with temporary sort of barricades and then it became a big thing. And then one thing just kept leading to another to another.” – M21

As implementation took off, BIGRS engaged local media to spread awareness about the junction transformations (i) (M11). After seeing firsthand what could be accomplished, the city government also committed to improving high-risk intersections in the city. However, despite growing momentum and support from both city government and city engineers to transform individual junctions, the bottom-up pilot approach presented practical scale and sustainability challenges despite this government commitment (M9, M13, documents) (c). A BIGRs partner described:

“It's a challenge at times, when the city does not have the funds allocated in that year. If you do manage a successful pilot and the city takes on doing it, then it's great because they can be scaled up. But in many cases […] pilots are sort of left as just that.” – M9

BIGRS partners described city government approvals as challenges preventing scale. In contrast, city government participants urged respect for government processes and timelines, which they saw as paramount to success (c). In managing these processes, city officials also took on significant work to enable each infrastructure re-design (M12, M18) – a contribution that often went unacknowledged (c).

Comparison of Mumbai’s infrastructure experience with other BIGRS cities

The Mumbai infrastructure example is illustrative of common dynamics. In Bogotá, capacity building was similarly perceived as successful when it used hands-on components specifically relevant to the participants (B15, G2, B9, B7, B14), and when facilitators used a coaching model that emphasized the participant’s experience (B15, B5, B9, G2).

Accra’s and Bogotá’s infrastructure TA were also targeted at bottom-up approaches (G6, A9, A5, A6, A8, M9) and guided by city-specific data (i ), but with limited scale. In Accra, BIGRS focused on low- or no-cost interventions (e.g, changing signal times for pedestrian crosswalks, widening pedestrian medians (i) ) (A5, A6, A8) because the city did not control the infrastructure budget and could not budget for new interventions. BIGRS also worked with the city to re-design the infamously dangerous Lapaz intersection to improve pedestrian safety, which was funded directly by BIGRS via a small grants program (i) [ 48 ]. In Bogotá, tactical urbanism demonstrated speed-calming measures, and feedback from road users was gathered (i) (B2). However, despite promising pilots, the lack of BIGRS’ ability to influence upstream changes to road procurement tenders and design guidelines limited the scale of infrastructure outcomes in each city.

Enforcing road safety legislation in Accra

In Accra, road safety legislation existed but needed to be enforced (c). In the words of a national road safety agency staff , “there’s no real commitment in solving some of these things.” (A17). BIGRS’ enforcement interventions started with relationship building (i). A BIGRS partner describes:

“How important it is to have this relationship with the high-level police officers. Because we cannot just go to a city or road police agency and say that this is what we want to do.” – A22

Trainings on the safe systems approach (i) and evidence-based enforcement operations were enabled by leadership support from the Superintendent of Police (A4) and the Mayor of Accra who championed road safety and several BIGRS initiatives (A4, A6, A3, A5, A8, A1, program documents) [ 48 , 49 ] (c) .

However, translating training into implementation quickly stalled because the police force required equipment and certification for implementing enforcement operations (c) (A1, A4, A6, program documents) . BIGRS’ partners then donated new drink-driving and speed enforcement equipment under the condition that the city utilized the equipment to conduct enforcement pilots (i) . These donations were accompanied by training and certification processes (A1, A4, A25, program documents) (c) .

While the lack of equipment could be directly addressed by BIGRS, the disconnect between city-level enforcement efforts and Ghana’s centralized policing structure could not be so easily overcome. City police did not have the authority to conduct enforcement operations (c) , so in exchange for the donated equipment, the police formed a dedicated tri-partite pilot task force with the authority to use the donated equipment in a series of roadside speed and drink-driving enforcement operations (i) .

New training and improved accuracy of the equipment were perceived to reduce conflict between police and citizens during enforcement and improved transparency in the enforcement operations (A1, A4, A25, program documents), reducing the perception of risk of public blowback (m). A high-ranking police officer describes the perceived increase in acceptability from the public:

“they don’t complain, they go to the court […] because you’ve told us that the device arefor enforcement operations (A4, A25) limiting further […] the very latest speed device, speed detection devices [equipment] because we’ve told the whole world about it.” – A25

The collective intervention – piloting the enforcement approach, supported by training and in tandem with appropriate equipment – was also received positively by the police. A senior police officer described a shift in focus towards ensuring road safety (m):

“What I’ve realised is, what a positive impact on our capability to be able to ensure road safety. [..]. With the devices, we can go to the route when they see us, all cars, cars you know approaching the robot, reduce their speed and that has really resulted in a lot of improvement.” – A25

However, while the pilot taskforce did conduct enforcement operations, a series of upstream barriers prevented the taskforce from scaling up. Most practically, the police force still lacked dedicated vehicles for enforcement operations (A4, A25) limiting further implementation. More broadly, the social and political context (A1, A4) (c) remained unconducive to enforcement. A BIGRS staff describes:

“During [the enforcement pilot], we did a special round of data collection for speed, and the data showed that there was a reduction in speed. However, the moment could not be sustained. Some of the feedback they got from the police was that [the] police could not boldly or fearlessly enforce.” – A1

Another challenge was that the required authority to change enforcement practices was vested in national agencies instead of the city government, limiting the ability of city police to institutionalize new enforcement operations (A4, program documents) (c). Finally, the transfer of police was described as a challenge to sustaining enforcement interventions (A4, A2, program documents) (c). A BIGRS partner described:

“We can work with person, everything agreed, and then just before we roll out, he's been transferred or there's a rotation, and we have to change everything.” — A22

Comparison of Accra’s enforcement experience with other BIGRS cities

Across the enforcement TA provided in the three cities, building trust with senior police officers was repeatedly emphasized (A4, B4, M16, G7, G10). Using former senior police officers from other countries was seen as key to building that neccessary trust (B4, M16, G7, G10) (c).

Like Bogotá, Accra’s enforcement interventions took place within broader city road safety prioritization (c), and BIGRS donations ensured police had the right equipment (i) , leading to increased enforcement (B8, G12, A4, A1, A22, A25) (o) . However, Bogotá’s enforcement was described as widespread and sustained (B8, G12), while in Accra, enforcement operations remained limited (o). The authority of the police to conduct enforcement was the key difference (c).

In contrast, India was moving towards an automated speed enforcement model, which contrasted with the model proposed by BIGRS (c) . Although automated and roadside enforcement co-exist (and they did in Bogotá), BIGRS’ roadside enforcement model did not align to the broader policy agenda in Mumbai and was not implemented.

Reducing city-wide speed limits in Bogotá – an example of policy change

Before BIGRS, improved mobility had been the focus of several consecutive city administrations (c) . A BIGRS partner described the favorable baseline environment:

“Bogotá has been concerned about road safety for a long time. [Bogotá] already had a Road Safety Directorate; […] there was already a direction with a super great team. It was easy to work in Bogotá because institutionally, they were already armed.” – B13

During Phase II, a new Secretary of Mobility with a public health background further elevated road safety in city administration (c) which was perceived as critical to the city's subsequent policy change (B1, B11, B12, B13, B14, B15, B16). A city official explained:

“ It is about setting priorities. So, we [the secretariat], from the first day, said the priority is road safety, and we will do everything possible to make it so.” – Bogotá 2

Alongside a change in government, BIGRS also hired new embedded staff, some who were former members of city government, all who were local to Bogotá, and all who were passionately committed to improving road safety (c) (B12). However, support for road safety did not immediately translate to speed. Instead, city officials were interested in reducing drunk driving and were explicitly resistant to tackling speed ( c) . This was due both to concerns that reducing speed would increase traffic and also a perceived lack of concern from the population over speed (G12, G5, B8, B12). A BIGRS staff recalled:

“Even when communicating to [the] Mayor, he had the issue of road safety in his heart the main thing he communicated and did not want to do. ‘Do not slow down on arterial roads’” – B12

However, BIGRS’ analyses of city data (i) identified that speed was a serious concern on arterial roads at night (B8, B12, G5, G8). A BIGRS partner in Bogotá described:

“The first thing I did was share with the Police the data that clearly showed that most of the deaths occurred at night or early in the morning when most roads were empty.” – B8

This was further demonstrated by a modeling study (i) showing both the relationship between speed and the crash rate and that the change in speed limits would not impact average travel times. This study was important evidence, which was only possible because the city’s existing speed detection infrastructure provided the modeling data (B12) (c).

The presentation of this novel information to city officials was perceived to shift the focus of city officials by demonstrating that speeding was prevalent at night when roads were empty, and that reducing speed wouldn’t worsen traffic (m) . City officials used this data to select five arterial road corridors with high speeds, crashes, and deaths to pilot a reduced speed limit of 50 km per hour (kmph) (o and i) .

The speed reduction pilot required close collaboration between the Secretariat of Mobility and the police to conduct nighttime enforcement (c). However, the police lacked necessary nighttime radar equipment (c) (B8, G12), a gap subsequently filled by BIGRS’ donations (i) . TA was provided for the police to use the equipment and to conduct safe nighttime operations (i), increasing enforcement campaigns in the pilot speed management corridors (o). A BIGRS staff described:

“ [It] was clear when you make enforcement operations visible, like speed enforcement down that avenue. In a matter of months, we already saw a reduction [in speed].” – G12

The new roadside enforcement was complemented by automated speed detection cameras ( c) ; however, the public was skeptical of the speed cameras' threatening the pilot’s success (c). Public messaging campaigns were therefore developed using city data to demonstrate the rationale behind the speed reduction and enforcement ( i ). A BIGRS staff described:

“Legitimacy has to do with road users' acceptance of this type of control. […] What decisions were made? Make visible the places where photodetection cameras are installed. They were published on the website of the Ministry of Mobility, and there was a strong media drive to make these cameras visible and associate the cameras with the issue of life-saving cameras.” –B8

BIGRS also provided monitoring and evaluation support (i) which quickly demonstrating the pilot’s effectiveness (o). A BIGRS staff described:

“In a matter of months, I already saw a reduction [in deaths]. That gave the Secretary of Mobility the confidence, trust like, ‘OK, like this is working, we are reducing deaths where we are not messing up traffic. Let's do it.’” – G12

A city official recalled the importance of the pilots:

“Yes, yes, yes, that was very well done. The expressive power of those corridors, of the first ones” – B14

Because of the positive pilot results, the city increased the number of corridors with lowered speed limits (o). The results of the pilot were also shared with the public, reinforcing the message that the speed reduction corridors were lifesaving interventions (G12, G5) (i) and further reducing the perception of risk in lowering the speed limit by building public support (m). As the pilot gained increased support, city counterparts used the data to develop a technical document justifying the lowered speed limits to Bogotá’s city council. A BIGRS staff described:

“To be able to argue before the City Council, it was necessary to argue with objective judgment elements […] Why did they decide to slow down? Not because it occurred to us. No, the speed was lowered because this technical document allows us to support making that decision.” – B8

Aided by the pilot’s success and with the support built through public messaging campaigns, the city council maintained the 50 km/h speed limit on the pilot corridors (o). However, the city council initially did not have the authority to change city-wide speed limits permanently (c), preventing scale-up until a window of opportunity opened in 2020. During the 2020 COVID-19 pandemic, a state of emergency was declared, giving temporary executive authority to the Mayor (c) . Although the Secretary of Mobility (the champion of the pilot) had changed, their successor became a new champion. They successfully argued that the speed limit reduction was preventing traffic crashes, thereby reducing non-COVID-19 health emergencies and freeing up healthcare capacity during the pandemic. This allowed the Mayor to extend the speed reductions city-wide in alignment with the WHO’s advised 50 km/h (o).

Reflecting on Bogotá’s experience with BIGRS, a city official described how BIGRS’ comprehensive TA approach was important in supporting the city’s road safety vision:

“We wanted to build how this systemic vision of approaching the problem. And then Bloomberg supported us with communications, technical, infrastructure, traffic calming, and enforcement issues.” – B2

City officials and BIGRS staff alike credited city leadership for continuously supporting road safety throughout several administrations and for giving political support to technical staff who brought changes to the city (c). One government official commented:

“ Everyone, I think, without exception, has supported this work. I believe that the first requirement to choose a city is that there is willingness. What has been in Bogotá, really, is the political will of the leaders to carry it out. Without it, you do nothing .” – B14

Comparison of Bogotá’s policy experience with other BIGRS cities

The scale of change in Bogotá’s road safety programming stands apart from the other case studies. Second to this was Accra; the city government formed a new road safety council and developed the city's first Pedestrian Action Plan (o). Like Bogotá, BIGRS in Accra leveraged city prioritization for road safety and provided city-specific evidence (i), which focused city stakeholders’ efforts on the importance of pedestrian safety (A8, A1, A5, A3, program documents). Also, like Bogotá, the Mayor was a champion who lent convening power to the development of Accra’s action plan (4, A6, A3, A5, A8, A1, program documents) (c) . The Accra Pedestrian Action Plan was further perceived to improve coordination of different road stakeholders towards a common goal (A8, A1, A5, A3, A6, program documents).

In Mumbai, in contrast, BIGRS staff and partners described a lack of an individual champion with the authority to advance road safety policy and planning at the city level as a key challenge (M10, M11, M12, M13, M14, M15, M21).

Revised program theory

The revised program theory for BIGRS should be considered an initial attempt to synthesize across both positive cases (where outcomes were observed) and negative cases (where outcomes were limited by specific factors) to distill a set of higher-level statements about how BIGRS works at the city level and the contexts that enable or constrain its success.

The first program theory is improved road safety capabilities, focused on capacity and data use interventions described by BIGRS staff and partners as precursors to implementation in each case study city.

Program theory for improved road safety capabilities:

Providing TA to increase capacity and data use (i), if delivered via trusted and credible TA providers who provide hands-on coaching support tailored to city needs and with counterparts interested in engaging with road safety, can strengthen road safety capabilities (o) because it shifts the focus of city officials towards evidence-informed approaches and creates a how-to moment to improve road safety through the safe system approach (m). This outcome is enabled by city prioritization of road safety (c) and can be disrupted if city government officials change (c) .

The second program theory is increasing the uptake of evidence-informed implementation of road safety interventions. In this theory, capacity building and data now comprise the necessary context that supports the interventions, and BIGRS and city officials are characterized as working together to implement.

Program theory for increasing the uptake of evidence-informed implementation of road safety interventions:

If trusted and credible TA providers, working with and through city champions (c), undertake a successful pilot (i), guided by city-specific data that targets interventions (c) , and with facilitation of city implementation via dedicated equipment, training and other supportive resources (i) , then this can increase the uptake of evidence-informed road safety interventions (o). This occurs because a pilot builds confidence that the safe systems approach is feasible in a specific road context (m) , and it reduces the perception of risk in adopting a new approach (m) by allowing city officials to gauge public sentiment. The scale and sustainability of the outcome(s) are determined by the city’s existing prioritization of road safety, the authority of the individuals and road safety agencies targeted in the intervention, and existing socio-cultural norms (c) . It can be disrupted if city government officials change (c) .

BIGRS’ interventions sought to accelerate cities’ adoption of the safe system approach. What united city officials were two questions – will it work here, and how? To answer those questions, TA needed to go beyond recommending that a safe system approach would work, to demonstrating how it could work, to prove that it worked (without provoking negative reactions from the public).

How did TA work?

TA provider credibility and ability to navigate the city context were important. This was demonstrated by embedded staff who continuously connected the evidence-base and resources of international partners with the tacit knowledge and goals of city agencies. By playing a dual ‘insider-outsider’ role, embedded staff worked to create a favorable context for interventions and made interventions a better fit for the context. This describes the role of boundary-spanners who bridge insider and outsider roles to facilitate the adoption of an intervention [ 12 ].

How TA was provided was also essential. TA providers needed to understand the context and work effectively within it, not against it. Capacity-building activities needed to follow a coaching model, amplifying the existing knowledge, needs, and priorities of decision-makers. Interventions needed to be immediately relevant to the context, or TA providers risked losing credibility. BIGRS embedded staff and partners based full-time in the city again had the advantage here. This finding aligns with calls for TA to be context-sensitive [ 50 , 51 ] and aligns with the characteristics of successful change agents [ 12 ].

Why did TA work (or not work)?

The mechanism ‘shifting the focus’ was about data. Aligning with diffusion of innovation theory, data framed a ‘felt need’ for change [ 52 ] in all cities to different degrees. Bogotá was an early adopter; new data was presented within the context of political commitment to road safety, and pre-existing automated enforcement infrastructure enabled BIGRS to develop data-driven machine learning models to predict the results of the speed enforcement pilot. In Mumbai, in comparision, most of BIGRS’ Phase II activities focused on building city data capabilities to catalyze this shift in focus. ‘Shifting the focus’ was further enhanced by city officials’ ability to establish fora for governing the use of data to support policy decisions, consistent with international norms [ 53 ].

But ‘shifting the focus’ was also directly facilitated by BIGRS, making it the most uncertain mechanism. An alternative conceptualization is that BIGRS ‘shifted the focus’ by dedicating resources to specific interventions, informed by its data, which the city endorsed.

The second mechanism, creating a ‘how-to moment’, comes from diffusion of innovation’s knowledge phase [ 52 ]. Adopters must understand how an innovation works, especially if the innovation is complex [ 52 ]. Pilots allowed officials to see change in action, built confidence, and reduced the risk of stakeholder discontent from changing the road environment [ 12 , 52 ]. BIGRS also had an advantage; infrastructure re-design and enforcement are trial-able approaches with quickly observable outcomes which supports innovation adoption [ 10 , 12 ].

Under what conditions did TA work (or not work)?

Moving from the first program theory outcome (‘strengthened road safety capabilities’) to the second (‘increasing evidence-informed interventions’) required more than triggering individual-level mechanisms. To change implementation, individual-level mechanisms had to translate into institutional actions by city officials– e.g., approving pilots, allocating resources, and implementing interventions. It was here that context was critical.

City champions were key to enabling change. Champions are important in diffusion of innovation theory [ 12 ] and were critical here. However, following structure-agent theory, city champions could only change areas within their control [ 54 ] and their agency varied. Comparing Bogotá and Accra is instructive. Bogotá had considerable latitude to change road safety practices, while Accra’s pilot task force failed to scale due to limited institutional and normative authority to enforce legislation. Officials in road safety agencies lamented this alongside BIGRS staff, suggesting that the interventions were compatible with the context [ 52 ] but that the city's agency was constrained.

Structural, or outer, contexts therefore determined the feasibility of converting individual and city level mechanisms into outcomes. Diffusion of innovation theory considers that innovation may not be ‘compatible’ with the context or that the system may not be ‘ready’ for change, which was important in these cases. But more important, however, was how the innovation was introduced, who introduced it, the city's priorities, and city’s authority to adopt the innovation. This points to a critical consideration – if the dissemination approach of how the innovation is introduced is incompatible with structural context, adoption will be slow or unsuccessful (even if the innovation itself fits the context).

Boundary spanning – crossing boundaries to negotiate interactions and translate knowledge from different settings [ 55 ] – is one way to bridge the gap between proposed solutions and local contexts. A 2017 multi-county nutrition project found that boundary spanning was feasible and useful to navigate context-specific challenges [ 56 ]. Our study suggests that boundary spanning – if those doing the boundary spanning are deeply embedded within the local context – could be a useful model for delivering TA. Engaging boundary spanners from the beginning to work with city government officials to design TA programs around local problems and priorities, rather than providing both with a model from elsewhere to adapt, is a practical way to design more context-sensitive TA and surface local innovations [ 13 ].

Strengths and limitations

The goal of this study was to learn from implementation experience and develop a program theory. We did not quantitatively measure outcomes, a limitation. To improve trustworthiness, we triangulated findings across cities and data sources. However, outcomes were mainly validated with informants due to a lack of access to documents across BIGRS partners, creating some uncertainty. Another limitation was the overrepresentation of BIGRS staff and partners in our sample as compared to government officials and other city stakeholders. The reasons for this were both practical – e.g., scheduling interviews over Zoom, governance differences across cities – and representative of broader findings – government official turnover limited available informants. Finally, several authors (but not the first author) were involved in BIGRS’ implementation, which required continual bracketing when analyzing the data.

Our multiple case study design was a strength, enabling ICMO comparison across cities, reducing uncertainty, and increasing confidence. Iterative data collection and validation of the program theory with participants further reduced uncertainty because we could discuss uncertainties with participants and dig deeper. We also verified our interpretations with documents.

We identified broadly applicable insights into the role of TA in strengthening evidence-informed road safety in LMICs and distilled these into a program theory, contributing to knowledge on multisectoral TA programs in global health. Our study is the first we know of to empirically analyze the role of TA in influencing road safety in LMICs. BIGRS’ program theory emphasizes the interaction of trust, credibility, champions and their authority, governance structures, political prioritization, and the implement-ability of evidence in creating the conditions for road safety change. Designing context-specific TA appropriate for structural contexts is critical. If decision makers prioritize road safety, TA can accompany local leaders in adapting international approaches to local realities. In this way, we see cross-country multisectoral projects as important opportunities to improve population health.

Availability of data and materials

Data generated and analyzed during this study are included in this article. Key informants were assured that the raw transcripts would not be shared.

Institute for Health Metrics and Evaluation. GBD Compare [Internet]. University of Washington. 2019 [cited 2023 Feb 10]. Available from: https://vizhub.healthdata.org/gbd-compare/#

World Health Organisation. Global status report on Road safety 2018 [internet]. Geneva: World Health Organization; 2018. Available from: http://apps.who.int/bookorders

Google Scholar

Hyder AA, Hoe C, Hijar M, Peden M. The political and social contexts of global road safety: challenges for the next decade. Lancet. 2022;400:127–36.

Article PubMed Google Scholar

Demystifying the safe system approach [internet]. Vision zero Network. 2023 [cited 2023 Feb 11]. Available from: https://visionzeronetwork.org/resources/demystifying-the-safe-system-approach/

Haghani M, Behnood A, Dixit V, Oviedo-Trespalacios O. Road safety research in the context of low- and middle-income countries: macro-scale literature analyses, trends, knowledge gaps and challenges. Saf Sci. 2022;146:105513.

Article Google Scholar

Shuey R, Mooren L, King M. Road safety lessons to learn from low and middle-income countries. Journal of Road Safety. 2020;31:69–78.

Peden MM, Puvanachandra P. Looking back on 10 years of global road safety. Int Health. 2019;11:327–30.

Soames Job RF, Wambulwa WM. Features of low-income and middle-income countries making Road safety more challenging. Journal of Road Safety. 2020;31:79–84.

West GR, Clapp SP, Averill EMD, Cates W Jr. Defining and assessing evidence for the effectiveness of technical assistance in furthering global health. Glob Public Health. 2012;7:915–30.

Article PubMed PubMed Central Google Scholar

Dearing JW. Applying diffusion of innovation theory to intervention development. Res Soc Work Pract. 2009;19:503–18.

Dearing JW, Cox JG. Diffusion of innovations theory, principles, and practice. Health Aff. 2018;37:183–90.

Greenhalgh T, Robert G, Macfarlane F, Bate P, Kyriakidou O. Diffusion of innovations in service organizations: systematic review and recommendations. Milbank Q. 2004;82:581–629.

Olivier de Sardan J-P, Diarra A, Moha M. Travelling models and the challenge of pragmatic contexts and practical norms: the case of maternal health. Health Res Policy Syst. 2017;15:60.

UN Road Safety Fund. Open day knowledge kit. New York; 2023.

International transport forum. The safe system approach in action. Paris; 2022.

Book Google Scholar

Turner B, Job S, Mitra S. Guide for Road safety interventions: evidence of what works and what does not work. Washington; 2021.

Sleet D, Baldwin G. Lost in translation: translating injury research into effective interventions. J Australas Coll Road Saf. 2010;

LaJeunesse S, Heiny S, Evenson KR, Fiedler LM, Cooper JF. Diffusing innovative road safety practice: a social network approach to identifying opinion leading U.S. cities. Traffic Inj Prev. 2018;19:832–7.

Knapp K, Walker D, Wilson E. Challenges and strategies for local Road safety training and technology transfer. Transportation Research Record: Journal of the Transportation Research Board. 2003;1819:187–90.

Koon AD, Lopez-Hernandez A, Hoe C, Vecino-Ortiz AI, Cunto FJC, de Castro-Neto MM, et al. Multisectoral action coalitions for road safety in Brazil: an organizational social network analysis in São Paulo and Fortaleza. Traffic Inj Prev. 2022;23:67–72.

UN Road Safety Fund [Internet]. United Nations Road Safety Fund. [cited 2024 Mar 1]. Available from: https://roadsafetyfund.un.org/

Global Road Safety Facility [Internet]. World Bank Group. 2023 [cited 2024 Mar 1]. Available from: https://www.roadsafetyfacility.org/

EU international cooperation in road safety [Internet]. European Commission . [cited 2024 Mar 1]. Available from: https://road-safety.transport.ec.europa.eu/what-we-do/eu-international-cooperation-road-safety_en

United Nations Institute for Training and Research. ROAD SAFETY INITIATIVE [Internet]. [cited 2024 Mar 1]. Available from: https://unitar.org/sustainable-development-goals/people/our-portfolio/road-safety-initiative

Initiative for global Road safety [internet]. Bloomberg Philanthropies 2022 [cited 2022 Dec 11]. Available from: https://www.bloomberg.org/public-health/improving-road-safety/initiative-for-global-road-safety/

Wong G, Westhorp G, Manzano A, Greenhalgh J, Jagosh J, Greenhalgh T. RAMESES II reporting standards for realist evaluations. BMC Med. 2016;14:96.

Pawson R, Tilley N. Realistic evaluation. Newbury Park: Sage Publications, Inc; 1997.

Yin RK. The case study method as a tool for doing evaluation. Curr Sociol. 1992;40:121–37.

Yin RK. Case study research - design and methods. 4th ed. Thousand Oaks: SAGE Publications; 2009.

Mukumbang FC, Marchal B, Van Belle S, van Wyk B. Unearthing how, why, for whom and under what health system conditions the antiretroviral treatment adherence club intervention in South Africa works: a realist theory refining approach. BMC Health Serv Res. 2018;18:343.

Greenhalgh T, Pawson R, Wong G, Westhorp G, Greenhalgh J, Manzano A, et al. What realists mean by context; or, why nothing works everywhere or for everyone. 2017.

World Bank Country and Lending Groups [Internet]. World Bank. [cited 2021 Jan 31]. Available from: https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups

Population density by city, 2014 [Internet]. Our World in Data. [cited 2022 Dec 26]. Available from: https://ourworldindata.org/grapher/population-density-by-city

Motor vehicles per 1000 inhabitants vs GDP per capita, 2014 [Internet]. Our World in Data . [cited 2022 Dec 26]. Available from: https://ourworldindata.org/grapher/road-vehicles-per-1000-inhabitants-vs-gdp-per-capita?tab=table

Greenhalgh T, Pawson R, Wong G, Westhorp G, Greenhalgh J, Manzano A, et al. Retroduction in realist evaluation. Oxford; 2017.

Manzano A. The craft of interviewing in realist evaluation. Evaluation. 2016;22:342–60.

Saunders B, Sim J, Kingstone T, Baker S, Waterfield J, Bartlam B, et al. Saturation in qualitative research: exploring its conceptualization and operationalization. Qual Quant. 2018;52:1893–907.

Gilmore B, McAuliffe E, Power J, Vallières F. Data analysis and synthesis within a realist evaluation: toward more transparent methodological approaches. International journal of qualitative. Methods. 2019;18.

Mirzoev T, Etiaba E, Ebenso B, Uzochukwu B, Ensor T, Onwujekwe O, et al. Tracing theories in realist evaluations of large-scale health programmes in low- and middle-income countries: experience from Nigeria. Health Policy Plan. 2020;35:1244–53.

Mumbai police traffic control branch, Bloomberg philanthropies initiative for global Road safety. Mumbai Road safety report 2018 key findings [internet]. Mumbai police traffic control branch. Mumbai; 2018 Apr. Available from: https://archive.org/details/mumbairoadsafetyreport2018keyfindings

Mumbai Live Team. Report shows half of road accident casualties in Mumbai in 2018 were pedestrians [Internet]. Mumbai Live. [cited 2023 Feb 15]. Available from: https://www.mumbailive.com/en/transport/mumbai-road-safety-annual-report-2018-shows-half-of-road-accident-casualties-in-mumbai-in-2018-were-pedestrians-40204

Press Trust of India. Mumbai road safety report suggests 22% decline in accident deaths [internet]. India News 2019 [cited 2023 Feb 15]. Available from: https://www.republicworld.com/india-news/general-news/mumbai-road-safety-report-suggests-22-percent-decline-in-accident-deaths.html

Bhatt A, Mascarenhas B, Ashar D. Redesigning One of Mumbai’s Most Dangerous Intersections in 3 Simple Steps [Internet]. TheCityFix. 2019 [cited 2023 Feb 15]. Available from: https://thecityfix.com/blog/redesigning-one-mumbais-dangerous-intersections-3-simple-steps-amit-bhatt-binoy-mascarenhas-dhawal-ashar/

Natu N. LBS Road, 13 other Mumbai junctions set for pedestrian-friendly redesign [internet]. Times of India 2018 [cited 2023 Feb 15]. Available from: https://timesofindia.indiatimes.com/city/mumbai/lbs-road-13-other-mumbai-junctions-set-for-pedestrian-friendly-redesign/articleshow/64673212.cms

Natu N. Mumbai: Times Square experiment at CSMT junction begins [internet]. Times of India. 2019 [cited 2023 Feb 15]. Available from: https://timesofindia.indiatimes.com/city/mumbai/mumbai-times-square-experiment-at-csmt-junction-begins/articleshow/71715425.cms

Minhas G. Mumbai civic body invites urban designers to improve five streets - [Internet]. Governance Now. 2019 [cited 2023 Feb 15]. Available from: https://www.governancenow.com/news/regular-story/mumbai-civic-body-invites-urban-designers-to-improve-five-streets

Singh D. Thirteen-km-stretch of Lal bahadur Shastri Road to be widened, redesigned [internet]. The Indian Express 2018 [cited 2023 Feb 15]. Available from: https://indianexpress.com/article/cities/mumbai/thirteen-km-stretch-of-lal-bahadur-shastri-road-to-be-widened-redesigned-5241735/

AMA-BIGRS to begin road safety enhancement works at Lapaz [internet]. Accra metropolitan Assembly . 2018 [cited 2023 Mar 10]. Available from: https://ama.gov.gh/news-details.php?n=OTkzczkwMnFvMzc4MjI2OTQzNDIxNjJvNW4yM28xbnNxNjE5cDZvbw==

Agbenorsi J, Kwasin J. Police to check speeding on Accra roads [internet]. Graphic Online. 2019 [cited 2023 Feb 16]. Available from: https://www.graphic.com.gh/news/general-news/police-to-check-speeding-on-accra-roads.html

Kanagat N, Chauffour J, Ilunga JF, Yuma Ramazani S, Ovuoraye Ajiwohwodoma JJP, Ibrahim Anas-Kolo S, et al. Country perspectives on improving technical assistance in the health sector. Gates Open Res. 2021;5:141.

Scott VC, Jillani Z, Malpert A, Kolodny-Goetz J, Wandersman A. A scoping review of the evaluation and effectiveness of technical assistance. Implement Sci Commun. 2022;3:70.

Sahin I. Detailed review of Rogers’ diffusion of innovations theory and educational technology-related studies based on Rogers’ theory. Turk Online J Educ Technol. 2006;5:1303–6521.

Hawkins B, Parkhurst J. The ‘good governance’ of evidence in health policy. Evidence & Policy. 2016;12:575–92.

Sewell WH Jr. A theory of structure: duality, agency, and transformation. Am J Sociol. 1992;98:1–29.

Long JC, Cunningham FC, Braithwaite J. Bridges, brokers and boundary spanners in collaborative networks: a systematic review. BMC Health Serv Res. 2013;13:158.

Pelletier D, Gervais S, Hafeez-ur-Rehman H, Sanou D, Tumwine J. Boundary-spanning actors in complex adaptive governance systems: the case of multisectoral nutrition. Int J Health Plann Manag. 2018;33:e293–319.

Download references

Acknowledgements

The authors wish to acknowledge Dr. Jeremy Shiffman, Dr. Svea Closser, and Dr. Nukhba Zia of the Johns Hopkins University Bloomberg School of Public Health who provided comments on an earlier version of this manuscript. The authors also thank Sylviane Ratte, Director, Road Safety Program and Sara Whitehead, Consultant, Public Health and Preventive Medicine, Road Safety Program, at Vital Strategies who provided valuable comments on this research and support in contacting key informants. The authors also wish to thank Alma H. Ramírez of Teasa Translate for providing translation and transcription services for the study. Finally, the authors wish to thank the study participants who generously provided their time and valuable insights.

This project was supported by Bloomberg Philanthropies through the Bloomberg Philanthropies Initiative for Global Road Safety (Grant No. 111882). The funders were not involved in this study or development of this manuscript.

Author information

Authors and affiliations.

Johns Hopkins International Injury Research Unit, Health Systems Program, Department of International Health, Johns Hopkins Bloomberg School of Public Health, 615 N. Wolfe Street Suite E8527, Baltimore, MD, 21205, USA

Rachel Neill, Angélica López Hernández, Adam D. Koon & Abdulgafoor M. Bachani

You can also search for this author in PubMed Google Scholar

Contributions

RN, ALH, AK, and AB designed this study. RN and ALH collected and analyzed the data. RN wrote the first draft of the manuscript. RN, ALH, AK, and AB provided critical revisions to the manuscript. All authors approved the final version for publication.

Corresponding author

Correspondence to Rachel Neill .

Ethics declarations

Ethics approval and consent to participate.

This study was exempted as non-human subject research by the Johns Hopkins University Bloomberg School of Public Health Institutional Review Board (No: IRB00013713). We received oral informed consent from all participants.

Consent for publication

Not applicable.

Competing interests

The authors declare they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary material 1., supplementary material 2., supplementary material 3., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Neill, R., Hernández, A.L., Koon, A.D. et al. Translating global evidence into local implementation through technical assistance: a realist evaluation of the Bloomberg philanthropies initiative for global Road safety. Global Health 20 , 42 (2024). https://doi.org/10.1186/s12992-024-01041-z

Download citation

Received : 20 December 2023

Accepted : 22 April 2024

Published : 10 May 2024

DOI : https://doi.org/10.1186/s12992-024-01041-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Road safety
Low- and middle-income countries
Technical assistance
International development

Globalization and Health

ISSN: 1744-8603

Submission enquiries: [email protected]
General enquiries: [email protected]

Open access
Published: 09 May 2024

Designing an evaluation tool for evaluating training programs of medical students in clinical skill training center from consumers’ perspective

Rezvan Azad 1 ,
Mahsa Shakour 2 &
Narjes Moharami 2

BMC Medical Education volume 24 , Article number: 502 ( 2024 ) Cite this article

170 Accesses

Metrics details

Introduction

The Clinical Skill Training Center (CSTC) is the first environment where third year medical students learn clinical skills after passing basic science. Consumer- based evaluation is one of the ways to improve this center with the consumer. This study was conducted with the aim of preparing a consumer-oriented evaluation tool for CSTC among medical students.

The study was mixed method. The first phase was qualitative and for providing an evaluation tool. The second phase was for evaluating the tool. At the first phase, after literature review in the Divergent phase, a complete list of problems in the field of CSTC in medicine schools was prepared. In the convergent step, the prepared list was compared with the standards of clinical education and values of scriven. In the second phase it was evaluated by the scientific and authority committee. Validity has been measured by determining CVR and CVI: Index. The face and content validity of the tool was obtained through the approval of a group of specialists.

The findings of the research were in the form of 4 questionnaires: clinical instructors, pre-clinical medical students, and interns. All items were designed as a 5-point Likert. The main areas of evaluation included the objectives and content of training courses, implementation of operations, facilities and equipment, and the environment and indoor space. In order to examine the long-term effects, a special evaluation form was designed for intern.

The tool for consumer evaluation was designed with good reliability and trustworthiness and suitable for use in the CSTC, and its use can improve the effectiveness of clinical education activities.

Peer Review reports

Mastering clinical skills is one of the essential requirements for becoming a physician and pre-clinical courses play an important role in forming these clinical skills in medical students. The importance of these courses is such that a Clinical Skill Training Center (CSTC) has been formed especially for this purpose, which is nowadays used for training pre-clinical skills and some of the more advanced procedures such as operating room simulation [ 1 ]. The CSTC is an educational environment where students can use the available resources and the supervision of experienced faculty members to be introduced to clinical skills, train and gain experience in these skills and receive immediate feedback to resolve their mistakes and shortcomings [ 2 ]. The aim of the student’s participation in this center is the training of students who have sufficient theoretical knowledge but lack the necessary skills for working in the clinical setting. Therefore, this center supports students in the acquisition, maintenance and improvement of their clinical medical skills [ 3 ]. In this center, students can learn and repeat treatment procedures in a safe environment without severe consequences which reduces their stress and allows them to train and learn [ 4 ]. In this study, medical students attend this center for the first time after the end of theoretical course and before entering the hospital for the first time and Preliminary learn practical medical skills such as performing a variety of examinations and history taking. Then, in externship and internship, they can practice more advanced courses such as cardiopulmonary resuscitation, dressing and stitches etc. in small groups.

The importance of these centers like CSTCs is the fact that learning a large number of practical and communicational skills related to theoretical knowledge is one of the essential characteristics of medical education and can play an important role in the future careers of the students and training of specialized human resources in the field of medicine and healthcare [ 4 ]. However, one of the important matters in clinical training is the quality of education which can directly affect the quality of healthcare services provided to society. The quality of education is, in turn, affected by the details of the educational programs. Therefore, the evaluation of educational programs can play an important role in providing quality equations. In other words, using suitable evaluation mechanisms creates the requirements for performance transparency and accountability in the clinical education system in medical education [ 5 ]. Observing the principles of evaluation can also help determine the shortcomings and programs in educational programs [ 2 ]. However, the evaluation of educational programs is often faced with difficulties. Evaluations conducted to ensure the suitable quality of education for medical students must determine whether the students have achieved acceptable clinical standards which is only possible through careful evaluation of their training programs [ 2 ].

There are various problems concerning evaluation tools. The faculty members in medicine are still faced with challenges concerning the improvement of evaluation tools and the creation of tools for evaluating factors which are hard to quantify or qualify, such as professionalism, group work and expertise [ 6 ].

Despite various theories regarding evaluation, the lack of credible and valid evaluation tools for educational programs is still being felt [ 7 ]. Using suitable evaluation tools can create an overview of the current situation of the training programs based on the quality factors of the curriculum and can be used as a guideline for decision-making, planning, faculty development and improving the quality of education [ 8 ]. Perhaps the most important value of a suitable evaluation tool for training programs is providing a clear picture and operational and measurable measures regarding the implementation of educational programs. Furthermore, after completion, such a tool can be used as a constant interventional screening tool by academic groups, faculty members and authorities in practical training programs.

The consumer-oriented model advocated by evaluation expert and philosopher Michael Scriven. This model of evaluation like other models, is to make a value judgment about the quality of a program, product, or policy in order to determine its value, merit, or importance, but in this model, the value judgment is based on the level of satisfaction and usefulness of the curriculum for the consumers of the program. It is achieved and the evaluator considers himself to be responsive to their needs and demands. The models that are included in this approach have paid more attention to their responsibility towards the consumers of curriculum and educational programs.it is an exercise in value-free measurement of whether program goals were achieved [ 9 , 10 ].

The current study aims to design an evaluation tool for training programs in the CSTC based on consumers’ perspectives and assess its validity and reliability to facilitate the evaluation of educational programs and help improve the practical skills of medical students. Therefore, the prepared evaluation tool not only can be used for continuous improvement of educational equality but can also be used for validation of educational programs.

Subjects and methods

The study was mixed method with triangulation approach. This was a developmental study for developing an evaluation tool for educational programs of the CSTC in medicine schools from consumers’ perspective using data gathered through qualitative study, descriptive – survey study and from many resources. The study was done in 2020 until 2022 and in Arak University of Medical Sciences. Samples were students in different level, and clinical teachers who are consumers and main stakeholders. This study included two main phases.

The first phase was qualitative. Samples were literature and 10 experts. Sampling was purposeful. This phase was for decision-making regarding factors used for evaluating the educational programs of the CSTC. In this phase and to create a deep understanding of the topic, the literature related to the subject matter was reviewed. The reviewed literature related to evaluation was based on the consumers’ perspective evaluation and questionnaire preparation method. Then, using the Scriven consumer opinion questionnaire, standards for CSTC, and the available literature, interviews were conducted with experts and stakeholders in the CSTC. These interviews aimed to prepare a comprehensive list of problems, and concerns related to the educational programs at the clinical skill training center which the evaluation tool aimed to answer. This stage was known as the divergent stage where the topics discussed in the interviews included educational goals, content, equipment, educational processes, the environment and physical location. Some of the questions asked in this stage included “What is the level of achieving educational goals among students in the current program?”, “How effective is the practical program of the center in improving the clinical skills of the students?”, “Does the center has access to sufficient tools and equipment for completing its educational program?” and “what are the long-term effects of CSTC’s educational program?”

In the next step, known as the convergent step, the list prepared in the previous stage was combined with the educational standards for CSTCs provided by the deputy of education, ministry of health as well as Scriven criteria. The results were then carefully assessed by a scientific and authority committee consisting of the Educational Deputy of Clinical Education of the Faculty of Medicine, Director of Educational Affairs of the Faculty of Medicine, Director of Clinical Skills Training Center and Curriculum, Expert of Clinical Skills Center and Bachelor of Technical Affairs of Clinical Skills Training Center in the Faculty of Medicine of Arak University of Medical Sciences. The questionnaire items were selected based on the importance and evaluation criteria. The data gathering tool was prepared after determining the evaluation questions, data gathering sources and designing the evaluation method. Customers in this study were clinical training faculty members and medical students (externship, pre-clinical and internship students). Therefore, we designed four questionnaires with special questions. Every questionnaire is designed in 5 domains (Learning objectives and course content, Equipment and tools, Educational processes, Environment and physical location).

The second phase was quantitative and it was survey. Samples were professors and who were experts in subject and medical students (externship, pre-clinical and internship students). Sampling was conventional and purposeful. 10 faculty members and 71 students were selected. This phase was for measuring the questionnaire’s face and content validity. The validity was measured using Content Validity Ratio (CVR) and Content Validity Index (CVI) using Lawshe’s method. In this method, the opinion of experts in the field concerning the questionnaire content is used to calculate these factors [ 11 ]. A total of 10 faculty members participated in the validity survey and including faculty members from specialty fields of medical education, gynecology, infectious diseases, emergency medicine, pediatric medicine, nursing and midwifery. After explaining the research goals to the participants and providing them with the operational definitions related to the contents of the items, they were asked to mark each item in a table using a three-part Likert scale using “essential”, “useful not nonessential” and “nonessential” scores. Then, Content Validity Ratio was calculated using the following equations. CVR= \(\frac{\varvec{n}\varvec{e}-\varvec{n}/2}{\varvec{n}/2}\) . In this equation, n is the total number of experts, and n e is the number of experts who have selected the “essential” score. Using the CVR table, the minimum CVT value for accepting an item based on the participants’ opinions was set at 0.62.

After calculating CVR, the method proposed by Waltz & Bausell was used for determining the CVI. To this end, a CVI evaluation table was prepared for the items using a four-part scale including “unrelated”, “requiring major revision”, “requiring minor revision” and “relevant” scores and delivered to the 10 participating experts who were asked to provide their opinions regarding each item. Then, the CVI value was calculated for each item by dividing the total number of “requiring minor revision” and “relevant” answers by the total number of experts. The items with CVI values higher than 0.79 were accepted [ 11 , 12 ]. The reliability of the questionnaire was determined with emphasis on internal correlation with the help of SPSS software and was higher than 0.8, which confirmed the suitable reliability of the questionnaire. A panel of experts then conducted a qualitative review of the items, edited their grammar, and modified unclear statements based on the research goals. In general, the entire phrase should have been accepted by the majority of the panel based on simplicity, clarity and lack of ambiguity. The face validity was also calculated by scoring the effect of each item on the questionnaire. This score was then used to eliminate phrases with scores lower than 1.5. After evaluating the face validity, Content Validity Ratio (CVR) was calculated by the experts and items with CVR values less than the threshold value were selected and eliminated. After that, we used this tool by 71 students and 11 teachers to assess reliability according to Cronbach’s alpha.

The results of the current study indicate that according to the faculty members and experts participating in this study, the evaluation of educational programs of clinical skill training centers includes evaluation of programs in regards to goal and content, educational processes, equipment and tools, and environment and physical location. After interviews with clinical training experts and a review of relevant literature, 4 separate questionnaires were developed for clinical training faculty members, pre-clinical students, internship students, and externship students. All experts as samples answered all questions for validity and 71 students of 90 students completely answered the questionnaires.

The questionnaire for faculty members included 35 items (Table 1 ), the one for interns included 6 items (Table 2 ), the externship students’ questionnaire included 29 (Table 3 ) items and the questionnaire for pre-clinical students included 41 items (Table 4 ). All items were designed for scoring using a 5-point Likert system (very low, low, average, high, very high).

The face validity of questionnaires was evaluated using qualitative and quantitative approaches. Among 117 items in 4 questionnaires, 6 items didn’t have suitable content validity (CVR < 0.62) which were eliminated according to the following table (Table 5 ). 111 items had CVR ≥ 0.62 and the results of the CVI assessment indicated that all items were acceptable.

The reliability of the questionnaires was investigated using Cronbach’s Alpha with emphasis on internal correlation with the help of SPSS software as presented in the following table, which confirms the reliability of the questionnaires (Table 6 ). The reliability in all questionnaire was more than %83. Therefore, all items received acceptable reliability and validity scores.

In the current study, a comprehensive researcher-made questionnaire was prepared based on the opinions of experts and curriculum designers while considering all relevant resources and literature which is a unique tool in Iran regarding the expansiveness of the scope. The prepared tool was then used to evaluate the activities of the clinical skills training center in 5 domains (1) program goals and content, (2) tools and equipment, (3) educational processes, (4) environment and physical location and (5) long-term effects of the curriculum.

The first part of the evaluation tool prepared in the current study aims to assess the objective goals of program according to the consumer’s views. CSTC is suitable for training basic and practical skills which are often neglected due to time constraints during the students’ presence in clinical environments [ 6 ]. The factors investigated in this area using the current tool included basic skills such as patient interview, basic resuscitation, clinical examination, practical clinical activities, interpretation of essential clinical findings, prescription skills and patient management. Other studies have also investigated similar factors. For example, Imran et al. (2018) in their study evaluated the attitude of students towards this center and stated that participation in Skill Lab sessions in the pre-clinical years will assist students in their clinical year to achieve better overall performance, as well as better communication skills and self-esteem [ 1 ]. According to previous studies, the majority of students preferred participation in pre-clinical straining in these centers due to the advantages of skill labs for learning clinical skills [ 3 ]. Another study showed that the majority of students prefer participation in skill lab for learning essential clinical skills such as venous blood sampling, catheterization, endotracheal intubation, listening to respiratory sounds, genital examination, etc. compared to directly performing these procedures on patients [ 2 ]. The designed tools in current study evaluated some of these learning objectives. But because of evaluating 5 domains and many questions in every domain, we summarized them to be user friendly. Every questionnaire had some question for objectives that questionnaire respondents as customers (faculty members and medical students) could reply them.

The second part of this evaluation tool is for assessing educational tools such as educational mannequins and models, medical examination devices (Stethoscope, sphygmomanometer, otoscope and ophthalmoscope), medical consumables, audio-visual equipment and information technology facilities. According to the studies, a common factor in CSTCs is access to a wide range of tools in each university as well as using updated technologies for education. These innovations have even resulted in the improved academic ranking of some colleges and medical universities in the world [ 12 ]. The quality of these educational tools is the other important item in many studies [ 13 ]. The quality for mannequin is depended to fidelity. Brydges et al. in his study showed that higher fidelity causes more learning and less time for learning. They suggested that clinical curricula incorporate exposure to multiple simulations to maximize educational [ 14 ].

The third part of this tool is educational processes consisted of evaluating factors such as the length and number of workshops, the effect of CSTC on teaching in a clinical environment, the effect of the center on increasing the motivation and interest in clinical topics, use of volunteer patients and actors and use of modern teaching and assessment methods. This area evaluates the educational process as an important part of clinical training. The importance of this area is also confirmed in other studies. CSTC enables students, including interns and new students, to practice procedures without fearing the consequences. Furthermore, there is also no time of ethical constraints in these practices, enabling the students to be trained in treatment procedures and physical examinations which can be dangerous or painful for the patient [ 2 ]. In this regard, the standardized patient is one of the popular methods used in universities around the world. For example, the University of Massachusetts had been using standardized patients as an education and assessment tool and even as clinical trainers for more than 20 years [ 8 ]. Another example is the simulation center of Grand Valley State University, which provides significant tools for the management of standardized patients, including registration and deployment of standardized patients as needed. This center has designed a website for the registration of standardized patients, which allows individuals to register based on certain criteria, before being trained and deployed according to the protocols [ 8 ].

The effectiveness of clinical skill training centers on motivation was presented in a study by Hashim et al. (2016) on the effects of clinical skill training centers on medical education. According to the results of this study, 84 to 89 per cent of students believed that these centers increase the motivation for medical education as well as interest in learning clinical skills [ 3 ]. In regards to the use of modern methods, one of the most recent examples is the use of clinical simulations using multimedia tools and software which can be used for improving psychological and psychomotor skills. Studies have shown that these centers also lead to improved motivation and independent learning tendencies among students [ 13 ].

The forth part, is related to the evaluation of the environment and physical location in the current tool, accessibility, flexibility in application, similarity to a real environment, specialized training spaces, receiving feedback and use of multimedia technologies. These factors are extracted according to the opinions of experts and stakeholders and have been used in similar studies. According to the standard for clinical skill training centers presented by the Ministry of Health, Treatment and Medical Education, the preferred physical location for a clinical skill training center includes a large area with a flexible application as well as a wardroom, nursing station, ICU or smaller rooms with specialized applications such as operation room and resuscitation room. Furthermore, a clinical skill training center must have access to a suitable location for providing students with multimedia education [ 8 ].

James et al. in their study, have shown effectiveness of an experimental pharmacology skill lab to facilitate training of specific modules for development of core competencies of parenteral drug administration and intravenous drip settings using mannequins for development of skills in administering injections for undergraduate medical students [ 15 ]. These factors were included in the evaluation questionnaire prepared in the current study. In the study by Hashim et.al.(2016), 62 participants believed that the time constraints and pressure of the clinical environment were not present in CSTC during learning clinical skills. Therefore, these centers can help students improve their skills by making them feel secure and resolve their concerns about the consequences of their actions. According to the students participating in this study, approximately 70 to 75 per cent of students felt more secure regarding mistakes and less worried about harming patients during clinical procedures after training clinical skills on mannequins available at clinical skill training centers [ 3 ].

The fifth part includes evaluating the long-term effects of education and evaluating the conformity between the center’s curriculum and educational needs, the effect of the center on improving essential skills, the effect of curriculum on interest, stress and facilitating clinical procedures. Ji He Yu et al. observed that after training in a clinical skill training center and simulations, students show a significantly lower level of anxiety and a significantly higher level of self-esteem compared to before the training. Furthermore, after experiencing the simulation, students without previous simulation experiences showed lower anxiety and higher self-esteem [ 16 ]. In a systematic review by Alanazi et al., evidence showed that participation in CSTC and using simulation can significantly improve the knowledge, skill and self-esteem of medical students [ 17 ]. Furthermore, a study by Younes et al. showed that adding a simulation program to a normal psychology curriculum improves the quality of education and the self-esteem of medical students [ 18 ]. In another study, Hashim et.al.(2016) showed a positive attitude among the students regarding the effectiveness of clinical skill training centers for improving skills, self-esteem as well as learning new clinical skills [ 3 ]. Therefore, based on the role of clinical skill training centers in improving the motivation and self-esteem of students presented in previous studies, these factors can be important in the evaluation of clinical skill training centers and therefore included in the evaluation questionnaire.

Limitations:

We had some limitations in our study. 1)There wasn’t any evaluation tool for evaluating Training Programs of medical students in Clinical Skill Training Center according to Consumers’ Perspective. Therefore, comparison was difficult and we compared every domain with results of other studies. The study was triangulation and we used many resources to designing this tool and it reduced biases. 2) In convergent step we extracted many items, but because of the possibility of non-response all questions, we couldn’t use all of them and questionnaires are summarized. To assuring no important item is neglected, experts in medical education checked the items.

There are many items in an evaluation tool for evaluating the Clinical Skill Training Center from Consumers’ Perspective. Some of these items could be answered by some consumers not all of them. In this tool is defined in 4 tools for four type of consumers. In every tool respondent answer questions in 5 domains (Learning objectives and course content, Equipment and tools, Educational processes, Environment and physical location). The evaluation tool designed in the current study offers suitable reliability and validity and can be used for evaluating CSTC from consumers’ perspectives. The application of this tool can help improve the effectiveness of educational activities and the curriculum in clinical skill training centers.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Abbreviations

Clinical Skill Training Center

Content Validity Ratio

Content Validity Index

- Imran M, Khan SA, Aftab T. Effect of preclinical skill lab training on clinical skills of students during clinical years. Pak J Phsyiol. 2018;12(3):30 – 2. https://pjp.pps.org.pk/index.php/PJP/article/view/580

- Upadhayay N. Clinical training in medical students during preclinical years in the skill lab. Adv Med Educ Pract. 2017;8:189–94.

Article Google Scholar

Hashim R, Qamar K, Khan MA, Rehman S. Role of Skill Laboratory Training in Medical Education - Students’ Perspective. J Coll Physicians Surg Pak. 2016;26(3):195-8. PMID: 26975950.

- Singh H, Kalani M, Acosta-Torres S, El Ahmadieh TY, Loya J, Ganju A. History of simulation in medicine: from Resusci Annie to the Ann Myers Medical Center. Neurosurgery. 2013;73(Suppl 1):9–14.

- Bazargan A. Educational evaluation. Tehran: Samt; 2020.

Google Scholar

- Morgan J, Green V, Blair J. Using simulation to prepare for clinical practice. Clin Teach. 2018;15(1):57–61.

- Pazargadi M, Ashktorab T, Alavimajd H, Khosravi S. Developing an Assessment Tool for nursing Students` General Clinical performance. Iran J Med Educ. 2013;12(11):877–87.

- Denizon Arranz S, Blanco Canseco JM, Pouplana Malagarriga MM, Holgado Catalán MS, Gámez Cabero MI, Ruiz Sánchez A, et al. Multi-source evaluation of an educational program aimed at medical students for interviewing/taking the clinical history using standardized patients. GMS J Med Educ. 2021;38(2):Doc40.

- Lam CY. Consumer-oriented evaluation Approach. The SAGE Encyclopedia of Educational Research, Measurement, and evaluation. Thousand Oaks: SAGE; 2018. pp. 390–2.

- Fitzpatrick J, Sanders J, Worthen B. Program evaluation: alternative approaches and practical guidelines. 4th, editor: ed. Boston: Allyn Bacon; 2004.

- Waltz CF, Bausell RB. Nursing research: design, statistics, and computer analysis. Philadelphia: V. A. Davis; 1981.

- Zamanzadeh V, Rassouli M, Abbaszadeh A, Majd HA, Nikanfar A, Ghahramanian A, editors. Details of content validity and objectifying it in instrument development2014.

- O’Connor M, Rainford L. The impact of 3D virtual reality radiography practice on student performance in clinical practice. Radiography. 2023;29(1):159–64.

Brydges R, Carnahan H, Rose D, Rose L, Dubrowski A. Coordinating Progressive Levels of Simulation Fidelity to maximize Educational Benefit. Acad Med. 2010;85(5):806–12.

James J, Rani RJ. Novel strategy of skill lab training for parenteral injection techniques: a promising opportunity for medical students. Int J Basic Clin Pharmacol. 2022;11(4):315.

Yu JH, Chang HJ, Kim SS, Park JE, Chung WY, Lee SK, et al. Effects of high-fidelity simulation education on medical students’ anxiety and confidence. PLoS ONE. 2021;16(5):e0251078.

Alanazi A, Nicholson N, Thomas S. Use of simulation training to improve knowledge, skills, and confidence among healthcare students: a systematic review. Internet J Allied Health Sci Pract. 2017.

Younes N, Delaunay A, Roger M, et al. Evaluating the effectiveness of a single-day simulation-based program in psychiatry for medical students: a controlled study. BMC Med Educ. 2021;21(1):348.

Download references

Acknowledgements

Sincere thanks to the practice tutors who undertook these clinical assessments and also we are very thankful to professors of Arak University of Medical Sciences for helping us in successful designing the questionnaire.

Not applicable.

Author information

Authors and affiliations.

Medical Education development center, Arak University of Medical Sciences, Arak, Iran

Rezvan Azad

Medicine School, Arak University of Medical Sciences, Arak, Iran

Mahsa Shakour & Narjes Moharami

You can also search for this author in PubMed Google Scholar

Contributions

The concept and framework were designed by MSH and RA. The questionnaires and data were collected by RA. Data analyzed by MSH and RA. The manuscript was prepared by NM and edited by MSH and NM. The technical editing was done by MSH.

Corresponding author

Correspondence to Mahsa Shakour .

Ethics declarations

Ethics approval and consent to participate.

This study received ethical approval from the Institutional Review Board (IRB) of University of Medical Sciences, Iran to which the researchers are affiliated. All study protocols were performed in accordance with the Declaration of Helsinki. This study considered ethical considerations such as the confidentiality of the participants’ names and the written consent of participants. survey was conducted in 2021. Informed consent from each participant was obtained after clearly explaining the objectives as well as the significance of the study for each study participant. We advised the study participants about the right to participate as well as refuse or discontinue participation at any time they want and the chance to ask anything about the study. The participants were also advised that all data collected would remain confidential.

Consent for publication

Not Applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Azad, R., Shakour, M. & Moharami, N. Designing an evaluation tool for evaluating training programs of medical students in clinical skill training center from consumers’ perspective. BMC Med Educ 24 , 502 (2024). https://doi.org/10.1186/s12909-024-05454-7

Download citation

Received : 22 November 2023

Accepted : 22 April 2024

Published : 09 May 2024

DOI : https://doi.org/10.1186/s12909-024-05454-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Program evaluation
Consumer-oriented
Clinical skills lab

BMC Medical Education

ISSN: 1472-6920

Submission enquiries: [email protected]
General enquiries: [email protected]

This website may not work correctly because your browser is out of date. Please update your browser .

Using case studies to do program evaluation

Using case studies to do program evaluation File type PDF File size 79.49 KB

This paper, authored by Edith D. Balbach for the California Department of Health Services is designed to help evaluators decide whether to use a case study evaluation approach.

It also offers guidance on how to conduct a case study evaluation.

This resource was suggested to BetterEvaluation by Benita Williams.

Using a Case Study as an Evaluation Tool 3
When to Use a Case Study 4
How to Do a Case Study 6
Unit Selection 6
Data Collection 7
Data Analysis and Interpretation 12

Balbach, E. D. 9 California Department of Health Services, (1999). Using case studies to do program evaluation . Retrieved from website: http://www.case.edu/affil/healthpromotion/ProgramEvaluation.pdf

'Using case studies to do program evaluation' is referenced in:

Open access
Published: 09 May 2024

Evaluation of integrated community case management of the common childhood illness program in Gondar city, northwest Ethiopia: a case study evaluation design

Mekides Geta 1 ,
Geta Asrade Alemayehu 2 ,
Wubshet Debebe Negash 2 ,
Tadele Biresaw Belachew 2 ,
Chalie Tadie Tsehay 2 &
Getachew Teshale 2

BMC Pediatrics volume 24 , Article number: 310 ( 2024 ) Cite this article

127 Accesses

Metrics details

Integrated Community Case Management (ICCM) of common childhood illness is one of the global initiatives to reduce mortality among under-five children by two-thirds. It is also implemented in Ethiopia to improve community access and coverage of health services. However, as per our best knowledge the implementation status of integrated community case management in the study area is not well evaluated. Therefore, this study aimed to evaluate the implementation status of the integrated community case management program in Gondar City, Northwest Ethiopia.

A single case study design with mixed methods was employed to evaluate the process of integrated community case management for common childhood illness in Gondar town from March 17 to April 17, 2022. The availability, compliance, and acceptability dimensions of the program implementation were evaluated using 49 indicators. In this evaluation, 484 mothers or caregivers participated in exit interviews; 230 records were reviewed, 21 key informants were interviewed; and 42 observations were included. To identify the predictor variables associated with acceptability, we used a multivariable logistic regression analysis. Statistically significant variables were identified based on the adjusted odds ratio (AOR) with a 95% confidence interval (CI) and p-value. The qualitative data was recorded, transcribed, and translated into English, and thematic analysis was carried out.

The overall implementation of integrated community case management was 81.5%, of which availability (84.2%), compliance (83.1%), and acceptability (75.3%) contributed. Some drugs and medical equipment, like Cotrimoxazole, vitamin K, a timer, and a resuscitation bag, were stocked out. Health care providers complained that lack of refreshment training and continuous supportive supervision was the common challenges that led to a skill gap for effective program delivery. Educational status (primary AOR = 0.27, 95% CI:0.11–0.52), secondary AOR = 0.16, 95% CI:0.07–0.39), and college and above AOR = 0.08, 95% CI:0.07–0.39), prescribed drug availability (AOR = 2.17, 95% CI:1.14–4.10), travel time to the to the ICCM site (AOR = 3.8, 95% CI:1.99–7.35), and waiting time (AOR = 2.80, 95% CI:1.16–6.79) were factors associated with the acceptability of the program by caregivers.

Conclusion and recommendation

The overall implementation status of the integrated community case management program was judged as good. However, there were gaps observed in the assessment, classification, and treatment of diseases. Educational status, availability of the prescribed drugs, waiting time and travel time to integrated community case management sites were factors associated with the program acceptability. Continuous supportive supervision for health facilities, refreshment training for HEW’s to maximize compliance, construction clean water sources for HPs, and conducting longitudinal studies for the future are the forwarded recommendation.

Peer Review reports

Integrated Community Case Management (ICCM) is a critical public health strategy for expanding the coverage of quality child care services [ 1 , 2 ]. It mainly concentrated on curative care and also on the diagnosis, treatment, and referral of children who are ill with infectious diseases [ 3 , 4 ].

Based on the World Health Organization (WHO) and the United Nations Children’s Fund (UNICEF) recommendations, Ethiopia adopted and implemented a national policy supporting community-based treatment of common childhood illnesses like pneumonia, Diarrhea, uncomplicated malnutrition, malaria and other febrile illness and Amhara region was one the piloted regions in late 2010 [ 5 ]. The Ethiopian primary healthcare units, established at district levels include primary hospitals, health centers (HCs), and health posts (HPs). The HPs are run by Health Extension Workers (HEWs), and they have function of monitoring health programs and disease occurrence, providing health education, essential primary care services, and timely referrals to HCs [ 6 , 7 ]. The Health Extension Program (HEP) uses task shifting and community ownership to provide essential health services at the first level using the health development army and a network of woman volunteers. These groups are organized to promote health and prevent diseases through community participation and empowerment by identifying the salient local bottlenecks which hinder vital maternal, neonatal, and child health service utilization [ 8 , 9 ].

One of the key steps to enhance the clinical case of health extension staff is to encourage better growth and development among under-five children by health extension. Healthy family and neighborhood practices are also encouraged [ 10 , 11 ]. The program also combines immunization, community-based feeding, vitamin A and de-worming with multiple preventive measures [ 12 , 13 ]. Now a days rapidly scaling up of ICCM approach to efficiently manage the most common causes of morbidity and mortality of children under the age of five in an integrated manner at the community level is required [ 14 , 15 ].

Over 5.3 million children are died at a global level in 2018 and most causes (75%) are preventable or treatable diseases such as pneumonia, malaria and diarrhea [ 16 ]. About 99% of the global burden of mortality and morbidity of under-five children which exists in developing countries are due to common childhood diseases such as pneumonia, diarrhea, malaria and malnutrition [ 17 ].

In 2013, the mortality rate of under-five children in Sub-Saharan Africa decreased to 86 deaths per 1000 live birth and estimated to be 25 per 1000live births by 2030. However, it is a huge figure and the trends are not sufficient to reach the target [ 18 ]. About half of global under-five deaths occurred in sub-Saharan Africa. And from the top 26 nations burdened with 80% of the world’s under-five deaths, 19 are in sub-Saharan Africa [ 19 ].

To alleviate the burden, the Ethiopian government tries to deliver basic child care services at the community level by trained health extension workers. The program improves the health of the children not only in Ethiopia but also in some African nations. Despite its proven benefits, the program implementation had several challenges, in particular, non-adherence to the national guidelines among health care workers [ 20 ]. Addressing those challenges could further improve the program performance. Present treatment levels in sub-Saharan Africa are unacceptably poor; only 39% of children receive proper diarrhea treatment, 13% of children with suspected pneumonia receive antibiotics, 13% of children with fever receive a finger/heel stick to screen for malaria [ 21 ].

To improve the program performance, program gaps should be identified through scientific evaluations and stakeholder involvement. This evaluation not only identify gaps but also forward recommendations for the observed gaps. Furthermore, the implementation status of ICCM of common childhood illnesses has not been evaluated in the study area yet. Therefore, this work aimed to evaluate the implementation status of integrated community case management program implementation in Gondar town, northwest Ethiopia. The findings may be used by policy makers, healthcare providers, funders and researchers.

Method and material

Evaluation design and settings.

A single-case study design with concurrent mixed-methods evaluation was conducted in Gondar city, northwest Ethiopia, from March 17 to April 17, 2022. The evaluability assessment was done from December 15–30, 2021. Both qualitative and quantitative data were collected concurrently, analyzed separately, and integrated at the result interpretation phase.

The evaluation area, Gondar City, is located in northwest Ethiopia, 740 km from Addis Ababa, the capital city of the country. It has six sub-cities and thirty-six kebeles (25 urban and 11 rural). In 2019, the estimated total population of the town was 338,646, and 58,519 (17.3%) were under-five children. In the town there are eight public health centers and 14 health posts serving the population. All health posts provide ICCM service for more than 70,852 populations.

Evaluation approach and dimensions

Program stakeholders.

The evaluation followed a formative participatory approach by engaging the potential stakeholders in the program. Prior to the development of the proposal, an extensive discussion was held with the Gondar City Health Department to identify other key stakeholders in the program. Service providers at each health facility (HCs and HPs), caretakers of sick children, the Gondar City Health Office (GCHO), the Amhara Regional Health Bureau (ARHB), the Minister of Health (MoH), and NGOs (IFHP and Save the Children) were considered key stakeholders. During the Evaluability Assessment (EA), the stakeholders were involved in the development of evaluation questions, objectives, indicators, and judgment criteria of the evaluation.

Evaluation dimensions

The availability and acceptability dimensions from the access framework [ 22 ] and compliance dimension from the fidelity framework [ 23 ] were used to evaluate the implementation of ICCM.

Population and samplings

All under-five children and their caregivers attended at the HPs; program implementers (health extension workers, healthcare providers, healthcare managers, PHCU focal persons, MCH coordinators, and other stakeholders); and ICCM records and registries in the health posts of Gondar city administration were included in the evaluation. For quantitative data, the required sample size was proportionally allocated for each health post based on the number of cases served in the recent one month. But the qualitative sample size was determined by data saturation, and the samples were selected purposefully.

The data sources and sample size for the compliance dimension were all administrative records/reports and ICCM registration books (230 documents) in all health posts registered from December 1, 2021, to February 30, 2022 (three months retrospectively) included in the evaluation. The registries were assessed starting from the most recent registration number until the required sample size was obtained for each health post.

The sample size to measure the mothers’/caregivers’ acceptability towards ICCM was calculated by taking prevalence of caregivers’ satisfaction on ICCM program p = 74% from previously similar study [ 24 ] and considering standard error 4% at 95% CI and 10% non- responses, which gave 508. Except those who were seriously ill, all caregivers attending the ICCM sites during data collection were selected and interviewed consecutively.

The availability of required supplies, materials and human resources for the program were assessed in all 14HPs. The data collectors observed the health posts and collected required data by using a resources inventory checklist.

A total of 70 non-participatory patient-provider interactions were also observed. The observations were conducted per each health post and for health posts which have more than one health extension workers one of them were selected randomly. The observation findings were used to triangulate the findings obtained through other data collection techniques. Since people may act accordingly to the standards when they know they are observed for their activities, we discarded the first two observations from analysis. It is one of the strategies to minimize the Hawthorne effect of the study. Finally a total of 42 (3 in each HPs) observations were included in the analysis.

Twenty one key informants (14 HEWs, 3 PHCU focal person, 3 health center heads and one MCH coordinator) were interviewed. These key informants were selected since they are assumed to be best teachers in the program. Besides originally developed key informant interview questions, the data collectors probed them to get more detail and clear information.

Variables and measurement

The availability of resources, including trained healthcare workers, was examined using 17 indicators, with weighted score of 35%. Compliance was used to assess HEWs’ adherence to the ICCM treatment guidelines by observing patient-provider interactions and conducting document reviews. We used 18 indicators and a weighted value of 40%.

Mothers’ /caregivers’/ acceptance of ICCM service was examined using 14 indicators and had a weighted score of 25%. The indicators were developed with a five-point Likert scale (1: strongly disagree, 2: disagree, 3: neutral, 4: agree and 5: strongly agree). The cut off point for this categorization was calculated using the demarcation threshold formula: ( \(\frac{\text{t}\text{o}\text{t}\text{a}\text{l}\, \text{h}\text{i}\text{g}\text{h}\text{e}\text{s}\text{t}\, \text{s}\text{c}\text{o}\text{r}\text{e}-\,\text{t}\text{o}\text{t}\text{a}\text{l}\, \text{l}\text{o}\text{w}\text{e}\text{s}\text{t} \,\text{s}\text{c}\text{o}\text{r}\text{e}}{2}) +total lowest score\) ( 25 – 27 ). Those mothers/caregivers/ who scored above cut point (42) were considered as “satisfied”, otherwise “dissatisfied”. The indicators were adapted from the national ICCM and IMNCI implementation guideline and other related evaluations with the participation of stakeholders. Indicator weight was given by the stakeholders during EA. Indicators score was calculated using the formula \(\left(achieved \,in \%=\frac{indicator \,score \,x \,100}{indicator\, weight} \right)\) [ 26 , 28 ].

The independent variables for the acceptability dimension were socio-demographic and economic variables (age, educational status, marital status, occupation of caregiver, family size, income level, and mode of transport), availability of prescribed drugs, waiting time, travel time to ICCM site, home to home visit, consultation time, appointment, and source of information.

The overall implementation of ICCM was measured by using 49 indicators over the three dimensions: availability (17 indicators), compliance (18 indicators) and acceptability (14 indicators).

Program logic model

Based on the constructed program logic model and trained health care providers, mothers/caregivers received health information and counseling on child feeding; children were assessed, classified, and treated for disease, received follow-up; they were checked for vitamin A; and deworming and immunization status were the expected outputs of the program activities. Improved knowledge of HEWs on ICCM, increased health-seeking behavior, improved quality of health services, increased utilization of services, improved data quality and information use, and improved child health conditions are considered outcomes of the program. Reduction of under-five morbidity and mortality and improving quality of life in the society are the distant outcomes or impacts of the program (Fig. 1 ).

Integrated community case management of childhood illness program logic model in Gondar City in 2022

Data collection tools and procedure

Resource inventory and data extraction checklists were adapted from standard ICCM tool and check lists [ 29 ]. A structured interviewer administered questionnaire was adapted by referring different literatures [ 30 , 31 ] to measure the acceptability of ICCM. The key informant interview (KII) guide was also developed to explore the views of KIs. The interview questionnaire and guide were initially developed in English and translated into the local language (Amharic) and finally back to English to ensure consistency. All the interviews were done in the local language, Amharic.

Five trained clinical nurses and one BSC nurse were recruited from Gondar zuria and Wegera district as data collectors and supervisors, respectively. Two days training on the overall purpose of the evaluation and basic data collection procedures were provided prior to data collection. Then, both quantitative and qualitative data were gathered at the same time. The quantitative data were gathered from program documentation, charts of ICCM program visitors and, exit interview. Interviews with 21 KIIs and non-participatory observations of patient-provider interactions were used to acquire qualitative data. Key informant interviews were conducted to investigate the gaps and best practices in the implementation of the ICCM program.

A pretest was conducted to 26 mothers/caregivers/ at Maksegnit health post and appropriate modifications were made based on the pretest results. The data collectors were supervised and principal evaluator examined the completeness and consistency of the data on a daily basis.

Data management and analysis

For analysis, quantitative data were entered into epi-data version 4.6 and exported to Stata 14 software for analysis. Narration and tabular statistics were used to present descriptive statistics. Based on established judgment criteria, the total program implementation was examined and interpreted as a mix of the availability, compliance, and acceptability dimensions. To investigate the factors associated with ICCM acceptance, a binary logistic regression analysis was performed. During bivariable analysis, variables with p-values less than 0.25 were included in multivariable analysis. Finally, variables having a p-value less than 0.05 and an adjusted odds ratio (AOR) with a 95% confidence interval (CI) were judged statistically significant. Qualitative data were collected recorded, transcribed into Amharic, then translated into English and finally coded and thematically analyzed.

Judgment matrix analysis

The weighted values of availability, compliance, and acceptability dimensions were 35, 40, and 25 based on the stakeholder and investigator agreement on each indicator, respectively. The judgment parameters for each dimension and the overall implementation of the program were categorized as poor (< 60%), fair (60–74.9%), good (75-84.9%), and very good (85–100%).

Availability of resources

A total of 26 HEWs were assigned within the fourteen health posts, and 72.7% of them were trained on ICCM to manage common childhood illnesses in under-five children. However, the training was given before four years, and they didn’t get even refreshment training about ICCM. The KII responses also supported that the shortage of HEWs at the HPs was the problem in implementing the program properly.

I am the only HEW in this health post and I have not been trained on ICCM program. So, this may compromise the quality of service and client satisfaction.(25 years old HEW with two years’ experience)

All observed health posts had ICCM registration books, monthly report and referral formats, functional thermometer, weighting scale and MUAC tape meter. However, timer and resuscitation bag was not available in all HPs. Most of the key informant finding showed that, in all HPs there was no shortage of guideline, registration book and recording tool; however, there was no OTP card in some health posts.

“Guideline, ICCM registration book for 2–59 months of age, and other different recording and reporting formats and booklet charts are available since September/2016. However, OTP card is not available in most HPs.”. (A 30 years male health center director)

Only one-fifth (21%) of HPs had a clean water source for drinking and washing of equipment. Most of Key-informant interview findings showed that the availability of infrastructures like water was not available in most HPs. Poor linkage between HPs, HCs, town health department, and local Kebele administer were the reason for unavailability.

Since there is no water for hand washing, or drinking, we obligated to bring water from our home for daily consumptions. This increases the burden for us in our daily activity. (35 years old HEW)

Most medicines, such as anti-malaria drugs with RDT, Quartem, Albendazole, Amoxicillin, vitamin A capsules, ORS, and gloves, were available in all the health posts. Drugs like zinc, paracetamol, TTC eye ointment, and folic acid were available in some HPs. However, cotrimoxazole and vitamin K capsules were stocked-out in all health posts for the last six months. The key informant also revealed that: “Vitamin K was not available starting from the beginning of this program and Cotrimoxazole was not available for the past one year and they told us they would avail it soon but still not availed. Some essential ICCM drugs like anti malaria drugs, De-worming, Amoxicillin, vitamin A capsules, ORS and medical supplies were also not available in HCs regularly.”(28 years’ Female PHCU focal)

The overall availability of resources for ICCM implementation was 84.2% which was good based on our presetting judgment parameter (Table 1 ).

Health extension worker’s compliance

From the 42 patient-provider interactions, we found that 85.7%, 71.4%, 76.2%, and 95.2% of the children were checked for body temperature, weight, general danger signs, and immunization status respectively. Out of total (42) observation, 33(78.6%) of sick children were classified for their nutritional status. During observation time 29 (69.1%) of caregivers were counseled by HEWs on food, fluid and when to return back and 35 (83.3%) of children were appointed for next follow-up visit. Key informant interviews also affirmed that;

“Most of our health extension workers were trained on ICCM program guidelines but still there are problems on assessment classification and treatment of disease based on guidelines and standards this is mainly due to lack refreshment training on the program and lack of continuous supportive supervision from the respective body.” (27years’ Male health center head)

From 10 clients classified as having severe pneumonia cases, all of them were referred to a health center (with pre-referral treatment), and from those 57 pneumonia cases, 50 (87.7%) were treated at the HP with amoxicillin or cotrimoxazole. All children with severe diarrhea, very severe disease, and severe complicated malnutrition cases were referred to health centers with a pre-referral treatment for severe dehydration, very severe febrile disease, and severe complicated malnutrition, respectively. From those with some dehydration and no dehydration cases, (82.4%) and (86.8%) were treated at the HPs for some dehydration (ORS; plan B) and for no dehydration (ORS; plan A), respectively. Moreover, zinc sulfate was prescribed for 63 (90%) of under-five children with some dehydration or no dehydration. From 26 malaria cases and 32 severe uncomplicated malnutrition and moderate acute malnutrition cases, 20 (76.9%) and 25 (78.1%) were treated at the HPs, respectively. Of the total reviewed documents, 56 (93.3%), 66 (94.3%), 38 (84.4%), and 25 (78.1%) of them were given a follow-up date for pneumonia, diarrhea, malaria, and malnutrition, respectively.

Supportive supervision and performance review meetings were conducted only in 10 (71.4%) HPs, but all (100%) HPs sent timely reports to the next supervisory body.

Most of the key informants’ interview findings showed that supportive supervision was not conducted regularly and for all HPs.

I had mentored and supervised by supportive supervision teams who came to our health post at different times from health center, town health office and zonal health department. I received this integrated supervision from town health office irregularly, but every month from catchment health center and last integrated supportive supervision from HC was on January. The problem is the supervision was conducted for all programs.(32 years’ old and nine years experienced female HEW)

Moreover, the result showed that there was poor compliance of HEWs for the program mainly due to weak supportive supervision system of managerial and technical health workers. It was also supported by key informants as:

We conducted supportive supervision and performance review meeting at different time, but still there was not regular and not addressed all HPs. In addition to this the supervision and review meeting was conducted as integration of ICCM program with other services. The other problem is that most of the time we didn’t used checklist during supportive supervision. (Mid 30 years old male HC director)

Based on our observation and ICCM document review, 83.1% of the HEWs were complied with the ICCM guidelines and judged as fair (Table 2 ).

Acceptability of ICCM program

Sociodemographic and obstetric characteristics of participants.

A total of 484 study participants responded to the interviewer-administered questionnaire with a response rate of 95.3%. The mean age of study participants was 30.7 (SD ± 5.5) years. Of the total caregivers, the majority (38.6%) were categorized under the age group of 26–30 years. Among the total respondents, 89.3% were married, and regarding religion, the majorities (84.5%) were Orthodox Christian followers. Regarding educational status, over half of caregivers (52.1%) were illiterate (unable to read or write). Nearly two-thirds of the caregivers (62.6%) were housewives (Table 3 ).

All the caregivers came to the health post on foot, and most of them 418 (86.4%) arrived within one hour. The majority of 452 (93.4%) caregivers responded that the waiting time to get the service was less than 30 min. Caregivers who got the prescribed drugs at the health post were 409 (84.5%). Most of the respondents, 429 (88.6%) and 438 (90.5%), received counseling services on providing extra fluid and feeding for their sick child and were given a follow-up date.

Most 298 (61.6%) of the caregivers were satisfied with the convenience of the working hours of HPs, and more than three-fourths (80.8%) were satisfied with the counseling services they received. Most of the respondents, 366 (75.6%), were satisfied with the appropriateness of waiting time and 431 (89%) with the appropriateness of consultation time. The majority (448 (92.6%) of caregivers were satisfied with the way of communicating with HEWs, and 269 (55.6%) were satisfied with the knowledge and competence of HEWs. Nearly half of the caregivers (240, or 49.6%) were satisfied with the availability of drugs at health posts.

The overall acceptability of the ICCM program was 75.3%, which was judged as good. A low proportion of acceptability was measured on the cleanliness of the health posts, the appropriateness of the waiting area, and the competence and knowledge of the HEWs. On the other hand, high proportion of acceptability was measured on appropriateness of waiting time, way of communication with HEWs, and the availability of drugs (Table 4 ).

Factors associated with acceptability of ICCM program

In the final multivariable logistic regression analysis, educational status of caregivers, availability of prescribed drugs, time to arrive, and waiting time were factors significantly associated with the satisfaction of caregivers with the ICCM program.

Accordingly, the odds of caregivers with primary education, secondary education, and college and above were 73% (AOR = 0.27, 95% CI: 0.11–0.52), 84% (AOR = 0.16, 95% CI: 0.07–0.39), and 92% (AOR = 0.08, 95% CI: 0.07–0.40) less likely to accept the program as compared to mothers or caregivers who were not able to read and write, respectively. The odds of caregivers or mothers who received prescribed drugs were 2.17 times more likely to accept the program as compared to their counters (AOR = 2.17, 95% CI: 1.14–4.10). The odds of caregivers or mothers who waited for services for less than 30 min were 2.8 times more likely to accept the program as compared to those who waited for more than 30 min (AOR = 2.80, 95% CI: 1.16–6.79). Moreover, the odds of caregivers/mothers who traveled an hour or less for service were 3.8 times more likely to accept the ICCM program as compared to their counters (AOR = 3.82, 95% CI:1.99–7.35) (Table 5 ).

Overall ICCM program implementation and judgment

The implementation of the ICCM program in Gondar city administration was measured in terms of availability (84.2%), compliance (83.1%), and acceptability (75.3%) dimensions. In the availability dimension, amoxicillin, antimalarial drugs, albendazole, Vit. A, and ORS were available in all health posts, but only six HPs had Ready-to-Use Therapeutic Feedings, three HPs had ORT Corners, and none of the HPs had functional timers. In all health posts, the health extension workers asked the chief to complain, correctly assessed for pneumonia, diarrhea, malaria, and malnutrition, and sent reports based on the national schedule. However, only 70% of caretakers counseled about food, fluids, and when to return, 66% and 76% of the sick children were checked for anemia and other danger signs, respectively. The acceptability level of the program by caretakers and caretakers’/mothers’ educational status, waiting time to get the service and travel time ICCM sites were the factors affecting its acceptability. The overall ICCM program in Gondar city administration was 81.5% and judged as good (Fig. 2 ).

Overall ICCM program implementation and the evaluation dimensions in Gondar city administration, 2022

The implementation status of ICCM was judged by using three dimensions including availability, compliance and acceptability of the program. The judgment cut of points was determined during evaluability assessment (EA) along with the stakeholders. As a result, we found that the overall implementation status of ICCM program was good as per the presetting judgment parameter. Availability of resources for the program implementation, compliance of HEWs to the treatment guideline and acceptability of the program services by users were also judged as good as per the judgment parameter.

This evaluation showed that most medications, equipment and recording and reporting materials available. This finding was comparable with the standard ICCM treatment guide line [ 10 ]. On the other hand trained health care providers, some medications like Zink, Paracetamol and TTC eye ointment, folic acid and syringes were not found in some HPs. However the finding was higher than the study conducted in SNNPR on selected health posts [ 33 ] and a study conducted in Soro district, southern Ethiopia [ 24 ]. The possible reason might be due to low interruption of drugs at town health office or regional health department stores, regular supplies of essential drugs and good supply management and distribution of drug from health centers to health post.

The result of this evaluation showed that only one fourth of health posts had functional ORT Corner which was lower compared to the study conducted in SNNPR [ 34 ]. This might be due poor coverage of functional pipe water in the kebeles and the installation was not set at the beginning of health post construction as reported from one of ICCM program coordinator.

Compliance of HEWs to the treatment guidelines in this evaluation was higher than the study done in southern Ethiopia (65.6%) [ 24 ]. This might be due to availability of essential drugs educational level of HEWs and good utilization of ICCM guideline and chart booklet by HEWs. The observations showed most of the sick children were assessed for danger sign, weight, and temperature respectively. This finding is lower than the study conducted in Rwanda [ 35 ]. This difference might be due to lack of refreshment training and regular supportive supervision for HEWs. This also higher compared to the study done in three regions of Ethiopia indicates that 88%, 92% and 93% of children classified as per standard for Pneumonia, diarrhea and malaria respectively [ 36 ]. The reason for this difference may be due to the presence of medical equipment and supplies including RDT kit for malaria, and good educational level of HEWs.

Moreover most HPs received supportive supervision and performance review meeting was conducted and all of them send reports timely to next level. The finding of this evaluation was lower than the study conducted on implementation evaluation of ICCM program southern Ethiopia [ 24 ] and study done in three regions of Ethiopia (Amhara, Tigray and SNNPR) [ 37 ]. This difference might be due sample size variation.

The overall acceptability of the ICCM program was less than the presetting judgment parameter but slightly higher compared to the study in southern Ethiopia [ 24 ]. This might be due to presence of essential drugs for treating children, reasonable waiting and counseling time provided by HEWs, and smooth communication between HEWs and caregivers. In contrast, this was lower than similar studies conducted in Wakiso district, Uganda [ 38 ]. The reason for this might be due to contextual difference between the two countries, inappropriate waiting area to receive the service and poor cleanness of the HPs in our study area. Low acceptability of caregivers to ICCM service was observed in the appropriateness of waiting area, availability of drugs, cleanness of health post, and competence of HEWs while high level of caregiver’s acceptability was consultation time, counseling service they received, communication with HEWs, treatment given for their sick children and interest to return back for ICCM service.

Caregivers who achieved primary, secondary, and college and above were more likely accept the program services than those who were illiterate. This may more educated mothers know about their child health condition and expect quality service from healthcare providers which is more likely reduce the acceptability of the service. The finding is congruent with a study done on implementation evaluation of ICCM program in southern Ethiopia [ 24 ]. However, inconsistent with a study conducted in wakiso district in Uganda [ 38 ]. The possible reason for this might be due to contextual differences between the two countries. The ICCM program acceptability was high in caregivers who received all prescribed drugs than those did not. Caregivers those waited less than 30 min for service were more accepted ICCM services compared to those more than 30 minutes’ waiting time. This finding is similar compared with the study conducted on implementation evaluation of ICCM program in southern Ethiopia [ 24 ]. In contrary, the result was incongruent with a survey result conducted by Ethiopian public health institute in all regions and two administrative cities of Ethiopia [ 39 ]. This variation might be due to smaller sample size in our study the previous one. Moreover, caregivers who traveled to HPs less than 60 min were more likely accepted the program than who traveled more and the finding was similar with the study finding in Jimma zone [ 40 ].

Strengths and limitations

This evaluation used three evaluation dimensions, mixed method and different data sources that would enhance the reliability and credibility of the findings. However, the study might have limitations like social desirability bias, recall bias and Hawthorne effect.

This evaluation assessed the implementation status of the ICCM program, focusing mainly on availability, compliance, and acceptability dimensions. The overall implementation status of the program was judged as good. The availability dimension is compromised due to stock-outs of chloroquine syrup, cotrimoxazole, and vitamin K and the inaccessibility of clean water supply in some health posts. Educational statuses of caregivers, availability of prescribed drugs at the HPs, time to arrive to HPs, and waiting time to receive the service were the factors associated with the acceptability of the ICCM program.

Therefore, continuous supportive supervision for health facilities, and refreshment training for HEW’s to maximize compliance are recommended. Materials and supplies shall be delivered directly to the health centers or health posts to solve the transportation problem. HEWs shall document the assessment findings and the services provided using the registration format to identify their gaps, limitations, and better performances. The health facilities and local administrations should construct clean water sources for health facilities. Furthermore, we recommend for future researchers and program evaluators to conduct longitudinal studies to know the causal relationship of the program interventions and the outcomes.

Data availability

Data will be available upon reasonable request from the corresponding author.

Abbreviations

Ethiopian Demographic and Health Survey

Health Center/Health Facility

Health Extension Program

Health Extension Workers

Health Post

Health Sector Development Plan

Integrated Community Case Management of Common Childhood Illnesses

Information Communication and Education

Integrated Family Health Program

Integrated Management of Neonatal and Childhood Illness

Integrated Supportive Supervision

Maternal and Child Health

Mid Upper Arm Circumference

Non-Government Organization

Oral Rehydration Salts

Outpatient Therapeutic program

Primary health care unit

Rapid Diagnostics Test

Ready to Use Therapeutic Foods

Sever Acute Malnutrition

South Nation Nationalities People Region

United Nations International Child Emergency Fund

World Health Organization

Brenner JL, Barigye C, Maling S, Kabakyenga J, Nettel-Aguirre A, Buchner D, et al. Where there is no doctor: can volunteer community health workers in rural Uganda provide integrated community case management? Afr Health Sci. 2017;17(1):237–46.

Article PubMed PubMed Central Google Scholar

Mubiru D, Byabasheija R, Bwanika JB, Meier JE, Magumba G, Kaggwa FM, et al. Evaluation of integrated community case management in eight districts of Central Uganda. PLoS ONE. 2015;10(8):e0134767.

Samuel S, Arba A. Utilization of integrated community case management service and associated factors among mothers/caregivers who have sick eligible children in southern Ethiopia. Risk Manage Healthc Policy. 2021;14:431.

Article Google Scholar

Kavle JA, Pacqué M, Dalglish S, Mbombeshayi E, Anzolo J, Mirindi J, et al. Strengthening nutrition services within integrated community case management (iCCM) of childhood illnesses in the Democratic Republic of Congo: evidence to guide implementation. Matern Child Nutr. 2019;15:e12725.

Miller NP, Amouzou A, Tafesse M, Hazel E, Legesse H, Degefie T, et al. Integrated community case management of childhood illness in Ethiopia: implementation strength and quality of care. Am J Trop Med Hyg. 2014;91(2):424.

WHO. Annual report 2016: Partnership and policy engagement. World Health Organization, 2017.

Banteyerga H. Ethiopia’s health extension program: improving health through community involvement. MEDICC Rev. 2011;13:46–9.

Article PubMed Google Scholar

Wang H, Tesfaye R, Ramana NV, Chekagn G. CT. Ethiopia health extension program: an institutionalized community approach for universal health coverage. The World Bank; 2016.

Donnelly J. Ethiopia gears up for more major health reforms. Lancet. 2011;377(9781):1907–8.

Legesse H, Degefie T, Hiluf M, Sime K, Tesfaye C, Abebe H, et al. National scale-up of integrated community case management in rural Ethiopia: implementation and early lessons learned. Ethiop Med J. 2014;52(Suppl 3):15–26.

Google Scholar

Miller NP, Amouzou A, Hazel E, Legesse H, Degefie T, Tafesse M et al. Assessment of the impact of quality improvement interventions on the quality of sick child care provided by Health Extension workers in Ethiopia. J Global Health. 2016;6(2).

Oliver K, Young M, Oliphant N, Diaz T, Kim JJNYU. Review of systematic challenges to the scale-up of integrated community case management. Emerging lessons & recommendations from the catalytic initiative (CI/IHSS); 2012.

FMoH E. Health Sector Transformation Plan 2015: https://www.slideshare.net . Accessed 12 Jan 2022.

McGorman L, Marsh DR, Guenther T, Gilroy K, Barat LM, Hammamy D, et al. A health systems approach to integrated community case management of childhood illness: methods and tools. The American Journal of Tropical Medicine and Hygiene. 2012;87(5 Suppl):69.

Young M, Wolfheim C, Marsh DR, Hammamy D. World Health Organization/United Nations Children’s Fund joint statement on integrated community case management: an equity-focused strategy to improve access to essential treatment services for children. The American journal of tropical medicine and hygiene. 2012;87(5 Suppl):6.

Ezbakhe F, Pérez-Foguet A. Child mortality levels and trends. Demographic Research.2020;43:1263-96.

UNICEF, Ending child deaths from pneumonia and diarrhoea. 2016 report: Available at https://data.unicef.org. accessed 13 Jan 2022.

UNITED NATIONS, The Millinium Development Goals Report 2015: Available at https://www.un.org.Accessed 12 Jan 2022

Bent W, Beyene W, Adamu A. Factors Affecting Implementation of Integrated Community Case Management Of Childhood Illness In South West Shoa Zone, Central Ethiopia 2015.

Abdosh B. The quality of hospital services in eastern Ethiopia: Patient’s perspective.The Ethiopian Journal of Health Development. 2006;20(3).

Young M, Wolfheim C, Marsh DR, Hammamy DJTAjotm, hygiene. World Health Organization/United Nations Children’s Fund joint statement on integrated community case management: an equity-focused strategy to improve access to essential treatment services for children.2012;87(5_Suppl):6–10.

Obrist B, Iteba N, Lengeler C, Makemba A, Mshana C, Nathan R, et al. Access to health care in contexts of livelihood insecurity: a framework for analysis and action.PLoS medicine. 2007;4(10):e308.

Carroll C, Patterson M, Wood S, Booth A, Rick J, Balain S. A conceptual framework for implementation fidelity. Implementation science. 2007;2(1):1–9.

Dunalo S, Tadesse B, Abraham G. Implementation Evaluation of Integrated Community Case Management of Common Childhood Illness (ICCM) Program in Soro Woreda, Hadiya Zone Southern Ethiopia 2017 2017.

Asefa G, Atnafu A, Dellie E, Gebremedhin T, Aschalew AY, Tsehay CT. Health System Responsiveness for HIV/AIDS Treatment and Care Services in Shewarobit, North Shewa Zone, Ethiopia. Patient preference and adherence. 2021;15:581.

Gebremedhin T, Daka DW, Alemayehu YK, Yitbarek K, Debie A. Process evaluation of the community-based newborn care program implementation in Geze Gofa district,south Ethiopia: a case study evaluation design. BMC pregnancy and childbirth. 2019;19(1):1–13.

Pitaloka DS, Rizal A. Patient’s satisfaction in antenatal clinic hospital Universiti Kebangsaan Malaysia. Jurnal Kesihatan Masyarakat (Malaysia). 2006;12(1):1–10.

Teshale G, Debie A, Dellie E, Gebremedhin T. Evaluation of the outpatient therapeutic program for severe acute malnourished children aged 6–59 months implementation in Dehana District, Northern Ethiopia: a mixed-methods evaluation. BMC pediatrics. 2022;22(1):1–13.

Mason E. WHO’s strategy on Integrated Management of Childhood Illness. Bulletin of the World Health Organization. 2006;84(8):595.

Shaw B, Amouzou A, Miller NP, Tafesse M, Bryce J, Surkan PJ. Access to integrated community case management of childhood illnesses services in rural Ethiopia: a qualitative study of the perspectives and experiences of caregivers. Health policy and planning.2016;31(5):656 – 66.

Organization WH. Annual report 2016: Partnership and policy engagement. World Health Organization, 2017.

Berhanu D, Avan B. Community Based Newborn Care Baseline Survey Report Ethiopia,October 2014.

Save the children, Enhancing Ethiopia’s Health Extension Package in the Southern Nations and Nationalities People’s Region (SNNPR) Shebedino and Lanfero Woredas report.Hawassa;. 2012: Avalable at https://ethiopia.savethechildren.net

Kolbe AR, Muggah R, Hutson RA, James L, Puccio M, Trzcinski E, et al. Assessing Needs After the Quake: Preliminary Findings from a Randomized Survey of Port-au-Prince Households. University of Michigan/Small Arms Survey: Available at https://deepbluelibumichedu PDF. 2010.

Teferi E, Teno D, Ali I, Alemu H, Bulto T. Quality and use of IMNCI services at health center under-five clinics after introduction of integrated community-based case management (ICCM) in three regions of Ethiopia. Ethiopian Medical Journal. 2014;52(Suppl 3):91 – 8.

Last 10 Km project, Integrated Community Case Management (iCCM) Survey report in Amhara, SNNP, and Tigray Regions, 2017: Avaialable at https://l10k.jsi.com

Tumuhamye N, Rutebemberwa E, Kwesiga D, Bagonza J, Mukose A. Client satisfaction with integrated community case management program in Wakiso District, Uganda, October 2012: A cross sectional survey. Health scrip org. 2013;2013.

EPHI. Ethiopia service provision assessment plus survey 2014 report: available at http://repository.iifphc.org

Gintamo B. EY, Assefa Y. Implementation Evaluation of IMNCI Program at Public Health Centers of Soro District, Hadiya Zone, Southern Ethiopia,. 2017: Available at https://repository.ju.edu.et

Download references

Acknowledgements

We are very grateful to University of Gondar and Gondar town health office for its welcoming approaches. We would also like to thank all of the study participants of this evaluation for their information and commitment. Our appreciation also goes to the data collectors and supervisors for their unreserved contribution.

No funding is secured for this evaluation study.

Author information

Authors and affiliations.

Metema District Health office, Gondar, Ethiopia

Mekides Geta

Department of Health Systems and Policy, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, P.O. Box 196, Gondar, Ethiopia

Geta Asrade Alemayehu, Wubshet Debebe Negash, Tadele Biresaw Belachew, Chalie Tadie Tsehay & Getachew Teshale

You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the preparation of the manuscript. M.G. conceived and designed the evaluation and performed the analysis then T.B.B., W.D.N., G.A.A., C.T.T. and G.T. revised the analysis. G.T. prepared the manuscript and all the authors revised and approved the final manuscript.

Corresponding author

Correspondence to Getachew Teshale .

Ethics declarations

Ethics approval and consent to participate.

Ethical approval was obtained from Institutional Review Board (IRB) of Institute of Public Health, College of Medicine and Health sciences, University of Gondar (Ref No/IPH/1482/2013). Informed consent was obtained from all subjects and/or their legal guardian(s).

Consent for publication

Not applicable.

Competing interests

All authors declared that they have no competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article.

Geta, M., Alemayehu, G.A., Negash, W.D. et al. Evaluation of integrated community case management of the common childhood illness program in Gondar city, northwest Ethiopia: a case study evaluation design. BMC Pediatr 24 , 310 (2024). https://doi.org/10.1186/s12887-024-04785-0

Download citation

Received : 20 February 2024

Accepted : 22 April 2024

Published : 09 May 2024

DOI : https://doi.org/10.1186/s12887-024-04785-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Integrated community case management

BMC Pediatrics

ISSN: 1471-2431

Submission enquiries: [email protected]
General enquiries: [email protected]

Evaluating Local Government Policy Innovations

A case study of surabaya's efforts in combating stunting and enhancing public health services quality.

Deasy Arieffiani Public Administration Department, Hang Tuah University, Surabaya, Indonesia
Mas Roro Lilik Ekowanti Public Administration Department, Hang Tuah University, Surabaya, Indonesia

This research aims to evaluate regional innovations in implementing Surabaya City government policies to reduce stunting rates and improve the quality of public health services. A qualitative descriptive method was used with a case study approach involving field observations and structured interviews. The research results show the success of Posyandu Prima in reducing stunting rates significantly in the last two years. The Surabaya City Government has proven effective in managing this program's human resources and budget. The active involvement of Great Surabaya Cadres (KSH) and non-governmental organizations also contributed greatly to the program's success. Cross-sector collaboration plays an important role in supporting implementation. Institutional characteristics, such as commitment to public health and ability to collaborate, also matter. Theoretically, this research shows that synergy between the parties involved and government commitment can achieve significant results in handling the stunting problem. In conclusion, the Prima Posyandu Program has proven successful in reducing stunting rates and improving the quality of public health services in Surabaya. Additionally, the collaborative efforts between community stakeholders, healthcare providers, and governmental bodies underscore the crucial role of multi-sectoral partnerships in addressing complex public health issues like stunting. This synergy fosters comprehensive approaches that combine local knowledge, resources, and policy support to effectively combat stunting and enhance the well-being of communities. Thus, the Prima Posyandu Program's success is a compelling example of how concerted action and sustained commitment can yield tangible improvements in population health outcomes.

Adair, L. S., Carba, D. B., Lee, N. R., & Borja, J. B. (2021). Stunting, IQ, and Final School Attainment in the Cebu Longitudinal Health and Nutrition Survey Birth Cohort. Economics & Human Biology, 42, 100999. https://doi.org/10.1016/j.ehb.2021.100999

Aditri, F., Sufyan, D. L., & Puspareni, L. D. (2022). Policy Implementation Strategy of West Bandung District Health Office in Stunting Intervention During COVID-19 Pandemic. Journal of Global Nutrition, 1(2), 75–92. https://doi.org/10.53823/jgn.v1i2.24

Adnyana, S. (2014). Perbedaan Status Gizi Balita Berdasarkan Frekuensi Kunjungan ke Posyandu dan Tingkat Pengetahuan Ibu di Desa Bungaya Kecamatan Bebandem Kabupaten Karangasem Provinsi Bali. Jurnal Bina Praja, 6(3), 221–226. https://doi.org/10.21787/jbp.06.2014.221-226

Anggraini, T., & Melin Wula, H. V. (2021). Governmental Performance in Integrated Stunting Countermeasures in Border Regions: Evidence from Timur Tengah Utara Regency. Jurnal Studi Sosial dan Politik, 5(2), 252–263. https://doi.org/10.19109/jssp.v5i2.9561

Ansell, C., & Gash, A. (2007). Collaborative Governance in Theory and Practice. Journal of Public Administration Research and Theory, 18(4), 543–571. https://doi.org/10.1093/jopart/mum032

Bhutta, Z. A., Akseer, N., Keats, E. C., Vaivada, T., Baker, S., Horton, S. E., Katz, J., Menon, P., Piwoz, E., Shekar, M., Victora, C., & Black, R. (2020). How Countries Can Reduce Child Stunting at Scale: Lessons From Exemplar Countries. The American Journal of Clinical Nutrition, 112, 894S-904S. https://doi.org/10.1093/ajcn/nqaa153

Bryson, J. M., Crosby, B. C., & Stone, M. M. (2015). Designing and Implementing Cross-Sector Collaborations: Needed and Challenging. Public Administration Review, 75(5), 647–663. https://doi.org/10.1111/puar.12432

Creswell, J. W., & Creswell, J. D. (2018). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications.

Daniel, D., Qaimamunazzala, H., Prawira, J., Siantoro, A., Sirait, M., Tanaboleng, Y. B., & Padmawati, R. S. (2023). Interactions of Factors Related to the Stunting Reduction Program in Indonesia: A Case Study in Ende District. International Journal of Social Determinants of Health and Health Services, 53(3), 354–362. https://doi.org/10.1177/27551938231156024

Elmighrabi, N. F., Fleming, C. A. K., & Agho, K. E. (2024). Factors Associated with Childhood Stunting in Four North African Countries: Evidence from Multiple Indicator Cluster Surveys, 2014–2019. Nutrients, 16(4), 473. https://doi.org/10.3390/nu16040473

Erlyn, P., Hidayat, B. A., Fatoni, A., & Saksono, H. (2021). Nutritional Interventions by Local Governments as an Effort to Accelerate Stunting Reduction. Jurnal Bina Praja, 13(3), 543–553. https://doi.org/10.21787/jbp.13.2021.543-553

Essa, W. Y., Nurfindarti, E., & Ruhyana, N. F. (2021). Strategies for Handling Stunting in Bandung City. Jurnal Bina Praja, 13(1), 15–28. https://doi.org/10.21787/jbp.13.2021.15-28

Fatahillah, R. E. P., & Noviyanti. (2023). Analisis Survei Kepuasan Masyarakat pada Pelayanan Kesehatan Ibu dan Anak (KIA) di Puskesmas Gayungan Kota Surabaya. Jurnal Inovasi Administrasi Negara Terapan, 1(1), 178–190. https://journal.unesa.ac.id/index.php/innovant/article/view/25898

Ferguson, L. C., & Clark, T. N. (1979). The Policy Predicament: Making and Implementing Public Policy by George C. Edwards and Ira Sharkansky. Administrative Science Quarterly, 24(1), 149. https://doi.org/10.2307/2989886

Habimana, J. de D., Uwase, A., Korukire, N., Jewett, S., Umugwaneza, M., Rugema, L., & Munyanshongore, C. (2023). Prevalence and Correlates of Stunting among Children Aged 6–23 Months from Poor Households in Rwanda. International Journal of Environmental Research and Public Health, 20(5), 4068. https://doi.org/10.3390/ijerph20054068

Halik, A. (2015). Implementasi Kebijakan Pelimpahan Urusan Pemerintahan Lingkup Kementerian Dalam Negeri. Jurnal Bina Praja, 7(2), 131–148. https://doi.org/10.21787/jbp.07.2015.131-148

Iryani, R. Y., Maulidiah, S., Rahman, K., Prihatin, P. S., & Febrian, R. A. (2022). Capacity of Community Government in Convergence Stunting Prevention in Sinaboi Countries Sinaboika District, Rokan Hilir District. International Journal of Health Sciences, 619–638. https://doi.org/10.53730/ijhs.v6nS4.5595

Jeyakumar, A., Godbharle, S., & Giri, B. R. (2021). Determinants of Anthropometric Failure Among Tribal Children Younger than 5 Years of Age in Palghar, Maharashtra, India. Food and Nutrition Bulletin, 42(1), 55–64. https://doi.org/10.1177/0379572120970836

Kwami, C. S., Godfrey, S., Gavilan, H., Lakhanpaul, M., & Parikh, P. (2019). Water, Sanitation, and Hygiene: Linkages with Stunting in Rural Ethiopia. International Journal of Environmental Research and Public Health, 16(20), 3793. https://doi.org/10.3390/ijerph16203793

Lacey, A., & Luff, D. (2009). Qualitative Data Analysis. The NIHR RDS for the East Midlands/Yorkshire & the Humber.

Macella, A. D. R., Mardhiah, N., & Handayani, S. W. (2022). A Study of Leadership Innovation in Stunting Prevention and Handling in Simeulue, Aceh Province, Indonesia. International Journal of Advances in Social Sciences and Humanities, 1(1), 50–57. https://doi.org/10.56225/ijassh.v1i1.39

Media, Y. (2014). Kualitas Pelayanan Kesehatan Ibu Hamil dan Bersalin di Daerah Terpencil (Studi Kasus di Nagari Batu Bajanjang, Kabupaten Solok, Provinsi Sumatera Barat). Jurnal Bina Praja, 6(1), 43–52. https://doi.org/10.21787/jbp.06.2014.21-30

Miles, M. B., Huberman, A. M., & Saldana, J. (2014). Qualitative Data Analysis: A Methods Sourcebook. SAGE.

Mwita, F. C., PrayGod, G., Sanga, E., Setebe, T., Joseph, G., Kunzi, H., Webster, J., Gladstone, M., Searle, R., Ahmed, M., Hokororo, A., Filteau, S., Friis, H., Briend, A., & Olsen, M. F. (2024). Developmental and Nutritional Changes in Children with Severe Acute Malnutrition Provided with n-3 Fatty Acids Improved Ready-to-Use Therapeutic Food and Psychosocial Support: A Pilot Study in Tanzania. Nutrients, 16(5), 692. https://doi.org/10.3390/nu16050692

Nadeak, H. (2014). Implementasi Peraturan Pemerintah Nomor 19 Tahun 2008 tentang Kecamatan. Jurnal Bina Praja, 6(3), 183–196. https://doi.org/10.21787/jbp.06.2014.183-195

Patton, M. Q. (2002). Qualitative Research and Evaluation Methods. SAGE Publications.

Pemerintah Kota Surabaya. (2023, February 16). Program Pemkot Surabaya “Posyandu Prima” Dijadikan Percontohan Nasional. Pemerintah Kota Surabaya. https://surabaya.go.id/id/berita/72605/program-pemkot-surabaya-posyandu-prima-dijadikan-percontohan-nasional

Prasetyo, A., Noviana, N., Rosdiana, W., Anwar, M. A., Hartiningsih, Hendrixon, Harwijayanti, B. P., & Fahlevi, M. (2023). Stunting Convergence Management Framework through System Integration Based on Regional Service Governance. Sustainability, 15(3), 1821. https://doi.org/10.3390/su15031821

Rahman, S. A., Amran, A., Ahmad, N. H., & Khadijeh Taghizadeh, S. (2019). The Contrasting Role of Government and NGO Support Towards the Entrepreneurs at Base of Pyramid and Effect on Subjective Wellbeing. Journal of Small Business & Entrepreneurship, 31(4), 269–295. https://doi.org/10.1080/08276331.2018.1498261

Rustikawati, K., Setyowati, D., & Herawati, N. (2019). Sistem Informasi Geografis Status Gizi Buruk Balita di Dinas Kesehatan Kota Yogyakarta Berbasis Mobile Android. Jurnal Teknologi, 12(2), 153–158. https://doi.org/10.3415/JURTEK.V12I2.2703

Sekretariat Percepatan Pencegahan Stunting. (2019). Strategi Nasional Percepatan Pencegahan Anak Kerdil (Stunting). Sekretariat Percepatan Pencegahan Stunting.

Setyawan, D., Priantono, A., & Firdausi, F. (2021). George Edward III Model. Publicio: Jurnal Ilmiah Politik, Kebijakan dan Sosial, 3(2), 9–19. https://doi.org/10.51747/publicio.v3i2.774

Sirait, F. E. T. (2021). Policy Communication and the Solidity of the Jokowi’s Second Term Coalition in Handling Covid-19. Jurnal Bina Praja, 13(2), 257–268. https://doi.org/10.21787/jbp.13.2021.257-268

Siswati, T., Iskandar, S., Pramestuti, N., Raharjo, J., Rubaya, A. K., & Wiratama, B. S. (2022). Impact of an Integrative Nutrition Package through Home Visit on Maternal and Children Outcome: Finding from Locus Stunting in Yogyakarta, Indonesia. Nutrients, 14(16), 3448. https://doi.org/10.3390/nu14163448

Suratri, M. A. L., Putro, G., Rachmat, B., Nurhayati, Ristrini, Pracoyo, N. E., Yulianto, A., Suryatma, A., Samsudin, M., & Raharni. (2023). Risk Factors for Stunting among Children under Five Years in the Province of East Nusa Tenggara (NTT), Indonesia. International Journal of Environmental Research and Public Health, 20(2), 1640. https://doi.org/10.3390/ijerph20021640

swaranews.com. (2023, January 26). Prevalensi Stunting Surabaya Terendah se-Indonesia. Swaranews.com. https://swaranews.com/baca-4764-prevalensi-stunting-surabaya-terendah-se-indonesia

Tamrin, M. H. (2017). Interaksi Aktor Kebijakan dalam Pengelolaan Wilayah Jembatan Suramadu dalam Perspektif Advocacy Coalition Framework (ACF). JKMP (Jurnal Kebijakan dan Manajemen Publik), 5(2), 141–158. https://doi.org/10.21070/jkmp.v5i2.1312

Tamrin, M. H., & Lubis, L. (2023). Pengelolaan KEE Ujung Pangkah Melalui Kolaborasi Stakeholders. Literasi Nusantara Abadi Grup.

Tarmizi, S. N. (2023, January 25). Prevalensi Stunting di Indonesia Turun ke 21,6% dari 24,4%. Sehat Negeriku. https://sehatnegeriku.kemkes.go.id/baca/rilis-media/20230125/3142280/prevalensi-stunting-di-indonesia-turun-ke-216-dari-244/

Taufiqurokhman, T. (2023). Equality Strategy for Reducing Stunting Prevalence Rate: Case Study of DKI Jakarta Province. Jurnal Bina Praja, 15(3), 495–506. https://doi.org/10.21787/jbp.15.2023.495-506

Titaley, C. R., Ariawan, I., Hapsari, D., Muasyaroh, A., & Dibley, M. J. (2019). Determinants of the Stunting of Children Under Two Years Old in Indonesia: A Multilevel Analysis of the 2013 Indonesia Basic Health Survey. Nutrients, 11(5), 1106. https://doi.org/10.3390/nu11051106

Umiyati, S., & Tamrin, M. H. (2021). Penta Helix Synergy in Halal Tourism Development. Proceedings of the 4th International Conference on Sustainable Innovation 2020–Social, Humanity, and Education (ICoSIHESS 2020). https://doi.org/10.2991/assehr.k.210120.108

Utami, T., Kosasih, K., & Sayidin, R. (2023). Analysis of Policy Formulation and Implementation of Stunting Reduction in Penajam Paser Utara District in 2021. Journal on Education, 5(4), 13218–13227. https://doi.org/10.31004/joe.v5i4.2322

Utami, W. A., Rikza, A., Anggresta, P., & Nuryananda, P. F. (2022). The Role of Institutional Collaboration Between Actors in Protecting the Economic Security of Indonesian Migrant Workers With Financial Literacy. Jurnal Bina Praja, 14(2), 373–383. https://doi.org/10.21787/jbp.14.2022.373-383

Van Meter, D. S., & Van Horn, C. E. (1975). The Policy Implementation Process. Administration & Society, 6(4), 445–488. https://doi.org/10.1177/009539977500600404

Yin, R. K. (2018). Case Study Research and Applications: Design and Methods. SAGE Publications.

How to Cite

Endnote/Zotero/Mendeley (RIS)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License .

Make a Submission

Manuscript Template

Accreditation Certificate

Download Certificate

Journal Policies

Focus and Scope
Section Policies
Peer Review Process
Publication Frequency
Open Access Statement
Article Processing Charges
Plagiarism Check
References Management
Author Guidelines
Copyright Notice and Licensing
Publication Ethics and Malpractice Statement
Crossmark Policy Page

Abstracting & Indexing

See complete lists

Citation Analysis

Collaboration

Domestic Policy Strategy Agency Ministry of Home Affairs

Jalan Kramat Raya Nomor 132 Jakarta Pusat DKI Jakarta - 10430

p-ISSN: 2085-4323 e-ISSN Elektronik: 2503-3360

DOI: 10.21787/jbp

Jurnal Bina Praja has been accredited by the Ministry of Research and Technology/Head of the National Research and Innovation Agency of the Republic of Indonesia in SINTA 2 based on Decree Number 200/M/KPT/2020

More information about the publishing system, Platform and Workflow by OJS/PKP.

IMAGES

(PDF) A qualitative case study of evaluation use in the context of a
Understanding Qualitative Research: An In-Depth Study Guide
case study method qualitative research
case study method of qualitative research
QUALITATIVE EVALUATION CHECKLIST
qualitative research with case study

VIDEO

Questions for First Timers About Software: Qualitative Research Methods
WHAT IS CASE STUDY RESEARCH? (Qualitative Research)
QUALITATIVE RESEARCH DESIGN IN EDUCATIONAL RESEAERCH
Lecture 49: Qualitative Resarch
Lecture 47: Qualitative Resarch
Lecture 50: Qualitative Resarch

COMMENTS

PDF Qualitative Research Methods in Program Evaluation
Typically gathered in the field, that is, the setting being studied, qualitative data used for program evaluation are obtained from three sources (Patton, 2002): In-depth interviews that use open-ended questions: "Interviews" include both one-on-one interviews and focus groups.
PDF Using Case Studies to do Program Evaluation
Using Case Studies. to doProgram. Evaluation. valuation of any kind is designed to document what happened in a program. Evaluation should show: 1) what actually occurred, 2) whether it had an impact, expected or unexpected, and 3) what links exist between a program and its observed impacts.
Guidance for the design of qualitative case study evaluation
This guide, written by Professor Frank Vanclay of the Department of Cultural Geography, University of Groningen, provides notes on planning and implementing qualitative case study research.It outlines the use of a variety of different evaluation options that can be used in outcomes assessment and provides examples of the use of story based approaches with a discussion focused on their ...
PDF APPLYING QUALITATIVE EVALUATION METHODS
The Power of Case Studies 196 ... • Determining improvements and changes to a program To introduce qualitative evaluation methods, it is important to first elab- ... Qualitative research has a long history, particularly in disciplines like anthro-pology and sociology, and there have been important changes over time in the ...
Case Study Methodology of Qualitative Research: Key Attributes and
A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...
PDF What is program evaluation?
How does program evaluation answer questions about whether a program works, or how to improve it. Basically, program evaluations systematically collect and analyze data about program activities and outcomes. The purpose of this guide is to briefly describe the methods used in the systematic collection and use of data.
Program Evaluation
The profession of educational and social program evaluation has expanded exponentially around the globe since the mid-1960s and continues to receive the considerable attention of theorists, methodologists, and practitioners. The literature on it is wide and deep, reflecting an array of definitions and conceptions of purpose and social role.
PDF Guidance for the design of qualitative case study evaluation
INTRODUCTION: THE NEED FOR QUALITATIVE EVALUATION. The attempt to identify what works and why are perennial questions for evaluators, program and project managers, funding agencies and policy makers. Policies, programs, plans and projects (hereafter all 'programs' for convenience) all start with good intent, often with long term and (over ...
A qualitative case study of evaluation use in the context of a
Program evaluation is widely recognized in the international humanitarian sector as a means to make interventions and policies more evidence based, equitable, and accountable. Yet, little is known about the way humanitarian non-governmental organizations (NGOs) actually use evaluations. The current qualitative evaluation employed an instrumental case study design to examine evaluation use (EU ...
Program Evaluation Case Pack
Case: The Geography of Poverty: Exploring the Role of Neighborhoods in the Lives of Urban, Adolescent Poor Length: 5 pages Learning Objective: Students learn about the functions of qualitative research and closely examine how qualitative research can enhance the interpretation of quantitative data in the context of a mixed methods evaluation of Moving To Opportunity, a very ambitious anti ...
Program Evaluation
In every chapter, case studies provide real world examples of evaluations broken down into the main elements of program evaluation: the needs that led to the program, the implementation of program plans, the people connected to the program, unexpected side effects, the role of evaluators in improving programs, the results, and the factors ...
Case Study Method: A Step-by-Step Guide for Business Researchers
Although case studies have been discussed extensively in the literature, little has been written about the specific steps one may use to conduct case study research effectively (Gagnon, 2010; Hancock & Algozzine, 2016).Baskarada (2014) also emphasized the need to have a succinct guideline that can be practically followed as it is actually tough to execute a case study well in practice.
Section 1. A Framework for Program Evaluation: A Gateway to Tools
For instance, over-generalizing the results from a single case study to make decisions that affect all sites in a national program is an example of misuse of a case study evaluation. Similarly, program opponents may misuse results by overemphasizing negative findings without giving proper credit for what has worked.
PDF Qualitative Evaluation Checklist
individualization lend themselves to case studies. • Capture and communicate stories—in certain program settings a focus on "stories" is less threatening and more friendly than conducting case studies. Evaluation models: The following evaluation models are especially amenable to qualitative methods—determine which you will use.
(PDF) Qualitative Case Study Methodology: Study Design and
McMaster University, West Hamilton, Ontario, Canada. Qualitative case study methodology prov ides tools for researchers to study. complex phenomena within their contexts. When the approach is ...
Program Evaluation: Principles, Procedures, and Practices
Scriven (1967) introduced the important distinction between summative program evaluations as compared with formative program evaluations. The goal of a summative evaluation is to judge the merits of a fixed, unchanging program as a finished product, relative to potential alternative programs. This judgment should consist of an analysis of the costs and benefits of the program, as compared with ...
A Qualitative Case Study Evaluation of a Government Workforce Training
evaluation model as the conceptual framework, this study used a goal-free evaluation approach. This program evaluation used a qualitative case study research design centered on a purposeful sample of 8 TQP participant interviews. Data were analyzed through coding and thematic analysis. Overall, TQP participants felt that the program was
Program Evaluation for Health Professionals: What It Is, What It Isn't
A review of several nursing research-focused textbooks identified that minimal information is provided about program evaluation compared with other research techniques and skills. For example, only one of the 29 chapters comprising the Nursing Research and Introduction textbook ( Moule et al., 2017 ) focused on program evaluation, including two ...
Qualitative Case Study: A Pilot Program to Improve the Integration of
Evaluation. We conducted a qualitative study as part of a larger evaluation of the Program. Semi-structured interviews investigated the perspectives of the clients, community-based NGOs and health and welfare government staff, on the perceived impact, success factors, constraints and potential improvements for the LW role.
Case study
The GAO (Government Accountability Office) has described six different types of case study: 1. Illustrative: This is descriptive in character and intended to add realism and in-depth examples to other information about a program or policy. (These are often used to complement quantitative data by providing examples of the overall findings).
Program Evaluation Guide
Evaluation should be practical and feasible and conducted within the confines of resources, time, and political context. Moreover, it should serve a useful purpose, be conducted in an ethical manner, and produce accurate findings. Evaluation findings should be used both to make decisions about program implementation and to improve program ...
How effective is nutrition training for staff running after school
Study design. To answer research question 1, the process evaluation, a qualitative study was conducted. Semi-structured group interviews were chosen as the data collection method because it gives comparable, reliable data and the flexibility to ask follow-up questions to clarify answers [].To answer research question 2, the outcome evaluation, a before and after comparison of quantitative data ...
Translating global evidence into local implementation through technical
Realist evaluation connects theories of 'how the world works' with 'how a program works' to explain how interventions trigger mechanisms in different contexts [].We used a realist evaluation methodology [] to identify how, why, and under what conditions TA can strengthen evidence-informed road safety, with a multiple case study design to improve understanding of how BIGRS worked in ...
Designing an evaluation tool for evaluating training programs of
Consumer- based evaluation is one of the ways to improve this center with the consumer. This study was conducted with the aim of preparing a consumer-oriented evaluation tool for CSTC among medical students. The study was mixed method. The first phase was qualitative and for providing an evaluation tool. The second phase was for evaluating the ...
Using case studies to do program evaluation
Resources. Using case studies to do program evaluation. PDF. 79.49 KB. This paper, authored by Edith D. Balbach for the California Department of Health Services is designed to help evaluators decide whether to use a case study evaluation approach. It also offers guidance on how to conduct a case study evaluation.
Evaluation of integrated community case management of the common
Integrated Community Case Management (ICCM) of common childhood illness is one of the global initiatives to reduce mortality among under-five children by two-thirds. It is also implemented in Ethiopia to improve community access and coverage of health services. However, as per our best knowledge the implementation status of integrated community case management in the study area is not well ...
Evaluating Local Government Policy Innovations
A qualitative descriptive method was used with a case study approach involving field observations and structured interviews. The research results show the success of Posyandu Prima in reducing stunting rates significantly in the last two years. The Surabaya City Government has proven effective in managing this program's human resources and budget.