---
name: conducting-program-evaluation-public-health
language: en
description: Structures program evaluation using CDC framework with process, outcome, and impact assessment. Use when evaluating public health programs, measuring program effectiveness, or conducting logic model analysis.
tags:
  - process
  - public-health
  - valuation
metadata:
  author: casemark
  practice_areas:
    - Public Health
    - Epidemiology
    - Preventive Medicine
  document_types:
    - Process Documentation
  skill_modes:
    - Process Management
---

# Conducting Program Evaluation in Public Health

## Why This Skill Exists

Program evaluation is the systematic assessment of a program's design, implementation, and outcomes — and it is a core function of public health practice. CDC's Framework for Program Evaluation in Public Health (MMWR 1999; 48(RR-11)) established the six-step standard that guides every federally funded public health program evaluation. PHAB accreditation Domain 9 requires health departments to evaluate the effectiveness of programs and interventions. The American Evaluation Association's Guiding Principles and the Joint Committee Program Evaluation Standards provide the ethical and methodological foundation. Yet evaluation remains the most underfunded and underperformed function in public health — programs are implemented without logic models, outcomes are measured without baselines, and results are reported without comparison groups. This skill provides the structured framework to conduct rigorous, useful evaluations that improve programs and demonstrate public health value.

---

## Checkpoint A — Intake and Scoping

### Intake Questions

1. What program is being evaluated — name, purpose, funding source, implementation period?
2. What is the evaluation purpose — formative (improve the program), summative (judge merit/worth), developmental (innovative program in flux), or accountability (funder requirement)?
3. Is a logic model or theory of change available for the program?
4. What evaluation questions need to be answered — process (was it implemented as planned?), outcome (did participants change?), or impact (did population health improve?)?
5. What data are currently collected by the program — enrollment, participation, outputs, outcomes?
6. What comparison or control group is available or feasible?
7. What is the evaluation timeline and budget?
8. Who are the primary intended users of the evaluation findings — program managers, funders, policymakers, community?

### Required Documents

- Program description (goals, objectives, activities, target population, staffing, budget)
- Logic model or theory of change (if it exists; if not, developing one is Step 1)
- Program data: enrollment records, activity logs, output counts, participant surveys, administrative data
- Cooperative agreement or grant requirements for evaluation
- CDC Framework for Program Evaluation (MMWR 1999; 48(RR-11))
- Prior evaluation reports for this or similar programs
- RE-AIM framework dimensions if applicable
- Institutional Review Board (IRB) determination letter (evaluation vs. research distinction)

---

## Step 1 — Engage Stakeholders

Identify and engage three categories of stakeholders (per CDC framework):

- **Those involved in program operations**: Program managers, staff, implementers, partner organizations. They know what the program actually does (which may differ from what the logic model says).
- **Those served or affected by the program**: Participants, community members, target population representatives. They define what "valuable" means.
- **Primary intended users of evaluation findings**: Decision-makers who will act on results — funders, health department leadership, policymakers, board members.

Form an evaluation advisory group from these stakeholder categories. Their involvement ensures the evaluation asks the right questions, uses credible methods, and produces findings that are actually used.

---

## Step 2 — Describe the Program with a Logic Model

If a logic model does not exist, develop one. If it exists, validate it against actual implementation:

- **Inputs**: Resources invested — funding, staff FTE, materials, partnerships, facilities, data systems.
- **Activities**: What the program does — services delivered, trainings conducted, policies implemented, outreach performed.
- **Outputs**: Countable products of activities — number of participants served, sessions delivered, materials distributed, policies enacted.
- **Short-term outcomes** (1-2 years): Changes in knowledge, attitudes, skills, awareness, intentions among participants.
- **Intermediate outcomes** (2-5 years): Changes in behavior, practice, decision-making, organizational policy.
- **Long-term outcomes/impact** (5+ years): Changes in health status, disease rates, quality of life, health equity at the population level.
- **External factors**: Contextual influences not controlled by the program — economic conditions, policy changes, other programs, natural events.

The logic model reveals the program's theory of change — the hypothesized causal chain from activities to impact. It also identifies which links in the chain are supported by evidence and which are assumptions.

---

## Step 3 — Focus the Evaluation Design

Based on the evaluation questions and available resources, select the design:

**Process evaluation** (was the program implemented as designed?):
- Fidelity: Were core components delivered as specified? Use implementation checklists, observation, document review.
- Dose delivered: How much of the program was delivered (sessions, contacts, materials)?
- Dose received: How much did participants engage (attendance, participation quality, satisfaction)?
- Reach: What proportion of the target population was enrolled? Are participants representative of the target population?
- Context: What external factors influenced implementation?

**Outcome evaluation** (did the program produce intended changes?):
- **Pre-post design** (no comparison group): Measures change in participants but cannot attribute it to the program. Minimum standard; acceptable for exploratory evaluation.
- **Quasi-experimental design** (non-randomized comparison): Pre-post with a comparison group (matched, propensity-scored, or naturally occurring). Strengthens causal inference.
- **Experimental design** (randomized controlled trial): Gold standard for causal inference but often infeasible in public health settings due to ethical, logistical, or political constraints.
- **Time series design**: Repeated measures before and after program implementation. Useful for policy evaluations (e.g., impact of a smoke-free law on ED visits for asthma).

**Impact evaluation** (did population health change?):
- Typically requires population-level data (surveillance, vital records, BRFSS) and a longer time horizon.
- Difference-in-differences design: Compare pre-post change in a program jurisdiction to pre-post change in a comparison jurisdiction.
- Interrupted time series: Multiple data points before and after the intervention to detect level and slope changes.

---

## Step 4 — Gather Credible Evidence

Select indicators and data sources for each evaluation question:

- **Quantitative data**: Surveys (pre/post, follow-up), program records (EHR, enrollment databases), administrative data (Medicaid claims, hospital discharge, surveillance), and standardized instruments (PHQ-9 for depression, AUDIT for alcohol, validated behavior scales).
- **Qualitative data**: Key informant interviews with program staff and partners, focus groups with participants, open-ended survey items, document review of program records and meeting minutes.
- **Mixed methods**: Combine quantitative outcome data with qualitative process data to answer "did it work?" and "how/why did it work (or not)?"

Ensure data quality:
- Use validated instruments with established reliability and validity.
- Minimize bias: consistent data collection procedures, training of data collectors, blinding where possible.
- Achieve adequate sample size for statistical power (calculate power a priori for outcome evaluations).
- Obtain IRB determination: most program evaluation is not human subjects research (per 45 CFR 46 common rule), but confirm with IRB.

---

## Step 5 — Justify Conclusions and Apply Standards

Analyze data and draw conclusions against the evaluation standards:

- **Utility**: Are findings useful to stakeholders? Present in formats they can act on — not just p-values, but practical significance and actionable recommendations.
- **Feasibility**: Was the evaluation design practical and cost-effective?
- **Propriety**: Was the evaluation conducted ethically — participant consent, confidentiality, cultural responsiveness, fair reporting of both positive and negative findings?
- **Accuracy**: Are findings valid — threats to internal and external validity addressed, alternative explanations considered, data quality documented?

For quantitative outcomes:
- Report effect sizes (Cohen's d, odds ratios, rate ratios) alongside statistical significance.
- Acknowledge limitations: selection bias, attrition, contamination, Hawthorne effect, historical threats.
- For cost analysis: cost-effectiveness analysis (cost per unit of outcome) or cost-benefit analysis (monetized outcomes vs. costs) when data permit.

For qualitative findings:
- Use systematic coding (thematic analysis, grounded theory, framework analysis).
- Report themes with supporting quotes and disconfirming evidence.
- Triangulate across data sources and methods.

---

## Step 6 — Ensure Use and Share Lessons Learned

The evaluation is worthless if findings are not used:

- Produce tailored reporting products for each stakeholder group: executive summary for leadership, detailed technical report for evaluators and program managers, fact sheet for community, presentation for funders.
- Hold a findings dissemination meeting with the evaluation advisory group. Facilitate discussion of implications and next steps.
- Develop specific recommendations: what should the program start doing, stop doing, or do differently? Link each recommendation to specific findings.
- Track whether recommendations are adopted and whether they lead to program improvement (evaluation of the evaluation).
- Publish findings in peer-reviewed literature, present at APHA or AEA conferences, and submit to What Works Clearinghouse or equivalent evidence registries.

---

## Checkpoint B — Evaluation Review

- [ ] Stakeholders from all three categories engaged in evaluation design
- [ ] Logic model developed or validated against actual program implementation
- [ ] Evaluation questions clearly stated and matched to evaluation design
- [ ] Comparison group identified and justified (or absence justified)
- [ ] Data collection instruments validated and data collectors trained
- [ ] IRB determination obtained (exempt, expedited, or full review)
- [ ] Analysis addresses both statistical and practical significance
- [ ] Findings reported to all stakeholder groups in accessible formats
- [ ] Recommendations are specific, actionable, and linked to findings

---

## Quality Audit

- [ ] CDC Framework for Program Evaluation six steps followed in sequence
- [ ] Logic model includes external factors and distinguishes outputs from outcomes
- [ ] Evaluation design matches evaluation purpose (formative → process; summative → outcome/impact)
- [ ] Pre/post design acknowledges inability to attribute causation without comparison group
- [ ] Validated instruments used for outcome measurement with reliability coefficients reported
- [ ] Sample size/power calculation documented for outcome evaluations
- [ ] Qualitative analysis uses systematic coding method (not anecdotal quote selection)
- [ ] Joint Committee Program Evaluation Standards (utility, feasibility, propriety, accuracy) assessed
- [ ] Both positive and negative findings reported — no suppression of unfavorable results

---

## Guidelines

- Evaluation is not an audit. The purpose is learning and improvement, not compliance enforcement. Evaluators must maintain independence but also be genuinely engaged in helping programs succeed.
- A logic model is not a wish list. It should represent the program's actual theory of change, with plausible causal links. If the logic model shows that a 2-hour workshop will reduce population-level diabetes prevalence, the model is wrong — and the evaluation should be designed to test more proximal outcomes.
- Do not overstate causal claims. A pre-post change in participants without a comparison group could be due to regression to the mean, secular trends, maturation, or co-occurring interventions. Report findings as "associated with" not "caused by" unless the design supports causal inference.
- Negative findings are findings. Programs that do not work should be documented just as rigorously as programs that do. Publication bias toward positive results distorts the evidence base.
- Cost is always relevant. Even effective programs can be cost-ineffective if cheaper alternatives exist. Include cost analysis whenever data permit.
- Cultural responsiveness in evaluation means involving communities in defining what counts as a "good outcome," not just in data collection. Participatory and empowerment evaluation approaches are appropriate when communities are the primary stakeholders.
- Escalate to evaluation lead or program director when: evaluation findings reveal potential harm to participants, data quality issues threaten the validity of conclusions, stakeholders pressure the evaluation team to suppress unfavorable findings, or the program has fundamentally deviated from its design such that the original evaluation questions are no longer relevant.
