When training ends, most organizations measure satisfaction and report "4.3 satisfaction — well received." This is the deepest misconception in training evaluation.
Satisfaction right after training (Lv1) barely correlates with training effectiveness. Worse, the relationship can invert: the more satisfying the training, the less the job actually changes. A session with a great speaker, lively exercises, and a "that was time well spent" feeling can be a success as entertainment while producing no behavior change (Lv3).
This article keeps the classic Kirkpatrick four-level model as its foundation but demotes satisfaction from the result-metric seat. It rebuilds the questions around a field reality: training succeeds or fails based not on how good the session was, but on whether the workplace lets people apply it.
Kirkpatrick's four levels
Published in 1959, still in use after half a century.
| Level | What it measures | When | Link to outcome |
|---|---|---|---|
| Lv1: Reaction | Participant satisfaction | Right after | Weak (caution) |
| Lv2: Learning | Knowledge / skill acquisition | After + 1 week | Moderate |
| Lv3: Behavior | Behavior change on the job | 1–3 months later | Strong (the real one) |
| Lv4: Results | Business impact | 6–12 months later | The ultimate goal |
Most evaluations stop at Lv1. But the real value of training only becomes visible at Lv3 — and since Lv1 and Lv3 correlate weakly, judging success on Lv1 alone is dangerous.
Lv1 (Reaction) — use it as a floor, not a result
You don't need to discard satisfaction. You change its role. Use Lv1 as a floor — "below this, something's wrong" — not as "high = success."
Q1. How satisfied were you with today's training? (5-point)
Q2. Was the instructor clear? (5-point)
Q3. Will the content apply to your work? (5-point)
Q4. Any suggested improvements? (open text, optional)
The most predictive item here is Q3, "will it apply to your work?" Q3 predicts later Lv3 (behavior change) better than raw satisfaction (Q1). In the report, headline Q3, not Q1.
Lv2 (Learning) — measure "can do," not "feel confident"
Q5. Please answer the following about the training content
(3–5 questions tied to the training theme)
Q6. Where do you feel confident applying what you learned? (Multiple)
□ The X procedure
□ The Y decision criteria
□ The Z tool operation
□ Not confident on anything yet
Q7. What would you like to revisit? (open text, optional)
The key is to always include the Q5 knowledge test. "Feels like I get it" and "can actually do it" are different. Self-reported confidence alone doesn't measure Lv2.
Lv3 (Behavior) — the real one. Measure what's blocking application
Sent 1–3 months after training. The most important survey of the set.
Q8. Did you apply the training content in your work?
○ Many times
○ A few times
○ No opportunity yet
○ Wanted to but couldn't
Q9. What changed in your work after applying it? (open text)
Q10. What's blocking application? (Multiple)
□ Workload — no time
□ No opportunity
□ Forgot the content
□ Lack of support from manager / colleagues
□ Nothing
The most valuable item is Q10. When "wanted to but couldn't," "no manager support," and "no opportunity" dominate, that isn't a training failure. It's a problem outside the training: the workplace isn't allowing application.
This is the heart of the article. However excellent the training, if the returning participant has "no time to try it" or "got stopped by their manager," behavior won't change. Training succeeds or fails based on the post-training work environment, not the quality of the session itself. Q10 is the one question that makes that environmental problem visible.
Lv4 (Results) — correlate with business KPIs
Q11. Since training, have you seen movement in:
- Deal conversion rate (for sales training)
- Complaint rate (for customer-facing training)
- Error rate (for operations training)
Q12. Training ROI thoughts (open text, optional)
Self-report gives directional signal; where possible, cross-check against actual KPI data.
Per-training-type design
Onboarding
- Focus Lv1–Lv2; send right after + 3 months later
- Add manager / mentor evaluation to surface Lv3
Manager training
- Focus Lv2–Lv3; reinforce Lv3 with subordinate 360 feedback
- At 6 months, layer in team-level change (attrition, pulse-survey scores)
Skill training (engineering, sales, etc.)
- Lv2 skill check is mandatory; correlate Lv4 with business KPIs
Compliance training
- Lv1–Lv2 is usually sufficient; set a passing threshold on the Lv2 test
Distribution timing — don't stop at one send
Sending once right after is incomplete — you only get Lv1, and the real one (Lv3) stays invisible forever. Recommend at least 3 sends.
| When | Levels measured |
|---|---|
| Right after | Lv1 (reaction) + Lv2 (learning) |
| 1 month later | Lv3 (behavior, early signal) |
| 3 months later | Lv3 (behavior established + blockers) |
To avoid burning out participants across rounds:
- Round 2 onwards: ≤ 5 questions
- Open with "based on your feedback, we changed X"
- Return aggregated results to participants
Reading the results — don't start from satisfaction
Just changing the report order changes the quality of training evaluation.
- Application rate (Lv3) — combined "many times / a few times." 30%+ is the bar, 50%+ excellent.
- Blockers (Lv3, Q10) — if environmental factors dominate, the next move is fixing the workplace, not the training.
- "Will apply to work" (Lv1, Q3) — leading indicator that predicts Lv3.
- Knowledge test accuracy (Lv2) — 80%+ is passing.
- Satisfaction (Lv1, Q1) — floor only; just watch for sudden drops.
Training with low application rate but high satisfaction is "fun, but nothing changed." Not missing that combination is what protects training ROI.
Frequently asked questions
What questions should a post-training survey include?
A good post-training survey covers all four Kirkpatrick levels over time, not just satisfaction. Right after training, ask reaction questions (satisfaction, instructor clarity, and — most predictive — "will this apply to your work?") plus a short knowledge check. One to three months later, ask the behavior questions: did you apply it, what changed, and what's blocking application. The blocker question is the single most valuable item, because it reveals whether the gap is the training or the workplace.
Can I get sample post-training survey questions?
Yes — this article includes a full set of sample post-training survey questions organized by level: Lv1 reaction (Q1–Q4), Lv2 learning with a knowledge check (Q5–Q7), Lv3 behavior including the "what's blocking application" question (Q8–Q10), and Lv4 results tied to business KPIs (Q11–Q12). Copy the blocks that match your training type from the sections above.
Is there a post-training survey template I can reuse?
Use the Lv1–Lv2 block right after the session as your base template, then duplicate the form for the 1-month and 3-month follow-ups (swapping in the Lv3 behavior questions). Keeping the reaction questions identical every time is what lets you compare across sessions. Repoan's post-training feedback and manager training templates ship with this structure ready to use.
How long after training should I send the feedback form?
Send the reaction/learning form immediately (highest response rate while memory is fresh), then at least two more sends: one month later for early behavior signal, and three months later for established behavior change and blockers. A single same-day send only captures Lv1 — which correlates weakly with whether the training actually worked.
Summary
Measuring training only at "satisfaction right after" isn't just insufficient — it's misleading.
- Demote satisfaction (Lv1) from result metric; use it as a floor
- The real one is Lv3 (behavior change), especially what's blocking application
- If the blockers are environmental, the next move is workplace change, not more training
- Capture Lv3 with at least 3 sends: right after, 1 month, 3 months
Repoan's post-training feedback and manager training templates ship with Lv1–Lv2 questions ready to use.
For the 1-month and 3-month follow-ups, duplicate the form per cadence. Once responses accumulate, AI reports (see AI-driven response analysis) auto-classify "why I couldn't apply it" — making the training-vs-environment distinction a sustainable operating habit.