Lesson 056: Eval harness hardening
Focus
Prefer explicit failure rehearsals over aspirational wording. Token Eval harness hardening:56 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Eval harness hardening · drill v6 · spin
437484. - Habit: attach a trace_id to every completion you would paste into an ops dashboard.
- Guardrail: add one RACI bullet for prompt or index changes before tomorrow's standup.
Deep dive notebook
Synthetic drill artefacts
Refusal RACI
policy_id: LLM-811
allow_when:
confidence_gt: 0.57
refuse_when:
- legal_hold
- unverified_medical
owner: ethics-int
Practice
Practice Simulate degraded retrieval once; capture user-facing fallback copy. — 56 Bump 12.