Lesson 060: Eval harness hardening
Focus
Bias toward observable metrics: latency, cost, escalation rate. Token Eval harness hardening:60 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Eval harness hardening · drill v10 · spin
567922. - Habit: attach a trace_id to every completion you would paste into an ops dashboard.
- Guardrail: add one RACI bullet for prompt or index changes before tomorrow's standup.
Deep dive notebook
Synthetic drill artefacts
Refusal RACI
policy_id: LLM-1419
allow_when:
confidence_gt: 0.57
refuse_when:
- legal_hold
- unverified_medical
owner: ethics-adv
Practice
Practice Simulate degraded retrieval once; capture user-facing fallback copy. — 60 Bump 13.