Lesson 051: Eval harness hardening
Focus
Document interfaces between retrieval, prompts, and policy engines. Token Eval harness hardening:51 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Eval harness hardening · drill v1 · spin
233126. - Habit: attach a trace_id to every completion you would paste into an ops dashboard.
- Guardrail: add one RACI bullet for prompt or index changes before tomorrow's standup.
Deep dive notebook
Synthetic drill artefacts
Ops review capsule
Pilot L-51 checkpoint
- Grounding accuracy Δ `0.73`
- Escalation rate Δ `0.041`
- Spend guardrail `$613/day`
Risk: Token burn spike
Owner: Safety
Practice
Practice Draft three eval assertions QA must pass before prompt promotion. — 51 Bump 24.