Lesson 060: Eval harness hardening
Focus
Document interfaces between retrieval, prompts, and policy engines. Token Eval harness hardening:60 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Eval harness hardening · drill v10 · spin
474531. - Habit: attach a trace_id to every completion you would paste into an ops dashboard.
- Guardrail: add one RACI bullet for prompt or index changes before tomorrow's standup.
Deep dive notebook
Synthetic drill artefacts
Agent choreography
1. Observe bucket `TENANT-15`
2. Max steps `9`
3. Tools: `retrieve, escalate_human, log_decision`
4. Stop: spend cap | keyword `STOP-5`
Practice
Practice Attach rollback steps if cost-per-request crosses your guardrail. — 60 Bump 31.