Lesson 054: Evaluation harness depth
Focus
Prefer explicit failure rehearsals over aspirational wording. Token Evaluation harness depth:54 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Evaluation harness depth · drill v4 · spin
777274. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Token CFO scratchpad
- prompt_budget_tokens: 1507
- completion_budget_tokens: 635
- cache_signature: `6c27`
Hypothesis: halving completions moves P95 `8`%; record actuals.
Practice
Practice Paste the worked-example template into a wiki stub and annotate owners. — 54 Bump literals mindset by 9.