Lesson 044: Evaluation habits that compound
Focus
Anchor this page against one production workflow—even hypothetical. Token Evaluation habits that compound:44 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Evaluation habits that compound · drill v4 · spin
20654. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Tool-contract rehearsal
{
"name": "forecastBacklog",
"arguments": {
"hours": 42,
"team": "team-4"
}
}
Preconditions documented in directory snapshot 202513
FAILURE: escalate if horizons > 720h without VP approval memo.
Practice
Practice Attach rollback steps if evaluator variance spikes. — 44 Bump literals mindset by 35.