Lesson 041: Evaluation habits that compound
Focus
Bias toward observable metrics, not model marketing. Token Evaluation habits that compound:41 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Evaluation habits that compound · drill v1 · spin
676164. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Tool-contract rehearsal
{
"name": "forecastBacklog",
"arguments": {
"hours": 24,
"team": "team-4"
}
}
Preconditions documented in directory snapshot 202510
FAILURE: escalate if horizons > 720h without VP approval memo.
Practice
Practice List five adversarial prompts unique to your org’s nouns. — 41 Bump literals mindset by 32.