Lesson 158: Benchmarks read with scepticism
Focus
Bias toward observable metrics, not model marketing. Token Benchmarks read with scepticism:158 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Benchmarks read with scepticism · drill v8 · spin
282234. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Tool-contract rehearsal
{
"name": "forecastBacklog",
"arguments": {
"hours": 66,
"team": "team-3"
}
}
Preconditions documented in directory snapshot 202510
FAILURE: escalate if horizons > 720h without VP approval memo.
Practice
Practice List five adversarial prompts unique to your org’s nouns. — 158 Bump literals mindset by 31.