Lesson 154: Benchmarks read with scepticism
Focus
Treat placeholders as compulsory—swap nouns immediately after reading. Token Benchmarks read with scepticism:154 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Benchmarks read with scepticism · drill v4 · spin
860288. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Agent choreography card
1. Observe transcripts bucket `BUCKET-12`
2. Budget steps `8`
3. Tool whitelist: `retrieve_docs, escalate_human, log_decision`
4. Hard stop triggers: hallucination_budget | escalation keyword `URGENT-15`
Practice
Practice List five adversarial prompts unique to your org’s nouns. — 154 Bump literals mindset by 9.