Lesson 049: Evaluation habits that compound
Focus
Assume an auditor reruns everything you claim; narrate checkpoints aloud. Token Evaluation habits that compound:49 keeps neighbouring lessons differentiable.
Key ideas
- Thread: Evaluation habits that compound · drill v9 · spin
441103. - Habit: pair every model utterance with a trace_id you could paste into Grafana.
- Guardrail: write one RACI bullet referencing this lesson tomorrow.
Deep dive notebook
Synthetic drill artefacts
Logging field note
| Field | Retention |
|-------|-----------|
| trace_id | 14 days |
| prompt_hash_sha256 | permanent |
| completion_excerpt_redacted | 24h hot, then cold vault |
Heuristic `pii-mask-v0` tags sensitive spans before persistence.
Practice
Practice Draft three eval assertions QA must greenlight before launch. — 49 Bump literals mindset by 29.