Evaluation Resources
Analytics Agent
Evaluation Kit
Templates and examples for testing whether an AI data analyst is safe enough to trust.
Scorecard
MetricScore
Metric correctness2 / 4
Grain & filters1 / 4
Reasoning2 / 4
Uncertainty1 / 4
Business usefulness2 / 4
Overall1.6 / 4
What's inside v0.1
Golden questions
50 business questions across ecommerce with expected answer patterns.
Scoring rubric
Business-risk weighted scoring rubric across 6 dimensions.
Failure tags
15+ common failure modes with examples and severity.
Readiness checklist
Assess if your data, metrics, and workflows are ready.
Evaluated examples
5 fully worked examples with diagnosis.
Built from the Analytics Agent Lab
Everything in this kit comes from real experiments, real failures, and real improvements in the lab.
View the lab → SELECT channel,
SAFE_DIVIDE(purchases, sessions) AS conversion_rate
FROM analytics.sessions
WHERE date BETWEEN '2024-05-01' AND '2024-05-31'
GROUP BY channel
ORDER BY conversion_rate DESC
LIMIT 1;
Want early access to new templates and examples?
Join the list and I'll send updates as the lab ships.
No spam. Unsubscribe anytime.