brybrydataguy
Evaluation Resources

Analytics Agent
Evaluation Kit

Templates and examples for testing whether an AI data analyst is safe enough to trust.

Scorecard
MetricScore
Metric correctness2 / 4
Grain & filters1 / 4
Reasoning2 / 4
Uncertainty1 / 4
Business usefulness2 / 4
Overall1.6 / 4

What's inside v0.1

Golden questions

50 business questions across ecommerce with expected answer patterns.

Scoring rubric

Business-risk weighted scoring rubric across 6 dimensions.

Failure tags

15+ common failure modes with examples and severity.

Readiness checklist

Assess if your data, metrics, and workflows are ready.

Evaluated examples

5 fully worked examples with diagnosis.

Built from the Analytics Agent Lab

Everything in this kit comes from real experiments, real failures, and real improvements in the lab.

View the lab →
lab_snippet
SELECT channel, SAFE_DIVIDE(purchases, sessions) AS conversion_rate FROM analytics.sessions WHERE date BETWEEN '2024-05-01' AND '2024-05-31' GROUP BY channel ORDER BY conversion_rate DESC LIMIT 1;

Want early access to new templates and examples?

Join the list and I'll send updates as the lab ships.

No spam. Unsubscribe anytime.