brybrydataguy
Analytics Agent Evaluation

Analytics agents are easy to demo.
Hard to trust.

I build practical evaluation tools for AI data analysts: golden questions, SQL checks, failure taxonomies, and business-risk-weighted scorecards.

Helping founders, data teams, and operators ship analytics agents they can actually rely on.

eval_run_014
unsafe
{ "question": "Why did revenue drop last week?", "sql_executes": true, "metric_correctness": 2, "grain_filters": 1, "failure_tags": [ "wrong_denominator", "unsupported_root_cause" ], "severity": "high", "next_step": "Compare channels at session level" }
Metric
Partial
Data source
Correct
Grain & filters
Incorrect
Practical evals. Real data. Fewer bad decisions.
50+
Business questions tested
200+
Agent runs and evaluations
10+
Common failure modes cataloged

“Most analytics-agent failures won’t look like sci-fi hallucinations. They’ll look like subtle metric, grain, join, filter, and uncertainty mistakes that quietly produce bad business decisions.”

Latest Lab Notes

I build, break, and test analytics agents in public.

See all notes →

Want a sanity check on your analytics agent?

I offer a light advisory for teams building or evaluating AI data analysts. No hype. Just practical help.