brybrydataguy analytics-agent lab

Analytics Agent Evaluation

Analytics agents are easy to demo.
Hard to trust.

I build practical evaluation tools for AI data analysts: golden questions, SQL checks, failure taxonomies, and business-risk-weighted scorecards.

Explore the Lab → Get the checklist (free)

Helping founders, data teams, and operators ship analytics agents they can actually rely on.

eval_run_014

unsafe

{ "question": "Why did revenue drop last week?", "sql_executes": true, "metric_correctness": 2, "grain_filters": 1, "failure_tags": [ "wrong_denominator", "unsupported_root_cause" ], "severity": "high", "next_step": "Compare channels at session level" }

Metric

Partial

Data source

Correct

Grain & filters

Incorrect

Practical evals. Real data. Fewer bad decisions.

50+

Business questions tested

200+

Agent runs and evaluations

10+

Common failure modes cataloged

“Most analytics-agent failures won’t look like sci-fi hallucinations. They’ll look like subtle metric, grain, join, filter, and uncertainty mistakes that quietly produce bad business decisions.”

Latest Lab Notes

I build, break, and test analytics agents in public.

See all notes →

The 10 ways my agent got revenue wrong

Wrong denominators, join multiplication, and more subtle SQL generation errors.

Version 2 results: better SQL, same reasoning gaps

Adding metric definitions helped—but root cause analysis is still weak.

Building my golden question set

Why I weight questions by business risk, not just semantic accuracy.

Want a sanity check on your analytics agent?

I offer a light advisory for teams building or evaluating AI data analysts. No hype. Just practical help.

View Advisory Services Contact Me