brybrydataguy
Hands-on, in public

Analytics Agent Lab

A public workbench where I build, break, evaluate, and improve analytics agents.

Following the full lifecycle: data → prompts → SQL → results → evals → failures → improvements.
Current Project

Ecommerce Analytics Agent

Building an end-to-end agent over a BigQuery ecommerce dataset. 50 business questions. Comparing versions. Documenting failures.

01

Build sandbox

BigQuery ecommerce data + metric definitions

02

Naive agent

NL → SQL → result → summary logs

03

Golden set

50 business questions with expected answer patterns

04

Eval harness

Scores, failure tags, severity, version comparison

eval_run_016
t.ident, t.filter
Q: Why did revenue drop last week? SQL executes : true Metric correctness : 2 / 4 Grain & filters : 1 / 4 Failure tags : ["unsupported_root_cause"] Severity : high Recommendation : Show query + uncertainty
Score
1.6 / 4
Root cause
unsafe

Recent lab notes

See all notes →
May 12, 2024

The 10 ways my agent got revenue wrong

Wrong denominators, join multiplication, and more.

Read note →
May 6, 2024

Version 2 results: better SQL, same reasoning gaps

Adding metric definitions helped—but root cause is still weak.

Read note →
Apr 28, 2024

Building my golden question set

Why I weight questions by business risk, not just accuracy.

Read note →