Research

Notes from our work on agents, models, and orchestration.

Notes on prompt caching at agent scale

What we measured running Amy with aggressive prompt caching: where it pays off, where it backfires, and the rule of thumb we use now.

Mira PatelApril 8, 20262 min read

You don't need a thousand-row eval set to catch the regression that matters. You need the right twenty rows.

Amy TeamJanuary 13, 20261 min read

Most teams reach for fine-tuning too early and prompt engineering too late. Here's the heuristic we use.

Amy TeamJanuary 6, 20261 min read