Observability for long-running agent runs

Traces work great until your span outlives your trace exporter. Notes from instrumenting eight-hour agent jobs.

Amy TeamMarch 31, 20261 min read

OpenTelemetry assumes spans end. Agent runs disagree.

A single Amy job can spawn 200 tool calls across an eight-hour window, with retries, partial failures, and human approvals in the middle. Treating that as one trace blows up the exporter; treating each tool call as its own trace loses the parent context.

What works: a trace per logical phase, a stable correlation ID stitched through metadata, and ledger-style append-only events for anything we'd want to replay. The traces stop being the source of truth and start being the search index over the events.

More in Amy Engineering

View all →

Amy Engineering

Inside Amy's credit system

How we ship a credit-based ledger that survives partial failures, refunds, and webhook re-deliveries — without losing a single cent.

Henry NgApril 22, 20263 min read

Amy Engineering

How we sandbox untrusted browser tools

Running an agent's browser actions next to production data is a footgun. Here's the isolation model we landed on.

Amy TeamApril 7, 20261 min read

Amy Engineering

Designing the Amy job scheduler

Cron-style triggers, idempotent dispatch, and the surprising amount of work that lives between 'fire' and 'run'.

Amy TeamMarch 24, 20261 min read