Observability for long-running agent runs

Traces work great until your span outlives your trace exporter. Notes from instrumenting eight-hour agent jobs.

Amy Team1 min read

OpenTelemetry assumes spans end. Agent runs disagree.

A single Amy job can spawn 200 tool calls across an eight-hour window, with retries, partial failures, and human approvals in the middle. Treating that as one trace blows up the exporter; treating each tool call as its own trace loses the parent context.

What works: a trace per logical phase, a stable correlation ID stitched through metadata, and ledger-style append-only events for anything we'd want to replay. The traces stop being the source of truth and start being the search index over the events.

More in Amy Engineering

View all →

How is Amy's credit system

How we ship a credit-based ledger that survives partial failures, refunds, and webhook re-deliveries — without losing a single cent.

Henry Ng3 min read