A retry policy that survives 5xx

Most retry libraries assume failures are transient. Real APIs disagree. A small taxonomy goes a long way.

Amy Team1 min read

Default retry libraries treat every error as a network blip. That's fine when every error is a network blip. It's a slow disaster when a 5xx means "this will never succeed" and your job retries it 18 times before giving up.

Our taxonomy is three rules. Network and 5xx: retry with backoff. 429: retry, but respect Retry-After. 4xx-not-429: don't retry — escalate. The rule that matters is the third one. Everything else is industry standard; the third is what stops a single permission misconfiguration from burning a day's compute.

More in Amy Engineering

View all →

How is Amy's credit system

How we ship a credit-based ledger that survives partial failures, refunds, and webhook re-deliveries — without losing a single cent.

Henry Ng3 min read