Most schedulers are deceptively simple — until you ask what happens when a worker dies between dequeueing a job and starting it. Or when a daylight-saving boundary creates two 1:30 AMs. Or when a user pauses an agent mid-run.
We split the problem into two halves: a stateless trigger layer that emits intent ("this schedule wants to fire at T") and a stateful dispatcher that owns idempotency and lifecycle. Each half is replaceable, neither owns the other's bugs.
The lesson, which we learned twice: the schedule is not the run. Conflate them and every retry becomes a duplicate-billing incident.