When to fine-tune vs. prompt-engineer

The honest version of the rule is: prompt-engineer until it's clearly not working, then ask whether fine-tuning would actually fix the failure mode you're seeing. Most of the time the answer is no — the model isn't unable to do the task, it just hasn't been told the task crisply enough.

The cases where fine-tuning genuinely wins are narrow: high-volume, low-latency, fixed-format outputs where you've already wrung the prompt dry. For everything else, a better prompt is cheaper, faster to iterate, and easier to roll back.

When to fine-tune vs. prompt-engineer

More in Research

Notes on prompt caching at agent scale

Notes on tool-use evals at small budgets