The honest version of the rule is: prompt-engineer until it's clearly not working, then ask whether fine-tuning would actually fix the failure mode you're seeing. Most of the time the answer is no — the model isn't unable to do the task, it just hasn't been told the task crisply enough.
The cases where fine-tuning genuinely wins are narrow: high-volume, low-latency, fixed-format outputs where you've already wrung the prompt dry. For everything else, a better prompt is cheaper, faster to iterate, and easier to roll back.