A new class of models has arrived that spends more time — and more tokens — thinking before it answers, and is markedly better at hard reasoning as a result. They're also slower and pricier per call. That trade-off is exactly the kind of thing an agent runtime should be able to exploit.
Not every step deserves deep reasoning#
A crew's steps are not equally hard. Classifying an input or extracting a field is shallow work that a fast, cheap model does well. Decomposing an ambiguous goal or reconciling conflicting sources is deep work where a reasoning model earns its cost. Spending a reasoning model's budget on the shallow steps is pure waste; using a cheap model on the deep ones is false economy.
The runtime should choose per step#
This is why we treat the model as a per-step decision rather than a property of the whole run. The reasoning step gets the reasoning model; the formatting step gets something fast. As this class of models matures, the workflows that win won't be the ones that route everything to the smartest model — they'll be the ones that match each step to the right one.