← All posts
Engineering6 min read

Routing steps across models without rewriting your workflow

Not every step needs your most expensive model. How per-step model routing trims cost and latency while keeping the same workflow definition.

It's tempting to point an entire workflow at the best model available and move on. It's also wasteful. A lot of the steps in a real crew — classifying an input, extracting a field, formatting a result — are easy work that a small, fast model handles perfectly well. Spending premium tokens on them buys you nothing but a bigger bill and higher latency.

Per-step model routing#

In LoopLlama, a model isn't a property of the whole run — it's chosen per step. A workflow has a default model, and any agent in the crew can override it. So the planner and the final synthesizer can run on a frontier model while the classifier and the extractor run on something cheap and fast, all within the same run.

Cost and latency aware#

The payoff compounds across a multi-step run. If four of a workflow's six steps are routine and you move them to a model that's an order of magnitude cheaper, you cut most of the cost while keeping the quality where it actually matters — on the two steps that do the hard reasoning. Latency drops too, because the fast steps finish fast.

  • Route classification, extraction, and formatting to small, fast models.
  • Reserve frontier models for planning, synthesis, and review.
  • Keep latency-sensitive steps on whichever model responds quickest.

Same workflow definition#

The point of doing this at the routing layer is that your workflow logic doesn't change. The crew, the roles, the order, the tools — all of it stays the same. You're swapping which model executes a step, not rewriting how the step works. That means you can tune the cost/quality trade-off after the fact, as models and prices change, without touching the design of the workflow itself.

Model choice stops being an architectural commitment and becomes a dial you can turn. Most teams start with everything on one capable model, watch the per-step traces, and then push the obvious routine steps down to cheaper models once they can see where the tokens are going.

Written by The LoopLlama team.

Run your first agent crew in five minutes.

Get an API key and put these ideas to work. Pay only for the steps your agents run.