When a multi-agent run produces a bad result, the first question is always the same: where did it go wrong? Without records, an agent loop is a black box — you get the final output and a vague sense that something upstream misfired. LoopLlama records every step of every run so you can answer that question the way you'd debug code: by reading the trace.
Every step is recorded#
Each step in a run captures the agent role that ran it, the input it saw, the output it produced, the tokens it consumed, how long it took, and whether it succeeded. Strung together in order, those records are a trace: a precise, replayable account of how the crew arrived at its answer.
Reading a trace#
A typical trace reads top to bottom like a call stack. You can see the planner decompose the goal, the researcher gather context, the writer produce a draft, and the reviewer flag issues. When the output is wrong, you scan the trace for the step where the context first went sideways — maybe the researcher pulled the wrong source, or the planner's decomposition missed a requirement. The defect is almost never where you first assumed; the trace shows you where it actually is.
- Which agent produced the questionable content?
- What input did it have when it did — was the upstream context already wrong?
- Did a step fail and get retried, or silently produce something off?
- Where did the tokens go, and which step was the expensive one?
Replay#
Reading a trace tells you where the problem is; replay lets you fix it. Because runs are checkpointed at every step, you can re-run from the step before the failure — with a tweaked prompt or a corrected input — without paying to redo the steps that were already fine. That tight loop, isolate then replay, is what turns debugging an agent system from guesswork into engineering.
The same traces feed usage accounting and audit logs, so the data you use to debug is the data you use to understand cost and to prove what an agent did. One record, three jobs.