The ReAct Loop

CUST/OS uses a ReAct (Reason + Act) loop as its central control flow. When the operator sends a message, the agent runs a fixed cycle: think, act, observe, think again -- until it has nothing more to do.

What is ReAct

ReAct is a published reasoning pattern (Yao et al., 2022) that interleaves an LLM's reasoning with real-world tool use. Instead of producing a single response, the model:

  1. Reasons about the task and decides what action to take.
  2. Acts by calling a tool.
  3. Observes the result.
  4. Repeats until the task is complete.

The alternative approaches -- single-shot prompting (one big prompt, one response, no tools) and plan-then-execute (write the whole plan first, then run it) -- both fail in tactical contexts. Single-shot can't query the map. Plan-then-execute can't react to surprises mid-task.

ReAct is the right shape for an operator who says "find the nearest hostile and place a track marker on it" -- the agent needs to query positions, do math, then mutate state, all within a single conversation turn.

How CUST/OS uses it

For each operator message, the agent runs this cycle:

1. Skill selection

Before the first reasoning step, CUST/OS selects the 1-3 most relevant skills for the operator's message using keyword and semantic matching. Only those skills and their tools are included in the prompt. This keeps the context focused and efficient, even when dozens of skills are installed.

2. Context assembly

The system prompt, selected skill descriptions, conversation history, persistent memory, and the operator's message are assembled into a single context payload for the LLM.

3. Reasoning (streaming)

The context is sent to the LLM. Tokens stream back in real time -- the operator sees the agent's response as it is being generated, not after the fact. This is critical for transparency: the operator knows the agent is working and can read its reasoning as it unfolds.

4. Tool detection

As the response streams in, tool calls are detected. A tool call is the model's way of saying "I need to do something" -- query the map, place a marker, measure a distance, send an alert.

5. Security evaluation

Before any tool runs, it passes through the security pipeline:

  • Configurable rules (hooks) can allow, deny, or escalate specific tools.
  • Tools tagged as high-impact require explicit operator approval before execution.
  • The operator sees the tool name, arguments, and impact level, and can approve, deny, or modify the arguments.

6. Tool execution

Approved tools are executed in the sandboxed script runtime. The result -- success or failure, with data -- is appended to the conversation.

7. Loop or finish

The agent now sees the tool results and decides what to do next:

  • More tool calls needed -- the cycle repeats from step 3.
  • Task complete -- the agent responds with text only (no tool calls), and the loop ends.

The loop is bounded by a configurable maximum number of iterations. The agent cannot reason forever.

Streaming

Tokens appear in the operator's chat panel as the model generates them. This is not cosmetic -- it serves three purposes:

  1. Responsiveness -- the operator knows the agent is working within a second of sending a message.
  2. Steerability -- the operator can read the agent's reasoning in progress and interrupt or redirect before the agent acts.
  3. Transparency -- there is no hidden "thinking" phase. Everything the agent considers is visible.

Operator control

The operator is never a passive recipient:

  • Stop -- halt the agent mid-loop. The agent returns whatever partial work it has completed.
  • Steer -- inject a correction ("scratch that, do X instead") without restarting the conversation. The correction is woven into the context before the next reasoning step.
  • Approve / Deny / Modify -- for high-impact tools, the operator gates every consequential action.

Graceful degradation

Two safeguards prevent the loop from failing silently:

  • Loop detection -- if the agent calls the same tool with the same arguments repeatedly without making progress, the loop is terminated early.
  • Last-iteration failure -- if the agent is on its final allowed iteration and every tool in the previous turn failed, the agent is forced to explain what it tried and what went wrong, rather than hitting a hard error. The operator gets a useful explanation instead of a dead end.

When not to use the ReAct loop

The ReAct loop is the right shape for interactive, exploratory, tactical work. It is not the right shape for:

  • Deterministic playbooks — if the steps are known in advance, write them directly as Lua. A skill script or automation can chain tools.call("step_one", ...)tools.call("step_two", ...) without an LLM in the loop.
  • High-frequency monitors — a 15-second perimeter sweep shouldn't run a reasoning loop on every tick. Put it in an automation with a deterministic run(ctx) body.
  • Batch operations — same reasoning: compose the steps in Lua, call delegate only when you need the LLM to make a judgement call.

The ReAct loop earns its keep when the next step genuinely depends on what was just observed. When it doesn't, the Lua composition path is cheaper, faster, and more auditable.

See also