Memory and Context
The agent's "memory" is what the LLM sees on a given turn -- the system prompt, conversation history, selected skills, and dynamic state. CUST/OS has several layers stacked on top of the raw context window to give the agent useful state without wasting space. This page explains what those layers are and how they fit together.
The layers
Every turn, the agent's context is assembled from these sources:
- System prompt -- the agent's persona and behavioral rules (mostly static, cached between turns).
- Selected skills -- the 1-3 most relevant skills for the current message, with their tool definitions and usage rules.
- Dynamic section -- rebuilt every turn with the current time, operator identity, persistent memory facts, active todos, and cached tool results.
- Conversation history -- recent messages in full, older messages in summarized form.
- The operator's latest message.
The split between cached and dynamic parts is deliberate. The persona and skill descriptions rarely change, so they are cached. The time of day, recent tool results, and active memory facts change every turn, so they are rebuilt fresh.
Conversations
Each conversation is independent. It has its own message history, its own summary state, and its own tool result cache. The operator can have multiple parallel conversations -- one for route planning, one for casualty reports, one for sensor tasking -- and switch between them instantly.
The chat panel provides a conversation selector and a new-conversation button. Starting a new conversation gives the agent a clean slate without losing the history of prior conversations.
Context management (summarization)
LLMs have a finite context window. A long mission can easily exceed it. CUST/OS handles this automatically:
When the conversation exceeds a configurable threshold, older messages are summarized to free space while preserving the information that matters. The summarization is structured -- it specifically preserves:
- Exact coordinates and grid references
- Callsigns and unit designations
- Decisions made and their rationale
- Tool calls and their results
- Timestamps and sequence of events
The most recent messages and the very first messages in the conversation are kept verbatim. The middle -- the bulk of the history -- is compressed into a structured summary. This preserves both the original framing of the conversation and the most recent tactical state.
If summarization fails for any reason (no provider available, unexpected error), the system falls back to simple truncation: drop the oldest messages until the conversation fits. Less accurate, but never fails.
Persistent memory
Some facts should outlive a single conversation:
- "My callsign is VIPER-6"
- "OBJ ALPHA is at 38.5N 77.0W"
- "The route through grid 12S WG is mined"
These are stored as key-value pairs in a persistent memory layer. The agent can remember, recall, and forget facts using dedicated tools. Critically, the most relevant persistent facts are automatically included in the agent's context every turn -- the agent sees them without being asked.
Persistent memory is the right shape for short, precise facts. For long passages or unstructured knowledge (SOPs, doctrinal references), CUST/OS provides a separate vector store for semantic retrieval.
Tool result caching
When the agent calls a tool, the result is cached for the duration of the conversation. On subsequent turns, recently cached results are included in the agent's context under a "previously computed values" section. This serves two purposes:
- Deduplication -- the agent sees that it already has the operator's position, the nearby unit list, or the distance calculation, and skips re-calling the tool.
- Context efficiency -- a cached one-line summary of a tool result costs far less context space than re-executing the tool and including the full result again.
The cache is cleared when the operator switches conversations.
Skill selection
With dozens of skills installed, including all of them in every prompt would exhaust the context window. Instead, CUST/OS selects the 1-3 most relevant skills per turn using a combination of keyword matching and semantic similarity.
Only the selected skills -- their descriptions, rules, and tool definitions -- are included in the prompt. This keeps the system prompt compact even as the skill library grows. The selection is transparent: the operator can see which skills were selected for each turn.
If no skills match the operator's message, the agent responds with its general knowledge and no tools. This is the correct behavior -- not every message needs a tool.
What this gets you
The point of these layers is to make a small context window go a long way without losing fidelity:
- Summarization keeps long missions tractable without losing critical tactical values.
- Persistent memory ensures important facts are always visible to the agent.
- Tool result caching prevents redundant tool calls and saves context space.
- Skill selection keeps the prompt focused on what matters for the current message.
- Multiple conversations let the operator compartmentalize different tasks.
The result is that even a constrained on-device model can host a conversation that would otherwise require many times its context window in raw history, without losing the things that matter.
See also
- Architecture overview -- where memory fits in the larger system
- The ReAct loop -- how the agent uses context each turn
- Skills as runtime, not bundle -- what skills are and how they are selected