Configuration Reference

CUST/OS is configured through custos.yaml, located at /sdcard/atak/custos/config/custos.yaml. The file is hot-reloaded -- changes take effect without restarting ATAK.

Unknown keys are ignored. Missing keys use their defaults.

Top-level structure

agent: {}            # Agent behavior tunables
defaults: {}         # Defaults shared by all providers
providers: []        # Inference providers (LLM, embedding, STT, TTS, vision, detection)
agents: []           # Specialist agent profiles for delegation
rag: {}              # RAG / chunking settings
delegation: {}       # Cross-device delegation settings
security: {}         # Security policy
scheduling: {}       # Automation scheduler settings
hooks: {}            # Pre/post tool hook rules
memory: {}           # Persistent memory settings

`agent`

Controls how the agent reasons and selects skills.

agent:
  persona: "You are a TAK-native AI assistant for tactical operations..."
  maxReasoningIterations: 10
  callsign: "CUSTOS"
  contextSummaryThreshold: 30
  maxToolResultChars: 4000
  deferToolLoading: false
  maxSelectedSkills: 3
  fallbackMode: priority

Key	Type	Default	Meaning
`persona`	string	(built-in)	The system prompt prefix the agent uses
`maxReasoningIterations`	int	`10`	Hard cap on reasoning loop iterations per user message
`callsign`	string	`"CUSTOS"`	Display name in the chat header and identity for delegation
`contextSummaryThreshold`	int	`30`	Compress conversation when message count exceeds this
`contextCompactionThreshold`	float	`0.8`	Compress when token usage exceeds this fraction of context window
`maxToolResultChars`	int	`4000`	Truncate tool results larger than this
`maxCompressedHistoryChars`	int	`2000`	Cap on the compressed history summary
`snipThreshold`	int	`2000`	Snip middles of large strings before they enter context
`deferToolLoading`	bool	`false`	If true, only a tool-search helper is loaded initially; the LLM discovers other tools on demand
`maxSelectedSkills`	int	`3`	Number of skills passed to the LLM each turn
`minSkillConfidence`	int	`50`	Minimum confidence score (0-100) for a skill to be considered
`synonymsPath`	string?	`null`	Optional path to a synonyms YAML file for keyword matching
`fallbackMode`	string	`"priority"`	Provider fallback strategy: `priority` (sort by `taskPriority`) or `tier-grouped` (sort by tier order — handheld, pack, mobile, mounted, command-post, cloud — then priority within each tier)

`defaults`

Global defaults applied to providers that don't override them.

defaults:
  maxTokens: 2048
  requestTimeoutMs: 60000
  healthCheckIntervalMs: 5000
  sampleRate: 16000
  maxRecordingDurationMs: 30000
  minRecordingDurationMs: 500
  byTier:
    handheld:
      requestTimeoutMs: 30000
      maxTokens: 1024
    cloud:
      requestTimeoutMs: 120000
      maxTokens: 4096

Key	Type	Default	Meaning
`maxTokens`	int	`2048`	Default max output tokens per request
`requestTimeoutMs`	long	`60000`	Default HTTP timeout for inference calls
`healthCheckIntervalMs`	long	`5000`	How often each provider is health-checked
`sampleRate`	int	`16000`	PCM sample rate for voice recording
`maxRecordingDurationMs`	long	`30000`	Max length of a single push-to-talk recording
`minRecordingDurationMs`	long	`500`	Below this, the recording is treated as a tap, not a press
`byTier`	map	`{}`	Per-tier override defaults for `maxTokens` / `requestTimeoutMs` / `healthCheckIntervalMs`. Per-provider values still win.

`providers`

A list of inference providers. See the providers reference for full per-protocol details.

providers:
  - name: "xai-grok"
    task: "chat"
    protocol: "openai"
    url: "https://api.x.ai"
    model: "grok-4-1-fast-reasoning"
    taskPriority: 5
    tier: "cloud"
    auth: true

Key	Type	Required	Meaning
`name`	string	yes	Unique identifier; used to look up API keys
`task`	string	yes	One of: `chat`, `embedding`, `transcription`, `tts`, `vision`, `detection`
`protocol`	string	yes	Protocol type: `ollama`, `openai`, `vllm`, `anthropic`, `litert`, `cot`, `vision`
`url`	string	yes	Base URL: `http://...`, `https://...`, `file:///...`, or `cot://...`
`model`	string	yes	Model name passed to the provider
`port`	int?	no	TCP port for http(s) URLs
`taskPriority`	int	no	Lower wins (default `1`). Determines fallback order among providers for the same task.
`tier`	string	no	One of `handheld`, `pack`, `mobile`, `mounted`, `command-post`, `cloud`. Drives lockdown mode filtering, classification ceilings, per-tier budget defaults, hook filters, and fallback grouping. See Tiers and priority.
`classification`	string	no	Max data classification this provider may handle (default `UNCLASSIFIED`)
`auth`	bool	no	If `true`, look up API key from the encrypted key store
`maxTokens`	int?	no	Override `defaults.maxTokens`
`requestTimeoutMs`	long?	no	Override `defaults.requestTimeoutMs`
`runtime`	string?	no	For `file://` URLs: `llama.cpp`, `whisper.cpp`, `onnxruntime`
`contextSize`	int?	no	LLM context window
`threads`	int?	no	CPU threads for native on-device servers
`chatTemplatePath`	string?	no	Path to a jinja2 chat template (for tool calling with llama.cpp)
`confidence`	float?	no	Vision: detection confidence threshold
`inputSize`	int?	no	Vision: input resolution
`properties`	map?	no	Free-form key/value passthrough to native server CLI

Protocol: `litert`

LiteRT-LM in-process inference. No port needed. The url must point to a .litertlm model file on device.

- name: "on-device-gemma4"
  task: "chat"
  protocol: "litert"
  url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
  model: "gemma-4-E2B-it"
  tier: "handheld"
  contextSize: 16384
  properties:
    backend: "cpu"

`properties` keys by runtime

Key	Applies to	Values	Description
`backend`	`protocol: "litert"`	`"cpu"` (default), `"gpu"`, `"npu"`	LiteRT-LM compute backend. `cpu` is the safe default; `gpu` is verified on the Samsung S26 Ultra (Adreno 840) and recommended once you've validated it on your hardware; `npu` is experimental.
`gpu-layers`	`runtime: "llama.cpp"`	Integer string, e.g. `"99"`	Number of layers to offload to Vulkan GPU
`reasoning-budget`	`runtime: "llama.cpp"`	`"0"` to disable	Controls extended thinking for models that support it
`flash-attn`	`runtime: "llama.cpp"`	`"on"` to enable	Enable flash attention
`cache-type-k`	`runtime: "llama.cpp"`	`"q8_0"` recommended	KV cache quantization for keys
`cache-type-v`	`runtime: "llama.cpp"`	`"q8_0"` recommended	KV cache quantization for values

`agents`

Specialist agent profiles for LLM-driven delegation. An empty list means the orchestrator handles everything itself.

agents:
  - name: "tactical"
    provider: "fast-handheld"
    role: "Tactical Responder"
    goal: "Fast answers about map, markers, navigation, and SA"
    backstory: "Experienced TAK operator focused on speed and brevity"
    skills:
      - "custos.tactical_picture"
      - "custos.markers"

Key	Type	Required	Meaning
`name`	string	yes	Unique identifier; used as the `agent_name` in `delegate` calls
`provider`	string	yes	Name of a provider entry to use for this agent's inference
`role`	string	no	What the agent IS — used in the composed persona
`goal`	string	no	What the agent is TRYING to do
`backstory`	string	no	WHY it behaves this way
`skills`	list[string]	no	Skill IDs this agent always has available in its context (in addition to whatever the skill selector picks). Use to pre-load the specialist's domain tools.

`rag`

rag:
  chunkSize: 512
  chunkOverlap: 64
  embeddingDimensions: 768

Key	Type	Default	Meaning
`chunkSize`	int	`512`	Default chunk size for vector store ingestion
`chunkOverlap`	int	`64`	Overlap between chunks
`embeddingDimensions`	int	`768`	Embedding vector dimensionality (must match the embedding model)

`delegation`

Controls cross-device delegation over the TAK CoT mesh.

delegation:
  allowUpward: true
  allowDownward: false
  maxQueueDepth: 10
  maxConcurrency: 3
  requestTimeoutMs: 60000
  approvalTimeoutMs: 30000
  sessionTtlMs: 600000

Key	Type	Default	Meaning
`allowUpward`	bool	`true`	Accept delegation from lower-tier nodes
`allowDownward`	bool	`false`	Accept delegation from higher-tier nodes
`maxQueueDepth`	int	`10`	Max queued inbound delegation requests
`maxConcurrency`	int	`3`	Max simultaneous delegation sessions
`requestTimeoutMs`	long	`60000`	Round-trip timeout for delegated requests
`approvalTimeoutMs`	long	`30000`	How long an inbound delegation waits for operator approval
`sessionTtlMs`	long	`600000`	Stale session eviction (10 min default)

`security`

security:
  requirePki: false
  mode: normal
  classificationLevel: "UNCLASSIFIED"
  trustedKeysPath: "/sdcard/atak/custos/keys"

Key	Type	Default	Meaning
`requirePki`	bool	`false`	WIP. Intended to require signed skills; the signature verifier exists but is not yet wired into skill loading. Setting this has no effect in the current release.
`mode`	string?	`null` (treated as `normal`)	Lockdown mode: `normal`, `no-cloud`, `field-only`, `squad-only`, `emcon`, or `standalone`. Filters the active provider list to a tier set. See Tiers and priority.
`classificationLevel`	string	`"UNCLASSIFIED"`	Operator clearance -- caps which providers can be used
`trustedKeysPath`	string	`/sdcard/atak/custos/keys`	WIP. Path intended to hold signed-skill verification keys. Not yet consumed.

`scheduling`

scheduling:
  enabled: true

Key	Type	Default	Meaning
`enabled`	bool	`true`	Master switch for the automation scheduler

`hooks`

Pre/post tool execution rules. First match wins.

hooks:
  rules:
    - event: PreToolUse
      toolPattern: "place_*"
      action: deny
      reason: "EMCON in effect"
    - event: PreToolUse
      toolPattern: "speak_alert"
      action: allow
    - event: PreToolUse
      toolPattern: "send_*"
      whenTier: ["cloud", "command-post"]
      action: require_approval
      reason: "Comms tools require approval when reasoning at remote tiers"

Each rule has:

Key	Type	Required	Meaning
`event`	string	yes	`PreToolUse`, `PostToolUse`, `PostToolUseFailure`
`toolPattern`	string	yes	Glob pattern: ``, `place_`, `speak_alert`
`action`	string	yes	`allow`, `deny`, `require_approval`
`reason`	string	no	Human-readable; shown to operator on `deny`, logged to audit
`whenTier`	list[string]?	no	Rule only fires when the active inference tier is in this set

`memory`

memory:
  enabled: true
  maxFactsInContext: 10

Key	Type	Default	Meaning
`enabled`	bool	`true`	Inject persistent memory facts into the system prompt
`maxFactsInContext`	int	`10`	Cap on facts auto-injected per turn

Hot-reload behavior

Most changes take effect automatically on the next inference call or tool execution:

Change	Effect
Add/remove a provider	Picked up on next inference call
Change `agent.persona`	Applies to the next message
Add a hook rule	Evaluated on next tool call
Change `agents` block	Applies immediately
Toggle `deferToolLoading`	Applies to the next user message