Add a Cloud Provider
CUST/OS routes inference through providers: entries in custos.yaml. This guide shows how to add a new chat provider for each supported protocol.
Where to edit
Edit custos.yaml in the in-app Editor panel (Settings > Edit Configuration) or push a new config via your deployment tooling. The config hot-reloads on save -- no plugin restart needed.
Pattern: every chat provider
Every chat provider has the same shape. Only protocol, url, and model change between providers:
providers:
- name: "<unique-id>"
task: "chat"
protocol: "<adapter>"
url: "<endpoint>"
model: "<model-name>"
port: 11434
taskPriority: 1
tier: "cloud"
classification: "UNCLASSIFIED"
auth: false
Ollama (LAN)
- name: "ollama-m4"
task: "chat"
protocol: "ollama"
url: "http://192.168.1.50"
port: 11434
model: "qwen2.5:7b"
taskPriority: 2
tier: "mobile"
Ollama defaults to 127.0.0.1. To expose it on your LAN, configure it to bind 0.0.0.0.
OpenAI-compatible (llama-cpp-python, koboldcpp, etc.)
- name: "local-openai-compat"
task: "chat"
protocol: "openai"
url: "http://10.0.2.2"
port: 8000
model: "Qwen/Qwen2.5-7B-Instruct"
taskPriority: 2
tier: "mounted"
Anything that exposes the OpenAI chat completions API works here: llama-cpp-python, koboldcpp, Oobabooga, and similar servers.
vLLM
vLLM speaks OpenAI-compatible but surfaces reasoning content in a specific shape that the generic adapter does not strip cleanly. Use protocol: vllm for vLLM servers:
- name: "vllm-server"
task: "chat"
protocol: "vllm"
url: "http://10.0.2.2"
port: 8000
model: "RedHatAI/Qwen3-8B-quantized.w4a16"
taskPriority: 2
tier: "mounted"
OpenAI
- name: "openai-gpt"
task: "chat"
protocol: "openai"
url: "https://api.openai.com"
model: "gpt-5"
taskPriority: 5
tier: "cloud"
auth: true
xAI Grok
- name: "xai-grok"
task: "chat"
protocol: "openai"
url: "https://api.x.ai"
model: "grok-4-1-fast-reasoning"
contextSize: 2000000
taskPriority: 5
tier: "cloud"
classification: "UNCLASSIFIED"
auth: true
xAI uses the OpenAI-compatible API, so set protocol: openai with their base URL.
Anthropic Claude
- name: "anthropic-claude"
task: "chat"
protocol: "anthropic"
url: "https://api.anthropic.com"
model: "claude-sonnet-4-6"
taskPriority: 1
tier: "cloud"
classification: "UNCLASSIFIED"
auth: true
Anthropic has its own message format, so it uses a dedicated protocol rather than the OpenAI one.
On-device (LiteRT-LM, Gemma-4-family)
- name: "on-device-gemma4"
task: "chat"
protocol: "litert"
url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
model: "gemma-4-E2B-it"
tier: "handheld"
taskPriority: 1
contextSize: 16384
threads: 4
properties:
backend: "cpu" # "cpu" (default), "gpu", or "npu"
LiteRT-LM runs in-process and keeps its KV cache warm across reasoning iterations, which makes follow-up turns noticeably faster than the llama.cpp subprocess path. This is the verified-working handheld combo — see Tested models for the matrix. Gemma 4 E2B-it is recommended for 16 GB-RAM devices; E4B-it runs but needs contextSize dropped to 4–8K.
On-device (llama.cpp)
- name: "on-device-llm"
task: "chat"
protocol: "openai"
runtime: "llama.cpp"
url: "file:///sdcard/atak/custos/models/qwen3-4b.gguf"
port: 8411
model: "qwen3-4b"
contextSize: 4096
threads: 4
chatTemplatePath: "/sdcard/atak/custos/models/qwen3-tool-calling.jinja"
requestTimeoutMs: 120000
taskPriority: 100
tier: "handheld"
classification: "CUI"
For file:// URLs, CUST/OS spawns a native inference server on the requested port. See How-to: Add an on-device model for the full procedure.
CoT delegation (split inference over the TAK mesh) — WIP
Status: the transport is wired but has not been vetted end-to-end against a second CUST/OS node. Message framing and response matching may still change. Configure it to experiment, not to depend on it.
- name: "command-post-llm"
task: "chat"
protocol: "cot"
url: "cot://command-post"
model: "remote"
taskPriority: 50
tier: "command-post"
classification: "UNCLASSIFIED"
This packages an inference request as a CoT broadcast and waits for the response from a remote CUST/OS node. Tools stay local -- only the inference call hops the mesh.
API keys
CUST/OS does not read API keys from custos.yaml. Keys are stored in the device's encrypted keystore, keyed by provider name.
To set, change, or remove a key:
- Tap the Status icon in the NavBar.
- Tap the row for the cloud provider you just added. Without a key it'll show as offline.
- The provider detail dialog opens with Set Key (or Change Key if one is already set) and Remove Key buttons.
- Tap Set Key, paste the key, confirm.
The key is encrypted at rest and only decrypted at request time. Within a few seconds the provider's row in Status flips to online.
For unattended / MDM deploys, keys can be written to the encrypted store programmatically. See your deployment guide for the exact procedure.
Verify
After saving the config:
- The Status panel should show your new provider as online within 5 seconds.
- If it shows offline, tap the row to see the error message.
Routing rules
When more than one provider has task: chat and is online, CUST/OS picks the one with the lowest taskPriority. Ties go to whoever was added first.
If the preferred provider fails repeatedly, the router falls back to the next-priority provider until the first one recovers.
A note on tier
taskPriority is the ordinal half of provider selection -- what order should I try them in? tier is the categorical half -- what kind of trust and reachability environment is this provider in?
Security modes filter the active provider list by tier. For example, mode: emcon restricts to handheld and pack only, while mode: standalone restricts to handheld only. Set tier correctly on every provider you add.
Available tiers: handheld, pack, mobile, mounted, command-post, cloud.