Provider Protocols Reference
A "provider" is a single entry in the providers: list of custos.yaml. Each provider uses a protocol: to determine how CUST/OS communicates with it. This page documents every supported protocol and the fields each one needs.
For the complete field schema, see the configuration reference. For the list of specific model + provider combinations that have been verified end-to-end vs. known-broken vs. code-ready-but-untested, see tested models.
Protocols at a glance
| Protocol | Used for | URL scheme |
|---|---|---|
ollama |
LAN Ollama servers | http://, https:// |
openai |
Anything that speaks the OpenAI chat completions API (llama-cpp-python, koboldcpp, OpenAI, xAI, etc.) and on-device llama.cpp | http://, https://, file:// |
vllm |
vLLM servers (separate adapter — vLLM's reasoning-content format is not OpenAI-compatible) | http://, https:// |
anthropic |
Anthropic Claude API | https:// |
litert |
LiteRT-LM in-process on-device inference | file:// |
vision |
Vision server (ONNX-based detection) | http://, file:// |
cot |
Cross-device delegation over the TAK CoT mesh (WIP) | cot:// |
Tasks at a glance
| Task | Supported protocols |
|---|---|
chat |
ollama, openai, anthropic, litert, cot |
embedding |
ollama, openai |
transcription |
openai (Whisper-compatible API) |
tts |
openai (network TTS endpoints), or no provider (falls back to Android TTS) |
vision / detection |
vision |
protocol: ollama
Native Ollama API. Use this when pointing at a LAN Ollama instance.
- name: "ollama-m4"
task: "chat"
protocol: "ollama"
url: "http://192.168.1.50"
port: 11434
model: "qwen2.5:7b"
taskPriority: 2
tier: "mobile"
Required: name, task, protocol, url, model. Default Ollama port is 11434.
protocol: openai
OpenAI-compatible chat completions API. The most common protocol — use it for OpenAI, xAI, llama-cpp-python, koboldcpp, Oobabooga, and on-device llama.cpp.
Cloud / LAN example
- name: "openai-gpt"
task: "chat"
protocol: "openai"
url: "https://api.openai.com"
model: "gpt-5"
taskPriority: 5
tier: "cloud"
auth: true
Self-hosted OpenAI-compatible example
- name: "local-openai-compat"
task: "chat"
protocol: "openai"
url: "http://10.0.2.2"
port: 8000
model: "Qwen/Qwen2.5-7B-Instruct"
taskPriority: 2
tier: "mounted"
protocol: vllm
vLLM servers speak OpenAI-compatible, but they surface reasoning content in a specific shape (reasoning_content field or <think> tags inside content) that the generic adapter does not strip cleanly. Use protocol: vllm when your server is actually vLLM:
- name: "vllm-server"
task: "chat"
protocol: "vllm"
url: "http://10.0.2.2"
port: 8000
model: "RedHatAI/Qwen3-8B-quantized.w4a16"
taskPriority: 2
tier: "mounted"
Use protocol: openai for every other OpenAI-compatible server.
On-device llama.cpp
For on-device inference, use url: file:// with a runtime: hint. CUST/OS spawns a native llama-server subprocess on the specified port.
- name: "on-device-llm"
task: "chat"
protocol: "openai"
runtime: "llama.cpp"
url: "file:///sdcard/atak/custos/models/qwen3-4b.gguf"
port: 8411
model: "qwen3-4b"
contextSize: 4096
threads: 4
chatTemplatePath: "/sdcard/atak/custos/models/qwen3-tool-calling.jinja"
requestTimeoutMs: 120000
taskPriority: 100
tier: "handheld"
Required for on-device: runtime, url, port, model, chatTemplatePath.
The runtime: accepts llama.cpp, whisper.cpp, or onnxruntime. It is auto-detected from file extension if omitted (.gguf = llama.cpp, .bin for transcription = whisper.cpp, .onnx = onnxruntime).
protocol: anthropic
Anthropic Claude API. Has its own message format, so it uses a dedicated protocol rather than the OpenAI-compatible one.
- name: "anthropic-claude"
task: "chat"
protocol: "anthropic"
url: "https://api.anthropic.com"
model: "claude-sonnet-4-6"
taskPriority: 1
tier: "cloud"
auth: true
Required: name, task, protocol, url, model, auth: true.
protocol: litert
LiteRT-LM in-process on-device inference. No HTTP subprocess — runs in-process and keeps its KV cache warm across reasoning iterations, which significantly reduces follow-up latency. The verified-working handheld combo is Gemma 4 E2B-it on LiteRT-LM on a Samsung S26 Ultra.
- name: "on-device-gemma4"
task: "chat"
protocol: "litert"
url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
model: "gemma-4-E2B-it"
tier: "handheld"
taskPriority: 1
contextSize: 16384
threads: 4
properties:
backend: "cpu"
Required: name, task, protocol, url, model.
The backend property accepts:
"cpu"— default, safe on any device."gpu"— verified numerically stable on Samsung S26 Ultra (Adreno 840) with our custom LiteRT-LM build. Validate on your specific device before relying on it."npu"— experimental Hexagon / dedicated NPU path. Not verified.
protocol: vision
On-device vision server for object detection (ONNX Runtime).
- name: "on-device-detection"
task: "detection"
protocol: "vision"
runtime: "onnxruntime"
url: "file:///sdcard/atak/custos/models/yolo11m.onnx"
port: 8413
model: "yolo11m"
threads: 4
confidence: 0.15
inputSize: 640
taskPriority: 1
tier: "handheld"
properties:
labels: "/sdcard/atak/custos/models/labels.txt"
| Field | Meaning |
|---|---|
confidence |
Detection confidence threshold (0-1) |
inputSize |
Model input dimensions (square) |
threads |
CPU threads for inference |
properties |
Free-form passthrough converted to CLI flags for the native server |
protocol: cot — WIP
Status: the transport is wired but has not been vetted end-to-end against a second CUST/OS node. Message framing and response matching may still change. Configure to experiment, not to depend on.
Delegation over the TAK CoT mesh. Packages an inference request as a CoT broadcast and waits for the response from a remote CUST/OS node.
- name: "command-post-llm"
task: "chat"
protocol: "cot"
url: "cot://command-post"
model: "remote"
taskPriority: 50
tier: "command-post"
The url is a label (cot://<hostname>); routing is handled by the TAK mesh. Tools stay local on each side -- only the inference call hops the mesh.
Per-provider override fields
Any of these defaults: values may be overridden per provider:
| Override | Default source |
|---|---|
maxTokens |
defaults.maxTokens (2048) |
requestTimeoutMs |
defaults.requestTimeoutMs (60000) |
healthCheckIntervalMs |
defaults.healthCheckIntervalMs (5000) |
sampleRate |
defaults.sampleRate (16000) |
maxRecordingDurationMs |
defaults.maxRecordingDurationMs (30000) |
minRecordingDurationMs |
defaults.minRecordingDurationMs (500) |
Tiers
Every provider should have a tier: value that reflects its physical and trust environment:
| Tier | What lives here |
|---|---|
handheld |
Models running on the operator's own device |
pack |
On-person compute connected via wired link (USB, Ethernet). Physically carried by the operator. No RF emissions. |
mobile |
A companion compute device the squad carries |
mounted |
A vehicle-mounted compute platform |
command-post |
A back-line workstation reachable over the TAK mesh |
cloud |
An internet-hosted provider |
See Tiers and priority for details on how tiers drive lockdown modes, classification, and fallback.
API keys
Cloud providers with auth: true need an API key stored in the encrypted key store. Keys are never read from custos.yaml.
To set, change, or remove a key: open the Status panel from the NavBar, tap the provider's row, and use the Set Key / Change Key / Remove Key buttons in the provider detail dialog. The key is encrypted at rest and only decrypted at request time.