Provider Protocols Reference

A "provider" is a single entry in the providers: list of custos.yaml. Each provider uses a protocol: to determine how CUST/OS communicates with it. This page documents every supported protocol and the fields each one needs.

For the complete field schema, see the configuration reference. For the list of specific model + provider combinations that have been verified end-to-end vs. known-broken vs. code-ready-but-untested, see tested models.

Protocols at a glance

Protocol Used for URL scheme
ollama LAN Ollama servers http://, https://
openai Anything that speaks the OpenAI chat completions API (llama-cpp-python, koboldcpp, OpenAI, xAI, etc.) and on-device llama.cpp http://, https://, file://
vllm vLLM servers (separate adapter — vLLM's reasoning-content format is not OpenAI-compatible) http://, https://
anthropic Anthropic Claude API https://
litert LiteRT-LM in-process on-device inference file://
vision Vision server (ONNX-based detection) http://, file://
cot Cross-device delegation over the TAK CoT mesh (WIP) cot://

Tasks at a glance

Task Supported protocols
chat ollama, openai, anthropic, litert, cot
embedding ollama, openai
transcription openai (Whisper-compatible API)
tts openai (network TTS endpoints), or no provider (falls back to Android TTS)
vision / detection vision

protocol: ollama

Native Ollama API. Use this when pointing at a LAN Ollama instance.

- name: "ollama-m4"
  task: "chat"
  protocol: "ollama"
  url: "http://192.168.1.50"
  port: 11434
  model: "qwen2.5:7b"
  taskPriority: 2
  tier: "mobile"

Required: name, task, protocol, url, model. Default Ollama port is 11434.

protocol: openai

OpenAI-compatible chat completions API. The most common protocol — use it for OpenAI, xAI, llama-cpp-python, koboldcpp, Oobabooga, and on-device llama.cpp.

Cloud / LAN example

- name: "openai-gpt"
  task: "chat"
  protocol: "openai"
  url: "https://api.openai.com"
  model: "gpt-5"
  taskPriority: 5
  tier: "cloud"
  auth: true

Self-hosted OpenAI-compatible example

- name: "local-openai-compat"
  task: "chat"
  protocol: "openai"
  url: "http://10.0.2.2"
  port: 8000
  model: "Qwen/Qwen2.5-7B-Instruct"
  taskPriority: 2
  tier: "mounted"

protocol: vllm

vLLM servers speak OpenAI-compatible, but they surface reasoning content in a specific shape (reasoning_content field or <think> tags inside content) that the generic adapter does not strip cleanly. Use protocol: vllm when your server is actually vLLM:

- name: "vllm-server"
  task: "chat"
  protocol: "vllm"
  url: "http://10.0.2.2"
  port: 8000
  model: "RedHatAI/Qwen3-8B-quantized.w4a16"
  taskPriority: 2
  tier: "mounted"

Use protocol: openai for every other OpenAI-compatible server.

On-device llama.cpp

For on-device inference, use url: file:// with a runtime: hint. CUST/OS spawns a native llama-server subprocess on the specified port.

- name: "on-device-llm"
  task: "chat"
  protocol: "openai"
  runtime: "llama.cpp"
  url: "file:///sdcard/atak/custos/models/qwen3-4b.gguf"
  port: 8411
  model: "qwen3-4b"
  contextSize: 4096
  threads: 4
  chatTemplatePath: "/sdcard/atak/custos/models/qwen3-tool-calling.jinja"
  requestTimeoutMs: 120000
  taskPriority: 100
  tier: "handheld"

Required for on-device: runtime, url, port, model, chatTemplatePath.

The runtime: accepts llama.cpp, whisper.cpp, or onnxruntime. It is auto-detected from file extension if omitted (.gguf = llama.cpp, .bin for transcription = whisper.cpp, .onnx = onnxruntime).

protocol: anthropic

Anthropic Claude API. Has its own message format, so it uses a dedicated protocol rather than the OpenAI-compatible one.

- name: "anthropic-claude"
  task: "chat"
  protocol: "anthropic"
  url: "https://api.anthropic.com"
  model: "claude-sonnet-4-6"
  taskPriority: 1
  tier: "cloud"
  auth: true

Required: name, task, protocol, url, model, auth: true.

protocol: litert

LiteRT-LM in-process on-device inference. No HTTP subprocess — runs in-process and keeps its KV cache warm across reasoning iterations, which significantly reduces follow-up latency. The verified-working handheld combo is Gemma 4 E2B-it on LiteRT-LM on a Samsung S26 Ultra.

- name: "on-device-gemma4"
  task: "chat"
  protocol: "litert"
  url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
  model: "gemma-4-E2B-it"
  tier: "handheld"
  taskPriority: 1
  contextSize: 16384
  threads: 4
  properties:
    backend: "cpu"

Required: name, task, protocol, url, model.

The backend property accepts:

  • "cpu" — default, safe on any device.
  • "gpu" — verified numerically stable on Samsung S26 Ultra (Adreno 840) with our custom LiteRT-LM build. Validate on your specific device before relying on it.
  • "npu" — experimental Hexagon / dedicated NPU path. Not verified.

protocol: vision

On-device vision server for object detection (ONNX Runtime).

- name: "on-device-detection"
  task: "detection"
  protocol: "vision"
  runtime: "onnxruntime"
  url: "file:///sdcard/atak/custos/models/yolo11m.onnx"
  port: 8413
  model: "yolo11m"
  threads: 4
  confidence: 0.15
  inputSize: 640
  taskPriority: 1
  tier: "handheld"
  properties:
    labels: "/sdcard/atak/custos/models/labels.txt"
Field Meaning
confidence Detection confidence threshold (0-1)
inputSize Model input dimensions (square)
threads CPU threads for inference
properties Free-form passthrough converted to CLI flags for the native server

protocol: cotWIP

Status: the transport is wired but has not been vetted end-to-end against a second CUST/OS node. Message framing and response matching may still change. Configure to experiment, not to depend on.

Delegation over the TAK CoT mesh. Packages an inference request as a CoT broadcast and waits for the response from a remote CUST/OS node.

- name: "command-post-llm"
  task: "chat"
  protocol: "cot"
  url: "cot://command-post"
  model: "remote"
  taskPriority: 50
  tier: "command-post"

The url is a label (cot://<hostname>); routing is handled by the TAK mesh. Tools stay local on each side -- only the inference call hops the mesh.


Per-provider override fields

Any of these defaults: values may be overridden per provider:

Override Default source
maxTokens defaults.maxTokens (2048)
requestTimeoutMs defaults.requestTimeoutMs (60000)
healthCheckIntervalMs defaults.healthCheckIntervalMs (5000)
sampleRate defaults.sampleRate (16000)
maxRecordingDurationMs defaults.maxRecordingDurationMs (30000)
minRecordingDurationMs defaults.minRecordingDurationMs (500)

Tiers

Every provider should have a tier: value that reflects its physical and trust environment:

Tier What lives here
handheld Models running on the operator's own device
pack On-person compute connected via wired link (USB, Ethernet). Physically carried by the operator. No RF emissions.
mobile A companion compute device the squad carries
mounted A vehicle-mounted compute platform
command-post A back-line workstation reachable over the TAK mesh
cloud An internet-hosted provider

See Tiers and priority for details on how tiers drive lockdown modes, classification, and fallback.

API keys

Cloud providers with auth: true need an API key stored in the encrypted key store. Keys are never read from custos.yaml.

To set, change, or remove a key: open the Status panel from the NavBar, tap the provider's row, and use the Set Key / Change Key / Remove Key buttons in the provider detail dialog. The key is encrypted at rest and only decrypted at request time.

See also