Provider Protocols Reference

A "provider" is a single entry in the providers: list of custos.yaml. Each provider uses a protocol: to determine how CUST/OS communicates with it. This page documents every supported protocol and the fields each one needs.

For the complete field schema, see the configuration reference. For the list of specific model + provider combinations that have been verified end-to-end vs. known-broken vs. code-ready-but-untested, see tested models.

Protocols at a glance

Protocol	Used for	URL scheme
`ollama`	LAN Ollama servers	`http://`, `https://`
`openai`	Anything that speaks the OpenAI chat completions API (llama-cpp-python, koboldcpp, OpenAI, xAI, etc.) and on-device llama.cpp	`http://`, `https://`, `file://`
`vllm`	vLLM servers (separate adapter — vLLM's reasoning-content format is not OpenAI-compatible)	`http://`, `https://`
`anthropic`	Anthropic Claude API	`https://`
`litert`	LiteRT-LM in-process on-device inference	`file://`
`vision`	Vision server (ONNX-based detection)	`http://`, `file://`
`cot`	Cross-device delegation over the TAK CoT mesh (WIP)	`cot://`

Tasks at a glance

Task	Supported protocols
`chat`	ollama, openai, anthropic, litert, cot
`embedding`	ollama, openai
`transcription`	openai (Whisper-compatible API)
`tts`	openai (network TTS endpoints), or no provider (falls back to Android TTS)
`vision` / `detection`	vision

`protocol: ollama`

Native Ollama API. Use this when pointing at a LAN Ollama instance.

- name: "ollama-m4"
  task: "chat"
  protocol: "ollama"
  url: "http://192.168.1.50"
  port: 11434
  model: "qwen2.5:7b"
  taskPriority: 2
  tier: "mobile"

Required: name, task, protocol, url, model. Default Ollama port is 11434.

`protocol: openai`

OpenAI-compatible chat completions API. The most common protocol — use it for OpenAI, xAI, llama-cpp-python, koboldcpp, Oobabooga, and on-device llama.cpp.

Cloud / LAN example

- name: "openai-gpt"
  task: "chat"
  protocol: "openai"
  url: "https://api.openai.com"
  model: "gpt-5"
  taskPriority: 5
  tier: "cloud"
  auth: true

Self-hosted OpenAI-compatible example

- name: "local-openai-compat"
  task: "chat"
  protocol: "openai"
  url: "http://10.0.2.2"
  port: 8000
  model: "Qwen/Qwen2.5-7B-Instruct"
  taskPriority: 2
  tier: "mounted"

`protocol: vllm`

vLLM servers speak OpenAI-compatible, but they surface reasoning content in a specific shape (reasoning_content field or <think> tags inside content) that the generic adapter does not strip cleanly. Use protocol: vllm when your server is actually vLLM:

- name: "vllm-server"
  task: "chat"
  protocol: "vllm"
  url: "http://10.0.2.2"
  port: 8000
  model: "RedHatAI/Qwen3-8B-quantized.w4a16"
  taskPriority: 2
  tier: "mounted"

Use protocol: openai for every other OpenAI-compatible server.

On-device llama.cpp

For on-device inference, use url: file:// with a runtime: hint. CUST/OS spawns a native llama-server subprocess on the specified port.

- name: "on-device-llm"
  task: "chat"
  protocol: "openai"
  runtime: "llama.cpp"
  url: "file:///sdcard/atak/custos/models/qwen3-4b.gguf"
  port: 8411
  model: "qwen3-4b"
  contextSize: 4096
  threads: 4
  chatTemplatePath: "/sdcard/atak/custos/models/qwen3-tool-calling.jinja"
  requestTimeoutMs: 120000
  taskPriority: 100
  tier: "handheld"

Required for on-device: runtime, url, port, model, chatTemplatePath.

The runtime: accepts llama.cpp, whisper.cpp, or onnxruntime. It is auto-detected from file extension if omitted (.gguf = llama.cpp, .bin for transcription = whisper.cpp, .onnx = onnxruntime).

`protocol: anthropic`

Anthropic Claude API. Has its own message format, so it uses a dedicated protocol rather than the OpenAI-compatible one.

- name: "anthropic-claude"
  task: "chat"
  protocol: "anthropic"
  url: "https://api.anthropic.com"
  model: "claude-sonnet-4-6"
  taskPriority: 1
  tier: "cloud"
  auth: true

Required: name, task, protocol, url, model, auth: true.

`protocol: litert`

LiteRT-LM in-process on-device inference. No HTTP subprocess — runs in-process and keeps its KV cache warm across reasoning iterations, which significantly reduces follow-up latency. The verified-working handheld combo is Gemma 4 E2B-it on LiteRT-LM on a Samsung S26 Ultra.

- name: "on-device-gemma4"
  task: "chat"
  protocol: "litert"
  url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
  model: "gemma-4-E2B-it"
  tier: "handheld"
  taskPriority: 1
  contextSize: 16384
  threads: 4
  properties:
    backend: "cpu"

Required: name, task, protocol, url, model.

The backend property accepts:

"cpu" — default, safe on any device.
"gpu" — verified numerically stable on Samsung S26 Ultra (Adreno 840) with our custom LiteRT-LM build. Validate on your specific device before relying on it.
"npu" — experimental Hexagon / dedicated NPU path. Not verified.

`protocol: vision`

On-device vision server for object detection (ONNX Runtime).

- name: "on-device-detection"
  task: "detection"
  protocol: "vision"
  runtime: "onnxruntime"
  url: "file:///sdcard/atak/custos/models/yolo11m.onnx"
  port: 8413
  model: "yolo11m"
  threads: 4
  confidence: 0.15
  inputSize: 640
  taskPriority: 1
  tier: "handheld"
  properties:
    labels: "/sdcard/atak/custos/models/labels.txt"

Field	Meaning
`confidence`	Detection confidence threshold (0-1)
`inputSize`	Model input dimensions (square)
`threads`	CPU threads for inference
`properties`	Free-form passthrough converted to CLI flags for the native server

`protocol: cot` — WIP

Status: the transport is wired but has not been vetted end-to-end against a second CUST/OS node. Message framing and response matching may still change. Configure to experiment, not to depend on.

Delegation over the TAK CoT mesh. Packages an inference request as a CoT broadcast and waits for the response from a remote CUST/OS node.

- name: "command-post-llm"
  task: "chat"
  protocol: "cot"
  url: "cot://command-post"
  model: "remote"
  taskPriority: 50
  tier: "command-post"

The url is a label (cot://<hostname>); routing is handled by the TAK mesh. Tools stay local on each side -- only the inference call hops the mesh.

Per-provider override fields

Any of these defaults: values may be overridden per provider:

Override	Default source
`maxTokens`	`defaults.maxTokens` (2048)
`requestTimeoutMs`	`defaults.requestTimeoutMs` (60000)
`healthCheckIntervalMs`	`defaults.healthCheckIntervalMs` (5000)
`sampleRate`	`defaults.sampleRate` (16000)
`maxRecordingDurationMs`	`defaults.maxRecordingDurationMs` (30000)
`minRecordingDurationMs`	`defaults.minRecordingDurationMs` (500)

Tiers

Every provider should have a tier: value that reflects its physical and trust environment:

Tier	What lives here
`handheld`	Models running on the operator's own device
`pack`	On-person compute connected via wired link (USB, Ethernet). Physically carried by the operator. No RF emissions.
`mobile`	A companion compute device the squad carries
`mounted`	A vehicle-mounted compute platform
`command-post`	A back-line workstation reachable over the TAK mesh
`cloud`	An internet-hosted provider

See Tiers and priority for details on how tiers drive lockdown modes, classification, and fallback.

API keys

Cloud providers with auth: true need an API key stored in the encrypted key store. Keys are never read from custos.yaml.

To set, change, or remove a key: open the Status panel from the NavBar, tap the provider's row, and use the Set Key / Change Key / Remove Key buttons in the provider detail dialog. The key is encrypted at rest and only decrypted at request time.