Add a Cloud Provider

CUST/OS routes inference through providers: entries in custos.yaml. This guide shows how to add a new chat provider for each supported protocol.

Where to edit

Edit custos.yaml in the in-app Editor panel (Settings > Edit Configuration) or push a new config via your deployment tooling. The config hot-reloads on save -- no plugin restart needed.

Pattern: every chat provider

Every chat provider has the same shape. Only protocol, url, and model change between providers:

providers:
  - name: "<unique-id>"
    task: "chat"
    protocol: "<adapter>"
    url: "<endpoint>"
    model: "<model-name>"
    port: 11434
    taskPriority: 1
    tier: "cloud"
    classification: "UNCLASSIFIED"
    auth: false

Ollama (LAN)

  - name: "ollama-m4"
    task: "chat"
    protocol: "ollama"
    url: "http://192.168.1.50"
    port: 11434
    model: "qwen2.5:7b"
    taskPriority: 2
    tier: "mobile"

Ollama defaults to 127.0.0.1. To expose it on your LAN, configure it to bind 0.0.0.0.

OpenAI-compatible (llama-cpp-python, koboldcpp, etc.)

  - name: "local-openai-compat"
    task: "chat"
    protocol: "openai"
    url: "http://10.0.2.2"
    port: 8000
    model: "Qwen/Qwen2.5-7B-Instruct"
    taskPriority: 2
    tier: "mounted"

Anything that exposes the OpenAI chat completions API works here: llama-cpp-python, koboldcpp, Oobabooga, and similar servers.

vLLM

vLLM speaks OpenAI-compatible but surfaces reasoning content in a specific shape that the generic adapter does not strip cleanly. Use protocol: vllm for vLLM servers:

  - name: "vllm-server"
    task: "chat"
    protocol: "vllm"
    url: "http://10.0.2.2"
    port: 8000
    model: "RedHatAI/Qwen3-8B-quantized.w4a16"
    taskPriority: 2
    tier: "mounted"

OpenAI

  - name: "openai-gpt"
    task: "chat"
    protocol: "openai"
    url: "https://api.openai.com"
    model: "gpt-5"
    taskPriority: 5
    tier: "cloud"
    auth: true

xAI Grok

  - name: "xai-grok"
    task: "chat"
    protocol: "openai"
    url: "https://api.x.ai"
    model: "grok-4-1-fast-reasoning"
    contextSize: 2000000
    taskPriority: 5
    tier: "cloud"
    classification: "UNCLASSIFIED"
    auth: true

xAI uses the OpenAI-compatible API, so set protocol: openai with their base URL.

Anthropic Claude

  - name: "anthropic-claude"
    task: "chat"
    protocol: "anthropic"
    url: "https://api.anthropic.com"
    model: "claude-sonnet-4-6"
    taskPriority: 1
    tier: "cloud"
    classification: "UNCLASSIFIED"
    auth: true

Anthropic has its own message format, so it uses a dedicated protocol rather than the OpenAI one.

On-device (LiteRT-LM, Gemma-4-family)

  - name: "on-device-gemma4"
    task: "chat"
    protocol: "litert"
    url: "file:///sdcard/atak/custos/models/gemma-4-E2B-it.litertlm"
    model: "gemma-4-E2B-it"
    tier: "handheld"
    taskPriority: 1
    contextSize: 16384
    threads: 4
    properties:
      backend: "cpu"     # "cpu" (default), "gpu", or "npu"

LiteRT-LM runs in-process and keeps its KV cache warm across reasoning iterations, which makes follow-up turns noticeably faster than the llama.cpp subprocess path. This is the verified-working handheld combo — see Tested models for the matrix. Gemma 4 E2B-it is recommended for 16 GB-RAM devices; E4B-it runs but needs contextSize dropped to 4–8K.

On-device (llama.cpp)

  - name: "on-device-llm"
    task: "chat"
    protocol: "openai"
    runtime: "llama.cpp"
    url: "file:///sdcard/atak/custos/models/qwen3-4b.gguf"
    port: 8411
    model: "qwen3-4b"
    contextSize: 4096
    threads: 4
    chatTemplatePath: "/sdcard/atak/custos/models/qwen3-tool-calling.jinja"
    requestTimeoutMs: 120000
    taskPriority: 100
    tier: "handheld"
    classification: "CUI"

For file:// URLs, CUST/OS spawns a native inference server on the requested port. See How-to: Add an on-device model for the full procedure.

CoT delegation (split inference over the TAK mesh) — WIP

Status: the transport is wired but has not been vetted end-to-end against a second CUST/OS node. Message framing and response matching may still change. Configure it to experiment, not to depend on it.

  - name: "command-post-llm"
    task: "chat"
    protocol: "cot"
    url: "cot://command-post"
    model: "remote"
    taskPriority: 50
    tier: "command-post"
    classification: "UNCLASSIFIED"

This packages an inference request as a CoT broadcast and waits for the response from a remote CUST/OS node. Tools stay local -- only the inference call hops the mesh.

API keys

CUST/OS does not read API keys from custos.yaml. Keys are stored in the device's encrypted keystore, keyed by provider name.

To set, change, or remove a key:

  1. Tap the Status icon in the NavBar.
  2. Tap the row for the cloud provider you just added. Without a key it'll show as offline.
  3. The provider detail dialog opens with Set Key (or Change Key if one is already set) and Remove Key buttons.
  4. Tap Set Key, paste the key, confirm.

The key is encrypted at rest and only decrypted at request time. Within a few seconds the provider's row in Status flips to online.

For unattended / MDM deploys, keys can be written to the encrypted store programmatically. See your deployment guide for the exact procedure.

Verify

After saving the config:

  1. The Status panel should show your new provider as online within 5 seconds.
  2. If it shows offline, tap the row to see the error message.

Routing rules

When more than one provider has task: chat and is online, CUST/OS picks the one with the lowest taskPriority. Ties go to whoever was added first.

If the preferred provider fails repeatedly, the router falls back to the next-priority provider until the first one recovers.

A note on tier

taskPriority is the ordinal half of provider selection -- what order should I try them in? tier is the categorical half -- what kind of trust and reachability environment is this provider in?

Security modes filter the active provider list by tier. For example, mode: emcon restricts to handheld and pack only, while mode: standalone restricts to handheld only. Set tier correctly on every provider you add.

Available tiers: handheld, pack, mobile, mounted, command-post, cloud.