Provider Setup
Providers connect your agents to large language models. Each provider plugin handles API authentication, streaming, tool calling format differences, and thinking/reasoning support so your agent config stays clean.
RivetOS ships with five provider plugins:
| Provider | Models | Thinking Support | Notes |
|---|---|---|---|
| Anthropic | Claude Opus, Sonnet, Haiku | ✅ | Extended thinking, OAuth login |
| xAI | Grok 3, Grok 4 | ✅ | Responses API, conversation caching |
| Gemini 2.5 Pro, Flash | ✅ | Thought signatures for function calling | |
| Ollama | Any local model | — | Local inference, no API key needed |
| llama.cpp server | Any model served by llama-server | — | Local llama-server binary (native sampling, |
Anthropic (Claude)
Section titled “Anthropic (Claude)”1. Get an API Key
Section titled “1. Get an API Key”- Go to the Anthropic Console
- Sign up or log in
- Go to API Keys → Create Key
- Copy the key (starts with
sk-ant-)
Alternatively, use OAuth login (no API key needed):
npx rivetos loginThis opens a browser, authenticates with Anthropic, and stores tokens locally. The provider auto-detects OAuth tokens vs API keys.
2. Configure
Section titled “2. Configure”Add your key to .env:
ANTHROPIC_API_KEY=sk-ant-...your-key-hereAdd to config.yaml:
providers: anthropic: model: claude-sonnet-4-20250514 max_tokens: 8192
agents: myagent: provider: anthropic default_thinking: mediumConfig Options
Section titled “Config Options”| Key | Type | Default | Description |
|---|---|---|---|
model | string | claude-opus-4-6 | Model identifier |
max_tokens | number | 8192 | Maximum output tokens |
api_key | string | ${ANTHROPIC_API_KEY} | API key. Use env var |
base_url | string | https://api.anthropic.com | API endpoint (for proxies) |
token_path | string | — | Path to OAuth token file (set automatically by rivetos login) |
Thinking Levels
Section titled “Thinking Levels”When default_thinking is set on the agent, the provider requests extended thinking with a token budget:
| Level | Budget | Best For |
|---|---|---|
off | — | Simple questions, fast responses |
low | 2,000 tokens | Light reasoning |
medium | 10,000 tokens | Code review, planning |
high | 50,000 tokens | Complex architecture, deep analysis |
Models
Section titled “Models”| Model | Speed | Intelligence | Context |
|---|---|---|---|
claude-opus-4-6 | Slow | Highest | 200K |
claude-sonnet-4-20250514 | Fast | High | 200K |
claude-haiku-3-5-20241022 | Fastest | Good | 200K |
Docs: Anthropic API Reference
xAI (Grok)
Section titled “xAI (Grok)”1. Get an API Key
Section titled “1. Get an API Key”- Go to console.x.ai
- Sign up or log in
- Create an API key
- Copy the key (starts with
xai-)
2. Configure
Section titled “2. Configure”Add your key to .env:
XAI_API_KEY=xai-...your-key-hereAdd to config.yaml:
providers: xai: model: grok-4-1-fast-reasoning
agents: grok: provider: xaiConfig Options
Section titled “Config Options”| Key | Type | Default | Description |
|---|---|---|---|
model | string | grok-4.20-reasoning | Model identifier |
api_key | string | ${XAI_API_KEY} | API key |
base_url | string | https://api.x.ai/v1 | API endpoint |
temperature | number | — | Sampling temperature (not used with reasoning models) |
store | boolean | true | Server-side conversation storage. When enabled, only new messages are sent each turn |
timeout_ms | number | 3600000 | Request timeout in milliseconds (default: 1 hour for reasoning) |
Conversation Caching
Section titled “Conversation Caching”When store: true (default), xAI stores the conversation server-side. Each turn only sends new messages, reducing token usage and latency. The provider manages previous_response_id automatically.
Models
Section titled “Models”| Model | Type | Notes |
|---|---|---|
grok-4.20-reasoning | Flagship | 2M context, fast + agentic, $2.00/$6.00 per M tokens |
grok-4-1-fast-reasoning | Fast | 10x cheaper ($0.20/$0.50), good for compaction/fallback |
Docs: xAI API Documentation
Google (Gemini)
Section titled “Google (Gemini)”1. Get an API Key
Section titled “1. Get an API Key”- Go to Google AI Studio
- Click Create API Key
- Select or create a Google Cloud project
- Copy the key
2. Configure
Section titled “2. Configure”Add your key to .env:
GOOGLE_API_KEY=AIza...your-key-hereAdd to config.yaml:
providers: google: model: gemini-2.5-pro
agents: gemini: provider: google default_thinking: mediumConfig Options
Section titled “Config Options”| Key | Type | Default | Description |
|---|---|---|---|
model | string | gemini-2.5-pro | Model identifier |
api_key | string | ${GOOGLE_API_KEY} | API key |
max_tokens | number | 8192 | Maximum output tokens |
base_url | string | https://generativelanguage.googleapis.com/v1beta | API endpoint |
Thinking Levels
Section titled “Thinking Levels”| Level | Budget |
|---|---|
off | 0 |
low | 1,024 tokens |
medium | 8,192 tokens |
high | 32,768 tokens |
Models
Section titled “Models”| Model | Speed | Context | Notes |
|---|---|---|---|
gemini-2.5-pro | Medium | 1M | Best reasoning |
gemini-2.5-flash | Fast | 1M | Good balance of speed and quality |
Docs: Gemini API Documentation
Ollama (Local Models)
Section titled “Ollama (Local Models)”Ollama runs models locally on your machine. No API key needed, no usage costs — just hardware.
1. Install Ollama
Section titled “1. Install Ollama”# Linuxcurl -fsSL https://ollama.com/install.sh | sh
# macOSbrew install ollama
# Or download from https://ollama.com/download2. Pull a Model
Section titled “2. Pull a Model”ollama pull qwen2.5:32bBrowse available models at ollama.com/library.
3. Configure
Section titled “3. Configure”No .env needed — Ollama runs locally without authentication.
providers: ollama: model: qwen2.5:32b base_url: http://localhost:11434
agents: local: provider: ollama local: true # Extended context (tokens are free)Config Options
Section titled “Config Options”| Key | Type | Default | Description |
|---|---|---|---|
model | string | llama3.1 | Model name (must be pulled via ollama pull) |
base_url | string | http://localhost:11434 | Ollama API endpoint |
temperature | number | 0.7 | Sampling temperature |
top_p | number | 0.9 | Nucleus sampling threshold |
num_ctx | number | model default | Context window size in tokens |
keep_alive | string | 30m | How long to keep model loaded in memory |
- Set
local: trueon the agent — this includes extended workspace context (CAPABILITIES.md, daily notes) since tokens are free with local inference. num_ctxis critical for tool-using agents. Most models default to 2048-4096 tokens, which isn’t enough. Set8192or higher.keep_alivecontrols how long the model stays in VRAM after the last request. Set to0to unload immediately, or24hto keep it warm.- Remote Ollama: If Ollama runs on a different machine, change
base_urlto point at it (e.g.,http://192.0.2.50:11434).
Docs: Ollama API Documentation
llama.cpp server (Local)
Section titled “llama.cpp server (Local)”The native provider for llama-server — the built-in HTTP server from the llama.cpp project.
It uses the native /completion and /infill endpoints (not the OpenAI compat layer). This gives full access to llama.cpp sampling parameters (typical_p, mirostat, repeat_last_n, seed, etc.), native <think> / <thinking> tag support, and lenient JSON tool-call parsing.
1. Install & Run llama-server
Section titled “1. Install & Run llama-server”# Build from source (recommended for latest features)git clone https://github.com/ggerganov/llama.cppcd llama.cppmake -j server
# Or use prebuilt binaries from https://github.com/ggerganov/llama.cpp/releases
# Run with a model (adjust -m, --host, --port)./llama-server -m models/Meta-Llama-3.1-70B-Instruct-Q4_K_M.gguf \ --host 0.0.0.0 --port 8080 \ -c 32768 --n-gpu-layers 992. Configure
Section titled “2. Configure”providers: local: provider_type: llama-server # or just use the default "llama-server" base_url: http://localhost:8080 model: llama3.1:70b # any model name your server knows num_ctx: 32768 typical_p: 0.9 repeat_last_n: 64 mirostat: 2 mirostat_tau: 5.0 seed: 42
agents: local: provider: local local: trueConfig Options
Section titled “Config Options”| Key | Type | Default | Description |
|---|---|---|---|
base_url | string | http://localhost:8080 | Must point to your llama-server (no /v1) |
model | string | default | Model alias or path known to the server |
num_ctx | number | 8192 | Context window (matches server -c) |
temperature | number | 0.7 | Sampling temperature |
top_p | number | 0.9 | Nucleus sampling |
typical_p | number | 0.9 | Locally typical sampling (llama.cpp specific) |
repeat_penalty | number | 1.1 | Repetition penalty |
repeat_last_n | number | 64 | Last N tokens to consider for repetition |
mirostat | number | 0 | 0=off, 1=Mirostat v1, 2=v2 |
mirostat_tau | number | 5.0 | Target surprise value |
mirostat_eta | number | 0.1 | Learning rate for Mirostat |
seed | number | -1 | Random seed (-1 = random) |
first_chunk_timeout_ms | number | 120000 | Timeout for first token |
chunk_timeout_ms | number | 30000 | Timeout between tokens |
Note: This provider is llama.cpp-specific. It talks directly to the native llama-server endpoints (not the OpenAI-compat layer). A future generic openai provider is planned for OpenRouter, Together, Fireworks, vLLM, etc.
Fallback Chains
Section titled “Fallback Chains”When a provider fails (429 rate limit, 503 overloaded, timeout), RivetOS can automatically try the next provider in a fallback chain.
Configure at the agent level:
agents: opus: provider: anthropic fallbacks: - "google:gemini-2.5-pro" - "xai:grok-4-1-fast-reasoning"Or globally:
runtime: fallbacks: - providerId: anthropic fallbacks: - "google:gemini-2.5-pro" - "xai:grok-4-1-fast-reasoning"Format: provider_id uses the provider’s default model, provider_id:model overrides the model.
Checking Provider Health
Section titled “Checking Provider Health”# Run provider connectivity checksnpx rivetos doctor
# Smoke test — send a test message to each providernpx rivetos test
# Check which providers are loadednpx rivetos statusNext Steps
Section titled “Next Steps”- Channel Setup — Connect your agents to Discord, Telegram, voice
- Configuration Reference — Full option tables for all config sections
- Plugin Development — Build your own provider plugin