On this page: Pain points · Routing matrix · Deploy steps · Troubleshooting · FAQ
This playbook shows how to co-locate LiteLLM Proxy with OpenClaw on a remote Mac, expose stable model aliases, attach breaker and token budgets, and return concise failure summaries to callers. Pair it with the gateway token patterns in OpenClaw plus LangGraph tool nodes, reuse breaker discipline from JSON Schema tools, timeouts, and retries, and align cost signals with OpenTelemetry GenAI observability on Mac when you export traces.
Where teams feel friction first
Secrets sprawl. When every agent reads provider keys from laptops, rotation becomes heroic and audits fail. Keep keys on the Mac in files or keychain references that only the LiteLLM service user can read.
Vendor strings in prompts. Hard-coding provider model ids in graphs guarantees churn. Aliases let you swap vendors after an outage without touching every manifest.
No shared breaker language. Without explicit thresholds and budgets, one flaky route starves the machine. Centralize limits in LiteLLM and let OpenClaw skills translate errors into operator-ready summaries.
Gateway versus proxy responsibilities
| Concern | OpenClaw gateway | LiteLLM Proxy |
|---|---|---|
| Authentication to agents | Dashboard tokens with narrow scopes such as invoke plus health read | Master keys stay server-side; agents never see upstream secrets |
| Model selection | Stable route names exposed to skills | Aliases map to provider-specific endpoints and fallbacks |
| Resilience | Schema validation, correlation ids, merged alerts | Breaker windows, RPM ceilings, structured provider errors |
Reproducible deploy and configuration
1. Isolate users and ports. Run LiteLLM under its own macOS user or label, bind to 127.0.0.1:4000 (example), and keep OpenClaw on a different loopback port documented beside your runbook.
2. Register providers once. Add each vendor key through LiteLLM admin flows, then define model_list entries that include timeouts, max tokens, and optional region hints for Apple Silicon friendly batch sizes.
3. Publish aliases. Create friendly names such as router.default or draft.fast that point to router groups or single models. Agents call only those names through the gateway.
4. Budgets and breakers. Set per-tenant RPM, TPM, and concurrent request caps. Configure breaker thresholds so repeated 5xx or latency spikes open the circuit and return a typed summary instead of hanging sockets.
5. Failure summary contract. Standardize JSON fields like route, provider_family, retry_after_ms, and correlation_id. Strip raw bodies at the gateway before anything reaches a browser or ticket.
6. OpenClaw integration. Point skills at the gateway base URL, attach bearer tokens from OPENCLAW_TOKEN_FILE, and forward X-Correlation-Id from thread metadata. Never reuse admin dashboard tokens for runtime agents.
7. Smoke tests. Curl each alias with tiny prompts, confirm budgets emit throttling when you burst traffic, and snapshot openclaw doctor output after changes.
Document every alias, port, and token file path in your internal wiki so on-call engineers can diff configs during incidents. When you add a new provider, stage it behind a canary alias first, watch breaker metrics for an hour, then promote the alias in OpenClaw manifests during a maintenance window.
# Example curl (loopback; replace port, alias, and token paths)
curl -sS http://127.0.0.1:4000/v1/chat/completions \
-H "Authorization: Bearer $(cat ~/.litellm/proxy.key)" \
-H "Content-Type: application/json" \
-d '{"model":"router.default","messages":[{"role":"user","content":"ping"}]}'Troubleshooting checklist
- 502 loops between gateway and proxy. Verify both processes listen on the documented loopback addresses, check reverse SSH tunnels if you develop remotely, and ensure health probes hit the same URL agents use.
- 429 storms after deploy. Compare LiteLLM budget counters with OpenClaw retry policies—lower client retries when the breaker is open so you do not amplify throttling.
- Alias not found. Reload the LiteLLM config through its supported signal, confirm the alias spelling in OpenClaw manifests, and grep logs for case drift.
- Silent cost spikes. Enable per-model usage logs, attach stable
tenant_idlabels, and reconcile against the observability matrix fields for token integers.
FAQ
Can I skip the gateway and call LiteLLM from tools directly? You can, but you lose centralized auth, schema gates, and consistent failure summaries. Keep LiteLLM server-side and expose only gateway routes.
How often should keys rotate? At least quarterly for shared secrets, sooner after personnel changes. Rotate LiteLLM master keys and OpenClaw tokens in the same change window to avoid half-updated stacks.
Why rent hardware for this stack? Laptops sleep, thermals throttle, and desktops mix personal apps with inference. A dedicated remote Mac mini M4 node gives stable clocks, predictable networking, and isolation that mirrors quasi production for routing drills.
Public pages: Compare plans on pricing and explore SKUs on purchase without signing in. Operational detail lives in the Help Center, and more playbooks sit in the Tech Blog index.