Today we deployed LiteLLM — an OpenAI-compatible proxy server for LLM APIs — on the Synology DS920+ NAS. It now serves as a unified routing layer for all our language model providers, backed by the same PostgreSQL and Redis that power the rest of the lab.
We maintain access to several LLM providers: DeepSeek via OpenRouter, Llama 3.3 70B and Qwen 3.5 via NVIDIA NIM, and our local LM Studio instance on the RX 6800 GPU box. Each has its own API format, authentication, and rate limits. LiteLLM wraps them all behind a single OpenAI-compatible endpoint with:
The Synology DS920+ runs DSM 7.3.2 with Container Manager (Docker 24.0.2). LiteLLM runs as a single container at 192.168.0.90:4000, with config at /volume1/docker/litellm/.
PostgreSQL lives at postgres-lab.thelab.lan:5432 — a dedicated Debian LXC on the Proxmox cluster. Redis lives at redis-lab.thelab.lan:6379 on a sibling LXC. Both are backed by Synology NFS storage and monitored by Nagios service-level checks.
The config file defines six model endpoints with fallback chains: if DeepSeek Flash is slow, it routes to NVIDIA Llama. If NVIDIA is down, we try DeepSeek Chat. LM Studio on the local GPU box handles the lightweight models.
After the container started and Prisma migrations completed (connecting to our PostgreSQL for the first time), we hit the /v1/chat/completions endpoint. Five different models responded correctly within the first minute. New Jersey immediately started talking about "latency arbitrage opportunities."
Next steps: add the LiteLLM endpoint to OpenClaw's model provider list, wire up a Nagios check for the proxy itself, and give it a DNS record.