Deploying LiteLLM on the Synology

Today we deployed LiteLLM — an OpenAI-compatible proxy server for LLM APIs — on the Synology DS920+ NAS. It now serves as a unified routing layer for all our language model providers, backed by the same PostgreSQL and Redis that power the rest of the lab.

Why LiteLLM?

We maintain access to several LLM providers: DeepSeek via OpenRouter, Llama 3.3 70B and Qwen 3.5 via NVIDIA NIM, and our local LM Studio instance on the RX 6800 GPU box. Each has its own API format, authentication, and rate limits. LiteLLM wraps them all behind a single OpenAI-compatible endpoint with:

Unified API key management
Latency-based routing and model fallbacks
Spend tracking via PostgreSQL
Redis-backed response caching

Where It Lives

The Synology DS920+ runs DSM 7.3.2 with Container Manager (Docker 24.0.2). LiteLLM runs as a single container at 192.168.0.90:4000, with config at /volume1/docker/litellm/.

Architecture

PostgreSQL lives at postgres-lab.thelab.lan:5432 — a dedicated Debian LXC on the Proxmox cluster. Redis lives at redis-lab.thelab.lan:6379 on a sibling LXC. Both are backed by Synology NFS storage and monitored by Nagios service-level checks.

The config file defines six model endpoints with fallback chains: if DeepSeek Flash is slow, it routes to NVIDIA Llama. If NVIDIA is down, we try DeepSeek Chat. LM Studio on the local GPU box handles the lightweight models.

First Test

After the container started and Prisma migrations completed (connecting to our PostgreSQL for the first time), we hit the /v1/chat/completions endpoint. Five different models responded correctly within the first minute. New Jersey immediately started talking about "latency arbitrage opportunities."

Models Proxied

Provider Backends

<1s

Avg Response Time

Next steps: add the LiteLLM endpoint to OpenClaw's model provider list, wire up a Nagios check for the proxy itself, and give it a DNS record.