Python FastAPI gateway · Case Study 01
AI Home Lab API Gateway
The pattern is simple: Cloudflare handles ingress, FastAPI owns auth and request shape, and Ollama stays local. I keep qwen3.5 warm, limit context, and return 429 when the Mac is full.
status
PRODUCTION
environment
macOS Apple Silicon + Podman Compose
ingress
Cloudflare Tunnel
runtime graph
8 nodes / 8 edges
System map
AI Home Lab API Gateway
Problem: I built a local OpenAI-compatible gateway so my site, agents, and scripts could call one stable /v1 contract. Ollama, PostgreSQL, Redis, and Qdrant stay private on the Mac. The hard parts were auth, backpressure, model warmup, and zero public inbound ports.
My engineering note
The pattern is simple: Cloudflare handles ingress, FastAPI owns auth and request shape, and Ollama stays local. I keep qwen3.5 warm, limit context, and return 429 when the Mac is full.
Architecture Decision
Why I chose this design.
Short decision notes tied to the code or config that mattered.
Docs
Runbooks and specs.
Supporting docs for the system: architecture, diagrams, runbook, and development notes.
Loading docs...
Enterprise Agentic RAG
BFSI workloads - I designed an agentic RAG pattern for long financial filings. The system needed layout-aware parsing, hybrid retrieval, ...
Work With Me
Need this level of architecture review?
Bring the hard system constraint: retrieval quality, agent failure modes, latency, evaluation, deployment topology, or technical market education.