Architecture Lab

Where I open the system and show the work.

My lab notebook for AI diagrams, RAG flows, gateways, local models, and architecture research.

lab.index

Primary Use

Learn how the architecture is built

Core Lab

Self-hosted AI home lab on Apple Silicon

Research Focus

Agents, RAG, gateways, local models

For

Architects, engineers, developers, clients

Diagrams, field notes, resources, and interactive tools will keep landing here.

Home Lab Blueprint

The first lab artifact is my private AI platform.

Secure ingress, API gateway, agent runtime, local inference, data stores, and frontend surfaces.

Five-panel AI home lab architecture deck showing Cloudflare, FastAPI Gateway, LangGraphJS, Ollama and MLX, PostgreSQL, Redis, Qdrant, Agentic RAG, and infrastructure stack layers.
Home lab topology, Agentic RAG, AI gateway, and infrastructure stack.

Secure Ingress

Cloudflare Tunnel, DNS, custom rules, and zero-open-port access into the local lab.

AI Gateway

FastAPI gateway with OpenAI-compatible contracts, auth, rate limits, logging, and model routing.

Agent Runtime

LangGraphJS orchestration for agent state, tool calls, MCP-style extensions, and approval paths.

Model Layer

Ollama and MLX for local model experiments, with fallback thinking for cloud model lanes.

Data Layer

PostgreSQL, Redis, and Qdrant for metadata, cache, queues, and vector search.

Frontend Surface

Next.js interfaces that turn the lab into a usable architecture playground and learning surface.

Graph Playground

Simulate a LangGraph-style agent run.

Preset inputs, node transitions, checkpoints, approval gates, retries, and state inspection.

Active graph node: Input

LangGraph Run

Orchestration Console

Risk

High

Mode

Async queue recommended

Checkpoints

0

Retries

0

Execution Path

01 / 08

active

input node

state

state.intent, state.policy_scope, state.risk_level

checkpoint

pending

gate

Not reached

Interactive Workbench

Inspect the flows, then inspect the tradeoffs.

Select a system and inspect inputs, outputs, metrics, and failure modes.

Trace ready

AI Home Lab (Apple Silicon)

Private Apple Silicon lab: agents, local models, gateway, and vector stores.

Evaluation Metrics

Self-Hosting Cost

$0 / month

All inference, databases, and platform layers run fully on local hardware.

Unified Memory

32GB RAM

Enables co-location of local models and transactional databases.

Security Surface

Zero ports open

Incoming traffic strictly flows through secure, egress-only Cloudflare tunnels.

Node Inspector
Secure Ingress Layer

Cloudflare Tunnel

Encrypted tunnel ingress with no exposed local ports.

> INPUT

Public DNS request / HTTPS request

> OUTPUT

Routed traffic to local daemon (zero open ports)

> METRIC

WAF + custom firewall rules enforced

POTENTIAL FAILURE MODE

Tunnel drop. Auto-restart restores the daemon.

Empirical Sandbox

Ground the lab in local hardware reality.

Live-ready local model inventory, throughput, memory, and tunnel health.

Local Sandbox

Apple Silicon telemetry hook.

Live-ready metrics for models, memory, tunnel health, and local AI infrastructure.

Source

Snapshot

Health

Local lab telemetry hook ready

Tunnel

Cloudflare

Stores

Postgres / Redis / Qdrant

online

Qwen 14B

Ollama / Apple Silicon

Generation

18-28 tok/s

Memory

9-12GB unified memory

testing

DeepSeek-R1 Distill

Ollama local reasoning lane

Generation

10-18 tok/s

Memory

12-16GB unified memory

standby

Llama 3.2 SLM

MLX / local router

Generation

35-55 tok/s

Memory

3-5GB unified memory

Host

MacBook M1 Pro

32GB unified memory, local inference and data stores.

Gateway

FastAPI

OpenAI-compatible routing, auth, logging, and fallback policy.

Vector Store

Qdrant

Local semantic retrieval with cache-aware query paths.

Ingress

Cloudflare Tunnel

Egress-only tunnel path with zero open local ports.

Tunnel Log

T-00:04

Tunnel route healthy; gateway reachable through private ingress.

T-00:11

Local model lane selected for low-risk router classification.

T-00:18

Cloud fallback reserved for high-context reasoning requests.

Audit Estimator

Turn architecture variables into a review signal.

Estimate token budget, latency shape, and cache/compaction savings.

AI Audit Estimator

Model cost, latency, cache savings, and review signal in one pass.

Primary Model

Monthly Token Budget

577.5M

75,000 monthly queries

Estimated Savings

$819

Semantic cache + context compaction

Sync P95

3.2s

Direct request-response path

Async Queue

5.4s

Queued worker path with retry control

Monthly Model API Budget

Baseline

$1,950

Optimized

$1,131

Architecture Readout

Review focus: cache, compaction, queues, reranking, and fallback routing.

Request Architecture Review

Research Queue

What I am turning into deeper notes.

Future notes, experiments, proof gaps, and publishable architecture decisions.

LAB-01

Publishing

AI Home Lab Blueprint

Private AI platform on MacBook M1 Pro with Cloudflare, FastAPI, LangGraphJS, Ollama, MLX, PostgreSQL, Redis, and Qdrant.

Next detail to publish

Topology notes, constraints, and failure-mode checks.

LAB-02

Measuring

Agentic RAG Grounding

Source routing, context assembly, reranking, local reasoning, and grounded answer checks.

Next detail to publish

Eval cases for weak grounding, source gaps, and freshness.

LAB-03

Hardening

OpenAI-Compatible AI Gateway

One API contract for local models, cloud models, tools, scripts, and future MCP clients.

Next detail to publish

Validation, auth, fallback routing, traces, and cost/privacy tradeoffs.

LAB-04

Designing

Architecture Playground

Interactive diagrams for layers, failure modes, metrics, and decisions.

Next detail to publish

Topology, sequence, data-flow, and reliability views.

LAB-05

Planned

Native MDX Editorial Engine

First-party architecture notes with diagrams, code walkthroughs, and file-aware references.

Next detail to publish

MDX model, Mermaid rendering, code highlighting, SEO, and newsletter capture.

Lab Resources

What this page will become over time.

Diagrams, data resources, tool reviews, playgrounds, and reusable patterns.

Architecture Diagrams

Topology maps, data-flow diagrams, sequence flows, failure-mode maps, and infra stack references.

Research Notes

Short field notes on what worked, what failed, what changed, and which tradeoffs still need proof.

Data Resources

Evaluation fixtures, source inventories, prompt contracts, retrieval examples, and grounding checklists.

Tool Reviews

Practical reviews of agent frameworks, vector databases, model runtimes, gateways, and observability tools.

Playgrounds

Interactive inspectors for agent routing, retrieval assembly, gateway fallback, and architecture decisions.

Reusable Patterns

Reference patterns that teams can adapt for private AI, RAG reliability, agent orchestration, and platform handoff.

Work With Me

Want to turn your AI architecture question into a lab experiment?

Bring one constraint: grounding, routing, gateway design, observability, data ownership, deployment, or developer adoption.