Architecture Lab

Where I open the system and show the work.

My lab notebook for AI diagrams, RAG flows, gateways, local models, and architecture research.

lab.index

Primary Use

Learn how the architecture is built

Core Lab

Self-hosted AI home lab on Apple Silicon

Research Focus

Agents, RAG, gateways, local models

For

Architects, engineers, developers, clients

Diagrams, field notes, resources, and interactive tools will keep landing here.

Home Lab Blueprint

The first lab artifact is my private AI platform.

Secure ingress, API gateway, agent runtime, local inference, data stores, and frontend surfaces.

Five-panel AI home lab architecture deck showing Cloudflare, FastAPI Gateway, LangGraphJS, Ollama and MLX, PostgreSQL, Redis, Qdrant, Agentic RAG, and infrastructure stack layers. — Home lab topology, Agentic RAG, AI gateway, and infrastructure stack.

Secure Ingress

Cloudflare Tunnel, DNS, custom rules, and zero-open-port access into the local lab.

AI Gateway

FastAPI gateway with OpenAI-compatible contracts, auth, rate limits, logging, and model routing.

Agent Runtime

Python LangGraph orchestration inside the FastAPI gateway; Astra remains the JS/TS website-side agent surface.

Model Layer

Ollama and MLX for local model experiments, with fallback thinking for cloud model lanes.

Data Layer

PostgreSQL, Redis, and Qdrant for metadata, cache, queues, and vector search.

Frontend Surface

Next.js interfaces that turn the lab into a usable architecture playground and learning surface.

Graph Playground

Simulate a LangGraph-style agent run.

Preset inputs, node transitions, checkpoints, approval gates, retries, and state inspection.

LangGraph Run

Orchestration Console

Risk

High

Mode

Async queue recommended

Checkpoints

Retries

Execution Path

01 / 08

active

input node

state

state.intent, state.policy_scope, state.risk_level

checkpoint

pending

gate

Not reached

Interactive Workbench

Inspect the flows, then inspect the tradeoffs.

Select a system and inspect inputs, outputs, metrics, and failure modes.

Trace ready

AI Home Lab (Apple Silicon)

Vercel-hosted Astra calls a private Python FastAPI gateway through Cloudflare Tunnel.

Evaluation Metrics

Runtime Split

TS + Python

Astra stays in Next.js while gateway orchestration runs in Python.

Gateway Port

8000

Private FastAPI server is reached through the Cloudflare tunnel.

Security Surface

Zero open ports

Incoming traffic strictly flows through secure, egress-only Cloudflare tunnels.

Node Inspector

Vercel Surface

Astra Website

Runs the public chat surface and API route in the Vercel-hosted Next.js site.

> INPUT

Visitor chat request from manojmukherjee.co.in

> OUTPUT

OpenAI-compatible request from the Next.js API route

> METRIC

JS/TS website-side agent tooling

POTENTIAL FAILURE MODE

Gateway unavailable. The UI returns a bounded fallback state.

Empirical Sandbox

Ground the lab in local hardware reality.

Live-ready local model inventory, throughput, memory, and tunnel health.

Local Sandbox

Apple Silicon telemetry hook.

Live-ready metrics for models, memory, tunnel health, and local AI infrastructure.

Source

Snapshot

Health

Local lab telemetry hook ready

Tunnel

Cloudflare

Stores

Postgres / Redis / Qdrant

online

Qwen 14B

Ollama / Apple Silicon

Generation

18-28 tok/s

Memory

9-12GB unified memory

testing

DeepSeek-R1 Distill

Ollama local reasoning lane

Generation

10-18 tok/s

Memory

12-16GB unified memory

standby

Llama 3.2 SLM

MLX / local router

Generation

35-55 tok/s

Memory

3-5GB unified memory

Host

MacBook M1 Pro

32GB unified memory, local inference and data stores.

Gateway

FastAPI

OpenAI-compatible routing, auth, logging, and fallback policy.

Vector Store

Qdrant

Local semantic retrieval with cache-aware query paths.

Ingress

Cloudflare Tunnel

Egress-only tunnel path with zero open local ports.

Tunnel Log

T-00:04

Tunnel route healthy; gateway reachable through private ingress.

T-00:11

Local model lane selected for low-risk router classification.

T-00:18

Cloud fallback reserved for high-context reasoning requests.

Audit Estimator

Turn architecture variables into a review signal.

Estimate token budget, latency shape, and cache/compaction savings.

AI Audit Estimator

Model cost, latency, cache savings, and review signal in one pass.

Daily Active Agent Queries2,500 queriesRetrieved Context Size6,000 tokens/queryAverage Output Length900 tokens

Primary Model

Semantic Cache Hit Rate35 %Context Compaction45 %

Monthly Token Budget

577.5M

75,000 monthly queries

Estimated Savings

$819

Semantic cache + context compaction

Sync P95

3.2s

Direct request-response path

Async Queue

5.4s

Queued worker path with retry control

Monthly Model API Budget

Baseline

$1,950

Optimized

$1,131

Architecture Readout

Review focus: cache, compaction, queues, reranking, and fallback routing.

Request Architecture Review via Email

Research Queue

What I am turning into deeper notes.

Future notes, experiments, proof gaps, and publishable architecture decisions.

LAB-01

Published

AI Home Lab Blueprint

Private AI platform on MacBook M1 Pro with Vercel-hosted Astra, Cloudflare Tunnel, FastAPI, Python LangGraph, Ollama, PostgreSQL, Redis, and Qdrant.

Specs Link / Details

Click card to view technical documentation specifications.

LAB-02

Production

Astra Graph RAG Engine

Static graph parsing, eager MiniSearch trie deserialization, warm Map query caching, and token-minimized context assembly.

Next detail to publish

Offline crawling, relationship mapping, and eager in-memory retrieval scaling.

LAB-03

Published

OpenAI-Compatible AI Gateway

One API contract for local models, cloud models, tools, scripts, and future MCP clients.

Specs Link / Details

Click card to view technical documentation specifications.

LAB-04

Designing

Architecture Playground

Interactive diagrams for layers, failure modes, metrics, and decisions.

Next detail to publish

Topology, sequence, data-flow, and reliability views.

LAB-05

Published

Native Editorial Engine

First-party architecture notes with diagrams, code walkthroughs, and file-aware references.

Specs Link / Details

Click card to view technical documentation specifications.

Lab Resources

What this page will become over time.

Diagrams, data resources, tool reviews, playgrounds, and reusable patterns.

Architecture Diagrams

Topology maps, data-flow diagrams, sequence flows, failure-mode maps, and infra stack references.

Research Notes

Short field notes on what worked, what failed, what changed, and which tradeoffs still need proof.

Data Resources

Evaluation fixtures, source inventories, prompt contracts, retrieval examples, and grounding checklists.

Tool Reviews

Practical reviews of agent frameworks, vector databases, model runtimes, gateways, and observability tools.

Playgrounds

Interactive inspectors for agent routing, retrieval assembly, gateway fallback, and architecture decisions.

Reusable Patterns

Reference patterns that teams can adapt for private AI, RAG reliability, agent orchestration, and platform handoff.

Work With Me

Want to turn your AI architecture question into a lab experiment?

Bring one constraint: grounding, routing, gateway design, observability, data ownership, deployment, or developer adoption.

Start Advisory Intake View GitHub