Inference scaling · Case Study 03

GPU Platform Modernization

The design separates ingress, scheduling, GPU slices, serving, and observability. Run:AI owns allocation policy. vLLM serves traffic. OTel shows saturation.

status

ACTIVE

environment

OpenShift / Run:AI

ingress

Kube Ingress Controller

runtime graph

5 nodes / 5 edges

System map

GPU Platform Modernization

Env: OpenShift / Run:AIIngress: Kube Ingress Controller

Problem: I worked on inference platform patterns where static GPU allocation slowed teams down. Production serving needed quota, priority, and predictable capacity.

My engineering note

The design separates ingress, scheduling, GPU slices, serving, and observability. Run:AI owns allocation policy. vLLM serves traffic. OTel shows saturation.

Live path

workload class: Kube ingress -> Run:AI policy

Path running

Mini Map
Interactive map

Zones, edges, and logs come from the case-study data model.

Architecture Decision

Why I chose this design.

Short decision notes tied to the code or config that mattered.

Decision

gpu-allocation.yaml

I kept GPU allocation policy separate from model serving. Platform teams can change quota and priority without rewriting the runtime.

gpu-allocation.yamlyaml
yaml
apiVersion: scheduling.run.ai/v1kind: PodGroupmetadata:  name: vllm-servingspec:  gpuAllocation: 0.5  gpuMemory: 12Gi  priority: HighPriority  schedulingStrategy: BinPacking  affinity:    nodeAffinity:      requiredDuringSchedulingIgnoredDuringExecution:        nodeSelectorTerms:          - matchExpressions:              - key: nvidia.com/gpu.family                operator: In                values: ["H100"]
Next case study

AI Architecture Enablement

Upskilling teams - I designed patterns for teams adopting MCP-style tools. The goal was to let agents call databases and APIs without expos...

Read next

Work With Me

Need this level of architecture review?

Bring the hard system constraint: retrieval quality, agent failure modes, latency, evaluation, deployment topology, or technical market education.