GPU Inference Lab Experiments

Project-linked experiments that turn GPU serving and kernel questions into evidence-backed decisions.

16 experiments2 projectsRun-ready5 supported9 selected1 rejected0 pending1 blocked

Catalog readyView project decisions →

Rows show the current proof, focus area, and decisions that still need stronger evidence.

Experiment

Memory pressure

KV Cache vs Concurrency

Purpose

Full delivery can hide queueing.

Focus

ConcurrencyKV memoryTail latency

Status

Run readySupported · Long-context kneeRejected · FP8 KV on g4dn

DetailsView details →

Experiment

Streaming latency

Prefill vs Decode Timing

Purpose

Streaming timing by request shape.

Focus

TTFTInter-token latencyThroughput

Status

Run readySelected report · Streaming split

DetailsView details →

Experiment

Scheduler behavior

Batching Scheduler Tradeoffs

Purpose

Scheduler limits versus tail latency.

Focus

Batchingp99 latencyTokens/sec

Status

Run readySelected report · Scheduler matrix

DetailsView details →

Experiment

Traffic shape

Request Pattern Utilization

Purpose

Same profile, different traffic outcome.

Focus

DeliveryTail latencyActive concurrency

Status

Run readySelected report · Pattern matrix

DetailsView details →

Experiment

Capacity response

Autoscaling and Queueing Behavior

Purpose

Scale-from-zero timing and queue policy.

Focus

Scale-from-zeroQueue policyDropped work

Status

Run readySupported · Admission behavior

DetailsView details →

Experiment

Cost efficiency

Cost per Useful Work

Purpose

Cheap only counts when useful work passes.

Focus

Cost/requestCost/tokenSLO pass

Status

Run readySupported · Useful-work cost

DetailsView details →

Experiment

Quantization

FP4 Quantization Optimization

Purpose

BF16 vs NVFP4 vs SmoothQuant.

Focus

Accuracy recoveryMemoryBuild cost

Status

Run readyBlocked · Blackwell capacity

DetailsView details →

GPU inference evidence

Decisions live in the project decision record

Admission, cold start, active-pressure HPA, FP8 KV cache, and Blackwell FP4 readiness remain in the GPU Inference Lab decision record.

View project decisions →