TL

Experiments

Project-linked catalogs that turn GPU serving and kernel questions into evidence-backed decisions.

16 experiments2 projectsRun-ready5 supported9 selected1 rejected0 pending1 blocked

Choose a project to browse its experiment table. Detail routes stay available for individual run shape, evidence, and commands.

Serving infrastructure 7 experiments 7 focus areas

GPU Inference Decision Lab

An EKS/vLLM lab that turns serving measurements into architecture decisions for admission, autoscaling, context limits, scheduling, and quantization.

EKS/vLLM measurements support admission, long-context boundaries, scheduler defaults, useful-work cost, and FP8 KV rejection.

  • 100% queued delivery across burst and spike-to-zero admission runs
  • 1.20 req/s long-context knee repeats with 36.8s p95 queue delay
  • FP8 KV rejected on the current g4dn/vLLM path
Kernel optimization 9 experiments 7 focus areas

CUDA Kernel Lab

A CUDA/Triton optimization lab organized around profile-driven kernel work for LLM-shaped primitives across A10G and H200.

RMSNorm fusion remains the strongest supported win, while H200 matmul autotune now bounds the Tensor Core gap against PyTorch/cuBLAS.

  • 115 operator rows plus 27 decode replay rows on A10G
  • H200 matmul rows keep best standard Triton around 88-90% of PyTorch/cuBLAS
  • RMSNorm fp16 reached 5.901x over the PyTorch baseline