Projects

Evidence-backed GPU systems work across serving infrastructure and kernel optimization.

Serving infrastructure decisions and CUDA kernel optimization are separated by project, with experiments attached to the evidence they support.

Serving infrastructure Measured decision record 7 experiments

GPU Inference Decision Lab

An EKS/vLLM lab that turns serving measurements into architecture decisions for admission, autoscaling, context limits, scheduling, and quantization.

EKS/vLLM measurements support admission, long-context boundaries, scheduler defaults, useful-work cost, and FP8 KV rejection.

Kernel optimization A10G/H200 benchmark evidence 9 experiments

A CUDA/Triton optimization lab organized around profile-driven kernel work for LLM-shaped primitives across A10G and H200.

RMSNorm fusion remains the strongest supported win, while H200 matmul autotune now bounds the Tensor Core gap against PyTorch/cuBLAS.