TL

Tony Lee

ML Inference Performance Engineering

I optimize GPU serving paths from Kubernetes scheduling to CUDA kernels with reproducible latency, throughput, and cost measurements.

Latest writing

Read the blog →