Tony Lee

ML Inference Performance Engineering

Tony Lee

I optimize GPU serving paths from Kubernetes scheduling to CUDA kernels with reproducible latency, throughput, and cost measurements.

Latest writing

Read the blog