I was on the original team of 4 developers who initially built CUDA. We had a compiler guy, a numerics guy, a library guy, and a driver guy. I was the driver guy. I designed, and often did the first implementations, of most of the low-level abstractions used by CUDA: devices, contexts, modules, kernels, CUDA arrays and texture and surface references, streams and events, pinned memory and its variants (portable pinned memory, mapped pinned memory, host memory registration).
I have written a book on CUDA: http://www.cudahandbook.com
Before joining NVIDIA in 2002, I spent 8 years working on various multimedia technologies at Microsoft. Just before leaving Microsoft, I built a prototype of what eventually became the Windows Desktop Manager; I also served as Direct3D dev lead for DirectX 5.0 and DirectX 6.0.
My latest gig, cloud computing, enables me to continue working on technologies that serve to commoditize and democratize.