> You're right that most people only use a small subset of cuda
This is true first and foremost for the host-side API. From my StackOverflow and NVIDIA forums experience - I'm often the first and only person to ask about any number of nooks and crannies of the CUDA Driver API, with issues which nobody seems to have stumbled onto before; or at least - not stumbled and wrote anything in public about it.
Oh yes, we found all kinds of bugs in Nvidia's cuda implementation during this project :D.
There's a bunch of pretty obscure functions in the device side apis too: some esoteric math functions, old simd "intrinsics" that are mostly irrelevant with modern compilers, etc.
This is true first and foremost for the host-side API. From my StackOverflow and NVIDIA forums experience - I'm often the first and only person to ask about any number of nooks and crannies of the CUDA Driver API, with issues which nobody seems to have stumbled onto before; or at least - not stumbled and wrote anything in public about it.