The CUDA documenation tells me that there are more performant but less precise t...

dahart · on Aug 31, 2024

Yep, these intrinsics are what I was referring to, and yes the software versions won’t use the hardware trig unit, they’ll be written using an approximating spline and/or Newton’s method, I would assume, probably mostly using adds and multiplies. Note the loss of precision with these fast-math intrinsics isn’t very much, it’s usually like 1 or 2 bits at most.

mabster · on Sept 2, 2024

I couldn't find much information on those. I assume that they don't include range reduction?

dahart · on Sept 2, 2024

I’m not totally sure but I think fast math usually comes with loss of support for denormals, which is a bit of range reduction. Note that even if they had denormals, the absolute error listed in the chart is much bigger than the biggest denorm. So you don’t lose range out at the large ends, but you might for very small numbers. Shouldn’t be a problem for sin/cos since the result is never large, but maybe it could be an issue for other ops.

mabster · on Sept 2, 2024

Just for your information: when calculating trig functions, you first modulo by 2 pi (this is called range reduction). Then you calculate the function, usually as a polynomial approximation, maybe piecewise.

But if it supports larger floats it must be doing range reduction which is impressive for low cycle ops. It must be done in hardware.

It doesn't surprise me regarding denorms. They're really nice numerically but always disabled when looking for performance!

dahart · on Sept 3, 2024

Oh that range reduction. :) I’m aware of the technique, but thanks I did misunderstand what you were referring to. I don’t know what Nvidia hardware does exactly. For __sinf(), the CUDA guide says: “For x in [-pi,pi], the maximum absolute error is 2^(-21.41), and larger otherwise.” That totally doesn’t answer your question, it could still go either way, but it does kinda tend to imply that it’s best to keep the inputs in-range.

mabster · on Sept 3, 2024

Terms like "range reduction" will definitely be loaded differently in different fields, so my bad.

Yeah, maybe they don't by the sounds.

I don't do much on GPUs nowadays, but I still find this stuff interesting. I'm definitely going have to do a deeper dive.

Thanks heaps for the info!