Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Regarding autovectorization:

> The other drawback of this method is that the optimizer won’t even touch anything involving floats (f32 and f64 types). It’s not permitted to change any observable outputs of the program, and reordering float operations may alter the result due to precision loss. (There is a way to tell the compiler not to worry about precision loss, but it’s currently nightly-only).

Ah - this makes a lot of sense. I've had zero trouble getting excellent performance out of Julia using autovectorization (from LLVM) so I was wondering why this was such a "thing" in Rust. I wonder if that nightly feature is a per-crate setting or what?



Does Julia ignore the problem of floating point not being associative, commutative nor distributive?

The reason it’s a thing is from LLVM and I’m not sure you can “language design” your way out of this problem as it seems intrinsic to IEEE 754.


No it only uses the same LLVM compiler passes and you enable certain optimizations locally via macros if you want to allow reordering in a given expression.


Nitpick, but IEEE float operations are commutative (when relevant and appropriate). Associative and distributive they indeed are not.


Unless I’m having a brain fart it’s not commutative or you mean something by “relevant and appropriate” that I’m not understanding.

a+b+c != c+b+a

That’s why you need techniques like Kahan summation.


I think the other replies are overcomplicating this.

+ is a binary operation, and a+b+c can’t be interpreted without knowing whether one treats + as left-associative or right-associative. Let’s assume the former: a+b+c really means (a+b)+c.

If + is commutative, you can turn (a+b)+c into (b+a)+c or c+(a+b) or (commuting twice) c+(b+a).

But that last expression is not the same thing as (c+b)+a. Getting there requires associativity, and floating point addition is not associative.


"a+b+c" doesn't describe a unique evaluation order. You need some parentheses to disambiguate which changes are due to associativity vs commutativity. a+(b+c)=(c+b)+a should be true of floating point numbers, due to commutativity. a+(b+c)=(a+b)+c may fail due to the lack of associativity.


It is not, due to precision. Consider a=1.00000, b=-0.99999, and c=0.00000582618.


No, the two evaluations will give you exactly the same result: https://play.rust-lang.org/?version=stable&mode=debug&editio...

IEEE 754 operations are nonassociative, but they are commutative (at least if you ignore the effect of NaN payloads).


https://play.rust-lang.org/?version=stable&mode=debug&editio...

You're supposed to do (a+b) to demonstrate the effect, because floating point subtraction that results in a number near zero is sensitive to rounding (worst case, a non-zero number gets you a zero number), which can introduce a huge error when a and b are very similar numbers.


When you go from (a + b) + c to a + (b + c), you're invoking the associative property, not the commutative property.

The confusion between associativity and commutativity is the entire point of this thread!


Is there a case involving NaN where they are not commutative? Do you mean getting a different bit-level representation of NaN?


IEEE 754 doesn't (usually) distinguish between different NaN encodings for the purposes of semantics--if the result is a NaN, it doesn't specify which NaN the result is. Most hardware vendors implement a form of NaN propagation: when both inputs are NaN, one of the operands is returned, for example, always the left NaN is returned if both are NaN.

As a side note: all compilers I'm aware of make almost no guarantees on preserving the value of NaN payloads, hence they consider floating-point operations to be fully commutative, and there's no general way to guarantee that they evaluate in exactly the order you specified.


In practical use for simd, various min/max operations. On Intel at least, they propagate nan or not based on operand order


Also all comparisons.


Does (1.00000+-0.99999)+0.00000582618 != 0.00000582618+(-0.99999+1.00000) ? This would disprove commutativity. But I think they're equal.


You still need to specify an evaluation order …


For those to be equal you need both associativity and commutativity.

Commutativity says that a*b = b*a, but that's not enough to allow arbitrary reordering. When you write a*b*c depending on whether * is left or right associative that either means a*(b*c) or (a*b)*c. If those are equal we say the operation is associative. You need both to allow arbitrary reordering. If an operation is only commutative you can turn a*(b*c) into a*(c*b) or (b*c)*a but there is no way to put a in the middle.


We’re in very nitpicky terminology weeds here (and I’m not the person you’re replying to), but my understanding is “commutative” is specifically about reordering operands of one binary op (4+3 == 3+4), while “associative” is about reordering a longer chain of the same operation (1+2+3 == 1+3+2).

Edit: Wikipedia actually says associativity is definitionally about changing parens[0]. Mostly amounts to the same thing for standard arithmetic operators, but it’s an interesting distinction.

[0]: https://en.wikipedia.org/wiki/Associative_property


It is not a nit it is fundamental, a•b•c is associativity, specifically operator associativity.

Rounding and eventual underflow in IEEE means an expression X•Y for any algebraic operation • produces, if finite, a result (X•Y)·( 1 + ß ) + µ where |µ| cannot exceed half the smallest gap between numbers in the destination’s format, and |ß| < 2^-N , and ß·µ = 0 . ( µ ≠ 0 only when Underflow occurs.)

And yes that is a binary relation only

a•b•c is really (a•b)•c assuming left operator associativity, one of the properties that IEEE doesn't have.


IEEE 754 floating-point addition and multiplication are commutative in practice, even if there are exceptions with NaNs etc..

But remember that commutative is on the operations (+,x) which are binary operations, a+b=b+a and ab=ba, you can get accumulated rounding errors on iterated forms of those binary operations.


It's not something you seem to be able to just enable globally. From what I gather this is what is being referenced:

https://doc.rust-lang.org/std/intrinsics/index.html

Specifically the *_fast intrinsics.


Is this equivalent to --ffast-math?


From what I know of -ffast-math and can read from the docs for *_fast. I am not convinced that the *_fast intrinsics do _everything_ -ffast-math allows. They seem focused around algebraic equivalence (a/b is equivalent to a*(1/b) ) and assumptions of finite math. There's a few other things that -ffast-math allows like ignoring certain errors, ignoring the existence of signed zero, ignoring signalling NaN handling, ignoring SIGFPE handling, etc...


Yes, because many of the traditional "fast math" assumptions are definitely not something that should be hidden behind an attractive option like that. In particular assuming the nonexistence of NaNs is essentially never anything but a ticket to the UB land.


There are algebraic operations available on nightly: https://doc.rust-lang.org/nightly/std/primitive.f32.html#alg...


For vectorizing, that quote is only true for loops with dependencies between iterations, e.g. summing a list of numbers (..that's basically the only case where this really matters).

For loops without such dependencies Rust should autovectorize just fine as with any other element type.


You just create f32x4 types, the wide crate does this. Then it autovectorizes just fine. But it still isn't the best idea if you are comparing values. We had a defect due to this recently.


I suspect I am misunderstanding. If you create an f32x4 type, aren't you manually vectorizing? Auto-vectoring is magic SIMD use the compiler does in some cases. (But usually doesn't...)


You are manually vectorizing, but it lets the optimizer know you don't care about safe rounding behavior so it ends up using the simd instructions. And this way it is portable still vs using intrinsics. Floating point addition is the only one the optimizer isn't allowed to do, so if you just need multiplication or only use integers it all autovectorizes fine. The f32xN stuff is just a way to tell it you don't care about the rounding. There are better ways to do that that could be added, like a FastF32 type, but I don't know if llvm could support that.

Edit: go to godbolt and load the rust aligned sum and play around with types. If you see addps that is the packed scalar simd instruction. The more you get packed, the higher your score! You'll need to pass some extra arguments they don't list to get avx512 sized registers vs the xmm or ymm ones. And not all the instances it uses support avx512 so sometimes you have to try a couple times.


Well, not really "you don't care about safe rounding behavior", more just "you have specified a specific operation order that happens to be more susceptible to being vectorizable". Implementing a float sum that way has the completely-safe completely-well-defined portable behavior of summing strides for any given size.

Both float multiplication and float addition are equally bad for optimizations though - both are non-associative: https://play.rust-lang.org/?version=stable&mode=debug&editio... ; and indeed changing the aligned-sum example to f64, neither .sum() nor .product() get vectorized.

And e.g. here's a plain rust loop autovectorizing both addition and multiplication (though of course not a reduction): https://rust.godbolt.org/z/6hEcj8zfx


I meant was multiply two vectors point by point autovecs, because there is no order. I'm usually doing accumulated products or something like them for dsp. As long as you only use the wide it is fine. I had a bug when comparing values constructed partially from simd vs not at all. Very unusual I'm sure, but there really is a reason rust won't let you turn on ffastmath


LLVM autovectorizes many FP operations just fine, the article was a bit strange in that respect. Problem is, there are many other cases where it's unable to do so, not because it can't but because it isn't allowed.


In my experience, compiling C with -ffast-math will tremendously improve floating point autovectorization and optimizations to SIMD (C vector extensions, which are similar to Rust std::simd) code in general.

This obviously has a lot of caveats, and should only be enabled on a per function or per file basis.

Unfortunately Rust does not currently have options for adjusting per-function compiler optimization parameters. This is possible in some C compilers using function attributes.


We used to tweak our scalar product simulator code to match the SIMD arithmetic order so we could hash the outputs for tests.

I wonder if it could autovec the simd-ordered code.


> I wonder if that nightly feature is a per-crate setting or what

Unfortunately it's a set of functions you have to use to perform arithmetic ops if you want the autovectorizer to touch them


Does Rust not have the equivalent of GCC's "-ffast-math"?


No, because as I commented in another subthread, `-ffast-math` is:

1. dangerous assumptions hidden behind a simple, attractive-looking option [1]. It should be called -fwrong-math or -fdangerous-math or something (GCC does have the funnily named switch -funsafe-math-optimizations – what could go wrong with fun, safe math optimizations?!)

2. Translation-unit scoped, which means that dependencies not consented to "fast math" can break your code (as in UB land) or make the optimizations pointless, and your code can break your dependencies' semantics too via inlining. On the other hand, a library author must think very carefully what float opts to enable in order to be compatible with client code.

Deciding how the scoping of non-IEEE float math operations should work is a very nontrivial question. The scope could be a translation unit, a module, a type, a function, a block, or every individual operation, and none of those is without issues, particularly regarding questions like inlining and interprocedural and link-time-optimization, as well as ergonomics. In other ways, it's yet another function coloring problem.

There are currently-unstable "algebraic_add/mul/etc" methods for floats for letting LLVM treat those particular operations as if floats were real numbers [2]. They're the first step towards safe UB-free float optimizations, but of course those names are rather awkward to use in math-heavy code, and a wrapper type overloading the normal operators would be good to have.

---

[1] See, eg. https://simonbyrne.github.io/notes/fastmath/

[2] In terms of associativity and such, not in eg. assuming the nonexistence of NaNs, which would be very unsafe.


As a student of floating point math idiosyncrasies, I had always thought -ffast-math should be renamed -fsloppy-math.


How does one become a "student of floating point math idiosyncracies"?


By enrolling at “Floating Point Math Idiosyncrasy University” I suppose


No it doesn't. A global flag is a no-go as it breaks modularity. A local opt-in through dedicated types or methods is being designed but it's not stable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: