As a non-java developer, this is surprising to me. I always assumed that bounds ...

iscoelho · on Jan 17, 2024

Here's an example of the same issue in Rust: https://dropbox.tech/infrastructure/lossless-compression-wit...

(with bounds checks) "Telling the Rust allocator to avoid zeroing the memory when allocating for Brotli improves the speed to 224 MB/s."

(without bounds checks) "Activating unsafe mode results in another gain, bringing the total speed up to 249MB/s, bringing Brotli to within 82% of the C code."

224MB/s -> 249MB/s (11% Brotli compression perf difference just by eliminating bounds checks)

sitkack · on Jan 17, 2024

Couple issues with the linked article, the rust compiler has matured a lot since it was written I didn’t look at the benchmark or the code, but many bounce checks are elided if you use iterators instead of array access and lastly, I would want to see ZSTD benchmark written in normative rust.

I don’t doubt that all of those things in the article were true back then, but that was eight years ago. Wow.

iscoelho · on Jan 17, 2024

Sure, you're not wrong, but my focus was that the benchmark is indicative of the raw performance impact of bounds checks (when they are unable to elided) in a real world algorithm (as opposed to a micro-benchmark).

With that said, convincing a compiler to elide bounds checks (especially Java's JIT compiler) is a hugely frustrating (and for some algorithms futile) task.

It could be an argument that bounds checks make up a small percentage of total application performance. However, I've profiled production Java servers where >50% of the CPU was encryption/compression. JDK implementations of those algorithms are heavily impacted by (and commonly fail to elide) bounds checks.

Performance matters!

xxs · on Jan 17, 2024

>With that said, convincing a compiler to elide bounds checks (especially Java's JIT compiler) is a hugely frustrating (and for some algorithms futile) task.

Adding explicit checks does work to a certain degree but it can change with the compiler, and it requires to keep checking the generated assembly - not fun (no unsafe, either but still)

iscoelho · on Jan 17, 2024

These explicit checks are basically what I mean by "convincing the compiler" (and sometimes, it isn't quite convinced!) - Yeah. Not fun at all.

Especially in Java, because "the assembly" can change as the JIT evolves. What is optimized today may not be tomorrow.

xxs · on Jan 17, 2024

Some 12-13y back (time does fly) Cliff Click (hotspot architect) had a series of blogs on optimizations, incl. the lattice checks (i.e. within bounds). Was quite insightful. It was called: "Too Much Theory" [0]

>What is optimized today may not be tomorrow.

Exactly. (Also most developers will have exceptionally hard time maintaining such code)

[0]: https://web.archive.org/web/20120328222841/http://cliffc.org...

sitkack · on Jan 17, 2024

I understand where you are coming from, but without comparing the generated assembly, we are comparing an implementation of bounds checks. I think we should have a bounds check instruction and operates concurrently.

I should play around this with this using a couple RISC-V cores.

> Performance matters!

https://www.youtube.com/watch?v=r-TLSBdHe1A by Emery Berger

Another, yesbut, encryption and compression are and will handled by on die accelerators.

Retr0id · on Jan 17, 2024

That's a good example, but much more inline with my expectations (not on the order of 125%)

iscoelho · on Jan 17, 2024

Keep in mind that the "11%" applies to the fully implemented Brotli compression algorithm. Bounds-checks are a small part of that.

If you write a micro-benchmark for only bounds-checks, you'd see the larger difference more inline with the "125%"

xxs · on Jan 17, 2024

Bounds checks are not 'free' but predicted ones are cheap. There are way to convince the JIT to remove them - in cases it can prove the index doesn't exceed the byte array, so explicit checks with the length of the array may improve the performance.

The stuff gets worse with DirectByteBuffers as the JIT has to work harder. Unsafe allows to 'remove' all bound checks, but it may prevent some other optimizations.

iscoelho · on Jan 17, 2024

> but it may prevent some other optimizations.

I see this mentioned a few times in this thread, but I haven't experienced this in practice (and I've written a lot of unsafe in Java). Are there any examples of this?