Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Zstd's decoder is very very fast for what it does, what ideas do you have to improve it?

It's unlikely to beat lzo family on decompression speed, but that's a different class of compressor entirely. And probably there are still some more tricks zstd can learn from closed-source Oodle leviathan, but i don't think much more can be learned.



I expect any "perform a conversion of some sort on a byte stream" implementation that uses 0 SIMD instructions and is not memory bound is leaving a lot of performance on the table, especially if one is permitted to mess with the design and layout of the input to make it more amenable to vectorized consumption. I cannot confidently claim that we're missing a a >2x speedup in this case though, it may be as low as ~1.3x or something.


I checked the zstd source and i'm surprised to see you're right -

There's a little x86_64 assembly but I don't see a single SIMD instruction anywhere, and no intrinsics neither. Seems like brotli is the same. I assume zstd still gains something from SIMD autovectorization in the compiler, that might be interesting to benchmark with- and without- such a flag.

Since the zstd bytestream got frozen in RFC8478, messing with the layout too much will require a zstd2 and moving the whole world again to use it (linux kernel compression, rpm binary format, etc)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: