Same, also I'd love to know more about the technical details of their logging format, the on-disk storage format, and why they were only able to reduce the storage size to 20% of the uncompressed size. For example, clp[1] can achieve much, much better compression on logs data.
Exactly! Which is again one of the reasons it's confusing that people apply full text search technology to logs. Machine logs are quite a lot less entropic than human prose, and therefore can be compressed a whole lot better. A corrollary is that because of the redundancy in the data "grepping" the compressed form can be very fast, so long as the compression scheme allows it.
If the query infrastructure operating on these compressed data is itself able to store intermediate results, then we've killed two birds with one stone because we've also gotten rid of the restrictive query language. That's how cascading mapreduce jobs (or Spark) does it, allowing users to perform complex analyses that are entirely off the table if they're restricted to the lucene query language. Imagine a world where your SQL database was one giant table and only allowed you to query it with SELECT. That's pretty limiting, right?
So as a technology demonstration of Quickwit this seems really cool--it can clearly scale!--but it's kind of also an indictment of Binance (and all the other companies doing ELKish things out there).
[1] https://github.com/y-scope/clp
EDIT: See also[2][3].
[2] https://www.uber.com/blog/reducing-logging-cost-by-two-order...
[3] https://www.uber.com/blog/modernizing-logging-with-clp-ii/