This was a lovely passage from Dyson’s Web of Stories interview, and it struck a chord with me, like it clearly did with the authors too.
It happened when Dyson took the preliminary results of his work on the Pseudoscalar theory of Pions to Fermi and Fermi very quickly dismissed the whole thing. It was a shock to Dyson but freed him from wasting more time on it.
Fermi: When one does a theoretical calculation, either you have a clear physical module in mind or a rigorous mathematical basis. You have neither. How many free parameters did you use for your fitting?
Dyson: 4
Fermi: You know, Johnny Von Neumann always used to say ‘with four parameters I can fit an elephant; and with five I can make him wiggle his trunk’.
Indeed, but worth noting that LZ is a modelling scheme, whilst Huffman is a coding technique.
That is, LZ determines, dynamically as it goes, what are all the elements we want to encode and their probabilities. Then you need a coder, like Huffman, to actually encode it.
In the post I used a semi-static zero-order byte-based model.
Which means I counted the byte occurrences first and just used that count for the probabilities throughout all of the encoding. Then I used Huffman codes to translate those probabilities into bits.
But I'm considering writing a follow-up changing this static model for an LZ77 one as I think that would be fun.
Haskell's speed can be competitive with systems languages but keep in mind that its killer feature is ease of abstraction.
The idea is that it is simple to assemble multiple parts into a coherent, well organised program. Which is important for the entirety of the program, no just the tight loop.
So, with the nice FFI Haskell has, you can always drop down to languages without a GC for inherently imperative optimisations. Then you wrap that into a library with nice types and you can now leverage that raw power anywhere in your Haskell code where the types will match.
I worked at Meta in a high performance Haskell application and that's what we did. Wrote beautiful, large, fast Haskell programs which in some specialised parts had C++ building blocks. 99% of the time was spent on Haskell land composing things into more and more useful applications.
The general rule of thumb I’d give is that a performance aware but not micro-optimized Haskell program will typically run in about 2x to 5x the time of a comparable C program, and will take somewhere between 2x and 10x as much memory. For a naive Haskell program the range is much bigger- maybe 2x to 10x as much time and 10x to 1000x as much memory (it’s easy to do a lot of allocations in Haskell).
For extremely optimized Haskell you can get close to the speed of C, but there’s still a garbage collector.
There are also certain classes of problem where a naive Haskell implementation can beat other languages by mile, including C, if you use the same implementation in both languages. Laziness can be really great sometimes. This didn’t happen much in practice though because the kind of code that’s really efficient with lazy evaluation is very obviously not in a strict language so people don’t usually write code that way.
In the end I’d say Haskell is a good choice for performance sensitive but not performance critical program. In a larger Haskell application if you have a performance critical bit you can usually write Haskell code that will be fast enough if you know what your doing. For something stand alone that needs to be as fast as possible, or the most critically performance sensitive parts of a bigger application, I’d consider using C or C++.
To rephrase using my experience: "performance aware" Haskell is about as "fast" as Go, but needs more memory, and both are slower than the same Java code - but both are way more fun to write ;). Optimising Go is easier for most people though, in Haskell you _really_ need to know Haskell internals (how to read core and write unboxed code) and understand laziness.
The goal of this implementation is not to be fast, but to be clear.
I am doing some inefficient things (like two pass encoding) on purpose to keep things simple and clear. So using this particular piece of code to judge a language's performance potential is not really the way to go here.
The point they’re making is that there is no performance without tradeoffs and “fast” is meaningless unless you define what you’re measuring. Asking the question implies a misunderstanding of the intent of the implementation, OP was trying to subtly let them know.
That’s interesting. I guess this is not usually used because you may have a long string of bits that is ambiguous till you get to a disambiguating bit.
Something like
`100000000000000001`
In this case, where to know whether the first code was an `a` or a `c` you have to read all the way to where the zeroes end.
There is one way in which Huffman codes are better: they are easier to explain and simpler to implement.
I went for simplicity of exposition in the post, but arithmetic coders can indeed get arbitrarily close to the entropy, which is not quite the case with Huffman.