I think CDs use two layers of codes interleaved in some non-obvious pattern. Errors in the smaller blocks that take out a whole block are actually spread amongst multiple large blocks, and you know what bits are bad so you're correcting erasures instead of errors, which is easier.
So basically the issue is that large codes (ex: RS(32768, 32768), a 32kb message+32kb parity) are very expensive. I don't quite remember the math off the top of my head, but I believe to form error-correction over a code like this, you need to make a 32768x65536 matrix (2GB). Which... quite possibly can work on modern computers, but you can imagine that in the 80s and 90s this is infeasible.
The largest codes in practice are 8-bit (RS(223,32) as the satellite guy mentioned). CD-ROMs are much much smaller RS(32,28) inner + RS(28,24) outer-code.
Instead of making such a large code, you can make a smaller interleaved code that handles a burst-error like the large code, but no other kinds of errors.
------
CD-ROMs, designed in the 80s, would go for the interleaved code. (1GB of RAM on every CD-ROM reader would have been prohibitively expensive in the 80s!)
In practice, its burst-errors that happen with the most frequency anyway. Random errors are exceedingly rare in practice, most errors are "localized" to a single region. In this paper-example, a tear or rip in the paper would erase many bits that are "next" to each other.
By focusing on "burst" errors instead of "all posssible random errors", you end up with much faster computations that use much smaller matricies.
--------
I hear that the new hotness for year 2000+ era is Low-Density Parity Checks. But I don't actually know how they work / or their math at all. I know they're an "imperfect" code from a theoretical basis (they'll correct fewer errors than a simple Reed-Solomon code), but they are "less imperfect" than interleaved codes and other codes designed for burst-errors... while having even better performance characteristics.
Reed Solomon is great... if you got the compute power for it.
Another important piece is that the total size of the code (223+32) is determined by the _word size_; in the common case, 8 bits. For an N-bit word architecture, your code may contain at most -1+2^N bits, or 255. So to practically implement an RS(32k, 32k), you'd need an efficient way to operate on 16 byte words, which we just didn't have in the 70s when these codes were devised.