B-Trees - Balanced Search Trees for Slow Storage

timr · on July 7, 2008

This is a really sloppy summary. First, the given example is broken (the numbers in the nodes change from fig 1 to fig 2). Second, the delete operation is only like the insert operation if you look at it from a distance, squint, and tap your heels together three times. Implementing the delete is about an order of magnitude more difficult than the insert, IMO. An article on the details of that would be enlightening.

(Personal anecdote: in college, I spent a nightmarish two days implementing the delete operation on a B-tree, because I had put it off until the last minute, thinking that it was "just like the insert, but with contractions". I was wrong. I don't think I've spent that much consecutive time in one chair before or since....)

noodle · on July 7, 2008

b-trees. ugh. i wrote code around b-trees in my data structures class in college. but to my surprise, when i got out, pretty much no one in the "real world" (hiring for entry-level positions) use them. they mostly use things that we didn't ever actually play around with -- stuff like hash tables.

ajross · on July 7, 2008

B-trees (and related data structures) are pervasive in the real world. Pretty much any application that requires high speed access from disk uses them. Think about filesystems, databases, etc... They're hugely useful.

But they're not useful for the same purpose as a hash table, which is also a pervasive data structure, but for a different problem area: constant time access to in-memory data.

noodle · on July 7, 2008

yes, i know. my point is, try finding an entry-level position that requires knowledge and experience of b-trees and not hashtables.

cdr · on July 7, 2008

Like a lot of things, it's good to know for the interview but most likely useless after that.

lampy · on July 7, 2008

Try finding a senior-level engineering position that doesn't require knowledge of B-trees or similar computer science topics.

Tamerlin · on July 7, 2008

So far in my experience the companies hiring want you to know that sort of thing, but that knowledge is largely wasted at most jobs.

Tamerlin · on July 7, 2008

Knowing what a binary tree is turned out to be overkill for even most of the senior developer positions I've had.

dfranke · on July 7, 2008

Hash tables can be used on disk as well.

Tamerlin · on July 7, 2008

That's certainly true. They're also useful for distributed caches, but that doesn't make B-trees any less useful when the situation calls for it.

dfranke · on July 8, 2008

B-trees are basically the only sane option if you ever need to do range queries. Hash tables tend to outperform them a bit on all other operations.