Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is true of the common tombstone approach to deletions in hash tables { which also require rehashing (like in resizing) if there are too many tombstones }.

Somehow dissemination of Knuth v3,chapter6.4 Algorithm D has been weak, though it has lately become known as "backshift deletion" - unsure who coined that. It is limited to linear probing (but then these days that also creates the least memory traffic/potential latency). With this approach, there is no real specialized form of "hash tables that can delete". You may already know all of this and not disagreeing with anything. Just a natural follow-up.



If we look at the major standard libraries for hash tables, you're telling me there will be no overhead to supporting the ability to delete keys?

Isn't resizing logic already a counterexample?


Resizing has to happen regardless of the ability to delete a key, unless you know upfront how many entries you're going to add; I'd be much more worried about a home grown hash table's tuning to rebalance itself at say, 80% buckets full.

If you _do_ know how many entries you're going to be adding upfront, you probably don't even want a hash table, you're probably better off with an array!


You didn’t answer my question.

> I'd be much more worried about a home grown

This is rhetorical framing.

As a knuth reader you should know then when you look inside the box you find some other idiot’s home grown thing who never read the research.


> look at the major standard libraries for hash tables

I don't know what would count as "major", but at least to me "major" does not imply "good". As I mentioned, this idea is ancient (the point of citing Knuth from over half a century ago - that was me, not @dwattttt) but also weakly disseminated. In said weakness, it may well be uncommon in "major" impls, but that merely recapitulates my point. Honestly, that and the resizing thing was such weird argumentation that I didn't take the time to reply.

One does usually resize on insert to keep things sparse as @dwattttt mentioned. With deletions, might want to resize after delete to keep things dense which could help if you iterate over the table much after deleting a lot of it. That is not a cost you would ever pay if you don't delete. So, ability to delete is still something insert-only workloads don't pay anything for.

Moving past the off-point qualification of "major"-ness & weird resize points, if you are actually curious about more details how this can work and want a concrete example, you could look at the Nim stdlib Table implementation: https://github.com/nim-lang/Nim/blob/fbdc9a4c19aafc25937aaa5... or you could read the Knuth subsection that I referenced, but Knuth uses a rather archaic linear probing to smaller indices which probably seems unnatural to modern readers. There is nothing special added to the insert-only data/pathways to support deletes either in that Nim code or the Knuth.

Deletes done this way can benefit from saving integer hash codes (and that happens to be done in the Nim example), but so do all finds via prefix compare & resizes by avoiding hash(). So, in the space-time trade-off analysis of "is saving hash codes worth it?", delete finds & backshifts are just one more part of some workload. This is "deletes in a workload impact space-time trade-offs", though, not "ability to delete costs insert-only workloads" which seems to be the contentious point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: