Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes and that's probably why there is no set in the go std lib. You just can use struct{}{} as (empty) value in a map.


As an aside, this is exactly how HashSets are implemented in the Rust standard library.[0]

[0]: https://rust-lang.github.io/hashbrown/hashbrown/hash_set/ind...


I thought the idomatic approach was to represent sets with map[key]bool


The problem of doing that is twofold:

* each key now has 3 possible states (true, false, and unset) rather than two

* a bool takes 1 byte to store (which may get more problematic due to alignment, I've never checked what the memory layout of go's map is so I don't know how much of a concern it is there)

An empty struct fixes these issues: a key being present means the item is in the set, and an empty struct is zero-sized.

edit: apparently go maps are laid out as buckets of 8 entries with all the keys followed by all the values, so there's no waste due to padding at least.


As someone who finds "indicating intent in the code" an important thing, I must admit I find this concert slightly horrifying. A Map and a Set are two different things and which one you use conveys some intent as to what you mean by your code. I get that it works, but it would still make me unhappy to do.


Ideally, if the language/standard library provides maps but not sets, and you wanted to use the idiomatic set = map of type -> bool approach, you'd create a wrapper so that intent is preserved but users don't have to know about the backing mechanism. Of course, it's obnoxious if everyone has to do this themselves and the language lacks generics so you have to write this once for each potential type.


Wrappers that don't wrap very much aren't worthwhile IMO. Its like getting an amazon box with a fedex box inside. Just give me the package itself.


Shallow wrappers that don't wrap much (now) but convey intent better are valuable if you buy into the idea of modularity and encapsulation in general. Some reasons (probably more out there):

1. The now-provided interface can more clearly express what the code is intending to do (better names for the operations you're providing than the underlying system has, remove_from_end to pop or dequeue)

2. Hide methods of interacting with the underlying data structures that you don't want people to use (use a C++ vector as a stack, but don't want random access)

3. You can replace the underlying mechanisms at will without impacting the users

If you just wrap a vector in your own vector class and otherwise provide the same operations (or a limited set of operations but for no good reason to restrict usage), sure, that's moronic. But if you wrap a vector class in a "BigNumber" class and provide operations like add, subtract, mod, etc. then value has been added. Same thing with the idea of wrapping a map in a set interface.


Wrapping to hide is valuable, but wrapping has a cost which is generally underrated. Every wrapper is a thing itself which must also be understood when trying to understand how things work. And every wrapper is a division between blocks of code, meaning if you have changes which impact multiple layers of wrap, its harder to determine what to change, and to maintain the understandability of each layer.

For this reason im an advocate of lazy wrapping. Create an abstraction at the last moment, when its painfully obvious what benefit it will provide, when you can see how it ties together disparate pre-existing code blocks, and when you have the highest confidence that it will stick and not need to be unwrapped next week by the senior dev.


> Every wrapper is a thing itself which must also be understood when trying to understand how things work.

I'd offer a different view. Wrapping/abstracting like this should reduce the amount of things a user of the abstraction needs to know. I don't care how Java's BigInteger class works under the hood, only that it does what I need it to do. If I did have to know how it worked to use it, this suggests a failure on the part of whoever created it.

It does increase what the maintainer of the underlying system (including the abstraction) needs to know, but if done in a sane manner this should not be a burden. So we're making a tradeoff. The user gets something simpler, the underlying system maintainer gets something a bit more complex. Or the user gets something more complex and with more boilerplate but the underlying system maintainer gets something simpler (though will be pestered with, "Why don't you offer a generic set yet?" asked for years to come).

> meaning if you have changes which impact multiple layers of wrap, its harder to determine what to change, and to maintain the understandability of each layer.

When this happens, in my experience, it has meant one or more of:

1. The choice of how to wrap/abstract was poorly chosen

2. The choice was made too early (before the problem was properly understood)

3. A major change was made that would've been hard to identify/plan for earlier

I ignore (3) when writing code beyond what's reasonable to plan for. (1) and (2) though mean I mostly agree with this:

> Create an abstraction at the last moment

But rephrased, borrowing the phrase I first saw in some Lean Software book, "last responsible moment." It's not sensible, for instance, to use a map to booleans as a set throughout the project's life and only wrap it at the last moment. If you know it's going to be a set, wrap it early because this offers clarity to your code and reduces boilerplate/noise. If you know you need a stack, and have a vector available, wrap it and hide the random access option. If it later turns out that you also want random access, you can offer it, but if it's been available from the start then users will have abused that and you won't be able to rein it in later (without a lot of effort and heartache).


I'm not sure how I feel about the `set` wrapper. I suppose its nice to hide some of the detail of how the set works. On the other hand, it is confidence inspiring to be told "this is just a map, its really that simple" as a user. I have a similar conflict about string alias types like `type MyId string`.


But it's not a Map, it's a Set. If I see an API return a Map, I expect it's returning a relationship of keys to values, because that's what a Map is used for. If I see it returning a Set, I expect it's return a collection of unique values, because that's what a Set is used for.

I mean, you could support only List objects in the language and call it a day because they can be used as anything else. Or only lambdas, for the same reason. At the end of the day, though, having structures for the various ways you want to treat data is helpful. Using the right structure to hold data reduces cognitive load.


> A Map [with value type = void/unit/() [0]] and a Set are two different things

Not to defend Go or anything, but that's like saying:

| A Array [with element type = byte/char/u8] and a String are different things

It might be useful to call them different names (of course that would require Go to support generic typedefs for `type Set k = Map k Void`), but they're still fundamentally the same thing.

0: which, to be fair, is not the same thing as a map with bool values.


Fundamentally, all values are a collection of bits. That doesn't mean having distinct structures built on top of those bits isn't useful.


A map[T]bool has 3 states for every key; absent, true, and false. A map[t]struct{} has 2 states for every key; absent, and present.

People new to Go tend to pick map[T]bool or map[T]int because they're used to using bools and ints throughout their code, but struct{} is the correct value type for sets. (That is not to say that a counting set, map[T]int is useless, however. If you need that, use that!)


I usually use map[T]bool because

  if _, ok := wasTouched[thing]; !ok {
    touch(thing)
    wasTouched[thing] = struct{}{}
  }
is way uglier than

  if !wasTouched[thing] {
    touch(thing)
    wasTouched[thing] = true
  }


That doesn’t actually work unless you first fill all possible values with `false`.


Wrong. The whole point of this construct is that in Go maps return default values for keys that do not exist. The default value for bool is false.


Hehe. It definitely was.. I think this is also still somewhere in "Efficient Go". However this seems to have changed in recent years. I was surprised by this too and personally I still prefer the bool even though it uses a bit more memory.

People argue there are 3 states but it is meaningless in my opinion because you can just ask exists := someMap[someKey] without checking for existence as you do with real maps. Here false is equivalent to not existent.


The bool is meaningless, so empty struct is more clear as it contains no state.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: