Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> but merely two times as large as a 64-bit number. That's nothing.

It's literally 2x as large. This starts to matter quite a bit at the millions/billions of rows scale.

> They're not even that bad for performance, assuming you have random writes regardless.

> Please share the benchmarks to back this up.

I've done [0] benchmarks, with [1] source to recreate it. This was for loading data, but it demonstrates quite nicely the problem of massive B+tree splits. An integer PK loads data ~2x as quickly as a UUIDv4 PK stored as `uuid` type, and ~3x as fast as one stored as `text` type.

> But most Postgres users have 1001 problems that are much bigger than UUIDs.

All of which become compounded by the use of non-k-sortable keys.

> If you've managed to avoid folks writing shitty queries and this is the bottleneck you're facing then I salute you.

Using UUIDs as a PK without having an extremely good reason to do so demonstrates a lack of understanding of how the underlying technology works. Worse, it demonstrates a lack of understanding of a basic data structure, something that devs should have near and dear to their heart.

With Postgres specifically, the use of UUIDv4 (or ULID) PKs will also cause an enormous amount of WAL bloat [2], unless you happen to be rolling your own DB and are using ZFS for the file system, and you've disabled `full_page_writes` (which is the only safe way to do so). I have personally seen this take down prod, because the other nodes couldn't keep up with the changes being issued (fun fact, most AWS instances under `.4xlarge` in size have a baseline network speed that's much lower than rated).

[0]: https://gist.github.com/stephanGarland/ee38c699a9bb999894d76...

[1]: https://gist.github.com/stephanGarland/fe0788cf2332d6e241ff3...

[2]: https://www.2ndquadrant.com/en/blog/on-the-impact-of-full-pa...



> This starts to matter quite a bit at the millions/billions of rows scale.

Only if you have no other data in the database? Our UTF8 encoded strings easily beat all the UUID’s in terms of storage size.


I doubt those are also duplicated repeatedly in indexes throughout the database.


> I've done [0] benchmarks, with [1] source to recreate it.

Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

Comparing random-insert to insert-at-end, if that is what you are indeed doing, is silly and not representative of real-world scenarios.

Even then, it's _less than_ a 2x overhead, which is far from an order-of-magnitude difference and squarely within the realm of "I don't care."

> All of which become compounded by the use of non-k-sortable keys.

But that's the thing -- they don't! Bad queries, like missing an index, improperly pruning via predicate pushdown and recursive queries will all dominate performance regardless of what you do here. We're not talking _quadratic_ behavior here. The compounding effect you are worried about is negligible for the vast, vast majority of workloads.

> With Postgres specifically, the use of UUIDv4 (or ULID) PKs will also cause an enormous amount of WAL bloat [2],

Again, only for _random insertions_, which is not a problem specific to UUIDs. Yes I agree that if you are rewriting all of your tables all the time you're going to have a bad time. However UUIDs are not inherently problematic as UUIDv7 or their predecessor ULIDs facilitate temporal ordering, which nearly eliminates the disadvantages discussed here.

I fail to see how ULIDs inherently cause WAL bloat, and checking my notes (read: databases) shows they do not. Why do _you_ believe otherwise?

> Using UUIDs as a PK without having an extremely good reason to do so demonstrates a lack of understanding of how the underlying technology works. Worse, it demonstrates a lack of understanding of a basic data structure, something that devs should have near and dear to their heart.

What is problematic is not understanding the basic data structures and the underlying technology, which is the root of most database issues, which is my point: you almost always have bigger fish to fry. Whether or not you opt to wield UUIDs will not save you here.

In fact, 2ndquadrant agrees with me[0]:

> Chances are your database schema is more complicated and uses various other indexes, so maybe the issues due to UUIDs are fairly negligible in the bigger picture.

[0]: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...


> Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

The names were randomly generated, as were the UUIDs. Serial / Identity were of course handled by the DB itself. Nothing was pre-sorted.

> 2ndquadrant agrees with me

I wouldn’t say a summary “maybe the issues are fairly negligible” counts as agreeing. Do terrible queries matter more? Often, yes. However, I’ve found there is a strong correlation between devs using UUIDv4s (even after having been warned of the consequences) and poor schema and query design.

I’m not going to argue any points further as I don’t see it being fruitful. My day job is DBRE; I have ran both MySQL and Postgres DBs at the TB scale, managed and self-hosted. I’m sure there are many who have ran far larger clusters, but I believe I have enough experience seeing the impact first-hand to be able to speak about them.


If your argument boils down to "bad engineers use them" then I don't know what to tell you. There's nothing intrinsically wrong or harmful with UUIDs, and they're not really any worse of a footgun than any other feature.

I don't particularly care about your credentials, but it's harmful to preach extremes. UUIDs don't deserve the hate they're getting here.

Credentials: Postgres (and begrudgingly MySQL) for thousands of developers at the TB scale, 100TB scale, as well as PB scale (we don't talk about that one), and all (yes, all) the Postgres-related OLAP DBs, as well as maintaining a fork of Postgres for some time.

The FUD here is unreal.


UUIDs are an inefficient bloated way to store PKs and only make sense in scenarios where distributed clients need a way to independently generate conflict-free identifiers.


To be honest, it sounds like you've made some of the terrible mistakes with uuid PKs described in this thread and now you're just very aggressively defending the poor decision making.

There's no FUD. These are facts.


I haven't! I literally never think about these things because they are literally never problems. It's frustrating to see people suggest that such things are intrinsically problematic because they have misused, or witnessed them be misused, without recognizing that they are not intrinsically worse or better than alternatives.

I've witnessed severe, horrifying problems, absolute nightmares, due to misuse of serial/bigserial. I'm not here trying to convince anyone not to use them. Right tool for the right job and all.

The argument here reads to me akin to suggesting that, a baseball bat is not very good at cutting loaves of bread! don't use baseball bats! you should really stick to knives!

Can we perhaps just agree that it's both possible to hold knives without cutting yourself and possess and wield a baseball bat without ruining your sandwich?

Anyways, at this point you're just posting flamebait and I don't particularly enjoy the direction the thread has taken by pulling credentials and implying inadequacy. It's not constructive. I hope at least some readers are convinced that they need not fear the UUID.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: