> but merely two times as large as a 64-bit number. That's nothing. It's literal...

Aeolun · on June 13, 2024

> This starts to matter quite a bit at the millions/billions of rows scale.

Only if you have no other data in the database? Our UTF8 encoded strings easily beat all the UUID’s in terms of storage size.

VWWHFSfQ · on June 13, 2024

I doubt those are also duplicated repeatedly in indexes throughout the database.

xyzzy_plugh · on June 12, 2024

> I've done [0] benchmarks, with [1] source to recreate it.

Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

Comparing random-insert to insert-at-end, if that is what you are indeed doing, is silly and not representative of real-world scenarios.

Even then, it's _less than_ a 2x overhead, which is far from an order-of-magnitude difference and squarely within the realm of "I don't care."

> All of which become compounded by the use of non-k-sortable keys.

But that's the thing -- they don't! Bad queries, like missing an index, improperly pruning via predicate pushdown and recursive queries will all dominate performance regardless of what you do here. We're not talking _quadratic_ behavior here. The compounding effect you are worried about is negligible for the vast, vast majority of workloads.

> With Postgres specifically, the use of UUIDv4 (or ULID) PKs will also cause an enormous amount of WAL bloat [2],

Again, only for _random insertions_, which is not a problem specific to UUIDs. Yes I agree that if you are rewriting all of your tables all the time you're going to have a bad time. However UUIDs are not inherently problematic as UUIDv7 or their predecessor ULIDs facilitate temporal ordering, which nearly eliminates the disadvantages discussed here.

I fail to see how ULIDs inherently cause WAL bloat, and checking my notes (read: databases) shows they do not. Why do _you_ believe otherwise?

> Using UUIDs as a PK without having an extremely good reason to do so demonstrates a lack of understanding of how the underlying technology works. Worse, it demonstrates a lack of understanding of a basic data structure, something that devs should have near and dear to their heart.

What is problematic is not understanding the basic data structures and the underlying technology, which is the root of most database issues, which is my point: you almost always have bigger fish to fry. Whether or not you opt to wield UUIDs will not save you here.

In fact, 2ndquadrant agrees with me[0]:

> Chances are your database schema is more complicated and uses various other indexes, so maybe the issues due to UUIDs are fairly negligible in the bigger picture.

[0]: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...

sgarland · on June 12, 2024

> Where's the source for the data files? Without that it's not possible to make sense of these benchmarks. Are the keys sorted?

The names were randomly generated, as were the UUIDs. Serial / Identity were of course handled by the DB itself. Nothing was pre-sorted.

> 2ndquadrant agrees with me

I wouldn’t say a summary “maybe the issues are fairly negligible” counts as agreeing. Do terrible queries matter more? Often, yes. However, I’ve found there is a strong correlation between devs using UUIDv4s (even after having been warned of the consequences) and poor schema and query design.

I’m not going to argue any points further as I don’t see it being fruitful. My day job is DBRE; I have ran both MySQL and Postgres DBs at the TB scale, managed and self-hosted. I’m sure there are many who have ran far larger clusters, but I believe I have enough experience seeing the impact first-hand to be able to speak about them.

xyzzy_plugh · on June 13, 2024

If your argument boils down to "bad engineers use them" then I don't know what to tell you. There's nothing intrinsically wrong or harmful with UUIDs, and they're not really any worse of a footgun than any other feature.

I don't particularly care about your credentials, but it's harmful to preach extremes. UUIDs don't deserve the hate they're getting here.

Credentials: Postgres (and begrudgingly MySQL) for thousands of developers at the TB scale, 100TB scale, as well as PB scale (we don't talk about that one), and all (yes, all) the Postgres-related OLAP DBs, as well as maintaining a fork of Postgres for some time.

The FUD here is unreal.

hipadev23 · on June 13, 2024

UUIDs are an inefficient bloated way to store PKs and only make sense in scenarios where distributed clients need a way to independently generate conflict-free identifiers.

VWWHFSfQ · on June 13, 2024

To be honest, it sounds like you've made some of the terrible mistakes with uuid PKs described in this thread and now you're just very aggressively defending the poor decision making.

There's no FUD. These are facts.

xyzzy_plugh · on June 14, 2024

I haven't! I literally never think about these things because they are literally never problems. It's frustrating to see people suggest that such things are intrinsically problematic because they have misused, or witnessed them be misused, without recognizing that they are not intrinsically worse or better than alternatives.

I've witnessed severe, horrifying problems, absolute nightmares, due to misuse of serial/bigserial. I'm not here trying to convince anyone not to use them. Right tool for the right job and all.

The argument here reads to me akin to suggesting that, a baseball bat is not very good at cutting loaves of bread! don't use baseball bats! you should really stick to knives!

Can we perhaps just agree that it's both possible to hold knives without cutting yourself and possess and wield a baseball bat without ruining your sandwich?

Anyways, at this point you're just posting flamebait and I don't particularly enjoy the direction the thread has taken by pulling credentials and implying inadequacy. It's not constructive. I hope at least some readers are convinced that they need not fear the UUID.