Hacker Newsnew | past | comments | ask | show | jobs | submit | traderj0e's commentslogin

Spanner in particular wants random primary keys. But there are sharded DBMSes that still use sequential PKs, like Citus. There are also some use cases for semi-sequential PKs like uuid7.

What about spanner specifically benefits from random ids over sequential ones?

I'm not an expert on Spanner, supposedly it's due to hotspotting. Your data is partitioned by primary key, and if you make that sequential, all new writes will hit the same master. https://docs.cloud.google.com/spanner/docs/schema-and-data-m... explicitly recommends a uuid4 or some other options

That's another thing, some say to use uuid7 for sharded DBs, but this is a serious counterexample.


Is there a problem with that?

Not the original commenter, but I thought sqlite had that title.

sqlite is arguably not really a DBMS, just a DB

It's technically a DBMS, but I can see why you wouldn't compare it to MySQL.

The database is the file, the management system is the api to use the file.

I've known for a long time that you usually want b-tree in Postgres/MySQL, but never understood too well how those actually work. This is the best explanation so far.

Also, for some reason there have been lots of HN articles incorrectly advising people to use uuid4 or v7 PKs with Postgres. Somehow this is the first time I've seen one say to just use serial.


> incorrectly advising people to use uuid4 or v7 PKs with Postgres

random UUIDs vs time-based UUIDs vs sequential integers has too many trade-offs and subtleties to call one of the options "incorrect" like you're doing here.

just as one example, any "just use serial everywhere" recommendation should mention the German tank problem [0] and its possible modern-day implications.

for example, if you're running a online shopping website, sequential order IDs means that anyone who places two orders is able to infer how many orders your website is processing over time. business people usually don't like leaking that information to competitors. telling them the technical justification of "it saves 8 bytes per order" is unlikely to sway them.

0: https://en.wikipedia.org/wiki/German_tank_problem


PK isn't the same as public ID, even though you could make them the same. Normally you have a uuid4 or whatever as the public one to look up, but all the internal joins etc use the serial PKs.

> Normally you have a uuid4 or whatever as the public one to look up, but all the internal joins etc use the serial PKs.

what? that's possible, but it's the worst of both worlds. I've certainly never encountered a system where that's the "normal" practice.

the usual reason people avoid UUIDv4 primary keys is that it causes writes to be distributed across the entire B-tree, whereas sequential (or UUIDv7) concentrates them.

but if you then add a "alternate primary key" you're just re-creating the problem - the B-tree for that unique index will have its writes distributed at random.

if you need a UUID PK...just use it as the PK.


The problem isn't so much the writes, it's the reads. Every time you join tables, you're using a PK 2-4x the size it needs to be, and at least that much slower. Even filtering on a secondary index may involve an internal lookup via PK to the main table. It doesn't take long to start noticing the performance difference.

Since you'd have a secondary index for the public UUID, yes that one index suffers from the random-writes issue still, but it takes a lot of volume to notice. If it ever is a big deal, you can use a separate KV store for it. But if you picked UUID as the PK, it's harder to get away from it.


DB perf considerations aside, a lot of software pattern around idempotency/safe retries/horiz-scaling/distributed systems are super awkward with a serial pk because you don’t have any kind of unambiguous unique record identifier until after the DB write succeeds.

DB itself is “distributed” in that it’s running outside the services own memory in 99% of cases, in complex systems the actual DB write may be buried under multiple layers of service indirection across multiple hosts. Trying to design that correctly while also dealing with pre-write/post-write split on record id is a nightmare.


Simple sequential IDs are great. If you want UUID, v7 is the way to go since it maintains sequential ordering.

There are subtle gotchas around sequential UUID compared to serial depending on where you generate the UUIDs. You can kinda only get hard sequential guarantee if you are generating them at write time on DB host itself.

But, for both Serial & db-gen’d sequential UUID you can still encounter transaction commit order surprises. I think software relying on sequential records should use some mechanism other than Id/PK to determine it. I’ve personally encountered extremely subtle bugs related to transaction commit order and sequential Id assumptions multiple times.


Does all of that apply to Postgresql as well or only Mysql?

Both, assuming you’re ever going to index it - both use a form of a B+tree for their base indices.

If it’s just being stored in the table, it doesn’t matter, but also if it doesn’t matter, just use v7.


> just use serial

Ideally you use IDENTITY with Postgres, but the end result is the same, yes.


I view it as an arms race. We even went beyond college degrees being common. Now it's fairly common to also do grad school and other resumé-padding. Yeah that means more learning, but there's also a big zero-sum aspect to this.

Specific example is medical school/residency. To "DMZ" this, they'd need to ignore anything students do during gap years, ignore research too unless it's an MD-PhD program. Everyone should be going straight through unless some personal challenge forces them to delay.

I don't look to CS as an example because it's an unusual bubble on top of all that. CS degrees also became super competitive and subsequently worthless around 2000.


iTerm2 gives you that then. I use it every day at work. Idk why there's no equivalent for Linux.

No, wanting to keep things vanilla when you're dealing with lots of random servers is a valid concern. Just because you can solve this with shell scripting doesn't mean you should.

You could always copy the config to /tmp and use the -f flag.

That's one event containing three markets, each yes/no. And in a way each market is two separate markets, buy/sell yes and buy/sell no, but they mirror each other.

I understand that. That's not my question tho. I am asking for the exact meaning of the 73% number.

It's not. But also a lot of those stats thrown around are misleading.

If the average no costs less than 73 cents, but the 73% of all polymarkets resolve to No, that would imply that the nothing-ever-happens strategy here is profitable. Are you claiming that it is profitable? Or are one of those premises incorrect?

Edit: conversely, if the average no costs _more_ than 73 cents, but the 73% of all polymarkets resolve to No, that would imply that an everything-always-happens strategy is profitable (neglecting slippage)


> Edit: conversely, if the average no costs _more_ than 73 cents, but the 73% of all polymarkets resolve to No, that would imply that an everything-always-happens strategy is profitable (neglecting slippage)

Or just the bid-ask spread; price no at 73.25 and yes at 27.5 and you have a profitable but theoretical mid-market price.


From what I've seen and tested, it's been profitable, for the reason you said. Variance and other caveats caused me to not pursue it further. https://news.ycombinator.com/item?id=47754918

Are you willing to pay $.27 for that perspective? Sounds like we have a market!

I've backtested this kind of strategy, and it had a good return (like 100% APR), but then I realized it was cheating by knowing when things are going to resolve. Often times it's not clear. Your return depends a lot on how quickly you can get your money out. I never got around to trying a strat that doesn't know the resolution time, which actually has to be manual cause it takes some judgement to pick things that you expect to resolve soon.

Also requires a lot of volume to be "predictable" obviously, since 1 loss sets you back 10-20 wins. It's surprisingly hard to find reasonable-liquidity markets after all your filtering. Many have huge spreads or thin books. Scare quotes around "predictable" because you never know if others will use this strat or a lot of unlikely events will happen due to insiders.

Another thing, just like the author, I was excluding sports in all the above. Yes Polymarket is famous for letting people bet on world events etc, but turns out it's still more about sports. Betting on the overdog in sports markets seems more appealing because there are plenty of those events with large volume, they're kinda homogenous, you know exactly when they resolve, and they're harder to rig. I simply never got around to putting real time or money into the overdog strat.


> One loss sets you back 10-20 wins.

didn't look at the numbers, but this one sentence reminds me of selling options for 'passive income' (don't do that)


I drew the same analogy. You put up $0.95, a YES gambler only puts $0.05 (ignoring spread); you're providing "insurance" in case of a YES. In theory, even if the market prices reflected the true probability of the event happening, the more expensive side should be netting some "insurance premium" on average, right? Not sure, and idk how to observe if that's happening.

Polymarket is also holding onto the money in the meantime. Idk what they do with it, but it's not like some other platforms where they at least work with a bank to earn you some tiny interest on it.


They probably make interest off it themselves, so to give you any interest would cut into their margins.

LTCM doing that was an early example of "too big to fail". In the late 90s.

Quintessential hustler logic: inability to compare the gains from wins to inevitable losses.

I assume all the 'no' bets have to have an explicit end date, otherwise the 'no' bet could never win? The time horizon is never unknown on these bets.

The time horizon is unknown sometimes. One example event, "what will happen before GTA VI?" with markets like "China invades Taiwan" and "Jesus Christ returns." The NO for the second one is only 52c rn. Maybe that resolves if GTA VI is permanently canceled?

Yeah, those sorts of bets seem clearly bad unless there is an explicit time limit on them. Factoring in how long your capital is locked up in the bet and the opportunity cost of it being locked up has to be accounted for in determining your expected gains.

Forgot to add, I wet-ran the non-sports strat once with $100 and lost like $5 net across the month. Not enough diversification like I said, so yeah maybe I'd make $6 the next month. I could only find like 10 things that met all the criteria: ≥90c price, low spread, thick enough ask ($5?), not sports, not related to certain topics that I thought were rigged (eg Mr Beast, Trump saying keywords in speech), likely resolving within a week. Some of them also weirdly took longer to resolve than the title suggested.

> 1 loss sets you back 10-20 wins

Good old eat like a bird, poop like an elephant.


I've always written my code in vim and preferred CLIs in general, but I really want a GUI for the terminal itself, including tmux. iTerm2 makes it nice for example, even if it's only to use the meta/super key instead of the heavily overloaded control.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: