Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Norwegian PNs have your birth date (in DDMMYY format) as the first six digits. Surely that doesn't change, right? Well, wrong, since although the date doesn't change, your knowledge of it might. Immigrants who didn't know their exact date of birth got assigned 1. Jan by default... And then people with actual birthdays on 1 Jan got told, "sorry, you can't have that as birth date, we've run out of numbers in that series!"

To me, what your example really shows is the problem with incorrect default values, not a problem with encoding data into a key per se. If they'd chosen a non-date for unknown values, maybe 00 or 99 for day or month components, then the issue you described would disappear.

But in case, the intention for encoding a timestamp into a UUID isn't for any implied meaning. It's both to guarantee uniqueness with a side effect that IDs are more or less monotonically increasing. Whether this is actually desirable depends on your application, but generally if the application is as a indexed key for insertion into a database, it's usually more useful for performance than a fully random ID as it avoids rewriting lots of leaf-nodes of B-trees. If you insert a load of these such keys, it forms a cluster on one side of the tree that can the rebalance with only the top levels needing to be rewritten.



>To me, what your example really shows is the problem with incorrect default values, not a problem with encoding data into a key per se. If they'd chosen a non-date for unknown values, maybe 00 or 99 for day or month components, then the issue you described would disappear.

You still have that problem from organic birthdays and also the problem of needing to change ids to correct birth dates.


To add to that, birthdays can clump, just like any seemingly "random" data.


Not significantly. For actual births, a couple holidays have very low rates but clumping into much higher rates happens on no dates.


A million dots scattered randomly over a graph can all land on the exact same coordinate if it’s truly random.

What most people intuit as random is some sort of noise function that is generally dispersed and doesn’t trigger the pattern matching part of their brain


> A million dots scattered randomly over a graph can all land on the exact same coordinate if it’s truly random.

It won't happen though. 0.00000000% chance it happens even once in a trillion attempts.

> What most people intuit as random is some sort of noise function that is generally dispersed and doesn’t trigger the pattern matching part of their brain

Yes, people intuit the texture of random wrong in a situation where most buckets are empty. But when you have orders of magnitude more events than buckets, that effect doesn't apply. You get pretty even results that people expect.


> It won't happen though. 0.00000000% chance it happens even once in a trillion attempts.

It has the same odds as any other specific configuration of randomly assigned dots. The overly active human pattern matching behavior is the only reason it would be treated as special.


>It has the same odds as any other specific configuration of randomly assigned dots

Which doesn't change anything in practice, since it having "the same odds as any other specific configuration" ignores the fact that more scattered configurations are still far more numerous than it (or even from ones with more visual order in general) taken all together.

>The overly active human pattern matching behavior is the only reason it would be treated as special.

Nope, it's also the fact that it is ONE configuration, whereas all the rest are much much larger number. That's enough to make this specific configuration ultra rare in comparison (since we don't compare it to each other but to all others put together).


> >It has the same odds as any other specific configuration of randomly assigned dots

> Nope, it's also the fact that it is ONE configuration, whereas all the rest are much much larger number.

That is the human pattern overactive pattern matching at play. I compared the single configuration of all dots on one location to any other specific configuration. You are not comparing to to _every other configuration_ because they are not the same

You are assigning specific importance to a single valid set of randomly selected data, because it seems significant to our brains.

If I asked you to give me an array of 1 million items containing an x, and y coordinate, what are the odds that any single specific set of items are returned?

Based on your answer to that, what are the odds for a set being return with all the same exact x and y coordinates, and a set with different x, and y coordinates?

if you answer anything other than it being the same chance, then you either don't think the selection mechanism is random, or you are falling to the standard fallacies around randomness


>That is the human pattern overactive pattern matching at play. I compared the single configuration of all dots on one location to any other specific configuration. You are not comparing to to _every other configuration_ because they are not the same. You are assigning specific importance to a single valid set of randomly selected data, because it seems significant to our brains.

That's just how importance works.

It sets some things aside as "significant to our brains". The universe doesn't care, even total heat death is not "important" if one excluses us making a prioritization of things.

Given our classification of orderly configurations as a distinct set, the comparison is between "any from all random-looking/noise-like configuration" vs "any from all orderly-like configurations". And the former are much more.

>if you answer anything other than it being the same chance, then you either don't think the selection mechanism is random, or you are falling to the standard fallacies around randomness

You're confusing the selection mechanism (random) with the classification mechanism that segments the set of possible outcomes into orderly vs not (not random).

As a simpler example, imagine a bag with N loterry numbers on individual cards. If they pick one at random, the chance any number has is 1/N. But the chance that a number OTHER than ours has is N-1/N. Our chances are as good as any other single number's, sure. But they're NOT as good as all other numbers put together.

You're argue that "but all are just sets of coordinates" or "all are just lottery numbers".

Sure, but some of those coordinate sets have importance to us, and others don't. And one of these lottery numbers is important t us, all the others aren't. And since the latter is a much larger group, the posibility of a member of it coming up is too.

That we consider one subset of results more special than the other is not negotiable. It's a thing we actually do in the real world, and it's the premise of the whole discussion.


Lol, reminds me of a story: at his workplace my brother was invited to join a lottery ticket pool where each got to pick the numbers for a ticket. The numbers he picked were 1-2-3-4-5-6. Although the others, mostly fellow engineers, reluctantly agreed his numbers were as likely as the others, after a couple of weeks they neglected to invite him again.


Entropy says it's special. If you have a million dots and 10,000 coordinates, you have 10,000 ways for all the dots to land in the same coordinate, and a zillion kavillion stupillion ways to have somewhere near 100 dots in each coordinate.


No, if its randomly distributed then every specific configuration has the same exact chance of happening.

I am laughing at all the people coming out of the woodwork to reply to my original post in this thread misunderstanding randomness and chance.

If you flip a coin a million times and it lands on head every single time, the millionth and 1 time still has a 50/50 chance of landing on heads


> every specific configuration

Who said anything about specific configurations?

We started this talking about whether things "clump" or not. The result depends on your definition of "clump" but let's say it involves a standard deviation. Different standard deviations have wildly different probabilities, even when every specific configuration has the same probability.

Nobody responding to you is calculating things wrong. We're talking about the shape of the data. Categories. And those categories are different sizes, because they have different numbers of specific configurations in them.

> the millionth and 1 time

I don't see any connection between the above discussion and the gambler's fallacy?


No, because most likely the coin wasn’t a fair coin then, or there was some other bias going on


Im talking about true random. If you believe there is a bias, then you dont believe its a random selection


And then have to enter/handle a non-date through all systems? How do you know if this non-dated person is over the age of minority? Eligible for a pension?

Maybe the answer is to evenly spread the defaults over 365 days.


If you don't know their birthday, you can presumably never answer that question in any case.

If you only know the birth year and keyed 99 as the month for unknown, then your algorithm would determine they were of a correct age on the start of the year after that was true, which I guess would be what you want for legal compliance.

If you don't even know if the birth year is correct, then the correct process depends on policy. Maybe they choose any year, maybe they choose the oldest/youngest year they might be, maybe they just encode that as 0000/9999.

Again, if you don't know the birth year of someone, you would have no way of knowing their age. I'm not sure that means that the general policy of putting a birthday into their ID number is flawed.

Many governments re-issue national IDs to the same person with different numbers, which is far less problematic that the many governments who choose to issue the same national ID (looking at you USA with your SSN) to multiple individuals. It doesn't seem like a massive imposition on a person who was originally issued an ID based on not knowing when their birthday to be re-issued a new ID when their birthday was ascertained. Perhaps even give them a choice of keeping the old one knowing it will cause problems, or take the new one instead and having the responsibility to tell people their number had changed.

Presumably the governments that choose to embed the date into a national ID number do so because it's more useful for their purposes to do so than just assigning everyone a random number.


> or take the new one instead and having the responsibility to tell people their number had changed

Or have the opportunity to scam people into thinking you’re a different person. (E.g. take a $1M loan, go bankrupt, remember your birthday, and take a loan again.)


> To me, what your example really shows is the problem with incorrect default values, not a problem with encoding data into a key per se. If they'd chosen a non-date for unknown values, maybe 00 or 99 for day or month components, then the issue you described would disappear.

well, till you run out of numbers for the immigrants that don't have exact birth date


Belgium's national register number is similar:

YY.MM.DD-AAA.BB

In either the AAA or BB component there is something about the gender.

But it does mean that there is a limit of people born per day of a certain gender.

But for a given year, using a moniker will only delay the inevitable. Sure, there are more numbers, but still limited as there are SOME parts that need to reflect reality. Year, gender (if that's still the case?) etc.


BB is a mod-97 checksum. The first A of AAA encodes your gender in an even/odd fashion, I forgot if its the first or last A doing that. MM or DD can be 00 if unknown. Also MM has +20 or +40 in some cases.

If you know someones birth date and gender, the INSZ is almost certainly 1 in 500 numbers, with a heavy skew to the lower AAA. Luckily, you can't do much damage with someones number,unlike an USA SSN (but I'd still treat it confidential).


> I'd still treat it confidential

Estonian isikukood is GYYMMDDNNNC, and is relatively public. You can find mine pretty easily if you know where to look (no spoilers!). It’s relatively harmless.

Kazakh IIN is YYMMDDNNNNNN (where N might have some structure) and is similarly relatively public: e.g. if you’re a sole proprietor, chances are you have to hang your license on the wall, which will have it.

It’s a bit more serious: I’ve got my mail at the post office by just showing a barcode of my IIN to the worker. They usually scan it from an ID, which I don’t have, but I’ve figured out the format and created a .pkpass of my own. Zero questions – here’s your package, no we don’t need your passport either, have a nice day!

(Tangential, but Kazakhs also happen to have the most peculiar post office layout: it looks exactly like a supermarket, where you go in, find your packages (sorted by the tracking number, IIRC), and go to checkout. I’ve never seen it anywhere else.)


> If they'd chosen a non-date for unknown values, maybe 00 or 99 for day or month components, then the issue you described would disappear

> Norwegian PNs have your birth date (in DDMMYY format) as the first six digits.

You can already feel the disaster rising because sone program expects always the latter.

And it doesn’t fix the problem, it just makes it less likely.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: