Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general, the search space even for email addresses is probably too large for me to crack in a few days, but in the context above, where the author's email was already available online (on her website, in SPAM databases, in leaked credential datasets, ...), there is hardly any difference. In any case, if you consider my email address "personally identifiable information", I consider its checksum such information as well.


So have the customer hash her list with a salt, and you hash your list with the same salt, and everyone goes home for dinner.


> In any case, if you consider my email address "personally identifiable information", I consider its checksum such information as well.

I wonder what the odds are on a hash collision from another email address (including abusing + addressing) that genuinely belongs to another person (rather than just exists) and therefore the resulting hash does not uniquely identify a single person.


Very, very small.

The 'birthday attack'[0] article covers this pretty well, but if we take the output size of a SHA-1 hash as 160 bits, and assume it's outputs are equally distributed[1], a brute-force approach (equivalent to a non-maliciously generated accidental collision across all addresses ever) is:

    sqrt(2**160 * PI/2) ~= 1.5 x10**24
for there to be a 50% probability of a collision occurring. (if I understood/got the maths right)

[0] https://en.wikipedia.org/wiki/Birthday_attack [1] This is the intent of all hash functions, and I don't think there are any fundamental attributes of email addresses that would cause systematic bias in the output


To put things into perspective:

Approximately, 10^3 = 1000 ~= 1024 = 2^10, 10^2 = 100 ~= 128 = 2^7.

Assume you have 1 billion (10^9) computers, each computer can do 1 billion hashing operations per second. That is 10^18 operations per second combined.

Rounding up, one day has 1 million seconds (10^6), and one year has 1000 (10^3) days. So, we have 10^27 ~= 2^90 operations per year.

100 million years is 10^8 ~= 2^27. So, you have 2^117 operations in 100 million years. Geologically, there was an Extinction Event [1] about every 100 million years (e.g. 66, 200 and 251 million years ago). So, having an (unintentional) hash collision in more than 128 bits (assuming a good hash function that has uniformly distributed hash) is less likely than an event happening within the next second that kills 50% of the Earth's species.

[1] http://en.wikipedia.org/wiki/Extinction_event




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: