In general, the search space even for email addresses is probably too large for me to crack in a few days, but in the context above, where the author's email was already available online (on her website, in SPAM databases, in leaked credential datasets, ...), there is hardly any difference. In any case, if you consider my email address "personally identifiable information", I consider its checksum such information as well.
> In any case, if you consider my email address "personally identifiable information", I consider its checksum such information as well.
I wonder what the odds are on a hash collision from another email address (including abusing + addressing) that genuinely belongs to another person (rather than just exists) and therefore the resulting hash does not uniquely identify a single person.
The 'birthday attack'[0] article covers this pretty well, but if we take the output size of a SHA-1 hash as 160 bits, and assume it's outputs are equally distributed[1], a brute-force approach (equivalent to a non-maliciously generated accidental collision across all addresses ever)
is:
sqrt(2**160 * PI/2) ~= 1.5 x10**24
for there to be a 50% probability of a collision occurring.
(if I understood/got the maths right)
[0] https://en.wikipedia.org/wiki/Birthday_attack
[1] This is the intent of all hash functions, and I don't think there are any fundamental attributes of email addresses that would cause systematic bias in the output
Assume you have 1 billion (10^9) computers, each computer can do 1 billion hashing operations per second. That is 10^18 operations per second combined.
Rounding up, one day has 1 million seconds (10^6), and one year has 1000 (10^3) days. So, we have 10^27 ~= 2^90 operations per year.
100 million years is 10^8 ~= 2^27. So, you have 2^117 operations in 100 million years. Geologically, there was an Extinction Event [1] about every 100 million years (e.g. 66, 200 and 251 million years ago). So, having an (unintentional) hash collision in more than 128 bits (assuming a good hash function that has uniformly distributed hash) is less likely than an event happening within the next second that kills 50% of the Earth's species.