Can you elaborate more? Discord has 656m users. if 10% upload their ID, they'd have 65m ID photos to search through. There are 2 use-cases here:
1/ Safety Bans (lets pretend 0.01% of ID card users have been banned for safety reasons: 650k accounts)
If a user submits their selfie/ID card, Discord needs to compare the new image with one of the 650k banned (but deleted?) images. I can't possible think how a human could remember the 650k photos well enough to declare a match.
Even if such a human existed with this perfect recall, there can't be very many of them on this planet to hire.
2/ Duplicate account bans
If a user registers, how can a support staff search the 65m photos without ML assistance to determine if this is a new user or a fraudster?
0.01% of 65M is 6,500. Also apparently only 70K people uploaded their IDs.
That being said, you can still hash faces and metadata (such as ID numbers) instead of storing the whole ID as a scanned photo, if the information is only used for duplicate checking. Hashing does not increase the racial bias. If your model has a bias it will always have a margin of error.
neat, but how do users appeal a false positive? Do companies just trust the users or should the company retain the original information so they can manually verify?
Fair point, but how does the appeal process work today? Even if the company stores someone else's ID in JPEG format, and the customer service claims that the photo on that ID looks very similar to my photo, is it sufficient proof? Should the company trust me, or should I trust the company? I don't think storing hashes makes it more complex.
Fraudsters (may) trick AI by holding up a photo copied version of the original tricking the AI to think its looking at the real thing.
Either the fraudster or the true human can request an appeal and the support staff could easily tell which one is tricking the AI and which one is not.
You can see all the videos of people trying to trick the Apple face lock. To a human, it was obvious they are wearing a mask. To the device, its the same person.
Face hashing is different than generic image hashing. Methods like dividing the photo into smaller rectangles and storing the average colour for each rectangle won't work.
It should be able to detect and hash facial features so that it can compare it to a future (potentially taken from a different angle) photo of the same person. You need some type of machine learning algorithm.
yes, I've worked on face recognition databases with 150m and 40m faces for banking and safety.
The models are not perfect. Humans should still be in the loop to verify, especially when the consequences of being wrong really suck for the user: losing access to their bank account, getting fired from their job.
If you're referring to algorithms like phash (Where they are using the same core image, but just add a filter), they wont work well, because everyone's ID card mostly looks the same. There will be too many FPs.
To be honest, I don't understand what exactly is the problem that needs to be solved. Two people using the same image? Two people using the same ID? The same person registering two accounts using two different IDs but they're both a photo of the same person?