I can't remember which company it was that launched a camera with face identification features, but that didn't recognize any face that wasn't lilly white like every single engineer that worked at that company. They could have probably benefited from a diversity and inclusion review. Heck, employing a single brown engineer or even QA engineer probably would have been enough to notice that before launch.
> I can't remember which company it was that launched a camera with face identification features, but that didn't recognize any face that wasn't lilly white like every single engineer that worked at that company
It doesn't show that. It's literally numerical in that dark skin reflects less light than light skin , so the sensors report lower values for the entire face, reducing contrast for the entire face, which is what the recognition systems count on.
Brown eyebrows on brown skin = low contrast.
Brown eyebrows on pale skin = high contrast.
If our races were dark purple hair on bright green skin and bright green hair on dark purple skin, facial recognition systems would have no trouble with either. But that's not how humans render, so our contrast based systems struggle with low contrast.
It's like you're confusing a software/data problem with a photon/physics problem because you're thinking in your box.
It's a design problem. If they tested it with POC they would have noted down "well, our primative algorithm works well for light skinned individuals but not others"
And hopefully someone wouldn't have said "hmm good enough for me, let's ship it!"
My webcam has an advanced option panel that lets me edit both the brightness and the exposure time. I can turn it up so bright that you can't even make out any of my facial features, and I'm in a somewhat dark room lit by a single floor lamp.
> it shows that they didn't test with anyone with a darker skin tone
Are you disagreeing and saying they did test on people with darker skin tone, found the issue, and decided to ship anyway? You realize that either way, it doesn't make them look good?
Anyway, leaving all that aside, the article interviewing an actual face recognition software expert, shows that your guesses here are incorrect.
Contrast may be one root of the technical problem, but claiming a product "ready to launch" while it fails to work for people based on their race (especially when the company clearly didn't put effort into preventing the issue ahead of time) is problematic.
By having a diverse team (or making some effort to include diverse opinions) you'd have a chance to discover new ways to detect faces, or new mitigations to the contrast problem.
But claiming a product is ready for release when it excludes people based on race (no matter the technical reason) is a problem.
It’s only a “contrast issue” if the people building the system failed to have roughly half of humanity represented in any meaningful way on their dev team.
There is really no reason why the dev team should include any particular demographic: how are you supposed to have 90 years old people in the team to make sure they are recognized correctly? This is a requirements issue which directly impacts validation/test data collection. If their user base has 50% black people any reasonable protocol will include enough black faces int he test data to detect the problem early on. Ml based systems will always make errors, which errors matter will be defined by market/legal/mission requirements. It may very well be that faces of black people are harder to detect (especially in backlit situations). Should you hold the product because it may not work for everybody? It’s a complex decision. Maybe you can just have a good “face detection failed” flow to handle all the errors (think not only black people but also, tattooed people, etc.).
Arguing that having quotas of that or the other in the dev team will make them more sensitive to diversity issues in general is also unnecessary because everybody is part of some minority in some situation, hence a minimum of education will make anybody understand first hand the value of inclusiveness and diversity.
Btw, the team is using only their faces to test the system they won’t go far.. (think about lighting condition / different environments).
In theory I agree with you, we want unbiased models but here we have an input distribution that is not well understood so things get much more complex. We don’t even have a clear definition of what’s a face or not.
The model doesn’t work for people with masks: near 100% failure rate on this category of inputs. Should we release it or not?
In general some inputs are harder than others so it is expected to have more errors on those.
That being said in practice, in normal conditions, it is not hard to detect people with dark skin if the proper training data and training is used (btw, if you don’t pay attention how you do things even a low light image of a Caucasian will not be recognized) so there is little excuse to exclude a large part of the population just because of sloppiness. Moreover for this specific category (and of course others), there are consideration ethical and legal to make sure the system works for them.
Apart from that in general I do really think that ML systems with no “operator override” in many contexts are an hazard. We cannot expect the model creators to have predicted and tested for every possible input and we cannot have ways to manually correct the error (for instance in lending or border controls). Incidentally it is interesting to note this will be skilled work that will not be take over by “AI”.
I believe we're mostly in agreement. What's not acceptable to me is using "All models are wrong" to imply that it's ok to not understand ways in which they wrong, to be willfully ignorant of their failures, or to devalue transparency.
As a professional and practitioner, I have to a responsibility to engage in transparency and honesty when I deliver a model. Part of that is understanding and designing failure modes. That's simply good engineering.
Indeed I agree, it seems even that for some use cases training data is not anymore the bottleneck but robust test suites are. Interesting times, let’s hope we will find a responsible way to use these powerful technologies.
They need to test on a realistic sample of users. Testing on the dev team is just lazy; they probably have unusually new and expensive hardware in well-lit offices.
And yet a short while later, they released a patch that fixed the bug. So your physics claim is irrelevant.
The fact remains that they would not have released that software knowing it wouldn't work for Black people. And yet, they didn't notice the bug because they were making no effort to be inclusive.
Microsoft Kinect also had this issue. I know the one black engineer (that had little to do with the project) that repeatedly got pulled in to see if the test system worked with non-white people.