I'm imagining the interaction to be a little like a bayesian spam filter-- a training period, followed by occasional corrections.
The UX is key, as you indicated-- imagine a nice tight grid of matches for one person (each cropped to just the face area), and you can click each picture to toggle yes/no, with perhaps some more advanced interaction possible for algorithm-specific training (indicating angle, or eye position, or a sunglasses yes/no toggle).
It's actually pretty straightforward to detect a face; most mid-range digital cameras can do it nowadays. The problem is that even if you tell the program that face A is not face B, all you've done is given it a very small hint in a VERY big problem space. The human brain is hard-wired to detect faces (newborn babies prefer to look at pictures of human faces rather than colorful shapes), which I think makes it feel like an easier problem than it actually is.
That being said, don't let me stop you from trying. It would definitely be a cool (and very saleable) bit of tech.
The UX is key, as you indicated-- imagine a nice tight grid of matches for one person (each cropped to just the face area), and you can click each picture to toggle yes/no, with perhaps some more advanced interaction possible for algorithm-specific training (indicating angle, or eye position, or a sunglasses yes/no toggle).