Have a look[1]. Observe how they all kind of look like each other regardless of the country: shoulder pads, head cover, belt, uniform colour, formal attire, etc. Once the model knows the baseline, that is "a uniformed individual waving characteristically in front of the car", don't you think it would be quite easy to teach it with a mere few hundred pictures how local police officers look like in the current region?
They don't look remotely similar to me, especially considering that black, blue, navy, grey and khaki (and reflective yellow for people on roads) are not unpopular colours for clothes worn by people who are not police officers or otherwise involved in traffic direction, many of whom may have cause to move their arms in ways comparable to signals. Humans are just a lot better at gauging intention and even simple stuff like parsing the word "police" at an oblique angle
Frankly even with local training, you'd still think giving law enforcement devices to stop, restart and redirect vehicles was a minimum requirement.
All of these agents follow a similar pattern of clothing (as I said, combination of similar garnments and colours) and behaviour (placement on the road, gesturing with authority, directly facing the car, etc.). Machine learning algorithms are especially good at recognizing patterns and storing their abstracted form, so it should come as no surprise that understanding what a police officer looks like in abstract is not the main issue of self driving.
> Humans are just a lot better at gauging intention and even simple stuff like parsing the word "police" at an oblique angle
Google seems to understand these intentions well enough, and at a much higher level than mere word parsing. This video is from 2015: https://youtu.be/tiwVMrTLUWg?t=9m5s
You got the car understanding all that happens at a complex intersection at 9'05, understanding what a police car looks like at 9'35, then detecting and reacting to a schoolbus and then parsing a police officer gestures right at the 10' mark. I'd say chances are these are pretty solved situations 3 years later. You can even see some creatures from their "zoo" of patterns for cars & people at 10'35.
> "When a Waymo car hears sirens, it will automatically pull over, yield, and stop. For example, when a number of vehicles are moving towards the scene of an accident on a highway and ambulances and other emergency vehicles are headed toward it, driverless cars will move aside and give way. Using audio sensors, the cars can detect exactly which direction the sirens are coming from and move out of the way."
> All of these agents follow a similar pattern of clothing (as I said, combination of similar garnments and colours) and behaviour (placement on the road, gesturing with authority, directly facing the car, etc.)
Repeating an assertion does not make it cease to be false. A very small handful of pictures you linked to shows a wide variety of coats, vests and shirts of many different colours, all of which heavily overlap with general garment types and colours used in everyday clothing which tend to indicate police only with small and greatly varying trim detailing (and sometimes hats). And is the clothing and trim designed to convey authority? isn't the sort of abstract pattern recognition computers do better than humans, or even at all well. Sure, you could certainly create specific police uniform training sets for every jurisdiction and possibly even cut down false positives in other jurisdictions by geofencing them (so you don't get people stopping in California for commuters wearing the distinctive er... blue shirts and black trousers of the Hong Kong police) but it's a non-trivial undertaking even if there are bigger problems for SDVs to tackle
More importantly, unlike humans evolved to have an intimate understanding of human mannerisms, machine learning has no concept of "gesturing with authority", beyond whether moving human shapes fit very specific patterns within its calibration parameters, and police officers often don't have scope to place themselves in a particular position in order to get the car to understand them.
> Google seems to understand these intentions well enough, and at a much higher level than mere word parsing.
The video shows examples of predicting possible directions of travel of moving road users based on maps and movements (i.e. its fundamental driving model) and a shot of it recognising two arm gestures in an idealised front on positions. Neither fall under the scope of being able to understand how the traffic policemen intends to clear the blocked intersection from his shouts and gesticulations at you and various other vehicles. Humans also don't need to be signalled to go again if the black-jacketed man they've stopped for was actually trying to hail the taxi behind them.
1: https://www.quora.com/What-are-the-different-police-uniforms...