It's like having 2 pilots instead of 1 pilot. If one pilot is unexpectedly defective (has a heart attack mid-flight), you still have the other pilot. Some errors between the 2 pilots aren't uncorrelated of course, but many of them are. So the chance of an at-fault crash goes from p and approaches p^2 in the best case. That's an unintuitively large improvement. Many laypeople's gut instinct would be more like p -> p/2 improvement from having 2 pilots (or 2 data streams in the case of camera+LIDAR).
In the camera+LIDAR case, you conceptually require AND(x.ok for all x) before you accelerate. If only one of those systems says there's a white truck in front of you, then you hit the brakes, instead of requiring both of them to flag it. False negatives are what you're trying to avoid because the confusion matrix shouldn't be equally weighted given the additional downside of a catastrophic crash. That's where two somewhat independent data streams becomes so powerful at reducing crashes, you really benefit from those ~uncorrelated errors.
"In the camera+LIDAR case, you conceptually require AND(x.ok for all x) before you accelerate."
This can be learnt by the model. Let's assume vision is 100% correct, the model would learn to ignore LIDAR, so the worst case scenario is that LIDAR is extra cost for zero benefit.
This is not going to be true for a very long time, at least so long as one's definition of "vision" is something like "low-cost passive planar high-resolution imaging sensors sensitive to the visual and IR spectrum" (I include "low-cost" on the basis that while SWIR, MWIR, and LWIR sensors do provide useful capabilities for self-driving applications, they are often equally expensive, if not much more so, than LIDARs). Camera sensors have gotten quite good, but they are still fundamentally much less capable than the human eyes plus visual cortex in terms of useful dynamic range, motion sensitivity, and depth cues - and human eyes regularly encounter driving conditions which interfere or prohibit safe driving (e.g. mist/ fog, heavy rain/snow, blowing sand/dust, low-angle sunlight at sunrise/sunset/winter). One of the best features of LIDAR is that it is either immune or much less sensitive to these phenomena at the ranges we care about for driving.
Of course, LIDAR is not without its own failings, and the ideal system really is one that combines cameras, LIDARs, and RADARs. The problem there is that building automotive RADAR with sufficient spatial resolution to reliably discriminate between stationary obstacles (e.g. a car stalled ahead) and nearby clutter (e.g. a bridge above the road) is something of an unsolved problem.
The worst case scenario is that LIDAR is a rapidly falling extra cost for zero benefit? Sounds like it's a good idea to invest into cheap LIDAR just in case the worst case doesn't happen. Even better, you can get a head start by investing in the solution early and abandon it when it has obsolete.
By the way, Tesla engineers secretly trained their vision systems using LIDAR data because that's how you get training data. When Elon Musk found out, he fired them.
Finally, your premise is nonsensical. Using end to end learning for self driving sounds batshit crazy to me. Traffic rules are very rigid and differ depending on the location. Tesla's self driving solution gets you ticketed for traffic violations in China. Machine learning is generally used to "parse" the sensor output into a machine representation and then classical algorithms do most of the work.
The rationale for being against LIDAR seems to be "Elon Musk said LIDAR is bad" and is not based on any deficiency in LIDAR technology.
If you're on a desert island and you have 2 watches instead of 1, the probability of failure (defined as "don't know the time") within T years goes from p to p^2 + epsilon (where epsilon encapsulates things like correlated manufacturing defects).
So in a way, yes.
The main difference is that "don't know the time" is a trivial consequence, but "crash into a white truck at 70mph" is non-trivial.
It's different because the challenge with self-driving is not to know the exact time. You win for simply noticing the discrepancy and stopping.
Imagine if the watch simply tells you if it is safe to jump into the pool (depending on the time it may or may not have water). If watches conflict, you still win by not jumping.
It's like having 2 pilots instead of 1 pilot. If one pilot is unexpectedly defective (has a heart attack mid-flight), you still have the other pilot. Some errors between the 2 pilots aren't uncorrelated of course, but many of them are. So the chance of an at-fault crash goes from p and approaches p^2 in the best case. That's an unintuitively large improvement. Many laypeople's gut instinct would be more like p -> p/2 improvement from having 2 pilots (or 2 data streams in the case of camera+LIDAR).
In the camera+LIDAR case, you conceptually require AND(x.ok for all x) before you accelerate. If only one of those systems says there's a white truck in front of you, then you hit the brakes, instead of requiring both of them to flag it. False negatives are what you're trying to avoid because the confusion matrix shouldn't be equally weighted given the additional downside of a catastrophic crash. That's where two somewhat independent data streams becomes so powerful at reducing crashes, you really benefit from those ~uncorrelated errors.