Given several models, assuming only that some unknown subset is "safe", can we construct a single model as safe as that subset? This reduces obtaining a trustworthy model to a plausibly easier task.
ChatGPT could encode usernames, timestamps and other session context into its responses in a way that would only be retrievable by OpenAI and provably invisible to everyone else
There are several differences:
1. Empirically, networks have many adversarial examples. It doesn't mean though that there are adversarial examples everywhere. They show that any point can be slightly changed to get whichever output.
2. Some training algorithms that already exist or will exist are meant to be robust. They show that even with a robust algorithm the backdoor will still exist.
3. As you said, they show that finding the backdoored point is also efficient to the key holder.