Here is my high level take: most AI researchers I trust recognize that AI alignment is at least fiendishly hard and probably impossible. This breaks down into at least two parts. First, codifying the values of a group of people is hard and impossible to do neutrally, since many sets of reasonable desiderata fail various impossibility theorems, not to mention the practical organizing difficulties. Second, ensuring the AI generalizes and behaves correctly according to a based on supervised learning over a set of examples is likely impossible, due to the well-known problems of out-of-distribution behavior.
Of course, we can't let the perfect be the enemy of the better. We must strive to align our systems better over time. Some ways include: (a) hybrid systems that use provably-correct subsystems; (b) better visibility, vetting, and accountability around training data; (c) smart regulation that requires meaningful disclosure (such as system cards); (d) external testing, including red-teaming; (e) reasoning out loud in English (not in neuralese!); and more.
Of course, we can't let the perfect be the enemy of the better. We must strive to align our systems better over time. Some ways include: (a) hybrid systems that use provably-correct subsystems; (b) better visibility, vetting, and accountability around training data; (c) smart regulation that requires meaningful disclosure (such as system cards); (d) external testing, including red-teaming; (e) reasoning out loud in English (not in neuralese!); and more.