I'm not talking about the fine tuning that make them side with the user even when they are wrong (anyway, this is less and less common now compared to the past, but anyway it's a different effect). I'm referring if in the template you make the assistant reply starting with wrong words / directions, and the LLM finds a way to say what it really meant saying "wait, actually I was wrong" or other sentences that allow it to avoid following the line.
They don't? That's not the case at all, unless I am misunderstanding.