I think the discussion here is confusing the algorithm for the output. It's true that diffusion can rewrite tokens during generation, but it is doing so for consistency with the evolving output -- not "accuracy". I'm unaware of any research which shows that the final product, when iteration stops, is less likely to contain hallucinations than with autoregression.
With that said, I'm still excited about diffusion -- if it offers different cost points, and different interaction modes with generated text, it will be useful.