It's somewhat true. Better models are coming out which are already pretrained on...

horsawlarway · on March 16, 2022

At least in my experience - audio is much harder to convincingly fake than video. If you have heard the real person speaking, they have very specific and distinguishable patterns of speech.

You can fake it reasonably, but you need to have a very large collection of audio clips to do so, and if you do a bad job it literally jumps out at the viewer.

Video might be off, but it requires close attention and large screens to notice - much easier to miss if you're viewing on a phone.