The LibriTTS demo clones unseen speakers from a five second or so clip

eigenvalue · on Nov 19, 2023

Ah ok, thanks. I tried the other demo.

eigenvalue · on Nov 19, 2023

I tried it. Sounds absolutely nothing like my voice or my wife's voice. I used the same sample files as I used 2 days ago on the Eleven Labs website, and they worked flawlessly there. So this is very, very far from being close to "Eleven Labs quality" when it comes to voice cloning.

thot_experiment · on Nov 19, 2023

Ah that's disappointing, have you tried https://git.ecker.tech/mrq/ai-voice-cloning ? I've had decent results with that, but inference is quite slow.

jsjmch · on Nov 19, 2023

ElevenLabs are based on Tortoise-TTS which was already pre-trained on millions of hours of data, but this one was only trained on LibriTTS which was 500 hours at best. If you have seen millions of voices, there are definitely gonna be some of them that sound like you. It is just a matter of training data, but it is very difficult to have someone collect these large amounts of data and train on it.

sandslides · on Nov 19, 2023

The speech generated is the best I've heard from an open source model. The one test I made didn't make an exact clone either but this is still early days. There's likely something not quite right. The cloned voice does speak without any artifacts or other weirdness that most TTS systems suffer from.

lewismenelaws · on Nov 19, 2023

Yep. Tried as well. Tried a little clip of Tony Sopranos and it came out as a british guy.

xTTSv2 does it much better. But the quality on the trained voices are great though.

eigenvalue · on Nov 19, 2023

Yes, same for my voice. Made me sound British and didn't capture anything special about my voice that makes it recognizable.