Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The LibriTTS demo clones unseen speakers from a five second or so clip


Ah ok, thanks. I tried the other demo.


I tried it. Sounds absolutely nothing like my voice or my wife's voice. I used the same sample files as I used 2 days ago on the Eleven Labs website, and they worked flawlessly there. So this is very, very far from being close to "Eleven Labs quality" when it comes to voice cloning.


Ah that's disappointing, have you tried https://git.ecker.tech/mrq/ai-voice-cloning ? I've had decent results with that, but inference is quite slow.


ElevenLabs are based on Tortoise-TTS which was already pre-trained on millions of hours of data, but this one was only trained on LibriTTS which was 500 hours at best. If you have seen millions of voices, there are definitely gonna be some of them that sound like you. It is just a matter of training data, but it is very difficult to have someone collect these large amounts of data and train on it.


The speech generated is the best I've heard from an open source model. The one test I made didn't make an exact clone either but this is still early days. There's likely something not quite right. The cloned voice does speak without any artifacts or other weirdness that most TTS systems suffer from.


Yep. Tried as well. Tried a little clip of Tony Sopranos and it came out as a british guy.

xTTSv2 does it much better. But the quality on the trained voices are great though.


Yes, same for my voice. Made me sound British and didn't capture anything special about my voice that makes it recognizable.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: