Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are these text-to-speech models manually controllable in terms of prosody, etc., or is it all transformer-based text-to-audio?

I've followed some of the research on prosody transfer, etc., but it still seems bad in the TTS systems I've heard.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: