Are these text-to-speech models manually controllable in terms of prosody, etc.,...

		ipsin on Jan 9, 2023 \| parent \| context \| favorite \| on: VALL-E: Microsoft’s new zero-shot text-to-speech m... Are these text-to-speech models manually controllable in terms of prosody, etc., or is it all transformer-based text-to-audio? I've followed some of the research on prosody transfer, etc., but it still seems bad in the TTS systems I've heard.