Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The model doing the heavy lifting is https://github.com/Rudrabha/Wav2Lip

Mic permissions on mobile are tricky, which might have been your issue? Note in this prototype you also need to hold the blue button down to speak.



Interesting. I didn’t think you could get anything close to realtime with Wav2Lip.


With a dedicated GPU and some cleverness it can be relatively quick. I split the response on punctuation and generate smaller clips in a pipeline. I haven't taken the model apart to try streaming the frames coming out of ffmpeg yet, but that would probably help a lot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: