I used vim for 5 years, afterwards I switched over to Emacs, and I couldn't disagree with you more. What's clumsy and slow is having to switch between insert mode, normal mode, and visual mode hundreds of times while working rather than just using modifier keys for special actions.
If you're entering normal mode for one or two commands at a time, that could be problem. However, if you're fast at entering normal (i.e. use jj, fd, Caps Lock as Esc, etc.), that's not a meaningful overhead.
On the other hand, having to constantly hold Control/Alt, including long chords (e.g. with prefix + count), feels much worse for me. I use Emacs keybinds on the terminal (outside of vim), but I can't imagine having to constantly press modifiers. It's not as comfortable as switching to a mode where those actions are explictly first-class.