Well one of the inherent issues is assuming that text is the optimal modality fo...

Well one of the inherent issues is assuming that text is the optimal modality for every thing we try to use an LLM for. LLMs are statistical engines designed to predict the most likely next token in a sequence of words. Any 'understanding' they do is ultimately incidental to that goal and once you look at them that way a lot of the shortcomings we see become more intuitive.

There's a lot of problems LLMs are really useful for because generating text is what you want to do. But there's tons of problems which we would want some sort of intelligent, learning behaviour that do not map to language at all. There's also a lot of problems that can "sort of" be mapped to a language problem but make pretty extraneous use of resources compared to a (existing or potential) domain specific solution. For purposes of AGI, you could argue that trying to express "general intelligence" via language alone is fundamentally flawed altogether -- although that quickly becomes a debate about what actually counts as intelligence.

I pay less attention to this space lately so I'm probably not the most informed. Everyone seems so hyped about LLMs that I feel like a lot of other progress gets buried, but I'm sure it's happening. There's some problem domains that are obviously solved better with other paradigms currently: self-driving tech, recommendation systems, robotics, game AIs, etc. Some of the exciting stuff that can likely solve some problems better in the future is some of the work on world models, graph neural nets, multi modality, reinforcement learning, alternatives to gradient descent, etc. I think it's a debate whether or not LLMs are a local maxima but many of the leading AI researchers seem to think so -- Yann Lecun recently for e.g. said LLMs 'are not a path to human-level AI'