My hope is that sufficiently rich language models obviate the need for a lot of ...

visarga · on Dec 18, 2021

> We do need some paired data

A couple of under-explored rich sources of training data on actions are videos and code. Videos, showing how people interact with objects in the world to achieve goals, might also come with captions and metadata, while code comes with comments, messages and variable names that relate to real world concepts, including millions of tables and business logic.

Maybe in the future we will add rich brain scans as an alternative to text. That kind of annotation would be so easy to collect in large quantities, provided we can wear neural sensors. If it's impractical to scan the brain, we can wear sensors and video cameras and use eye tracking and body tracking to train the system.

I am optimistic that language modelling can become the core engine of AI agents, but we need a system that has both a generator and a critic, going back and forth for a few rounds, doing multi-step problem solving. Another must is to allow search engine queries in order to make more efficient and correct models, not all knowledge must be burned into the weights.

solarmist · on Dec 18, 2021

> My hope is that sufficiently rich language models obviate the need for a lot of robot-language grounding data.

I feel like this is “missing the trees for the forest.” In my experience, generality only emerges after a critical mass of detailed low-level examples is collected and arranged into a pattern. Humans can’t actually reason about purely abstract ideas very well. Experts always have specifics in mind they are working from.

So I'm not convinced leaving it to the model gets you anything new.

PeterisP · on Dec 18, 2021

I feel that the (IMHO plausible) idea is that a sufficiently rich language model can enable transfer learning for robotics, where you can effectively replace a lot of robot-language grounding data with a small amount of robot-language grounding and a lot of pure language data.