>There is a vast gulf between "GPT-5 can drive a car" and "a neural network using the transformer architecture can be trained to drive a car".
The only difference between the two is training data the former lacks that the latter does so not a 'vast gulf'.
>And I see no proof whatsoever that we can, today, train a single model that can both write a play and drive a car.
You are not making a lot of sense here. You can have a model that does both. It's not some herculean task. it's literally just additional data in the training run. There are vision-language-action models tested on public roads.
The only difference between the two is training data the former lacks that the latter does so not a 'vast gulf'.
>And I see no proof whatsoever that we can, today, train a single model that can both write a play and drive a car.
You are not making a lot of sense here. You can have a model that does both. It's not some herculean task. it's literally just additional data in the training run. There are vision-language-action models tested on public roads.
https://wayve.ai/thinking/lingo-2-driving-with-language/