It’s an auto regressive model so it can’t do anything that requires planning tok...

It’s an auto regressive model so it can’t do anything that requires planning tokens.

It can’t do anything which implies a large or infinite token space (eg video understanding).

It’s also limited to a reasonable response length since token selection is probabilistic at each recursion. The longer you make it the more likely it is to veer off course.