My (very limited) understanding of AI models is the input "shape" has to be well defined.
I.e. a vision network expects 1 input per pixel (or more for encoding color) and so it's up to you to "format" your given image into what the model expects.
But what about GPT-3, which takes in "free text?" The animations in the post show 2048 input nodes, does this mean it can only take in a maximum of 2048 tokens? Or will it somehow scale beyond that?
Correct, you can only input up to 2048 tokens total (this is a big improvement over GPT-2's 1024 input size). You can use sliding windows to continue generating beyond that.
However, model training scales quadratically as input size increases which makes building larger models more difficult (which is why Reformer is trying workarounds to increase the input size).
Yes, there is a limited amount of input. In addition, each token may be a word or only part of a word, depending on how common it is. Common words get one token and uncommon words are divided into pieces, each of which gets a token.
I.e. a vision network expects 1 input per pixel (or more for encoding color) and so it's up to you to "format" your given image into what the model expects.
But what about GPT-3, which takes in "free text?" The animations in the post show 2048 input nodes, does this mean it can only take in a maximum of 2048 tokens? Or will it somehow scale beyond that?