No. That’s like saying you can transplant a person’s neuronal action potentials ...

helloplanets · on Nov 1, 2024

That metaphor is skipping the most important part in between! You wouldn't be transplanting anything directly, you'd have a separate step in between, which would attempt to translate these action potentials.

The point of the translating model in between would be that it would re weight each and every one of the values of the embedding, after being trained on a massive dataset of original text -> vector embedding for model A + vector embedding for model B. If you have billions of parameters trained to do this translation between just two specific models to start with, wouldn't this be in the realm of possible?

quantadev · on Nov 1, 2024

A translation between models doesn't seem possible because there are actually no "common dimensions" at all between models. That is, each dimension has a completely different semantic meaning, in different models, but also it's the combination of dimension values that begin to impart real "meaning".

For example, the number of different unit vector combinations in a 1500 dimensional space is like the number of different ways of "ordering" the components, which is 5^4114 .

EDIT: And the point of that factorial is that even if the dimensions were "identical" across two different LLMs but merely "scrambled" (in ordering) there would be that large number to contend with to "unscramble".

tempusalaria · on Nov 1, 2024

This is very similar to how LLMs are taught to understand images in llava style models (the image embeddings are encoded into the existing language token stream)