> GPT-2 is may be last LLM This is not true. You have tones of models those are ...

simne · on Feb 11, 2024

> you still can train them on a single GPU with 24GB video memory

It depends, on what target. For pure science (or for enjoy), I could train GPT-4 class model on C64, but this method will not fit on concurrent market, where need fast check hypotheses and fast deliver tuned models.

- Concurrent market is very sensitive for speed - for example, if MS present something on December 10, Google after New Year should present not equal, but significantly better, to just appear equal for customers.

So, horizontal scale is a must, not just my wish, even when speed increase is far from linear.

> I honestly expect the scaling per GPU get better than 0.75 in next 5 years

Could you give explanation, or even speculations, how this is possible, when we already hit Silicone limits (about 5GHz core, 1nm, etc)?

pk-protect-ai · on Feb 11, 2024

> Could you give explanation, or even speculations, how this is possible

Nope. But i'm so desperate to give you a hint right now, it is almost impossible to hold myself... Stop looking into horizontal scalability. The vertical one is not exhausted yet. Btw that was not the hint.

simne · on Feb 12, 2024

> Stop looking into horizontal scalability.

Sure. B-747 officially need about 700 man-years so assemble, lets make them with small but highly motivated teams, with classics 3 pizza rule, world will wait :)

simne · on Feb 11, 2024

BTW I was not joking, when said about train LLM on C64. I lot of time seen scientists, who run their tasks on desktop, waiting days or even weeks for results. But they usually have reasons for such behavior, for example, to keep secret from colleagues, on what working now and what calculations show. Or to run something so original, that tops not happy to see on special numbers crunching machine.