Synthetic data doesn't have to come from an LLM. And that paper only showed that...

Synthetic data doesn't have to come from an LLM. And that paper only showed that if you train on a random sample from an LLM, the resulting second LLM is a worse model of the distribution that the first LLM was trained on. When people construct synthetic data with LLMs, they typically do not just sample at random, but carefully shape the generation process to match the target task better than the original training distribution.