To put it this way: after seeing examples of how a LLM with similar capabilities to state-of-the-art ones can be built with 20 times less money, we now have proof that the same can be done with 20 times more money as well!
There was this joke about rich Russians that I heard maybe 25 years ago.
Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."
To put it simple: He only bought the necktie so he can brag how rich he is. He could have bragged even more if he had bought the necktie in the other shop.
The only thing Deepseek open sourced is architecture description and some of training methods. They didn’t open source their data pipelines or super optimized training code.
Their architecture achievement is their own MoE and own attention. Grok was MoE since v1. As for attention we don’t know really what grok use now, but it worth noting DeepSeek attention was already present in previous version of DeepSeek models.
As of reasoning recipe for R1 seems like Grok already either replicated or came up to it by itself, since they have well performing reasoning uptrain too.