The only thing Deepseek open sourced is architecture description and some of training methods. They didn’t open source their data pipelines or super optimized training code.
Their architecture achievement is their own MoE and own attention. Grok was MoE since v1. As for attention we don’t know really what grok use now, but it worth noting DeepSeek attention was already present in previous version of DeepSeek models.
As of reasoning recipe for R1 seems like Grok already either replicated or came up to it by itself, since they have well performing reasoning uptrain too.