Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To put it this way: after seeing examples of how a LLM with similar capabilities to state-of-the-art ones can be built with 20 times less money, we now have proof that the same can be done with 20 times more money as well!


There was this joke about rich Russians that I heard maybe 25 years ago.

Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."


Can you explain that joke for me? I keep reading it and I don't get it.


The punch line is that more expensive is better in cases where you buy something just to flex wealth.



To put it simple: He only bought the necktie so he can brag how rich he is. He could have bragged even more if he had bought the necktie in the other shop.



it's just that rich Russians do not have financial sense.


Imagine what they'll achieve if they'll apply deepseek methods here with this insane compute


And they will since Deepseek open-sourced everything.


The only thing Deepseek open sourced is architecture description and some of training methods. They didn’t open source their data pipelines or super optimized training code.

Their architecture achievement is their own MoE and own attention. Grok was MoE since v1. As for attention we don’t know really what grok use now, but it worth noting DeepSeek attention was already present in previous version of DeepSeek models.

As of reasoning recipe for R1 seems like Grok already either replicated or came up to it by itself, since they have well performing reasoning uptrain too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: