To put it this way: after seeing examples of how a LLM with similar capabilities...

jansan · 2025-02-18T20:28:56 1739910536

There was this joke about rich Russians that I heard maybe 25 years ago.

Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."

jaysonelliot · 2025-02-18T22:06:19 1739916379

Can you explain that joke for me? I keep reading it and I don't get it.

iteratethis · 2025-02-18T23:56:20 1739922980

The punch line is that more expensive is better in cases where you buy something just to flex wealth.

fragmede · 2025-02-19T10:28:54 1739960934

https://en.wikipedia.org/wiki/Veblen_good

jansan · 2025-02-19T20:24:55 1739996695

To put it simple: He only bought the necktie so he can brag how rich he is. He could have bragged even more if he had bought the necktie in the other shop.

d0mine · 2025-02-19T19:20:50 1739992850

https://en.wikipedia.org/wiki/New_Russians

atulvi · 2025-02-18T23:54:55 1739922895

it's just that rich Russians do not have financial sense.

lopatamd · 2025-02-18T17:25:04 1739899504

Imagine what they'll achieve if they'll apply deepseek methods here with this insane compute

iclimbthings · 2025-02-18T19:32:34 1739907154

And they will since Deepseek open-sourced everything.

boroboro4 · 2025-02-19T00:33:59 1739925239

The only thing Deepseek open sourced is architecture description and some of training methods. They didn’t open source their data pipelines or super optimized training code.

Their architecture achievement is their own MoE and own attention. Grok was MoE since v1. As for attention we don’t know really what grok use now, but it worth noting DeepSeek attention was already present in previous version of DeepSeek models.

As of reasoning recipe for R1 seems like Grok already either replicated or came up to it by itself, since they have well performing reasoning uptrain too.