> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.
This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.
SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].
The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?
SemiAnalysis is wrong. They just made their numbers up (among many other things they have invented - they are not to be trusted). I have observed many errors of understanding, analysis and calculation in their writing.
Deep Seek R1 is literally an open weight model. It has <40bln active parameters. We know that for a fact. That size of model is definitely roughly optimally trained over the time period and server times claimed. In fact, the 70bln parameter Llama 3 model used almost exactly the same compute as the DeepSeek V3/R1 claims (which makes sense, as you would expect a bit less efficiency for the H800 and for the complex DeepSeek MoE architecture).
This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.
SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].
The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?
[0]: https://semianalysis.com/2025/01/31/deepseek-debates/#:~:tex...