Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.

This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.

SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].

The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?

[0]: https://semianalysis.com/2025/01/31/deepseek-debates/#:~:tex...



SemiAnalysis is wrong. They just made their numbers up (among many other things they have invented - they are not to be trusted). I have observed many errors of understanding, analysis and calculation in their writing.

Deep Seek R1 is literally an open weight model. It has <40bln active parameters. We know that for a fact. That size of model is definitely roughly optimally trained over the time period and server times claimed. In fact, the 70bln parameter Llama 3 model used almost exactly the same compute as the DeepSeek V3/R1 claims (which makes sense, as you would expect a bit less efficiency for the H800 and for the complex DeepSeek MoE architecture).


Active parameters is definitely the wrong metric to use for evaluating the cost to train a model




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: