Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SemiAnalysis is wrong. They just made their numbers up (among many other things they have invented - they are not to be trusted). I have observed many errors of understanding, analysis and calculation in their writing.

Deep Seek R1 is literally an open weight model. It has <40bln active parameters. We know that for a fact. That size of model is definitely roughly optimally trained over the time period and server times claimed. In fact, the 70bln parameter Llama 3 model used almost exactly the same compute as the DeepSeek V3/R1 claims (which makes sense, as you would expect a bit less efficiency for the H800 and for the complex DeepSeek MoE architecture).



Active parameters is definitely the wrong metric to use for evaluating the cost to train a model




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: