Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My guess is around 256GiB but it depends on what level of quantization you are okay with. At full 16bit it will be massive, near 512GiB.

I figure we will see some Q4's that can probably fit on 4 4090s with CPU offloading.



With 400 billion parameters and 8 bits per parameter, wouldn't it be ~400 GB? Plus context size which could be quite large.


he said "Q4" - meaning 4-bit weights.


Ok but at 16-bit it would be 800GB+, right? Not 512.


Divide not multiply. If a size is estimated in 8-bit, reducing to 4-bit halves the size (and entropy of each value). Difference between INT_MAX and SHORT_MAX (assuming you have such defs).

I could be wrong too but that’s my understanding. Like float vs half-float.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: