Divide not multiply. If a size is estimated in 8-bit, reducing to 4-bit halves the size (and entropy of each value). Difference between INT_MAX and SHORT_MAX (assuming you have such defs).
I could be wrong too but that’s my understanding. Like float vs half-float.
I figure we will see some Q4's that can probably fit on 4 4090s with CPU offloading.