From a practical standpoint, scaling test-time compute does enable datacenter-scale performance on the edge. I can not feasibly run 70B on my iphone, but I can run 3B even if takes a lot of time for it to produce a solution comparable to 70B's 0-shot.
I think it *is* an unlock.