I think it's not so much that the IEEE f32/f64 choice was necessarily perfect so...

I think it's not so much that the IEEE f32/f64 choice was necessarily perfect so much as that it was "good enough", and so it's not worth the hardware costs of handling multiple formats or the headaches of picking a single choice that's something else. With f16 because you only have 16 bits the tradeoffs are suddenly much more sharp, because you don't have enough to both have a reasonable representable range (large exponent field) and a reasonable precision (large mantissa field). So you must trade one against the other, and it can be worth the extra hardware to support two points in the tradeoff range.