Since these are hyperparameters, some of which are annealed over the entire training period, and given the fact that the training required ungodly amounts of computing time, I think it is just impractical for them to have fully checked whether they were set optimally. They probably went with what seemed good, and trusted deep networks to pick up the slack. (this is total speculation on my part).
I do think if they'd used some more sophisticated RL algorithms, perhaps with intrinsic curiosity, or some kind
of hierarchical task learning, they might have been able to reduce their training time and maybe been able to tune their hyperparameters a bit more
The numbers seem pretty arbitrary to me, that's probably what this blog post is talking about when it mentions why it lost.