Our 4-year-old got a Spirograph Junior for Christmas which he's enjoying - the outer gear in held in place by the frame so there's a bit less to go wrong.
I also wondered if these formulae were devised with 1-based indexing in mind (though I guess for larger dimensions it doesn't make much difference), as the paper states
> The wavelengths form a geometric progression from 2π to 10000 · 2π
That led me to this chain of PRs - https://github.com/tensorflow/tensor2tensor/pull/177 - turns out the original code was actually quite different to that stated in the paper. I guess slight variations in how you calculate this encoding doesn't affect things too much?