So, that does 19999 fork()s in 224 seconds, which works out to about 11 milliseconds per fork(). (On my Linux box, it's actually doing clone(), and also does a bunch of wait4(), rt_sigaction(), exit_group(), etc., but let's assume the fork() is the bottleneck.)
This is pretty slow, but it's still about 6× faster than the 70 milliseconds I had inferred from your "100× slower than any other POSIX".
Also note that your Red Hat machine is evidently forking in 39μs, which is several times faster than I've ever seen Linux fork.
I understand that WSLv1 presents a more efficient fork(), which is not really available to shims like Cygwin or mingw.