I generally do 2x the number of cores (e.g., eight Unicorn workers on a quadcore box), but it depends on your application. A heavyweight application with 150 models and lots of cpu-intensive tasks will suffer more from context switches than a lightweight one which spends most of its time idle.
Probably in a manner not entirely dissimilar to Unicorn: https://github.com/blog/517-unicorn
I generally do 2x the number of cores (e.g., eight Unicorn workers on a quadcore box), but it depends on your application. A heavyweight application with 150 models and lots of cpu-intensive tasks will suffer more from context switches than a lightweight one which spends most of its time idle.