Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My own favourite random() in the wild bug is one that I've come across many many times:

    /* Generate a random number 0...5 inclusive */
    int r = random() % 6;  
The problem is that this results in a bias towards certain values, because the random() return space is probably not a whole multiple of the number you are modding. It's easier if you think what would happen if RAND_MAX were "10". Then the results 0,1,2,3,4 each have two opportunities of being selected for r, but "5" only has one. So 5 is half as likely as any other number. Terrible!

Using float/double random()'s don't always help either, because floating point multiplication has non-uniform errors. Instead, what you really need is:

   int random_n(int top) {
       if (top <= 0) {
            return -1;
       }
    
       while(1) {
           int r = random(top);
           if (r < (RAND_MAX - (RAND_MAX % top))) {
                return r;
           }
       }
     
       return -1;   
    }
Although it's better to replace random() itself with something that uses real entropy.


Hence arc4random_uniform() in OpenBSD, which accounts for modulo bias (https://en.wikipedia.org/wiki/Fisher–Yates_shuffle#Modulo_bi...).


On my machine RAND_MAX is 2^31-1. So getting the probability wrong in the way you describe means a relative error of one in two billion. That's really quite small for non-critical applications.


It's not so simple. The larger the modulus, the larger the error; up to a limit. If the modulus is 2^30 + 2^29 for example, then values of r in the range 0 ... 2^29 will be represented twice as often as values in the range (2^29 + 1) ... (2^30 + 2^29). This is much more significant error than one in two billion (it's close to 1 in 3, nearly ten orders of magnitude more significant).

Subtleties like this are hard to detect on code review, and even in testing ... the real lesson may be that the very interface behind random(void) is just broken. It really should be random(int) and take care of all of this for you, as many other languages/libraries do.


Yes, you are right, I didn't think of really large moduluses of magnitude comparable to RAND_MAX.


Doesn't matter what your RAND_MAX is if you're taking the result modulo anything that's not a power of two.


No, it does matter. If RAND_MAX is really large, the relative error in "incorrect" probabilities is really small. Unless you have a critical need for it to be precisely correct, the error is basically negligible.


I have no idea what I was thinking, but I stand corrected. Thanks.


And if you have a need for it to be precisely correct, I have bad news for you about the word "random".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: