Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

random.choice not random

Tags:

python

random

I'm using Python 2.5 on Linux, in multiple parallel FCGI processes. I use

    chars = string.ascii_letters + string.digits
    cookie = ''.join([random.choice(chars) for x in range(32)])

to generate distinct cookies. Assuming that the RNG is seeded from /dev/urandom, and that the sequence of random numbers comes from the Mersenne twister, I would expect that there is practically zero chance of collision.

However, I do see regular collisions, even though only a few (<100) users are logged in at any time.

Why are the random numbers not more random?

like image 791
Martin v. Löwis Avatar asked Sep 02 '09 06:09

Martin v. Löwis


People also ask

What is the difference between random sample and random choice?

Use the random. sample function when you want to choose multiple random items from a list without including the duplicates. Use random. choices function when you want to choose multiple items out of a list including repeated.

What does random choice mean?

Python Random choice() Method The choice() method returns a randomly selected element from the specified sequence. The sequence can be a string, a range, a list, a tuple or any other kind of sequence.

How do you select randomly in Python?

In Python, you can randomly sample elements from a list with choice() , sample() , and choices() of the random module. These functions can also be applied to a string and tuple. choice() returns one random element, and sample() and choices() return a list of multiple random elements.


2 Answers

It shouldn't be generating duplicates.

import random
chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
def gen():
    return ''.join([random.choice(chars) for x in range(32)])

test = [gen() for i in range(100000)]
print len(test), len(set(test)) # 100000 100000

The chances of duplicates is significant with chars = "ab"; 126 duplicates in 1000000 iterations. It's nonexistant with 62.

That said, this isn't a good way to generate cookies, because session cookies need to be unpredictable, to avoid attacks involving stealing other people's session cookies. The Mersenne Twister is not designed for generating secure random numbers. This is what I do:

import os, hashlib
def gen():
    return hashlib.sha1(os.urandom(512)).hexdigest()

test = [gen() for i in range(100000)]
print len(test), len(set(test))

... which should be very secure (which is to say, difficult to take a string of session cookies and guess other existing session cookies from them).

like image 165
Glenn Maynard Avatar answered Oct 01 '22 14:10

Glenn Maynard


This is definitely not a normal collision scenario:

  • 32 characters with 62 options per character is equivalent to 190 bits (log2(62) * 32)
  • According to the birthday paradox, you should be receiving a collision naturally once every 2**95 cookies, which means never

Could this be a concurrency issue?

  • If so, use different random.Random instances for each thread
  • Can save these instances in thread-local storage (threading.local())
  • On linux, Python should seed them using os.urandom() - not system time - so you should get different streams for each thread.
like image 36
orip Avatar answered Oct 01 '22 13:10

orip