Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is python's random number generation easily reproducible?

I was reading about python's random module in standard library. It amazes me that when I set the seed and produce a few random numbers:

random.seed(1)
for i in range(5):
    print random.random()

The numbers produced are exactly the same as the sample in the article. I think it's safe to say the algorithm is deterministic when the seed is set.

And when the seed is not set, the standard library seeds with time.time(). Now suppose an online service use random.random() to generate a captcha code, can a hacker use the same random generator to reproduce the captcha easily?

  1. Let's assume the hacker knows about the algorithm to convert random number to captcha code. Otherwise, it seems quite impossible.
  2. Since random.seed() is called when the module is imported, I assume for a web application, the time used as the seed is around the time the request is sent (within a few seconds), it won't be hard to caliberate with a few tries?

Am I worrying too much, or is this a real vulnerability?

like image 575
NeoWang Avatar asked Jul 11 '15 11:07

NeoWang


People also ask

Is Python random number generator deterministic?

The random number or data generated by Python's random module is not truly random; it is pseudo-random(it is PRNG), i.e., deterministic. The random module uses the seed value as a base to generate a random number.

How does Python random generate random numbers?

For "secure" random numbers, Python doesn't actually generate them: it gets them from the operating system, which has a special driver that gathers entropy from various real-world sources, such as variations in timing between keystrokes and disk seeks.

Is Python random pseudorandom?

Python, like any other programming technique, uses a pseudo-random generator. Python's random generation is based upon Mersenne Twister algorithm that produces 53-bit precision floats.

Is Python random actually random?

Most random data generated with Python is not fully random in the scientific sense of the word. Rather, it is pseudorandom: generated with a pseudorandom number generator (PRNG), which is essentially any algorithm for generating seemingly random but still reproducible data.


1 Answers

It shouldn't surprise you that the sequence is deterministic after seeding. That's the whole point of seeding. random.random is known as a PRNG, a pseudo- random number generator. This is not unique to Python, every language's simple random source is deterministic in this way.

And yes, people who are genuinely concerned about security will worry that an attacker could reproduce the sequence. That's why other sources of randomness are available, like os.urandom, but they are more expensive.

But the problem is not as bad as you say: for a web request, typically a process handles more than one request, so the module is initialized at some unknown point in the past, not when the web request was received.

like image 115
Ned Batchelder Avatar answered Sep 16 '22 12:09

Ned Batchelder