Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this password generator biased? [closed]

Is there a flaw in this command to generate passwords?

head -c 8 /dev/random | uuencode -m - | sed -n '2s/=*$//;2p'

After generating a few passwords with it, I started to suspect that it tends to favor certain characters. Of course people are good at seeing patterns where there aren't any, so I decided to test the command on a larger sample. The results are below.

From a sample of 12,000 generated (12-digit) passwords, here are the most and least common letters and how many times they appear.

  TOP 10          BOTTOM 10

Freq | Char      Freq | Char
-----|-----      -----|-----
2751 | I         1833 | p
2748 | Q         1831 | V
2714 | w         1825 | 1
2690 | Y         1821 | r
2673 | k         1817 | 7
2642 | o         1815 | R
2628 | g         1815 | 2
2609 | 4         1809 | u
2605 | 8         1791 | P
2592 | c         1787 | +

So for instance 'I' appears more than 1.5 times as often as '+'.

Is this statistically significant? If so, how can the command be improved?

like image 225
Joe Nelson Avatar asked Aug 23 '11 03:08

Joe Nelson


1 Answers

yes, i think it is going to be biased. uuencode requires 3 bytes for each 4 output characters. since you are giving it 8 bytes the last byte is padding of some (non-random) kind and that is going to bias the 12th character (and slightly affect the 11th too).

can you try

head -c 9 /dev/random | uuencode -m -

(with 9 instead of 8) instead and post the results? that should not have the same problem.

ps also, you will no longer need to drop the "=" padding, since that's a multiple of 3.

http://en.wikipedia.org/wiki/Uuencoding

pps it certainly appears statistically significant. you expect a natural variation of sqrt(mean), which is (guessing) sqrt(2000) or about 40. so three deviations from that, +/-120, or 1880-2120 should contain 99% of letters - you are seeing something much more systematic.

ppps neat idea.

ooops i just realised -m for uuencode forces base64 rather than the uudecode algorithm, but the same idea applies.

like image 80
andrew cooke Avatar answered Oct 21 '22 13:10

andrew cooke