Is there a flaw in this command to generate passwords?
head -c 8 /dev/random | uuencode -m - | sed -n '2s/=*$//;2p'
After generating a few passwords with it, I started to suspect that it tends to favor certain characters. Of course people are good at seeing patterns where there aren't any, so I decided to test the command on a larger sample. The results are below.
From a sample of 12,000 generated (12-digit) passwords, here are the most and least common letters and how many times they appear.
TOP 10 BOTTOM 10
Freq | Char Freq | Char
-----|----- -----|-----
2751 | I 1833 | p
2748 | Q 1831 | V
2714 | w 1825 | 1
2690 | Y 1821 | r
2673 | k 1817 | 7
2642 | o 1815 | R
2628 | g 1815 | 2
2609 | 4 1809 | u
2605 | 8 1791 | P
2592 | c 1787 | +
So for instance 'I' appears more than 1.5 times as often as '+'.
Is this statistically significant? If so, how can the command be improved?
yes, i think it is going to be biased. uuencode requires 3 bytes for each 4 output characters. since you are giving it 8 bytes the last byte is padding of some (non-random) kind and that is going to bias the 12th character (and slightly affect the 11th too).
can you try
head -c 9 /dev/random | uuencode -m -
(with 9 instead of 8) instead and post the results? that should not have the same problem.
ps also, you will no longer need to drop the "=" padding, since that's a multiple of 3.
http://en.wikipedia.org/wiki/Uuencoding
pps it certainly appears statistically significant. you expect a natural variation of sqrt(mean), which is (guessing) sqrt(2000) or about 40. so three deviations from that, +/-120, or 1880-2120 should contain 99% of letters - you are seeing something much more systematic.
ppps neat idea.
ooops i just realised -m
for uuencode forces base64 rather than the uudecode algorithm, but the same idea applies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With