Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to choose random element from list based on length

Tags:

python

Apologies if this is the wrong forum - it's my first question. I'm learning python and writing a password generator as an exercise from www.practicepython.org

I've written the following but it can be really slow so I guess i"m doing it inefficiently. I want to select a random word from the dictionary and then add ascii characters to it. I want at least 2 ascii characters in the password so I use a while loop to ensure that the word element contains (length - 2).

This works fine if you say that you want the password to be 10 characters long, but if you constrict to something like 5 I think the while loop has to go through so many iterations it can take up to 30 seconds.

I can't find the answer via searching - guidance appreciated!

import string
import random
import nltk
from nltk.corpus import words

word = words.words()[random.randint(1, len(words.words()))]
ascii_str = (string.ascii_letters + string.digits + string.punctuation)

length = int(input("How long do you want the password to be? "))

while len(word) >= (length - 2):
    word = words.words()[random.randint(1, len(words.words()))]

print("The password is: " + word, end="")

for i in range(0, (length - len(word))):
    print(ascii_str[random.randint(1, len(ascii_str) - 1)], end="")
like image 571
propofrolic Avatar asked Nov 25 '25 19:11

propofrolic


1 Answers

Start by calling words.words() just once and store that in a variable:

allwords = words.words()

That saves a lot of work, because now the nltk.corpus library won't try to load the whole list each time you try to get the length of the list or try to select a random word with the index you generated.

Next, use random.choice() to pick a random element from that list. That eliminates the need to keep passing in a list length:

word = random.choice(allwords)

# ...

while len(word) >= (length - 2):
    word = random.choice(allwords)

Next, you could group the words by length first:

allwords = words.words()
by_length = {}
for word in allwords:
    by_length.setdefault(len(word), []).append(word)

This gives you a dictionary with keys representing the length of the words; the nltk corpus has words between 1 and 24 letters long. Each value in the dictionary is a list of words of the same length, so by_length[12] would give you a list of words that are all exactly 12 characters long.

This allows you to pick words of a specific length:

# start with the desired length, and see if there are words this long in the
# dictionary, but don’t presume that all possible lengths exist:
wordlength = length - 2
while wordlength > 0 and wordlength not in by_length:
    wordlength -= 1

# we picked a length, but it could be 0, -1 or -2, so start with an empty word
# and then pick a random word from the list with words of the right length.
word = ''
if wordlength > 0:
    word = random.choice(by_length[wordlength])

Now word is the longest random word that'll fit your criteria: at least 2 characters shorter than the required length, and taken at random from the word list.

More importantly: we only picked a random word once. Provided you keep the by_length dictionary around for longer and re-use it in a password-generating function, that's a big win.

Picking the nearest available length from by_length can be done without stepping through every possible length one step at a time if you use bisection, but I’ll leave adding that as an exercise for the reader.

like image 195
Martijn Pieters Avatar answered Nov 27 '25 08:11

Martijn Pieters



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!