The following code is causing me enormous headaches:
def extract_by_letters(letters, dictionary):
for word in dictionary:
for letter in letters:
if word.count(letter) != letters.count(letter):
if word in dictionary: #I cant leave this line out
dictionary.remove(word)
return dictionary
First of all: the 'if word in dictionary' line. Why can't I leave this out? I get an error saying ValueError: list.remove(x): x not in list
But it is, obviously.
Second: Dictionary is a file of about 50,000 words separated by linebreaks. The above code takes about 2 minutes to run... Wayyy too long. I played with the code for a bit, and I found that the line:
if word.count(letter) != letters.count(letter):
seems to be causing all my problems. If I take that line out (which totally screws up the output), the function takes about 2 seconds to go through the dictionary.
I thought it was the count functions, but it's not.
if I change the if statement to something like:
print word.count(letter)
print letters.count(letter)
the function takes about 3 seconds to run.
I'm convinced it's the comparison. Any other suggestions? Is there a better way to do this?
Thanks in advance!
The reason you get the exception is that if the letter count matches for more than one letter, you are trying to remove the word more than once
The reason it is so slow is that you have loops inside loops inside loops.
If you would write a sentence or two about what the function is supposed to do, it would be much easier to refactor it. In the mean time, this would stop you checking whether a word needs to be removed once you have already removed it.
def extract_by_letters(letters, dictionary):
for word in dictionary[:]: # bad idea to change this while you iterate over it
for letter in letters:
if word.count(letter) != letters.count(letter):
dictionary.remove(word)
break
return dictionary
If dictionary is a set you should get some speed up. If dictionary is a list this should give a huge speedup
Try building the output instead of deleting from it:
def extract_by_letters(letters, dictionary):
d = []
for word in dictionary:
for letter in letters:
if word.count(letter)>0:
d.append(word)
break
return d
Or, you could use regular expressions:
import re
def extract_by_letters(letters, dictionary):
regex = re.compile('['+letters+']')
d=[]
for word in dictionary:
if regex.search(word):
d.append(word)
return d
Or, perhaps the simplest way:
import re
def extract_by_letters(letters, dictionary):
regex = re.compile('['+letters+']')
return [word for word in dictionary if regex.search(word)]
This last one takes no noticeable time to scan /usr/share/dict/words on my Mac, which is a list of 234936 words.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With