I have a list of strings. I am trying to find if any of these strings in the list appear in the english dictionary stored as another list.
I observed the time it takes to find a match grows linearly. However, it becomes way too long when the original list has a few thousand strings.
On my development EC2 instance, it takes ~2 seconds for 100 strings, ~15 seconds for 700 strings, ~100 seconds for 5000 strings, and ~800 seconds for 40000 strings!
Is there a way to speed this up? Thanks in advance.
matching_word = ""
for w in all_strings:
if w in english_dict:
if matching_word: # More than one possible word
matching_word = matching_word + ", " + w
else:
matching_word = w
Instead of creating a string and extend it you can use list comprehension for that:
matching_words = [x for x in all_strings if x in english_dict]
Now you can make a string from that list using ", ".join(matching_sords)
.
Another option - using two sets you can use the &
operator:
set(all_strings) & set(english_dict)
The result here will be a set with the items you have in both lists.
Provided you don't have issues with memory, turn your english_dict
to set
(if you do have memory issues, load your dictionary as a set
to begin with): english_dict = set(english_dict)
(prior to the loop, of course)
That should significantly speed up the look-up. If that's not enough, you'll have to resort to creating search trees and similar search optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With