I have a file containing roughly all the words in English (~60k words, ~500k characters). I want to test whether a certain word I receive as input is "in English" (i.e. if this exact word is in the list).
What would be the most efficient way to do this in Python?
The trivial solution is to load the file into a list and check whether the word is in that list. The list can be sorted, which I believe will shrink the complexity to O(logn). However I'm not sure about how Python implements searching through lists, and whether there's a performance penalty if such a large list is in memory. Can I "abuse" the fact I can put a cap on the length of words? (e.g. say the longest one is 15 characters long).
Please note I run the application on a machine with lots of memory, so I care less for memory consumption than for speed and CPU utilization.
Thanks
The python Set is what you should try.
A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With