Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most Efficient Way to Find Whether a Large List Contains a Specific String (Python)

Tags:

python

string

I have a file containing roughly all the words in English (~60k words, ~500k characters). I want to test whether a certain word I receive as input is "in English" (i.e. if this exact word is in the list).

What would be the most efficient way to do this in Python?

The trivial solution is to load the file into a list and check whether the word is in that list. The list can be sorted, which I believe will shrink the complexity to O(logn). However I'm not sure about how Python implements searching through lists, and whether there's a performance penalty if such a large list is in memory. Can I "abuse" the fact I can put a cap on the length of words? (e.g. say the longest one is 15 characters long).

Please note I run the application on a machine with lots of memory, so I care less for memory consumption than for speed and CPU utilization.

Thanks

like image 904
Roee Adler Avatar asked May 16 '09 12:05

Roee Adler


1 Answers

The python Set is what you should try.

A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

like image 150
gimel Avatar answered Oct 17 '22 07:10

gimel