Most Efficient Way to Find Whether a Large List Contains a Specific String (Python)

Question

I have a file containing roughly all the words in English (~60k words, ~500k characters). I want to test whether a certain word I receive as input is "in English" (i.e. if this exact word is in the list).

What would be the most efficient way to do this in Python?

The trivial solution is to load the file into a list and check whether the word is in that list. The list can be sorted, which I believe will shrink the complexity to O(logn). However I'm not sure about how Python implements searching through lists, and whether there's a performance penalty if such a large list is in memory. Can I "abuse" the fact I can put a cap on the length of words? (e.g. say the longest one is 15 characters long).

Please note I run the application on a machine with lots of memory, so I care less for memory consumption than for speed and CPU utilization.

Thanks

gimel · Accepted Answer

The python Set is what you should try.

A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

Most Efficient Way to Find Whether a Large List Contains a Specific String (Python)

Tags:

python

string

Roee Adler

1 Answers

gimel

Recent Activity

Donate For Us

Most Efficient Way to Find Whether a Large List Contains a Specific String (Python)

Tags:

python

string

Roee Adler

1 Answers

gimel

Related questions

Recent Activity

Donate For Us