Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up string match against a list of strings?

Tags:

python

string

I have a list of strings. I am trying to find if any of these strings in the list appear in the english dictionary stored as another list.

I observed the time it takes to find a match grows linearly. However, it becomes way too long when the original list has a few thousand strings.

On my development EC2 instance, it takes ~2 seconds for 100 strings, ~15 seconds for 700 strings, ~100 seconds for 5000 strings, and ~800 seconds for 40000 strings!

Is there a way to speed this up? Thanks in advance.

    matching_word = ""
    for w in all_strings:
            if w in english_dict:
                    if matching_word: # More than one possible word
                            matching_word = matching_word + ",  " + w
                    else:
                            matching_word = w
like image 695
RebornCodeLover Avatar asked Dec 24 '22 17:12

RebornCodeLover


2 Answers

Instead of creating a string and extend it you can use list comprehension for that:

matching_words = [x for x in all_strings if x in english_dict]

Now you can make a string from that list using ", ".join(matching_sords).

Another option - using two sets you can use the & operator:

set(all_strings) & set(english_dict)

The result here will be a set with the items you have in both lists.

like image 182
Dekel Avatar answered Dec 29 '22 19:12

Dekel


Provided you don't have issues with memory, turn your english_dict to set (if you do have memory issues, load your dictionary as a set to begin with): english_dict = set(english_dict) (prior to the loop, of course)

That should significantly speed up the look-up. If that's not enough, you'll have to resort to creating search trees and similar search optimizations.

like image 30
zwer Avatar answered Dec 29 '22 19:12

zwer