Python searching a large list speed

Tags:

I have run into a speed issue searching through a very large list. I have a file with a lot of errors and very strange words in it. I am trying to use difflib to find the closest match in a dictionary file I have that has 650,000 words in it. This approach below works really well but is very very slow and I was wondering if there is a better way to approach this problem. This is the code:

Click to copy

from difflib import SequenceMatcher
headWordList = [ #This is a list of 650,000 words]


openFile = open("sentences.txt","r")

for line in openFile:
    sentenceList.append[line]

percentage = 0
count = 0

for y in sentenceList:
      if y not in headwordList:

         for x in headwordList:
             m = SequenceMatcher(None, y.lower(), x)

             if m.ratio() > percentage:
                 percentage = m.ratio()

                 word = x

         if percentage > 0.86:        
             sentenceList[count] = word
count=count+1

Thanks for the help, software engineering is not even close to my strong suit. Much appreciated.

814

asked Dec 23 '13 21:12

English Grad

1 Answers

Two things that might provide some small help:

1) Use the approach in this SO answer to read through your large file the most efficiently.

2) Change your code from

Click to copy

for x in headwordList:
    m = SequenceMatcher(None, y.lower(), 1)

Click to copy

yLower = y.lower()
for x in headwordList:
    m = SequenceMatcher(None, yLower, 1)

You're converting each sentence to lower 650,000 times. No need for that.

153

answered Nov 14 '22 09:11

Dillon Welch

Related questions
                            
                                Python : Check file is locked
                            
                                Are python sort keys guaranteed to be called only once?
                            
                                Should glob.glob(...) be preferred over os.listdir(...) or the other way around?
                            
                                Get the Gmail attachment filename without downloading it
                            
                                How to extract meaningful and useful content from web pages? [closed]
                            
                                How to add an element to xml file by using elementtree
                            
                                Evaluation order in python list and tuple literals
                            
                                Replace value in JSON file for key which can be nested by n levels
                            
                                Downloading the files(which are uploaded) from media folder in django 1.4.3
                            
                                Good Call Hierarchy in Eclipse/PyDev
                            
                                Using multiple colors in matplotlib plot
                            
                                Stuck at Flask tutorial step 3
                            
                                Pillow 2.0.0 tutorial [closed]
                            
                                Python Visual Debugger [closed]
                            
                                How to Access the Directory API in Admin SDK
                            
                                Python: wildcard subset import
                            
                                What are good features for classifying photos of clothing? [closed]
                            
                                How to access the meta attributes of a superclass in Python?
                            
                                What's the maximum number of repetitions allowed in a Python regex?
                            
                                Python - Efficient way to find the largest area of a specific value in a 2D numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python searching a large list speed

Tags:

python

text

list

optimization

memory

English Grad

People also ask

1 Answers

Dillon Welch

Recent Activity

Donate For Us