Optimizing BeautifulSoup (Python) code

Question

I have code that uses the BeautifulSoup library for parsing, but it is very slow. The code is written in such a way that threads cannot be used. Can anyone help me with this?

I am using BeautifulSoup for parsing and than save into a DB. If I comment out the save statement, it still takes a long time, so there is no problem with the database.

def parse(self,text):                
    soup = BeautifulSoup(text)
    arr = soup.findAll('tbody')                

    for i in range(0,len(arr)-1):
        data=Data()
        soup2 = BeautifulSoup(str(arr[i]))
        arr2 = soup2.findAll('td')

        c=0
        for j in arr2:                                       
            if str(j).find("<a href=") > 0:
                data.sourceURL = self.getAttributeValue(str(j),'<a href="')
            else:  
                if c == 2:
                    data.Hits=j.renderContents()

            #and few others...

            c = c+1

            data.save()

Any suggestions?

Note: I already ask this question here but that was closed due to incomplete information.

interjay · Accepted Answer

soup2 = BeautifulSoup(str(arr[i]))
arr2 = soup2.findAll('td')

Don't do this: Just call arr2 = arr[i].findAll('td') instead.

This will also be slow:

if str(j).find("<a href=") > 0:
    data.sourceURL = self.getAttributeValue(str(j),'<a href="')

Assuming that getAttributeValue gives you the href attribute, use this instead:

a = j.find('a', href=True)       #find first <a> with href attribute
if a:
    data.sourceURL = a['href']
else:
    #....

In general, you shouldn't need to convert the BeautifulSoup object back into a string if all you want to do is parse it and extract values. Since the find and findAll methods give you back searchable objects, you can keep searching by invoking the find/findAll/etc. methods on the results.

Optimizing BeautifulSoup (Python) code

Tags:

python

optimization

beautifulsoup

developer

1 Answers

interjay

Recent Activity

Donate For Us

Optimizing BeautifulSoup (Python) code

Tags:

python

optimization

beautifulsoup

developer

1 Answers

interjay

Related questions

Recent Activity

Donate For Us