Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautiful Soup: 'ResultSet' object has no attribute 'find_all'?

I am trying to scrape a simple table using Beautiful Soup. Here is my code:

import requests
from bs4 import BeautifulSoup

url = 'https://gist.githubusercontent.com/anonymous/c8eedd8bf41098a8940b/raw/c7e01a76d753f6e8700b54821e26ee5dde3199ab/gistfile1.txt'
r = requests.get(url)

soup = BeautifulSoup(r.text)
table = soup.find_all(class_='dataframe')

first_name = []
last_name = []
age = []
preTestScore = []
postTestScore = []

for row in table.find_all('tr'):
    col = table.find_all('td')

    column_1 = col[0].string.strip()
    first_name.append(column_1)

    column_2 = col[1].string.strip()
    last_name.append(column_2)

    column_3 = col[2].string.strip()
    age.append(column_3)

    column_4 = col[3].string.strip()
    preTestScore.append(column_4)

    column_5 = col[4].string.strip()
    postTestScore.append(column_5)

columns = {'first_name': first_name, 'last_name': last_name, 'age': age, 'preTestScore': preTestScore, 'postTestScore': postTestScore}
df = pd.DataFrame(columns)
df

However, whenever I run it, I get this error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-116-a900c2872793> in <module>()
     14 postTestScore = []
     15 
---> 16 for row in table.find_all('tr'):
     17     col = table.find_all('td')
     18 

AttributeError: 'ResultSet' object has no attribute 'find_all'

I have read around a dozen StackOverflow questions about this error, and I cannot figure out what I am doing wrong.

like image 403
Anton Avatar asked Jun 08 '14 16:06

Anton


3 Answers

The table variable contains a list. You would need to call find_all on its members (even though you know it's a list with only one member), not on the entire thing.

>>> type(table)
<class 'bs4.element.ResultSet'>
>>> type(table[0])
<class 'bs4.element.Tag'>
>>> len(table[0].find_all('tr'))
6
>>>
like image 139
Ralf Haring Avatar answered Oct 19 '22 19:10

Ralf Haring


table = soup.find_all(class_='dataframe')

This gives you a result set – i.e. all the elements that match the class. You can either iterate over them or, if you know you only have one dataFrame, you can use find instead. From your code it seems the latter is what you need, to deal with the immediate problem:

table = soup.find(class_='dataframe')

However, that is not all:

for row in table.find_all('tr'):
    col = table.find_all('td')

You probably want to iterate over the tds in the row here, rather than the whole table. (Otherwise you'll just see the first row over and over.)

for row in table.find_all('tr'):
    for col in row.find_all('td'):
like image 16
otus Avatar answered Oct 19 '22 21:10

otus


Iterate over table and use rowfind_all('td')

   for row in table:
        col = row.find_all('td')
like image 2
Padraic Cunningham Avatar answered Oct 19 '22 19:10

Padraic Cunningham