Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load Scraped Table via BS4 into Pandas Dataframe

I'm trying to grab any of the Basic Box Score Stat or Advanced Box Score Stats tables from here

I tried doing something like this:

url = "http://www.basketball-reference.com/boxscores/200112100LAC.html"
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")

table =  soup.find('div', class_='overthrow table_container').find('table',class_='sortable stats_table')
df = pd.read_html(table)
print df

However it doesnt work because of a NoneType' object error. Is there a better way to go about taking a table code and putting it into a dataframe? Thanks.

like image 469
Ravash Jalil Avatar asked Dec 12 '16 12:12

Ravash Jalil


2 Answers

You can use read_html which return list of DataFrames from all parsed tables:

df = pd.read_html('http://www.basketball-reference.com/boxscores/200112100LAC.html')[0] # or [1], [2]
print (df)
like image 176
jezrael Avatar answered Nov 04 '22 15:11

jezrael


table is tag object in BeautifulSoup, you should transform it to string and pass it to pandas

The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with each HTML/XML tag on its own line:

df = pd.read_html(table.prettify())
like image 26
宏杰李 Avatar answered Nov 04 '22 16:11

宏杰李