Load Scraped Table via BS4 into Pandas Dataframe

Question

I'm trying to grab any of the Basic Box Score Stat or Advanced Box Score Stats tables from here

I tried doing something like this:

url = "http://www.basketball-reference.com/boxscores/200112100LAC.html"
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")

table =  soup.find('div', class_='overthrow table_container').find('table',class_='sortable stats_table')
df = pd.read_html(table)
print df

However it doesnt work because of a NoneType' object error. Is there a better way to go about taking a table code and putting it into a dataframe? Thanks.

jezrael · Accepted Answer

You can use read_html which return list of DataFrames from all parsed tables:

df = pd.read_html('http://www.basketball-reference.com/boxscores/200112100LAC.html')[0] # or [1], [2]
print (df)

宏杰李 · Answer

table is tag object in BeautifulSoup, you should transform it to string and pass it to pandas

The prettify() method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with each HTML/XML tag on its own line:

df = pd.read_html(table.prettify())

Load Scraped Table via BS4 into Pandas Dataframe

Tags:

python

pandas

beautifulsoup

Ravash Jalil

2 Answers

jezrael

宏杰李

Recent Activity

Donate For Us

Load Scraped Table via BS4 into Pandas Dataframe

Tags:

python

pandas

beautifulsoup

Ravash Jalil

2 Answers

jezrael

宏杰李

Related questions

Recent Activity

Donate For Us