I'm trying to grab any of the Basic Box Score Stat or Advanced Box Score Stats tables from here
I tried doing something like this:
url = "http://www.basketball-reference.com/boxscores/200112100LAC.html"
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")
table = soup.find('div', class_='overthrow table_container').find('table',class_='sortable stats_table')
df = pd.read_html(table)
print df
However it doesnt work because of a NoneType' object error. Is there a better way to go about taking a table code and putting it into a dataframe? Thanks.
You can use read_html
which return list of DataFrame
s from all parsed tables:
df = pd.read_html('http://www.basketball-reference.com/boxscores/200112100LAC.html')[0] # or [1], [2]
print (df)
table
is tag object in BeautifulSoup, you should transform it to string and pass it to pandas
The prettify()
method will turn a Beautiful Soup parse tree into a nicely formatted Unicode string, with each HTML/XML tag on its own
line:
df = pd.read_html(table.prettify())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With