Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get HTML table into pandas Dataframe, not list of dataframe objects

I apologize if this question has been answered elsewhere but I have been unsuccessful in finding a satisfactory answer here or elsewhere.

I am somewhat new to python and pandas and having some difficulty getting HTML data into a pandas dataframe. In the pandas documentation it says .read_html() returns a list of dataframe objects, so when I try to do some data manipulation to get rid of the some samples I get an error.

Here is my code to read the HTML:

df = pd.read_html('http://espn.go.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)

Then I try to clean it up:

df = df.dropna(axis=0, thresh=4)

And I received the following error:

Traceback (most recent call last): File "module4.py", line 25, in
<module> df = df.dropna(axis=0, thresh=4) AttributeError: 'list'
object has no attribute 'dropna'

How do I get this data into an actual dataframe, similar to what .read_csv() does?

like image 463
schaefferda Avatar asked Jul 20 '16 16:07

schaefferda


1 Answers

From https://pandas.pydata.org/pandas-docs/version/0.17.1/io.html#io-read-html, read_html returns a list of DataFrame objects, even if there is only a single table contained in the HTML content".

So df = df[0].dropna(axis=0, thresh=4) should do what you want.

like image 135
Laurent S Avatar answered Sep 18 '22 16:09

Laurent S