I am reading an HTML table with pd.read_html but the result is coming in a list, I want to convert it inot a pandas dataframe, so I can continue further operations on the same. I am using the following script
import pandas as pd
import html5lib
data=pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2',skiprows=1)
and since My results are coming as 1 list, I tried to convert it into a data frame with
data1=pd.DataFrame(Data)
and result came as 0
0 0 1 2 3 4...
and because of result as a list, I can't apply any functions such as rename, dropna, drop.
I will appreciate every help
I think you need add [0]
if need select first item of list, because read_html
return list of DataFrames
:
So you can use:
import pandas as pd
data1 = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2',skiprows=1)[0]
print (data1)
0 1 2 3 4 5 6 7 8 9 \
0 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
1 1 Jamie Benn, LW DAL 82 35 52 87 1 64 1.06
2 2 John Tavares, C NYI 82 38 48 86 5 46 1.05
3 3 Sidney Crosby, C PIT 77 28 56 84 5 47 1.09
4 4 Alex Ovechkin, LW WSH 81 53 28 81 10 58 1.00
5 NaN Jakub Voracek, RW PHI 82 22 59 81 1 78 0.99
6 6 Nicklas Backstrom, C WSH 82 18 60 78 5 40 0.95
7 7 Tyler Seguin, C DAL 71 37 40 77 -1 20 1.08
8 8 Jiri Hudler, LW CGY 78 31 45 76 17 14 0.97
9 NaN Daniel Sedin, LW VAN 82 20 56 76 5 18 0.93
10 10 Vladimir Tarasenko, RW STL 77 37 36 73 27 31 0.95
11 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
12 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
13 NaN Nick Foligno, LW CBJ 79 31 42 73 16 50 0.92
14 NaN Claude Giroux, C PHI 81 25 48 73 -3 36 0.90
15 NaN Henrik Sedin, C VAN 82 18 55 73 11 22 0.89
16 14 Steven Stamkos, C TB 82 43 29 72 2 49 0.88
17 NaN Tyler Johnson, C TB 77 29 43 72 33 24 0.94
18 16 Ryan Johansen, C CBJ 82 26 45 71 -6 40 0.87
19 17 Joe Pavelski, C SJ 82 37 33 70 12 29 0.85
20 NaN Evgeni Malkin, C PIT 69 28 42 70 -2 60 1.01
21 NaN Ryan Getzlaf, C ANA 77 25 45 70 15 62 0.91
22 20 Rick Nash, LW NYR 79 42 27 69 29 36 0.87
23 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
24 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
25 21 Max Pacioretty, LW MTL 80 37 30 67 38 32 0.84
26 NaN Logan Couture, C SJ 82 27 40 67 -6 12 0.82
27 23 Jonathan Toews, C CHI 81 28 38 66 30 36 0.81
28 NaN Erik Karlsson, D OTT 82 21 45 66 7 42 0.80
29 NaN Henrik Zetterberg, LW DET 77 17 49 66 -6 32 0.86
30 26 Pavel Datsyuk, C DET 63 26 39 65 12 8 1.03
31 NaN Joe Thornton, C SJ 78 16 49 65 -4 30 0.83
32 28 Nikita Kucherov, RW TB 82 28 36 64 38 37 0.78
33 NaN Patrick Kane, RW CHI 61 27 37 64 10 10 1.05
34 NaN Mark Stone, RW OTT 80 26 38 64 21 14 0.80
35 NaN PP SH NaN NaN NaN NaN NaN NaN NaN
36 RK PLAYER TEAM GP G A PTS +/- PIM PTS/G
37 NaN Alexander Steen, LW STL 74 24 40 64 8 33 0.86
38 NaN Kyle Turris, C OTT 82 24 40 64 5 36 0.78
39 NaN Johnny Gaudreau, LW CGY 80 24 40 64 11 14 0.80
40 NaN Anze Kopitar, C LA 79 16 48 64 -2 10 0.81
41 35 Radim Vrbata, RW VAN 79 31 32 63 6 20 0.80
42 NaN Jaden Schwartz, LW STL 75 28 35 63 13 16 0.84
43 NaN Filip Forsberg, C NSH 82 26 37 63 15 24 0.77
44 NaN Jordan Eberle, RW EDM 81 24 39 63 -16 24 0.78
45 NaN Ondrej Palat, LW TB 75 16 47 63 31 24 0.84
46 40 Zach Parise, LW MIN 74 33 29 62 21 41 0.84
10 11 12 13 14 15 16
0 SOG PCT GWG G A G A
1 253 13.8 6 10 13 2 3
2 278 13.7 8 13 18 0 1
3 237 11.8 3 10 21 0 0
4 395 13.4 11 25 9 0 0
5 221 10.0 3 11 22 0 0
6 153 11.8 3 3 30 0 0
7 280 13.2 5 13 16 0 0
8 158 19.6 5 6 10 0 0
9 226 8.9 5 4 21 0 0
10 264 14.0 6 8 10 0 0
11 NaN NaN NaN NaN NaN NaN NaN
12 SOG PCT GWG G A G A
13 182 17.0 3 11 15 0 0
14 279 9.0 4 14 23 0 0
15 101 17.8 0 5 20 0 0
16 268 16.0 6 13 12 0 0
17 203 14.3 6 8 9 0 0
18 202 12.9 0 7 19 2 0
19 261 14.2 5 19 12 0 0
20 212 13.2 4 9 17 0 0
21 191 13.1 6 3 10 0 2
22 304 13.8 8 6 6 4 1
23 NaN NaN NaN NaN NaN NaN NaN
24 SOG PCT GWG G A G A
25 302 12.3 10 7 4 3 2
26 263 10.3 4 6 18 2 0
27 192 14.6 7 6 11 2 1
28 292 7.2 3 6 24 0 0
29 227 7.5 3 4 24 0 0
30 165 15.8 5 8 16 0 0
31 131 12.2 0 4 18 0 0
32 190 14.7 2 2 13 0 0
33 186 14.5 5 6 16 0 0
34 157 16.6 6 5 8 1 0
35 NaN NaN NaN NaN NaN NaN NaN
36 SOG PCT GWG G A G A
37 223 10.8 5 8 16 0 0
38 215 11.2 6 4 12 1 0
39 167 14.4 4 8 13 0 0
40 134 11.9 4 6 18 0 0
41 267 11.6 7 12 11 0 0
42 184 15.2 4 8 8 0 2
43 237 11.0 6 6 13 0 0
44 183 13.1 2 6 15 0 0
45 139 11.5 5 3 8 1 1
46 259 12.7 3 11 5 0 0
If your dataframe ends up with columns indexed as 0,1,2 etc and the headings in the first row, (as above) just specify that the column names are in the first row with header=0
Without this, pandas may see a mix of data types - text in row 1 and numbers in the rest and cast the column as object rather than, say, int64.
Full line would be:
data1 = pd.read_html(url, skiprows=1, header=0)[0]
[0]
is the first table in the list of possible tables.
There are options for handling NA values as well. Check out the documentation here: https://pandas.pydata.org/docs/reference/api/pandas.read_html.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With