I have this simple one line script:
from pandas import read_html
print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4')
Which works, fine, but the column names are missing, they are being identified as 1, 2, 3. Is there an easy way to tell pandas to use the first row as the column names? I know I could just store the names as a list and set them, and then skip the first row, but am wondering if there is an easier/better way.
Currently it prints:
0 1 2 3
0 Company Price Change % Change
1 AAPL Apple Inc 115.31 +6.17 +5.65%
2 BAC Bank of America Corp 15.20 -0.43 -2.75%
3 YHOO Yahoo! Inc 46.46 -1.53 -3.19%
4 MSFT Microsoft Corp 41.19 -1.47 -3.45%
5 FB Facebook Inc 76.24 +0.46 +0.61%
6 GE General Electric Co 23.84 -0.54 -2.21%
7 T AT&T Inc 32.68 -0.13 -0.40%
8 F Ford Motor Co 14.46 -0.24 -1.63%
9 INTC Intel Corp 33.78 -0.41 -1.20%
10 CSCO Cisco Systems Inc 26.80 -0.09 -0.35%
columns() to Convert Row to Column Header. You can use df. columns=df. iloc[0] to set the column labels by extracting the first row.
Pandas Set First Row as Header While Reading CSV The read_csv() method accepts the parameter header . You can pass header=[0] to make the first row from the CSV file as a header of the dataframe.
The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site's HTML.
'read_html` takes a header parameter. You can pass a row index:
read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4')
Worth noting this caveat in the docs:
For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0 argument
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With