Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use first row as column names? Pandas read_html

I have this simple one line script:

from pandas import read_html

print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4')

Which works, fine, but the column names are missing, they are being identified as 1, 2, 3. Is there an easy way to tell pandas to use the first row as the column names? I know I could just store the names as a list and set them, and then skip the first row, but am wondering if there is an easier/better way.

Currently it prints:

                           0       1       2         3
0                    Company   Price  Change  % Change
1             AAPL Apple Inc  115.31   +6.17    +5.65%
2   BAC Bank of America Corp   15.20   -0.43    -2.75%
3            YHOO Yahoo! Inc   46.46   -1.53    -3.19%
4        MSFT Microsoft Corp   41.19   -1.47    -3.45%
5            FB Facebook Inc   76.24   +0.46    +0.61%
6     GE General Electric Co   23.84   -0.54    -2.21%
7                 T AT&T Inc   32.68   -0.13    -0.40%
8            F Ford Motor Co   14.46   -0.24    -1.63%
9            INTC Intel Corp   33.78   -0.41    -1.20%
10    CSCO Cisco Systems Inc   26.80   -0.09    -0.35%
like image 604
nicholas.reichel Avatar asked Jan 29 '15 03:01

nicholas.reichel


People also ask

How do I make the first row a column name in pandas?

columns() to Convert Row to Column Header. You can use df. columns=df. iloc[0] to set the column labels by extracting the first row.

How do I make the first row a header in pandas?

Pandas Set First Row as Header While Reading CSV The read_csv() method accepts the parameter header . You can pass header=[0] to make the first row from the CSV file as a header of the dataframe.

What does PD Read_html do?

The pandas read_html() function is a quick and convenient way to turn an HTML table into a pandas DataFrame. This function can be useful for quickly incorporating tables from various websites without figuring out how to scrape the site's HTML.


Video Answer


1 Answers

'read_html` takes a header parameter. You can pass a row index:

read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4')

Worth noting this caveat in the docs:

For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0 argument

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html

like image 66
JAB Avatar answered Oct 11 '22 01:10

JAB