I have this simple one line script: <pre class="prettyprint"><code>from pandas import read_html print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4') </code></pre> Which works, fine, but the column names are missing, they are being identified as 1, 2, 3. Is there an easy way to tell pandas to use the first row as the column names? I know I could just store the names as a list and set them, and then skip the first row, but am wondering if there is an easier/better way. Currently it prints: <pre class="prettyprint"><code> 0 1 2 3 0 Company Price Change % Change 1 AAPL Apple Inc 115.31 +6.17 +5.65% 2 BAC Bank of America Corp 15.20 -0.43 -2.75% 3 YHOO Yahoo! Inc 46.46 -1.53 -3.19% 4 MSFT Microsoft Corp 41.19 -1.47 -3.45% 5 FB Facebook Inc 76.24 +0.46 +0.61% 6 GE General Electric Co 23.84 -0.54 -2.21% 7 T AT&T Inc 32.68 -0.13 -0.40% 8 F Ford Motor Co 14.46 -0.24 -1.63% 9 INTC Intel Corp 33.78 -0.41 -1.20% 10 CSCO Cisco Systems Inc 26.80 -0.09 -0.35% </code></pre>

'read_html` takes a header parameter. You can pass a row index: <pre class="prettyprint"><code>read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4') </code></pre> Worth noting this caveat in the docs: <blockquote> For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0 argument </blockquote> http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html

Use first row as column names? Pandas read_html

Tags:

python

pandas

parsing

I have this simple one line script:

from pandas import read_html

print read_html('http://money.cnn.com/data/hotstocks/', flavor = 'bs4')

Which works, fine, but the column names are missing, they are being identified as 1, 2, 3. Is there an easy way to tell pandas to use the first row as the column names? I know I could just store the names as a list and set them, and then skip the first row, but am wondering if there is an easier/better way.

Currently it prints:

                           0       1       2         3
0                    Company   Price  Change  % Change
1             AAPL Apple Inc  115.31   +6.17    +5.65%
2   BAC Bank of America Corp   15.20   -0.43    -2.75%
3            YHOO Yahoo! Inc   46.46   -1.53    -3.19%
4        MSFT Microsoft Corp   41.19   -1.47    -3.45%
5            FB Facebook Inc   76.24   +0.46    +0.61%
6     GE General Electric Co   23.84   -0.54    -2.21%
7                 T AT&T Inc   32.68   -0.13    -0.40%
8            F Ford Motor Co   14.46   -0.24    -1.63%
9            INTC Intel Corp   33.78   -0.41    -1.20%
10    CSCO Cisco Systems Inc   26.80   -0.09    -0.35%

604

asked Jan 29 '15 03:01

nicholas.reichel

Video Answer

1 Answers

'read_html` takes a header parameter. You can pass a row index:

read_html('http://money.cnn.com/data/hotstocks/', header =0, flavor = 'bs4')

Worth noting this caveat in the docs:

For example, you might need to manually assign column names if the column names are converted to NaN when you pass the header=0 argument

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.html.read_html.html

answered Oct 11 '22 01:10

JAB

Related questions
                            
                                http request with timeout, maximum size and connection pooling
                            
                                osticket, create ticket through REST API
                            
                                Do I need to close connection in mongodb?
                            
                                Is os.path.join necessary?
                            
                                Flask: TypeError: 'int' object is not callable [duplicate]
                            
                                Why can't I establish connection to rabbitMQ using python?
                            
                                Add dynamic field to django admin model form
                            
                                Changing a single strings color within a QTextEdit
                            
                                SQLAlchemy one-to-one relation, primary as foreign key
                            
                                Exposing C++ functions, that return pointer using Boost.Python
                            
                                NumPy percentile function different from MATLAB's percentile function
                            
                                Sending an ASP.net POST with Python's Requests
                            
                                two Lists to Json Format in python
                            
                                Python cross correlation
                            
                                For loop in unittest
                            
                                How to install libpython2.7.so
                            
                                How to embed python in an Objective-C OS X application for plugins?
                            
                                plotting the projection of 3D plot in three planes using contours
                            
                                Average line for bar chart in matplotlib
                            
                                Sorted bar charts with pandas/matplotlib or seaborn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With