Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas reading html table

import pandas as pd
import pandas_datareader.data as web

coins = pd.read_html('https://coinmarketcap.com/')

for name in coins[0][1][1:]:
    print(name)

Results in the error message below. When I print coins, I get the complete table, but when I try and get specific info it gives me this error message. I know this format works as I have copied it exactly from other exercises I have been learning from, and have just changed the website. Many thanks.

C:\Users\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/Desktop/python_work/crypto/crypto_corr.py
Traceback (most recent call last):
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:/Users/Desktop/python_work/crypto/crypto_corr.py", line 6, in <module>
    for name in coins[0][1][1:]:
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "C:\Users\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

Process finished with exit code 1
like image 701
top bantz Avatar asked Sep 02 '18 15:09

top bantz


People also ask

Can pandas read HTML table?

The pandas read_html function will extract data from HTML tables and return a list of all the tables. Note that pandas read_html function returns a list of Pandas DataFrame objects.

How extract HTML table data from python?

For this, you can use different python libraries that help you extract content from the HTML table. One such method is available in the popular python Pandas library, it is called read_html(). The method accepts numerous arguments that allow you to customize how the table will be parsed.

How do you read a table in HTML?

We can read tables of an HTML file using the read_html() function. This function read tables of HTML files as Pandas DataFrames. It can read from a file or a URL.


1 Answers

If df is a dataframe, indexing like df[column] looks for columns called column. In your case, coins[0] is a dataframe, which does not have a column 1. However, it does have a column Name, so to print all names do the following:

df = coins[0]
for name in df['Name']:
    print(name)
like image 60
Andrey Portnoy Avatar answered Nov 15 '22 09:11

Andrey Portnoy