Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape tables into dataframe with BeautifulSoup

I'm trying to scrape the data from the coins catalog.

There is one of the pages. I need to scrape this data into Dataframe

So far I have this code:

import bs4 as bs import urllib.request import pandas as pd  source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/view/45518').read() soup = bs.BeautifulSoup(source,'lxml')  table = soup.find('table', attrs={'class':'subs noBorders evenRows'}) table_rows = table.find_all('tr')  for tr in table_rows:     td = tr.find_all('td')     row = [tr.text for tr in td]     print(row)                    # I need to save this data instead of printing it  

It produces following output:

[] ['', '', '1882', '', '108,000', 'UNC', '—'] [' ', '', '1883', '', '786,000', 'UNC', '~ $3.99'] [' ', " \n\n\n\n\t\t\t\t\t\t\t$('subGraph55337').on('click', function(event) {\n\t\t\t\t\t\t\t\tLightview.show({\n\t\t\t\t\t\t\t\t\thref : '/en/catalog/ajax/subgraph?id=55337',\n\t\t\t\t\t\t\t\t\trel : 'ajax',\n\t\t\t\t\t\t\t\t\toptions : {\n\t\t\t\t\t\t\t\t\t\tautosize : true,\n\t\t\t\t\t\t\t\t\t\ttopclose : true,\n\t\t\t\t\t\t\t\t\t\tajax : {\n\t\t\t\t\t\t\t\t\t\t\tevalScripts : true\n\t\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t} \n\t\t\t\t\t\t\t\t});\n\t\t\t\t\t\t\t\tevent.stop();\n\t\t\t\t\t\t\t\treturn false;\n\t\t\t\t\t\t\t});\n\t\t\t\t\t\t", '1884', '', '4,604,000', 'UNC', '~ $2.08–$4.47'] [' ', '', '1885', '', '1,314,000', 'UNC', '~ $3.20'] ['', '', '1886', '', '444,000', 'UNC', '—'] [' ', '', '1888', '', '413,000', 'UNC', '~ $2.88'] [' ', '', '1889', '', '568,000', 'UNC', '~ $2.56'] [' ', " \n\n\n\n\t\t\t\t\t\t\t$('subGraph55342').on('click', function(event) {\n\t\t\t\t\t\t\t\tLightview.show({\n\t\t\t\t\t\t\t\t\thref : '/en/catalog/ajax/subgraph?id=55342',\n\t\t\t\t\t\t\t\t\trel : 'ajax',\n\t\t\t\t\t\t\t\t\toptions : {\n\t\t\t\t\t\t\t\t\t\tautosize : true,\n\t\t\t\t\t\t\t\t\t\ttopclose : true,\n\t\t\t\t\t\t\t\t\t\tajax : {\n\t\t\t\t\t\t\t\t\t\t\tevalScripts : true\n\t\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t} \n\t\t\t\t\t\t\t\t});\n\t\t\t\t\t\t\t\tevent.stop();\n\t\t\t\t\t\t\t\treturn false;\n\t\t\t\t\t\t\t});\n\t\t\t\t\t\t", '1890', '', '2,137,000', 'UNC', '~ $1.28–$4.79'] ['', '', '1891', '', '605,000', 'UNC', '—'] [' ', '', '1892', '', '205,000', 'UNC', '~ $4.47'] [' ', '', '1893', '', '754,000', 'UNC', '~ $4.79'] [' ', '', '1894', '', '532,000', 'UNC', '~ $3.20'] [' ', '', '1895', '', '423,000', 'UNC', '~ $2.40'] ['', '', '1896', '', '174,000', 'UNC', '—'] 

But when I'm trying to save it to Dataframe and export to excel it contains just the last value:

         0 0          1          2     1896 3          4  174,000 5      UNC 6        — 
like image 799
Alex Avatar asked May 31 '18 21:05

Alex


People also ask

How do I extract a table from a website in Python?

Pandas can do this right out of the box, saving you from having to parse the html yourself. to_html() extracts all tables from your html and puts them in a list of dataframes. to_csv() can be used to convert each dataframe to a csv file.


1 Answers

Pandas already has a built-in method to convert the table on the web to a dataframe:

table = soup.find_all('table') df = pd.read_html(str(table))[0] 
like image 78
ttfreeman Avatar answered Sep 24 '22 01:09

ttfreeman