Scrape tables into dataframe with BeautifulSoup

Tags:

I'm trying to scrape the data from the coins catalog.

There is one of the pages. I need to scrape this data into Dataframe

So far I have this code:

import bs4 as bs import urllib.request import pandas as pd  source = urllib.request.urlopen('http://www.gcoins.net/en/catalog/view/45518').read() soup = bs.BeautifulSoup(source,'lxml')  table = soup.find('table', attrs={'class':'subs noBorders evenRows'}) table_rows = table.find_all('tr')  for tr in table_rows:     td = tr.find_all('td')     row = [tr.text for tr in td]     print(row)                    # I need to save this data instead of printing it

It produces following output:

[] ['', '', '1882', '', '108,000', 'UNC', '—'] [' ', '', '1883', '', '786,000', 'UNC', '~ $3.99'] [' ', " \n\n\n\n\t\t\t\t\t\t\t$('subGraph55337').on('click', function(event) {\n\t\t\t\t\t\t\t\tLightview.show({\n\t\t\t\t\t\t\t\t\thref : '/en/catalog/ajax/subgraph?id=55337',\n\t\t\t\t\t\t\t\t\trel : 'ajax',\n\t\t\t\t\t\t\t\t\toptions : {\n\t\t\t\t\t\t\t\t\t\tautosize : true,\n\t\t\t\t\t\t\t\t\t\ttopclose : true,\n\t\t\t\t\t\t\t\t\t\tajax : {\n\t\t\t\t\t\t\t\t\t\t\tevalScripts : true\n\t\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t} \n\t\t\t\t\t\t\t\t});\n\t\t\t\t\t\t\t\tevent.stop();\n\t\t\t\t\t\t\t\treturn false;\n\t\t\t\t\t\t\t});\n\t\t\t\t\t\t", '1884', '', '4,604,000', 'UNC', '~ $2.08–$4.47'] [' ', '', '1885', '', '1,314,000', 'UNC', '~ $3.20'] ['', '', '1886', '', '444,000', 'UNC', '—'] [' ', '', '1888', '', '413,000', 'UNC', '~ $2.88'] [' ', '', '1889', '', '568,000', 'UNC', '~ $2.56'] [' ', " \n\n\n\n\t\t\t\t\t\t\t$('subGraph55342').on('click', function(event) {\n\t\t\t\t\t\t\t\tLightview.show({\n\t\t\t\t\t\t\t\t\thref : '/en/catalog/ajax/subgraph?id=55342',\n\t\t\t\t\t\t\t\t\trel : 'ajax',\n\t\t\t\t\t\t\t\t\toptions : {\n\t\t\t\t\t\t\t\t\t\tautosize : true,\n\t\t\t\t\t\t\t\t\t\ttopclose : true,\n\t\t\t\t\t\t\t\t\t\tajax : {\n\t\t\t\t\t\t\t\t\t\t\tevalScripts : true\n\t\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t} \n\t\t\t\t\t\t\t\t});\n\t\t\t\t\t\t\t\tevent.stop();\n\t\t\t\t\t\t\t\treturn false;\n\t\t\t\t\t\t\t});\n\t\t\t\t\t\t", '1890', '', '2,137,000', 'UNC', '~ $1.28–$4.79'] ['', '', '1891', '', '605,000', 'UNC', '—'] [' ', '', '1892', '', '205,000', 'UNC', '~ $4.47'] [' ', '', '1893', '', '754,000', 'UNC', '~ $4.79'] [' ', '', '1894', '', '532,000', 'UNC', '~ $3.20'] [' ', '', '1895', '', '423,000', 'UNC', '~ $2.40'] ['', '', '1896', '', '174,000', 'UNC', '—']

But when I'm trying to save it to Dataframe and export to excel it contains just the last value:

         0 0          1          2     1896 3          4  174,000 5      UNC 6        —

799

asked May 31 '18 21:05

Alex

1 Answers

Pandas already has a built-in method to convert the table on the web to a dataframe:

table = soup.find_all('table') df = pd.read_html(str(table))[0]

answered Sep 24 '22 01:09

ttfreeman

Related questions
                            
                                'DataFrame' object has no attribute 'as_matrix
                            
                                pandas converting floats to strings without decimals
                            
                                Pandas filter dataframe rows with a specific year
                            
                                pandas cut with infinite upper/lower bounds
                            
                                Comparison between datetime and datetime64[ns] in pandas
                            
                                How to rearrange Pandas column sequence?
                            
                                Exception Handling in Pandas .apply() function
                            
                                Map dataframe index using dictionary
                            
                                Pandas deleting row with df.drop doesn't work
                            
                                Count number of words per row
                            
                                How to remove accents from values in columns?
                            
                                No module named 'pandas._libs.tslibs.timedeltas' in PyInstaller
                            
                                plotting a histogram on a Log scale with Matplotlib
                            
                                How to read a file with a semi colon separator in pandas
                            
                                Check whether non-index column sorted in Pandas
                            
                                extract month from date in python
                            
                                Get first and second highest values in pandas columns
                            
                                Normalize DataFrame by group
                            
                                Counting number of zeros per row by Pandas DataFrame?
                            
                                Using conditional to generate new column in pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrape tables into dataframe with BeautifulSoup

Tags:

pandas

dataframe

beautifulsoup

web-scraping

Alex

People also ask

1 Answers

ttfreeman

Recent Activity

Donate For Us