Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse html table with python and beautifulsoup and write to csv

I try to parse html page and fetch values for currencies and write to csv. I have following code:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.

like image 844
user2140323 Avatar asked Mar 06 '13 14:03

user2140323


People also ask

How do I export HTML table data as .CSV file?

right-click anywhere in the table and select 'copy whole table' start up a spreadsheet application such as LibreOffice Calc. paste into the spreadsheet (select appropriate separator character as needed) save/export the spreadsheet as CSV.

How extract HTML table from Python?

For this, you can use different python libraries that help you extract content from the HTML table. One such method is available in the popular python Pandas library, it is called read_html(). The method accepts numerous arguments that allow you to customize how the table will be parsed.


1 Answers

You'd be much better off picking out specific cells in the table. The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate:

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.

like image 151
Martijn Pieters Avatar answered Sep 28 '22 04:09

Martijn Pieters