I try to parse html page and fetch values for currencies and write to csv. I have following code:
#!/usr/bin/env python
import urllib2
from BeautifulSoup import BeautifulSoup
contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())
table = soup.find('div', attrs={'class': 'content'})
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
text = td.find(text=True) + ';'
print text,
print
The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.
right-click anywhere in the table and select 'copy whole table' start up a spreadsheet application such as LibreOffice Calc. paste into the spreadsheet (select appropriate separator character as needed) save/export the spreadsheet as CSV.
For this, you can use different python libraries that help you extract content from the HTML table. One such method is available in the popular python Pandas library, it is called read_html(). The method accepts numerous arguments that allow you to customize how the table will be parsed.
You'd be much better off picking out specific cells in the table. The td
cells with the cell_c
class contain data you are interested in, and the last one is always the currency exchange rate:
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
if 'cell_c' in cols[0]['class']:
# currency row
digital_code, letter_code, units, name, rate = [c.text for c in cols]
print digital_code, letter_code, units, name, rate
With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With