How to parse html table with python and beautifulsoup and write to csv

Tags:

python

beautifulsoup

I try to parse html page and fetch values for currencies and write to csv. I have following code:

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

The problem is, that I do not know, how to retrieve only values for currency. I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work.

844

asked Mar 06 '13 14:03

user2140323

1 Answers

You'd be much better off picking out specific cells in the table. The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate:

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever.

151

answered Sep 28 '22 04:09

Martijn Pieters

Related questions
                            
                                How do I get tomorrow's date in Python?
                            
                                Key Error 4 in Python
                            
                                Django Circular Model Dependency
                            
                                Replace First and Last Word of String in the Most Pythonic Way
                            
                                Django session race condition?
                            
                                Read Celery configuration from Python properties file
                            
                                recv() in Python
                            
                                How can i write my custom link extractor in scrapy python
                            
                                Fabric Sudo No Password Solution
                            
                                How to find mtu value of network through code(in python)?
                            
                                Is there anything like Python export?
                            
                                How do I do a SQL style disjoint or set difference on two Pandas DataFrame objects?
                            
                                Picking up items progressivly as soon as a queue is available
                            
                                Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'
                            
                                good merkle hash tree python implementation?
                            
                                How to get multiple parameters with same name from a URL in Pylons?
                            
                                Converting postgresql timestamp to JavaScript timestamp in Python
                            
                                Analogue of Python's OrderedDict?
                            
                                Correct usage of os.path and os.join
                            
                                How to do nonlinear complex root finding in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With