reading some content from a web page read in python

Question

I am trying to read some data from a python module from a web.

I manage to read, however having some difficulty in parsing this data and getting the required information.

My code is below. Any help is appreciated.

#!/usr/bin/python2.7 -tt

import urllib
import urllib2

def Connect2Web():
  aResp = urllib2.urlopen("https://uniservices1.uobgroup.com/secure/online_rates/gold_and_silver_prices.jsp");
  web_pg = aResp.read();

  print web_pg

#Define a main() function that prints a litte greeting
def main():
  Connect2Web()

# This is the standard boilerplate that calls the maun function.
if __name__ == '__main__':
    main()

When I print this web page I get the whole web page printed.

I want to extract some information from it, (e.g. "SILVER PASSBOOK ACCOUNT" and get the rate from it), I am having some difficulties in parsing this html document.

Keith · Accepted Answer

It's not recommended to use RE to match XML/HTML. It can sometimes work, however. It's better to use an HTML parser and a DOM API. Here's an example:

import html5lib
import urllib2

aResp = urllib2.urlopen("https://uniservices1.uobgroup.com/secure/online_rates/gold_and_silver_prices.jsp")
t = aResp.read()
dom = html5lib.parse(t, treebuilder="dom")
trlist = dom.getElementsByTagName("tr")
print trlist[-3].childNodes[1].firstChild.childNodes[0].nodeValue

You could iterate over trlist to find your interesting data.

Added from comment: html5lib is third party module. See html5lib site. The easy_install or pip program should be able to install it.

max taldykin · Answer

It's possible to use regexps to get required data:

import urllib
import urllib2
import re

def Connect2Web():
  aResp = urllib2.urlopen("https://uniservices1.uobgroup.com/secure/online_rates/gold_and_silver_prices.jsp");
  web_pg = aResp.read();

  pattern = "<td><b>SILVER PASSBOOK ACCOUNT</b></td>" + "<td>(.*)</td>" * 4
  m = re.search(pattern, web_pg)
  if m:
    print "SILVER PASSBOOK ACCOUNT:"
    print "	Currency:", m.group(1)
    print "	Unit:", m.group(2)
    print "	Bank Sells:", m.group(3)
    print "	Bank Buys:", m.group(4)
  else:
    print "Nothing found"

Don't forget to re.compile the pattern if you are doing your matches in loop.

reading some content from a web page read in python

Tags:

python

tush1r

2 Answers

Keith

max taldykin

Recent Activity

Donate For Us

reading some content from a web page read in python

Tags:

python

tush1r

2 Answers

Keith

max taldykin

Related questions

Recent Activity

Donate For Us