Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract certain parts of a web page in Python

Target web page: http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm

The section I want to extract:

  <tr>
  <td>Skilled &ndash; Independent (Residence) subclass 885<br />online</td>
  <td>N/A</td>
  <td>N/A</td>
  <td>N/A</td>
  <td>15 May 2011</td>
  <td>N/A</td>
  </tr>

Once the code finds this section by searching the keyword "subclass 885
online
", it should then print the date which is within the 5th tag which is "15 May 2011" as shown above.

It's just a monitor for myself to keep an eye on the progress of my immigration application.

like image 408
jiaoziren Avatar asked Aug 14 '11 04:08

jiaoziren


1 Answers

"Beau--ootiful Soo--oop!

Beau--ootiful Soo--oop!

Soo--oop of the e--e--evening,

Beautiful, beauti--FUL SOUP!"

--Lewis Carroll, Alice's Adventures in Wonderland

I think this is exactly what he had in mind!

The Mock Turtle would probably do something like this:

>>> from BeautifulSoup import BeautifulSoup
>>> import urllib2
>>> url = 'http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm'
>>> page = urllib2.urlopen(url)
>>> soup = BeautifulSoup(page)
>>> for row in soup.html.body.findAll('tr'):
...     data = row.findAll('td')
...     if data and 'subclass 885online' in data[0].text:
...         print data[4].text
... 
15 May 2011

But I'm not sure it would help, since that date has already passed!

Good luck with the application!

like image 112
Johnsyweb Avatar answered Oct 10 '22 04:10

Johnsyweb