For this following xml, how do I fetch the xml and then parse it to get out the value for <age>
?
<boardgames>
<boardgame objectid="13">
<yearpublished>1995</yearpublished>
<minplayers>3</minplayers>
<maxplayers>4</maxplayers>
<playingtime>90</playingtime>
<age>10</age>
<name sortindex="1">Catan</name>
...
I'm currently trying:
result = urlfetch.fetch(url=game_url)
xml = ElementTree.fromstring(result.content)
But I'm not sure I'm on the right path. When I try to parse I get errors (I think because the xml is not valid xml).
xml.findtext('age')
or xml.findtext('boardgames/age')
would normally get you the 10 inside <age>10</age>
, but the parsing appears to fail due to invalid xml. ElementTree
does a rather poor job of parsing invalid xml in my experience.
Instead use BeautifulSoup, which handles invalid xml well.
content = urllib2.urlopen('http://boardgamegeek.com/xmlapi/boardgame/13').read()
soup = BeautifulSoup(content)
print soup.find('age').string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With