Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I specify a namespace for an xml tag with BeautifulSoup4?

I'm using beautifulsoup4 like this:

from bs4 import BeautifulSoup
xml_string = u"""<something><dcterms:valid><![CDATA[

            start=2012-02-24T00:00:00Z
            end=2030-12-30T00:00:00Z
            scheme=W3C-DTF]]>
        </dcterms:valid></something>"""
soup = BeautifulSoup(xml_string, 'xml')
soup.find('dcterms:valid')  # returns None
soup.find('valid')  # returns the dcterms:valid node

Is there a way to specify the namespace in the soup.find(tagname) so I can be precise about what I'm looking to find?

like image 559
dar Avatar asked Aug 19 '13 14:08

dar


1 Answers

You don't need to specify 'xml' while parsing (Edit: unless there is cdata as pointed in comments).

Here is the sample piece of code that worked for me

>>> soup = BeautifulSoup(xml_string)
>>> soup.find('valid')
>>> soup.find('dcterms:valid')
<dcterms:valid start="2012-02-24T00:00:00Z" end="2030-12-30T00:00:00Z" scheme="W3C-DTF"></dcterms:valid>

>>> item = soup.find('dcterms:valid')
>>> item['start']
u'2012-02-24T00:00:00Z'
like image 169
Kalyan02 Avatar answered Sep 27 '22 18:09

Kalyan02