I'm using beautifulsoup4 like this:
from bs4 import BeautifulSoup
xml_string = u"""<something><dcterms:valid><![CDATA[
start=2012-02-24T00:00:00Z
end=2030-12-30T00:00:00Z
scheme=W3C-DTF]]>
</dcterms:valid></something>"""
soup = BeautifulSoup(xml_string, 'xml')
soup.find('dcterms:valid') # returns None
soup.find('valid') # returns the dcterms:valid node
Is there a way to specify the namespace in the soup.find(tagname)
so I can be precise about what I'm looking to find?
You don't need to specify 'xml' while parsing (Edit: unless there is cdata as pointed in comments).
Here is the sample piece of code that worked for me
>>> soup = BeautifulSoup(xml_string)
>>> soup.find('valid')
>>> soup.find('dcterms:valid')
<dcterms:valid start="2012-02-24T00:00:00Z" end="2030-12-30T00:00:00Z" scheme="W3C-DTF"></dcterms:valid>
>>> item = soup.find('dcterms:valid')
>>> item['start']
u'2012-02-24T00:00:00Z'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With