How do you search for namespace-specific tags in XML using Elementtree in Python?
I have an XML/RSS document like:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:wp="http://wordpress.org/export/1.0/"
>
<channel>
<title>sometitle</title>
<pubDate>Tue, 28 Aug 2012 22:36:02 +0000</pubDate>
<generator>http://wordpress.org/?v=2.5.1</generator>
<language>en</language>
<wp:wxr_version>1.0</wp:wxr_version>
<wp:category><wp:category_nicename>apache</wp:category_nicename><wp:category_parent></wp:category_parent><wp:cat_name><![CDATA[Apache]]></wp:cat_name></wp:category>
</channel>
</rss>
But when I try and find all "wp:category" tags by doing:
import xml.etree.ElementTree as xml
tree = xml.parse(fn)
doc = tree.getroot()
categories = doc.findall('channel/wp:category')
I get the error:
SyntaxError: prefix 'wp' not found in prefix map
Searching for any non-namespace specific fields works just fine. What am I doing wrong?
You need to handle the namespace prefixes, either by using iterparse and handling the event directly or by explicitly declaring the prefixes you're interested in before parsing. Depending on what you're trying to do, I will admit in my lazier moments I just strip all the prefixes out with a string replace before parsing the XML.
EDIT: this similar question might help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With