I have an xml document in the following format:
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007"> ... <entry> <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id> <updated>2011-11-07T21:32:39.795Z</updated> <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited> <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/> <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/> <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content> <gsa:content name="numCrawledURLs">7</gsa:content> <gsa:content name="numExcludedURLs">0</gsa:content> <gsa:content name="type">DirectoryContentData</gsa:content> <gsa:content name="numRetrievalErrors">0</gsa:content> </entry> <entry> ... </entry> ... </feed>
I need to retrieve all entry
elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.
import lxml.etree as et tree=et.fromstring(xml)
The various things I have tried are:
for node in tree.xpath('//entry'):
or
namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"} for node in tree.xpath('//entry', namespaces=ns):
or
for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'):
At this point I just don't know what to try. Any help is greatly appreciated.
XPath treats the empty prefix as the null namespace. In other words, only prefixes mapped to namespaces can be used in XPath queries. This means that if you want to query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.
If all you have in your section of code is the element and you want the element's xpath do then element. getroottree(). getpath(element) will do the job.
Introduction to XPath namespace. In an XML document, namespaces are used to provide uniquely named components and attributes. A namespace is made up of two parts: a prefix and a URL. This indicates the location of a document that defines the namespace in question.
Something like this should work:
import lxml.etree as et ns = {"atom": "http://www.w3.org/2005/Atom"} tree = et.fromstring(xml) for node in tree.xpath('//atom:entry', namespaces=ns): print node
See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.
Alternative:
for node in tree.xpath("//*[local-name() = 'entry']"): print node
Use findall method.
for item in tree.findall('{http://www.w3.org/2005/Atom}entry'): print item
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With