I have an xml document in the following format: <pre class="prettyprint"><code><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007"> ... <entry> <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id> <updated>2011-11-07T21:32:39.795Z</updated> <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited> <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/> <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/> <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content> <gsa:content name="numCrawledURLs">7</gsa:content> <gsa:content name="numExcludedURLs">0</gsa:content> <gsa:content name="type">DirectoryContentData</gsa:content> <gsa:content name="numRetrievalErrors">0</gsa:content> </entry> <entry> ... </entry> ... </feed> </code></pre> I need to retrieve all <code>entry</code> elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise. <pre class="prettyprint"><code>import lxml.etree as et tree=et.fromstring(xml) </code></pre> The various things I have tried are: <pre class="prettyprint"><code>for node in tree.xpath('//entry'): </code></pre> or <pre class="prettyprint"><code>namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"} for node in tree.xpath('//entry', namespaces=ns): </code></pre> or <pre class="prettyprint"><code>for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'): </code></pre> At this point I just don't know what to try. Any help is greatly appreciated.

Something like this should work: <pre class="prettyprint"><code>import lxml.etree as et ns = {"atom": "http://www.w3.org/2005/Atom"} tree = et.fromstring(xml) for node in tree.xpath('//atom:entry', namespaces=ns): print node </code></pre> See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes. Alternative: <pre class="prettyprint"><code>for node in tree.xpath("//*[local-name() = 'entry']"): print node </code></pre>

Use findall method. <pre class="prettyprint"><code>for item in tree.findall('{http://www.w3.org/2005/Atom}entry'): print item </code></pre>

how do I use empty namespaces in an lxml xpath query?

Tags:

python

xml

xpath

lxml

I have an xml document in the following format:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007">   ...   <entry>     <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id>     <updated>2011-11-07T21:32:39.795Z</updated>     <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited>     <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>     <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>     <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content>     <gsa:content name="numCrawledURLs">7</gsa:content>     <gsa:content name="numExcludedURLs">0</gsa:content>     <gsa:content name="type">DirectoryContentData</gsa:content>     <gsa:content name="numRetrievalErrors">0</gsa:content>   </entry>   <entry>     ...   </entry>   ... </feed>

I need to retrieve all entry elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.

import lxml.etree as et  tree=et.fromstring(xml)

The various things I have tried are:

for node in tree.xpath('//entry'):

namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"}  for node in tree.xpath('//entry', namespaces=ns):

for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'):

At this point I just don't know what to try. Any help is greatly appreciated.

496

asked Nov 08 '11 16:11

ewok

2 Answers

Something like this should work:

import lxml.etree as et  ns = {"atom": "http://www.w3.org/2005/Atom"} tree = et.fromstring(xml) for node in tree.xpath('//atom:entry', namespaces=ns):     print node

See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

Alternative:

for node in tree.xpath("//*[local-name() = 'entry']"):     print node

answered Oct 16 '22 03:10

mzjn

Use findall method.

for item in tree.findall('{http://www.w3.org/2005/Atom}entry'):      print item

answered Oct 16 '22 05:10

Seb

Related questions
                            
                                numpy array of objects
                            
                                Most elegant way to modify elements of nested lists in place
                            
                                Combining Devanagari characters
                            
                                Parent instance is not bound to a Session; lazy load operation of attribute ’account’ cannot proceed
                            
                                Display python unittest results in nice, tabular form [closed]
                            
                                ImportError: No module named jinja2
                            
                                Why is the range object "not an iterator"? [duplicate]
                            
                                A faster alternative to Pandas `isin` function
                            
                                QLayout: Attempting to add QLayout "" to QWidget "", which already has a layout
                            
                                copy data from csv to postgresql using python
                            
                                Choosing from different cost function and activation function of a neural network
                            
                                How to use numpy in optional typing
                            
                                What does 'index 0 is out of bounds for axis 0 with size 0' mean?
                            
                                Using Smote with Gridsearchcv in Scikit-learn
                            
                                how do simple SQLAlchemy relationships work?
                            
                                Import C++ function into Python program
                            
                                Fullscreen with pyqt4?
                            
                                Reading a line from standard input in Python
                            
                                A super strange bug of os.path.abspath
                            
                                append two data frame with pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With