Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I use empty namespaces in an lxml xpath query?

I have an xml document in the following format:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/" xmlns:gsa="http://schemas.google.com/gsa/2007">   ...   <entry>     <id>https://ip.ad.dr.ess:8000/feeds/diagnostics/smb://ip.ad.dr.ess/path/to/file</id>     <updated>2011-11-07T21:32:39.795Z</updated>     <app:edited xmlns:app="http://purl.org/atom/app#">2011-11-07T21:32:39.795Z</app:edited>     <link rel="self" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>     <link rel="edit" type="application/atom+xml" href="https://ip.ad.dr.ess:8000/feeds/diagnostics"/>     <gsa:content name="entryID">smb://ip.ad.dr.ess/path/to/directory</gsa:content>     <gsa:content name="numCrawledURLs">7</gsa:content>     <gsa:content name="numExcludedURLs">0</gsa:content>     <gsa:content name="type">DirectoryContentData</gsa:content>     <gsa:content name="numRetrievalErrors">0</gsa:content>   </entry>   <entry>     ...   </entry>   ... </feed> 

I need to retrieve all entry elements using xpath in lxml. My problem is that I can't figure out how to use an empty namespace. I have tried the following examples, but none work. Please advise.

import lxml.etree as et  tree=et.fromstring(xml)     

The various things I have tried are:

for node in tree.xpath('//entry'): 

or

namespaces = {None:"http://www.w3.org/2005/Atom" ,"openSearch":"http://a9.com/-/spec/opensearchrss/1.0/" ,"gsa":"http://schemas.google.com/gsa/2007"}  for node in tree.xpath('//entry', namespaces=ns): 

or

for node in tree.xpath('//\"{http://www.w3.org/2005/Atom}entry\"'): 

At this point I just don't know what to try. Any help is greatly appreciated.

like image 496
ewok Avatar asked Nov 08 '11 16:11

ewok


People also ask

How does XPath handle namespace?

XPath treats the empty prefix as the null namespace. In other words, only prefixes mapped to namespaces can be used in XPath queries. This means that if you want to query against a namespace in an XML document, even if it is the default namespace, you need to define a prefix for it.

How do you find the XPath of an element lxml?

If all you have in your section of code is the element and you want the element's xpath do then element. getroottree(). getpath(element) will do the job.

What is namespace node in XPath?

Introduction to XPath namespace. In an XML document, namespaces are used to provide uniquely named components and attributes. A namespace is made up of two parts: a prefix and a URL. This indicates the location of a document that defines the namespace in question.


2 Answers

Something like this should work:

import lxml.etree as et  ns = {"atom": "http://www.w3.org/2005/Atom"} tree = et.fromstring(xml) for node in tree.xpath('//atom:entry', namespaces=ns):     print node 

See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

Alternative:

for node in tree.xpath("//*[local-name() = 'entry']"):     print node 
like image 70
mzjn Avatar answered Oct 16 '22 03:10

mzjn


Use findall method.

for item in tree.findall('{http://www.w3.org/2005/Atom}entry'):      print item 
like image 21
Seb Avatar answered Oct 16 '22 05:10

Seb