Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath with lxml failing

Tags:

python

xpath

lxml

I am trying to query with XPath an html document parsed with lxml. The document is a straight html-only download of the page about Plastic in Wikipedia. Then I parse it with lxml disabling entity substitution to avoid an error with '&reg'

from lxml import etree
root = etree.parse("plastic.html",etree.XMLParser(resolve_entities=False))

Then, I retrieve the namespace url

htmltag = root.iter().next()
nsurl = htmltag.nsmap.values()[0]

Now, I would like to use xpath queries on either 'root' or 'htmltag', but I am unable to do so. I have tried different ways, but the following seems to me the most correct form, which yields errors anyway.

root.xpath('//ns:body',namespace={'ns',nsurl})

And this is what I get

XPathResultError: Unknown return type: dict

I am running the commands in an IPython console, but I don't think that might be the problem. What am I doing wrong?

like image 541
el_technic0 Avatar asked Feb 28 '12 00:02

el_technic0


People also ask

How do I know if LXML is installed?

Check lxml Version Python To check which version of lxml is installed, use pip show lxml or pip3 show lxml in your CMD/Powershell (Windows), or terminal (macOS/Linux/Ubuntu) to obtain the output major.

Is LXML standard Python?

lxml is not written in plain Python, because it interfaces with two C libraries: libxml2 and libxslt. Accessing them at the C-level is required for performance reasons.

What is LXML Etree in Python?

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API.


1 Answers

This is a simple miss spell. You should use namespaces instead of namespace.

like image 165
Takaki Taniguchi Avatar answered Oct 01 '22 18:10

Takaki Taniguchi