Python lxml XPath problem

Tags:

I'm trying to print/save a certain element's HTML from a web-page.
I've retrieved the requested element's XPath from firebug.

All I wish is to save this element to a file. I don't seem to succeed in doing so.
(tried the XPath with and without a /text() at the end)

I would appreciate any help, or past experience.
10x, David

import urllib2,StringIO
from lxml import etree

url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm'
seite = urllib2.urlopen(url)
html = seite.read()
seite.close()
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(html), parser)
xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()"
elem = tree.xpath(xpath)


print elem[0].strip().encode("utf-8")

645

asked Mar 16 '11 23:03

Trevor

1 Answers

Your XPath is obviously a bit too long, why don't you try shorter ones and see if they match. One problem might be "tbody" which gets automatically created in the DOM by browsers but the HTML markup usually does not contain it.

Here's an example of how to use XPath results:

>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = etree.parse(StringIO("<html><body>a<something/>b</body></root>"), etree.HTMLParser())
>>> doc.xpath("/html/body/text()")
['a', 'b']

So you could just "".join(...) all text parts together if needed.

answered Sep 26 '22 16:09

AndiDog

Related questions
                            
                                how do run syncdb without loading fixtures?
                            
                                How to make this Twisted Python Proxy faster?
                            
                                numpy.digitize returns values out of range?
                            
                                django-south with django-audit-log
                            
                                python - sys.argv and flag identification
                            
                                list sorting case insensitive using operator.attrgetter
                            
                                What is the simplest way to create a shaped window in wxPython?
                            
                                Matrix multiplication gives unsual result in Python (SciPy/PyLab)
                            
                                Python sending command over a socket
                            
                                python tornado setup
                            
                                How to alphabetically sort array of dictionaries on single key?
                            
                                Bearing between two points
                            
                                How to pack arbitrary bit sequence in Python?
                            
                                python equivalent to clojure's partition-all?
                            
                                How to override an app in Django properly?
                            
                                PyUSB backend not accessible
                            
                                Elegant way of reducing list by averaging?
                            
                                How to set the color of a single cell in a pygtk treeview?
                            
                                App Engine, appcfg and saving uploading credentials
                            
                                Handling leap years in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python lxml XPath problem

Tags:

python

xpath

lxml

Trevor

People also ask

1 Answers

AndiDog

Recent Activity

Donate For Us