I'm trying to print/save a certain element's HTML from a web-page.
I've retrieved the requested element's XPath from firebug.
All I wish is to save this element to a file.
I don't seem to succeed in doing so.
(tried the XPath with and without a /text()
at the end)
I would appreciate any help, or past experience.
10x, David
import urllib2,StringIO
from lxml import etree
url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm'
seite = urllib2.urlopen(url)
html = seite.read()
seite.close()
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(html), parser)
xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()"
elem = tree.xpath(xpath)
print elem[0].strip().encode("utf-8")
The xpath() method For ElementTree, the xpath method performs a global XPath query against the document (if absolute) or against the root node (if relative): >>> f = StringIO('<foo><bar></bar></foo>') >>> tree = etree.
lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. There are a lot of off-the-shelf XML parsers out there, but for better results, developers sometimes prefer to write their own XML and HTML parsers. This is when the lxml library comes to play.
lxml has been downloaded from the Python Package Index millions of times and is also available directly in many package distributions, e.g. for Linux or macOS.
Your XPath is obviously a bit too long, why don't you try shorter ones and see if they match. One problem might be "tbody" which gets automatically created in the DOM by browsers but the HTML markup usually does not contain it.
Here's an example of how to use XPath results:
>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = etree.parse(StringIO("<html><body>a<something/>b</body></root>"), etree.HTMLParser())
>>> doc.xpath("/html/body/text()")
['a', 'b']
So you could just "".join(...)
all text parts together if needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With