Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mimic XPath 2.0 functions by lxml extension functions?

Tags:

python

xpath

lxml

I am following the lxml documentation on extension functions and want to mimic the upper-case function in XPath 2.0.

import urllib
from lxml import html, etree

ns = etree.FunctionNamespace(None)
ns['upper-case'] = lambda context, s: str.upper(s)

google_page = urllib.request.urlopen('http://www.google.com').read().decode('latin-1')
google_page_tree = html.fromstring(google_page)

# text == ['Google.com']
text = google_page_tree.xpath('//a[@id="fehl"]/text()')

# TypeError: descriptor 'upper' requires a 'str' object but received a 'list'
text = google_page_tree.xpath('//a[upper-case(@id)="FEHL"]/text()')    

It seems it's not the correct way because I see upper-case received an empty list []. Any ideas? Thank you.

like image 303
ziyuang Avatar asked Oct 01 '22 02:10

ziyuang


1 Answers

I am not familiar with your XPath API but @id selects in XPath 1.0 a node-set with a single attribute node and in XPath 2.0 a sequence with a single attribute node. I assume that the str.upper method expects a string value so instead of //a[upper-case(@id) = ...] try //a[upper-case(string(@id)) = ...]. That way the XPath expression should yield a string which the Python function knows to consume.

like image 72
Martin Honnen Avatar answered Oct 04 '22 20:10

Martin Honnen