I am trying to match for country or Country using lower-case
function in XPath. translate
is kinda messy, so using lower-case and my Python version 2.6.6 has XPath 2.0 support I believe since lower-case is only available in XPath 2.0.
How I can put lower-case to use in my case is what I am looking for. Hope the example is self explanatory. I am looking for ['USA', 'US']
as output (both countries in one go which can happen if lower-case evaluates Country and country to be the same).
HTML: doc.htm
<html>
<table>
<tr>
<td>
Name of the Country : <span> USA </span>
</td>
</tr>
<tr>
<td>
Name of the country : <span> UK </span>
</td>
</tr>
</table>
Python :
import lxml.html as lh
doc = open('doc.htm', 'r')
out = lh.parse(doc)
doc.close()
print out.xpath('//table/tr/td[text()[contains(. , "Country")]]/span/text()')
# Prints : [' USA ']
print out.xpath('//table/tr/td[text()[contains(. , "country")]]/span/text()')
# Prints : [' UK ']
print out.xpath('//table/tr/td[lower-case(text())[contains(. , "country")]]/span/text()')
# Prints : [<Element td at 0x15db2710>]
Update :
out.xpath('//table/tr/td[text()[contains(translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") , "country")]]/span/text()')
Now the question remains, can I store the translate part as a global variable 'handlecase' and print that global variable whenever I do an XPath?
Something like this works :
handlecase = """translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")"""
out.xpath('//table/tr/td[text()[contains(%s , "country")]]/span/text()' % (handlecase))
But for sake of simplicity and readability, I want to run it like this :
out.xpath('//table/tr/td[text()[contains(handlecase , "country")]]/span/text()')
I believe the easiest thing to get what you want would be just writing an XPath Extension function.
By doing this, you could either write a lower-case()
function, or a case insensitive search.
You can find the details here: http://lxml.de/extensions.html
Use:
//td[translate(substring(text()[1], string-length(text()[1]) - 9),
'COUNTRY :',
'country'
)
=
'country'
]
/span/text()
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"//td[translate(substring(text()[1], string-length(text()[1]) - 9),
'COUNTRY :',
'country'
)
=
'country'
]
/span/text()
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<html>
<table>
<tr>
<td>
Name of the Country : <span> USA </span>
</td>
</tr>
<tr>
<td>
Name of the country : <span> UK </span>
</td>
</tr>
</table>
</html>
the XPath expression is evaluated and the selected two text-nodes are copied to the output:
USA UK
Explanation:
ends-with($text, $s)
: this is: .....
$s = substring($text, string-length($text) - string-length($s) +1)
.2. The next step is, using the translate()
function, to convert the ending 10-character long string to lowercase, eliminating any spaces or any ":" character.
.3. If the result is the string (all lowercase) "country", then we select the children text nodes (only one in this case) of the s=span
child of this td
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With