Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case insensitive XPath contains() possible?

I'm running over all textnodes of my DOM and check if the nodeValue contains a certain string.

/html/body//text()[contains(.,'test')] 

This is case sensitive. However, I also want to catch Test, TEST or TesT. Is that possible with XPath (in JavaScript)?

like image 951
Aron Woost Avatar asked Dec 12 '11 12:12

Aron Woost


People also ask

Is contains in XPath case sensitive?

This is case sensitive.

What is contain in XPath?

contains() is a Selenium function that searches for web elements that contain a specific text within an Xpath expression. The XPath function contains offers the ability to detect elements containing partial text. They are used in any condition on Xpath. Lets take an HTML code here: <html>

What is XPath function?

XPath can be used to navigate through elements and attributes in an XML document. XPath is a syntax for defining parts of an XML document. XPath uses path expressions to navigate in XML documents. XPath contains a library of standard functions. XPath is a major element in XSLT and in XQuery.


2 Answers

This is for XPath 1.0. If your environment supports XPath 2.0, see here.


Yes. Possible, but not beautiful.

/html/body//text()[   contains(     translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),     'test'   ) ] 

This would work for search strings where the alphabet is known beforehand. Add any accented characters you expect to see.


If you can, mark the text that interests you with some other means, like enclosing it in a <span> that has a certain class while building the HTML. Such things are much easier to locate with XPath than substrings in the element text.

If that's not an option, you can let JavaScript (or any other host language that you are using to execute XPath) help you with building an dynamic XPath expression:

function xpathPrepare(xpath, searchString) {   return xpath.replace("$u", searchString.toUpperCase())               .replace("$l", searchString.toLowerCase())               .replace("$s", searchString.toLowerCase()); }  xp = xpathPrepare("//text()[contains(translate(., '$u', '$l'), '$s')]", "Test"); // -> "//text()[contains(translate(., 'TEST', 'test'), 'test')]" 

(Hat tip to @KirillPolishchuk's answer - of course you only need to translate those characters you're actually searching for.)

This approach would work for any search string whatsoever, without requiring prior knowledge of the alphabet, which is a big plus.

Both of the methods above fail when search strings can contain single quotes, in which case things get more complicated.

like image 128
Tomalak Avatar answered Oct 07 '22 17:10

Tomalak


Case-insensitive contains

/html/body//text()[contains(translate(., 'EST', 'est'), 'test')] 
like image 32
Kirill Polishchuk Avatar answered Oct 07 '22 19:10

Kirill Polishchuk