Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath: Match whole word (using matches function with case insensitive flag)

Using XPath, I would like to "Match whole word" (option for user, just like in VS search).

It seems as though the functions contains and matches work similarly though matches allows for flags like i for case insensitivity.

In other words, I am getting the same results with these two XPath queries:

<pets>
    <dog name="Rupert" color="grey"/>
    <dog name="Ralph" color="brown"/>
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>
    <cat name="Fluffy" color="black"/>
</pets>

Matches XPath: //cat[descendant-or-self::*[@*[matches(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>


Contains XPath: //cat[descendant-or-self::*[@*[contains(.,'Cat')]]]
    returns:
    <cat name="Marvin the Cat" color="white"/>
    <cat name="Garfield the Cat" color="orange"/>
    <cat name="Cat" color="grey"/>

But I would like to use matches to return results that match "Cat" whole word only:

<cat name="Cat" color="grey"/>

How can I adjust the matches query so it matches whole word?

EDIT: I forgot to mention that I need to still use the matches function because I need the case insensitivity flag.

like image 266
developer Avatar asked May 01 '12 20:05

developer


People also ask

How to use matches function in XPath?

In XPath, the matches function the meta characters ^ and $ is used as anchors and the string is used to consider as pattern matching if any substring will matches a pattern. Basically, anchors are used to starting and end of the string of used to start and end of the line.


2 Answers

What about using ^ and $ characters as anchors?

//cat[descendant-or-self::*[@*[matches(.,'^Cat$')]]]

From RegEx Syntax in XQuery 1.0 and XPath 2.0:

Two meta-characters, ^ and $ are added. By default, the meta-character ^ matches the start of the entire string, while $ matches the end of the entire string.

like image 115
Petr Janeček Avatar answered Sep 29 '22 00:09

Petr Janeček


There are three functions/operators of relevance here.

matches() does a regular expression match; you can use it to match a substring or to match the entire string by use of anchors (^cat$), and you can set the 'i' flag to make it case-blind.

contains() does an exact match of a substring; you can use the third argument (collation) to request a case-blind match, but the way in which collations are specified depends on the processor you are using.

The eq operator does an exact match of the entire string; the "default collation" (which in the case of XPath will typically be set using the processor's API) can be used to request case-blind matching. This seems to be the one that is closest to your requirement, the only drawback is that specifying the collation is more system-dependent than using the "i" flag with matches().

like image 23
Michael Kay Avatar answered Sep 29 '22 02:09

Michael Kay