Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex anchors inside character class

Tags:

regex

xslt

xquery

Is it possible to use anchors inside a character class? This doesn't work:

analyze-string('abcd', '[\s^]abcd[\s$]') 

It looks like ^ and $ are treated as literal when inside a character class; however, escaping them (\^, \$) doesn't work either.

I'm trying to use this expression to create word boundaries (\b is not available in XSLT/XQuery), but I would prefer not to use groups ((^|\s)) -- since non-capturing groups aren't available, that means in some scenarios I may end up with a large amount of unneeded capture groups, and that creates a new task of finding the "real" capture groups in the set of unneeded ones.

like image 993
wst Avatar asked May 29 '13 22:05

wst


People also ask

How do I use an anchor in regex?

Start of String or Line: ^ By default, the ^ anchor specifies that the following pattern must begin at the first character position of the string. If you use ^ with the RegexOptions. Multiline option (see Regular Expression Options), the match must occur at the beginning of each line.

How do I use character class in regex?

With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey.

What is ?: In regex?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.


2 Answers

I believe the answer is no, you can't include ^ and $ as anchors in a [], only as literal characters. (I've wished you could do that before too.)

However, you could concat a space on the front and back of the string, then just look for \s as word boundaries and never mind the anchors. E.g.

analyze-string(concat(' ', 'abcd xyz abcd', ' '), '\sabcd\s')

You may also want + after each \s, but that's a separate issue.

like image 185
LarsH Avatar answered Oct 17 '22 07:10

LarsH


If you're using analyze-string as a function, then presumably you're using a 3.0 implementation of either XSLT or XQuery.

In that case, why do you say "non-capturing groups aren't available"? The XPath Functions and Operators 3.0 spec is explicit that "Non-capturing groups are also recognized. These are indicated by the syntax (?:xxxx)."

like image 40
C. M. Sperberg-McQueen Avatar answered Oct 17 '22 06:10

C. M. Sperberg-McQueen