Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to match word boundary beginning with special characters

Tags:

regex

perl

I have regex that matches words fine except if they contain a special character such as ~Query which is the name of a member of a C++ class. Need to use word boundary as shown below for member names that are single characters. $key =~ /\b$match\b/

I tried numerous expressions I thought would work such as /[~]*\b$match\b/ or /\b[~]*$match\b/

Is it possible to put a word boundary on words that may contain a special character?

like image 383
Jeff Cunningham Avatar asked Oct 03 '12 16:10

Jeff Cunningham


People also ask

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What character's do you use to match on a word boundary?

The (\b ) is an anchor like the caret ( ^ ) and the dollar sign ( $ ). It matches a position that is called a “word boundary”. The word boundary match is zero-length.


2 Answers

\b

is short for

(?:(?<!\w)(?=\w)|(?<=\w)(?!\w))

If you want to treat ~ as a word character, change \w to [\w~].

(?:(?<![\w~])(?=[\w~])|(?<=[\w~])(?![\w~]))

Example usage:

my $word_char = qr/[\w~]/;
my $boundary  = qr/(?<!$word_char)(?=$word_char)
                  |(?<=$word_char)(?!$word_char)/x;

$key =~ /$boundary$match$boundary/

If we know $match can only match something that starts and ends with a $word_char, we can simplify as follows:

my $word_char   = qr/[\w~]/;
my $start_bound = qr/(?<!$word_char)/;
my $end_bound   = qr/(?!$word_char)/;

$key =~ /$start_bound$match$end_bound/

This is simple enough that we can inline.

$key =~ /(?<![\w~])$match(?![\w~])/
like image 194
ikegami Avatar answered Oct 12 '22 02:10

ikegami


Assuming you don't need to check the contents of $match (i.e. it always contains a valid identifier) you can write this

$key =~ /(?<![~\w])$match(?![~\w])/

which simply checks that the string in $match isn't preceded or followed by alphanumerics, underscores or tildes

like image 37
Borodin Avatar answered Oct 12 '22 01:10

Borodin