To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.
Using character sets The pattern within the brackets of a regular expression defines a character set that is used to match a single character. For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter.
There is a method for matching specific characters using regular expressions, by defining them inside square brackets. For example, the pattern [abc] will only match a single a, b, or c letter and nothing else.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
Square brackets are meant for character class, and you're actually trying to match any one of: s
, |
, s
(again), e
, a
, s
(again), o
and n
.
Use parentheses instead for grouping:
(s|season)
or non-capturing group:
(?:s|season)
Note: Non-capture groups tell the engine that it doesn't need to store the match, while the other one (capturing group does). For small stuff, either works, for 'heavy duty' stuff, you might want to see first if you need the match or not. If you don't, better use the non-capture group to allocate more memory for calculation instead of storing something you will never need to use.
Above screenshot taken from this live example: https://regex101.com/r/cU5lC2/1
I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match
Start phpsh, put some content into a variable, match on word.
el@apollo:~/foo$ phpsh
php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'
php> echo preg_match('(\w+)', $content1);
1
php> echo preg_match('(\w+)', $content2);
1
php> echo preg_match('(\w+)', $content3);
0
The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1
, $content2
and $content3
with the (\w)+
pattern.
$content1 and $content2 contain at least one word, $content3 does not.
el@apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(dart|fart)', $gun1);
1
php> echo preg_match('(dart|fart)', $gun2);
1
php> echo preg_match('(dart|fart)', $gun3);
1
php> echo preg_match('(dart|fart)', $gun4);
0
Variables gun1
and gun2
contain the string dart
or fart
which is correct, but gun3 contains darty
and still matches, that is the problem. So onto the next example.
Word Boundaries can be force matched with \b
, see:
Regex Visual Image acquired from http://jex.im/regulex and https://github.com/JexCheng/regulex Example:
el@apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0
php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0
The \b
asserts that we have a word boundary, making sure " dart " is matched, but " darty " isn't.
I test examples in js. Simplest solution - just add word u need inside / /:
var reg = /cat/;
reg.test('some cat here');//1 test
true // result
reg.test('acatb');//2 test
true // result
Now if u need this specific word with boundaries, not inside any other signs-letters. We use b marker:
var reg = /\bcat\b/
reg.test('acatb');//1 test
false // result
reg.test('have cat here');//2 test
true // result
We have also exec() method in js, whichone returns object-result. It helps f.g. to get info about place/index of our word.
var matchResult = /\bcat\b/.exec("good cat good");
console.log(matchResult.index); // 5
If we need get all matched words in string/sentence/text, we can use g modifier (global match):
"cat good cat good cat".match(/\bcat\b/g).length
// 3
Now the last one - i need not 1 specific word, but some of them. We use | sign, it means choice/or.
"bad dog bad".match(/\bcat|dog\b/g).length
// 1
[ ]
defines a character class. So every character you set there, will match. [012]
will match 0
or 1
or 2
and [0-2]
behaves the same.
What you want is groupings to define a or-statement. Use (s|season)
for your issue.
Btw. you have to watch out. Metacharacters in normal regex (or inside a grouping) are different from character class. A character class is like a sub-language. [$A]
will only match $
or A
, nothing else. No escaping here for the dollar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With