I want a C++ regex that matches "bananas" or "pajamas" but not "bananas2" or "bananaspajamas" or "banana" or basically anything besides those exact two words. So I did this:
#include <regex.h>
#include <stdio.h>
int main()
{
regex_t rexp;
int rv = regcomp(&rexp, "\\bbananas\\b|\\bpajamas\\b", REG_EXTENDED | REG_NOSUB);
if (rv != 0) {
printf("Abandon hope, all ye who enter here\n");
}
regmatch_t match;
int diditmatch = regexec(&rexp, "bananas", 1, &match, 0);
printf("%d %d\n", diditmatch, REG_NOMATCH);
}
and it printed 1 1
as if there wasn't a match. What happened? I also tried \bbananas\b|\bpajamas\b
for my regex and that failed too.
I asked Whole-word matching using regex about std::regex, but std::regex is awful and slow so I'm trying regex.h.
To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.
Description. The <regex. h> header defines the structures and symbolic constants used by the regcomp(), regexec(), regerror(), and regfree() functions.
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
The POSIX standard specifies neither word boundary syntax nor look-behind and look-ahead syntax (which could be used to emulate a word boundary) for both BRE and ERE. Therefore, it's not possible to write a regex with word boundaries that works across different POSIX-compliant platforms.
For a portable solution, you should consider using PCRE, or Boost.Regex if you plan to code in C++.
Otherwise, you are stuck with a non-portable solution. If you are fine with such restriction, there are several alternatives:
\b
(word boundary), \B
(non word boundary), \<
(start of word), \>
(end of word).[[:<:]]
(start of word), [[:>:]]
(end of word) syntax.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With