I'm trying to use a regex for group matching. I want to extract two strings from one big string.
The input string looks something like this:
tХB:[email protected] Connected tХB:[email protected] WEBMSG #Username :this is a message tХB:[email protected] Status: visible
The Username
can be anything. Same goes for the end part this is a message
.
What I want to do is extract the Username that comes after the pound sign #
. Not from any other place in the string, since that can vary aswell. I also want to get the message from the string that comes after the semicolon :
.
I tried that with the following regex. But it never outputs any results.
regex rgx("WEBMSG #([a-zA-Z0-9]) :(.*?)"); smatch matches; for(size_t i=0; i<matches.size(); ++i) { cout << "MATCH: " << matches[i] << endl; }
I'm not getting any matches. What is wrong with my regex?
C++11 uses ECMAScript grammar as the default grammar for regex. ECMAScript is simple, yet it provides powerful regex capabilities.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
std::regex_match, std::regex_replace() | Regex (Regular Expression) In C++ Regex is the short form for “Regular expression”, which is often used in this way in programming languages and many different libraries. It is supported in C++11 onward compilers.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.
Your regular expression is incorrect because neither capture group does what you want. The first is looking to match a single character from the set [a-zA-Z0-9]
followed by <space>:
, which works for single character usernames, but nothing else. The second capture group will always be empty because you're looking for zero or more characters, but also specifying the match should not be greedy, which means a zero character match is a valid result.
Fixing both of these your regex
becomes
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
But simply instantiating a regex
and a match_results
object does not produce matches, you need to apply a regex
algorithm. Since you only want to match part of the input string the appropriate algorithm to use in this case is regex_search
.
std::regex_search(s, matches, rgx);
Putting it all together
std::string s{R"( tХB:[email protected] Connected tХB:[email protected] WEBMSG #Username :this is a message tХB:[email protected] Status: visible )"}; std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)"); std::smatch matches; if(std::regex_search(s, matches, rgx)) { std::cout << "Match found\n"; for (size_t i = 0; i < matches.size(); ++i) { std::cout << i << ": '" << matches[i].str() << "'\n"; } } else { std::cout << "Match not found\n"; }
Live demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With