I'm a bit confused about the following C++11 code:
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string haystack("abcdefabcghiabc");
std::regex needle("abc");
std::smatch matches;
std::regex_search(haystack, matches, needle);
std::cout << matches.size() << std::endl;
}
I'd expect it to print out 3
but instead I get 1
. Am I missing something?
You get 1
because regex_search
returns only 1 match, and size()
will return the number of capture groups + the whole match value.
Your matches
is...:
Object of a match_results type (such as cmatch or smatch) that is filled by this function with information about the match results and any submatches found.
If [the regex search is] successful, it is not empty and contains a series of sub_match objects: the first sub_match element corresponds to the entire match, and, if the regex expression contained sub-expressions to be matched (i.e., parentheses-delimited groups), their corresponding sub-matches are stored as successive sub_match elements in the match_results object.
Here is a code that will find multiple matches:
#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
string str("abcdefabcghiabc");
int i = 0;
regex rgx1("abc");
smatch smtch;
while (regex_search(str, smtch, rgx1)) {
std::cout << i << ": " << smtch[0] << std::endl;
i += 1;
str = smtch.suffix().str();
}
return 0;
}
See IDEONE demo returning abc
3 times.
As this method destroys the input string, here is another alternative based on the std::sregex_iterator
(std::wsregex_iterator
should be used when your subject is an std::wstring
object):
int main() {
std::regex r("ab(c)");
std::string s = "abcdefabcghiabc";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::smatch m = *i;
std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
std::cout << " Capture: " << m[1].str() << " at Position " << m.position(1) << '\n';
}
return 0;
}
See IDEONE demo, returning
Match value: abc at Position 0
Capture: c at Position 2
Match value: abc at Position 6
Capture: c at Position 8
Match value: abc at Position 12
Capture: c at Position 14
What you're missing is that matches
is populated with one entry for each capture group (including the entire matched substring as the 0th capture).
If you write
std::regex needle("a(b)c");
then you'll get matches.size()==2
, with matches[0]=="abc"
, and matches[1]=="b"
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With