Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does std::match_results::size return?

Tags:

c++

regex

c++11

I'm a bit confused about the following C++11 code:

#include <iostream>
#include <string>
#include <regex>

int main()
{
    std::string haystack("abcdefabcghiabc");
    std::regex needle("abc");
    std::smatch matches;
    std::regex_search(haystack, matches, needle);
    std::cout << matches.size() << std::endl;
}

I'd expect it to print out 3 but instead I get 1. Am I missing something?

like image 824
Morpheu5 Avatar asked Sep 24 '15 15:09

Morpheu5


2 Answers

You get 1 because regex_search returns only 1 match, and size() will return the number of capture groups + the whole match value.

Your matches is...:

Object of a match_results type (such as cmatch or smatch) that is filled by this function with information about the match results and any submatches found.

If [the regex search is] successful, it is not empty and contains a series of sub_match objects: the first sub_match element corresponds to the entire match, and, if the regex expression contained sub-expressions to be matched (i.e., parentheses-delimited groups), their corresponding sub-matches are stored as successive sub_match elements in the match_results object.

Here is a code that will find multiple matches:

#include <string>
#include <iostream>
#include <regex>
using namespace std;
int main() {
  string str("abcdefabcghiabc");
  int i = 0;
  regex rgx1("abc");
  smatch smtch;
  while (regex_search(str, smtch, rgx1)) {
        std::cout << i << ": " << smtch[0] << std::endl;
        i += 1;
        str = smtch.suffix().str();
  }
  return 0;
}

See IDEONE demo returning abc 3 times.

As this method destroys the input string, here is another alternative based on the std::sregex_iterator (std::wsregex_iterator should be used when your subject is an std::wstring object):

int main() {
    std::regex r("ab(c)");
    std::string s = "abcdefabcghiabc";
    for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
                             i != std::sregex_iterator();
                             ++i)
    {
        std::smatch m = *i;
        std::cout << "Match value: " << m.str() << " at Position " << m.position() << '\n';
        std::cout << "    Capture: " << m[1].str() << " at Position " << m.position(1) << '\n';
    }
    return 0;
}

See IDEONE demo, returning

Match value: abc at Position 0
    Capture: c at Position 2
Match value: abc at Position 6
    Capture: c at Position 8
Match value: abc at Position 12
    Capture: c at Position 14
like image 153
Wiktor Stribiżew Avatar answered Nov 14 '22 03:11

Wiktor Stribiżew


What you're missing is that matches is populated with one entry for each capture group (including the entire matched substring as the 0th capture).

If you write

std::regex needle("a(b)c");

then you'll get matches.size()==2, with matches[0]=="abc", and matches[1]=="b".

like image 24
Toby Speight Avatar answered Nov 14 '22 04:11

Toby Speight