Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

`regex_match` returns both 'not found' and `match_results`

Tags:

c++

regex

c++11

In the following code (gcc 10.2.1), the call to regex_match returns 'no match', which I believe is correct.

sm.size() returns 0, but when iterating from sm.begin() to end(), it finds 3 occurrences (all empty strings).

If this is correct, what do these 3 finds mean ?

But since size()==0, shouldn't begin() == end() ?

Edit: Based on comments, I added the ready flag to the output

#include <iostream>
#include <string>
#include <regex>
#include <assert.h>

int main()
{
    std::string input("4321");
    std::regex rg("^([0-9])");
    std::smatch sm;

    bool found = std::regex_match(input, sm, rg);

    assert(!sm.size() == sm.empty());

     std::cout << "ready: " << sm.ready() << ", found: " <<
          found << ", size: " << sm.size() << std::endl;


    for (auto it = sm.begin(); it != sm.end(); ++it)
    {
        std::cout << "iterate '" << *it << "'\n";
    }
}

output:

ready: 1, found: 0, size: 0
iterate ''
iterate ''
iterate ''
like image 897
Tootsie Avatar asked Mar 13 '21 07:03

Tootsie


1 Answers

In GCC's implementation of match_results the prefix, suffix, and unmatched string are stored at the end of the sequence managed by the match_results object (which is implemented as a private std::vector base class). Those extra elements should not be visible when iterating from begin() to end(), but the end() function is returning the wrong position. It's returning an iterator to the end of the vector, after the three extra elements. It should be returning an iterator just before those, which would be equal to begin().

This is a bug, obviously. I'll fix it.

The fix is:

       const_iterator
       end() const noexcept
-      { return _Base_type::end() - (empty() ? 0 : 3); }
+      { return _Base_type::end() - (_Base_type::empty() ? 0 : 3); }

🤦

like image 71
Jonathan Wakely Avatar answered Nov 19 '22 23:11

Jonathan Wakely