Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::regex_replace bug when string contains \0

Tags:

c++

string

std

I maybe found a bug in std::regex_replace.

The following code should write "1a b2" with length 5, but it writes "1a2" with length 3.

Am I right? If not, why not?

#include <iostream>
#include <regex>

using namespace std;
int main()
{
    string a = regex_replace("1<sn>2", std::regex("<sn>"), string("a\0b", 3));

    cout << "a: " << a << "\n";
    cout << a.length();

    return 0;
}
like image 398
Art Avatar asked Dec 11 '21 16:12

Art


1 Answers

This does seem to be a bug in libstdc++. Using a debugger I stepped into regex_replace, until getting to this part:

 // std [28.11.4] Function template regex_replace
  /**
   * @brief Search for a regular expression within a range for multiple times,
   and replace the matched parts through filling a format string.
   * @param __out   [OUT] The output iterator.
   * @param __first [IN]  The start of the string to search.
   * @param __last  [IN]  One-past-the-end of the string to search.
   * @param __e     [IN]  The regular expression to search for.
   * @param __fmt   [IN]  The format string.
   * @param __flags [IN]  Search and replace policy flags.
   *
   * @returns __out
   * @throws an exception of type regex_error.
   */
  template<typename _Out_iter, typename _Bi_iter,
       typename _Rx_traits, typename _Ch_type,
       typename _St, typename _Sa>
    inline _Out_iter
    regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
          const basic_regex<_Ch_type, _Rx_traits>& __e,
          const basic_string<_Ch_type, _St, _Sa>& __fmt,
          regex_constants::match_flag_type __flags
          = regex_constants::match_default)
    {
      return regex_replace(__out, __first, __last, __e, __fmt.c_str(), __flags);
    }

Referencing this write-up at cppreference.com, this seems to be implementing the first overload, the one that takes a std::string for the replacement string, by calling its c_str() and then calling the 2nd overload, the one that takes a const char * parameter, for the actual implementation. And that explains the observed behavior. I can't find anything that requires this approach.

Stepping further into the actual implementation:

          auto __len = char_traits<_Ch_type>::length(__fmt);

              __out = __i->format(__out, __fmt, __fmt + __len, __flags);

So, it determines the length of the replacement string and passes the replacement string, as a beginning and an ending iterator, into format().

This seems like it should be the other way around, with __fmt preserved as a std::basic_string, and passing iterators directly derived from it into format().

like image 160
Sam Varshavchik Avatar answered Sep 27 '22 18:09

Sam Varshavchik