I maybe found a bug in std::regex_replace
.
The following code should write "1a b2"
with length 5, but it writes "1a2"
with length 3.
Am I right? If not, why not?
#include <iostream>
#include <regex>
using namespace std;
int main()
{
string a = regex_replace("1<sn>2", std::regex("<sn>"), string("a\0b", 3));
cout << "a: " << a << "\n";
cout << a.length();
return 0;
}
This does seem to be a bug in libstdc++. Using a debugger I stepped into regex_replace
, until getting to this part:
// std [28.11.4] Function template regex_replace
/**
* @brief Search for a regular expression within a range for multiple times,
and replace the matched parts through filling a format string.
* @param __out [OUT] The output iterator.
* @param __first [IN] The start of the string to search.
* @param __last [IN] One-past-the-end of the string to search.
* @param __e [IN] The regular expression to search for.
* @param __fmt [IN] The format string.
* @param __flags [IN] Search and replace policy flags.
*
* @returns __out
* @throws an exception of type regex_error.
*/
template<typename _Out_iter, typename _Bi_iter,
typename _Rx_traits, typename _Ch_type,
typename _St, typename _Sa>
inline _Out_iter
regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
const basic_regex<_Ch_type, _Rx_traits>& __e,
const basic_string<_Ch_type, _St, _Sa>& __fmt,
regex_constants::match_flag_type __flags
= regex_constants::match_default)
{
return regex_replace(__out, __first, __last, __e, __fmt.c_str(), __flags);
}
Referencing this write-up at cppreference.com, this seems to be implementing the first overload, the one that takes a std::string
for the replacement string, by calling its c_str()
and then calling the 2nd overload, the one that takes a const char *
parameter, for the actual implementation. And that explains the observed behavior. I can't find anything that requires this approach.
Stepping further into the actual implementation:
auto __len = char_traits<_Ch_type>::length(__fmt);
__out = __i->format(__out, __fmt, __fmt + __len, __flags);
So, it determines the length of the replacement string and passes the replacement string, as a beginning and an ending iterator, into format()
.
This seems like it should be the other way around, with __fmt
preserved as a std::basic_string
, and passing iterators directly derived from it into format()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With