Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do different GCC 4.9.2 installations give different results for this regex match?

Tags:

c++

regex

gcc

c++14

I posted the following code on ideone and Coliru:

#include <iostream>
#include <regex>
#include <string>

int main() 
{
    std::string example{"   <match1>  <match2>    <match3>"};
    std::regex re{"<([^>]+)>"};
    std::regex_token_iterator<std::string::iterator> it{example.begin(), example.end(), re, 1};
    decltype(it) end{};
    while (it != end) std::cout << *it++ << std::endl;
    return 0;
}

Both sites use GCC 4.9.2. I don't know what compilation arguments ideone uses, but there is nothing unusual in Coliru's.

Coliru doesn't give me the match1 result:

Coliru

# g++ -v 2>&1 | grep version; \
# g++ -std=c++14 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
gcc version 4.9.2 (GCC) 
match2
match3

ideone (and, incidentally, Coliru's clang 3.5.0 using libc++)

match1
match2
match3

Does my code have undefined behaviour or something? What could cause this?

like image 807
Lightness Races in Orbit Avatar asked Apr 20 '15 14:04

Lightness Races in Orbit


2 Answers

It's a bug in libstdc++'s regex_token_iterator copy constructor, as called by the postincrement operator. The bug was fixed in December 2014; versions of gcc 4.9 and 5.x released since then will have the fix. The nature of the bug is that the copy of the iterator aliases the target of the copy, leading to the observed behavior.

The workaround is to use preincrement - this is desirable from a microoptimisation point of view as well, as regex_token_iterator is a reasonably heavy class:

for (; it != end; ++it) std::cout << *it << std::endl;
like image 161
ecatmur Avatar answered Sep 22 '22 07:09

ecatmur


The code is valid.

The only plausible explanation is that the standard library versions differ; although for the most part standard library implementations are shipped with compilers, they can be upgraded independently through, say, a Linux package manager.

In this instance it seems that this is a libstdc++ fault that was fixed late last year:

  • Coliru has __GLIBCXX__ == 20141030
  • ideone has __GLIBCXX__ == 20141220

The most likely match on Bugzilla that I can find is bug 63497 but, to be honest, I'm not convinced this particular bug was ever fully covered by Bugzilla. Joseph Mansfield identified that these specific symptoms in this specific case are triggered by the post-fix increment, at least.

like image 39
Lightness Races in Orbit Avatar answered Sep 25 '22 07:09

Lightness Races in Orbit