Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ regex with primary classes does not match

Tags:

c++

regex

locale

In https://en.cppreference.com/w/cpp/regex/regex_traits/transform_primary the following example snippet is proposed:

#include <iostream>
#include <regex>

int main()
{
    std::locale::global(std::locale("en_US.UTF-8"));
    std::wstring str = L"AÀÁÂÃÄÅaàáâãäå";
    std::wregex re(L"[[=a=]]*", std::regex::basic);
    std::cout << std::boolalpha << std::regex_match(str, re) << '\n';
}

It is also said that it should output true. However, trying it with GCC 8 and Clang 7 on Debian and with the Clang that comes with a macOS High Sierra always gave false (you can directly test this with the "Run" button in the cppreference page).

One might say that the cppreference page is wrong, which is surely possible, however reading the documentation it also seems to me that true is the right output: all the characters in the string str are, as I understand it, in the primary collating class of a.

So the question is: who is right? The compilers or cppreference? And why?

like image 273
Giovanni Mascellani Avatar asked Apr 15 '19 13:04

Giovanni Mascellani


1 Answers

Here's what the g++/libstdc++-9 implementation of transform_primary looks like:

template<typename _Fwd_iter>
string_type
transform_primary(_Fwd_iter __first, _Fwd_iter __last) const
{
  // TODO : this is not entirely correct.
  // This function requires extra support from the platform.
  //
  // Read http://gcc.gnu.org/ml/libstdc++/2013-09/msg00117.html and
  // http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2003/n1429.htm
  // for details.
  typedef std::ctype<char_type> __ctype_type;
  const __ctype_type& __fctyp(use_facet<__ctype_type>(_M_locale));
  std::vector<char_type> __s(__first, __last);
  __fctyp.tolower(__s.data(), __s.data() + __s.size());
  return this->transform(__s.data(), __s.data() + __s.size());
}

The comment says "is not entirely correct"; in my humble opinion the comment is not quite right. It should have said "this is totally wrong", because it is. It simply doesn't work.

The comment at the top of libc++-8 says:

// transform_primary is very FreeBSD-specific

Indeed it doesn't work on Linux at all (it returns an empty string for all characters). It could be working on a macOS, which is sort of a variant of FreeBSD, but I don't have one nearby to check. There could be a different bug lurking inside.

So the answer is, at least some of the compilers are wrong at least some of the time.

like image 53
n. 1.8e9-where's-my-share m. Avatar answered Sep 21 '22 07:09

n. 1.8e9-where's-my-share m.