Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transform_primary() and collate_byname()

Tags:

c++

c++11

locale

To give context of what I'm talking about, the following program correctly prints true when compiled with clang++/libc++

#include <iostream>
#include <regex>
int main()
{
    std::locale::global(std::locale("en_US.UTF-8"));
    std::wstring str = L"AÀÁÂÃÄÅaàáâãäå";
    std::wregex re(L"[[=a=]]*", std::regex::basic);
    std::cout << std::boolalpha << std::regex_match(str, re) << '\n';
}

however, I can't quite understand the description of std::regex_traits::transform_primary() in the standard (through which [=a=] is handled). To quote 28.7[re.traits]/7:

if typeid(use_facet<collate<charT> >) == typeid(collate_byname<charT>) and the form of the sort key returned by collate_byname<charT>::transform(first, last) is known and can be converted into a primary sort key then returns that key, otherwise returns an empty string.

The original proposal explains that the standard regex_traits::transform_primary() can only work if the collate facet in the imbued locale was not replaced by the user (that's the only way it can know how to convert the result of collate::transform() to the equivalence key).

My question is, how is the typeid comparison in the standard supposed to ensure that? Does it imply that all system-supplied facets pulled out of locales with use_facet have _byname as their true dynamic types?

like image 561
Cubbi Avatar asked Jul 13 '12 13:07

Cubbi


1 Answers

"My question is, how is the typeid comparison in the standard supposed to ensure that? Does it imply that all system-supplied facets pulled out of locales with use_facet have _byname as their true dynamic types?"

to answer the first half of your question, the typeid comparison ensures this because if the user has instantiated the template with a different value for use_facet, the typeid comparison will fail. if the typeid's do match, it will be guaranteed that the function to be dispatched will not have been overridden by the user. thus you'll get the system collate_byname class, and the proper transform will get called.

to answer the second part of your question, it does simply mean that all system-supplied facets associated with locales expected to be used by regex conform to this implementation requirement. find earlier in that same document from where you pulled the cited reference to 28.7

Note also that there is no portable way to implement transform_primary in terms of std::locale, since even if the sort key format returned by std::collate_byname<>::transform is known and can be converted into a primary sort key, the user can still install their own custom std::collate implementation into the locale object used, and that can use any sort key format they see fit. The transform_primary member function is therefore more of use to custom traits classes, and should throw an exception if it cannot be implemented for a particular locale.

in short, this is telling us that if there is anything but an expected (i.e. system-supplied) value for that type/typeid, the results could be unpredictable because the user could supply a different sort key format. by sticking with the system supplied value, the typeid for that facet will be known, and thus the sort-key will be known and predictable.

like image 117
john.k.doe Avatar answered Dec 26 '22 05:12

john.k.doe