cppreference says std::ctype
provides character classification based on the classic "C" locale. Is this even true when we create a locale like this:
std::locale loc(std::locale("en_US.UTF8"), new std::ctype<char>);
Will the facet of loc
still classify characters based on the "C" locale or the Unicode one? If it classifies by the former, why do we even specify the locale name as "en_US.UTF8"?
The standard requires the default-constructed std::ctype<char>
to match the minimal "C" locale via §22.4.1.3.3[facet.ctype.char.statics]/1
static const mask* classic_table() noexcept;
Returns: A pointer to the initial element of an array of size
table_size
which represents the classifications of characters in the "C" locale
the classification member function is()
is defined in terms of table()
which is defined in terms of classic_table()
unless another table was provided to the ctype<char>
's constructor
I've updated cppreference to match these requirements more properly (it was saying "C" for std::ctype<wchar_t>
too)
To answer your second question, the locale constructed with std::locale loc(std::locale("en_US.UTF8"), new std::ctype<char>);
will use the ctype facet you specified (and, therefore, "C") to classify narrow characters, but it's redundant: narrow character classification of a plain std::locale("en_US.UTF8")
(at least in GNU implementation) is exactly the same:
#include <iostream>
#include <cassert>
#include <locale>
int main()
{
std::locale loc1("en_US.UTF8");
const std::ctype_base::mask* tbl1 =
std::use_facet<std::ctype<char>>(loc1).table();
std::locale loc2(std::locale("en_US.UTF8"), new std::ctype<char>);
const std::ctype_base::mask* tbl2 =
std::use_facet<std::ctype<char>>(loc2).table();
for(size_t n = 0; n < 256; ++n)
assert(tbl1[n] == tbl2[n]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With