I have a simple program that tests converting between wchar_t and char using a series of locales passed to it on the command line. It outputs a list of the conversions that fail by printing out the locale name and the string that failed to convert.
I'm building it using clang and libc++. My understanding is that libc++'s named locale support is provided by the xlocale library on OS X.
I'm seeing some unexpected failures, as well as some instances where conversion should fail, but doesn't.
Here's the program.
#warning call this program like: "locale -a | ./a.out" or pass \
locale names valid for your platform, one per line via standard input
#include <iostream>
#include <codecvt>
#include <locale>
#include <array>
template <class Facet>
class usable_facet : public Facet {
public:
// FIXME: use inheriting constructors when available
// using Facet::Facet;
template <class ...Args>
usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
~usable_facet() {}
};
int main() {
std::array<std::wstring,11> args = {L"a",L"é",L"¤",L"€",L"Да",L"Ψ",L"א",L"আ",L"✈",L"가",L"𐌅"};
std::wstring_convert<usable_facet<std::codecvt_utf8<wchar_t>>> u8cvt; // wchar_t uses UCS-4/UTF-32 on this platform
int convert_failures = 0;
std::string line;
while(std::getline(std::cin,line)) {
if(line.empty())
continue;
using codecvt = usable_facet<std::codecvt_byname<wchar_t,char,std::mbstate_t>>;
std::wstring_convert<codecvt> convert(new codecvt(line));
for(auto const &s : args) {
try {
convert.to_bytes(s);
} catch (std::range_error &e) {
convert_failures++;
std::cout << line << " : " << u8cvt.to_bytes(s) << '\n';
}
}
}
std::cout << std::string(80,'=') << '\n';
std::cout << convert_failures << " wstring_convert to_bytes failures.\n";
}
Here are some examples of correct output
en_US.ISO8859-1 : €
en_US.US-ASCII : ✈
Here's an example of output that is not expected
en_US.ISO8859-15 : €
The euro character does exist in the ISO 8859-15 charset and so this should not be failing.
Here are examples of output that I expect but do not receive
en_US.ISO8859-15 : ¤
en_US.US-ASCII : ¤
This is the currency symbol that exists in ISO 8859-1 but was removed and replaced with the euro symbol in ISO 8859-15. This conversion should not be succeeding, but no error is being signaled. When examining this case further I find that in both cases '¤' is being converted to 0xA4, which is the ISO 8859-1 representation of '¤'.
I'm not using xlocale directly, only indirectly via libc++. Is xlocale on Mac OS X simply broken with bad locale definitions? Is there a way to fix it? Or are the issues I'm seeing a result of something else?
I suspect you are seeing problems with the xlocale system. A bug report would be most appreciated!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With