Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xlocale broken on OS X?

I have a simple program that tests converting between wchar_t and char using a series of locales passed to it on the command line. It outputs a list of the conversions that fail by printing out the locale name and the string that failed to convert.

I'm building it using clang and libc++. My understanding is that libc++'s named locale support is provided by the xlocale library on OS X.

I'm seeing some unexpected failures, as well as some instances where conversion should fail, but doesn't.

Here's the program.

#warning call this program like: "locale -a | ./a.out" or pass \
locale names valid for your platform, one per line via standard input

#include <iostream>
#include <codecvt>
#include <locale>
#include <array>

template <class Facet>
class usable_facet : public Facet {
public:
    // FIXME: use inheriting constructors when available
    // using Facet::Facet;
    template <class ...Args>
    usable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
    ~usable_facet() {}
};

int main() {
    std::array<std::wstring,11> args = {L"a",L"é",L"¤",L"€",L"Да",L"Ψ",L"א",L"আ",L"✈",L"가",L"𐌅"};

    std::wstring_convert<usable_facet<std::codecvt_utf8<wchar_t>>> u8cvt; // wchar_t uses UCS-4/UTF-32 on this platform

    int convert_failures = 0;
    std::string line;
    while(std::getline(std::cin,line)) {
        if(line.empty())
            continue;

        using codecvt = usable_facet<std::codecvt_byname<wchar_t,char,std::mbstate_t>>;
        std::wstring_convert<codecvt> convert(new codecvt(line));

        for(auto const &s : args) {
            try {
                convert.to_bytes(s);
            } catch (std::range_error &e) {
                convert_failures++;
                std::cout << line << " : " << u8cvt.to_bytes(s) << '\n';
            }
        }
    }

    std::cout << std::string(80,'=') << '\n';
    std::cout << convert_failures << " wstring_convert to_bytes failures.\n";
}

Here are some examples of correct output

en_US.ISO8859-1 : €
en_US.US-ASCII : ✈

Here's an example of output that is not expected

en_US.ISO8859-15 : €

The euro character does exist in the ISO 8859-15 charset and so this should not be failing.

Here are examples of output that I expect but do not receive

en_US.ISO8859-15 : ¤
en_US.US-ASCII : ¤

This is the currency symbol that exists in ISO 8859-1 but was removed and replaced with the euro symbol in ISO 8859-15. This conversion should not be succeeding, but no error is being signaled. When examining this case further I find that in both cases '¤' is being converted to 0xA4, which is the ISO 8859-1 representation of '¤'.

I'm not using xlocale directly, only indirectly via libc++. Is xlocale on Mac OS X simply broken with bad locale definitions? Is there a way to fix it? Or are the issues I'm seeing a result of something else?

like image 535
bames53 Avatar asked Nov 04 '22 04:11

bames53


1 Answers

I suspect you are seeing problems with the xlocale system. A bug report would be most appreciated!

like image 135
Howard Hinnant Avatar answered Nov 10 '22 21:11

Howard Hinnant