Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ String to lowercase with custom locale

I've been trying to call std::tolower() with a different locale but it seems that something is going wrong. My code is as follows:

int main() {
    std::locale::global(std::locale("es_ES.UTF-8"));
    std::thread(&function, this); // Repeated some times
    // wait for threads
}

void function() {
    std::string word = "HeÉllO";
    std::transform(word.begin(), word.end(), word.begin(), cToLower);
}

int cToLower(int c) {
    return std::tolower(c, std::locale());
}

So when I try to execute this program I get:

terminate called after throwing an instance of 'std::bad_cast'
terminate called recursively
  what():  std::bad_cast
Aborted (core dumped)

Although executing return std::tolower(c); works fine, but it just converts the 'standard' characters to lower, and not É.

I have some threads which are executing the same function simultaneously, using C++11 and compiling with g++ (in case it has something to do with it).

I was wondering if this is the correct way to implement what I want to do, or there's some other way of doing it.

Thanks!

like image 565
lpares12 Avatar asked Feb 26 '17 14:02

lpares12


2 Answers

Unlike the version of tolower that came from C (which takes characters converted to unsigned char and then to int), the <locale> version of tolower is meant to be called with characters directly. It is defined to use the std::ctype<charT> facet of the locale, and the only two std::ctype specializations guaranteed to be available are std::ctype<char> and std::ctype<wchar_t>. Thus:

char cToLower(char c) {
    return std::tolower(c, std::locale());
}

Note that this is still a char-by-char transform; if the character occupies more than one byte, it is unlikely to handle it properly.

like image 82
T.C. Avatar answered Oct 05 '22 23:10

T.C.


Check if locale you are trying to use installed on your system. For example I have to install Spanish locale before code below stop crashing. Additionally you could work with wstring instead. Update: after some digging here is good explanation of using wstring - all cons and procs (cons mostly).

#include <thread>
#include <locale>
#include <algorithm> 
#include <iostream>

//forward declaration
void function();

int main() {
    std::locale::global(std::locale("es_ES.utf8"));
    std::thread test(&function);
    test.join();
}

wchar_t cToLower(wchar_t c) {        
    return std::tolower(c, std::locale());    
}

void function() {
    std::wstring word = L"HeÉllO";
    std::transform(word.begin(), word.end(), word.begin(), cToLower);
    std::wcout << word;
}

Output:

heéllo
like image 30
j2ko Avatar answered Oct 05 '22 23:10

j2ko