Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the towlower() function not convert the Я to a lower-case я?

The function towlower() doesn't seem to work in Visual Studio 2012. Here's an example:

#include <string>
#include <iostream>
#include <io.h>
#include <fcntl.h>
#include <wctype.h>

using namespace std;

int main()
{
    _setmode(_fileno(stdout), _O_U8TEXT);
    wcout << (wchar_t)towlower(L'Я') << endl;
    system("pause");
    return 0;
}

The character remains upper case. Similar questions have been asked here before but I can't find any solutions.

Is there another method I can use to change to lower case?

like image 679
Johnny Mnemonic Avatar asked Apr 08 '13 21:04

Johnny Mnemonic


2 Answers

Use the locale-aware version of tolower, but don't forget to also set the C locale.

For example:

#include <clocale>
#include <locale>
#include <iostream>

int main()
{
    std::setlocale(LC_CTYPE, "");
    std::wcout << L"The letter is: " << L'Я' << L" => "
               << std::tolower(L'Я', std::locale("")) << std::endl;
}

This prints:

The letter is: Я => я

Using locales in iostreams is tricky business, and there's a whole Pandora's box hidden behind this. For example, you can imbue streams with a locale, and you can manage multiple locales at once, and in particular you can have one per thread (which may be necessary for stateful string encoding conversions)... someone should write a book about that (or instead use Boost.Locale).

like image 81
Kerrek SB Avatar answered Nov 14 '22 00:11

Kerrek SB


I see two possibilities. The first one is locale not being set correctly. From MSDN:

The case conversion of towlower is locale-specific. Only the characters relevant to the current locale are changed in case. The functions without the _l suffix use the currently set locale.

The second one is the source file encoding. L'Я' might mean different things based on what your source file is encoded with. It won't work, for example, if you have it in UTF-8. Make sure you have it in UTF-16. Or to remove any possible confusion put it like this '\u042F'

Update: On the second thought this whole L business is tricky. If the compiler understands the encoding correctly, via BOM for example, it might be fine with UTF-8 or any other encoding. Important that it should know what the encoding is. It must be very much implementation specific.

Another update: To fix the problem try to set locale via:

_wsetlocale(LC_ALL, L"ru-RU");

or use the version that takes the locale as parameter (_towlower_l).

And there's also on top of everything a pragma that tells the compiler how to treat non-ASCII string literals in the file.

like image 28
detunized Avatar answered Nov 14 '22 00:11

detunized