Logo Questions Linux Laravel Mysql Ubuntu Git Menu

wstring::find() doesn't work with non-latin symbols?

I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.

I use find() function for it:

    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");

L'ф' is a Cyrillic letter.

But find() in same call always returns npos. In a case with Latin letters find() works correctly.

It is a problem of this function? Or I incorrectly do something?


I use MinGW and save source in UTF-8. I also set locale with setlocale(LC_ALL, "");. Code same wcout << L'ф'; works coorectly. But same

wchar_t w;
wcin >> w;
wcout << w;

works incorrectly.

It is strange. Earlier I had no problems with the encoding, using setlocale ().

like image 291
shau-kote Avatar asked Apr 03 '13 15:04


3 Answers

The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:

std::wcout << std::hex << L"ф";

Before C++11, you could use non-ASCII characters in source code by using their hex values:

"\x05" "five"

C++11 adds the ability to specify their Unicode value, which in your case would be


If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:

const char* phi_utf8 = "\u03A6";
const char16_t* phi_utf16 = u"\u03A6";
const char32_t* phi_utf16 = U"\u03A6";
like image 104
rubenvb Avatar answered Nov 07 '22 17:11


You must set the encoding of the console.

This works:

#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>

using namespace std;

int main()
    _setmode(_fileno(stdout), _O_U16TEXT);
    _setmode(_fileno(stdin), _O_U16TEXT);
    wstring str;
    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
    return 0;
like image 45
Johnny Mnemonic Avatar answered Nov 07 '22 18:11

Johnny Mnemonic

std::wstring::find() works fine. But you have to read the input string correctly.

The following code runs fine on Windows console (the input Unicode string is read using ReadConsoleW() Win32 API):

#include <exception>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <windows.h>
using namespace std;

class Win32Error : public runtime_error
    Win32Error(const char* message, DWORD error)
        : runtime_error(message)
        , m_error(error)

    DWORD Error() const
        return m_error;

    DWORD m_error;

void ThrowLastWin32(const char* message)
    const DWORD error = GetLastError();
    throw Win32Error(message, error);

void Test()
    const HANDLE hStdIn = GetStdHandle(STD_INPUT_HANDLE);
        ThrowLastWin32("GetStdHandle failed.");

    static const int kBufferLen = 200;
    wchar_t buffer[kBufferLen];
    DWORD numRead = 0;

    if (! ReadConsoleW(hStdIn, buffer, kBufferLen, &numRead, nullptr))
        ThrowLastWin32("ReadConsoleW failed.");

    const wstring str(buffer, numRead - 2);

    static const wchar_t kEf = 0x0444;
    wcout << ((str.find(kEf) != wstring::npos) ? L"EXIST" : L"NONE");

int main()
    static const int kExitOk = 0;
    static const int kExitError = 1;

        return kExitOk;
    catch(const Win32Error& e)
        cerr << "\n*** ERROR: " << e.what() << '\n';
        cerr << "    (GetLastError returned " << e.Error() << ")\n";
        return kExitError;
    catch(const exception& e)
        cerr << "\n*** ERROR: " << e.what() << '\n';
        return kExitError;


like image 1
Mr.C64 Avatar answered Nov 07 '22 18:11
