Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wstring::find() doesn't work with non-latin symbols?

I have an wide-character string (std::wstring) in my code, and I need to search wide character in it.

I use find() function for it:

    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");

L'ф' is a Cyrillic letter.

But find() in same call always returns npos. In a case with Latin letters find() works correctly.

It is a problem of this function? Or I incorrectly do something?

UPD

I use MinGW and save source in UTF-8. I also set locale with setlocale(LC_ALL, "");. Code same wcout << L'ф'; works coorectly. But same

wchar_t w;
wcin >> w;
wcout << w;

works incorrectly.

It is strange. Earlier I had no problems with the encoding, using setlocale ().

like image 291
shau-kote Avatar asked Apr 03 '13 15:04

shau-kote


3 Answers

The encoding of your source file and the execution environment's encoding may be wildly different. C++ makes no guarantees about any of this. You can check this by outputting the hexadecimal value of your string literal:

std::wcout << std::hex << L"ф";

Before C++11, you could use non-ASCII characters in source code by using their hex values:

"\x05" "five"

C++11 adds the ability to specify their Unicode value, which in your case would be

L"\u03A6"

If you're going full C++11 (and your environment ensures these are encoded in UTF-*), you can use any of char, char16_t, or char32_t, and do:

const char* phi_utf8 = "\u03A6";
const char16_t* phi_utf16 = u"\u03A6";
const char32_t* phi_utf16 = U"\u03A6";
like image 104
rubenvb Avatar answered Nov 07 '22 17:11

rubenvb


You must set the encoding of the console.

This works:

#include <iostream>
#include <string>
#include <io.h>
#include <fcntl.h>
#include <stdio.h>

using namespace std;

int main()
{       
    _setmode(_fileno(stdout), _O_U16TEXT);
    _setmode(_fileno(stdin), _O_U16TEXT);
    wstring str;
    wcin >> str;
    wcout << ((str.find(L'ф') != wstring::npos)? L"EXIST":L"NONE");
    system("pause");
    return 0;
}
like image 45
Johnny Mnemonic Avatar answered Nov 07 '22 18:11

Johnny Mnemonic


std::wstring::find() works fine. But you have to read the input string correctly.

The following code runs fine on Windows console (the input Unicode string is read using ReadConsoleW() Win32 API):

#include <exception>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <string>
#include <windows.h>
using namespace std;

class Win32Error : public runtime_error
{
public:
    Win32Error(const char* message, DWORD error)
        : runtime_error(message)
        , m_error(error)
    {}

    DWORD Error() const
    {
        return m_error;
    }

private:
    DWORD m_error;
};

void ThrowLastWin32(const char* message)
{
    const DWORD error = GetLastError();
    throw Win32Error(message, error);
}

void Test()
{
    const HANDLE hStdIn = GetStdHandle(STD_INPUT_HANDLE);
    if (hStdIn == INVALID_HANDLE_VALUE)
        ThrowLastWin32("GetStdHandle failed.");

    static const int kBufferLen = 200;
    wchar_t buffer[kBufferLen];
    DWORD numRead = 0;

    if (! ReadConsoleW(hStdIn, buffer, kBufferLen, &numRead, nullptr))
        ThrowLastWin32("ReadConsoleW failed.");

    const wstring str(buffer, numRead - 2);

    static const wchar_t kEf = 0x0444;
    wcout << ((str.find(kEf) != wstring::npos) ? L"EXIST" : L"NONE");
}

int main()
{
    static const int kExitOk = 0;
    static const int kExitError = 1;

    try
    {
        Test();
        return kExitOk;
    }    
    catch(const Win32Error& e)
    {
        cerr << "\n*** ERROR: " << e.what() << '\n';
        cerr << "    (GetLastError returned " << e.Error() << ")\n";
        return kExitError;
    }
    catch(const exception& e)
    {
        cerr << "\n*** ERROR: " << e.what() << '\n';
        return kExitError;
    }        
}

Output:

C:\TEMP>test.exe
abc
NONE
C:\TEMP>test.exe
abcфabc
EXIST
like image 1
Mr.C64 Avatar answered Nov 07 '22 18:11

Mr.C64