Explanation needed for an UTF-8 vs cpp case

Tags:

I have Microsoft Visual Studio 2010 on Windows 7 64bit. (In project properties "Character set" is set to "Not set", however every setting leads to same output.)

Source code:

  using namespace std;
  char const charTest[] = "árvíztűrő tükörfúrógép ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP\n";
  cout << charTest;
  printf(charTest);
  if(set_codepage()) // SetConsoleOutputCP(CP_UTF8); // *1
    cerr << "DEBUG: set_codepage(): OK" << endl;
  else
    cerr << "DEBUG: set_codepage(): FAIL" << endl;
  cout << charTest;
  printf(charTest);

*1: Including windows.h messes up things, so I'm including it from a separate cpp.

The compiled binary contains the string as correct UTF-8 byte sequence. If I set the console to UTF-8 with chcp 65001 and issue type main.cpp, the string displays correctly.

Test (console set to use Lucida Console font):

D:\dev\user\geometry\Debug>chcp
Active code page: 852

D:\dev\user\geometry\Debug>listProcessing.exe
├írv├şzt┼▒r┼Ĺ t├╝k├Ârf├║r├│g├ęp ├üRV├ŹZT┼░R┼É T├ťK├ľRF├ÜR├ôG├ëP
├írv├şzt┼▒r┼Ĺ t├╝k├Ârf├║r├│g├ęp ├üRV├ŹZT┼░R┼É T├ťK├ľRF├ÜR├ôG├ëP
DEBUG: set_codepage(): OK
��rv��zt��r�� t��k��rf��r��g��p ��RV��ZT��R�� T��K��RF��R��G��P
árvíztűrő tükörfúrógép ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP

What is the explanation behind that? Can I somehow ask cout to work as printf?

ATTACHMENT

Many says that Windows console does not support UTF-8 characters at all. I'm a Hungarian guy in Hungary, my Windows is set to English (except date formats, they are set to Hungarian) and Cyrillic letters are still displayed correctly alongside Hungarian letters:

Hungarian and Cyrillic letters on console at the same time

(My default console codepage is CP852)

882

asked Sep 22 '12 15:09

Notinlist

1 Answers

The differences here is how C++ runtime and C library is handles system locale.

To achieve same result with std::cout you'll can try std::ios::imbue method and std::locale

But main issue with utf-8 and C++ described here

C++03 offers two kinds of string literals. The first kind, contained within double quotes, produces a null-terminated array of type const char. The second kind, defined as L"", produces a null-terminated array of type const wchar_t, where wchar_t is a wide-character. Neither literal type offers support for string literals with UTF-8, UTF-16, or any other kind of Unicode encodings.

So anyway it is all implementation specific and thus non-portable, because non of the standard C++ output streams can understand utf-8.

194

answered Oct 06 '22 00:10

Sergei Nikulov

Related questions
                            
                                How to find vector for the quaternion from X Y Z rotations
                            
                                What is an overlapped I/O alternative to WaitNamedPipe?
                            
                                Sharing heap memory with fork()
                            
                                Lowpass FIR Filter with FFT Convolution - Overlap add, why and how
                            
                                Calculating Catalan Numbers mod prime number
                            
                                error: no type named ‘value_type’ in ‘class
                            
                                Why is this copy constructor called rather than the move constructor?
                            
                                Operator as function pointer
                            
                                Specializing C++ template based on presence/absense of a class member?
                            
                                Random numbers from Beta distribution, C++
                            
                                Why do stdint.h can be found but cstdint not?
                            
                                Printing values of keypoint descriptor matrix opencv
                            
                                How to make thread synchronization without using mutex, semorphore, spinLock and futex?
                            
                                g++ Parse error at ":"
                            
                                What glutswapbuffers actually did?
                            
                                How to implement zero-copy tcp using lock-free circular buffer in C++
                            
                                Do invocations of std constructors need to be qualified?
                            
                                Virtual inheritance vs. non-default constructors
                            
                                Are recursive types really the only way to build noncontinuous arbitrary-size data structures?
                            
                                How to remove zero values from an array in parallel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Explanation needed for an UTF-8 vs cpp case

Tags:

c++

visual-studio

utf-8

Notinlist

People also ask

1 Answers

Sergei Nikulov

Recent Activity

Donate For Us