I have the following piece of code:
#include <iostream>
std::string eps("ε");
int main()
{
std::cout << eps << '\n';
return 0;
}
Somehow it compiles with g++ and clang on Ubuntu, and even prints out right character ε
.
Also I have almost same piece of code which happily reads ε
with cin
into std::string
.
By the way, eps.size()
is 2.
My question is - how that works? How can we insert unicode character into std::string
?
My guess is that operating system handles all this work with unicode, but I'm not sure.
EDIT
As with output, I understood that it is terminal who is responsible for showing me right character (ε in this case).
But with input: cin reads symbols to ' '
or any other space character (and as I understand byte by byte). So, if I take Ƞ
, which second byte is 32 ' '
it will read only first byte, and then stop. But it reads Ƞ
. How?
@MSalters: std::string can hold 100% of all Unicode characters, even if CHAR_BIT is 8. It depends on the encoding of std::string, which may be UTF-8 on the system level (like almost everywhere except for windows) or on your application level.
These are the two classes that you will actually use. std::string is used for standard ascii and utf-8 strings. std::wstring is used for wide-character/unicode (utf-16) strings. There is no built-in class for utf-32 strings (though you should be able to extend your own from basic_string if you need one).
On macOS specifically, std::string is UTF-8 (8-bit code units), and std::wstring is UTF-32 (32-bit code units); note that the size of wchar_t is platform-dependent. For both, size tracks the number of code units instead of the number of code points, or grapheme clusters.
There is no functionality difference between string and std::string because they're the same type.
The most likely reason is that everything is getting encoded in UTF-8, as it does on my system:
$ xxd test.cpp
...
0000020: 2065 7073 2822 ceb5 2229 3b0a 0a69 6e74 eps("..");..int
^^^^ ε in UTF-8 ^^ TWO bytes!
...
$ g++ -o test.out test.cpp
$ ./test.out
ε
$ ./test.out | xxd
0000000: ceb5 0a
^^^^
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With