So, I want to be able to use Chinese characters in my C++ program, and I need to use some type, to hold such characters beyond the ASCII range.
However, I tried to run the following code, and it worked.
#include <iostream>
int main() {
char snet[4];
snet[0] = '你';
snet[1] = '爱';
snet[2] = '我';
std::cout << snet << std::endl;
int conv = static_cast<int>(snet[0]);
std::cout << conv << std::endl; // -96
}
This doesn't make sense, as since a sizeof(char)
in C++, for the g++ compiler evaluates to 1, yet Chinese characters cannot be expressed in a single byte.
Why are the Chinese characters here being allowed to be housed in a char
type?
What type should be used to house Chinese characters or non-ASCII characters in C++?
Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.
Character Array is used to display the sequence of characters or numbers. Using char array we can store the variable in a memory to corresponding memory address.
Since Strings are immutable there is no way the contents of Strings can be changed because any change will produce new String, while if you char[] you can still set all his elements as blank or zero. So Storing the password in a character array clearly mitigates security risk of stealing passwords.
In C programming, the collection of characters is stored in the form of arrays. This is also supported in C++ programming. Hence it's called C-strings. C-strings are arrays of type char terminated with null character, that is, \0 (ASCII value of null character is 0).
When you compile the code using -Wall flag you will see warnings like:
warning: overflow in implicit constant conversion [-Woverflow] snet[2] = '我';
warning: multi-character character constant [-Wmultichar] snet1 = '爱';
Visual C++ in Debug mode, gives the following warning:
c:\users\you\temp.cpp(9): warning C4566: character represented by universal-character-name '\u4F60' cannot be represented in the current code page (1252)
What is happening under the curtains is that your two byte Chinese characters are implicitly converted to a char. That conversion overflows and therefore you are seeing a negative value or something weird when you print it in the console.
Why are the Chinese characters here being allowed to be housed in a char type?
You can, but you shouldn't, the same way that you can define char c = 1000000;
What type should be used to house Chinese characters or non-ASCII characters in C++?
If you want to store Chinese characters and you can use C++11, go for UTF-8 encoding with std::string (live example).
std::string msg = u8"你爱我";
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With