Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do character arrays accept non ASCII characters in C++?

Tags:

So, I want to be able to use Chinese characters in my C++ program, and I need to use some type, to hold such characters beyond the ASCII range.

However, I tried to run the following code, and it worked.

    #include <iostream>

    int main() {
      char snet[4];
      snet[0] = '你';
      snet[1] = '爱';
      snet[2] = '我';
      std::cout << snet << std::endl;
      int conv = static_cast<int>(snet[0]);
      std::cout << conv << std::endl; // -96
    }

This doesn't make sense, as since a sizeof(char) in C++, for the g++ compiler evaluates to 1, yet Chinese characters cannot be expressed in a single byte.

Why are the Chinese characters here being allowed to be housed in a char type?

What type should be used to house Chinese characters or non-ASCII characters in C++?

like image 583
Josh Weinstein Avatar asked Jan 12 '18 06:01

Josh Weinstein


People also ask

What is a non ASCII character?

Non-ASCII characters are those that are not encoded in ASCII, such as Unicode, EBCDIC, etc. ASCII is limited to 128 characters and was initially developed for the English language.

What is the purpose of character array in C?

Character Array is used to display the sequence of characters or numbers. Using char array we can store the variable in a memory to corresponding memory address.

Which is better String or char array?

Since Strings are immutable there is no way the contents of Strings can be changed because any change will produce new String, while if you char[] you can still set all his elements as blank or zero. So Storing the password in a character array clearly mitigates security risk of stealing passwords.

Can C have character array?

In C programming, the collection of characters is stored in the form of arrays. This is also supported in C++ programming. Hence it's called C-strings. C-strings are arrays of type char terminated with null character, that is, \0 (ASCII value of null character is 0).


1 Answers

When you compile the code using -Wall flag you will see warnings like:

warning: overflow in implicit constant conversion [-Woverflow] snet[2] = '我';

warning: multi-character character constant [-Wmultichar] snet1 = '爱';

Visual C++ in Debug mode, gives the following warning:

c:\users\you\temp.cpp(9): warning C4566: character represented by universal-character-name '\u4F60' cannot be represented in the current code page (1252)

What is happening under the curtains is that your two byte Chinese characters are implicitly converted to a char. That conversion overflows and therefore you are seeing a negative value or something weird when you print it in the console.

Why are the Chinese characters here being allowed to be housed in a char type?

You can, but you shouldn't, the same way that you can define char c = 1000000;

What type should be used to house Chinese characters or non-ASCII characters in C++?

If you want to store Chinese characters and you can use C++11, go for UTF-8 encoding with std::string (live example).

std::string msg = u8"你爱我"; 
like image 145
FrankS101 Avatar answered Sep 23 '22 13:09

FrankS101