Why don't the C or C++ standards explicitly define char as signed or unsigned?

Tags:

int main() {     char c = 0xff;     bool b = 0xff == c;     // Under most C/C++ compilers' default options, b is FALSE!!! }

Neither the C or C++ standard specify char as signed or unsigned, it is implementation-defined.

Why does the C/C++ standard not explicitly define char as signed or unsigned for avoiding dangerous misuses like the above code?

219

asked Mar 20 '13 19:03

xmllmx

1 Answers

Historical reasons, mostly.

Expressions of type char are promoted to int in most contexts (because a lot of CPUs don't have 8-bit arithmetic operations). On some systems, sign extension is the most efficient way to do this, which argues for making plain char signed.

On the other hand, the EBCDIC character set has basic characters with the high-order bit set (i.e., characters with values of 128 or greater); on EBCDIC platforms, char pretty much has to be unsigned.

The ANSI C Rationale (for the 1989 standard) doesn't have a lot to say on the subject; section 3.1.2.5 says:

Three types of char are specified: signed, plain, and unsigned. A plain char may be represented as either signed or unsigned, depending upon the implementation, as in prior practice. The type signed char was introduced to make available a one-byte signed integer type on those systems which implement plain char as unsigned. For reasons of symmetry, the keyword signed is allowed as part of the type name of other integral types.

Going back even further, an early version of the C Reference Manual from 1975 says:

A char object may be used anywhere an int may be. In all cases the char is converted to an int by propagating its sign through the upper 8 bits of the resultant integer. This is consistent with the two’s complement representation used for both characters and integers. (However, the sign-propagation feature disappears in other implementations.)

This description is more implementation-specific than what we see in later documents, but it does acknowledge that char may be either signed or unsigned. On the "other implementations" on which "the sign-propagation disappears", the promotion of a char object to int would have zero-extended the 8-bit representation, essentially treating it as an 8-bit unsigned quantity. (The language didn't yet have the signed or unsigned keyword.)

C's immediate predecessor was a language called B. B was a typeless language, so the question of char being signed or unsigned did not apply. For more information about the early history of C, see the late Dennis Ritchie's ~~home page~~, now moved here.

As for what's happening in your code (applying modern C rules):

char c = 0xff; bool b = 0xff == c;

If plain char is unsigned, then the initialization of c sets it to (char)0xff, which compares equal to 0xff in the second line. But if plain char is signed, then 0xff (an expression of type int) is converted to char -- but since 0xff exceeds CHAR_MAX (assuming CHAR_BIT==8), the result is implementation-defined. In most implementations, the result is -1. In the comparison 0xff == c, both operands are converted to int, making it equivalent to 0xff == -1, or 255 == -1, which is of course false.

Another important thing to note is that unsigned char, signed char, and (plain) char are three distinct types. char has the same representation as either unsigned char or signed char; it's implementation-defined which one it is. (On the other hand, signed int and int are two names for the same type; unsigned int is a distinct type. (Except that, just to add to the frivolity, it's implementation-defined whether a bit field declared as plain int is signed or unsigned.))

Yes, it's all a bit of a mess, and I'm sure it would have be defined differently if C were being designed from scratch today. But each revision of the C language has had to avoid breaking (too much) existing code, and to a lesser extent existing implementations.

answered Sep 20 '22 04:09

Keith Thompson

Related questions
                            
                                In C++11, does `i += ++i + 1` exhibit undefined behavior?
                            
                                Recommended usage of std::unique_ptr [duplicate]
                            
                                What's special about R and L in the C++ preprocessor?
                            
                                What are the similarities between the Java memory model and the C++11 memory model?
                            
                                Brace elision in std::array initialization
                            
                                Does the C++ standard guarantee that uniform initialization is exception-safe?
                            
                                Sorting zipped (locked) containers in C++ using boost or the STL
                            
                                C++ map<std::string> vs map<char *> performance (I know, "again?")
                            
                                Keeping all libraries in the Arduino sketch directory
                            
                                lvalue to rvalue implicit conversion
                            
                                What does an object look like in memory? [duplicate]
                            
                                Inheriting a constructor from a private template class in C++
                            
                                Pass by pointer & Pass by reference [duplicate]
                            
                                Raw pointer lookup for sets of unique_ptrs
                            
                                Compile-time constant id
                            
                                Resizing a C++ std::vector<char> without initializing data [duplicate]
                            
                                Lock-free swap of two unique_ptr<T>
                            
                                What's safe for a C++ plug-in system?
                            
                                Is there a reason for zero sized std::array in C++11?
                            
                                Why doesn't GCC optimize out deletion of null pointers in C++?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why don't the C or C++ standards explicitly define char as signed or unsigned?

Tags:

c++

c

types

standards

compiler-construction

xmllmx

People also ask

1 Answers

Keith Thompson

Recent Activity

Donate For Us