In C++, sizeof('a') == sizeof(char) == 1
. This makes intuitive sense, since 'a'
is a character literal, and sizeof(char) == 1
as defined by the standard.
In C however, sizeof('a') == sizeof(int)
. That is, it appears that C character literals are actually integers. Does anyone know why? I can find plenty of mentions of this C quirk but no explanation for why it exists.
A character literal contains a sequence of characters or escape sequences enclosed in single quotation mark symbols, for example 'c' . A character literal may be prefixed with the letter L, for example L'c' . A character literal without the L prefix is an ordinary character literal or a narrow character literal.
Character literals are enclosed in single quotes. For example, 'x' and can be stored in a simple variable of char type. A character literal can be a plain character (such as 'x'), an escape sequence (such as '\t'), or a universal character (such as '\u02C0').
In C++ the size of the character literal is char. In C the type of character literal is integer (int). So in C the sizeof('a') is 4 for 32bit architecture, and CHAR_BIT is 8. But the sizeof(char) is one byte for both C and C++.
An int is required to be at least a 16 bits signed word, and to accept all values between -32767 and 32767. That means that an int can accept all values from a char, be the latter signed or unsigned.
discussion on same subject
"More specifically the integral promotions. In K&R C it was virtually (?) impossible to use a character value without it being promoted to int first, so making character constant int in the first place eliminated that step. There were and still are multi character constants such as 'abcd' or however many will fit in an int."
The original question is "why?"
The reason is that the definition of a literal character has evolved and changed, while trying to remain backwards compatible with existing code.
In the dark days of early C there were no types at all. By the time I first learnt to program in C, types had been introduced, but functions didn't have prototypes to tell the caller what the argument types were. Instead it was standardised that everything passed as a parameter would either be the size of an int (this included all pointers) or it would be a double.
This meant that when you were writing the function, all the parameters that weren't double were stored on the stack as ints, no matter how you declared them, and the compiler put code in the function to handle this for you.
This made things somewhat inconsistent, so when K&R wrote their famous book, they put in the rule that a character literal would always be promoted to an int in any expression, not just a function parameter.
When the ANSI committee first standardised C, they changed this rule so that a character literal would simply be an int, since this seemed a simpler way of achieving the same thing.
When C++ was being designed, all functions were required to have full prototypes (this is still not required in C, although it is universally accepted as good practice). Because of this, it was decided that a character literal could be stored in a char. The advantage of this in C++ is that a function with a char parameter and a function with an int parameter have different signatures. This advantage is not the case in C.
This is why they are different. Evolution...
I don't know the specific reasons why a character literal in C is of type int. But in C++, there is a good reason not to go that way. Consider this:
void print(int);
void print(char);
print('a');
You would expect that the call to print selects the second version taking a char. Having a character literal being an int would make that impossible. Note that in C++ literals having more than one character still have type int, although their value is implementation defined. So, 'ab'
has type int
, while 'a'
has type char
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With