I'm curious if I can compile
int map [] = { [ /*(unsigned char)*/ 'a' ]=1 };
regardless of platform or if it's better to cast character constants to unsigned char
prior to using them as indices.
A "character constant" is formed by enclosing a single character from the representable character set within single quotation marks (' '). Character constants are used to represent characters in the execution character set.
A character constant is formed by enclosing a “single character” from the representable character set within single quotation marks (' '). Hence the correct answer is “option 3”.
A character constant is one or more characters enclosed in single quotes, such as 'A' , '+' , or '\n' . In the mikroC PRO for PIC, single-character constants are of the unsigned int type.
Answer: 6 isn't a constant character. As long as the character is enclosed in quotation marks (“), it is considered to be an enclosing character constant. During the execution of a programme, characters in the character set are represented by character constants.
I'm curious if I can compile
int map [] = { [ /*(unsigned char)*/ 'a' ]=1 };
regardless of platform or if it's better to cast character constants to unsigned char prior to using them as indices.
Your specific code is safe.
'a'
is an integer character constant. The language specifies of these that
An integer character constant has type
int
. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. [...] If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
(C2011, paragraph 6.4.4.4/10)
It furthermore specifies that
If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative.
(C2011, paragraph 6.2.5/3)
and it requires of every implementation that both the basic source and basic execution character sets contain, among other characters, the lowercase Latin letters, including 'a'. (C2011, paragraph 5.2.1/3)
You should take care, however: an integer character constant for a character that is not a member of the basic execution character set, including a multibyte character, or for a multi-character integer character constant does need not to be nonnegative. Some of those could, in principle, be negative even on machines where default char
is an unsigned type.
Moreover, again considering multibyte characters, the cast to unsigned char
is not necessarily safe either, in that you could produce collisions that way. To be sure to avoid collisions, you would need to convert to unsigned int
, but that could produce much larger arrays than you expect. If you stick to the basic character sets then you're ok. If you stick to single-byte characters then you're ok with a cast. If you must accommodate multibyte characters then for portability, you should probably choose a different approach.
A character constant is a positive values of int
, if it is based on a member of the basic execution-time character set.
Since a
is in that basic character set, we know that 'a'
is required to be positive.
On the other hand, for example, '\xFF'
might not be positive. The FF
value will be regarded as the bit pattern for a char
†, which could be signed, giving us a -1 due to two's complement. Similar reasoning will apply if instead of a numeric escape, we use a character that corresponds to a negative value of type char
, like characters corresponding to the 0x80-0xFF byte range on 8-bit systems.
It was like this in ANSI C89 and C90, where I'm relying on my memory; but the requirements persist through newer drafts and standards. In the n1570 draft, we have these items:
6.4.4.4 Character Constants, paragraph 10: "If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int."
6.2.5 Types, paragraph 3: "If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative."
A character constant is not a "char
object", but the requirements in 6.4.4.4 specify that the value of a character constant is determined using the char
representation: "... one that results when an object with type char whose value ...".
† The numeric escape sequences for an unprefixed character constants and those prefixed with L
have an associated "corresponding type" which is unsigned and are required to be in that type's range (6.4.4.4 9). The idea is that character values are specified as an unsigned value, which gives their bit-wise representation which is then interpreted as char
. This intent is also conveyed in Example 2 (6.4.4.4 13).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With