Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are character constants always positive?

I'm curious if I can compile

int map [] = { [ /*(unsigned char)*/ 'a' ]=1 };

regardless of platform or if it's better to cast character constants to unsigned char prior to using them as indices.

like image 485
PSkocik Avatar asked May 30 '19 19:05

PSkocik


People also ask

What is a character constant?

A "character constant" is formed by enclosing a single character from the representable character set within single quotation marks (' '). Character constants are used to represent characters in the execution character set.

Which is the correct option for character constant?

A character constant is formed by enclosing a “single character” from the representable character set within single quotation marks (' '). Hence the correct answer is “option 3”.

What is the example of character constant?

A character constant is one or more characters enclosed in single quotes, such as 'A' , '+' , or '\n' . In the mikroC PRO for PIC, single-character constants are of the unsigned int type.

What is not character constant?

Answer: 6 isn't a constant character. As long as the character is enclosed in quotation marks (“), it is considered to be an enclosing character constant. During the execution of a programme, characters in the character set are represented by character constants.


2 Answers

I'm curious if I can compile

int map [] = { [ /*(unsigned char)*/ 'a' ]=1 };

regardless of platform or if it's better to cast character constants to unsigned char prior to using them as indices.

Your specific code is safe.

'a' is an integer character constant. The language specifies of these that

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. [...] If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

(C2011, paragraph 6.4.4.4/10)

It furthermore specifies that

If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative.

(C2011, paragraph 6.2.5/3)

and it requires of every implementation that both the basic source and basic execution character sets contain, among other characters, the lowercase Latin letters, including 'a'. (C2011, paragraph 5.2.1/3)

You should take care, however: an integer character constant for a character that is not a member of the basic execution character set, including a multibyte character, or for a multi-character integer character constant does need not to be nonnegative. Some of those could, in principle, be negative even on machines where default char is an unsigned type.

Moreover, again considering multibyte characters, the cast to unsigned char is not necessarily safe either, in that you could produce collisions that way. To be sure to avoid collisions, you would need to convert to unsigned int, but that could produce much larger arrays than you expect. If you stick to the basic character sets then you're ok. If you stick to single-byte characters then you're ok with a cast. If you must accommodate multibyte characters then for portability, you should probably choose a different approach.

like image 32
John Bollinger Avatar answered Nov 09 '22 14:11

John Bollinger


A character constant is a positive values of int, if it is based on a member of the basic execution-time character set.

Since a is in that basic character set, we know that 'a' is required to be positive.

On the other hand, for example, '\xFF' might not be positive. The FF value will be regarded as the bit pattern for a char, which could be signed, giving us a -1 due to two's complement. Similar reasoning will apply if instead of a numeric escape, we use a character that corresponds to a negative value of type char, like characters corresponding to the 0x80-0xFF byte range on 8-bit systems.

It was like this in ANSI C89 and C90, where I'm relying on my memory; but the requirements persist through newer drafts and standards. In the n1570 draft, we have these items:

  1. 6.4.4.4 Character Constants, paragraph 10: "If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int."

  2. 6.2.5 Types, paragraph 3: "If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative."

A character constant is not a "char object", but the requirements in 6.4.4.4 specify that the value of a character constant is determined using the char representation: "... one that results when an object with type char whose value ...".


† The numeric escape sequences for an unprefixed character constants and those prefixed with L have an associated "corresponding type" which is unsigned and are required to be in that type's range (6.4.4.4 9). The idea is that character values are specified as an unsigned value, which gives their bit-wise representation which is then interpreted as char. This intent is also conveyed in Example 2 (6.4.4.4 13).

like image 163
Kaz Avatar answered Nov 09 '22 14:11

Kaz