Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is int x = 'fooo' a compiler extension?

Tags:

c++

c

gcc

I have seen and used C++ code like the following:

int myFourcc = 'ABCD';

It works in recent versions of GCC, not sure how recent. Is this feature in the standard? What is it called?

I have had trouble searching the web for it...

EDIT:

I found this info as well, for future observers:

from gcc documentation

The compiler values a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the target int the compiler issues a warning, and the excess leading characters are ignored.

For example, 'ab' for a target with an 8-bit char would be interpreted as (int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')', and '\234a' as (int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')'.

like image 694
jw. Avatar asked Dec 01 '22 07:12

jw.


2 Answers

See section 6.4.4.4, paragraph 10 of the C99 standard:

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Recall that implementation-defined means that the implementation (in this case, the C compiler) can do whatever it wants, but it must be documented.

Most compilers will convert it to an integral constant corresponding to the concatenation of the octets corresponding to the individual characters, but the endianness could be either little- or big-endian, depending on the endianness of the target architecture.

Therefore, portable code should not use multi-character constants and should instead use plain integral constants. Instead of 'abcd', which could be of either endianness, use either 0x61626364 or 0x64636261, which have known endiannesses (big and little respectively).

like image 139
Adam Rosenfield Avatar answered Dec 04 '22 11:12

Adam Rosenfield


"Note that according to the C standard there is no limit on the length of a character constant, but the value of a character constant that contains more than one character is implementation-defined. Recent versions of GCC provide support multi-byte character constants, and instead of an error the warnings multiple-character character constant or warning: character constant too long for its type are generated in this case."

like image 39
chaos Avatar answered Dec 04 '22 13:12

chaos