Today, I stumbled over something like this:
#define FOO 2u
#if (FOO == 2)
unsigned int foo = FOO;
#endif
Regardless of why the code is as it is (let's not question the why
), I was wondering to which degree the preprocessor can even handle integer literal suffixes. I was actually surprised that it works at all.
After doing some experiments with GCC and C99 with this code ...
#include <stdio.h>
int main()
{
#if (1u == 1)
printf("1u == 1\n");
#endif
#if (1u + 1l == 2ll)
printf("1u + 1l == 2ll\n");
#endif
#if (1ull - 2u == -1)
printf("1ull - 2u == -1\n");
#endif
#if (1u - 2u == 0xFFFFFFFFFFFFFFFF)
printf("1u - 2u == 0xFFFFFFFFFFFFFFFF\n");
#endif
#if (-1 == 0xFFFFFFFFFFFFFFFF)
printf("-1 == 0xFFFFFFFFFFFFFFFF\n");
#endif
#if (-1l == 0xFFFFFFFFFFFFFFFF)
printf("-1l == 0xFFFFFFFFFFFFFFFF\n");
#endif
#if (-1ll == 0xFFFFFFFFFFFFFFFF)
printf("-1ll == 0xFFFFFFFFFFFFFFFF\n");
#endif
}
... which just prints all the statements:
1u == 1
1u + 1l == 2ll
1ull - 2u == -1
1u - 2u == 0xFFFFFFFFFFFFFFFF
-1 == 0xFFFFFFFFFFFFFFFF
-1l == 0xFFFFFFFFFFFFFFFF
-1ll == 0xFFFFFFFFFFFFFFFF
... I guess the preprocessor simply ignores integer literal suffixes altogether and probably always does arithmetics and comparisons in the native integer size, in this case 64 bit?
I wanted to find out by myself and checked out Wikipedia and the C standard (working paper). I found information about integer suffixes and information about the preprocessor, but none about the combination of these. Obviously, I have also googled it but didn't get any useful results.
I have seen this Stack Overflow question that clarifies where it should be specified, but yet, I couldn't find an answer for my questions.
Integer Literals An integer literal can be a decimal, octal, or hexadecimal constant. A prefix specifies the base or radix: 0x or 0X for hexadecimal, 0 for octal, and nothing for decimal. An integer literal can also have a suffix that is a combination of U and L, for unsigned and long, respectively.
Prefixes which indicates the base. For example, 0x10 indicates the value 16 in hexadecimal having prefix 0x. Suffixes which indicates the type. For example, 12345678901234LL indicates the value 12345678901234 as an long long integer having suffix LL.
It is extremely easy to inadvertently create an integer object with the wrong value, because '013' means 'decimal 11', not 'decimal 13', to the Python language itself, which is not the meaning that most humans would assign to this literal.
An integer has no fractional part and cannot include a decimal point. Built-in data types of SQL that can be exactly represented as literal integers include BIGINT, BIGSERIAL, DECIMAL(p, 0), INT, INT8, SERIAL, SERIAL8, and SMALLINT.
- To which degree does the preprocessor regard integer literal suffixes? Or does it just ignore them?
The type suffixes of integer constants are not inherently meaningful to the preprocessor, but they are an inherent part of the corresponding preprocessing tokens, not separate. The standard has this to say about them:
A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.
Preprocessing number tokens lexically include all floating and integer constant tokens.
(C11 6.4.8/2-3; emphasis added)
For the most part, the preprocessor doesn't treat preprocessing tokens of this type any differently than any other. The exception is in the controlling expressions of #if
directives, which are evaluated by performing macro expansion, replacing identifiers with 0, and then converting each preprocessing token into a token before evaluating the result according to C rules. Converting to tokens accounts for the type suffixes, yielding bona fide integer constants.
This does not necessarily produce results identical to those you would get from runtime evaluation of the same expressions, however, because
For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types
intmax_t
anduintmax_t
.
(C2011, 6.10.1/4)
You go on to ask
- Are there any dependencies or different behaviors with different environments, e.g. different compilers, C vs. C++, 32 bit vs. 64 bit machine, etc.? I.e., what does the preprocessor's behavior depend on?
The only direct dependency is the implementation's definitions of intmax_t
and uintmax_t
. These are not directly tied to language choice or machine architecture, though there may be correlations with those.
- Where is all that specified/documented?
In the respective languages' language specifications, of course. I've cited the two of the more relevant sections of the C11 specification, and linked you to a late draft of that standard. (The current C is C18, but it hasn't changed in any of these regards.)
C 2018 6.10.1 deals with conditional inclusion (#if
and related statements and the defined
operator). Paragraph 1 says:
The expression that controls conditional inclusion shall be an integer constant expression except that: identifiers (including those lexically identical to keywords) are interpreted as described below; and it may contain unary operator expressions of the form
defined
identifieror
defined
(
identifier)
…
Integer constant expression is defined in 6.6 6:
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants,
sizeof
expressions whose results are integer constants,_Alignof
expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to thesizeof
or_Alignof
operator.
That paragraph is for C generally, not just the preprocessor. So the expressions that can appear in #if
statements are the same as the integer constant expressions that can appear generally in C. However, as stated in the quote above, sizeof
and _Alignof
are just identifiers; they are not recognized as C operators. In particular, 6.10.1 4 tells us:
… After all replacements due to macro expansion and the
defined
unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number0
,…
So, where sizeof
or _Alignof
appear in a #if
expression, it becomes 0
. Thus, a #if
expression can only have operands that are constants and defined
expressions.
Paragraph 4 goes on to say:
… The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types
intmax_t
anduintmax_t
defined in the header<stdint.h>
.…
6.6 is the section for constant expressions.
So, the compiler will accept integer suffixes in #if
expressions, and that does not depend on the C implementation (for the suffixes required in the core C language; implementations could allow extensions). However, all the arithmetic will be performed using intmax_t
or uintmax_t
, and those do depend on the implementation. If your expressions do not depend on the width of integers above the minimum required1, they should be evaluated the same in any C implementation.
Additionally, paragraph 4 goes on to say there may be some variations with character constants and values, which I omit here as it is not relevant to this question.
1intmax_t
designates a signed type capable of representing any value of any signed integer type (7.20.1.5 1), and long long int
is a signed type that must be at least 64 bits (5.2.4.2.1 1), so any conforming C implementation must provide 64-bit integer arithmetic in the preprocessor.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With