Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does POSIX mean by "two's complement representation" in stdint.h?

It's a well-known fact that signed integer overflow invokes undefined behavior, and that bitwise manipulation of signed integers is unreliable at best. Thus, I found this line in the POSIX standard rather curious:

The typedef name int N _t designates a signed integer type with width N, no padding bits, and a two's-complement representation. Thus, int8_t denotes a signed integer type with a width of exactly 8 bits.

This is a rather nebulous statement, which could mean any combination of these things:

  1. The range of intN_t is from INTN_MIN to INTN_MAX.
  2. sizeof(intN_t) == N/8
  3. Bitwise operations on intN_t behave as expected for a two's complement representation. -1 ^ x == ~x for every x, after inserting intN_t casts everywhere.
  4. intN_t cannot have trap representations (and an optimizing compiler must not exploit possible traps).
  5. Overflow of an intN_t variable is defined behavior (and wraps from INTN_MAX to INTN_MIN).

(1) and (2) both seem pretty obviously true to me, based on the rest of the document. (1) is explicitly specified by the definition of INTN_MIN/MAX. (2) is implied by "no padding bits."

Which of (3), (4), and (5) are required by POSIX, if any?

like image 975
Kevin Avatar asked Jan 04 '23 15:01

Kevin


1 Answers

TL;DR

1, 3, 4 are true on any C99, C11 compiler where intN_t exists. 2 is true on any C11 compiler where int8_t is present - because the presence of int8_t implies that CHAR_BIT is 8. 5 specifically not required by C - behaviour on signed integer overflow is undefined.

POSIX restricts the allowed C implementations so that CHAR_BIT must be 8 and integer representation is two's complement. Therefore a compliant C99/C11 compiler on a POSIX platform must have int8_t, which makes statements 1, 2, 3 and 4 true on POSIX. Since POSIX does not say anything about signed integer overflow, it remains undefined, therefore 5 is false.


The quoted sentence is taken verbatim from C11 (C99) standard. C11 7.20.1.1p1:

The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two's complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

int8_t is optional in C, so the mere presence of this fragment in the standard doesn't even require 2's complement representation. C11 7.20.1.1p3:

These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two's complement representation, it shall define the corresponding typedef names.

Of your original statements,

  1. is of course true, but such thing doesn't follow from two's complement representation. int ranges from INT_MIN to INT_MAX on any one's complement architecture too. However, what follows from this is that the INTN_MIN has value .

  2. does not follow from two's complement representation. sizeof(intN_t) is N / CHAR_BIT. However, POSIX requires thatCHAR_BIT is 8 so sizeof(intN_t) indeed is N / 8

  3. This is the only thing that follows from two's complement representation

  4. does not follow from a two's complement representation but the combination of 2's complement and that there are no padding bits.

  5. does not follow from a two's complement representation, and is required by neither C nor POSIX.


The type int8_t is not specified by POSIX but the C99, C11 that POSIX adopts and augments. POSIX adds two restrictions: * that CHAR_BIT must be exactly 8 and not greater, even though allowed by the C programming language * that one's complement or the sign and magnitude representations for integer numbers are not allowed, even if they would be allowed by the C programming language

C99, C11 specify that if present, the types intN_t must have exactly so many bits with no padding bits, and 2's complement. If int8_t exists, its sizeof must be 1, because it is a synonym for signed char then and CHAR_BIT is equal to 8.

There will not be a trap representations of intN_t but it doesn't mean that those objects with decidedly indeterminate values must have the same value all over, or that the passing such values to library functions will have defined behaviour. Consider the following fragment:

int32_t *foo = malloc(sizeof(int32_t));
printf(PRId32 "\n", *foo);
printf(PRId32 "\n", *foo);
free(foo);

The compiler need not even call malloc; it could compile this to the equivalent of puts("42\n666"); - even though there are no trap values of type int32_t. This is because malloc:

[...] allocates space for an object whose size is specified by size and whose value is indeterminate.

And indeterminate really means unstable; indeterminable


Behaviour on signed integer overflow remains always undefined.

like image 148