Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can an integer type with no padding bits have a non-value representation?

I have seen the claim made that, in standard C, reading from an uninitialized object (specifically, an object with indeterminate representation) invokes undefined behavior. This also seems to be how compilers will treat an uninitialized read in practice. For example, the following C program can output 0 0 using GCC 15.1:

#include <stdio.h>

unsigned char calculate(unsigned char *p)
{
    if (!p) return 0;

    unsigned char sum;
    printf("%d ", (int)sum);
    if (!sum) sum += *p;

    return sum;
}

int main() {
    unsigned char one = 1;
    printf("%d\n", calculate(&one));

    return 0;
}

(Godbolt: https://godbolt.org/z/vKEoGadM9)

It is also willing to produce the same result with ,s/unsigned char/int/g, and also with ,s/unsigned char/signed char/g.

If GCC didn't treat an uninitialized read as UB, I would expect the program to print 0 1 or the same nonzero integer twice.

However, I haven't found proof in the standard (I am using n3220 as a reference) that an uninitialized read of an integer with no padding bits (such as unsigned char, and I would also expect int not to have padding bits on most platforms) should be UB.

It may be worth noting that it is the eleventh item of the informative-only J.2 list.

I believe the relevant undefined behavior is defined in 6.2.6.1p5:

Certain object representations do not represent a value of the object type. If such a representation is read by an lvalue expression that does not have character type, the behavior is undefined. [...] Such a representation is called a non-value representation.

The connection to uninitialized automatic variables is found in 6.7.11p11:

If an object that has automatic storage duration is not initialized explicitly, its representation is indeterminate. [...]

and indeterminate representation is defined in 3.23:

object representation that either represents an unspecified value or is a non-value representation

However, I wouldn't expect any integer type that contains no padding bits to have any non-value representations (thus rendering the UB impossible). Both unsigned char and signed char are explicitly called out in 6.2.6.2 as having no padding bits, for example. I would also not expect int or any other non-bit-precise non-bit-field integer to have any padding bits on x86-64 GCC.

Indeed, GCC treats int as if the total number of bits in an int (which must be sizeof(int)*CHAR_BIT per 6.2.6.1p4) equals the number of non-padding bits (which must be INT_WIDTH per 6.2.6.2p2 and 5.2.5.3.2): https://godbolt.org/z/EfE5dacWK.

My questions are:

  1. Is GCC being conformant here?
  2. Can an integer have a non-value representation without having any padding bits?
  3. If so (which I think to be unlikely), could one of the bytes of an object be a non-value representation for unsigned char such that reading a byte of the object through an lvalue of type unsigned char is UB?
like image 406
lateralfricative Avatar asked Oct 16 '25 16:10

lateralfricative


1 Answers

There are actually two main question here which turn out to be unrelated.

First, the question in the title regarding whether integers types without padding bits can have a non-value representation (referred to in older versions of the standard as a trap representation).

The bits of unsigned types are grouped into value bits and padding bits, which the bits of signed types are groups into a single sign bit as well as value bits and padding bits. So if there are no padding bits, we are left with only value bits (and a sign bit for signed types).

This is detailed in section 6.2.6.2p1 for unsigned types:

For unsigned integer types the bits of the object representation shall be divided into two groups: value bits and padding bits. If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N −1, so that objects of that type shall be capable of representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation

And section 6.2.6.2p2 for signed types:

For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. If the corresponding unsigned type has width N , the signed type uses the same number of N bits, its width, as value bits and sign bit. N − 1 are value bits and the remaining bit is the sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type. If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, it has value −(2N −1).

This means that integer types without padding bits do not have a non-value representation.


The second (implied) question is when is reading an uninitialized object undefined behavior.

In the specific case given above, it turns out that this has nothing to do with whether or not the object in question can have a trap representation. The above example exhibits undefined behavior because the variable in question was read without ever having had its address taken.

In such situations, the compiler is free to optimize away the variable's storage. So attempting to read such a variable, even if it's type is unsigned char, can potentially result in a different value being read each time.

This behavior is spelled out in section 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

If the object in question did have it address taken at some point, then if the lvalue in question is an integer type that does not have padding bits, then its value is unspecified as opposed to indeterminate and can be read safely, although there is no guarantee what specific value it will contain.

So to summarize:

  • Integer types without padding bits do not have non-value representations.
  • Your code has undefined behavior because an automatic variable that never had its address taken was read.
  • Because your code has undefined behavior, there are no requirements regarding what GCC does in this particular instance.
like image 116
dbush Avatar answered Oct 18 '25 07:10

dbush



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!