Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the C and C++ standards say about bit-level integer representation and manipulation?

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:

  1. If all the bits in an integer type are zero, does the integer as whole represent zero?
  2. If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
  3. Is there a guaranteed way to check if any bit is not set?
  4. Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
  5. Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
  6. If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?

I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?

I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").

Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.

like image 561
Cornstalks Avatar asked Aug 25 '12 21:08

Cornstalks


4 Answers

(1) If all the bits in an integer type are zero, does the integer as whole represent zero?

Yes, the bit pattern consisting of all zeroes always represents 0:

The representations of integral types shall define values by use of a pure binary numeration system.49[§3.9.1/7]

49 A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.


(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

No. In fact, signed magnitude is specifically allowed:

[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ] [§3.9.1/7]


(3) Is there a guaranteed way to check if any bit is not set?

I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]


(4) Is there a guaranteed way to check if any bit is set?

I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.


(5) Is there a guaranteed way to set the left-most and/or right-most bits?

Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:

Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]

like image 79
John Calsbeek Avatar answered Nov 05 '22 21:11

John Calsbeek


You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.

The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.

As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.

I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.

like image 21
AnT Avatar answered Nov 05 '22 21:11

AnT


You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.

If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.

like image 26
Jens Gustedt Avatar answered Nov 05 '22 21:11

Jens Gustedt


Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.

Good luck finding a machine nowadays that uses signed magnitude or one's complement.

like image 21
David Hammen Avatar answered Nov 05 '22 20:11

David Hammen