Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do a bit representation in a C-standard way?

As per the C standard the value representation of a integer type is implementation defined. So 5 might not be represented as 00000000000000000000000000000101 or -1 as 11111111111111111111111111111111 as we usually assume in a 32-bit 2's complement. So even though the operators ~, << and >> are well defined, the bit patterns they will work on is implementation defined. The only defined bit pattern I could find was "§5.2.1/3 A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.".

So my questions is - Is there a implementation independent way of converting integer types to a bit pattern?

We can always start with a null character and do enough bit operations on it to get it to a desired value, but I find it too cumbersome. I also realise that practically all implementations will use a 2's complement representation, but I want to know how to do it in a pure C standard way. Personally I find this topic quite intriguing due to the matter of device-driver programming where all code written till date assumes a particular implementation.

like image 220
tinkerbeast Avatar asked Mar 09 '15 08:03

tinkerbeast


People also ask

What is bit in C?

In C, we can specify size (in bits) of structure and union members. The idea is to use memory efficiently when we know that the value of a field or group of fields will never exceed a limit or is within a small range.

How do you find a bit is set or not in C?

Method 1 (Using Left Shift Operator) 1) Left shift given number 1 by k-1 to create a number that has only set bit as k-th bit. temp = 1 << (k-1) 2) If bitwise AND of n and temp is non-zero, then result is SET else result is NOT SET.

How do Bitwise operators work in C?

Binary AND Operator copies a bit to the result if it exists in both operands. Binary OR Operator copies a bit if it exists in either operand. Binary XOR Operator copies the bit if it is set in one operand but not both.

What is bit and byte in C?

Utilities Operating System Computer Hardware OS Designer End user Programmer Page 4 Copyright @ 2008 Ananda Gunawardena Bits, Bytes and Data Types A bit is the smallest unit of storage represented by 0 or 1. A byte is typically 8 bits. C character data type requires one byte of storage. A file is a sequence of bytes.


2 Answers

In general, it's not that hard to accommodate unusual platforms for the most cases (if you don't want to simply assume 8-bit char, 2's complement, no padding, no trap, and truncating unsigned-to-signed conversion), the standard mostly gives enough guarantees (a few macros to inspect certain implementation details would be helpful, though).

As far as a strictly conforming program can observe (outside bit-fields), 5 is always encoded as 00...0101. This is not necessarily the physical representation (whatever this should mean), but what is observable by portable code. A machine using Gray code internally, for example, would have to emulate a "pure binary notation" for bitwise operators and shifts.

For negative values of signed types, different encodings are allowed, which leads to different (but well-defined for every case) results when re-interpreting as the corresponding unsigned type. For example, strictly conforming code must distinguish between (unsigned)n and *(unsigned *)&n for a signed integer n: They are equal for two's complement without padding bits, but different for the other encodings if n is negative.

Further, padding bits may exist, and signed integer types may have more padding bits than their corresponding unsigned counterparts (but not the other way round, type-punning from signed to unsigned is always valid). sizeof cannot be used to get the number of non-padding bits, so e.g. to get an unsigned value where only the sign-bit (of the corresponding signed type) is set, something like this must be used:

#define TYPE_PUN(to, from, x) ( *(to *)&(from){(x)} ) unsigned sign_bit = TYPE_PUN(unsigned, int, INT_MIN) &                     TYPE_PUN(unsigned, int, -1) & ~1u; 

(there are probably nicer ways) instead of

unsigned sign_bit = 1u << sizeof sign_bit * CHAR_BIT - 1; 

as this may shift by more than the width. (I don't know of a constant expression giving the width, but sign_bit from above can be right-shifted until it's 0 to determine it, Gcc can constant-fold that.) Padding bits can be inspected by memcpying into an unsigned char array, though they may appear to "wobble": Reading the same padding bit twice may give different results.

If you want the bit pattern (without padding bits) of a signed integer (little endian):

int print_bits_u(unsigned n) {     for(; n; n>>=1) {         putchar(n&1 ? '1' : '0'); // n&1 never traps     }     return 0; }  int print_bits(int n) {     return print_bits_u(*(unsigned *)&n & INT_MAX);     /* This masks padding bits if int has more of them than unsigned int.      * Note that INT_MAX is promoted to unsigned int here. */ }  int print_bits_2scomp(int n) {     return print_bits_u(n); } 

print_bits gives different results for negative numbers depending on the representation used (it gives the raw bit pattern), print_bits_2scomp gives the two's complement representation (possibly with a greater width than a signed int has, if unsigned int has less padding bits).

Care must be taken not to generate trap representations when using bitwise operators and when type-punning from unsigned to signed, see below how these can potentially be generated (as an example, *(int *)&sign_bit can trap with two's complement, and -1 | 1 can trap with ones' complement).

Unsigned-to-signed integer conversion (if the converted value isn't representable in the target type) is always implementation-defined, I would expect non-2's complement machines to differ from the common definition more likely, though technically, it could also become an issue on 2's complement implementations.

From C11 (n1570) 6.2.6.2:

(1) For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.

(2) For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N ). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:

  • the corresponding value with sign bit 0 is negated (sign and magnitude);
  • the sign bit has the value -(2M) (two's complement);
  • the sign bit has the value -(2M-1) (ones' complement).

Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.

like image 89
mafso Avatar answered Sep 22 '22 16:09

mafso


To add to mafso's excellent answer, there's a part of the ANSI C rationale which talks about this:

The Committee has explicitly restricted the C language to binary architectures, on the grounds that this stricture was implicit in any case:

  • Bit-fields are specified by a number of bits, with no mention of “invalid integer” representation. The only reasonable encoding for such bit-fields is binary.
  • The integer formats for printf suggest no provision for “invalid integer” values, implying that any result of bitwise manipulation produces an integer result which can be printed by printf.
  • All methods of specifying integer constants — decimal, hex, and octal — specify an integer value. No method independent of integers is defined for specifying “bit-string constants.” Only a binary encoding provides a complete one-to-one mapping between bit strings and integer values.

The restriction to binary numeration systems rules out such curiosities as Gray code and makes possible arithmetic definitions of the bitwise operators on unsigned types.

The relevant part of the standard might be this quote:

3.1.2.5 Types

[...]

The type char, the signed and unsigned integer types, and the enumerated types are collectively called integral types. The representations of integral types shall define values by use of a pure binary numeration system.

like image 37
benj Avatar answered Sep 23 '22 16:09

benj