I'm trying to figure out a bug when converting a byte array to a uint64_t (unsigned long long).
I know you can do that with shifts, but that's not my question. I want to do this with multiplications. I would like to know why the multiplications do not go as planned in some cases.
The result of this code is correct:
#include <stdio.h>
int main()
{
unsigned char byte[8];
unsigned long long x;
byte[0] = 0x15;
byte[1] = 0x15;
byte[2] = 0x17;
byte[3] = 0x18;
byte[4] = 0x19;
byte[5] = 0x20;
byte[6] = 0x21;
byte[7] = 0x12;
x = (byte[0] * 0x100000000000000) +
(byte[1] * 0x1000000000000) +
(byte[2] * 0x10000000000) +
(byte[3] * 0x100000000);
x = x +
(byte[4] << 24) +
(byte[5] << 16) +
(byte[6] << 8) +
byte[7];
printf("%llx\n", x);
}
The result is 1515171819202112, which is correct.
But with this case, where I use these bytes instead:
byte[0] = 0x15;
byte[1] = 0x92; // changed
byte[2] = 0x53; // changed
byte[3] = 0x22; // changed
byte[4] = 0xec; // changed
byte[5] = 0x33; // changed
byte[6] = 0x99; // changed
byte[7] = 0x12;
...the result is wrong.
I get:
15925321ec339912.
But, it should be:
15925322ec339912.
Why? What is the bug?
In this subexpression:
byte[4]<<24
You start with an unsigned char value. This value first gets promoted to type int. This behavior is detailed in section 6.3.1.1p2 of the C standard regarding arithmetic conversions:
The following may be used in an expression wherever an
intorunsigned intmay be used:
- An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.
- A bit-field of type
_Bool,int,signed int, orunsigned int.If an
intcan represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to anint; otherwise, it is converted to anunsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
So the promoted value of byte[4] has type int which is signed and (most likely) 32 bit. You then shift this int value left by 24. Given that the original value was 0xec, this results in a value of 1 being shifted into the sign bit of the resulting int value.
Shifting a 1 into the sign bit results in undefined behavior. This is spelled out in section 6.5.7p4 of the C standard regarding bitwise shift operators:
The result of
E1 << E2is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
You can correct this by first casting the value to unsigned long long so that the shift is valid.
x=x+((unsigned long long)byte[4]<<24)+(byte[5]<<16)+(byte[6]<<8)+byte[7];
As explained by @dbush, byte[4] << 24 has undefined behavior when byte[4] has a value greater than 127.
On your architecture the most likely effect of shifting a 1 bit into the sign bit of type int is to create a negative value: 0xec << 24 evaluates to -335544320. In the expression x + (byte[4] << 24) + (byte[5] << 16) + (byte[6] << 8) + byte[7], adding (byte[4] << 24) causes the int value to be converted to unsigned long long as 0xffffffec000000 which effectively subtracts 0x100000000 from the high 32 bits. This explains the observed behavior.
This is non intuitive but not related to the << shift operator. The same problem would occur with this code:
x = (byte[0] * 0x100000000000000) +
(byte[1] * 0x1000000000000) +
(byte[2] * 0x10000000000) +
(byte[3] * 0x100000000) +
(byte[4] * 0x1000000) +
(byte[5] * 0x10000) +
(byte[6] * 0x100) +
(byte[7] * 0x1);
As a matter of fact, there are 2 instances of potential undefined behavior in the above expression: byte[0] * 0x100000000000000 and byte[4] * 0x1000000 because of signed arithmetic overflow.
You can use multiplications to achieve your conversion by forcing unsigned long long arithmetics as an alternative to shifts:
#include <stdio.h>
unsigned long long read64_mul(const unsigned char *byte) {
return byte[0] * 0x100000000000000ULL +
byte[1] * 0x1000000000000ULL +
byte[2] * 0x10000000000ULL +
byte[3] * 0x100000000ULL +
byte[4] * 0x1000000ULL +
byte[5] * 0x10000ULL +
byte[6] * 0x100ULL +
byte[7] * 0x1ULL;
}
unsigned long long read64_shift(const unsigned char *byte) {
return ((unsigned long long)byte[0] << 56) |
((unsigned long long)byte[1] << 48) |
((unsigned long long)byte[2] << 40) |
((unsigned long long)byte[3] << 32) |
((unsigned long long)byte[4] << 24) |
((unsigned long long)byte[5] << 16) |
((unsigned long long)byte[6] << 8) |
((unsigned long long)byte[7] << 0);
}
int main()
{
unsigned char byte[8] = {
0x15, 0x92, 0x53, 0x22, 0xec, 0x33, 0x99, 0x12
};
unsigned long long x1 = read64_mul(byte);
unsigned long long x2 = read64_shift(byte);
printf("%llx\n%llx\n", x1, x2);
return 0;
}
Output:
15925322ec339912
15925322ec339912
Yet this does not seem to be a good idea: as can be verified on Godbolt's Compiler Explorer, both gcc and clang recognise the expression and generate optimal code for the shift version but they do not recognise the multiplication approach and generate naive code even at -O3, although clang efficiently inlines both calls for the known array contents.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With