Bug when converting a byte array to a uint64

Question

I'm trying to figure out a bug when converting a byte array to a uint64_t (unsigned long long).

I know you can do that with shifts, but that's not my question. I want to do this with multiplications. I would like to know why the multiplications do not go as planned in some cases.

The result of this code is correct:

#include <stdio.h>

int main()
{
    unsigned char byte[8];
    unsigned long long x;

    byte[0] = 0x15;
    byte[1] = 0x15; 
    byte[2] = 0x17;
    byte[3] = 0x18; 
    byte[4] = 0x19;
    byte[5] = 0x20;
    byte[6] = 0x21;
    byte[7] = 0x12;
            
    x = (byte[0] * 0x100000000000000) +
        (byte[1] * 0x1000000000000) + 
        (byte[2] * 0x10000000000) + 
        (byte[3] * 0x100000000);

    x = x +
        (byte[4] << 24) + 
        (byte[5] << 16) + 
        (byte[6] << 8) + 
        byte[7];

    printf("%llx
", x);
}

The result is 1515171819202112, which is correct.

But with this case, where I use these bytes instead:

    byte[0] = 0x15;
    byte[1] = 0x92;  // changed 
    byte[2] = 0x53;  // changed
    byte[3] = 0x22;  // changed
    byte[4] = 0xec;  // changed
    byte[5] = 0x33;  // changed
    byte[6] = 0x99;  // changed
    byte[7] = 0x12;

...the result is wrong.

I get:
15925321ec339912.

But, it should be:
15925322ec339912.

Why? What is the bug?

dbush · Accepted Answer

In this subexpression:

byte[4]<<24

You start with an unsigned char value. This value first gets promoted to type int. This behavior is detailed in section 6.3.1.1p2 of the C standard regarding arithmetic conversions:

The following may be used in an expression wherever an int or unsigned int may be used:

An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to the rank of int and unsigned int.

A bit-field of type _Bool, int, signed int, or unsigned int.

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

So the promoted value of byte[4] has type int which is signed and (most likely) 32 bit. You then shift this int value left by 24. Given that the original value was 0xec, this results in a value of 1 being shifted into the sign bit of the resulting int value.

Shifting a 1 into the sign bit results in undefined behavior. This is spelled out in section 6.5.7p4 of the C standard regarding bitwise shift operators:

The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

You can correct this by first casting the value to unsigned long long so that the shift is valid.

x=x+((unsigned long long)byte[4]<<24)+(byte[5]<<16)+(byte[6]<<8)+byte[7];

chqrlie · Answer

As explained by @dbush, byte[4] << 24 has undefined behavior when byte[4] has a value greater than 127.

On your architecture the most likely effect of shifting a 1 bit into the sign bit of type int is to create a negative value: 0xec << 24 evaluates to -335544320. In the expression x + (byte[4] << 24) + (byte[5] << 16) + (byte[6] << 8) + byte[7], adding (byte[4] << 24) causes the int value to be converted to unsigned long long as 0xffffffec000000 which effectively subtracts 0x100000000 from the high 32 bits. This explains the observed behavior.

This is non intuitive but not related to the << shift operator. The same problem would occur with this code:

    x = (byte[0] * 0x100000000000000) +
        (byte[1] * 0x1000000000000) + 
        (byte[2] * 0x10000000000) + 
        (byte[3] * 0x100000000) +
        (byte[4] * 0x1000000) +
        (byte[5] * 0x10000) +
        (byte[6] * 0x100) +
        (byte[7] * 0x1);

As a matter of fact, there are 2 instances of potential undefined behavior in the above expression: byte[0] * 0x100000000000000 and byte[4] * 0x1000000 because of signed arithmetic overflow.

You can use multiplications to achieve your conversion by forcing unsigned long long arithmetics as an alternative to shifts:

#include <stdio.h>

unsigned long long read64_mul(const unsigned char *byte) {
    return byte[0] * 0x100000000000000ULL +
           byte[1] * 0x1000000000000ULL + 
           byte[2] * 0x10000000000ULL + 
           byte[3] * 0x100000000ULL +
           byte[4] * 0x1000000ULL +
           byte[5] * 0x10000ULL +
           byte[6] * 0x100ULL +
           byte[7] * 0x1ULL;
}

unsigned long long read64_shift(const unsigned char *byte) {
    return ((unsigned long long)byte[0] << 56) |
           ((unsigned long long)byte[1] << 48) |
           ((unsigned long long)byte[2] << 40) |
           ((unsigned long long)byte[3] << 32) |
           ((unsigned long long)byte[4] << 24) |
           ((unsigned long long)byte[5] << 16) |
           ((unsigned long long)byte[6] << 8) |
           ((unsigned long long)byte[7] << 0);
}

int main()
{
    unsigned char byte[8] = {
        0x15, 0x92, 0x53, 0x22, 0xec, 0x33, 0x99, 0x12
    };
            
    unsigned long long x1 = read64_mul(byte);
    unsigned long long x2 = read64_shift(byte);

    printf("%llx
%llx
", x1, x2);
    return 0;
}

Output:

15925322ec339912
15925322ec339912

Yet this does not seem to be a good idea: as can be verified on Godbolt's Compiler Explorer, both gcc and clang recognise the expression and generate optimal code for the shift version but they do not recognise the multiplication approach and generate naive code even at -O3, although clang efficiently inlines both calls for the known array contents.

Bug when converting a byte array to a uint64

Tags:

c

serialization

deserialization

2 Answers

dbush

chqrlie

Recent Activity

Donate For Us

Bug when converting a byte array to a uint64

Tags:

c

serialization

deserialization

2 Answers

dbush

chqrlie

Related questions

Recent Activity

Donate For Us