Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to safely offset bits without undefined behaviour?

Tags:

c++

std-bitset

I'm writting a function that will convert a bitset to a int/uint value considering that the bitset could have fewer bits than target type.

Here is the function I wrote:

template <typename T,size_t count> static T convertBitSetToNumber( const std::bitset<count>& bitset )
{
    T result;
    #define targetSize (sizeof( T )*CHAR_BIT)
    if ( targetSize > count )
    {
        // if bitset is 0xF00, converting it as 0x0F00 will lose sign information (0xF00 is negative, while 0x0F00 is positive)
        // This is because sign bit is on the left.
        // then, we need to add a zero (4bits) on the right and then convert 0xF000, later, we will divide by 16 (2^4) to preserve sign and value

        size_t missingbits = targetSize - count;

        std::bitset<targetSize> extended;
        extended.reset(); // set all to 0
        for ( size_t i = 0; i != count; ++i )
        {
            if ( i < count )
                extended[i+missingbits] = bitset[i];
        }

        result = static_cast<T>( extended.to_ullong() );

        result = result >> missingbits;

        return result;
    }
    else
    {
        return static_cast<T>( bitset.to_ullong() );
    }
}

And the "test program":

uint16_t val1 = Base::BitsetUtl::convertBitSetToNumber<uint16_t,12>( std::bitset<12>( "100010011010" ) );
// val1 is 0x089A
int16_t val2 = Base::BitsetUtl::convertBitSetToNumber<int16_t,12>( std::bitset<12>( "100010011010" ) );
// val2 is 0xF89A

Note: See comment/exchange with Ped7g, he code above is right and preserves bit sign and does the 12->16bits conversion right for signed or unsigned bits. But if you are looking on how to offset 0xABC0 to 0x0ABC on a signed object, the answers could help you, so I don't delete the question.

See the program works when using uint16 as target type, as:

uint16_t val = 0x89A0; // 1000100110100000
val = val >> 4;        // 0000100010011010

However, it fails when using int16_t, because 0x89A0 >> 4 is 0xF89A instead of expected 0x089A.

int16_t val = 0x89A0; // 1000100110100000
val = val >> 4;       // 1111100010011010

I don't understand why >> operator sometimes insert 0 and sometimes 1. And I can't find out how to safely do the final operation of my function (result = result >> missingbits; must be wrong at some point...)

like image 870
jpo38 Avatar asked Sep 22 '16 14:09

jpo38


2 Answers

It's because shifting is an arithmetic operation, and that promotes the operands to int, which will do sign extension.

I.e. promoting the signed 16-bit integer (int16_t) 0x89a0 to a 32-bit signed integer (int) causes the value to become 0xffff89a0, which is the value that is shifted.

See e.g. this arithmetic operation conversion reference for more information.

You should cast the variable (or value) to an unsigned integer (i.e. uint16_t in your case):

val = static_cast<uint16_t>(val) >> 4;

If the type is not really know, like if it's a template argument, then you can use std::make_unsigned:

val = static_cast<typename std::make_unsigned<T>::type>(val) >> 4;
like image 61
Some programmer dude Avatar answered Sep 30 '22 09:09

Some programmer dude


This is called arithmetic shifting. On signed types, the most significant bit is the sign bit. When you shift a negative value to the right, the upper bits are set to 1, such that the result is still a negative number. (The result is a division by 2n where n is the number of bits shifted, with rounding towards negative infinity).

To avoid that, use an unsigned type. Shifting them uses logical shifting, which will set the upper bits to 0.

Change this line:

result = result >> missingbits;

to

result = static_cast<T>(static_cast<uintmax_t>(result) >> missingbits);

(uintmax_t is the maximum width unsigned integer type supported by the compiler)

or use std::make_unsigned as Joachim Pileborg wrote in his answer.

like image 42
alain Avatar answered Sep 30 '22 11:09

alain