Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can ~3 safely be widened automatically?

While answering another question, I ended up trying to justify casting the operand to the ~ operator, but I was unable to come up with a scenario where not casting it would yield wrong results.

I am asking this clarification question in order to be able to clean up that other question, removing the red herrings and keeping only the most relevant information intact.

The problem in question is that we want to clear the two lowermost bits of a variable:

offset = offset & ~3;

This looks dangerous, because ~3 will be an int no matter what offset is, so we might end up masking the bits that do not fit into int's width. For example if int is 32 bits wide and offset is of a 64 bit wide type, one could imagine that this operation would lose the 32 most significant bits of offset.

However, in practice this danger does not seem to manifest itself. Instead, the result of ~3 is sign-extended to fill the width of offset, even when offset is unsigned.

Is this behavior mandated by the standard? I am asking because it seems that this behavior could rely on specific implementation and/or hardware details, but I want to be able to recommend code that is correct according to the language standard.


I can make the operation produce an undesired result if I try to remove the 32. least significant bit. This is because the result of ~(1 << 31) will be positive in a 32 bit signed integer in two's complement representation (and indeed a one's complement representation), so sign-extending the result will make all the higher bits unset.

offset = offset & ~(1 << 31); // BZZT! Fragile!

In this case, if int is 32 bits wide and offset is of a wider type, this operation will clear all the high bits.

However, the proposed solution in the other question does not seem to resolve this problem!

offset = offset & ~static_cast<decltype(offset)>(1 << 31); // BZZT! Fragile!

It seems that 1 << 31 will be sign-extended before the cast, so regardless of whether decltype(offset) is signed or unsigned, the result of this cast will have all the higher bits set, such that the operation again will clear all those bits.

In order to fix this, I need to make the number unsigned before widening, either by making the integer literal unsigned (1u << 31 seems to work) or casting it to unsigned int:

offset = offset &
    ~static_cast<decltype(offset)>(
        static_cast<unsigned int>(
            1 << 31
        )
    );
// Now it finally looks like C++!

This change makes the original danger relevant. When the bitmask is unsigned, the inverted bitmask will be widened by setting all the higher bits to zero, so it is important to have the correct width before inverting.

This leads me to conclude that there are two ways to recommend clearing some bits:

1: offset = offset & ~3;

Advantages: Short, easily readable code.

Disadvantages: None that I know of. But is the behavior guaranteed by the standard?

2: offset = offset & ~static_cast<decltype(offset)>(3u);

Advantages: I understand how all elements of this code works, and I am fairly confident that its behavior is guaranteed by the standard.

Disadvantages: It doesn't exactly roll of the tounge.


Can you guys help me clarify if the behavior of option 1 is guaranteed or if I have to resort to recommending option 2?

like image 533
Magnus Hoff Avatar asked Aug 19 '14 10:08

Magnus Hoff


People also ask

What is widening type conversion?

A widening conversion changes a value to a data type that can allow for any possible value of the original data. Widening conversions preserve the source value but can change its representation. This occurs if you convert from an integral type to Decimal , or from Char to String .

What is widening in C#?

A widening conversion is a conversion where every value of the original type can be represented in the result type. A narrowing conversion is a conversion where some values of the. original type cannot be represented in the result type.

What is vaccination differentiated SMMs?

The differentiated SMMs for vaccinated individuals have been implemented since 10 August 2021. We have eased measures for those who are fully vaccinated, because they have good protection against the infection and are at lower risk of becoming dangerously ill if infected with COVID-19.

What is considered fully vaccinated?

Being fully vaccinated means that you have finished your vaccine, whether that's one dose or two, and two weeks have passed. You do need two weeks for your immune system to mount its full response. You are then considered fully immunized.


1 Answers

It is not valid in sign-magnitude representation. In that representation with 32-bit ints, ~3 is -0x7FFFFFFC. When this is widened to 64-bit (signed) the value is retained, -0x7FFFFFFC. So we would not say that sign-extension happens in that system; and you will incorrectly mask off all the bits 32 and higher.

In two's complement, I think offset &= ~3 always works. ~3 is -4, so whether or not the 64-bit type is signed, you still get a mask with only the bottom 2 bits unset.

However, personally I'd try to avoid writing it, as then when checking over my code for bugs later I'd have to go through all this discussion again! (and what hope does a more casual coder have of understanding the intricacies here). I only do bitwise operations on unsigned types, to avoid all of this.

like image 157
M.M Avatar answered Sep 23 '22 18:09

M.M