Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why (int)((unsigned int)((int)v)?

The website in which I found this code

int v, sign; // or, to avoid branching on CPUs with flag registers (IA32): sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));  // if v < 0 then -1, else 0.  

This statement assigns variable sign with the sign of variable v (either -1 or 0). I wonder why (int)((unsigned int)((int)v) is used instead of a plain v?

like image 685
nalzok Avatar asked Nov 23 '15 23:11

nalzok


People also ask

Why would you use an unsigned int?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.

Why do we write unsigned int before variable?

Unsigned int is usually used when we are dealing with bit values that means when we are performing bitwise operations like bit masking orbit shifting. As bit shifting in negative integers is undefined or implementation-defined outputs.

What is difference between int and unsigned int?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295].

What is the difference between unsigned and unsigned int?

There is no difference. unsigned and unsigned int are both synonyms for the same type (the unsigned version of the int type).


2 Answers

Note that you've extracted a fragment of the expression in your question (you quote (int)((unsigned int)((int)v) which has one more left bracket ( than it does right brackets )). The RHS expression of the assignment statement is, in full:

-(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); 

If you add a few spaces, you find:

-(int) (  (unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)  );        ^  ^            ^^      ^    ^                          ^  ^        |  +------------++------+    +--------------------------+  |        +----------------------------------------------------------+ 

That is, the outer (int) cast applies to all of:

((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); 

The inner cast to (int) cast is vacuous; its result is immediately cast to unsigned int. The (unsigned int) cast ensures that the right shift is well defined. The expression as a whole determines whether the most significant bit is a 0 or a 1. The outer int converts the result back to an int, and the - then negates it, so the expression is -1 if v is negative and 0 if v is zero or positive — which is what the comment says.

like image 148
Jonathan Leffler Avatar answered Oct 14 '22 05:10

Jonathan Leffler


Quoting the C Standard 6.5.7p5:

The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

The author is writing on how to implement a function sign(int v) that returns -1 for negative numbers and 0 for 0 and positive numbers efficiently. A naive approach is this:

int sign(int v) {     if (v < 0)         return -1;     else         return 0; } 

But this solution may compile to code that performs a comparison and branch on CPU flags set by the comparison. This is inefficient. He proposes a simpler and more direct solution:

sign = -(v > 0); 

But this computation still requires compare and branch on CPUs that do not produce comparison results directly as boolean values. CPUs with flag registers typically set various flags upon comparison instructions or even most arithmetic instruction. So he proposes another solution based on shifting the sign bit, but as the Standard specifies above, he cannot rely on the result of right shifting a negative value.

Casting v as unsigned removes this problem because right shifting unsigned values is well specified. Assuming the sign bit is in the highest position, which is true for all modern processors but not mandated by the C standard, right shifting (unsigned)v by the one less than the number of bits in its type produces a value of 1 for negative values and 0 otherwise. Negating the result should produce the expected values -1 for negative v and 0 for positive and zero v. But the expression is unsigned, so plain negation will produce UINT_MAX or 0, which in turn causes arithmetic overflow when stored to an int or even just cast as an (int). Casting this result back to int before negating it correctly computes the desired result, -1 for negative v and 0 for positive or zero v.

Arithmetic overflow is usually benign and widely ignored by most programmers, but modern compilers tend to take advantage of its undefinedness to perform aggressive optimizations, so it is unwise to rely on expected but unwarranted behavior and best to avoid arithmetic overflow in all cases.

The expression could be simplified as:

sign = -(int)((unsigned)v >> (sizeof(int) * CHAR_BIT - 1)); 

Note that if right shifting is defined as replicating the bit for your platform (an almost universal behavior with current CPUs), the expression would be much simpler (assuming int v):

sign = v >> (sizeof(v) * CHAR_BIT - 1));   // works on x86 CPUs 

The bithacks page https://graphics.stanford.edu/~seander/bithacks.html , very instructive indeed, contains a detailed explanation:

int v;      // we want to find the sign of v int sign;   // the result goes here   // CHAR_BIT is the number of bits per byte (normally 8). sign = -(v < 0);  // if v < 0 then -1, else 0.  // or, to avoid branching on CPUs with flag registers (IA32): sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1)); // or, for one less instruction (but not portable): sign = v >> (sizeof(int) * CHAR_BIT - 1);  

The last expression above evaluates to sign = v >> 31 for 32-bit integers. This is one operation faster than the obvious way, sign = -(v < 0). This trick works because when signed integers are shifted right, the value of the far left bit is copied to the other bits. The far left bit is 1 when the value is negative and 0 otherwise; all 1 bits gives -1. Unfortunately, this behavior is architecture-specific.

As an epilogue, I would recommend to use the most readable version and rely on the compiler to produce the most efficient code:

sign = -(v < 0); 

As can be verified on this enlightening page: http://gcc.godbolt.org/# compiling the above code with gcc -O3 -std=c99 -m64 indeed produces the code below for all solutions above, even the most naive if/else statement:

sign(int):     movl    %edi, %eax     sarl    $31, %eax     ret 
like image 27
chqrlie Avatar answered Oct 14 '22 07:10

chqrlie