Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between literals and variables in C (signed vs unsigned short ints)?

I have seen the following code in the book Computer Systems: A Programmer's Perspective, 2/E. This works well and creates the desired output. The output can be explained by the difference of signed and unsigned representations.

#include<stdio.h>
int main() {
    if (-1 < 0u) {
        printf("-1 < 0u\n");
    }
    else {
        printf("-1 >= 0u\n");
    }
    return 0;
}

The code above yields -1 >= 0u, however, the following code which shall be the same as above, does not! In other words,

#include <stdio.h>

int main() {

    unsigned short u = 0u;
    short x = -1;
    if (x < u)
        printf("-1 < 0u\n");
    else
        printf("-1 >= 0u\n");
    return 0;
}

yields -1 < 0u. Why this happened? I cannot explain this.

Note that I have seen similar questions like this, but they do not help.

PS. As @Abhineet said, the dilemma can be solved by changing short to int. However, how can one explains this phenomena? In other words, -1 in 4 bytes is 0xff ff ff ff and in 2 bytes is 0xff ff. Given them as 2s-complement which are interpreted as unsigned, they have corresponding values of 4294967295 and 65535. They both are not less than 0 and I think in both cases, the output needs to be -1 >= 0u, i.e. x >= u.

A sample output for it on a little endian Intel system:

For short:

-1 < 0u
u =
 00 00
x =
 ff ff

For int:

-1 >= 0u
u =
 00 00 00 00
x =
 ff ff ff ff
like image 713
Ali Shakiba Avatar asked Oct 26 '15 06:10

Ali Shakiba


People also ask

What is the difference between signed and unsigned integer in C?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295].

What is the difference between variable and literals?

Literals are raw values or data that are stored in a variable or constant. Variables are mutable, i.e., their values can be changed and updated. Constants are immutable, i.e. their values cannot be changed or updated. Literals are both mutable or immutable depending on the type of literal used.

What is signed and unsigned literals in C?

The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.

What is the difference between signed and unsigned?

An unsigned number contains just zero or positive values, whereas a signed number has both positive and negative numbers along with the value zero. The maximum value of signed numbers is half that of unsigned numbers.


3 Answers

The code above yields -1 >= 0u

All integer literals (numeric constansts) have a type and therefore also a signedness. By default, they are of type int which is signed. When you append the u suffix, you turn the literal into unsigned int.

For any C expression where you have one operand which is signed and one which is unsiged, the rule of balacing (formally: the usual arithmetic conversions) implicitly converts the signed type to unsigned.

Conversion from signed to unsigned is well-defined (6.3.1.3):

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

For example, for 32 bit integers on a standard two's complement system, the max value of an unsigned integer is 2^32 - 1 (4294967295, UINT_MAX in limits.h). One more than the maximum value is 2^32. And -1 + 2^32 = 4294967295, so the literal -1 is converted to an unsigned int with the value 4294967295. Which is larger than 0.


When you switch types to short however, you end up with a small integer type. This is the difference between the two examples. Whenever a small integer type is part of an expression, the integer promotion rule implicitly converts it to a larger int (6.3.1.1):

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

If short is smaller than int on the given platform (as is the case on 32 and 64 bit systems), any short or unsigned short will therefore always get converted to int, because they can fit inside one.

So for the expression if (x < u), you actually end up with if((int)x < (int)u) which behaves as expected (-1 is lesser than 0).

like image 194
Lundin Avatar answered Nov 08 '22 09:11

Lundin


You're running into C's integer promotion rules.

Operators on types smaller than int automatically promote their operands to int or unsigned int. See comments for more detailed explanations. There is a further step for binary (two-operand) operators if the types still don't match after that (e.g. unsigned int vs. int). I won't try to summarize the rules in more detail than that. See Lundin's answer.

This blog post covers this in more detail, with a similar example to yours: signed and unsigned char. It quotes the C99 spec:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.


You can play around with this more easily on something like godbolt, with a function that returns one or zero. Just look at the compiler output to see what ends up happening.

#define mytype short

int main() {
    unsigned mytype u = 0u;
    mytype x = -1;
    return (x < u);
}
like image 29
Peter Cordes Avatar answered Nov 08 '22 08:11

Peter Cordes


Other than what you seem to assume , this is not a property of the particular width of the types, here 2 byte versus 4 bytes, but a question of the rules that are to be applied. The integer promotion rules state that short and unsigned short are converted to int on all platforms where the corresponding range of values fit into int. Since this is the case here, both values are preserved and obtain the type int. -1 is perfectly representable in int as is 0. So the test results in -1 is smaller than 0.

In the case of testing -1 against 0u the common conversion choses the unsigned type as a common type to which both are converted. -1 converted to unsigned is the value UINT_MAX, which is larger than 0u.

This is a good example, why you should never use "narrow" types to do arithmetic or comparison. Only use them if you have a sever size constraint. This will rarely be the case for simple variables, but mostly for large arrays where you can really gain from storing in a narrow type.

like image 29
Jens Gustedt Avatar answered Nov 08 '22 08:11

Jens Gustedt