Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do arithmetic operations on unsigned chars promote them to signed integers?

Tags:

c++

c

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.

From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.

Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.

[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.

[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?

like image 836
DashwoodIce9 Avatar asked May 27 '20 09:05

DashwoodIce9


People also ask

What is the advantage of unsigned integers over signed integers?

Both can store 256 different values, but signed integers use half of their range for negative numbers, whereas unsigned integers can store positive numbers that are twice as large. An n-bit unsigned variable has a range of 0 to (2n)-1.

What is the purpose of unsigned integers?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.

Why do we need signed and unsigned char?

Unsigned char must be used for accessing memory as a block of bytes or for small unsigned integers. Signed char must be used for small signed integers and simple char must be used only for ASCII characters and strings.

What is the difference between unsigned and signed integer?

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation.


1 Answers

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).

Chapter 6.3 (p. 44 - 45) is about conversions

Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.

The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.

Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:

  1. An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.

  2. The result of the preceding expression is used in a context in which its signedness is significant:

    sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or

    • it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or

    • it is either operand of /, %, <, <=, >, or >=.

In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.

One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.

The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE IN C89

A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.

For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

like image 177
Bob__ Avatar answered Sep 29 '22 00:09

Bob__