Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters. From my understanding an <code>unsigned char</code> does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two <code>unsigned chars</code> would produce unexpected behavior. Therefore, promoting them to <code>int</code> just seems like a waste of space (in most cases) and CPU cycles. Moreover, why <code>int</code>? If a variable is being declared as <code>unsigned</code>, clearly the unsignedness is important to the programmer, therefore a promotion to an <code>unsigned int</code> would still make more sense than an <code>int</code>, in my opinion. [EDIT #1] As stated out in the comments, promotion to <code>unsigned int</code> will take place if an <code>int</code> cannot sufficiently accommodate the value in the <code>unsigned char</code>. [EDIT #2] To clarify the question, if it is about the performance benefit of operating over <code>int</code> than <code>char</code>, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over <code>unsigned chars</code>, therefore the requirement to promote them to <code>ints</code>, seems unnecessary. Can you give me an example which proves this wrong?

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003). Chapter 6.3 (p. 44 - 45) is about conversions <blockquote> Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving. The difference between these approaches centered on the treatment of <code>unsigned char</code> and <code>unsigned short</code> when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1). The unsigned preserving approach calls for promoting the two smaller unsigned types to <code>unsigned int</code>. This is a simple rule, and yields a type which is independent of execution environment. The value preserving approach calls for promoting those types to <code>signed int</code> if that type can properly represent all the values of the original type, and otherwise for promoting those types to <code>unsigned int</code>. Thus, if the execution environment represents <code>short</code> as something smaller than <code>int</code>, <code>unsigned short</code> becomes <code>int</code>; otherwise it becomes <code>unsigned int</code>. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true: <ol> <li>An expression involving an <code>unsigned char</code> or <code>unsigned short</code> produces an <code>int</code>-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an <code>int</code> or “narrower” type.</li> <li> The result of the preceding expression is used in a context in which its signedness is significant: • <code>sizeof(int) < sizeof(long)</code> and it is in a context where it must be widened to a long type, or • it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or • it is either operand of /, %, <, <=, >, or >=. </li> </ol> In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an <code>unsigned int</code> confronts a <code>signed int</code> across an operator, and the <code>signed int</code> has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative <code>signed int</code> becomes a very large <code>unsigned int</code>, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts. One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions. The unsigned preserving rules greatly increase the number of situations where <code>unsigned int</code> confronts <code>signed int</code> to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving. QUIET CHANGE IN C89 A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice. </blockquote> For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

Why do arithmetic operations on unsigned chars promote them to signed integers?

Tags:

c++

c

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.

From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.

Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.

[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.

[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?

836

asked May 27 '20 09:05

DashwoodIce9

1 Answers

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).

Chapter 6.3 (p. 44 - 45) is about conversions

Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.

The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.

Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:

An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.

The result of the preceding expression is used in a context in which its signedness is significant:

• sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or

• it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or

• it is either operand of /, %, <, <=, >, or >=.

In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.

One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.

The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE IN C89

A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.

For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

177

answered Sep 29 '22 00:09

Bob__

Related questions
                            
                                Can't change Windows SDK version in Visual Studio C++ project
                            
                                Fallback implementation for conflict detection in AVX2
                            
                                Is there a special rule for lambda in case of decltype(auto)?
                            
                                Assimp model loading library install/linking troubles
                            
                                Class with types dependant on variadic templating
                            
                                False-branch of if constexpr not discarded in templated lambda
                            
                                use c++ 11 constexpr for std::map initialization
                            
                                Template template parameter and default values [duplicate]
                            
                                (How) Can I use the new C++ 11 ABI with devtoolset-7 on Centos/RHEL?
                            
                                Exposing parameter types in a perfectly-forwarding function avoiding code repetition
                            
                                GCC allows access to private static member
                            
                                Why does a CopyConstructible type also have to be MoveConstructible?
                            
                                Clang modifies return value in destructor?
                            
                                Difference between getters ending with const and const&
                            
                                C++ non-copyable lambda behaves copyable?
                            
                                g++ and clang++ different behaviour with template specialization for auto argument
                            
                                Valgrind errors with boost::thread_specific_ptr on GCC 8.3 + Linux
                            
                                How is shift_right() intended to be implemented in C++20?
                            
                                Template argument deduction for an argument of a function type
                            
                                Node.js native addons: where is node_api.h located?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With