Recently, during a refactoring session, I was looking over some code I wrote and noticed several things: <ol> <li>I had functions that used <code>unsigned char</code> to enforce values in the interval [0-255].</li> <li>Other functions used <code>int</code> or <code>long</code> data types with <code>if</code> statements inside the functions to silently clamp the values to valid ranges.</li> <li>Values contained in classes and/or declared as arguments to functions that had an unknown <code>upper bound</code> but a known and definite non-negative <code>lower bound</code> were declared as an <code>unsigned</code> data type (<code>int</code> or <code>long</code> depending on the possibility that the <code>upper bound</code> went above 4,000,000,000).</li> </ol> The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using <code>int</code> or <code>long</code> with appropriate non-notifying clamping? A note on the use of "appropriate": There are cases where I use <code>signed</code> data types and throw notifying exceptions when the values go out of range but these are reserved for <code>divde by zero</code> and <code>constructors</code>.

In C and C++, signed and unsigned integer types have certain specific characteristics. Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions). Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around. Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious. Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this: <pre class="prettyprint"><code>for (unsigned int u = 10; u >= 0; u --) { // ... } </code></pre> is an infinite loop. One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.) Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types. (Yet another is to define your own types with exactly the behavior you want, but that has costs.) As John Sallay's answer says, consistency is probably more important than which particular approach you take. I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.

Is it a best practice to use unsigned data types to enforce non-negative and/or valid values?

Tags:

c++

unsigned

signed

Recently, during a refactoring session, I was looking over some code I wrote and noticed several things:

I had functions that used unsigned char to enforce values in the interval [0-255].
Other functions used int or long data types with if statements inside the functions to silently clamp the values to valid ranges.
Values contained in classes and/or declared as arguments to functions that had an unknown upper bound but a known and definite non-negative lower bound were declared as an unsigned data type (int or long depending on the possibility that the upper bound went above 4,000,000,000).

The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using int or long with appropriate non-notifying clamping?

A note on the use of "appropriate": There are cases where I use signed data types and throw notifying exceptions when the values go out of range but these are reserved for divde by zero and constructors.

359

asked May 12 '12 21:05

Casey

1 Answers

In C and C++, signed and unsigned integer types have certain specific characteristics.

Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions).

Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around.

Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious.

Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this:

for (unsigned int u = 10; u >= 0; u --) {
    // ...
}

is an infinite loop.

One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.)

Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types.

(Yet another is to define your own types with exactly the behavior you want, but that has costs.)

As John Sallay's answer says, consistency is probably more important than which particular approach you take.

I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.

149

answered Oct 13 '22 22:10

Keith Thompson

Related questions
                            
                                how the destructor works in c++
                            
                                Is there any way to stop OpenCL kernel from execution?
                            
                                std::map.insert "could not deduce template argument for..."
                            
                                How to improve performance of a hashtable with 1 million elements and 997 buckets?
                            
                                __cdecl results in larger executable than __stdcall?
                            
                                function forward-declaration inside another function
                            
                                Run .bat file on VS cmd prompt
                            
                                Implicit constructor arguments
                            
                                Displaying Image in QmessageBox
                            
                                list box notifications
                            
                                Strange behaviour when reading in int from STDIN
                            
                                Can an abstract class be member of other concrete class as composition relationship ? c++
                            
                                boost::asio ssl linking error
                            
                                How can I use memcpy to copy data from two integers to an array of characters?
                            
                                why can't I cast a pointer to Derived class member function to the same but of class Base?
                            
                                Is it an overkill to use scoped_ptr in simple cases?
                            
                                How to install clang 3.1 using macports? (OSX, snow leopard)
                            
                                C++: Is arr.size() a good condition for a loop?
                            
                                Design a better API interface to pass a struct from one class to another class [closed]
                            
                                I'm having some trouble with C++11 in Xcode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With