Recently, during a refactoring session, I was looking over some code I wrote and noticed several things:
unsigned char
to enforce values in the interval [0-255].int
or long
data types with if
statements inside the functions to silently clamp the values to valid ranges.upper bound
but a known and definite non-negative lower bound
were declared as an unsigned
data type (int
or long
depending on the possibility that the upper bound
went above 4,000,000,000).The inconsistency is unnerving. Is this a good practice that I should continue? Should I rethink the logic and stick to using int
or long
with appropriate non-notifying clamping?
A note on the use of "appropriate": There are cases where I use signed
data types and throw notifying exceptions when the values go out of range but these are reserved for divde by zero
and constructors
.
Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive).
The Google C++ style guide recommends avoiding unsigned integers except in situations that definitely require it (for example: file formats often store sizes in uint32_t or uint64_t -- no point in wasting a signedness bit that will never be used).
Unsigned int data type in C++ is used to store 32-bit integers. The keyword unsigned is a data type specifier, which only represents non-negative integers i.e. positive numbers and zero.
The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.
In C and C++, signed and unsigned integer types have certain specific characteristics.
Signed types have bounds far from zero, and operations that exceed those bounds have undefined behavior (or implementation-defined in the case of conversions).
Unsigned types have a lower bound of zero and an upper bound far from zero, and operations that exceed those bounds quietly wrap around.
Often what you really want is a particular range of values with some particular behavior when operations exceed those bounds (saturation, signaling an error, etc.). Neither signed nor unsigned types are entirely suitable for such requirements. And operations that mix signed and unsigned types can be confusing; the rules for such operations are defined by the language, but they're not always obvious.
Unsigned types can be problematic because the lower bound is zero, so operations with reasonable values (nowhere near the upper bound) can behave in unexpected ways. For example, this:
for (unsigned int u = 10; u >= 0; u --) {
// ...
}
is an infinite loop.
One approach is to use signed types for everything that doesn't absolutely require an unsigned representation, choosing a type wide enough to hold the values you need. This avoids problems with signed/unsigned mixed operations. Java, for example, enforces this approach by not having unsigned types at all. (Personally, I think that decision was overkill, but I can see the advantages of it.)
Another approach is to use unsigned types for values that logically cannot be negative, and be very careful with expressions that might underflow or that mix signed and unsigned types.
(Yet another is to define your own types with exactly the behavior you want, but that has costs.)
As John Sallay's answer says, consistency is probably more important than which particular approach you take.
I wish I could give a "this way is right, that way is wrong" answer, but there really isn't one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With