I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.
How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?
In a mathematical operation in C++ (e.g. arithmetic or comparison), if one signed and one unsigned integer are used, the signed integer will be converted to unsigned. And because unsigned integers can not store negative numbers, this can result in loss of data.
Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.
"A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
The Java language specification requires its signed integers to be represented in two's complement format. Because of this, many basic operations are exactly the same whether the integer type is signed or unsigned.
One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug
for(size_t i = foo.size(); i >= 0; --i)
...
Note that, by definition, i >= 0
is always true. (What causes this in the first place is that if i
is signed, the compiler will warn about a possible overflow with the size_t
of size()
).
There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.
One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:
void fun (const std::vector<int> &vec) {
for (std::size_t i = 0; i < vec.size() - 1; ++i)
do_something(vec[i]);
}
Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.
Now, later on, somebody comes along an passes an empty vector
to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.
But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX
. Disaster, UB and most likely crash!
I want to know how they lead to security bugs?
This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something
would do something that can be observed by the attacker. They might be able to find what input went into do_something
, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)
I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:
#include <iostream>
int main() {
unsigned n = 42;
int i = -42;
if (i < n) {
std::cout << "All is well\n";
} else {
std::cout << "ARITHMETIC IS BROKEN!\n";
}
}
The promotion rules mean that i
is converted to unsigned
for the comparison, giving a large positive number and a surprising result.
Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.
The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).
The example given there is:
template <class T> class Array { public: Array(unsigned int size); ...
and a possible instantiation of this class
int f(); // f and g are functions that return int g(); // ints; what they do is unimportant Array<double> a(f()-g()); // array size is f()-g()
The difference of the values returned by f()
and g()
might be negative, for an awful number of reasons. The constructor of the Array
class will receive this difference as a value that is implicitly converted to be unsigned
. Thus, as the implementor of the Array
class, one can not distinguish between an erreonously passed value of -1
, and a very large array allocation.
The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.
unsigned int x = 0;
unsigned int y = x - 1;
if (y > x) printf ("What a surprise! \n");
And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.
If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.
The problem with unsigned integer types is that depending upon their size they may represent one of two different things:
int
(e.g. uint8
) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int
type. Under present rules, if such a calculation exceeds the range of an int
, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int
. unsigned int
and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.Consequently, given uint32_t x=1, y=2;
the expression x-y
may have one of two meanings depending upon whether int
is larger than 32 bits.
int
is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t
can't hold the value -1 regardless of the size of int
, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.int
is 32 bits or smaller, the expression will yield a uint32_t
value which, when added to the uint32_t
value 2, will yield the uint32_t
value 1 (i.e. the uint32_t
value 0xFFFFFFFF).IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t
would always behave as a number, regardless of the size of int
(possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int
is 32 bits or smaller), while a wrap32_t
would always behave as a member of an algebraic ring (blocking promotions even if int
were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.
Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.
Take for example the simple case of a comparison between two variables, one signed and the other unsigned.
To take another example consider multiplying two unsigned integers of the same size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With