Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are unsigned integers error prone?

I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.

How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?

like image 217
Destructor Avatar asked May 22 '15 11:05

Destructor


People also ask

Why avoid unsigned integer?

In a mathematical operation in C++ (e.g. arithmetic or comparison), if one signed and one unsigned integer are used, the signed integer will be converted to unsigned. And because unsigned integers can not store negative numbers, this can result in loss of data.

What is the point of unsigned int?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.

Can unsigned integers overflow?

"A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."

Why is there no unsigned in Java?

The Java language specification requires its signed integers to be represented in two's complement format. Because of this, many basic operations are exactly the same whether the integer type is signed or unsigned.


7 Answers

One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug

for(size_t i = foo.size(); i >= 0; --i)
    ...

Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).

There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.

like image 127
Ami Tavory Avatar answered Oct 06 '22 02:10

Ami Tavory


One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:

void fun (const std::vector<int> &vec) {
    for (std::size_t i = 0; i < vec.size() - 1; ++i)
        do_something(vec[i]);
}

Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.

Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.

But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!

I want to know how they lead to security bugs?

This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. They might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)

like image 38
Baum mit Augen Avatar answered Oct 06 '22 00:10

Baum mit Augen


I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:

#include <iostream>

int main() {
    unsigned n = 42;
    int i = -42;
    if (i < n) {
        std::cout << "All is well\n";
    } else {
        std::cout << "ARITHMETIC IS BROKEN!\n";
    }
}

The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.

like image 23
Mike Seymour Avatar answered Oct 06 '22 02:10

Mike Seymour


Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.

The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).

The example given there is:

template <class T>
  class Array {
  public:
      Array(unsigned int size);
  ...

and a possible instantiation of this class

int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()

The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.

like image 33
Marco13 Avatar answered Oct 06 '22 01:10

Marco13


The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.

unsigned int x = 0;
unsigned int y = x - 1;

if (y > x) printf ("What a surprise! \n");

And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.

If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.

like image 36
gnasher729 Avatar answered Oct 06 '22 02:10

gnasher729


The problem with unsigned integer types is that depending upon their size they may represent one of two different things:

  1. Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
  2. Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.

Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.

  1. If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
  2. If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).

IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.

like image 41
supercat Avatar answered Oct 06 '22 01:10

supercat


Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.

Take for example the simple case of a comparison between two variables, one signed and the other unsigned.

  • If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
  • If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
  • If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.

To take another example consider multiplying two unsigned integers of the same size.

  • If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
  • If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
  • If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.
like image 32
plugwash Avatar answered Oct 06 '22 02:10

plugwash