Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Signed vs. unsigned values for counting in a loop

So I have within a program an ordinary for loop through a vector of objects (objects that are of a type I defined, if that is relevant):

for(int k = 0; k < objects.size(); k++){ ... }

...and when I compile, I get this warning:

warning: comparison between signed and unsigned integer expressions 

This makes sense, since I think size() for a vector returns a size_t. But why would it matter? Isn't a certain number of elements (or even memory chunks) an integer that you can count? More importantly, since my program has multiple such loops and happens to segfault a lot, could this be part of it?

like image 784
norman Avatar asked Oct 16 '12 00:10

norman


People also ask

How do you know if a variable is signed or unsigned?

A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers (zero or positive numbers).

When would you want to use an unsigned type?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive). Note: it is almost always the case that you could use a regular integer variable in place of an unsigned integer.

Can signed and unsigned integers store the same number of values?

Both can store 256 different values, but signed integers use half of their range for negative numbers, whereas unsigned integers can store positive numbers that are twice as large.

Why do we need signed and unsigned integer?

Unsigned can hold a larger positive value and no negative value. Unsigned uses the leading bit as a part of the value, while the signed version uses the left-most-bit to identify if the number is positive or negative. Signed integers can hold both positive and negative numbers.


2 Answers

The problem arises when object.size() returns a value that is greater than the maximum representable value of k. Since k is signed, it has only half the maximum value compared to a size_t1.

Now, this may not happen in your particular application (on a typical 32-bit system, that would be upwards of two billion objects in your collection), but it's always a good idea to use the correct types.

1. Pre-emptive rebuttal: Yes, this is only true for machines using typical two's-complement arithmetic, and for machines where int and size_t are represented using the same number of bits.

like image 111
Greg Hewgill Avatar answered Nov 15 '22 15:11

Greg Hewgill


Well answered, already, but I'll add my S/0.02: The "correct" way to do this is:

for (typename std::vector<MyObject>::size_type i = 0; i < object.size(); ++i) { ... }

Only aspiring language lawyers would write that, and even they are likely to stop reading before they get to the good stuff.

With C++11 you can take advantage of decltype:

for (decltype(object.size()) i = 0; i < object.size(); ++i) { ... }

Or you can take advantage of auto:

for (auto i = object.size() - object.size(); i < object.size(); ++i) { ... }

Or you can just use size_t, but you still might have doubts about overflow, since vector<MyObject>'s size_type might be larger than size_t. (It isn't, but there are no guarantees):

for (size_t i = 0; i < object.size(); ++i) { ... }

So what's an honest programmer to do?

The absolutely easiest solution is the one which STL has been promoting since the beginning. Except that in the beginning, it was also a pain to write:

for (typename std::vector<MyObject>::iterator_type it = object.begin(); it != object.end(); ++it) { ... }

Now, C++11 actually does help you. You have some very nice alternatives, starting with the simple:

for (auto it = object.begin(); it != object.end(); ++it) { ... }

But it gets even better (drumroll, please)...:

for (auto& val : object) { ... }

And that's the one I'd use.


Edited to add:

Cory Nelson, in a comment, points out that it is also possible to cache the result of object.end() with:

for (auto it = object.begin(), end = object.end(); it != end; ++it) { ... }

It turns out that the code generated by the for (var : object) syntax is very similar to that proposed by Cory Nelson. (So I'd encourage him and you to just use the latter.)

However, that has subtly different semantics from the other ones, including the iteration which was the subject of the original post. If you're modifying the container during the iteration in a way that changes its size, then you have to think things through very carefully. Disaster is highly likely.

The only way to iterate a vector which might get modified during the iteration is to use integer indices, as in the original post. Other containers are more forgiving. You can iterate an STL map with a loop which calls object.end() on each iteration, and (as far as I know) it will work even in the face of insertions and deletions, but don't try that with an unordered_map, or a vector. It does work with a deque if you always push at the end and pop at the front, which is convenient if you're using the deque as the queue in a breadth-first walk; I'm not sure if you can get away with popping the deque at the back.

There really should be a simple summary somewhere of the effect by container type of container modifications on iterators and element pointers (which are not always the same as iterators), since this is all specified by the standard, but I've never run into it anywhere. If you find one, let me know.

like image 23
rici Avatar answered Nov 15 '22 13:11

rici