Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use int or unsigned int when working with STL container?

Referring to this guide:
https://google.github.io/styleguide/cppguide.html#Integer_Types

Google suggests to use int in the most of time.
I try to follow this guide and the only problem is with STL containers.


Example 1.

void setElement(int index, int value)
{
    if (index > someExternalVector.size()) return;
    ...
}

Comparing index and .size() is generating a warning.

Example 2.

for (int i = 0; i < someExternalVector.size(); ++i)
{
    ...
}

Same warning between i and .size().


If I declare index or i as unsigned int, the warning is off, but the type declaration will propagate, then I have to declare more variables as unsigned int, then it contradicts the guide and loses consistency.

The best way I can think is to use a cast like:

if (index > static_cast<int>(someExternalVector.size())

or

for (int i = 0; i < static_cast<int>(someExternalVector.size()); ++i)

But I really don't like the casts.

Any suggestion?


Some detailed thoughts below:

To advantage to use only signed integer is like: I can avoid signed/unsigned warnings, castings, and be sure every value can be negative(to be consistent), so -1 could be used to represent invalid values.

There are many cases that the usage of loop counters are mixed with some other constants or struct members. So it would be problematic if signed/unsigned is not consistent. There will be full of warnings and castings.

like image 411
Marson Mao Avatar asked Jun 19 '13 03:06

Marson Mao


People also ask

Is unsigned int better to use?

The Google C++ style guide recommends avoiding unsigned integers except in situations that definitely require it (for example: file formats often store sizes in uint32_t or uint64_t -- no point in wasting a signedness bit that will never be used).

Should I use unsigned int or uint32_t?

uint32_t is used when you must have a 32 bit unsigned. int or unsigned int for general purposes when you don't need a guaranteed size and unsigned only if you can ensure that you won't have negative numbers.

Why would you use an unsigned int?

Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive).

Should I always use unsigned?

In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case). unsigned starts to make more sense when: You're going to do bitwise things like masks, or.


2 Answers

Unsigned types have three characteristics, one of which is qualitatively 'good' and one of which is qualitatively 'bad':

  • They can hold twice as many values as the same-sized signed type (good)
  • The size_t version (that is, 32-bit on a 32-bit machine, 64-bit on a 64-bit machine, etc) is useful for representing memory (addresses, sizes, etc) (neutral)
  • They wrap below 0, so subtracting 1 in a loop or using -1 to represent an invalid index can cause bugs (bad.) Signed types wrap too.

The STL uses unsigned types because of the first two points above: in order to not limit the potential size of array-like classes such as vector and deque (although you have to question how often you would want 4294967296 elements in a data structure); because a negative value will never be a valid index into most data structures; and because size_t is the correct type to use for representing anything to do with memory, such as the size of a struct, and related things such as the length of a string (see below.) That's not necessarily a good reason to use it for indexes or other non-memory purposes such as a loop variable. The reason it's best practice to do so in C++ is kind of a reverse construction, because it's what's used in the containers as well as other methods, and once used the rest of the code has to match to avoid the same problem you are encountering.

You should use a signed type when the value can become negative.

You should use an unsigned type when the value cannot become negative (possibly different to 'should not'.)

You should use size_t when handling memory sizes (the result of sizeof, often things like string lengths, etc.) It is often chosen as a default unsigned type to use, because it matches the platform the code is compiled for. For example, the length of a string is size_t because a string can only ever have 0 or more elements, and there is no reason to limit a string's length method arbitrarily shorter than what can be represented on the platform, such as a 16-bit length (0-65535) on a 32-bit platform. Note (thanks commenter Morwen) std::intptr_t or std::uintptr_t which are conceptually similar - will always be the right size for your platform - and should be used for memory addresses if you want something that's not a pointer. Note 2 (thanks commenter rubenvb) that a string can only hold size_t-1 elements due to the value of npos. Details below.

This means that if you use -1 to represent an invalid value, you should use signed integers. If you use a loop to iterate backwards over your data, you should consider using a signed integer if you are not certain that the loop construct is correct (and as noted in one of the other answers, they are easy to get wrong.) IMO, you should not resort to tricks to ensure the code works - if code requires tricks, that's often a danger signal. In addition, it will be harder to understand for those following you and reading your code. Both these are reasons not to follow @Jasmin Gray's answer above.

Iterators

However, using integer-based loops to iterate over the contents of a data structure is the wrong way to do it in C++, so in a sense the argument over signed vs unsigned for loops is moot. You should use an iterator instead:

std::vector<foo> bar;
for (std::vector<foo>::const_iterator it = bar.begin(); it != bar.end(); ++it) {
  // Access using *it or it->, e.g.:
  const foo & a = *it;

When you do this, you don't need to worry about casts, signedness, etc.

Iterators can be forward (as above) or reverse, for iterating backwards. Use the same syntax of it != bar.end(), because end() signals the end of the iteration, not the end of the underlying conceptual array, tree, or other structure.

In other words, the answer to your question 'Should I use int or unsigned int when working with STL containers?' is 'Neither. Use iterators instead.' Read more about:

  • Why use iterators instead of array indices in C++?
  • Why again (some more interesting points in the answers to this question)
  • Iterators in general - the different kinds, how to use them, etc.

What's left?

If you don't use an integer type for loops, what's left? Your own values, which are dependent on your data, but which in your case include using -1 for an invalid value. This is simple. Use signed. Just be consistent.

I am a big believer in using natural types, such as enums, and signed integers fit into this. They match our conceptual expectation more closely. When your mind and the code are aligned, you are less likely to write buggy code and more likely to expressively write correct, clean code.

like image 185
David Avatar answered Oct 21 '22 01:10

David


Use the type that the container returns. In this case, size_t - which is an integer type that is unsigned. (To be technical, it's std::vector<MyType>::size_type, but that's usually defined to size_t, so you're safe using size_t. unsigned is also fine)

But in general, use the right tool for the right job. Is the 'index' ever supposed to be negative? If not, don't make it signed.

By the by, you don't have to type out 'unsigned int'. 'unsigned' is shorthand for the same variable type:

int myVar1;
unsigned myVar2;

The page linked to in the original question said:

Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.

It's not just self-documentation, it's use the right tool for the right job. Saying that 'unsigned variables can cause bugs so don't use unsigned variables' is silly. Signed variables can also cause bugs. So can floats (more than integers). The only guaranteed bug-free code is code that doesn't exist.

Their example of why unsigned is evil, is this loop:

for (unsigned int i = foo.Length()-1; i >= 0; --i)

I have difficulty iterating backwards over a loop, and I usually make mistakes (with signed or unsigned integers) with it. Do I subtract one from size? Do I make it greater-than-AND-equal-to 0, or just greater than? It's a sloppy situation to begin with.

So what do you do with code that you know you have problems with? You change your coding style to fix the problem, make it simpler, and make it easier to read, and make it easier to remember. There is a bug in the loop they posted. The bug is, they wanted to allow a value below zero, but they chose to make it unsigned. It's their mistake.

But here's a simple trick that makes it easier to read, remember, write, and run. With unsigned variables. Here's the intelligent thing to do (obviously, this is my opinion).

for(unsigned i = myContainer.size(); i--> 0; )
{
    std::cout << myContainer[i] << std::endl;
}

It's unsigned. It always works. No negative to the starting size. No worrying about underflows. It just works. It's just smart. Do it right, don't stop using unsigned variables because someone somewhere once said they had a mistake with a for() loop and failed to train themselves to not make the mistake.

The trick to remembering it:

  1. Set 'i' to the size. (don't worry about subtracting one)
  2. Make 'i' point to 0 like an arrow. i --> 0 (it's a combination of post-decrementing (i--) and greater-than comparison (i > 0))

It's better to teach yourself tricks to code right, then to throw away tools because you don't code right.

Which would you want to see in your code?

for(unsigned i = myContainer.size()-1; i >= 0; --i)

Or:

for(unsigned i = myContainer.size(); i--> 0; )

Not because it's less characters to type (that'd be silly), but because it's less mental clutter. It's simpler to mentally parse when skimming through code, and easier to spot mistakes.


Try the code yourself

like image 41
Jamin Grey Avatar answered Oct 21 '22 00:10

Jamin Grey