Referring to this guide:
https://google.github.io/styleguide/cppguide.html#Integer_Types
Google suggests to use int
in the most of time.
I try to follow this guide and the only problem is with STL containers.
Example 1.
void setElement(int index, int value)
{
if (index > someExternalVector.size()) return;
...
}
Comparing index
and .size()
is generating a warning.
Example 2.
for (int i = 0; i < someExternalVector.size(); ++i)
{
...
}
Same warning between i
and .size()
.
If I declare index
or i
as unsigned int
, the warning is off, but the type declaration will propagate, then I have to declare more variables as unsigned int
, then it contradicts the guide and loses consistency.
The best way I can think is to use a cast like:
if (index > static_cast<int>(someExternalVector.size())
or
for (int i = 0; i < static_cast<int>(someExternalVector.size()); ++i)
But I really don't like the casts.
Any suggestion?
Some detailed thoughts below:
To advantage to use only signed integer is like: I can avoid signed/unsigned warnings, castings, and be sure every value can be negative(to be consistent), so -1 could be used to represent invalid values.
There are many cases that the usage of loop counters are mixed with some other constants or struct members. So it would be problematic if signed/unsigned is not consistent. There will be full of warnings and castings.
The Google C++ style guide recommends avoiding unsigned integers except in situations that definitely require it (for example: file formats often store sizes in uint32_t or uint64_t -- no point in wasting a signedness bit that will never be used).
uint32_t is used when you must have a 32 bit unsigned. int or unsigned int for general purposes when you don't need a guaranteed size and unsigned only if you can ensure that you won't have negative numbers.
Unsigned integers are used when we know that the value that we are storing will always be non-negative (zero or positive).
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case). unsigned starts to make more sense when: You're going to do bitwise things like masks, or.
Unsigned types have three characteristics, one of which is qualitatively 'good' and one of which is qualitatively 'bad':
size_t
version (that is, 32-bit on a 32-bit machine, 64-bit on a 64-bit machine, etc) is useful for representing memory (addresses, sizes, etc) (neutral)The STL uses unsigned types because of the first two points above: in order to not limit the potential size of array-like classes such as vector
and deque
(although you have to question how often you would want 4294967296 elements in a data structure); because a negative value will never be a valid index into most data structures; and because size_t
is the correct type to use for representing anything to do with memory, such as the size of a struct, and related things such as the length of a string (see below.) That's not necessarily a good reason to use it for indexes or other non-memory purposes such as a loop variable. The reason it's best practice to do so in C++ is kind of a reverse construction, because it's what's used in the containers as well as other methods, and once used the rest of the code has to match to avoid the same problem you are encountering.
You should use a signed type when the value can become negative.
You should use an unsigned type when the value cannot become negative (possibly different to 'should not'.)
You should use size_t
when handling memory sizes (the result of sizeof
, often things like string lengths, etc.) It is often chosen as a default unsigned type to use, because it matches the platform the code is compiled for. For example, the length of a string is size_t
because a string can only ever have 0 or more elements, and there is no reason to limit a string's length method arbitrarily shorter than what can be represented on the platform, such as a 16-bit length (0-65535) on a 32-bit platform. Note (thanks commenter Morwen) std::intptr_t
or std::uintptr_t
which are conceptually similar - will always be the right size for your platform - and should be used for memory addresses if you want something that's not a pointer. Note 2 (thanks commenter rubenvb) that a string can only hold size_t-1
elements due to the value of npos
. Details below.
This means that if you use -1 to represent an invalid value, you should use signed integers. If you use a loop to iterate backwards over your data, you should consider using a signed integer if you are not certain that the loop construct is correct (and as noted in one of the other answers, they are easy to get wrong.) IMO, you should not resort to tricks to ensure the code works - if code requires tricks, that's often a danger signal. In addition, it will be harder to understand for those following you and reading your code. Both these are reasons not to follow @Jasmin Gray's answer above.
However, using integer-based loops to iterate over the contents of a data structure is the wrong way to do it in C++, so in a sense the argument over signed vs unsigned for loops is moot. You should use an iterator instead:
std::vector<foo> bar;
for (std::vector<foo>::const_iterator it = bar.begin(); it != bar.end(); ++it) {
// Access using *it or it->, e.g.:
const foo & a = *it;
When you do this, you don't need to worry about casts, signedness, etc.
Iterators can be forward (as above) or reverse, for iterating backwards. Use the same syntax of it != bar.end()
, because end()
signals the end of the iteration, not the end of the underlying conceptual array, tree, or other structure.
In other words, the answer to your question 'Should I use int or unsigned int when working with STL containers?' is 'Neither. Use iterators instead.' Read more about:
If you don't use an integer type for loops, what's left? Your own values, which are dependent on your data, but which in your case include using -1 for an invalid value. This is simple. Use signed. Just be consistent.
I am a big believer in using natural types, such as enums, and signed integers fit into this. They match our conceptual expectation more closely. When your mind and the code are aligned, you are less likely to write buggy code and more likely to expressively write correct, clean code.
Use the type that the container returns. In this case, size_t - which is an integer type that is unsigned.
(To be technical, it's std::vector<MyType>::size_type
, but that's usually defined to size_t, so you're safe using size_t. unsigned is also fine)
But in general, use the right tool for the right job. Is the 'index' ever supposed to be negative? If not, don't make it signed.
By the by, you don't have to type out 'unsigned int'. 'unsigned' is shorthand for the same variable type:
int myVar1;
unsigned myVar2;
The page linked to in the original question said:
Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.
It's not just self-documentation, it's use the right tool for the right job. Saying that 'unsigned variables can cause bugs so don't use unsigned variables' is silly. Signed variables can also cause bugs. So can floats (more than integers). The only guaranteed bug-free code is code that doesn't exist.
Their example of why unsigned is evil, is this loop:
for (unsigned int i = foo.Length()-1; i >= 0; --i)
I have difficulty iterating backwards over a loop, and I usually make mistakes (with signed or unsigned integers) with it. Do I subtract one from size? Do I make it greater-than-AND-equal-to 0, or just greater than? It's a sloppy situation to begin with.
So what do you do with code that you know you have problems with? You change your coding style to fix the problem, make it simpler, and make it easier to read, and make it easier to remember. There is a bug in the loop they posted. The bug is, they wanted to allow a value below zero, but they chose to make it unsigned. It's their mistake.
But here's a simple trick that makes it easier to read, remember, write, and run. With unsigned variables. Here's the intelligent thing to do (obviously, this is my opinion).
for(unsigned i = myContainer.size(); i--> 0; )
{
std::cout << myContainer[i] << std::endl;
}
It's unsigned. It always works. No negative to the starting size. No worrying about underflows. It just works. It's just smart. Do it right, don't stop using unsigned variables because someone somewhere once said they had a mistake with a for() loop and failed to train themselves to not make the mistake.
The trick to remembering it:
i --> 0
(it's a combination of post-decrementing (i--) and greater-than comparison (i > 0))It's better to teach yourself tricks to code right, then to throw away tools because you don't code right.
Which would you want to see in your code?
for(unsigned i = myContainer.size()-1; i >= 0; --i)
Or:
for(unsigned i = myContainer.size(); i--> 0; )
Not because it's less characters to type (that'd be silly), but because it's less mental clutter. It's simpler to mentally parse when skimming through code, and easier to spot mistakes.
Try the code yourself
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With