In C++, the default size for array indices is size_t which is a 64 bits unsigned 64-bits integer on most x86-64 platforms. I am in the process of building my own std::vector class for my library for High Performance Computing (One of the main reason is that I want this class to be able to take ownership of a pointer, something std::vector does not offer). For the type of the array index, I am thinking of either using:
The advantages or using a signed integer over an unsigned one are numerous, such as
for (index_t i = 0; i < v.size() - 1; ++i)
works like it is supposer to (with an unsigned integer, this loop goes crazy when v is of size 0)
for (index_t i = v.size() - 1; i >= 0; --i)
works like it is supposed to, and many other avantages. In terms of performance, it even seems to be a little bit better as
a + 1 < b + 1
can be reduced to a < b with signed integer (overflow is undefined), and not in the case of unsigned integers. The only avantage performance wise seems to be that a /= 2 can be reduced to a shift operation with unsigned integers but not with signed one.
I am wondering why the C++ committee has decided to use an unsigned integer for size_t as it seems to introduce a lot of pain and only few advantages.
The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.
An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295]. The signed integer is represented in twos complement notation. The most significant byte is 0 and the least significant is 3.
An unsigned data type simply means that the data type will only hold positive values; negatives aren't allowed to be stored in the data type. Unsigned data types include int, char, short, and long.
To convert a signed integer to an unsigned integer, or to convert an unsigned integer to a signed integer you need only use a cast. For example: int a = 6; unsigned int b; int c; b = (unsigned int)a; c = (int)b; Actually in many cases you can dispense with the cast.
The motivation for using an unsigned type as index or size in
the standard is based on constraints only relevant to 16 bit
machines. The natural type for any integral type in C++ is
int
, and that's what should probably be used; as you've
noticed, trying to use unsigned types as numerical values in C++
is fraught with problems. If you're worried about the sizes
being so big that they don't fit into an int
, ptrdiff_t
would be appropriate; this is, after all, the type of the
results of subtraction of pointers or iterators. (The fact that
v.size()
has a different type than v.end() - v.begin()
is
really a design flaw in the standard library.)
For me, unsigned sizes always make the most sense, since you can't have -32 elements in an array it is very very scary to consider the size/length as a signed quantity all the time.
The corner cases you mention can be coded around, you can e.g. abort the loop before entering it if v
is empty for the first case (which doesn't look all that common to begin with, iterating over all elements except the last?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With