Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there still a reason to use `int` in C++ code? [duplicate]

Tags:

c++

People also ask

Why we use int in c++?

C++ int. The int keyword is used to indicate integers. Its size is usually 4 bytes. Meaning, it can store values from -2147483648 to 2147483647.

Do all C programs start with int main?

Show activity on this post. Not all C "programs" start with "int main." I put program in quotes to highlight the fact that you can have c files that do not have main in them, such as class definitions.


There was a discussion on the C++ Core Guidelines what to use:

https://github.com/isocpp/CppCoreGuidelines/pull/1115

Herb Sutter wrote that gsl::index will be added (in the future maybe std::index), which will be defined as ptrdiff_t.

hsutter commented on 26 Dec 2017 •

(Thanks to many WG21 experts for their comments and feedback into this note.)

Add the following typedef to GSL

namespace gsl { using index = ptrdiff_t; }

and recommend gsl::index for all container indexes/subscripts/sizes.

Rationale

The Guidelines recommend using a signed type for subscripts/indices. See ES.100 through ES.107. C++ already uses signed integers for array subscripts.

We want to be able to teach people to write "new clean modern code" that is simple, natural, warning-free at high warning levels, and doesn’t make us write a "pitfall" footnote about simple code.

If we don’t have a short adoptable word like index that is competitive with int and auto, people will still use int and auto and get their bugs. For example, they will write for(int i=0; i<v.size(); ++i) or for(auto i=0; i<v.size(); ++i) which have 32-bit size bugs on widely used platforms, and for(auto i=v.size()-1; i>=0; ++i) which just doesn't work. I don’t think we can teach for(ptrdiff_t i = ... with a straight face, or that people would accept it.

If we had a saturating arithmetic type, we might use that. Otherwise, the best option is ptrdiff_t which has nearly all the advantages of a saturating arithmetic unsigned type, except only that ptrdiff_t still makes the pervasive loop style for(ptrdiff_t i=0; i<v.size(); ++i) emit signed/unsigned mismatches on i<v.size() (and similarly for i!=v.size()) for today's STL containers. (If a future STL changes its size_type to be signed, even this last drawback goes away.)

However, it would be hopeless (and embarrassing) to try to teach people to routinely write for (ptrdiff_t i = ... ; ... ; ...). (Even the Guidelines currently use it in only one place, and that's a "bad" example that is unrelated to indexing`.)

Therefore we should provide gsl::index (which can later be proposed for consideration as std::index) as a typedef for ptrdiff_t, so we can hopefully (and not embarrassingly) teach people to routinely write for (index i = ... ; ... ; ...).

Why not just tell people to write ptrdiff_t? Because we believe it would be embarrassing to tell people that's what you have to do in C++, and even if we did people won't do it. Writing ptrdiff_t is too ugly and unadoptable compared to auto and int. The point of adding the name index is to make it as easy and attractive as possible to use a correctly sized signed type.

Edit: More rationale from Herb Sutter

Is ptrdiff_t big enough? Yes. Standard containers are already required to have no more elements than can be represented by ptrdiff_t, because subtracting two iterators must fit in a difference_type.

But is ptrdiff_t really big enough, if I have a built-in array of char or byte that is bigger than half the size of the memory address space and so has more elements than can be represented in a ptrdiff_t? Yes. C++ already uses signed integers for array subscripts. So use index as the default option for the vast majority of uses including all built-in arrays. (If you do encounter the extremely rare case of an array, or array-like type, that is bigger than half the address space and whose elements are sizeof(1), and you're careful about avoiding truncation issues, go ahead and use a size_t for indexes into that very special container only. Such beasts are very rare in practice, and when they do arise often won't be indexed directly by user code. For example, they typically arise in a memory manager that takes over system allocation and parcels out individual smaller allocations that its users use, or in an MPEG or similar which provides its own interface; in both cases the size_t should only be needed internally within the memory manager or the MPEG class implementation.)


I come at this from the perspective of an old timer (pre C++)... It was understood back in the day that int was the native word of the platform and was likely to give the best performance.

If you needed something bigger, then you'd use it and pay the price in performance. If you needed something smaller (limited memory, or specific need for a fixed size), same thing.. otherwise use int. And yeah, if your value was in the range where int on one target platform could accommodate it and int on another target platform could not.. then we had our compile time size specific defines (prior to them becoming standardized we made our own).

But now, present day, processors and compilers are much more sophisticated and these rules don't apply so easily. It is also harder to predict what the performance impact of your choice will be on some unknown future platform or compiler ... How do we really know that uint64_t for example will perform better or worse than uint32_t on any particular future target? Unless you're a processor/compiler guru, you don't...

So... maybe it's old fashioned, but unless I am writing code for a constrained environment like Arduino, etc. I still use int for general purpose values that I know will be within int size on all reasonable targets for the application I am writing. And the compiler takes it from there... These days that generally means 32 bits signed. Even if one assumes that 16 bits is the minimum integer size, it covers most use cases.. and the use cases for numbers larger than that are easily identified and handled with appropriate types.


Most programs do not live and die on the edge of a few CPU cycles, and int is very easy to write. However, if you are performance-sensitive, I suggest using the fixed-width integer types defined in <cstdint>, such as int32_t or uint64_t. These have the benefit of being very clear in their intended behavior in regards to being signed or unsigned, as well as their size in memory. This header also includes the fast variants such as int_fast32_t, which are at least the stated size, but might be more, if it helps performance.


No formal reason to use int. It doesn't correspond to anything sane as per standard. For indices you almost always want signed pointer-sized integer.

That said, typing int feels like you just said hey to Ritchie and typing std::ptrdiff_t feels like Stroustrup just kicked you in the butt. Coders are people too, don't bring too much ugliness into their life. I would prefer to use long or some easily typed typedef like index instead of std::ptrdiff_t.


This is somewhat opinion-based, but alas, the question somewhat begs for it, too.

First of all, you talk about integers and indices as if they were the same thing, which is not the case. For any such thing as "integer of sorts, not sure what size", simply using int is of course, most of the time, still appropriate. This works fine most of the time, for most applications, and the compiler is comfortable with it. As a default, that's fine.

For array indices, it's a different story.

There is to date one single formally correct thing, and that's std::size_t. In the future, there may be a std::index_t which makes the intent clearer on the source level, but so far there is not.
std::ptrdiff_t as an index "works" but is just as incorrect as int since it allows for negative indices.
Yes, this happens what Mr. Sutter deems correct, but I beg to differ. Yes, on an assembly language instruction level, this is supported just fine, but I still object. The standard says:

8.3.4/6: E1[E2] is identical to *((E1)+(E2)) [...] Because of the conversion rules that apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1.
5.7/5: [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object [...] otherwise, the behavior is undefined.

An array subscription refers to the E2-th member of E1. There is no such thing as a negative-th element of an array. But more importantly, the pointer arithmetic with a negative additive expression invokes undefined behavior.

In other words: signed indices of whatever size are a wrong choice. Indices are unsigned. Yes, signed indices work, but they're still wrong.

Now, although size_t is by definition the correct choice (an unsigned integer type that is large enough to contain the size of any object), it may be debatable whether it is truly good choice for the average case, or as a default.

Be honest, when was the last time you created an array with 1019 elements?

I am personally using unsigned int as a default because the 4 billion elements that this allows for is way enough for (almost) every application, and it already pushes the average user's computer rather close to its limit (if merely subscribing an array of integers, that assumes 16GB of contiguous memory allocated). I personally deem defaulting to 64-bit indices as ridiculous.

If you are programming a relational database or a filesystem, then yes, you will need 64-bit indices. But for the average "normal" program, 32-bit indices are just good enough, and they only consume half as much storage.

When keeping around considerably more than a handful of indices, and if I can afford (because arrays are not larger than 64k elements), I even go down to uint16_t. No, I'm not joking there.

Is storage really such a problem? It's ridiculous to greed about two or four bytes saved, isn't it! Well, no...

Size can be a problem for pointers, so sure enough it can be for indices as well. The x32 ABI does not exist for no reason. You will not notice the overhead of needlessly large indices if you have only a handful of them in total (just like pointers, they will be in registers anyway, nobody will notice whether they're 4 or 8 bytes in size).

But think for example of a slot map where you store an index for every element (depending on the implementation, two indices per element). Oh heck, it sure does make a bummer of a difference whether you hit L2 every time, or whether you have a cache miss on every access! Bigger is not always better.

At the end of the day, you must ask yourself what you pay for, and what you get in return. With that in mind, my style recommendation would be:

If it costs you "nothing" because you only have e.g. one pointer and a few indices to keep around, then just use what's formally correct (that'd be size_t). Formally correct is good, correct always works, it's readable and intellegible, and correct is... never wrong.

If, however, it does cost you (you have maybe several hundred or thousand or ten thousand indices), and what you get back is worth nothing (because e.g. you cannot even store 220 elements, so whether you could subscribe 232 or 264 makes no difference), you should think twice about being too wasteful.