Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why aren't the C-supplied integer types good enough for basically any project?

Tags:

c

types

integer

I'm much more of a sysadmin than a programmer. But I do spend an inordinate amount of time grovelling through programmers' code trying to figure out what went wrong. And a disturbing amount of that time is spent dealing with problems when the programmer expected one definition of __u_ll_int32_t or whatever (yes, I know that's not real), but either expected the file defining that type to be somewhere other than it is, or (and this is far worse but thankfully rare) expected the semantics of that definition to be something other than it is.

As I understand C, it deliberately doesn't make width definitions for integer types (and that this is a Good Thing), but instead gives the programmer char, short, int, long, and long long, in all their signed and unsigned glory, with defined minima which the implementation (hopefully) meets. Furthermore, it gives the programmer various macros that the implementation must provide to tell you things like the width of a char, the largest unsigned long, etc. And yet the first thing any non-trivial C project seems to do is either import or invent another set of types that give them explicitly 8, 16, 32, and 64 bit integers. This means that as the sysadmin, I have to have those definition files in a place the programmer expects (that is, after all, my job), but then not all of the semantics of all those definitions are the same (this wheel has been re-invented many times) and there's no non-ad-hoc way that I know of to satisfy all of my users' needs here. (I've resorted at times to making a <bits/types_for_ralph.h>, which I know makes puppies cry every time I do it.)

What does trying to define the bit-width of numbers explicitly (in a language that specifically doesn't want to do that) gain the programmer that makes it worth all this configuration management headache? Why isn't knowing the defined minima and the platform-provided MAX/MIN macros enough to do what C programmers want to do? Why would you want to take a language whose main virtue is that it's portable across arbitrarily-bitted platforms and then typedef yourself into specific bit widths?

like image 875
Bandrami Avatar asked Jun 27 '14 05:06

Bandrami


1 Answers

When a C or C++ programmer (hereinafter addressed in second-person) is choosing the size of an integer variable, it's usually in one of the following circumstances:

  • You know (at least roughly) the valid range for the variable, based on the real-world value it represents. For example,
    • numPassengersOnPlane in an airline reservation system should accommodate the largest supported airplane, so needs at least 10 bits. (Round up to 16.)
    • numPeopleInState in a US Census tabulating program needs to accommodate the most populous state (currently about 38 million), so needs at least 26 bits. (Round up to 32.)

In this case, you want the semantics of int_leastN_t from <stdint.h>. It's common for programmers to use the exact-width intN_t here, when technically they shouldn't; however, 8/16/32/64-bit machines are so overwhelmingly dominant today that the distinction is merely academic.

You could use the standard types and rely on constraints like “int must be at least 16 bits”, but a drawback of this is that there's no standard maximum size for the integer types. If int happens to be 32 bits when you only really needed 16, then you've unnecessarily doubled the size of your data. In many cases (see below), this isn't a problem, but if you have an array of millions of numbers, then you'll get lots of page faults.

  • Your numbers don't need to be that big, but for efficiency reasons, you want a fast, “native” data type instead of a small one that may require time wasted on bitmasking or zero/sign-extension.

This is the int_fastN_t types in <stdint.h>. However, it's common to just use the built-in int here, which in the 16/32-bit days had the semantics of int_fast16_t. It's not the native type on 64-bit systems, but it's usually good enough.

  • The variable is an amount of memory, array index, or casted pointer, and thus needs a size that depends on the amount of addressable memory.

This corresponds to the typedefs size_t, ptrdiff_t, intptr_t, etc. You have to use typedefs here because there is no built-in type that's guaranteed to be memory-sized.

  • The variable is part of a structure that's serialized to a file using fread/fwrite, or called from a non-C language (Java, COBOL, etc.) that has its own fixed-width data types.

In these cases, you truly do need an exact-width type.

  • You just haven't thought about the appropriate type, and use int out of habit.

Often, this works well enough.


So, in summary, all of the typedefs from <stdint.h> have their use cases. However, the usefulness of the built-in types is limited due to:

  • Lack of maximum sizes for these types.
  • Lack of a native memsize type.
  • The arbitrary choice between LP64 (on Unix-like systems) and LLP64 (on Windows) data models on 64-bit systems.

As for why there are so many redundant typedefs of fixed-width (WORD, DWORD, __int64, gint64, FINT64, etc.) and memsize (INT_PTR, LPARAM, VPTRDIFF, etc.) integer types, it's mainly because <stdint.h> came late in C's development, and people are still using older compilers that don't support it, so libraries need to define their own. Same reason why C++ has so many string classes.

like image 93
dan04 Avatar answered Sep 28 '22 02:09

dan04