Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I just use "int" versus more sign-specific or size-specific types?

Tags:

c

unsigned

signed

I have a little VM for a programming language implemented in C. It supports being compiled under both 32-bit and 64-bit architectures as well as both C and C++.

I'm trying to make it compile cleanly with as many warnings enabled as possible. When I turn on CLANG_WARN_IMPLICIT_SIGN_CONVERSION, I get a cascade of new warnings.

I'd like to have a good strategy for when to use int versus either explicitly unsigned types, and/or explicitly sized ones. So far, I'm having trouble deciding what that strategy should be.

It's certainly true that mixing them—using mostly int for things like local variables and parameters and using narrower types for fields in structs—causes lots of implicit conversion problems.

I do like using more specifically sized types for struct fields because I like the idea of explicitly controlling memory usage for objects in the heap. Also, for hash tables, I rely on unsigned overflow when hashing, so it's nice if the hash table's size is stored as uint32_t.

But, if I try to use more specific types everywhere, I find myself in a maze of twisty casts everywhere.

What do other C projects do?

like image 711
munificent Avatar asked Mar 22 '15 18:03

munificent


People also ask

Should I use int or int32_t?

int/short/long etc might use a type that is faster for that particular target, potentially causing your application to behave differently. Using int32_t etc will make the behavior consistent, but possibly sacrifising performance in doing so.

When should you use a long instead of int as your variable type?

The typical thing to do is to just use int if you don't care about the size of the integer. If you need a 64-bit integer, then you use long . If you're trying to use less memory and int is far more than you need, then you use byte or short .

How do you decide which integer type to use?

Q: How should I decide which integer type to use? A: If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int.

Should I use short or int?

Conclusion: Use int unless you conserving memory is critical, or your program uses a lot of memory (e.g. many arrays). In that case, use short .


1 Answers

Just using int everywhere may seem tempting, since it minimizes the need for casting, but there are several potential pitfalls you should be aware of:

  • An int might be shorter than you expect. Even though, on most desktop platforms, an int is typically 32 bits, the C standard only guarantees a minimum length of 16 bits. Could your code ever need numbers larger than 216−1 = 32,767, even for temporary values? If so, don't use an int. (You may want to use a long instead; a long is guaranteed to be at least 32 bits.)

  • Even a long might not always be long enough. In particular, there is no guarantee that the length of an array (or of a string, which is a char array) fits in a long. Use size_t (or ptrdiff_t, if you need a signed difference) for those.

    In particular, a size_t is defined to be large enough to hold any valid array index, whereas an int or even a long might not be. Thus, for example, when iterating over an array, your loop counter (and its initial / final values) should generally be a size_t, at least unless you know for sure that the array is short enough for a smaller type to work. (But be careful when iterating backwards: size_t is unsigned, so for(size_t i = n-1; i >= 0; i--) is an infinite loop! Using i != SIZE_MAX or i != (size_t) -1 should work, though; or use a do/while loop, but beware of the case n == 0!)

  • An int is signed. In particular, this means that int overflow is undefined behavior. If there's ever any risk that your values might legitimately overflow, don't use an int; use an unsigned int (or an unsigned long, or uintNN_t) instead.

  • Sometimes, you just need a fixed bit length. If you're interfacing with an ABI, or reading / writing a file format, that requires integers of a specific length, then that's the length you need to use. (Of course, is such situations, you may also need to worry about things like endianness, and so may sometimes have to resort to manually packing data byte-by-byte anyway.)

All that said, there are also reasons to avoid using the fixed-length types all the time: not only is int32_t awkward to type all the time, but forcing the compiler to always use 32-bit integers is not always optimal, particularly on platforms where the native int size might be, say, 64 bits. You could use, say, C99 int_fast32_t, but that's even more awkward to type.


Thus, here are my personal suggestions for maximum safety and portability:

  • Define your own integer types for casual use in a common header file, something like this:

    #include <limits.h> typedef int i16; typedef unsigned int u16; #if UINT_MAX >= 4294967295U   typedef int i32;   typedef unsigned int u32; #else   typedef long i32;   typedef unsigned long i32; #endif 

    Use these types for anything where the exact size of the type doesn't matter, as long as they're big enough. The type names I've suggested are both short and self-documenting, so they should be easy to use in casts where needed, and minimize the risk of errors due to using a too-narrow type.

    Conveniently, the u32 and u16 types defined as above are guaranteed to be at least as wide as unsigned int, and thus can be used safely without having to worry about them being promoted to int and causing undefined overflow behavior.

  • Use size_t for all array sizes and indexing, but be careful when casting between it and any other integer types. Optionally, if you don't like to type so many underscores, typedef a more convenient alias for it too.

  • For calculations that assume overflow at a specific number of bits, either use uintNN_t, or just use u16 / u32 as defined above and explicit bitmasking with &. If you choose to use uintNN_t, make sure to protect yourself against unexpected promotion to int; one way to do that is with a macro like:

    #define u(x) (0U + (x)) 

    which should let you safely write e.g.:

    uint32_t a = foo(), b = bar(); uint32_t c = u(a) * u(b);  /* this is always unsigned multiply */ 
  • For external ABIs that require a specific integer length, again define a specific type, e.g.:

    typedef int32_t fooint32;  /* foo ABI needs 32-bit ints */ 

    Again, this type name is self-documenting, with regard to both its size and its purpose.

    If the ABI might actually require, say, 16- or 64-bit ints instead, depending on the platform and/or compile-time options, you can change the type definition to match (and rename the type to just fooint) — but then you really do need to be careful whenever you cast anything to or from that type, because it might overflow unexpectedly.

  • If your code has its own structures or file formats that require specific bitlengths, consider defining custom types for those too, exactly as if it was an external ABI. Or you could just use uintNN_t instead, but you'll lose a little bit of self-documentation that way.

  • For all these types, don't forget to also define the corresponding _MIN and _MAX constants for easy bounds checking. This might sound like a lot of work, but it's really just a couple of lines in a single header file.

Finally, remember to be careful with integer math, especially overflows. For example, keep in mind that the difference of two n-bit signed integers may not fit in an n-bit int. (It will fit into an n-bit unsigned int, if you know it's non-negative; but remember that you need to cast the inputs to an unsigned type before taking their difference to avoid undefined behavior!) Similarly, to find the average of two integers (e.g. for a binary search), don't use avg = (lo + hi) / 2, but rather e.g. avg = lo + (hi + 0U - lo) / 2; the former will break if the sum overflows.

like image 155
Ilmari Karonen Avatar answered Nov 03 '22 18:11

Ilmari Karonen