Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

drawbacks or tradeoffs of using explicitly sized types in C family languages

I am developing several C and C++ projects that need to be portable across several desktop and mobile platforms. I know it is important to use explicitly sized types u32_t i64_t etc when I am reading and writing data to disk.

Would it be a good idea to use explicitly sized types of all integer types to ensure consistent execution? I have heard that explicitly sized types can have performance impacts because processors are optimized for their expected int type etc. I have also read that a good strategy is to use explicitly sized types internally for class data members but not in interfaces.

Are there any best practices regarding explicitly sized types on data members and interfaces? (I am assuming that there would not be a huge difference between C or C++ in these situations, but let me know if there is)

like image 594
Justin Meiners Avatar asked Apr 19 '12 04:04

Justin Meiners


2 Answers

The nice thing about the basic "int" type is that it will pretty much always be the fastest integer type for whatever platform you're currently compiling on.

On the other hand, the advantage of using, say, int32_t (instead of just int) is that your code can count on an int32_t always being 32 bits wide no matter what platform it is compiled on, which means you can safely make more assumptions about the behavior of the value than you could with an int. With fixed-size types, if your code compiles at all on new platform Y, then it's more likely to behave the exactly same as it did on old platform X.

The (theoretical) disadvantage of int32_t is that new platform X might not support 32-bit integers (in which case your code would not compile at all on that platform), or it might support them but handle them more slowly than it would handle plain old ints.

The examples above are a little contrived, since almost all modern hardware handles 32-bit integers at full speed, but there did (and do) exist platforms where manipulating int64_ts is slower than manipulating ints, because (a) the CPU has 32-bit registers, and must therefore split up each operation into multiple steps, and of course (b) a 64-bit integer will take up twice as much memory as a 32-bit integer, which can put extra pressure on the caches.

But: keep in mind that for 99% of the software people write, this issue is not going to have any observable effect on performance, simply because 99% of the software out there isn't CPU-bound these days, and even for the code that is, it's unlikely that integer-width will be the big performance issue. So what it really comes down to is, how do you want your integer math to behave?

  • If you want the compiler to guarantee that your integer values always take up 32 bits of RAM, and will always "wrap around" at 2^31 (or 2^32 for unsigned), no matter what platform you're compiling on, go with int32_t (etc).

  • If you don't really care about wrapping behavior (because you know your integers will never wrap anyway, due to the nature of the data they are storing), and you want to make the code a bit more portable for odd/unusual compile targets, and at least theoretically faster (although probably not in real life), then you can stick with plain old short/int/long.

Personally I use fixed-size types (int32_t, etc) by default unless there is a very clear reason not to, because I want to minimize the amount of variant behavior across platforms. For example, this code:

for (uint32_t i=0; i<4000000000; i++) foo();

... will always call foo() exactly 4000000000 times, whereas this code:

for (unsigned int i=0; i<4000000000; i++) foo();

might call foo() 4000000000 times, or it might go into an infinite loop, depending on whether (sizeof(int)>=4) or not. Certainly it would be possible to hand-verify that the second snippet doesn't do that on any given platform, but given the likely-zero performance difference between the two styles anyway, I prefer the first approach since predicting its behavior is a no-brainer. I think the char/short/int/long approach was more useful back in the early days of C, when computer architecture was more varied, and CPUs were slow enough that achieving full native performance was more important than safe coding.

like image 137
Jeremy Friesner Avatar answered Nov 16 '22 01:11

Jeremy Friesner


Use inttypes.h or stdint.h. It is ANSI-C, so it wll be supporte by any toolchain that aims to be ANSI compliant.

Furthermore, it still saves you the work of reinveinting the wheel.

The only thing you must do is

#include <inttypes.h>

uint32_t integer_32bits_nosign;
  • One more concern about portability: So important as data width is data endianess. You must check the target endianess with standard macros:

    struct {
    #if defined( __BIG_ENDIAN__ ) || defined( _BIG_ENDIAN )
        // Data disposition for Big Endian   
    #else
        // Data disposition for Little Endian   
    #endif
    };
    

It is specially sensitiive if you use bit-fields.


EDIT:

Of course you can use <csdtint> as others suggested if you plan to use it on C++ only code.

like image 27
j4x Avatar answered Nov 16 '22 00:11

j4x