I've being benchmarking an algorithm, it's not necessary to know the details. The main components are a buffer(raw array of integers) and an indexer (integer - used to access the elements in buffer).
The fastest types for the buffer seem to be unsigned char, and both signed and unsigned versions of short, int, long. However char/signed char was slower. Difference: 1.07x.
For the indexer there was no difference between signed and unsigned types. However int and long were 1.21x faster than char and short.
Is there a type that should be used by default when considering performance and not memory consumption?
NOTE: The operations used on the elements of the buffer and the indexer were assignment, increment, decrement and comparison.
The fast type (int_fast#_t) gives you an integer that's the fastest type with a width of at least # bits (where # = 8, 16, 32, or 64). For example, int_fast32_t will give you the fastest integer type that's at least 32 bits.
In most cases using int in a loop is more efficient than using short. My simple tests showed a performance gain of ~10% when using int.
'int' runs much faster. Conclusion: Use "int" data type instead of "long" as much as you can to get better execution performance in interpreted-only mode.
It's an optional typedef that the implementation must provide iff it has an unsigned integer type of exactly 32-bits. Some have a 9-bit bytes for example, so they don't have a uint32_t . uint_fast32_t states your intent clearly: it's a type of at least 32 bits which is the best from a performance point-of-view.
Generally the biggest win comes from cacheing.
If your data values are small enough that they fit in 8 bits then you can fit more of the data in the CPU cache than if you used ints and wasted 3 bytes/value. If you are processing a block of data you get a huge speed advantage for cache hits.
The type of the index is less important, as long as it fits in a CPU register (ie don't try using a long long
on an 8bit CPU) it will have the same speed
edit: it's also worth mentioning that measuring speed is tricky. You need to run the algorithm several times to allow for caching, you need to watch what else is running on the CPU and even what other hardware might be interrupting. Speed differences of 10% might be considered noise unless you are very careful.
It depends heavily on the underlying architecture. Usually fastest data types are those that are word-wide. In my experience with IA32 (x86-32), smaller/bigger than word data types incur in penalties, sometimes even more than one memory read for one single data.
Once on the CPU registers, usually data type length doesn't matter (if the whole data fits in one register, that is) but what operations you accomplish with them. Of course floating point operations are the most costly; the fastest being adding, subtracting (which is also comparing), bit-wise (shift and the like), and logical operations (and, or...).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With