Fastest integer type for common architectures

Tags:

The stdint.h header lacks an int_fastest_t and uint_fastest_t to correspond with the {,u}int_fastX_t types. For instances where the width of the integer type does not matter, how does one pick the integer type that allows processing the greatest quantity of bits with the least penalty to performance? For example, if one was searching for the first set bit in a buffer using a naive approach, a loop such as this might be considered:

// return the bit offset of the first 1 bit
size_t find_first_bit_set(void const *const buf)
{
    uint_fastest_t const *p = buf; // use the fastest type for comparison to zero
    for (; *p == 0; ++p); // inc p while no bits are set
    // return offset of first bit set
    return (p - buf) * sizeof(*p) * CHAR_BIT + ffsX(*p) - 1;
}

Naturally, using char would result in more operations than int. But long long might result in more expensive operations than the overhead of using int on a 32 bit system and so on.

My current assumption is for the mainstream architectures, the use of long is the safest bet: It's 32 bit on 32 bit systems, and 64 bit on 64 bit systems.

424

asked Sep 12 '10 07:09

Matt Joiner

2 Answers

int_fast8_t is always the fastest integer type in a correct implementation. There can never be integer types smaller than 8 bits (because CHAR_BIT>=8 is required), and since int_fast8_t is the fastest integer type with at least 8 bits, it's thus the fastest integer type, period.

answered Oct 25 '22 05:10

R.. GitHub STOP HELPING ICE

Theoretically, int is the best bet. It should map to the CPU's native register size, and thus be "optimal" in the sense you're asking about.

However, you may still find that an int-64 or int-128 is faster on some CPUs than an int-32, because although these are larger than the register size, they will reduce the number of iterations of your loop, and thus may work out more efficient by minimising the loop overheads and/or taking advantage of DMA to load/store the data faster.

(For example, on ARM-2 processors it took 4 memory cycles to load one 32-bit register, but only 5 cycles to load two sequentially, and 7 cycles to load 4 sequentially. The routine you suggest above would be optimised to use as many registers as you could free up (8 to 10 usually), and could therefore run up to 3 or 4 times faster by using multiple registers per loop iteration)

The only way to be sure is to write several routines and then profile them on the specific target machine to find out which produces the best performance.

answered Oct 25 '22 05:10

Jason Williams

Related questions
                            
                                How to see if a pipe is empty
                            
                                what is the difference between struct {0} and memset 0 [duplicate]
                            
                                Find position of prime number
                            
                                How to send ctrl+z in C
                            
                                How can I prevent zombie child processes?
                            
                                How do I sum the four 2-bit bitfields in a single 8-bit byte?
                            
                                GCC: __attribute__((malloc))
                            
                                valgrind unrecognizes memcmp instruction in raspberry Pi
                            
                                Does endianness apply to bit order too?
                            
                                Why is the windows return code called HRESULT?
                            
                                Why is this implicit conversion (between different pointer types) valid?
                            
                                Linking statically OpenSSL crypto library in CMake
                            
                                Does an else if statement exist?
                            
                                MFC resources / links
                            
                                Determine highest .NET Framework version [duplicate]
                            
                                How to declare the size of an array at runtime in C?
                            
                                What happens to memory that is not freed after end of program? [duplicate]
                            
                                Waiting for all child processes before parent resumes execution UNIX
                            
                                Reusing socket descriptor on connection failure
                            
                                Compiling a C program with a specific architecture

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest integer type for common architectures

Tags:

performance

c

architecture

integer

stdint

Matt Joiner

People also ask

2 Answers

R.. GitHub STOP HELPING ICE

Jason Williams

Recent Activity

Donate For Us