In C, why is signed int
faster than unsigned int
? True, I know that this has been asked and answered multiple times on this website (links below). However, most people said that there is no difference. I have written code and accidentally found a significant performance difference.
Why would the "unsigned" version of my code be slower than the "signed" version (even when testing the same number)? (I have a x86-64 Intel processor).
Similar links
Compile Command: gcc -Wall -Wextra -pedantic -O3 -Wl,-O3 -g0 -ggdb0 -s -fwhole-program -funroll-loops -pthread -pipe -ffunction-sections -fdata-sections -std=c11 -o ./test ./test.c && strip --strip-all --strip-unneeded --remove-section=.note --remove-section=.comment ./test
signed int
versionNOTE: There is no difference if I explicitly declare signed int
on all numbers.
int isprime(int num) { // Test if a signed int is prime int i; if (num % 2 == 0 || num % 3 == 0) return 0; else if (num % 5 == 0 || num % 7 == 0) return 0; else { for (i = 11; i < num; i += 2) { if (num % i == 0) { if (i != num) return 0; else return 1; } } } return 1; }
unsigned int
versionint isunsignedprime(unsigned int num) { // Test if an unsigned int is prime unsigned int i; if (num % (unsigned int)2 == (unsigned int)0 || num % (unsigned int)3 == (unsigned int)0) return 0; else if (num % (unsigned int)5 == (unsigned int)0 || num % (unsigned int)7 == (unsigned int)0) return 0; else { for (i = (unsigned int)11; i < num; i += (unsigned int)2) { if (num % i == (unsigned int)0) { if (i != num) return 0; else return 1; } } } return 1; }
Test this in a file with the below code:
int main(void) { printf("%d\n", isprime(294967291)); printf("%d\n", isprime(294367293)); printf("%d\n", isprime(294967293)); printf("%d\n", isprime(294967241)); // slow printf("%d\n", isprime(294967251)); printf("%d\n", isprime(294965291)); printf("%d\n", isprime(294966291)); printf("%d\n", isprime(294963293)); printf("%d\n", isprime(294927293)); printf("%d\n", isprime(294961293)); printf("%d\n", isprime(294917293)); printf("%d\n", isprime(294167293)); printf("%d\n", isprime(294267293)); printf("%d\n", isprime(294367293)); // slow printf("%d\n", isprime(294467293)); return 0; }
Results (time ./test
):
Signed - real 0m0.949s Unsigned - real 0m1.174s
While arithmetic "might" be faster using int , one should remember that integer arithmetic is rarely a bottleneck (on modern desktop cpus at least), memory bandwidth on the other hand often is, so for large datasets short might actually give considerably better performance then int.
A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a nonnegative integer in the range [0 to 4294967295].
The term "unsigned" in computer programming indicates a variable that can hold only positive numbers. The term "signed" in computer code indicates that a variable can hold negative and positive values. The property can be applied to most of the numeric data types including int, char, short and long.
When no negative numbers are required, unsigned integers are well-suited for networking and systems with little memory, because unsigned integers can store more positive numbers without taking up extra memory. New programmers sometimes get signed and unsigned mixed up.
Your question is genuinely intriguing as the unsigned version consistently produces code that is 10 to 20% slower. Yet there are multiple problems in the code:
0
for 2
, 3
, 5
and 7
, which is incorrect.if (i != num) return 0; else return 1;
is completely useless as the loop body is only run for i < num
. Such a test would be useful for the small prime tests but special casing them is not really useful.clock()
function to time CPU intensive functions without any intervening I/O.num / 2
times instead of sqrt(num)
.Let's simplify the code and run some precise benchmarks:
#include <stdio.h> #include <time.h> int isprime_slow(int num) { if (num % 2 == 0) return num == 2; for (int i = 3; i < num; i += 2) { if (num % i == 0) return 0; } return 1; } int unsigned_isprime_slow(unsigned int num) { if (num % 2 == 0) return num == 2; for (unsigned int i = 3; i < num; i += 2) { if (num % i == 0) return 0; } return 1; } int isprime_fast(int num) { if (num % 2 == 0) return num == 2; for (int i = 3; i * i <= num; i += 2) { if (num % i == 0) return 0; } return 1; } int unsigned_isprime_fast(unsigned int num) { if (num % 2 == 0) return num == 2; for (unsigned int i = 3; i * i <= num; i += 2) { if (num % i == 0) return 0; } return 1; } int main(void) { int a[] = { 294967291, 0, 294367293, 0, 294967293, 0, 294967241, 1, 294967251, 0, 294965291, 0, 294966291, 0, 294963293, 0, 294927293, 1, 294961293, 0, 294917293, 0, 294167293, 0, 294267293, 0, 294367293, 0, 294467293, 0, }; struct testcase { int (*fun)(); const char *name; int t; } test[] = { { isprime_slow, "isprime_slow", 0 }, { unsigned_isprime_slow, "unsigned_isprime_slow", 0 }, { isprime_fast, "isprime_fast", 0 }, { unsigned_isprime_fast, "unsigned_isprime_fast", 0 }, }; for (int n = 0; n < 4; n++) { clock_t t = clock(); for (int i = 0; i < 30; i += 2) { if (test[n].fun(a[i]) != a[i + 1]) { printf("%s(%d) != %d\n", test[n].name, a[i], a[i + 1]); } } test[n].t = clock() - t; } for (int n = 0; n < 4; n++) { printf("%21s: %4d.%03dms\n", test[n].name, test[n].t / 1000), test[n].t % 1000); } return 0; }
The code compiled with clang -O2
on OS/X produces this output:
isprime_slow: 788.004ms unsigned_isprime_slow: 965.381ms isprime_fast: 0.065ms unsigned_isprime_fast: 0.089ms
These timings are consistent with the OP's observed behavior on a different system, but show the dramatic improvement caused by the more efficient iteration test: 10000 times faster!
Regarding the question Why is the function slower with unsigned?, let's look at the generated code (gcc 7.2 -O2):
isprime_slow(int): ... .L5: movl %edi, %eax cltd idivl %ecx testl %edx, %edx je .L1 .L4: addl $2, %ecx cmpl %esi, %ecx jne .L5 .L6: movl $1, %edx .L1: movl %edx, %eax ret unsigned_isprime_slow(unsigned int): ... .L19: xorl %edx, %edx movl %edi, %eax divl %ecx testl %edx, %edx je .L22 .L18: addl $2, %ecx cmpl %esi, %ecx jne .L19 .L20: movl $1, %eax ret ... .L22: xorl %eax, %eax ret
The inner loops are very similar, same number of instructions, similar instructions. Here are however some potential explanations:
cltd
extends the sign of the eax
register into the edx
register, which may be causing an instruction delay because eax
is modified by the immediately preceeding instruction movl %edi, %eax
. Yet this would make the signed version slower than the unsigned one, not faster.idivl
instruction take fewer cycles than the divl
instruction. Indeed the signed division operates on one less bit of precision than the unsigned division, but the difference seems quite large for this small change.idivl
because signed divisions are more common that unsigned divisions (as measured by years of coding statistics at Intel).This surprising result should teach us a few lessons:
Because signed integer overflow is undefined, the compiler can make a lot of assumptions and optimizations on code involving signed integers. Unsigned integer overflow is defined to wrap around, so the compiler won't be able to optimize as much. See also http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html#signed_overflow and http://www.airs.com/blog/archives/120.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With