difference between the function performance when passing parameter as compile time constant or variable

Question

In Linux kernel code there is a macro used to test bit ( Linux version 2.6.2 ):

#define test_bit(nr, addr)                      \
        (__builtin_constant_p((nr))             \
         ? constant_test_bit((nr), (addr))      \
         : variable_test_bit((nr), (addr)))

where constant_test_bit and variable_test_bit are defined as:

static inline int constant_test_bit(int nr, const volatile unsigned long *addr  )
{       
        return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}


static __inline__ int variable_test_bit(int nr, const volatile unsigned long *addr)
{       
        int oldbit;

        __asm__ __volatile__(
                "btl %2,%1
	sbbl %0,%0"
                :"=r" (oldbit)
                :"m" (ADDR),"Ir" (nr));
        return oldbit;
}

I understand that __builtin_constant_p is used to detect whether a variable is compile time constant or unknown. My question is: Is there any performance difference between these two functions when the argument is a compile time constant or not? Why use the C version when it is and use the assembly version when it's not?

UPDATE: The following main function is used to test the performance:

constant, call constant_test_bit:

int main(void) {
        unsigned long i, j = 21;
        unsigned long cnt = 0;
        srand(111)
        //j = rand() % 31;
        for (i = 1; i < (1 << 30); i++) {
                j = (j + 1) % 28;
                if (constant_test_bit(j, &i))
                        cnt++;
        }
        if (__builtin_constant_p(j))
                printf("j is a compile time constant
");
        return 0;
}

This correctly outputs the sentence j is a...

For the other situations just uncomment the line which assigns a "random" number to j and change the function name accordingly. When that line is uncommented the output will be empty, and this is expected.

I use gcc test.c -O1 to compile, and here is the result:

constant, constant_test_bit:

$ time ./a.out 

j is compile time constant

real    0m0.454s
user    0m0.450s
sys     0m0.000s

constant, variable_test_bit( omit time ./a.out, same for the following ):

j is compile time constant

real    0m0.885s
user    0m0.883s
sys     0m0.000s

variable, constant_test_bit:

real    0m0.485s
user    0m0.477s
sys     0m0.007s

variable, variable_test_bit:

real    0m3.471s
user    0m3.467s
sys     0m0.000s

I have each version runs several times, and the above results are the typical values of them. It seems the constant_test_bit function is always faster than the variable_test_bit function, no matter whether the parameter is a compile time constant or not... For the last two results( when j is not constant ) the variable version is even dramatically slower than the constant one. Am I missing something here?

Shafik Yaghmour · Accepted Answer

Using godbolt we can do a experiment using of constant_test_bit, the following two test functions are compiled gcc with the -O3 flag:

// Non constant expression test case
int func1(unsigned long i, unsigned long j)
{
  int x = constant_test_bit(j, &i) ;
  return x ;
}

// constant expression test case
int func2(unsigned long i)
{
  int x = constant_test_bit(21, &i) ;
  return x ;
}

We see the optimizer is able to optimize the constant expression case to the following:

shrq    $21, %rax
andl    $1, %eax

while the non-constant expression case ends up as follows:

sarl    $5, %eax
andl    $31, %ecx
cltq
leaq    -8(%rsp,%rax,8), %rax
movq    (%rax), %rax
shrq    %cl, %rax
andl    $1, %eax

So the optimizer is able to produce much better code for the constant expression case and we can see that the non-constant case for constant_test_bit is pretty bad compared to the hand rolled assembly in variable_test_bit and the implementer must believe the constant expression case for constant_test_bit ends up being better than:

btl %edi,8(%rsp)
sbbl %esi,%esi

for most cases.

As to why your test case seems to show a different conclusion is that your test case it is flawed. I have not been able to suss out all the issues. But if we look at this case using constant_test_bit with a non-constant expression we can see the optimizer is able to move all the work outside the look and reduce the work related to constant_test_bit inside the loop to:

movq    (%rax), %rdi

even with an older gcc version, but this case may not be relevant to the cases test_bit is being used in. There may be more specific cases where this kind of optimization won't be possible.

difference between the function performance when passing parameter as compile time constant or variable

Tags:

c

linux

optimization

gcc

linux-kernel

Xiangyu Zhu

1 Answers

Shafik Yaghmour

Recent Activity

Donate For Us

difference between the function performance when passing parameter as compile time constant or variable

Tags:

c

linux

optimization

gcc

linux-kernel

Xiangyu Zhu

1 Answers

Shafik Yaghmour

Related questions

Recent Activity

Donate For Us