Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a sse4.2 popcnt machine instruction

Using the c program:

int main(int argc , char** argv)
{

  return  __builtin_popcountll(0xf0f0f0f0f0f0f0f0);

}

and the compiler line (gcc 4.4 - Intel Xeon L3426):

gcc -msse4.2 poptest.c -o poptest

I do NOT get the builtin popcnt insruction rather the compiler generates a lookup table and computes the popcount that way. The resulting binary is over 8000 bytes. (Yuk!)

Thanks so much for any assistance.

like image 624
Alan Moskowitz Avatar asked Jun 21 '11 15:06

Alan Moskowitz


People also ask

What is Popcnt instruction set?

What Is POPCNT CPU? According to Wikipedia, instruction POPCNT, population count (count number of bits set to 1), support is indicated via the CPUID. 01H;ECX. POPCNT[Bit 23] flag. Intel implements POPCNT beginning with the Nehalem microarchitecture and AMD with the Barcelona microarchitecture.


2 Answers

You have to tell GCC to generate code for an architecture that supports the popcnt instruction:

gcc -march=corei7 popcnt.c

Or just enable support for popcnt:

gcc -mpopcnt popcnt.c

In your example program the parameter to __builtin_popcountll is a constant so the compiler will probably do the calculation at compile time and never emit the popcnt instruction. GCC does this even if not asked to optimize the program.

So try passing it something that it can't know at compile time:

int main (int argc, char** argv)
{
    return  __builtin_popcountll ((long long) argv);
}

$ gcc -march=corei7 -O popcnt.c && objdump -d a.out | grep '<main>' -A 2
0000000000400454 <main>:
  400454:       f3 48 0f b8 c6          popcnt %rsi,%rax
  400459:       c3                      retq
like image 91
Torkel Bjørnson-Langen Avatar answered Sep 24 '22 18:09

Torkel Bjørnson-Langen


You need to do it like this:

#include <stdio.h>
#include <smmintrin.h>

int main(void)
{
    int pop = _mm_popcnt_u64(0xf0f0f0f0f0f0f0f0ULL);
    printf("pop = %d\n", pop);
    return 0;
}

$ gcc -Wall -m64 -msse4.2 popcnt.c -o popcnt
$ ./popcnt 
pop = 32
$ 

EDIT

Oops - I just checked the disassembly output with gcc 4.2 and ICC 11.1 - while ICC 11.1 correctly generates popcntl or popcntq, for some reason gcc does not - it calls ___popcountdi2 instead. Weird. I will try a newer version of gcc when I get a chance and see if it's fixed. I guess the only workaround otherwise is to use ICC instead of gcc.

like image 30
Paul R Avatar answered Sep 26 '22 18:09

Paul R