Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get `gcc` to generate `bts` instruction for x86-64 from standard C?

Tags:

c

gcc

x86-64

Inspired by a recent question, I'd like to know if anyone knows how to get gcc to generate the x86-64 bts instruction (bit test and set) on the Linux x86-64 platforms, without resorting to inline assembly or to nonstandard compiler intrinsics.

Related questions:

  • Why doesn't gcc do this for a simple |= operation were the right-hand side has exactly 1 bit set?

  • How to get bts using compiler intrinsics or the asm directive

Portability is more important to me than bts, so I won't use and asm directive, and if there's another solution, I prefer not to use compiler instrinsics.

EDIT: The C source language does not support atomic operations, so I'm not particularly interested in getting atomic test-and-set (even though that's the original reason for test-and-set to exist in the first place). If I want something atomic I know I have no chance of doing it with standard C source: it has to be an intrinsic, a library function, or inline assembly. (I have implemented atomic operations in compilers that support multiple threads.)

like image 950
Norman Ramsey Avatar asked Jan 11 '10 04:01

Norman Ramsey


2 Answers

It is in the first answer for the first link - how much does it matter in grand scheme of things. The only part when you test bits are:

  • Low level drivers. However if you are writing one you probably know ASM, it is sufficient tided to the system and probably most delays are on I/O
  • Testing for flags. It is usually either on initialisation (one time only at the beginning) or on some shared computation (which takes much more time).

The overall impact on performance of applications and macrobenchmarks is likely to be minimal even if microbenchmarks shows an improvement.

To the Edit part - using bts alone does not guarantee the atomic of the operation. All it guarantee is that it will be atomic on this core (so is or done on memory). On multi-processor units (uncommon) or multi-core units (very common) you still have to synchronize with other processors.

As synchronization is much more expensive I belive that difference between:

asm("lock bts %0, %1" : "+m" (*array) : "r" (bit));

and

asm("lock or %0, %1" : "+m" (*array) : "r" (1 << bit));

is minimal. And the second form:

  • Can set several flag at once
  • Have nice __sync_fetch_and_or (array, 1 << bit) form (working on gcc and intel compiler as far as I remember).
like image 74
Maciej Piechotka Avatar answered Oct 24 '22 16:10

Maciej Piechotka


I use the gcc atomic builtins such as __sync_lock_test_and_set( http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html ). Changing the -march flag will directly affect what is generated. I'm using it with i686 right now, but http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options shows all the possibilities.

I realize it's not exactly what you are asking for, but I found those two web pages very useful when I was looking for mechanisms like that.

like image 27
laura Avatar answered Oct 24 '22 16:10

laura