Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

making mistake in inline assembler in gcc [duplicate]

I have successfully written some inline assembler in gcc to rotate right one bit following some nice instructions: http://www.cs.dartmouth.edu/~sergey/cs108/2009/gcc-inline-asm.pdf

Here's an example:

static inline int ror(int v) {
    asm ("ror %0;" :"=r"(v) /* output */ :"0"(v) /* input */ );
    return v;
}

However, I want code to count clock cycles, and have seen some in the wrong (probably microsoft) format. I don't know how to do these things in gcc. Any help?

unsigned __int64 inline GetRDTSC() {
   __asm {
      ; Flush the pipeline
      XOR eax, eax
      CPUID
      ; Get RDTSC counter in edx:eax
      RDTSC
   }
}

I tried:

static inline unsigned long long getClocks() {
    asm("xor %%eax, %%eax" );
    asm(CPUID);
    asm(RDTSC : : %%edx %%eax); //Get RDTSC counter in edx:eax

but I don't know how to get the edx:eax pair to return as 64 bits cleanly, and don't know how to really flush the pipeline.

Also, the best source code I found was at: http://www.strchr.com/performance_measurements_with_rdtsc

and that was mentioning pentium, so if there are different ways of doing it on different intel/AMD variants, please let me know. I would prefer something that works on all x86 platforms, even if it's a bit ugly, to a range of solutions for each variant, but I wouldn't mind knowing about it.

like image 995
Dov Avatar asked Dec 17 '10 18:12

Dov


2 Answers

The following does what you want:

inline unsigned long long rdtsc() {
  unsigned int lo, hi;
  asm volatile (
     "cpuid \n"
     "rdtsc" 
   : "=a"(lo), "=d"(hi) /* outputs */
   : "a"(0)             /* inputs */
   : "%ebx", "%ecx");     /* clobbers*/
  return ((unsigned long long)lo) | (((unsigned long long)hi) << 32);
}

It is important to put as little inline ASM as possible in your code, because it prevents the compiler from doing any optimizations. That's why I've done the shift and oring of the result in C code rather than coding that in ASM as well. Similarly, I use the "a" input of 0 to let the compiler decide when and how to zero out eax. It could be that some other code in your program already zeroed it out, and the compiler could save an instruction if it knows that.

Also, the "clobbers" above are very important. CPUID overwrites everything in eax, ebx, ecx, and edx. You need to tell the compiler that you're changing these registers so that it knows not to keep anything important there. You don't have to list eax and edx because you're using them as outputs. If you don't list the clobbers, there's a serious chance your program will crash and you will find it extremely difficult to track down the issue.

like image 159
SoapBox Avatar answered Sep 23 '22 22:09

SoapBox


This will store the result in value. Combining the results takes extra cycles, so the number of cycles between calls to this code will be a few less than the difference in results.

unsigned int hi,lo;
unsigned long long value;
asm (
    "cpuid\n\t"
    "rdtsc"
    : "d" (hi), "a" (lo)
);
value = (((unsigned long long)hi) << 32) | lo;
like image 45
ughoavgfhw Avatar answered Sep 24 '22 22:09

ughoavgfhw