Embedding assembler within C++ acceptable?

Question

If you're writing an application which is very latency sensitive what are the limits to embedding assembler within C++ functions (and using the C++ function calls normally), like so:

inline __int64 GetCpuClocks()
{

    // Counter
    struct { int32 low, high; } counter;

    // Use RDTSC instruction to get clocks count
    __asm push EAX
    __asm push EDX
    __asm __emit 0fh __asm __emit 031h // RDTSC
    __asm mov counter.low, EAX
    __asm mov counter.high, EDX
    __asm pop EDX
    __asm pop EAX

    // Return result
    return *(__int64 *)(&counter);

}

(The above function came from another SO post I saw)

Can you treat assembler-inlined functions like a black box? Could you easily retrieve a result from calculations performed in assembler? Are there dangers that you dont know what variables are currently in registers etc? Does it cause more problems than solve, or is it acceptable for specific small tasks?

(Assume your architecture is going to be fixed, and known)

EDIT I just found this, this is what I am hinting at:

http://www.codeproject.com/Articles/15971/Using-Inline-Assembly-in-C-C

EDIT2 This is more aimed towards Linux and x86- its just a general C++/assembler question (or so i thought).

Jonas Schäfer · Accepted Answer

I'd like to answer on the subquestion:

Does it cause more problems than solve, or is it acceptable for specific small tasks?

It certainly does! Using inline assembler, you take the ability from the compiler to optimize the code. It cannot do partial expression substition or any other fancy optimization. It is really, really hard to produce code which is better than what the compiler emits with -O3. And as a bonus, the code gets even better with the next compiler release (presuming that the next compiler release doesn't break it ;) ).

Compilers usually grasp a more wider scope than human brains ever could (or should, to ensure sanity), being able to inline the right function at the right place, to do a partial expression substitution which makes code more efficient. Things you would never do in ASM because your code becomes unreadable as hell.

As an anecdotal reference, I'd like to this post by Linus Torvalds, relating to the git implementation of SHA1, which outperforms the hand-optimized SHA1 in libcrypt.

In fact, I think the only reasonable use of inline assembler nowadays is calling processor instructions which are not available otherwise (the one you quoted is available, on linux for example as clock_gettime, at least if you're only after a high resolution time counter) or if you have to do things where you need to trick the compiler (for example during implementation of foreign function interfaces).

On the snippet and what others said. Especially with such functions you'll get a performance penalty. In inline asm, you have to be super-careful that the registers are kept in the state the compiler assumes them to be (push/pop, as above). While if you write the code normally, the compiler can take care and keep exactly those variables for which it makes sense in registers and those which do not fit on the stack.

Trust your compiler. It's smart. Most of the time. Invest the time you save by not using inline assembler in thinking about smart, fast algorithms and learning the relevant compiler switches (e.g. to enable SSE optimizations etc.).

Almo · Answer

If the asm in question is pushing any registers it uses at the top then pops them at the bottom, I think you're safe not to worry about it.

In your example, these are the __asm push EAX and __asm pop EAX instructions.

The real answer, I suppose, is that you need to know enough about what the asm does to be sure you can treat it as a black box. :)

Embedding assembler within C++ acceptable?

Tags:

c++

performance

assembly

user997112

2 Answers

Jonas Schäfer

Almo

Recent Activity

Donate For Us

Embedding assembler within C++ acceptable?

Tags:

c++

performance

assembly

user997112

2 Answers

Jonas Schäfer

Almo

Related questions

Recent Activity

Donate For Us