Using AT&T syntax on x86-64, I wish to assemble c = a + b;
as
add %[a], %[b], %[c]
Unfortunately, GNU's assembler will not do it. Why not?
DETAILS
According to Intel's Software Developer's Manual, rev. 75 (June 2021), vol. 2, section 2.5,
VEX-encoded general-purpose-register instructions have ... instruction syntax support for three encodable operands.
The VEX prefix is an AVX feature, so x86-64 CPUs from Sandy Bridge/Bulldozer onward implement it. That's ten years ago, so GNU's assembler ought to assemble my three-operand instruction, oughtn't it?
To clarify, I am aware that one can write it in the old style as
mov %[a], %[c]
add %[b], %[c]
However, I wish to write it in the new, VEX style. Incidentally, I have informed the assembler that I have a modern CPU by issuing GCC the -march=skylake
command-line option.
What is my mistake, please?
SAMPLE CODE
In a C++ wrapper,
#include <cstddef>
#include <iostream>
int main()
{
volatile int a{8};
volatile int b{5};
volatile int c{0};
//c = a + b;
asm volatile (
//"mov %[a], %[c]\n\t"
//"add %[b], %[c]\n\t"
"add %[a], %[b], %[c]\n\t"
: [c] "=&r" (c)
: [a] "r" (a), [b] "r" (b)
: "cc"
);
std::cout << c << "\n";
}
The add instruction adds together its two operands, storing the result in its first operand. Note, whereas both operands may be registers, at most one operand may be a memory location. The inc instruction increments the contents of its operand by one. The dec instruction decrements the contents of its operand by one.
lea is an abbreviation of "load effective address". It loads the address of the location reference by the source operand to the destination operand.
Only a few specific GPR instructions have VEX encodings, primarily the BMI1/BMI2 instructions that were added after AVX already existed. See the list in Table 2-28, which has ANDN, BEXTR, BLSI, BLSMSK, BLSR, BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX
, as well as the same list in 5.1.16.1. For example, andn
's manual entry lists only a VEX encoding, and
's manual entry doesn't list any.
So Intel (unfortunately) didn't introduce a brand new three-operand alternate encoding for the entire instruction set. They just introduced a few specific instructions that take three operands and use VEX for it. In some cases these have similar or equivalent functionality to an existing instruction, e.g. SHLX
for SHL
with a variable count, and so effectively provide a three-operand version of the previous two-operand instruction, but only in those special cases. There are not equivalent instructions across the board.
The "old style" two-operand form remains the only version of the add
instruction. However, as fuz points out in comments, lea
can be a good way to add two registers and write the result to a third, subject to some restrictions on operand size.
See Using LEA on values that aren't addresses / pointers? for more general things LEA can do, like copy-and-add a constant to a register, or shift-and-add. Compilers already know this and will use lea
where appropriate, any time it saves instructions. (Or with some tune options like -mtune=atom
for old in-order Atom, will use lea
even when they could have used add
.)
If more flexible encodings of common integer instructions other than add existed, like and
/xor
/sub
, gcc -O3 -march=skylake
would already be using them in its own asm output, without needing inline asm. Or if alternative instructions could get the job done, like lea
for add
, would be doing that, so it makes sense to look at compiler output to see what tricks it knows. Trying it yourself would make more sense as something to play around with in a stand-alone .s
file that just makes an exit system call, or just to single-step, removing the complexity of using inline asm. (GAS by default doesn't restrict instruction-sets. gcc -march=skylake
doesn't pass that on to the assembler, as
.)
In your inline asm, your c
operand should be to output-only: =r
instead of +r
. The old value is overwritten, so there's no need to tell the compiler to produce it as an input. (Like you said, you want c = a+b
not c += a+b
.)
Using a single lea
as the asm template means you don't need a =&r
early-clobber output, because your asm will read all its inputs before writing that output. In your case, having it as an input/output was probably stopping the compiler from choosing the same register as one of the inputs, which could have broken with mov; add
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With