For the the following code:
long buf[64];
register long rrax asm ("rax");
register long rrbx asm ("rbx");
register long rrsi asm ("rsi");
rrax = 0x34;
rrbx = 0x39;
__asm__ __volatile__ ("movq $buf,%rsi");
__asm__ __volatile__ ("movq %rax, 0(%rsi);");
__asm__ __volatile__ ("movq %rbx, 8(%rsi);");
printf( "buf[0] = %lx, buf[1] = %lx!\n", buf[0], buf[1] );
I get the following output:
buf[0] = 0, buf[1] = 346161cbc0!
while it should have been:
buf[0] = 34, buf[1] = 39!
Any ideas why it is not working properly, and how to solve it?
In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.
The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached.
Inline assembly code can use any C variable or function name that is in scope, so it is easy to integrate it with your program's C code. Because the assembly code can be mixed inline with C or C++ statements, it can do tasks that are cumbersome or impossible in C or C++.
In general, Clang is highly compatible with the GCC inline assembly extensions, allowing the same set of constraints, modifiers and operands as GCC inline assembly.
Slightly off-topic but I'd like to follow up a bit on gcc inline assembly.
The (non-)need for __volatile__
comes from the fact that GCC optimizes inline assembly. GCC inspects the assembly statement for side effects / prerequisites, and if it finds them not to exist it may choose to move the assembly instruction around or even decide to remove it. All __volatile__
does is to tell the compiler "stop caring and put this right there".
Which is usually not what you really want.
This is where the need for constraints come in. The name is overloaded and actually used for different things in GCC inline assembly:
asm()
blockasm()
.In many cases, developers abuse __volatile__
because they noticed their code either being moved around or even disappearing without it. If this happens, it's usually rather a sign that the developer has attempted not to tell GCC about side effects / prerequisites of the assembly. For example, this buggy code:
register int foo __asm__("rax") = 1234;
register int bar __adm__("rbx") = 4321;
asm("add %rax, %rbx");
printf("I'm expecting 'bar' to be 5555 it is: %d\n", bar);
It's got several bugs:
%%
are needed, but in the above if you actually specify them you get a compiler/assembler error, /tmp/ccYPmr3g.s:22: Error: bad register name '%%rax'
.asm()
literally. That might be true for Microsoft Visual C++ but is not the case for gcc.If you compile it without optimization, it creates:
0000000000400524 <main>: [ ... ] 400534: b8 d2 04 00 00 mov $0x4d2,%eax 400539: bb e1 10 00 00 mov $0x10e1,%ebx 40053e: 48 01 c3 add %rax,%rbx 400541: 48 89 da mov %rbx,%rdx 400544: b8 5c 06 40 00 mov $0x40065c,%eax 400549: 48 89 d6 mov %rdx,%rsi 40054c: 48 89 c7 mov %rax,%rdi 40054f: b8 00 00 00 00 mov $0x0,%eax 400554: e8 d7 fe ff ff callq 400430 <printf@plt> [...]You can find your
add
instruction, and the initializations of the two registers, and it'll print the expected. If, on the other hand, you crank optimization up, something else happens:0000000000400530 <main>: 400530: 48 83 ec 08 sub $0x8,%rsp 400534: 48 01 c3 add %rax,%rbx 400537: be e1 10 00 00 mov $0x10e1,%esi 40053c: bf 3c 06 40 00 mov $0x40063c,%edi 400541: 31 c0 xor %eax,%eax 400543: e8 e8 fe ff ff callq 400430 <printf@plt> [ ... ]Your initializations of both the "used" registers are no longer there. The compiler discarded them because nothing it could see was using them, and while it kept the assembly instruction it put it before any use of the two variables. It's there but it does nothing (Luckily actually ... if
rax
/ rbx
had been in use who can tell what'd have happened ...).
And the reason for that is that you haven't actually told GCC that the assembly is using these registers / these operand values. This has nothing whatsoever to do with volatile
but all with the fact you're using a constraint-free asm()
expression.
The way to do this correctly is via constraints, i.e. you'd use:
int foo = 1234;
int bar = 4321;
asm("add %1, %0" : "+r"(bar) : "r"(foo));
printf("I'm expecting 'bar' to be 5555 it is: %d\n", bar);
This tells the compiler that the assembly:
"+r"(...)
that both needs to be initialized before the assembly statement, and is modified by the assembly statement, and associate the variable bar
with it."r"(...)
that needs to be initialized before the assembly statement and is treated as readonly / not modified by the statement. Here, associate foo
with that.Notice no register assignment is specified - the compiler chooses that depending on the variables / state of the compile. The (optimized) output of the above:
0000000000400530 <main>: 400530: 48 83 ec 08 sub $0x8,%rsp 400534: b8 d2 04 00 00 mov $0x4d2,%eax 400539: be e1 10 00 00 mov $0x10e1,%esi 40053e: bf 4c 06 40 00 mov $0x40064c,%edi 400543: 01 c6 add %eax,%esi 400545: 31 c0 xor %eax,%eax 400547: e8 e4 fe ff ff callq 400430 <printf@plt> [ ... ]GCC inline assembly constraints are almost always necessary in some form or the other, but there can be multiple possible ways of describing the same requirements to the compiler; instead of the above, you could also write:
asm("add %1, %0" : "=r"(bar) : "r"(foo), "0"(bar));
This tells gcc:
bar
, that after the statement will be found in a register, "=r"(...)
foo
, which is to be placed into a register, "r"(...)
bar
Or, again an alternative:
asm("add %1, %0" : "+r"(bar) : "g"(foo));
which tells gcc:
bar
both input/output)foo
, which the statement doesn't care whether it's in a register, in memory or a compile-time constant (that's the "g"(...)
constraint)The result is different from the former:
0000000000400530 <main>: 400530: 48 83 ec 08 sub $0x8,%rsp 400534: bf 4c 06 40 00 mov $0x40064c,%edi 400539: 31 c0 xor %eax,%eax 40053b: be e1 10 00 00 mov $0x10e1,%esi 400540: 81 c6 d2 04 00 00 add $0x4d2,%esi 400546: e8 e5 fe ff ff callq 400430 <printf@plt> [ ... ]because now, GCC has actually figured out
foo
is a compile-time constant and simply embedded the value in the add
instruction ! Isn't that neat ?
Admittedly, this is complex and takes getting used to. The advantage is that letting the compiler choose which registers to use for what operands allows optimizing the code overall; if, for example, an inline assembly statement is used in a macro and/or a static inline
function, the compiler can, depending on the calling context, choose different registers at different instantiations of the code. Or if a certain value is compile-time evaluatable / constant in one place but not in another, the compiler can tailor the created assembly for it.
Think of GCC inline assembly constraints as kind of "extended function prototypes" - they tell the compiler what types and locations for arguments / return values are, plus a bit more. If you don't specify these constraints, your inline assembly is creating the analogue of functions that operate on global variables/state only - which, as we probably all agree, are rarely ever doing exactly what you intended.
You clobber memory but don't tell GCC about it, so GCC can cache values in buf
across assembly calls. If you want to use inputs and outputs, tell GCC about everything.
__asm__ (
"movq %1, 0(%0)\n\t"
"movq %2, 8(%0)"
: /* Outputs (none) */
: "r"(buf), "r"(rrax), "r"(rrbx) /* Inputs */
: "memory"); /* Clobbered */
You also generally want to let GCC handle most of the mov
, register selection, etc -- even if you explicitly constrain the registers (rrax is stil %rax
) let the information flow through GCC or you will get unexpected results.
__volatile__
is wrong.The reason __volatile__
exists is so you can guarantee that the compiler places your code exactly where it is... which is a completely unnecessary guarantee for this code. It's necessary for implementing advanced features such as memory barriers, but almost completely worthless if you are only modifying memory and registers.
GCC already knows that it can't move this assembly after printf
because the printf
call accesses buf
, and buf
could be clobbered by the assembly. GCC already knows that it can't move the assembly before rrax=0x39;
because rax
is an input to the assembly code. So what does __volatile__
get you? Nothing.
If your code does not work without __volatile__
then there is an error in the code which should be fixed instead of just adding __volatile__
and hoping that makes everything better. The __volatile__
keyword is not magic and should not be treated as such.
Alternative fix:
Is __volatile__
necessary for your original code? No. Just mark the inputs and clobber values correctly.
/* The "S" constraint means %rsi, "b" means %rbx, and "a" means %rax
The inputs and clobbered values are specified. There is no output
so that section is blank. */
rsi = (long) buf;
__asm__ ("movq %%rax, 0(%%rsi)" : : "a"(rrax), "S"(rssi) : "memory");
__asm__ ("movq %%rbx, 0(%%rsi)" : : "b"(rrbx), "S"(rrsi) : "memory");
Why __volatile__
doesn't help you here:
rrax = 0x34; /* Dead code */
GCC is well within its rights to completely delete the above line, since the code in the question above claims that it never uses rrax
.
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ __volatile__ ("movq %%rax, (global)");
}
The disassembly is more or less as you expect it at -O0
,
movl $5, %rax
movq %rax, (global)
But with optimization off, you can be fairly sloppy about assembly. Let's try -O2
:
movq %rax, (global)
Whoops! Where did rax = 5;
go? It's dead code, since %rax
is never used in the function — at least as far as GCC knows. GCC doesn't peek inside assembly. What happens when we remove __volatile__
?
; empty
Well, you might think __volatile__
is doing you a service by keeping GCC from discarding your precious assembly, but it's just masking the fact that GCC thinks your assembly isn't doing anything. GCC thinks your assembly takes no inputs, produces no outputs, and clobbers no memory. You had better straighten it out:
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ __volatile__ ("movq %%rax, (global)" : : : "memory");
}
Now we get the following output:
movq %rax, (global)
Better. But if you tell GCC about the inputs, it will make sure that %rax
is properly initialized first:
long global;
void store_5(void)
{
register long rax asm ("rax");
rax = 5;
__asm__ ("movq %%rax, (global)" : : "a"(rax) : "memory");
}
The output, with optimizations:
movl $5, %eax
movq %rax, (global)
Correct! And we don't even need to use __volatile__
.
__volatile__
exist?The primary correct use for __volatile__
is if your assembly code does something else besides input, output, or clobbering memory. Perhaps it messes with special registers which GCC doesn't know about, or affects IO. You see it a lot in the Linux kernel, but it's misused very often in user space.
The __volatile__
keyword is very tempting because we C programmers often like to think we're almost programming in assembly language already. We're not. C compilers do a lot of data flow analysis — so you need to explain the data flow to the compiler for your assembly code. That way, the compiler can safely manipulate your chunk of assembly just like it manipulates the assembly that it generates.
If you find yourself using __volatile__
a lot, as an alternative you could write an entire function or module in an assembly file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With