Novice here, with a frustratingly simple question. I'm trying to learn assembly and this has been a stumbling block for me for so long, I would really appreciate help with these concepts. Thanks so much.
Take the following statement:
movq $5, %rax
This is moving the value 5 itself into the register %rax, yes? That is to say, if I subsequently use %rax in an addition statement, it's going to treat that as the number 5 itself -- it's not going to try to add some memory address -- it's going to add the actual value 5. If I wanted to treat the number as a memory address, I'd have to leave the dollar sign off, yes? Then it would be treated as a memory address, not a numerical value, right?
And yet, if I define a label:
.section .data
my_number:
.quad 5
and use the label to write the same statement:
movq my_number, %rax
suddenly everything is inverted. I now have to omit the dollar sign to get the same result. Why?
This statement is going to mov the value 5 itself into %rax again, just like the previous statement, right? If I were to use the dollar sign before my number, then I'd get the memory address. Which is the opposite of how it worked before. Before, using the literal, the dollar sign gave me the integer value (5), and leaving the dollar sign off gave me the memory address. Now, in the example with my_number, leaving the dollar sign off gives me value, and using it gives me the memory address. Why the change? What happened?
It seems to me that the function of the dollar sign completely reverses itself from one example (movq $5, %rax) to another (movq my_number, %rax). These two instructions have the same functionality, they do the same thing, so why does one require the dollar sign and the other doesn't? Obviously my understanding of the concepts of values versus memory addresses has some major flaw, and I just haven't been able to identify it despite literally many, many hours of reading through forums, programming books, instructional videos, etc. -- several times in the past I gave up when I reached this point because I couldn't find an answer. Every time I try to revisit assembly language I reach this same obstacle.
Please help. Thank you in advance.
The first thing to understand is that the syntax is what it is. You can try to find explanations for why it is the way it is, but it's kind of hard to force it into a logical system that doesn't necessarily exist. Neverthless, I have written about why the syntax is the way it is and how exactly addressing modes work before.
That said, to resolve your question, here is a way to think about it: First, the value of a symbol is its address, not whatever is stored at that address. The assembler doesn't distinguish values from addresses. For all it knows, if you write
movq $5, %rax
you are loading the address 5 to register rax. Or the value 5. Who knows? Not the assembler. If you write
foo: .quad 5
The value 5 will be placed somewhere in memory and the symbol foo will be assigned its address. Writing foo does not have the same effect as writing 5 because foo is where 5 is stored, not 5 itself. Of course you can also make foo resolve to the address 5 by writing
.equ foo, 5
or equivalently
foo= 5
This sets the address of foo to 5 and does not allocate any memory.
Now why is the $ “decoration” needed in some cases but not in others?
Operands to an instruction always have an addressing mode. That is, they specify how the operand is obtained. An operand that starts with $ is an immediate operand, i.e. its value is encoded into the instruction. An operand that is just a plain expression (something like 5, foo, or 5+foo) is an absolutely addressed memory operand, i.e. the operand is an absolute address at which the value is found.
Directives (things like .quad, but also assignment through =) however do not have addressing modes. They just take expressions and then do something with the expression, like place its value into memory. Therefore, their operands look like absolutely addressed memory operands, but aren't. They are just expressions with no syntactically implied addressing mode.
So that's why a naked expression some times indicates a memory reference and some times seems to indicate an immediate value. Context matters.
There are other syntaxes like Plan 9 syntax where directives do take addressing modes. For example, in Plan 9 syntax you'd write
DATA my_number(SB)/8, $5
with both an addressing mode for my_number and for the immediate 5 to write what AT&T syntax does with
my_number: .quad 5
However, that's Plan 9 syntax, not AT&T syntax. It's different.
The code that gets generated for (A):
mov $5, %rax
is fundamentally different than the code generated for (B):
mov my_number, %rax.
Both will have the result of putting the number 5 into rax, but A will generate an immediate load of the number 5 into rax, while B will load the number from memory -- specifically, from the .data section of your running executable.
To see this, we can look at the generated code for each instruction. Here is your example:
# loads.s
.global test
.text
test:
movq $5, %rax
movq my_number, %rax
ret
.data # switch to the .data section.
# Without this, my_number would be contiguous with the machine code
my_number:
.quad 5
I assembled it with
as -o loads.o loads.s
and linked it with
ld -o loads -no-pie loads.o
and now we can view the machine code in the .text section with
objdump -dw loads:
Disassembly of section .text:
0000000000401000 <test>:
401000: 48 c7 c0 05 00 00 00 mov $0x5,%rax
401007: 48 8b 04 25 30 30 40 00 mov 0x403030,%rax
40100f: c3 ret
The first instruction has three leading bytes(0x48, 0xc7, 0xc0 = REX.W=1, opcode, and ModRM) that encode the instruction and operand style, and then a 4 byte immediate value: 0x05 0x00 0x00 0x00. There's our 5! (in little endian, so the 5 byte is first). It will take that 5 from the instruction stream, and put it into RAX.
The second instruction has four leading bytes(0x48, 0x8b, 0x04, 0x25), and then another four byte immediate: 0x30 0x30 0x40 0x00. This is the runtime virtual address in .data where, if all went right, a 5 will be located. The leading bytes indicate that cpu should load from that address in memory and put the result into RAX. And now our little function has accomplished nothing, and we return.
(The static executable we actually built from this source alone has nothing to return to; running it will segfault after ret pops argc (a small integer) from the stack into RIP, then tries to fetch code from that unmapped page. This source which defines a function is written to be linked into a larger program which contains a caller. We only used ld (without -pie like GCC normally passes these days) on it alone to fill in an actual absolute address into the machine code, instead of a placeholder, so we'd have a concrete example. Non-PIE was also necessary to allow this 32-bit absolute addressing mode rather than my_number(%rip) to link.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With