I am trying to wrap my mind around pointers in Assembly. What exactly is the difference between: <pre class="prettyprint"><code>mov eax, ebx </code></pre> and <pre class="prettyprint"><code>mov [eax], ebx </code></pre> and when should <code>dword ptr [eax]</code> should be used? Also when I try to do <code>mov eax, [ebx]</code> I get a compile error, why is this?

As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly. So, this: <pre class="prettyprint"><code>mov eax, ebx </code></pre> simply copies the value in <code>ebx</code> into <code>eax</code>. In a pseudo-C notation, this would be: <code>eax = ebx</code>. Whereas this: <pre class="prettyprint"><code>mov eax, [ebx] </code></pre> dereferences the contents of <code>ebx</code> and stores the pointed-to value in <code>eax</code>. In a pseudo-C notation, this would be: <code>eax = *ebx</code>. Finally, this: <pre class="prettyprint"><code>mov [eax], ebx </code></pre> stores the value in <code>ebx</code> into the memory location pointed to by <code>eax</code>. Again, in pseudo-C notation: <code>*eax = ebx</code>. <hr> The registers here could also be replaced with memory operands, such as symbolic variable names. So this: <pre class="prettyprint"><code>mov eax, [myVar] </code></pre> dereferences the address of the variable <code>myVar</code> and stores the contents of that variable in <code>eax</code>, like <code>eax = myVar</code>. By contrast, this: <pre class="prettyprint"><code>mov eax, myVar </code></pre> stores the address of the variable <code>myVar</code> into <code>eax</code>, like <code>eax = &myVar</code>. At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands. To get the address of a variable in MASM, you would use the <code>OFFSET</code> keyword: <pre class="prettyprint"><code>mov eax, OFFSET myVar </code></pre> However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch. Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if <code>myVar</code> is an <code>int</code>, you would do: <pre class="prettyprint"><code>mov eax, DWORD PTR [myVar] ; eax = myVar </code></pre> or <pre class="prettyprint"><code>mov DWORD PTR [myVar], eax ; myVar = eax </code></pre> This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that <code>myVar</code> is a <code>DWORD</code>-sized memory location. You don't need this at all when dereferencing register operands, since the name of the register indicates its size. <code>al</code> and <code>ah</code> are always <code>BYTE</code>-sized, <code>ax</code> is always <code>WORD</code>-sized, <code>eax</code> is always <code>DWORD</code>-sized, and <code>rax</code> is always <code>QWORD</code>-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands. <hr> <blockquote> Also when I try to do <code>mov eax, [ebx]</code> I get a compile error, why is this? </blockquote> Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to: <pre class="prettyprint"><code>mov eax, DWORD PTR [ebx] </code></pre> and means that the memory location pointed to by <code>ebx</code> will be dereferenced and that <code>DWORD</code>-sized value will be loaded into <code>eax</code>. <hr> <blockquote> why I cant do <code>mov a, [eax]</code> Should that not make "a" a pointer to wherever eax is pointing? </blockquote> No. This combination of operands is not allowed. As you can see from the documentation for the <code>MOV</code> instruction, there are essentially five possibilities (ignoring alternate encodings and segments): <pre class="prettyprint"><code>mov register, register ; copy one register to another mov register, memory ; load value from memory into register mov memory, register ; store value from register into memory mov register, immediate ; move immediate value (constant) into register mov memory, immediate ; store immediate value (constant) in memory </code></pre> Notice that there is no <code>mov memory, memory</code>, which is what you were trying. However, you can make <code>a</code> point to what <code>eax</code> is pointing to by simply coding: <pre class="prettyprint"><code>mov DWORD PTR [a], eax </code></pre> Now <code>a</code> and <code>eax</code> have the same value. If <code>eax</code> was a pointer, then <code>a</code> is now a pointer to that same memory location. If you want to set <code>a</code> to the value that <code>eax</code> is pointing to, then you will need to do: <pre class="prettyprint"><code>mov eax, DWORD PTR [eax] ; eax = *eax mov DWORD PTR [a], eax ; a = eax </code></pre> Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like: <pre class="prettyprint"><code>mov edx, DWORD PTR [eax] ; edx = *eax mov DWORD PTR [a], edx ; a = edx </code></pre> <hr> I realize this is all somewhat confusing. The <code>mov</code> instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single <code>mov</code> instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.

x86 Assembly pointers

Tags:

pointers

x86

assembly

I am trying to wrap my mind around pointers in Assembly.

What exactly is the difference between:

mov eax, ebx

and

mov [eax], ebx

and when should dword ptr [eax] should be used?

Also when I try to do mov eax, [ebx] I get a compile error, why is this?

613

asked May 03 '17 20:05

Duxa

1 Answers

As has already been stated, wrapping brackets around an operand means that that operand is to be dereferenced, as if it were a pointer in C. In other words, the brackets mean that you are reading a value from (or storing a value into) that memory location, rather than reading that value directly.

So, this:

mov  eax, ebx

simply copies the value in ebx into eax. In a pseudo-C notation, this would be: eax = ebx.

Whereas this:

mov  eax, [ebx]

dereferences the contents of ebx and stores the pointed-to value in eax. In a pseudo-C notation, this would be: eax = *ebx.

Finally, this:

mov  [eax], ebx

stores the value in ebx into the memory location pointed to by eax. Again, in pseudo-C notation: *eax = ebx.

The registers here could also be replaced with memory operands, such as symbolic variable names. So this:

mov  eax, [myVar]

dereferences the address of the variable myVar and stores the contents of that variable in eax, like eax = myVar.

By contrast, this:

mov  eax, myVar

stores the address of the variable myVar into eax, like eax = &myVar.

At least, that's how most assemblers work. Microsoft's assembler (called MASM), and the Microsoft C/C++ compiler's inline assembly, is a bit different. It treats the above two instructions as equivalent, essentially ignoring the brackets around memory operands.

To get the address of a variable in MASM, you would use the OFFSET keyword:

mov  eax, OFFSET myVar

However, even though MASM has this forgiving syntax and allows you to be sloppy, you shouldn't. Always include the brackets when you want to dereference a variable and get its actual value. You will never get the wrong result if you write the code explicitly using the proper syntax, and it'll make it easier for others to understand. Plus, it'll force you to get into the habit of writing the code the way that other assemblers will expect it to be written, rather than relying on MASM's "do what I mean, not what I write" crutch.

Speaking of that "do what I mean, not what I write" crutch, MASM also generally allows you to get away with omitting the operand-size specifier, since it knows the size of the variable. But again, I recommend writing it for clarity and consistency. Therefore, if myVar is an int, you would do:

mov  eax, DWORD PTR [myVar]    ; eax = myVar

mov  DWORD PTR [myVar], eax    ; myVar = eax

This notation is necessary in other assemblers like NASM that are not strongly-typed and don't remember that myVar is a DWORD-sized memory location.

You don't need this at all when dereferencing register operands, since the name of the register indicates its size. al and ah are always BYTE-sized, ax is always WORD-sized, eax is always DWORD-sized, and rax is always QWORD-sized. But it doesn't hurt to include it anyway, if you like, for consistency with the way you notate memory operands.

Also when I try to do mov eax, [ebx] I get a compile error, why is this?

Um…you shouldn't. This assembles fine for me in MSVC's inline assembly. As we have already seen, it is equivalent to:

mov  eax, DWORD PTR [ebx]

and means that the memory location pointed to by ebx will be dereferenced and that DWORD-sized value will be loaded into eax.

why I cant do mov a, [eax] Should that not make "a" a pointer to wherever eax is pointing?

No. This combination of operands is not allowed. As you can see from the documentation for the MOV instruction, there are essentially five possibilities (ignoring alternate encodings and segments):

mov  register, register     ; copy one register to another mov  register, memory       ; load value from memory into register mov  memory,   register     ; store value from register into memory mov  register, immediate    ; move immediate value (constant) into register mov  memory,   immediate    ; store immediate value (constant) in memory

Notice that there is no mov memory, memory, which is what you were trying.

However, you can make a point to what eax is pointing to by simply coding:

mov  DWORD PTR [a], eax

Now a and eax have the same value. If eax was a pointer, then a is now a pointer to that same memory location.

If you want to set a to the value that eax is pointing to, then you will need to do:

mov  eax, DWORD PTR [eax]    ; eax = *eax mov  DWORD PTR [a], eax      ; a   = eax

Of course, this clobbers the pointer and replaces it with the dereferenced value. If you don't want to lose the pointer, then you will have to use a second "scratch" register; something like:

mov  edx, DWORD PTR [eax]    ; edx = *eax mov  DWORD PTR [a], edx      ; a   = edx

I realize this is all somewhat confusing. The mov instruction is overloaded with a large number of potential meanings in the x86 ISA. This is due to x86's roots as a CISC architecture. By contrast, modern RISC architectures do a better job of separating register-register moves, memory loads, and memory stores. x86 crams them all into a single mov instruction. It's too late to go back and fix it now; you just have to get comfortable with the syntax, and sometimes it takes a second glance.

answered Sep 21 '22 23:09

Cody Gray

Related questions
                            
                                Can the expression "(ptr == 0) != (ptr == (void*)0)" really be true?
                            
                                C++: Vector of objects vs. vector of pointers to new objects?
                            
                                Why "bool c = nullptr ;" compiles (C++11)?
                            
                                Pointer address in a C multidimensional array
                            
                                std::map, pointer to map key value, is this possible?
                            
                                How to know if a pointer points to the heap or the stack?
                            
                                What is the difference between: Handle, Pointer and Reference
                            
                                What happens in a double delete?
                            
                                What real use does a double pointer have?
                            
                                Does java really have pointers or not? [closed]
                            
                                What is the real difference between Pointers and References?
                            
                                Creating an interface for an abstract class template in C++
                            
                                Pointer interconvertibility vs having the same address
                            
                                Function pointer vs Function reference
                            
                                What's the ampersand for when used after class name like ostream& operator <<(...)?
                            
                                Why does Go forbid taking the address of (&) map member, yet allows (&) slice element?
                            
                                C++ correct way to return pointer to array from function
                            
                                Assigned vs <> nil
                            
                                error: invalid type argument of ‘unary *’ (have ‘int’)
                            
                                What is the difference between char*str={"foo",...} and char str[][5]={"foo",...} array definitions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With