Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MASM: Using Current Location Counter ($) in .data declaration

Tags:

x86

assembly

masm

I met a problem about the Current Location Counter in MASM.

Here is my assembly code, and I used Visual Studio 2013 Express for assembling

.386
.model flat,stdcall
.stack 8192
ExitProcess proto,dwExitCode:dword

.data
ptr1 DWORD $
ptr2 DWORD $
ptr5 DWORD $


.code
main proc
    mov eax, $
    mov eax, $
    invoke ExitProcess,0
main endp
end main

In my opinion, I think that ptr1, ptr2, and ptr5 should have their own location value.

But it's not correct in fact.

When in debugging mode, the variables show the same result.

ptr1, ptr2, ptr5 have the same address and there is no offset between them.

What's problem when using $ for declaration ?

like image 729
Payton Hsieh Avatar asked Dec 26 '16 12:12

Payton Hsieh


2 Answers

Your problem seems to be a bug in MASM (or as Microsoft would likely put it, a "feature"). The problem isn't that the DWORD directives aren't generating object code or that they aren't advancing the assembler's location counter. If the former was true then they wouldn't show up in the executable at all, and if the later was true they would all have the same address.

The problem is that MASM incorrectly uses the offset of the start of the current segment (in the generated object file) for the value of $ instead of the current location counter in certain contexts when used in a data definition. The following code, based on your example, demonstrates this (and shows a simple solution):

        .386
        PUBLIC  ptr1, ptr2, ptr5, ptr6, len

_DATA   SEGMENT
        mov     eax, $
        mov     eax, $
ptr1    DWORD   $
ptr2    DWORD   $
ptr5    DWORD   OFFSET $
ptr6    DWORD   ptr6
len     DWORD   $ - ptr1
        mov     eax, $
_DATA   ENDS

        END

Here's how IDA disassembles the object file created by MASM for the above assembly code:

.data:00000000 $$000000:
.data:00000000                 mov     eax, offset $$000000
.data:00000005 $$000005:
.data:00000005                 mov     eax, offset $$000005
.data:0000000A ptr1            dd offset $$000000
.data:0000000E ptr2            dd offset $$000000
.data:00000012 ptr5            dd offset $$000000
.data:00000016 ptr6            dd offset ptr6
.data:0000001A len             dd 16
.data:0000001E $$00001E:
.data:0000001E                 mov     eax, offset $$00001E

You'll notice that the mov eax, $ instructions show that location counter is being correctly advanced by the DWORD directives. You'll also notice that ptr1, ptr2 and ptr5 all have been initialized with $$000000 which is at the start of the segment, completely ignoring the fact that both the previous MOV instructions and the DWORD directives have advanced the location counter.

On the other hand, MASM does evaluate $ - ptr1 correctly. This calculates the distance between ptr1 and the current location counter, which is 16, the total length in bytes of the previous four DWORD directives. This means in at least in this context MASM uses the correct value of $.

Finally the example shows how work around this problem. Just use a named label instead of $, like in the line ptr6 DWORD ptr6. This results in the assembler correctly generating a pointer that is initialized to point at itself.

like image 122
Ross Ridge Avatar answered Oct 01 '22 19:10

Ross Ridge


In MASM, the $ symbol represents the current value of the location counter. The location counter is a variable maintained internally by the assembler while it is processing your code. When a segment is first encountered, the assembler sets the location counter to zero. Then, as it encounters instructions or pseudo-opcodes, it increments the location counter for each byte that is output as object code.

Variable declarations don't count. Because they are not instructions or pseudo-opcodes, they don't cause the location counter to be incremented.

Granted, the MSDN documentation is lousy. Here's what Randall Hyde's The Art of Assembly Language has to say on the location counter (Chapter 8, Section 2, of the 16-bit DOS edition):

Recall that all addresses in the 80x86's memory space consist of a segment address and an offset within that segment. The assembler, in the process of converting your source file into object code, needs to keep track of offsets within the current segment. The location counter is an assembler variable that handles this.

Whenever you create a segment in your assembly language source file (see segments later in this chapter), the assembler associates the current location counter value with it. The location counter contains the current offset into the segment. Initially (when the assembler first encounters a segment) the location counter is set to zero. When encountering instructions or pseudo-opcodes, MASM increments the location counter for each byte written to the object code file. For example, MASM increments the location counter by two after encountering mov ax, bx since this instruction is two bytes long.

The value of the location counter varies throughout the assembly process. It changes for each line of code in your program that emits object code. We will use the term location counter to mean the value of the location counter at a particular statement before generating any code.

In short, the location counter is only incremented by lines that cause object code to be produced.

You can explicitly add an offset value to $ if you want. But I'm not really sure why you would want to do this...

like image 26
Cody Gray Avatar answered Oct 01 '22 20:10

Cody Gray