Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store 4 characters in a define doubleword in assembly language?

I'm currently doing assembly programming (16-bit) on DOSBox using MASM.

var1 dd 'abcd'

For the above code MASM is generating the error:

A2010: syntax error

What is wrong with the syntax? I am simply storing 4 characters in a doubleword.

I am doing 16-bit assembly, so is that a problem? Can I only use db and dw because the other variables are greater than 16 bit?

like image 836
Hassaan Raza Avatar asked May 23 '19 18:05

Hassaan Raza


People also ask

What is multiple initializations in assembly?

Multiple InitializationsThe TIMES directive allows multiple initializations to the same value. For example, an array named marks of size 9 can be defined and initialized to zero using the following statement − marks TIMES 9 DW 0. The TIMES directive is useful in defining arrays and tables.

What is size of DB in assembly language?

DB - Define Byte. 8 bits. DW - Define Word. Generally 2 bytes on a typical x86 32-bit system.

How are variables defined in assembly language?

A variable declaration starts with a label definition (the name of the variable), followed by a . word directive, followed by the initial value for the variable. The assembler supports a fairly flexible syntax for specifying the initial value.

How are strings stored in assembly?

A string is stored as consecutive characters in memory. If it's ASCII (not UTF-8), each character is a single byte. So you can access them one at a time with byte loads/stores, like movzbl 2(%rsi), %eax to get the 3rd character, if rsi points to the start of the string.


1 Answers

var1 db 'abcd' (not dd) puts the 4 bytes you want into memory in source order.

what is the purpose of having variables other than db?

Convenience in writing the initializer, dd 1234h is more convenient than db 34h, 12h, 0, 0 but assembles identical data into the output file. Also, the way MASM treats them as implying an operand-size when you use the symbol.

Later versions of MASM do accept dd 'abcd', but they endian flip it. (Instead of assembling bytes into memory in source order like NASM does.) See @RossRidge's answer for MASM details.

NASM will accept mov eax, 'abcd' or dd 'abcd' just fine: multi-character literals are just another form of integer literal, with the first byte first in memory (the least significant), because x86 is little endian. i.e. in NASM, multi-character integer literals have a memory order that matches their source order.

But MASM reverses them when used with dd or dw, so the first character becomes the most significant byte of an integer, and memory order is the reverse of source order. It may be a good idea to avoid it even in MASM versions that support the syntax, and use hex ASCII codes plus a comment.


In MASM, var1 dd vs. db also sets a default operand-size for accessing the data, if you declare it as a variable instead of a label.

Using var1 db ... means you'll have to use an explicit dword ptr any time you want to access all 4 bytes with mov eax, [var1]. Without dword ptr [var1], MASM will complain about operand-size mismatch.

But if you declare it as just a plain label, not tied to any db or dd directives that assemble bytes into memory, I think you can freely use it with any size.

(Update: apparently a label with a : is an error in MASM outside of code sections. I'm not sure if there is a way to declare just a data label that isn't a MASM "variable". See discussion in comments.)

;; I'm not sure this is correct, I'm making this up from memory
;; and I've never actually used MASM.  I know the syntax from SO answers.
.data
    label1:         ; "Just" a label, no data
      db 'abcd'       

    ; little-endian 'abcd'
    var2  dd 64636261h        ; no : so the symbol becomes a variable with a size from the dd

.code
func:
    mov  eax, [label1]                ; legal I think
    mov  al, [label1]                 ; also legal
    mov  eax, dword ptr [label1]      ; always works
    movzx  eax,  byte ptr [label1+2]  ; zero extend the 'c' into EAX

    inc  [label1]                  ; ERROR: ambiguous operand-size

    mov  eax, [var1]               ; fine, both operands are dwords
    mov  al, [var1]                ; ERROR: operand-size mismatch
    mov  al, byte ptr [var1]       ; load the low byte of the dword

    inc  [var1]                   ; legal: the "variable" implies dword operand size
    inc  dword ptr [var1]         ; same as above
    and  byte ptr [var1], ~20h    ; upper-case just the first character, 'abcd' into 'Abcd'

Note that mov eax, var1 is equivalent to mov eax, [var1] in MASM syntax, but I prefer making the memory reference explicit by using [].

like image 137
Peter Cordes Avatar answered Oct 10 '22 08:10

Peter Cordes