How to index through a string in assembly

Tags:

intel

Given the variable:

var1    db  "abcdefg", NULL

How would I perform a loop to navigate each letter? In C++ you would do something like var[x] inside the loop, then increment x each time. Any ideas?

400

asked Jun 14 '17 03:06

1 Answers

In C and C++, strings are NUL terminated. This means that an ASCII NUL character (0) is added to the end of the string so that code can tell where the string ends. The strlen function walks through the string, starting from the beginning, and keeps looping until it encounters this NUL character. When it finds the NUL, it knows that's the end of the string, and it returns the number of characters from the beginning to the NUL as the string's length.

String literals (the things in double-quotation marks) are automatically NUL-terminated by a C/C++ compiler, so that:

"abcdefg"

is equivalent to the following array:

{'a', 'b', 'c', 'd', 'e', 'f', 'g', 0}

I mention this because Peter Rader suggested it in his answer, and you didn't really understand what he was talking about. However, it seems that you already know this, as you appended a NUL character to your string in the assembly declaration:

var1    db  "abcdefg", NULL

Now, generally, we don't use the identifier NULL for this. Especially not in C, where NULL is defined as a null pointer. We just use the literal 0, so that definition would be:

var1    db  "abcdefg", 0

but your code probably works, assuming that NULL is somewhere defined as 0.

So you've got the setup all correct. Now all you need to do is write your loop:

    mov  edx, OFFSET var1    ; get starting address of string

Loop:
    mov  al, BYTE PTR [edx]  ; get next character
    inc  edx                 ; increment pointer
    test al, al              ; test value in AL and set flags
    jz   Finished            ; AL == 0, so exit the loop

    ; Otherwise, AL != 0, so we fell through.
    ; Here, you can do do something with the character in AL.
    ; ...

    jmp  Loop                ; keep looping

Finished:

You say that you're familiar with the CMP instruction. In the code above, I used TEST rather than CMP. You could have equivalently written:

cmp  al, 0

but

test al, al

is slightly more efficient because it is a smaller instruction, so I'm just in the habit of writing it that way in the special case that I'm comparing a register's value to 0. Compilers will generate this code, too, so it's good to be familiar with it.

Bonus chatter: An alternative way of representing a string is to store its length (in characters) along with the string itself. This is what the Pascal language traditionally did. This way, you don't need the special NUL sentinel character at the end of the string. Rather, the declaration would look like this:

var1    db  7, "abcdefg"

where the first byte of every string is its length. This has various advantages over the C style, namely that you don't have to iterate through the entire string to determine its length. The primary disadvantage, of course, is that a string's length is limited to 255 characters, since that's all that will fit into a BYTE.

Anyway, with the length known in advance, you're no longer checking for a NUL character, you're just iterating the same number of times as the characters in the string:

    mov  edx, OFFSET var1    ; get starting address of string
    mov  cl, BYTE PTR [edx]  ; get length of string

Loop:
    inc  edx                 ; increment pointer
    dec  cl                  ; decrement length
    mov  al, BYTE PTR [edx]  ; get next character
    jz   Finished            ; CL == 0, so exit the loop

    ; Do something with the character in AL.
    ; ...

    jmp  Loop                ; keep looping

Finished:

(In the code above, I've assumed that all strings are a minimum of 1 character in length. This is probably a safe assumption, and avoids the need to do a length check above the loop.)

Alternatively, you could do the array-indexing that you mentioned, but you have to be a bit careful if you want to iterate forwards through the string:

    mov   edx, OFFSET var1        ; get starting address of string
    movzx ecx, BYTE PTR [edx]     ; get length of string
    lea   edx, [ecx+1]            ; increment pointer by 1 + number of chars
    neg   ecx                     ; negate the length counter
Loop:
    mov   al, BYTE PTR [edx+ecx]  ; get next character

    ; Do something with the character in AL.
    ; ...

    inc   ecx
    jnz   Loop                     ; CL != 0, so keep looping

Basically, we set EDX to point to the end of the string, we set the counter (ECX) to the negative of the length of the string, and then we read characters by indexing [EDX+ECX] (which, since we negated ECX, is equivalent to [EDX-ECX]).

There is almost certainly a better (more clever) way of doing this than I've managed to think up here, but you should get the idea.

150

answered Oct 06 '22 08:10

Cody Gray

Related questions
                            
                                Why is fp division op slower than reciprocal op plus multiply op
                            
                                OS X - x64: stack not 16 byte aligned error
                            
                                MOV BX,[SI] - ASM question
                            
                                NDK build for target x86_64 results in error
                            
                                using TBB for non-parallel tasks
                            
                                Hotspot in a for loop
                            
                                Purpose of self-IPI on IA-32
                            
                                Homegrown workqueue vs Intel TBB
                            
                                Porting low-level x86 optimized code to the ARM Cortex-A8 architecture
                            
                                Intel TBB for Android and iOS
                            
                                How to load a pixel struct into an SSE register?
                            
                                clang (LLVM) inline assembly - multiple constraints with useless spills / reloads
                            
                                which CPUs support MOVBE instruction?
                            
                                SIMD instructions for floating point equality comparison (with NaN == NaN)
                            
                                On which platforms does integer divide by zero trigger a floating point exception?
                            
                                When should I use _mm_sfence _mm_lfence and _mm_mfence
                            
                                Is memcpy() usually faster than strcpy()?
                            
                                Interpretation of intel_gpu_top output
                            
                                Difference between MOVDQA and MOVAPS x86 instructions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to index through a string in assembly

Tags:

intel

Jenke

People also ask

1 Answers

Cody Gray

Recent Activity

Donate For Us