Given the variable:
var1 db "abcdefg", NULL
How would I perform a loop to navigate each letter? In C++ you would do something like var[x] inside the loop, then increment x each time. Any ideas?
Strings are ordered sequences of character data, 00:15 and the individual characters of a string can be accessed directly using that numerical index. String indexing in Python is zero-based, so the very first character in the string would have an index of 0 , 00:30 and the next would be 1 , and so on.
Because strings, like lists and tuples, are a sequence-based data type, it can be accessed through indexing and slicing.
2 Arrays in assembly language An array is a collection of variables, all of the same type, which you access by specifying a subscript (also called an index) which identifies one of the variables in the collection.
Just subtract the string address from what strchr returns: char *string = "qwerty"; char *e; int index; e = strchr(string, 'e'); index = (int)(e - string); Note that the result is zero based, so in above example it will be 2.
In C and C++, strings are NUL terminated. This means that an ASCII NUL character (0) is added to the end of the string so that code can tell where the string ends. The strlen
function walks through the string, starting from the beginning, and keeps looping until it encounters this NUL character. When it finds the NUL, it knows that's the end of the string, and it returns the number of characters from the beginning to the NUL as the string's length.
String literals (the things in double-quotation marks) are automatically NUL-terminated by a C/C++ compiler, so that:
"abcdefg"
is equivalent to the following array:
{'a', 'b', 'c', 'd', 'e', 'f', 'g', 0}
I mention this because Peter Rader suggested it in his answer, and you didn't really understand what he was talking about. However, it seems that you already know this, as you appended a NUL character to your string in the assembly declaration:
var1 db "abcdefg", NULL
Now, generally, we don't use the identifier NULL
for this. Especially not in C, where NULL
is defined as a null pointer. We just use the literal 0, so that definition would be:
var1 db "abcdefg", 0
but your code probably works, assuming that NULL
is somewhere defined as 0.
So you've got the setup all correct. Now all you need to do is write your loop:
mov edx, OFFSET var1 ; get starting address of string
Loop:
mov al, BYTE PTR [edx] ; get next character
inc edx ; increment pointer
test al, al ; test value in AL and set flags
jz Finished ; AL == 0, so exit the loop
; Otherwise, AL != 0, so we fell through.
; Here, you can do do something with the character in AL.
; ...
jmp Loop ; keep looping
Finished:
You say that you're familiar with the CMP
instruction. In the code above, I used TEST
rather than CMP
. You could have equivalently written:
cmp al, 0
but
test al, al
is slightly more efficient because it is a smaller instruction, so I'm just in the habit of writing it that way in the special case that I'm comparing a register's value to 0. Compilers will generate this code, too, so it's good to be familiar with it.
Bonus chatter: An alternative way of representing a string is to store its length (in characters) along with the string itself. This is what the Pascal language traditionally did. This way, you don't need the special NUL sentinel character at the end of the string. Rather, the declaration would look like this:
var1 db 7, "abcdefg"
where the first byte of every string is its length. This has various advantages over the C style, namely that you don't have to iterate through the entire string to determine its length. The primary disadvantage, of course, is that a string's length is limited to 255 characters, since that's all that will fit into a BYTE.
Anyway, with the length known in advance, you're no longer checking for a NUL character, you're just iterating the same number of times as the characters in the string:
mov edx, OFFSET var1 ; get starting address of string
mov cl, BYTE PTR [edx] ; get length of string
Loop:
inc edx ; increment pointer
dec cl ; decrement length
mov al, BYTE PTR [edx] ; get next character
jz Finished ; CL == 0, so exit the loop
; Do something with the character in AL.
; ...
jmp Loop ; keep looping
Finished:
(In the code above, I've assumed that all strings are a minimum of 1 character in length. This is probably a safe assumption, and avoids the need to do a length check above the loop.)
Alternatively, you could do the array-indexing that you mentioned, but you have to be a bit careful if you want to iterate forwards through the string:
mov edx, OFFSET var1 ; get starting address of string
movzx ecx, BYTE PTR [edx] ; get length of string
lea edx, [ecx+1] ; increment pointer by 1 + number of chars
neg ecx ; negate the length counter
Loop:
mov al, BYTE PTR [edx+ecx] ; get next character
; Do something with the character in AL.
; ...
inc ecx
jnz Loop ; CL != 0, so keep looping
Basically, we set EDX
to point to the end of the string, we set the counter (ECX
) to the negative of the length of the string, and then we read characters by indexing [EDX+ECX]
(which, since we negated ECX
, is equivalent to [EDX-ECX]
).
There is almost certainly a better (more clever) way of doing this than I've managed to think up here, but you should get the idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With