I am trying to reverse engineer a binary and the following instruction is confusing me, can anyone clarify what exactly this does?
=>0x804854e: repnz scas al,BYTE PTR es:[edi] 0x8048550: not ecx
Where:
EAX: 0x0 ECX: 0xffffffff EDI: 0xbffff3dc ("aaaaaa\n") ZF: 1
I see that it is somehow decrementing ECX by 1 each iteration, and that EDI is incrementing along the length of the string. I know it calculates the length of the string, but as far as exactly HOW it's happening, and why "al" is involved I'm not quite sure.
The SCAS instruction is used for searching a particular character or set of characters in a string. The data item to be searched should be in AL (for SCASB), AX (for SCASW) or EAX (for SCASD) registers. The string to be searched should be in memory and pointed by the ES:DI (or EDI) register.
It compares the content of the accumulator ( AL , AX , or EAX ) against the current value pointed at by ES:[EDI] . When used together with the REPNE prefix (REPeat while Not Equal), SCAS scans the string searching for the first string element which is equal to the value in the accumulator. The Intel manual (Vol. 1, p.
The SCASB instruction compares the content of the AL register to the byte addressed by ES:[DI] by performing the subtraction operation AL - ES:[DI], setting the arithmetic flags based on the result of the subtraction. Then, DI is either incremented or decremented by 1 depending on the Direction Flag.
I'll try to explain it by reversing the code back into C.
Intel's Instruction Set Reference (Volume 2 of Software Developer's Manual) is invaluable for this kind of reverse engineering.
The logic for REPNE and SCASB combined:
while (ecx != 0) { temp = al - *(BYTE *)edi; SetStatusFlags(temp); if (DF == 0) // DF = Direction Flag edi = edi + 1; else edi = edi - 1; ecx = ecx - 1; if (ZF == 1) break; }
Or more simply:
while (ecx != 0) { ZF = (al == *(BYTE *)edi); if (DF == 0) edi++; else edi--; ecx--; if (ZF) break; }
However, the above is insufficient to explain how it computes the length of a string. Based on the presence of the not ecx
in your question, I'm assuming the snippet belongs to this idiom (or similar) for computing string length using REPNE SCASB
:
sub ecx, ecx sub al, al not ecx cld repne scasb not ecx dec ecx
Translating to C and using our logic from the previous section, we get:
ecx = (unsigned)-1; al = 0; DF = 0; while (ecx != 0) { ZF = (al == *(BYTE *)edi); if (DF == 0) edi++; else edi--; ecx--; if (ZF) break; } ecx = ~ecx; ecx--;
Simplifying using al = 0
and DF = 0
:
ecx = (unsigned)-1; while (ecx != 0) { ZF = (0 == *(BYTE *)edi); edi++; ecx--; if (ZF) break; } ecx = ~ecx; ecx--;
Things to note:
ecx
is equivalent to -1 - ecx
.ecx
is decremented before the loop breaks, so it decrements by length(edi) + 1
in total.ecx
can never be zero in the loop, since the string would have to occupy the entire address space.So after the loop above, ecx
contains -1 - (length(edi) + 1)
which is the same as -(length(edi) + 2)
, which we flip the bits to give length(edi) + 1
, and finally decrement to give length(edi)
.
Or rearranging the loop and simplifying:
const char *s = edi; size_t c = (size_t)-1; // c == -1 while (*s++ != '\0') c--; // c == -1 - length(s) c = ~c; // c == length(s)
And inverting the count:
size_t c = 0; while (*s++ != '\0') c++;
which is the strlen
function from C:
size_t strlen(const char *s) { size_t c = 0; while (*s++ != '\0') c++; return c; }
AL
is involved, because scas
scans the memory for the value of AL
. AL
has been zeroed so that the instruction finds the terminating zero at the end of the string. scas
itself increments (or decrements, depending on the direction flag) EDI
automatically. The REPNZ
prefix (which is more readable in the REPNE
form) repeats the scas
as long as the comparison is false (REPeat while Not Equal) and ECX > 0
. It also decrements ECX
automatically in every iteration. ECX
has been initialized to the longest possible string so that it doesn't terminate the loop early.
Since ECX
counts down from 0xffffffff
(also known as -1), the resulting length will be -1-ECX
which due to the peculiarity of 2's complement arithmetic can be calculated using a NOT
instruction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With