Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REPNZ SCAS Assembly Instruction Specifics

Tags:

I am trying to reverse engineer a binary and the following instruction is confusing me, can anyone clarify what exactly this does?

=>0x804854e:    repnz scas al,BYTE PTR es:[edi]   0x8048550:    not    ecx 

Where:

EAX: 0x0 ECX: 0xffffffff EDI: 0xbffff3dc ("aaaaaa\n") ZF:  1 

I see that it is somehow decrementing ECX by 1 each iteration, and that EDI is incrementing along the length of the string. I know it calculates the length of the string, but as far as exactly HOW it's happening, and why "al" is involved I'm not quite sure.

like image 772
Michael Scott Avatar asked Nov 06 '14 15:11

Michael Scott


People also ask

How to use SCAS in assembly?

The SCAS instruction is used for searching a particular character or set of characters in a string. The data item to be searched should be in AL (for SCASB), AX (for SCASW) or EAX (for SCASD) registers. The string to be searched should be in memory and pointed by the ES:DI (or EDI) register.

What is Repne Scasb?

It compares the content of the accumulator ( AL , AX , or EAX ) against the current value pointed at by ES:[EDI] . When used together with the REPNE prefix (REPeat while Not Equal), SCAS scans the string searching for the first string element which is equal to the value in the accumulator. The Intel manual (Vol. 1, p.

What does Scasb instruction accomplish?

The SCASB instruction compares the content of the AL register to the byte addressed by ES:[DI] by performing the subtraction operation AL - ES:[DI], setting the arithmetic flags based on the result of the subtraction. Then, DI is either incremented or decremented by 1 depending on the Direction Flag.


2 Answers

I'll try to explain it by reversing the code back into C.

Intel's Instruction Set Reference (Volume 2 of Software Developer's Manual) is invaluable for this kind of reverse engineering.

REPNE SCASB

The logic for REPNE and SCASB combined:

while (ecx != 0) {     temp = al - *(BYTE *)edi;     SetStatusFlags(temp);     if (DF == 0)   // DF = Direction Flag         edi = edi + 1;     else         edi = edi - 1;     ecx = ecx - 1;     if (ZF == 1) break; } 

Or more simply:

while (ecx != 0) {     ZF = (al == *(BYTE *)edi);     if (DF == 0)         edi++;     else         edi--;     ecx--;     if (ZF) break; } 

String Length

However, the above is insufficient to explain how it computes the length of a string. Based on the presence of the not ecx in your question, I'm assuming the snippet belongs to this idiom (or similar) for computing string length using REPNE SCASB:

sub ecx, ecx sub al, al not ecx cld repne scasb not ecx dec ecx 

Translating to C and using our logic from the previous section, we get:

ecx = (unsigned)-1; al = 0; DF = 0; while (ecx != 0) {     ZF = (al == *(BYTE *)edi);     if (DF == 0)         edi++;     else         edi--;     ecx--;     if (ZF) break; } ecx = ~ecx; ecx--; 

Simplifying using al = 0 and DF = 0:

ecx = (unsigned)-1; while (ecx != 0) {     ZF = (0 == *(BYTE *)edi);     edi++;     ecx--;     if (ZF) break; } ecx = ~ecx; ecx--; 

Things to note:

  • in two's complement notation, flipping the bits of ecx is equivalent to -1 - ecx.
  • in the loop, ecx is decremented before the loop breaks, so it decrements by length(edi) + 1 in total.
  • ecx can never be zero in the loop, since the string would have to occupy the entire address space.

So after the loop above, ecx contains -1 - (length(edi) + 1) which is the same as -(length(edi) + 2), which we flip the bits to give length(edi) + 1, and finally decrement to give length(edi).

Or rearranging the loop and simplifying:

const char *s = edi; size_t c = (size_t)-1;      // c == -1 while (*s++ != '\0') c--;   // c == -1 - length(s) c = ~c;                     // c == length(s) 

And inverting the count:

size_t c = 0; while (*s++ != '\0') c++; 

which is the strlen function from C:

size_t strlen(const char *s) {     size_t c = 0;     while (*s++ != '\0') c++;     return c; } 
like image 200
Daniel Hanrahan Avatar answered Oct 19 '22 02:10

Daniel Hanrahan


AL is involved, because scas scans the memory for the value of AL. AL has been zeroed so that the instruction finds the terminating zero at the end of the string. scas itself increments (or decrements, depending on the direction flag) EDI automatically. The REPNZ prefix (which is more readable in the REPNE form) repeats the scas as long as the comparison is false (REPeat while Not Equal) and ECX > 0. It also decrements ECX automatically in every iteration. ECX has been initialized to the longest possible string so that it doesn't terminate the loop early.

Since ECX counts down from 0xffffffff (also known as -1), the resulting length will be -1-ECX which due to the peculiarity of 2's complement arithmetic can be calculated using a NOT instruction.

like image 21
Jester Avatar answered Oct 19 '22 01:10

Jester