I am trying to reverse engineer a binary and the following instruction is confusing me, can anyone clarify what exactly this does? <pre class="prettyprint"><code>=>0x804854e: repnz scas al,BYTE PTR es:[edi] 0x8048550: not ecx </code></pre> Where: <pre class="prettyprint"><code>EAX: 0x0 ECX: 0xffffffff EDI: 0xbffff3dc ("aaaaaa\n") ZF: 1 </code></pre> I see that it is somehow decrementing ECX by 1 each iteration, and that EDI is incrementing along the length of the string. I know it calculates the length of the string, but as far as exactly HOW it's happening, and why "al" is involved I'm not quite sure.

I'll try to explain it by reversing the code back into C. Intel's Instruction Set Reference (Volume 2 of Software Developer's Manual) is invaluable for this kind of reverse engineering. <h3>REPNE SCASB</h3> The logic for REPNE and SCASB combined: <pre class="prettyprint"><code>while (ecx != 0) { temp = al - *(BYTE *)edi; SetStatusFlags(temp); if (DF == 0) // DF = Direction Flag edi = edi + 1; else edi = edi - 1; ecx = ecx - 1; if (ZF == 1) break; } </code></pre> Or more simply: <pre class="prettyprint"><code>while (ecx != 0) { ZF = (al == *(BYTE *)edi); if (DF == 0) edi++; else edi--; ecx--; if (ZF) break; } </code></pre> <h3>String Length</h3> However, the above is insufficient to explain how it computes the length of a string. Based on the presence of the <code>not ecx</code> in your question, I'm assuming the snippet belongs to this idiom (or similar) for computing string length using <code>REPNE SCASB</code>: <pre class="prettyprint"><code>sub ecx, ecx sub al, al not ecx cld repne scasb not ecx dec ecx </code></pre> Translating to C and using our logic from the previous section, we get: <pre class="prettyprint"><code>ecx = (unsigned)-1; al = 0; DF = 0; while (ecx != 0) { ZF = (al == *(BYTE *)edi); if (DF == 0) edi++; else edi--; ecx--; if (ZF) break; } ecx = ~ecx; ecx--; </code></pre> Simplifying using <code>al = 0</code> and <code>DF = 0</code>: <pre class="prettyprint"><code>ecx = (unsigned)-1; while (ecx != 0) { ZF = (0 == *(BYTE *)edi); edi++; ecx--; if (ZF) break; } ecx = ~ecx; ecx--; </code></pre> Things to note: <ul> <li>in two's complement notation, flipping the bits of <code>ecx</code> is equivalent to <code>-1 - ecx</code>.</li> <li>in the loop, <code>ecx</code> is decremented before the loop breaks, so it decrements by <code>length(edi) + 1</code> in total.</li> <li> <code>ecx</code> can never be zero in the loop, since the string would have to occupy the entire address space.</li> </ul> So after the loop above, <code>ecx</code> contains <code>-1 - (length(edi) + 1)</code> which is the same as <code>-(length(edi) + 2)</code>, which we flip the bits to give <code>length(edi) + 1</code>, and finally decrement to give <code>length(edi)</code>. Or rearranging the loop and simplifying: <pre class="prettyprint"><code>const char *s = edi; size_t c = (size_t)-1; // c == -1 while (*s++ != '\0') c--; // c == -1 - length(s) c = ~c; // c == length(s) </code></pre> And inverting the count: <pre class="prettyprint"><code>size_t c = 0; while (*s++ != '\0') c++; </code></pre> which is the <code>strlen</code> function from C: <pre class="prettyprint"><code>size_t strlen(const char *s) { size_t c = 0; while (*s++ != '\0') c++; return c; } </code></pre>

<code>AL</code> is involved, because <code>scas</code> scans the memory for the value of <code>AL</code>. <code>AL</code> has been zeroed so that the instruction finds the terminating zero at the end of the string. <code>scas</code> itself increments (or decrements, depending on the direction flag) <code>EDI</code> automatically. The <code>REPNZ</code> prefix (which is more readable in the <code>REPNE</code> form) repeats the <code>scas</code> as long as the comparison is false (REPeat while Not Equal) and <code>ECX > 0</code>. It also decrements <code>ECX</code> automatically in every iteration. <code>ECX</code> has been initialized to the longest possible string so that it doesn't terminate the loop early. Since <code>ECX</code> counts down from <code>0xffffffff</code> (also known as -1), the resulting length will be <code>-1-ECX</code> which due to the peculiarity of 2's complement arithmetic can be calculated using a <code>NOT</code> instruction.

REPNZ SCAS Assembly Instruction Specifics

Tags:

I am trying to reverse engineer a binary and the following instruction is confusing me, can anyone clarify what exactly this does?

=>0x804854e:    repnz scas al,BYTE PTR es:[edi]   0x8048550:    not    ecx

Where:

EAX: 0x0 ECX: 0xffffffff EDI: 0xbffff3dc ("aaaaaa\n") ZF:  1

I see that it is somehow decrementing ECX by 1 each iteration, and that EDI is incrementing along the length of the string. I know it calculates the length of the string, but as far as exactly HOW it's happening, and why "al" is involved I'm not quite sure.

772

asked Nov 06 '14 15:11

Michael Scott

2 Answers

I'll try to explain it by reversing the code back into C.

Intel's Instruction Set Reference (Volume 2 of Software Developer's Manual) is invaluable for this kind of reverse engineering.

REPNE SCASB

The logic for REPNE and SCASB combined:

while (ecx != 0) {     temp = al - *(BYTE *)edi;     SetStatusFlags(temp);     if (DF == 0)   // DF = Direction Flag         edi = edi + 1;     else         edi = edi - 1;     ecx = ecx - 1;     if (ZF == 1) break; }

Or more simply:

while (ecx != 0) {     ZF = (al == *(BYTE *)edi);     if (DF == 0)         edi++;     else         edi--;     ecx--;     if (ZF) break; }

String Length

However, the above is insufficient to explain how it computes the length of a string. Based on the presence of the not ecx in your question, I'm assuming the snippet belongs to this idiom (or similar) for computing string length using REPNE SCASB:

sub ecx, ecx sub al, al not ecx cld repne scasb not ecx dec ecx

Translating to C and using our logic from the previous section, we get:

ecx = (unsigned)-1; al = 0; DF = 0; while (ecx != 0) {     ZF = (al == *(BYTE *)edi);     if (DF == 0)         edi++;     else         edi--;     ecx--;     if (ZF) break; } ecx = ~ecx; ecx--;

Simplifying using al = 0 and DF = 0:

ecx = (unsigned)-1; while (ecx != 0) {     ZF = (0 == *(BYTE *)edi);     edi++;     ecx--;     if (ZF) break; } ecx = ~ecx; ecx--;

Things to note:

in two's complement notation, flipping the bits of ecx is equivalent to -1 - ecx.
in the loop, ecx is decremented before the loop breaks, so it decrements by length(edi) + 1 in total.
ecx can never be zero in the loop, since the string would have to occupy the entire address space.

So after the loop above, ecx contains -1 - (length(edi) + 1) which is the same as -(length(edi) + 2), which we flip the bits to give length(edi) + 1, and finally decrement to give length(edi).

Or rearranging the loop and simplifying:

const char *s = edi; size_t c = (size_t)-1;      // c == -1 while (*s++ != '\0') c--;   // c == -1 - length(s) c = ~c;                     // c == length(s)

And inverting the count:

size_t c = 0; while (*s++ != '\0') c++;

which is the strlen function from C:

size_t strlen(const char *s) {     size_t c = 0;     while (*s++ != '\0') c++;     return c; }

200

answered Oct 19 '22 02:10

Daniel Hanrahan

AL is involved, because scas scans the memory for the value of AL. AL has been zeroed so that the instruction finds the terminating zero at the end of the string. scas itself increments (or decrements, depending on the direction flag) EDI automatically. The REPNZ prefix (which is more readable in the REPNE form) repeats the scas as long as the comparison is false (REPeat while Not Equal) and ECX > 0. It also decrements ECX automatically in every iteration. ECX has been initialized to the longest possible string so that it doesn't terminate the loop early.

Since ECX counts down from 0xffffffff (also known as -1), the resulting length will be -1-ECX which due to the peculiarity of 2's complement arithmetic can be calculated using a NOT instruction.

answered Oct 19 '22 01:10

Jester

Related questions
                            
                                Separate controller per tab in angular-material w/ ui-router
                            
                                SQLite Select from where column contains string?
                            
                                Golang bug or intended feature on map literals?
                            
                                Select users belonging only to particular departments
                            
                                Angular2 two-way data binding
                            
                                Error in FxCop Phoenix analysis engine
                            
                                How to change the default author in PhpStorm for PHPDoc
                            
                                How to work around Groovy's XmlSlurper refusing to parse HTML due to DOCTYPE and DTD restrictions?
                            
                                Upgrade cordova: cannot install plugins from git urls anymore
                            
                                Java POI the supplied data appears to be in the Office 2007+ XML
                            
                                Unreadable Notebook NotJSONError('Notebook does not appear to be JSON: u\'{\\n "cells": [\\n {\\n "cell_type": "...',)
                            
                                How to pass baseUrl from protractor.conf into my test to assert it in tests?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With