Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why does vs c++ 2010 compiler produce a different assembly code for similar function

Tags:

c++

c

assembly

So recently i was thinking about strcpy and back to K&R where they show the implementation as

while (*dst++ = *src++) ;

However I mistakenly transcribed it as:

while (*dst = *src)
{
    src++; //technically could be ++src on these lines
    dst++; 
}

In any case that got me thinking about whether the compiler would actually produce different code for these two. My initial thought is they should be near identical, since src and dst are being incremented but never used I thought the compiler would know not to try to acually preserve them as "variables" in the produced machine code.

Using windows7 with VS 2010 C++ SP1 building in 32 bit Release mode (/O2), I got the dis-assembly code for both of the above incarnations. To prevent the function itself from referencing the input directly and being inlined i made a dll with each of the functions. I have omitted the prologue and epilogue of the produced ASM.

    while (*dst++ = *src++)
6EBB1003 8B 55 08             mov         edx,dword ptr [src]     
6EBB1006 8B 45 0C             mov         eax,dword ptr [dst]     
6EBB1009 2B D0                sub         edx,eax                //prepare edx so that edx + eax always points to src     
6EBB100B EB 03                jmp         docopy+10h (6EBB1010h)  
6EBB100D 8D 49 00             lea         ecx,[ecx]              //looks like align padding, never hit this line
6EBB1010 8A 0C 02             mov         cl,byte ptr [edx+eax]  //ptr [edx+ eax] points to char in src  :loop begin
6EBB1013 88 08                mov         byte ptr [eax],cl      //copy char to dst
6EBB1015 40                   inc         eax                    //inc src ptr
6EBB1016 84 C9                test        cl,cl                  // check for 0 (null terminator)
6EBB1018 75 F6                jne         docopy+10h (6EBB1010h)  //if not goto :loop begin
        ;

Above I have annotated the code, essentially a single loop , only 1 check for null and 1 memory copy.

Now lets look at my mistake version:

    while (*dst = *src)
6EBB1003 8B 55 08             mov         edx,dword ptr [src]  
6EBB1006 8A 0A                mov         cl,byte ptr [edx]  
6EBB1008 8B 45 0C             mov         eax,dword ptr [dst]  
6EBB100B 88 08                mov         byte ptr [eax],cl       //copy 0th char to dst
6EBB100D 84 C9                test        cl,cl                   //check for 0
6EBB100F 74 0D                je          docopy+1Eh (6EBB101Eh)  // return if we encounter null terminator
6EBB1011 2B D0                sub         edx,eax  
6EBB1013 8A 4C 02 01          mov         cl,byte ptr [edx+eax+1]  //get +1th char  :loop begin
    {
        src++;
        dst++;
6EBB1017 40                   inc         eax                   
6EBB1018 88 08                mov         byte ptr [eax],cl        //copy above char to dst
6EBB101A 84 C9                test        cl,cl                    //check for 0
6EBB101C 75 F5                jne         docopy+13h (6EBB1013h)   // if not goto :loop begin
    }

In my version, I see that it first copies the 0th char to the destination, then checks for null , and then finally enters the loop where it checks for null again. So the loop remains largely the same but now it handles the 0th character before the loop. This of course is going to be sub-optimal compared with the first case.

I am wondering if anyone knows why the compiler is being prevented from making the same (or near same) code as the first example. Is this a ms compiler specific issue or possibly with my compiler/linker settings?


here is the full code, 2 files (1 function replaces the other).

// in first dll project
__declspec(dllexport) void docopy(const char* src, char* dst)
{
    while (*dst++ = *src++);
}

__declspec(dllexport) void docopy(const char* src, char* dst)
{
    while (*dst = *src)
    {
        ++src;
        ++dst;
    }
}


//seprate main.cpp file calls docopy
void docopy(const char* src, char* dst);
char* source ="source";
char destination[100];
int main()
{

    docopy(source, destination);
}
like image 982
skimon Avatar asked Mar 23 '12 17:03

skimon


People also ask

Why the compiler is generating dramatically different assembly?

The answer of course being the compiler was fed different code on the input so it is perfectly valid for the compiler to generate different output.


2 Answers

Because in the first example, the post-increment happens always, even if src starts out pointing to a null character. In the same starting situation, the second example would not increment the pointers.

like image 184
AShelly Avatar answered Sep 19 '22 22:09

AShelly


Of course the compiler has other options. The "copy first byte then enter the loop if not 0" is what gcc-4.5.1 produces with -O1. With -O2 and -O3, it produces

.LFB0:
    .cfi_startproc
    jmp     .L6             // jump to copy
    .p2align 4,,10
    .p2align 3
.L4:
    addq    $1, %rdi        // increment pointers
    addq    $1, %rsi
.L6:                        // copy
    movzbl  (%rdi), %eax    // get source byte
    testb   %al, %al        // check for 0
    movb    %al, (%rsi)     // move to dest
    jne     .L4             // loop if nonzero
    rep
    ret
    .cfi_endproc

which is quite similar to what it produces for the K&R loop. Whether that's actually better I can't say, but it looks nicer.

Apart from the jump into the loop, the instructions for the K&R loop are exactly the same, just ordered differently:

.LFB0:
    .cfi_startproc
    .p2align 4,,10
    .p2align 3
.L2:
    movzbl  (%rdi), %eax    // get source byte
    addq    $1, %rdi        // increment source pointer
    movb    %al, (%rsi)     // move byte to dest
    addq    $1, %rsi        // increment dest pointer
    testb   %al, %al        // check for 0
    jne     .L2             // loop if nonzero
    rep
    ret
    .cfi_endproc
like image 44
Daniel Fischer Avatar answered Sep 20 '22 22:09

Daniel Fischer