Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected output from Bubblesort program with MSVC vs TCC

One of my friends sent this code to me, saying it doesn't work as expected:

#include<stdio.h>

void main()
{
    int a [10] ={23, 100, 20, 30, 25, 45, 40, 55, 43, 42};
    int sizeOfInput = sizeof(a)/sizeof(int);
    int b, outer, inner, c;
    printf("Size is : %d \n", sizeOfInput);

    printf("Values before bubble sort are  : \n");
    for ( b = 0; b &lt; sizeOfInput; b++)
        printf("%d\n", a[b]);

    printf("End of values before bubble sort... \n");

    for ( outer = sizeOfInput; outer > 0; outer-- )
    {
        for (  inner = 0 ; inner < outer ; inner++)
        {
        printf ( "Comparing positions: %d and %d\n",inner,inner+1);
        if ( a[inner] > a[inner + 1] )
        {
                int tmp = a[inner];
                a[inner] = a [inner+1];
            a[inner+1] = tmp;
            }
        }
        printf ( "Bubble sort total array size after inner loop is %d :\n",sizeOfInput);
        printf ( "Bubble sort sizeOfInput after inner loop is %d :\n",sizeOfInput);
    }
    printf ( "Bubble sort total array size at the end is %d :\n",sizeOfInput);
    for ( c = 0 ; c < sizeOfInput; c++)
        printf("Element: %d\n", a[c]);
}

I am using Micosoft Visual Studio Command Line Tool for compiling this on a Windows XP machine. cl /EHsc bubblesort01.c

My friend gets the correct output on a dinosaur machine (code is compiled using TCC there).
My output is unexpected. The array mysteriously grows in size, in between.

If you change the code so that the variable sizeOfInput is changed to sizeOfInputt, it gives the expected results!

A search done at Microsoft Visual C++ Developer Center doesn't give any results for "sizeOfInput".

I am not a C/C++ expert, and am curious to find out why this happens - any C/C++ experts who can shed some light on this?
Unrelated note: I seriously thought of rewriting the whole code to use quicksort or merge sort before posting it here. But, after all, it is not Stooge sort...

Edit: I know the code is not correct (it reads beyond the last element), but I am curious why the variable name makes a difference.

like image 882
crnlx Avatar asked Jan 01 '11 20:01

crnlx


2 Answers

Like interjay's answer mentioned, once you get into undefined behavior all bets are off. However, when you said that merely renaming the variable changed the behavior of the program, I got curious about what was going on (undefined or not).

First, I didn't believe that renaming the variable would change the compiler's output (all other things being equal), but sure enough - when I tried it, I was surprised to see exactly what you described.

So I had the compiler dump the assembly for the code it was generating for each version of the source file, and ran a comparison. Here's what I found in the compilers description of how it was laying out the local variables:

    ***** test.sizeOfInput.cod
    _c$ = -56                    ; size = 4
    _b$ = -52                    ; size = 4
    _inner$ = -48                ; size = 4
    _a$ = -44                    ; size = 40
>>> _sizeOfInput$ = -4           ; size = 4
    _main   PROC
    ***** test.sizeOfInputt.cod
    _c$ = -56                    ; size = 4
>>> _sizeOfInputt$ = -52         ; size = 4
    _b$ = -48                    ; size = 4
    _inner$ = -44                ; size = 4
    _a$ = -40                    ; size = 40
    _main   PROC
    *****

What you'll notice is that when the variable is named sizeOfInput, he compiler places it at a higher address than array a (just past the end of the array), and when the variable is named sizeOfInputt it places it at a lower address than array a instead of just past the end of the array. That means that in the build that has the variable named sizeOfInput, the undefined behavior that occurs when you modify a[10] is changing the value of sizeOfInput. In the build that uses the name sizeOfInputt, since that variable isn't at the end of the array, the write to a[10] trashes something else.

As to why the compiler would lay out the variables differently when the name of one changes in an apparently insignificant way - I have no idea.

But this is a good example of why you shouldn't count on the layout of local variables (or pretty much any variables, though you can count on the layout order of struct elements), and why when it comes to undefined behavior, "it works on my machine" doesn't cut it as proof that something works.

like image 107
Michael Burr Avatar answered Nov 15 '22 17:11

Michael Burr


Your code reads past the end of the array. The maximum value of outer is 10, and the maximum value of inner is 9, so a[inner+1] will read a[10]. This will give you undefined behavior, which explains why different compilers give different results.

As for the variable name making a difference: It probably doesn't. It's possible that if you run the same code twice (using the same variable name), you will get different results. Basically, when invoking undefined behavior, you can't be sure of anything that your program will do, so it's best not to try and find meaning in things like variable names.

There's also a chance that the variable name does make a difference. That depends on the implementation of the compiler: For example, using a different variable name might make the compiler organize memory differently somehow, which could cause the program to bahave differently. I think most compilers would output the same code if you change variable names though, so it was probably just a matter of luck.

like image 32
interjay Avatar answered Nov 15 '22 18:11

interjay