Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the code behave different for Java and C compiler?

Tags:

java

c

I have this Code, I ran this on Java and C ,but they give me two different results. What is that makes them to run differently.

x=10;y=10;z=10;
y-=x--;
z-=--x;
x-=--x-x--;

The Output in Java for value of X is : 8, and for C it is 6.

How these two compiler behave differently for incremented options?

like image 306
vipin k. Avatar asked Nov 24 '09 08:11

vipin k.


3 Answers

You are wrong when you say that the output of this code considered as a C program is 6.

Considered as a C program, this is undefined. You just happened to get 6 with your compiler, but you could just as well have gotten 24, segmentation fault, or a compile-time error.

See the C99 standard, 6.5.2:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.71)

--x-x-- is explicitly forbidden by this paragraph.

EDIT:

Aaron Digulla writes in the comments:

Is it really undefined?

Did you notice that I linked to the C99 standard and indicated the paragraph that says this is undefined?

gcc -Wall (GCC 4.1.2) doesn't complain about this and I doubt that any compiler would reject this code.

The standard describes some behaviors as "undefined" precisely because not all ways for a C program to be nonsense can be detected reliably at compile-time. If you think that "no warning" should mean everything's fine, you should switch to another language than C. Many modern languages are better defined. I use OCaml when I have a choice, but there are countless other well-defined languages.

There is a reason why it returns 6 and you should be able to explain it.

I did not notice your explanation of why this expression evaluated to 6. I hope you don't spend too much time writing it, because for me it returns 0.

Macbook:~ pascalcuoq$ cat t.c
#include <stdio.h>

int main(int argc, char **argv)
{
  int y;
  printf("argc:%d\n", argc);
  y = --argc - argc--;
  printf("y:%d\n", y);
  return 0;
}
Macbook:~ pascalcuoq$ gcc t.c
Macbook:~ pascalcuoq$ ./a.out 1 2 3 4 5 6 7 8 9
argc:10
y:0

This is the time at which you argue that there is a bug in my compiler (since it doesn't return the same thing as yours).

Macbook:~ pascalcuoq$ gcc -v
Using built-in specs.
Target: i686-apple-darwin9
Configured with: /var/tmp/gcc/gcc-5490~1/src/configure --disable-checking -enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib --build=i686-apple-darwin9 --with-arch=apple --with-tune=generic --host=i686-apple-darwin9 --target=i686-apple-darwin9
Thread model: posix
gcc version 4.0.1 (Apple Inc. build 5490)

Aaron also writes:

As an engineer, you should still be able to explain why it returns one result or the other.

Exactly! I gave the simplest explanation why one might get 6: the result is explicitly specified in C99 as undefined behavior, and it was in earlier standards too.

and:

Lastly, please show a compiler which warns about this construct.

To the best of my knowledge, no compiler warns about *(&x - 1) where x is defined by int x;. Are you claiming that this construct is valid C and that a good engineer should be able to predict the result because no compiler warns about it? This construct is undefined, just like the one being discussed.

Lastly, if you absolutely need warnings to believe there is a problem, consider using a verification tool such as Frama-C. It needs to make some assumptions that are not in the standard to capture some existing practices, but it correctly warns about --x-x-- and most other undefined C behaviors.

like image 133
Pascal Cuoq Avatar answered Sep 29 '22 08:09

Pascal Cuoq


How is the term evaluated? The right hand side --x - x-- evaluates to 0 for both Java and C but it changes x. So the question is: How does -= work? Does it read x before the right hand side (RHS) is evaluated and then subtracts the RHS or does it do that after the RHS was evaluated. So do you have

tmp = x // copy the value of x
x = tmp - (--x - x--) // complicated way to say x = x

or

tmp = (--x - x--) // first evaluate RHS, from left to right, which means x -= 2.
x = x - tmp // substract 0 from x

In Java, here is the rule:

A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T)((E1) op (E2)), where T is the type of E1, except that E1 is evaluated only once. (see 15.26.2 Compound Assignment Operators)

This means the value of is copied, so the pre- and post-decrements have no effect. Your C compiler probably uses a different rule.

For C, this article might help:

The moral is that writing code that depends on order of evaluation is a bad programming practice in any language.

[EDIT] Pascal Cuoq (see below) insist that the standard says the result is undefined. This is probably correct: I stared the the part of he copied out of the standard for a couple of minutes and couldn't understand what that sentence said. I guess I'm not alone here :) So I went to see how the C interpreter works which I developed for my master thesis. It's not standard compliant but I understand how it works. Guess, I'm a Heisenberg-type guy: I can have either at any precision but not both ;) Anyway.

When parsing this construct, you get this parse tree:

        +---- (-=) ----+
        v     -=       v
        x        +--- (-) ----+
                 v            v
              PREDEC x    POSTDEC x

The standard states that modifying x three times (once on the left and twice in the two decrement ops), leaves x undefined. Okay. But a compiler is a deterministic program, so when it accepts some input, it will always produce the same output. And most compilers work the same. I think we all agree that any C compiler will in fact accept this input. What outputs can we expect? Answer: 6 or 8. Reasoning:

  1. x-x is 0 for any value of x.
  2. --x-x is 0 for any value of x, because it can be written as --x, x-x
  3. x-x-- is 0 because the result of the minus operator is calculated before the post-decrement.

So if the pre-decrement has no influence on the result and neither has the post-decrement has no influence. Also, there is no inference between the two operators (using them both in the same expression as in a = --y - x-- doesn't change their behavior). Conclusion: all and any C compiler will return 0 for --x - x-- (well, except the buggy ones).

Which leaves us with my original assumption: The value RHS has no influence on the result, it always evaluates to 0 but it modifies x. So the question is how is -= implemented? There are quite a few factors which play a role here:

  1. Does the CPU have an native operator for -=? Register based CPU do (in fact, they only have such operators. To do a+b, they have to copy a into a register and then they can +=b to it), stack based CPUs don't (they push all the values on the stack and then use operators which use the topmost stack elements as operands).
  2. Are the values saved on the stack or in registers? (Another way to ask the first question)
  3. Which optimization options are active?

To go any further, we must look at the code:

#include <stdio.h>

int main() {
        int x = 8;
        x -= --x - x--;
        printf("x=%d\n", x);
}

When compiled, we get this assembler code for the assignment (x86 code):

    .loc 1 4 0
    movl    $8, -4(%rbp)    ; x = 8
    .loc 1 5 0
    subl    $1, -4(%rbp)    ; x--
    movl    $0, %eax        ; tmp = 0
    subl    %eax, -4(%rbp)  ; x -= tmp
    subl    $1, -4(%rbp)    ; x--
    .loc 1 6 0
    movl    -4(%rbp), %esi  ; push `x` into the place where printf() expects it

The first movl sets x to 8 which means -4(%rbp) is x. As you can see, the compiler actually notices x-x and optimizes that to 0 as predicted (even without any optimization options). We also have the two expected -- operations which means the result must always be 6.

So who is right? We both are. Pascal is right when he says that the standard doesn't define this behavior. But that doesn't mean it's random. All the pieces of the code have a well-defined behavior, so the behavior of the sum can't suddenly be undefined (unless there is something else missing - but not in this case). So even though the standard doesn't treat this problem, it's still deterministic.

For stack based CPUs (that don't have any registers), the result should be 8 since they will copy the value of x before they start evaluating the right hand side. For register based CPUs, it should always be 6.

Morale: The standard is always right but if you must understand, look at the code ;)

like image 22
Aaron Digulla Avatar answered Sep 29 '22 09:09

Aaron Digulla


In C++, the result is indeterminate, i.e., not specified or guaranteed to be consistent - the compiler is free to do whatever suits it best at any time based on sequence points.

I suspect the same for Java [and C# etc.]

like image 30
Ruben Bartelink Avatar answered Sep 29 '22 09:09

Ruben Bartelink