Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do these two string comparisons return different results?

Here is a small piece of code :

String a = "abc";  Console.WriteLine(((object)a) == ("ab" + "c")); // true  Console.WriteLine(((object)a) == ("ab" + 'c')); // false  

Why ?

like image 349
Pablo Honey Avatar asked Apr 08 '15 13:04

Pablo Honey


People also ask

What happens when you compare strings?

In String, the == operator is used to comparing the reference of the given strings, depending on if they are referring to the same objects. When you compare two strings using == operator, it will return true if the string variables are pointing toward the same java object. Otherwise, it will return false .

How do you compare two strings with the same?

The equals() method compares two strings, and returns true if the strings are equal, and false if not. Tip: Use the compareTo() method to compare two strings lexicographically.

How do you know if two string variables are equal?

You can check the equality of two Strings in Java using the equals() method. This method compares this string to the specified object. The result is true if and only if the argument is not null and is a String object that represents the same sequence of characters as this object.

How does string compare work?

The algorithm to compare two strings is simple: Compare the first character of both strings. If the first character from the first string is greater (or less) than the other string's, then the first string is greater (or less) than the second.


1 Answers

Because the == is doing a reference comparison. With the C# compiler all the "equal" strings that are known at compile time are "grouped" together, so that

string a = "abc"; string b = "abc"; 

will point to the same "abc" string. So they will be referentially equal.

Now, ("ab" + "c") is simplified at compile time to "abc", while "ab" + 'c' is not, and so is not referentially equal (the concatenation operation is done at runtime).

See the decompiled code here

I'll add that the Try Roslyn is doing a wrong decompilation :-) And even IlSpy :-(

It is decompiling to:

string expr_05 = "abc" Console.WriteLine(expr_05 == "abc"); Console.WriteLine(expr_05 == "ab" + 'c'); 

So string comparison. But at least the fact that some strings are calculated at compile time can be clearly seen.

Why is your code doing reference comparison? Because you are casting one of the two members to object, and the operator== in .NET isn't virtual, so it must be resolved at compile time with the information the compiler has, and then... from == Operator

For predefined value types, the equality operator (==) returns true if the values of its operands are equal, false otherwise. For reference types other than string, == returns true if its two operands refer to the same object. For the string type, == compares the values of the strings.

To the compiler, the first operand of the == operator isn't a string (because you casted it), so it doesn't fall in the string comparison.

Interesting fact: at the CIL level (the assembly language of .NET), the opcode used is the ceq, that does value comparison for primitive value types and reference comparison for reference types (so in the end it always does bit-by-bit comparison, with some exceptions for the float types with NaN). It doesn't use "special" operator== methods. It can be seen in this example

where the

Console.WriteLine(a == ("ab" + 'c')); // True  

is resolved at compile time in a call to

call bool [mscorlib]System.String::op_Equality(string, string) 

while the other == are simply

ceq 

This explains why the Roslyn decompiler works "badly" (as the IlSpy :-(, see bug report )... It sees an opcode ceq and doesn't check if there is a cast needed to rebuild the correct comparison.

Holger asked why only the addition between two string literals is done by the compiler... Now, reading the C# 5.0 specifications in a very strict way, and considering the C# 5.0 specifications to be "separated" from the .NET specifications (with the exceptions of the prerequisites that the C# 5.0 has for some classes/structs/methods/properties/...), we have:

String concatenation:

string operator +(string x, string y); string operator +(string x, object y); string operator +(object x, string y); 

These overloads of the binary + operator perform string concatenation. If an operand of string concatenation is null, an empty string is substituted. Otherwise, any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object. If ToString returns null, an empty string is substituted.

So, the case string + string, string + null, null + string are all precisely described, and their result can be "calculated" by using only the rules of the C# specifications. For every other type, the virtual ToString method must be called. The result of the virtual ToString method isn't defined for any type in the C# specifications, so if the compiler "presumed" its result it would do a wrong "thing". For example a .NET version that had System.Boolean.ToString() that returned Yes/No instead of True/False would still be OK for the C# specifications.

like image 92
xanatos Avatar answered Nov 06 '22 19:11

xanatos