I was looking at the source for java.lang.String
and noticed the equals
method doesn't check whether the char[]
backing each String is the same object. Wouldn't this improve compare times?
Supposed improvement contained in this rewritten version:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
/** Begin Optimization **/
if(v1==v2 && i==j){
return true;
}
/** End Optimization **/
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
}
I believe this would improve performance in the case that the two Strings were obtained using String.substring
, and possibly even interned Strings.
Does anybody know if there's a reason they chose not to implement it this way?
Update: For anybody who might not know a lot about the implementation of String, there are cases other than the String pool where two String objects can have the same char[] value, int offset, and int count.
Consider the following code:
String x = "I am a String, yo!";
String y = x.split(" ")[3];
String z = x.substring(7,14);
You would end up with a situation like this:
Also apparently the value-sharing feature of Strings has been done away with in Java 7u6 in order to satisfy some benchmarks. So if you spent time making your code run in decent time (or at all) by using String.substring() rather than String concatenation, you're SOL.
This is because the == operator doesn't check for equality. It checks for identity. In other words, it doesn't compares the String s value - it compares object references.
You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.
char is a primitive data type whereas String is a class in java. char represents a single character whereas String can have zero or more characters. So String is an array of chars. We define char in java program using single quote (') whereas we can define String in Java using double quotes (").
Generally, if the strings contain only ASCII characters, you use the === operator to check if they are equal.
Well, you'd need to check the char[]
, the offset
and the count
(string length). Since the char[]
is only created from within the String
class, the only way for all three of those to be equal would be for a String
to create a doppelgänger from itself. You can get it to do that (e.g. new String("why?")
), but it's not a common use case.
<speculative>
I'm not even sure if it would speed anything up. The vast majority of the time, the check will fail, meaning that it's doing extra work for no benefit. That could be offset by branch prediction, but in that case, the few times the check passes, it'll invalidate guesses made by that branch prediction, which could actually slow things down. In other words, if the JVM/CPU tries to optimize for the common case, you'll usually gain nothing, and you'll actually hurt yourself in the rare case (which is what you're trying to optimize). If it doesn't try to optimize that common case, you hurt yourself in most comparisons for the sake of a fairly rare set of comparisons. </speculative>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With