Can you please explain me in plain English what is the XOR (^
) operator and what it does in the following code:
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
The XOR logical operation, exclusive or, takes two boolean operands and returns true if, and only if, the operands are different. Conversely, it returns false if the two operands have the same value.
XOR Background As we can see, if both bits are similar, then the XOR equals to zero. Otherwise, the result equals to one. Put differently, the bitwise XOR operation equals zero in case the number of ones is even and equals one in case the number of ones is odd.
XOR stands for exclusive or. It ensures that either A or B is true but never both. In this case we're doing a bitwise operation so you can make a nice little graph of the outcomes, they are as follows;
0 ^ 1 = 1
1 ^ 1 = 0
1 ^ 0 = 1
0 ^ 0 = 0
Since you're applying it to integers the above outcomes are applied to each bit in the operands. So lets just say you have the values 1, 2, 3 for height, length, and width respectively.
You would first have
0001 ^ 0010 resulting in 0011 then that would be XOR'd into 3 so 0011 ^ 0011 which gives you 0000
EDIT: supplying the wiki link from the comments to supplement my explanation; http://en.wikipedia.org/wiki/Exclusive_or#Computer_science
EDIT: Why does 0001 ^ 0010
result in 0011
?
So it's best to do this bit by bit. Think of the operator iterating over the two sets of bits and comparing pairs of them. So in this case lets work from right to left (least significant to most in this case).
1 ^ 0 = 1 // xxx1
0 ^ 1 = 1 // xx11
0 ^ 0 = 0 // x011
0 ^ 0 = 0 // 0011 - end of input
so piecing that back together you get 0011
. Basically, take each pair of inputs and reference the truth table for the outcome. The comment shows the output with x
being a value that has not yet been calculated.
With regard to collisions, yes, in this case there are plenty of collision. If I said it would be unique that was a poor word choice. What I really mean is that if you have 2, 8, 4 as your values XOR'n them in that order will always produce the same value.
Elaborating a little bit, you see alot of XOR
ing between fields in getHashCode()
methods because its a safe way of getting an objects signature. The concept of a signature is that its like an object's fingerprint, and it needs to fit into 32 bits. This signature is used by a number of objects as a quick comparison, (though, if your planning on using it for that, take a look at that wikipedia article because you need to be careful about equality and hash-codes), or for some kind of addressing (as is the case with .net's Dictionary
and Java's HashMap
).
The obvious solution to me to get a Box's fingerprint is to simply add up the values, that way if any of them change you'll get a different fingerprint:
bx.Height + bx.Length + bx.Width
Given that an equals operation might be really expensive (ie really slow), if we need to test the equality of two boxes:
Box {5, 10, 15}
Box {30, 40, 50}
Rather than do a full equals comparison, we can compare the two hash-code's, see that they're different, and skip the full equality comparison. In a dictionary this is critical to give us a fast method to find a bin (an element) to put the object in.
But if any of those values are too high, we could get an integer overflow exception, so instead of using addition, we use an XOR. Another solution, and one that's fairly unique to C#, is to use the unchecked{ ... }
block, but using XOR is considered more elegant.
There is one more subtle thing we can do to increase performance, and you'll see this with alot of auto-generated hashcode methods (such as those produced by ReSharper or IntelliJ): we can make the order matter by shifting (multiplying) each value.
public int hashCode() {
int result = x;
result = 31 * result ^ y;
result = 31 * result ^ z;
return result;
}
Now whats happening is that each field in your hashcode effectively has a place in the resulting 32 bits. What this means is that the two boxes:
Box {1, 20, 30}
Box {1, 30, 20}
would not have the same hash codes (they would have the same hash codes with your current system, but they're different!)
There's more than you ever wanted to know about hash-codes but I'll say one more thing.
In both Java/Scala and the .net framework, if you overload either equals or hash-code, you must also overload the other. You must also ensure that if two objects A and B have different hash-codes, then a call to A.Equals(B) must be false.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With