I'm doing a Ruby tutorial here: http://rubymonk.com/learning/books/4-ruby-primer-ascent/chapters/45-more-classes/lessons/105-equality_of_objects
It's saying when I overload the ==
operator that I should also overload the eql?
method and hash methods because they are "faster".
However, if I am overloading all three with essentially the same method, how is one faster than the other?
The == operator compares whether two object references point to the same object.
method. This method tests object equality by checking if the 2 objects refer to the same hash key. Here, the instances assigned to the key and other_key variables are 2 distinct instances. Now, if the Hash#[]= method made the comparison at an object-level then it'd rather create an entry for each string.
The equals() method of the Object class compare the equality of two objects. The two objects will be equal if they share the same memory address. Syntax: public boolean equals(Object obj)
In most cases, ==
and eql?
have the same result. In some cases, eql?
is more strict than ==
:
42.0 == 42 # => true
42.0.eql?(42) # => false
Because of this, if you define ==
you probably want to define eql?
also (or vice versa).
A choice was made that the Hash
class would use eql?
to differentiate between different keys, not ==
. It could have been ==
, mind you, but eql?
was cleaner.
To avoid doing expensive calls to eql?
all the time, a hash value is calculated with the requirement that two object that are eql?
must have the same hash value. That hash value is stored, which makes future lookups very easy: if the hash code does not match, then the values are not eql?
...
For that reason, you must define hash
in a sensible way if you define eql?
.
Note that calculating the hash value is almost always more expensive than doing a comparison with ==
or eql?
. Once the hash is calculated, though, checking that the hashes matches is very quick.
Because hashes normally involve very many comparisons, the relatively expensive hash calculation is done once for each key, and then once for each lookup. Imagine a hash with 10 entries. Building it will involve 10 calls to hash
, before the first lookup is even done. The first lookup will be relatively quick though: one call to hash
, followed by very efficient comparison of hash codes (it's actually faster than this, as they are "indexed"). If there is a match, one must still do a call to eql?
to insure it's a real match. Indeed, two objects that are not eql?
could have the same hash. The only guarantee is that two objects that are eql?
must have the same hash, but two different objects could have the same too.
If you wanted to do the same using an Array
instead, you might need 10 call to eql?
for each lookup.
For what it's worth, I don't think the Ruby primer you link to is as clear as it could be. It neglects the fact that calculating the hash
can be expensive, so that it's done only when it makes sense, i.e. when it is a good assumption that each element will be compared many times. Moreover, it's a shame that the example of a custom eql?
it gives uses ==
to compare the instance variables. Ideally, it would use eql?
for consistency, in the same way that arrays are ==
if its elements are ==
and arrays are eql?
if its elements are eql?
. Finally, it really should mention Struct
which defines decent ==
, hash
and eql?
for you.
For e.g. Array#hash
says -
Two arrays with the same content will have the same hash code (and will compare using eql?).
and Array#==
says :
Equality — Two arrays are equal if they contain the same number of elements and if each element is equal to (according to Object#==) the corresponding element in other_ary.
and Array#eql?
says
Returns true if self and other are the same object, or are both arrays with the same content.
So as per the documentation it is clear that eql?
is faster as it uses hash
value, with eql?
. Whereas #==
does two things -
- length of the array and
- each element equality test.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With