Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing object equivalence in Ruby

Tags:

ruby

I'm doing a Ruby tutorial here: http://rubymonk.com/learning/books/4-ruby-primer-ascent/chapters/45-more-classes/lessons/105-equality_of_objects

It's saying when I overload the == operator that I should also overload the eql? method and hash methods because they are "faster".

However, if I am overloading all three with essentially the same method, how is one faster than the other?

like image 680
Ricky Avatar asked Apr 24 '13 21:04

Ricky


People also ask

Can you use == to compare objects?

The == operator compares whether two object references point to the same object.

How do you check if two objects are the same in Ruby?

method. This method tests object equality by checking if the 2 objects refer to the same hash key. Here, the instances assigned to the key and other_key variables are 2 distinct instances. Now, if the Hash#[]= method made the comparison at an object-level then it'd rather create an entry for each string.

How do you compare two objects are equal?

The equals() method of the Object class compare the equality of two objects. The two objects will be equal if they share the same memory address. Syntax: public boolean equals(Object obj)


2 Answers

In most cases, == and eql? have the same result. In some cases, eql? is more strict than ==:

42.0 == 42 # => true
42.0.eql?(42) # => false

Because of this, if you define == you probably want to define eql? also (or vice versa).

A choice was made that the Hash class would use eql? to differentiate between different keys, not ==. It could have been ==, mind you, but eql? was cleaner.

To avoid doing expensive calls to eql? all the time, a hash value is calculated with the requirement that two object that are eql? must have the same hash value. That hash value is stored, which makes future lookups very easy: if the hash code does not match, then the values are not eql?...

For that reason, you must define hash in a sensible way if you define eql?.

Note that calculating the hash value is almost always more expensive than doing a comparison with == or eql?. Once the hash is calculated, though, checking that the hashes matches is very quick.

Because hashes normally involve very many comparisons, the relatively expensive hash calculation is done once for each key, and then once for each lookup. Imagine a hash with 10 entries. Building it will involve 10 calls to hash, before the first lookup is even done. The first lookup will be relatively quick though: one call to hash, followed by very efficient comparison of hash codes (it's actually faster than this, as they are "indexed"). If there is a match, one must still do a call to eql? to insure it's a real match. Indeed, two objects that are not eql? could have the same hash. The only guarantee is that two objects that are eql? must have the same hash, but two different objects could have the same too.

If you wanted to do the same using an Array instead, you might need 10 call to eql? for each lookup.

For what it's worth, I don't think the Ruby primer you link to is as clear as it could be. It neglects the fact that calculating the hash can be expensive, so that it's done only when it makes sense, i.e. when it is a good assumption that each element will be compared many times. Moreover, it's a shame that the example of a custom eql? it gives uses == to compare the instance variables. Ideally, it would use eql? for consistency, in the same way that arrays are == if its elements are == and arrays are eql? if its elements are eql?. Finally, it really should mention Struct which defines decent ==, hash and eql? for you.

like image 173
Marc-André Lafortune Avatar answered Oct 09 '22 22:10

Marc-André Lafortune


For e.g. Array#hash says -

Two arrays with the same content will have the same hash code (and will compare using eql?).

and Array#== says :

Equality — Two arrays are equal if they contain the same number of elements and if each element is equal to (according to Object#==) the corresponding element in other_ary.

and Array#eql? says

Returns true if self and other are the same object, or are both arrays with the same content.

So as per the documentation it is clear that eql? is faster as it uses hash value, with eql?. Whereas #== does two things -

  1. length of the array and
  2. each element equality test.
like image 20
Arup Rakshit Avatar answered Oct 10 '22 00:10

Arup Rakshit