My algorithm skills are lackluster. I created a method to see if two arrays contain the same elements (duplicates don't matter):
one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]
def same_elements?(array_one, array_two)
  return true if ( (array_one - array_two).empty? && (array_two - array_one).empty? )
  return false
end
same_elements?(one, two)
This returns true (which is correct). The problem is, I am not sure what the efficiency of this algorithm is. My first guess is O(n^2) since we have to check both a-b and b-a. I know that O(n^2) is pretty terrible. Is there a more efficient way to do this?
O(n+m) on average
The worst case is O(nm), but it only happens if you really want to make it happen (see last paragraph).
If array_one is chosen to be bigger than array_two, O(m+n) is just O(n), so this algorithm runs in linear time on average.
Another shorter way to check would be :
one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]
puts Set[*one] == Set[*two] #=> true
# or
puts one.to_set == two.to_set #=> true
return true if x
return false
is equivalent to just
x
So your code can be written :
def same_elements?(array_one, array_two)
  (array_one - array_two).empty? && (array_two - array_one).empty?
end
I created one array with 1E6 elements, half of which are random numbers between 0 and 199999 (for collisions), the other half being plain Ruby objects.
The other array is just the first one, shuffled randomly.
N = 1_000_000
one = (1..N).map{rand < 0.5 ? rand(N/5) : Object.new}
two = one.sort_by{rand}
It takes 1 min to compare the sets, and fruity reports that set comparison is about 20% faster than the OP's method.
For smaller arrays of integers, the OP's method was a bit faster.
Note: code proposed by @engineersmnky in comments was reported to have similar speed than the other methods.
Your code is surely not O(nm) when used with usual Arrays.
Approximate times are :
Looking at rb_ary_diff in array.c, it's no wonder all the methods described above run in the same time : they basically work the same.
rb_ary_diff creates a hash table for array_two ( in O(m)), and iterates over each element of array_one (in O(n)), looking for the value in the hash table (O(1) on average). The whole operation would be O(n+m) on average.
This blog post analyses set intersection, which is implemented in a very similar way.
Doing it twice doesn't change anything, so the overall time complexity stays O(n+m).
One way to make this algorithm O(mn) is to completely disable the hash method. There's no reason to do so, except to prove that it's a very bad idea.
With 10_000 KeyObjects :
class KeyObject < Object
end
the set comparison takes less than 1 s.
With 10_000 KeyObjects :
class KeyObject < Object
  def hash
    1
  end
end
the set comparison takes more than 14 mins!
The probability that 2 distinct, random Ruby objects have the same hash is about 1E-20. Strictly speaking, the worst case for this algorithm is O(mn), but it will never happen if you don't go looking for it. Finding a collision with 2 elements isn't trivial, finding a collision with 1E6 elements is not going to happen by chance.
Let the size of first and second array be m and n respectively. Looking at rb_ary_diff's source code (see Joel Cornett's comment above), there's a for loop which runs O(m) times. Inside the loop there's a call that seems to search the hash. This operation in general takes O(n) time. Thus, assuming that all other calls are asymptotically faster than O(mn), then the overall difference function complexity is O(mn). Calling this function twice followed by an emptyness check results in your algorithm being O(mn).
On average hash search is constant, i.e., O(1), which means that in this case your algorithm performs in O(n). Still, hash search worst case complexity is O(n) meaning your algorithm is O(mn). It's a good exercise to find an example which demonstrates this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With