Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby - fastest way to select items that belong to two arrays by nested attribute

My Ruby query is really slow. I'm looking for a faster way to select items that belong to two arrays

listA is a list of objectA with the attribute name

listB is a list of objectB with the attribute name

There can be multiple items in listA that all have the same name, but nothing in listB with the same name. There are also items in listA whose names are not in listB. I want a function (loop) to return all the items in listA where their name is in listB.

This is my current solution, but it's too slow. How can I make it faster ?

z = 0
new = []
while z < listB.length 
    hold = listA.select{|obj| obj.name == listB[z].name}
    new += hold
    z += 1
end
like image 272
Marcus Avatar asked Feb 10 '23 13:02

Marcus


2 Answers

You want to maximise lookup speed. The best speed you can reach for lookup is O(1) using Hashes. My method described below is 1000x faster than your code and 350x faster than a loop with include? for n=10000

EDIT 2 : This link explains more in detail why hash lookup is fast.

names = {}
listB.each{|item| names[item.name] = true}
result = listA.select{|item| names[item.name]}

EDIT : Updates to the benchmark.

  • 03/08/2015 - Added user12341234 set method

Benchmark Code

class MyObject
  def initialize(name)
    @name = name
  end

  def name
    @name
  end
end

n=10000
k=n/10

listA = (1..n).map{ MyObject.new(('a'..'z').to_a.shuffle[0,8].join)}
listB = listA.sample(k)
listB += (1..(n-k)).map{ MyObject.new(('a'..'z').to_a.shuffle[0,8].join)}

require 'benchmark'
require 'set'

Benchmark.bm do |x|
    x.report("Hash") do
        names = {}
        listB.each{|item| names[item.name] = true}
        result = listA.select{|item| names[item.name]}
    end

    x.report("OP's code") do
        z = 0
        new = []
        while z < listB.length 
            hold = listA.select{|obj| obj.name == listB[z].name}
            new += hold
            z += 1
        end
    end

    x.report("Include") do
      names = listB.map(&:name)
      listA.select{|obj| names.include?(obj.name) }
    end

    x.report("Set") do 
        names = listB.map(&:name).to_set
        listA.select{|item| names.include?(item.name)}
    end
end

Benchmark

Specs :

  • Intel Core i7 920 @ 2.67 GHz
  • 13 GB RAM
  • Windows 8 x64
  • Ruby 21 x64

Results :

Algorithm       user     system      total        real
Hash         0.015000   0.000000   0.015000 (  0.013002)
OP's code   26.053000   0.016000  26.069000 ( 26.161283)
Include      9.219000   0.000000   9.219000 (  9.244161)
Set          0.016000   0.000000   0.016000 (  0.013001)
like image 140
Cyril Duchon-Doris Avatar answered May 04 '23 17:05

Cyril Duchon-Doris


Something like:

names = listB.map(&:name)
listA.select {|obj| names.inlude? obj.name }
like image 27
BroiSatse Avatar answered May 04 '23 17:05

BroiSatse