Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I make this Ruby code faster and/or use less memory?

I have an Array of String objects in Ruby which are made up of words like the one below:

animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]

I want to convert this to another Array of String objects but with only one animal per element and only unique elements. I found one way to do this as follows:

class Array
  def process()
    self.join(" ").split().uniq
  end
end

However, if the input array is huge, let's say millions of entries then performance of this will be pretty bad because I'll be creating a huge string, then a huge array and then uniq has to process that huge array to remove duplicate elements. One way I was thinking of speeding things up was to create a Hash with an entry for each word, that way I'd only process each word once on the first pass. Is there a better way?

like image 867
conorgriffin Avatar asked Jan 09 '23 15:01

conorgriffin


1 Answers

You've got the right idea. However, Ruby has a built-in class that's perfect for building sets of unique items: Set.

animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]

unique_animals = Set.new

animals.each do |str|
  unique_animals.merge(str.split)
end
# => cat
#    horse
#    dog
#    bird
#    sheep
#    chicken
#    cow

Or...

unique_animals = animals.reduce(Set.new) do |set, str|
  set.merge(str.split)
end

Under the covers Set actually uses a Hash to store its items, but it acts more like an un-ordered Array and responds to all of the familiar Enumerable methods (each, map, select, etc.). If you need to turn it into a real Array, though, just use Set#to_a.

like image 186
Jordan Running Avatar answered Jan 20 '23 05:01

Jordan Running