I have an Array
of String
objects in Ruby which are made up of words like the one below:
animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]
I want to convert this to another Array
of String
objects but with only one animal per element and only unique elements. I found one way to do this as follows:
class Array
def process()
self.join(" ").split().uniq
end
end
However, if the input array is huge, let's say millions of entries then performance of this will be pretty bad because I'll be creating a huge string, then a huge array and then uniq
has to process that huge array to remove duplicate elements. One way I was thinking of speeding things up was to create a Hash
with an entry for each word, that way I'd only process each word once on the first pass. Is there a better way?
You've got the right idea. However, Ruby has a built-in class that's perfect for building sets of unique items: Set.
animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]
unique_animals = Set.new
animals.each do |str|
unique_animals.merge(str.split)
end
# => cat
# horse
# dog
# bird
# sheep
# chicken
# cow
Or...
unique_animals = animals.reduce(Set.new) do |set, str|
set.merge(str.split)
end
Under the covers Set actually uses a Hash to store its items, but it acts more like an un-ordered Array and responds to all of the familiar Enumerable methods (each
, map
, select
, etc.). If you need to turn it into a real Array, though, just use Set#to_a.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With