Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge array of hashes to get hash of arrays of values

This is the opposite of Turning a Hash of Arrays into an Array of Hashes in Ruby.

Elegantly and/or efficiently turn an array of hashes into a hash where the values are arrays of all values:

hs = [   { a:1, b:2 },   { a:3, c:4 },   { b:5, d:6 } ] collect_values( hs ) #=> { :a=>[1,3], :b=>[2,5], :c=>[4], :d=>[6] } 

This terse code almost works, but fails to create an array when there are no duplicates:

def collect_values( hashes )   hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } } end collect_values( hs ) #=> { :a=>[1,3], :b=>[2,5], :c=>4, :d=>6 } 

This code works, but can you write a better version?

def collect_values( hashes )   # Requires Ruby 1.8.7+ for Object#tap   Hash.new{ |h,k| h[k]=[] }.tap do |result|     hashes.each{ |h| h.each{ |k,v| result[k]<<v } }   end end 

Solutions that only work in Ruby 1.9 are acceptable, but should be noted as such.


Here are the results of benchmarking the various answers below (and a few more of my own), using three different arrays of hashes:

  • one where each hash has distinct keys, so no merging ever occurs:
    [{:a=>1}, {:b=>2}, {:c=>3}, {:d=>4}, {:e=>5}, {:f=>6}, {:g=>7}, ...]

  • one where every hash has the same key, so maximum merging occurs:
    [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}, {:a=>5}, {:a=>6}, {:a=>7}, ...]

  • and one that is a mix of unique and shared keys:
    [{:c=>1}, {:d=>1}, {:c=>2}, {:f=>1}, {:c=>1, :d=>1}, {:h=>1}, {:c=>3}, ...]
                user     system      total        real Phrogz 2a  0.577000   0.000000   0.577000 (  0.576000) Phrogz 2b  0.624000   0.000000   0.624000 (  0.620000) Glenn 1    0.640000   0.000000   0.640000 (  0.641000) Phrogz 1   0.671000   0.000000   0.671000 (  0.668000) Michael 1  0.702000   0.000000   0.702000 (  0.700000) Michael 2  0.717000   0.000000   0.717000 (  0.726000) Glenn 2    0.765000   0.000000   0.765000 (  0.764000) fl00r      0.827000   0.000000   0.827000 (  0.836000) sawa       0.874000   0.000000   0.874000 (  0.868000) Tokland 1  0.873000   0.000000   0.873000 (  0.876000) Tokland 2  1.077000   0.000000   1.077000 (  1.073000) Phrogz 3   2.106000   0.093000   2.199000 (  2.209000) 

The fastest code is this method that I added:

def collect_values(hashes)   {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } } end 

I've accepted "glenn mcdonald's answer" as it was competitive in terms of speed, reasonably terse, but (most importantly) because it pointed out the danger of using a Hash with a self-modifying default proc for convenient construction, as this may introduce bad changes when the user is indexing it later on.

Finally, here's the benchmark code, in case you want to run your own comparisons:

require 'prime'   # To generate the third hash require 'facets'  # For tokland1's map_by AZSYMBOLS = (:a..:z).to_a TESTS = {   '26 Distinct Hashes'   => AZSYMBOLS.zip(1..26).map{|a| Hash[*a] },   '26 Same-Key Hashes'   => ([:a]*26).zip(1..26).map{|a| Hash[*a] },   '26 Mixed-Keys Hashes' => (2..27).map do |i|     factors = i.prime_division.transpose     Hash[AZSYMBOLS.values_at(*factors.first).zip(factors.last)]   end }  def phrogz1(hashes)   Hash.new{ |h,k| h[k]=[] }.tap do |result|     hashes.each{ |h| h.each{ |k,v| result[k]<<v } }   end end def phrogz2a(hashes)   {}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } } end def phrogz2b(hashes)   hashes.each_with_object({}){ |h,r| h.each{ |k,v| (r[k]||=[]) << v } } end def phrogz3(hashes)   result = hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }   result.each{ |k,v| result[k] = [v] unless v.is_a? Array } end def glenn1(hs)   hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h} end def glenn2(hs)   hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h} end def fl00r(hs)   h = Hash.new{|h,k| h[k]=[]}   hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}   h end def sawa(a)   a.map(&:to_a).flatten(1).group_by{|k,v| k}.each_value{|v| v.map!{|k,v| v}} end def michael1(hashes)   h = Hash.new{|h,k| h[k]=[]}   hashes.each_with_object(h) do |h, result|     h.each{ |k, v| result[k] << v }   end end def michael2(hashes)   h = Hash.new{|h,k| h[k]=[]}   hashes.inject(h) do |result, h|     h.each{ |k, v| result[k] << v }     result   end end def tokland1(hs)   hs.map(&:to_a).flatten(1).map_by{ |k, v| [k, v] } end def tokland2(hs)   Hash[hs.map(&:to_a).flatten(1).group_by(&:first).map{ |k, vs|     [k, vs.map{|o|o[1]}]   }] end  require 'benchmark' N = 10_000 Benchmark.bm do |x|   x.report('Phrogz 2a'){ TESTS.each{ |n,h| N.times{ phrogz2a(h) } } }   x.report('Phrogz 2b'){ TESTS.each{ |n,h| N.times{ phrogz2b(h) } } }   x.report('Glenn 1  '){ TESTS.each{ |n,h| N.times{ glenn1(h)   } } }   x.report('Phrogz 1 '){ TESTS.each{ |n,h| N.times{ phrogz1(h)  } } }   x.report('Michael 1'){ TESTS.each{ |n,h| N.times{ michael1(h) } } }   x.report('Michael 2'){ TESTS.each{ |n,h| N.times{ michael2(h) } } }   x.report('Glenn 2  '){ TESTS.each{ |n,h| N.times{ glenn2(h)   } } }   x.report('fl00r    '){ TESTS.each{ |n,h| N.times{ fl00r(h)    } } }   x.report('sawa     '){ TESTS.each{ |n,h| N.times{ sawa(h)     } } }   x.report('Tokland 1'){ TESTS.each{ |n,h| N.times{ tokland1(h) } } }   x.report('Tokland 2'){ TESTS.each{ |n,h| N.times{ tokland2(h) } } }   x.report('Phrogz 3 '){ TESTS.each{ |n,h| N.times{ phrogz3(h)  } } }  end 
like image 731
Phrogz Avatar asked Mar 30 '11 19:03

Phrogz


People also ask

How do you combine hash?

Hash#merge!() is a Hash class method which can add the content the given hash array to the other. Entries with duplicate keys are overwritten with the values from each other_hash successively if no block is given.

How do you turn an array into a hash?

The to_h method is defined in the array class. It works to convert an array to a hash in the form of key-value pairs. The method converts each nested array into key-value pairs. The method also accepts a block.

What is the difference between hashes and arrays?

With arrays, the key is an integer, whereas hashes support any object as a key. Both arrays and hashes grow as needed to hold new elements. It's more efficient to access array elements, but hashes provide more flexibility.

What is hash array?

An array of hashes is useful when you have a bunch of records that you'd like to access sequentially, and each record itself contains key/value pairs. Arrays of hashes are used less frequently than the other structures in this chapter.


1 Answers

Take your pick:

hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}  hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h} 

I'm strongly against messing with the defaults for hashes, as the other suggestions do, because then checking for a value modifies the hash, which seems very wrong to me.

like image 108
glenn mcdonald Avatar answered Oct 10 '22 16:10

glenn mcdonald