I have a file with numbers on each line:
0101
1010
1311
0101
1311
431
1010
431
420
I want have a hash with the number of occurrences of each number, in this case:
{0101 => 2, 1010 => 2, 1311 => 2, 431 => 2, 420 => 1}
How can I do this?
Simple one-liner, given an array items
:
items.inject(Hash.new(0)) {|hash, item| hash[item] += 1; hash}
How it works:
Hash.new(0)
creates a new Hash where accessing undefined keys returns 0.
inject(foo)
iterates through an array with the given block. For the first iteration, it passes foo
, and on further iterations, it passes the return value of the last iteration.
Another way to write it would be:
hash = Hash.new(0)
items.each {|item| hash[item] += 1}
This is essentially the same as Chuck's, but when you are creating an array or hash, 'each_with_object' will make it slightly simpler than 'inject', as you do not have to write the final array or hash in the block.
items.each_with_object(Hash.new(0)) {|item, hash| hash[item] += 1}
ID = -> x { x } # Why is the identity function not in the core lib?
f = <<-HERE
0101
1010
1311
0101
1311
431
1010
431
420
HERE
Hash[f.lines.map(&:to_i).group_by(&ID).map {|n, ns| [n, ns.size] }]
# { 101 => 2, 1010 => 2, 1311 => 2, 431 => 2, 420 => 1 }
You simply group the numbers by themselves using Enumerable#group_by
, which gives you something like
{ 101 => [101, 101], 420 => [420] }
And then you Enumerable#map
the value arrays to their lengths, i.e. [101, 101]
becomes 2
. Then just convert it back to a Hash
using Hash::[]
.
However, if you are willing to use a third-party library, it becomes even more trivial, because if you use a MultiSet
data structure, the answer falls out naturally. (A MultiSet
is like a Set
, except that an item can be added multiple times and the MultiSet
will keep count of how often an item was added – which is exactly what you want.)
require 'multiset' # Google for it, it's so old that it isn't available as a Gem
Multiset[*f.lines.map(&:to_i)]
# => #<Multiset:#2 101, #2 1010, #2 1311, #2 431, #1 420>
Yes, that's it.
That's the beautiful thing about using the right data-structure: your algorithms become massively simpler. Or, in this particular case, the algorithm just vanishes.
I've written more about using MultiSet
s for solving this exact problem at
group_by
example here.)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With