Hey I have an array where each element is a hash containing a few values and a count.
result = [
{"count" => 3,"name" => "user1"},
{"count" => 10,"name" => "user2"},
{"count" => 10, "user3"},
{"count" => 2, "user4"}
]
I can sort the array by count as follows:
result = result.sort_by do |r|
r["count"]
end
Now I want to be able to retrieve the top n entries based on count (not just first(n)) Is there an elegant way to do this? So as an example, let n = 1 I would expect a result set of.
[{"count" => 10,"name" => "user2"}, {"count" => 10, "user3"}]
since I asked for all entries with the highest score.. if I asked for top 2 highest scores I'd get
[{"count" => 10,"name" => "user2"}, {"count" => 10, "user3"}, {"count" => 3, "user1"}]
new_result = result.
sort_by { |r| -r["count"] }.
chunk { |r| r["count"] }.
take(2).
flat_map(&:last)
#=> [{"count"=>10, "name"=>"user3"},
# {"count"=>10, "name"=>"user2"},
# {"count"=> 3 "name"=>"user1"}]
Enumerable#group_by
to the rescue (as usual):
result.group_by { |r| r["count"] }
.sort_by { |k, v| -k }
.first(2)
.map(&:last)
.flatten
Most of the work is done by the group_by
. The sort_by
simply lines things up so that first(2)
will pick off the groups you want. Then map
with last
will extract the count/name hashes that you started with and the final flatten
will clean up the extra left over arrays.
This solution is not elegant in terms of being concise, but it has better time complexity. In other words, it should execute a lot faster for a very large number of hashes.
You will need to install the "algorithms" gem in order to use the Heap data structure:
Heaps are an efficient data structure when you need to find the largest or smallest elements in a group. This particular type of heap is optimal if the value of "n" is much smaller than the total number of pairs.
require 'algorithms'
def take_highest(result,n)
max_heap = Containers::Heap.new(result){|x,y| (x["count"] <=> y["count"]) == 1}
last = max_heap.pop
count = 0
highest = [last]
loop do
top = max_heap.pop
break if top.nil?
count += (top["count"] == last["count"] ? 0 : 1)
break if count == n
highest << top
last = top
end
highest
end
Starting in Ruby 2.2.0, max_by
takes an extra argument that lets you ask for a certain number of top elements instead of just getting one. Using this, we can improve on mu is too short's answer
result = [
{count: 3, name: 'user1'},
{count: 10, name: 'user2'},
{count: 10, name: 'user3'},
{count: 2, name: 'user4'}
]
p result.group_by { |r| r[:count] }
.max_by(2, &:first)
.flat_map(&:last)
.sort_by { |r| -r[:count] }
# => [{:count=>10, :name=>"user2"}, {:count=>10, :name=>"user3"}, {:count=>3, :name=>"user1"}]
The docs don't say if the array returned by max_by
is sorted. If that turns out to be true though, we could just use reverse
in the last step instead of sorting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With