Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get top n elements from ruby array of hash values

Hey I have an array where each element is a hash containing a few values and a count.

result = [
           {"count" => 3,"name" => "user1"}, 
           {"count" => 10,"name" => "user2"}, 
           {"count" => 10, "user3"},
           {"count" => 2, "user4"}
         ]

I can sort the array by count as follows:

result = result.sort_by do |r|
  r["count"]
end

Now I want to be able to retrieve the top n entries based on count (not just first(n)) Is there an elegant way to do this? So as an example, let n = 1 I would expect a result set of.

[{"count" => 10,"name" => "user2"}, {"count" => 10, "user3"}]

since I asked for all entries with the highest score.. if I asked for top 2 highest scores I'd get

 [{"count" => 10,"name" => "user2"}, {"count" => 10, "user3"}, {"count" => 3, "user1"}]
like image 212
Emmanuel P Avatar asked Jun 19 '12 05:06

Emmanuel P


4 Answers

new_result = result.
  sort_by { |r| -r["count"] }.
  chunk { |r| r["count"] }.
  take(2).
  flat_map(&:last)

#=> [{"count"=>10, "name"=>"user3"}, 
#    {"count"=>10, "name"=>"user2"}, 
#    {"count"=> 3  "name"=>"user1"}]
like image 147
tokland Avatar answered Oct 10 '22 16:10

tokland


Enumerable#group_by to the rescue (as usual):

result.group_by { |r| r["count"] }
      .sort_by  { |k, v| -k }
      .first(2)
      .map(&:last)
      .flatten

Most of the work is done by the group_by. The sort_by simply lines things up so that first(2) will pick off the groups you want. Then map with last will extract the count/name hashes that you started with and the final flatten will clean up the extra left over arrays.

like image 39
mu is too short Avatar answered Oct 10 '22 16:10

mu is too short


This solution is not elegant in terms of being concise, but it has better time complexity. In other words, it should execute a lot faster for a very large number of hashes.

You will need to install the "algorithms" gem in order to use the Heap data structure:

Heaps are an efficient data structure when you need to find the largest or smallest elements in a group. This particular type of heap is optimal if the value of "n" is much smaller than the total number of pairs.

require 'algorithms'
def take_highest(result,n)
  max_heap = Containers::Heap.new(result){|x,y| (x["count"] <=> y["count"]) == 1}
  last = max_heap.pop
  count = 0
  highest = [last]
  loop do   
    top = max_heap.pop
    break if top.nil?
    count += (top["count"] == last["count"] ? 0 : 1)
    break if count == n
    highest << top
    last = top
  end
  highest
end
like image 2
Fab Avatar answered Oct 10 '22 16:10

Fab


Starting in Ruby 2.2.0, max_by takes an extra argument that lets you ask for a certain number of top elements instead of just getting one. Using this, we can improve on mu is too short's answer

result = [
           {count: 3, name: 'user1'},
           {count: 10, name: 'user2'},
           {count: 10, name: 'user3'},
           {count: 2, name: 'user4'}
         ]
p result.group_by { |r| r[:count] }
      .max_by(2, &:first)
      .flat_map(&:last)
      .sort_by { |r| -r[:count] }
# => [{:count=>10, :name=>"user2"}, {:count=>10, :name=>"user3"}, {:count=>3, :name=>"user1"}]

The docs don't say if the array returned by max_by is sorted. If that turns out to be true though, we could just use reverse in the last step instead of sorting.

like image 1
David Grayson Avatar answered Oct 10 '22 16:10

David Grayson