Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group Users by Age Range in ruby

I'm trying to list the number of users by age-range:

Range  : #Users
10-14  : 16
15-21  : 120
22-29  : 312
30-40  : 12131
41-70  : 612
71-120 : 20

I was thinking of creating a static array of hashes:

AGE_RANGES = [
  {label:"10 - 14", min:10, max:14},
  {label:"15 - 21", min:15, max:21},
  {label:"22 - 29", min:22, max:29},
  {label:"30 - 40", min:30, max:40},
  {label:"41 - 70", min:41, max:70},
  {label:"71 - 120", min:71, max:120}
]

and then use it for my search filter, as well as for my query. But, I cannot think of a way of getting the most performance out of it.

My method in my model only groups by age:

def self.group_by_ageRange(minAge, maxAge)

  query = User.group("users.age")
              .where("users.age BETWEEN minAge and maxAge ")
              .select("users.age,
                        count(*) as number_of_users")

end

Any suggestions?

like image 264
MrWater Avatar asked Aug 23 '12 18:08

MrWater


2 Answers

You want to build some SQL that looks like this:

select count(*),
       case
           when age between 10 and 14 then '10 - 14'
           when age between 15 and 21 then '15 - 21'
           -- ...
       end as age_range
from users
where age between 10 and 120
group by age_range

In ActiveRecord terms, that would be:

# First build the big ugly CASE, we can also figure out the
# overall max and min ages along the way.
min   = nil
max   = nil
cases = AGE_RANGES.map do |r|
    min = [r[:min], min || r[:min]].min
    max = [r[:max], max || r[:max]].max
    "when age between #{r[:min]} and #{r[:max]} then '#{r[:min]} - #{r[:max]}'"
end

# Then away we go...
age_ranges = Users.select("count(*) as n, case #{cases.join(' ')} end as age_range")
                  .where(:age => min .. max)
                  .group('age_range')
                  .all

That will leave you with an array of objects in age_ranges and those objects will have n and age_range methods. If you want a Hash out of that, then:

age_ranges = Hash[age_ranges.map { |r| [r.age_range, r.n] }]

That won't include ranges that don't have any people in them of course; I'll leave that as an exercise for the reader.

like image 195
mu is too short Avatar answered Nov 13 '22 12:11

mu is too short


I find the accepted answer to be a bit dense. Fast but hard to understand and write. Today, I came up with a slower but simpler solution. Since we are grouping ages into ranges, we can assume that we won't have values over 125

That means that if you use a ruby filter on a grouped and counted result set, you won't iterate over more than 125 items. This will be slower than a sql range based group/count, but it was fast enough for my purposes while still relying on the DB for most of the heavy lifting. Iterating over a hash with less than 125 items doesn't seem like a big deal. Especially when the key value pairs are just ints like this:

{
  0 => 0,
  1 => 1,
  3 => 5,
  25 => 3,
  99 => 3
}

Here's the psudo-code:

users = User
  .where(age: (min..max))
  .group(:age)
  .count(:age)
group = Hash.new(0)
users.each{|age, count|
      case
      when age <= 10
        group['under 10'] += count
      when age <= 25
        group['11-25'] += count
      when age <= 40
        group['26-40'] += count
      else
        group['41+'] += count
      end
}

Note: this solution provides the count of users in a given range.

like image 2
Jared Menard Avatar answered Nov 13 '22 11:11

Jared Menard