Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query for grouping monthly period ranges

Tags:

sql

postgresql

I'm having some trouble building a query that will group my items into monthly ranges according to whenever they exist in a month or not. I'm using PostgreSQL.

For example I have a table with data as this:

Name    Period(text)
Ana     2010/09
Ana     2010/10
Ana     2010/11
Ana     2010/12
Ana     2011/01
Ana     2011/02
Peter   2009/05
Peter   2009/06
Peter   2009/07
Peter   2009/08
Peter   2009/12
Peter   2010/01
Peter   2010/02
Peter   2010/03
John    2009/05
John    2009/06
John    2009/09
John    2009/11
John    2009/12

and I want the result query to be this:

Name    Start     End
Ana     2010/09   2011/02
Peter   2009/05   2009/08
Peter   2009/12   2010/03
John    2009/05   2009/06
John    2009/09   2009/09
John    2009/11   2009/12

Is there any way to achieve this?

like image 663
fdr Avatar asked Jan 08 '15 19:01

fdr


2 Answers

This is an aggregation problem, but with a twist -- you need the define the groups of adjacent months for each name.

Assuming that the month never appears more than once for a given name, you can do this by assigning a "month" number to each period and subtracting a sequential number. The values will be a constant for months that are in a row.

select name, min(period), max(period)
from (select t.*,
             (cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
              row_number() over (partition by name order by period)
             ) as grp
      from names t
     ) t
group by grp, name;

Here is a SQL Fiddle illustrating this.

Note: duplicates are not really a problem either. You would jsut use dense_rank() instead of row_number().

like image 126
Gordon Linoff Avatar answered Nov 15 '22 18:11

Gordon Linoff


I don't know if there is an easier way (there probably is) but I can't think of one right now:

with parts as (
  select name, 
         to_date(replace(period,'/',''), 'yyyymm') as period
  from names
), flagged as (
  select name, 
         period, 
         case 
           when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
           else 1
         end as group_flag
  from parts
), grouped as (
  select flagged.*, 
         coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
  from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);

The first common table expression (parts) simple changes the period into a date so that it can be used in an arithmetic expression.

The second CTE (flagged) assigns a flag each time the gap (in months) between the current row and the previous is not one.

The third CTE then accumulates those flags to define a unique group number for each consecutive number of rows.

The final select then simply gets the start and end period for each group. I didn't bother to convert the period back to the original format though.

SQLFiddle example that also shows the intermediate result of the flagged CTE:
http://sqlfiddle.com/#!15/8c0aa/2

like image 44
a_horse_with_no_name Avatar answered Nov 15 '22 18:11

a_horse_with_no_name