I'm having some trouble building a query that will group my items into monthly ranges according to whenever they exist in a month or not. I'm using PostgreSQL.
For example I have a table with data as this:
Name Period(text)
Ana 2010/09
Ana 2010/10
Ana 2010/11
Ana 2010/12
Ana 2011/01
Ana 2011/02
Peter 2009/05
Peter 2009/06
Peter 2009/07
Peter 2009/08
Peter 2009/12
Peter 2010/01
Peter 2010/02
Peter 2010/03
John 2009/05
John 2009/06
John 2009/09
John 2009/11
John 2009/12
and I want the result query to be this:
Name Start End
Ana 2010/09 2011/02
Peter 2009/05 2009/08
Peter 2009/12 2010/03
John 2009/05 2009/06
John 2009/09 2009/09
John 2009/11 2009/12
Is there any way to achieve this?
This is an aggregation problem, but with a twist -- you need the define the groups of adjacent months for each name.
Assuming that the month never appears more than once for a given name, you can do this by assigning a "month" number to each period and subtracting a sequential number. The values will be a constant for months that are in a row.
select name, min(period), max(period)
from (select t.*,
(cast(left(period, 4) as int) * 12 + cast(right(period, 2) as int) -
row_number() over (partition by name order by period)
) as grp
from names t
) t
group by grp, name;
Here is a SQL Fiddle illustrating this.
Note: duplicates are not really a problem either. You would jsut use dense_rank()
instead of row_number()
.
I don't know if there is an easier way (there probably is) but I can't think of one right now:
with parts as (
select name,
to_date(replace(period,'/',''), 'yyyymm') as period
from names
), flagged as (
select name,
period,
case
when lag(period,1, (period - interval '1' month)::date) over (partition by name order by period) = (period - interval '1' month)::date then null
else 1
end as group_flag
from parts
), grouped as (
select flagged.*,
coalesce(sum(group_flag) over (partition by name order by period),0) as group_nr
from flagged
)
select name, min(period), max(period)
from grouped
group by name, group_nr
order by name, min(period);
The first common table expression (parts
) simple changes the period into a date so that it can be used in an arithmetic expression.
The second CTE (flagged
) assigns a flag each time the gap (in months) between the current row and the previous is not one.
The third CTE then accumulates those flags to define a unique group number for each consecutive number of rows.
The final select then simply gets the start and end period for each group. I didn't bother to convert the period back to the original format though.
SQLFiddle example that also shows the intermediate result of the flagged
CTE:
http://sqlfiddle.com/#!15/8c0aa/2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With