Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select data for 15 minute windows - PostgreSQL

Right so I have a table such as this in PostgreSQL:

timestamp              duration

2013-04-03 15:44:58    4
2013-04-03 15:56:12    2
2013-04-03 16:13:17    9
2013-04-03 16:16:30    3
2013-04-03 16:29:52    1
2013-04-03 16:38:25    1
2013-04-03 16:41:37    9
2013-04-03 16:44:49    1
2013-04-03 17:01:07    9
2013-04-03 17:07:48    1
2013-04-03 17:11:00    2
2013-04-03 17:11:16    2
2013-04-03 17:15:17    1
2013-04-03 17:16:53    4
2013-04-03 17:20:37    9
2013-04-03 17:20:53    3
2013-04-03 17:25:48    3
2013-04-03 17:29:26    1
2013-04-03 17:32:38    9
2013-04-03 17:36:55    4

And I would like to get the following output:

timestampwindowstart = 2013-04-03 15:44:58

duration    count
1           0
2           1
3           0
4           1
9           0

timestampwindowstart = 2013-04-03 15:59:58

duration    count
1           0
2           0
3           0
4           0
9           1

timestampwindowstart = 2013-04-03 16:14:58

duration    count
1           1
2           0
3           1
4           0
9           0

timestampwindowstart = 2013-04-03 16:29:58

duration    count
1           2
2           0
3           0
4           0
9           1

etc...

So basically it cycles through the timestamps in 15 minute windows and outputs the distinct duration values along with their frequency (count). The timestampwindowstart value is the earliest timestamp for the window (i.e timestampwindowfinish = timestampwindowstart + 15 minutes)

This is so I can then plot the 15 minute interval histograms...

I have tried reading up but it is a bit complicated for me to get my head around and I don't have much time...

Thanks for any help!

like image 382
James Elder Avatar asked Oct 03 '22 14:10

James Elder


1 Answers

Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp instead of your timestamp

with t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
    (select duration from tmp group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmp on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

Brief explanation:

  1. Calculate minimum and maximum timestamp
  2. Generate 15 minutes intervals between minimum and maximum
  3. Cross join results with unique values of duration
  4. Left join original data (left join is important, because this will keep all possible combination in output and there will be null where duration does not exists for given interval.
  5. Aggregate data. count(null)=0

In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3 all with columns tstamp and duration. The we can extend the previous solution:

with 

tmpout as (
  select * from tmp1 union all
  select * from tmp2 union all
  select * from tmp3
)

,t as (
  select
    generate_series(mitstamp,matstamp,'15 minutes') as int,
    duration
  from
    (select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
    (select duration from tmpout group by duration) b
)

select
  int as timestampwindowstart,
  t.duration,
  count(tmp.duration)
from
   t
   left join tmpout on 
         (tmp.tstamp >= t.int and 
          tmp.tstamp < (t.int + interval '15 minutes') and 
          t.duration = tmp.duration)
group by
  int,
  t.duration
order by
  int,
  t.duration

You should really know with clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.

like image 115
Tomas Greif Avatar answered Oct 10 '22 02:10

Tomas Greif