Right so I have a table such as this in PostgreSQL:
timestamp duration
2013-04-03 15:44:58 4
2013-04-03 15:56:12 2
2013-04-03 16:13:17 9
2013-04-03 16:16:30 3
2013-04-03 16:29:52 1
2013-04-03 16:38:25 1
2013-04-03 16:41:37 9
2013-04-03 16:44:49 1
2013-04-03 17:01:07 9
2013-04-03 17:07:48 1
2013-04-03 17:11:00 2
2013-04-03 17:11:16 2
2013-04-03 17:15:17 1
2013-04-03 17:16:53 4
2013-04-03 17:20:37 9
2013-04-03 17:20:53 3
2013-04-03 17:25:48 3
2013-04-03 17:29:26 1
2013-04-03 17:32:38 9
2013-04-03 17:36:55 4
And I would like to get the following output:
timestampwindowstart = 2013-04-03 15:44:58
duration count
1 0
2 1
3 0
4 1
9 0
timestampwindowstart = 2013-04-03 15:59:58
duration count
1 0
2 0
3 0
4 0
9 1
timestampwindowstart = 2013-04-03 16:14:58
duration count
1 1
2 0
3 1
4 0
9 0
timestampwindowstart = 2013-04-03 16:29:58
duration count
1 2
2 0
3 0
4 0
9 1
etc...
So basically it cycles through the timestamps in 15 minute windows and outputs the distinct duration values along with their frequency (count). The timestampwindowstart value is the earliest timestamp for the window (i.e timestampwindowfinish = timestampwindowstart + 15 minutes)
This is so I can then plot the 15 minute interval histograms...
I have tried reading up but it is a bit complicated for me to get my head around and I don't have much time...
Thanks for any help!
Quick and dirty way: http://sqlfiddle.com/#!1/bd2f6/21 I named my column tstamp
instead of your timestamp
with t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
(select duration from tmp group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmp on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
Brief explanation:
null
where duration does not exists for given interval.count(null)=0
In case you have more tables and the algorithm should be applied on their union. Suppose we have three tables tmp1, tmp2, tmp3
all with columns tstamp
and duration
. The we can extend the previous solution:
with
tmpout as (
select * from tmp1 union all
select * from tmp2 union all
select * from tmp3
)
,t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
(select duration from tmpout group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmpout on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
You should really know with
clause in PostgreSQL. It is invaluable concept for any data analysis in PostgreSQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With