I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.
When I retrieve them, though, I want to split the ranges at day boundaries - so for example:
2013-01-01 00:00:00 to 2013-01-02 23:59:59
would be selected as two rows:
2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59
with the values in the other columns the same for both retrieved entries.
I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.
I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY
statement is an Oracle extension to the SQL standard, so I can't use it.
I believe I can achieve this using PostgreSQL's generate_series
, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.
This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM
table in a joined subquery), but I believe this is more-or-less the right track.
Here's the fiddle with the schema, sample data, and my working query.
Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT
part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.
In PostgreSQL, the / operator stands for division. If the columns have integer types, PostgreSQL will use integer division. Integer division is division in which the fractional part (remainder) is discarded. For example, in integer division, the result of 5/2 is 2.
In PostgreSQL, the make_interval() function creates an interval from years, months, weeks, days, hours, minutes and seconds fields. You provide the years, months, weeks, days, hours, minutes and/or seconds fields, and it will return an interval in the interval data type.
When a row is created, the value of xmin is set equal to the ID of the transaction that performed the INSERT command, while xmax is not filled in. When a row is deleted, the xmax value of the current version is labeled with the ID of the transaction that performed DELETE.
First off, your upper border concept is broken. A timestamp with 23:59:59
is no good. The data type timestamp
has fractional digits. What about 2013-10-18 23:59:59.123::timestamp
?
Include the lower border and exclude the upper border everywhere in your logic. Compare:
Building on this premise:
SELECT id
, stime
, etime
FROM timesheet_entries t
WHERE etime <= stime::date + 1 -- this includes upper border 00:00
UNION ALL
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
WHERE etime > stime::date + 1
) sub
ORDER BY id, stime;
Or simply:
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END -- AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
) sub
ORDER BY id, stime;
The simpler one may even be faster.
Note a corner case difference when stime
and etime
both fall on 00:00
exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:
SELECT *
FROM (
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM (
SELECT id
, stime
, etime
, generate_series(stime::date, etime::date, interval '1d')::date AS d
FROM timesheet_entries t
) sub1
ORDER BY id, stime
) sub2
WHERE etime <> stime;
In Postgres 9.3+ you would better use LATERAL
for this
SELECT id
, CASE WHEN stime::date = d THEN stime ELSE d END AS stime
, CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM timesheet_entries t
, LATERAL (SELECT d::date
FROM generate_series(t.stime::date, t.etime::date, interval '1d') d
) d
ORDER BY id, stime;
Details in the manual.
Same corner case as above.
SQL Fiddle demonstrating all.
There is simply solution (if intervals starts in same time)
postgres=# select i, i + interval '1day' - interval '1sec' from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i); i │ ?column? ─────────────────────┼───────────────────── 2013-01-01 00:00:00 │ 2013-01-01 23:59:59 2013-01-02 00:00:00 │ 2013-01-02 23:59:59 (2 rows)
I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms
create or replace function day_ranges(timestamp, timestamp) returns table(t1 timestamp, t2 timestamp) as $$ begin t1 := $1; if $2 > $1 then loop if t1::date = $2::date then t2 := $2; return next; exit; end if; t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; t1 := t2 + interval '1sec'; end loop; end if; return; end; $$ language plpgsql;
Result:
postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00'); t1 │ t2 ─────────────────────┼───────────────────── 2013-10-08 22:00:00 │ 2013-10-09 23:59:59 2013-10-09 00:00:00 │ 2013-10-09 23:59:59 2013-10-10 00:00:00 │ 2013-10-10 23:00:00 (3 rows) Time: 6.794 ms
and faster (and little bit longer) version based on RETURN QUERY
create or replace function day_ranges(timestamp, timestamp) returns table(t1 timestamp, t2 timestamp) as $$ begin t1 := $1; t2 := $2; if $1::date = $2::date then return next; else -- first day t2 := date_trunc('day', t1) + interval '1day' - interval '1sec'; return next; if $2::date > $1::date + 1 then return query select d, d + interval '1day' - interval '1sec' from generate_series(date_trunc('day', $1 + interval '1day')::timestamp, date_trunc('day', $2 - interval '1day')::timestamp, '1day') g(d); end if; -- last day t1 := date_trunc('day', $2); t2 := $2; return next; end if; return; end; $$ language plpgsql;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With