Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL splitting time range into days

I'm trying to write a complex query using PostgreSQL 9.2.4, and I'm having trouble getting it working. I have a table which contains a time range, as well as several other columns. When I store data in this table, if all of the columns are the same and the time ranges overlap or are adjacent, I combine them into one row.

When I retrieve them, though, I want to split the ranges at day boundaries - so for example:

2013-01-01 00:00:00 to 2013-01-02 23:59:59

would be selected as two rows:

2013-01-01 00:00:00 to 2013-01-01 23:59:59
2013-01-02 00:00:00 to 2013-01-02 23:59:59

with the values in the other columns the same for both retrieved entries.

I have seen this question which seems to more or less address what I want, but it's for a "very old" version of PostgreSQL, so I'm not sure it's really still applicable.

I've also seen this question, which does exactly what I want, but as far as I know the CONNECT BY statement is an Oracle extension to the SQL standard, so I can't use it.

I believe I can achieve this using PostgreSQL's generate_series, but I'm hoping there's a simple example out there demonstrating how it can be used to do this.

This is the query I'm working on at the moment, which currently doesn't work (because I can't reference the FROM table in a joined subquery), but I believe this is more-or-less the right track.

Here's the fiddle with the schema, sample data, and my working query.

Update: I just found out a fun fact, thanks to this question, that if you use a set-returning function in the SELECT part of the query, PostgreSQL will "automagically" do a cross join on the set and the row. I think I'm close to getting this working.

like image 758
CmdrMoozy Avatar asked Oct 18 '13 16:10

CmdrMoozy


People also ask

How do I use division in PostgreSQL?

In PostgreSQL, the / operator stands for division. If the columns have integer types, PostgreSQL will use integer division. Integer division is division in which the fractional part (remainder) is discarded. For example, in integer division, the result of 5/2 is 2.

How do I create an interval in PostgreSQL?

In PostgreSQL, the make_interval() function creates an interval from years, months, weeks, days, hours, minutes and seconds fields. You provide the years, months, weeks, days, hours, minutes and/or seconds fields, and it will return an interval in the interval data type.

What is xmin and xmax in PostgreSQL?

When a row is created, the value of xmin is set equal to the ID of the transaction that performed the INSERT command, while xmax is not filled in. When a row is deleted, the xmax value of the current version is labeled with the ID of the transaction that performed DELETE.


2 Answers

First off, your upper border concept is broken. A timestamp with 23:59:59 is no good. The data type timestamp has fractional digits. What about 2013-10-18 23:59:59.123::timestamp?

Include the lower border and exclude the upper border everywhere in your logic. Compare:

  • Calculate number of concurrent events in SQL

Building on this premise:

Postgres 9.2 or older

SELECT id
     , stime
     , etime
FROM   timesheet_entries t
WHERE  etime <= stime::date + 1  -- this includes upper border 00:00

UNION ALL
SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
   SELECT id
        , stime
        , etime
        , generate_series(stime::date, etime::date, interval '1d')::date AS d
   FROM   timesheet_entries t
   WHERE  etime > stime::date + 1
   ) sub
ORDER  BY id, stime;

Or simply:

SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     -- AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END -- AS etime
FROM (
   SELECT id
        , stime
        , etime
        , generate_series(stime::date, etime::date, interval '1d')::date AS d
   FROM   timesheet_entries t
   ) sub
ORDER  BY id, stime;

The simpler one may even be faster.
Note a corner case difference when stime and etime both fall on 00:00 exactly. Then a row with a zero time range is added at the end. There are various ways to deal with that. I propose:

SELECT *
FROM  (
   SELECT id
        , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime
        , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
   FROM (
      SELECT id
           , stime
           , etime
           , generate_series(stime::date, etime::date, interval '1d')::date AS d
      FROM   timesheet_entries t
      ) sub1
   ORDER  BY id, stime
   ) sub2
WHERE  etime <> stime;

Postgres 9.3+

In Postgres 9.3+ you would better use LATERAL for this

SELECT id
     , CASE WHEN stime::date = d THEN stime ELSE d END     AS stime
     , CASE WHEN etime::date = d THEN etime ELSE d + 1 END AS etime
FROM   timesheet_entries t
     , LATERAL (SELECT d::date
                FROM   generate_series(t.stime::date, t.etime::date, interval '1d') d
                ) d
ORDER  BY id, stime;

Details in the manual.
Same corner case as above.

SQL Fiddle demonstrating all.

like image 89
Erwin Brandstetter Avatar answered Oct 12 '22 20:10

Erwin Brandstetter


There is simply solution (if intervals starts in same time)

postgres=# select i, i + interval '1day' - interval '1sec' 
  from generate_series('2013-01-01 00:00:00'::timestamp, '2013-01-02 23:59:59', '1day') g(i);
          i          │      ?column?       
─────────────────────┼─────────────────────
 2013-01-01 00:00:00 │ 2013-01-01 23:59:59
 2013-01-02 00:00:00 │ 2013-01-02 23:59:59
(2 rows)

I wrote a table function, that do it for any interval. It is fast - two years range divide to 753 ranges in 10ms

create or replace function day_ranges(timestamp, timestamp)
returns table(t1 timestamp, t2 timestamp) as $$
begin
  t1 := $1;
  if $2 > $1 then
    loop
      if t1::date = $2::date then
        t2 := $2;
        return next;
        exit;
      end if;
      t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';
      return next;
      t1 := t2 + interval '1sec';
    end loop;
  end if;
  return;
end;
$$ language plpgsql;

Result:

postgres=# select * from day_ranges('2013-10-08 22:00:00', '2013-10-10 23:00:00');
         t1          │         t2          
─────────────────────┼─────────────────────
 2013-10-08 22:00:00 │ 2013-10-09 23:59:59
 2013-10-09 00:00:00 │ 2013-10-09 23:59:59
 2013-10-10 00:00:00 │ 2013-10-10 23:00:00
(3 rows)

Time: 6.794 ms

and faster (and little bit longer) version based on RETURN QUERY

create or replace function day_ranges(timestamp, timestamp)
returns table(t1 timestamp, t2 timestamp) as $$
begin
  t1 := $1; t2 := $2;
  if $1::date = $2::date then
    return next;
  else
    -- first day
    t2 := date_trunc('day', t1) + interval '1day' - interval '1sec';
    return next;
    if $2::date > $1::date + 1 then
      return query select d, d + interval '1day' - interval '1sec'
                      from generate_series(date_trunc('day', $1 + interval '1day')::timestamp,
                                           date_trunc('day', $2 - interval '1day')::timestamp,
                                           '1day') g(d);
    end if;
    -- last day 
    t1 := date_trunc('day', $2); t2 := $2;
    return next;
  end if;
  return;
end;
$$ language plpgsql;
like image 37
Pavel Stehule Avatar answered Oct 12 '22 21:10

Pavel Stehule