Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill timestamp gaps in a Postgres query?

Given the following table:

CREATE TABLE channel1m (
  ts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  itemId BIGINT,
  value BIGINT
)

in which a row may be inserted each minute, per itemId, as follows:

ts                    itemId         value
2012-12-03 15:29:00   100            1
2012-12-03 15:30:00   100            2
2012-12-03 15:30:00   101            0
2012-12-03 15:32:00   100            1
2012-12-03 15:32:00   101            1

I can't find a way (without creating additional tables) to write a query that fills the time gaps (for example, 15:29:00 for itemId 101, and 15:31:00 for both items) by returning NULL in value.

The expected resultset would be:

ts                    itemId         value
2012-12-03 15:29:00   100            1
2012-12-03 15:29:00   101            NULL
2012-12-03 15:30:00   100            2
2012-12-03 15:30:00   101            0
2012-12-03 15:31:00   100            NULL
2012-12-03 15:31:00   101            NULL
2012-12-03 15:32:00   100            1
2012-12-03 15:32:00   101            1

I've found solutions having a separate time table with the full serie of timestamps, but I would much prefer to solve this in the query alone. Is this possible?

like image 321
luisfarzati Avatar asked Dec 05 '12 17:12

luisfarzati


People also ask

What is the format of timestamp in PostgreSQL?

Postgres DATE data type Postgres uses the DATE data type for storing different dates in YYYY-MM-DD format. It uses 4 bytes for storing a date value in a column. You can design a Postgres table with a DATE column and use the keyword DEFAULT CURRENT_DATE to use the current system date as the default value in this column.

What is interval in Postgres?

In PostgreSQL, the Interval is another type of data type used to store and deploy Time in years, months, days, hours, minutes, seconds, etc. And the months and days values are integers values, whereas the second's field can be the fractions values.


2 Answers

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path = tmp;

DROP TABLE IF EXISTS channel1m CASCADE;
CREATE TABLE channel1m (
  zts TIMESTAMP WITHOUT TIME ZONE NOT NULL,
  zitemid BIGINT,
  zvalue BIGINT
);

-- in which a row may be inserted each minute, per zitemid, as follows:

INSERT INTO channel1m(zts, zitemid, zvalue) VALUES
 ('2012-12-03 15:29:00',   100,            1)
,('2012-12-03 15:30:00',   100,            2)
,('2012-12-03 15:30:00',   101,            0)
,('2012-12-03 15:32:00',   100,            1)
,('2012-12-03 15:32:00',   101,            1)
        ;

        -- CTE to the rescue!!!
WITH cal AS (
        WITH mm AS (
                SELECT MIN(xx.zts) AS minmin, MAX(xx.zts) AS maxmax
                 FROM channel1m xx)
        SELECT generate_series(mm.minmin , mm.maxmax , '1 min'::interval) AS stamp
        FROM mm
        )
, ite AS (
        SELECT DISTINCT zitemid AS zitemid
        FROM channel1m
        )
SELECT cal.stamp
        , ite.zitemid
        , tab.zvalue
FROM cal
JOIN ite ON 1=1 -- Note: this is a cartesian product of the {time,id} -domains
LEFT JOIN channel1m tab ON tab.zts = cal.stamp AND tab.zitemid = ite.zitemid
ORDER BY stamp ASC
        ;

Output:

NOTICE:  drop cascades to table tmp.channel1m
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE:  table "channel1m" does not exist, skipping
DROP TABLE
CREATE TABLE
INSERT 0 5
        stamp        | zitemid | zvalue 
---------------------+---------+--------
 2012-12-03 15:29:00 |     101 |       
 2012-12-03 15:29:00 |     100 |      1
 2012-12-03 15:30:00 |     100 |      2
 2012-12-03 15:30:00 |     101 |      0
 2012-12-03 15:31:00 |     100 |       
 2012-12-03 15:31:00 |     101 |       
 2012-12-03 15:32:00 |     100 |      1
 2012-12-03 15:32:00 |     101 |      1
(8 rows)
like image 131
wildplasser Avatar answered Oct 13 '22 23:10

wildplasser


You will need: table with all itemId, and a (pseudo)table with all required dates.

You probably have the table with all distinct itemId. Lets call it item_table.

Pseudo-table with dates you can get with generate_series('start_date','end_date', interval '1 minute'). Details here.

The query:

SELECT gs.ts, it.itemId, ch1m.value
FROM item_table it
CROSS JOIN generate_series('start_date','end_date', interval '1 minute') gs(ts)
LEFT JOIN channel1m ch1m ON it.itemId = ch1m.itemId 
                         AND gs.ts = ch1m.ts

Replace 'start_date','end_date' with desired values or get them from sub query.

This query:

1) Builds all pairs of item-time via CROSS JOIN

2) Gets the value via LEFT JOIN

like image 31
Ihor Romanchenko Avatar answered Oct 13 '22 23:10

Ihor Romanchenko