We have a table that saves information about the interval of the employees. Let's call it INTERVAL_TABLE.
We save when the user starts a interval and when he finishes. The user can start a interval as many times as he wants and finish as many times as he wants as well.
This is a simplified structure of the INTERVAL_TABLE
:
INTERVAL_ID | USER_ID | INTERVAL_TYPE_ID | INTERVAL_TIMESTAMP | ENTRY_TYPE
A user may have these entries in the table:
Now, we must create a report combining different entries of that table that refer to the same user and interval type. We should be able to identify intervals that have a start and an end and group these two in one row. Assuming the data in the image above, the output of the report should be the following:
The output should be ordered by date, like the above image.
I have no idea how to create a query to do that.
Thanks!
Edit - Extra info:
To find the END interval for any INIT interval, we should find the closest END interval based on the timestamp of that interval. That's how we know we should match ID 1 with ID 2 and not with ID 3.
It's important to note that if a INIT interval is followed by another INIT interval (based on the timestamps), we should not proceed to find the END for that INIT. That is because this is a INIT without END.
It could be done easy and efficiently using LEAD
and LAG
functions. At least it is much more efficient than self-join of the table: O(n)
vs O(n*n)
.
At first add columns for the next and previous row using LEAD
and LAG
with appropriate PARTITION BY
.
Then build two sets of pairs - the first that starts with INIT_INTERVAL
, the second that ends with END_INTERVAL
. If there is a pair that has both Init and End - it will be included twice and later eliminated in UNION
.
SQL Fiddle
Sample data (this is something that you should have included in your question in addition to the screenshot)
CREATE TABLE INTERVAL_TABLE (
INTERVAL_ID int,
USER_ID int,
INTERVAL_TYPE_ID int,
INTERVAL_TIMESTAMP timestamp,
ENTRY_TYPE varchar(255));
INSERT INTO INTERVAL_TABLE (INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE) VALUES
(1, 1, 1, '2018-03-08 14:00:00', 'INIT_INTERVAL'),
(2, 1, 1, '2018-03-08 15:00:00', 'END_INTERVAL' ),
(3, 1, 1, '2018-03-08 15:30:00', 'END_INTERVAL' ),
(4, 1, 1, '2018-03-08 15:45:00', 'INIT_INTERVAL'),
(5, 1, 1, '2018-03-08 15:50:00', 'INIT_INTERVAL');
Query
WITH
CTE
AS
(
SELECT
USER_ID
,INTERVAL_TYPE_ID
,ENTRY_TYPE AS Curr_Entry_Type
,INTERVAL_TIMESTAMP AS Curr_Interval_Timestamp
,INTERVAL_ID AS Curr_Interval_ID
,LAG(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Entry_Type
,LAG(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_Timestamp
,LAG(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Prev_Interval_ID
,LEAD(ENTRY_TYPE) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Entry_Type
,LEAD(INTERVAL_TIMESTAMP) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_Timestamp
,LEAD(INTERVAL_ID) OVER(PARTITION BY USER_ID, INTERVAL_TYPE_ID ORDER BY INTERVAL_TIMESTAMP) AS Next_Interval_ID
FROM
INTERVAL_TABLE
)
,CTE_Result
AS
(
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Curr_Entry_Type AS Entry_Type_Init
,Curr_Interval_Timestamp AS Interval_Timestamp_Init
,Curr_Interval_ID AS Interval_ID_Init
,Next_Entry_Type AS Entry_Type_End
,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_Timestamp END AS Interval_Timestamp_End
,CASE WHEN Next_Entry_Type = 'END_INTERVAL' THEN Next_Interval_ID END AS Interval_ID_End
FROM CTE
WHERE Curr_Entry_Type = 'INIT_INTERVAL'
UNION -- sic! not UNION ALL
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Prev_Entry_Type AS Entry_Type_Init
,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_Timestamp END AS Interval_Timestamp_Init
,CASE WHEN Prev_Entry_Type = 'INIT_INTERVAL' THEN Prev_Interval_ID END AS Interval_ID_Init
,Curr_Entry_Type AS Entry_Type_End
,Curr_Interval_Timestamp AS Interval_Timestamp_End
,Curr_Interval_ID AS Interval_ID_End
FROM CTE
WHERE Curr_Entry_Type = 'END_INTERVAL'
)
SELECT
USER_ID
,INTERVAL_TYPE_ID
,Interval_Timestamp_Init
,Interval_Timestamp_End
,Interval_ID_Init
,Interval_ID_End
FROM CTE_Result
ORDER BY
USER_ID
,INTERVAL_TYPE_ID
,COALESCE(Interval_Timestamp_Init, Interval_Timestamp_End)
Results
| user_id | interval_type_id | interval_timestamp_init | interval_timestamp_end | interval_id_init | interval_id_end |
|---------|------------------|-------------------------|------------------------|------------------|-----------------|
| 1 | 1 | 2018-03-08T14:00:00Z | 2018-03-08T15:00:00Z | 1 | 2 |
| 1 | 1 | (null) | 2018-03-08T15:30:00Z | (null) | 3 |
| 1 | 1 | 2018-03-08T15:45:00Z | (null) | 4 | (null) |
| 1 | 1 | 2018-03-08T15:50:00Z | (null) | 5 | (null) |
This might not be the most efficient way to do this (I imagine a recursive query might be), but I find these subqueries easier to maintain:
WITH ordered_table AS (
SELECT row_number() OVER(ORDER BY USER_ID,INTERVAL_TYPE_ID,INTERVAL_TIMESTAMP ASC) row_num, *
FROM INTERVAL_TABLE
ORDER BY row_num
),
_inits AS (
SELECT
t1.USER_ID,
t1.INTERVAL_TYPE_ID AS INTERVAL_TYPE,
t1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT,
CASE
WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL'
AND t2.ENTRY_TYPE = 'END_INTERVAL'
AND t1.USER_ID = t2.USER_ID
AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
THEN t2.INTERVAL_TIMESTAMP
END AS INTERVAL_TIMESTAMP_END,
t1.INTERVAL_ID AS INTERVAL_ID_INIT,
CASE
WHEN t1.ENTRY_TYPE = 'INIT_INTERVAL'
AND t2.ENTRY_TYPE = 'END_INTERVAL'
AND t1.USER_ID = t2.USER_ID
AND t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
THEN t2.INTERVAL_ID
END AS INTERVAL_ID_END
FROM ordered_table AS t1
LEFT JOIN ordered_table AS t2 ON (
t1.row_num = t2.row_num - 1 AND
t1.USER_ID = t2.USER_ID AND
t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
)
WHERE t1.ENTRY_TYPE = 'INIT_INTERVAL'
),
_ends AS (
SELECT
t2.USER_ID,
t2.INTERVAL_TYPE_ID AS INTERVAL_TYPE,
NULL::timestamp AS INTERVAL_TIMESTAMP_INIT,
CASE
WHEN (
t1.ENTRY_TYPE = 'END_INTERVAL' AND
t2.ENTRY_TYPE = 'END_INTERVAL'
)
OR (t1.ENTRY_TYPE IS NULL) -- case when first record for USER_ID and INTERVAL_TYPE_ID is an END
THEN t2.INTERVAL_TIMESTAMP
END AS INTERVAL_TIMESTAMP_END,
NULL::int AS INTERVAL_ID_INIT,
t2.INTERVAL_ID AS INTERVAL_ID_END
FROM ordered_table AS t1
RIGHT JOIN ordered_table AS t2 ON (
t1.row_num = t2.row_num - 1 AND
t1.USER_ID = t2.USER_ID AND
t1.INTERVAL_TYPE_ID = t2.INTERVAL_TYPE_ID
)
WHERE t2.ENTRY_TYPE = 'END_INTERVAL'
)
SELECT * FROM (
SELECT * FROM _inits
UNION ALL
SELECT * FROM _ends
) qry
WHERE
COALESCE(interval_timestamp_init, interval_timestamp_end) IS NOT NULL
ORDER BY
USER_ID,
INTERVAL_TYPE,
COALESCE(interval_timestamp_init, interval_timestamp_end)
Basically, INITs will always be listed. They will either have an associated END or a null. So almost all the content from _inits
will be there.
Because the ENDs were already captured by the INITs, we only need to capture the ones that don't have an INIT (they were preceded by an END).
Because they are outer joins, you simply can remove the cases where INIT and END both are NULL and apply proper ordering.
This query gives the output you need:
WITH Intervals AS
(
WITH Events AS
(
WITH OrderedEvents AS
(
SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
)
SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
FROM OrderedEvents Events1 INNER JOIN
OrderedEvents Events2
ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
)
SELECT USER_ID, INTERVAL_TYPE_ID,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_TIMESTAMP_INIT
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
END AS INTERVAL_TIMESTAMP_INIT,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_TIMESTAMP_END
END AS INTERVAL_TIMESTAMP_END,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_INIT
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN INTERVAL_ID_INIT
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN NULL
END AS INTERVAL_ID_INIT,
CASE WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
WHEN ENTRY_TYPE1 = 'INIT_INTERVAL' AND ENTRY_TYPE2 = 'INIT_INTERVAL' THEN NULL
WHEN ENTRY_TYPE1 = 'END_INTERVAL' AND ENTRY_TYPE2 = 'END_INTERVAL' THEN INTERVAL_ID_END
END AS INTERVAL_ID_END
FROM Events
)
SELECT * FROM Intervals WHERE INTERVAL_ID_INIT IS NOT NULL OR INTERVAL_ID_END IS NOT NULL;
At first, we build OrderedEvents
CTE that groups entries by USER_ID
and INTERVAL_TYPE_ID
, sorts them by INTERVAL_TIMESTAMP
within each group and assign numeric order to each event.
Also for each group we add INIT_INTERVAL
as first event and END_INTERVAL
as last event to cover cases when group starts with END_INTERVAL
or finishes with INIT_INTERVAL
:
WITH OrderedEvents AS
(
SELECT INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, INTERVAL_TIMESTAMP, ENTRY_TYPE, row_number() over (partition by USER_ID, INTERVAL_TYPE_ID order by INTERVAL_TIMESTAMP ASC) AS EVENT_ORDER FROM INTERVAL_TABLE
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'INIT_INTERVAL' AS ENTRY_TYPE, 0 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
UNION ALL
SELECT NULL AS INTERVAL_ID, USER_ID, INTERVAL_TYPE_ID, NULL AS INTERVAL_TIMESTAMP, 'END_INTERVAL' AS ENTRY_TYPE, COUNT(*) + 1 AS EVENT_ORDER FROM INTERVAL_TABLE GROUP BY USER_ID, INTERVAL_TYPE_ID
)
SELECT * FROM OrderedEvents ORDER BY user_id, interval_type_id, event_order;
This query gives following results for the provided data:
Then we intersect OrderedEvents
with itself on USER_ID
and INTERVAL_TYPE_ID
and select pairs of neighbor events (Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
):
WITH OrderedEvents AS
(
...
)
SELECT Events1.USER_ID, Events1.INTERVAL_TYPE_ID, Events1.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_INIT, Events2.INTERVAL_TIMESTAMP AS INTERVAL_TIMESTAMP_END, Events1.INTERVAL_ID AS INTERVAL_ID_INIT, Events2.INTERVAL_ID AS INTERVAL_ID_END, Events1.ENTRY_TYPE AS ENTRY_TYPE1, Events2.ENTRY_TYPE AS ENTRY_TYPE2
FROM OrderedEvents Events1 INNER JOIN
OrderedEvents Events2
ON Events1.USER_ID = Events2.USER_ID AND Events1.INTERVAL_TYPE_ID = Events2.INTERVAL_TYPE_ID AND Events1.EVENT_ORDER + 1 = Events2.EVENT_ORDER
This query gives following results:
Now we should transform these pairs of neighbor events to intervals based on the logic you described. Previous output has columns entry_type1
and entry_type2
which could take values of INIT_INTERVAL
or END_INTERVAL
.
The possible combinations are:
<INIT_INTERVAL, END_INTERVAL>
- this is the most natural case when INIT_INTERVAL
is followed by END_INTERVAL
. We take event values as is.<INIT_INTERVAL(1), INIT_INTERVAL(2)>
- the case of two consecutive INIT_INTERVAL
. We force ending of the interval by taking <INIT_INTERVAL(1), NULL>
. INIT_INTERVAL(2)
will be taken with the next pair when it will be in the first entry.<END_INTERVAL(1), END_INTERVAL(2)>
- the case of two consecutive END_INTERVAL
. We force start of the interval by taking <NULL, END_INTERVAL(2)>
. END_INTERVAL(1)
is processed either by case #1 or by the current case when it is the second entry in the pair.<END_INTERVAL, INIT_INTERVAL>
- such pairs are just skipped. END_INTERVAL
is taken either by case #1 or case #3. INIT_INTERVAL
is taken either by case #1 or case #2.All this logic is put into set of CASE
expressions. There are 4 such expressions with duplicated conditions, because we conditionally select 4 different columns (INTERVAL_TIMESTAMP_INIT
, INTERVAL_TIMESTAMP_END
, INTERVAL_ID_INIT
and INTERVAL_ID_END
) which could not be done with one CASE
expression.
The final output is the same as you described:
You can use the INTERVAL_ID (or a new column with a generated row_number) to join two instances of the same table, using as predicate something like this:
on a.INTERVAL_ID=b.INTERVAL_ID + 1
This way, you can compare and get in 1 line each record with the next one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With