I have an issue in Teradata where I am trying to build a historical contract table that lists a system, it's corresponding contracts and the start and end dates of each contract. This table would then be queried for reporting as a point in time table. Here is some code to better explain.
CREATE TABLE TMP_WORK_DB.SOLD_SYSTEMS
(
SYSTEM_ID varchar(5),
CONTRACT_TYPE varchar(10),
CONTRACT_RANK int,
CONTRACT_STRT_DT date,
CONTRACT_END_DT date
);
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'BEST', 10, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'BEST', 9, '2012-01-01', '2012-06-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('AAA', 'OK', 1, '2012-08-01', '2012-12-30');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'BEST', 10, '2013-12-01', '2014-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'BETTER', 7, '2013-12-01', '2017-03-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('BBB', 'GOOD', 4, '2016-12-02', '2017-12-02');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'BEST', 10, '2009-10-13', '2014-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'BETTER', 7, '2009-10-13', '2016-10-14');
INSERT INTO TMP_WORK_DB.SOLD_SYSTEMS VALUES ('CCC', 'OK', 2, '2008-10-13', '2017-10-14');
The required output would be:
SYSTEM_ID CONTRACT_TYPE CONTRACT_STRT_DT CONTARCT_END_DT CONTRACT_RANK
AAA BEST 01/01/2012 06/30/2012 10
AAA OK 08/01/2012 12/30/2012 1
BBB BEST 12/01/2013 03/02/2014 10
BBB BETTER 03/03/2014 03/02/2017 7
BBB GOOD 03/03/2017 12/02/2017 4
CCC OK 10/13/2008 10/12/2009 2
CCC BEST 10/13/2009 10/14/2014 10
CCC BETTER 10/15/2014 10/14/2016 7
CCC OK 10/15/2016 10/14/2017 2
I'm not necessarily looking to reduce rows but am looking to get the correct state of the system_id at any given point in time. Note that when a higher ranked contract ends and a lower ranked contract is still active the lower ranked picks up where the higher one left off.
We are using TD 14 and I have been able to get the easy records where the dates flow sequentially and are of higher rank but am having trouble with the overlaps where two different ranked contracts cover multiple date spans.
I found this blog post (Sharpening Stones) and got it working for the most part but am still having trouble setting the new start dates for the overlapping contracts.
Any help would be appreciated. Thanks.
*UPDATE 04/04/2014 *
I came up with the following code which gives me exactly what I want but I'm not sure of the performance. It works on smaller data sets of a few hundred rows but I havent tested it on several million:
*UPDATE 04/07/2014 * Updated the date subquery due to spool issues. This query explodes all days where the contract is possibly active and then uses the ROW_NUMBER function to get the highest ranked CONTRACT_TYPE per day. The MIN/MAX functions are then partitioned over the system and contract type to pick up when the highest ranked contract type changes.
*UPDATE - 2 - 04/07/2014 * I cleaned up the query and it seems to be perform a little better.
SELECT
SYSTEM_ID
, CONTRACT_TYPE
, MIN(CALENDAR_DATE) NEW_START_DATE
, MAX(CALENDAR_DATE) NEW_END_DATE
, CONTRACT_RANK
FROM (
SELECT
CALENDAR_DATE
, SYSTEM_ID
, CONTRACT_TYPE
, CONTRACT_RANK
, ROW_NUMBER() OVER (PARTITION BY SYSTEM_ID, CALENDAR_DATE ORDER BY CONTRACT_RANK DESC, CONTRACT_STRT_DT DESC, CONTRACT_END_DT DESC) AS RNK
FROM SOLD_SYSTEMS t1
JOIN (
SELECT CALENDAR_DATE
FROM FULL_CALENDAR_TABLE ia
WHERE CALENDAR_DATE > DATE'2013-01-01'
)dt
ON CALENDAR_DATE BETWEEN CONTRACT_STRT_DT AND CONTRACT_END_DT
QUALIFY RNK = 1
)z1
GROUP BY 1,2,5
Following approach uses the new PERIOD functions in TD13.10.
-- 1. TD_SEQUENCED_COUNT can't be used in joins, so create a Volatile Table
-- 2. TD_SEQUENCED_COUNT can't use additional columns (e.g. CONTRACT_RANK),
-- so simply create a new row whenever a period starts or ends without
-- considering CONTRACT_RANK
CREATE VOLATILE TABLE vt AS
(
WITH cte
(
SYSTEM_ID
,pd
)
AS
(
SELECT
SYSTEM_ID
-- PERIODs can easily be constructed on-the-fly, but the end date is not inclusive,
-- so I had to adjust to your implementation, CONTRACT_END_DT +/- 1:
,PERIOD(CONTRACT_STRT_DT, CONTRACT_END_DT + 1) AS pd
FROM SOLD_SYSTEMS
)
SELECT
SYSTEM_ID
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
FROM
TABLE (TD_SEQUENCED_COUNT
(NEW VARIANT_TYPE(cte.SYSTEM_ID)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,Policy_Count INTEGER
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID ,pd) AS dt
)
WITH DATA
PRIMARY INDEX (SYSTEM_ID)
ON COMMIT PRESERVE ROWS
;
-- Find the matching CONTRACT_RANK
SELECT
vt.SYSTEM_ID
,t.CONTRACT_TYPE
,vt.CONTRACT_STRT_DT
,vt.CONTRACT_END_DT
,t.CONTRACT_RANK
FROM vt
-- If both vt and SOLD_SYSTEMS have a NUPI on SYSTEM_ID this join should be
-- quite efficient
JOIN SOLD_SYSTEMS AS t
ON vt.SYSTEM_ID = t.SYSTEM_ID
AND ( t.CONTRACT_STRT_DT, t.CONTRACT_END_DT)
OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
QUALIFY
-- As multiple contracts for the same period are possible:
-- find the row with the highest rank
ROW_NUMBER()
OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3
;
-- Previous query might return consecutive rows with the same CONTRACT_RANK, e.g.
-- BBB BETTER 2014-03-03 2016-12-01 7
-- BBB BETTER 2016-12-02 2017-03-02 7
-- If you don't want that you have to normalize the data:
WITH cte
(
SYSTEM_ID
,CONTRACT_STRT_DT
,CONTRACT_END_DT
,CONTRACT_RANK
,CONTRACT_TYPE
,pd
)
AS
(
SELECT
vt.SYSTEM_ID
,vt.CONTRACT_STRT_DT
,vt.CONTRACT_END_DT
,t.CONTRACT_RANK
,t.CONTRACT_TYPE
,PERIOD(vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT + 1) AS pd
FROM vt
JOIN SOLD_SYSTEMS AS t
ON vt.SYSTEM_ID = t.SYSTEM_ID
AND ( t.CONTRACT_STRT_DT, t.CONTRACT_END_DT)
OVERLAPS (vt.CONTRACT_STRT_DT, vt.CONTRACT_END_DT)
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY vt.SYSTEM_ID,vt.CONTRACT_STRT_DT
ORDER BY t.CONTRACT_RANK DESC, vt.CONTRACT_END_DT DESC) = 1
)
SELECT
SYSTEM_ID
,CONTRACT_TYPE
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
,CONTRACT_RANK
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.SYSTEM_ID
,cte.CONTRACT_RANK
,cte.CONTRACT_TYPE)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,CONTRACT_RANK INT
,CONTRACT_TYPE VARCHAR(10)
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;
Edit: This is another way to get the result of the 2nd query without Volatile Table and TD_SEQUENCED_COUNT:
SELECT
t.SYSTEM_ID
,t.CONTRACT_TYPE
,BEGIN(CONTRACT_PERIOD) AS CONTRACT_STRT_DT
,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
,t.CONTRACT_RANK
,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
FROM
(
SELECT
dt.SYSTEM_ID
,PERIOD(d, MIN(d)
OVER (PARTITION BY dt.SYSTEM_ID
ORDER BY d
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
FROM
(
SELECT
SYSTEM_ID
,CONTRACT_STRT_DT AS d
FROM SOLD_SYSTEMS
UNION
SELECT
SYSTEM_ID
,CONTRACT_END_DT + 1 AS d
FROM SOLD_SYSTEMS
) AS dt
QUALIFY p IS NOT NULL
) AS dt
JOIN SOLD_SYSTEMS AS t
ON dt.SYSTEM_ID = t.SYSTEM_ID
WHERE CONTRACT_PERIOD IS NOT NULL
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY dt.SYSTEM_ID,p
ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
ORDER BY 1,3
And based on that you can also include the normalization in a single query:
WITH cte
(
SYSTEM_ID
,CONTRACT_TYPE
,CONTRACT_STRT_DT
,CONTRACT_END_DT
,CONTRACT_RANK
,pd
)
AS
(
SELECT
t.SYSTEM_ID
,t.CONTRACT_TYPE
,BEGIN(CONTRACT_PERIOD) AS CONTRACT_STRT_DT
,END(CONTRACT_PERIOD)- 1 AS CONTRACT_END_DT
,t.CONTRACT_RANK
,dt.p P_INTERSECT PERIOD(t.CONTRACT_STRT_DT,t.CONTRACT_END_DT + 1) AS CONTRACT_PERIOD
FROM
(
SELECT
dt.SYSTEM_ID
,PERIOD(d, MIN(d)
OVER (PARTITION BY dt.SYSTEM_ID
ORDER BY d
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING)) AS p
FROM
(
SELECT
SYSTEM_ID
,CONTRACT_STRT_DT AS d
FROM SOLD_SYSTEMS
UNION
SELECT
SYSTEM_ID
,CONTRACT_END_DT + 1 AS d
FROM SOLD_SYSTEMS
) AS dt
QUALIFY p IS NOT NULL
) AS dt
JOIN SOLD_SYSTEMS AS t
ON dt.SYSTEM_ID = t.SYSTEM_ID
WHERE CONTRACT_PERIOD IS NOT NULL
QUALIFY
ROW_NUMBER()
OVER (PARTITION BY dt.SYSTEM_ID,p
ORDER BY t.CONTRACT_RANK DESC, t.CONTRACT_END_DT DESC) = 1
)
SELECT
SYSTEM_ID
,CONTRACT_TYPE
,BEGIN(pd) AS CONTRACT_STRT_DT
,END(pd) - 1 AS CONTRACT_END_DT
,CONTRACT_RANK
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.SYSTEM_ID
,cte.CONTRACT_RANK
,cte.CONTRACT_TYPE)
,cte.pd)
RETURNS (SYSTEM_ID VARCHAR(5)
,CONTRACT_RANK INT
,CONTRACT_TYPE VARCHAR(10)
,pd PERIOD(DATE))
HASH BY SYSTEM_ID
LOCAL ORDER BY SYSTEM_ID, CONTRACT_RANK, CONTRACT_TYPE, pd ) A
ORDER BY 1, 3;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With