I have an activity table with a structure like this:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00
2 1 2000-01-01 00:00:01
3 1 2000-01-01 00:00:02
4 2 2000-01-01 00:00:00
5 2 2000-01-01 00:00:01
6 2 2000-01-01 01:00:00
7 2 2000-01-01 01:00:01
8 3 2000-01-01 00:00:00
9 3 2000-01-01 00:00:01
10 3 2000-01-01 02:00:00
I want to split the data within this activity table by product (prd_id
) and activity date (act_dt
), and update the the group (grp
) column with a value from a sequence for each of these groups.
The kicker is, I need to group by similar timestamps, where similar means "all records have a difference of exactly 1 second." In other words, within a group, the difference between any 2 records when sorted by date will be exactly 1 second, and the difference between the first and last records can be any amount of time, so long as all the intermediary records are 1 second apart.
For the example data, the groups would be:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00 1
2 1 2000-01-01 00:00:01 1
3 1 2000-01-01 00:00:02 1
4 2 2000-01-01 00:00:00 2
5 2 2000-01-01 00:00:01 2
6 2 2000-01-01 01:00:00 3
7 2 2000-01-01 01:00:01 3
8 3 2000-01-01 00:00:00 4
9 3 2000-01-01 00:00:01 4
10 3 2000-01-01 02:00:00 5
What method would I use to accomplish this?
The size of the table is ~20 million rows, if that affects the method used to solve the problem.
I'm not an Oracle wiz, so I'm guessing at the best option for one line:
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
This just needs to be "the number of seconds from [aDateConstant] to act_dt". The result can be negative. It just needs to be a the number of seconds, to turn your act_dt
into an INT. The rest should work fine.
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY prd_id ORDER BY act_dt) AS sequence_id,
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
*
FROM
yourTable
)
SELECT
DENSE_RANK() OVER (PARTITION BY prd_id ORDER BY time_id - sequence_id) AS group_id,
*
FROM
sequenced_data
Example data:
sequence_id | time_id | t-s | group_id
-------------+---------+-----+----------
1 | 1 | 0 | 1
2 | 2 | 0 | 1
3 | 3 | 0 | 1
4 | 8 | 4 | 2
5 | 9 | 4 | 2
6 | 12 | 6 | 3
7 | 14 | 7 | 4
8 | 15 | 7 | 4
NOTE: This does assume there are not multiple records with the same time. If there are, they would need to be filtered out first. Probably just using a GROUP BY in a preceding CTE.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With