Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle GROUP BY similar timestamps?

I have an activity table with a structure like this:

id  prd_id  act_dt               grp
------------------------------------
1   1       2000-01-01 00:00:00
2   1       2000-01-01 00:00:01
3   1       2000-01-01 00:00:02
4   2       2000-01-01 00:00:00
5   2       2000-01-01 00:00:01
6   2       2000-01-01 01:00:00
7   2       2000-01-01 01:00:01
8   3       2000-01-01 00:00:00
9   3       2000-01-01 00:00:01
10  3       2000-01-01 02:00:00

I want to split the data within this activity table by product (prd_id) and activity date (act_dt), and update the the group (grp) column with a value from a sequence for each of these groups.

The kicker is, I need to group by similar timestamps, where similar means "all records have a difference of exactly 1 second." In other words, within a group, the difference between any 2 records when sorted by date will be exactly 1 second, and the difference between the first and last records can be any amount of time, so long as all the intermediary records are 1 second apart.

For the example data, the groups would be:

id  prd_id  act_dt               grp
------------------------------------
1   1       2000-01-01 00:00:00  1
2   1       2000-01-01 00:00:01  1
3   1       2000-01-01 00:00:02  1
4   2       2000-01-01 00:00:00  2
5   2       2000-01-01 00:00:01  2
6   2       2000-01-01 01:00:00  3
7   2       2000-01-01 01:00:01  3
8   3       2000-01-01 00:00:00  4
9   3       2000-01-01 00:00:01  4
10  3       2000-01-01 02:00:00  5

What method would I use to accomplish this?

The size of the table is ~20 million rows, if that affects the method used to solve the problem.

like image 596
FtDRbwLXw6 Avatar asked Apr 02 '12 16:04

FtDRbwLXw6


1 Answers

I'm not an Oracle wiz, so I'm guessing at the best option for one line:

    (CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60      AS time_id,

This just needs to be "the number of seconds from [aDateConstant] to act_dt". The result can be negative. It just needs to be a the number of seconds, to turn your act_dt into an INT. The rest should work fine.

WITH
  sequenced_data
AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY prd_id  ORDER BY act_dt)      AS sequence_id,
    (CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60      AS time_id,
    *
  FROM
    yourTable
)
SELECT
  DENSE_RANK() OVER (PARTITION BY prd_id ORDER BY time_id - sequence_id) AS group_id,
  *
FROM
  sequenced_data 

Example data:

 sequence_id | time_id | t-s | group_id
-------------+---------+-----+----------
      1      |   1     |  0  |    1
      2      |   2     |  0  |    1
      3      |   3     |  0  |    1
      4      |   8     |  4  |    2
      5      |   9     |  4  |    2
      6      |   12    |  6  |    3
      7      |   14    |  7  |    4
      8      |   15    |  7  |    4


NOTE: This does assume there are not multiple records with the same time. If there are, they would need to be filtered out first. Probably just using a GROUP BY in a preceding CTE.

like image 107
MatBailie Avatar answered Oct 16 '22 05:10

MatBailie