Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL GROUP BY DateTime +/- 3 seconds

Suppose I have a table with 3 columns:

  • id (PK, int)
  • timestamp (datetime)
  • title (text)

I have the following records:

1, 2010-01-01 15:00:00, Some Title
2, 2010-01-01 15:00:02, Some Title
3, 2010-01-02 15:00:00, Some Title

I need to do a GROUP BY records that are within 3 seconds of each other. For this table, rows 1 and 2 would be grouped together.

There is a similar question here: Mysql DateTime group by 15 mins

I also found this: http://www.artfulsoftware.com/infotree/queries.php#106

I don't know how to convert these methods into something that will work for seconds. The trouble with the method on the SO question is that it seems to me that it would only work for records falling within a bin of time that starts at a known point. For instance, if I were to get FLOOR() to work with seconds, at an interval of 5 seconds, a time of 15:00:04 would be grouped with 15:00:01, but not grouped with 15:00:06.

Does this make sense? Please let me know if further clarification is needed.

EDIT: For the set of numbers, {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}, it seems it might be best to group them {1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}, so that each grouping row depends on if the row is within 3 seconds of the previous. I know this changes things a bit, I'm sorry for being wishywashy on this.

I am trying to fuzzy-match logs from different servers. Server #1 may log an item, "Item #1", and Server #2 will log that same item, "Item #1", within a few seconds of server #1. I need to do some aggregate functions on both log lines. Unfortunately, I only have title to go on, due to the nature of the server software.

like image 823
Brad Avatar asked Jul 01 '11 17:07

Brad


People also ask

How do I Group A timestamp by month?

You can group month and year with the help of function DATE_FORMAT() in MySQL. The GROUP BY clause is also used.

How to change date format in MySQL?

Change the curdate() (current date) format in MySQL The current date format is 'YYYY-mm-dd'. To change current date format, you can use date_format().

How do I group data in MySQL?

The MySQL GROUP BY Statement The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.


2 Answers

I'm using Tom H.'s excellent idea but doing it a little differently here:

Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.

Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:

SELECT DISTINCT Timestamp
FROM Table a
LEFT JOIN Table b
ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
    AND b.Timestamp < a.Timestamp)
WHERE b.Timestamp IS NULL

And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:

SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
FROM Table
JOIN ([query #1]) StartofChains
ON Table.Timestamp >= StartOfChains.TimeStamp
GROUP BY Table.id

Once we have that, we can GROUP BY it as you wanted.

SELECT COUNT(*) --or whatever
FROM Table
JOIN ([query #2]) GroupingQuery
ON Table.id = GroupingQuery.id
GROUP BY GroupingQuery.ChainStartTime

I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!

like image 192
Chris Cunningham Avatar answered Oct 12 '22 01:10

Chris Cunningham


Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:

SELECT
    MT1.my_id,
    MT1.title,
    MT1.my_time
FROM
    My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
    MT2.title = MT1.title AND
    (
        MT2.my_time < MT1.my_time OR
        (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
    ) AND
    MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
    MT2.my_id IS NULL

Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.

SELECT
    SQ1.my_id,
    COUNT(*)  -- You didn't say what you were trying to calculate, just that you needed to group them
FROM
(
    SELECT
        MT1.my_id,
        MT1.title,
        MT1.my_time
    FROM
        My_Table MT1
    LEFT OUTER JOIN My_Table MT2 ON
        MT2.title = MT1.title AND
        (
            MT2.my_time < MT1.my_time OR
            (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
        ) AND
        MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
    WHERE
        MT2.my_id IS NULL
) SQ1
INNER JOIN My_Table MT3 ON
    MT3.title = SQ1.title AND
    MT3.my_time >= SQ1.my_time
LEFT OUTER JOIN
(
    SELECT
        MT1.my_id,
        MT1.title,
        MT1.my_time
    FROM
        My_Table MT1
    LEFT OUTER JOIN My_Table MT2 ON
        MT2.title = MT1.title AND
        (
            MT2.my_time < MT1.my_time OR
            (MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
        ) AND
        MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
    WHERE
        MT2.my_id IS NULL
) SQ2 ON
    SQ2.title = SQ1.title AND
    SQ2.my_time > SQ1.my_time AND
    SQ2.my_time <= MT3.my_time
WHERE
    SQ2.my_id IS NULL

This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.

Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.

EDIT: Changed the queries to handle exact matches by timestamp.

like image 42
Tom H Avatar answered Oct 12 '22 01:10

Tom H