Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find missing time intervals in a table

I have the following table which contains values read every 15 minutes from several different devices:

ID   DeviceID   Date                    Value
----------------------------------------------
1    3          24.08.2011 00:00:00     0.51
2    3          24.08.2011 00:15:00     2.9
3    3          24.08.2011 00:30:00     0
4    3          24.08.2011 00:45:00     7.1
5    3          24.08.2011 01:00:00     1.05
6    3          24.08.2011 03:15:00     3.8

I'd like to find all the gaps in the table for each device, where there are no entries, for a given month. For the table above, the result should be something like this:

DeviceID    StartDate               EndDate
-------------------------------------------------------
3           24.08.2011 01:00:00     24.08.2011 03:15:00

The table has roughly 35000 devices and 100 million entries.

This is what I tried; it's rather slow, but returns what I need. However, besides its speed, there's another problem: it only finds missing intervals up to the last entry for a device in the given month; anything after that will be ignored, so it's possible to miss an extra interval of missing values.

SELECT
    t2.Date AS StartDate
    , t1.Date AS EndDate
FROM
    TestTable t1
    INNER JOIN TestTable t2 ON t1.DeviceID = t2.DeviceID
WHERE
    (t2.Date = (SELECT MAX(Date) FROM TestTable t3 WHERE t3.Date < t1.Date AND t3.DeviceID = t1.DeviceID)
        AND DATEDIFF(MINUTE, t2.Date, t1.Date) > 15)
    AND t1.DeviceID = @id
    AND DATEPART(YEAR, t1.Date) = @year AND DATEPART(MONTH, t1.Date) = @month
like image 513
alex Avatar asked Jun 06 '12 06:06

alex


People also ask

How do I fill missing values in SQL?

Using the SQL COALESCE function, it is easy to replace missing or NULL values in SELECT statements. Specific values can be set directly with COALESCE and the mean, median or mode can be used by combining COALESCE with WINDOW functions.


1 Answers

Following should work and doesn't return just a single record for a deviceid.

The gist of this is to

  • Add a rownumber to each record, ordered by Date and restarting for each DeviceID.
  • Join with self to create a result with rows consisting of the combination of two original rows. The relation between the columns of each row is the rownumber (+1) and the DeviceID.
  • Only retain those rows where the related Date is more than 15 minutes.

SQL Statement

;WITH t AS (
  SELECT  *, rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Date)
  FROM    TestTable
)  
SELECT  t1.DeviceID, t1.Date, t2.Date
FROM    t t1
        INNER JOIN t t2 ON t2.DeviceID = t1.DeviceID AND t2.rn = t1.rn + 1
WHERE   DATEDIFF(MINUTE, t1.Date, t2.Date) > 15        

Test script

;WITH TestTable (ID, DeviceID, Date, Value) AS (
  SELECT 1, 3, '2011-08-24 00:00:00', 0.51 UNION ALL
  SELECT 2, 3, '2011-08-24 00:15:00', 2.9 UNION ALL
  SELECT 3, 3, '2011-08-24 00:30:00', 0 UNION ALL
  SELECT 4, 3, '2011-08-24 00:45:00', 7.1 UNION ALL
  SELECT 5, 3, '2011-08-24 01:00:00', 1.05 UNION ALL
  SELECT 6, 3, '2011-08-24 03:15:00', 3.8 
)
, t AS (
  SELECT  *, rn = ROW_NUMBER() OVER (PARTITION BY DeviceID ORDER BY Date)
  FROM    TestTable
)  
SELECT  t1.DeviceID, t1.Date, t2.Date
FROM    t t1
        INNER JOIN t t2 ON t2.DeviceID = t1.DeviceID AND t2.rn = t1.rn + 1
WHERE   DATEDIFF(MINUTE, t1.Date, t2.Date) > 15        
like image 158
Lieven Keersmaekers Avatar answered Sep 25 '22 06:09

Lieven Keersmaekers