Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query challenge

So here's yet another 'write a query to X' challenge.

I'm monitoring a number of networked vending machines. Each machine has a number of parts, e.g. bank note acceptor, coin system, printer and so on.

Problems with machine parts are logged in table, let's call it 'faults', which looks something like this (irrelevant fields omitted):

machineid           partid         start_time            end_time
---------           ------         ----------------      ----------------
       1                2          2009-10-05 09:00      NULL
       1                3          2009-10-05 08:00      2009-10-05 10:00
       2                2          2009-09-30 12:00      2009-09-30 14:00
       3                4          2009-09-28 13:00      2009-09-28 15:00
       3                2          2009-09-28 12:00      2009-09-28 14:00

end_date is NULL if the problem is currently ongoing.

I need a query which show time periods for which the machine as a whole is down, and which can account for overlapping ranges, collapsing them down into a single record. So for the sample data above, it would produce:

machineid          start_time            end_time
---------          ----------------      ----------------
       1           2009-10-05 08:00      NULL
       2           2009-09-30 12:00      2009-09-30 14:00
       3           2009-09-28 12:00      2009-09-28 15:00

It's not tough to write procedural code to do this line by line, but a nice declarative SQL query would be more useful, more elegant. It seems like it ought to be possible, I just can't quite get there though.

SQL dialect is Oracle. Analytic functions are availabe if that would help.

Thanks!

like image 457
Res Cogitans Avatar asked Oct 05 '09 20:10

Res Cogitans


1 Answers

using analytics, you can build a query that will make a single pass on the data (with a large data set this will be the most efficient):

SELECT machineid, MIN(start_time), MAX(end_time)
  FROM (SELECT machineid, start_time, end_time, 
               SUM(gap) over(PARTITION BY machineid 
                             ORDER BY start_time) contiguous_faults
           FROM (SELECT machineid, start_time, 
                        coalesce(end_time, DATE '9999-12-31') end_time,
                         CASE
                            WHEN start_time > MAX(coalesce(end_time, 
                                                           DATE '9999-12-31'))
                                              over(PARTITION BY machineid 
                                                   ORDER BY start_time 
                                                   ROWS BETWEEN UNBOUNDED PRECEDING
                                                            AND 1 preceding)
                            THEN 1
                         END gap
                    FROM faults))
 GROUP BY machineid, contiguous_faults
 ORDER BY 1, 2

This query starts by determining if a row is contiguous to any row that started before. We then group the rows that are contiguous.

like image 147
Vincent Malgrat Avatar answered Oct 29 '22 06:10

Vincent Malgrat