Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(self) join by time intervals

I have a table in an oracle database. The schema is

create table PERIODS
( 
  ID NUMBER, 
  STARTTIME TIMESTAMP, 
  ENDTIME TIMESTAMP, 
  TYPE VARCHAR2(100)
)

I have two different TYPE's: TYPEA and TYPEB. The have independent start and end times and they can overlap. What I would like to find are the periods of TYPEB that started, are fully contained or ended within a given period of TYPEA.

Here is what I came up with so far (with some sample data)

WITH mydata 
     AS (SELECT 100                                                    ID, 
                To_timestamp('2015-08-01 11:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 110                                                    ID, 
                To_timestamp('2015-08-01 11:30', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:50', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 120                                                    ID, 
                To_timestamp('2015-08-01 12:00', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:20', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEA'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 105                                                    ID, 
                To_timestamp('2015-08-01 10:55', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 108                                                    ID, 
                To_timestamp('2015-08-01 11:05', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual 
         UNION ALL 
         SELECT 111                                                    ID, 
                To_timestamp('2015-08-01 11:15', 'YYYY-MM-DD HH24:MI') STARTTIME, 
                To_timestamp('2015-08-01 12:25', 'YYYY-MM-DD HH24:MI') ENDTIME, 
                'TYPEB'                                                TYPE 
         FROM   dual), 
     typeas 
     AS (SELECT starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEA'), 
     typebs 
     AS (SELECT id, 
                starttime, 
                endtime 
         FROM   mydata 
         WHERE  TYPE = 'TYPEB') 
SELECT id 
FROM   typebs b 
       join typeas a 
         ON ( b.starttime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.starttime BETWEEN a.starttime AND a.endtime 
                  AND b.endtime BETWEEN a.starttime AND a.endtime ) 
             OR ( b.endtime BETWEEN a.starttime AND a.endtime ) 
ORDER  BY id; 

This seems to work in principle, the result from the query above is

        ID
----------
       105
       108
       111

so it selects the three periods TYPEB that started or ended inside the first TYPEA period.

The problem is that the table has about 200k entries and already at this size the above query is quite slow --- which is very surprising to me as the number of both TYPEA and TYPEB entries is quite low ( 1-2k )

Is there a more efficient way to perform this type of self join? Did I miss something else in my query?

like image 518
Erik Avatar asked Aug 01 '15 20:08

Erik


People also ask

What is a self join in SQL?

You can join different tables by their common columns using the JOIN keyword. It is also possible to join a table to itself, which is known as a self join. In this article, we will discuss what a self join is, how it works, and when you need it in your SQL queries.

What are the last three columns in a self join?

The last three columns are taken from the table aliased ancestor and contain details about each ancestor. In SQL, it is possible to have a self join in combination with one or more different tables. While not a clean self join, this is very common in practice.

What is the difference between self join and table alias?

Because the query that uses self join references the same table, the table alias is used to assign different names to the same table within the query. Note that referencing the same table more than one in a query without using table aliases will result in an error.

How to perform an inner join on a single table?

Since in self-join, we perform an inner join on a single table. We create two instances of the table as t1 and t2. WHERE t1.common_filed = t2.common_field: It is used to specify the conditions to filter records.In self join we will be mentioning the condition on which the two instances of the table, namely t1 and t2 will join.


1 Answers

Maybe worth a try (also you need to write the most restricting conditions in the end in oracle, don't ask me why or believe me, better do your own performance tests):

SELECT
   p.id
FROM
   periods p
WHERE
   EXISTS(SELECT * FROM periods q WHERE
      (p.startTime BETWEEN q.startTime AND q.endTime
      OR p.endTime BETWEEN q.startTime AND q.endTime
      OR p.startTime < q.startTime AND p.endTime > q.endTime -- overlapping correction, remove if not needed
      ) AND q.type = 'TYPEA'
   ) AND p.type = 'TYPEB'
ORDER BY
   p.id
;
like image 115
maraca Avatar answered Sep 30 '22 17:09

maraca