Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

INNER JOIN performance with '<' or '>' condition

I have two tables with the column SessionOrder. This column is an integer datatype and have the following index: CREATE INDEX OSIDX_<internal name> ON <Entity>.

I'm executing the following query:

SELECT i_0.rn, i_1.rn 
FROM (
    SELECT "RawEvent"."SessionOrder" as rn
    FROM "RawEvent" i_0
    WHERE something = 12
)
INNER JOIN (
    SELECT "RawEvent"."SessionOrder" as rn
    FROM "RawEvent" i_1
    WHERE something = 14
) ON i_0.rn > i_1.rn

The problem of this query is the ON i_0.rn > i_1.rn that gets too slow and times out. I replaced it by ON i_0.rn = i_1.rn and it was very fast but obviously does not produce the expected results.

Does someone know a way to increase the performance of this query avoiding the timeout? Other goal of this question is to understand why it gets bad performance with ON i_0.rn > i_1.rn.

PS: It is not possible to increase the timeout time

like image 258
Hélder Gonçalves Avatar asked Dec 21 '16 17:12

Hélder Gonçalves


People also ask

Which is faster inner join or WHERE clause?

The subquery can be placed in the following SQL clauses they are WHERE clause, HAVING clause, FROM clause. Advantages Of Joins: The advantage of a join includes that it executes faster. The retrieval time of the query using joins almost always will be faster than that of a subquery.

Does order of inner joins matter for performance?

JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.

Does adding WHERE condition in join in SQL improves performance?

No, it doesn't. Query optimizer will transform your code anyway. You better choose a convention and go with it.


2 Answers

Please check first if you realy use Oracle database. The syntax of your SQL suggest either other RDBMS or some prepocessor.

To get an impression what you can expect from such kind of queries, you may use a dummy example as follows.

Generate Sample Data

create table myTab as 
with mySeq as 
(select rownum SessionOrder from dual connect by level <= 10000)
select 12 something, SessionOrder from mySeq union all
select 14  something, SessionOrder from mySeq
;

This produces both subsources each with 10.000 sequences starting from 1 to 10.000.

Test Query

create table myRes as
select a.SessionOrder rn0, b.SessionOrder rn1
from myTab a join myTab b on a.SessionOrder > b.SessionOrder and
a.something = 12 and b.something = 14;

Produces 49.995.000 rows in less that 30 seconds.

If you expect to get such large result in much less time, you'll need an advanced optimization. Without knowing your data and requirement no generll advice is possible.

like image 134
Marmite Bomber Avatar answered Nov 04 '22 13:11

Marmite Bomber


As recommended I tried to solve the problem with other strategy that got greater performance.

Despite this simple solution, I do not understand why the original query got too slow. I think that the Oracle engine is not using indexes.

SELECT i_0."SessionOrder",  i_1."SessionOrder"
FROM "RawEvent" i_0
INNER JOIN "RawEvent" i_1 ON  i_0."SessionOrder" < i_1."SessionOrder" 
WHERE i_0."something" = 12 AND i_1."something" = 14
like image 29
Hélder Gonçalves Avatar answered Nov 04 '22 15:11

Hélder Gonçalves