Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL query takes more than an hour to execute for 200k rows

I have two tables each with around 200,000 rows. I have run the query below and it still hasn't completed after running for more than an hour. What could be the explanation for this?

SELECT 
    dbo.[new].[colom1],
    dbo.[new].[colom2],
    dbo.[new].[colom3],
    dbo.[new].[colom4],  
    dbo.[new].[Value] as 'nieuwe Value',
    dbo.[old].[Value] as 'oude Value'
FROM dbo.[new]
JOIN dbo.[old] 
    ON dbo.[new].[colom1] = dbo.[old].[colom1] 
    and dbo.[new].[colom2] = dbo.[old].[colom2] 
    and dbo.[new].[colom3] = dbo.[old].[colom3] 
    and dbo.[new].[colom4] = dbo.[old].[colom4] 
where dbo.[new].[Value] <> dbo.[old].[Value]

from comment;

Execution plan

Table structure

like image 865
wouter de jong Avatar asked Dec 23 '16 13:12

wouter de jong


Video Answer


1 Answers

It seems that for an equality join on a single column, the rows with NULL value in the join key are being filtered out, but this is not the case for joins on multiple columns.
As a result, the hash join complexity is changed from O(N) to O(N^2).

======================================================================

In that context I would like to recommend a great article written by Paul White on similar issues - Hash Joins on Nullable Columns

======================================================================

I have generated a small simulation of this use-case and I encourage you to test your solutions.

create table mytab1 (c1 int null,c2 int null)
create table mytab2 (c1 int null,c2 int null)

;with t(n) as (select 1 union all select n+1 from t where n < 10)
insert into mytab1 select null,null from t t0,t t1,t t2,t t3,t t4

insert into mytab2 select null,null from mytab1

insert into mytab1 values (111,222);
insert into mytab2 values (111,222);

select * from mytab1 t1 join mytab2 t2 on t1.c1 = t2.c1 and t1.c2 = t2.c2 

For the OP query we should remove rows with NULL values in any of the join key columns.

SELECT 
    dbo.[new].[colom1],
    dbo.[new].[colom2],
    dbo.[new].[colom3],
    dbo.[new].[colom4],  
    dbo.[new].[Value] as 'nieuwe Value',
    dbo.[old].[Value] as 'oude Value'
FROM dbo.[new]
JOIN dbo.[old] 
    ON dbo.[new].[colom1] = dbo.[old].[colom1] 
    and dbo.[new].[colom2] = dbo.[old].[colom2] 
    and dbo.[new].[colom3] = dbo.[old].[colom3] 
    and dbo.[new].[colom4] = dbo.[old].[colom4] 
where dbo.[new].[Value] <> dbo.[old].[Value]
    and dbo.[new].[colom1]  is not null
    and dbo.[new].[colom2]  is not null
    and dbo.[new].[colom3]  is not null
    and dbo.[new].[colom4]  is not null
    and dbo.[old].[colom1]  is not null
    and dbo.[old].[colom2]  is not null
    and dbo.[old].[colom3]  is not null
    and dbo.[old].[colom4]  is not null
like image 146
David דודו Markovitz Avatar answered Oct 21 '22 23:10

David דודו Markovitz