Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative of NOT IN On MySQL

Tags:

sql

mysql

I have a query

SELECT DISTINCT phoneNum 
FROM `Transaction_Register` 
WHERE phoneNum NOT IN (SELECT phoneNum FROM `Subscription`) 
LIMIT 0 , 1000000

It takes too much time to execute b/c Transaction_Register table has millions of records is there any alternative of above query I will be grateful to you guys if there is any.

like image 501
developerpk Avatar asked Jun 03 '13 16:06

developerpk


2 Answers

An alternative would be to use a LEFT JOIN:

select distinct t.phoneNum
from Transaction_Register t
left join Subscription s
  on t.phoneNum = s.phoneNum
where s.phoneNum is null
LIMIT 0 , 1000000;

See SQL Fiddle with Demo

like image 124
Taryn Avatar answered Sep 22 '22 12:09

Taryn


I doubt whether LEFT JOIN truly perform better than NOT IN. I just perform a few tests with the following table structure (if I am wrong please correct me):

account (id, ....)   [42,884 rows, index by id]
play (account_id, playdate, ...)   [61,737 rows, index by account_id]

(1) Query with LEFT JOIN

SELECT * FROM
account LEFT JOIN play ON account.id = play.account_id
WHERE play.account_id IS NULL

(2) Query with NOT IN

SELECT * FROM
account WHERE
account.id NOT IN (SELECT play.account_id FROM play)

Speed test with LIMIT 0,...

LIMIT 0,->   100      150      200      250
-------------------------------------------------------------------------
LEFT         3.213s   4.477s   5.881s   7.472s
NOT EXIST    2.200s   3.261s   4.320s   5.647s
--------------------------------------------------------------------------
Difference   1.013s   1.216s   1.560s   1.825s

As I increase the the limit, the difference is getting larger and larger


With EXPLAIN

(1) Query with LEFT JOIN

SELECT_TYPE   TABLE      TYPE   ROWS    EXTRA
-------------------------------------------------
SIMPLE         account   ALL    42,884
SIMPLE         play      ALL    61,737  Using where; not exists

(2) Query with NOT IN

SELECT_TYPE          TABLE      TYPE   ROWS   EXTRA
-------------------------------------------------
SIMPLE               account   ALL    42,884  Using where
DEPENDENT SUBQUERY   play      INDEX  61,737  Using where; Using index

It seem like the LEFT JOIN does not make use of index

LOGIC

(1) Query with LEFT JOIN

After LEFT JOIN between account and play will produce 42,884 * 61,737 = 2,647,529,508 rows. Then check if play.account_id is NULL on those rows.

(2) Query with NOT IN

Binary search takes log2(N) for item existence. That's mean 42,884 * log2(61,737) = 686,144 steps

like image 39
invisal Avatar answered Sep 23 '22 12:09

invisal