Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MYSQL: Excluding several rows if one joined row matches a condition

Tags:

sql

mysql

drupal

I've got two tables. One is users, which has fields uid and name. The other table is users_roles, which has fields uid and rid (role id).

I'd like to retrieve a list of users that don't have any of a set of provided roles.

For example:

uid | name
----------
1   | BOB
2   | DAVE
3   | JOHN

USERS_ROLES:

uid | rid
---------
1   | 1
1   | 2
1   | 3
2   | 1
3   | 1

I want to be able to query for just users that don't have a certain set of rids. For instance, a query that excludes rids 2 and 3 should return DAVE and JOHN, or a query that excludes rids (1,3) should return nobody.

like image 450
Mike Avatar asked Dec 21 '22 10:12

Mike


2 Answers

A query using an anti-join pattern is sometimes the most efficient:

SELECT u.uid
     , u.name
  FROM users u
  LEFT
  JOIN users_roles r
    ON r.uid = u.uid
   AND r.rid IN (2,3)
 WHERE r.uid IS NULL

The anti-join pattern is do a LEFT [outer] JOIN the user_roles table to pull back all the matching rows, AND to get rows from users that don't have a matching row. The "trick" is to exclude all the matching rows with a predicate in the WHERE clause that eliminates all the rows from users that had a match.

An equivalent resultset can be obtained using a NOT EXISTS correlated subquery:

SELECT u.uid
     , u.name
  FROM users u
 WHERE NOT EXISTS 
       ( SELECT 1 
           FROM users_roles r
          WHERE r.uid = u.uid
            AND r.rid IN (2,3)
       )

Another approach is to use a NOT IN, although that is sometimes less efficient. Performance depends on a whole host of factors. It's possible for the optimizer to generate different execution plans for each of these queries.

In any case, for best performance, you'll need an index ... ON users_roles (uid) or ON users_roles (uid,rid).


Followup:

Performance testing on my MySQL 5.1.34 server reveals that an anti-join query is almost twice as fast as equivalent NOT EXISTS and NOT IN queries. (1.091 sec vs. 2.066 sec and 2.020 sec)

-- setup and populate test tables
CREATE TABLE t_users
( uid    INT UNSIGNED NOT NULL PRIMARY KEY
, `name` VARCHAR(50)
) ENGINE=INNODB DEFAULT CHARSET=latin1 ;

CREATE TABLE t_users_roles
( uid INT UNSIGNED NOT NULL
, rid INT UNSIGNED NOT NULL
, PRIMARY KEY (uid,rid)
) ENGINE=INNODB DEFAULT CHARSET=latin1;

ALTER TABLE t_users_roles ADD CONSTRAINT FK_t_users_roles_t_users FOREIGN KEY (uid) REFERENCES t_users (uid);

CREATE INDEX t_users_ix1 ON t_users (uid,`name`);

CREATE INDEX t_users_roles_ix1 ON t_users_roles (rid,uid);

INSERT INTO t_users (uid,`name`)
SELECT d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1 AS uid
     , CONCAT('NAME',LPAD(d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1,8,'-')) AS `name`
  FROM (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) o
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) h
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) u
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d 
;

-- 1000000 row(s) affected.

INSERT INTO t_users_roles (uid,rid)
SELECT d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1 AS uid
     , r.rid
  FROM (SELECT 1 AS rid UNION ALL SELECT 2 UNION ALL SELECT 3) r
 CROSS
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) o
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) t
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) h
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) u
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e
  JOIN (SELECT 0 AS d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
 WHERE (d.d*100000+e.d*10000+u.d*1000+h.d*100+t.d*10+o.d+1) % 100000 <> 0
;
-- 2999970 row(s) affected

OPTIMIZE TABLE t_users;
OPTIMIZE TABLE t_users_roles;

SHOW STATUS LIKE 'Qcache_hits' ;
-- Variable_name  Value    
-- -------------  ---------
-- Qcache_hits    1117342  

SHOW VARIABLES LIKE 'version' ; 
-- Variable_name  Value       
-- -------------  ------------
-- version        5.1.53-log

-- table size from the file system
$ du -sh DATA/test/t_users*.ibd
72M     DATA/test/t_users.ibd
133M    DATA/test/t_users_roles.ibd


-- anti-join query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
  LEFT
  JOIN t_users_roles r
    ON r.uid = u.uid
   AND r.rid IN (2,3)
 WHERE r.uid IS NULL ;

-- Exec: 1.095 sec
-- Exec: 1.090 sec
-- Exec: 1.091 sec
-- Exec: 1.087 sec
-- Exec: 1.090 sec
-- avg 5 executions: 1.091 sec    


-- not exists query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
 WHERE NOT EXISTS
       ( SELECT 1
           FROM t_users_roles r
          WHERE r.uid = u.uid
            AND r.rid IN (2,3)
       ) ;

-- Exec: 2.071 sec
-- Exec: 2.066 sec
-- Exec: 2.059 sec
-- Exec: 2.065 sec
-- Exec: 2.070 sec
-- avg 5 executions: 2.066 sec

-- not in query
SELECT SQL_NO_CACHE u.uid
     , u.name
  FROM t_users u
 WHERE u.uid NOT IN
       ( SELECT r.uid
           FROM t_users_roles r
          WHERE r.uid IS NOT NULL
            AND r.rid IN (2,3)
       ) ;

-- Exec: 2.022 sec 
-- Exec: 2.023 sec 
-- Exec: 2.014 sec
-- Exec: 2.026 sec
-- Exec: 2.016 sec
-- avg 5 executions: 2.020 sec

SHOW STATUS LIKE 'Qcache_hits' ;
-- Variable_name  Value    
-- -------------  ---------
-- Qcache_hits    1117342  

EXPLAIN output for three statements:

-- ANTI JOIN
id  select_type         table   type            possible_keys              key          key_len  ref       rows   filtered  Extra                                 
--  ------------------  ------  --------------  -------------------------  -----------  -------  -----  -------  --------  ------------------------------------
 1  SIMPLE              u       index                                      t_users_ix1  57              1000423    100.00  Using index
 1  SIMPLE              r       ref             PRIMARY,t_users_roles_ix1  PRIMARY      4        u.uid        1    100.00  Using where; Using index; Not exists

-- NOT EXISTS
id  select_type         table   type            possible_keys              key          key_len  ref       rows  filtered  Extra                     
--  ------------------  ------  --------------  -------------------------  -----------  -------  -----  -------  --------  --------------------------
 1  PRIMARY             u       index                                      t_users_ix1  57              1000423    100.00  Using where; Using index
 2  DEPENDENT SUBQUERY  r       ref             PRIMARY,t_users_roles_ix1  PRIMARY      4        u.uid        1    100.00  Using where; Using index

-- NOT IN
id  select_type         table   type            possible_keys              key          key_len  ref        rows  filtered  Extra                     
--  ------------------  ------  --------------  -------------------------  -----------  -------  ------  -------  --------  --------------------------
 1  PRIMARY             u       index                                      t_users_ix1  57               1000423    100.00  Using where; Using index
 2  DEPENDENT SUBQUERY  r       index_subquery  PRIMARY,t_users_roles_ix1  PRIMARY      4        func          1    100.00  Using index; Using where
like image 113
spencer7593 Avatar answered Dec 31 '22 13:12

spencer7593


You're looking for the NOT IN query:

MySQL "NOT IN" query

SELECT name 
FROM users 
WHERE uid NOT IN (SELECT uid FROM Users_roles WHERE rid = 2 OR rid = 3)
like image 28
mattmanser Avatar answered Dec 31 '22 15:12

mattmanser