Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select parent row only if it has no children

Tags:

sql

mysql

I have a MySQL database in which table A has a one-to-many relation to table B, and I would like to select all rows in table B that have no children in table A. I have tried using

SELECT id FROM A WHERE NOT EXISTS (SELECT * FROM B WHERE B.id=A.id)

and

SELECT id FROM A LEFT JOIN B ON A.id=B.id WHERE B.id IS NULL

Both of these seem slow. Is there a faster query to achieve the same thing?

In case this is relevant, in my database table A has about 500,000 rows and table B has about 3 to 4 million rows.

Edit: For the actual tables in my database, explain gives me:

+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
| id | select_type        | table            | type  | possible_keys | key                       | key_len | ref  | rows    | Extra                    |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
|  1 | PRIMARY            | frontend_form471 | index | NULL          | frontend_form471_61a633e8 | 32      | NULL |  671927 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | SchoolData       | index | PRIMARY       | PRIMARY                   | 49      | NULL | 3121110 | Using where; Using index |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+

for

select number from frontend_form471 where not exists (select * from SchoolData where SchoolData.`f471 Application Number`=frontend_form471.number)

and

+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
| id | select_type | table            | type  | possible_keys | key                       | key_len | ref  | rows    | Extra                                          |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | frontend_form471 | index | NULL          | frontend_form471_61a633e8 | 32      | NULL |  671927 | Using index; Using temporary                   |
|  1 | SIMPLE      | SchoolData       | index | PRIMARY       | PRIMARY                   | 49      | NULL | 3121110 | Using where; Using index; Not exists; Distinct |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+

for

select distinct number from frontend_form471 left join SchoolData on frontend_form471.number=SchoolData.`f471 Application Number` where SchoolData.`f471 Application Number` is NULL

where in my case frontend_form471 is table A and SchoolData is table B

Edit2: In table B (SchoolData) in my database, id is the first part of a two part primary key, so it is indexed and there are still multiple entries in B with the same id.

like image 706
murgatroid99 Avatar asked Jul 19 '11 19:07

murgatroid99


4 Answers

SELECT id FROM A LEFT OUTER JOIN B ON A.id=B.id WHERE B.id IS NULL

you can do this. the outer join should bring a little performance, but not much.

new database systems will probably optimize your query anyway so that there wont be any difference.

the correct way here is caching! try the query cacher and application level caching if possible.

of course you need proper indexes.

and by proper i mean on both tables and preferably a hash index as it will have static lookup time in comparision to any tree that has logarithmic

Try putting an explain before the query to see what really slows this down.

if you really need this to be fast you may re-facture your data structure.

you could possibly create a trigger to mark a flag in table A whether there is a corresponding entry in table be. of course this id data redundancy, but sometimes its worth it. just think of it as caching.

one last thought: you could try SELECT id FROM A WHERE id NOT IN (SELECT id FROM B) it may be a little faster because no actual joining is necessary, however it may also be slower because the lookup in the set of be will be a full scan. I am not really sure how this will be processed but it may be worth a try.

like image 90
The Surrican Avatar answered Nov 18 '22 17:11

The Surrican


It's going to be slow no matter how you look at it. Worst case performance is going to be a full cross join creating 2 trillion potential matches (4 mill * 500k).

The second one will most likely perform faster, since it's a single query.

like image 23
Marc B Avatar answered Nov 18 '22 17:11

Marc B


Your indexing is poor.

For all forms (EXISTS, IN, LEFT JOIN) you should have indexes on id in both tables

like image 1
gbn Avatar answered Nov 18 '22 18:11

gbn


You could try

SELECT id FROM A WHERE A.id NOT IN (SELECT id FROM B)

but i don't know if this will be any faster. I would have tried the left join first. I think your problem is more to do with indexes. Do you have indexes on both id fields.

like image 1
David Steele Avatar answered Nov 18 '22 17:11

David Steele