I have a MySQL database in which table A has a one-to-many relation to table B, and I would like to select all rows in table B that have no children in table A. I have tried using
SELECT id FROM A WHERE NOT EXISTS (SELECT * FROM B WHERE B.id=A.id)
and
SELECT id FROM A LEFT JOIN B ON A.id=B.id WHERE B.id IS NULL
Both of these seem slow. Is there a faster query to achieve the same thing?
In case this is relevant, in my database table A has about 500,000 rows and table B has about 3 to 4 million rows.
Edit: For the actual tables in my database, explain gives me:
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
| 1 | PRIMARY | frontend_form471 | index | NULL | frontend_form471_61a633e8 | 32 | NULL | 671927 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | SchoolData | index | PRIMARY | PRIMARY | 49 | NULL | 3121110 | Using where; Using index |
+----+--------------------+------------------+-------+---------------+---------------------------+---------+------+---------+--------------------------+
for
select number from frontend_form471 where not exists (select * from SchoolData where SchoolData.`f471 Application Number`=frontend_form471.number)
and
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | frontend_form471 | index | NULL | frontend_form471_61a633e8 | 32 | NULL | 671927 | Using index; Using temporary |
| 1 | SIMPLE | SchoolData | index | PRIMARY | PRIMARY | 49 | NULL | 3121110 | Using where; Using index; Not exists; Distinct |
+----+-------------+------------------+-------+---------------+---------------------------+---------+------+---------+------------------------------------------------+
for
select distinct number from frontend_form471 left join SchoolData on frontend_form471.number=SchoolData.`f471 Application Number` where SchoolData.`f471 Application Number` is NULL
where in my case frontend_form471 is table A and SchoolData is table B
Edit2: In table B (SchoolData) in my database, id is the first part of a two part primary key, so it is indexed and there are still multiple entries in B with the same id.
SELECT id FROM A LEFT OUTER JOIN B ON A.id=B.id WHERE B.id IS NULL
you can do this. the outer join should bring a little performance, but not much.
new database systems will probably optimize your query anyway so that there wont be any difference.
the correct way here is caching! try the query cacher and application level caching if possible.
of course you need proper indexes.
and by proper i mean on both tables and preferably a hash index as it will have static lookup time in comparision to any tree that has logarithmic
Try putting an explain before the query to see what really slows this down.
if you really need this to be fast you may re-facture your data structure.
you could possibly create a trigger to mark a flag in table A whether there is a corresponding entry in table be. of course this id data redundancy, but sometimes its worth it. just think of it as caching.
one last thought: you could try SELECT id FROM A WHERE id NOT IN (SELECT id FROM B)
it may be a little faster because no actual joining is necessary, however it may also be slower because the lookup in the set of be will be a full scan. I am not really sure how this will be processed but it may be worth a try.
It's going to be slow no matter how you look at it. Worst case performance is going to be a full cross join creating 2 trillion potential matches (4 mill * 500k).
The second one will most likely perform faster, since it's a single query.
Your indexing is poor.
For all forms (EXISTS, IN, LEFT JOIN) you should have indexes on id in both tables
You could try
SELECT id FROM A WHERE A.id NOT IN (SELECT id FROM B)
but i don't know if this will be any faster. I would have tried the left join first. I think your problem is more to do with indexes. Do you have indexes on both id fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With