I have a table files
with files and a table reades
with read accesses to these files. In the table reades
there is a column file_id
where refers to the respective column in files
.
Now I would like to list all files which have not been accessed and tried this:
SELECT * FROM files WHERE file_id NOT IN (SELECT file_id FROM reades)
This is terribly slow. The reason is that mySQL thinks that the subquery is dependent on the query:
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | PRIMARY | files | ALL | NULL | NULL | NULL | NULL | 1053 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | reades | ALL | NULL | NULL | NULL | NULL | 3242 | 100.00 | Using where |
+----+--------------------+--------+------+---------------+------+---------+------+------+----------+-------------+
But why? The subquery is completely independent and more or less just meant to return a list of ids.
(To be precise: Each file_id
can appear multiple times in reades
, of course, as there can be arbitrarily many read operations for each file.)
Try replacing the subquery with a join:
SELECT *
FROM files f
LEFT OUTER JOIN reades r on r.file_id = f.file_id
WHERE r.file_id IS NULL
Here's a link to an article about this problem. The writer of that article wrote a stored procedure to force MySQL to evaluate subqueries as independant. I doubt that's necessary in this case though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With