I have a query which has some sub queries (inner selects), i'm trying work out which is better for performance, one larger query or lots of smaller queries, i'm finding it difficult to try and time the differences as it changes all the time on my server.
I use the query below to return 10 results at a time to display on my website, using pagination (offset and limit).
SELECT adverts.*, breed.breed, breed.type, sellers.profile_name, sellers.logo, users.user_level ,
round( sqrt( ( ( (adverts.latitude - '51.558430') * (adverts.latitude - '51.558430') ) * 69.1 * 69.1 ) + ( (adverts.longitude - '-0.0069345') * (adverts.longitude - '-0.0069345') * 53 * 53 ) ), 1 ) as distance,
( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename,
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos
FROM adverts
LEFT JOIN breed ON adverts.breed_id = breed.breed_id
LEFT JOIN sellers ON (adverts.user_id = sellers.user_id)
LEFT JOIN users ON (adverts.user_id = users.user_id)
WHERE (adverts.status = 1) AND (adverts.approved = 1)
AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719) AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
having (distance <= '20')
ORDER BY distance ASC
LIMIT 0,10
Would it be better to remove the 2 inner selects below from the main query, and then in my php loop, call the 2 selects 10 times, once for each record in the loop?
( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename,
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos
As I understand your inner selects, they serve two purposes: find any name of an associated image, and count the number of associated images. You probably might achieve both using a left join instead of an inner select:
SELECT …,
advert_images.image_name AS imagename,
COUNT(advert_images.advert_id) AS num_photos,
…
FROM …
LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
…
GROUP BY adverts.advert_id
…
LIMIT 0,10
I haven't tried this, but perhaps the MySQL engine is smart enough to only perform that part of the query for the rows you're actually returning.
Note that there are no guarantees at all about which image name this query will return for a given set of images. If you want reproducible results, you should use some aggregating function there, e.g. MIN(advert_images.image_name)
to select the lexicographically first image.
If the above does not work, i.e. the query will still examine the advert_images
table for all rows of the computed result, then you're probably really better off by performing a second query. You might however try avoiding the for
loop, and instead fetch all those rows in a single query:
SELECT advert_images.image_name AS imagename,
COUNT(advert_images.advert_id) AS num_photos
FROM advert_images
WHERE advert_images.advert_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
GROUP BY advert_images.advert_id
The ten parameters in this query correspond to the ten rows of the result you're currently generating. Note that an advert without an associated photo will not be included in that result at all. So make sure to default num_photos
to zero and imagename
to NULL
in your code.
Another way to achieve what you're attempting to do would be to use an explicit temporary in-memory table: first you select the results you're interested in, and then you retrieve all the associated information.
CREATE TEMPORARY TABLE tmp
SELECT adverts.advert_id, round(…) as distance
FROM adverts
WHERE (adverts.status = 1) AND (adverts.approved = 1)
AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
HAVING (distance <= 20)
ORDER BY distance ASC
LIMIT 0,10;
SELECT tmp.distance, adverts.*, …
advert_images.image_name AS imagename,
COUNT(advert_images.advert_id) AS num_photos,
…
FROM tmp
INNER JOIN adverts ON tmp.advert_id = adverts.advert_id
LEFT JOIN breed ON adverts.breed_id = breed.breed_id
LEFT JOIN sellers ON adverts.user_id = sellers.user_id
LEFT JOIN users ON adverts.user_id = users.user_id
LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY tmp.distance ASC;
DROP TABLE tmp;
This will ensure that all the other tables are only queried for the results you're currently working on. After all, there is little magic about the advert_images
table except the fact that you might want multiple rows from it.
Building on the approach from the preceding paragraph, you can even avoid managing a temporary table, and use a subquery in its stead:
SELECT sub.distance, adverts.*, …
advert_images.image_name AS imagename,
COUNT(advert_images.advert_id) AS num_photos,
…
FROM ( SELECT adverts.advert_id, round(…) as distance
FROM adverts
WHERE (adverts.status = 1) AND (adverts.approved = 1)
AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
HAVING (distance <= 20)
ORDER BY distance ASC
LIMIT 0,10;
) AS sub
INNER JOIN adverts ON sub.advert_id = adverts.advert_id
LEFT JOIN breed ON adverts.breed_id = breed.breed_id
LEFT JOIN sellers ON (adverts.user_id = sellers.user_id)
LEFT JOIN users ON (adverts.user_id = users.user_id)
LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY sub.distance ASC
Again you determine the relevant rows by using data from the adverts
table only, and join only the required rows from other tables. Most likely, that intermediate result will internally be stored in a temporary table, but that is up to the SQL server to decide.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With