Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MYSQL query optimization, multiple queries or one large query

I have a query which has some sub queries (inner selects), i'm trying work out which is better for performance, one larger query or lots of smaller queries, i'm finding it difficult to try and time the differences as it changes all the time on my server.

I use the query below to return 10 results at a time to display on my website, using pagination (offset and limit).

SELECT adverts.*, breed.breed, breed.type, sellers.profile_name, sellers.logo, users.user_level , 
round( sqrt( ( ( (adverts.latitude - '51.558430') * (adverts.latitude - '51.558430') ) * 69.1 * 69.1 ) + ( (adverts.longitude - '-0.0069345') * (adverts.longitude - '-0.0069345') * 53 * 53 ) ), 1 ) as distance, 
( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename, 
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos 
FROM adverts 
LEFT JOIN breed ON adverts.breed_id = breed.breed_id 
LEFT JOIN sellers ON (adverts.user_id = sellers.user_id) 
LEFT JOIN users ON (adverts.user_id = users.user_id) 
WHERE (adverts.status = 1) AND (adverts.approved = 1) 
AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719) AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613) 
having (distance <= '20') 
ORDER BY distance ASC 
LIMIT 0,10

Would it be better to remove the 2 inner selects below from the main query, and then in my php loop, call the 2 selects 10 times, once for each record in the loop?

( SELECT advert_images.image_name FROM advert_images WHERE advert_images.advert_id = adverts.advert_id AND advert_images.main = 1 LIMIT 1) as imagename, 
( SELECT count(advert_images.advert_id) from advert_images WHERE advert_images.advert_id = adverts.advert_id ) AS num_photos 
like image 876
user1052096 Avatar asked Nov 03 '22 15:11

user1052096


1 Answers

Avoid subqueries

As I understand your inner selects, they serve two purposes: find any name of an associated image, and count the number of associated images. You probably might achieve both using a left join instead of an inner select:

SELECT …,
      advert_images.image_name AS imagename,
      COUNT(advert_images.advert_id) AS num_photos,
      …
FROM …
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
…
GROUP BY adverts.advert_id
…
LIMIT 0,10

I haven't tried this, but perhaps the MySQL engine is smart enough to only perform that part of the query for the rows you're actually returning.

Note that there are no guarantees at all about which image name this query will return for a given set of images. If you want reproducible results, you should use some aggregating function there, e.g. MIN(advert_images.image_name) to select the lexicographically first image.

Separate select but no loop

If the above does not work, i.e. the query will still examine the advert_images table for all rows of the computed result, then you're probably really better off by performing a second query. You might however try avoiding the for loop, and instead fetch all those rows in a single query:

SELECT advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos
FROM advert_images
WHERE advert_images.advert_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
GROUP BY advert_images.advert_id

The ten parameters in this query correspond to the ten rows of the result you're currently generating. Note that an advert without an associated photo will not be included in that result at all. So make sure to default num_photos to zero and imagename to NULL in your code.

Temporary table

Another way to achieve what you're attempting to do would be to use an explicit temporary in-memory table: first you select the results you're interested in, and then you retrieve all the associated information.

CREATE TEMPORARY TABLE tmp
SELECT adverts.advert_id, round(…) as distance
FROM adverts
WHERE (adverts.status = 1) AND (adverts.approved = 1)
  AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
  AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
HAVING (distance <= 20)
ORDER BY distance ASC
LIMIT 0,10;

SELECT tmp.distance, adverts.*, …
       advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos,
       …
FROM tmp
     INNER JOIN adverts ON tmp.advert_id = adverts.advert_id
     LEFT JOIN breed ON adverts.breed_id = breed.breed_id
     LEFT JOIN sellers ON adverts.user_id = sellers.user_id
     LEFT JOIN users ON adverts.user_id = users.user_id
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY tmp.distance ASC;

DROP TABLE tmp;

This will ensure that all the other tables are only queried for the results you're currently working on. After all, there is little magic about the advert_images table except the fact that you might want multiple rows from it.

Subquery as join factor

Building on the approach from the preceding paragraph, you can even avoid managing a temporary table, and use a subquery in its stead:

SELECT sub.distance, adverts.*, …
       advert_images.image_name AS imagename,
       COUNT(advert_images.advert_id) AS num_photos,
       …
FROM ( SELECT adverts.advert_id, round(…) as distance
        FROM adverts
        WHERE (adverts.status = 1) AND (adverts.approved = 1)
          AND (adverts.latitude BETWEEN 51.2692837281 AND 51.8475762719)
          AND (adverts.longitude BETWEEN -0.472015213613 AND 0.458146213613)
        HAVING (distance <= 20)
        ORDER BY distance ASC
        LIMIT 0,10;
     ) AS sub
     INNER JOIN adverts ON sub.advert_id = adverts.advert_id
     LEFT JOIN breed ON adverts.breed_id = breed.breed_id 
     LEFT JOIN sellers ON (adverts.user_id = sellers.user_id) 
     LEFT JOIN users ON (adverts.user_id = users.user_id) 
     LEFT JOIN advert_images ON advert_images.advert_id = adverts.advert_id
GROUP BY adverts.advert_id
ORDER BY sub.distance ASC

Again you determine the relevant rows by using data from the adverts table only, and join only the required rows from other tables. Most likely, that intermediate result will internally be stored in a temporary table, but that is up to the SQL server to decide.

like image 136
MvG Avatar answered Nov 09 '22 14:11

MvG