I have a query where I want to select all users who like a given set of artists. There also are some other WHERE criteria on country etc. Here's what the schema looks like.
          users                     favourite_artists             artists
+----------+------------+    +-----------+------------+    +--------+--------+
|    id    |  country   |    |  user_id  |  artist_id |    |  id    |  name  |
+----------+------------+    +-----------+------------+    +--------+--------+
|     1    |     gb     |    |     1     |      6     |    |   1    |  Muse  |
|     2    |     gb     |    |     1     |      5     |    |   2    |  RATM  |
|     3    |     us     |    |     1     |      3     |    |   3    |  ABBA  | 
|     4    |     us     |    |     2     |      3     |    |   4    |   U2   |
+----------+------------+    +-----------+------------+    +--------+--------+
I want to order them by the number of those artists they like. I also want to include users who don't like any of the artists but who match the WHERE criteria. The expected result set would look like.
+--------+---------------+----------------+
|   id   |    country    |   match_count  |
+--------+---------------+----------------+
|    6   |      gb       |       4        |
|    9   |      gb       |       4        |
|    2   |      gb       |       3        |
|    1   |      gb       |       2        |
|    5   |      gb       |       0        |
|    4   |      gb       |       0        |
+--------+---------------+----------------+
I've been trying to do it using a subquery to get the match_count and ordering by that but it's performing pretty slowly so I thought there'd have to be a better way.
   SELECT users.id, users.country
   (SELECT COUNT(*) FROM favourite_artists 
    WHERE user_id = users.id AND artist_id IN (1,3,4,9)) AS match_count        
   FROM "users" 
   WHERE users.country = 'gb'
   ORDER BY match_count DESC;
I'm using Postgresql 9.0.7. Any thoughts?
Your query is executing one subquery for every row in users. Such queries are called "correlated subqueries" and their performance, quite understandably, sucks.
Instead you want a join:
SELECT users.id, users.country, count(artist_id) as match_count
FROM users
LEFT JOIN favourite_artists ON user_id = users.id AND artist_id IN (1,3,4,9)
WHERE users.country = 'gb'
GROUP BY 1, 2
ORDER BY 3 DESC;
This query will get the joining rows far more efficiently, assuming you have an index on favourite_artists(user_id) - or better yet a multi-column index favourite_artists(user_id, artist_id).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With