Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I simplify/improve the performance of this MySQL query?

I am very new to MySQL and thanks to the great support from you more experienced guys here I am managing to struggle by, while learning a lot in the process.

I have a query that does exactly what I want. However, it looks extremely messy to me and I am certain there must be a way to simplify it.

How can this query be improved and optimized for performance?

Many thanks

            $sQuery = "
        SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."

    FROM $sTable b 
    LEFT JOIN (
   SELECT COUNT(*) AS projects_count, a.songs_id

   FROM $sTable2 a
   GROUP BY a.songs_id
) bb ON bb.songs_id = b.songsID

LEFT JOIN (
   SELECT AVG(rating) AS rating, COUNT(rating) AS ratings_count, c.songid

FROM $sTable3 c

   GROUP BY c.songid   
) bbb ON bbb.songid = b.songsID

LEFT JOIN (
   SELECT c.songid, c.userid,

    CASE WHEN EXISTS 
   ( 
       SELECT songid 
       FROM $sTable3
       WHERE songid = c.songid 
   ) Then 'User Voted'
   else
   (
       'Not Voted'
   )
   end
   AS voted
FROM $sTable3 c
WHERE c.userid = $userid


   GROUP BY c.songid   
) bbbb ON bbbb.songid = b.songsID

EDIT: Here is a description of what the query is doing:-

I have three tables:

  • $sTable = a table of songs (songid, mp3link, artwork, useruploadid etc.)

  • $sTable2 = a table of projects with songs linked to them (projectid, songid, project name etc.)

  • $sTable3 = a table of song ratings (songid, userid, rating)

All of this data is output to a JSON array and displayed in a table in my application to provide a list of songs, combined with the projects and ratings data.

The query itself does the following in this order:-

  1. Collects all rows from $sTable
  2. Joins to $sTable2 on songsID and counts the number of rows (projects) in this table which have the same songsID
  3. Joins to $stable3 on songsID and works out an average of the column 'rating' in this table which have the same songsID
  4. At this point it also counts the total number of rows in $sTable3 which have the same songID to provide a total number of votes.
  5. Finally it performs a check on all these rows to see if the $userid (which is a variable containing the ID of the logged in user) matches the 'userid' stores in $sTable3 for each row in order to check whether a user has already voted on a given songID or not. If it matches then it returns "User Voted" if not it returns "Not Voted". It outputs this as a seperate column into my JSON array which I then check against clientside in my app and add a class to.

If there is any more detail anyone needs, please just let me know. Thanks all.

EDIT:

Thanks to Aurimis' excellent first attempt I am closing in on a much more simple solution.

This is the code I have tried based on that suggestion.

SELECT SQL_CALC_FOUND_ROWS ".str_replace(" , ", " ", implode(", ", $aColumns))."

    FROM 
      (SELECT 
        $sTable.songsID, COUNT(rating) AS ratings_count, 
        AVG(rating) AS ratings
      FROM $sTable 
        LEFT JOIN $sTable2 ON $sTable.songsID = $sTable2.songs_id
        LEFT JOIN $sTable3 ON $sTable.songsID = $sTable3.songid
      GROUP BY $sTable.songsID) AS A
    LEFT JOIN $sTable3 AS B ON A.songsID = B.songid AND B.userid = $userid

There are several problems however. I had to remove the first line of your answer as it caused a 500 internal server error:

IF(B.userid = NULL, "Not voted", "User Voted") AS voted 

Obviously now the 'voted check' functionality is lost.

Also and more importantly it is not returning all the columns defined in my array, only the songsID. My JSON returns Unknown column 'song_name' in 'field list' - If I remov it from my $aColumns array it will of course move on to the next one.

I am defining my columns at the beginning of my script as this array is used for filtering and putting together the output for the JSON encode. This is the definition of $aColumns:-

$aColumns = array( 'songsID', 'song_name', 'artist_band_name', 'author', 'song_artwork', 'song_file', 'genre', 'song_description', 'uploaded_time', 'emotion', 'tempo', 'user', 'happiness', 'instruments', 'similar_artists', 'play_count', 'projects_count',  'rating', 'ratings_count', 'voted');

In order to quickly test the rest of the query I modified the first line within the subquery to select $sTable.* rather than $sTable.songsID (remember $sTable is the songs table)

Then... The query obviously worked, but with terrible performance of course. But only returned 24 songs out of the 5000 song test dataset. Therefore I changed your first 'JOIN' to a 'LEFT JOIN' so that all 5000 songs were returned. To clarify the query needs to return ALL of the rows in the songs table but with various extra bits of data from the projects and ratings tables for each song.

So... We are getting there and I am certain that this is a much better approach it just needs some modification. Thanks for your help so far Aurimis.

like image 724
gordyr Avatar asked Nov 29 '11 11:11

gordyr


1 Answers

SELECT SQL_CALC_FOUND_ROWS
    songsID, song_name, artist_band_name, author, song_artwork, song_file,
    genre, song_description, uploaded_time, emotion, tempo,
    `user`, happiness, instruments, similar_artists, play_count,
    projects_count,
    rating, ratings_count,
    IF(user_ratings_count, 'User Voted', 'Not Voted') as voted
FROM (
    SELECT
        sp.songsID, projects_count,
        AVG(rating) as rating,
        COUNT(rating) AS ratings_count,
        COUNT(IF(userid=$userid, 1, NULL)) as user_ratings_count
    FROM (
        SELECT songsID, COUNT(*) as projects_count
        FROM $sTable s
        LEFT JOIN $sTable2 p ON s.songsID = p.songs_id
        GROUP BY songsID) as sp
    LEFT JOIN $sTable3 r ON sp.songsID = r.songid
    GROUP BY sp.songsID) as spr
JOIN $sTable s USING (songsID);

You will need the following indexes:

  • (songs_id) on $sTable2
  • the composite (songid, rating, userid) on $sTable3

the ideas behind the query:

  • subqueries operate with INTs so that the result of the subquery would easily fit in memory
  • left joins are grouped separately to reduce the cartesian product
  • user votes are counted in the same subquery as other ratings to avoid expensive correlated subquery
  • all othe information is retrieved ib the final join
like image 138
newtover Avatar answered Nov 14 '22 23:11

newtover