Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL very slow query

Tags:

sql

mysql

My table has the following columns:

gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)

The table has the following indexes

+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name           | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs |          0 | PRIMARY            |            1 | player_gamelogs_id | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
| player_gamelogs |          1 | player_name        |            1 | player_name        | A         |        3375 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | points          |            1 | points          | A         |         506 |     NULL | NULL   | YES  | BTREE      |         ## Heading ##|               |
| player_gamelogs |          1 | game_id            |            1 | game_id            | A         |       37133 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | season             |            1 | season             | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | team_abbreviation  |            1 | team_abbreviation  | A         |          70 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            3 | dk_points          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            3 | season_id          | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I am trying to calculate the mean of points for a season and player prior to the game being played. So for the 3rd game of the season, avg_points would be the mean of games 1 and 2. The game numbers are in sequential order such that an earlier game is less than a later game. I also have the option to use a date field but I figured that numeric comparison would be faster?

My query is as follows:

SELECT game_id, 
       player_id, 
       player_name, 
       (SELECT avg(points) 
          FROM player_gamelogs t2
         WHERE t2.game_id < t1.game_id 
           AND t1.player_id = t2.player_id 
           AND t1.season_id = t2.season_id) AS avg_points
  FROM player_gamelogs t1
 ORDER BY player_name, game_id;

EXPLAIN produces the following output:

| id | select_type        | table | type | possible_keys                        | key  | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | t1    | ALL  | NULL                                 | NULL | NULL    | NULL | 371330 | Using filesort                                  |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | game_id,player_id,game_player_season | NULL | NULL    | NULL | 371330 | Range checked for each record (index map: 0xC8) |

I am not sure if it is because of the nature of the task involved or because of an inefficiency in my query. Thanks for any suggestions!

like image 973
Eric Truett Avatar asked Dec 31 '15 00:12

Eric Truett


People also ask

What to Do If SQL query is slow?

Create Indexes If you are running a query and it's slow, it could benefit from indexes. Indexes are often the most useful when created on columns as part of the Join clause and the Where clause. So, check which columns are used there, check if they have indexes, and create them if they don't.

Why is my query running slow?

WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits. RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.


2 Answers

Please consider this query:

SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
        LEFT JOIN player_gamelogs t2 ON 
                t1.game_id > t2.game_id 
            AND t1.player_id = t2.player_id
            AND t1.season_id = t2.season_id 
GROUP BY
    t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;

Notes:

  • To perform optimally, you'd need an additional index on (season_id, game_id, player_id, player_name)
  • Even better, would be to have player table where to retrieve the name from the id. It seems redundant to me that we have to grab the player name from a log table, moreover if it's required in an index.
  • Group by already sorts by grouped columns. If you can, avoid ordering afterwards as it generates useless overhead. As outlined in the comments, this is not an official behavior and the outcome of assuming its consistency over time should be pondered vs the risk of suddenly losing sorting.
like image 148
Sebas Avatar answered Oct 02 '22 01:10

Sebas


Your query is fine as written:

SELECT game_id, player_id, player_name, 
       (SELECT avg(t2.points) 
        FROM player_gamelogs t2
        WHERE t2.game_id < t1.game_id AND
              t1.player_id = t2.player_id AND
              t1.season_id = t2.season_id
      ) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;

But, for optimal performance you want two composite indexes on it: (player_id, season_id, game_id, points) and (player_name, game_id, season_id).

The first index should speed the subquery. The second is for the outer order by.

like image 26
Gordon Linoff Avatar answered Oct 02 '22 02:10

Gordon Linoff