My table has the following columns:
gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)
The table has the following indexes
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs | 0 | PRIMARY | 1 | player_gamelogs_id | A | 371330 | NULL | NULL | | BTREE | | |
| player_gamelogs | 1 | player_name | 1 | player_name | A | 3375 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | points | 1 | points | A | 506 | NULL | NULL | YES | BTREE | ## Heading ##| |
| player_gamelogs | 1 | game_id | 1 | game_id | A | 37133 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | season | 1 | season | A | 30 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | team_abbreviation | 1 | team_abbreviation | A | 70 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | player_id | 3 | dk_points | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 1 | game_id | A | 41258 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 2 | player_id | A | 371330 | NULL | NULL | YES | BTREE | | |
| player_gamelogs | 1 | game_player_season | 3 | season_id | A | 371330 | NULL | NULL | | BTREE | | |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I am trying to calculate the mean of points for a season and player prior to the game being played. So for the 3rd game of the season, avg_points would be the mean of games 1 and 2. The game numbers are in sequential order such that an earlier game is less than a later game. I also have the option to use a date field but I figured that numeric comparison would be faster?
My query is as follows:
SELECT game_id,
player_id,
player_name,
(SELECT avg(points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
EXPLAIN produces the following output:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 371330 | Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | game_id,player_id,game_player_season | NULL | NULL | NULL | 371330 | Range checked for each record (index map: 0xC8) |
I am not sure if it is because of the nature of the task involved or because of an inefficiency in my query. Thanks for any suggestions!
Create Indexes If you are running a query and it's slow, it could benefit from indexes. Indexes are often the most useful when created on columns as part of the Join clause and the Where clause. So, check which columns are used there, check if they have indexes, and create them if they don't.
WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits. RUNNING: Queries can be slow because they're running (executing) for a long time. In other words, these queries are actively using CPU resources.
Please consider this query:
SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
LEFT JOIN player_gamelogs t2 ON
t1.game_id > t2.game_id
AND t1.player_id = t2.player_id
AND t1.season_id = t2.season_id
GROUP BY
t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;
Notes:
Group by
already sorts by grouped columns. If you can, avoid ordering afterwards as it generates useless overhead. As outlined in the comments, this is not an official behavior and the outcome of assuming its consistency over time should be pondered vs the risk of suddenly losing sorting.
Your query is fine as written:
SELECT game_id, player_id, player_name,
(SELECT avg(t2.points)
FROM player_gamelogs t2
WHERE t2.game_id < t1.game_id AND
t1.player_id = t2.player_id AND
t1.season_id = t2.season_id
) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;
But, for optimal performance you want two composite indexes on it: (player_id, season_id, game_id, points)
and (player_name, game_id, season_id)
.
The first index should speed the subquery. The second is for the outer order by
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With