I have a query that takes roughly four minutes to run on a high powered SSD server with no other notable processes running. I'd like to make it faster if possible.
The database stores a match history for a popular video game called Dota 2. In this game, ten players (five on each team) each select a "hero" and battle it out.
The intention of my query is to create a list of past matches along with how much of a "XP dependence" each team had, based on the heroes used. With 200,000 matches (and a 2,000,000 row matches-to-heroes relationship table) the query takes about four minutes. With 1,000,000 matches, it takes roughly 15.
I have full control of the server, so any configuration suggestions are also appreciated. Thanks for any help guys. Here are the details...
CREATE TABLE matches (
* match_id BIGINT UNSIGNED NOT NULL,
start_time INT UNSIGNED NOT NULL,
skill_level TINYINT NOT NULL DEFAULT -1,
* winning_team TINYINT UNSIGNED NOT NULL,
PRIMARY KEY (match_id),
KEY start_time (start_time),
KEY skill_level (skill_level),
KEY winning_team (winning_team));
CREATE TABLE heroes (
* hero_id SMALLINT UNSIGNED NOT NULL,
name CHAR(40) NOT NULL DEFAULT '',
faction TINYINT NOT NULL DEFAULT -1,
primary_attribute TINYINT NOT NULL DEFAULT -1,
group_index TINYINT NOT NULL DEFAULT -1,
match_count BIGINT UNSIGNED NOT NULL DEFAULT 0,
win_count BIGINT UNSIGNED NOT NULL DEFAULT 0,
* xp_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
* team_xp_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
xp_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
team_xp_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
gold_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
team_gold_from_wins BIGINT UNSIGNED NOT NULL DEFAULT 0,
gold_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
team_gold_from_losses BIGINT UNSIGNED NOT NULL DEFAULT 0,
included TINYINT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (hero_id));
CREATE TABLE matches_heroes (
* match_id BIGINT UNSIGNED NOT NULL,
player_id INT UNSIGNED NOT NULL,
* hero_id SMALLINT UNSIGNED NOT NULL,
xp_per_min SMALLINT UNSIGNED NOT NULL,
gold_per_min SMALLINT UNSIGNED NOT NULL,
position TINYINT UNSIGNED NOT NULL,
PRIMARY KEY (match_id, hero_id),
KEY match_id (match_id),
KEY player_id (player_id),
KEY hero_id (hero_id),
KEY xp_per_min (xp_per_min),
KEY gold_per_min (gold_per_min),
KEY position (position));
Query
SELECT
matches.match_id,
SUM(CASE
WHEN position < 5 THEN xp_from_wins / team_xp_from_wins
ELSE 0
END) AS radiant_xp_dependence,
SUM(CASE
WHEN position >= 5 THEN xp_from_wins / team_xp_from_wins
ELSE 0
END) AS dire_xp_dependence,
winning_team
FROM
matches
INNER JOIN
matches_heroes
ON matches.match_id = matches_heroes.match_id
INNER JOIN
heroes
ON matches_heroes.hero_id = heroes.hero_id
GROUP BY
matches.match_id
Sample Results
match_id | radiant_xp_dependence | dire_xp_dependence | winning_team
2298874871 | 1.0164 | 0.9689 | 1
2298884079 | 0.9932 | 1.0390 | 0
2298885606 | 0.9877 | 1.0015 | 1
EXPLAIN
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | heroes | ALL | PRIMARY | NULL | NULL | NULL | 111 | Using temporary; Using filesort
1 | SIMPLE | matches_heroes | ref | PRIMARY,match_id,hero_id | hero_id | 2 | dota_2.heroes.hero_id | 3213 |
1 | SIMPLE | matches | eq_ref | PRIMARY | PRIMARY | 8 | dota_2.matches_heroes.match_id | 1 |
Machine Specs
Database
In all likelihood, the main performance driver is the GROUP BY
. Sometimes, in MySQL, it can be faster to use correlated subuqeries. So, try writing the query like this:
SELECT m.match_id,
(SELECT SUM(h.xp_from_wins / h.team_xp_from_wins)
FROM matches_heroes mh INNER JOIN
heroes h
ON mh.hero_id = h.hero_id
WHERE m.match_id = mh.match_id AND mh.position < 5
) AS radiant_xp_dependence,
(SELECT SUM(h.xp_from_wins / h.team_xp_from_wins)
FROM matches_heroes mh INNER JOIN
heroes h
ON mh.hero_id = h.hero_id
WHERE m.match_id = mh.match_id AND mh.position >= 5
) AS dire_xp_dependence,
m.winning_team
FROM matches m;
Then, you want indexes on:
matches_heroes(match_id, position)
heroes(hero_id, xp_from_wins, team_xp_from_wins)
For completeness, you might want this index as well:
matches(match_id, winning_team)
This would be more important if you added order by match_id
to the query.
As has already been mentioned in a comment; there is little you can do, because you select all data from the table. The query looks perfect.
The one idea that comes to mind are covering indexes. With indexes containing all data needed for the query, the tables themselves don't have to be accessed anymore.
CREATE INDEX matches_quick ON matches(match_id, winning_team);
CREATE INDEX heroes_quick ON heroes(hero_id, xp_from_wins, team_xp_from_wins);
CREATE INDEX matches_heroes_quick ON matches_heroes (match_id, hero_id, position);
There is no guarantee for this to speed up your query, as you are still reading all data, so running through the indexes may be just as much work as reading the tables. But there is a chance that the joins will be faster and there would probably be less physical read. Just give it a try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With