Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL not responding for simple queries when using INNER JOIN. especially "Sending data" takes too much time

I have two tables; songs with 48 959 281 rows, and popular_songs with 5 721 117 rows.

This is structure of that two tables:

CREATE TABLE songs (
  songId       BIGINT(20)    NOT NULL,
  songName     VARCHAR(1000) NOT NULL,
  songDuration BIGINT(20)    NOT NULL,
  songPreview  VARCHAR(1000) NOT NULL,
  PRIMARY KEY (songId),
  INDEX (songDuration)
)
  ENGINE = InnoDB
  DEFAULT CHARSET = utf8
  DEFAULT COLLATE utf8_general_ci;

CREATE TABLE popular_songs (
  storeFrontId BIGINT(20) NOT NULL,
  genreId      BIGINT(20) NOT NULL,
  songId       BIGINT(20) NOT NULL,
  songRank     INT(11)    NOT NULL,
  INDEX (storeFrontId),
  INDEX (genreId),
  INDEX (songId),
  INDEX (songRank),
  CONSTRAINT popular_song UNIQUE (storeFrontId, genreId, songId),
  FOREIGN KEY (storeFrontId) REFERENCES storefront (storeFrontId),
  FOREIGN KEY (genreId) REFERENCES genre (genreId),
  FOREIGN KEY (songId) REFERENCES songs (songId)
)
  ENGINE = InnoDB
  DEFAULT CHARSET = utf8
  DEFAULT COLLATE utf8_general_ci;

When I'm running this below two quires everything works fine.

mysql> SELECT count(*) FROM songs;
+----------+
| count(*) |
+----------+
| 48959281 |
+----------+
1 row in set (9.10 sec)

mysql> SHOW PROFILE;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000008 |
| Waiting for query cache lock   | 0.000002 |
| checking query cache for query | 0.000019 |
| checking permissions           | 0.000003 |
| Opening tables                 | 0.000011 |
| System lock                    | 0.000004 |
| Waiting for query cache lock   | 0.000017 |
| init                           | 0.000006 |
| optimizing                     | 0.000003 |
| statistics                     | 0.000004 |
| preparing                      | 0.000004 |
| executing                      | 0.000003 |
| Sending data                   | 9.100444 |
| end                            | 0.000012 |
| query end                      | 0.000005 |
| closing tables                 | 0.000008 |
| freeing items                  | 0.000007 |
| Waiting for query cache lock   | 0.000002 |
| freeing items                  | 0.000012 |
| Waiting for query cache lock   | 0.000002 |
| freeing items                  | 0.000001 |
| storing result in query cache  | 0.000002 |
| logging slow query             | 0.000001 |
| cleaning up                    | 0.000003 |
+--------------------------------+----------+
24 rows in set (0.00 sec)

mysql> SELECT count(*) FROM popular_songs;
+----------+
| count(*) |
+----------+
|  5721117 |
+----------+
1 row in set (1.34 sec)

mysql> SHOW PROFILE;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000012 |
| Waiting for query cache lock   | 0.000004 |
| checking query cache for query | 0.000037 |
| checking permissions           | 0.000006 |
| Opening tables                 | 0.000017 |
| System lock                    | 0.000007 |
| Waiting for query cache lock   | 0.000031 |
| init                           | 0.000010 |
| optimizing                     | 0.000017 |
| statistics                     | 0.000004 |
| preparing                      | 0.000004 |
| executing                      | 0.000003 |
| Sending data                   | 1.343991 |
| end                            | 0.000010 |
| query end                      | 0.000005 |
| closing tables                 | 0.000007 |
| freeing items                  | 0.000007 |
| Waiting for query cache lock   | 0.000002 |
| freeing items                  | 0.000011 |
| Waiting for query cache lock   | 0.000001 |
| freeing items                  | 0.000001 |
| storing result in query cache  | 0.000002 |
| logging slow query             | 0.000001 |
| cleaning up                    | 0.000003 |
+--------------------------------+----------+
24 rows in set (0.00 sec)

When I'm running these below queries MySQL is not responding. I don't understand what the reason is.

SELECT COUNT(*)
FROM songs
INNER JOIN popular_songs ON popular_songs.songId = songs.songId

SELECT COUNT(*)
FROM songs
INNER JOIN popular_songs ON popular_songs.songId = songs.songId
WHERE songs.songDuration > 0

Update:

MySQL responded with above two queries. But it takes 250-300 seconds on Sending data. How to optimize this.

EXPLAIN EXTENDED report:

mysql> EXPLAIN EXTENDED
    -> SELECT COUNT(*)
    -> FROM songs
    -> INNER JOIN popular_songs ON popular_songs.songId = songs.songId;
+----+-------------+---------------+--------+---------------+----------+---------+-----------------------------+---------+----------+-------------+
| id | select_type | table         | type   | possible_keys | key      | key_len | ref                         | rows    | filtered | Extra       |
+----+-------------+---------------+--------+---------------+----------+---------+-----------------------------+---------+----------+-------------+
|  1 | SIMPLE      | popular_songs | index  | songId        | songRank | 4       | NULL                        | 6449163 |   100.00 | Using index |
|  1 | SIMPLE      | songs         | eq_ref | PRIMARY       | PRIMARY  | 8       | itunes.popular_songs.songId |       1 |   100.00 | Using index |
+----+-------------+---------------+--------+---------------+----------+---------+-----------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

mysql> EXPLAIN EXTENDED
    -> SELECT COUNT(*)
    -> FROM songs
    -> INNER JOIN popular_songs ON popular_songs.songId = songs.songId
    -> WHERE songs.songDuration > 0;
+----+-------------+---------------+--------+----------------------+----------+---------+-----------------------------+---------+----------+-------------+
| id | select_type | table         | type   | possible_keys        | key      | key_len | ref                         | rows    | filtered | Extra       |
+----+-------------+---------------+--------+----------------------+----------+---------+-----------------------------+---------+----------+-------------+
|  1 | SIMPLE      | popular_songs | index  | songId               | songRank | 4       | NULL                        | 6449163 |   100.00 | Using index |
|  1 | SIMPLE      | songs         | eq_ref | PRIMARY,songDuration | PRIMARY  | 8       | itunes.popular_songs.songId |       1 |   100.00 | Using where |
+----+-------------+---------------+--------+----------------------+----------+---------+-----------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

When using INDEX songId for joing two tables:

mysql> SELECT COUNT(*) FROM songs INNER JOIN popular_songs USE INDEX FOR JOIN (songId) ON popular_songs.songId = songs.songId;
+----------+
| COUNT(*) |
+----------+
|  5721117 |
+----------+
1 row in set (25.35 sec)

mysql> SHOW PROFILE;
+--------------------------------+-----------+
| Status                         | Duration  |
+--------------------------------+-----------+
| starting                       |  0.000011 |
| Waiting for query cache lock   |  0.000004 |
| checking query cache for query |  0.000045 |
| checking permissions           |  0.000004 |
| checking permissions           |  0.000004 |
| Opening tables                 |  0.000023 |
| System lock                    |  0.000007 |
| Waiting for query cache lock   |  0.000030 |
| init                           |  0.000021 |
| optimizing                     |  0.000011 |
| statistics                     |  0.000020 |
| preparing                      |  0.000012 |
| executing                      |  0.000005 |
| Sending data                   | 25.350160 |
| end                            |  0.000019 |
| query end                      |  0.000005 |
| closing tables                 |  0.000024 |
| freeing items                  |  0.000022 |
| Waiting for query cache lock   |  0.000002 |
| freeing items                  |  0.000016 |
| Waiting for query cache lock   |  0.000001 |
| freeing items                  |  0.000001 |
| storing result in query cache  |  0.000002 |
| logging slow query             |  0.000001 |
| logging slow query             |  0.000005 |
| cleaning up                    |  0.000003 |
+--------------------------------+-----------+
26 rows in set (0.00 sec)

It's pretty nice. But, still it's take 25 seconds on Sending data.

Explain extended for above query:

mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM songs INNER JOIN popular_songs USE INDEX FOR JOIN (songId) ON popular_songs.songId = songs.songId;
+----+-------------+---------------+--------+---------------+---------+---------+-----------------------------+---------+----------+-------------+
| id | select_type | table         | type   | possible_keys | key     | key_len | ref                         | rows    | filtered | Extra       |
+----+-------------+---------------+--------+---------------+---------+---------+-----------------------------+---------+----------+-------------+
|  1 | SIMPLE      | popular_songs | index  | songId        | songId  | 8       | NULL                        | 2684407 |   100.00 | Using index |
|  1 | SIMPLE      | songs         | eq_ref | PRIMARY       | PRIMARY | 8       | itunes.popular_songs.songId |       1 |   100.00 | Using index |
+----+-------------+---------------+--------+---------------+---------+---------+-----------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
like image 794
Chandra Nakka Avatar asked Feb 18 '16 12:02

Chandra Nakka


4 Answers

My guess is that the index on songDuration is confusing the optimizer. You might try phrasing the query like this:

SELECT COUNT(*)
FROM songs s
WHERE EXISTS (SELECT 1
              FROM popular_songs ps
              WHERE ps.songId = s.songId
             ) AND
      s.songDuration > 0;

The index on songDuration would be better to include songId as well: songs(songDuration, songId).

like image 147
Gordon Linoff Avatar answered Sep 17 '22 22:09

Gordon Linoff


It picked songRank because it was the smallest index (INT, not BIGINT).

"Sending data" is one of about 2 useless things that come from that tool. Since it rarely provides anything useful, I don't use it.

Often timing is confusing because of caching. You appear to have the Query cache turned off (good). If the data (or index) is not currently cached in RAM (InnoDB's buffer_pool), the timings are inflated due to I/O. The times you have look like they were CPU bound. Running a query twice avoids the I/O question.

There are several things to do to shrink the tables; this tends to make them run faster.

popular_songs has not explicit PRIMARY KEY, so a 6-byte PK was invented. Since you have CONSTRAINT popular_song UNIQUE (storeFrontId, genreId, songId), that is a 'natural' PK. Get rid of the constraint and add that as the PK. (We may need to rearrange the columns; we'll see.)

Use INT UNSIGNED, not BIGINT for all the ids; that cuts the size in half.

songRank allows -2 billion to + 2 billion and occupies 4 bytes. You can probably find a smaller field, and may want to use UNSIGNED. For example, SMALLINT UNSIGNED is 2 bytes and allows values 0..65535.

As already mentioned, songDuration is using too big a field (unless it is measured in nanoseconds).

INDEX (storeFrontId) is redundant with the UNIQUE index, so it could be removed.

But the real issue is that the tables are big, and JOINs cost something. I do not think you can get the queries to run much faster than what you are seeing.

That leads to another question -- why do these queries matter? They sound like one-time queries, not something that a user sitting at a UI is waiting for every day. When dealing with long-running queries against large tables, it is often best to re-think the requirements.

If the queries are repeatedly being performed, can't they be run once a day and 'cached'? I suspect the tables are not growing by more than 1% per day, implying that the result sets will be correct to about 1%; isn't that "close enough"?

If there are other purposes for the queries, perhaps we can come up with a different way to solve the real problem.

like image 22
Rick James Avatar answered Sep 16 '22 22:09

Rick James


Note that select count(*) from table is actually not reading the table at all. It is picking any available unique index (e.g. a primary key), and return the number of rows in that index (which in fact is equal to the number of the rows of a table).

However as soon as you join two tables, data has to be read. You have huge data, it makes sense to perform the join and the where criteria from the indexes instead of reading in all the data from the disk.

  • popular_songs.songid shall be also a primary key

This helps on SELECT COUNT(*) FROM songs INNER JOIN popular_songs ON popular_songs.songId = songs.songId

  • song.songDuration must have a non-unique index, otherwise you read all 48 million records just to evaluate the where criteria. (this you already have)

This helps on SELECT COUNT(*) FROM songs INNER JOIN popular_songs ON popular_songs.songId = songs.songId WHERE songs.songDuration > 0

  • avoid where criterias with low cardinality

songDuration is mostly some number, for very few songs it is zero. Therefore using it as a where criteria will not use the index (you had to force the index yourselves). It is better idea to have a boolean field telling if the song has zero duration (is zero length song a valid use case? I guess not? The songDuration can be a nullable field and contain null if the duratiuon is not known.) But if you go for the boolean field or a 0/1 field, you can benefit of bitmap indexes if your database supports that.

Some rules of thumb: avoid reading data from the tables if not needed:

  • prefer unique indexes over non-unique indexes
  • if you join two tables, the joining column in both tables must have an index. If you can have unique indexes that is better, if the column is actually the primary key it is the best.
  • all columns in the where criterias must be indexed. Otherwise to evaluate the where criteria, all rows must be read from the disk.
  • use compound indexes if possibe. E.g. if you want to speed up a query which contains where songId=xxx and genre=yyy then make one index which contains both songid AND genre instead of indexing the columns individually. This particular index can be a unique index, since songId is unique - reading data from unique indexes is faster
  • indexing columns in advance, and indexing individual columns usually makes no sense, it is a premature optimization. Indexes are for speeding up your queries, and you have queries once you have an application. Then you can see which query is slow, and speed it up using the methods above.
  • avoid of adding all kinds of possible indexes just "because it might be good at some point". Indexes makes selects quick, but slows update and insert and delete statements.
like image 40
Gee Bee Avatar answered Sep 20 '22 22:09

Gee Bee


It depends very much on your search criteria. If you want to search over song.songDuration try to create composite index:

ALTER TABLE song ADD INDEX songDuration (songId, songDuration);

That way the MySQL optimizer will use only the index instead of searching for the row just to fetch songId.

Also you could try to set the isolation level of READ UNCOMMITED, if you think your result set could live with that. It will speed up your query a lot!

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;

EDIT

I have missed that you are storing huge VARCHARs inside the table. I'm currently not working on MySQL but I remember that MySQL keeps the whole VARCHAR inside the table no matter what size it is. You could improve the performance a lot if change a songPreview to TEXT type for example.

TEXT, BLOB and CBLOB fields are kept outside the table, so the join and search by other fields should permorms better.

If you really think there are much songs with title around 1000 characters you could also make songName TEXT type and use a FULLTEXT index to search. If your song titles are not do big, you could use CHAR just to keep the row size fixed and to improve the table performance.

You could read more about why to use TEXT against large VARCHAR here High Performance MySQL

About FULLTEXT index search - I have used it in real project over TEXT field, so I could confirm it performs better than VARCHAR with LIKE.

Your tables are really huge, so if you want to change the types from VARCHAR(1000) to TEXT do not alter the table. It will perform better if you create new tables and copy the data in new tables. Altering the old one will take forever. You could speedup the process if you don't create indexes and constraints at the table definition but after the old data is copied inside the table.

like image 42
Mita Avatar answered Sep 19 '22 22:09

Mita