Basically having some performance issues with queries, mainly to my largest table which holds call data.
The main query contains quite a few left joins & sub-selects, but in a scenario where I'm running a query where I expect back 1.3M calls to be returned, the query is just not doing it. Having to stop it at 7 minutes means there's definately a problem somewhere.
I've narrowed down the main query and tested the simplest sub-select join which is
SELECT
DateStart,
ID,
NumbID,
EffectiveFlag,
OrigNumber
FROM calls
WHERE
DateStart <= '2013-12-31'
AND DateStart >= '2013-01-01'
AND CallLength >= '00:00:00'
AND Direction = '1'
AND CustID IN (474,482,250,268,197,604,132,359,279,441,118,448,152,133,380,162,249,679,226,259,2450,2408,2451,2453,2439,2454,2444,2445,2452)
And even that query takes 4.5s - so when it's a sub-select in a query with other joins & sub-selected, I can imagine why the query as a whole is unusable.
The explain statement for the above query is
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
| 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber,idx_Direction | idx_CustID_DateStart | 7 | NULL | 1660009 | Using where |
+----+-------------+-------+-------+-------------------------------------------------------------------------------------------------------+----------------------+---------+------+---------+-------------+
The database schema of the calls table is
+-------------------+-------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+-------------+------+-----+---------------------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| CustID | int(11) | NO | MUL | 0 | |
| CarrID | int(11) | NO | MUL | NULL | |
| TariID | int(11) | NO | MUL | 0 | |
| CarrierRef | varchar(30) | NO | MUL | | |
| NumbID | int(11) | NO | MUL | 0 | |
| VlviID | int(11) | NO | MUL | NULL | |
| VcamID | int(11) | NO | MUL | NULL | |
| SomeID | int(11) | NO | MUL | NULL | |
| VlnsID | int(11) | NO | MUL | NULL | |
| NGNumber | varchar(12) | NO | | | |
| OrigNumber | varchar(16) | NO | MUL | NULL | |
| CLIRestrictedFlag | int(2) | NO | | NULL | |
| OrigLocality | varchar(11) | NO | MUL | | |
| OrigAreaCode | varchar(11) | NO | MUL | | |
| TermNumber | varchar(16) | NO | MUL | NULL | |
| BatchNumber | varchar(10) | NO | MUL | | |
| DateStart | date | NO | MUL | 0000-00-00 | |
| DateClear | date | NO | | 0000-00-00 | |
| TimeStart | time | NO | | 00:00:00 | |
| TimeClear | time | NO | | 00:00:00 | |
| CallLength | time | NO | | 00:00:00 | |
| RingLength | time | NO | | 00:00:00 | |
| EffectiveFlag | smallint(1) | NO | MUL | NULL | |
| UnansweredFlag | smallint(1) | NO | MUL | NULL | |
| EngagedFlag | smallint(1) | NO | | NULL | |
| RecID | int(11) | NO | MUL | NULL | |
| CreatedUserID | int(11) | NO | | 0 | |
| CreatedDatetime | datetime | NO | MUL | 0000-00-00 00:00:00 | |
| Direction | int(1) | NO | MUL | NULL | |
+-------------------+-------------+------+-----+---------------------+----------------+
The indexes on the calls table are
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| calls | 0 | PRIMARY | 1 | ID | A | 23905312 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigNumber | 1 | OrigNumber | A | 5976328 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigLocality | 1 | OrigLocality | A | 45019 | NULL | NULL | | BTREE | |
| calls | 1 | idx_OrigAreaCode | 1 | OrigAreaCode | A | 846 | NULL | NULL | | BTREE | |
| calls | 1 | idx_TermNumber | 1 | TermNumber | A | 232090 | NULL | NULL | | BTREE | |
| calls | 1 | idx_DateStart | 1 | DateStart | A | 4596 | NULL | NULL | | BTREE | |
| calls | 1 | idx_EffectiveFlag | 1 | EffectiveFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_UnansweredFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_EngagedFlag | 1 | UnansweredFlag | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_TariID | 1 | TariID | A | 110 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_DateStart | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_DateStart | 2 | DateStart | A | 919435 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID_DateStart | 1 | NumbID | A | 37765 | NULL | NULL | | BTREE | |
| calls | 1 | idx_NumbID_DateStart | 2 | DateStart | A | 5976328 | NULL | NULL | | BTREE | |
| calls | 1 | idx_RecID | 1 | RecID | A | 288015 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CarrierRef | 1 | CarrierRef | A | 7968437 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_CallTermNumber | 1 | CustID | A | 1685 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CustID_CallTermNumber | 2 | TermNumber | A | 246446 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CreatedDatetime | 1 | CreatedDatetime | A | 771139 | NULL | NULL | | BTREE | |
| calls | 1 | idx_Direction | 1 | Direction | A | 2 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VlviID | 1 | VlviID | A | 50539 | NULL | NULL | | BTREE | |
| calls | 1 | idx_SomeID | 1 | SomeID | A | 30 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VcamID | 1 | VcamID | A | 64 | NULL | NULL | | BTREE | |
| calls | 1 | idx_VlnsID | 1 | VlnsID | A | 191 | NULL | NULL | | BTREE | |
| calls | 1 | idx_CarrID | 1 | CarrID | A | 4 | NULL | NULL | | BTREE | |
| calls | 1 | idx_BatchNumber | 1 | BatchNumber | A | 271651 | NULL | NULL | | BTREE | |
+-------+------------+---------------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
Something which I understand may be causing the performance, is the indexes on columns with a low cardinality. I know columns such as Direction which has a cardinality of 2 is actually probably worse of with an index in terms of performance, but that alone shouldn't be making the statement so slow.
In terms of the cardinality requirements to have a worthwhile index, is there a general cardinality percentage compared to total table records at which an index increases performance and when it reduces performance?
I understand that no one is going to be able to throw an answer at me that will change the query time from 4.5s to 0.01s, but any advice on either the query itself, the table schema, the indexes, or the hardware would be greatly appreciated.
Update:
@Sebas "could you please rerun the query AND explain plan without the part: AND CallLength >= '00:00:00' AND Direction = '1' please?"
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
| 1 | SIMPLE | calls | range | idx_CustID,idx_DateStart,idx_CustID_DateStart,idx_CustID_TermNumber | idx_CustID_DateStart | 7 | NULL | 724813 | Using where |
+----+-------------+-------+-------+---------------------------------------------------------------------+----------------------+---------+------+--------+-------------+
The MySQL Performance Schema is a feature for monitoring MySQL Server execution at a low level. The Performance Schema has these characteristics: The Performance Schema provides a way to inspect internal execution of the server at runtime.
Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.
The performance schema is disabled by default for performance reasons. You can check its current status by looking at the value of the performance_schema system variable. The performance schema cannot be activated at runtime - it must be set when the server starts by adding the following line in your my.
Is your "DateStart" a truncated datetime -- keep date only? If not, you may want to build one with truncated value (by day, or hour), and use int datatype, which will make the index much smaller for faster query.
Or, another way to optimize (golden rule #1 don't do it, #2 do't do it now).
If and only if your date and PK are sync in sequence, you can build a external index of Range of StartDate <=> ID (PK).
and using below pattern
SELECT @start:=ID_START FROM ANOTHER_TABLE WHERE StartDate='2013-01-01'
SELECT @end:=ID_END FROM ANOTHER_TABLE WHERE StartDate='2013-12-31'
SELECT * FROM calls WHERE ID BETWEEN @start and @end AND CustId in (xxxxx) ....
By using above pattern, Mysql will know if has to scan only a segment of table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With