A really basic table containing quotes on instruments across multiple exchanges leveraging the TokuDB storage engine:
CREATE TABLE `quotes` (
`ticker` char(4) NOT NULL,
`timestamp` time(3) NOT NULL,
`price` decimal(7,2) unsigned NOT NULL,
`size` smallint(5) unsigned NOT NULL,
`exchange` char(3) NOT NULL,
KEY `best_price` (`ticker`,`timestamp`,`exchange`,`price`),
KEY `best_size` (`exchange`,`ticker`,`price`,`timestamp`)
) ENGINE=TokuDB
Whenever I query for best price across all exchanges it always uses a temporary table. The presence of exchange
and price
in the index only appears to produce an index scan being equivalent to a clustered key on (ticker, timestamp)
in TokuDB.
EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
ticker="A" AND
timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: best_price
key: best_price
key_len: 9
ref: NULL
rows: 2690
Extra: Using where; Using index; Using temporary
Is it possible to define a configuration that would not use a temporary table? This appears only possible when dropping the timestamp
article:
EXPLAIN SELECT max(price),exchange
FROM quotes
WHERE
ticker="A"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: best_price
key: best_size
key_len: 7
ref: NULL
rows: 96
Extra: Using where; Using index for group-by
Example output from best price query:
+------------+----------+
| max(price) | exchange |
+------------+----------+
| 41.06 | BTY |
| 41.06 | DEA |
| 41.07 | NYS |
| 41.07 | THM |
| 41.06 | PSE |
| 41.07 | BAT |
| 41.06 | DEX |
| 41.06 | BOS |
| 41.06 | ADC |
| 41.06 | XPH |
+------------+----------+
10 rows in set (0.01 sec)
And the bottleneck (3ms) is processing each individual row in the time range:
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000071 |
| checking permissions | 0.000005 |
| Opening tables | 0.000016 |
| After opening tables | 0.000006 |
| System lock | 0.000014 |
| Table lock | 0.000002 |
| After table lock | 0.000005 |
| init | 0.000038 |
| optimizing | 0.000024 |
| statistics | 0.000155 |
| preparing | 0.000028 |
| executing | 0.000003 |
| Copying to tmp table | 0.000031 |
| Copying to tmp table | 0.003381 |
| Sending data | 0.000017 |
| end | 0.000004 |
| removing tmp table | 0.000020 |
| end | 0.000002 |
| query end | 0.000005 |
| closing tables | 0.000005 |
| freeing items | 0.000006 |
| updating status | 0.000011 |
| cleaning up | 0.000002 |
+----------------------+----------+
The time range contains 2316 rows, broken down by exchange:
+----------+----------+
| exchange | count(*) |
+----------+----------+
| ADC | 71 |
| BAT | 298 |
| BOS | 129 |
| BTY | 266 |
| DEA | 153 |
| DEX | 60 |
| NYS | 530 |
| PSE | 325 |
| THM | 453 |
| XPH | 31 |
+----------+----------+
I've tried the crazy and added every permutation of covering index and MariaDB cannot find a better key. Are there other databases I should be looking at instead?
Example data set for time range and ticker symbol: http://pastebin.com/b5RcTXAs
The answer is all about optimisation. MySQL has chosen to use a temporary table because it is considered more sensible than using the indexes to fetch each exchange. If one uses a clustered index across exchange,ticker,timestamp
then the query can operate without a temporary:
MariaDB [trth]> explain SELECT min(ask_price),exchange
FROM quotes
USE INDEX (exchange_ticker_timestamp)
WHERE exchange IN ("NYS","BOS","CIN","ADC","DEX","DEA","MID","PSE","THM","WCB","BAT","XPH","BTY") AND
ticker="A" AND
timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: exchange_ticker_timestamp
key: exchange_ticker_timestamp
key_len: 10
ref: NULL
rows: 2589
Extra: Using where; Using index
MariaDB [trth]> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000079 |
| checking permissions | 0.000006 |
| Opening tables | 0.000014 |
| After opening tables | 0.000011 |
| System lock | 0.000014 |
| Table lock | 0.000003 |
| After table lock | 0.000005 |
| init | 0.000043 |
| optimizing | 0.000019 |
| statistics | 0.000234 |
| preparing | 0.000027 |
| executing | 0.000008 |
| Sorting result | 0.000002 |
| Sending data | 0.002985 |
| end | 0.000006 |
| query end | 0.000010 |
| closing tables | 0.000006 |
| freeing items | 0.000007 |
| updating status | 0.000138 |
| cleaning up | 0.000004 |
+----------------------+----------+
Compare with a temporary table grouping:
MariaDB [trth]> explain SELECT min(ask_price),exchange
FROM quotes
WHERE ticker="A" AND
timestamp BETWEEN "15:15:22.328961" AND "15:17:22.328961"
GROUP BY exchange
ORDER BY NULL \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: quotes
type: range
possible_keys: ticker_timestamp
key: ticker_timestamp
key_len: 9
ref: NULL
rows: 1515
Extra: Using where; Using temporary
MariaDB [trth]> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000091 |
| checking permissions | 0.000009 |
| Opening tables | 0.000037 |
| After opening tables | 0.000009 |
| System lock | 0.000052 |
| Table lock | 0.000004 |
| After table lock | 0.000009 |
| init | 0.000049 |
| optimizing | 0.000025 |
| statistics | 0.000144 |
| preparing | 0.000039 |
| executing | 0.000003 |
| Copying to tmp table | 0.000040 |
| Copying to tmp table | 0.004674 |
| Sending data | 0.000020 |
| end | 0.000003 |
| removing tmp table | 0.000015 |
| end | 0.000003 |
| query end | 0.000004 |
| closing tables | 0.000006 |
| freeing items | 0.000006 |
| updating status | 0.000204 |
| cleaning up | 0.000004 |
+----------------------+----------+
What is interesting to note here is that the first query scans more rows but is executed faster than the second.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With