I have a table for storing prices over time of ~35k items every 15 minutes for 2 weeks.. Roughly it translates to about 35 million rows in the table. I'm trying to perform the simplest of queries:
SELECT buy_price, sell_price, created_at FROM price_archive WHERE item_id = X
The first, uncached run of that query takes around 4-7 seconds to return ~1300 rows (per item). This seems ridiculously slow for something so trivial for a database, especially considering there is on index on the item_id
column.
The table has 35k rows inserted every 15 minutes, and each day a task runs to delete items from < (2 weeks - 1) ago (to stop the table from growing too much). I suspect this fragments the table quite a bit, but would that fragmentation cause the query to perform so poorly? If yes, would partitioning on created_at
for the deletion of old data make it better?
1306 rows in set (8.32 sec)
mysql> explain select * from price_archives where item_id = 743;
+----+-------------+----------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
| 1 | SIMPLE | price_archives | ref | index_price_archives_on_item_id | index_price_archives_on_item_id | 5 | const | 1305 | Using where |
+----+-------------+----------------+------+---------------------------------+---------------------------------+---------+-------+------+-------------+
This is your query:
SELECT buy_price, sell_price, created_at
FROM price_archive
WHERE item_id = X;
The best index for this query is the composite index: price_archive(item_id, buy_price, sell_price, created_at)
. This is a "covering" index that can satisfy the query. It comes with a downside, though. This index may slow the inserts that you are doing in the table. 140k rows every hour is a lot of data, but this maintaining this index should not be so bad.
You are facing a not-uncommon challenge in databases. The problem with your query is that the 1300 or so rows being returned are all on different data pages. In all likelihood, the table does not fit into memory on your machine, so this results in about 1300 accesses to the files on disk. That explains why you are seeing a lag time of several seconds.
Another solution is to ensure that the data tables themselves fit into memory. Although the first uncached query will take a bit of time, subsequent queries should be quite fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With