The problem:
We're getting stock prices and trades from a provider, and to speed things up we cache the trades as they come in (1 trade per second per stock is not a lot). We've got around 2,000 stocks, so technically, we're expecting as much as 120,000 trades per minute (2,000 * 60). Now, these prices are realtime, but to avoid paying licensing fees to show these data to the customer we need to show the prices delayed with 15 minutes. (We need the realtime prices internally, which is why we've bought and pay for them (they are NOT cheap!))
I feel like I've tried everything, and I've run into an uncountable number of problems.
Things I've tried:
1:
Run a cronjob every 15 seconds that runs a query that checks what the trade for the stock, more than 15 minutes ago, had for an ID (for joins):
SELECT
MAX(`time`) as `max_time`,
`stock_id`
FROM
`stocks_trades`
WHERE
`time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
`time` > '0000-00-00 00:00:00'
GROUP BY
`stock_id`
This works very fast - 1.8
seconds with ~2,000,000 rows, but the following is very slow:
SELECT
st.id,
st.stock_id
FROM
(
SELECT
MAX(`time`) as `max_time`,
`stock_id`
FROM
`stocks_trades`
WHERE
`time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
`time` > '0000-00-00 00:00:00'
GROUP BY
`stock_id`
) as `tmp`
INNER JOIN
`stocks_trades` as `st`
ON
(tmp.max_time = st.time AND tmp.stock_id = st.stock_id)
GROUP BY
`stock_id`
..that takes ~180-200 seconds, which is WAY too slow. There's an index on both time
and stock_id
(indiviudally).
2:
Switch between InnoDB/MyISAM. I'd think I would need InnoDB (we're inserting A LOT of rows from multiple threads, we don't want to block between each insert) - InnoDB seems faster at inserting, but WAY slower at reading (we require both, obviously).
3:
Optimize tables every day. Still slow.
What I think might help:
int
s instead of DateTime
. Perhaps (since the markets are open from 9-22) keep a custom int time, which would be "seconds since 9 o'clock this morning" and use the same method as above (it seems to make some difference, albeit not a lot)SQL Server uses Julian dates so your 30 means "30 calendar days". getdate() - 0.02083 means "30 minutes ago".
Here's the SQL query to select records for last 10 minutes. In the above query we select those records where order_date falls after a past interval of 10 minutes. We use system function now() to get the latest datetime value, and INTERVAL clause to calculate a date 10 minutes in the past.
SELECT col1, col2, col3 FROM table WHERE DATE_ADD(last_seen, INTERVAL 10 MINUTE) >= NOW();
Here is the syntax that we can use to get the latest date records in SQL Server. Select column_name, .. From table_name Order By date_column Desc; Now, let's use the given syntax to select the last 10 records from our sample table.
Since your are joining against your subquery on two columns (stock_id, time)
, MySQL ought to be able to make use of a compound index across both of them, while it cannot make use of either of the individual column indices you already have.
ALTER TABLE `stocks_trades` ADD INDEX `idx_stock_id_time` (`stock_id`, `time`)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With