Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best indexing strategy for time series in a mysql database with timestamps

I have a database that I'm taking care of containing pulse measurements.
The schema is like this:

id - monitorid - starttime - stoptime - pulses

Every monitor gives information every 10 minutes.
Currently that adds up to about 13 000 000 rows.

The start- and stoptime are varchar(10)'s, holding unix timestamps. Probably not the most efficiënt for my case.

Almost all queries against this table are 'WHERE starttime > $certaintime AND monitorid = $monid'. All these queries are currently extremely slowly.

I have an index on monitorid. I haven't yet put any on starttime and stoptime, since I figured that that will hardely give me any better cardinality, since each 10 minute slot is a new value. I'm not sure of this reasoning though.

So, my question: how would one optimize this for the range-like queries that it is confronted with mostly. Index starttime? Rebuild the table with dates instead of timestamps?

Any advice is most welcome!

Cheers,

Dieter

like image 347
Dieter Avatar asked Oct 16 '25 23:10

Dieter


1 Answers

Create a compound btree index on monitorid + starttime columns.
This index can give the best results for queries which use WHERE starttime > X AND monitorid = Y clause

CREATE INDEX name ON tablename( monitorid + starttime )

monitorid must be a leading column in this index, otherwise the index will be not usable.
Read a chapter "8.2.1.3.2 The Range Access Method for Multiple-Part Indexes" for details here: https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html

They write that:

For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

(emphasis mine)

The above means, that in your specific case if an index on monitorid + starttime will be created, then the opimizec can use both part of the index because monitorid = $monid is used in the where clause, but in a case of reverse index order starttime + monitorid, the second part of the index is not usable because starttime > $certaintime is used in the where clause.

like image 175
krokodilko Avatar answered Oct 19 '25 13:10

krokodilko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!