I would like to store time series in a MySQL database. I would like to do it in a linear fashion, that is, every row stands for an unique observation (1 measure, 1 site, 1 timestamp). At present time, it will require 84 096 000
rows and it will grow about 2 102 400
rows a year.
What precautions must be taken in order to properly design the time series table, indices and related queries (essentially a selection of data where measure, site and time range are determined).
Edit:
Adding a proposal of table design:
CREATE TABLE TimeSeries(
Id INT NOT NULL AUTO_INCREMENT,
MeasureTimeStamp DATETIME NOT NULL,
MeasureId INT NOT NULL,
SiteId INT NOT NULL,
Measure FLOAT NOT NULL,
Quality INT NOT NULL,
PRIMARY KEY (Id),
CONSTRAINT UNIQUE (MeasureTimeStamp,MeasureId,SiteId),
FOREIGN KEY (MeasureId) REFERENCES Measure(Id),
FOREIGN KEY (SiteId) REFERENCES Site(Id)
);
CREATE INDEX ChannelIndex ON TimeSeries(MeasureId,SiteId);
Provided Measure and Site table exist, what should be improved to this structure if my major queries are:
SELECT *
FROM TimeSeries
WHERE (MeasureId IN (?,?,?))
AND (SiteId IN (?,?,?))
AND (MeasureTimeStamp BETWEEN ? AND ?)
ORDER BY MeasureId ASC,
SiteId ASC,
MeasureTimeStamp ASC;
Edit 2:
Sites are about 20 and measures are about 50. This leads to maximum 1000 channels (pair of site and measure). It may increase a little bit in few decade but it will not reach more than 10000 channels. Most of the data have a time granulity about 30 min. Anyway time granulity is not constant, and will not be smaller than a minute (some data are daily or weekly).
Some clues:
WHERE A=? AND B=?
but not WHERE B=? AND C=?
.In your example, four indices are created:
MeasureId,SiteId
(ChannelIndex)MeasureTimeStamp,MeasureId,SiteId
(unique constraint)MeasureId
(foreign key)SiteId
(foreign key)Simply put, ChannelIndex is sorted like a list of strings combining MeasureId and SiteId. E.g. for MeasureId = 12 and and Site Id = 68 you can imagine the sorting value as 12_68
.
Your unique constraint sorts according to values like 2014-12-23 09:01:43_12_68
.
To solve your query, MySQL could either use your index or the unique constraint. It depends on the data in your table which it selects. Neither however is optimal. Using the index it will quickly find blocks in the index which have the right MeasureId
and SiteId
, but then it will need to go into each value in the main table to check whether the MeasureTimeStamp
is in range.
Using the unique constraint it can easily select the time range. This index subset however has MeasureId
and SiteId
randomly ordered as is still ordered by MeasureTimeStamp.
To improve your structure, it will help to change your unique constraint to
CONSTRAINT UNIQUE (MeasureId,SiteId,MeasureTimeStamp)
That index will now sort with values like 12_68_2014-12-23 09:01:43
which I expect to show better performance as MySQL can now select a discrete and predictable number of ranges within the index. This covers your SELECT statement and makes your index redundant at the same time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With