Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Timeserie database linear storage

I would like to store time series in a MySQL database. I would like to do it in a linear fashion, that is, every row stands for an unique observation (1 measure, 1 site, 1 timestamp). At present time, it will require 84 096 000 rows and it will grow about 2 102 400 rows a year.

What precautions must be taken in order to properly design the time series table, indices and related queries (essentially a selection of data where measure, site and time range are determined).

Edit:

Adding a proposal of table design:

CREATE TABLE TimeSeries(
   Id                  INT          NOT NULL     AUTO_INCREMENT,
   MeasureTimeStamp    DATETIME     NOT NULL, 
   MeasureId           INT          NOT NULL,
   SiteId              INT          NOT NULL,
   Measure             FLOAT        NOT NULL,
   Quality             INT          NOT NULL,   
   PRIMARY KEY (Id),
   CONSTRAINT UNIQUE (MeasureTimeStamp,MeasureId,SiteId),
   FOREIGN KEY (MeasureId) REFERENCES Measure(Id),
   FOREIGN KEY (SiteId) REFERENCES Site(Id)
);
CREATE INDEX ChannelIndex ON TimeSeries(MeasureId,SiteId);

Provided Measure and Site table exist, what should be improved to this structure if my major queries are:

SELECT *
FROM TimeSeries
WHERE (MeasureId IN (?,?,?)) 
  AND (SiteId IN (?,?,?))
  AND (MeasureTimeStamp BETWEEN ? AND ?)
ORDER BY MeasureId ASC,
         SiteId ASC,
         MeasureTimeStamp ASC;

Edit 2:

Sites are about 20 and measures are about 50. This leads to maximum 1000 channels (pair of site and measure). It may increase a little bit in few decade but it will not reach more than 10000 channels. Most of the data have a time granulity about 30 min. Anyway time granulity is not constant, and will not be smaller than a minute (some data are daily or weekly).

like image 618
jlandercy Avatar asked Nov 10 '22 23:11

jlandercy


1 Answers

Some clues:

  • An index in MySQL is a list of your primary keys ordered by your 'index columns'. You want to order that list in such away that it is as easy as possible to find the values you need.
  • MySQL uses only one index on a table at a time.
  • MySQL can use the index from left to right (MySQl Multi-column indexes). This means Index(A,B,C) allows you to do WHERE A=? AND B=? but not WHERE B=? AND C=?.

In your example, four indices are created:

  • MeasureId,SiteId (ChannelIndex)
  • MeasureTimeStamp,MeasureId,SiteId (unique constraint)
  • MeasureId (foreign key)
  • SiteId (foreign key)

Simply put, ChannelIndex is sorted like a list of strings combining MeasureId and SiteId. E.g. for MeasureId = 12 and and Site Id = 68 you can imagine the sorting value as 12_68. Your unique constraint sorts according to values like 2014-12-23 09:01:43_12_68.

To solve your query, MySQL could either use your index or the unique constraint. It depends on the data in your table which it selects. Neither however is optimal. Using the index it will quickly find blocks in the index which have the right MeasureId and SiteId, but then it will need to go into each value in the main table to check whether the MeasureTimeStamp is in range. Using the unique constraint it can easily select the time range. This index subset however has MeasureId and SiteId randomly ordered as is still ordered by MeasureTimeStamp.

To improve your structure, it will help to change your unique constraint to

CONSTRAINT UNIQUE (MeasureId,SiteId,MeasureTimeStamp)

That index will now sort with values like 12_68_2014-12-23 09:01:43 which I expect to show better performance as MySQL can now select a discrete and predictable number of ranges within the index. This covers your SELECT statement and makes your index redundant at the same time.

like image 79
Gerard Avatar answered Nov 15 '22 08:11

Gerard