Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time Series Data in MySQL: Sampling

We have a MySQL database where we have added time series values.

-------------------------------------
|Col A | Col B | Timestamp          |
-------------------------------------
|1.23  | 4.48  |2013-09-03 10:45:27 |
-------------------------------------
|1.23  | 4.48  |2013-09-03 10:46:27 |
-------------------------------------
|1.23  | 4.48  |2013-09-03 10:47:27 |
-------------------------------------

The data is unevenly spaced w.r.t time, some points are separated by a minute & some by a few seconds.

Is there an efficient way I could query this database to pull data for every nth minute/second/hour? Ideally I would want the (linear) interpolated value at the nth minute, but the closest point to the nth minute or the last point just before or at the nth point would do too.

The use case being I want to plot this into a graph, but do not want too many points than necessary. So for plotting for a year, I would prefer querying only a couple of points a day. While plotting for a day, I would want to plot a point every minute or so.

I can do all this in PHP, but is there a way to do it directly in the database? If not, I am contemplating the usage of a time series database, but the budget constraints restricts me to only the free ones. Is there any free time series database that gives out of the box sampling and preferably interpolation?

like image 446
Jugal Thakkar Avatar asked Sep 22 '13 07:09

Jugal Thakkar


People also ask

Is MySQL good for time series?

MySQL and a number of it's variants can be used as a time-series database. Using the MySQL example employees database we are going to provide a list of time-series analysis that we want performed, giving you a chance to try writing the SQL yourself. We will then provide the SQL we would have used.

Is SQL good for time series data?

Times Series With SQLWorking with a time series dataset can be conducive to your SQL learning for many reasons. Time series data, by nature, store records that are not independent of each other. Analyzing such data will require conducting more complex calculations between columns and between rows.

How is time series data stored in a database?

Storing time series data. Time series data is best stored in a time series database (TSDB) built specifically for handling metrics and events that are time-stamped. This is because time series data is often ingested in massive volumes that require a purpose-built database designed to handle that scale.


1 Answers

I've had a stab at this, I'm really interested to see how others would solve it.

I had a similar problem like this before and solved it by creating a time index table and then joining the data table based on rewriting the time to fit a time frame. The problem is you need a new time index table and separate query or view for each time interval.

The advantage of joining the data in this way was that I was also interested in time frames where there was no reading or result, so I needed to see the nulls or no readings for certain time frames. There is just a little extra work required with the end data for that (ie: taking out the Placeholders) .

The first thing I did, was create a time index table, it looks something like this....

mysql> select * from ctb_time_idx  WHERE YEAR( ctb_datetime ) = 2013  LIMIT 10 ;
+---------------------+
| ctb_datetime        |
+---------------------+
| 2013-01-01 00:00:00 | 
| 2013-01-01 00:15:00 | 
| 2013-01-01 00:30:00 | 
| 2013-01-01 00:45:00 | 
| 2013-01-01 01:00:00 | 
| 2013-01-01 01:15:00 | 
| 2013-01-01 01:30:00 | 
| 2013-01-01 01:45:00 | 
| 2013-01-01 02:00:00 | 
| 2013-01-01 02:15:00 | 
+---------------------+
10 rows in set (0.07 sec)

I then union my data in

( select 
    ctb_datetime AS time1 , 
    'Placeholder' AS TimeInterval , 
    NULL AS `Col A` , 
    NULL AS `Col B` 
from my_time_idx 
    where YEAR ( ctb_time_idx.ctb_datetime  ) = 2013 )  
UNION 
( select DATE_FORMAT( time1 , '%Y-%m-%d %H:00' ) AS time1  , 
    '00min' AS TimeInterval , `Col A` , `Col B` from my_data_table  
    where MINUTE( time1 ) BETWEEN  00 AND 14  ) 
UNION 
( select DATE_FORMAT( time1 , '%Y-%m-%d %H:15' ) AS time1 , 
    '15min' AS TimeInterval, `Col A` , `Col B` from my_data_table 
    where MINUTE( time1 ) BETWEEN  15 AND 29  ) 
UNION 
( select DATE_FORMAT( time1 , '%Y-%m-%d %H:30' ) AS time1 , 
    '30min' AS TimeInterval, `Col A` , `Col B` from my_data_table 
    where MINUTE( time1 ) BETWEEN  30 AND 44  ) 
UNION 
( select DATE_FORMAT( time1 , '%Y-%m-%d %H:45' ) AS time1 , 
    '45min' AS TimeInterval, `Col A` , `Col B` from my_data_table 
    where MINUTE( time1 ) BETWEEN  45 AND 59  )     
order by time1 

I tested this on my old tables and it seems to work fine, I had to re-edit my code to suit your example, so hopefully I didn't screw it up when doing that.

like image 119
paul1923 Avatar answered Sep 22 '22 15:09

paul1923