Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server : time-series data performance

I have a table of a little over 1 billion rows of time-series data with fantastic insert performance but (sometimes) awful select performance.

Table tblTrendDetails (PK is ordered as shown):

PK  TrendTime    datetime
PK  CavityId     int
PK  TrendValueId int
    TrendValue   real

The table is continuously pulling in new data and purging old data, so insert and delete performance needs to remain snappy.

When executing a query such as the following, performance is poor (30 sec):

SELECT * 
FROM tblTrendDetails
WHERE TrendTime BETWEEN @inMinTime AND @inMaxTime
  AND CavityId = @inCavityId
  AND TrendValueId = @inTrendId

If I execute the same query again (with similar times, but any @inCavityId or @inTrendId), performance is very good (1 sec). Performance counters show that disk access is the culprit the first time the query is run.

Any recommendations regarding how to improve performance without (significantly) adversely affecting the insert or delete performance? Any suggestions (including completely changing the underlying database) are welcome.

like image 383
pilotcam Avatar asked Oct 23 '12 19:10

pilotcam


People also ask

Is SQL good for time series data?

Times Series With SQLWorking with a time series dataset can be conducive to your SQL learning for many reasons. Time series data, by nature, store records that are not independent of each other. Analyzing such data will require conducting more complex calculations between columns and between rows.

Is NoSQL good for time series?

Another type of database, NoSQL, are also often used to store time series data. Since NoSQL databases are more flexible in terms of the data format for each record, they are good for capturing time series data from a number of distinct sources.


1 Answers

The fact that subsequent queries of the same or similar data run much faster is probably due to SQL Server caching your data. That said, is it possible to speed this initial query up?

Verify the query plan:

My guess is that your query should result in an Index Seek rather than an Index Scan (or worse, a Table Scan). Please verify this using SET SHOWPLAN_TEXT ON; or a similar feature. Using between and = as your query does should really take advantage of the clustered index, though that's debatable.

Index Fragmentation:

It is possible that your clustered index (the primary key in this case) is quite fragmented after all of those inserts and deletes. I would probably check this with DBCC SHOWCONTIG (tblTrendDetails).

You can defrag the table's indexes with DBCC INDEXDEFRAG (MyDatabase, tblTrendDetails). This may take some time, but will allow the table to remain accessible, and you can stop the operation without any nasty side-effects.

You might have to go further and use DBCC DBREINDEX (tblTrendDetails). This is an offline operation, though, so you should only do this when the table does not need to be accessed.

There are some differences described here: Microsoft SQL Server 2000 Index Defragmentation Best Practices.

Be aware that your transaction log can grow quite a bit from defragging a large table, and it can take a long time.

Partitioned Views:

If these do not remedy the situation (or fragmentation is not a problem), you may even wish to look to partitioned views, in which you create a bunch of underlying base tables for various ranges of records, then union them all up in a view (replacing your original table).

Better Stuff:

If performance of these selects is a real business need, you may be able to make the case for better hardware: faster drives, more memory, etc. If your drives are twice as fast, then this query will run in half the time, yeah? Also, this may not be workable for you, but I've simply found newer versions of SQL Server to truly be faster with more options and better to maintain. I'm glad to have moved most of my company's data to 2008R2. But I digress...

like image 182
Tim Lehner Avatar answered Sep 29 '22 07:09

Tim Lehner