What is the best way to store historical data in SQL Server 2005/2008?

Tags:

My simplified and contrived example is the following:-

Lets say that I want to measure and store the temperature (and other values) of all the worlds' towns on a daily basis. I am looking for an optimal way of storing the data so that it is just as easy to get the current temperature in all the towns, as it is to get all the temperature historically in one town.

It is an easy enough problem to solve, but I am looking for the best solution.

The 2 main options I can think of are as follows:-

Option 1 - Same table stores current and historical records

Store all the current and archive records in the same table.

i.e.

CREATE TABLE [dbo].[WeatherMeasurement](   MeasurementID [int] Identity(1,1) NOT Null,   TownID [int] Not Null,   Temp [int] NOT Null,   Date [datetime] NOT Null, )

This would keep everything simple, but what would be the most efficient query to get a list of towns and there current temperature? Would this scale once the table has millions of rows in? Is there anything to be gained by having some sort of IsCurrent flag in the table?

Option 2 - Store all archive records in a separate table

There would be a table to store the current live measurements in

CREATE TABLE [dbo].[WeatherMeasurement](   MeasurementID [int] Identity(1,1) NOT Null,   TownID [int] Not Null,   Temp [int] NOT Null,   Date [datetime] NOT Null, )

And a table to store historical archived date (inserted by a trigger perhaps)

CREATE TABLE [dbo].[WeatherMeasurementHistory](   MeasurementID [int] Identity(1,1) NOT Null,   TownID [int] Not Null,   Temp [int] NOT Null,   Date [datetime] NOT Null, )

This has the advantages of keeping the main current data lean, and very efficient to query, at the expense of making the schema more complex and inserting data more expensive.

Which is the best option? Are there better options I haven't mentioned?

NOTE: I have simplified the schema to help focus my question better, but assume there will be alot of data inserted each day (100,000s of records), and data is current for one day. The current data is just as likely to be queried as the historical.

946

asked Nov 17 '08 16:11

Andrew Rimmer

2 Answers

it DEPENDS on the applications usage patterns... If usage patterns indicate that the historical data will be queried more often than the current values, then put them all in one table... But if Historical queries are the exception, (or less than 10% of the queries), and the performance of the more common current value query will suffer from putting all data in one table, then it makes sense to separate that data into it's own table...

179

answered Oct 20 '22 08:10

Charles Bretana

I would keep the data in one table unless you have a very serious bias for current data (in usage) or history data (in volume). A compound index with DATE + TOWNID (in that order) would remove the performance concern in most cases (although clearly we don't have the data to be sure of this at this time).

The one thing I would wonder about is if anyone will want data from both the current and history data for a town. If so, you just created at least one new view to worry about and possible performance problem in that direction.

This is unfortunately one of those things where you may need to profile your solutions against real world data. I personally have used compound indexes such as specified above in many cases, and yet there are a few edge cases where I have opted to break the history into another table. Well, actually another data file, because the problem was that the history was so dense that I created a new data file for it alone to avoid bloating the entire primary data file set. Performance issues are rarely solved by theory.

I would recommend reading up on query hints for index use, and "covering indexes" for more information about performance issues.

answered Oct 20 '22 06:10

Godeke

Related questions
                            
                                Will public key change on renewing a certificate?
                            
                                When zeroing a struct such as sockaddr_in, sockaddr_in6 and addrinfo before use, which is correct: memset, an initializer or either?
                            
                                Why is HttpApplication constructor called several times
                            
                                What does ExtJS ComboBox triggerAction: "all" really do?
                            
                                Relation between Oracle session and connection pool
                            
                                How to Create a Virtual Windows Drive
                            
                                How To Deploy Your PHP Applications Correctly?
                            
                                Rule of thumb for naming wrapper classes
                            
                                Why do ActiveRecord callbacks require instance variables or instance methods to be prefixed with self keyword?
                            
                                Where is the source code for the java compiler? [closed]
                            
                                In python's tkinter, how can I make a Label such that you can select the text with the mouse?
                            
                                How to use PIL to resize and apply rotation EXIF information to the file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With