Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best strategy for retainment of large data sets?

I'm leading a project where we'll be recording metrics data. I'd like to retain the data for years. However, I'd also like to keep the primary table from becoming bloated with data that, while necessary for long term trending, isn't required for short term reporting.

What is the best strategy for handling this situation? Simply archive the old data to another table? Or "roll it up" via some consolidation of the data itself (and then store it off to a different table)? Or something else entirely?

Additional info: we are using SQL Server 2005.

like image 249
Scott Marlowe Avatar asked Aug 21 '08 12:08

Scott Marlowe


4 Answers

We use both methods at my work, but slightly different, we keep all sales data in the primary table for 30 days, then at night (part of the nightly jobs) the days sales are rolled up into summaries (n qty of x product sold today ect) in a separate table for reporting reasons, and sales over 30 days are archived into a different database, then once a year (we go on tax years) a new archive database is started. not exactly perfect but..

this way we get the summaries data fast, keep all current sales data at hand and have an unlimited space for the detailed archive data. we did try keeping it all in one database (in different tables) but the file size of the database (interbase) would grow so large that it would drag the system down.

the only real problem we have is accessing detailed data that spans several database, as connecting and disconnecting is slow, and analysis has to be done in code rather than sql

like image 172
Re0sless Avatar answered Nov 20 '22 08:11

Re0sless


If you are using SQL server 2005, this may be a good candidate for using partitioned tables.

like image 39
wcm Avatar answered Nov 20 '22 07:11

wcm


@Jason - I don't see how keeping data in plain old text files will allow you to do long term trending analysis easily on the data.

@Jason - I guess my point is that if any sort of ad-hoc analysis (i.e. trending) needs to be done on the data by business people, rolling up or archiving the data to text files really doesn't solve any problems. Of course writing code to consume a text file is easy in many languages, but that problem has been solved. Also, I would argue that today's RDBMS's are all extremely durable when setup and maintained properly. If they weren't why would you run a business on top of one (let alone archive data to it)? I just don't see the point of archiving to a plain text file because of the claim that durability of text files is superior to that of databases.

like image 2
Peter Meyer Avatar answered Nov 20 '22 08:11

Peter Meyer


Either of those options are excellent, but it really depends on the problem domain. For things like cash balances or statistical data, I think that rolling up records and consolidating them is the best way, you can then move the rolled up records into a parallel archive table, keying them in such a way that you can "unroll" if necessary. This keeps your primary data table clean and quick, but allows you to retain the extra data for auditing or whatever. The key question is, how do you implement the "roll-up" process. Either automatically, via a trigger or server side process, or by user intervention at the application level?

like image 1
ninesided Avatar answered Nov 20 '22 07:11

ninesided