Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to store 10 - 100 million simulation outputs from .net (SQL vs. flat file)

Tags:

c#

.net

sql

I've been working on a project that is generating on the order of 10 - 100 million outputs from a simulation that I would like to store for future analyses. There are several nature levels of organization to the data e.g. Classrooms hold Students who take Tests which have a handful of different performance metrics.

It seems like my data is border line in terms of being able to fit in memory all at once (given the calculation of the simulations requires a fair amount of data in memory to do the calculation), but I don't have any immediate need for all of the data to be available to my program at once.

I am considering whether it would be better to be outputting the calculated values to a SQL database or a flat text file. I am looking for advice about which approach might be faster/easier to maintain (or if you have an alternate suggestion for storing the data I am open to that).

I don't need to be able to share the data with anyone else or worry about accessing the data years down the line. I just need a convenient way to avoid regenerating the simulations everytime I want to carry out a tweak to the analysis of the values.

like image 821
Rob Donnelly Avatar asked Dec 21 '12 01:12

Rob Donnelly


1 Answers

I'd consider using a database - 100 million files is too many for a file system without some kind of classification scheme, while a database can easily handle this many rows. You could just serialize the output into a BLOB column so you don't have to map it. Also, consider that SQL Server has file stream access so this could be essentially a hybrid approach where SQL manages the files for you.

like image 136
codekaizen Avatar answered Oct 04 '22 11:10

codekaizen