Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to analyze large amounts of data?

I need to analyze tens of thousands of lines of data. The data is imported from a text file. Each line of data has eight variables. Currently, I use a class to define the data structure. As I read through the text file, I store each line object in a generic list, List.

I am wondering if I should switch to using a relational database (SQL) as I will need to analyze the data in each line of text, trying to relate it to definition terms which I also currently store in generic lists (List).

The goal is to translate a large amount of data using definitions. I want the defined data to be filterable, searchable, etc. Using a database makes more sense the more I think about it, but I would like to confirm with more experienced developers before I make the changes, yet again (I was using structs and arraylists at first).

The only drawback I can think of, is that the data does not need to be retained after it has been translated and viewed by the user. There is no need for permanent storage of data, therefore using a database might be a little overkill.

like image 863
Snooze Avatar asked Feb 28 '10 01:02

Snooze


People also ask

How do you analyze a large amount of data?

For large datasets, analyze continuous variables (such as age) by determining the mean, median, standard deviation and interquartile range (IQR). Analyze nominal variables (such as gender) by using percentages.

How do you handle analyzing a data set that is too large to be processed?

Another way to handle large datasets is by chunking them. That is cutting a large dataset into smaller chunks and then processing those chunks individually. After all the chunks have been processed, you can compare the results and calculate the final findings.


4 Answers

This is not a large amount of data. I don't see any reason to involve a database in your analysis.

There IS a query language built into C# -- LINQ. The original poster currently uses a list of objects, so there is really nothing left to do. It seems to me that a database in this situation would add far more heat than light.

like image 36
Joe H Avatar answered Nov 04 '22 08:11

Joe H


I was facing the same problem that you faced now while I was working on my previous company.The thing is I was looking a concrete and good solution for a lot of bar code generated files.The bar code generates a text file with thousands of records with in a single file.Manipulating and presenting the data was so difficult for me at first.Based on the records what I programmed was, I create a class that read the file and loads the data to the data table and able to save it in database. The database what I used was SQL server 2005.Then I able to manage the saved data easily and present it which way I like it.The main point is read the data from the file and save to it to the data base.If you do so you will have a lot of options to manipulate and present as the way you like it.

like image 44
wonde Avatar answered Nov 04 '22 07:11

wonde


It is not absolutely necessary to go a database. It depends on the actual size of the data and the process you need to do. If you are loading the data into a List with a custom class, why not use Linq to do your querying and filtering? Something like:

var query = from foo in List<Foo>
            where foo.Prop = criteriaVar
            select foo;

The real question is whether the data is so large that it cannot be loaded up into memory confortably. If that is the case, then yes, a database would be much simpler.

like image 150
Thomas Avatar answered Nov 04 '22 08:11

Thomas


It sounds like what you want is a database. Sqlite supports in-memory databases (use ":memory:" as the filename). I suspect others may have an in-memory mode as well.

like image 38
i_am_jorf Avatar answered Nov 04 '22 06:11

i_am_jorf