Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Big Data Database

I am collecting a large amount of data which is most likely going to be a format as follows:

User 1: (a,o,x,y,z,t,h,u)

Where all the variables dynamically change with respect to time, except u - this is used to store the user name. What I am trying to understand since my background is not very intense in "big data", is when I end up with my array, it will be very large, something like 108000 x 3500, since I will be preforming analysis on each timestep, and graphing it, what would be an appropriate database to manage this in is what I am trying to determine. Since this is for scientific research I was looking at CDF and HDF5, and based on what I read here NASA I think I will want to use CDF. But is this the correct way to manage such data for speed and efficiency?

The final data set will have all the users as columns, and the rows will be timestamped, so my analysis program would read row by row to interpret the data. And make entries into the dataset. Maybe I should be looking at things like CouchDB and RDBMS, I just don't know a good place to start. Advice would be appreciated.

like image 590
eWizardII Avatar asked Jan 04 '13 09:01

eWizardII


People also ask

Is big data same as database?

Big Data is a Database that is different and advanced from the standard database. The Standard Relational databases are efficient for storing and processing structured data. It uses the table to store the data and structured query language (SQL) to access and retrieve the data.

Which SQL is best for big data?

1. PostgreSQL. Another open-source SQL database, PostgreSQL is a relational database system that is known for its high level of performance and capacity to work with large stores of data.

What are the 3 types of big data?

The classification of big data is divided into three parts, such as Structured Data, Unstructured Data, and Semi-Structured Data.

Which database can handle big data?

There is a problem: Relational databases, the dominant technology for storing and managing data, are not designed to handle big data. In fact, relational databases still look similar to the way they did more than 30 years ago when they were first introduced.


1 Answers

This is an extended comment rather than a comprehensive answer ...

With respect, a dataset of size 108000*3500 doesn't really qualify as big data these days, not unless you've omitted a unit such as GB. If it's just 108000*3500 bytes, that's only 3GB plus change. Any of the technologies you mention will cope with that with ease. I think you ought to make your choice on the basis of which approach will speed your development rather than speeding your execution.

But if you want further suggestions to consider, I suggest:

  1. SciDB
  2. Rasdaman, and
  3. Monet DB

all of which have some traction in the academic big data community and are beginning to be used outside that community too.

like image 190
High Performance Mark Avatar answered Sep 29 '22 03:09

High Performance Mark