I'm working on an academic project aimed at studying people behavior.
The project will be divided in three parts:
The data consists of a list of people, all with an ID #, and with several characteristics: height, weight, age, ...
I need to easily make groups out of this data (e.g.: all with a given age, or a range of heights) and the data is several TB big (but can reduced in smaller subsets of 2-3 gb).
I have a strong background on the theoretical stuff behind the project, but I'm not a computer scientist. I know java, C and Matlab, and now I'm learning python.
I would like to use python since it seems easy enough and greatly reduce the verbosity of Java. The problem is that I'm wondering how to handle the data pool.
I'm no expert of databases but I guess I need one here. What tools do you think I should use?
Remember that the aim is to implement very advanced mathematical functions on sets of data, thus we want to reduce complexity of source code. Speed is not an issue.
Sounds that the main functionality needed can be found from:
pytables
and
scipy/numpy
Go with a NoSQL database like MongoDB which is much easier to handle data in such a case than having to learn SQL.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With