Handle large data pools in python

Question

I'm working on an academic project aimed at studying people behavior.

The project will be divided in three parts:

A program to read the data from some remote sources, and build a local data pool with it.
A program to validate this data pool, and to keep it coherent
A web interface to allow people to read/manipulate the data.

The data consists of a list of people, all with an ID #, and with several characteristics: height, weight, age, ...

I need to easily make groups out of this data (e.g.: all with a given age, or a range of heights) and the data is several TB big (but can reduced in smaller subsets of 2-3 gb).

I have a strong background on the theoretical stuff behind the project, but I'm not a computer scientist. I know java, C and Matlab, and now I'm learning python.

I would like to use python since it seems easy enough and greatly reduce the verbosity of Java. The problem is that I'm wondering how to handle the data pool.

I'm no expert of databases but I guess I need one here. What tools do you think I should use?

Remember that the aim is to implement very advanced mathematical functions on sets of data, thus we want to reduce complexity of source code. Speed is not an issue.

eat · Accepted Answer

Sounds that the main functionality needed can be found from:
pytables
and
scipy/numpy

Andreas Jung · Answer

Go with a NoSQL database like MongoDB which is much easier to handle data in such a case than having to learn SQL.

Handle large data pools in python

Tags:

python

database

large-data

Mascarpone

2 Answers

eat

Andreas Jung

Recent Activity

Donate For Us

Handle large data pools in python

Tags:

python

database

large-data

Mascarpone

2 Answers

eat

Andreas Jung

Related questions

Recent Activity

Donate For Us