Storing and accessing large data with Python [closed]

Question

I'm about to start working with data that is ~500 GB in size. I'd like to be able to access small components of the data at any given time with Python. I'm considering in using PyTables or MongoDB with PyMongo (or Hadoop - thanks Drahkar). Are there other file structures/DBs that I should consider?

Some of the operations I'll be doing are computing distances from one point to another. Extracting data based on indices from boolean tests and the like. The results may go online for a website, but at the moment it is intended to be only used on a desktop for analysis.

Cheers

Drahkar · Accepted Answer

If you are seriously looking at data processing on a Big Data process, I would highly suggest looking into Hadoop. One provider being Cloudera ( http://www.cloudera.com/ ). It is a very powerful platform that has many tools within it for data processing. Many languages, including Python, have modules for accessing the data, plus a hadoop cluster can do a significant amount of the processing for you once you have built the various mapreduce, Hive and hbase jobs for it.

Storing and accessing large data with Python [closed]

Tags:

python

bigdata

ebressert

1 Answers

Drahkar

Recent Activity

Donate For Us

Storing and accessing large data with Python [closed]

Tags:

python

bigdata

ebressert

1 Answers

Drahkar

Related questions

Recent Activity

Donate For Us