Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing and accessing large data with Python [closed]

Tags:

python

bigdata

I'm about to start working with data that is ~500 GB in size. I'd like to be able to access small components of the data at any given time with Python. I'm considering in using PyTables or MongoDB with PyMongo (or Hadoop - thanks Drahkar). Are there other file structures/DBs that I should consider?

Some of the operations I'll be doing are computing distances from one point to another. Extracting data based on indices from boolean tests and the like. The results may go online for a website, but at the moment it is intended to be only used on a desktop for analysis.

Cheers

like image 254
ebressert Avatar asked Dec 08 '25 13:12

ebressert


1 Answers

If you are seriously looking at data processing on a Big Data process, I would highly suggest looking into Hadoop. One provider being Cloudera ( http://www.cloudera.com/ ). It is a very powerful platform that has many tools within it for data processing. Many languages, including Python, have modules for accessing the data, plus a hadoop cluster can do a significant amount of the processing for you once you have built the various mapreduce, Hive and hbase jobs for it.

like image 90
Drahkar Avatar answered Dec 10 '25 03:12

Drahkar