Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pickle vs sql efficiency

Tags:

python

sql

pickle

I'm developing an application in Python which requires storing (very) large data sets. Is pickle the most practical way to store the data and retrieve it on request, or should i consider using SQL instead? My main goals are speed and as little processing strain as possible.

My concern is that pickle has to process an entire large file on the fly, which could adversely effect performance. I'm not particularly familiar with pickle beyond usage, so any explanation to how it works would be great.

Right now, i'm using this code:

users = pickle.load( open( "users.py", "rb" ) )
username = raw_input("Please enter a username: ")
password = raw_input("Please enter a password: ")
var = username in users
if(var == 0):
    return 0
    exit()
else:
    if(users[username] != password):
        return 0
        exit()
    else:
        return 1
        exit()

Imaging that users contains 1 million entries, which would be more efficient, this or SQL?

Any help would be great,

Thanks

like image 983
user2330561 Avatar asked May 11 '13 12:05

user2330561


2 Answers

Pickle is generally suited to storage of objects, if you want to store 'raw' data efficiently then pickle probably isn't the way to go, but its very dependant on the specific situation - is 'loading' the data time critical, do you have the development time to set up a database, queries etc.

If your data is a million pairs of username and date of birth then pickle is probably not the best way to go, it would be arguably simpler to store the data in a flat text file.

Both the pickle and the db/SQL solutions have the advantage of being extendible. Bear in mind pickle is not 'secure' and so you should consider the trustworthiness of the file, e.g. Would it be transferred between different systems.

Overall, if your data sets are very large, a relational Db may be more suitable than pickle, but you may also want to consider other storage engines, e.g. Redis, MongoDb, Memcached. All of them, are very situation dependent though, so any more info you can provide on how the data is expected to be used would be useful!

like image 173
Tom Dalton Avatar answered Sep 22 '22 12:09

Tom Dalton


As you are searching for some user in the users object, I guess SQL will be a better solution.

Supposing users is an array, you will have to search that user from the begin to the end of the array. Using SQL you have the possibility to add indexes, which depending on how you model your user object can give you a bit boost.

Also pickle will parse, recreate and load the stored objects, so just the cost of load that (both in processor power and memory used) probably will make it a worse option.

like image 21
Salem Avatar answered Sep 22 '22 12:09

Salem