I looked through several SO-Questions for how to pickle a python object and store it into a database. The information I collected is:
import pickle
or import cpickle
. Import the latter, if performance is an issue.dict
is a python dictionary (or what so ever python object): pickled = pickle.dumps(dict)
.pickled
into a MySQL BLOB Column using what so ever module to communicate with Database.pickle.loads(pickled)
to restore the python dictionary.I just want to make sure I understood this right. Did I miss something critical? Are there sideeffects? Is it really that easy?
Background-Info: The only thing I want to do, is store Googlegeocoder-Responses, which are nested python dictionarys in my case. I am only using a little part of the response object and I don't know if I will ever need more of it later on. That's why I thought of storing the response to save me repetition of some million querys.
In general, pickling a dict will fail unless you have only simple objects in it, like strings and integers. Even a really simple dict will often fail.
Pickle in Python is primarily used in serializing and deserializing a Python object structure. In other words, it's the process of converting a Python object into a byte stream to store it in a file/database, maintain program state across sessions, or transport data over the network.
First, import pickle to use it, then we define an example dictionary, which is a Python object. Next, we open a file (note that we open to write bytes in Python 3+), then we use pickle. dump() to put the dict into opened file, then close. Use pickle.
It's really that easy... so long as you don't need your DB to know anything about the dictionary. If you need any sort of structured data access to the contents of the dictionary, then you're going to have to get more involved.
Another gotcha might be what you intend to put in the dict. Python's pickle serialization is quite intelligent and can handle most cases without any need for adding custom support. However, when it doesn't work, it can be very difficult to understand what's gone wrong. So if you can, restrict the contents of the dict to Python's built-in types. If you start adding instances of custom classes, keep them to simple custom classes that don't do any funny stuff with attribute storage or access. And beware of adding instances of classes or types from add-ons. In general, if you start running into hard-to-understand problems with the pickling or unpickling, look at the non-built-in types in the dict.
If speed is really important, I just ran a test of loading a large python dictionary (35MB) from a pickle vs SELECTING from a MySql table with all keys and values stored in rows:
Pickle Method:
import time, pickle
t1 = time.clock()
f = open('story_data.pickle','rb')
s = pickle.load(f)
print time.clock() - t1
MySQL Method:
import database as db
t1 = time.clock()
data,msg = db.mysql(""" SELECT id,story from story_data;""")
data_dict = dict([(int(x),y.split(',')) for x,y in data])
print time.clock() - t1
Output: pickle method: 32.0785171704 mysql method: 3.25916336479
If a ten-fold speed enhancement is enough, the structure of the database probably doesn't matter. Note I am splitting all the comma separated data into lists as the values for 36,000 keys and it still only takes 3 seconds. So I've switched away from using pickles for large data sets, as the rest of the 400 line program I was using took about 3 seconds, and the pickle loading took 32 seconds.
Also note:
cPickle works just like pickle and is over 50% faster.
Don't try to pickle a class full of dictionaries and save in mysql: It doesn't reconstitute itself correctly, at least it didn't for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With