I looked through several SO-Questions for how to pickle a python object and store it into a database. The information I collected is: <ul> <li> <code>import pickle</code> or <code>import cpickle</code>. Import the latter, if performance is an issue.</li> <li>Assume <code>dict</code> is a python dictionary (or what so ever python object): <code>pickled = pickle.dumps(dict)</code>.</li> <li>store <code>pickled</code> into a MySQL BLOB Column using what so ever module to communicate with Database.</li> <li>Get it out again. And use <code>pickle.loads(pickled)</code> to restore the python dictionary.</li> </ul> I just want to make sure I understood this right. Did I miss something critical? Are there sideeffects? Is it really that easy? Background-Info: The only thing I want to do, is store Googlegeocoder-Responses, which are nested python dictionarys in my case. I am only using a little part of the response object and I don't know if I will ever need more of it later on. That's why I thought of storing the response to save me repetition of some million querys.

If speed is really important, I just ran a test of loading a large python dictionary (35MB) from a pickle vs SELECTING from a MySql table with all keys and values stored in rows: Pickle Method: <pre class="prettyprint"><code>import time, pickle t1 = time.clock() f = open('story_data.pickle','rb') s = pickle.load(f) print time.clock() - t1 </code></pre> MySQL Method: <pre class="prettyprint"><code>import database as db t1 = time.clock() data,msg = db.mysql(""" SELECT id,story from story_data;""") data_dict = dict([(int(x),y.split(',')) for x,y in data]) print time.clock() - t1 </code></pre> Output: pickle method: 32.0785171704 mysql method: 3.25916336479 If a ten-fold speed enhancement is enough, the structure of the database probably doesn't matter. Note I am splitting all the comma separated data into lists as the values for 36,000 keys and it still only takes 3 seconds. So I've switched away from using pickles for large data sets, as the rest of the 400 line program I was using took about 3 seconds, and the pickle loading took 32 seconds. Also note: cPickle works just like pickle and is over 50% faster. Don't try to pickle a class full of dictionaries and save in mysql: It doesn't reconstitute itself correctly, at least it didn't for me.

How to Pickle a python dictionary into MySQL?

Tags:

python

mysql

pickle

I looked through several SO-Questions for how to pickle a python object and store it into a database. The information I collected is:

import pickle or import cpickle. Import the latter, if performance is an issue.
Assume dict is a python dictionary (or what so ever python object): pickled = pickle.dumps(dict).
store pickled into a MySQL BLOB Column using what so ever module to communicate with Database.
Get it out again. And use pickle.loads(pickled) to restore the python dictionary.

I just want to make sure I understood this right. Did I miss something critical? Are there sideeffects? Is it really that easy?

Background-Info: The only thing I want to do, is store Googlegeocoder-Responses, which are nested python dictionarys in my case. I am only using a little part of the response object and I don't know if I will ever need more of it later on. That's why I thought of storing the response to save me repetition of some million querys.

921

asked Aug 19 '11 05:08

Aufwind

2 Answers

It's really that easy... so long as you don't need your DB to know anything about the dictionary. If you need any sort of structured data access to the contents of the dictionary, then you're going to have to get more involved.

Another gotcha might be what you intend to put in the dict. Python's pickle serialization is quite intelligent and can handle most cases without any need for adding custom support. However, when it doesn't work, it can be very difficult to understand what's gone wrong. So if you can, restrict the contents of the dict to Python's built-in types. If you start adding instances of custom classes, keep them to simple custom classes that don't do any funny stuff with attribute storage or access. And beware of adding instances of classes or types from add-ons. In general, if you start running into hard-to-understand problems with the pickling or unpickling, look at the non-built-in types in the dict.

100

answered Sep 20 '22 03:09

Ross Patterson

If speed is really important, I just ran a test of loading a large python dictionary (35MB) from a pickle vs SELECTING from a MySql table with all keys and values stored in rows:

Pickle Method:

import time, pickle
t1 = time.clock()
f = open('story_data.pickle','rb')
s = pickle.load(f)
print time.clock() - t1

MySQL Method:

import database as db
t1 = time.clock()
data,msg = db.mysql(""" SELECT id,story from story_data;""")
data_dict = dict([(int(x),y.split(',')) for x,y in data])
print time.clock() - t1

Output: pickle method: 32.0785171704 mysql method: 3.25916336479

If a ten-fold speed enhancement is enough, the structure of the database probably doesn't matter. Note I am splitting all the comma separated data into lists as the values for 36,000 keys and it still only takes 3 seconds. So I've switched away from using pickles for large data sets, as the rest of the 400 line program I was using took about 3 seconds, and the pickle loading took 32 seconds.

Also note:

cPickle works just like pickle and is over 50% faster.

Don't try to pickle a class full of dictionaries and save in mysql: It doesn't reconstitute itself correctly, at least it didn't for me.

answered Sep 21 '22 03:09

Marc Maxmeister

Related questions
                            
                                Are there any benchmarks showing good performance of `collections.deque`?
                            
                                Installing psyco for python on Snow Leopard
                            
                                Split a rectangle into n equally sized rectangles
                            
                                how do i import and use WordNet 3.0 in python?
                            
                                Insert keypresses into the Linux console from Python
                            
                                image loading performance problems with python and gobject
                            
                                generate video pixel by pixel, programmatically
                            
                                Django ORM & Unit of Work
                            
                                Is it possible to use Python modules in Octave?
                            
                                Raising exceptions without 'raise' in the traceback? [duplicate]
                            
                                Python: how to share an object instance across multiple invocations of a script
                            
                                Consuming a rabbitmq message queue with multiple threads (Python Kombu)
                            
                                2D slice series of 3D array in numpy
                            
                                using swig to bind google protocol buffers
                            
                                WindowsError: [Error 5] Access is denied
                            
                                using cpd on python
                            
                                GDAL Raster Output
                            
                                Python: how to dynamically set function closure environment
                            
                                Creating an arc with a given thickness using PIL's Imagedraw
                            
                                Force Content-Type or expose request.data in Flask for known content-type

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With