Python list serialization - fastest method

Tags:

I need to load (de-serialize) a pre-computed list of integers from a file in a Python script (into a Python list). The list is large (upto millions of items), and I can choose the format I store it in, as long as loading is fastest.

Which is the fastest method, and why?

Using import on a .py file that just contains the list assigned to a variable
Using cPickle's load
Some other method (perhaps numpy?)

Also, how can one benchmark such things reliably?

Addendum: measuring this reliably is difficult, because import is cached so it can't be executed multiple times in a test. The loading with pickle also gets faster after the first time probably because page-precaching by the OS. Loading 1 million numbers with cPickle takes 1.1 sec the first time run, and 0.2 sec on subsequent executions of the script.

Intuitively I feel cPickle should be faster, but I'd appreciate numbers (this is quite a challenge to measure, I think).

And yes, it's important for me that this performs quickly.

Thanks

750

asked Feb 17 '09 13:02

Eli Bendersky

1 Answers

I would guess cPickle will be fastest if you really need the thing in a list.

If you can use an array, which is a built-in sequence type, I timed this at a quarter of a second for 1 million integers:

from array import array
from datetime import datetime

def WriteInts(theArray,filename):
    f = file(filename,"wb")
    theArray.tofile(f)
    f.close()

def ReadInts(filename):
    d = datetime.utcnow()
    theArray = array('i')
    f = file(filename,"rb")
    try:
        theArray.fromfile(f,1000000000)
    except EOFError:
        pass
    print "Read %d ints in %s" % (len(theArray),datetime.utcnow() - d)
    return theArray

if __name__ == "__main__":
    a = array('i')
    a.extend(range(0,1000000))
    filename = "a_million_ints.dat"
    WriteInts(a,filename)
    r = ReadInts(filename)
    print "The 5th element is %d" % (r[4])

142

answered Oct 23 '22 14:10

Carlos A. Ibarra

Related questions
                            
                                Kivy: scroll to zoom
                            
                                removing redundant columns when using get_dummies
                            
                                docker swarm throwing an error "swarm already part of swarm"
                            
                                How does the CRC32 function work when using sampling data?
                            
                                Creating a 3D surface plot from three 1D arrays
                            
                                AWS Batch analog in GCP?
                            
                                Django/Nginx - Error 403 Forbidden when serving media files over some size
                            
                                How to use logger to print a list in just one line in Python
                            
                                How to handle Shift in Forecasted value
                            
                                Selecting non-adjacent columns by column number pandas [duplicate]
                            
                                Receive webRTC video stream using python OpenCV in real-time
                            
                                Transpose a 1-dimensional array in Numpy without casting to matrix
                            
                                Why does my code throw "NameError: name 'ModuleNotFoundError' is not defined" error?
                            
                                Pickle a frozen dataclass that has __slots__
                            
                                How to append a list to pandas column, series?
                            
                                Change Interpreter in Jupyter notebook
                            
                                Asyncio in corroutine RuntimeError: no running event loop
                            
                                Can't pip install Tensorflow 'msvcp140_1.dll' missing
                            
                                Docker how to make python 3.8 as default
                            
                                Making a virtual package available via sys.modules

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python list serialization - fastest method

Tags:

python

serialization

caching

Eli Bendersky

People also ask

1 Answers

Carlos A. Ibarra

Recent Activity

Donate For Us