Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the fastest way to save/load a large list in Python 2.7?

What's the fastest way to save/load a large list in Python 2.7? I apologize if this has already been asked, I couldn't find an answer to this exact question when I searched...

More specifically, I'm testing out methods for simulating something, and I need to compare the result from each method I test out to an exact solution. I have a Python script that produces a list of values representing the exact solution, and I don't want to re-compute it every time I run a new simulation. Thus, I want to save it somewhere and just load the solution instead of re-computing it every time I want to see how good my simulation results are.

I also don't need the saved file to be human-readable. I just need to be able to load it in Python.

like image 492
nukeguy Avatar asked May 05 '15 15:05

nukeguy


People also ask

How do I permanently store a list in python?

Save the list in a file. You can use json to format and parse it. You can also "save" it (the correct term is "serialize") using pickle (docs.python.org/3/library/pickle.html). This can be used for a variety of python objects such as dictionaries etc...


1 Answers

Using np.load and tolist is significantly faster than any other solution:

In [77]: outfile = open("test.pkl","w")   
In [78]: l = list(range(1000000))   

In [79]:  timeit np.save("test",l)
10 loops, best of 3: 122 ms per loop

In [80]:  timeit np.load("test.npy").tolist()
10 loops, best of 3: 20.9 ms per loop

In [81]: timeit pickle.load(outfile)
1 loops, best of 3: 1.86 s per loop

In [82]: outfile = open("test.pkl","r")

In [83]: timeit pickle.load(outfile)
1 loops, best of 3: 1.88 s per loop

In [84]: cPickle.dump(l,outfile)
....: 
1 loops, best of 3: 
273 ms per loop    
In [85]: outfile = open("test.pkl","r")
In [72]: %%timeit
cPickle.load(outfile)
   ....: 
1 loops, best of 3: 
539 ms per loop

In python 3 numpy is far more efficient if you use a numpy array:

In [24]: %%timeit                  
out = open("test.pkl","wb")
pickle.dump(l, out)
   ....: 
10 loops, best of 3: 27.3 ms per loop

In [25]: %%timeit
out = open("test.pkl","rb")
pickle.load(out)
   ....: 
10 loops, best of 3: 52.2 ms per loop

In [26]: timeit np.save("test",l)
10 loops, best of 3: 115 ms per loop

In [27]: timeit np.load("test.npy")
100 loops, best of 3: 2.35 ms per loop

If you want a list it is again faster to call tolist and use np.load:

In [29]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 37 ms per loop
like image 73
Padraic Cunningham Avatar answered Oct 20 '22 10:10

Padraic Cunningham