Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest method to dump numpy array into string

I need to organized a data file with chunks of named data. Data is NUMPY arrays. But I don't want to use numpy.save or numpy.savez function, because in some cases, data have to be sent on a server over a pipe or other interface. So I want to dump numpy array into memory, zip it, and then, send it into a server.

I've tried simple pickle, like this:

try:
    import cPickle as pkl
except:
    import pickle as pkl
import ziplib
import numpy as np

def send_to_db(data, compress=5):
     send( zlib.compress(pkl.dumps(data),compress) )

.. but this is extremely slow process.

Even with compress level 0 (without compression), the process is very slow and just because of pickling.

Is there any way to dump numpy array into string without pickle? I know that numpy allows to get buffer numpy.getbuffer, but it isn't obvious to me, how to use this dumped buffer to obtaine an array back.

like image 205
rth Avatar asked May 11 '17 21:05

rth


2 Answers

You should definitely use numpy.save, you can still do it in-memory:

>>> import io
>>> import numpy as np
>>> import zlib
>>> f = io.BytesIO()
>>> arr = np.random.rand(100, 100)
>>> np.save(f, arr)
>>> compressed = zlib.compress(f.getbuffer())

And to decompress, reverse the process:

>>> np.load(io.BytesIO(zlib.decompress(compressed)))
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>

Which, as you can see, matches what we saved earlier:

>>> arr
array([[ 0.80881898,  0.50553303,  0.03859795, ...,  0.05850996,
         0.9174782 ,  0.48671767],
       [ 0.79715979,  0.81465744,  0.93529834, ...,  0.53577085,
         0.59098735,  0.22716425],
       [ 0.49570713,  0.09599001,  0.74023709, ...,  0.85172897,
         0.05066641,  0.10364143],
       ...,
       [ 0.89720137,  0.60616688,  0.62966729, ...,  0.6206728 ,
         0.96160519,  0.69746633],
       [ 0.59276237,  0.71586014,  0.35959289, ...,  0.46977027,
         0.46586237,  0.10949621],
       [ 0.8075795 ,  0.70107856,  0.81389246, ...,  0.92068768,
         0.38013495,  0.21489793]])
>>>
like image 101
juanpa.arrivillaga Avatar answered Nov 02 '22 19:11

juanpa.arrivillaga


THe default pickle method provides a pure ascii output. To get (much) better performance, use the latest version available. Versions 2 and above are binary and, if memory serves me right, allows numpy arrays to dump their buffer directly into the stream without addtional operations.

To select version to use, add the optional argument while pickling (no need to specify it while unpickling), for instance pkl.dumps(data, 2). To pick the latest possible version, use pkl.dumps(data, -1)

Note that if you use different python versions, you need to specify the lowest supported version. See Pickle documentation for details on the different versions

like image 3
ilmarinen Avatar answered Nov 02 '22 18:11

ilmarinen