Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I serialize a numpy array while preserving matrix dimensions?

numpy.array.tostring doesn't seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.reshape.

Is there a way to serialize a numpy array to JSON format while preserving this information?

Note: The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array.

Note 2: this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant.

like image 970
Louis Thibault Avatar asked Jun 07 '15 20:06

Louis Thibault


People also ask

Is NumPy array serializable?

As far as I know you can not simply serialize a numpy array with any data type and any dimension...but you can store its data type, dimension and information in a list representation and then serialize it using JSON.

Is NumPy array JSON serializable?

Custom JSON Encoder to Serialize NumPy ndarray Python json module has a JSONEncoder class, we can extend it to get more customized output. i.e., you will have to subclass JSONEncoder so you can implement custom NumPy JSON serialization.

Can NumPy arrays be multidimensional?

In general numpy arrays can have more than one dimension. One way to create such array is to start with a 1-dimensional array and use the numpy reshape() function that rearranges elements of that array into a new shape.


2 Answers

pickle.dumps or numpy.save encode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird structured dtypes. Endianness issues are probably the most important; you don't want array([1]) to suddenly become array([16777216]) because you loaded your array on a big-endian machine. pickle is probably the more convenient option, though save has its own benefits, given in the npy format rationale.

I'm giving options for serializing to JSON or a bytestring, because the original questioner needed JSON-serializable output, but most people coming here probably don't.

The pickle way:

import pickle a = # some NumPy array  # Bytestring option serialized = pickle.dumps(a) deserialized_a = pickle.loads(serialized)  # JSON option # latin-1 maps byte n to unicode code point n serialized_as_json = json.dumps(pickle.dumps(a).decode('latin-1')) deserialized_from_json = pickle.loads(json.loads(serialized_as_json).encode('latin-1')) 

numpy.save uses a binary format, and it needs to write to a file, but you can get around that with io.BytesIO:

a = # any NumPy array memfile = io.BytesIO() numpy.save(memfile, a)  serialized = memfile.getvalue() serialized_as_json = json.dumps(serialized.decode('latin-1')) # latin-1 maps byte n to unicode code point n 

And to deserialize:

memfile = io.BytesIO()  # If you're deserializing from a bytestring: memfile.write(serialized) # Or if you're deserializing from JSON: # memfile.write(json.loads(serialized_as_json).encode('latin-1')) memfile.seek(0) a = numpy.load(memfile) 
like image 50
user2357112 supports Monica Avatar answered Sep 20 '22 02:09

user2357112 supports Monica


EDIT: As one can read in the comments of the question this solution deals with "normal" numpy arrays (floats, ints, bools ...) and not with multi-type structured arrays.

Solution for serializing a numpy array of any dimensions and data types

As far as I know you can not simply serialize a numpy array with any data type and any dimension...but you can store its data type, dimension and information in a list representation and then serialize it using JSON.

Imports needed:

import json import base64 

For encoding you could use (nparray is some numpy array of any data type and any dimensionality):

json.dumps([str(nparray.dtype), base64.b64encode(nparray), nparray.shape]) 

After this you get a JSON dump (string) of your data, containing a list representation of its data type and shape as well as the arrays data/contents base64-encoded.

And for decoding this does the work (encStr is the encoded JSON string, loaded from somewhere):

# get the encoded json dump enc = json.loads(encStr)  # build the numpy data type dataType = numpy.dtype(enc[0])  # decode the base64 encoded numpy array data and create a new numpy array with this data & type dataArray = numpy.frombuffer(base64.decodestring(enc[1]), dataType)  # if the array had more than one data set it has to be reshaped if len(enc) > 2:      dataArray.reshape(enc[2])   # return the reshaped numpy array containing several data sets 

JSON dumps are efficient and cross-compatible for many reasons but just taking JSON leads to unexpected results if you want to store and load numpy arrays of any type and any dimension.

This solution stores and loads numpy arrays regardless of the type or dimension and also restores it correctly (data type, dimension, ...)

I tried several solutions myself months ago and this was the only efficient, versatile solution I came across.

like image 21
daniel451 Avatar answered Sep 24 '22 02:09

daniel451