I can convert a numpy ndarray to bytes using myndarray.tobytes()
Now how can I get it back to an ndarray?
Using the example from the .tobytes()
method docs:
>>> x = np.array([[0, 1], [2, 3]])
>>> bytes = x.tobytes()
>>> bytes
b'\x00\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
>>> np.some_magic_function_here(bytes)
array([[0, 1], [2, 3]])
To deserialize the bytes you need np.frombuffer()
.tobytes()
serializes the array into bytes and the np.frombuffer()
deserializes them.
Bear in mind that once serialized, the shape info is lost, which means that after deserialization, it is required to reshape it back to its original shape.
Below is a complete example:
import numpy as np
x = np.array([[0, 1], [2, 3]], np.int8)
bytes = x.tobytes()
# bytes is a raw array, which means it contains no info regarding the shape of x
# let's make sure: we have 4 values with datatype=int8 (one byte per array's item), therefore the length of bytes should be 4bytes
assert len(bytes) == 4, "Ha??? Weird machine..."
deserialized_bytes = np.frombuffer(bytes, dtype=np.int8)
deserialized_x = np.reshape(deserialized_bytes, newshape=(2, 2))
assert np.array_equal(x, deserialized_x), "Deserialization failed..."
After your edit it seems you are going into the wrong direction!
You can't use np.tobytes()
to store a complete array containing all informations like shapes and types when reconstruction from these bytes only is needed! It will only save the raw data (cell-values) and flatten these in C or Fortran-order.
Now we don't know your task. But you will need something based on serialization. There are tons of approaches, the easiest being the following based on python's pickle (example here: python3!):
import pickle
import numpy as np
x = np.array([[0, 1], [2, 3]])
print(x)
x_as_bytes = pickle.dumps(x)
print(x_as_bytes)
print(type(x_as_bytes))
y = pickle.loads(x_as_bytes)
print(y)
Output:
[[0 1]
[2 3]]
b'\x80\x03cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02C\x01bq\x03\x87q\x04Rq\x05(K\x01K\x02K\x02\x86q\x06cnumpy\ndtype\nq\x07X\x02\x00\x00\x00i8q\x08K\x00K\x01\x87q\tRq\n(K\x03X\x01\x00\x00\x00<q\x0bNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x0cb\x89C \x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00q\rtq\x0eb.'
<class 'bytes'>
[[0 1]
[2 3]]
The better alternative would be joblib's pickle with specialized pickling for large arrays. joblib's functions are file-object based and can be used in-memory with byte-strings too using python's BytesIO.
If you know the dimensions you are recreating ahead of time, do
numpy.ndarray(<dimensions>,<dataType>,<bytes(aka buffer)>)
x = numpy.array([[1.0,1.1,1.2,1.3],[2.0,2.1,2.2,2.3],[3.0,3.1,3.2,3.3]],numpy.float64)
#array([[1. , 1.1, 1.2, 1.3],
# [2. , 2.1, 2.2, 2.3],
# [3. , 3.1, 3.2, 3.3]])
xBytes = x.tobytes()
#b'\x00\x00\x00\x00\x00\x00\xf0?\x9a\x99\x99\x99\x99\x99\xf1?333333\xf3?\xcd\xcc\xcc\xcc\xcc\xcc\xf4?\x00\x00\x00\x00\x00\x00\x00@\xcd\xcc\xcc\xcc\xcc\xcc\x00@\x9a\x99\x99\x99\x99\x99\x01@ffffff\x02@\x00\x00\x00\x00\x00\x00\x08@\xcd\xcc\xcc\xcc\xcc\xcc\x08@\x9a\x99\x99\x99\x99\x99\t@ffffff\n@'
newX = numpy.ndarray((3,4),numpy.float64,xBytes)
#array([[1. , 1.1, 1.2, 1.3],
# [2. , 2.1, 2.2, 2.3],
# [3. , 3.1, 3.2, 3.3]])
Another approach might be, if you have stored your data as records of bytes rather than as an entire ndarray and your selection of data varies from ndarray to ndarray, you can aggregate your pre-array data as bytes in a python bytearray, then when it is the desired size, you already know the required dimensions, and can supply those dimensions/dataType with the bytearray as a buffer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With