i am trying to convert a numpy.ndarray to base64 and then convert it back. Is base64 library the way to go? the very simple code below does not even works as expected. What am I missing?
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
print(x)
print(type(x))
encoded = base64.b64encode(x)
decoded = base64.b64decode(encoded)
print(decoded)
print(type(decoded))
is there a way to obtain back the original variable?
The general question is: can i convert "any" object to a binary string and then convert back to the original type?
I maybe can use pickle BUT I would need a compressed format (not in a file): something like
x_compressed = zipped(pickle.dumps(x))
I am not sure what you are trying to accomplish, but you can base-64 encode any object that has a bytes
representation. In the example you gave, you are encoding a numpy array to base64.
This works because a numpy array has a bytes
form. You can reach it by either wrapping bytes()
around the array or by use the .tobytes()
method.
import numpy as np
x = np.array([1,2,3])
bytes(x)
# returns:
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
x.tobytes()
# returns:
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'
Since we have a bytes
representation of the array, you can pass it to the base64 encoder. Note that if the object is not a byte-like object, it will base64
will try to convert it before encoding, as in the following example:
base64.b64encode(x)
# returns
b'AQAAAAIAAAADAAAA'
base64.b64encode(x.tobytes())
# returns
b'AQAAAAIAAAADAAAA'
The byte array is nothing special. It is just a sequence of numbers! That's it. You reason you did not recover the numpy array is because encoding-decoding process still just leaves you with the result from x.tobytes()
not x
itself.
To get back the original object, you need to an interface that can read a sequence of bytes and return an object of some sort. Luckily, numpy can do just that via the frombuffer
function. However, you will need to tell numpy what TYPE of array it is reading as bytes.
In other words, you could have a int32
array and an int16
array that have identical byte representations, but to recover the correct one, you need to tell numpy which TYPE is correct. So you need some sort of knowledge of the object.
x = np.array([1,2,3])
# encode as base 64
x_64 = base64.b64encode(x.tobytes())
# decode back to bytes
x_bytes = base64.b64decode(x_64)
# use numpy to recreate original array of ints
np.frombuffer(x_bytes, dtype=int)
# returns:
np.array([1, 2, 3])
If you want to save an object and then recover it later, that process is called serialization. There are two very good packages that handles serialization, the first is in the standard library, call pickle
, the second is called dill
and can handle more complicated objects.
import pickle
x = np.array([1,2,3])
pickled_x = pickle.dumps(x)
# pickled_x is a bytes-object that is a hard to read by humans.
pickle.loads(x)
# returns:
np.array([1, 2, 3])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With