Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python base64 encoding and then decoding generic object

i am trying to convert a numpy.ndarray to base64 and then convert it back. Is base64 library the way to go? the very simple code below does not even works as expected. What am I missing?

import numpy as np

x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
print(x)
print(type(x))

encoded = base64.b64encode(x)
decoded = base64.b64decode(encoded)
print(decoded)
print(type(decoded))

is there a way to obtain back the original variable?

The general question is: can i convert "any" object to a binary string and then convert back to the original type?

I maybe can use pickle BUT I would need a compressed format (not in a file): something like

x_compressed = zipped(pickle.dumps(x))
like image 968
lordcenzin Avatar asked Jan 28 '23 01:01

lordcenzin


1 Answers

I am not sure what you are trying to accomplish, but you can base-64 encode any object that has a bytes representation. In the example you gave, you are encoding a numpy array to base64.

This works because a numpy array has a bytes form. You can reach it by either wrapping bytes() around the array or by use the .tobytes() method.

import numpy as np

x = np.array([1,2,3])

bytes(x)
# returns:
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'

x.tobytes()
# returns:
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00'

Since we have a bytes representation of the array, you can pass it to the base64 encoder. Note that if the object is not a byte-like object, it will base64 will try to convert it before encoding, as in the following example:

base64.b64encode(x)
# returns
b'AQAAAAIAAAADAAAA'

base64.b64encode(x.tobytes())
# returns
b'AQAAAAIAAAADAAAA'

The byte array is nothing special. It is just a sequence of numbers! That's it. You reason you did not recover the numpy array is because encoding-decoding process still just leaves you with the result from x.tobytes() not x itself.

To get back the original object, you need to an interface that can read a sequence of bytes and return an object of some sort. Luckily, numpy can do just that via the frombuffer function. However, you will need to tell numpy what TYPE of array it is reading as bytes.

In other words, you could have a int32 array and an int16 array that have identical byte representations, but to recover the correct one, you need to tell numpy which TYPE is correct. So you need some sort of knowledge of the object.

x = np.array([1,2,3])

# encode as base 64
x_64 = base64.b64encode(x.tobytes())

# decode back to bytes
x_bytes = base64.b64decode(x_64)

# use numpy to recreate original array of ints
np.frombuffer(x_bytes, dtype=int)
# returns:
np.array([1, 2, 3])

If you want to save an object and then recover it later, that process is called serialization. There are two very good packages that handles serialization, the first is in the standard library, call pickle, the second is called dill and can handle more complicated objects.

import pickle

x = np.array([1,2,3])
pickled_x = pickle.dumps(x)
# pickled_x is a bytes-object that is a hard to read by humans. 

pickle.loads(x)
# returns:
np.array([1, 2, 3])
like image 72
James Avatar answered Jan 31 '23 21:01

James