Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

msgpack unserialising dict key strings to bytes

Tags:

python

msgpack

I am having issues with msgpack in python. It seems that when serialising a dict, if the keys are strings str, they are not unserialised properly and causing KeyError exceptions to be raised.

Example:

>>> import msgpack
>>> d = dict()
>>> value = 1234
>>> d['key'] = value
>>> binary = msgpack.dumps(d)
>>> new_d = msgpack.loads(binary)
>>> new_d['key']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'key'

This is because the keys are not strings after calling loads() but are unserialised to bytes objects.

>>> d.keys()
dict_keys(['key'])
>>> new_d.keys()
dict_keys([b'key'])

It seems this is related to a unimplemented feature as mentioned in github

My question is, Is there a way to fix this issue or a work around to ensure that the same keys can be used upon deserialisation?

I would like to use msgpack but if I cannot build a dict object with str keys and expect to be able to use the same key upon deserilisation, it becomes useless.

like image 937
Nathan McCoy Avatar asked Jan 18 '18 11:01

Nathan McCoy


2 Answers

A default encoding is set when calling dumps or packb

:param str encoding:
 |      Convert unicode to bytes with this encoding. (default: 'utf-8')

but it is not set by default when calling loads or unpackb as seen in:

Help on built-in function unpackb in module msgpack._unpacker:

unpackb(...)
    unpackb(... encoding=None, ... )

Therefore changing the encoding on the deserialisation fixes the issue, for example:

>>> d['key'] = 1234
>>> binary = msgpack.dumps(d)
>>> msgpack.loads(binary, encoding = "utf-8")
{'key': 1234}
>>> msgpack.loads(binary, encoding = "utf-8") == d
True
like image 100
Nathan McCoy Avatar answered Nov 10 '22 08:11

Nathan McCoy


Using the raw=False flag as such worked for me on your example:

msgpack.unpackb(binary, raw=False)
# or
msgpack.loads(binary, raw=False)

See https://msgpack-python.readthedocs.io/en/latest/api.html#msgpack.Unpacker:

raw (bool) – If true, unpack msgpack raw to Python bytes. Otherwise, unpack to Python str by decoding with UTF-8 encoding (default).

like image 45
Jean Monet Avatar answered Nov 10 '22 09:11

Jean Monet