Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast Perl<->Python serialization that supports integer dictionary keys

I'm looking for a fast (xml is way too slow) serialization method that could be used in both Perl and Python.

Unfortunately, I can't use JSON (and many others) because it always changes type of a dict key from integer to a string. I need serialization/deserialization that preserves key type.

Python:

>>> import json
>>> dict_before = {1:'one', 20: 'twenty'}
>>> data = json.dumps(dict_before)
>>> dict_after = json.loads(data)

>>> dict_before
{1: 'one', 20: 'twenty'}            #integer keys
>>> dict_after
{u'1': u'one', u'20': u'twenty'}    #string keys

Any suggestions are welcome.

like image 502
cimak Avatar asked Mar 23 '23 16:03

cimak


2 Answers

You can use yaml.

>>> import yaml
>>> dict_before = {1:'one', 20: 'twenty'}
>>> data = yaml.safe_dump(dict_before)
>>> dict_after = yaml.safe_load(data)
>>> dict_after
{1: 'one', 20: 'twenty'}

I had similar problem. I wanted to share a configuration file in Perl and Python and I had to use yaml.

You can install yaml module in python with:

pip install PyYAML

Although, integer keys will be converted to string in perl => Legal values for Perl hash key

like image 189
moliware Avatar answered Mar 25 '23 07:03

moliware


Mu. You have started with the wrong premise.

Perl does not have a meaningful type system that distinguishes between numbers and strings. Any given value can be both. It can't be determined using only the Perl language whether a given value is considered to be a number only (although you can use modules like Devel::Peek). It is utterly impossible to know what type a given value originally was.

my $x = 1;     # an integer (IV), right?
say "x = $x";  # not any more! It's a PVIV now (string and integer)

Furthermore, in a hash map (“dictionary”), the key type is always coerced to a string. In arrays, the key is always coerced to an integer. Other types can only be faked.

This is wonderful when parsing text, but of course introduces endless pain when serializing a data structure. JSON maps perfectly to Perl data structures, so I suggest you stick to that (or YAML, as it is a superset of JSON) to protect yourself from the delusion that a serialization could infer information that isn't possibly there.

What do we take from this?

  • If interop is important, refrain from using creative dictionary types in Python.

  • You can always encode type information in the serialization should it really be important (hint: it probably isn't): {"type":"interger dict", "data":{"1":"foo","2":"bar"}}

  • It would also be premature to dismiss XML as too slow. See this recent article, although I disagree with the methods, and it restricts itself to JS (last week's HN thread for perspective).

    If it is native, it will probably be fast enough, so obviously don't use any pure-Perl or pure-Python implementations. This also holds for JSON- and YAML- and whatnot -parsers.

like image 44
amon Avatar answered Mar 25 '23 05:03

amon