My Python application currently uses the python-memcached API to set and get objects in memcached. This API uses Python's native pickle module to serialize and de-serialize Python objects.
This API makes it simple and fast to store nested Python lists, dictionaries and tuples in memcached, and reading these objects back into the application is completely transparent -- it just works.
But I don't want to be limited to using Python exclusively, and if all the memcached objects are serialized with pickle, then clients written in other languages won't work.
Here are the cross-platform serialization options I've considered:
Considering the priorities for this app, what's the ideal object serialization method for memcached?
To make a Java object serializable we implement the java. io. Serializable interface. The ObjectOutputStream class contains writeObject() method for serializing an Object.
Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” 1 or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”. The pickle module is not secure. Only unpickle data you trust.
There are three types of serialization in . Net : Binary Serialization, SOAP Serialization and XML Serialization.
Data serialization is the process of converting an object into a stream of bytes to more easily save or transmit it. The reverse process—constructing a data structure or object from a series of bytes—is deserialization.
One major consideration is "do you want to have to specify each structure definition"?
If you are OK with that, then you could take a look at:
Both of these solutions require supporting files to define each data structure.
If you would prefer not to incur the developer overhead of pre-defining each structure, then take a look at:
However, I believe that both of these have had issues with transporting binary content, which is why they were ruled out for our usage. Note: YAML may have good binary support, you will have to check the client libraries -- see here: http://yaml.org/type/binary.html
At our company, we rolled our own library (Extruct) for cross-language serialization with binary support. We currently have (decently) fast implementations in Python and PHP, although it isn't very human readable due to using base64 on all the strings (binary support). Eventually we will port them to C and use more standard encoding.
Dynamic languages like PHP and Python get really slow if you have too many iterations in a loop or have to look at each character. C on the other hand shines at such operations.
If you'd like to see the implementation of Extruct, please let me know. (contact info at http://blog.gahooa.com/ under "About Me")
I tried several methods and settled on compressed JSON as the best balance between speed and memory footprint. Python's native Pickle function is slightly faster, but the resulting objects can't be used with non-Python clients.
I'm seeing 3:1 compression so all the data fits in memcache and the app gets sub-10ms response times including page rendering.
Here's a comparison of JSON, Thrift, Protocol Buffers and YAML, with and without compression:
http://bouncybouncy.net/ramblings/posts/more_on_json_vs_thrift_and_protocol_buffers/
Looks like this test got the same results I did with compressed JSON. Since I don't need to pre-define each structure, this seems like the fastest and smallest cross-platform answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With