Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consuming the pickle binary format from non-python (with celery and rabbitmq)

I'm using Python, Celery and RabbitMQ to produce messages from loosely coupled systems. However, I'm worried about interoperability.

When inspecting the message payload directly from RabbitMQ, that is produced by celery, I get the following binary format:

Binary version of celery output

I strongly suspect that this is a binary pickle format. However, I'm having trouble finding information on the binary pickle format in general.

So, I really have a few questions:

  1. Is this a binary pickle format?
  2. What resources are available to map out the binary format?
  3. Given that celery does, in fact, produce pickled data, what options are available to me if I want to consume those messages from non-python consumers (such as c++ or php)?
  4. Do you have any experiences of working with Celery, RabbitMQ and interoperating with other consumers which are not python. Do you have any advice regarding that subject?

Thanks in advance...

UPDATE:

Based on Brendan's recommendation, I've switched this to a JSON serializer with:

add.apply_async(args=[10, 10], serializer="json")

For reference for future searchers, it appears that the JSON format, in this specific, empty case, is about 15% larger (or 28 bytes):

JSON serialized version from celery

Also, for people that might be interested in reading the pickle format from c++, I found this question helpful: How can I read a python pickle database/file from C?

UPDATE 2:

Based on Asksol's recommendation, I tried out the zlib compression with:

async_result = add.apply_async( (x, y), compression='zlib' )

I thought there were some interesting results, so here they are:

Format comparison table

As you can see in this example, the Pickle format is smaller than JSON. However, when compression is added to the mix, compressed JSON is actually smaller than either version of Pickle. I'm also curious about the parse times of either format. While JSON was designed to parser performant, Pickle is based on offsets, which means it wouldn't have to be iterated through. I wonder if anyone has done any performance benchmarks on the two formats, with and without compressions, and taking parsing CPU time into account.

like image 339
Homer6 Avatar asked Aug 29 '12 18:08

Homer6


1 Answers

According to the documentation, you can make Celery use JSON instead. I'd recommend doing that since it's pretty standard, no matter what language you use. If you use a lot of binary data, it might increase the size of the messages though.

Data transferred between clients and workers needs to be serialized. The default serializer is pickle, but you can change this globally or for each individual task. There is built-in support for pickle, JSON, YAML and msgpack, and you can also add your own custom serializers by registering them into the Kombu serializer registry (see Kombu: Serialization of Data).

like image 59
Brendan Long Avatar answered Nov 04 '22 01:11

Brendan Long