Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass a pickle buffer from Node to Python

I have a Node application that subscribes to JSON data streams. I would like to extend this to subscribe to Python pickle data streams (I am willing to drop or convert non primitive types). The node-pickle & jpickle packages have failed me. I now wish to write my own Python script to convert pickles to JSON.

I fiddled with the node-pickle source code to get part of it to work (can pass JSON from Node to Python and get back a pickle string, can also use a predefined Python dict and pass to Node as JSON). My problem is getting Python to recognize the data from Node as pickled data. I am passing the data stream buffer from Node to Python and trying desparately to get the string buffer argument into a format for me to pickle.loads it.

After much trial and error I have ended up with this:

main.js

const pickle = require('node-pickle');
const amqp = require('amqplib/callback_api');

amqp.connect(`amqp://${usr}:${pwd}@${url}`, (err, conn) => {
  if (err) {
    console.error(err);
  }
  conn.createChannel((err, ch) => {
    if (err) {
      console.error(err);
    }
    ch.assertExchange(ex, 'fanout', { durable: false });
    ch.assertQueue('', {}, (err, q) => {
      ch.bindQueue(q.queue, ex, '');
      console.log('consuming');
      ch.consume(q.queue, msg => {
        console.log('Received [x]');
        const p = msg.content.toString('base64');
        pickle.loads(p).then(r => console.log('Res:', r));
        // conn.close();
      });
    });
  });
});

index.js (node-pickle)

const spawn = require('child_process').spawn,
  Bluebird = require('bluebird');

module.exports.loads = function loads(pickle) {
  return new Bluebird((resolve, reject) => {
    const convert = spawn('python', [__dirname + '/convert.py', '--loads']),
      stdout_buffer = [];

    convert.stdout.on('data', function(data) {
      stdout_buffer.push(data);
    });

    convert.on('exit', function(code) {
      const data = stdout_buffer.join('');
      // console.log('buffer toString', stdout_buffer[0] ? stdout_buffer[0].toString() : null);
      if (data == -1) {
        resolve(false);
      } else {
        let result;
        try {
          result = JSON.parse(data);
        } catch (err) {
          console.log('failed parse');
          result = false;
        }
        resolve(result);
      }
    });
    convert.stdin.write(pickle);
    convert.stdin.end();
  });
};

convert.py (node-pickle)

import sys
try:
    import simplejson as json
except ImportError:
    import json
try:
    import cPickle as pickle
except ImportError:
    import pickle

import codecs
import jsonpickle

def main(argv):
    try:
        if argv[0] == '--loads':
            buffer = sys.stdin.buffer.read()
            decoded = codecs.decode(buffer, 'base64')
            d = pickle.loads(decoded, encoding='latin1')
            j = jsonpickle.encode(d,False)
            sys.stdout.write(j)
        elif argv[0] == '--dumps':
            d = json.loads(argv[1])
            p = pickle.dumps(d)
            sys.stdout.write(str(p))
    except Exception as e:
        print('Error: ' + str(e))
        sys.stdout.write('-1')

if __name__ == '__main__':
    main(sys.argv[1:])

The error I come up against at the moment is:

invalid load key, '\xef'

EDIT 1: I am now sending the buffer string representation, instead of the buffer, to Python. I then use stdin to read it in as bytes. I started writing the bytes object to a file to compare to the data received from Node, to the buffer received when I subscribe to the data stream from a Python script. I have found that they seem to be identical, apart from certain \x.. sequences found when subscribing from Python, being represented as \xef\xbf\xbd when subscribing from Node. I assume this has something to do with string encoding?? Some examples of the misrepresented sequences are: \x80 (this is the first sequence after the b'; however \x80 does appear elsewhere), \xe3, and \x85.

EDIT 2: I have now encoded the string I'm sending to Python as base64, then, in Python, decoding the stdin buffer using codecs.decode. The buffer I'm writing to the file now looks more identical to the Python only stream, with no more \xef\xbf\xbd substitutions. However, I now come up against this error:

'ascii' codec can't decode byte 0xe3 in position 1: ordinal not in range(128)

Also, I found a slight difference when trying to match the last 1000 characters of each stream. The is a section in the Python stream (\x0c,'\x023) that looks like this (\x0c,\'\x023) in the stream from Node. Not sure how much that'll affect things.

EDIT 3 (Success!): After searching up my new error, I found the last piece of this encoding puzzle. Since I was working in Python 3, and the pickle came from Python 2.x, I needed to specify the encoding for pickle.loads as bytes or latin1(the one I needed). I was then able to make use of the wonderful jsonpickle package to do the work of JSON serializing the dict, changing datetime objects into date strings.

like image 898
Ace Avatar asked Jul 02 '26 03:07

Ace


1 Answers

So I was able to get the node-pickle npm package to work. My flow of getting a buffer of pickled data from Node to Python to get back JSON is:

In Node

  • Encode the buffer as a base64 string
  • Send the string to the Python child process as a stdin input, not an argument

In Python

  • Read in the buffer from stdin as bytes
  • Use codecs to decode it from base64
  • If using Python 3, specify bytes or latin1 encoding for pickle.loads
  • Use jsonpickle to serialize python objects in JSON

In Node

  • Collect the buffer from stdout and JSON.parse it
like image 133
Ace Avatar answered Jul 03 '26 16:07

Ace