Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IPC between C application and Python

Tags:

python

c

ipc

So I am relatively new to IPC and I have a c program that collects data and a python program that analyses the data. I want to be able to:

  1. Call the python program as a subprocess of my main c program
  2. Pass a c struct containing the data to be processed to the python process
  3. Return an int value from the python process back to the c program

I have been briefly looking at Pipes and FIFO, but so far cannot find any information to address this kind of problem, since as I understand it, a fork() for example will simply duplicate the calling process, so not what I want as I am trying to call a different process.

like image 879
PaddyOsmond Avatar asked Dec 16 '15 15:12

PaddyOsmond


3 Answers

About fork() and the need to execute a different process. It is true that fork() creates a copy of the current process. But this is usually coupled with exec() (one of the various forms) to get the process copy to execute a different program.

As for IPC, you have several choices. Someone mentioned a queue - but something like ZeroMQ is a overkill. You can do IPC with one of several mechanisms.

  1. Pipes (named pipes or anonymous)
  2. Unix domain sockets
  3. TCP or UDP via the sockets API
  4. Shared memory
  5. Message queues

The pipe approach is the easiest. Note that when you pass data back and forth between the C program and Python, you will need to worry about the transfer syntax of the data. If you choose to use C structs (which can be non portable), you will need to unpack the data on the Python side. Else you can use some textual format - combination of sprintf/sscanf, or JSON etc.

like image 167
Ziffusion Avatar answered Nov 14 '22 20:11

Ziffusion


I suggest looking at the application and structuring the issues you are confronted with.

Multi-threading

Starting two processes is by far not the biggest issue, as Ziffusion said you can have another process do something else. Plus there are python bindings for C, so you can create another thread for example (no need for it to be a process) and call your python routines from the C program.

Communication

Sharing information is more interesting as you have to solve two issues: one is technically getting the data from one place to another and viceversa; the other is how two different things can work on the same data. This goes into messaging patterns and process flow:

  • who generates the data?
  • who receives the data?
  • is there a piece of code waiting for something before proceeding?
  • is there the need to control what happens to the data while the data is processed?
  • do I want to code it myself?
  • can I use libraries in the project?
  • are there security limitations?
  • ...

Once you answer the above questions, you can define how your pieces of the application are going to interact. One main distinction is synchronous vs asynchronous.

Sync vs Async

Synchronous means that for every message there is a reply which should be contained in a time envelope of finite (usually as small as possible) size. This in order to avoid latency. This a pattern best used when you have to finely control what's happening, or you need an answer to the question in timely manner. It is, in fact, how http works to download web pages: whenever you load a web site, you want to see the content right now. This is a pattern called REQuest/REPly

Asynchronous is often used in case of heavy processing: the data producer (for example a a database interface, or a sensor) sends a bulk of data to a worker thread, without waiting for an answer. The worker thread then starts doing its job on the data, and when it's done sends the results to a data sink/user. This pattern is called PUBlish/SUBscribe.

There are many others, but these form the basics of communication.

Marshalling

Another issue you face is how to structure the data passing, marshalling. How to get the meaning and content of your data from one context to a totally different one. For example from your C part to your Python part. Maintaining serializing libraries is tedious and perilous not to mention prone to backward compatibility issues.

Implementation

When you come to implementation you usually want the cleanest and most powerful code. The two things are clearly against each other. So I usually go look for a library that can do exactly what I need. In this case my advice is to try ZeroMQ: it is thin, flexible, low-level. It will give you a powerful framework to interface threads, processes and even machines. ZeroMQ provides the link, but you still need a protocol to run over this link. To avoid incredible headaches and streamline your work with respect to the marshaling issue, I suggest you investigate available marshaling libraries that make this task easy. Cap'n proto, Flatbuffers, Protocol buffers (Google, can't post more than 2 links yet) They make it easy to define your data in an intermediate language, and parse it from any other language without you having to write all the classes yourself.

As for pipes and shared memory my humble opinion is: forget they exist.

like image 28
Claudio Avatar answered Nov 14 '22 21:11

Claudio


The way you are organizing the architecture is a bit messy. What you really want is Message Queues. So in your example:

  • Your python worker listens for new information to process in queue A;
  • Your C program input data in queue A;
  • Your python worker process the data and queue the result into queue B;
  • Your C program listens for new items on queue B;

This may vary, but the concept is simple.

They are easy to implement, and has tons of libraries and tools to aid you on this task. ZeroMQ would do for you, for sure. It works with C and Python.

like image 1
0xd Avatar answered Nov 14 '22 21:11

0xd