Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to stream data between Docker containers

I have a large number of bytes per second coming from a sensor device (e.g., video) that are being read and processed by a process in a Docker container.

I have a second Docker container that would like to read the processed byte stream (still a large number of bytes per second).

What is an efficient way to read this stream? Ideally I'd like to have the first container write to some sort of shared memory buffer that the second container can read from, but I don't think separate Docker containers can share memory. Perhaps there is some solution with a shared file pointer, with the file saved to an in-memory file system?

My goal is to maximize performance and minimize useless copies of data from one buffer to another as much as possible.

Edit: Would love to have solutions for both Linux and Windows. Similarly, I'm interested in finding solutions for doing this in C++ as well as python.

like image 260
eraoul Avatar asked Jul 11 '18 22:07

eraoul


People also ask

Which feature in Docker allows me to share data between multiple containers?

Multiple containers can run with the same volume when they need access to shared data. Docker creates a local volume by default. However, we can use a volume diver to share data across multiple machines.


2 Answers

Create a fifo with mkfifo /tmp/myfifo. Share it with both containers: --volume /tmp/myfifo:/tmp/myfifo:rw

You can directly use it:

  • From container 1: echo foo >>/tmp/myfifo

  • In Container 2: read var </tmp/myfifo

Drawback: Container 1 is blocked until Container 2 reads the data and empties the buffer.

Avoid the blocking: In both containers, run in bash exec 3<>/tmp/myfifo.

  • From container 1: echo foo >&3

  • In Container 2: read var <&3 (or e.g. cat <&3)

This solution uses exec file descriptor handling from bash. I don't know how, but certainly it is possible with other languages, too.

like image 182
mviereck Avatar answered Oct 18 '22 06:10

mviereck


Using simple TCP socket would be my first choice. Only if measurements show that we absolutely need to squeeze the last bit of performance from the system that I would fall back to or pipes or shared memory.

Going by the problem statement, the process seems to be bound by the local CPU/mem resources and that the limiting factors are not external services. In that case having both producer and consumer on the same machine (as docker containers) might bound the CPU resource before anything else - BUT I will first measure before acting.

Most of the effort in developing a code is spent in maintaining it. So I favor mainstream practices. TCP stack has rock solid foundations and it is as optimized for performance as humanly possible. Also it is lot more (completely?) portable across platforms and frameworks. Docker containers on same host when communicating over TCP do not hit wire. If some day the processes do hit resource limit, you can scale horizontally by splitting the producer and consumer across physical hosts - manually or say using Kubernetes. TCP will work seamlessly in that case. If you never gonna need that level of throughput, then you also wont need system-level sophistication in inter process communication.

Go by TCP.

like image 25
inquisitive Avatar answered Oct 18 '22 06:10

inquisitive