I have a large number of bytes per second coming from a sensor device (e.g., video) that are being read and processed by a process in a Docker container.
I have a second Docker container that would like to read the processed byte stream (still a large number of bytes per second).
What is an efficient way to read this stream? Ideally I'd like to have the first container write to some sort of shared memory buffer that the second container can read from, but I don't think separate Docker containers can share memory. Perhaps there is some solution with a shared file pointer, with the file saved to an in-memory file system?
My goal is to maximize performance and minimize useless copies of data from one buffer to another as much as possible.
Edit: Would love to have solutions for both Linux and Windows. Similarly, I'm interested in finding solutions for doing this in C++ as well as python.
Multiple containers can run with the same volume when they need access to shared data. Docker creates a local volume by default. However, we can use a volume diver to share data across multiple machines.
Create a fifo with mkfifo /tmp/myfifo
. Share it with both containers: --volume /tmp/myfifo:/tmp/myfifo:rw
You can directly use it:
From container 1: echo foo >>/tmp/myfifo
In Container 2: read var </tmp/myfifo
Drawback: Container 1 is blocked until Container 2 reads the data and empties the buffer.
Avoid the blocking: In both containers, run in bash exec 3<>/tmp/myfifo
.
From container 1: echo foo >&3
In Container 2: read var <&3
(or e.g. cat <&3
)
This solution uses exec
file descriptor handling from bash
. I don't know how, but certainly it is possible with other languages, too.
Using simple TCP socket would be my first choice. Only if measurements show that we absolutely need to squeeze the last bit of performance from the system that I would fall back to or pipes or shared memory.
Going by the problem statement, the process seems to be bound by the local CPU/mem resources and that the limiting factors are not external services. In that case having both producer and consumer on the same machine (as docker containers) might bound the CPU resource before anything else - BUT I will first measure before acting.
Most of the effort in developing a code is spent in maintaining it. So I favor mainstream practices. TCP stack has rock solid foundations and it is as optimized for performance as humanly possible. Also it is lot more (completely?) portable across platforms and frameworks. Docker containers on same host when communicating over TCP do not hit wire. If some day the processes do hit resource limit, you can scale horizontally by splitting the producer and consumer across physical hosts - manually or say using Kubernetes. TCP will work seamlessly in that case. If you never gonna need that level of throughput, then you also wont need system-level sophistication in inter process communication.
Go by TCP.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With