Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ZeroMQ/Python - CPU affinity hickup?

I have the following strange situation.

We have a process, call it Distributor, that receives tasks over ZeroMQ/TCP from Client, and accumulates them in a queue. There is a Worker process, which talks with the Distributor over ZeroMQ/IPC. The Distributor forwards each incoming task to Worker, and waits for an answer. As soon as the Worker answers, it sends it another task (if there was one received in the mean time), and returns the answer to the Client (over a separate ZeroMQ/TCP connection). If a task was not processed within 10ms, it is dropped from the queue.

With 1 Worker, the system is capable to process ~3,500 requests/sec. The client sends 10,000 requests/sec, so 6,500 requests are dropped.

But - when I'm running some unrelated process on the server, which takes 100% CPU (a busy wait loop, or whatever) - then, strangely, the system can suddenly process ~7,000 requests/sec. When the process is stopped, it returns back to 3,500. The server has 4 cores.

The same happens when running 2, 3 or 4 Workers (connected to the same Distributor), with slightly different numbers.

The Distributor is written in C++. The Worker is written in Python, and uses pyzmq binding. The worker process is a simple arithmetic process, and does not depend on any external I/O other than Distributor.

There is a theory that this has to do with ZeroMQ using threads on separate CPUs when the server is free, and the same CPU when it's busy. If this is the case, I would appreciate an idea how to configure thread/CPU affinity of ZeroMQ so that it works correctly (without running a busy loop in background).

Is there any ZeroMQ setting that might explain / fix this?

EDIT:

This doesn't happen with a Worker written in C++.

like image 594
SashaM Avatar asked Oct 30 '22 22:10

SashaM


1 Answers

This was indeed a CPU affinity problem. Turns out that using ZeroMQ in a setting where a worker processes an input and waits for the next one, if the context switch causes it to switch to another process, a lot of time is wasted on copying the ZeroMQ data.

Running the worker with

taskset -c 1 python worker.py

solves the problem.

like image 95
SashaM Avatar answered Nov 10 '22 19:11

SashaM