Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the pythonic way to deal with worker processes that must coordinate their tasks?

I'm currently learning Python (from a Java background), and I have a question about something I would have used threads for in Java.

My program will use workers to read from some web-service some data periodically. Each worker will call on the web-service at various times periodically.

From what I have read, it's preferable to use the multiprocessing module and set up the workers as independent processes that get on with their data-gathering tasks. On Java I would have done something conceptually similar, but using threads. While it appears I can use threads in Python, I'll lose out on multi-cpu utilisation.

Here's the guts of my question: The web-service is throttled, viz., the workers must not call on it more than x times per second. What is the best way for the workers to check on whether they may request data?

I'm confused as to whether this should be achieved using:

  • Pipes as a way to communicate to some other 'managing object', which monitors the total calls per second.
  • Something along the lines of nmap, to share some data/value between the processes that describes if they may call the web-service.
  • A Manager() object that monitors the calls per seconds and informs workers if they have permission to make their calls.

Of course, I guess this may come down to how I keep track of the calls per second. I suppose one option would be for the workers to call a function on some other object, which makes the call to the web-service and records the current number of calls/sec. Another option would be for the function that calls the web-service to live within each worker, and for them to message a managing object every time they make a call to the web-service.

Thoughts welcome!

like image 447
Edwardr Avatar asked Jun 19 '11 20:06

Edwardr


People also ask

How does multiprocessing Queue work?

A queue is a data structure on which items can be added by a call to put() and from which items can be retrieved by a call to get(). The multiprocessing. Queue provides a first-in, first-out FIFO queue, which means that the items are retrieved from the queue in the order they were added.

How do I share data between two processes in Python?

Passing Messages to Processes A simple way to communicate between process with multiprocessing is to use a Queue to pass messages back and forth. Any pickle-able object can pass through a Queue. This short example only passes a single message to a single worker, then the main process waits for the worker to finish.

What is multiprocessing manager Python?

Multiprocessing Manager provides a way of creating centralized Python objects that can be shared safely among processes. Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines.

What is a Python process?

In Python, a process is an instance of the Python interpreter that executes Python code. In Python, the first process created when we run our program is called the 'MainProcess'. It is also a parent process and may be called the main parent process. The main process will create the first child process or processes.


2 Answers

Delegate the retrieval to a separate process which queues the requests until it is their turn.

like image 173
Ignacio Vazquez-Abrams Avatar answered Nov 02 '22 14:11

Ignacio Vazquez-Abrams


I think that you'll find that the multiprocessing module will provide you with some fairly familiar constructs.

You might find that multiprocessing.Queue is useful for connecting your worker threads back to a managing thread that could provide monitoring or throttling.

like image 22
quamrana Avatar answered Nov 02 '22 13:11

quamrana