Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly reuse a session while multiprocessing

I have some python code that is listening to a socket connection and processes tasks that it receives via that socket. The python code has a multiprocessing pool in it that takes care of these tasks. (A task is assigned to some process in the pool)

These tasks involve an API request and I noticed that each time it is doing a task it is repeating the handshake (whereas it should reuse the session).

To make sure the processes are not interfering with one another I tried to give each process it's own session. But for some reason I do not get that to work.

My attempt

import multiprocessing
import requests


def init_pool():
    global s
    s = requests.Session()
    print(s)


def f():
    print(s)


pool = multiprocessing.Pool(4, initializer=init_pool )



res= pool.map(f,[1,2,3,4])

When looking at the print in the function f you can notice that the sessions are actually the same. Which surprised my because this code

import multiprocessing
import random


def init_pool():
    global r
    r = random.random()


def f():
    print(r)


pool = multiprocessing.Pool(4, initializer=init_pool )



res= pool.map(f,[1,2,3,4])

does print distinct values.

My question is how can I reuse a session in a multiprocessing pool the right way? (preventing repeated handshakes)

like image 847
Daniel van der Maas Avatar asked Oct 18 '25 12:10

Daniel van der Maas


1 Answers

What is taking place there is that Multiprocessing Pool actually delays the creation of workers until they are needed, or there is some "idle" time.

By simply adding more information to the prints, that is clearly visible - what is taking place is that the single line tasks finish so fast they are actually all running in the first, and same, worker, before multprocessing find the need to spin-off other workers.

By printing the process identification, along with session Id, and a uniquely guaranteed random number that does not depend on S.O. assigned memspawnory addresses to the worker, as well as attributing the prints to the init or tasks phases of a process, this is clearly visible:

import multiprocessing
import requests
import random

def init_pool():
    global s
    s = requests.Session()
    s.myid = random.randint(0, 1000)
    print(f"init - {multiprocessing.current_process()}  session id: {id(s)}, myid: {s.myid}")


def f(x):
    print(f"worker - {multiprocessing.current_process()}  session id: {id(s)}, myid: {s.myid}")

if __name__ == "__main__":
    multiprocessing.set_start_method('spawn')
    pool = multiprocessing.Pool(4, initializer=init_pool )
    res= pool.map(f,[1,2,3,4])

Which will print, for example:

init - <SpawnProcess name='SpawnPoolWorker-1' parent=20870 started daemon>  session id: 139938864231504, myid: 509
worker - <SpawnProcess name='SpawnPoolWorker-1' parent=20870 started daemon>  session id: 139938864231504, myid: 509
worker - <SpawnProcess name='SpawnPoolWorker-1' parent=20870 started daemon>  session id: 139938864231504, myid: 509
worker - <SpawnProcess name='SpawnPoolWorker-1' parent=20870 started daemon>  session id: 139938864231504, myid: 509
worker - <SpawnProcess name='SpawnPoolWorker-1' parent=20870 started daemon>  session id: 139938864231504, myid: 509
init - <SpawnProcess name='SpawnPoolWorker-2' parent=20870 started daemon>  session id: 139910687835472, myid: 747
init - <SpawnProcess name='SpawnPoolWorker-4' parent=20870 started daemon>  session id: 139784626402384, myid: 803
init - <SpawnProcess name='SpawnPoolWorker-3' parent=20870 started daemon>  session id: 140676029002576, myid: 359

Now, if I add a time.sleep(0.5) call inside f, allowing the execution of Pool.map actually create the other processes before submitting the next task, the result is this:

init - <SpawnProcess name='SpawnPoolWorker-3' parent=20939 started daemon>  session id: 139733583964496, myid: 741
init - <SpawnProcess name='SpawnPoolWorker-2' parent=20939 started daemon>  session id: 140209458611728, myid: 157
init - <SpawnProcess name='SpawnPoolWorker-1' parent=20939 started daemon>  session id: 140097358576912, myid: 178
init - <SpawnProcess name='SpawnPoolWorker-4' parent=20939 started daemon>  session id: 140214050276880, myid: 952
worker - <SpawnProcess name='SpawnPoolWorker-2' parent=20939 started daemon>  session id: 140209458611728, myid: 157
worker - <SpawnProcess name='SpawnPoolWorker-4' parent=20939 started daemon>  session id: 140214050276880, myid: 952
worker - <SpawnProcess name='SpawnPoolWorker-1' parent=20939 started daemon>  session id: 140097358576912, myid: 178
worker - <SpawnProcess name='SpawnPoolWorker-3' parent=20939 started daemon>  session id: 139733583964496, myid: 741

(Also, I've forced the mp to create subprocesses by using spawn, which is the default on Windows and MacOS. As I am running this on Linux, the default would be fork, which does not display the "run in the first Pool process" behavior)

like image 106
jsbueno Avatar answered Oct 20 '25 00:10

jsbueno



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!