Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing threads never join when given large amounts of work

I don't believe this is a duplicate of this, because his problem appeared to have been caused by using multiprocessing.pool, which I am not doing.

This program:

import multiprocessing
import time

def task_a(procrange,result):
    "Naively identify prime numbers in an iterator of integers. Procrange may not contain negative numbers, 0, or 1. Result should be a multiprocessing.queue."

    for i in procrange: #For every number in our given iterator...
        for t in range (2,(i//2)+1): #Take every number up to half of it...
            if (i % t == 0): #And see if that number goes evenly into it.
                break   #If it does, it ain't prime.
        else:
            #print(i)
            result.put(i) #If the loop never broke, it's prime.




if __name__ == '__main__':
    #We seem to get the best times with 4 processes, which makes some sense since my machine has 4 cores (apparently hyperthreading doesn't do shit)
    #Time taken more or less halves for every process up to 4, then very slowly climbs back up again as overhead eclipses the benifit from concurrency
    processcount=4
    procs=[]
    #Will search up to this number.
    searchto=11000
    step=searchto//processcount
    results=multiprocessing.Queue(searchto)
    for t in range(processcount):
        procrange=range(step * t, step * (t+1) )
        print("Process",t,"will search from",step*t,"to",step*(t+1))
        procs.append(
                     multiprocessing.Process(target=task_a, name="Thread "+str(t),args=(procrange,results))
                     )
    starttime=time.time()
    for theproc in procs:
        theproc.start()
    print("Processing has begun.")

    for theproc in procs:
        theproc.join()
        print(theproc.name,"has terminated and joined.")
    print("Processing finished!")
    timetook=time.time()-starttime

    print("Compiling results...")

    resultlist=[]
    try:
        while True:
            resultlist.append(results.get(False))
    except multiprocessing.queues.Empty:
        pass

    print(resultlist)
    print("Took",timetook,"seconds to find",len(resultlist),"primes from 0 to",searchto,"with",processcount,"concurrent executions.")

... works perfectly, giving the result:

Process 0 will search from 0 to 2750
Process 1 will search from 2750 to 5500
Process 2 will search from 5500 to 8250
Process 3 will search from 8250 to 11000
Processing has begun.
Thread 0 has terminated and joined.
Thread 1 has terminated and joined.
Thread 2 has terminated and joined.
Thread 3 has terminated and joined.
Processing finished!
Compiling results...
[Many Primes]
Took 0.3321540355682373 seconds to find 1337** primes from 0 to 11000 with 4 concurrent executions.

However, if search_to is increased by even 500...

Processing has begun.
Thread 0 has terminated and joined.
Thread 1 has terminated and joined.
Thread 2 has terminated and joined.

... and the rest is silence. Process Hacker shows the Python threads consuming 12% CPU each, petering out one by one... and not terminating. They just hang until I terminate them manually.

Why?

** Clearly, either God or Guido has a cruel sense of humor.

like image 731
Schilcote Avatar asked Apr 23 '26 14:04

Schilcote


1 Answers

It seems that a problem is in result.put(i), because when I commited it, the script began to work well. So I suggest you do not use to save the results multiprocessing.Queue. Instead, you can use the database: MySQL, MongoDB etc. Note: you cannot use SQLite, because with SQLite only one process can be making changes to the database at any moment in time (from docs).

like image 101
NorthCat Avatar answered Apr 25 '26 03:04

NorthCat