I have a simple program to run 8 processes, It remarkably reduces the script running time by using multiprocessing, however, I am not sure how many processes should I put to maximum my CPU utilization. Currently my cpu is 6 cores with only 1 physical cpu as it is a VPS. :
def spider1():
def spider2():
def spider3():
def spider4():
def spider5():
def spider6():
def spider7():
def spider8():
if __name__ == '__main__':
p1 = multiprocessing.Process(target=spider1,)
p2 = multiprocessing.Process(target=spider2,)
p3 = multiprocessing.Process(target=spider3,)
p4 = multiprocessing.Process(target=spider4, )
p5 = multiprocessing.Process(target=spider5, )
p6 = multiprocessing.Process(target=spider6, )
p7 = multiprocessing.Process(target=spider7, )
p8 = multiprocessing.Process(target=spider8, )
p1.start()
p2.start()
p3.start()
p4.start()
p5.start()
p6.start()
p7.start()
p8.start()
If you want to use the number of cpu's to calculate number of process to spawn, use cpu_count to find the number of cpu's,
psutil.cpu_count()
But using the CPU utilization to calculate the number of spawned processes could be a better approach, to check the CPU utilization, you could do something like,
import psutil
psutil.cpu_times_percent(interval=1, percpu=False)
this will give you the cpu usage and for example you could use that information to decide if you want to spawn a new process or not. It might be a good idea to keep an eye on memory and swap too.
I think this answer might be useful to look at, Limit total CPU usage in python multiprocessing
For a recommendation you have to give much more information about your use case. Multi-processing and the associated communication primitives like queues introduce overhead. Additionally, reasoning about such an issue using a VPS introduces many variables that might heavily skew experimental results.
N
and multiply by a factor starting with 1.0
that increases with independent IO load and decreases asymptotically to 1/N
with dependent IO load of your tasks. This means that, if for example your parallel tasks fight over one limited resource, like a spinning harddisk, decrease parallelism (lockout cost) and concurrency (task switching cost by seektime) down to one. No IO leaves you with the number of cores that you can then use on full burn. With IO that is independent this rule would lead you to increase the number of tasks running in parallel, so the CPU cores can switch to another task when one runs into an IO operation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With