Please consider a class as follow:
class Foo:
def __init__(self, data):
self.data = data
def do_task(self):
#do something with data
In my application I've a list containing several instances of Foo class. The aim is to execute do_task
for all Foo objects. A first implementation is simply:
#execute tasks of all Foo Object instantiated
for f_obj in my_foo_obj_list:
f_obj.do_task()
I'd like to take advantage of multi-core architecture sharing the for
cycle between 4 CPUs of my machine.
What's the best way to do it?
Key Takeaways Python is NOT a single-threaded language. Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.
To recap, threading in Python allows multiple threads to be created within a single process, but due to GIL, none of them will ever run at the exact same time. Threading is still a very good option when it comes to running multiple I/O bound tasks concurrently.
Many computationally expensive tasks for machine learning can be made parallel by splitting the work across multiple CPU cores, referred to as multi-core processing.
This is why Python multithreading can provide a large speed increase. The processor can switch between the threads whenever one of them is ready to do some work. Using the threading module in Python or any other interpreted language with a GIL can actually result in reduced performance.
Instead of going through all the multithreading/multicore basics, I would like to reference a Post by Ryan W. Smith: Multi-Core and Distributed Programming in Python
He will go into details how you can utilize multiple cores and use those concepts. But please be careful with that stuff if you are not familiar with general multithreading concepts.
Functional Programming will also allow you to customize the algorithm/function for each core.
You can use process pools in multiprocessing module.
def work(foo):
foo.do_task()
from multiprocessing import Pool
pool = Pool()
pool.map(work, my_foo_obj_list)
pool.close()
pool.join()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With