I'm new in python threading and I'm experimenting this:
When I run something in threads (whenever I print outputs), it never seems to be running in parallel. Also, my functions take the same time that before using the library concurrent.futures (ThreadPoolExecutor).
I have to calculate the gains of some attributes over a dataset (I cannot use libraries). Since I have about 1024 attributes and the function was taking about a minute to execute (and I have to use it in a for iteration) I dicided to split the array of attributes
into 10 (just as an example) and run the separete function gain(attribute)
separetly for each sub array. So I did the following (avoiding some extra unnecessary code):
def calculate_gains(self):
splited_attributes = np.array_split(self.attributes, 10)
result = {}
for atts in splited_attributes:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(self.calculate_gains_helper, atts)
return_value = future.result()
self.gains = {**self.gains, **return_value}
Here's the calculate_gains_helper:
def calculate_gains_helper(self, attributes):
inter_result = {}
for attribute in attributes:
inter_result[attribute] = self.gain(attribute)
return inter_result
Am I doing something wrong? I read some other older posts but I couldn't get any info. Thanks a lot for any help!
In fact, a Python process cannot run threads in parallel but it can run them concurrently through context switching during I/O bound operations. This limitation is actually enforced by GIL. The Python Global Interpreter Lock (GIL) prevents threads within the same process to be executed at the same time.
ThreadPoolExecutor Thread-Safety Although the ThreadPoolExecutor uses threads internally, you do not need to work with threads directly in order to execute tasks and get results. Nevertheless, when accessing resources or critical sections, thread-safety may be a concern.
ThreadPoolExecutor. ThreadPoolExecutor is an Executor subclass that uses a pool of threads to execute calls asynchronously. An Executor subclass that uses a pool of at most max_workers threads to execute calls asynchronously.
ThreadPoolExecutor Methods : submit(fn, *args, **kwargs): It runs a callable or a method and returns a Future object representing the execution state of the method. map(fn, *iterables, timeout = None, chunksize = 1) : It maps the method and iterables together immediately and will raise an exception concurrent. futures.
Python threads do not run in parallel (at least in CPython implementation) because of the GIL. Use processes and ProcessPoolExecutor to really have parallelism
with concurrent.futures.ProcessPoolExecutor() as executor:
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With