I have some complex class A that computes data (large matrix calculations) while consuming input data from class B.
A itself uses multiple cores. However, when A needs the next chunk of data, it waits for quite some time since B runs in the same main thread.
Since A mainly uses the GPU for computations, I would like to have B collecting data concurrently on the CPU.
My latest approach was:
# every time *A* needs data
def some_computation_method(self):
data = B.get_data()
# start computations with data
...and B looks approximately like this:
class B(object):
def __init__(self, ...):
...
self._queue = multiprocessing.Queue(10)
loader = multiprocessing.Process(target=self._concurrent_loader)
def _concurrent_loader(self):
while True:
if not self._queue.full():
# here: data loading from disk and pre-processing
# that requires access to instance variables
# like self.path, self.batch_size, ...
self._queue.put(data_chunk)
else:
# don't eat CPU time if A is too busy to consume
# the queue at the moment
time.sleep(1)
def get_data(self):
return self._queue.get()
Could this approach be considered a "pythonic" solution?
Since I have not much experience with Python's multiprocessing module, I've built an easy/simplistic approach. However, it looks kind of "hacky" to me.
What would be a better solution to have a class B loading data from disk concurrently and supplying it via some queue, while the main thread runs heavy computations and consumes data from the queue from time to time?
While your solution is perfectly OK, especially for "small" projects, it has the downside of the threading getting tightly coupled with the class B
. Hence if you (for example) for some reason wanted to use B
in a non-threaded manner, your out of luck.
I would personally write the class in a thread safe manner, and then call it using threads from outside:
class B(object):
def __init__(self):
self._queue = multiprocessing.Queue(10)
...
if __name__ == '__main__':
b = B()
loader = multiprocessing.Process(target=b._concurrent_loader)
loader.start()
This makes B
more flexible, separates dependencies better and is easier to test. It also makes the code more readable by being explicit about the thread creation, as compared to it happening implicitly on class creation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With