I am trying to implement the multiprocessing
module for a working with a large csv file. I am using Python 2.7 and following the example from here.
I ran the unmodified code (copied below for convenience) and noticed that print
statements within the worker
function do not work. The inability to print
makes it difficult to understand the flow and debug.
Can anyone please explain why print
is not working here? Does pool.map not execute print commands? I searched online but did not find any documentation that would indicate this.
import multiprocessing as mp
import itertools
import time
import csv
def worker(chunk):
# `chunk` will be a list of CSV rows all with the same name column
# replace this with your real computation
print(chunk) # <----- nothing prints
print 'working' # <----- nothing prints
return len(chunk)
def keyfunc(row):
# `row` is one row of the CSV file.
# replace this with the name column.
return row[0]
def main():
pool = mp.Pool()
largefile = 'test.dat'
num_chunks = 10
results = []
with open(largefile) as f:
reader = csv.reader(f)
chunks = itertools.groupby(reader, keyfunc)
while True:
# make a list of num_chunks chunks
groups = [list(chunk) for key, chunk in
itertools.islice(chunks, num_chunks)]
if groups:
result = pool.map(worker, groups)
results.extend(result)
else:
break
pool.close()
pool.join()
print(results)
if __name__ == '__main__':
main()
The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.
Using Pool. The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.
Pool. apply_async is also like Python's built-in apply , except that the call returns immediately instead of waiting for the result. An AsyncResult object is returned. You call its get() method to retrieve the result of the function call.
Daemon processes in Python Python multiprocessing module allows us to have daemon processes through its daemonic option. Daemon processes or the processes that are running in the background follow similar concept as the daemon threads. To execute the process in the background, we need to set the daemonic flag to true.
This is an issue with IDLE, which you're using to run your code. IDLE does a fairly basic emulation of a terminal for handling the output of a program you run in it. It cannot handle subprocesses though, so while they'll run just fine in the background, you'll never see their output.
The simplest fix is to simply run your code from the command line.
An alternative might be to use a more sophisticated IDE. There are a bunch of them listed on the Python wiki, though I'm not sure which ones have better terminal emulation for multiprocessing output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With