I have defined this function
def writeonfiles(a,seed):
random.seed(seed)
f = open(a, "w+")
for i in range(0,10):
j = random.randint(0,10)
#print j
f.write(j)
f.close()
Where a is a string containing the path of the file and seed is an integer seed. I want to parallelize a simple program in such a way that each core takes one of the available paths that I give in, seeds its random generator and write some random numbers on that files, so, for example, if I pass the vector
vector = [Test/file1.txt, Test/file2.txt]
and the seeds
seeds = (123412, 989898),
it gives to the first available core the function
writeonfiles(Test/file1.txt, 123412)
and to the second one the same function with different arguments:
writeonfiles(Test/file2.txt, 989898)
I have looked through a lot of similar questions here on Stackoverflow, but I cannot make any solution work. What I tried is:
def writeonfiles_unpack(args):
return writeonfiles(*args)
if __name__ == "__main__":
folder = ["Test/%d.csv" %i for i in range(0,4)]
seed = [234124, 663123, 12345 ,123833]
p = multiprocessing.Pool()
p.map(writeonfiles, (folder,seed))
and gives me TypeError: writeonfiles() takes exactly 2 arguments (1 given).
I tried also
if __name__ == "__main__":
folder = ["Test/%d.csv" %i for i in range(0,4)]
seed = [234124, 663123, 12345 ,123833]
p = multiprocessing.Process(target=writeonfiles, args= [folder,seed])
p.start()
But it gives me
File "/usr/lib/python2.7/random.py", line 120, in seed
super(Random, self).seed(a)
TypeError: unhashable type: 'list'
Finally, I tried the contextmanager
@contextmanager
def poolcontext(*args, **kwargs):
pool = multiprocessing.Pool(*args, **kwargs)
yield pool
pool.terminate()
if __name__ == "__main__":
folder = ["Test/%d" %i for i in range(0,4)]
seed = [234124, 663123, 12345 ,123833]
a = zip(folder, seed)
with poolcontext(processes = 3) as pool:
results = pool.map(writeonfiles_unpack,a )
and it results in File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value
TypeError: 'module' object is not callable
The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.
Pool . It creates multiple Python processes in the background and spreads out your computations for you across multiple CPU cores so that they all happen in parallel without you needing to do anything.
The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.
Python 2.7 lacks the starmap
pool-method from Python 3.3+ . You can overcome this by decorating your target function with a wrapper, which unpacks the argument-tuple and calls the target function:
import os
from multiprocessing import Pool
import random
from functools import wraps
def unpack(func):
@wraps(func)
def wrapper(arg_tuple):
return func(*arg_tuple)
return wrapper
@unpack
def write_on_files(a, seed):
random.seed(seed)
print("%d opening file %s" % (os.getpid(), a)) # simulate
for _ in range(10):
j = random.randint(0, 10)
print("%d writing %d to file %s" % (os.getpid(), j, a)) # simulate
if __name__ == '__main__':
folder = ["Test/%d.csv" % i for i in range(0, 4)]
seed = [234124, 663123, 12345, 123833]
arguments = zip(folder, seed)
pool = Pool(4)
pool.map(write_on_files, iterable=arguments)
pool.close()
pool.join()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With