Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.7: How to compensate for missing pool.starmap?

I have defined this function

def writeonfiles(a,seed):
    random.seed(seed)

    f = open(a, "w+")
    for i in range(0,10):
        j = random.randint(0,10)
        #print j
        f.write(j)
    f.close()

Where a is a string containing the path of the file and seed is an integer seed. I want to parallelize a simple program in such a way that each core takes one of the available paths that I give in, seeds its random generator and write some random numbers on that files, so, for example, if I pass the vector

vector = [Test/file1.txt, Test/file2.txt] 

and the seeds

seeds = (123412, 989898), 

it gives to the first available core the function

writeonfiles(Test/file1.txt, 123412) 

and to the second one the same function with different arguments:

writeonfiles(Test/file2.txt, 989898)

I have looked through a lot of similar questions here on Stackoverflow, but I cannot make any solution work. What I tried is:

def writeonfiles_unpack(args):
    return writeonfiles(*args)
if __name__ == "__main__":
     folder = ["Test/%d.csv" %i for i in range(0,4)]
     seed = [234124, 663123, 12345 ,123833]
     p = multiprocessing.Pool()
     p.map(writeonfiles, (folder,seed))

and gives me TypeError: writeonfiles() takes exactly 2 arguments (1 given).

I tried also

if __name__ == "__main__":
    folder = ["Test/%d.csv" %i for i in range(0,4)]
    seed = [234124, 663123, 12345 ,123833]
    p = multiprocessing.Process(target=writeonfiles, args= [folder,seed])
    p.start()

But it gives me
File "/usr/lib/python2.7/random.py", line 120, in seed super(Random, self).seed(a) TypeError: unhashable type: 'list'

Finally, I tried the contextmanager

 @contextmanager
 def poolcontext(*args, **kwargs):
     pool = multiprocessing.Pool(*args, **kwargs)
     yield pool
     pool.terminate()

if __name__ == "__main__":
    folder = ["Test/%d" %i for i in range(0,4)]
    seed = [234124, 663123, 12345 ,123833]
    a = zip(folder, seed)
    with poolcontext(processes = 3) as pool:
    results = pool.map(writeonfiles_unpack,a )

and it results in File "/usr/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value

TypeError: 'module' object is not callable

like image 923
Francesco Di Lauro Avatar asked Oct 04 '18 16:10

Francesco Di Lauro


People also ask

How does pool map work Python?

The pool's map method chops the given iterable into a number of chunks which it submits to the process pool as separate tasks. The pool's map is a parallel equivalent of the built-in map method. The map blocks the main execution until all computations finish. The Pool can take the number of processes as a parameter.

What does pool mean in Python?

Pool . It creates multiple Python processes in the background and spreads out your computations for you across multiple CPU cores so that they all happen in parallel without you needing to do anything.

What is pool in multiprocessing python?

The Pool class in multiprocessing can handle an enormous number of processes. It allows you to run multiple jobs per process (due to its ability to queue the jobs). The memory is allocated only to the executing processes, unlike the Process class, which allocates memory to all the processes.


1 Answers

Python 2.7 lacks the starmap pool-method from Python 3.3+ . You can overcome this by decorating your target function with a wrapper, which unpacks the argument-tuple and calls the target function:

import os
from multiprocessing import Pool
import random
from functools import wraps


def unpack(func):
    @wraps(func)
    def wrapper(arg_tuple):
        return func(*arg_tuple)
    return wrapper

@unpack
def write_on_files(a, seed):
    random.seed(seed)
    print("%d opening file %s" % (os.getpid(), a))  # simulate
    for _ in range(10):
        j = random.randint(0, 10)
       print("%d writing %d to file %s" % (os.getpid(), j, a))  # simulate


if __name__ == '__main__':

    folder = ["Test/%d.csv" % i for i in range(0, 4)]
    seed = [234124, 663123, 12345, 123833]

    arguments = zip(folder, seed)

    pool = Pool(4)
    pool.map(write_on_files, iterable=arguments)
    pool.close()
    pool.join()
like image 146
Darkonaut Avatar answered Nov 15 '22 01:11

Darkonaut