Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing.Pool in jupyter notebook works on linux but not windows

Tags:

I'm trying to run a few independent computations (though reading from the same data). My code works when I run it on Ubuntu, but not on Windows (windows server 2012 R2), where I get the error:

'module' object has no attribute ...

when I try to use multiprocessing.Pool (it appears in the kernel console, not as output in the notebook itself)

(And I've already made the mistake of defining the function AFTER creating the pool, and I've also corrected it, that's not the problem).

This happens even on the simplest of examples:

from multiprocessing import Pool
def f(x):
    return x**2
pool = Pool(4)
for res in pool.map(f,range(20)):
    print res

I know that it needs to be able to import the module (and I have no idea how this works when working in the notebook), and I've heard of IPython.Parallel, but I have been unable to find any documentation or examples.

Any solutions/alternatives would be most welcome.

like image 739
user1999728 Avatar asked May 08 '16 18:05

user1999728


People also ask

Does Python multiprocessing work on Windows?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

How do I run Jupyter notebook locally windows?

Windows File Explorer + Command Prompt Once you've entered your specific folder with Windows Explorer, you can simply press ALT + D, type in cmd and press Enter. You can then type jupyter notebook to launch Jupyter Notebook within that specific folder.

How do processes pools work in multiprocessing?

Pool allows multiple jobs per process, which may make it easier to parallel your program. If you have a numbers jobs to run in parallel, you can make a Pool with number of processes the same number of as CPU cores and after that pass the list of the numbers jobs to pool. map.

What is the difference between pool and process in multiprocessing?

While the Process keeps all the processes in the memory, the Pool keeps only those that are under execution. Therefore, if you have a large number of tasks, and if they have more data and take a lot of space too, then using process class might waste a lot of memory. The overhead of creating a Pool is more.


1 Answers

I would post this as a comment since I don't have a full answer, but I'll amend as I figure out what is going on.

from multiprocessing import Pool

def f(x):
    return x**2

if __name__ == '__main__':
    pool = Pool(4)
    for res in pool.map(f,range(20)):
        print(res)

This works. I believe the answer to this question is here. In short, the subprocesses do not know they are subprocesses and are attempting to run the main script recursively.

This is the error I am given, which gives us the same solution:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
like image 97
GRAYgoose124 Avatar answered Oct 18 '22 11:10

GRAYgoose124