Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in behavior between os.fork and multiprocessing.Process

Tags:

People also ask

What is fork in multiprocessing?

Permalink. Forking and spawning are two different start methods for new processes. Fork is the default on Linux (it isn't available on Windows), while Windows and MacOS use spawn by default. When a process is forked the child process inherits all the same variables in the same state as they were in the parent.

What is the difference between pool and process in multiprocessing?

Pool is generally used for heterogeneous tasks, whereas multiprocessing. Process is generally used for homogeneous tasks. The Pool is designed to execute heterogeneous tasks, that is tasks that do not resemble each other. For example, each task submitted to the process pool may be a different target function.

How does Linux multiprocessing work?

Linux is a multiprocessing operating system, its objective is to have a process running on each CPU in the system at all times, to maximize CPU utilization. If there are more processes than CPUs (and there usually are), the rest of the processes must wait before a CPU becomes free until they can be run.

Is multiprocessing possible in Windows?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.


I have this code :

import os

pid = os.fork()

if pid == 0:
    os.environ['HOME'] = "rep1"
    external_function()
else:
    os.environ['HOME'] = "rep2"
    external_function()

and this code :

from multiprocessing import Process, Pipe

def f(conn):
    os.environ['HOME'] = "rep1"
    external_function()
    conn.send(some_data)
    conn.close()

if __name__ == '__main__':
    os.environ['HOME'] = "rep2"
    external_function()
    parent_conn, child_conn = Pipe()
    p = Process(target=f, args=(child_conn,))
    p.start()
    print parent_conn.recv()
    p.join()

The external_function initializes an external programs by creating the necessary sub-directories in the directory found in the environment variable HOME. This function does this work only once in each process.

With the first example, which uses os.fork(), the directories are created as expected. But with second example, which uses multiprocessing, only the directories in rep2 get created.

Why isn't the second example creating directories in both rep1 and rep2?