Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python:How os.fork() works?

Tags:

python

fork

I am learning multiprocessing in python. I tried multiprocessing and after I read the source code of multiprocessing module, I found it use os.fork(), so I wrote some code to test os.fork(), but I am stuck. My code is as following:

#!/usr/bin/env python # -*- coding: utf-8 -*-  import os import time  for i in range(2):     print '**********%d***********' % i     pid = os.fork()     print "Pid %d" % pid 

I think that each print will be executed two times but they execute three times. I can't understand how this works? I read this Need to know how fork works?
From what this article says it also will be executed twice, so I am so stuck...

like image 400
tudouya Avatar asked Nov 06 '15 06:11

tudouya


People also ask

What happens when the os fork () function is executed?

Thus executing os. fork() creates two processes: A parent process and a child process. The newly created child process is the exact replica of the parent process. The child process will have copies of the descriptors if any used by the parent process.

Is os fork () Python a system call?

fork is Unix system call which is used to create a new process. System call way to request services from the kernel.

What is an os fork?

In an operating system, a fork is a Unix or Linux system call to create a new process from an existing running process. The new process is a child process of the calling parent process.

What does os module do in Python?

Python OS module provides the facility to establish the interaction between the user and the operating system. It offers many useful OS functions that are used to perform OS-based tasks and get related information about operating system. The OS comes under Python's standard utility modules.


1 Answers

First of all, remove that print '******...' line. It just confuses everyone. Instead, let's try this code...

import os import time  for i in range(2):     print("I'm about to be a dad!")     time.sleep(5)     pid = os.fork()     if pid == 0:         print("I'm {}, a newborn that knows to write to the terminal!".format(os.getpid()))     else:         print("I'm the dad of {}, and he knows to use the terminal!".format(pid))         os.waitpid(pid, 0) 

Okay, first of all, what is "fork"? Fork is a feature of modern and standard-compliant operating systems (except of M$ Windows: that joke of an OS is all but modern and standard-compliant) that allows a process (a.k.a: "program", and that includes the Python interpreter!) to literally make an exact duplicate of itself, effectively creating a new process (another instance of the "program"). Once that magic is done, both processes are independent. Changing anything in one of them does not affect the other one.

The process responsible for spelling out this dark and ancient incantation is known as the parent process. The soulless result of this immoral abomination towards life itself is known as the child process.

As shall be obvious to all, including those for which it isn't, you can become a member of that select group of programmers who have sold their soul by means of os.fork(). This function performs a fork operation, and thus results in a second process being created out of thin air.

Now, what does this function return, or more importantly, how does it even return? If you want not to become insane, please don't go and read the Linux kernel's /kernel/fork.c file! Once the kernel does what we know it has to do, but we don't want to accept it, os.fork() returns in the two processes! Yes, even the call stack is copied on!

So, if they are exact copies, how does one differentiate between parent and child? Simple. If the result of os.fork() is zero, then you're working in the child. Otherwise, you're working in the parent, and the return value is the PID (Process IDentifier) of the child. Anyway, the child can get its own PID from os.getpid(), no?

Now, taking this into account, and the fact that doing fork() inside a loop is the recipe for mess, this is what happens. Let's call the original process the "master" process...

  • Master: i = 0, forks into child-#1-of-master
    • Child-#1-of-master: i = 1 forks into child-#1-of-child-#1-of-master
    • Child-#1-of-child-#1-of-master: for loop over, exits
    • Child-#1-of-master: for loop over, exits
  • Master: i = 1, forks into child-#2-of-master
    • Child-#2-of-master: i = 1 forks into child-#1-of-child-#2-of-master
    • Child-#1-of-child-#2-of-master: for loop over, exits
    • Child-#2-of-master: for loop over, exits
  • Master: for loop over, exits

As you can see, there are a total of 6 parent/child prints coming from 4 unique processes, resulting in 6 lines of output, something like...

I'm the dad of 12120, and he knows to use the terminal!

I'm 12120, a newborn that knows to write to the terminal!

I'm the dad of 12121, and he knows to use the terminal!

I'm 12121, a newborn that knows to write to the terminal!

I'm the dad of 12122, and he knows to use the terminal!

I'm 12122, a newborn that knows to write to the terminal!

But that's just arbitrary, it could have output this out instead...

I'm 12120, a newborn that knows to write to the terminal!

I'm the dad of 12120, and he knows to use the terminal!

I'm 12121, a newborn that knows to write to the terminal!

I'm the dad of 12121, and he knows to use the terminal!

I'm 12122, a newborn that knows to write to the terminal!

I'm the dad of 12122, and he knows to use the terminal!

Or anything other than that. The OS (and your motherboard's funky clocks) is solely responsible for the order in which processes get timeslices, so go blame on Torvalds (and expect no self-steem when back) if you dislike how the kernel manages to organize your processes ;).

I hope this has led some light on you!

like image 58
3442 Avatar answered Sep 17 '22 15:09

3442