Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

the only overhead incurred by fork is page table duplication and process id creation

The only overhead incurred by fork() is the duplication of the parent’s page tables and the creation of a unique process descriptor for the child. In Linux, fork() is implemented through the use of copy-on-write pages. Copy-on-write (or COW) is a technique to delay or altogether prevent copying of the data.

so why is there a need to copy page tables . as long as the processes share the pages in read only mode or until they write something there is no need that the page tables need to be copied because the translation is the same for both the parent and child??

can someone please explain..

thanks in advance

like image 958
user1071758 Avatar asked May 23 '13 22:05

user1071758


People also ask

What is fork in process creation?

fork() returns 0 in the child process and positive integer in the parent process. Here, two outputs are possible because the parent process and child process are running concurrently. So we don't know whether the OS will first give control to the parent process or the child process.

Does fork copy page table?

The Fork system call duplicates the kernel state of the parent process, which requires copying the process page tables – typically the largest process-specific kernel data structure.

Does fork duplicate data?

fork() duplicates the entire process. The only difference is in the return value of the fork() call itself -- in the parent it returns the child's PID, in the child it returns 0 . Most operating systems optimize this, using a technique called copy on write.

Does fork return process ID?

RETURN VALUE Upon successful completion, fork() returns 0 to the child process and returns the process ID of the child process to the parent process.


1 Answers

Because COW works on the basis that the page is read-only, so we need a copy of the page-table that is all read-only. When the new process writes to somewhere, a page-fault is taken as a consequences of writing to a page that is read-only. The page-fault handler looks at the status of the page, determines whether it's supposed to be written to (if not, segfault, just like if you write to read-only in the original process) and copies the relevant original page to the new process.

The original page-table is read-write for some of the entries, so at least those will have to be copied. I do believe the entire page-table is copied (because it makes some other code simpler, and a page-table entry is not very large - four or eight bytes per page [plus one entry per 4096KB, plus one for every 4009*4096KB, etc up the hierarchy].

There are also some interesting aspects if, for example, we have some code that does:

 char *ptr = malloc(big_number);
 // Fill ptr[...] with some data. 
 if(!fork())
 {
      // child process works on ptr data. 
      ...
 }
 else
 {
    free(ptr);
 }

Now, the page-table entries in the parent process will be removed. If we are sharing these with the child process, we need to know that those page-table entries are shared.

Lots of other similar problems occur when receiving/sending data via network, writing to disk, swapping pages in and out, etc, etc.

like image 105
Mats Petersson Avatar answered Nov 15 '22 10:11

Mats Petersson