I was having a discussion about the relative cost of fork() Vs thread() for parallelization of a task. We understand the basic differences between processes Vs Thread Thread: <ul> <li>Easy to communicate between threads</li> <li>Fast context switching.</li> </ul> Processes: <ul> <li>Fault tolerance.</li> <li>Communicating with parent not a real problem (open a pipe)</li> <li>Communication with other child processes hard</li> </ul> But we disagreed on the start-up cost of processes Vs threads. So to test the theories I wrote the following code. My question: Is this a valid test of measuring the start-up cost or I am missing something. Also I would be interested in how each test performs on different platforms. <h3>fork.cpp</h3> <pre class="prettyprint"><code>#include <boost/lexical_cast.hpp> #include <vector> #include <unistd.h> #include <iostream> #include <stdlib.h> #include <time.h> extern "C" int threadStart(void* threadData) { return 0; } int main(int argc,char* argv[]) { int threadCount = boost::lexical_cast<int>(argv[1]); std::vector<pid_t> data(threadCount); clock_t start = clock(); for(int loop=0;loop < threadCount;++loop) { data[loop] = fork(); if (data[looo] == -1) { std::cout << "Abort\n"; exit(1); } if (data[loop] == 0) { exit(threadStart(NULL)); } } clock_t middle = clock(); for(int loop=0;loop < threadCount;++loop) { int result; waitpid(data[loop], &result, 0); } clock_t end = clock(); std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n"; } </code></pre> <h3>Thread.cpp</h3> <pre class="prettyprint"><code>#include <boost/lexical_cast.hpp> #include <vector> #include <iostream> #include <pthread.h> #include <time.h> extern "C" void* threadStart(void* threadData) { return NULL; } int main(int argc,char* argv[]) { int threadCount = boost::lexical_cast<int>(argv[1]); std::vector<pthread_t> data(threadCount); clock_t start = clock(); for(int loop=0;loop < threadCount;++loop) { if (pthread_create(&data[loop], NULL, threadStart, NULL) != 0) { std::cout << "Abort\n"; exit(1); } } clock_t middle = clock(); for(int loop=0;loop < threadCount;++loop) { void* result; pthread_join(data[loop], &result); } clock_t end = clock(); std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n"; } </code></pre> I expect Windows to do worse in processes creation. But I would expect modern Unix like systems to have a fairly light fork cost and be at least comparable to thread. On older Unix style systems (before fork() was implemented as using copy on write pages) that it would be worse. Anyway My timing results are: <pre class="prettyprint"><code>> uname -a Darwin Alpha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 > gcc --version | grep GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) > g++ thread.cpp -o thread -I~/include > g++ fork.cpp -o fork -I~/include > foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 ) foreach? ./thread ${a} >> A foreach? end > foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 ) foreach? ./fork ${a} >> A foreach? end vi A Thread: Fork: C Start Wait Total C Start Wait Total ============================================================== 1 26 145 171 1 160 37 197 2 44 198 242 2 290 37 327 3 62 234 296 3 413 41 454 4 77 275 352 4 499 59 558 5 91 107 10808 5 599 57 656 6 99 332 431 6 665 52 717 7 130 388 518 7 741 69 810 8 204 468 672 8 833 56 889 9 164 469 633 9 1067 76 1143 10 165 450 615 10 1147 64 1211 12 343 585 928 12 1213 71 1284 15 232 647 879 15 1360 203 1563 20 319 921 1240 20 2161 96 2257 30 461 1243 1704 30 3005 129 3134 40 559 1487 2046 40 4466 166 4632 50 686 1912 2598 50 4591 292 4883 60 827 2208 3035 60 5234 317 5551 70 973 2885 3858 70 7003 416 7419 80 3545 2738 6283 80 7735 293 8028 90 1392 3497 4889 90 7869 463 8332 100 3917 4180 8097 100 8974 436 9410 </code></pre> <h3>Edit:</h3> Doing a 1000 children caused the fork version to fail. So I have reduced the children count. But doing a single test also seems unfair so here is a range of values.

mumble ... I do not like your solution for many reasons: <ol> <li>You are not taking in account the execution time of child processes/thread.</li> <li>You should compare cpu-usage not the bare elapsed time. This way your statistics will not depend from, e.g., disk access congestion.</li> <li>Let your child process do something. Remember that "modern" fork uses copy-on-write mechanisms to avoid to allocate memory to the child process until needed. It is too easy to exit immediately. This way you avoid quite all the disadvantages of fork.</li> <li>CPU time is not the only cost you have to account. Memory consumption and slowness of IPC are both disadvantages of fork solution.</li> </ol> You could use "rusage" instead of "clock" to measure real resource usage. P.S. I do not think you can really measure the process/thread overhead writing a simple test program. There are too many factors and, usually, the choice between threads and processes is driven by other reasons than mere cpu-usage.

Fair comparison of fork() Vs Thread [closed]

Tags:

I was having a discussion about the relative cost of fork() Vs thread() for parallelization of a task.

We understand the basic differences between processes Vs Thread

Thread:

Easy to communicate between threads
Fast context switching.

Processes:

Fault tolerance.
Communicating with parent not a real problem (open a pipe)
Communication with other child processes hard

But we disagreed on the start-up cost of processes Vs threads.
So to test the theories I wrote the following code. My question: Is this a valid test of measuring the start-up cost or I am missing something. Also I would be interested in how each test performs on different platforms.

fork.cpp

#include <boost/lexical_cast.hpp> #include <vector> #include <unistd.h> #include <iostream> #include <stdlib.h> #include <time.h>  extern "C" int threadStart(void* threadData) {     return 0; }  int main(int argc,char* argv[]) {     int threadCount =  boost::lexical_cast<int>(argv[1]);      std::vector<pid_t>   data(threadCount);     clock_t                 start   = clock();     for(int loop=0;loop < threadCount;++loop)     {         data[loop]  = fork();         if (data[looo] == -1)         {             std::cout << "Abort\n";             exit(1);         }         if (data[loop] == 0)         {             exit(threadStart(NULL));         }     }     clock_t                 middle   = clock();     for(int loop=0;loop < threadCount;++loop)     {         int   result;         waitpid(data[loop], &result, 0);     }     clock_t                 end   = clock();     std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n"; }

Thread.cpp

#include <boost/lexical_cast.hpp> #include <vector> #include <iostream> #include <pthread.h> #include <time.h>   extern "C" void* threadStart(void* threadData) {     return NULL; }     int main(int argc,char* argv[]) {     int threadCount =  boost::lexical_cast<int>(argv[1]);      std::vector<pthread_t>   data(threadCount);      clock_t                 start   = clock();     for(int loop=0;loop < threadCount;++loop)     {         if (pthread_create(&data[loop], NULL, threadStart, NULL) != 0)         {             std::cout << "Abort\n";             exit(1);         }     }        clock_t                 middle   = clock();     for(int loop=0;loop < threadCount;++loop)     {            void*   result;         pthread_join(data[loop], &result);     }     clock_t                 end   = clock();     std::cout << threadCount << "\t" << middle - start << "\t" << end - middle << "\t"<< end - start << "\n";  }

I expect Windows to do worse in processes creation.
But I would expect modern Unix like systems to have a fairly light fork cost and be at least comparable to thread. On older Unix style systems (before fork() was implemented as using copy on write pages) that it would be worse.

Anyway My timing results are:

> uname -a Darwin Alpha.local 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386 > gcc --version | grep GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5659) > g++ thread.cpp -o thread -I~/include > g++ fork.cpp -o fork -I~/include > foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 ) foreach? ./thread ${a} >> A foreach? end > foreach a ( 1 2 3 4 5 6 7 8 9 10 12 15 20 30 40 50 60 70 80 90 100 ) foreach? ./fork ${a}  >> A foreach? end vi A   Thread:                             Fork: C   Start   Wait    Total           C   Start   Wait    Total ==============================================================  1    26     145     171             1   160     37      197  2    44     198     242             2   290     37      327  3    62     234     296             3   413     41      454  4    77     275     352             4   499     59      558  5    91     107   10808             5   599     57      656  6    99     332     431             6   665     52      717  7   130     388     518             7   741     69      810  8   204     468     672             8   833     56      889  9   164     469     633             9  1067     76     1143 10   165     450     615            10  1147     64     1211 12   343     585     928            12  1213     71     1284 15   232     647     879            15  1360    203     1563 20   319     921    1240            20  2161     96     2257 30   461    1243    1704            30  3005    129     3134 40   559    1487    2046            40  4466    166     4632 50   686    1912    2598            50  4591    292     4883 60   827    2208    3035            60  5234    317     5551 70   973    2885    3858            70  7003    416     7419 80  3545    2738    6283            80  7735    293     8028 90  1392    3497    4889            90  7869    463     8332 100 3917    4180    8097            100 8974    436     9410

Edit:

Doing a 1000 children caused the fork version to fail.
So I have reduced the children count. But doing a single test also seems unfair so here is a range of values.

529

asked Oct 14 '10 15:10

Martin York

1 Answers

mumble ... I do not like your solution for many reasons:

You are not taking in account the execution time of child processes/thread.
You should compare cpu-usage not the bare elapsed time. This way your statistics will not depend from, e.g., disk access congestion.
Let your child process do something. Remember that "modern" fork uses copy-on-write mechanisms to avoid to allocate memory to the child process until needed. It is too easy to exit immediately. This way you avoid quite all the disadvantages of fork.
CPU time is not the only cost you have to account. Memory consumption and slowness of IPC are both disadvantages of fork solution.

You could use "rusage" instead of "clock" to measure real resource usage.

P.S. I do not think you can really measure the process/thread overhead writing a simple test program. There are too many factors and, usually, the choice between threads and processes is driven by other reasons than mere cpu-usage.

167

answered Sep 28 '22 04:09

andcoz

Related questions
                            
                                Building 64-bit C Python extensions on Windows
                            
                                How do I drag multiple elements at once with JavaScript or jQuery?
                            
                                Can I do SendKeys in Javascript?
                            
                                How to manage CSS Stylesheet Assets in Rails 3.1?
                            
                                How to force a static member to be initialized?
                            
                                Is there a URI equivalent of Path.PathSeparator in .NET?
                            
                                What is the fastest way to unstage parts of a new file in git?
                            
                                mvn deploy:file to different repositories for snapshot and release version
                            
                                Pre-load values for a Guava Cache
                            
                                Raphael JS - Use a SVG file
                            
                                Interfaces that inherit from a base interface
                            
                                What events are fired by Bluetooth AVRCP commands in Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With