Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Different execution orders cause differences in performance of a Pthread program

This is my first post on stackoverflow and my native language is not English. Please excuse me for any inconvenience this post brings to you. Maybe it's a little long, so I am looking forward to your patience. Thanks in advance!

I have a C language code snippet. The job is counting the number of words in two files. I use pthreads to solve this problem. But I find the order of these two statements

count_words(argv[1]);

pthread_create(&t1, NULL, count_words, (void *)argv[2]);

affects the program performance, which it's opposite to what I expected. Here is the code:

#include <stdio.h>
#include <pthread.h>
#include <ctype.h>
#include <stdlib.h>

int total_words;

int main(int argc, char *argv[]) {
    pthread_t t1;
    void *count_words(void *);

    if (argc != 3) {
        printf("usage: %s file1 file2\n", argv[0]);
        exit(1);
    }
    total_words = 0;

    count_words(argv[1]); // program runs faster when executing this first
    pthread_create(&t1, NULL, count_words, (void *)argv[2]);

    pthread_join(t1, NULL);
    printf("%5d: total words\n", total_words);
    return 0;
}

void *count_words(void *f) {
    char *filename = (char *)f;
    FILE *fp;
    int c, prevc = '\0';

    if ((fp = fopen(filename, "r")) == NULL) {
        perror(filename);
        exit(1);
    }
    while ((c = getc(fp)) != EOF) {
        if (!isalnum(c) && isalnum(prevc))
            total_words++;
        prevc = c;
    }
    fclose(fp);
    return NULL;
}

Performance:

I run program using "test program_name" on command line to test the running speed. The output is:

If the order like this:

count_words(argv[1]);

pthread_create(&t1, NULL, count_words, (void *)argv[2]);

program runs fast: real 0.014s

If like this:

pthread_create(&t1, NULL, count_words, (void *)argv[2]);

count_words(argv[1]);

program runs slow: real 0.026s

What I expected:

On case 1, program runs count_word() first. After completing the counting job will it continue to run pthread_create(). At that time, the new thread will help do the counting job. So, the new thread does the job after the origin thread completes the job, which is sequential running instead of parallel running. On case 2, program runs pthread_create() first before any counting, so after that there are two threads parallel do the counting. So I expect the case 2 is faster than case 1. But I am wrong. Case 2 is slower. Could anybody give me some useful info on this?

Note

Please ignore that I don't put a mutex lock on the global variable total_words. This is not the part I am concerned about. And the program is just for testing. Please forgive its imperfections.


Edit 1

Below is the supplement and improvement after I read some suggestions.

a) Supplement: The processor is Intel® Celeron(R) CPU 420 @ 1.60GHz. One core.

b) Improvement: I have improved my example, two changes:

1) I enlarged the files. file1 is 2080651 bytes (about 2M), file2 is the copy of file1.

2) I modified count_words(). When reaching the file end, use fseek() to set the fp to the beginning and count again. Repeatedly counts COUNT times. Define COUNT 20. Below is the changed code:

#define COUNT 20

// other unchanged codes ...

void *count_words(void *f) {
    // other unchanged codes ...
  int i;
  for (i = 0; i < COUNT; i++) {
      while ((c = getc(fp)) != EOF) {
          if (!isalnum(c) && isalnum(prevc))
              total_words++;
          prevc = c;
      }
      fseek(fp, 0, SEEK_SET);
  }
  fclose(fp);
  return NULL;
}

Output of fast_version (count_word() first) and slow_version (pthread_create() first):

administrator@ubuntu:~$ time ./fast_version file1 file2

12241560: total words

real 0m5.057s

user 0m4.960s

sys 0m0.048s

administrator@ubuntu:~$ time ./slow_version file1 file2

12241560: total words

real 0m7.636s

user 0m7.280s

sys 0m0.048s

I tried the "time progname file1 file2" command a few times. Maybe there is some difference on tenth or hundredth of a second on each run. But the differences are not much.

Edit 2

This part is added after I have done some experiments according to some hints --

When you launch the second thread after the first thread completes it's execution, there is no context switching overhead.

--by user315052.

The experiment is that I improved the count_word():

void *count_word(void *f) {
// unchanged codes
// ...
    for (i = 0; i < COUNT; i++) {
        while ((c = getc(fp)) != EOF) {
            if (!isalnum(c) && isalnum(prevc))
                total_words++;
            prevc = c;
        }
        fseek(fp, 0, SEEK_SET);
        printf("from %s\n", filename); // This statement is newly added.
    }
// unchanged codes
// ...
}

Add statement " printf("from %s\n", filename); " , so I can tell which file (or thread) is running at that time. The output of fast version is 20 times " from file1 ", then 20 times " from file2 ", and the slow version is " from file1 " and " from file2 " mixed printed.

It looks like fast version is faster because there is no context switching. But fact is that after count_word() finished, the original thread was not dead, but created a new thread and waited for it to terminate. Is there no context switching when new thread is running? I watched the screen closely and found the printing speed of " from file2 " is apparently slower than " from file1 ". Why? Is it because context switching happened when counting from file2?

For the slow version, we can see from the output the printing speed of " from file1 " and " from file2 " is even slower than the printing speed of " from file2 " in fast version, because its context switching costs more time on parallel counting, while in fast version the context switching is not so heavy as one of the threads has finished its job and just waiting.

So I think the main reason is fast version has a light and easy context switching against the slow version. But the "printing speed" is from my observation, and may be not so strict. So I am not sure about it.

like image 957
Yang Wenhao Avatar asked Apr 06 '13 09:04

Yang Wenhao


3 Answers

In a comment, you wrote:

The processor is Intel® Celeron(R) CPU 420 @ 1.60GHz. One core.

Since you only have one core, your thread executions are serialized anyway. Your program with two threads running concurrently pays the overhead of thread context switching as each performs blocking I/O.

When you launch the second thread after the first thread completes it's execution, there is no context switching overhead.

like image 76
jxh Avatar answered Nov 13 '22 16:11

jxh


Try to do the same measure but run your program a 100 times and calculate the average time, with such a short time the effect of caching is far from neglictible for an example.

like image 2
Étienne Avatar answered Nov 13 '22 17:11

Étienne


How did you measure?

Real time is not an indication of how long your program ran. You have to measure user+system time. What's more, meaningful timing at the millisecond level depends very much on the granularity of your timing clock. If it runs, say, at 60Hz, then you have a problem. Coming up with meaningful benchmarks is an art...

As a start, you should find a way to run your threads in a loop, say, 10.000 times and add up numbers. That'll at least get you out of the millisecond timing problem.

like image 2
Jens Avatar answered Nov 13 '22 16:11

Jens