Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What costs the extra execution time of the routine in a pthread program?

I wrote four different programs to count total words in two files. These four versions look mostly the same. First three versions use two threads to count and just the orders of three statements are different. The last version uses one thread to count. I will list the different part of each version and the common part first, then the output of each version and my question.

different part:

// version 1
count_words(&file1); 
pthread_create(&new_thread, NULL, count_words, &file2);
pthread_join(new_thread, NULL);

// version 2
pthread_create(&new_thread, NULL, count_words, &file2);
count_words(&file1);
pthread_join(new_thread, NULL);

// version 3
pthread_create(&new_thread, NULL, count_words, &file2);
pthread_join(new_thread, NULL);
count_words(&file1);

// version 4
count_words(&file1);
count_words(&file2);

common part: (Insert the different part into this common part to make a complete version)

#include <stdio.h>
#include <pthread.h>
#include <ctype.h>
#include <stdlib.h>
#include <time.h>

#define N 2000

typedef struct file_t {
    char *name;
    int words;
} file_t;

double time_diff(struct timespec *, struct timespec *);
void *count_words(void *);

// Usage: progname file1 file2
int main(int argc, char *argv[]) {
    pthread_t new_thread;
    file_t file1, file2;

    file1.name = argv[1];
    file1.words = 0;
    file2.name= argv[2];
    file2.words = 0;

    // Insert different part here

    printf("Total words: %d\n", file1.words+file2.words);
    return 0;
}

void *count_words(void *arg) {
    FILE *fp;
    file_t *file = (file_t *)arg;
    int i, c, prevc = '\0';
    struct timespec process_beg, process_end;
    struct timespec thread_beg, thread_end;
    double process_diff, thread_diff;

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_beg);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_beg);

    fp = fopen(file->name, "r");
    for (i = 0; i < N; i++) {
        while ((c = getc(fp)) != EOF) {
            if (!isalnum(c) && isalnum(prevc))
                file->words++;
            prevc = c;
        }
        fseek(fp, 0, SEEK_SET);
    }
    fclose(fp);

    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_end);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_end);

    process_diff = time_diff(&process_beg, &process_end);
    thread_diff = time_diff(&thread_beg, &thread_end);
    printf("count_words() in %s takes %.3fs process time and" 
           "%.3fs thread time\n", file->name, process_diff, thread_diff);
    return NULL;
}

double time_diff(struct timespec *beg, struct timespec *end) {
    return ((double)end->tv_sec + (double)end->tv_nsec*1.0e-9)
         - ((double)beg->tv_sec + (double)beg->tv_nsec*1.0e-9);
}

NOTE

  1. file1 is a file with 10000 words of "word". file2 is a copy of file1, created by cp command.
  2. To make the execution time long enough, program repeatly counts the words. N is the number of loops. So the result is not the accurate number of total words, but that multiplies by N.
  3. Please don't put too much emphasis on the counting algorithm. I am just concerned about the execution time in this example.
  4. An important information: The machine is Intel® Celeron(R) CPU 420 @ 1.60GHz. one core. The OS is Linux 3.2.0. Maybe one core is the cause of this strange phenomenon like others said. But I still want to figure it out.

The program counts words and uses clock_gettime() to calculate the process cpu time and the thread cpu time of routine count_words() and then output the times and the word number. Below is the output and my comment with questions. I will be very appreciated if someone can explain the reason what the extra time is taken on.

// version 1
count_words() in file1 takes 2.563s process time and 2.563s thread time
count_words() in file2 takes 8.374s process time and 8.374s thread time
Total words: 40000000

Comment: The original thread finishes count_words() and waits for the new thread to die. When count_words() running in the new thread, no context switch happens (because process time == thread time). Why it takes so much time? What happens in count_words() in the new thread?

// version 2
count_words() in file1 takes 16.755s process time and 8.377s thread time
count_words() in file2 takes 16.753s process time and 8.380s thread time
Total words: 40000000

Comment: Two threads parallel runs here. Context switch happens, so the process time > thread time.

// version 3
count_words() in file2 takes 8.374s process time and 8.374s thread time
count_words() in file1 takes 8.365s process time and 8.365s thread time
Total words: 40000000

Comment: New thread counts first and original thread waits for it. After new thread is joined, original thread begins to count. Neither of them has context switching, why so much time taken, especially the count after the new thread joined?

// version 4
count_words() in file1 takes 2.555s process time and 2.555s thread time
count_words() in file2 takes 2.556s process time and 2.556s thread time
Total words: 40000000

Comment: Fastest version. No new thread created. Both count_words() runs in a single thread.

like image 527
Yang Wenhao Avatar asked Apr 08 '13 10:04

Yang Wenhao


People also ask

What happens when pthread_create is called?

The pthread_create() function starts a new thread in the calling process. The new thread starts execution by invoking start_routine(); arg is passed as the sole argument of start_routine().

Which function is used to wait for a Posix thread pthread to finish execution?

pthread_join() is used to wait for the thread to complete its execution. pthread_exit() is used to end the pthread.

What is pthread scheduling?

Scheduling. You use the Pthreads scheduling features to set up a policy that determines which thread the system first selects to run when CPU cycles become available, and how long each thread can run once it is given the CPU.

What are the arguments in pthread_create?

pthread_t is the data type used to uniquely identify a thread. It is returned by pthread_create() and used by the application in function calls that require a thread identifier. The thread is created running start_routine, with arg as the only argument.


1 Answers

It's probably because creation of any thread forces libc to use synchronization in getc. This makes this function significantly slower. Following example is for me as slow as version 3:

void *skip(void *p){ return NULL; };
pthread_create(&new_thread, NULL, skip, NULL);
count_words(&file1); 
count_words(&file2); 

To fix this problem you can use a buffer:

for (i = 0; i < N; i++) {
    char buffer[BUFSIZ];
    int read;
    do {
        read = fread(buffer, 1, BUFSIZ, fp);
        int j;
        for(j = 0; j < read; j++) {
            if (!isalnum(buffer[j]) && isalnum(prevc))
                file->words++;
            prevc = buffer[j];
        }
    } while(read == BUFSIZ);
    fseek(fp, 0, SEEK_SET);
}

In this solution, IO functions are called rarely enough to make synchronization overhead insignificant. This not only solves problem of weird timings, but also makes it several times faster. For me it's reduction from 0.54s (without threads) or 0.85s (with threads) to 0.15s (in both cases).

like image 68
zch Avatar answered Sep 20 '22 04:09

zch