After reading up on this answer and "Linux Kernel Development" by Robert Love and, subsequently, on the <code>clone()</code> system call, I discovered that processes and threads in Linux are (almost) indistinguishable to the kernel. There are a few tweaks between them (discussed as being "more sharing" or "less sharing" in the quoted SO question), but I do still have some questions yet to be answered. I recently worked on a program involving a couple of POSIX threads and decided to experiment on this premise. On a process that creates two threads, all threads of course get a unique value returned by <code>pthread_self()</code>, however, not by <code>getpid()</code>. A sample program I created follows: <pre class="prettyprint"><code>#include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <unistd.h> #include <pthread.h> void* threadMethod(void* arg) { int intArg = (int) *((int*) arg); int32_t pid = getpid(); uint64_t pti = pthread_self(); printf("[Thread %d] getpid() = %d\n", intArg, pid); printf("[Thread %d] pthread_self() = %lu\n", intArg, pti); } int main() { pthread_t threads[2]; int thread1 = 1; if ((pthread_create(&threads[0], NULL, threadMethod, (void*) &thread1)) != 0) { fprintf(stderr, "pthread_create: error\n"); exit(EXIT_FAILURE); } int thread2 = 2; if ((pthread_create(&threads[1], NULL, threadMethod, (void*) &thread2)) != 0) { fprintf(stderr, "pthread_create: error\n"); exit(EXIT_FAILURE); } int32_t pid = getpid(); uint64_t pti = pthread_self(); printf("[Process] getpid() = %d\n", pid); printf("[Process] pthread_self() = %lu\n", pti); if ((pthread_join(threads[0], NULL)) != 0) { fprintf(stderr, "Could not join thread 1\n"); exit(EXIT_FAILURE); } if ((pthread_join(threads[1], NULL)) != 0) { fprintf(stderr, "Could not join thread 2\n"); exit(EXIT_FAILURE); } return 0; } </code></pre> (This was compiled [<code>gcc -pthread -o thread_test thread_test.c</code>] on 64-bit Fedora; due to the 64-bit types used for <code>pthread_t</code> sourced from <code><bits/pthreadtypes.h></code>, the code will require minor changes to compile on 32-bit editions.) The output I get is as follows: <pre class="prettyprint"><code>[bean@fedora ~]$ ./thread_test [Process] getpid() = 28549 [Process] pthread_self() = 140050170017568 [Thread 2] getpid() = 28549 [Thread 2] pthread_self() = 140050161620736 [Thread 1] getpid() = 28549 [Thread 1] pthread_self() = 140050170013440 [bean@fedora ~]$ </code></pre> By using scheduler locking in <code>gdb</code>, I can keep the program and its threads alive so I can capture what <code>top</code> says, which, just showing processes, is: <pre class="prettyprint"><code> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28602 bean 20 0 15272 1112 820 R 0.4 0.0 0:00.63 top 2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash 28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb 28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test 28561 bean 20 0 107m 1712 1356 S 0.0 0.0 0:00.07 bash </code></pre> And when showing threads, says: <pre class="prettyprint"><code> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28617 bean 20 0 15272 1116 820 R 47.2 0.0 0:00.08 top 2036 bean 20 0 108m 1868 1412 S 0.0 0.0 0:00.11 bash 28547 bean 20 0 231m 16m 7676 S 0.0 0.4 0:01.56 gdb 28549 bean 20 0 22688 340 248 t 0.0 0.0 0:00.26 thread_test 28552 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test 28553 bean 20 0 22688 340 248 t 0.0 0.0 0:00.00 thread_test 28561 bean 20 0 107m 1860 1432 S 0.0 0.0 0:00.08 bash </code></pre> It seems to be quite clear that programs, or perhaps the kernel, have a distinct way of defining threads in contrast to processes. Each thread has its own PID according to <code>top</code> - why?

These confusions all stem from the fact that the kernel developers originally held an irrational and wrong view that threads could be implemented almost entirely in userspace using kernel processes as the primitive, as long as the kernel offered a way to make them share memory and file descriptors. This lead to the notoriously bad LinuxThreads implementation of POSIX threads, which was rather a misnomer because it did not give anything remotely resembling POSIX thread semantics. Eventually LinuxThreads was replaced (by NPTL), but a lot of the confusing terminology and misunderstandings persist. The first and most important thing to realize is that "PID" means different things in kernel space and user space. What the kernel calls PIDs are actually kernel-level thread ids (often called TIDs), not to be confused with <code>pthread_t</code> which is a separate identifier. Each thread on the system, whether in the same process or a different one, has a unique TID (or "PID" in the kernel's terminology). What's considered a PID in the POSIX sense of "process", on the other hand, is called a "thread group ID" or "TGID" in the kernel. Each process consists of one or more threads (kernel processes) each with their own TID (kernel PID), but all sharing the same TGID, which is equal to the TID (kernel PID) of the initial thread in which <code>main</code> runs. When <code>top</code> shows you threads, it's showing TIDs (kernel PIDs), not PIDs (kernel TGIDs), and this is why each thread has a separate one. With the advent of NPTL, most system calls that take a PID argument or act on the calling process were changed to treat the PID as a TGID and act on the whole "thread group" (POSIX process).

Distinction between processes and threads in Linux

Tags:

After reading up on this answer and "Linux Kernel Development" by Robert Love and, subsequently, on the clone() system call, I discovered that processes and threads in Linux are (almost) indistinguishable to the kernel. There are a few tweaks between them (discussed as being "more sharing" or "less sharing" in the quoted SO question), but I do still have some questions yet to be answered.

I recently worked on a program involving a couple of POSIX threads and decided to experiment on this premise. On a process that creates two threads, all threads of course get a unique value returned by pthread_self(), however, not by getpid().

A sample program I created follows:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <pthread.h>

void* threadMethod(void* arg)
{
    int intArg = (int) *((int*) arg);

    int32_t pid = getpid();
    uint64_t pti = pthread_self();

    printf("[Thread %d] getpid() = %d\n", intArg, pid);
    printf("[Thread %d] pthread_self() = %lu\n", intArg, pti);
}

int main()
{
    pthread_t threads[2];

    int thread1 = 1;

    if ((pthread_create(&threads[0], NULL, threadMethod, (void*) &thread1))
         != 0)
    {
        fprintf(stderr, "pthread_create: error\n");
        exit(EXIT_FAILURE);
    }

    int thread2 = 2;

    if ((pthread_create(&threads[1], NULL, threadMethod, (void*) &thread2))
         != 0)
    {
        fprintf(stderr, "pthread_create: error\n");
        exit(EXIT_FAILURE);
    }

    int32_t pid = getpid();
    uint64_t pti = pthread_self();

    printf("[Process] getpid() = %d\n", pid);
    printf("[Process] pthread_self() = %lu\n", pti);

    if ((pthread_join(threads[0], NULL)) != 0)
    {
        fprintf(stderr, "Could not join thread 1\n");
        exit(EXIT_FAILURE);
    }

    if ((pthread_join(threads[1], NULL)) != 0)
    {
        fprintf(stderr, "Could not join thread 2\n");
        exit(EXIT_FAILURE);
    }

    return 0;
}

(This was compiled [gcc -pthread -o thread_test thread_test.c] on 64-bit Fedora; due to the 64-bit types used for pthread_t sourced from <bits/pthreadtypes.h>, the code will require minor changes to compile on 32-bit editions.)

The output I get is as follows:

[bean@fedora ~]$ ./thread_test 
[Process] getpid() = 28549
[Process] pthread_self() = 140050170017568
[Thread 2] getpid() = 28549
[Thread 2] pthread_self() = 140050161620736
[Thread 1] getpid() = 28549
[Thread 1] pthread_self() = 140050170013440
[bean@fedora ~]$

By using scheduler locking in gdb, I can keep the program and its threads alive so I can capture what top says, which, just showing processes, is:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28602 bean      20   0 15272 1112  820 R  0.4  0.0   0:00.63 top
 2036 bean      20   0  108m 1868 1412 S  0.0  0.0   0:00.11 bash
28547 bean      20   0  231m  16m 7676 S  0.0  0.4   0:01.56 gdb
28549 bean      20   0 22688  340  248 t  0.0  0.0   0:00.26 thread_test
28561 bean      20   0  107m 1712 1356 S  0.0  0.0   0:00.07 bash

And when showing threads, says:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28617 bean      20   0 15272 1116  820 R 47.2  0.0   0:00.08 top
 2036 bean      20   0  108m 1868 1412 S  0.0  0.0   0:00.11 bash
28547 bean      20   0  231m  16m 7676 S  0.0  0.4   0:01.56 gdb
28549 bean      20   0 22688  340  248 t  0.0  0.0   0:00.26 thread_test
28552 bean      20   0 22688  340  248 t  0.0  0.0   0:00.00 thread_test
28553 bean      20   0 22688  340  248 t  0.0  0.0   0:00.00 thread_test
28561 bean      20   0  107m 1860 1432 S  0.0  0.0   0:00.08 bash

It seems to be quite clear that programs, or perhaps the kernel, have a distinct way of defining threads in contrast to processes. Each thread has its own PID according to top - why?

884

asked Feb 06 '12 01:02

Doddy

1 Answers

These confusions all stem from the fact that the kernel developers originally held an irrational and wrong view that threads could be implemented almost entirely in userspace using kernel processes as the primitive, as long as the kernel offered a way to make them share memory and file descriptors. This lead to the notoriously bad LinuxThreads implementation of POSIX threads, which was rather a misnomer because it did not give anything remotely resembling POSIX thread semantics. Eventually LinuxThreads was replaced (by NPTL), but a lot of the confusing terminology and misunderstandings persist.

The first and most important thing to realize is that "PID" means different things in kernel space and user space. What the kernel calls PIDs are actually kernel-level thread ids (often called TIDs), not to be confused with pthread_t which is a separate identifier. Each thread on the system, whether in the same process or a different one, has a unique TID (or "PID" in the kernel's terminology).

What's considered a PID in the POSIX sense of "process", on the other hand, is called a "thread group ID" or "TGID" in the kernel. Each process consists of one or more threads (kernel processes) each with their own TID (kernel PID), but all sharing the same TGID, which is equal to the TID (kernel PID) of the initial thread in which main runs.

When top shows you threads, it's showing TIDs (kernel PIDs), not PIDs (kernel TGIDs), and this is why each thread has a separate one.

With the advent of NPTL, most system calls that take a PID argument or act on the calling process were changed to treat the PID as a TGID and act on the whole "thread group" (POSIX process).

100

answered Nov 21 '22 20:11

R.. GitHub STOP HELPING ICE

Related questions
                            
                                how to make solution visible in solution explorer window of visual studio 2010 or visual studio 2012?
                            
                                Bad performance with Guava Cache on Android
                            
                                How I can find function in shared object files using objdump and bash functions in linux?
                            
                                Tell Emacs never to insert Tabs
                            
                                Setting focus to a textbox when a function is called
                            
                                Unauthorized result in ajax requests
                            
                                (Django) Cannot assign "u'1'": "StaffProfile.user" must be a "User" instance
                            
                                mysql5 - As 'root' can't create database or do anything (Access denied)
                            
                                How to grep to include an optional word?
                            
                                Server cleanup after a client disconnects
                            
                                effect of goto on C++ compiler optimization
                            
                                Is there an option to GNU ld to omit -dynamic-linker (PT_INTERP) completely?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With