Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the kernel separate threads from processes

Suppose I have a browser process like Firefox, that has pid = 123. Firefox has 5 opened tabs each running in a separate thread, so in total it has 5 threads.

  1. So I want to know in depth, how the kernel will separate the process into the thread to execute in struct task_struct or in the thread_info.

  2. Like struct task_struct is a task descriptor of the task list. where does struct task_struct contain a reference or a link to these five threads.

  3. Does the struct thread_struct of a process like Firefox contain reference to all the 5 thread

    OR

    each thread is treated like a process inside the Linux kernel.

like image 376
Dpk Avatar asked Dec 01 '22 17:12

Dpk


1 Answers

Unlike Windows, Linux does not have an implementation of "threads" in the kernel. The kernel gives us what are sometimes called "lightweight processes", which are a generalization of the concepts of "processes" and "threads", and can be used to implement either.

It may be confusing when you read kernel code and see things like thread_struct on the one hand, and pid (process ID) on the other. In reality, both are one and the same. Don't be confused by the terminology.

Each lightweight process has a completely different thread_info and task_struct (with embedded thread_struct). You seem to think that the task_struct of one lightweight process should have pointers to the task_structs of other (userspace) "threads" in the same (userspace) "process". This is not the case. Inside the kernel, each "thread" is a separate process, and the scheduler deals with each one separately.

Linux has a system call called clone which is used to create new lightweight processes. When you call clone, you must provide various flags which indicate what will be shared between the new process and the existing process. They can share their address space, or they can each have a different address space. They can share their open files, or they can each have their own list of open files. They can share their signal handlers, or they can each have their own signal handlers. They can be in the same "thread group", or they can be in different thread groups. And so on...

Although "threads" and "processes" are the same thing in Linux, you can implement what we normally think of as "processes" by using clone to create processes which do not share their address space, open files, signal handlers, etc.

You can also implement what we normally think of as "threads" by using clone to create processes which DO share their address space, open files, signal handlers, etc.

If you look at the definition of task_struct, you will find that it has pointers to other structs such as mm_struct (address space), files_struct (open files), sighand_struct (signal handlers), and so on. When you clone a new "process", all of these structs will be copied. When you clone a new "thread", these structs will be shared between the new and old task_structs -- they will both point to the same mm_struct, the same files_struct, and so on. Either way, you are just providing different flags to clone to tell it what to copy, and what to share.

I just mentioned "thread groups" above, so you might wonder about that. In short, each "thread" in a "process" has its own PID, but they all share the same TGID (thread group ID). The TGIDs are all equal to the PID of the first program thread. Userspace "PIDs", like those shown in ps, or in /proc, are actually "TGIDs" in the kernel. Naturally, clone has a flag to determine whether a new lightweight process will have a new TGID (thus putting it in a new "thread group") or not.

UNIX processes also have "parents" and "children". There are pointers in a Linux task_struct which implement the parent-child relationships. And, as you might have guessed, clone has a flag to determine what the parent of a new lightweight process will be. It can either be the process which called clone, OR the parent of the process which called clone. Can you figure out which is used when creating a "process", and which is used when creating a "thread"?

Look at the manpage for clone; it will be very educational. Also try strace on a program which uses pthreads to see clone in use.

(A lot of this was written from memory; others should feel free to edit in corrections as necessary)

like image 119
Alex D Avatar answered Dec 11 '22 07:12

Alex D