I'm trying to understand more about <code>process 0</code>, such as, whether it has a memory descriptor (non-<code>NULL</code> <code>task_struct->mm</code> field) or not, and how is it related to the swap or idle process. It seems to me that a single 'process 0' is created on the boot cpu, and then an idle thread is created for every other cpu by <code>idle_threads_init</code>, but I didn't find where the first one( I assume that is the <code>process 0</code>) was created. Update In light of the live book that tychen referenced, here is my most up-to-date understanding regarding <code>process 0</code> (for x86_64), can someone confirm/refute the items below? <ol> <li>An <code>init_task</code> typed <code>task_struct</code> is statically defined, with the task's kernel stack <code>init_task.stack = init_stack</code>, memory descriptor <code>init_task.mm=NULL</code> and <code>init_task.active_mm=&init_mm</code>, where the stack area <code>init_stack</code> and <code>mm_struct</code> <code>init_mm</code> are both statically defined.</li> <li>The fact that only <code>active_mm</code> is non-NULL means <code>process 0</code> is a kernel process. Also, <code>init_task.flags=PF_KTHREAD</code>.</li> <li>Not long after the uncompressed kernel image begins execution, boot cpu starts to use <code>init_stack</code> as kernel stack. This makes the <code>current</code> macro meaningful (for the first time since machine boots up), which makes <code>fork()</code> possible. After this point, the kernel literally runs in <code>process 0</code>'s conext.</li> <li> <code>start_kernel</code> -> <code>arch_call_rest_init</code> -> <code>rest_init</code>, and inside this function, <code>process 1&2</code> are forked. Within the <code>kernel_init</code> function which is scheduled for <code>process 1</code>, a new thread (with <code>CLONE_VM</code>) is made and hooked to a CPU's run queue's <code>rq->idle</code>, for every other logical CPU.</li> <li>Interestingly, all idle threads share the same <code>tid 0</code> (not only <code>tgid</code>). Usually threads share <code>tgid</code> but have distinct <code>tid</code>, which is really Linux's <code>process id</code>. I guess it doesn't break anything because idle threads are locked to their own CPUs.</li> <li> <code>kernel_init</code> loads the <code>init</code> executable (typically <code>/sbin/init</code>), and switches both <code>current</code>-><code>mm</code> and <code>active_mm</code> to a non-NULL <code>mm_struct</code>, and clears the <code>PF_KTHREAD</code> flag, which makes <code>process 1</code> a legitimate user space process. While <code>process 2</code> does not tweak <code>mm</code>, meaning it remains a kernel process, same as <code>process 0</code>.</li> <li>At the end of <code>rest_init</code>, <code>do_idle</code> takes over, which means all CPU has an idle process.</li> <li>Something confused me before, but now becomes clear: the <code>init_*</code> objects/labels such as <code>init_task</code>/<code>init_mm</code>/<code>init_stack</code> are all used by <code>process 0</code>, and not the <code>init process</code>, which is <code>process 1</code>.</li> </ol>

We really start Linux kernel from <code>start_kernel</code>, and the process 0/idle starts here too. In the begin of <code>start_kernel</code>, we call <code>set_task_stack_end_magic(&init_stack)</code>. This function will set the stack border of <code>init_task</code>, which is the process 0/idle. <pre class="prettyprint lang-c prettyprint-override"><code>void set_task_stack_end_magic(struct task_struct *tsk) { unsigned long *stackend; stackend = end_of_stack(tsk); *stackend = STACK_END_MAGIC; /* for overflow detection */ } </code></pre> It's easy to understand that this function get the limitation address and set the bottom to STACK_END_MAGIC as a stack overflow flag. Here is the structure graph. <img src="https://i.stack.imgur.com/Z04AU.png" alt="enter image description here"> The process 0 is statically defined . This is the only process that is not created by <code>kernel_thread</code> nor <code>fork</code>. <pre class="prettyprint lang-c prettyprint-override"><code>/* * Set up the first task table, touch at your own risk!. Base=0, * limit=0x1fffff (=2MB) */ struct task_struct init_task #ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK __init_task_data #endif = { #ifdef CONFIG_THREAD_INFO_IN_TASK .thread_info = INIT_THREAD_INFO(init_task), .stack_refcount = REFCOUNT_INIT(1), #endif .state = 0, .stack = init_stack, .usage = REFCOUNT_INIT(2), .flags = PF_KTHREAD, .prio = MAX_PRIO - 20, .static_prio = MAX_PRIO - 20, .normal_prio = MAX_PRIO - 20, .policy = SCHED_NORMAL, .cpus_ptr = &init_task.cpus_mask, .cpus_mask = CPU_MASK_ALL, .nr_cpus_allowed= NR_CPUS, .mm = NULL, .active_mm = &init_mm, ...... .thread_pid = &init_struct_pid, .thread_group = LIST_HEAD_INIT(init_task.thread_group), .thread_node = LIST_HEAD_INIT(init_signals.thread_head), ...... }; EXPORT_SYMBOL(init_task); </code></pre> Here are some important thins we need to make it clearly. <ol> <li> <code>INIT_THREAD_INFO(init_task)</code> sets the <code>thread_info</code> as the graph above.</li> <li> <code>init_stack</code> is defined as below</li> </ol> <pre class="prettyprint lang-c prettyprint-override"><code>extern unsigned long init_stack[THREAD_SIZE / sizeof(unsigned long)]; </code></pre> where THREAD_SIZE equal to <pre class="prettyprint lang-c prettyprint-override"><code>#ifdef CONFIG_KASAN #define KASAN_STACK_ORDER 1 #else #define KASAN_STACK_ORDER 0 #endif #define THREAD_SIZE_ORDER (2 + KASAN_STACK_ORDER) #define THREAD_SIZE (PAGE_SIZE << THREAD_SIZE_ORDER) </code></pre> so the default size is defined. <ol start="3"> <li>The process 0 will only run in kernel space, but in some circumstances as I mention above it needs a virtual memory space, so we set the following</li> </ol> <pre class="prettyprint lang-c prettyprint-override"><code> .mm = NULL, .active_mm = &init_mm, </code></pre> Let's look back at <code>start_kernel</code>, the <code>rest_init</code> will initialize <code>kernel_init</code> and <code>kthreadd</code>. <pre class="prettyprint lang-c prettyprint-override"><code>noinline void __ref rest_init(void) { ...... pid = kernel_thread(kernel_init, NULL, CLONE_FS); ...... pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES); ...... } </code></pre> <code>kernel_init</code> will run <code>execve</code> and then go to user space, change to <code>init</code> process by running , which is process 1. <pre class="prettyprint lang-c prettyprint-override"><code>if (!try_to_run_init_process("/sbin/init") || !try_to_run_init_process("/etc/init") || !try_to_run_init_process("/bin/init") || !try_to_run_init_process("/bin/sh")) return 0; </code></pre> <code>kthread</code> becomes the daemon process to manage and schedule other kernel <code>task_struts</code>, which is process 2. After all this, the process 0 will become idle process and jump out <code>rq</code> which means it will only run when the <code>rq</code> is empty. <pre class="prettyprint lang-c prettyprint-override"><code>noinline void __ref rest_init(void) { ...... /* * The boot idle thread must execute schedule() * at least once to get things moving: */ schedule_preempt_disabled(); /* Call into cpu_idle with preempt disabled */ cpu_startup_entry(CPUHP_ONLINE); } void cpu_startup_entry(enum cpuhp_state state) { arch_cpu_idle_prepare(); cpuhp_online_idle(state); while (1) do_idle(); } </code></pre> Finally, here is a good gitbook for you if you want to get more understanding of Linux kernel.

Which Linux kernel function creates the 'process 0'?

Tags:

c

x86

assembly

linux-kernel

boot

I'm trying to understand more about process 0, such as, whether it has a memory descriptor (non-NULL task_struct->mm field) or not, and how is it related to the swap or idle process. It seems to me that a single 'process 0' is created on the boot cpu, and then an idle thread is created for every other cpu by idle_threads_init, but I didn't find where the first one( I assume that is the process 0) was created.

Update

In light of the live book that tychen referenced, here is my most up-to-date understanding regarding process 0 (for x86_64), can someone confirm/refute the items below?

An init_task typed task_struct is statically defined, with the task's kernel stack init_task.stack = init_stack, memory descriptor init_task.mm=NULL and init_task.active_mm=&init_mm, where the stack area init_stack and mm_struct init_mm are both statically defined.
The fact that only active_mm is non-NULL means process 0 is a kernel process. Also, init_task.flags=PF_KTHREAD.
Not long after the uncompressed kernel image begins execution, boot cpu starts to use init_stack as kernel stack. This makes the current macro meaningful (for the first time since machine boots up), which makes fork() possible. After this point, the kernel literally runs in process 0's conext.
start_kernel -> arch_call_rest_init -> rest_init, and inside this function, process 1&2 are forked. Within the kernel_init function which is scheduled for process 1, a new thread (with CLONE_VM) is made and hooked to a CPU's run queue's rq->idle, for every other logical CPU.
Interestingly, all idle threads share the same tid 0 (not only tgid). Usually threads share tgid but have distinct tid, which is really Linux's process id. I guess it doesn't break anything because idle threads are locked to their own CPUs.
kernel_init loads the init executable (typically /sbin/init), and switches both current->mm and active_mm to a non-NULL mm_struct, and clears the PF_KTHREAD flag, which makes process 1 a legitimate user space process. While process 2 does not tweak mm, meaning it remains a kernel process, same as process 0.
At the end of rest_init, do_idle takes over, which means all CPU has an idle process.
Something confused me before, but now becomes clear: the init_* objects/labels such as init_task/init_mm/init_stack are all used by process 0, and not the init process, which is process 1.

954

asked Jun 04 '20 21:06

QnA

Video Answer

1 Answers

We really start Linux kernel from start_kernel, and the process 0/idle starts here too.

In the begin of start_kernel, we call set_task_stack_end_magic(&init_stack). This function will set the stack border of init_task, which is the process 0/idle.

void set_task_stack_end_magic(struct task_struct *tsk)
{
    unsigned long *stackend;

    stackend = end_of_stack(tsk);
    *stackend = STACK_END_MAGIC;    /* for overflow detection */
}

It's easy to understand that this function get the limitation address and set the bottom to STACK_END_MAGIC as a stack overflow flag. Here is the structure graph.

enter image description here

The process 0 is statically defined . This is the only process that is not created by kernel_thread nor fork.

/*
 * Set up the first task table, touch at your own risk!. Base=0,
 * limit=0x1fffff (=2MB)
 */
struct task_struct init_task
#ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
    __init_task_data
#endif
= {
#ifdef CONFIG_THREAD_INFO_IN_TASK
    .thread_info    = INIT_THREAD_INFO(init_task),
    .stack_refcount = REFCOUNT_INIT(1),
#endif
    .state      = 0,
    .stack      = init_stack,
    .usage      = REFCOUNT_INIT(2),
    .flags      = PF_KTHREAD,
    .prio       = MAX_PRIO - 20,
    .static_prio    = MAX_PRIO - 20,
    .normal_prio    = MAX_PRIO - 20,
    .policy     = SCHED_NORMAL,
    .cpus_ptr   = &init_task.cpus_mask,
    .cpus_mask  = CPU_MASK_ALL,
    .nr_cpus_allowed= NR_CPUS,
    .mm     = NULL,
    .active_mm  = &init_mm,
    ......
    .thread_pid = &init_struct_pid,
    .thread_group   = LIST_HEAD_INIT(init_task.thread_group),
    .thread_node    = LIST_HEAD_INIT(init_signals.thread_head),
    ......
};
EXPORT_SYMBOL(init_task);

Here are some important thins we need to make it clearly.

INIT_THREAD_INFO(init_task) sets the thread_info as the graph above.
init_stack is defined as below

extern unsigned long init_stack[THREAD_SIZE / sizeof(unsigned long)];

where THREAD_SIZE equal to

#ifdef CONFIG_KASAN
#define KASAN_STACK_ORDER 1
#else
#define KASAN_STACK_ORDER 0
#endif
#define THREAD_SIZE_ORDER   (2 + KASAN_STACK_ORDER)
#define THREAD_SIZE  (PAGE_SIZE << THREAD_SIZE_ORDER)

so the default size is defined.

The process 0 will only run in kernel space, but in some circumstances as I mention above it needs a virtual memory space, so we set the following

    .mm     = NULL,
    .active_mm  = &init_mm,

Let's look back at start_kernel, the rest_init will initialize kernel_init and kthreadd.

noinline void __ref rest_init(void)
{
......
    pid = kernel_thread(kernel_init, NULL, CLONE_FS);
......
    pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
......
}

kernel_init will run execve and then go to user space, change to init process by running , which is process 1.

if (!try_to_run_init_process("/sbin/init") || 
    !try_to_run_init_process("/etc/init")  || 
    !try_to_run_init_process("/bin/init")  || 
    !try_to_run_init_process("/bin/sh")) 
   return 0;

kthread becomes the daemon process to manage and schedule other kernel task_struts, which is process 2.

After all this, the process 0 will become idle process and jump out rq which means it will only run when the rq is empty.

noinline void __ref rest_init(void)
{
......
    /*
     * The boot idle thread must execute schedule()
     * at least once to get things moving:
     */
    schedule_preempt_disabled();
    /* Call into cpu_idle with preempt disabled */
    cpu_startup_entry(CPUHP_ONLINE);
}


void cpu_startup_entry(enum cpuhp_state state)
{
    arch_cpu_idle_prepare();
    cpuhp_online_idle(state);
    while (1)
        do_idle();
}

Finally, here is a good gitbook for you if you want to get more understanding of Linux kernel.

163

answered Oct 12 '22 14:10

tyChen

Related questions
                            
                                C99: what is the recomended way to handle exceptions raised by `pow()` (overflow or complex number)
                            
                                Syntax with missed Expression for basic for-loop
                            
                                How to disable linux space randomization via dockerfile?
                            
                                How to identify if a process is java or c or c++ process in linux?
                            
                                What can I assume about the behaviour of atoi() on error?
                            
                                Avoid false-positive -Wswitch warnings
                            
                                Function with input of pointer to pointer to
                            
                                Can't import numpy from C
                            
                                clang tautological-constant-out-of-range-compare warning
                            
                                Is this strategy, to avoid global variables in C, right? [closed]
                            
                                Difference between cpp and gcc -E
                            
                                Typedef function vs function pointer
                            
                                Slightly different result from exp function on Mac and Linux
                            
                                GCC PowerPC avoiding .rodata section for floats
                            
                                Can't reach stdout from Docker using Go client
                            
                                Replacing JNI Crashes by Exceptions on Android [duplicate]
                            
                                scapy send tcp packet on established connection
                            
                                How to let CMake / CTest memcheck exit with status code 1 on failure?
                            
                                Temporary Objects in C
                            
                                Programmatically get accurate CPU cache hierarchy information on Linux

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With