In my program, whose rss is 65G, when call fork
, sys_clone->dup_mm->copy_page_range
will consume more than 2 seconds. In this case, one cpu will 100% sys when execute fork, at the same time, one thread cannot get cpu time until fork finish. The machine has 16 CPUs, the other CPUs is idle.
So my question is one cpu was busy on fork, why the scheduler don't migrate the process waiting on this cpu to other idle cpu? In general, when and how the scheduler migrate process between cpus?
I search this site, and the existing threads cannot answer my question.
rss is 65G, when call fork, sys_clone->dup_mm->copy_page_range will consume more than 2 seconds
While doing fork
(or clone
) the vmas of existing process should be copied into vmas of new process. dup_mm
function (kernel/fork.c) creates new mm
and do actual copy. There are no direct calls to copy_page_range
, but I think, static function dup_mmap
may be inlined into dup_mm
and it has calls to copy_page_range
.
In the dup_mmap
there are several locks locked, both in new mm
and old oldmm
:
356 down_write(&oldmm->mmap_sem);
After taking the mmap_sem
reader/writer semaphore, there is a loop over all mmaps to copy their metainformation:
381 for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next)
Only after the loop (it is long in your case), mmap_sem
is unlocked:
465 out:
468 up_write(&oldmm->mmap_sem);
While the rwlock mmap_sep
is down by writer, no any other reader or writer can do anything with mmaps in oldmm
.
one thread cannot get cpu time until fork finish So my question is one cpu was busy on fork, why the scheduler don't migrate the process waiting on this cpu to other idle cpu?
Are you sure, that other thread is ready to run and not wanting to do anything with mmaps, like:
brk
),Actually, the wait-cpu thread is my IO thread, which send/receive package from client, in my observation, the package always exist, but the IO thread cannot receive it.
You should check stack of your wait-cpu thread (there is even SysRq for this), and kind of I/O. mmap
ing of file is the variant of I/O which will be blocked on mmap_sem
by fork.
Also you can check the "last used CPU" of the wait-cpu thread, e.g. in the top
monitoring utility, by enabling the thread view (H
key) and adding "Last used CPU" column to output (fj
in older; f
scroll to P
, enter in newer). I think it is possible that your wait-cpu thread already was on the other CPU, just not allowed (not ready) to run.
If you are using fork only to make exec
, it can be useful to:
vfork
+exec
(or just to posix_spawn
). vfork
will suspend your process (but may not suspend your other threads, it is dangerous) until new process will do exec
or exit
, but execing may be faster than waiting for 65 GB of mmaps to be copied.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With