How to increase CPU frequency of newly spawned process

Tags:

I've been working on a hobby project for a while (written in C), and it's still far from complete. It's very important that it will be fast, so I recently decided to do some benchmarking to verify that my way of solving the problem wouldn't be inefficient.

Click to copy

$ time ./old
real 1m55.92
user 0m54.29
sys 0m33.24

I redesigned parts of the program to significantly remove unnecessary operations, reduced memory cache misses and branch mispredictions. The wonderful Callgrind tool was showing me more and more impressive numbers. Most of the benchmarking was done without forking external processes.

Click to copy

$ time ./old --dry-run
real 0m00.75
user 0m00.28
sys 0m00.24

$ time ./new --dry-run
real 0m00.15
user 0m00.12
sys 0m00.02

Clearly I was at least doing something right. Yet running the program for real told a different story.

Click to copy

$ time ./new
real 2m00.29
user 0m53.74
sys 0m36.22

As you might have noticed, the time is mostly dependent on the external processes. I don't know what caused the regression. There's nothing really weird about it; just a traditional vfork/execve/waitpid done by a single thread, running the same programs in the same order.

Something had to be causing forking to be slow, so I made a small test (similar to the one below) that would only spawn the new processes and have none of the overhead associated with my program. Obviously this had to be the fastest.

Click to copy

#define _GNU_SOURCE
#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, const char **argv)
{
    static const char *const _argv[] = {"/usr/bin/md5sum", "test.c", 0};

    int fd = open("/dev/null", O_WRONLY);
    dup2(fd, STDOUT_FILENO);
    close(fd);

    for (int i = 0; i < 100000; i++)
    {
        int pid = vfork();
        int status;
        if (!pid)
        {
            execve("/usr/bin/md5sum", (char*const*)_argv, environ);
            _exit(1);
        }
        waitpid(pid, &status, 0);
    }
    return 0;
}

$ time ./test
real 1m58.63
user 0m68.05
sys 0m30.96

I guess not.

At this time I decided to vote performance for governor, and times got better:

Click to copy

$ for i in 0 1 2 3 4 5 6 7; do sudo sh -c "echo performance > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor";done
$ time ./test
real 1m03.44
user 0m29.30
sys 0m10.66

It seems like every new process gets scheduled on a separate core and it takes a while for it to switch to a higher frequency. I can't say why the old version ran faster. Maybe it was lucky. Maybe it (due to it's inefficiency) caused the CPU to choose a higher frequency earlier.

A nice side effect of changing governor was that compile times improved too. Apparently compiling requires forking many new processes. It's not a workable solution though, as this program will have to run on other people's desktops (and laptops).

The only way I found to improve the original times was to restrict the program (and child processes) to a single CPU by adding this code at the beginning:

Click to copy

cpu_set_t mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
sched_setaffinity(0, sizeof(mask), &mask);

Which actually was the fastest despite using the default "ondemand" governor:

Click to copy

$ time ./test
real 0m59.74
user 0m29.02
sys 0m10.67

Not only is it a hackish solution, but it doesn't work well in case the launched program uses multiple threads. There's no way for my program to know that.

Does anyone have any idea for how to get the spawned processes to run at high CPU clock frequency? It has to be automated and not require su priviliges. Though I've only tested this on Linux so far, I intend to port this to more or less all popular and impopular desktop OSes (and it will also run on servers). Any idea on any platform is welcome.

793

asked May 26 '13 16:05

torso

1 Answers

CPU frequency is seen (by the most OSs) as a system property. Thus, you can't change it without root rights. There exists some research on extensions to allow an adoption for specific programs; however since the energy/performance model differs even for the same general architecture, you will hardly find a general solution.

In addition, be aware that in order to guarantee fairness, the linux scheduler shares the execution time of perent and child processes for the first epoch of the child. This might have an impact to your problem.

174

answered Oct 11 '22 14:10

Matthias

Related questions
                            
                                performance impact of "hot" and "inline" combination for a function definition
                            
                                Adding watches to Inotify in multi-threaded program
                            
                                What is the simplest way to create several HSQLDB server databases? [closed]
                            
                                Linux :Identifying pages in memory
                            
                                How can I translate Linux keycodes from /dev/input/event* to ASCII
                            
                                How is the initial value of the stack pointer determined?
                            
                                Cross-compile to Linux from OS X
                            
                                CURL: How retain cookies between requests?
                            
                                waitpid and pthread_cond_wait(3)
                            
                                create a ramdisk in C++ on linux
                            
                                Simple cache profiling API
                            
                                Connection keep-alive problems
                            
                                Half-duplex serial communications in Python
                            
                                Shell redirection and file I/O durations
                            
                                Serial communication with Arduino only works while the screen is running
                            
                                How to keep parent and child process on same core
                            
                                Duplicated memory management symbols in libc.so and ld-linux.so
                            
                                Combining existing rootfs with custom toolchain
                            
                                Segmentation fault: 0x0000000000000001 in ?? () trying to compile/link under Linux
                            
                                what is difference between io_submit and file with O_ASYNC

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to increase CPU frequency of newly spawned process

Tags:

performance

linux

fork

scheduling

torso

People also ask

1 Answers

Matthias

Recent Activity

Donate For Us