Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Run an external command and get the amount of CPU it consumed

Tags:

python

Pretty simple, I'd like to run an external command/program from within a Python script, once it is finished I would also want to know how much CPU time it consumed.

Hard mode: running multiple commands in parallel won't cause inaccuracies in the CPU consumed result.

like image 600
Noah McIlraith Avatar asked Dec 15 '12 03:12

Noah McIlraith


4 Answers

On UNIX: either (a) use resource module (also see answer by icktoofay), or (b) use the time command and parse the results, or (c) use /proc filesystem, parse /proc/[pid]/stat and parse out utime and stime fields. The last of these is Linux-specific.

Example of using resource:

import subprocess, resource
usage_start = resource.getrusage(resource.RUSAGE_CHILDREN)
subprocess.call(["yourcommand"])
usage_end = resource.getrusage(resource.RUSAGE_CHILDREN)
cpu_time = usage_end.ru_utime - usage_start.ru_utime

Note: it is not necessary to do fork/execvp, subprocess.call() or the other subprocess methods are fine here and much easier to use.

Note: you could run multiple commands from the same python script simultaneously either using subprocess.Popen or subprocess.call and threads, but resource won't return their correct individual cpu times, it will return the sum of their times in between calls to getrusage; to get the individual times, run one little python wrapper per command to time it as above (could launch those from your main script), or use the time method below which will work correctly with multiple simultaneous commands (time is basically just such a wrapper).

Example of using time:

import subprocess, StringIO
time_output = StringIO.StringIO()
subprocess.call(["time", "yourcommand", "youroptions"], stdout=time_output)
# parse time_output

On Windows: You need to use performance counters (aka "performance data helpers") somehow. Here is a C example of the underlying API. To get it from python, you can use one of two modules: win32pdh (part of pywin32; sample code) or pyrfcon (cross-platform, also works on Unix; sample code).

Any of these methods actually meet the "hard mode" requirements above: they should be accurate even with multiple running instances of different processes on a busy system. They may not produce the exact same results in that case compared to running just one process on an idle system, because process switching does have some overhead, but they will be very close, because they ultimately get their data from the OS scheduler.

like image 82
Alex I Avatar answered Oct 19 '22 11:10

Alex I


On platforms where it's available, the resource module may provide what you need. If you need to time multiple commands simultaneously, you may want to (for each command you want to run) fork and then create the subprocess so you get information for only that process. Here's one way you might do this:

def start_running(command):
    time_read_pipe, time_write_pipe = os.pipe()
    want_read_pipe, want_write_pipe = os.pipe()
    runner_pid = os.fork()
    if runner_pid != 0:
        os.close(time_write_pipe)
        os.close(want_read_pipe)
        def finish_running():
            os.write(want_write_pipe, 'x')
            os.close(want_write_pipe)
            time = os.read(time_read_pipe, struct.calcsize('f'))
            os.close(time_read_pipe)
            time = struct.unpack('f', time)[0]
            return time
        return finish_running
    os.close(time_read_pipe)
    os.close(want_write_pipe)
    sub_pid = os.fork()
    if sub_pid == 0:
        os.close(time_write_pipe)
        os.close(want_read_pipe)
        os.execvp(command[0], command)
    os.wait()
    usage = resource.getrusage(resource.RUSAGE_CHILDREN)
    os.read(want_read_pipe, 1)
    os.write(time_write_pipe, struct.pack('f', usage.ru_utime))
    sys.exit(0)

You can then use it to run a few commands:

get_ls_time = start_running(['ls'])
get_work_time = start_running(['python', '-c', 'print (2 ** 512) ** 200'])

After that code has executed, both of those commands should be running in parallel. When you want to wait for them to finish and get the time they took to execute, call the function returned by start_running:

ls_time = get_ls_time()
work_time = get_work_time()

Now ls_time will contain the time ls took to execute and work_time will contain the time python -c "print (2 ** 512) ** 200" took to execute.

like image 45
icktoofay Avatar answered Oct 19 '22 10:10

icktoofay


You can do timings within Python, but if you want to know the overall CPU consumption of your program, that is kind of silly to do. The best thing to do is to just use the GNU time program. It even comes standard in most operating systems.

like image 24
Swiss Avatar answered Oct 19 '22 11:10

Swiss


The timeit module of python is very useful for benchmarking/profiling purposes. In addtion to that you can even call it from the command-line interface. To benchmark a external command, you would go like this:

>>> import timeit
>>> timeit.timeit("call(['ls','-l'])",setup="from subprocess import call",number=1) #number defaults to 1 million
total 16
-rw-rw-r-- 1 nilanjan nilanjan 3675 Dec 17 08:23 icon.png
-rw-rw-r-- 1 nilanjan nilanjan  279 Dec 17 08:24 manifest.json
-rw-rw-r-- 1 nilanjan nilanjan  476 Dec 17 08:25 popup.html
-rw-rw-r-- 1 nilanjan nilanjan 1218 Dec 17 08:25 popup.js
0.02114391326904297

The last line is the returned execution time. Here, the first argument to timeit.timeit() is the code for calling the external method and setup argument specifies the code to run before the start of time-measurement. number argument is the number of time you wish to run the specified code and then you can divide the time returned by the number to get average time.

You can also use the timeit.repeat() method which takes similar arguments as timeit.timeit() but takes an additional repeat argument to specify the number of time timeit.timeit() should be called and returns a list of execution times for each run.

Note: The execution time returned by the timeit.timeit() method is the wall clock time, not the CPU time. So, other processes may interfere with the timing. So, in case of timeit.repeat() you should take the minimum value instead of trying to calculate the average or standard deviation.

like image 21
Nilanjan Basu Avatar answered Oct 19 '22 11:10

Nilanjan Basu