I'm trying to come up with a heuristic to estimate how much energy (say, in Joules) a process or a thread has consumed between two time points. This is on a PC (Linux/x86), not mobile, so the statistics will be used to compare the relative energy efficiency of computations that take similar wall-clock time. The idea is to collect or sample hardware statistics such as cycle counter, p/c states or dynamic frequency, bus accesses, etc., and come up with a reasonable formula for energy usage between measurements. What I'm asking is whether this possible, and what this formula might look like. Some challenges that come to mind: 1) Properly accounting for context switches to other processes (or threads). 2) Properly accounting for the energy used outside the CPU. If we assume negligible I/O, that means mostly RAM. How does allocation amount and/or access pattern affect energy usage? (That is, assuming I have a way to measure dynamic memory allocation to begin with, e.g., with a modified allocator.) 3) Using CPU time as an estimate is limited to coarse-grain and oft-wrong accounting, CPU energy usage only, and assumes fixed clock frequencies. It includes, but doesn't account well for, time spent waiting on RAM.

This is the topic of ongoing research. So don't expect any definite answers. Some publications you might find interesting are for example: <ul> <li>Chunling Hu, Daniel A. Jiménez and Ulrich Kremer, Efficient Program Power Behavior Characterization, Proceedings of the 2007 International Conference on High Performance Embedded Architectures & Compilers (HiPEAC-2007), pp. 183--197, January 2007. (pdf)</li> <li>Adam Lewis, Soumik Ghosh, and N.-F. Tzeng, Run-time Energy Consumption Estimation Based on Workload in Server Systems, USENIX 2008, Workshop on Power Aware Computing and Systems (html pdf)</li> </ul> But you can easily find many more using Google Scholar and Citeseer.

Estimating process energy usage on PCs (x86)

Tags:

cpu

cpu-usage

hardware

hardware-interface

I'm trying to come up with a heuristic to estimate how much energy (say, in Joules) a process or a thread has consumed between two time points. This is on a PC (Linux/x86), not mobile, so the statistics will be used to compare the relative energy efficiency of computations that take similar wall-clock time.

The idea is to collect or sample hardware statistics such as cycle counter, p/c states or dynamic frequency, bus accesses, etc., and come up with a reasonable formula for energy usage between measurements. What I'm asking is whether this possible, and what this formula might look like.

Some challenges that come to mind: 1) Properly accounting for context switches to other processes (or threads).

2) Properly accounting for the energy used outside the CPU. If we assume negligible I/O, that means mostly RAM. How does allocation amount and/or access pattern affect energy usage? (That is, assuming I have a way to measure dynamic memory allocation to begin with, e.g., with a modified allocator.)

3) Using CPU time as an estimate is limited to coarse-grain and oft-wrong accounting, CPU energy usage only, and assumes fixed clock frequencies. It includes, but doesn't account well for, time spent waiting on RAM.

413

asked Dec 19 '10 21:12

Eitan

2 Answers

You may be able to get a figure for the power consumption of your process, but it will only be correct in isolation. For example, if you ran two processes in parallel, you're unlikely to fit a straight line with good accuracy.

This is hard enough to do on embedded platforms with a complete break-out of every voltage rail, let alone on a PC where your one data point is the wattage from the outlet. Things you'll need to measure and bear in mind:

Base load ain't so base. A system idle for many seconds will be in a deeper sleep state than one which isn't. Do you measure 'deep' sleep or just idle? How do you know which you're measuring?
Load isn't always linear. Variable voltage: some components shift voltage up/down depending on load and frequency. Temperature: can go either way these days (not just thermal runaway).
Power supplies aren't the same efficiency at all loads. If you're measuring outlet wattage, you need to bear this in mind. For example, it could be 50% efficient below 100W, 90% from 100-300W and down to 80% 300W+.
Additional processes won't necessarily add linearly. For example, once DDR is out of idle, its base load increases, but additional processes won't make that any worse. This is even more unpredictable with multiple cores and variable frequencies.

The basic way to measure it is the obvious way: record number of watts in idle, record number of watts in use, subtract. You can try running at 50% duty cycle, 25%, 75% and so on, to draw a pretty graph (linear or otherwise). This will show up any non-linearity. Unfortunately conversion efficiency vs load for both CPU regulator and PSU will be the dominant cause. There's not much you can do to eliminate that without having a development version of the motherboard you're playing with (unlikely), or if you're lucky enough to have a PSU with a graph of efficiency vs load.

However, it's important to realize that these data points are only correct in isolation. You can do a pretty good job of modeling how these things will sum up in the system, but be very aware that it's only a good approximation at best. Think of it as being equivalent to looking at some C code for an audio codec and estimating how fast it'll run. You can get a good general idea, but expect to be wildly inaccurate when measured in reality.

Edit - Expanding a little as the above doesn't really answer how you might go about it.

Measuring power consumption: get yourself an accurate wattage meter. As I mentioned, unless you have a way to break out the individual voltage rails and measure current, the only measurement you can make is at the outlet. Alternatively, if you have access to the health monitoring status on the motherboard, and that has current (amps) reporting (rare), that can give you good accuracy and fast response times.

So, measure base wattage - pick whatever situation you think of as "base". Run your test, and measure "peak". Subtract, done. Yes, that's fairly obvious. If you have something where the difference is so small it's lost in the noise, you can try measuring energy usage over time instead (e.g kWh). Try measuring an hour at idle vs an hour with your process running flat out, and see the total energy difference. Repeat similarly for all types of test you want to perform.

You will get noticeable wattage differences for heavy CPU, DDR and GPU users. You might notice the difference between L1 vs L2 vs DDR constrained algorithms (DDR uses much more power), if you're careful to note that the L1/L2 constrained algorithms are running faster - you need to account for energy used per "task" not continuous power. You probably won't notice hard disk access (it's actually just a watt or two and lost in the noise in a PC) other than the performance hit. One extra data point worth recording is how much "base" load increases if you have a task waking up every 100ms or so, using 1% of CPU. That's basically what non-deep-sleep idle looks like. (This is a hack and 100ms is a guess)

Beware that 1% may be different from 1% at another time, if you have a CPU with frequency changing policies enabled.

One final big note: it's of course energy you should be measuring, just as you titled the question. It's very easy to make the mistake of benchmarking power consumption of one task vs another and to conclude one is more expensive... if you forget about the relative performance of them. This always happens with bad tech journalists benchmarking hard disk vs SSD, for example.

On embedded platforms with current monitoring across many rails, I've done measurements down to nanojoules per instruction. It's still difficult to account for energy usage by thread/process because there's a lot of load that's shared by many tasks, and it can increase/decrease outside of its timeslice. On a PC, I'm not sure you'll manage to get as fine grained as that :)

116

answered Sep 28 '22 12:09

John Ripley

This is the topic of ongoing research. So don't expect any definite answers. Some publications you might find interesting are for example:

Chunling Hu, Daniel A. Jiménez and Ulrich Kremer, Efficient Program Power Behavior Characterization, Proceedings of the 2007 International Conference on High Performance Embedded Architectures & Compilers (HiPEAC-2007), pp. 183--197, January 2007. (pdf)
Adam Lewis, Soumik Ghosh, and N.-F. Tzeng, Run-time Energy Consumption Estimation Based on Workload in Server Systems, USENIX 2008, Workshop on Power Aware Computing and Systems (html pdf)

But you can easily find many more using Google Scholar and Citeseer.

answered Sep 28 '22 10:09

Mackie Messer

Related questions
                            
                                Is low-level / embedded systems programming hard for software developers? [closed]
                            
                                Sizing and Capacity Planning Tips and How-to
                            
                                Can I control hardware via PHP Language?
                            
                                Autofocus algorithm for USB microscope
                            
                                Is is necessary to use volatile when writing to hardware in C or C++?
                            
                                Why does multithreaded file transfer improve performance?
                            
                                in Delphi7, How can I retrieve hard disk unique serial number?
                            
                                Why do computers work in binary?
                            
                                What is the easiest way in C# to check if hard disk is SSD without writing any file on hard disk?
                            
                                Is it fair to compare SSE/AVX units to GPU cores?
                            
                                Direct memory access DMA - how does it work?
                            
                                Is the Intel Xeon Phi usable without a costly Intel Compiler?
                            
                                How are interrupts handled by dual processor machines?
                            
                                Whats the best way to determine the hardware requirements for an application
                            
                                Programming for Multi core Processors
                            
                                How does the OS detect hardware?
                            
                                Program exceeding theoretical memory transfer rate
                            
                                Why aren't Floating-Point Decimal numbers hardware accelerated like Floating-Point Binary numbers?
                            
                                Listing available devices in python-opencv
                            
                                Hardware for .NET Micro Framework

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With