Is there a quick/easy way to do this (for at least a rough estimate) ? I'm benchmarking algorithms and I thought it would be cool to know the absolute speed at which my computer is executing instructions and compare that with my asymptotic analysis.

If you want to know what your CPU can do, then look at the documentation. Your CPU vendor specifies the latency and throughput of all instructions, as well as a variety of other information (how many instructions can be issued or retired per cycle, cache latencies and much more). Based on this, you can compute the theoretical peak throughput. If you want to do what your CPU is actually doing then run your own code and measure its performance. However, keep in mind that modern CPUs are really complex beasts, and their performance depends on a wide variety of factors, and you will very rarely be able to come anywhere near maxing out your CPU, and understanding why, or what exactly is holding your code back requires a fairly thorough understanding of the hardware. (My usual rule of thumb is that you're doing very good if you get a sustained 30-40% of the theoretical peak FLOPS)

Modern CPUs are pipelining instruction processing, so there is no constant as such. You could however, read out number of CPU ticks at start of your algo and at the end. I think this is as low level as you can get with such measuring. http://en.wikipedia.org/wiki/Time_Stamp_Counter Note: There are a lot of problems why this won't be 100% accurate, I can mention few, but I am sure the community will be able to add to the list: -OS pre-empting you process -cache misses (algo will run slower the first time, faster if it's ran subsequently) -on older CPUs, the CPU ticks are not invariant to the CPU frequency

How can you test how many instructions per second your computer can do?

4 Answers

If you want to know what your CPU can do, then look at the documentation. Your CPU vendor specifies the latency and throughput of all instructions, as well as a variety of other information (how many instructions can be issued or retired per cycle, cache latencies and much more). Based on this, you can compute the theoretical peak throughput.

If you want to do what your CPU is actually doing then run your own code and measure its performance.

However, keep in mind that modern CPUs are really complex beasts, and their performance depends on a wide variety of factors, and you will very rarely be able to come anywhere near maxing out your CPU, and understanding why, or what exactly is holding your code back requires a fairly thorough understanding of the hardware. (My usual rule of thumb is that you're doing very good if you get a sustained 30-40% of the theoretical peak FLOPS)

142

answered Oct 23 '22 16:10

jalf

This is a typical case of "In theory, theory and practice is the same, in practice they are not".

Modern CPU's have very sophisticated logic in them, which means that the ACTUAL number of operations performed is different from what you'd think from just looking at code or thinking about the problem [unless you have a brain the size of a small planet and know how that particular CPU works]. For example, a processor may speculatively execute instructions on one or another side of a branch, even if it hasn't quite got to the branch - if that's the "wrong" side, then it will discard the results of those instructions - but of course it took time to execute them.

Instructions are also executed out of order, which means that it's hard to predict exactly which instruction will execute when. There are some exceptions.

You will only get (anywhere near) the theoretical throughput if you are pushing data and instructions through all the available execution units at once - this means having the right mix of instructions, and of course ALL of the code and data in caches.

So, in theory we could stuff the processor full of instructions that maxes it out, by writing very clever code. In practice, that turns very very very quickly into a hard task.

However, the question is about measuring the throughput of instructions, and on modern CPU's, this is very much possible with the right extra software. On linux perftool or oprofile, for windows there is Intel's VTune and AMD's Code Analyst. These will allow you (subject to sufficient privileges) to fetch the "performance counters" in the processor, which has counters for "number of instructions", "number of float operatons", "number of cache misses", "branch mispredicted" and many, many other measurements of processor performance. So given a sufficient length of runtime (at least a few seconds, preferrably more), you can measure the actual count or clock cycles that a processor performs.

answered Oct 23 '22 16:10

Mats Petersson

In practice these days, the effective number of instructions depends primarily on memory latency, which is the main bottleneck on performance. Waiting for data is bad. Processors can alleviate this problem somewhat with techniques such as caching, pipelining and concurrency, but the issue remains and will only get worse over time.

Proper implementation can make a huge difference. You may want to check out this question about cache-friendly code.

answered Oct 23 '22 18:10

Marc Claesen

Modern CPUs are pipelining instruction processing, so there is no constant as such.

You could however, read out number of CPU ticks at start of your algo and at the end. I think this is as low level as you can get with such measuring.

http://en.wikipedia.org/wiki/Time_Stamp_Counter

Note: There are a lot of problems why this won't be 100% accurate, I can mention few, but I am sure the community will be able to add to the list: -OS pre-empting you process -cache misses (algo will run slower the first time, faster if it's ran subsequently) -on older CPUs, the CPU ticks are not invariant to the CPU frequency

answered Oct 23 '22 17:10

Patrik Beck

Related questions
                            
                                Using Robot Framework for ATDD
                            
                                How do I properly implement a "minimize to tray" function in Qt?
                            
                                How to use a C++ string in a structure when malloc()-ing the same structure?
                            
                                std::vector iterator invalidation
                            
                                Why can we delete arrays, but not know the length in C/C++?
                            
                                why call operator new explicitly
                            
                                Easiest way to build a cross-platform application
                            
                                Replacement for MS _vscprintf on MacOS/Linux?
                            
                                QSQLITE driver not loaded - where to put qt database driver plugins
                            
                                Usage of union inside a class
                            
                                How to use high and low bytes?
                            
                                Is it not possible to call C++ operators manually?
                            
                                How to get screenshot of a window as bitmap object in C++?
                            
                                Read floats from a .txt file
                            
                                How to align face images c++ opencv
                            
                                How can I execute two threads asynchronously using boost?
                            
                                Avoid newline in qDebug()
                            
                                Why does chrono have its own namespace?
                            
                                How to run application with Admin privileges using Manifest file in Visual Studio 2005?
                            
                                Why will std::sort crash if the comparison function is not as operator <?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can you test how many instructions per second your computer can do?

Tags:

c++

c

algorithm

sorting

benchmarking

ordinary

People also ask