Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare performance of two pieces of codes

I have a friendly competition with couple of guys in the field of programming and recently we have become so interested in writing efficient code. Our challenge was to try to optimize the code (in sense of cpu time and complexity) at any cost (readability, reusability, etc).

The problem is, now we need to compare our codes and see which approach is better comparing to the others but we don't know any tools for this purpose.

My question is, are there some (any!) tools that takes a piece of code as input and calculates the number of flops or cpu instructions necessary for running it? Is there any tool can measure the optimacy of a code?

P.S. The target language is c++ but would be nice to know if such tools exists also for java.

like image 564
Pouya Avatar asked Sep 09 '12 16:09

Pouya


People also ask

What is code performance?

What Is the Performance of Code? Wiktionary says that performance is “the amount of useful work accomplished estimated in terms of time needed, resources used, etc.” In the context of this article, the time needed directly depends on how the code uses the CPU, and the resources used refer to the computer memory.


3 Answers

Here's a little C++11 stopwatch I like to roll out when I need to time something:

#include <chrono>
#include <ctime>

template <typename T> class basic_stopwatch
{
    typedef T clock;
    typename clock::time_point p;
    typename clock::duration   d;

public:
    void tick()  { p  = clock::now();            }
    void tock()  { d += clock::now() - p;        }
    void reset() { d  = clock::duration::zero(); }

    template <typename S> unsigned long long int report() const
    {
        return std::chrono::duration_cast<S>(d).count();
    }

    unsigned long long int report_ms() const
    {
        return report<std::chrono::milliseconds>();
    }

    basic_stopwatch() : p(), d() { }
};

struct c_clock
{
    typedef std::clock_t time_point;
    typedef std::clock_t duration;
    static time_point now() { return std::clock(); }
};

template <> unsigned long long int basic_stopwatch<c_clock>::report_ms() const
{
  return 1000. * double(d) / double(CLOCKS_PER_SEC);
}

typedef basic_stopwatch<std::chrono::high_resolution_clock> stopwatch;
typedef basic_stopwatch<c_clock> cstopwatch;

Usage:

stopwatch sw;
sw.tick();

run_long_code();

sw.tock();
std::cout << "This took " << sw.report_ms() << "ms.\n";

On any decent implementation, the default high_resolution_clock should give very accurate timing information.

like image 50
Kerrek SB Avatar answered Oct 21 '22 01:10

Kerrek SB


There is the std::clock() function from <ctime> which returns how much CPU time was spent on the current process (that means it doesn't count the time the program was idling because the CPU was executing other tasks). This function can be used to accurately measure execution times of algorithms. Use the constant std::CLOCKS_PER_SEC (also from <ctime>) to convert the return value into seconds.

like image 43
Philipp Avatar answered Oct 21 '22 01:10

Philipp


From the inline-assembly, you can use rdtsc instruction to get 32-bit(least significant part) counter into eax and 32-bit(highest significant part) to edx. If your code is too small, you can check total-approimate cpu-cycles with just eax register. If count is more than max. of 32-bit value, edx increments per max-32-bit value cycle.

int cpu_clk1a=0;
int cpu_clk1b=0;
int cpu_clk2a=0;
int cpu_clk2b=0;
int max=0;
std::cin>>max; //loop limit

__asm
{
    push eax
    push edx
    rdtsc    //gets current cpu-clock-counter into eax&edx
    mov [cpu_clk1a],eax
    mov [cpu_clk1b],edx
    pop edx
    pop eax

}

long temp=0;
for(int i=0;i<max;i++)
{

    temp+=clock();//needed to defy optimization to  actually measure something
                          //even the smartest compiler cannot know what 
                          //the clock would be
}

__asm
{
    push eax
    push edx
    rdtsc     //gets current cpu-clock-counter into aex&edx
    mov [cpu_clk2a],eax
    mov [cpu_clk2b],edx
    pop edx
    pop eax

}
std::cout<<(cpu_clk2a-cpu_clk1a)<<std::endl;
   //if your loop takes more than ~2billions of cpu-clocks, use cpu_clk1b and 2b
getchar();
getchar();

Output: 74000 cpu-cycles for 1000 iterations and 800000 cpu-cycles for 10000 iterations on my machine. Because clock() is time-consuming.

Cpu-cycle resolution on my machine: ~1000 cycles. Yes, you need more than several thousands of addition/subtraction(fast instructions) to measure it relatively correct.

Assuming cpu working frequency being constant, 1000 cpu-cycles is nearly equal to 1 micro-seconds for a 1GHz cpu. You should warm your cpu up before doing this.

like image 1
huseyin tugrul buyukisik Avatar answered Oct 21 '22 02:10

huseyin tugrul buyukisik