I have the following test program with a simple function that finds primes which I am trying to run in multiple threads (just as an example).
#include <cstdio>
#include <iostream>
#include <ctime>
#include <thread>
void primefinder(void)
{
int n = 300000;
int i, j;
int lastprime = 0;
for(i = 2; i <= n; i++) {
for(j = 2; j <= i; j++) {
if((i % j) == 0) {
if(i == j)
lastprime = i;
else {
break;
}
}
}
}
std::cout << "Prime: " << lastprime << std::endl;
}
int main(void)
{
std::clock_t start;
start = std::clock();
std::thread t1(primefinder);
t1.join();
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
start = std::clock();
std::thread t2(primefinder);
std::thread t3(primefinder);
t2.join();
t3.join();
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
return 0;
}
As shown, I run the function once in 1 thread and then once in 2 different threads. I compile it with g++ using -O3 and -pthread. I am running it on Linux Mint 18. I have a Core i5-4670. I know it comes down to the OS but I would very much expect these threads to run in somewhat parallel. When I run the program, top shows 100% CPU when using 1 thread and 200% CPU when using 2 threads. Despite this the second run takes almost exactly twice as long.
The CPU is doing nothing else while running the program. Why doesn't this get executed in parallel ?
Edit: I know both threads are doing the exact same thing. I chose the primerfinder function simply as an example of something embarrassingly parallel so when I run it in multiple threads it should take just as long in real time.
Using std::clock to time a parallel program in c++ is very deceptive. There are two types of time that we care about when timing a program: wall time and cpu time. Wall time is what we are all used to (think clock on a wall). Cpu time is essentially how many cpu cycles your program used. std::clock returns cpu time (this is why you are dividing by CLOCKS_PER_SEC) and cpu time is only equal to wall time when there is one thread of execution. If a program can be run 100% in parallel (like your's), then cpu time = (number of threads)*(wall time). So seeing almost exactly twice as long is exactly what you would expect.
For measuring wall time (which is what you want to do), I don't know of a way to do that using the STL. You can measure it using OpenMP or Boost.
omp_get_wtime()
Boost Timer
Since you are on linux, the version of g++ that you are using more than likely has openmp support built in.
#include <omp.h>
const double startTime = omp_get_wtime();
..... //Work goes here
const double time = omp_get_wtime() - startTime;
You will have to compile with -fopenmp
EDIT:
As johnbakers pointed out, the chrono library does have a wall clock
#include <chrono>
auto start = std::chrono::system_clock::now();
.... //Do some work
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Time: " << diff.count() << "(s)" << std::end;
Output of that vs. boost timer:
Boost: 121.685972s wall, 724.940000s user + 67.660000s system = 792.600000s CPU (651.3%)
Chrono: 121.683(s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With