Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my C++ code so much slower than R?

Tags:

c++

r

I have written the following codes in R and C++ which perform the same algorithm:

a) To simulate the random variable X 500 times. (X has value 0.9 with prob 0.5 and 1.1 with prob 0.5)

b) Multiply these 500 simulated values together to get a value. Save that value in a container

c) Repeat 10000000 times such that the container has 10000000 values

R:

ptm <- proc.time()
steps <- 500
MCsize <- 10000000
a <- rbinom(MCsize,steps,0.5)
b <- rep(500,times=MCsize) - a
result <- rep(1.1,times=MCsize)^a*rep(0.9,times=MCsize)^b
proc.time()-ptm

C++

#include <numeric>
#include <vector>
#include <iostream>
#include <random>
#include <thread>
#include <mutex>
#include <cmath>
#include <algorithm>
#include <chrono>

const size_t MCsize = 10000000;
std::mutex mutex1;
std::mutex mutex2;
unsigned seed_;
std::vector<double> cache;

void generatereturns(size_t steps, int RUNS){
    mutex2.lock();
    // setting seed
    try{    
        std::mt19937 tmpgenerator(seed_);
        seed_ = tmpgenerator();
        std::cout << "SEED : " << seed_ << std::endl;
    }catch(int exception){
        mutex2.unlock();
    }
    mutex2.unlock();

    // Creating generator
    std::binomial_distribution<int> distribution(steps,0.5);
    std::mt19937 generator(seed_);

    for(int i = 0; i!= RUNS; ++i){
        double power;
        double returns;
        power = distribution(generator);
        returns = pow(0.9,power) * pow(1.1,(double)steps - power);
        std::lock_guard<std::mutex> guard(mutex1);
        cache.push_back(returns);
    }
}    


int main(){
    std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
    size_t steps = 500;
    seed_ = 777;    

    unsigned concurentThreadsSupported = std::max(std::thread::hardware_concurrency(),(unsigned)1);
    int remainder = MCsize % concurentThreadsSupported;

    std::vector<std::thread> threads;
    // starting sub-thread simulations
    if(concurentThreadsSupported != 1){
        for(int i = 0 ; i != concurentThreadsSupported - 1; ++i){
            if(remainder != 0){
                threads.push_back(std::thread(generatereturns,steps,MCsize /     concurentThreadsSupported + 1));
                remainder--;
            }else{
                threads.push_back(std::thread(generatereturns,steps,MCsize /     concurentThreadsSupported));
            }
        }
    }

    //starting main thread simulation
    if(remainder != 0){
        generatereturns(steps, MCsize / concurentThreadsSupported + 1);
        remainder--;
    }else{
        generatereturns(steps, MCsize / concurentThreadsSupported);
    }

    for (auto& th : threads) th.join();

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now() ;
    typedef std::chrono::duration<int,std::milli> millisecs_t ;
    millisecs_t duration( std::chrono::duration_cast<millisecs_t>(end-start) ) ;
    std::cout << "Time elapsed : " << duration.count() << " milliseconds.\n" ;

    return 0;
}

I can't understand why my R code is so much faster than my C++ code (3.29s vs 12s) even though I have used four threads in the C++ code? Can anyone enlighten me please? How should I improve my C++ code to make it run faster?

EDIT:

Thanks for all the advice! I reserved capacity for my vectors and reduced the amount of locking in my code. The crucial update in the generatereturns() function is :

std::vector<double> cache(MCsize);
std::vector<double>::iterator currit = cache.begin();
//.....

// Creating generator
std::binomial_distribution<int> distribution(steps,0.5);
std::mt19937 generator(seed_);
std::vector<double> tmpvec(RUNS);
for(int i = 0; i!= RUNS; ++i){
    double power;
    double returns;
    power = distribution(generator);
    returns = pow(0.9,power) * pow(1.1,(double)steps - power);
    tmpvec[i] = returns;
}
std::lock_guard<std::mutex> guard(mutex1);
std::move(tmpvec.begin(),tmpvec.end(),currit);
currit += RUNS;

Instead of locking every time, I created a temporary vector and then used std::move to shift the elements in that tempvec into cache. Now the elapsed time has reduced to 1.9seconds.

like image 671
user22119 Avatar asked Sep 30 '22 21:09

user22119


2 Answers

First of all, are you running it in release mode? Switching from debug to release reduced the running time from ~15s to ~4.5s on my laptop (windows 7, i5 3210M).

Also, reducing the number of threads to 2 instead of 4 in my case (I just have 2 cores but with hyperthreading) further reduced the running time to ~2.4s.

Changing the variable power to int (as jimifiki also suggested) also offered a slight boost, reducing the time to ~2.3s.

like image 60
George Avatar answered Oct 17 '22 10:10

George


I really enjoyed your question and I tried the code at home. I tried to change the random number generator, my implementation of std::binomial_distribution requires on average about 9.6 calls of generator().

I know the question is more about comparing R with C++ performances, but since you ask "How should I improve my C++ code to make it run faster?" I insist with pow optimization. You can easily avoid one half of the call by precomputing either 0.9^steps or 1.1^steps before the for loop. This makes your code run a bit faster:

double power1 = pow(0.9,steps);
double ratio = 1.1/0.9;
for(int i = 0; i!= RUNS; ++i){
  ... 
  returns = myF1 * pow(myF2, (double)power); 

Analogously you can improve the R code:

...
ratio <-1.1/0.9
pow1 = 0.9^steps
result <- rep(ratio,times=MCsize)^rep(pow1,times=MCsize)
...
like image 1
jimifiki Avatar answered Oct 17 '22 09:10

jimifiki