Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ sieve of Eratosthenes my code is slow

I'm trying to find the number of prime numbers below 400 million but even with just 40 million my code is taking 8 secs to run. what am i doing wrong?

what can i do to make it faster?

#include<iostream>
#include<math.h>
#include<vector>
using namespace std;
int main()
{
    vector<bool> k;                         
    vector<long long int> c;                
    for (int i=2;i<40000000;i++)
    {
        k.push_back(true);                  
        c.push_back(i);
    }

    for ( int i=0;i<sqrt(40000000)+1;i++)                            
    {                                                               
        if (k[i]==true)                                              
       {                                                            
           for (int j=i+c[i];j<40000000;j=j+c[i])                  
           {                                                       
               k[j]=false; 
           }
       }
    }
    vector <long long int> arr;
    for ( int i=0;i<40000000-2;i++)
    {
        if (k[i]==true)
        {
            arr.push_back(c[i]);
        }
    }
    cout << arr.size() << endl ;
    return 0;
}
like image 338
Nikhil Avatar asked Dec 10 '22 16:12

Nikhil


2 Answers

I profiled your code as well as a simple tweak, below. The tweak is more than twice as fast:

    auto start = std::chrono::high_resolution_clock::now();

    //original version
    vector<bool> k;                         
    vector<long long int> c;                
    for (int i=2;i<40000000;i++)
      {
        k.push_back(true);                  
        c.push_back(i);
      }

    for ( int i=0;i<sqrt(40000000)+1;i++)                            
      {                                                               
        if (k[i]==true)                                              
          {                                                            
            for (int j=i+c[i];j<40000000;j=j+c[i])                  
              {                                                       
                k[j]=false; 
              }
          }
      }
    vector <long long int> arr;
    for ( int i=0;i<40000000-2;i++)
      {
        if (k[i]==true)
          {
            arr.push_back(c[i]);
          }
      }
    cout << arr.size() << endl ;


    auto end1 = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(end1 - start).count() <<
      std::endl;

  }

  {

    auto begin = std::chrono::high_resolution_clock::now();

    //new version

    const long limit{40000000};
    vector<bool> k(limit-1,true);  

    //k[0] is the number 0
    k[0]=false; k[1]=false;

    auto sq = sqrt(limit) + 1;

    //start at the number 2
    for ( int i=2;i<sq;i++)                            
      {                                                               
        if (k[i]==true)                                              
          {                                                            
            for (int j=i+i;j<limit;j+=i)                  
              {                                                       
                k[j]=false; 
              }
          }
      }


    vector <long long int> arr;
    for ( int i=0;i<limit-2;i++)
      {
        if (k[i]==true)
          {
            arr.push_back(i);
          }
      }
    cout << arr.size() << endl ;



    auto stop = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
      std::endl;

  }

Here is the output (elapsed in milliseconds), in Debug mode:

2433654
Elapsed = 5787
2433654
Elapsed = 2432

Both have same results, second is much faster.

Here is another version using some nice C++ features (requiring less code), and it is about 11% faster than the second version above:

    auto begin = std::chrono::high_resolution_clock::now();

    const long limit{40000000};
    vector<int> k(limit-1,0);

    //fill with sequence of integers
    std::iota(k.begin(),k.end(),0);

    //k[0] is the number 0
    //integers reset to 0 are not prime
    k[0]=0; k[1]=0;

    auto sq = sqrt(limit) + 1;

    //start at the number 2
    for (int i=2;i<sq;i++)                            
      {                                                               
        if (k[i])
          {                                                            
            for (int j=i+i;j<limit;j+=i)                  
              {                                                       
                k[j]=0; 
              }
          }
      }

    auto results = std::remove(k.begin(),k.end(),0);

    cout << results - k.begin() << endl ;


    auto stop = std::chrono::high_resolution_clock::now();
    std::cout << "Elapsed = " << 
      std::chrono::duration_cast<std::chrono::milliseconds>(stop - begin).count() <<
      std::endl;

  }

Note that in your original version, you push_back in three different places, while this use of modern idioms never uses push_back at all when operating on the vectors.

In this example, the vector is of ints so that you have the actual list of prime numbers when you are finished.

Output:

2433654
Elapsed = 2160

These above are all Debug mode numbers.

In Release mode, the best is a combination of the second and third techniques above, using the numeric with a vector of bools, if you don't care what the actual prime numbers are in the end:

2433654
Elapsed = 1098
2433654
Elapsed bool remove= 410
2433654
Elapsed = 779

Note that your original code only takes about 1 second on my 5 year-old laptop in Release mode, so you are probably running in Debug mode.

like image 65
johnbakers Avatar answered Jan 01 '23 07:01

johnbakers


I got it down from taking 10 seconds to run to just half a second on my computer by changing two things. First, I'm guessing you didn't compile it with optimization enabled. That brought it from 10 seconds down to 1 second for me. Second, the vector c is unnecessary. Everywhere you have c[i] in your code you can replace it with i+2. This will make it run twice as fast.

like image 37
Aaron Avatar answered Jan 01 '23 06:01

Aaron