Unusual Speed Difference between Python and C++

Question

I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it. For a speed comparison I decided to make the most direct translation of the algorithm I knew of from python to c++.

Surprisingly, the c++ version runs significantly slower than the python version. Accurate speed tests between the execution times for discovering the first 10,000 happy numbers indicate the python program runs on average in 0.59 seconds and the c++ version runs on average in 8.5 seconds.

I would attribute this speed difference to the fact that I had to write helper functions for parts of the calculations (for example determining if an element is in a list/array/vector) in the c++ version which were already built in to the python language.

Firstly, is this the true reason for such an absurd speed difference, and secondly, how can I change the c++ version to execute more quickly than the python version (the way it should be in my opinion).

The two pieces of code, with speed testing are here: Python Version, C++ Version. Thanks for the help.

#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <windows.h>

using namespace std;

bool inVector(int inQuestion, vector<int> known);
int sum(vector<int> given);
int pow(int given, int power);
void calcMain(int upperBound);

int main()
{
    while(true)
    {
        int upperBound;
        cout << "Pick an upper bound: ";
        cin >> upperBound;
        long start, end;
        start = GetTickCount();
        calcMain(upperBound);
        end = GetTickCount();
        double seconds = (double)(end-start) / 1000.0;
        cout << seconds << " seconds." << endl << endl;
    }
    return 0;
}

void calcMain(int upperBound)
{
    vector<int> known;
    for(int i = 0; i <= upperBound; i++)
    {
        bool next = false;
        int current = i;
        vector<int> history;
        while(!next)
        {
            char* buffer = new char[10];
            itoa(current, buffer, 10);
            string digits = buffer;
            delete buffer;
            vector<int> squares;
            for(int j = 0; j < digits.size(); j++)
            {
                char charDigit = digits[j];
                int digit = atoi(&charDigit);
                int square = pow(digit, 2);
                squares.push_back(square);
            }
            int squaresum = sum(squares);
            current = squaresum;
            if(inVector(current, history))
            {
                next = true;
                if(current == 1)
                {
                    known.push_back(i);
                    //cout << i << "	";
                }
            }
            history.push_back(current);
        }
    }
    //cout << "

";
}

bool inVector(int inQuestion, vector<int> known)
{
    for(vector<int>::iterator it = known.begin(); it != known.end(); it++)
        if(*it == inQuestion)
            return true;
    return false;
}

int sum(vector<int> given)
{
    int sum = 0;
    for(vector<int>::iterator it = given.begin(); it != given.end(); it++)
        sum += *it;
    return sum;
}

int pow(int given, int power)
{
    int original = given;
    int current = given;
    for(int i = 0; i < power-1; i++)
        current *= original;
    return current;
}

#!/usr/bin/env python

import timeit

upperBound = 0

def calcMain():
    known = []
    for i in range(0,upperBound+1):
        next = False
        current = i
        history = []
        while not next:
            digits = str(current)
            squares = [pow(int(digit), 2) for digit in digits]
            squaresum = sum(squares)
            current = squaresum
            if current in history:
                next = True
                if current == 1:
                    known.append(i)
                    ##print i, "	",
            history.append(current)
    ##print "
end"

while True:    
    upperBound = input("Pick an upper bound: ")
    result = timeit.Timer(calcMain).timeit(1)
    print result, "seconds.
"

Asik · Accepted Answer

For 100000 elements, the Python code took 6.9 seconds while the C++ originally took above 37 seconds.

I did some basic optimizations on your code and managed to get the C++ code above 100 times faster than the Python implementation. It now does 100000 elements in 0.06 seconds. That is 617 times faster than the original C++ code.

The most important thing is to compile in Release mode, with all optimizations. This code is literally orders of magnitude slower in Debug mode.

Next, I will explain the optimizations I did.

Moved all vector declarations outside of the loop; replaced them by a clear() operation, which is much faster than calling the constructor.
Replaced the call to pow(value, 2) by a multiplication : value * value.
Instead of having a squares vector and calling sum on it, I sum the values in-place using just an integer.
Avoided all string operations, which are very slow compared to integer operations. For instance, it is possible to compute the squares of each digit by repeatedly dividing by 10 and fetching the modulus 10 of the resulting value, instead of converting the value to a string and then each character back to int.
Avoided all vector copies, first by replacing passing by value with passing by reference, and finally by eliminating the helper functions completely.
Eliminated a few temporary variables.
And probably many small details I forgot. Compare your code and mine side-by-side to see exactly what I did.

It may be possible to optimize the code even more by using pre-allocated arrays instead of vectors, but this would be a bit more work and I'll leave it as an exercise to the reader. :P

Here's the optimized code :

#include <iostream> #include <vector> #include <string> #include <ctime> #include <algorithm> #include <windows.h>  using namespace std;  void calcMain(int upperBound, vector<int>& known);  int main() {     while(true)     {         vector<int> results;         int upperBound;         cout << "Pick an upper bound: ";         cin >> upperBound;         long start, end;         start = GetTickCount();         calcMain(upperBound, results);         end = GetTickCount();         for (size_t i = 0; i < results.size(); ++i) {             cout << results[i] << ", ";         }         cout << endl;         double seconds = (double)(end-start) / 1000.0;         cout << seconds << " seconds." << endl << endl;     }     return 0; }  void calcMain(int upperBound, vector<int>& known) {     vector<int> history;     for(int i = 0; i <= upperBound; i++)     {         int current = i;         history.clear();         while(true)         {                 int temp = current;                 int sum = 0;                 while (temp > 0) {                     sum += (temp % 10) * (temp % 10);                     temp /= 10;                 }                 current = sum;                 if(find(history.begin(), history.end(), current) != history.end())                 {                         if(current == 1)                         {                                 known.push_back(i);                         }                         break;                 }                 history.push_back(current);         }     } }

ilya n. · Answer

There's a new, radically faster version as a separate answer, so this answer is deprecated.

I rewrote your algorithm by making it cache whenever it finds the number to be happy or unhappy. I also tried to make it as pythonic as I could, for example by creating separate functions digits() and happy(). Sorry for using Python 3, but I get to show off a couple a useful things from it as well.

This version is much faster. It runs at 1.7s which is 10 times faster than your original program that takes 18s (well, my MacBook is quite old and slow :) )

#!/usr/bin/env python3  from timeit import Timer from itertools import count  print_numbers = False upperBound = 10**5  # Default value, can be overidden by user.   def digits(x:'nonnegative number') -> "yields number's digits":     if not (x >= 0): raise ValueError('Number should be nonnegative')     while x:         yield x % 10         x //= 10   def happy(number, known = {1}, happies = {1}) -> 'True/None':     '''This function tells if the number is happy or not, caching results.      It uses two static variables, parameters known and happies; the     first one contains known happy and unhappy numbers; the second      contains only happy ones.      If you want, you can pass your own known and happies arguments. If     you do, you should keep the assumption commented out on the 1 line.      '''  #        assert 1 in known and happies <= known  # <= is expensive      if number in known:         return number in happies      history = set()     while True:         history.add(number)         number = sum(x**2 for x in digits(number))         if number in known or number in history:             break      known.update(history)     if number in happies:         happies.update(history)         return True   def calcMain():     happies = {x for x in range(upperBound) if happy(x) }     if print_numbers:         print(happies)   if __name__ == '__main__':     upperBound = eval(             input("Pick an upper bound [default {0}]: "                     .format(upperBound)).strip()             or repr(upperBound))     result = Timer(calcMain).timeit(1)     print ('This computation took {0} seconds'.format(result))

John · Answer

It looks like you're passing vectors by value to other functions. This will be a significant slowdown because the program will actually make a full copy of your vector before it passes it to your function. To get around this, pass a constant reference to the vector instead of a copy. So instead of:

int sum(vector<int> given)

Use:

int sum(const vector<int>& given)

When you do this, you'll no longer be able to use the vector::iterator because it is not constant. You'll need to replace it with vector::const_iterator.

You can also pass in non-constant references, but in this case, you don't need to modify the parameter at all.

Unusual Speed Difference between Python and C++

Tags:

c++

performance

python

algorithm

Keegan Jay

3 Answers

Asik

ilya n.

John

Recent Activity

Donate For Us

Unusual Speed Difference between Python and C++

Tags:

c++

performance

python

algorithm

Keegan Jay

3 Answers

Asik

ilya n.

John

Related questions

Recent Activity

Donate For Us