I recently wrote a short algorithm to calculate happy numbers in python. The program allows you to pick an upper bound and it will determine all the happy numbers below it. For a speed comparison I decided to make the most direct translation of the algorithm I knew of from python to c++.
Surprisingly, the c++ version runs significantly slower than the python version. Accurate speed tests between the execution times for discovering the first 10,000 happy numbers indicate the python program runs on average in 0.59 seconds and the c++ version runs on average in 8.5 seconds.
I would attribute this speed difference to the fact that I had to write helper functions for parts of the calculations (for example determining if an element is in a list/array/vector) in the c++ version which were already built in to the python language.
Firstly, is this the true reason for such an absurd speed difference, and secondly, how can I change the c++ version to execute more quickly than the python version (the way it should be in my opinion).
The two pieces of code, with speed testing are here: Python Version, C++ Version. Thanks for the help.
#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <windows.h>
using namespace std;
bool inVector(int inQuestion, vector<int> known);
int sum(vector<int> given);
int pow(int given, int power);
void calcMain(int upperBound);
int main()
{
while(true)
{
int upperBound;
cout << "Pick an upper bound: ";
cin >> upperBound;
long start, end;
start = GetTickCount();
calcMain(upperBound);
end = GetTickCount();
double seconds = (double)(end-start) / 1000.0;
cout << seconds << " seconds." << endl << endl;
}
return 0;
}
void calcMain(int upperBound)
{
vector<int> known;
for(int i = 0; i <= upperBound; i++)
{
bool next = false;
int current = i;
vector<int> history;
while(!next)
{
char* buffer = new char[10];
itoa(current, buffer, 10);
string digits = buffer;
delete buffer;
vector<int> squares;
for(int j = 0; j < digits.size(); j++)
{
char charDigit = digits[j];
int digit = atoi(&charDigit);
int square = pow(digit, 2);
squares.push_back(square);
}
int squaresum = sum(squares);
current = squaresum;
if(inVector(current, history))
{
next = true;
if(current == 1)
{
known.push_back(i);
//cout << i << "\t";
}
}
history.push_back(current);
}
}
//cout << "\n\n";
}
bool inVector(int inQuestion, vector<int> known)
{
for(vector<int>::iterator it = known.begin(); it != known.end(); it++)
if(*it == inQuestion)
return true;
return false;
}
int sum(vector<int> given)
{
int sum = 0;
for(vector<int>::iterator it = given.begin(); it != given.end(); it++)
sum += *it;
return sum;
}
int pow(int given, int power)
{
int original = given;
int current = given;
for(int i = 0; i < power-1; i++)
current *= original;
return current;
}
#!/usr/bin/env python
import timeit
upperBound = 0
def calcMain():
known = []
for i in range(0,upperBound+1):
next = False
current = i
history = []
while not next:
digits = str(current)
squares = [pow(int(digit), 2) for digit in digits]
squaresum = sum(squares)
current = squaresum
if current in history:
next = True
if current == 1:
known.append(i)
##print i, "\t",
history.append(current)
##print "\nend"
while True:
upperBound = input("Pick an upper bound: ")
result = timeit.Timer(calcMain).timeit(1)
print result, "seconds.\n"
For 100000 elements, the Python code took 6.9 seconds while the C++ originally took above 37 seconds.
I did some basic optimizations on your code and managed to get the C++ code above 100 times faster than the Python implementation. It now does 100000 elements in 0.06 seconds. That is 617 times faster than the original C++ code.
The most important thing is to compile in Release mode, with all optimizations. This code is literally orders of magnitude slower in Debug mode.
Next, I will explain the optimizations I did.
It may be possible to optimize the code even more by using pre-allocated arrays instead of vectors, but this would be a bit more work and I'll leave it as an exercise to the reader. :P
Here's the optimized code :
#include <iostream> #include <vector> #include <string> #include <ctime> #include <algorithm> #include <windows.h> using namespace std; void calcMain(int upperBound, vector<int>& known); int main() { while(true) { vector<int> results; int upperBound; cout << "Pick an upper bound: "; cin >> upperBound; long start, end; start = GetTickCount(); calcMain(upperBound, results); end = GetTickCount(); for (size_t i = 0; i < results.size(); ++i) { cout << results[i] << ", "; } cout << endl; double seconds = (double)(end-start) / 1000.0; cout << seconds << " seconds." << endl << endl; } return 0; } void calcMain(int upperBound, vector<int>& known) { vector<int> history; for(int i = 0; i <= upperBound; i++) { int current = i; history.clear(); while(true) { int temp = current; int sum = 0; while (temp > 0) { sum += (temp % 10) * (temp % 10); temp /= 10; } current = sum; if(find(history.begin(), history.end(), current) != history.end()) { if(current == 1) { known.push_back(i); } break; } history.push_back(current); } } }
There's a new, radically faster version as a separate answer, so this answer is deprecated.
I rewrote your algorithm by making it cache whenever it finds the number to be happy or unhappy. I also tried to make it as pythonic as I could, for example by creating separate functions digits()
and happy()
. Sorry for using Python 3, but I get to show off a couple a useful things from it as well.
This version is much faster. It runs at 1.7s which is 10 times faster than your original program that takes 18s (well, my MacBook is quite old and slow :) )
#!/usr/bin/env python3 from timeit import Timer from itertools import count print_numbers = False upperBound = 10**5 # Default value, can be overidden by user. def digits(x:'nonnegative number') -> "yields number's digits": if not (x >= 0): raise ValueError('Number should be nonnegative') while x: yield x % 10 x //= 10 def happy(number, known = {1}, happies = {1}) -> 'True/None': '''This function tells if the number is happy or not, caching results. It uses two static variables, parameters known and happies; the first one contains known happy and unhappy numbers; the second contains only happy ones. If you want, you can pass your own known and happies arguments. If you do, you should keep the assumption commented out on the 1 line. ''' # assert 1 in known and happies <= known # <= is expensive if number in known: return number in happies history = set() while True: history.add(number) number = sum(x**2 for x in digits(number)) if number in known or number in history: break known.update(history) if number in happies: happies.update(history) return True def calcMain(): happies = {x for x in range(upperBound) if happy(x) } if print_numbers: print(happies) if __name__ == '__main__': upperBound = eval( input("Pick an upper bound [default {0}]: " .format(upperBound)).strip() or repr(upperBound)) result = Timer(calcMain).timeit(1) print ('This computation took {0} seconds'.format(result))
It looks like you're passing vectors by value to other functions. This will be a significant slowdown because the program will actually make a full copy of your vector before it passes it to your function. To get around this, pass a constant reference to the vector instead of a copy. So instead of:
int sum(vector<int> given)
Use:
int sum(const vector<int>& given)
When you do this, you'll no longer be able to use the vector::iterator because it is not constant. You'll need to replace it with vector::const_iterator.
You can also pass in non-constant references, but in this case, you don't need to modify the parameter at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With