I'm trying to write a Python script that finds all integers (N) where a certain power of the sum of the digits of N is equal to N. For example, N=81 qualifies, because 8 + 1 = 9, and a certain power of 9 (namely 2) = 81.
The ranges I chose are arbitrary. My script works but it is very, very slow. Ideally, I'd want to find the first 30 such integers in about 6000ms.
My first solution:
def powerOfSum1():
listOfN = []
arange = [a for a in range(11, 1000000)] #range of potential Ns
prange = [a for a in range(2, 6)] # a range for the powers to calculate
for num in arange:
sumOfDigits = sum(map(int, str(num)))
powersOfSum = [sumOfDigits**p for p in prange]
if num in powersOfSum:
listOfN.append(num)
return listOfN
In my second solution, I tried storing all the powers for each sumOfDigits, but that did not improve the performance much.
def powerOfSum2():
listOfN = []
powers= {}
for num in range(11, 1000000):
sumOfDigits = sum(map(int, str(num)))
summ = str(sumOfDigits)
if summ in powers:
if num in powers[summ]:
listOfN.append(num)
else:
powersOfSum = [sumOfDigits**p for p in range(2,6)]
powers[summ] = powersOfSum
if num in powers[summ]:
listOfN.append(num)
return listOfN
I haven't studied data structures and algorithms yet, so I'd appreciate any pointers on making this script more efficient.
This is a great time to break out a profiler and see where your code is spending all its time. To do that, I put a little cProfiler
wrapper around your code:
#!/usr/bin/env python
import cProfile
import math
def powerOfSum1():
listOfN = []
arange = [a for a in range(11, 1000000)] #range of potential Ns
prange = [a for a in range(2, 6)] # a range for the powers to calculate
for num in arange:
sumOfDigits = sum(map(int, str(num)))
powersOfSum = [sumOfDigits**p for p in prange]
if num in powersOfSum:
listOfN.append(num)
return listOfN
def main():
cProfile.run('powerOfSum1()')
if __name__ == "__main__":
main()
Running this, here's what I got:
⌁ [alex:/tmp] 44s $ python powers.py
1999993 function calls in 4.089 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 4.089 4.089 <string>:1(<module>)
1 0.934 0.934 4.084 4.084 powers.py:7(powerOfSum1)
999989 2.954 0.000 2.954 0.000 {map}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.017 0.009 0.017 0.009 {range}
999989 0.178 0.000 0.178 0.000 {sum}
If you look, it seems like most of the time is being spend on that map
call, where you turn the num
to a string, then make each digit an int, which is then summed.
It makes a lot of sense that this would be the slow part of the program. Not only do you do it a lot, but it's a slow operation: On that one line, you do a string-parsing operation, and then map an int conversion function over each char in the string, and then you sum them up.
I bet that if you can compute the sum of digits without doing string conversion first, it'll be a lot faster.
Let's try that. I made some other changes, like removing the redundant list comprehensions at the start. Here's what I got:
#!/usr/bin/env python
#started at 47.56
import cProfile
import math
MAXNUM = 10000000
powersOf10 = [10 ** n for n in range(0, int(math.log10(MAXNUM)))]
def powerOfSum1():
listOfN = []
arange = range(11, MAXNUM) #range of potential Ns
prange = range(2, 6) # a range for the powers to calculate
for num in arange:
sumOfDigits = 0
for p in powersOf10:
sumOfDigits += num / p % 10
powersOfSum = []
curr = sumOfDigits
for p in prange:
curr = curr * sumOfDigits
if num < curr:
break
if num == curr:
listOfN.append(num)
return listOfN
def main():
cProfile.run('powerOfSum1()')
if __name__ == "__main__":
main()
What does cProfile
have to say?
⌁ [alex:/tmp] 3m42s $ python powers.py
15 function calls in 0.959 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.006 0.006 0.959 0.959 <string>:1(<module>)
1 0.936 0.936 0.953 0.953 powers.py:13(powerOfSum1)
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.017 0.009 0.017 0.009 {range}
4 seconds to 0.9 seconds? Much better.
If you want to really see the effect, add an extra zero to the upper bound of arange
. On my box, the original code takes ~47 seconds. The modified code takes ~10.
The profiler is your friend, and string conversion isn't free when you do it hundreds of thousands of times :)
Your solution examines every possible integer to see it could be a solution. It's more efficient to only examine the integers that are actually powers to see if they are valid answers -- because there are fewer of those. Here is something that does this. But it will probably take longer than 6s to find 30 -- they get scarce fast.
EDIT -- updated to do the faster digit-summing suggested in the comments by Padraic Cunningham and fjarri, and then updated to add a couple of more tweaks, make it a generator, and make it Python-3 friendly.
It still gets slow fast, but the algorithm may be parallelizable -- maybe you could put the digit-summing in a separate process.
EDIT 2 -- Shaved off some time by doing a quick check of whether the base and the result value are equal modulo 9. There may be further number theory tricks afoot...
import heapq
import functools
def get_powers():
heap = []
push = functools.partial(heapq.heappush, heap)
pop = functools.partial(heapq.heappop, heap)
nextbase = 3
nextbasesquared = nextbase ** 2
push((2**2, 2, 2))
while 1:
value, base, power = pop()
if base % 9 == value % 9:
r = 0
n = value
while n:
r, n = r + n % 10, n // 10
if r == base:
yield value, base, power
power += 1
value *= base
push((value, base, power))
if value > nextbasesquared:
push((nextbasesquared, nextbase, 2))
nextbase += 1
nextbasesquared = nextbase ** 2
for i, info in enumerate(get_powers(), 1):
print(i, info)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With