Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Certain Power of Sum of Digits of N == N (running too slowly)

I'm trying to write a Python script that finds all integers (N) where a certain power of the sum of the digits of N is equal to N. For example, N=81 qualifies, because 8 + 1 = 9, and a certain power of 9 (namely 2) = 81.

The ranges I chose are arbitrary. My script works but it is very, very slow. Ideally, I'd want to find the first 30 such integers in about 6000ms.

My first solution:

def powerOfSum1():
    listOfN = []
    arange = [a for a in range(11, 1000000)] #range of potential Ns
    prange = [a for a in range(2, 6)] # a range for the powers to calculate
    for num in arange:
        sumOfDigits = sum(map(int, str(num)))
        powersOfSum = [sumOfDigits**p for p in prange]
        if num in powersOfSum:
            listOfN.append(num)
    return listOfN

In my second solution, I tried storing all the powers for each sumOfDigits, but that did not improve the performance much.

def powerOfSum2():
    listOfN = []
    powers= {}
    for num in range(11, 1000000):
        sumOfDigits = sum(map(int, str(num)))
        summ = str(sumOfDigits)
        if summ in powers:
            if num in powers[summ]:
                listOfN.append(num)
        else:
            powersOfSum = [sumOfDigits**p for p in range(2,6)]
            powers[summ] = powersOfSum
            if num in powers[summ]:
                listOfN.append(num)
    return listOfN

I haven't studied data structures and algorithms yet, so I'd appreciate any pointers on making this script more efficient.

like image 903
5u2ie Avatar asked Sep 14 '15 01:09

5u2ie


2 Answers

This is a great time to break out a profiler and see where your code is spending all its time. To do that, I put a little cProfiler wrapper around your code:

#!/usr/bin/env python

import cProfile

import math


def powerOfSum1():
    listOfN = []
    arange = [a for a in range(11, 1000000)] #range of potential Ns
    prange = [a for a in range(2, 6)] # a range for the powers to calculate
    for num in arange:
        sumOfDigits = sum(map(int, str(num)))
        powersOfSum = [sumOfDigits**p for p in prange]
        if num in powersOfSum:
            listOfN.append(num)
    return listOfN


def main():
    cProfile.run('powerOfSum1()')

if __name__ == "__main__":
    main()

Running this, here's what I got:

⌁ [alex:/tmp] 44s $ python powers.py
         1999993 function calls in 4.089 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    4.089    4.089 <string>:1(<module>)
        1    0.934    0.934    4.084    4.084 powers.py:7(powerOfSum1)
   999989    2.954    0.000    2.954    0.000 {map}
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.017    0.009    0.017    0.009 {range}
   999989    0.178    0.000    0.178    0.000 {sum}

If you look, it seems like most of the time is being spend on that map call, where you turn the num to a string, then make each digit an int, which is then summed.

It makes a lot of sense that this would be the slow part of the program. Not only do you do it a lot, but it's a slow operation: On that one line, you do a string-parsing operation, and then map an int conversion function over each char in the string, and then you sum them up.

I bet that if you can compute the sum of digits without doing string conversion first, it'll be a lot faster.

Let's try that. I made some other changes, like removing the redundant list comprehensions at the start. Here's what I got:

#!/usr/bin/env python

#started at 47.56

import cProfile

import math

MAXNUM = 10000000

powersOf10 = [10 ** n for n in range(0, int(math.log10(MAXNUM)))]

def powerOfSum1():
    listOfN = []
    arange = range(11, MAXNUM) #range of potential Ns
    prange = range(2, 6) # a range for the powers to calculate
    for num in arange:
        sumOfDigits = 0
        for p in powersOf10:
            sumOfDigits += num / p % 10
        powersOfSum = []
        curr = sumOfDigits
        for p in prange:
            curr = curr * sumOfDigits
            if num < curr:
                break
            if num == curr:
                listOfN.append(num)
    return listOfN

def main():
    cProfile.run('powerOfSum1()')

if __name__ == "__main__":
    main()

What does cProfile have to say?

⌁ [alex:/tmp] 3m42s $ python powers.py
         15 function calls in 0.959 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.959    0.959 <string>:1(<module>)
        1    0.936    0.936    0.953    0.953 powers.py:13(powerOfSum1)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.017    0.009    0.017    0.009 {range}

4 seconds to 0.9 seconds? Much better.

If you want to really see the effect, add an extra zero to the upper bound of arange. On my box, the original code takes ~47 seconds. The modified code takes ~10.

The profiler is your friend, and string conversion isn't free when you do it hundreds of thousands of times :)

like image 107
Alex Avatar answered Nov 07 '22 04:11

Alex


Your solution examines every possible integer to see it could be a solution. It's more efficient to only examine the integers that are actually powers to see if they are valid answers -- because there are fewer of those. Here is something that does this. But it will probably take longer than 6s to find 30 -- they get scarce fast.

EDIT -- updated to do the faster digit-summing suggested in the comments by Padraic Cunningham and fjarri, and then updated to add a couple of more tweaks, make it a generator, and make it Python-3 friendly.

It still gets slow fast, but the algorithm may be parallelizable -- maybe you could put the digit-summing in a separate process.

EDIT 2 -- Shaved off some time by doing a quick check of whether the base and the result value are equal modulo 9. There may be further number theory tricks afoot...

import heapq
import functools


def get_powers():
    heap = []
    push = functools.partial(heapq.heappush, heap)
    pop = functools.partial(heapq.heappop, heap)
    nextbase = 3
    nextbasesquared = nextbase ** 2
    push((2**2, 2, 2))
    while 1:
        value, base, power = pop()
        if base % 9 == value % 9:
            r = 0
            n = value
            while n:
                r, n = r + n % 10, n // 10
            if r == base:
                yield value, base, power
        power += 1
        value *= base
        push((value, base, power))
        if value > nextbasesquared:
            push((nextbasesquared, nextbase, 2))
            nextbase += 1
            nextbasesquared = nextbase ** 2


for i, info in enumerate(get_powers(), 1):
    print(i, info)
like image 43
Patrick Maupin Avatar answered Nov 07 '22 03:11

Patrick Maupin