Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing a Partition Function

Here is the code, in python:

# function for pentagonal numbers
def pent (n):     return int((0.5*n)*((3*n)-1))

# function for generalized pentagonal numbers
def gen_pent (n): return pent(int(((-1)**(n+1))*(round((n+1)/2))))

# array for storing partitions - first ten already stored
partitions = [1, 1, 2, 3, 5, 7, 11, 15, 22, 30, 42]

# function to generate partitions 
def partition (k):
 if (k < len(partitions)): return partitions[k]

 total, sign, i = 0, 1, 1

 while (k - gen_pent(i)) >= 0:
  sign   = (-1)**(int((i-1)/2))
  total += sign*(partition(k - gen_pent(i)))
  i     +=  1

 partitions.insert(k,total)
 return total

It uses this method to calculate partitions:

p(k) = p(k − 1) + p(k − 2) − p(k  − 5) − p(k − 7) + p(k − 12) + p(k  − 15) ...

(source: Wikipedia)

However, the code is taking quite some time when it comes to large numbers (over p(10^3)). I want to ask you if I can optimize my code, or hint me to a different but faster algorithm or approach. Any optimization suggestions are welcome.

And no, before you ask, this is not for homework or Project Euler, its for the experience value.

like image 805
Khaled Avatar asked Jul 02 '10 08:07

Khaled


People also ask

What do you want to optimize in the partition problem?

There is an optimization version of the partition problem, which is to partition the multiset S into two subsets S1, S2 such that the difference between the sum of elements in S1 and the sum of elements in S2 is minimized. The optimization version is NP-hard, but can be solved efficiently in practice.

Can partitioning improve performance?

Partitioning can also improve the performance of multi-table joins by using a technique known as partition-wise joins.

How does partitioning help scalability?

Partitioning helps to scale a data warehouse by dividing database objects into smaller pieces, enabling access to smaller, more manageable objects. Having direct access to smaller objects addresses the scalability requirements of data warehouses.

What is a partition algorithm?

The Partition Algorithm executes in two phases: > Phase I: the algorithm logically divides the database into a number of. non-overlapping partitions. The partitions are considered one at a time and all large itemsets for that partition are generated.


1 Answers

Here are some comments. Note that I am no expert on this stuff, by I too like messing about with maths (and Project Euler).

I have redefined the pentagonal number functions as follows:

def pent_new(n):
    return (n*(3*n - 1))/2

def gen_pent_new(n):
    if n%2:
        i = (n + 1)/2
    else:
        i = -n/2
    return pent_new(i)

I have written them in such a way that I don't introduce floating point calculations - for large n using floats will introduce errors (compare the results for n = 100000001).

You can use the timeit module to test small snippets of code. When I tested all pentagonal functions (yours and mine), the results for mine were quicker. The following is an example which tests your gen_pent function.

# Add this code to your script
t = Timer("for i in xrange(1, 100): gen_pent(i)", "from __main__ import gen_pent")
print t.timeit()

Here is a slight modification of your partition function. Again, testing with timeit shows that it is faster than your partition function. However, this may be due to the improvements made to the pentagonal number functions.

def partition_new(n):
    try:
        return partitions_new[n]
    except IndexError:
        total, sign, i = 0, 1, 1
        k = gen_pent_new(i)
        while n - k >= 0:
            total += sign*partition_new(n - k)

            i += 1
            if i%2: sign *= -1
            k = gen_pent_new(i)

        partitions_new.insert(n, total)
        return total

In your partition function, you calculate the general pentagonal number twice for each loop. Once to test in the while condition, and the other to update total. I store the result in a variable, so only make the calculation once per loop.

Using the cProfile module (for python >= 2.5, otherwise the profile module) you can see where your code spends most of its time. Here is an example based on your partition function.

import cProfile
import pstats

cProfile.run('partition(70)', 'partition.test')
p = pstats.Stats('partition.test')
p.sort_stats('name')
p.print_stats()

This produced the following output in the console window:

C:\Documents and Settings\ags97128\Desktop>test.py
Fri Jul 02 12:42:15 2010    partition.test

         4652 function calls (4101 primitive calls) in 0.015 CPU seconds

   Ordered by: function name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      552    0.001    0.000    0.001    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       60    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
        1    0.000    0.000    0.015    0.015 <string>:1(<module>)
     1162    0.002    0.000    0.002    0.000 {round}
     1162    0.006    0.000    0.009    0.000 C:\Documents and Settings\ags97128\Desktop\test.py:11(gen_pent)
    552/1    0.005    0.000    0.015    0.015 C:\Documents and Settings\ags97128\Desktop\test.py:26(partition)
     1162    0.002    0.000    0.002    0.000 C:\Documents and Settings\ags97128\Desktop\test.py:5(pent)

Now profiling my partition function:

Fri Jul 02 12:50:10 2010    partition.test

         1836 function calls (1285 primitive calls) in 0.006 CPU seconds

   Ordered by: function name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       60    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
        1    0.000    0.000    0.006    0.006 <string>:1(<module>)
      611    0.002    0.000    0.003    0.000 C:\Documents and Settings\ags97128\Desktop\test.py:14(gen_pent_new)
    552/1    0.003    0.000    0.006    0.006 C:\Documents and Settings\ags97128\Desktop\test.py:40(partition_new)
      611    0.001    0.000    0.001    0.000 C:\Documents and Settings\ags97128\Desktop\test.py:8(pent_new)

You can see in my results there are no calls to the len and round builtin functions. And I have nearly halved the number of calls to the pentagonal functions (gen_pent_new and pent_new)

There are probably other tricks to improving the speed of python code. I would search here for 'python optimization' to see what you can find.

Finally, one of the links at the bottom of the wikipedia page you mentioned is an academic paper on fast algorithms for calculating partition numbers. A quick glance shows it contains pseudocode for the algorithms. These algorithms will probably be faster than anything you or I could produce.

like image 70
Gary Kerr Avatar answered Oct 15 '22 07:10

Gary Kerr