Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Algorithm Challenge?

I have a python function (call it myFunction) that gets as input a list of numbers, and, following a complex calculation, returns back the result of the calculation (which is a number).

The function looks like this:

def myFunction( listNumbers ):
    # initialize the result of the calculation
    calcResult = 0

    # looping through all indices, from 0 to the last one
    for i in xrange(0, len(listNumbers), 1):
        # some complex calculation goes here, changing the value of 'calcResult'

    # let us now return the result of the calculation
    return calcResult

I tested the function, and it works as expected.

Normally, myFunction is provided a listNumbers argument that contains 5,000,000 elements in it. As you may expect, the calculation takes time. I need this function to run as fast as possible

Here comes the challenge: assume that the time now is 5am, and that listNumbers contains just 4,999,999 values in it. Meaning, its LAST VALUE is not yet available. This value will only be available at 6am.

Obviously, we can do the following (1st mode): wait until 6am. Then, append the last value into listNumbers, and then, run myFunction. This solution works, BUT it will take a while before myFunction returns our calculated result (as we need to process the entire list of numbers, from the first element on). Remember, our goal is to get the results as soon as possible past 6am.

I was thinking about a more efficient way to solve this (2nd mode): since (at 5am) we have listNumbers with 4,999,999 values in it, let us immediately start running myFunction. Let us process whatever we can (remember, we don't have the last piece of data yet), and then -- exactly at 6am -- 'plug in' the new data piece -- and generate the computed result. This should be significantly faster, as most of the processing will be done BEFORE 6am, hence, we will only have to deal with the new data -- which means the computed result should be available immediately after 6am.

Let's suppose that there's no way for us to inspect the code of myFunction or modify it. Is there ANY programming technique / design idea that will allow us to take myFunction AS IS, and do something with it (without changing its code) so that we can have it operate in the 2nd mode, rather than the 1st one?

Please do not suggest using c++ / numpy + cython / parallel computing etc to solve this problem. The goal here is to see if there's any programming technique or design pattern that can be easily used to solve such problems.

like image 275
user3262424 Avatar asked Jul 20 '11 22:07

user3262424


1 Answers

You could use a generator as an input. The generator will only return when there is data available to process.

Update: thanks for the brilliant comment, I wanted to remove this entry :)

class lazylist(object):
    def __init__(self):
        self.cnt = 0
        self.length = 5000000

    def __iter__(self):
        return self

    def __len__(self):
        return self.length

    def next(self):
        if self.cnt < self.length:
            self.cnt += 1
            #return data here or wait for it
            return self.cnt #just return a counter for this example
        else:
            raise StopIteration()

    def __getitem__(self, i):
        #again, block till you have data.
        return i+1 #simple counter

myFunction(lazylist())

Update: As you can see from the comments and other solutions your loop construct and len call causes a lot of headaches, if you can eliminate it you can use a lot more elegant solution. for e in li or enumerate is the pythonic way to go.

like image 139
Karoly Horvath Avatar answered Oct 04 '22 20:10

Karoly Horvath