Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the speed difference between these 2 functions so large?

I've been reading through some of the MIT opencourseware quizes and they had a question that goes like this:

6) Consider the two functions specified below that are used to play a “guess a number game.”

def cmpGuess(guess):
"""Assumes that guess is an integer in range(maxVal). returns -1 if guess is < than the magic number, 0 if it is equal to the magic number and 1 if it is greater than the magic number."""
def findNumber(maxVal):
"""Assumes that maxVal is a positive integer. Returns a number, num, such that cmpGuess(num) == 0."""
Write a Python implementation of findNumber that guesses the magic number defined by cmpGuess. Your program should have the lowest time complexity possible. """

Here's my implementation:

def find(maxVal):
    ceiling = maxVal
    floor = 0

    while 1:
        med = ((ceiling + floor) / 2)
        res = cmp(med)
        if res == 1:
            ceiling = med
        elif res == -1:
            floor = med
        else:
            return med

And here's the answer sheet implementation provided by the teacher:

def findNumber(maxVal): 
    """ Assumes that maxVal is a positive integer. Returns a number, num, such that cmpGuess(num) == 0 """ 
    s = range(0, maxVal) 
    return bsearch(s, 0, len(s) -1)

def bsearch(s, first, last): 
    if (last-first) < 2: 
        if cmp(s[first]) == 0: 
            return first 
    else:
        return last

    mid = first + (last -first)/2
    if cmp(s[mid]) == 0:
        return s[mid]
    if cmp(s[mid]) == -1:
        return bsearch(s, first, mid -1)
    return bsearch(s, mid + 1, last)

This is the cmp function used by both our functions, according to spec:

def cmp(guess):
    if guess > num:
        return 1
    elif guess < num:
        return -1
    else: return 0

One major difference is that my solution is iterative and that of the teacher is recursive. I timed 1000 runs of both our functions with a maxVal = 1,000,000. Here's the timing snippet:

t = timeit.Timer('find(1000000)', 'from __main__ import find,cmp')
t1 = timeit.Timer('findNumber(1000000)', 'from __main__ import findNumber,bsearch')
print str(t.timeit(1000))
print str(t1.timeit(1000))

My run took: 0.000621605333677s The teachers run took: 29.627s
This can't possibly be right. I timed this several times in a row, and in all instances the second function came in with ridiculous 30s results. I copy pasted the solution function straight from the document provided by MIT. Any ideas?

like image 581
Steve Avatar asked Jan 05 '11 08:01

Steve


2 Answers

The most obvious thing I can see is that every time you call the teacher's function it creates a list of 1,000,000 integers (assuming Python 2.x), and then when it returns it destroys that list again.

That is going to take a while.

like image 164
Duncan Avatar answered Oct 18 '22 09:10

Duncan


You said that you tested both scripts to make sure that they give the same answer, but they don't. The teacher's script, as you have it written, will always return the last element. This is because of these lines:

def bsearch(s, first, last): 
    if (last-first) < 2: 
        if cmp(s[first]) == 0: 
            return first 
    else:
        return last

should be

def bsearch(s, first, last): 
    if (last-first) < 2: 
        if cmp(s[first]) == 0: 
            return first 
        else:
            return last

In other words, there is an indentation error, which I realize is also in the pdf on MIT's site. Second, the teacher's script still won't work correctly (it will always return the last number) because his binary search is going in the wrong direction with the result of cmp. This is a fault with the teacher's solution according to the spec and easily fixed by changing the line

    if cmp(s[mid]) == -1:

to

    if cmp(s[mid]) == 1:

This wasn't really an answer to the question of slowness which is obviously (as has been pointed out) due to the allocation of O(N) memory (this is the big one), recursion, list look-ups, calling cmp potentially twice for each bsearch call rather than once and storing the result, and having to check explicitly if (last - first) < 2 for each call to bsearch because he is using the variable last as being included in the range of possible values rather than it being 1 more than the highest possible value. By the way, your own code can be made a tad bit faster by changing the line:

            floor = med

to

            floor = med + 1

so that you are excluding med from the search range. You already know that it isn't the value based on the cmp result. By the way, I wouldn't use cmp as my own function name because it is a Python built-in function (I know that it is cmpGuess in the spec).

like image 31
Justin Peel Avatar answered Oct 18 '22 09:10

Justin Peel