Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python recursive function exceeds recursion limit. How can I convert it to iteration

I have created a function that reads lists of ID pairs (i.e. [("A","B"),("B","C"),("C","D"),...] and sequences the ID's from start to finish including any branches.

Each list of ordered ID's is held in a class called an Alignment and this function uses recursion to handle branches by creating a new alignment starting at the ID at which the branch splits from the main list.

I have found that with certain inputs it is possible to reach the maximum recursion limit set by Python. I know I could just increase this limit using sys.setrecursionlimit(), but as I do not know how many combinations of branches are possible, I'd like to avoid this tactic.

I have been reading several articles about converting recursive functions to iterative functions, but I have not been able to determine the best way to handle this particular function because the recursion takes place in the middle of the function and can be exponential.

Could any of you offer any suggestions?

Thanks, Brian

Code is posted below:

def buildAlignments(alignment, alignmentList, endIDs):
    while alignment.start in endIDs:

        #If endID only has one preceding ID: add preceding ID to alignment
        if len(endIDs[alignment.start]) == 1:
            alignment.add(endIDs[alignment.start][0])

        else:

            #List to hold all branches that end at spanEnd
            branches = []

            for each in endIDs[alignment.start]:

                #New alignment for each branch
                al = Alignment(each)

                #Recursively process each new alignment
                buildAlignments(al, branches, endIDs)

                branches.append(al)
            count = len(branches)
            i = 0
            index = 0

            #Loop through branches by length
            for branch in branches:
                if i < count - 1:

                    #Create copy of original alignment and add branch to alignment
                    al = Alignment(alignment)
                    al += branch #branches[index]
                    alignmentList.append(al)
                    i += 1

                #Add single branch to existing original alignment
                else: alignment += branch #branches[index]
                index += 1

def main():
    IDs = [("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")]

    #Gather all startIDs with corresponding endIDs and vice versa
    startIDs = {}
    endIDs = {}
    for pair in IDs:
        if not pair[0] in startIDs: startIDs[pair[0]] = []
        startIDs[pair[0]].append(pair[1])
        if not pair[1] in endIDs: endIDs[pair[1]] = []
        endIDs[pair[1]].append(pair[0])

    #Create Alignment objects from any endID that does not start another pair (i.e. final ID in sequence)
    alignments = [Alignment(end) for end in endIDs if not end in startIDs]

    #Build build sequences in each original Alignment
    i = len(alignments)
    while i:
        buildAlignments(alignments[i-1], alignments, endIDs)
        i -= 1

EDIT: I should point out that the provided IDs are just a small sample I used for testing this algorithm. In actuality, the sequences of IDs may be several thousand long with many branches and branches off of branches.

RESOLUTION: Thanks to Andrew Cooke. The new method seems to be much simpler and much easier on the call stack. I did make some minor adjustments to his code to better suit my purposes. I have included the completed solution below:

from collections import defaultdict

def expand(line, have_successors, known):
    #print line
    known.append(line)
    for child in have_successors[line[-1]]:
        newline = line + [child]
        if line in known: known.remove(line)
        yield expand(newline, have_successors, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = defaultdict(lambda: set())
    links = set()
    for (start, end) in pairs:
        links.add(end)
        have_successors[start].add(end)
    known = []
    for node in set(have_successors.keys()):
        if node not in links:
            trampoline(expand([node], have_successors, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

SUMMARY OF CHANGES: swapped links and have_successors to create list from start to end added if line in known: known.remove(line) to expand in order to retain only the complete series changed line variable from string to list in order to handle multiple characters in a single ID.

UPDATE: So I just discovered the reason I was having an issue with all this in the first place is do to circular references in the list of IDs I was provided. Now that the circular reference is fixed, either method works as expected. - Thanks again for all your help.

like image 789
Brian Avatar asked Aug 16 '11 18:08

Brian


People also ask

How do I bypass recursion limit in Python?

The maximum recursion depth in Python is 1000. You can change the limit by calling sys. setrecursionlimit() method.

How do you solve maximum recursion depth exceeded in Python?

The “maximum recursion depth exceeded in comparison” error is raised when you try to execute a function that exceeds Python's built in recursion limit. You can fix this error by rewriting your program to use an iterative approach or by increasing the recursion limit in Python.

What error do you get when you provide a large input to the recursive function in Python?

Stack Overflow Error in Recursive Function A recursive function that is called with an input that requires too many iterations will cause the call stack to get too large, resulting in a stack overflow error.

How do you convert iteration to recursion in Python?

Initialize the accumulator before the while-loop. Use the negation of the base-case condition as the loop's condition. Use the recursive function's body (except the recursive call) as the body of the while-loop. After the loop, apply the base-case update of the accumulator and return its value.


1 Answers

your code is a disorganised muddle. i can't tell what it is supposed to be doing in detail. if you were more careful (neater, clearer) then you would probably also find it easier to refactor.

anyway, this may do something like what you want:

from collections import defaultdict

def expand(line, links, known):
    print 'expand'
    known.append(line)
    for child in links[line[-1]]:
        newline = line + child
        yield expand(newline, links, known)

def trampoline(generator):
    stack = [generator]
    while stack:
        try:
            generator = stack.pop()
            print 'next'
            child = next(generator)
            stack.append(generator)
            stack.append(child)
        except StopIteration:
            pass

def main(pairs):
    have_successors = set()
    links = defaultdict(lambda: set())
    for (start, end) in pairs:
        have_successors.add(start)
        links[end].add(start)
    known = []
    for node in set(links.keys()):
        if node not in have_successors:
            trampoline(expand(node, links, known))
    for line in known:
        print line

if __name__ == '__main__':
    main([("L", "G"), ("A", "B"), ("B", "I"), ("B", "H"), ("B", "C"), ("F", "G"), ("D", "E"), ("D", "J"), ("E", "L"), ("C", "D"), ("E", "F"), ("J", "K")])

i used python2.7 - with earlier versions you may need to replace next(foo) with foo.__next__() or similar.


on writing cleaner code

first, i too am a self-taught programmer who started out as an academic (an astronomer), so you have my sympathy. and if you keep going, you can catch up and pass many "taught" programmers. it's not as hard as you might think...

second there's a difference between using "tricks" like defaultdict, which is just a matter of experience / practice, and being "tidy". i don't expect you to know about defaultdict - that will come with time.

but what you should be able to do now is write clean, simple code:

  • i think that you have comments that are about earlier versions of the code. one mentions "max length" but i don't see any length calculations. so either the comment is out of date (in which case why is it there) or it needs to be clearer (why are those things of max length?). in general, you should comment as little as possible, because otherwise it does get out of date. but at the same time you should use comments where it's not clear what the "ideas" are behind the code. the code should speak for itself, so don't say "i am adding two numbers here", but do say "fragments here must be maximum length because ..." if there is some "hidden" logic.

  • be careful with the pictures you use. for some reason your code starts with known terminals. so you're building things from the end back towards the start. why? that's a strange way of looking at the problem. wouldn't it be clearer to start with points that are in start, but not in end? and then use "startIDs" to grow them? that way you are "walking forwards". it might seem like a little thing, but it makes reading the code confusing.

  • use the right tools for the job. you didn't use startIDs, so why are you building a map? all you needed there was a set. perhaps you didn't know about sets, in which case OK (but you do now! :o). but otherwise, that too is confusing - someone reading your code expects you to be doing things for a reason. so when you do more than is necessary they wonder why.

  • avoid counting things when you don't need to. you have i and index and count. are they all needed? these kinds of counters are the easiest way to have bugs, because they can have silly little logic errors. and they make the code unclear. is if i < count - 1: really saying "is this the last branch"? if so, it would be much nicer to write if branch == branches [-1]: because that's clear about what you're thinking.

  • similarly with looping over alignments in main. using i just complicates things. you are processing each alignment, so just say for each alignment in alignments. if that gives an error because alignments is changing, make an unchanging copy: for each alignment in list(alignments).

  • avoid special cases if they are not needed. in buildAlignment you have a test right at the start for a special case. but was it really needed? would you have got the same result without it? often, when you write the code simply, it turns out that you don't need special cases. in my code i don't need to test whether there is one or no "links" because it works ok in all those cases. this gives you less code and less things to worry about and less chance for bugs.

more than all these things - you have to be obsessively tidy and methodical. you have lots of ideas, but rather than try out half of one then jump to another, write them down and work through them one by one. otherwise you end up with a mess and code that you don't understand. at first it feels like you are wasting time, but you start to see that as a result you are becoming faster because you spend less time confused...


on generators

[i modified the code slightly to separate out newline and to add print in a couple of places.]

first, did you run the code? is it doing the kind of thing you want? can you see how it connects with what you had before? my expand is similar to your buildAlignment (i hope).

if you run it (latest version) you'll see:

: python2.7 recurse.py
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
expand
next
next
...

which may give a clue about what is happening. the "trick" is the yield statement - the python compiler sees that and, instead of making an ordinary function, makes a generator.

a generator is a very strange thing. it's basically your function (in this case, expand), "bundled up" so that it can be run in stages. the running is done by next() and the function stops again each time a yield is reached.

so trampoline is passed this strange bundle. and it calls next(). that's the "magic" function that starts the function. so when next is called the function starts running, until it reaches the yield, where it returns a new bundle. the trampoline() command then saves the old bundle and starts working on the new one, calling next() on that, starting it... etc etc.

when a generator "runs out of things to do" it raises StopIteration. so when we reach a point where the expression cannot grow any more then we see that exception in trampoline(). at that point we return to the last "old" bundle (stored in our stack) and call next() again on that. this bundle restarts from where it was (just after yield) and continues, probably doing another loop in the while, until it hits yield again (or runs out and raises StopIteration).

so in the end, the code does the same as if the yield was not there! the only difference is that we keep making these bundles, and returning them. which seems pointless. except that we are no longer using the stack! because the bundle is returned the stack is not being "used up"! that is why we need to manage out own stack (the list stack) - otherwise there's no way to know what the previous call was.

ok, ok, i don't expect you to understand this at all. yes, it's kind-of crazy. now you need to go away and google for "python generators". and write some code of your own to test it out. but hopefully this points the way.


oh, also, i was thinking last night. and i suspect that if you were exhausting the stack it was actually because you have loops, not because the chains are so long. have you considered loops? A->B, B->C, C->A, ....

like image 197
andrew cooke Avatar answered Oct 21 '22 08:10

andrew cooke