Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: maximum recursion depth exceeded while calling a Python object

I've built a crawler that had to run on about 5M pages (by increasing the url ID) and then parses the pages which contain the info' I need.

after using an algorithm which run on the urls (200K) and saved the good and bad results I found that the I'm wasting a lot of time. I could see that there are a a few returning subtrahends which I can use to check the next valid url.

you can see the subtrahends quite fast (a little ex' of the few first "good IDs") -

510000011 # +8 510000029 # +18 510000037 # +8 510000045 # +8 510000052 # +7 510000060 # +8 510000078 # +18 510000086 # +8 510000094 # +8 510000102 # +8 510000110 # etc' 510000128 510000136 510000144 510000151 510000169 510000177 510000185 510000193 510000201 

after crawling about 200K urls which gave me only 14K good results I knew I was wasting my time and need to optimize it, so I run some statistics and built a function that will check the urls while increasing the id with 8\18\17\8 (top returning subtrahends ) etc'.

this is the function -

def checkNextID(ID):     global numOfRuns, curRes, lastResult     while ID < lastResult:         try:             numOfRuns += 1             if numOfRuns % 10 == 0:                 time.sleep(3) # sleep every 10 iterations             if isValid(ID + 8):                 parseHTML(curRes)                 checkNextID(ID + 8)                 return 0             if isValid(ID + 18):                 parseHTML(curRes)                 checkNextID(ID + 18)                 return 0             if isValid(ID + 7):                 parseHTML(curRes)                 checkNextID(ID + 7)                 return 0             if isValid(ID + 17):                 parseHTML(curRes)                 checkNextID(ID + 17)                 return 0             if isValid(ID+6):                 parseHTML(curRes)                 checkNextID(ID + 6)                 return 0             if isValid(ID + 16):                 parseHTML(curRes)                 checkNextID(ID + 16)                 return 0             else:                 checkNextID(ID + 1)                 return 0         except Exception, e:             print "somethin went wrong: " + str(e) 

what is basically does is -checkNextID(ID) is getting the first id I know that contain the data minus 8 so the first iteration will match the first "if isValid" clause (isValid(ID + 8) will return True).

lastResult is a variable which saves the last known url id, so we'll run until numOfRuns is

isValid() is a function that gets an ID + one of the subtrahends and returns True if the url contains what I need and saves a soup object of the url to a global varibale named - 'curRes', it returns False if the url doesn't contain the data I need.

parseHTML is a function that gets the soup object (curRes), parses the data I need and then saves the data to a csv, then returns True.

if isValid() returns True, we'll call parseHTML() and then try to check the next ID+the subtrahends (by calling checkNextID(ID + subtrahends), if none of them will return what I'm looking for I'll increase it with 1 and check again until I'll find the next valid url.

you can see the rest of the code here

after running the code I got about 950~ good results and suddenly an exception had raised -

"somethin went wrong: maximum recursion depth exceeded while calling a Python object"

I could see on WireShark that the scipt stuck on id - 510009541 (I started my script with 510000003), the script tried getting the url with that ID a few times before I noticed the error and stopped it.

I was really exciting to see that I got the same results but 25x-40x times faster then my old script, with fewer HTTP requests, it's very precise, I have missed only 1 result for 1000 good results, which is find by me, it's impossible to rum 5M times, I had my old script running for 30 hours and got 14-15K results when my new script gave me 960~ results in 5-10 minutes.

I read about stack limitations, but there must be a solution for the algorithm I'm trying to implement in Python (I can't go back to my old "algorithm", it will never end).

Thanks!

like image 593
YSY Avatar asked Jul 24 '11 20:07

YSY


People also ask

How do you fix maximum recursion depth exceeded while calling a Python object?

The recursion depth limit in Python is by default 1000 . You can change it using sys. setrecursionlimit() function.

How do you solve maximum recursion depth exceeded?

The “maximum recursion depth exceeded in comparison” error is raised when you try to execute a function that exceeds Python's built in recursion limit. You can fix this error by rewriting your program to use an iterative approach or by increasing the recursion limit in Python.

What RecursionError maximum recursion depth exceeded while calling a Python object?

The RecursionError occurs because the Python interpreter has exceeded the recursion limit allowed. The reason why the Python interpreter limits the number of times recursion can be performed is to avoid infinite recursion and hence avoid a stack overflow.

What is the maximum depth of recursive calls a function in Python?

Python's default recursion limit is 1000, meaning that Python won't let a function call on itself more than 1000 times, which for most people is probably enough. The limit exists because allowing recursion to occur more than 1000 times doesn't exactly make for lightweight code.


1 Answers

Python don't have a great support for recursion because of it's lack of TRE (Tail Recursion Elimination).

This means that each call to your recursive function will create a function call stack and because there is a limit of stack depth (by default is 1000) that you can check out by sys.getrecursionlimit (of course you can change it using sys.setrecursionlimit but it's not recommended) your program will end up by crashing when it hits this limit.

As other answer has already give you a much nicer way for how to solve this in your case (which is to replace recursion by simple loop) there is another solution if you still want to use recursion which is to use one of the many recipes of implementing TRE in python like this one.

N.B: My answer is meant to give you more insight on why you get the error, and I'm not advising you to use the TRE as i already explained because in your case a loop will be much better and easy to read.

like image 112
mouad Avatar answered Sep 23 '22 14:09

mouad