Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Beautifulsoup, maximum recursion depth reached

This is a beautifulsoup procedure that grabs content within all <p> html tags. After grabbing content from some web pages, I get an error that says maximum recursion depth exceeded.

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag,
        else:
            printText(tag)
    print ""
#loop over urls, send soup to printText procedure

The bottom of trace:

 File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 13, in printText
    if tag.__class__ == NavigableString:
RuntimeError: maximum recursion depth exceeded in cmp
like image 861
yayu Avatar asked Apr 12 '12 06:04

yayu


2 Answers

Your printText() calls itself recursively if it encounters anything other than a NavigableString. This includes subclasses of NavigableString, such as Comment. Calling printText() on a Comment iterates over the text of the comment, and causes the infinite recursion you see.

I recommend using isinstance() in your if statement instead of comparing class objects:

if isinstance(tag, basestring):

I diagnosed this problem by inserting a print statement before the recursion:

print "recursing on", tag, type(tag)
printText(tag)
like image 176
Leonard Richardson Avatar answered Sep 28 '22 06:09

Leonard Richardson


You probably hit a string. Iterating over a string yields 1-length strings. Iterating over that 1-length string yields a 1-length string. Iterating over THAT 1-length string...

like image 29
Ignacio Vazquez-Abrams Avatar answered Sep 28 '22 07:09

Ignacio Vazquez-Abrams