This is a beautifulsoup
procedure that grabs content within all <p>
html tags. After grabbing content from some web pages, I get an error that says maximum recursion depth exceeded.
def printText(tags):
for tag in tags:
if tag.__class__ == NavigableString:
print tag,
else:
printText(tag)
print ""
#loop over urls, send soup to printText procedure
The bottom of trace:
File "web_content.py", line 16, in printText
printText(tag)
File "web_content.py", line 16, in printText
printText(tag)
File "web_content.py", line 16, in printText
printText(tag)
File "web_content.py", line 16, in printText
printText(tag)
File "web_content.py", line 16, in printText
printText(tag)
File "web_content.py", line 13, in printText
if tag.__class__ == NavigableString:
RuntimeError: maximum recursion depth exceeded in cmp
Your printText() calls itself recursively if it encounters anything other than a NavigableString. This includes subclasses of NavigableString, such as Comment. Calling printText() on a Comment iterates over the text of the comment, and causes the infinite recursion you see.
I recommend using isinstance() in your if statement instead of comparing class objects:
if isinstance(tag, basestring):
I diagnosed this problem by inserting a print statement before the recursion:
print "recursing on", tag, type(tag)
printText(tag)
You probably hit a string. Iterating over a string yields 1-length strings. Iterating over that 1-length string yields a 1-length string. Iterating over THAT 1-length string...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With