Catch exception gets UnboundLocalError

I wrote a crawler to fetch information out of an Q&A website. Since not all the fields are presented in a page all the time, I used multiple try-excepts to handle the situation.

def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
    while True:
        URL = questionLinkQueue.get()
            response   = loginSession.get(URL,timeout = MAX_WAIT_TIME)
            raw_data   = response.text

            #These fields must exist, or something went wrong...
            questionId = re.findall(REGEX,raw_data)[0]
            answerId   = re.findall(REGEX,raw_data)[0]
            title      = re.findall(REGEX,raw_data)[0]

        except requests.exceptions.Timeout ,IndexError:
            print >> sys.stderr, URL + " extraction error..."

            questionInfo = re.findall(REGEX,raw_data)[0]
        except IndexError:
            questionInfo = ""

            answerContent = re.findall(REGEX,raw_data)[0]
        except IndexError:
            answerContent = ""

        result = {
                  'questionId'   : questionId,
                  'answerId'     : answerId,
                  'title'        : title,
                  'questionInfo' : questionInfo,
                  'answerContent': answerContent


And this code, sometimes, may or may not, gives the following exception during runtime:

UnboundLocalError: local variable 'IndexError' referenced before assignment

The line number indicates the error occurs at the second except IndexError:

Thanks everyone for your suggestions, Would love to give the marks that you deserve, too bad I can only mark one as the correct answer...

1 Answers

I think the problem is this line:

except requests.exceptions.Timeout ,IndexError

This is equivalent to:

except requests.exceptions.Timeout  as IndexError:

So, you're assigning IndexError to the exception caught by requests.exceptions.Timeout. Error can be reproduced by this code:

except NameError, IndexError:
    print IndexError
    #name 'true' is not defined

To catch multiple exceptions use a tuple:

except (requests.exceptions.Timeout, IndexError):

And UnboundLocalError is coming because IndexError is treated as a local variable by your function, so trying to access its value before actual definition will raise UnboundLocalError error.

>>> 'IndexError' in answerContentExtractor.func_code.co_varnames

So, if this line is not executed at runtime (requests.exceptions.Timeout ,IndexError) then the IndexError variable used below it will raise the UnboundLocalError. A sample code to reproduce the error:

def func():
    except NameError, IndexError:
    except IndexError:
#UnboundLocalError: local variable 'IndexError' referenced before assignment
