Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why is 'ord' seen as an unassigned variable here?

I hope it's not a duplicate (and at the same time it's difficult to tell, given the amount of questions with such errors, but which are basic mistakes), but I don't understand what happens here.

def f():
    c = ord('a')

f()

runs, no error (ord converts character to ASCII code, it's a built-in). Now:

if False:
    ord = None
def f():
    c = ord('a')

f()

Also runs, no error (ord isn't overwritten, condition is always false). Now:

def f():
    if False:
        ord = None
    c = ord('a')

f()

I get (at line where c = ord('a'))

UnboundLocalError: local variable 'ord' referenced before assignment

It seems that just referencing a left side operand makes it a local variable, even if the code is not run.

Obviously I can workaround this, but I was very surprised, given that the dynamic aspect of python allows you to define a variable like being an integer, and at the next line define it as a string.

It seems related to What's the scope of a variable initialized in an if statement?

Apparently the interpreter still takes notes of unreached branches when compiling to bytecode, but what happens exactly?

(tested on Python 2.7 and Python 3.4)

like image 972
Jean-François Fabre Avatar asked Mar 28 '18 15:03

Jean-François Fabre


2 Answers

It's not about the compiler doing a static analysis based on unrelated branches when compiling to bytecode; it's much simpler.

Python has a rule for distinguishing global, closure, and local variables. All variables that are assigned to in the function (including parameters, which are assigned to implicitly), are local variables (unless they have a global or nonlocal statement). This is explained in Binding and Naming and subsequent sections in the reference documentation.

This isn't about keeping the interpreter simple, it's about keeping the rule simple enough that it's usually intuitive to human readers, and can easily be worked out by humans when it isn't intuitive. (That's especially important for cases like this—the behavior can't be intuitive everywhere, so Python keeps the rule simple enough that, once you learn it, cases like this are still obvious. But you definitely do have to learn the rule before that's true. And, of course, most people learn the rule by being surprised by it the first time…)

Even with an optimizer smart enough to completely remove any bytecode related to if False: ord=None, ord must still be a local variable by the rules of the language semantics.

So: there's an ord = in your function, therefore all references to ord are references to a local variable, not any global or nonlocal that happens to have the same name, and therefore your code is an UnboundLocalError.


Many people get by without knowing the actual rule, and instead use an even simpler rule: a variable is

  • Local if it possibly can be, otherwise
  • Enclosing if it possibly can be, otherwise
  • Global if it's in globals, otherwise
  • Builtin if it's in builtins, otherwise
  • an error

While this works for most cases, it can be a bit misleading in some cases—like this one. A language with LEGB scoping done Lisp-style would see that ord isn't in the local namespace, and therefore return the global, but Python doesn't do that. You could say that ord is in the local namespace, but bound to a special "undefined" value, and that's actually close to what happens under the covers, but that's not what the rules of Python say, and, while it may be more intuitive for simple cases, it's harder to reason through.


If you're curious how this works under the covers:

In CPython, the compiler scans your function to find all assignments with an identifier as a target, and stores them in an array. It removes global and nonlocal variables. This arrays ends up as your code object's co_varnames, so let's say your ord is co_varnames[1]. Every use of that variable then gets compiled to a LOAD_FAST 1 or STORE_FAST 1, instead of a LOAD_NAME or STORE_GLOBAL or other operation. That LOAD_FAST 1 just loads the frame's f_locals[1] onto the stack when interpreted. That f_locals starts off as an array of NULL pointers instead of pointers to Python objects, and if a LOAD_FAST loads a NULL pointer, it raises UnboundLocalError.

like image 150
abarnert Avatar answered Sep 23 '22 13:09

abarnert


Just to demonstrate what's going on with the compiler:

def f():
    if False:
        ord = None
    c = ord('a')

  4           0 LOAD_FAST                0 (ord)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 STORE_FAST               1 (c)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE

Access to a is using LOAD_FAST, which is used for local variables.

If you set ord to None outside your function, LOAD_GLOBAL is used instead:

if False:
    ord = None
def f():
    c = ord('a')

  4           0 LOAD_GLOBAL              0 (ord)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 STORE_FAST               0 (c)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE
like image 40
user3483203 Avatar answered Sep 26 '22 13:09

user3483203