Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Module Initialization Order?

I am a Python newbie coming from a C++ background. While I know it's not Pythonic to try to find a matching concept using my old C++ knowledge, I think this question is still a general question to ask:

Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units, thus a global/static variable depending on another one in different compilation units might be initialized earlier than its dependency counterparts, and when dependant started to use the services provided by the dependency object, we would have undefined behavior. Here I don't want to go too deep on how C++ solves this problem. :)

On the Python world, I do see uses of global variables, even across different .py files, and one typycal usage case I saw was: initialize one global object in one .py file, and on other .py files, the code just fearlessly start using the global object, assuming that it must have been initialized somewhere else, which under C++ is definitely unaccept by myself, due to the problem I specified above.

I am not sure if the above use case is common practice in Python (Pythonic), and how does Python solve this kind of global variable initialization order problem in general?

like image 464
Lin Avatar asked Jun 21 '10 03:06

Lin


2 Answers

Python import executes new Python modules from beginning to end. Subsequent imports only result in a copy of the existing reference in sys.modules, even if still in the middle of importing the module due to a circular import. Module attributes ("global variables" are actually at the module scope) that have been initialized before the circular import will exist.

main.py:

import a

a.py:

var1 = 'foo'
import b
var2 = 'bar'

b.py:

import a
print a.var1 # works
print a.var2 # fails
like image 62
Ignacio Vazquez-Abrams Avatar answered Oct 17 '22 23:10

Ignacio Vazquez-Abrams


Under C++, there is a well known problem called global/static variable initialization order fiasco, due to C++'s inability to decide which global/static variable would be initialized first across compilation units,

I think that statement highlights a key difference between Python and C++: in Python, there is no such thing as different compilation units. What I mean by that is, in C++ (as you know), two different source files might be compiled completely independently from each other, and thus if you compare a line in file A and a line in file B, there is nothing to tell you which will get placed first in the program. It's kind of like the situation with multiple threads: you cannot say whether a particular statement in thread 1 will be executed before or after a particular statement in thread 2. You could say C++ programs are compiled in parallel.

In contrast, in Python, execution begins at the top of one file and proceeds in a well-defined order through each statement in the file, branching out to other files at the points where they are imported. In fact, you could almost think of the import directive as an #include, and in that way you could identify the order of execution of all the lines of code in all the source files in the program. (Well, it's a little more complicated than that, since a module only really gets executed the first time it's imported, and for other reasons.) If C++ programs are compiled in parallel, Python programs are interpreted serially.

Your question also touches on the deeper meaning of modules in Python. A Python module - which is everything that is in a single .py file - is an actual object. Everything declared at "global" scope in a single source file is actually an attribute of that module object. There is no true global scope in Python. (Python programmers often say "global" and in fact there is a global keyword in the language, but it always really refers to the top level of the current module.) I could see that being a bit of a strange concept to get used to coming from a C++ background. It took some getting used to for me, coming from Java, and in this respect Java is a lot more similar to Python than C++ is. (There is also no global scope in Java)

I will mention that in Python it is perfectly normal to use a variable without having any idea whether it has been initialized/defined or not. Well, maybe not normal, but at least acceptable under appropriate circumstances. In Python, trying to use an undefined variable raises a NameError; you don't get arbitrary behavior as you might in C or C++, so you can easily handle the situation. You may see this pattern:

try:
    duck.quack()
except NameError:
    pass

which does nothing if duck does not exist. Actually, what you'll more commonly see is

try:
    duck.quack()
except AttributeError:
    pass

which does nothing if duck does not have a method named quack. (AttributeError is the kind of error you get when you try to access an attribute of an object, but the object does not have any attribute by that name.) This is what passes for a type check in Python: we figure that if all we need the duck to do is quack, we can just ask it to quack, and if it does, we don't care whether it's really a duck or not. (It's called duck typing ;-)

like image 26
David Z Avatar answered Oct 17 '22 22:10

David Z