Given a package:
package/
├── __init__.py
└── module.py
__init__.py:
from .module import function
module.py:
def function():
pass
One can import the package and print its namespace.
python -c 'import package; print(dir(package))'
['__builtins__', ..., 'function', 'module']
Question:
Why does the namespace of package
contain module
when only function
was imported in the __init__.py
?
I would have expected that the package
's namespace would only contain function
and not module
. This mechanism is also mentioned in the Documentation,
"When a submodule is loaded using any mechanism (e.g. importlib APIs, the import or import-from statements, or built-in
__import__()
) a binding is placed in the parent module’s namespace to the submodule object."
but is not really motivated. For me this choice seems odd, as I think of sub-modules as implementation detail to structure packages and do not expect them to be part of the API as the structure can change.
Also I know "Python is for consenting adults" and one cannot truly hide anything from a user. But I would argue, that binding the sub-modules names to the package's scopes makes it less obvious to a user what is actually part of the API and what can change.
Why no use a __sub_modules__
attribute or so to make sub-modules accessible to a user? What is the reason for this design decision?
You say you think of submodules as implementation details. This is not the design intent behind submodules; they can be, and extremely commonly are, part of the public interface of a package. The import system was designed to facilitate access to submodules, not to prevent access.
Loading a submodule places a binding into the parent's namespace because this is necessary for access to the module. For example, after the following code:
import package.submodule
the expression package.submodule
must evaluate to the module object for the submodule. package
evaluates to the module object for the package, so this module object must have a submodule
attribute referring to the module object for the submodule.
At this point, you are almost certainly thinking, "hey, there's no reason from .submodule import function
has to do the same thing!" It does the same thing because this attribute binding is part of submodule initialization, which only happens on the first import, and which needs to do the same setup regardless of what kind of import triggered it.
This is not an extremely strong reason. With enough changes and rejiggering, the import system definitely could have been designed the way you expect. It was not designed that way because the designers had different priorities than you. Python's design cares very little about hiding things or supporting any notion of privacy.
you have to understand that Python is a runtime language. def
, class
and import
are all executable statements, that will, when executed, create (respectively) a function
, class
or module
object and bind them in the current namespace.
wrt/ modules (packages being modules too - at least at runtime), the very first time a module is imported (directly or indirectly) for a given process, the matching .py (well, usually it's compiled .pyc version) is executed (all statements at the top level are executed in order), and the resulting namespace will be used to populate the module
instance. Only once this has been done can any name defined in the module be accessed (you cannot access something that doesn't exist yet, can you ?). Then the module object is cached in sys.modules
for subsequent imports. In this process, a when a sub-module is loaded, it's considered as an attribute of it's parent module.
For me this choice seems odd, as I think of sub-modules as implementation detail to structure packages and do not expect them to be part of the API as the structure can change
Actually, Python's designers considered things the other way round: a "package" (note that there's no 'package' type at runtime) is mostly a convenience to organize a collection of related modules - IOW, the ̀moduleis the real building block - and as a matter of fact, at runtime, when what you import is technically a "package", it still materializes as a
module` object.
Now wrt/ the "do not expect them to be part of the API as the structure can change", this has of course been taken into account. It's actually a quite common pattern to start out with a single module, and then turn it into a package as the code base grows - without impacting client code, of course. The key here is to make proper use of your package's initializer - the __init__.py
file - which is actually what your package's module
instance is built from. This lets the package act as a "facade", masking the "implementation details" of which submodule effectively defines which function, class or whatever.
So the solution here is plain simply to, in your package's __init__.py
, 1/ import the names you want to make public (so the client code can import directly from your package instead of having to go thru the submodule) and 2/ define the __all__
attributes with the names that should be considered public so the interface is clearly documented.
FWIW, this last operation should be done for all your submodules too, and you can also use the _single_leading_underscore naming convention for things that are really really "implementation details".
None of this will of course prevent anyone to import even "private" names directly from your submodules, but then they are on their own when something breaks ("we are all consenting adults" etc).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With