TL;DR: What's the cleanest way to keep implementation details out of a module's namespace?
There are a number of similar questions on this topic already, but none seems to have a satisfactory answer relative to modern tools and language features.
I'm designing a Python package, and I'd like to keep each module's public interface clean, exposing only what's intended, keeping implementation details (especially imports) hidden.
Over the years, I've seen a number of techniques:
This is just horrible, in my opinion. A well-designed interface should be easily discoverable. Having the implementation details publicly visible makes the interface much more confusing. Even as the author of a package, I don't want to use it when it exposes too much, as it makes autocompletion less useful.
This is a well-understood convention, and most development tools are smart enough to at least sort underscore-prefixed names to the bottom of autocomplete lists. It works fine if you have a small number of names to treat this way, but as the number of names grows, it becomes more and more tedious and ugly.
Take for example this relatively simple list of imports:
import struct
from abc import abstractmethod, ABC
from enum import Enum
from typing import BinaryIO, Dict, Iterator, List, Optional, Type, Union
Applying the underscore technique, this relatively small list of imports becomes this monstrosity:
import struct as _struct
from abc import abstractmethod as _abstractmethod, ABC as _ABC
from enum import Enum as _Enum
from typing import (
BinaryIO as _BinaryIO,
Dict as _Dict,
Iterator as _Iterator,
List as _List,
Optional as _Optional,
Type as _Type,
Union as _Union
)
Now, I know this problem can be partially mitigated by never doing from
imports, and just importing the entire package, and package-qualifying everything. While that does help this situation, and I realize that some people prefer to do this anyway, it doesn't eliminate the problem, and it's not my preference. There are some packages I prefer to import directly, but I usually prefer to import type names and decorators explicitly so that I can use them unqualified.
There's an additional small problem with the underscore prefix. Take the following publicly exposed class:
class Widget(_ABC):
@_abstractmethod
def implement_me(self, input: _List[int]) -> _Dict[str, object]:
...
A consumer of this package implementing his own Widget
implementation will see that he needs to implement the implement_me
method, and it needs to take a _List
and return a _Dict
. Those aren't actual type names, and now the implementation-hiding mechanism has leaked into my public interface. It's not a big problem, but it does contribute to the ugliness of this solution.
This one's definitely hacky, and it doesn't play well with most development tools.
Here's an example:
def module():
import struct
from abc import abstractmethod, ABC
from typing import BinaryIO, Dict, List
def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
while count > 16:
lst.extend(struct.unpack("<16i", r.read(16 * 4)))
count -= 16
while count > 4:
lst.extend(struct.unpack("<4i", r.read(4 * 4)))
count -= 4
for _ in range(count):
lst.append(struct.unpack("<i", r.read(4))[0])
def parse_ints(r: BinaryIO) -> List[int]:
count = struct.unpack("<i", r.read(4))[0]
rtn: List[int] = []
fill_list(r, count, rtn)
return rtn
class Widget(ABC):
@abstractmethod
def implement_me(self, input: List[int]) -> Dict[str, object]:
...
return (parse_ints, Widget)
parse_ints, Widget = module()
del module
This works, but it's super hacky, and I don't expect it to operate cleanly in all development environments. ptpython
, for example, fails to provide method signature information for the parse_ints
function. Also, the type of Widget
becomes my_package.module.<locals>.Widget
instead of my_package.Widget
, which is weird and confusing to consumers.
__all__
.This is a commonly given solution to this problem: list the "public" members in the global __all__
variable:
import struct
from abc import abstractmethod, ABC
from typing import BinaryIO, Dict, List
__all__ = ["parse_ints", "Widget"]
def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
... # You've seen this.
def parse_ints(r: BinaryIO) -> List[int]:
... # This, too.
class Widget(ABC):
... # And this.
This looks nice and clean, but unfortunately, the only thing __all__
affects is what happens when you use wildcard imports from my_package import *
, which most people don't do, anyway.
__init__.py
.This is what I'm currently doing, and it's pretty clean for most cases, but it can get ugly if I'm exposing multiple modules instead of flattening everything:
my_package/
+--__init__.py
+--_widget.py
+--shapes/
+--__init__.py
+--circle/
| +--__init__.py
| +--_circle.py
+--square/
| +--__init__.py
| +--_square.py
+--triangle/
+--__init__.py
+--_triangle.py
Then my __init__.py
files look kind of like this:
# my_package.__init__.py
from my_package._widget.py import parse_ints, Widget
# my_package.shapes.circle.__init__.py
from my_package.shapes.circle._circle.py import Circle, Sphere
# my_package.shapes.square.__init__.py
from my_package.shapes.square._square.py import Square, Cube
# my_package.shapes.triangle.__init__.py
from my_package.shapes.triangle._triangle.py import Triangle, Pyramid
This makes my interface clean, and works well with development tools, but it makes my directory structure pretty messy if my package isn't completely flat.
Is there a better technique?
prefix: This has the benefit of avoiding long import statements and the prefix helps avoid namespace collisions.
In Python-speak, modules are a namespace—a place where names are created. And names that live in a module are called its attributes. Technically, modules correspond to files, and Python creates a module object to contain all the names defined in the file; but in simple terms, modules are just namespaces.
Namespace pollution is a lot like pollution in general. It means that something is misplaced. In programming that means that code that should really live in separate namespaces is added to a common namespace (in some cases the global namespace).
Namespaces help us uniquely identify all the names inside a program. However, this doesn't imply that we can use a variable name anywhere we want. A name also has a scope that defines the parts of the program where you could use that name without using any prefix.
Convert to subpackages to limit the number of classes in a place and to separate concerns. If a class or constant is not needed outside of its module, prefix it with a double underscore. Import the module name if you do not want to explicitly import many classes from it. You have laid out all the solutions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With