Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding module namespace pollution in Python

TL;DR: What's the cleanest way to keep implementation details out of a module's namespace?

There are a number of similar questions on this topic already, but none seems to have a satisfactory answer relative to modern tools and language features.

I'm designing a Python package, and I'd like to keep each module's public interface clean, exposing only what's intended, keeping implementation details (especially imports) hidden.

Over the years, I've seen a number of techniques:

Don't worry about it. Just document how to use your package and let consumers of it just ignore the implementation details.

This is just horrible, in my opinion. A well-designed interface should be easily discoverable. Having the implementation details publicly visible makes the interface much more confusing. Even as the author of a package, I don't want to use it when it exposes too much, as it makes autocompletion less useful.

Add an underscore to the beginning of all implementation details.

This is a well-understood convention, and most development tools are smart enough to at least sort underscore-prefixed names to the bottom of autocomplete lists. It works fine if you have a small number of names to treat this way, but as the number of names grows, it becomes more and more tedious and ugly.

Take for example this relatively simple list of imports:

import struct

from abc    import abstractmethod, ABC
from enum   import Enum
from typing import BinaryIO, Dict, Iterator, List, Optional, Type, Union

Applying the underscore technique, this relatively small list of imports becomes this monstrosity:

import struct as _struct

from abc    import abstractmethod as _abstractmethod, ABC as _ABC
from enum   import Enum as _Enum
from typing import (
    BinaryIO as _BinaryIO,
    Dict     as _Dict,
    Iterator as _Iterator,
    List     as _List,
    Optional as _Optional,
    Type     as _Type,
    Union    as _Union
)

Now, I know this problem can be partially mitigated by never doing from imports, and just importing the entire package, and package-qualifying everything. While that does help this situation, and I realize that some people prefer to do this anyway, it doesn't eliminate the problem, and it's not my preference. There are some packages I prefer to import directly, but I usually prefer to import type names and decorators explicitly so that I can use them unqualified.

There's an additional small problem with the underscore prefix. Take the following publicly exposed class:

class Widget(_ABC):
    @_abstractmethod
    def implement_me(self, input: _List[int]) -> _Dict[str, object]:
        ...

A consumer of this package implementing his own Widget implementation will see that he needs to implement the implement_me method, and it needs to take a _List and return a _Dict. Those aren't actual type names, and now the implementation-hiding mechanism has leaked into my public interface. It's not a big problem, but it does contribute to the ugliness of this solution.

Hide the implementation details inside a function.

This one's definitely hacky, and it doesn't play well with most development tools.

Here's an example:

def module():
    import struct

    from abc    import abstractmethod, ABC
    from typing import BinaryIO, Dict, List

    def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
        while count > 16:
            lst.extend(struct.unpack("<16i", r.read(16 * 4)))
            count -= 16
        while count > 4:
            lst.extend(struct.unpack("<4i", r.read(4 * 4)))
            count -= 4
        for _ in range(count):
            lst.append(struct.unpack("<i", r.read(4))[0])

    def parse_ints(r: BinaryIO) -> List[int]:
        count = struct.unpack("<i", r.read(4))[0]
        rtn: List[int] = []
        fill_list(r, count, rtn)
        return rtn

    class Widget(ABC):
        @abstractmethod
        def implement_me(self, input: List[int]) -> Dict[str, object]:
            ...

    return (parse_ints, Widget)

parse_ints, Widget = module()
del module

This works, but it's super hacky, and I don't expect it to operate cleanly in all development environments. ptpython, for example, fails to provide method signature information for the parse_ints function. Also, the type of Widget becomes my_package.module.<locals>.Widget instead of my_package.Widget, which is weird and confusing to consumers.

Use __all__.

This is a commonly given solution to this problem: list the "public" members in the global __all__ variable:

import struct

from abc    import abstractmethod, ABC
from typing import BinaryIO, Dict, List

__all__ = ["parse_ints", "Widget"]

def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
    ...  # You've seen this.

def parse_ints(r: BinaryIO) -> List[int]:
    ...  # This, too.

class Widget(ABC):
    ...  # And this.

This looks nice and clean, but unfortunately, the only thing __all__ affects is what happens when you use wildcard imports from my_package import *, which most people don't do, anyway.

Convert the module to a subpackage, and expose the public interface in __init__.py.

This is what I'm currently doing, and it's pretty clean for most cases, but it can get ugly if I'm exposing multiple modules instead of flattening everything:

my_package/
+--__init__.py
+--_widget.py
+--shapes/
   +--__init__.py
   +--circle/
   |  +--__init__.py
   |  +--_circle.py
   +--square/
   |  +--__init__.py
   |  +--_square.py
   +--triangle/
      +--__init__.py
      +--_triangle.py

Then my __init__.py files look kind of like this:

# my_package.__init__.py

from my_package._widget.py import parse_ints, Widget
# my_package.shapes.circle.__init__.py

from my_package.shapes.circle._circle.py import Circle, Sphere
# my_package.shapes.square.__init__.py

from my_package.shapes.square._square.py import Square, Cube
# my_package.shapes.triangle.__init__.py

from my_package.shapes.triangle._triangle.py import Triangle, Pyramid

This makes my interface clean, and works well with development tools, but it makes my directory structure pretty messy if my package isn't completely flat.

Is there a better technique?

like image 650
P Daddy Avatar asked Aug 30 '19 14:08

P Daddy


People also ask

Which features in Python helps to avoid namespace collision?

prefix: This has the benefit of avoiding long import statements and the prefix helps avoid namespace collisions.

What is a module namespace in Python?

In Python-speak, modules are a namespace—a place where names are created. And names that live in a module are called its attributes. Technically, modules correspond to files, and Python creates a module object to contain all the names defined in the file; but in simple terms, modules are just namespaces.

What is namespace pollution?

Namespace pollution is a lot like pollution in general. It means that something is misplaced. In programming that means that code that should really live in separate namespaces is added to a common namespace (in some cases the global namespace).

What is a module namespace How is it useful?

Namespaces help us uniquely identify all the names inside a program. However, this doesn't imply that we can use a variable name anywhere we want. A name also has a scope that defines the parts of the program where you could use that name without using any prefix.


1 Answers

Convert to subpackages to limit the number of classes in a place and to separate concerns. If a class or constant is not needed outside of its module, prefix it with a double underscore. Import the module name if you do not want to explicitly import many classes from it. You have laid out all the solutions.

like image 189
spacether Avatar answered Oct 19 '22 00:10

spacether