Best practices for top level __init__.py imports

Question

When a package becomes large, it can be hard to remember where things are, and dot-paths to the object we want can be cumbersome. One way that authors seem to address this is to bring references to the "best of" objects to the top, though there code can actually live several package levels below.

This allows one to say:

from pandas import wide_to_long

instead of

from pandas.core.reshape.melt import wide_to_long

But what are the ins and outs of doing this, and the best practices around the method? Doesn't loading the top __init__.py with many imports (in order to make them available at the top level) mean that any import of a single object of the package suddenly takes much more memory than needed - since everything mentioned in the __init__.py is automatically loaded?

Yet, packages do it. See, for example, what can be imported from top level numpy or pandas below (code to run your own diagnosis can be found in this gist).

$ python print_top_level_diagnosis.py numpy
--------- numpy ---------
599 objects can be imported from top level numpy:
  19 modules
  300 functions
  104 types

depth   count
0   162
1   406
2   2
3   29
4   1

$ python print_top_level_diagnosis.py pandas
--------- pandas ---------
115 objects can be imported from top level pandas:
  12 modules
  55 functions
  40 types

depth   count
0   12
3   37
4   65
5   1

Thomas Satterly · Accepted Answer

This is definitely a more philosophical question since there's no hard or fast ruleset that's used, so take my opinion with a grain of salt.

I like the phrase "API encourages behavior". Part of the behavior you want to encourage is the primary use case, the main ins and outs of your API. So for something like pandas, DataFrame and Series are naturally exposed in the root __init__.py file. For larger packages like scipy, __init__.py files are used in sub-packages because the use case for the package is too broad to expose everything at the top.

Keep in mind that not everything needs to be exposed in an __init__.py file. Keeping your more power-user features out of an __init__.py file both protects new users from tumbling down a rabbit hole and communicates that there's something special going on when you see an import from deep within a package.

For example:

# Anyone will immediately associate this import with a standard use case
from pandas import DataFrame

# Most people, even seasoned pandas users, will take a second to question why this import exists
from pandas.core.internals.ops import operate_blockwise

Lastly, using __init__.py files is helpful for keeping your code organized and protecting users (including internal users!) from restructuring. It doesn't matter where DataFrame is actually defined, because the established usage promises that DataFrame will be a top-level pandas import. This can be really handy for growing projects, but just like any interface, you need to be diligent about honoring established interfaces, even if it's just the location of an import.

In summary:

Encourage behavior by exposing the main parts of your API
Protect novices from more advanced features
Organize your code

Best practices for top level init.py imports

Tags:

python

python-3.x

python-import

thorwhalen

1 Answers

Thomas Satterly

Recent Activity

Donate For Us