Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices for top level __init__.py imports

When a package becomes large, it can be hard to remember where things are, and dot-paths to the object we want can be cumbersome. One way that authors seem to address this is to bring references to the "best of" objects to the top, though there code can actually live several package levels below.

This allows one to say:

from pandas import wide_to_long

instead of

from pandas.core.reshape.melt import wide_to_long

But what are the ins and outs of doing this, and the best practices around the method? Doesn't loading the top __init__.py with many imports (in order to make them available at the top level) mean that any import of a single object of the package suddenly takes much more memory than needed - since everything mentioned in the __init__.py is automatically loaded?

Yet, packages do it. See, for example, what can be imported from top level numpy or pandas below (code to run your own diagnosis can be found in this gist).

$ python print_top_level_diagnosis.py numpy
--------- numpy ---------
599 objects can be imported from top level numpy:
  19 modules
  300 functions
  104 types

depth   count
0   162
1   406
2   2
3   29
4   1

$ python print_top_level_diagnosis.py pandas
--------- pandas ---------
115 objects can be imported from top level pandas:
  12 modules
  55 functions
  40 types

depth   count
0   12
3   37
4   65
5   1

like image 861
thorwhalen Avatar asked Dec 19 '25 06:12

thorwhalen


1 Answers

This is definitely a more philosophical question since there's no hard or fast ruleset that's used, so take my opinion with a grain of salt.

I like the phrase "API encourages behavior". Part of the behavior you want to encourage is the primary use case, the main ins and outs of your API. So for something like pandas, DataFrame and Series are naturally exposed in the root __init__.py file. For larger packages like scipy, __init__.py files are used in sub-packages because the use case for the package is too broad to expose everything at the top.

Keep in mind that not everything needs to be exposed in an __init__.py file. Keeping your more power-user features out of an __init__.py file both protects new users from tumbling down a rabbit hole and communicates that there's something special going on when you see an import from deep within a package.

For example:

# Anyone will immediately associate this import with a standard use case
from pandas import DataFrame

# Most people, even seasoned pandas users, will take a second to question why this import exists
from pandas.core.internals.ops import operate_blockwise

Lastly, using __init__.py files is helpful for keeping your code organized and protecting users (including internal users!) from restructuring. It doesn't matter where DataFrame is actually defined, because the established usage promises that DataFrame will be a top-level pandas import. This can be really handy for growing projects, but just like any interface, you need to be diligent about honoring established interfaces, even if it's just the location of an import.

In summary:

  • Encourage behavior by exposing the main parts of your API
  • Protect novices from more advanced features
  • Organize your code
like image 97
Thomas Satterly Avatar answered Dec 21 '25 20:12

Thomas Satterly



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!