Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to structure imports in a large python project

I have read a lot of 'how-to' articles on Python imports (and related SO questions), however I'm struggling to figure out what the 'best practice' is for managing imports in a large Python project. For example, say I have a project structure like the below (this is an oversimplification):

test/                     
    packA/                 
        subA/             
            __init__.py
            sa1.py
            sa2.py
        __init__.py
        a1.py
        a2.py
    packB/                 
        b1.py
        b2.py
    main.py

and say inside packA/subA/sa1.py I want to import code from packB/b1.py. More generally I want to be able to freely import between packages/subpackages inside the project.

Based on my current understanding, there are four ways to go about doing this:

Option 1

Add my project root to the PYTHONPATH permanently and use absolute imports everywhere in the project. So inside packA/subA/sa1.py I would have

from packB import b1

This could get a bit messy as the project tree gets larger e.g.

from packB.subC.subD.subE import f1

Option 2

Same as above, but instead of modifying PYTHONPATH to include the project root, I simply insist on only executing python from the project root (so that the root is always the working directory).

Option 3

Use relative imports

from ...packB import b1

I don't like this as it's not easy to read and everywhere I've read generally says relative imports are a bad idea.

Option 4

Use setuptools/setup.py script and install my packages using pip so I can import everywhere.

This seems like overkill since all of the code is already in the project folder (and would have to re-install each time a package changes) and could also cause headaches with dependency/version management.


So my question is, which of the above (if any) is considered best practice? I'm undecided between 1 and 2 at the moment, but would be great to hear of a more elegant/Pythonic approach.

Note: I am using Python 3.6

like image 296
user3050446 Avatar asked May 12 '20 18:05

user3050446


1 Answers

Redesign

The first thing you can do is redesign your package. Seriously, straight from the pep20 python zen:

Flat is better than nested.

So, you should try and reduce the number of nested folders in your package. For example from mypackage.module1.module2.module3 import foo should not exist. The only time you'll see modules being that nested for very large and mature packages like django or tensorfow. Even then, you'll notice their apis are still super short (ie from django.test import TestCase), even if their internal modules are complex and nested. Generally, long and nested imports are a signal of bad package design in Python.

Also, if package A and package B rely on each other (co-dependent), they should really be under the same package, and you need to rethink your design choices.


My Opinion

I always like to follow the standards from the flask or keras libraries. The keras API in particular has always been intuitive to me. I often go to that repo to inspire my own coding practices. Personally, I try to use relative imports where possible since most of the stuff I build is small (either no nesting, or one layer of nesting). However I know many larger projects opt to use absolute imports since they have more nesting layers.

One thing I've adopted from the 'keras' library is the balance between specificity and availability of imports. To import a Conv2D layer it's not so general as:

from keras import Conv2D,

but it's also not so specific as:

from keras.layers.convolational import Conv2D

There's a nice in-between with:

from keras.layers import Conv2D.

However, if we took a closer look at keras, you'll notice that the Conv2D class is still contained in its own file convolutional.py. They are able to reduce the specificity of the import by adding the following line in the __init__.py of the layers module:

from .convolutional import Conv2D

This allows you to retain package/module structure for development, but keep the client api simple and intuitive.


Use Pip

DEFINETLY DO NOT go for option 1. The purpose of pip packages is that you don't need to add random paths to your PYTHONPATH. That's not a scalable and it means the code won't work on other machines or as a standalone package without editing the PYTHONPATH. It's better to have a setup.py in each package and install things via pip.

It's super annoying to keep having to pip uninstall . and pip install . every-time you make a change to your package, so that's why pip has the -e flag:

pip install -e .

The -e flag is an editable install, which installs the package such that any changes you make to the code take effect immediately so you don't have to keep pip uninstalling and reinstalling after are changes made.

like image 139
Jay Mody Avatar answered Sep 28 '22 01:09

Jay Mody