Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:

Projects
    + Project_1
    + Project_2
    - Project_3
        - Lib1
            __init__.py # empty
            moduleA.py
        - Tests
            __init__.py # empty
            foo_tests.py
            bar_tests.py
            setpath.py
        __init__.py     # empty
        foo.py
        bar.py

Goals:

  1. Have an organized project structure
  2. Be able to independently run each .py file when necessary
  3. Be able to reference/import both sibling and cousin modules
  4. Keep all import/from statements at the beginning of each file.

I Achieved #1 by using the above structure

I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)

In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:

import os
import sys
sys.path.insert(0, os.path.abspath('..'))

sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))

Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:

import setpath      # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py

Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.

My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.

import setpath

import Project_3.foo.py  # Error
from Project_3 import foo  # Error

I receive the error "ImportError: No module named 'Project_3'".

What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.

For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.

I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.

like image 591
BrianHVB Avatar asked Mar 02 '16 04:03

BrianHVB


2 Answers

Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.

First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.

Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.

Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).

And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.

like image 84
RobertT Avatar answered Sep 28 '22 09:09

RobertT


I think your goals are not reasonable. Specifically, goal number 2 is a problem:

  1. Be able to independently run each .py file when neccessary

This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.

There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.

The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

like image 42
Blckknght Avatar answered Sep 28 '22 10:09

Blckknght