Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly structure internal scripts in a Python project?

Consider the following Python project skeleton:

proj/
├── foo
│   └── __init__.py
├── README.md
└── scripts
    └── run.py

In this case foo holds the main project files, for example

# foo/__init__.py
class Foo():
    def run(self):
        print('Running...')

And scripts holds auxiliary scripts that need to import files from foo, which are then invoked via:

[~/proj]$ python scripts/run.py

There are two ways of importing Foo which both fail:

  1. If a relative import is attempted from ..foo import Foo then the error is ValueError: attempted relative import beyond top-level package
  2. If an absolute import is attempted from foo import Foo then the error is ModuleNotFoundError: No module named 'foo'

My current workaround is to append the running path to sys.path:

import sys
sys.path.append('.')

from foo import Foo
Foo().run()

But this feels like a hack, and has to be added to every new script in scripts/.

Is there a better way to structure scripts in such projects?

like image 290
Yuval Adam Avatar asked Sep 01 '19 08:09

Yuval Adam


2 Answers

Best practice? Put a single entry-point in the root

I know this might sound absurd, if you have lots of scripts you want to be able to execute... But it's actually the cleanest option and it's the one that is most often used in big Python projects like magage.py in Django, for example. It also doesn't need to be a huge undertaking. Even more importantly, it is always more secure to have a single entry point than several smaller ones.

proj/
├── run.py
├── foo
│   └── __init__.py
├── README.md
└── scripts
    └── my_script.py

When run.py lives in the root directory, it can be very lightweight... Basically just a wrapper to call the function you need from my_scripts.py. It just ties everything together so now all of your imports just work.

Just keep in mind that your entrypoint is your root. The parent of a root doesn't exist. So put your entrypoint in the root, and then import packages relative to the root, aka import foo from scripts.

But how do I call multiple scripts!?

If you need to be able to call multiple scripts, this is a good argument for... Well... arguments! Keep run.py as your single entrypoint/command, and leverage subcommands to pass functionality to the script you care about.

Reinventing the wheel?

Generally, frameworks have already done the architecture for you to add your own subcommands, such as Django and, for a smaller footprint, Flask.

You can easily wrap up a small project without that help, though, as I've illustrated.

Security

No one ever wishes their code was less refactorable after a few years of working with it. No one ever wishes their codebase has less security. As we drive to more secure systems in general, it would make sense to create some gatekeeper script that determines what is and isn’t a safe operation and by whom. Moving the code to an LDAP based system, and need to lock things down by group? No problem. You can either change the single file or add LDAP security in your codebase, even creating your own internal API.

With distributed scripts, security options are much less flexible and much harder to maintain, and a single vulnerability could leave you wide open to exploit.

Bonus advantage You're adding abstraction to your script base. If you ever want to change the structure of your codebase (maybe you want scripts to have subfolders with more organization), you/your users don't need to do any refactoring for any dependencies, or change paths to longer, more verbose names. Your package is self-contained, and the only thing a user will ever need to touch is your proj/run.py entry-point.

And, obviously, you don't need to play with Python paths as much!

like image 115
smcjones Avatar answered Sep 24 '22 03:09

smcjones


There's two ways you could resolve this.

(1) Turn your project into an installable package

Add a proj/setup.py file with the following contents:

import setuptools

setuptools.setup(
    name="my-project",
    version="1.0.0",
    author="You",
    author_email="[email protected]",
    description="This is my project",
    packages=["foo"],
)

create a virtualenv:

python3 -m venv virtualenv  # this creates a directory "virtualenv" in your project
source ./virtualenv/bin/activate  # this switches you into the new environment
python setup.py develop  # this places your "foo" package in the environment

inside the virtualenv, foo behaves as an installed package and is importable via import foo.

So you can use absolute imports in your scripts.

To make them run from anywhere, without needing to activate the virtualenv, you can then specify the path as a shebang.

In scripts/run.py (the first line is important):

#!/path/to/proj/virtualenv/bin/python

import foo

print(foo.callfunc())

(2) Make the scripts part of the foo package

Instead of a separate subdirectory scripts, make a subpackage. In proj/foo/commands/run.py:

from .. import callfunc()

def main():
    print(callfunc())

if __name__ == "__main__":
    main()

Then execute the script from the top-level proj/ directory with:

python -m foo.commands.run

If you combine this with (1) and install your package, you can then run python -m foo.commands.run from anywhere.

like image 22
matejcik Avatar answered Sep 22 '22 03:09

matejcik