Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if package is imported from within the source tree

Users should install our python package via pip or it can be cloned from a github repo and installed from source. Users should not be running import Foo from within the source tree directory for a number of reasons, e.g. C extensions are missing (numpy has the same issue: read here). So, we want to check if the user is running import Foo from within the source tree, but how to do this cleanly, efficiently, and robustly with support for Python 3 and 2?

Edit: Note the source tree here is defined as where the code is downloaded too (e.g. via git or from the source archive) and it contrasts with the installation directory where the code is installed too.

We considered the following:

  • Check for setup.py, or other file like PKG-INFO, which should only be present in the source. It’s not that elegant and checking for the presence of a file is not very cheap, given this check will happen every time someone import Foo. Also there is nothing to stop someone from putting a setup.py outside to the source tree in their lib/python3.X/site-packages/ directory or similar.
  • Parsing the contents of setup.py for the package name, but it also adds overhead and is not that clean to parse.
  • Create a dummy flag file that is only present in the source tree.
  • Some clever, but likely overcomplicated and error-prone, ideas like modifying Foo/__init__.py during installation to note that we are now outside of the source tree.
like image 516
Chris_Rands Avatar asked Apr 29 '19 09:04

Chris_Rands


People also ask

How do you check if a package is already installed in Python?

The Pip, Pipenv, Anaconda Navigator, and Conda Package Managers can all be used to list installed Python packages. You can also use the ActiveState Platform's command line interface (CLI), the State Tool to list all installed packages using a simple “state packages” command.

How do I list installed packages in pip?

If you want to list all the Python packages installed in an environment, pip list command is what you are looking for. The command will return all the packages installed, along with their specific version and location. If a package is installed from a remote host (for example PyPI or Nexus) the location will be empty.

Where are Python packages installed?

When a package is installed globally, it's made available to all users that log into the system. Typically, that means Python and all packages will get installed to a directory under /usr/local/bin/ for a Unix-based system, or \Program Files\ for Windows.


1 Answers

Since you mention numpy in your comments and wanting to do it like they do but not fully understanding it, I figured I would break that down and see if you could implement a similar process.


__init__.py

The error you are seeking starts here which is what you linked to in your comments and answers so you already know that. It's just attempting an import of __config__.py and failing if it isn't there or can't be imported.

    try:
        from numpy.__config__ import show as show_config
    except ImportError:
        msg = """Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there."""
        raise ImportError(msg)

So where does the __config__.py file come from then and how does that help? Let's follow below...

setup.py

When the package is installed, setup is called to run and it in turn does some configuration actions. This is essentially what ensures that the package is properly installed rather than being run from the download directory (which I think is what you are wanting to ensure).

The key here is this line:

config.make_config_py() # installs __config__.py

misc_util.py

That is imported from distutils/misc_util.py which we can follow all the way down to here.

    def make_config_py(self,name='__config__'):
        """Generate package __config__.py file containing system_info
        information used during building the package.
        This file is installed to the
        package installation directory.
        """
        self.py_modules.append((self.name, name, generate_config_py))

Which is then running here which is writing in that __config__.py file with some system information and your show() function.


Summary
The import of __config__.py is attempted and fails which generates the error you are wanting to raise if setup.py wasn't run, which is what triggers that file to be properly created. That ensures not only that a file check is being done but that the file only exists in the installation directory. It is still some overhead of importing an additional file on every import but no matter what you do you're adding some amount of overhead making this check in the first place.


Suggestions

I think that you could implement a much lighter weight version of what numpy is doing while accomplishing the same thing.

Remove the distutils subfunction and create the checked file within your setup.py file as part of the standard install. It would only exist in the installed directory after installation and never elsewhere unless a user faked that (in which case they could get around just about anything you try probably).

As an alternative (without knowing your application and what your setup file is doing) possibly you have a function that is normally imported anyway that isn't key to the running of the application but is good to have available (in numpy's case the functions are information about the installation like version(). Instead of keeping those functions where you put them now, you make them part of this file that is created. Then you are at least loading something that you would otherwise load anyway from somewhere else.

Using this method you are importing something no matter what, which has some overhead, or raising the error. I think as far as methods to raise an error because they aren't working out of the installed directory, it's a pretty clean and straightforward way to do it. No matter what method you use, you have some overhead of using that method so I would focus on keeping the overhead low, simple, and not going to cause errors.

I wouldn't do something that is complicated like parsing the setup file or modifying necessary files like __init__.py somewhere. I think you are right that those methods would be more error prone.

Checking if setup.py exists could work but I would consider it less clean than attempting to import which is already optimized as a standard Python function. They accomplish similar things but I think implemented the numpy style is going to be more straight forward.

like image 103
MyNameIsCaleb Avatar answered Oct 18 '22 10:10

MyNameIsCaleb