Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use sys.path_hooks for customized loading of modules?

I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:

Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday) I have written a specfic loader for a new file type (.xxx). (In fact the xxx is an encrypted version of a pyc to protect code from being stolen).

I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.

Now, the loader is ModuleLoader, inheriting from mportlib.machinery.SourcelessFileLoader.

Using sys.path_hooks the loader shall be added as a hook:

myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))

Note: This is activated once by calling modloader.activateLoader()

Upon loading a module named test (which is a test.xxx) I get:

>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>

However, when I delete content of sys.path_hooks before adding the hook:

sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))

it works:

>>> modloader.activateLoader()
>>> import test
using xxx class

in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...

GENERATE CODE OBJECT ...

  2           0 LOAD_CONST               0
              3 LOAD_CONST               1 ('foo2')
              6 MAKE_FUNCTION            0
              9 STORE_NAME               0 (foo2)
             12 LOAD_CONST               2 (None)
             15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>

The module is imported correctly after conversion of the files content to a code object.

However I cannot load the same module from a package: import pack.test

Note: __init__.py is of course as an empty file in pack directory.

>>> import pack.test
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>

Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:

>>> import pack.testpy
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>

For my understanding sys.path_hooks is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks) do? It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks is not able to recognize "xxx".

And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks contains only a hook for "xxx"...

like image 638
MichaelW Avatar asked Feb 01 '17 21:02

MichaelW


People also ask

What is __ loader __ in Python?

__loader__ is an attribute that is set on an imported module by its loader. Accessing it should return the loader object itself. In Python versions before 3.3, __loader__ was not set by the built-in import machinery. Instead, this attribute was only available on modules that were imported using a custom loader.

What are the different methods of importing the Python module?

So there's four different ways to import: Import the whole module using its original name: pycon import random. Import specific things from the module: pycon from random import choice, randint. Import the whole module and rename it, usually using a shorter variable name: pycon import pandas as pd.

What is the use of sys module in Python?

The sys module in Python provides various functions and variables that are used to manipulate different parts of the Python runtime environment. It allows operating on the interpreter as it provides access to the variables and functions that interact strongly with the interpreter.


1 Answers

The short answer is that the default PathFinder in sys.meta_path isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!

Quick Breakdown

sys.path_hooks is consumed by the importlib._bootstrap_external.PathFinder class.

When an import happens, each entry in sys.meta_path is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path and pass it to the factory functions in sys.path_hooks. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.

If you look at the contents of sys.path_importer_cache, you'll see all of the directory entries from sys.path have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.

Thus, if you append a new factory created via FileFinder.path_hook to sys.path_hooks, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.

Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.

Making it Work

So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.

So eg. from my own project,


import sys

from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename

from sibilant.module import prep_module, exec_module


SOURCE_SUFFIXES = [".lspy", ".sibilant"]


_path_importer_cache = {}
_path_hooks = []


class SibilantPathFinder(PathFinder):
    """
    An overridden PathFinder which will hunt for sibilant files in
    sys.path. Uses storage in this module to avoid conflicts with the
    original PathFinder
    """


    @classmethod
    def invalidate_caches(cls):
        for finder in _path_importer_cache.values():
            if hasattr(finder, 'invalidate_caches'):
                finder.invalidate_caches()


    @classmethod
    def _path_hooks(cls, path):
        for hook in _path_hooks:
            try:
                return hook(path)
            except ImportError:
                continue
        else:
            return None


    @classmethod
    def _path_importer_cache(cls, path):
        if path == '':
            try:
                path = getcwd()
            except FileNotFoundError:
                # Don't cache the failure as the cwd can easily change to
                # a valid directory later on.
                return None
        try:
            finder = _path_importer_cache[path]
        except KeyError:
            finder = cls._path_hooks(path)
            _path_importer_cache[path] = finder
        return finder


class SibilantSourceFileLoader(FileLoader):


    def create_module(self, spec):
        return None


    def get_source(self, fullname):
        return self.get_data(self.get_filename(fullname)).decode("utf8")


    def exec_module(self, module):
        name = module.__name__
        source = self.get_source(name)
        filename = basename(self.get_filename(name))

        prep_module(module)
        exec_module(module, source, filename=filename)


def _get_lspy_file_loader():
    return (SibilantSourceFileLoader, SOURCE_SUFFIXES)


def _get_lspy_path_hook():
    return FileFinder.path_hook(_get_lspy_file_loader())


def _install():
    done = False

    def install():
        nonlocal done
        if not done:
            _path_hooks.append(_get_lspy_path_hook())
            sys.meta_path.append(SibilantPathFinder)
            done = True

    return install


_install = _install()
_install()

The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook and sys.path_importer_cache with similar implementations which instead look in a _path_hook and _path_importer_cache which are local to this module.

During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path and try to find a match with one of my own file extensions.

Figuring More Out

I ended up delving into the source for the _bootstrap_external module https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py

The _install function and the PathFinder.find_spec method are the best starting points to seeing why things work the way they do.

like image 117
obriencj Avatar answered Nov 14 '22 22:11

obriencj