I hope the following question is not too long. But otherwise I cannot explain by problem and what I want:
Learned from How to use importlib to import modules from arbitrary sources? (my question of yesterday) I have written a specfic loader for a new file type (.xxx). (In fact the xxx is an encrypted version of a pyc to protect code from being stolen).
I would like just to add an import hook for the new file type "xxx" without affecting the other types (.py, .pyc, .pyd) in any way.
Now, the loader is ModuleLoader
, inheriting from mportlib.machinery.SourcelessFileLoader
.
Using sys.path_hooks
the loader shall be added as a hook:
myFinder = importlib.machinery.FileFinder
loader_details = (ModuleLoader, ['.xxx'])
sys.path_hooks.append(myFinder.path_hook(loader_details))
Note: This is activated once by calling modloader.activateLoader()
Upon loading a module named test
(which is a test.xxx
) I get:
>>> import modloader
>>> modloader.activateLoader()
>>> import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'test'
>>>
However, when I delete content of sys.path_hooks
before adding the hook:
sys.path_hooks = []
sys.path.insert(0, '.') # current directory
sys.path_hooks.append(myFinder.path_hook(loader_details))
it works:
>>> modloader.activateLoader()
>>> import test
using xxx class
in xxxLoader exec_module
in xxxLoader get_code: .\test.xxx
ANALYZING ...
GENERATE CODE OBJECT ...
2 0 LOAD_CONST 0
3 LOAD_CONST 1 ('foo2')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (foo2)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>>>>> test
<module 'test' from '.\\test.xxx'>
The module is imported correctly after conversion of the files content to a code object.
However I cannot load the same module from a package: import pack.test
Note: __init__.py
is of course as an empty file in pack directory.
>>> import pack.test
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.test'; 'pack' is not a package
>>>
Not enough, I cannot load plain *.py modules from that package anymore: I get the same error as above:
>>> import pack.testpy
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 2218, in _find_and_load_unlocked
AttributeError: 'module' object has no attribute '__path__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pack.testpy'; 'pack' is not a package
>>>
For my understanding sys.path_hooks
is traversed until the last entry is tried. So why is the first variant (without deleting sys.path_hooks
) not recognizing the new extension "xxx" and the second variant (deleting sys.path_hooks
) do?
It looks like the machinery is throwing an exception rather than traversing further to the next entry, when an entry of sys.path_hooks
is not able to recognize "xxx".
And why is the second version working for py, pyc and xxx modules in the current directory, but not working in the package pack
? I would expect that py and pyc is not even working in the current dir, because sys.path_hooks
contains only a hook for "xxx"...
__loader__ is an attribute that is set on an imported module by its loader. Accessing it should return the loader object itself. In Python versions before 3.3, __loader__ was not set by the built-in import machinery. Instead, this attribute was only available on modules that were imported using a custom loader.
So there's four different ways to import: Import the whole module using its original name: pycon import random. Import specific things from the module: pycon from random import choice, randint. Import the whole module and rename it, usually using a shorter variable name: pycon import pandas as pd.
The sys module in Python provides various functions and variables that are used to manipulate different parts of the Python runtime environment. It allows operating on the interpreter as it provides access to the variables and functions that interact strongly with the interpreter.
The short answer is that the default PathFinder in sys.meta_path
isn't meant to have new file extensions and importers added in the same paths it already supports. But there's still hope!
Quick Breakdown
sys.path_hooks
is consumed by the importlib._bootstrap_external.PathFinder
class.
When an import happens, each entry in sys.meta_path
is asked to find a matching spec for the requested module. The PathFinder in particular will then take the contents of sys.path
and pass it to the factory functions in sys.path_hooks
. Each factory function has a chance to either raise an ImportError (basically the factory saying "nope, I don't support this path entry") or return a finder instance for that path. The first successfully returned finder is then cached in sys.path_importer_cache
. From then on PathFinder will only ask those cached finder instances if they can provide the requested module.
If you look at the contents of sys.path_importer_cache
, you'll see all of the directory entries from sys.path
have been mapped to FileFinder instances. Non-directory entries (zip files, etc) will be mapped to other finders.
Thus, if you append a new factory created via FileFinder.path_hook
to sys.path_hooks
, your factory will only be invoked if the previous FileFinder hook didn't accept the path. This is unlikely, since FileFinder will work on any existing directory.
Alternatively, if you insert your new factory to sys.path_hooks ahead of the existing factories, the default hook will only be used if your new factory doesn't accept the path. And again, since FileFinder is so liberal with what it will accept, this would lead to only your loader being used, as you've already observed.
Making it Work
So you can either try to adjust that existing factory to also support your file extension and importer (which is difficult as the importers and extension string tuples are held in a closure), or do what I ended up doing, which is add a new meta path finder.
So eg. from my own project,
import sys
from importlib.abc import FileLoader
from importlib.machinery import FileFinder, PathFinder
from os import getcwd
from os.path import basename
from sibilant.module import prep_module, exec_module
SOURCE_SUFFIXES = [".lspy", ".sibilant"]
_path_importer_cache = {}
_path_hooks = []
class SibilantPathFinder(PathFinder):
"""
An overridden PathFinder which will hunt for sibilant files in
sys.path. Uses storage in this module to avoid conflicts with the
original PathFinder
"""
@classmethod
def invalidate_caches(cls):
for finder in _path_importer_cache.values():
if hasattr(finder, 'invalidate_caches'):
finder.invalidate_caches()
@classmethod
def _path_hooks(cls, path):
for hook in _path_hooks:
try:
return hook(path)
except ImportError:
continue
else:
return None
@classmethod
def _path_importer_cache(cls, path):
if path == '':
try:
path = getcwd()
except FileNotFoundError:
# Don't cache the failure as the cwd can easily change to
# a valid directory later on.
return None
try:
finder = _path_importer_cache[path]
except KeyError:
finder = cls._path_hooks(path)
_path_importer_cache[path] = finder
return finder
class SibilantSourceFileLoader(FileLoader):
def create_module(self, spec):
return None
def get_source(self, fullname):
return self.get_data(self.get_filename(fullname)).decode("utf8")
def exec_module(self, module):
name = module.__name__
source = self.get_source(name)
filename = basename(self.get_filename(name))
prep_module(module)
exec_module(module, source, filename=filename)
def _get_lspy_file_loader():
return (SibilantSourceFileLoader, SOURCE_SUFFIXES)
def _get_lspy_path_hook():
return FileFinder.path_hook(_get_lspy_file_loader())
def _install():
done = False
def install():
nonlocal done
if not done:
_path_hooks.append(_get_lspy_path_hook())
sys.meta_path.append(SibilantPathFinder)
done = True
return install
_install = _install()
_install()
The SibilantPathFinder overrides PathFinder and replaces only those methods which reference sys.path_hook
and sys.path_importer_cache
with similar implementations which instead look in a _path_hook
and _path_importer_cache
which are local to this module.
During import, the existing PathFinder will try to find a matching module. If it cannot, then my injected SibilantPathFinder will re-traverse the sys.path
and try to find a match with one of my own file extensions.
Figuring More Out
I ended up delving into the source for the _bootstrap_external module https://github.com/python/cpython/blob/master/Lib/importlib/_bootstrap_external.py
The _install
function and the PathFinder.find_spec
method are the best starting points to seeing why things work the way they do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With