<h3>Summary</h3> I am working on a series of add-ons for Anki, an open-source flashcard program. Anki add-ons are shipped as Python packages, with the basic folder structure looking as follows: <pre class="prettyprint"><code>anki_addons/ addon_name_1/ __init__.py addon_name_2/ __init__.py </code></pre> <code>anki_addons</code> is appended to <code>sys.path</code> by the base app, which then imports each add_on with <code>import <addon_name></code>. The problem I have been trying to solve is to find a reliable way to ship packages and their dependencies with my add-ons while not polluting global state or falling back to manual edits of the vendored packages. <h3>Specifics</h3> Specifically, given an add-on structure like this... <pre class="prettyprint"><code>addon_name_1/ __init__.py _vendor/ __init__.py library1 library2 dependency_of_library2 ... </code></pre> ...I would like to be able to import any arbitrary package that is included in the <code>_vendor</code> directory, e.g.: <pre class="prettyprint"><code>from ._vendor import library1 </code></pre> The main difficulty with relative imports like this is that they do not work for packages that also depend on other packages imported through absolute references (e.g. <code>import dependency_of_library2</code> in the source code of <code>library2</code>) <h3>Solution attempts</h3> So far I have explored the following options: <ol> <li>Manually updating the third-party packages, so that their import statements point to the fully qualified module path within my python package (e.g. <code>import addon_name_1._vendor.dependency_of_library2</code>). But this is tedious work that is not scalable to larger dependency trees and not portable to other packages.</li> <li>Adding <code>_vendor</code> to <code>sys.path</code> via <code>sys.path.insert(1, <path_to_vendor_dir>)</code> in my package init file. This works, but it introduces a global change to the module look-up path which will affect other add-ons and even the base app itself. It just seems like a hack that could result in a pandora's box of issues later down the line (e.g. conflicts between different versions of the same package, etc.).</li> <li> Temporarily modifying sys.path for my imports; but this fails to work for third-party modules with method-level imports.</li> <li>Writing a PEP302-style custom importer based off an example I found in setuptools, but I just couldn't make head nor tail of that.</li> </ol> <hr> I've been stuck on this for quite a few hours now and I'm beginning to think that I'm either completely missing an easy way to do this, or that there is something fundamentally wrong with my entire approach. Is there no way I can ship a dependency tree of third-party packages with my code, without having to resort to <code>sys.path</code> hacks or modifying the packages in question? <hr> Edit: Just to clarify: I don't have any control over how add-ons are imported from the anki_addons folder. anki_addons is just the directory provided by the base app where all add-ons are installed into. It is added to the sys path, so the add-on packages therein pretty much just behave like any other python package located in Python's module look-up paths.

First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the <code>requests</code> library. If you are relying on people using <code>pip install</code> to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python <code>site-packages</code> location. At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as <code>.zip</code> files without setuptools support, so that's certainly such an environment. So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated. This is the path that the <code>pip</code> project has chosen, see their <code>tasks</code> subdirectory for their automation, which builds on the <code>invoke</code> library. See the pip project vendoring README for their policy and rationale (chief among those is that <code>pip</code> needs to bootstrap itself, e.g. have their dependencies available to be able to install anything). You should not use any of the other options; you already enumerated the issues with #2 and #3. The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by <code>setuptools</code> doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that <code>pip</code> solves with a manual debundling process). <code>setuptools</code> actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the <code>packaging</code> project in the <code>setuptools</code> vendored subpackage; the <code>setuptools.extern</code> namespace is handled by the custom import hook, which then redirects either to <code>setuptools._vendor</code> or the top-level name if importing from the vendorized package fails. The <code>pip</code> automation to update vendored packages takes the following steps: <ul> <li>Delete everything in the <code>_vendor/</code> subdirectory except the documentation, the <code>__init__.py</code> file and the requirements text file.</li> <li>Use <code>pip</code> to install all vendored dependencies into that directory, using a dedicated requirements file named <code>vendor.txt</code>, avoiding compilation of <code>.pyc</code> bytecache files and ignoring transient dependencies (these are assumed to be listed in <code>vendor.txt</code> already); the command used is <code>pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps</code>.</li> <li>Delete everything that was installed by <code>pip</code> but not needed in a vendored environment, i.e. <code>*.dist-info</code>, <code>*.egg-info</code>, the <code>bin</code> directory, and a few things from installed dependencies that <code>pip</code> would never use.</li> <li>Collect all installed directories and added files sans <code>.py</code> extension (so anything not in the whitelist); this is the <code>vendored_libs</code> list.</li> <li>Rewrite imports; this is simply a series of regexes, where every name in <code>vendored_lists</code> is used to replace <code>import <name></code> occurrences with <code>import pip._vendor.<name></code> and every <code>from <name>(.*) import</code> occurrence with <code>from pip._vendor.<name>(.*) import</code>.</li> <li>Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the <code>pip</code> patch for <code>requests</code> is interesting here in that it updates the <code>requests</code> library backwards compatibility layer for the vendored packages that the <code>requests</code> library had removed; this patch is quite meta!</li> </ul> So in essence, the most important part of the <code>pip</code> approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the <code>pip</code> specific parts, it is simply the following process: <pre class="prettyprint"><code>import shutil import subprocess import re from functools import partial from itertools import chain from pathlib import Path WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'} def delete_all(*paths, whitelist=frozenset()): for item in paths: if item.is_dir(): shutil.rmtree(item, ignore_errors=True) elif item.is_file() and item.name not in whitelist: item.unlink() def iter_subtree(path): """Recursively yield all files in a subtree, depth-first""" if not path.is_dir(): if path.is_file(): yield path return for item in path.iterdir(): if item.is_dir(): yield from iter_subtree(item) elif item.is_file(): yield item def patch_vendor_imports(file, replacements): text = file.read_text('utf8') for replacement in replacements: text = replacement(text) file.write_text(text, 'utf8') def find_vendored_libs(vendor_dir, whitelist): vendored_libs = [] paths = [] for item in vendor_dir.iterdir(): if item.is_dir(): vendored_libs.append(item.name) elif item.is_file() and item.name not in whitelist: vendored_libs.append(item.stem) # without extension else: # not a dir or a file not in the whilelist continue paths.append(item) return vendored_libs, paths def vendor(vendor_dir): # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}' # remove everything delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST) # install with pip subprocess.run([ 'pip', 'install', '-t', str(vendor_dir), '-r', str(vendor_dir / 'vendor.txt'), '--no-compile', '--no-deps' ]) # delete stuff that's not needed delete_all( *vendor_dir.glob('*.dist-info'), *vendor_dir.glob('*.egg-info'), vendor_dir / 'bin') vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST) replacements = [] for lib in vendored_libs: replacements += ( partial( # import bar -> import foo._vendor.bar re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub, r'\1from {} import {}\n'.format(pkgname, lib) ), partial( # from bar -> from foo._vendor.bar re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub, r'\1from {}.{}\2'.format(pkgname, lib) ), ) for file in chain.from_iterable(map(iter_subtree, paths)): patch_vendor_imports(file, replacements) if __name__ == '__main__': # this assumes this is a script in foo next to foo/_vendor here = Path('__file__').resolve().parent vendor_dir = here / 'foo' / '_vendor' assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found' assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found' vendor(vendor_dir) </code></pre>

How about making your <code>anki_addons</code> folder a package and importing the the required libraries to <code>__init__.py</code> in the main package folder. So it'd be something like <pre class="prettyprint"><code>anki/ __init__.py </code></pre> In <code>anki.__init__.py</code> : <pre class="prettyprint"><code>from anki_addons import library1 </code></pre> In <code>anki.anki_addons.__init__.py</code> : <pre class="prettyprint"><code>from addon_name_1 import * </code></pre> I'm new at this, so please bear with me here.

Import vendored dependencies in Python package without modifying sys.path or 3rd party packages

Summary

I am working on a series of add-ons for Anki, an open-source flashcard program. Anki add-ons are shipped as Python packages, with the basic folder structure looking as follows:

anki_addons/     addon_name_1/         __init__.py     addon_name_2/         __init__.py

anki_addons is appended to sys.path by the base app, which then imports each add_on with import <addon_name>.

The problem I have been trying to solve is to find a reliable way to ship packages and their dependencies with my add-ons while not polluting global state or falling back to manual edits of the vendored packages.

Specifics

Specifically, given an add-on structure like this...

addon_name_1/     __init__.py     _vendor/         __init__.py         library1         library2         dependency_of_library2         ...

...I would like to be able to import any arbitrary package that is included in the _vendor directory, e.g.:

from ._vendor import library1

The main difficulty with relative imports like this is that they do not work for packages that also depend on other packages imported through absolute references (e.g. import dependency_of_library2 in the source code of library2)

Solution attempts

So far I have explored the following options:

Manually updating the third-party packages, so that their import statements point to the fully qualified module path within my python package (e.g. import addon_name_1._vendor.dependency_of_library2). But this is tedious work that is not scalable to larger dependency trees and not portable to other packages.
Adding _vendor to sys.path via sys.path.insert(1, <path_to_vendor_dir>) in my package init file. This works, but it introduces a global change to the module look-up path which will affect other add-ons and even the base app itself. It just seems like a hack that could result in a pandora's box of issues later down the line (e.g. conflicts between different versions of the same package, etc.).
Temporarily modifying sys.path for my imports; but this fails to work for third-party modules with method-level imports.
Writing a PEP302-style custom importer based off an example I found in setuptools, but I just couldn't make head nor tail of that.

I've been stuck on this for quite a few hours now and I'm beginning to think that I'm either completely missing an easy way to do this, or that there is something fundamentally wrong with my entire approach.

Is there no way I can ship a dependency tree of third-party packages with my code, without having to resort to sys.path hacks or modifying the packages in question?

Edit:

Just to clarify: I don't have any control over how add-ons are imported from the anki_addons folder. anki_addons is just the directory provided by the base app where all add-ons are installed into. It is added to the sys path, so the add-on packages therein pretty much just behave like any other python package located in Python's module look-up paths.

680

asked Sep 27 '18 13:09

Glutanimate

2 Answers

First of all, I'd advice against vendoring; a few major packages did use vendoring before but have switched away to avoid the pain of having to handle vendoring. One such example is the requests library. If you are relying on people using pip install to install your package, then just use dependencies and tell people about virtual environments. Don't assume you need to shoulder the burden of keeping dependencies untangled or need to stop people from installing dependencies in the global Python site-packages location.

At the same time, I appreciate that a plug-in environment of a third-party tool is something different, and if adding dependencies to the Python installation used by that tool is cumbersome or impossible vendorizing may be a viable option. I see that Anki distributes extensions as .zip files without setuptools support, so that's certainly such an environment.

So if you choose to vendor dependencies, then use a script to manage your dependencies and update their imports. This is your option #1, but automated.

This is the path that the pip project has chosen, see their tasks subdirectory for their automation, which builds on the invoke library. See the pip project vendoring README for their policy and rationale (chief among those is that pip needs to bootstrap itself, e.g. have their dependencies available to be able to install anything).

You should not use any of the other options; you already enumerated the issues with #2 and #3.

The issue with option #4, using a custom importer, is that you still need to rewrite imports. Put differently, the custom importer hook used by setuptools doesn't solve the vendorized namespace problem at all, it instead makes it possible to dynamically import top-level packages if the vendorized packages are missing (a problem that pip solves with a manual debundling process). setuptools actually uses option #1, where they rewrite the source code for vendorized packages. See for example these lines in the packaging project in the setuptools vendored subpackage; the setuptools.extern namespace is handled by the custom import hook, which then redirects either to setuptools._vendor or the top-level name if importing from the vendorized package fails.

The pip automation to update vendored packages takes the following steps:

Delete everything in the _vendor/ subdirectory except the documentation, the __init__.py file and the requirements text file.
Use pip to install all vendored dependencies into that directory, using a dedicated requirements file named vendor.txt, avoiding compilation of .pyc bytecache files and ignoring transient dependencies (these are assumed to be listed in vendor.txt already); the command used is pip install -t pip/_vendor -r pip/_vendor/vendor.txt --no-compile --no-deps.
Delete everything that was installed by pip but not needed in a vendored environment, i.e. *.dist-info, *.egg-info, the bin directory, and a few things from installed dependencies that pip would never use.
Collect all installed directories and added files sans .py extension (so anything not in the whitelist); this is the vendored_libs list.
Rewrite imports; this is simply a series of regexes, where every name in vendored_lists is used to replace import <name> occurrences with import pip._vendor.<name> and every from <name>(.*) import occurrence with from pip._vendor.<name>(.*) import.
Apply a few patches to mop up the remaining changes needed; from a vendoring perspective, only the pip patch for requests is interesting here in that it updates the requests library backwards compatibility layer for the vendored packages that the requests library had removed; this patch is quite meta!

So in essence, the most important part of the pip approach, the rewriting of vendored package imports is quite simple; paraphrased to simplify the logic and removing the pip specific parts, it is simply the following process:

import shutil import subprocess import re  from functools import partial from itertools import chain from pathlib import Path  WHITELIST = {'README.txt', '__init__.py', 'vendor.txt'}  def delete_all(*paths, whitelist=frozenset()):     for item in paths:         if item.is_dir():             shutil.rmtree(item, ignore_errors=True)         elif item.is_file() and item.name not in whitelist:             item.unlink()  def iter_subtree(path):     """Recursively yield all files in a subtree, depth-first"""     if not path.is_dir():         if path.is_file():             yield path         return     for item in path.iterdir():         if item.is_dir():             yield from iter_subtree(item)         elif item.is_file():             yield item  def patch_vendor_imports(file, replacements):     text = file.read_text('utf8')     for replacement in replacements:         text = replacement(text)     file.write_text(text, 'utf8')  def find_vendored_libs(vendor_dir, whitelist):     vendored_libs = []     paths = []     for item in vendor_dir.iterdir():         if item.is_dir():             vendored_libs.append(item.name)         elif item.is_file() and item.name not in whitelist:             vendored_libs.append(item.stem)  # without extension         else:  # not a dir or a file not in the whilelist             continue         paths.append(item)     return vendored_libs, paths  def vendor(vendor_dir):     # target package is <parent>.<vendor_dir>; foo/_vendor -> foo._vendor     pkgname = f'{vendor_dir.parent.name}.{vendor_dir.name}'      # remove everything     delete_all(*vendor_dir.iterdir(), whitelist=WHITELIST)      # install with pip     subprocess.run([         'pip', 'install', '-t', str(vendor_dir),         '-r', str(vendor_dir / 'vendor.txt'),         '--no-compile', '--no-deps'     ])      # delete stuff that's not needed     delete_all(         *vendor_dir.glob('*.dist-info'),         *vendor_dir.glob('*.egg-info'),         vendor_dir / 'bin')      vendored_libs, paths = find_vendored_libs(vendor_dir, WHITELIST)      replacements = []     for lib in vendored_libs:         replacements += (             partial(  # import bar -> import foo._vendor.bar                 re.compile(r'(^\s*)import {}\n'.format(lib), flags=re.M).sub,                 r'\1from {} import {}\n'.format(pkgname, lib)             ),             partial(  # from bar -> from foo._vendor.bar                 re.compile(r'(^\s*)from {}(\.|\s+)'.format(lib), flags=re.M).sub,                 r'\1from {}.{}\2'.format(pkgname, lib)             ),         )      for file in chain.from_iterable(map(iter_subtree, paths)):         patch_vendor_imports(file, replacements)  if __name__ == '__main__':     # this assumes this is a script in foo next to foo/_vendor     here = Path('__file__').resolve().parent     vendor_dir = here / 'foo' / '_vendor'     assert (vendor_dir / 'vendor.txt').exists(), '_vendor/vendor.txt file not found'     assert (vendor_dir / '__init__.py').exists(), '_vendor/__init__.py file not found'     vendor(vendor_dir)

answered Oct 19 '22 02:10

Martijn Pieters

How about making your anki_addons folder a package and importing the the required libraries to __init__.py in the main package folder.

So it'd be something like

anki/ __init__.py

In anki.__init__.py :

from anki_addons import library1

In anki.anki_addons.__init__.py :

from addon_name_1 import *

I'm new at this, so please bear with me here.

answered Oct 19 '22 03:10

gavin

Related questions
                            
                                Why does [=]{} have a lambda capture?
                            
                                When is it safe to move a member value out of a pinned future?
                            
                                What is the recommended way to skin an entire application in WPF?
                            
                                How do you manage your Javascript files?
                            
                                Best data binding solution for GWT
                            
                                Windows malloc replacement (e.g., tcmalloc) and dynamic crt linking
                            
                                Alternative to SendKeys when running over Remote Desktop?
                            
                                Bootstrapping a language on LLVM
                            
                                How to find a good/optimal dictionary for zlib 'setDictionary' when processing a given set of data?
                            
                                Scrolling a Canvas smoothly in Android
                            
                                Declaring variable type based on a column type
                            
                                .NET WinForms INotifyPropertyChanged updates all bindings when one is changed. Better way?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With