Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Good way to get function and all dependencies in a single file?

I'm working on a big Python code base that grows and grows and grows. It's not a single application - more of a bunch of experiments that share some common code.

Every so often, I want to make a public release of a given experiment. I don't want to release my entire awful codebase, just the parts required to run a given experiment. So basically I'd like something to crawl through all the imports and copy whatever functions are called (or at least all the modules imported) into a single file, which I can release as a demo. I'd of course like to only do this for files defined in the current project (not a dependent package like numpy).

I'm using PyCharm now, and haven't been able to find that functionality. Is there any tool that does this?

Edit: I created the public-release package to solve this problem. Given a main module, it crawls through dependent modules and copies them into a new repo.

like image 809
Peter Avatar asked Nov 07 '16 16:11

Peter


People also ask

How do I get all dependencies of a package in Python?

Use the pipdeptree utility to gather a list of all dependencies, create a requirements. txt file listing all the dependencies, and then download them with the pip download command. Get the list of dependencies for a package from the setup.py file.

When you use pip to install a package that requires one or more dependencies then?

Pip relies on package authors to stipulate the dependencies for their code in order to successfully download and install the package plus all required dependencies from the Python Package Index (PyPI). But if packages are installed one at a time, it may lead to dependency conflicts.

Do Python wheels contain dependencies?

Python package installed with pip , e.g. WheelPython dependencies can be specified as dependencies in your packaging, and automatically installed by pip . You can include third party C libraries in wheels, but for sufficiently complex dependencies that won't work.


2 Answers

If you just want the modules, you could just run the code and a new session and go through sys.modules for any module in your package.

To move all the dependencies with PyCharm, you could make a macro that moves a highlighted object to a predefined file, attach the macro to a keyboard shortcut and then quickly move any in-project imports recursively. For instance, I made a macro called export_func that moves a function to to_export.py and added a shortcut to F10:

Macro Actions

Given a function that I want to move in a file like

from utils import factorize


def my_func():
    print(factorize(100))

and utils.py looking something like

import numpy as np
from collections import Counter
import sys
if sys.version_info.major >= 3:
    from functools import lru_cache
else:
    from functools32 import lru_cache


PREPROC_CAP = int(1e6)


@lru_cache(10)
def get_primes(n):
    n = int(n)
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
    for i in range(1, int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[k * k // 3::2 * k] = False
            sieve[k * (k - 2 * (i & 1) + 4) // 3::2 * k] = False
    return list(map(int, np.r_[2, 3, ((3 * np.nonzero(sieve)[0][1:] + 1) | 1)]))


@lru_cache(10)
def _get_primes_set(n):
    return set(get_primes(n))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])

I can highlight my_func and press F10 to create to_export.py:

from utils import factorize


def my_func():
    print(factorize(100))

Highlighting factorize in to_export.py and hitting F10 gets

from collections import Counter
from functools import lru_cache

from utils import PREPROC_CAP, _get_primes_set, get_primes


def my_func():
    print(factorize(100))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])

Then highlighting each of PREPROC_CAP, _get_primes_set, and get_primes and then pressing F10 gets

from collections import Counter
from functools import lru_cache

import numpy as np


def my_func():
    print(factorize(100))


@lru_cache(int(1e6))
def factorize(value):
    if value == 1:
        return Counter()
    if value < PREPROC_CAP and value in _get_primes_set(PREPROC_CAP):
        return Counter([value])
    for p in get_primes(PREPROC_CAP):
        if p ** 2 > value:
            break
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    for p in range(PREPROC_CAP + 1, int(value ** .5) + 1, 2):
        if value % p == 0:
            factors = factorize(value // p).copy()
            factors[p] += 1
            return factors
    return Counter([value])


PREPROC_CAP = int(1e6)


@lru_cache(10)
def _get_primes_set(n):
    return set(get_primes(n))


@lru_cache(10)
def get_primes(n):
    n = int(n)
    sieve = np.ones(n // 3 + (n % 6 == 2), dtype=np.bool)
    for i in range(1, int(n ** 0.5) // 3 + 1):
        if sieve[i]:
            k = 3 * i + 1 | 1
            sieve[k * k // 3::2 * k] = False
            sieve[k * (k - 2 * (i & 1) + 4) // 3::2 * k] = False
    return list(map(int, np.r_[2, 3, ((3 * np.nonzero(sieve)[0][1:] + 1) | 1)]))

It goes pretty fast even if you have a lot of code that you're copying over.

like image 85
Colin Avatar answered Oct 12 '22 14:10

Colin


Jamming all your code into a single module isn't a good idea. A good example reason why is the case when one of your experiments depends on two modules with different definitions for the same function name. With separate modules, it's easy for your code to distinguish between them; to stuff them in the same module, the editor would have to do some kind of hacky function renaming (e.g., prepend them with the old module name or something), and the situation gets even worse if some other function in the module calls the one with the conflicting name. You effectively have to fully replace the module scoping mechanism to do this.

Building a list of module dependencies is also a non-trival task. Consider having an experiment that depends on a module that depends on numpy. You almost certainly want your end users to actually install the numpy package rather than bundle it, so now the editor has to have some way of distinguishing what modules to include and which ones you expect to be installed some other way. On top of this, you have to consider things like when a function imports a module in line as opposed to at the top of your module and other out-of-the-ordinary cases.

You're asking too much of your editor. You really have two problems:

  1. Separate your experimental code from your release ready code.
  2. Package your stable code.

Separating your experimental code

Source control is the answer to your first problem. This will allow you to create whatever experimental code you wish on your local machine, and as long as you don't commit it, you won't pollute your code base with experimental code. If you do want to commit this code for back up, tracking, or sharing purposes, you can use branching here. Identify a branch as your stable branch (typically trunk in SVN and master in git), and only commit experimental code to other branches. You can then merge experimental feature branches into the stable branch as they become mature enough to publish. Such a branching set up has the added benefit of allowing you to segregate your experiments from each other, if you choose.

A server hosted source control system will generally make things simpler and safer, but if you're the sole developer, you could still use git locally without a server. A server hosted repository also makes it easier to coordinate with others if you're not the sole developer.

Packaging your stable code

One very simple option to consider is to just tell your users to check out the stable branch from the repository. Distributing this way is far from unheard of. This is still a little better than your current situation since you no longer need to manually gather all your files; you may need to do a little bit of documentation, though. You alternatively could use your source control's built in feature to check out an entire commit as a zip file or similar (export in SVN and archive in git) if you don't want to make your repository publicly available; these can be uploaded anywhere.

If that doesn't seem like enough and you can spare the time right now, setuptools is probably a good answer to this problem. This will allow you to generate a wheel containing your stable code. You can have a setup.py script for each package of code you want to release; the setup.py script will identify which packages and modules to include. You do have to manage this script manually, but if you configure it to include whole packages and directories and then establish good project conventions for organizing your code, you shouldn't have to change it very often. This also has the benefit of giving your end users a standard install mechanism for your code. You could even publish it on PyPI if you wish to share it broadly.

If you go so far as to use setuptools, you may also want to consider a build server, which can pick up on new commits and can run scripts to repackage and potentially publish your code.

like image 33
jpmc26 Avatar answered Oct 12 '22 15:10

jpmc26