Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

py2app picking up .git subdir of a package during build

We use py2app extensively at our facility to produce self contained .app packages for easy internal deployment without dependency issues. Something I noticed recently, and have no idea how it began, is that when building an .app, py2app started including the .git directory of our main library.

commonLib, for instance, is our root python library package, which is a git repo. Under this package are the various subpackages such as database, utility, etc.

commonLib/
    |- .git/ # because commonLib is a git repo
    |- __init__.py
    |- database/
        |- __init__.py
    |- utility/
        |- __init__.py
    # ... etc

In a given project, say Foo, we will do imports like from commonLib import xyz to use our common packages. Building via py2app looks something like: python setup.py py2app

So the recent issue I am seeing is that when building an app for project Foo, I will see it include everything in commonLib/.git/ into the app, which is extra bloat. py2app has an excludes option but that only seems to be for python modules. I cant quite figure out what it would take to exclude the .git subdir, or in fact, what is causing it to be included in the first place.

Has anyone experienced this when using a python package import that is a git repo? Nothing has changed in our setup.py files for each project, and commonLib has always been a git repo. So the only thing I can think of being a variable is the version of py2app and its deps which have obviously been upgraded over time.

Edit

I'm using the latest py2app 0.6.4 as of right now. Also, my setup.py was first generated from py2applet a while back, but has been hand configured since and copied over as a template for every new project. I am using PyQt4/sip for every single one of these projects, so it also makes me wonder if its an issue with one of the recipes?

Update

From the first answer, I tried to fix this using various combinations of exclude_package_data settings. Nothing seems to force the .git directory to become excluded. Here is a sample of what my setup.py files generally look like:

from setuptools import setup
from myApp import VERSION

appname = 'MyApp'
APP = ['myApp.py']
DATA_FILES = []
OPTIONS = {
    'includes': 'atexit, sip, PyQt4.QtCore, PyQt4.QtGui',
    'strip': True, 
    'iconfile':'ui/myApp.icns', 
    'resources':['src/myApp.png'], 
    'plist':{
        'CFBundleIconFile':'ui/myApp.icns',
        'CFBundleIdentifier':'com.company.myApp',
        'CFBundleGetInfoString': appname,
        'CFBundleVersion' : VERSION,
        'CFBundleShortVersionString' : VERSION
        }
    }

setup(
    app=APP,
    data_files=DATA_FILES,
    options={'py2app': OPTIONS},
    setup_requires=['py2app'],
)

I have tried things like:

setup(
    ...
    exclude_package_data = { 'commonLib': ['.git'] },
    #exclude_package_data = { '': ['.git'] },
    #exclude_package_data = { 'commonLib/.git/': ['*'] },
    #exclude_package_data = { '.git': ['*'] },
    ...
)

Update #2

I have posted my own answer which does a monkeypatch on distutils. Its ugly and not preferred, but until someone can offer me a better solution, I guess this is what I have.

like image 942
jdi Avatar asked Mar 23 '12 19:03

jdi


4 Answers

I am adding an answer to my own question, to document the only thing I have found to work thus far. My approach was to monkeypatch distutils to ignore certain patterns when creating a directory or copying a file. This is really not what I wanted to do, but like I said, its the only thing that works so far.

## setup.py ##

import re

# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util

def wrapper(fn):
    def wrapped(src, *args, **kwargs):
        if not re.search(r'/\.git/?', src):
            fn(src, *args, **kwargs) 
    return wrapped       

file_util.copy_file = wrapper(file_util.copy_file)
dir_util.mkpath = wrapper(dir_util.mkpath)

# now import setuptools so it uses the monkeypatched methods
from setuptools import setup

Hopefully someone will comment on this and tell me a higher level approach to avoid doing this. But as of now, I will probably wrap this into a utility method like exclude_data_patterns(re_pattern) to be reused in my projects.

like image 151
jdi Avatar answered Oct 16 '22 04:10

jdi


I can see two options for excluding the .git directory.

  1. Build the application from a 'clean' checkout of the code. When deploying a new version, we always build from a fresh svn export based on a tag to ensure we don't pick up spurious changes/files. You could try the equivalent here - although the git equivalent seems somewhat more involved.

  2. Modify the setup.py file to massage the files included in the application. This might be done using the exclude_package_data functionality as described in the docs, or build the list of data_files and pass it to setup.

As for why it has suddenly started happening, knowing the version of py2app you are using might help, as will knowing the contents of your setup.py and perhaps how this was made (by hand or using py2applet).

like image 35
Mark Streatfield Avatar answered Oct 16 '22 02:10

Mark Streatfield


I have a similar experience with Pyinstaller, so I'm not sure it applies directly.

Pyinstaller creates a "manifest" of all files to be included in the distribution, before running the export process. You could "massage" this manifest, as per Mark's second suggestion, to exclude any files you want. Including anything within .git or .git itself.

In the end, I stuck with checking out my code before producing a binary as there was more than just .git being bloat (such as UML documents and raw resource files for Qt). A checkout guaranteed a clean result and I experienced no issues automating that process along with the process of creating the installer for the binary.

like image 20
Marcus Ottosson Avatar answered Oct 16 '22 04:10

Marcus Ottosson


There is a good answer to this, but I have a more elaborate answer to solve the problem mentioned here with a white-list approach. To have the monkey patch also work for packages outside site-packages.zip I had to monkey patch also copy_tree (because it imports copy_file inside its function), this helps in making a standalone application.

In addition, I create a white-list recipe to mark certain packages zip-unsafe. The approach makes it easy to add filters other than white-list.

import pkgutil
from os.path import join, dirname, realpath
from distutils import log

# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
# noinspection PyUnresolvedReferences
from py2app import util


def keep_only_filter(base_mod, sub_mods):
    prefix = join(realpath(dirname(base_mod.filename)), '')
    all_prefix = [join(prefix, sm) for sm in sub_mods]
    log.info("Set filter for prefix %s" % prefix)

    def wrapped(mod):
        name = getattr(mod, 'filename', None)
        if name is None:
            # ignore anything that does not have file name
            return True
        name = join(realpath(dirname(name)), '')
        if not name.startswith(prefix):
            # ignore those that are not in this prefix
            return True
        for p in all_prefix:
            if name.startswith(p):
                return True
        # log.info('ignoring %s' % name)
        return False
    return wrapped

# define all the filters we need
all_filts = {
    'mypackage': (keep_only_filter, [
        'subpackage1', 'subpackage2',
    ]),
}


def keep_only_wrapper(fn, is_dir=False):
    filts = [(f, k[1]) for (f, k) in all_filts.iteritems()
             if k[0] == keep_only_filter]
    prefixes = {}
    for f, sms in filts:
        pkg = pkgutil.get_loader(f)
        assert pkg, '{f} package not found'.format(f=f)
        p = join(pkg.filename, '')
        sp = [join(p, sm, '') for sm in sms]
        prefixes[p] = sp

    def wrapped(src, *args, **kwargs):
        name = src
        if not is_dir:
            name = dirname(src)
        name = join(realpath(name), '')
        keep = True
        for prefix, sub_prefixes in prefixes.iteritems():
            if name == prefix:
                # let the root pass
                continue
            # if it is a package we have a filter for
            if name.startswith(prefix):
                keep = False
                for sub_prefix in sub_prefixes:
                    if name.startswith(sub_prefix):
                        keep = True
                        break
        if keep:
            return fn(src, *args, **kwargs)
        return []

    return wrapped

file_util.copy_file = keep_only_wrapper(file_util.copy_file)
dir_util.mkpath = keep_only_wrapper(dir_util.mkpath, is_dir=True)
util.copy_tree = keep_only_wrapper(util.copy_tree, is_dir=True)


class ZipUnsafe(object):
    def __init__(self, _module, _filt):
        self.module = _module
        self.filt = _filt

    def check(self, dist, mf):
        m = mf.findNode(self.module)
        if m is None:
            return None

        # Do not put this package in site-packages.zip
        if self.filt:
            return dict(
                packages=[self.module],
                filters=[self.filt[0](m, self.filt[1])],
            )
        return dict(
            packages=[self.module]
        )

# Any package that is zip-unsafe (uses __file__ ,... ) should be added here 
# noinspection PyUnresolvedReferences
import py2app.recipes
for module in [
        'sklearn', 'mypackage',
]:
    filt = all_filts.get(module)
    setattr(py2app.recipes, module, ZipUnsafe(module, filt))
like image 22
dashesy Avatar answered Oct 16 '22 03:10

dashesy