Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to have a python script copy itself?

I am using python for scientific applications. I run simulations with various parameters, my script outputs the data to an appropriate directory for that parameter set. Later I use that data. However sometimes I edit my script; in order to be able to reproduce my results if needed I would like to have a copy of whatever version of the script was used to generate the data live right in the directory with the data. So basically I would like to have my python script copy itself to the data directory. What's the best way to do this?

Thanks!

like image 562
Kai Sikorski Avatar asked Dec 14 '22 23:12

Kai Sikorski


2 Answers

I stumbled across this question as I wanted to do the same thing. Although I agree with the comments that git/VCS with revision and everything would be the cleanest solution, sometimes you just want something quick and dirty that does the job. So if anyone is still interested:

With __file__ you can access the running scripts filename (with path), and as already suggested you can use a high-level file manipulation lib like shutil to copy it to some place. In one line:

shutil.copy(__file__, 'experiment_folder_path/copied_script_name.py') 

With the corresponding imports and some bells and whistles:

import shutil
import os     # optional: for extracting basename / creating new filepath
import time   # optional: for appending time string to copied script

# generate filename with timestring
copied_script_name = time.strftime("%Y-%m-%d_%H%M") + '_' + os.path.basename(__file__)

# copy script
shutil.copy(__file__, 'my_experiment_folder_path' + os.sep + copied_script_name) 
like image 141
Honeybear Avatar answered Dec 31 '22 05:12

Honeybear


Copying the script can be done with shutil.copy().

But you should consider keeping your script under revision control. That enables you to retain a revision history.

E.g. I keep my scripts under revision control with git. In Python files I tend to keep a version string like this;

__version__ = '$Revision: a42ef58 $'[11:-2]

This version string is updated with the git short hash tag every time the file in question is changed. (this is done by running a script called update-modified-keywords.py from git's post-commit hook.)

If you have a version string like this, you can embed that in the output, so you always know which version has produced the output.

Edit:

The update-modified-keywords script is shown below;

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
# $Revision: 3d4f750 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-modified-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove and check out those files that that contain keywords and have
changed since in the last commit in the current working directory."""

from __future__ import print_function, division
import os
import mmap
import sys
import subprocess


def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- string or list of strings of commands. A single string may
            not contain spaces.
    """
    if isinstance(args, str):
        if ' ' in args:
            raise ValueError('No spaces in single command allowed.')
        args = [args]
    try:
        with open(os.devnull, 'w') as bb:
            subprocess.check_call(args, stdout=bb, stderr=bb)
    except subprocess.CalledProcessError:
        print("Required program '{}' not found! exiting.".format(args[0]))
        sys.exit(1)


def modifiedfiles():
    """Find files that have been modified in the last commit.

    :returns: A list of filenames.
    """
    fnl = []
    try:
        args = ['git', 'diff-tree', 'HEAD~1', 'HEAD', '--name-only', '-r',
                '--diff-filter=ACMRT']
        with open(os.devnull, 'w') as bb:
            fnl = subprocess.check_output(args, stderr=bb).splitlines()
            # Deal with unmodified repositories
            if len(fnl) == 1 and fnl[0] is 'clean':
                return []
    except subprocess.CalledProcessError as e:
        if e.returncode == 128:  # new repository
            args = ['git', 'ls-files']
            with open(os.devnull, 'w') as bb:
                fnl = subprocess.check_output(args, stderr=bb).splitlines()
    # Only return regular files.
    fnl = [i for i in fnl if os.path.isfile(i)]
    return fnl


def keywordfiles(fns):
    """Filter those files that have keywords in them

    :fns: A list of filenames
    :returns: A list for filenames for files that contain keywords.
    """
    # These lines are encoded otherwise they would be mangled if this file
    # is checked in my git repo!
    datekw = 'JERhdGU='.decode('base64')
    revkw = 'JFJldmlzaW9u'.decode('base64')
    rv = []
    for fn in fns:
        with open(fn, 'rb') as f:
            try:
                mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
                if mm.find(datekw) > -1 or mm.find(revkw) > -1:
                    rv.append(fn)
                mm.close()
            except ValueError:
                pass
    return rv


def main(args):
    """Main program.

    :args: command line arguments
    """
    # Check if git is available.
    checkfor(['git', '--version'])
    # Check if .git exists
    if not os.access('.git', os.F_OK):
        print('No .git directory found!')
        sys.exit(1)
    print('{}: Updating modified files.'.format(args[0]))
    # Get modified files
    files = modifiedfiles()
    if not files:
        print('{}: Nothing to do.'.format(args[0]))
        sys.exit(0)
    files.sort()
    # Find files that have keywords in them
    kwfn = keywordfiles(files)
    for fn in kwfn:
        os.remove(fn)
    args = ['git', 'checkout', '-f'] + kwfn
    subprocess.call(args)


if __name__ == '__main__':
    main(sys.argv)

If you don't want keyword expansion to clutter up your git history, you can use the smudge and clean filters. I have the following set in my ~/.gitconfig;

[filter "kw"]
    clean = kwclean
    smudge = kwset

Both kwclean and kwset are Python scripts.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Fill the Date and Revision keywords from the latest git commit and tag and
   subtitutes them in the standard input."""

import os
import sys
import subprocess
import re


def gitdate():
    """Get the date from the latest commit in ISO8601 format.
    """
    args = ['git', 'log',  '-1', '--date=iso']
    dline = [l for l in subprocess.check_output(args).splitlines()
             if l.startswith('Date')]
    try:
        dat = dline[0][5:].strip()
        return ''.join(['$', 'Date: ', dat, ' $'])
    except IndexError:
        raise ValueError('Date not found in git output')


def gitrev():
    """Get the latest tag and use it as the revision number. This presumes the
    habit of using numerical tags. Use the short hash if no tag available.
    """
    args = ['git', 'describe',  '--tags', '--always']
    try:
        with open(os.devnull, 'w') as bb:
            r = subprocess.check_output(args, stderr=bb)[:-1]
    except subprocess.CalledProcessError:
        return ''.join(['$', 'Revision', '$'])
    return ''.join(['$', 'Revision: ', r, ' $'])


def main():
    """Main program.
    """
    dre = re.compile(''.join([r'\$', r'Date:?\$']))
    rre = re.compile(''.join([r'\$', r'Revision:?\$']))
    currp = os.getcwd()
    if not os.path.exists(currp+'/.git'):
        print >> sys.stderr, 'This directory is not controlled by git!'
        sys.exit(1)
    date = gitdate()
    rev = gitrev()
    for line in sys.stdin:
        line = dre.sub(date, line)
        print rre.sub(rev, line),


if __name__ == '__main__':
    main()

and

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Author: R.F. Smith <[email protected]>
# $Date: 2013-11-24 22:20:54 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove the Date and Revision keyword contents from the standard input."""

import sys
import re

## This is the main program ##
if __name__ == '__main__':
    dre = re.compile(''.join([r'\$', r'Date.*\$']))
    drep = ''.join(['$', 'Date', '$'])
    rre = re.compile(''.join([r'\$', r'Revision.*\$']))
    rrep = ''.join(['$', 'Revision', '$'])
    for line in sys.stdin:
        line = dre.sub(drep, line)
        print rre.sub(rrep, line),

Both of these scripts are installed (without an extension at the end of the filename, as usual for executables) in a directory that is in my $PATH, and have their executable bit set.

In the .gitattributes file of my repository I choose for which files I want keyword expansion. So for e.g. Python files;

*.py filter=kw
like image 34
Roland Smith Avatar answered Dec 31 '22 04:12

Roland Smith