First off, apologies for what I'm sure will be obvious is my rudimentary understanding of bash and shells and subprocesses.
I am trying to use Python to automate calls to a program called Freesurfer (actually, the subprogram I'm calling is called recon-all.)
If I were doing this directly at the command line, I'd "source" a script called mySetUpFreeSurfer.sh that does nothing but set three environment variables, and then "source" another script, FreeSurferEnv.sh. FreesurferEnv.sh doesn't seem to me to do anything but set a lot of environment variables and echo some stuff to the terminal, but it's more complicated than the other bash script, so I'm not sure of that.
Here is what I have right now:
from subprocess import Popen, PIPE, call, check_output
import os
root = "/media/foo/"
#I got this function from another Stack Overflow question.
def source(script, update=1):
pipe = Popen(". %s; env" % script, stdout=PIPE, shell=True)
data = pipe.communicate()[0]
env = dict((line.split("=", 1) for line in data.splitlines()))
if update:
os.environ.update(env)
return env
source('~/scripts/mySetUpFreeSurfer.sh')
source('/usr/local/freesurfer/FreeSurferEnv.sh')
for sub_dir in os.listdir(root):
sub = "s" + sub_dir[0:4]
anat_dir = os.path.join(root, sub_dir, "anatomical")
for directory in os.listdir(anat_dir):
time_dir = os.path.join(anat_dir, directory)
for d in os.listdir(time_dir):
dicoms_dir = os.path.join(time_dir, d, 'dicoms')
dicom_list = os.listdir(dicoms_dir)
dicom = dicom_list[0]
path = os.path.join(dicoms_dir, dicom)
cmd1 = "recon-all -i " + path + " -subjid " + sub
check_output(cmd1, shell=True)
call(cmd1, shell=True)
cmd2 = "recon-all -all -subjid " + sub,
call(cmd2, shell=True)
This is failing:
Traceback (most recent call last):
File "/home/katie/scripts/autoReconSO.py", line 28, in <module>
check_output(cmd1, shell=True)
File "/usr/lib/python2.7/subprocess.py", line 544, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command 'recon-all -i /media/foo/bar -subjid s1001' returned non-zero exit status 127
I maybe understand why this is. My "calls" later in the script are raising new subprocesses that do not inherit environment variables from the processes that are raised by invocation of the source() function. I have done a number of things to try to confirm my understanding. One example -- I put these lines:
mkdir ~/testFreeSurferEnv
export TEST_ENV_VAR=~/testFreeSurferEnv
in the FreeSurferEnv.sh script. The directory gets made just fine, but in the Python script this:
cmd = 'mkdir $TEST_ENV_VAR/test'
check_output(cmd, shell=True)
fails like this:
File "/usr/lib/python2.7/subprocess.py", line 544, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command 'mkdir $TEST_ENV_VAR/test' returned non-zero exit status 1
QUESTION:
How can I make the subprocess that runs "recon-all" inherit the environment variables it needs? Or how can I do everything I need to do -- run the scripts to set the environment variables, and call recon-all, in the same process? Or should I approach the problem another way? Or do I likely misunderstand the problem?
Subprocesses inherit only environment variables. They are available automatically, without the subprocess having to take any explicit action. All the other "things" -- shell options, aliases, and functions -- must be made explicitly available. The environment file is how you do this.
Popen is more general than subprocess. call . Popen doesn't block, allowing you to interact with the process while it's running, or continue with other things in your Python program. The call to Popen returns a Popen object.
Using /usr/bin/envpath = '/dir1:/dir2' subprocess. Popen(['/usr/bin/env', '-P', path, 'progtorun', other, args], ...) This lets you pass a different PATH to the env process (using the option -P ), which will use it to find the program.
To start a new process, or in other words, a new subprocess in Python, you need to use the Popen function call. It is possible to pass two parameters in the function call. The first parameter is the program you want to start, and the second is the file argument.
If you look at the docs for Popen
, it takes an env
parameter:
If env is not
None
, it must be a mapping that defines the environment variables for the new process; these are used instead of inheriting the current process’ environment, which is the default behavior.
You've written a function that extracts the environment you want from your sourced scripts and puts it into a dict
. Just pass the result as the env
to the scripts you want to use it. For example:
env = {}
env.update(os.environ)
env.update(source('~/scripts/mySetUpFreeSurfer.sh'))
env.update(source('/usr/local/freesurfer/FreeSurferEnv.sh'))
# …
check_output(cmd, shell=True, env=env)
Regarding
If I were doing this directly at the command line, I'd "source" a script called mySetUpFreeSurfer.sh that does nothing but set three environment variables, and then "source" another script, FreeSurferEnv.sh.
I think you would be better off using Python to automate the process of writing
a shell script newscript.sh
, and then calling this script with one call
subprocess.check_output
(instead of many calls to Popen
, check_output
,
call
, etc.):
newscript.sh:
#!/bin/bash
source ~/scripts/mySetUpFreeSurfer.sh
source /usr/local/freesurfer/FreeSurferEnv.sh
recon-all -i /media/foo/bar -subjid s1001
...
and then calling
subprocess.check_output(['newscript.sh'])
import subprocess
import tempfile
import os
import stat
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
f.write('''\
#!/bin/bash
source ~/scripts/mySetUpFreeSurfer.sh
source /usr/local/freesurfer/FreeSurferEnv.sh
''')
root = "/media/foo/"
for sub_dir in os.listdir(root):
sub = "s" + sub_dir[0:4]
anat_dir = os.path.join(root, sub_dir, "anatomical")
for directory in os.listdir(anat_dir):
time_dir = os.path.join(anat_dir, directory)
for d in os.listdir(time_dir):
dicoms_dir = os.path.join(time_dir, d, 'dicoms')
dicom_list = os.listdir(dicoms_dir)
dicom = dicom_list[0]
path = os.path.join(dicoms_dir, dicom)
cmd1 = "recon-all -i {} -subjid {}\n".format(path, sub)
f.write(cmd1)
cmd2 = "recon-all -all -subjid {}\n".format(sub)
f.write(cmd2)
filename = f.name
os.chmod(filename, stat.S_IRUSR | stat.S_IXUSR)
subprocess.call([filename])
os.unlink(filename)
By the way,
def source(script, update=1):
pipe = Popen(". %s; env" % script, stdout=PIPE, shell=True)
data = pipe.communicate()[0]
env = dict((line.split("=", 1) for line in data.splitlines()))
if update:
os.environ.update(env)
return env
is broken. For example, if script
contains something like
VAR=`ls -1`
export VAR
then
. script; env
may return output like
VAR=file1
file2
file3
which will result in source(script)
raising a ValueError
:
env = dict((line.split("=", 1) for line in data.splitlines()))
ValueError: dictionary update sequence element #21 has length 1; 2 is required
There is a way to fix source
: have env
separate environment variables with a zero byte instead of the ambiguous newline:
def source(script, update=True):
"""
http://pythonwise.blogspot.fr/2010/04/sourcing-shell-script.html (Miki Tebeka)
http://stackoverflow.com/questions/3503719/#comment28061110_3505826 (ahal)
"""
import subprocess
import os
proc = subprocess.Popen(
['bash', '-c', 'set -a && source {} && env -0'.format(script)],
stdout=subprocess.PIPE, shell=False)
output, err = proc.communicate()
output = output.decode('utf8')
env = dict((line.split("=", 1) for line in output.split('\x00') if line))
if update:
os.environ.update(env)
return env
Fixable or not, however, you are still probably better off constructing a
conglomerate shell script (as shown above) than you would be parsing env
and
passing env
dicts to subprocess
calls.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With