I am using Python's <code>subprocess</code> module to call some Linux command line functions. The documentation explains the <code>shell=True</code> argument as <blockquote> If shell is <code>True</code>, the specified command will be executed through the shell </blockquote> There are two examples, which seem the same to me from a descriptive viewpoint (i.e. both of them call some command-line command), but one of them uses <code>shell=True</code> and the other does not <pre class="prettyprint"><code>>>> subprocess.call(["ls", "-l"]) 0 >>> subprocess.call("exit 1", shell=True) 1 </code></pre> My question is: <ul> <li>What does running the command with <code>shell=False</code> do, in contrast to <code>shell=True</code>? </li> <li>I was under the impression that <code>subprocess.call</code> and <code>check_call</code> and <code>check_output</code> all must execute the argument through the shell. In other words, how can it possibly not execute the argument through the shell?</li> </ul> It would also be helpful to get some examples of: <ul> <li>Things that can be done with <code>shell=True</code> that can't be done with <code>shell=False</code> and why they can't be done.</li> <li>Vice versa (although it seems that there are no such examples)</li> <li>Things for which it does not matter whether <code>shell=True</code> or <code>False</code> and why it doesn't matter</li> </ul>

UNIX programs start each other with the following three calls, or derivatives/equivalents thereto: <ul> <li> <code>fork()</code> - Create a new copy of yourself.</li> <li> <code>exec()</code> - Replace yourself with a different program (do this if you're the copy!).</li> <li> <code>wait()</code> - Wait for another process to finish (optional, if not running in background).</li> </ul> Thus, with <code>shell=False</code>, you do just that (as Python-syntax pseudocode below -- exclude the <code>wait()</code> if not a blocking invocation such as <code>subprocess.call()</code>): <pre class="prettyprint"><code>pid = fork() if pid == 0: # we're the child process, not the parent execlp("ls", "ls", "-l", NUL); else: retval = wait(pid) # we're the parent; wait for the child to exit & get its exit status </code></pre> whereas with <code>shell=True</code>, you do this: <pre class="prettyprint"><code>pid = fork() if pid == 0: execlp("sh", "sh", "-c", "ls -l", NUL); else: retval = wait(pid) </code></pre> Note that with <code>shell=False</code>, the command we executed was <code>ls</code>, whereas with <code>shell=True</code>, the command we executed was <code>sh</code>. <hr> That is to say: <pre class="prettyprint"><code>subprocess.Popen(foo, shell=True) </code></pre> is exactly the same as: <pre class="prettyprint"><code>subprocess.Popen( ["sh", "-c"] + ([foo] if isinstance(foo, basestring) else foo), shell=False) </code></pre> That is to say, you execute a copy of <code>/bin/sh</code>, and direct that copy of <code>/bin/sh</code> to parse the string into an argument list and execute <code>ls -l</code> itself. <hr> So, why would you use <code>shell=True</code>? <ul> <li> You're invoking a shell builtin. For instance, the <code>exit</code> command is actually part of the shell itself, rather than an external command. That said, this is a fairly small set of commands, and it's rare for them to be useful in the context of a shell instance that only exists for the duration of a single <code>subprocess.call()</code> invocation. </li> <li> You have some code with shell constructs (ie. redirections) that would be difficult to emulate without it. If, for instance, your command is <code>cat one two >three</code>, the syntax <code>>three</code> is a redirection: It's not an argument to <code>cat</code>, but an instruction to the shell to set <code>stdout=open('three', 'w')</code> when running the command <code>['cat', 'one', 'two']</code>. If you don't want to deal with redirections and pipelines yourself, you need a shell to do it. A slightly trickier case is <code>cat foo bar | baz</code>. To do that without a shell, you need to start both sides of the pipeline yourself: <code>p1 = Popen(['cat', 'foo', 'bar'], stdout=PIPE), p2=Popen(['baz'], stdin=p1.stdout)</code>. </li> <li> You don't give a damn about security bugs. ...okay, that's a little bit too strong, but not by much. Using <code>shell=True</code> is dangerous. You can't do this: <code>Popen('cat -- %s' % (filename,), shell=True)</code> without a shell injection vulnerability: If your code were ever invoked with a <code>filename</code> containing <code>$(rm -rf ~)</code>, you'd have a very bad day. On the other hand, <code>['cat', '--', filename]</code> is safe with all possible filenames: The filename is purely data, not parsed as source code by a shell or anything else. It is possible to write safe scripts in shell, but you need to be careful about it. Consider the following: <pre class="prettyprint"><code>filenames = ['file1', 'file2'] # these can be user-provided subprocess.Popen(['cat -- "$@" | baz', '_'] + filenames, shell=True) </code></pre> That code is safe (well -- as safe as letting a user read any file they want ever is), because it's passing your filenames out-of-band from your script code -- but it's safe only because the string being passed to the shell is fixed and hardcoded, and the parameterized content is external variables (the <code>filenames</code> list). And even then, it's "safe" only to a point -- a bug like Shellshock that triggers on shell initialization would impact it as much as anything else. </li> </ul>

<blockquote> I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell. </blockquote> No, subprocess is perfectly capable of starting a program directly (via an operating system call). It does not need a shell <blockquote> Things that can be done with shell=True that can't be done with shell=False </blockquote> You can use <code>shell=False</code> for any command that simply runs some executable optionally with some specified arguments. You must use <code>shell=True</code> if your command uses shell features. This includes pipelines, <code>|</code>, or redirections or that contains compound statements combined with <code>;</code> or <code>&&</code> or <code>||</code> etc. Thus, one can use <code>shell=False</code> for a command like <code>grep string file</code>. But, a command like <code>grep string file | xargs something</code> will, because of the <code>|</code> require <code>shell=True</code>. Because the shell has power features that python programmers do not always find intuitive, it is considered better practice to use <code>shell=False</code> unless you really truly need the shell feature. As an example, pipelines are not really truly needed because they can also be done using subprocess' PIPE feature.

How does subprocess.call() work with shell=False?

Tags:

python

linux

bash

shell

subprocess

I am using Python's subprocess module to call some Linux command line functions. The documentation explains the shell=True argument as

If shell is True, the specified command will be executed through the shell

There are two examples, which seem the same to me from a descriptive viewpoint (i.e. both of them call some command-line command), but one of them uses shell=True and the other does not

>>> subprocess.call(["ls", "-l"])
0

>>> subprocess.call("exit 1", shell=True)
1

My question is:

What does running the command with shell=False do, in contrast to shell=True?
I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell. In other words, how can it possibly not execute the argument through the shell?

It would also be helpful to get some examples of:

Things that can be done with shell=True that can't be done with shell=False and why they can't be done.
Vice versa (although it seems that there are no such examples)
Things for which it does not matter whether shell=True or False and why it doesn't matter

970

asked May 15 '17 23:05

dkv

2 Answers

UNIX programs start each other with the following three calls, or derivatives/equivalents thereto:

fork() - Create a new copy of yourself.
exec() - Replace yourself with a different program (do this if you're the copy!).
wait() - Wait for another process to finish (optional, if not running in background).

Thus, with shell=False, you do just that (as Python-syntax pseudocode below -- exclude the wait() if not a blocking invocation such as subprocess.call()):

pid = fork()
if pid == 0: # we're the child process, not the parent
  execlp("ls", "ls", "-l", NUL);
else:
  retval = wait(pid) # we're the parent; wait for the child to exit & get its exit status

whereas with shell=True, you do this:

pid = fork()
if pid == 0:
  execlp("sh", "sh", "-c", "ls -l", NUL);
else:
  retval = wait(pid)

Note that with shell=False, the command we executed was ls, whereas with shell=True, the command we executed was sh.

That is to say:

subprocess.Popen(foo, shell=True)

is exactly the same as:

subprocess.Popen(
  ["sh", "-c"] + ([foo] if isinstance(foo, basestring) else foo),
  shell=False)

That is to say, you execute a copy of /bin/sh, and direct that copy of /bin/sh to parse the string into an argument list and execute ls -l itself.

So, why would you use shell=True?

You're invoking a shell builtin.

For instance, the exit command is actually part of the shell itself, rather than an external command. That said, this is a fairly small set of commands, and it's rare for them to be useful in the context of a shell instance that only exists for the duration of a single subprocess.call() invocation.
You have some code with shell constructs (ie. redirections) that would be difficult to emulate without it.

If, for instance, your command is cat one two >three, the syntax >three is a redirection: It's not an argument to cat, but an instruction to the shell to set stdout=open('three', 'w') when running the command ['cat', 'one', 'two']. If you don't want to deal with redirections and pipelines yourself, you need a shell to do it.

A slightly trickier case is cat foo bar | baz. To do that without a shell, you need to start both sides of the pipeline yourself: p1 = Popen(['cat', 'foo', 'bar'], stdout=PIPE), p2=Popen(['baz'], stdin=p1.stdout).
You don't give a damn about security bugs.

...okay, that's a little bit too strong, but not by much. Using shell=True is dangerous. You can't do this: Popen('cat -- %s' % (filename,), shell=True) without a shell injection vulnerability: If your code were ever invoked with a filename containing $(rm -rf ~), you'd have a very bad day. On the other hand, ['cat', '--', filename] is safe with all possible filenames: The filename is purely data, not parsed as source code by a shell or anything else.

It is possible to write safe scripts in shell, but you need to be careful about it. Consider the following:
```
filenames = ['file1', 'file2'] # these can be user-provided
subprocess.Popen(['cat -- "$@" | baz', '_'] + filenames, shell=True)
```
That code is safe (well -- as safe as letting a user read any file they want ever is), because it's passing your filenames out-of-band from your script code -- but it's safe only because the string being passed to the shell is fixed and hardcoded, and the parameterized content is external variables (the filenames list). And even then, it's "safe" only to a point -- a bug like Shellshock that triggers on shell initialization would impact it as much as anything else.

162

answered Oct 18 '22 22:10

Charles Duffy

I was under the impression that subprocess.call and check_call and check_output all must execute the argument through the shell.

No, subprocess is perfectly capable of starting a program directly (via an operating system call). It does not need a shell

Things that can be done with shell=True that can't be done with shell=False

You can use shell=False for any command that simply runs some executable optionally with some specified arguments.

You must use shell=True if your command uses shell features. This includes pipelines, |, or redirections or that contains compound statements combined with ; or && or || etc.

Thus, one can use shell=False for a command like grep string file. But, a command like grep string file | xargs something will, because of the | require shell=True.

Because the shell has power features that python programmers do not always find intuitive, it is considered better practice to use shell=False unless you really truly need the shell feature. As an example, pipelines are not really truly needed because they can also be done using subprocess' PIPE feature.

answered Oct 18 '22 23:10

John1024

Related questions
                            
                                The similar method from the nltk module produces different results on different machines. Why?
                            
                                Python decimal.InvalidOperation error
                            
                                Use plotly offline to generate graphs as images
                            
                                uwsgi http is ambiguous
                            
                                How to detect when pytest test case failed?
                            
                                How to get indices of non-diagonal elements of a numpy array?
                            
                                Pandas: How to reference and print multiple dataframes as HTML tables
                            
                                Using python ijson to read a large json file with multiple json objects
                            
                                Flask-SQLAlchemy - how do sessions work with multiple databases?
                            
                                HTTP status code 200 vs 202
                            
                                How to setup Emacs to use a given Python virtualenv?
                            
                                open selected rows with pandas using "chunksize" and/or "iterator"
                            
                                Pandas dataframe: ValueError: num must be 1 <= num <= 0, not 1
                            
                                Drop specific rows from multiindex Dataframe
                            
                                How can I tell if Gensim Word2Vec is using the C compiler?
                            
                                How can I type hint an attribute in Python 3.5?
                            
                                Postgres: values query on json key with django
                            
                                find row positions and column names of cells contanining inf in pandas dataframe
                            
                                Python - AttributeError: 'numpy.ndarray' object has no attribute 'append'
                            
                                Detecting incorrect assertion methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With