Why avoid subshells?

Tags:

I've seen a lot of answers and comments on Stack Overflow that mention doing something to avoid a subshell. In some cases, a functional reason for this is given (most often, the potential need to read a variable outside the subshell that was assigned inside it), but in other cases, the avoidance seems to be viewed as an end in itself. For example

union of two columns of a tsv file
suggesting { ... ; } | ... rather than ( ... ) | ..., so there's a subshell either way.
unhide hidden files in unix with sed and mv commands
Linux bash script to copy files
explicitly stating, "the goal is just to avoid a subshell"

Why is this? Is it for style/elegance/beauty? For performance (avoiding a fork)? For preventing likely bugs? Something else?

448

asked Feb 24 '14 00:02

ruakh

2 Answers

There are a few things going on.

First, forking a subshell might be unnoticible when it happens only once, but if you do it in a loop, it adds up to measurable performance impact. The performance impact is also greater on platforms such as Windows where forking is not as cheap as it is on modern Unixlikes.

Second, forking a subshell means you have more than one context, and information is lost in switching between them -- if you change your code to set a variable in a subshell, that variable is lost when the subshell exits. Thus, the more your code has subshells in it, the more careful you have to be when modifying it later to be sure that any state changes you make will actually persist.

See BashFAQ #24 for some examples of surprising behavior caused by subshells.

177

answered Nov 16 '22 04:11

Charles Duffy

sometimes examples are helpful.

f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -n "$( grep 're' <<< $f )" ]];then ((y++));fi;done;echo $y

real    0m3.878s
user    0m0.794s
sys 0m2.346s
1000

f='fred';y=0;time for ((i=0;i<1000;i++));do if [[ -z "${f/*re*/}" ]];then ((y++));fi;done;echo $y

real    0m0.041s
user    0m0.027s
sys 0m0.001s
1000

f='fred';y=0;time for ((i=0;i<1000;i++));do if grep -q 're' <<< $f ;then ((y++));fi;done >/dev/null;echo $y

real    0m2.709s
user    0m0.661s
sys 0m1.731s
1000

As you can see, in this case, the difference between using grep in a subshell and parameter expansion to do the same basic test is close to 100x in overall time.

Following the question further, and taking into account the comments below, which clearly fail to indicate what they are trying to indicate, I checked the following code: https://unix.stackexchange.com/questions/284268/what-is-the-overhead-of-using-subshells

time for((i=0;i<10000;i++)); do echo "$(echo hello)"; done >/dev/null 
real    0m12.375s
user    0m1.048s
sys 0m2.822s

time for((i=0;i<10000;i++)); do echo hello; done >/dev/null 
real    0m0.174s
user    0m0.165s
sys 0m0.004s

This is actually far far worse than I expected. Almost two orders of magnitude slower in fact in overall time, and almost THREE orders of magnitude slower in sys call time, which is absolutely incredible. https://www.gnu.org/software/bash/manual/html_node/Bash-Builtins.html

Note that the point of demonstrating this is to show that if you are using a testing method that's quite easy to fall into the habit of using, subshell grep, or sed, or gawk (or a bash builtin, like echo), which is for me a bad habit I tend to fall into when hacking fast, it's worth realizing that this will have a significant performance hit, and it's probably worth the time avoiding those if bash builtins can handle the job natively.

By carefully reviewing a large programs use of subshells, and replacing them with other methods, when possible, I was able to cut about 10% of the overall execution time in a just completed set of optimizations (not the first, and not the last, time I have done this, it's already been optimized several times, so gaining another 10% is actually quite significant)

So it's worth being aware of.

Because I was curious, I wanted to confirm what 'time' is telling us here: https://en.wikipedia.org/wiki/Time_(Unix)

The total CPU time is the combination of the amount of time the CPU or CPUs spent performing some action for a program and the amount of time they spent performing system calls for the kernel on the program's behalf. When a program loops through an array, it is accumulating user CPU time. Conversely, when a program executes a system call such as exec or fork, it is accumulating system CPU time.

As you can see in particularly the echo loop test, the cost of the forks is very high in terms of system calls to the kernel, those forks really add up (700x!!! more time spent on sys calls).

I'm in an ongoing process of resolving some of these issues, so these questions are actually quite relevant to me, and the global community of users who like the program in question, that is, this is not an arcane academic point for me, it's realworld, with real impacts.

answered Nov 16 '22 04:11

Lizardx

Related questions
                            
                                how to check if mongodb is up and ready to accept connections from bash script?
                            
                                Validate date format in a shell script
                            
                                change a line with awk
                            
                                How can I run a list of commands in parallel?
                            
                                How can I find the sum of the elements of an array in Bash?
                            
                                -bash: /usr/bin/virtualenvwrapper.sh: No such file or directory
                            
                                RubyGems + Cygwin: POSIX path not found by ruby.exe
                            
                                How to get Linux autocompletion with numbered menu?
                            
                                Cancel failed reverse-i-search in bash but keep what I typed in
                            
                                Alpine apk: List all available package versions
                            
                                How to Bash Complete Three-Part Pattern
                            
                                How to get the process id of command executed in bash script?
                            
                                Colorize filename according to svn status
                            
                                Combine output of two concurrent programs with bash
                            
                                How to call a Bash script from VBA (Excel)
                            
                                Suppressing printout of "Exception ... ignored" message in Python 3
                            
                                How to return to bash prompt after printing output from backgrounded function?
                            
                                How can I hide a password/username used in a bash script for accessing MySQL?
                            
                                Bash Autocompletion - How to pass this array to compgen without significant whitespace being collapsed?
                            
                                Unit Test for Bash completion script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why avoid subshells?

Tags:

bash

subshell

ruakh

People also ask

2 Answers

Charles Duffy

Lizardx

Recent Activity

Donate For Us