Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why test for equality in sh scripts in an indirect way?

Tags:

bash

shell

sh

I often see this construct in sh scripts:

if [ "z$x" = z ]; then echo x is empty; fi

Why don't they just write it like this?

if [ "$x" = "" ]; then echo x is empty; fi
like image 472
Dog Avatar asked Dec 20 '22 01:12

Dog


2 Answers

TL;DR short answer

In this construct:

if [ "z$x" = z ]; then echo x is empty; fi

the z is a guard against funny content of $x and many other problems.

If you write it without the z:

if [ "$x" = "" ]; then echo x is empty; fi

and $x contains the string -x you will get:

if [ "-x" = "" ]; then echo x is empty; fi

and that confuses the hell out of some older implementations of [.

If you further omit the quotes around $x and $x contains the string -f foo -o x you will get:

if [ -f foo -o x = "" ]; then echo x is empty; fi

and now it silently checks for something completely different.

the guard will prevent these maybe honest human errors maybe possibly malicious attacks to fall through silently. with the guard you either get the correct result or an error message. read on for an elaborate explanation.


Elaborate explanation

The z in

if [ "z$x" = z ]; then echo x is empty; fi

is called a guard.

To explain why you want the guard I first want to explain the syntax of the bash conditional if. It is important to understand that [ is not part of the syntax. It is a command. It is an alias to the test command. And in most current shells it is a builtin command.

The grammar rule for if is roughly as follows:

if command; then morecommands; else evenmorecommands; fi

(the else part is optional)

command can be any command. Really any command. What bash does when it encounters an if is roughly as follows:

  1. Execute command.
  2. Check the exit status of command.
  3. If exit status is 0 then execute morecommands. If exit status is anything else, and the else part exists, then execute evenmorecommands.

Let's try that:

$ if true; then echo yay; else echo boo; fi
yay
$ if wat; then echo yay; else echo boo; fi
bash: wat: command not found
boo
$ if echo foo; then echo yay; else echo boo; fi
foo
yay
$ if cat foo; then echo yay; else echo boo; fi
cat: foo: No such file or directory
boo

Let's try the test command:

$ if test z = z; then echo yay; else echo boo; fi
yay

And the alias [:

$ if [ z = z ]; then echo yay; else echo boo; fi
yay

You see [ is not part of the syntax. It is just a command.

Note that the z here has no special meaning. It is just a string.

Let's try the [ command outside of an if:

$ [ z = z ]

Nothing happens? It returned an exit status. You can check the exit status with echo $?.

$ [ z = z ]
$ echo $?
0

Let's try unequal strings:

$ [ z = x ]
$ echo $?
1

Because [ is a command it accepts parameters just like any other commands. In fact, the closing ] is also a parameter, a mandatory parameter which must come last. If it is missing the command will complain:

$ [ z = z
bash: [: missing `]'

It is misleading that bash does the complaining. Actually the builtin command [ does the complaining. We can see more clearly who does the complaining when we invoke the system [:

$ /usr/bin/[ z = z
/usr/bin/[: missing `]'

Interestingly the system [ doesn't always insist on a closing ]:

$ /usr/bin/[ --version
[ (GNU coreutils) 7.4
...

You need a space before the closing ] otherwise it will not be recognized as a parameter:

$ [ z = z]
bash: [: missing `]'

You also need a space after the [ otherwise bash will think you want to execute another command:

$ [z = z]
bash: [z: command not found

This is much more obvious when you use test:

$ testz = z
bash: testz: command not found

Remember [ is just another name for test.

[ can do more than just compare strings. It can compare numbers:

$ [ 1 -eq 1 ]
$ [ 42 -gt 0 ]

It can also check for the existence of files or directories:

$ [ -f filename ]
$ [ -d dirname ]

See help [ or man [ for more information about the capabilities of [ (or test). man will show you the documentation for the system command. help will show you the documentation for the bash builtin command.

Now that I have covered the bases I can answer your question:

Why do people write this:

if [ "z$x" = z ]; then echo x is empty; fi

and not this:

if [ "$x" = "" ]; then echo x is empty; fi

For brevity I will strip off the if because this is only about [.

The z in this construct:

[ "z$x" = z ]

is a guard against funny content of $x in combination with older implementations of [, and/or a guard against human error like forgetting to quote $x.

What happens when $x has funny content like -f?

This

[ "$x" = "" ]

will become

[ "-f" = "" ]

Some older implementations of [ will get confused when the first parameter starts with a -. The z will make sure that the first parameter never starts with a - regardless of content of $x.

[ "z$x" = "z" ]

will become

[ "z-f" = "z" ]

What happens when you forgot to quote $x? Funny content like -f foo -o x can change the entire meaning of the test.

[ $x = "" ]

will become

[ -f foo -o x = "" ]

The test is now checking for the existence of the file foo and then logical or with whether x is the empty string. The worst part is that you won't even notice because there is no error message, only an exit status. If $x comes from user input this can even be used for malicious attacks.

With the guarding z

[ z$x = z ]

will become

[ z-f foo -o x = z ]

At least you will now get an error message:

$ [ z-f foo -o x = z ]; echo $?
bash: [: too many arguments

The guard also helps against the case of undefined variable instead of the empty string. Some older shells had different behaviour for undefined variable and empty string. This problem is basically solved because in modern shells undefined mostly behaves like an empty string.

Summary:

The quote around $x helps to make the undefined cases behave more like the empty string cases.

The guard before $x helps to further prevent all the other problems mentioned above.

The guard before $x will prevent all these possible errors:

  • Funny content of $x (code injection by malicious user)
  • old implementations of [ (getting confused if string begins with -)
  • forgetting to quote $x (will allow -f foo -o x to subvert the meaning of the test)
  • undefined $x. (older implementations behave differently if undefined)

The guard will either do the right thing or raise an error message.

Modern implementations of [ have fixed some of the problems and modern shells have some solutions for the other cases, but they have pitfalls of their own. The guarding z is not necessary if you are otherwise carefull, but it makes avoiding mistakes while writing simple tests so much more simpler.

See also:

  • bash pitfalls about quoting in tests
  • bash FAQ more details about test
  • more about test
  • more about quoting
  • "test" operator robustness in various shells
like image 91
Lesmana Avatar answered Jan 12 '23 16:01

Lesmana


For testing zero length, use -z:

if [ -z "$x" ] ; then
    echo x is empty
fi

With bash, you can use its [[ that does not need quotes:

if [[ -z $x ]] ; then
    echo x is empty
fi

I just found the following in man 1p sh, the documentation of POSIX shell:

Historical systems have also been unreliable given the common construct:

    test "$response" = "expected string"

One of the following is a more reliable form:

    test "X$response" = "Xexpected string"
    test "expected string" = "$response"

Note that the second form assumes that expected string could not be confused with any unary primary. If expected string starts with '-', '(', '!', or even '=', the first form should be used instead.

like image 41
choroba Avatar answered Jan 12 '23 17:01

choroba