Is chaining interpreters via shebang lines portable?

Question

Tying a script to a specific interpreter via a so-called shebang line is a well-known practice on POSIX operating systems. For example, if the following script is executed (given sufficient file-system permissions), the operating system will launch the /bin/sh interpreter with the file name of the script as its first argument. Subsequently, the shell will execute the commands in the script skipping over the shebang line which it will treat as a comment.

#! /bin/sh

date -R
echo hello world

Possible output:

Sat, 01 Apr 2017 12:34:56 +0100
hello world

I used to believe that the interpreter (/bin/sh in this example) must be a native executable and cannot be a script itself that, in turn, would require yet another interpreter to be launched.

However, I went ahead and tried the following experiment nonetheless.

Using the following dumb shell saved as /tmp/interpreter.py, …

#! /usr/bin/python3

import sys
import subprocess

for script in sys.argv[1:]:
    with open(script) as istr:
        status = any(
            map(
                subprocess.call,
                map(
                    str.split,
                    filter(
                        lambda s : s and not s.startswith('#'),
                        map(str.strip, istr)
                    )
                )
            )
        )
        if status:
            sys.exit(status)

… and the following script saved as /tmp/script.xyz,

#! /tmp/interpreter.py

date -R
echo hello world

… I was able (after making both files executable), to execute script.xyz.

5gon12eder:/tmp> ls -l
total 8
-rwxr-x--- 1 5gon12eder 5gon12eder 493 Jun 19 01:01 interpreter.py
-rwxr-x--- 1 5gon12eder 5gon12eder  70 Jun 19 01:02 script.xyz
5gon12eder:/tmp> ./script.xyz
Mon, 19 Jun 2017 01:07:19 +0200
hello world

This surprised me. I was even able to launch scrip.xyz via another script.

So, what I am asking is this:

Is the behavior observed by my experiment portable?
Was the experiment even conducted correctly or are there situations where this doesn't work? How about different (Unix-like) operating systems?
If this is supposed to work, is it true that there is no observable difference between a native executable and an interpreted script as far as invocation is concerned?

mpez0 · Accepted Answer

New executables in Unix-like operating systems are started by the system call execve(2). The man page for execve includes:

Interpreter scripts
    An interpreter script is  a  text  file  that  has  execute
    permission enabled and whose first line is of the form:

       #! interpreter [optional-arg]

    The interpreter must be a valid pathname for an executable which
    is not itself a script.  If the filename argument  of  execve()
    specifies  an interpreter script, then interpreter will be invoked
    with the following arguments:

       interpreter [optional-arg] filename arg...

   where arg...  is the series of words pointed to by the argv
   argument of execve().

   For portable use, optional-arg should either be absent, or be
   specified as a single word (i.e., it should not contain white
   space);  see  NOTES below.

So within those contraints (Unix-like, optional-arg at most one word), yes, shebang scripts are portable. Read the man page for more details, including other differences in invocation between binary executables and scripts.

agc · Answer

See boldfaced text below:

This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems). -- WP
And this output from COLUMNS=75 man execve | grep -nA 23 " Interpreter scripts" | head -39 on a Ubuntu 17.04 box, particularly lines #186-#189 which tells us what works on Linux, (i.e. scripts can be interpreters, up to four levels deep):

166:   Interpreter scripts
167-       An interpreter script is a text file that has  execute  permission
168-       enabled and whose first line is of the form:
169-
170-           #! interpreter [optional-arg]
171-
172-       The  interpreter  must be a valid pathname for an executable file.
173-       If the filename argument  of  execve()  specifies  an  interpreter
174-       script,  then interpreter will be invoked with the following argu‐
175-       ments:
176-
177-           interpreter [optional-arg] filename arg...
178-
179-       where arg...  is the series of words pointed to by the argv  argu‐
180-       ment of execve(), starting at argv[1].
181-
182-       For  portable  use,  optional-arg  should  either be absent, or be
183-       specified as a single word (i.e.,  it  should  not  contain  white
184-       space); see NOTES below.
185-
186-       Since Linux 2.6.28, the kernel permits the interpreter of a script
187-       to itself be a script.  This permission  is  recursive,  up  to  a
188-       limit  of four recursions, so that the interpreter may be a script
189-       which is interpreted by a script, and so on.
--
343:   Interpreter scripts
344-       A  maximum  line length of 127 characters is allowed for the first
345-       line in an interpreter scripts.
346-
347-       The semantics of  the  optional-arg  argument  of  an  interpreter
348-       script  vary  across implementations.  On Linux, the entire string
349-       following the interpreter name is passed as a single  argument  to
350-       the  interpreter,  and  this string can include white space.  How‐
351-       ever, behavior differs on some other systems.   Some  systems  use
352-       the first white space to terminate optional-arg.  On some systems,
353-       an interpreter script can have multiple arguments, and white  spa‐
354-       ces in optional-arg are used to delimit the arguments.
355-
356-       Linux ignores the set-user-ID and set-group-ID bits on scripts.

Is chaining interpreters via shebang lines portable?

Tags:

linux

shell

unix

posix

executable

5gon12eder

2 Answers

mpez0

agc

Recent Activity

Donate For Us

Is chaining interpreters via shebang lines portable?

Tags:

linux

shell

unix

posix

executable

5gon12eder

2 Answers

mpez0

agc

Related questions

Recent Activity

Donate For Us