Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Iterate Null Separated Results in non-Bash Shell

Tags:

bash

shell

Okay, so I have a script processing the null-separated output of find, and I can easily process this using a bash shell like so:

#!/bin/sh
find "$1" -print0 | while read -rd '' path; do echo "$path"; done

Fairly silly example since it just converts the results to new-lines anyway, but it's just to give you an idea of what I'm looking to do. This basic method works great, and avoids potential issues due to files possibly containing new-lines on various file-systems.

However, I need to do the same thing on a non-bash shell, which means I lose support for read -d. So, without resorting to bash (or other shell) specific features, is there a way that I can process null-separated results similarly to the above?

If not, what is the best to protect myself against new-lines in results? I was thinking I could perhaps use the -exec option of find to replace new-lines in file names with some kind of escaped value, but I'm not sure of the best way to find and replace the new-lines (I can't use tr for example) or what replacement to use, which is why null-characters are the best option if available.

like image 945
Haravikk Avatar asked Mar 31 '14 13:03

Haravikk


2 Answers

See How can I find and safely handle file names containing newlines, spaces or both?.

You can e.g. use find -exec:

find [...] -exec <command> {} \;

or xargs -0:

find [...] -print0 | xargs -r0 <command>

Note that in your above example you still need to set IFS or you will trim off leading/trailing whitespace:

while IFS= read -rd '' file; do
   do_something_with "${file}"
done

You are right, it's a real bummer that this read only properly works in bash. I usually don't give a damn about possible newlines in filenames and just make sure that otherwise portable code doesn't break if they occur (as opposed to ignoring the problem and your script exploding) which I believe suffices for most scenarios, e.g.

while IFS= read -r file; do
    [ -e "${file}" ] || continue # skip over truncated filenames due to newlines
    do_something_file "${file}"
done < <(find [...])

or use globbing (when possible) which behaves correctly:

for file in *.foo; do
    [ -e "${file}" ] || continue # or use nullglob
    do_something_file "${file}"
done
like image 160
Adrian Frühwirth Avatar answered Oct 18 '22 12:10

Adrian Frühwirth


1. Use zsh

The simplest solution is to use zsh, which is a non-bash shell that supports reading null-separated values via read -d "" (since version 4.2, released in 2004) and the only mainstream shell that can store nulls in variables. Moreover, the last component of the pipeline is not run in subshell in zsh, so variables set there are not lost. We can simply write:

#!/usr/bin/env zsh
find . -print0 |while IFS="" read -r -d "" file; do
  echo "$file"
done

With zsh we can also easily avoid the problem of null separators altogether (at least in the case of find . -print) by using setopt globdots, which makes globs match hidden files, and **, which recurses into subdirectories. This works in basically all versions of zsh, even those older than 4.2:

#!/usr/bin/env zsh
setopt globdots
for file in **/*; do
  echo "$file"
done

2. Use a POSIX shell and od

2.1 Use pipes

A general, POSIX-compatible solution for iterating over null separated values needs to convert the input in a way that no information is lost and nulls are converted to something else that is easier to process. We can use od to dump octal values of all input bytes and easily convert the data back using printf:

#!/usr/bin/env sh

find . -print0 |od -An -vto1 |xargs printf ' %s' \
               |sed 's/ 000/@/g' |tr @ '\n' \
               |while IFS="" read -r file; do
  file=`printf '\134%s' $file`
  file=`printf "$file@"`
  file="${file%@}"
  echo "$file"
done

2.2 Use a variable to store intermediate results

Note that the while loop will be run in a subshell (at least in shells other than zsh and the original, non-public domain Korn shell), which means that variables set in that loop won't be visible in the rest of the code. If that's unacceptable, the while loop can be run from the main shell, and its input can be stored in a variable:

#!/usr/bin/env sh

VAR=`find . -print0 |od -An -vto1 |xargs printf ' %s' \
                     |sed 's/ 000/@/g' |tr @ '\n'`
while IFS="" read -r file; do
  file=`printf '\134%s' $file`
  file=`printf "$file@"`
  file="${file%@}"
  echo "$file"
done <<EOF
$VAR
EOF

2.3 Use a temporary file to store intermediate results

If the output of the find command is very long, the script will be unable to store the output in the variable, and may crash. Moreover, most shells use temporary files to implement heredocs, so instead of using a variable, we might as well explicitly write to a temporary file, and avoid problems with using variables for storing intermediate results.

#!/usr/bin/env sh

TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
find . -print0 |od -An -vto1 |xargs printf ' %s' \
               |sed 's/ 000/@/g' |tr @ '\n' >"$TMPFILE"
while IFS="" read -r file; do
  file=`printf '\134%s' $file`
  file=`printf "$file@"`
  file="${file%@}"
  echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"

2.4 Use named pipes

We may use named pipes to solve the above two problems: now reading and writing can be done in parallel, and we don't need to store intermediate results in variables. Note, however, that this might not work in Cygwin.

#!/usr/bin/env sh

TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
mknod "$TMPFILE" p
{
  exec 3>"$TMPFILE"
  find . -print0 |od -An -vto1 |xargs printf ' %s' \
                 |sed 's/ 000/@/g' |tr @ '\n' >&3
} &
while IFS="" read -r file; do
  file=`printf '\134%s' $file`
  file=`printf "$file@"`
  file="${file%@}"
  echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"

3. Modify the above solutions to work with the original Bourne shell

The above solutions should work in any POSIX shell, but fail in the original Bourne shell, which is the default /bin/sh in Solaris 10 and older. This shell doesn't support the %-substitution, and trailing newlines in filenames need to be preserved in another way, e.g.:

#!/usr/bin/env sh

TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
mknod "$TMPFILE" p
{
  exec 3>"$TMPFILE"
  find . -print0 |od -An -vto1 |xargs printf ' %s' \
                 |sed 's/ 000/@/g' |tr @ '\n' >&3
} &
while read -r file; do
  trailing_nl=""
  for char in $file; do
    if [ X"$char" = X"012" ]; then
      trailing_nl="${trailing_nl}
"
    else
      trailing_nl=""
    fi
  done
  file=`printf '\134%s' $file`
  file=`printf "$file"`
  file="$file$trailing_nl"
  echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"

4. Use a separator other than null

As pointed out in the comments, Haravikk's answer is not completely correct. Here is a modified version of his code that handles all sorts of strange situations, such as paths beginning with ~:/\/: and trailing newlines in filenames. Note that it only works for relative pathnames; a similar trick can be done with absolute pathnames by prepending them with /./, but read_path() needs to be changed to handle that. This method is inspired by Rich’s sh (POSIX shell) tricks.

#!/usr/bin/env sh

read_path() {
    path=
    IFS=
    read -r path || return $?
    read -r path_next || return 0
    if [ X"$path" = X"././" ]; then
        path="./"
        read -r path_next || return 0
        return
    fi
    path="./$path"
    while [ X"$path_next" != X"././" ]; do
        path=`printf '%s\n%s' "$path" "$path_next"`
        read -r path_next || return 0
    done
}

find ././ |sed 's,^\./\./,&\n,' |while read_path; do
  echo "$path"
done
like image 45
michau Avatar answered Oct 18 '22 10:10

michau