Okay, so I have a script processing the null-separated output of find
, and I can easily process this using a bash shell like so:
#!/bin/sh
find "$1" -print0 | while read -rd '' path; do echo "$path"; done
Fairly silly example since it just converts the results to new-lines anyway, but it's just to give you an idea of what I'm looking to do. This basic method works great, and avoids potential issues due to files possibly containing new-lines on various file-systems.
However, I need to do the same thing on a non-bash shell, which means I lose support for read -d
. So, without resorting to bash (or other shell) specific features, is there a way that I can process null-separated results similarly to the above?
If not, what is the best to protect myself against new-lines in results? I was thinking I could perhaps use the -exec
option of find
to replace new-lines in file names with some kind of escaped value, but I'm not sure of the best way to find and replace the new-lines (I can't use tr
for example) or what replacement to use, which is why null-characters are the best option if available.
See How can I find and safely handle file names containing newlines, spaces or both?.
You can e.g. use find -exec
:
find [...] -exec <command> {} \;
or xargs -0
:
find [...] -print0 | xargs -r0 <command>
Note that in your above example you still need to set IFS
or you will trim off leading/trailing whitespace:
while IFS= read -rd '' file; do
do_something_with "${file}"
done
You are right, it's a real bummer that this read
only properly works in bash
. I usually don't give a damn about possible newlines in filenames and just make sure that otherwise portable code doesn't break if they occur (as opposed to ignoring the problem and your script exploding) which I believe suffices for most scenarios, e.g.
while IFS= read -r file; do
[ -e "${file}" ] || continue # skip over truncated filenames due to newlines
do_something_file "${file}"
done < <(find [...])
or use globbing
(when possible) which behaves correctly:
for file in *.foo; do
[ -e "${file}" ] || continue # or use nullglob
do_something_file "${file}"
done
zsh
The simplest solution is to use zsh
, which is a non-bash
shell that supports reading null-separated values via read -d ""
(since version 4.2, released in 2004) and the only mainstream shell that can store nulls in variables. Moreover, the last component of the pipeline is not run in subshell in zsh
, so variables set there are not lost. We can simply write:
#!/usr/bin/env zsh
find . -print0 |while IFS="" read -r -d "" file; do
echo "$file"
done
With zsh
we can also easily avoid the problem of null separators altogether (at least in the case of find . -print
) by using setopt globdots
, which makes globs match hidden files, and **
, which recurses into subdirectories. This works in basically all versions of zsh
, even those older than 4.2:
#!/usr/bin/env zsh
setopt globdots
for file in **/*; do
echo "$file"
done
od
A general, POSIX-compatible solution for iterating over null separated values needs to convert the input in a way that no information is lost and nulls are converted to something else that is easier to process. We can use od
to dump octal values of all input bytes and easily convert the data back using printf
:
#!/usr/bin/env sh
find . -print0 |od -An -vto1 |xargs printf ' %s' \
|sed 's/ 000/@/g' |tr @ '\n' \
|while IFS="" read -r file; do
file=`printf '\134%s' $file`
file=`printf "$file@"`
file="${file%@}"
echo "$file"
done
Note that the while
loop will be run in a subshell (at least in shells other than zsh
and the original, non-public domain Korn shell), which means that variables set in that loop won't be visible in the rest of the code. If that's unacceptable, the while
loop can be run from the main shell, and its input can be stored in a variable:
#!/usr/bin/env sh
VAR=`find . -print0 |od -An -vto1 |xargs printf ' %s' \
|sed 's/ 000/@/g' |tr @ '\n'`
while IFS="" read -r file; do
file=`printf '\134%s' $file`
file=`printf "$file@"`
file="${file%@}"
echo "$file"
done <<EOF
$VAR
EOF
If the output of the find
command is very long, the script will be unable to store the output in the variable, and may crash. Moreover, most shells use temporary files to implement heredocs, so instead of using a variable, we might as well explicitly write to a temporary file, and avoid problems with using variables for storing intermediate results.
#!/usr/bin/env sh
TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
find . -print0 |od -An -vto1 |xargs printf ' %s' \
|sed 's/ 000/@/g' |tr @ '\n' >"$TMPFILE"
while IFS="" read -r file; do
file=`printf '\134%s' $file`
file=`printf "$file@"`
file="${file%@}"
echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"
We may use named pipes to solve the above two problems: now reading and writing can be done in parallel, and we don't need to store intermediate results in variables. Note, however, that this might not work in Cygwin.
#!/usr/bin/env sh
TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
mknod "$TMPFILE" p
{
exec 3>"$TMPFILE"
find . -print0 |od -An -vto1 |xargs printf ' %s' \
|sed 's/ 000/@/g' |tr @ '\n' >&3
} &
while IFS="" read -r file; do
file=`printf '\134%s' $file`
file=`printf "$file@"`
file="${file%@}"
echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"
The above solutions should work in any POSIX shell, but fail in the original Bourne shell, which is the default /bin/sh
in Solaris 10 and older. This shell doesn't support the %
-substitution, and trailing newlines in filenames need to be preserved in another way, e.g.:
#!/usr/bin/env sh
TMPFILE="/tmp/$$_`awk 'BEGIN{srand(); print rand()}'`"
mknod "$TMPFILE" p
{
exec 3>"$TMPFILE"
find . -print0 |od -An -vto1 |xargs printf ' %s' \
|sed 's/ 000/@/g' |tr @ '\n' >&3
} &
while read -r file; do
trailing_nl=""
for char in $file; do
if [ X"$char" = X"012" ]; then
trailing_nl="${trailing_nl}
"
else
trailing_nl=""
fi
done
file=`printf '\134%s' $file`
file=`printf "$file"`
file="$file$trailing_nl"
echo "$file"
done <"$TMPFILE"
rm -f "$TMPFILE"
As pointed out in the comments, Haravikk's answer is not completely correct. Here is a modified version of his code that handles all sorts of strange situations, such as paths beginning with ~:/\/:
and trailing newlines in filenames. Note that it only works for relative pathnames; a similar trick can be done with absolute pathnames by prepending them with /./
, but read_path()
needs to be changed to handle that. This method is inspired by Rich’s sh (POSIX shell) tricks.
#!/usr/bin/env sh
read_path() {
path=
IFS=
read -r path || return $?
read -r path_next || return 0
if [ X"$path" = X"././" ]; then
path="./"
read -r path_next || return 0
return
fi
path="./$path"
while [ X"$path_next" != X"././" ]; do
path=`printf '%s\n%s' "$path" "$path_next"`
read -r path_next || return 0
done
}
find ././ |sed 's,^\./\./,&\n,' |while read_path; do
echo "$path"
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With