~ ls
A B C
On bash (looks wrong)
~IFS=$'\x00' read -a vars < <(find -type f -print0); echo "${vars}"
ABC
On zsh (looks good)
~IFS=$'\x00' read -A vars < <(find -type f -print0); echo "${vars}"
A B C
Is it a bash bug?
The null character is very special and POSIX and bash do not allow it inside strings (it is the definition of the end of a string, so $'\x00'
and $'\000'
pretty much never work; Inian's answer here even links to a workaround for entering the null character, but again you cannot expect that to be properly preserved when you assign it to a variable). Looks like zsh doesn't mind it, but bash does.
Here's a test that illustrates the problems representing space, tab, and newline characters in filenames:
$ touch 'two words' tabbed$'\t'words "two
lines"
$ ls # GNU coreutils ls displays using bash's $'string' notation
'tabbed'$'\t''words' 'two'$'\n''lines' 'two words'
$ ls |cat # … except when piped elsewhere
tabbed words
two
lines
two words
$ find * # GNU findutils find displays tabs & newlines as questions
tabbed?words
two?lines
two words
$ find * |cat # … except when piped elsewhere
tabbed words
two
lines
two words
$ touch a b c # (more tests for later)
The GNU tools are very smart and know this is a problem, so they come up with creative ways around it—but they aren't even consistent. ls
assumes you're using bash or zsh (the $'…'
syntax for a literal is not present in POSIX) and find
gives you a question mark (itself a valid filename character, but it's a file glob that matches any character, so e.g. rm two?lines tabbed?words
will delete both files, just like rm 'two'$'\n''lines' 'tabbed'$'\t''words'
). Both present the truth when piped to another command like cat
.
I see you're using GNU extensions: POSIX and BSD/OSX find
don't allow an implicit path and POSIX find
doesn't support -print0
though the POSIX find spec does mention it:
Other implementations have added other ways to get around this problem, notably a -print0 primary that wrote filenames with a null byte terminator. This was considered here, but not adopted. Using a null terminator meant that any utility that was going to process find's -print0 output had to add a new option to parse the null terminators it would now be reading.
The POSIX xargs spec similarly lacks support for -0
(there is no reference to it either), though it is supported by xargs
in GNU, BSD/OSX, and busybox.
Therefore, you can probably do this:
$ find . -type f -print0 |xargs -0
./c ./b ./a ./two
lines ./tabbed words ./two words
However, you might actually want the array, so perhaps I'm overfitting to your simplified question.
You can use mapfile
in Bash 4.4 and later:
$ mapfile -d '' vars < <(find . -type f -print0)
$ printf '<%s>\n' "${vars[@]}"
<./c>
<./b>
<./a>
<./two
lines>
<./tabbed words>
<./two words>
Some commands, including mapfile
, read
, and readarray
(a synonym of mapfile
), accept -d ''
as if it were -d $'\0'
, likely [citation needed] as a workaround for POSIX shell's aforementioned inability to deal with null characters in strings.
This mapfile
command merely reads an input file (standard input in this case) into the $vars
array as delimited by null characters. Standard input is populated via pipeline by means of a file descriptor created by the <(…)
process substitution at the end of the line, which handles the output of our find
command.
A short aside: You'd think you could simply do find … |mapfile …
but that changes the scope and any variables you set or modify in there are lost when the pipeline command completes. The process substitution trick doesn't trap you in the same way.
The printf
command simply demonstrates the contents of the array. The angle brackets denote the start and end of each item so you aren't confused by the newline, space, or tab.
There are a lot of mis-conceptions in your logic in both the attempts above. In bash
shell you just cannot store the value of NULL byte \x00
in a variable, be it the special IFS
or any other user-defined variable. So your requirement to split the result of find
over the NULL byte would never work. Because of this your results from find
are stored in the array at first index as a one long entry concatenated with the NULL byte.
You can get around the problem of using the NULL byte in a variable by a few tricks defined in How to pass \x00
as argument to program?. You could use any other custom character for your IFS
simply though as
IFS=: read -r -a splitList <<<"foo:bar:dude"
declare -p splitList
The ideal way would to read NULL de-limited files would be set the delimiter field in read
command to read until the null byte is encountered.
But then if you simply do
IFS= read -r -d '' -a files < <(find -type f -print0)
you only read the first file followed by the NULL byte and the array "${files[@]}"
would just contain one filename. You need to read in a loop, until the last NULL byte is read and no more characters to read
declare -a array=()
while IFS= read -r -d '' file; do
array+=( "$file" )
done < <(find -type f -print0)
which emits the results containing each file in a separate array entry
printf '%s\n' "${array[@]}"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With