A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:
for x in $(seq 1 5); do
echo $x
done
But, how do I iterate over each character of a word?
Create a bash file named 'for_list1.sh' and add the following script. A string value with spaces is used within for loop. By default, string value is separated by space. For loop will split the string into words and print each word by adding a newline.
Another way to iterate over a string is to use for item of str . The variable item receives the character directly so you do not have to use the index. If your code does not need the index value of each character, this loop format is even simpler.
It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash
, but I don't have busybox handy to test with.
var='ab * cd'
tmp="$var" # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
rest="${tmp#?}" # All but the first character of the string
first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character
echo "$first"
tmp="$rest"
done
Output:
a
b
*
c
d
Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ]
are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}"
are needed if the string contains "*".
Use getopts to process input one character at a time. The :
instructs getopts to ignore illegal options and set OPTARG. The leading -
in the input makes getopts treat the string as a options.
If getopts encounters a colon, it will not set OPTARG
, so the script uses parameter expansion to return :
when OPTARG
is not set/null.
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "-$1"
do echo "'${OPTARG:-:}'"
done
}
while read -r line;do
split_string "$line"
done
As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "$1";do
case "${OPTARG:=:}" in
([[:print:]])
[ -n "$multi" ] && echo "$multi" && multi=
echo "$OPTARG" && continue
esac
multi="$multi$OPTARG"
case "$multi" in
([[:print:]]) echo "$multi" && multi=
esac
done
[ -n "$multi" ] && echo "$multi"
}
while read -r line;do
split_string "-$line"
done
Here the extra case "$multi"
is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.
This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.
Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.
This won't handle the case where a single byte character is followed by a combining character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With