Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over the characters of a string in a POSIX shell script?

A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:

for x in $(seq 1 5); do
    echo $x
done

But, how do I iterate over each character of a word?

like image 752
Luis Lavaire. Avatar asked Jun 26 '18 23:06

Luis Lavaire.


People also ask

How do I loop a string in bash?

Create a bash file named 'for_list1.sh' and add the following script. A string value with spaces is used within for loop. By default, string value is separated by space. For loop will split the string into words and print each word by adding a newline.

Can we iterate a string?

Another way to iterate over a string is to use for item of str . The variable item receives the character directly so you do not have to use the index. If your code does not need the index value of each character, this loop format is even simpler.


2 Answers

It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.

var='ab * cd'

tmp="$var"    # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
    rest="${tmp#?}"    # All but the first character of the string
    first="${tmp%"$rest"}"    # Remove $rest, and you're left with the first character
    echo "$first"
    tmp="$rest"
done

Output:

a
b

*

c
d

Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".

like image 197
Gordon Davisson Avatar answered Oct 02 '22 04:10

Gordon Davisson


Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.

If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "-$1"
    do echo "'${OPTARG:-:}'"
  done
}

while read -r line;do
  split_string "$line"
done

As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "$1";do
    case "${OPTARG:=:}" in
      ([[:print:]])
        [ -n "$multi" ] && echo "$multi" && multi=
        echo "$OPTARG" && continue
    esac
    multi="$multi$OPTARG"
    case "$multi" in
      ([[:print:]]) echo "$multi" && multi=
    esac
  done
  [ -n "$multi" ] && echo "$multi"
}
while read -r line;do
  split_string "-$line"
done

Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.

This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.

Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.

This won't handle the case where a single byte character is followed by a combining character.

like image 21
David Farrell Avatar answered Oct 02 '22 04:10

David Farrell