A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings: <pre class="prettyprint"><code>for x in $(seq 1 5); do echo $x done </code></pre> But, how do I iterate over each character of a word?

It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in <code>dash</code>, but I don't have busybox handy to test with. <pre class="prettyprint"><code>var='ab * cd' tmp="$var" # The loop will consume the variable, so make a temp copy first while [ -n "$tmp" ]; do rest="${tmp#?}" # All but the first character of the string first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character echo "$first" tmp="$rest" done </code></pre> Output: <pre class="prettyprint"><code>a b * c d </code></pre> Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in <code>[ -n "$tmp" ]</code> are absolutely necessary, and the inner double-quotes in <code>first="${tmp%"$rest"}"</code> are needed if the string contains "*".

Use getopts to process input one character at a time. The <code>:</code> instructs getopts to ignore illegal options and set OPTARG. The leading <code>-</code> in the input makes getopts treat the string as a options. If getopts encounters a colon, it will not set <code>OPTARG</code>, so the script uses parameter expansion to return <code>:</code> when <code>OPTARG</code> is not set/null. <pre class="prettyprint"><code>#!/bin/sh IFS=' ' split_string () { OPTIND=1; while getopts ":" opt "-$1" do echo "'${OPTARG:-:}'" done } while read -r line;do split_string "$line" done </code></pre> As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them: <pre class="prettyprint"><code>#!/bin/sh IFS=' ' split_string () { OPTIND=1; while getopts ":" opt "$1";do case "${OPTARG:=:}" in ([[:print:]]) [ -n "$multi" ] && echo "$multi" && multi= echo "$OPTARG" && continue esac multi="$multi$OPTARG" case "$multi" in ([[:print:]]) echo "$multi" && multi= esac done [ -n "$multi" ] && echo "$multi" } while read -r line;do split_string "-$line" done </code></pre> Here the extra <code>case "$multi"</code> is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale. This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine. Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it. This won't handle the case where a single byte character is followed by a combining character.

How to iterate over the characters of a string in a POSIX shell script?

Tags:

shell

posix

sh

dash-shell

A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:

for x in $(seq 1 5); do
    echo $x
done

But, how do I iterate over each character of a word?

752

asked Jun 26 '18 23:06

Luis Lavaire.

2 Answers

It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.

var='ab * cd'

tmp="$var"    # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
    rest="${tmp#?}"    # All but the first character of the string
    first="${tmp%"$rest"}"    # Remove $rest, and you're left with the first character
    echo "$first"
    tmp="$rest"
done

Output:

a
b

*

c
d

Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".

197

answered Oct 02 '22 04:10

Gordon Davisson

Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.

If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "-$1"
    do echo "'${OPTARG:-:}'"
  done
}

while read -r line;do
  split_string "$line"
done

As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:

#!/bin/sh
IFS='
'
split_string () {
  OPTIND=1;
  while getopts ":" opt "$1";do
    case "${OPTARG:=:}" in
      ([[:print:]])
        [ -n "$multi" ] && echo "$multi" && multi=
        echo "$OPTARG" && continue
    esac
    multi="$multi$OPTARG"
    case "$multi" in
      ([[:print:]]) echo "$multi" && multi=
    esac
  done
  [ -n "$multi" ] && echo "$multi"
}
while read -r line;do
  split_string "-$line"
done

Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.

This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.

Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.

This won't handle the case where a single byte character is followed by a combining character.

answered Oct 02 '22 04:10

David Farrell

Related questions
                            
                                Filter special chars such as color codes from shell output
                            
                                netcat with milliseconds interval
                            
                                Hadoop fs -du-h sorting by size for M, G, T, P, E, Z, Y
                            
                                diff on columns of two files in shell
                            
                                How to detect if system has IPv6 enabled in a UNIX shell script?
                            
                                Bash/SH, Same command different output?
                            
                                Extracting part of a string on jenkins pipeline
                            
                                How to suppress irrelevant ShellCheck messages?
                            
                                Detect whether current shell is powershell in python
                            
                                How do I capture a SQLPlus exit code within a shell script?
                            
                                escape character in vim command
                            
                                How to save the current working directory to Zsh history?
                            
                                How to execute shell script in cygwin? [closed]
                            
                                How to do date calculations in Shell Scripting?
                            
                                How to find the options in if conditions of shell [duplicate]
                            
                                CLI shell script code generation from compiled executable? [closed]
                            
                                Executing KornShell script
                            
                                How to parse JSON with shell scripting on Linux?
                            
                                Why is this Bash function within a git alias executing twice, and why does adding `exit` fix it?
                            
                                docker run throws "invalid reference format: repository name must be lowercase" using $(pwd) in volume flag

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With