Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Howto split a string on a multi-character delimiter in bash?

Why doesn't work the following bash code?

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

expected output:

e
bb
aaddsb
like image 412
v217 Avatar asked Nov 18 '16 22:11

v217


People also ask

How do I split a string on a delimiter in bash?

In bash, a string can also be divided without using $IFS variable. The 'readarray' command with -d option is used to split the string data. The -d option is applied to define the separator character in the command like $IFS. Moreover, the bash loop is used to print the string in split form.

How do you split a line in a word in shell script?

The -a option of read will allow you to split a line read in by the characters contained in $IFS . #!/bin/bash filename=$1 while read LINE do echo $LINE | read -a done < $filename should it work?

How do I cut a string in bash?

There is a built-in function named trim() for trimming in many standard programming languages. Bash has no built-in function to trim string data. But many options are available in bash to remove unwanted characters from string data, such as parameter expansion, sed, awk, xargs, etc.


2 Answers

Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:

in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"

If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:

# Taken from http://mywiki.wooledge.org/BashFAQ/021

# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
  # STR cannot be empty
  [[ $1 ]] || return

  # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
  awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
    # get the length of the search string
    BEGIN {
      len = length(str);
    }

    {
      # empty the output string
      out = "";

      # continue looping while the search string is in the line
      while (i = index($0, str)) {
        # append everything up to the search string, and the replacement string
        out = out substr($0, 1, i-1) rep;

        # remove everything up to and including the first instance of the
        # search string from the line
        $0 = substr($0, i + len);
      }

      # append whatever is left
      out = out $0;

      print out;
    }
  '
}

...used, in this context, as:

gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
like image 79
Charles Duffy Avatar answered Dec 30 '22 16:12

Charles Duffy


The recommended tool for character subtitution is sed's command s/regexp/replacement/ for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.

Pipe your echo output and try to substitute the characters mm witht the newline character \n:

echo "emmbbmmaaddsb" | sed 's/mm/\n/g'

The output is:

e
bb
aaddsb
like image 37
John Goofy Avatar answered Dec 30 '22 16:12

John Goofy