Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash Script - split string using regex delimiter

I want to split string something like 'substring1 substring2 ONCE[0,10s] substring3'. The expected result should be (with delimiter 'ONCE[0,10s]'):

substring1 substring2
substring3

The problem is that the number in delimiter is variable such as 'ONCE[0,1s]' or 'ONCE[0,3m]' or 'ONCE[0,10d]' and so on.

How can I do this in bash script ? Any idea ?

Thank you

like image 722
user3541984 Avatar asked Apr 16 '14 16:04

user3541984


3 Answers

The example provided in the OP (as well as the two answers provided by @GlennJackman and @devnull) assume that the actual question could have been:

In bash, how do I replace the match for a regular expression in a string with a newline.

That's not actually the same as "split a string using a regular expression", unless you add the constraint that the string does not contain any newline characters. And even then, it's not actually "splitting" the string; the presumption is that some other process will use a newline to split the result.

Once the question has been reformulated, the solution is not challenging. You could use any tool which supports regular expressions, such as sed:

sed 's/ *ONCE\[[^]]*] */\n/g' <<<"$variable"

(Remove the g if you only want to replace the first sequence; you may need to adjust the regular expression, since it wasn't quite clear what the desired constraints are.)

bash itself does not provide a replace all primitive using regular expressions, although it does have "patterns" and, if the option extglob is set (which is the default on some distributions), the patterns are sufficiently powerful to express the pattern, so you could use:

echo "${variable//*( )ONCE\[*([^]])]*( )/$'\n'}"

Again, you can make the substitution only happen once by changing // to / and you may need to change the pattern to meet your precise needs.

That leaves open the question of how to actually split a bash variable using a delimiter specified by a regular expression, for some definition of "split". One possible definition is "call a function with the parts of the string as arguments"; that's the one which we use here:

# Usage:
# call_with_split <pattern> <string> <cmd> <args>...
# Splits string according to regular expression pattern and then invokes
# cmd args string-pieces
call_with_split () { 
  if [[ $2 =~ ($1).* ]]; then
    call_with_split "$1" \
                    "${2:$((${#2} - ${#BASH_REMATCH[0]} + ${#BASH_REMATCH[1]}))}" \
                    "${@:3}" \
                    "${2:0:$((${#2} - ${#BASH_REMATCH[0]}))}"
  else
    "${@:3}" "$2"
  fi
}

Example:

$ var="substring1 substring2 ONCE[0,10s] substring3"
$ call_with_split " ONCE\[[^]]*] " "$var" printf "%s\n"
substring1 substring2
substring3
like image 90
rici Avatar answered Oct 15 '22 17:10

rici


bash:

s='substring1 substring2 ONCE[0,10s] substring3'

if [[ $s =~ (.+)" ONCE["[0-9]+,[0-9]+[smhd]"] "(.+) ]]; then
    echo "${BASH_REMATCH[1]}"
    echo "${BASH_REMATCH[2]}"
else 
    echo no match
fi
substring1 substring2
substring3
like image 3
glenn jackman Avatar answered Oct 15 '22 16:10

glenn jackman


You could use awk. Specify the field separator as:

'ONCE[[]0,[^]]*[]] *'

For example, using your sample input:

$ awk -F 'ONCE[[]0,[^]]*[]] *' '{for(i=1;i<=NF;i++){printf $i"\n"}}' <<< "substring1 substring2 ONCE[0,10s] substring3"
substring1 substring2 
substring3
like image 2
devnull Avatar answered Oct 15 '22 16:10

devnull