I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.
Simple case:
#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"
IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output
simple string: aaaaa/bbbbb/ddd/ffffff
split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff
More complex case:
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
for i in ${c_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
C
------ new part ------
------ new part ------
CC
DD
------ new part ------
D
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
EEE
FF
I would like the second output to be like
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
I.e. to split the string on a sequence of characters, instead of one. How can I do this?
I am looking for an answer that would only modify this line in the second script:
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
Using the tr Command to Split a String Into an Array in Bash It can be used to remove repeated characters, convert lowercase to uppercase, and replace characters. In the bash script below, the echo command pipes the string variable, $addrs , to the tr command, which splits the string variable on a delimiter, ; .
IFS
disambiguationIFS
mean Input Field Separators, as list of characters that could be used as separators
.
By default, this is set to
\t\n
, meaning that any number (greater than zero) of space, tabulation and/or newline could be one separator
.
So the string:
" blah foo=bar
baz "
Leading and trailing separators would be ignored and this string will contain only 3
parts: blah
, foo=bar
and baz
.
Splitting a string using IFS
is possible if you know a valid field separator not used in your string.
OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
But this work only while string do not contain §
.
You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})
but anyway this may involve furter bugs.
You could browse character maps for finding one who's not in your string:
myIfs=""
for i in {1..255};do
printf -v char "$(printf "\\\%03o" $i)"
[ "$c" == "${c#*$char}" ] && myIfs="$char" && break
done
if ! [ "$myIFS" ] ;then
echo no split char found, could not do the job, sorry.
exit 1
fi
but I find this solution a little overkill.
Under bash, we could use this bashism:
b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })
In fact, this syntaxe ${varname//
will initiate a translation (delimited by /
) replacing all occurences of /
by a space , before assigning it to an array
b_split
.
Of course, this still use IFS
and split array on spaces.
This is not the best way, but could work with specific cases.
You could even drop unwanted spaces before splitting:
b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>,
or exchange thems...
b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>,
strings
:So you have to not use IFS
for your meaning, but bash do have nice features:
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
echo "${c%%$mySep*}"
c="${c#*$mySep}"
done
echo "------ last part ------"
echo "$c"
Let see:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Nota: Leading and trailing newlines are not deleted. If this is needed, you could:
mySep=$'\n=======\n'
instead of simply =======
.
Or you could rewrite split loop for keeping explicitely this out:
mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
part="${c%%$mySep*}"
part="${part##$'\n'}"
echo "${part%%$'\n'}"
c="${c#*$mySep}"
done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"
Any case, this match what SO question asked for (: and his sample :)
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
array
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
part="${c%%$mySep*}"
part="${part##$'\n'}"
c_split+=("${part%%$'\n'}")
c="${c#*$mySep}"
done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
Do this finely:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
export -a var
to define var
as an array and share them in childs${variablename%string*}
, ${variablename%%string*}
result in the left part of variablename, upto but without string. One %
mean last occurence of string and %%
for all occurences. Full variablename is returned is string not found.${variablename#*string}
, do same in reverse way: return last part of variablename from but without string. One #
mean first occurence and two ##
man all occurences.Nota in replacement, character *
is a joker mean any number of any character.
The command echo "${c%%$'\n'}"
would echo variable c but without any number of newline at end of string.
So if variable contain Hello WorldZorGluBHello youZorGluBI'm happy
,
variable="Hello WorldZorGluBHello youZorGluBI'm happy"
$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy
$ echo ${variable##*ZorGluB}
I'm happy
$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you
$ echo ${variable%%ZorGluB*}
Hello World
$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy
$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm
$ echo ${variable##* }
happy
All this is explained in the manpage:
$ man -Len -Pless\ +/##word bash
$ man -Len -Pless\ +/%%word bash
$ man -Len -Pless\ +/^\\\ *export\\\ .*word bash
The separator:
mySep=$'======='
Declaring c_split
as an array (and could be shared with childs)
export -a c_split
While variable c do contain at least one occurence of mySep
while [ "$c" != "${c#*$mySep}" ];do
Trunc c from first mySep
to end of string and assign to part
.
part="${c%%$mySep*}"
Remove leading newlines
part="${part##$'\n'}"
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${part%%$'\n'}")
Reassing c whith the rest of string when left upto mySep
is removed
c="${c#*$mySep}"
Done ;-)
done
Remove leading newlines
c=${c##$'\n'}
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${c%%$'\n'}")
ssplit() {
local string="$1" array=${2:-ssplited_array} delim="${3:- }" pos=0
while [ "$string" != "${string#*$delim}" ];do
printf -v $array[pos++] "%s" "${string%%$delim*}"
string="${string#*$delim}"
done
printf -v $array[pos] "%s" "$string"
}
Usage:
ssplit "<quoted string>" [array name] [delimiter string]
where array name is $splitted_array
by default and delimiter is one single space.
You could use:
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF
do it with awk:
awk -vRS='\n=*\n' '{print "----- new part -----";print}' <<< $c
output:
kent$ awk -vRS='\n=*\n' '{print "----- new part -----";print}' <<< $c
----- new part -----
AA=A
B=BB
----- new part -----
C==CC
DD=D
----- new part -----
EEE
FF
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With