Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash Regex Capture Groups

I have a single string that is this kind of format:

"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"

If I was writing a normal regex in JS, C#, etc, I'd do this

(?:"(.+?)"|'(.+?)'|(\S+))

And iterate the match groups to grab each string, ideally without the quotes. I ultimately want to add each value to an array, so in the example, I'd end up with 3 items in an array as follows:

Mike H<[email protected]>
[email protected] 
Mike H<[email protected]>

I can't figure out how to replicate this functionality with grep or sed or bash regex's. I've tried some things like

echo "$email" | grep -oP "\"\K(.+?)(?=\")|'\K(.+?)(?=')|(\S+)"

The problem with this is that while it kind of mimics the functionality of capture groups, it doesn't really work with multiples, so I get captures like

"Mike
H<[email protected]>"
 [email protected] 

If I remove the look ahead/behind logic, I at least get the 3 strings, but the first and last are still wrapped in quotes. In that approach, I pipe the output to read so I can individually add each string to the array, but I'm open to other options.

EDIT:

I think my input example may have been confusing, it's just a possible input. The real input could be double quoted, single quoted, or non-quoted (without spaces) strings in any order with any quantity. The Javascript/C# regex I provided is the real behavior I'm trying to achieve.

like image 340
mhaken Avatar asked Jan 30 '23 12:01

mhaken


2 Answers

You can use Perl:

$ email='"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"'
$ echo "$email" | perl -lane 'while (/"([^"]+)"|(\S+)/g) {print $1 ? $1 : $2}' 
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Or in pure Bash, it gets kinda wordy:

re='\"([^\"]+)\"[[:space:]]*|([^[:space:]]+)[[:space:]]*'
while [[ $email =~ $re ]]; do
    echo ${BASH_REMATCH[1]}${BASH_REMATCH[2]}
    i=${#BASH_REMATCH}
    email=${email:i}
done 
# same output
like image 56
dawg Avatar answered Feb 01 '23 02:02

dawg


You may use sed to achieve that,

$ sed -r 's/"(.*)" (.*)"(.*)"/\1\n\2\n\3/g' <<< "$EMAIL"
Mike H<[email protected]>
[email protected] 
Mike H<[email protected]>
like image 45
CWLiu Avatar answered Feb 01 '23 01:02

CWLiu