Bash regex ungreedy match

Question

I have a regex pattern that is supposed to match at multiple places in a string. I want to get all the match groups into one array and then print every element.

So, I've been trying this:

#!/bin/bash

f=$'
	Share1   Disk
	Share2  Disk
	Prnt1  Printer'
regex=$'
	(.+?)\s+Disk'
if [[ $f =~ $regex ]]
then
    for match in "${BASH_REMATCH[@]}"
    do
        echo "New match: $match"
    done
else
    echo "No matches"
fi

Result:

New match: 
    Share1   Disk
    Share2  Disk
New match: Share1   Disk
    Share2

The expected result would have been

New match: Share1
New match: Share2

I think it doesn't work because my .+? is matching greedy. So I looked up how this could be accomplished with bash regex. But everyone seems to suggest to use grep with perl regex.

But surely there has to be another way. I was thinking maybe something like [^\s]+.. But the output for that was:

New match: 
    Share1   Disk
New match: Share1

... Any ideas?

Eric Renouf · Accepted Answer

There are a couple of issues here. First, the first element of BASH_REMATCH is the entire string that matched the pattern, not the capture group, so you want to use ${BASH_REMATCH[@]:1} to get those things that were in the capture groups.

However, bash regex doesn't support repeating the matches multiple times in the string, so bash probably isn't the right tool for this job. Since things are on their own lines though, you could try to use that to split things and apply the pattern to each line like:

f=$'
	Share1   Disk
	Share2  Disk
	Prnt1  Printer'
regex=$'	(\S+?)\s+Disk'
while IFS=$'
' read -r line; do
    if [[ $line =~ $regex ]]
    then
        printf 'New match: %s
' "${BASH_REMATCH[@]:1}"
    else
        echo "No matches"
    fi
done <<<"$f"

tripleee · Answer

As the accepted answer already states, the solution here is not really to use a non-greedy regex, because Bash doesn't support the notation .*? (it was introduced in Perl 5, and is available in languages whose regex implementation derives from that, but Bash is not one of them). But for visitors finding this question in Google, the answer to the actual question in the title is sometimes to simply use a more limited regex than .* to implement the non-greedy matching you are looking for.

For example,

re='(Disk.*)'
if [[ $f =~ $re ]]; then
 ... # ${BASH_REMATCH[0]} contains everything after (the first occurrence of) Disk

This is just a building block; you would have to take it from there with additional regex matches or a loop. See below for a non-regex variation which does by and large this.

If the thing you don't want to match is a specific character, using a negated character class is simple, elegant, convenient, and compatible back to the dark beginnings of Ken Thompson's original regular expression library. In the OP's example, it looks like you want to skip over a newline and a tab, then match any characters which are not literal spaces.

re=$'
	([^ ]+)'

But probably in this case a better solution is to actually use parameter expansions in a loop.

f=$'
	Share1   Disk
	Share2  Disk
	Prnt1  Printer'
result=()
f=${f#$'
	'}      # trim any newline + tab prefix
while true; do
  case $f in
    *\ Disk*)
        d=${f%% *}           # capture up to just before first space
        result+=("$d")
        f=${f#*$'
	'}     # trim up to next newline + tab
        ;;
    *)
        break ;;
  esac
done
echo "${result[@]}"

Bash regex ungreedy match

Tags:

regex

bash

regex-greedy

Forivin

2 Answers

Eric Renouf

tripleee

Recent Activity

Donate For Us

Bash regex ungreedy match

Tags:

regex

bash

regex-greedy

Forivin

2 Answers

Eric Renouf

tripleee

Related questions

Recent Activity

Donate For Us