Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Interval expressions with bash extended globbing

Tags:

bash

shell

I know for a fact, that bash supports extended glob with a regular expression like support for @(foo|bar), *(foo) and ?(foo). This syntax is quite unique i.e. different from that of EREs -- extended globs use a prefix notation (where the operator appears before its operands), rather than postfix like EREs.

I'm wondering does it support the interval expressions feature of type {n,m} i.e. if there is one number in the braces, the preceding regexp is repeated n times or if there are two numbers separated by a comma, the preceding regexp is repeated n to m times. I couldn't find a particular documentation that suggests this support enabled in extended glob.

Actual Question

I came across a requirement in one of the questions today, to remove only a pair of trailing zeroes in a string. Trying to solve this with the extended glob support in bash

Given some sample strings like

foobar0000
foobar00
foobar000

should produce

foobar00
foobar
foobar0

I tried using extended glob with parameter expansion to do

x='foobar000'

respectively. I tried using the interval expression as below which seemed obvious to me that it wouldn't work

echo ${x%%+([0]{2})}

i.e. similar using sed in ERE as sed -E 's/[0]{2}$//' or in BRE as sed 's/[0]\{2\}$//'

So my question being, is this possible using any of the extended glob operators? I'm looking for answers specific to using the extended glob support in bash would take 'No' if not possible too.

like image 937
Inian Avatar asked May 31 '18 18:05

Inian


People also ask

What is extended globbing?

Extended globs gives us more of the power of regular expressions for globbing. Unlike character sets or character classes, patterns can be more than one character and we can match multiple occurrences of a pattern.

What is globbing in Bash?

The Bash shell feature that is used for matching or expanding specific types of patterns is called globbing. Globbing is mainly used to match filenames or searching for content in a file. Globbing uses wildcard characters to create the pattern.


1 Answers

Somehow I managed to find a way to do this within the confinements of bash.

Are interval glob-expressions implemented in bash?

No! In contrast to other shells such as ksh and zsh, bash did not implement interval expressions for globbing.

Can we mimic interval expressions in bash?

Yes! However, it is not really practical and could sometimes benefit by using printf. The idea is to build the globular expression that mimics the {m,n} interval using the KSH-globs @(pattern) and ?(pattern).

In the explanation below, we assume that the pattern is stored in variable p

  • Match n occurrences of the given pattern ({n}):

    The idea is to repeat the pattern n times. For large n you can use printf

    $ var="foobar01010"
    $ echo ${var%%@(0|1)@(0|1)}
    foobar000
    

    or

    $ var="foobar01010"
    $ p=$(printf "@(0|1)%.0s" {1..4})
    $ echo ${var%%$p}
    foobar0
    
  • Match at least m occurrences of the given pattern ({m,}):

    It is the same as before, but with an additional *(pattern)

    $ var="foobar01010"
    $ echo ${var%%@(0|1)@(0|1)*(0|1)}
    foobar
    

    or

    $ var="foobar01010"
    $ p="(0|1)"
    $ q=$(printf "@$p%.0s" {1..4})
    $ echo ${var%%$q*$p}
    foobar
    
  • Match from n to m occurrences of the given pattern ({m,n}):

    The interval expression {n,m} implies we have for sure n appearances and m-n possible appearances. These can be constructed using the ksh-globs @(pat) n times and ?(pat) m-n times. For n=2 and m=3, this leads to:

    $ var="foobar01010"
    $ echo ${var%%@(0|1)@(0|1)?(0|1)}
    foobar010
    

    or

    $ p="(0|1)"
    $ q=$(printf "@$p%.0s" {1..n})$(printf "?$p%.0s" {n+1..m})
    $ echo ${var%%$q}
    foobar010
    $ var="foobar00200"
    foobar002
    $ var="foobar00020"
    foobar00020
    

    Another way to construct the interval expression {n,m} is using the ksh-glob anything but pattern written as !(pat) which allows us to say: give me all, except...

    man bash: !(pattern-list): Matches anything except one of the given patterns

    This way we can write

    $ echo ${var%%!(!(*$p)|@$p@$p@$p+$p|?$p)}
    

    or

    $ p="(0|1)"
    $ pn=$(printf "@$p%.0s" {1..n})
    $ pm=$(printf "?$p%.0s" {1..m-1})
    $ echo ${var%%!(!(*$p)|$pn+$p|$pm)}
    

    note: you need to do a double exclusion here due to the or (|) in the pattern list.

What about other shells?

KSH93

The interval expression {n,m} has been implemented in ksh93:

man ksh:

  • {n}(pattern-list) Matches n occurrences of the given patterns.
  • {m,n}(pattern-list) Matches from m to n occurrences of the given patterns. If m is omitted, 0 will be used. If n is omitted at least m occurrences will be matched.
$ echo ${var%%{2,3}(0|1)}

ZSH

Also zsh has a form of interval expression. It is a globbing flag which is part of the EXTENDED_GLOB option:

man zshall:

(#cN,M) The flag (#cN,M) can be used anywhere that the # or ## operators can be used except in the expressions (*/)# and (*/)## in filename generation, where / has special meaning; it cannot be combined with other globbing flags and a bad pattern error occurs if it is misplaced. It is equivalent to the form {N,M} in regular expressions. The previous character or group is required to match between N and M times, inclusive. The form (#cN) requires exactly N matches; (#c,M) is equivalent to specifying N as 0; (#cN,) specifies that there is no maximum limit on the number of matches.

$ echo ${var%%(0|1)(#c2,3)}
like image 99
kvantour Avatar answered Oct 25 '22 18:10

kvantour