Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash Regular Expression -- Can't seem to match any of \s \S \d \D \w \W etc

Tags:

regex

bash

I have a script that is trying to get blocks of information from gparted.

My Data looks like:

Disk /dev/sda: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  316MB   315MB   primary  ext4            boot
 2      316MB   38.7GB  38.4GB  primary  ext4
 3      38.7GB  42.9GB  4228MB  primary  linux-swap(v1)

log4net.xml
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 42.9GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  316MB   315MB   primary  ext4            boot
 5      316MB   38.7GB  38.4GB  primary  ext4
 6      38.7GB  42.9GB  4228MB  primary  linux-swap(v1)

I use a regex to break this into two Disk blocks

^Disk (/dev[\S]+):((?!Disk)[\s\S])*

This works with multiline on.

When I test this in a bash script, I can't seem to match \s, or \S -- What am I doing wrong?

I am testing this through a script like:

data=`cat disks.txt`
morematches=1
x=0
regex="^Disk (/dev[\S]+):((?!Disk)[\s\S])*"

if [[ $data =~ $regex ]]; then
echo "Matched"
while [ $morematches == 1 ]
do
        x=$[x+1]
        if [[ ${BASH_REMATCH[x]} != "" ]]; then
                echo $x "matched" ${BASH_REMATCH[x]}
        else
                echo $x "Did not match"
                morematches=0;
        fi

done

fi

However, when I walk through testing parts of the regex, Whenever I match a \s or \S, it doesn't work -- what am I doing wrong?

like image 837
Yablargo Avatar asked Aug 29 '13 14:08

Yablargo


People also ask

What does \s mean in regex?

\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.

What does \d mean in regex?

\d (digit) matches any single digit (same as [0-9] ). The uppercase counterpart \D (non-digit) matches any single character that is not a digit (same as [^0-9] ). \s (space) matches any single whitespace (same as [ \t\n\r\f] , blank, tab, newline, carriage-return and form-feed).

What is \s in bash?

From man bash : -s If the -s option is present, or if no arguments remain after option processing, then commands are read from the standard input. This option allows the positional parameters to be set when invoking an interactive shell. From help set : -e Exit immediately if a command exits with a non-zero status.

What is the difference between \S and \s in regex?

As far as I know, \s means all white-space symbols and \S means all non white-spaced symbols or [^\s] so [\s\S] logically should be equivalent to .


2 Answers

Perhaps \S and \s are not supported, or that you cannot place them around [ ]. Try to use the following regex instead:

^Disk[[:space:]]+/dev[^[:space:]]+:[[:space:]]+[^[:space:]]+

EDIT

It seems like you actually want to get the matching fields. I simplified the script to this for that.

#!/bin/bash 

regex='^Disk[[:space:]]+(/dev[^[:space:]]+):[[:space:]]+(.*)'

while read line; do
    [[ $line =~ $regex ]] && echo "${BASH_REMATCH[1]} matches ${BASH_REMATCH[2]}."
done < disks.txt

Produces:

/dev/sda matches 42.9GB.
/dev/sdb matches 42.9GB.
like image 84
konsolebox Avatar answered Sep 27 '22 18:09

konsolebox


Because this is a common FAQ, let me list a few constructs which are not supported in Bash, and how to work around them, where there is a simple workaround.

There are multiple dialects of regular expressions in common use. The one supported by Bash is a variant of Extended Regular Expressions. This is different from e.g. what many online regex testers support, which is often the more modern Perl 5 / PCRE variant.

  • Bash doesn't support \d \D \s \S \w \W -- these can be replaced with POSIX character class equivalents [[:digit:]], [^[:digit:]], [[:space:]], [^[:space:]], [_[:alnum:]], and [^_[:alnum:]], respectively. (Notice the last case, where the [:alnum:] POSIX character class is augmented with underscore to be exactly equivalent to the Perl \w shorthand.)
  • Bash doesn't support non-greedy matching. You can sometimes replace a.*?b with something like a[^ab]*b to get a similar effect in practice, though the two are not exactly equivalent.
  • Bash doesn't support non-capturing parentheses (?:...). In the trivial case, just use capturing parentheses (...) instead; though of course, if you use capture groups and/or backreferences, this will renumber your capture groups.
  • Bash doesn't support lookarounds like (?<=before) or (?!after) and in fact anything with (? is a Perl extension. There is no simple general workaround for these, though you can often rephrase your problem into one where lookarounds can be avoided.
like image 20
tripleee Avatar answered Sep 27 '22 18:09

tripleee