Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash script pattern matching

I need a to find patterns that are 6 digits and the first 3 digits are specific digits, but the remaining 3 digits will be any digit. For example, 6 digit strings starting with 123 followed by any 3 digits.

var1="abc,123111,"
var2="abcdefg,123222,"
var3="xyzabc,987111,"

if [[ $var1 == *",123ddd,"* ]] ; then echo "Pattern matched"; fi

Where ddd are any digits. var1 and var2 would match the pattern but var 3 would not. I can't seem to get it just right.

like image 507
Bdgisme Avatar asked Jun 22 '17 01:06

Bdgisme


People also ask

What is shell pattern matching?

2.3. 4 Shell Pattern MatchingA shell pattern is a string that may contain the following special characters, which are known as wildcards or metacharacters. You must quote patterns that contain metacharacters to prevent the shell from expanding them itself.

What is the command for pattern matching in Linux?

The grep filter searches a file for a particular pattern of characters, and displays all lines that contain that pattern. The pattern that is searched in the file is referred to as the regular expression (grep stands for global search for regular expression and print out).

What =~ in Bash?

The bash documentation just calls it the =~ operator. Is it only used to compare the right side against the left side? The right side is considered an extended regular expression. If the left side matches, the operator returns 0 , and 1 otherwise.


2 Answers

Use a character class: [0-9] matches 0, 9, and every character between them in the character set, which - at least in Unicode (e.g. UTF-8) and subset character sets (e.g. US-ASCII, Latin-1) - are the digits 1 through 8. So it matches any one of the 10 Latin digits.

if [[ $var1 == *,123[0-9][0-9][0-9],* ]] ; then echo "Pattern matched"; fi

Using =~ instead of == changes the pattern type from shell standard "glob" patterns to regular expressions ("regexes" for short). You can make an equivalent regex a little shorter:

if [[ $var1 =~ ,123[0-9]{3}, ]] ; then echo "Pattern matched"; fi

The first shortening comes from the fact that a regex only has to match any part of the string, not the whole thing. Therefore you don't need the equivalent of the leading and trailing *s that you find in the glob pattern.

The second length reduction is due to the {n} syntax, which lets you specify an exact number of repetitions of the previous pattern instead of actually repeating the pattern itself in the regex. (You can also match any of a range of repetition counts by specifying a minimum and maximum, such as [0-9]{2,4} to match either two, three, or four digits in a row.)

It's worth noting that you could also use a named character class to match digits. Depending on your locale, [[:digit:]] may be exactly equivalent to [0-9], or it may include characters from other scripts with the Unicode "Number, Decimal Digit" property.

if [[ $var1 =~ ,123[[:digit:]]{3}, ]] ; then echo "Pattern matched"; fi
like image 87
Mark Reed Avatar answered Oct 31 '22 17:10

Mark Reed


Bash glob pattern matching [0-9] can be used to match digit:

if [[ $var1 == *,123[0-9][0-9][0-9],* ]] ; then echo "Pattern matched"; fi

Alternatively, you can use regex pattern matching with =~:

if [[ $var1 =~ .*,123[0-9]{3},.* ]] ; then echo "Pattern matched"; fi
like image 34
zhenguoli Avatar answered Oct 31 '22 17:10

zhenguoli