Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using match to find substrings in strings with only bash

Tags:

regex

bash

match

Although I am almost sure this has been covered, I can't seem to find anything specific to this. As I continue my journey on learning bash I keep finding parts where I am baffled as to why things happen the way they do.

Searching and replacing or just matching sub-strings in strings is most likely one of the first thing you do when writing scripts. But, trying to stick to one single language or set of tools is difficult to do in bash, as you are able to solve most problem in multiple ways. I am doing my best to stay as low level as possible with bash. I have run into a snag that I need someone to explain to me.

Doing sub-string a search in bash with match gives me different results depending on the regular expression I use, and I am not sure why.

#!/bin/bash
Stext="Hallo World"
echo `expr "$Stext" : '^\(.[a-z]*\)'` # Hallo
echo `expr "$Stext" : '.*World'`      # 11

Although both search for a word, I think, both don't return what they find. Why?

like image 916
Adesso Avatar asked Mar 07 '12 08:03

Adesso


People also ask

How do you check if a substring is in a string bash?

To check if a string contains a substring in Bash, use comparison operator == with the substring surrounded by * wildcards.

How do you find the substring of a string in a shell?

Using Regex Operator Another option to determine whether a specified substring occurs within a string is to use the regex operator =~ . When this operator is used, the right string is considered as a regular expression. The period followed by an asterisk .


2 Answers

You can use the BASH_REMATCH variable in bash to get the matched string:

$ Stext="Hallo World"
$ [[ $Stext =~ ^.[a-z]* ]] && echo $BASH_REMATCH
Hallo
$ [[ $Stext =~ ^(.[a-z]*) ]] && echo ${BASH_REMATCH[1]}
Hallo

Substrings matched by parenthesized subexpressions within the regular expression are saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of the string matching the entire regular expression. The element of BASH_REMATCH with index n is the portion of the string matching the nth parenthesized subexpression.

like image 169
kev Avatar answered Oct 11 '22 15:10

kev


Both expressions are equivalent, the difference is the regular expression you use:

$ echo `expr "$Stext" : '^\(.[a-z]*\)'`
Hallo
$ echo `expr "$Stext" : '^.[a-z]*'`
5
$ echo `expr "$Stext" : '\(.*World\)'`
Hallo World
$ echo `expr "$Stext" : '.*World'`
11

As you can see, parentheses is what makes the difference to either return the length of the match or the match itself.

You can find more examples in Chapter 10 of the Advanced Bash-Scripting Guide.

like image 40
jcollado Avatar answered Oct 11 '22 14:10

jcollado