shell script. how to extract string using regular expressions

Tags:

I am new to shell scripts. I want to send a http request using curl and then extract some string using regular expressions. For example, how can I extract a domain name from a http response? (The example is for learning purposes only)

#!/bin/bash name=$(curl google.com | grep "www\..*com") echo "domain name is" echo $name

392

asked Nov 02 '13 00:11

ashim

1 Answers

Using bash regular expressions:

re="http://([^/]+)/" if [[ $name =~ $re ]]; then echo ${BASH_REMATCH[1]}; fi

Edit - OP asked for explanation of syntax. Regular expression syntax is a large topic which I can't explain in full here, but I will attempt to explain enough to understand the example.

re="http://([^/]+)/"

This is the regular expression stored in a bash variable, re - i.e. what you want your input string to match, and hopefully extract a substring. Breaking it down:

http:// is just a string - the input string must contain this substring for the regular expression to match
[] Normally square brackets are used say "match any character within the brackets". So c[ao]t would match both "cat" and "cot". The ^ character within the [] modifies this to say "match any character except those within the square brackets. So in this case [^/] will match any character apart from "/".
The square bracket expression will only match one character. Adding a + to the end of it says "match 1 or more of the preceding sub-expression". So [^/]+ matches 1 or more of the set of all characters, excluding "/".
Putting () parentheses around a subexpression says that you want to save whatever matched that subexpression for later processing. If the language you are using supports this, it will provide some mechanism to retrieve these submatches. For bash, it is the BASH_REMATCH array.
Finally we do an exact match on "/" to make sure we match all the way to end of the fully qualified domain name and the following "/"

Next, we have to test the input string against the regular expression to see if it matches. We can use a bash conditional to do that:

if [[ $name =~ $re ]]; then     echo ${BASH_REMATCH[1]} fi

In bash, the [[ ]] specify an extended conditional test, and may contain the =~ bash regular expression operator. In this case we test whether the input string $name matches the regular expression $re. If it does match, then due to the construction of the regular expression, we are guaranteed that we will have a submatch (from the parentheses ()), and we can access it using the BASH_REMATCH array:

Element 0 of this array ${BASH_REMATCH[0]} will be the entire string matched by the regular expression, i.e. "http://www.google.com/".
Subsequent elements of this array will be subsequent results of submatches. Note you can have multiple submatch () within a regular expression - The BASH_REMATCH elements will correspond to these in order. So in this case ${BASH_REMATCH[1]} will contain "www.google.com", which I think is the string you want.

Note that the contents of the BASH_REMATCH array only apply to the last time the regular expression =~ operator was used. So if you go on to do more regular expression matches, you must save the contents you need from this array each time.

This may seem like a lengthy description, but I have really glossed over several of the intricacies of regular expressions. They can be quite powerful, and I believe with decent performance, but the regular expression syntax is complex. Also regular expression implementations vary, so different languages will support different features and may have subtle differences in syntax. In particular escaping of characters within a regular expression can be a thorny issue, especially when those characters would have an otherwise different meaning in the given language.

Note that instead of setting the $re variable on a separate line and referring to this variable in the condition, you can put the regular expression directly into the condition. However in bash 3.2, the rules were changed regarding whether quotes around such literal regular expressions are required or not. Putting the regular expression in a separate variable is a straightforward way around this, so that the condition works as expected in all bash versions that support the =~ match operator.

126

answered Sep 22 '22 03:09

Digital Trauma

Related questions
                            
                                Detect strings with non English characters in Python
                            
                                Remove trailing zero in Java
                            
                                Regex to *not* match any characters
                            
                                Regex for youtube URL
                            
                                regex match any whitespace
                            
                                XPath with regex match on an attribute value
                            
                                DataAnnotations validation (Regular Expression) in asp.net mvc 4 - razor view
                            
                                How do you translate this regular-expression idiom from Perl into Python?
                            
                                Javascript replace method, replace with "$1"
                            
                                Perl iterate through each match
                            
                                regex pattern to match the end of a string
                            
                                Making a Regex Django URL Token Optional
                            
                                Regex to match only uppercase "words" with some exceptions
                            
                                Kibana query exact match
                            
                                Python regex findall
                            
                                Regex: To pull out a sub-string between two tags in a string
                            
                                How can I match query string variables with mod_rewrite?
                            
                                How to split long regular expression rules to multiple lines in Python
                            
                                Convert dash-separated string to camelCase?
                            
                                Regex: match word that ends with "Id"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

shell script. how to extract string using regular expressions

Tags:

regex

shell

curl

ashim

People also ask

1 Answers

Digital Trauma

Recent Activity

Donate For Us