Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract string from string using RegEx in the Terminal [duplicate]

Tags:

regex

grep

bash

I have a string like first url, second url, third url and would like to extract only the url after the word second in the OS X Terminal (only the first occurrence). How can I do it?

In my favorite editor I used the regex /second (url)/ and used $1 to extract it, I just don't know how to do it in the Terminal.

Keep in mind that url is an actual url, I'll be using one of these expressions to match it: Regex to match URL

like image 686
fregante Avatar asked Aug 20 '10 16:08

fregante


3 Answers

echo 'first url, second url, third url' | sed 's/.*second//'

Edit: I misunderstood. Better:

echo 'first url, second url, third url' | sed 's/.*second \([^ ]*\).*/\1/'

or:

echo 'first url, second url, third url' | perl -nle 'm/second ([^ ]*)/; print $1'
like image 121
Sjoerd Avatar answered Oct 21 '22 13:10

Sjoerd


Piping to another process (like 'sed' and 'perl' suggested above) might be very expensive, especially when you need to run this operation multiple times. Bash does support regexp:

[[ "string" =~ regex ]]

Similarly to the way you extract matches in your favourite editor by using $1, $2, etc., Bash fills in the $BASH_REMATCH array with all the matches.

In your particular example:

str="first url1, second url2, third url3"
if [[ $str =~ (second )([^,]*) ]]; then
  echo "match: '${BASH_REMATCH[2]}'"
else
  echo "no match found"
fi

Output:

match: 'url2'

Specifically, =~ supports extended regular expressions as defined by POSIX, but with platform-specific extensions (which vary in extent and can be incompatible).
On Linux platforms (GNU userland), see man grep; on macOS/BSD platforms, see man re_format.

like image 38
Dmitry Shevkoplyas Avatar answered Oct 21 '22 14:10

Dmitry Shevkoplyas


In the other answer provided you still remain with everything after the desired URL. So I propose you the following solution.

echo 'first url, second url, third url' | sed 's/.*second \(url\)*.*/\1/'

Under sed you group an expression by escaping the parenthesis around it (POSIX standard).

like image 11
mhitza Avatar answered Oct 21 '22 13:10

mhitza