Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non greedy (reluctant) regex matching in sed?

I'm trying to use sed to clean up lines of URLs to extract just the domain.

So from:

http://www.suepearson.co.uk/product/174/71/3816/ 

I want:

http://www.suepearson.co.uk/ 

(either with or without the trailing slash, it doesn't matter)

I have tried:

 sed 's|\(http:\/\/.*?\/\).*|\1|' 

and (escaping the non-greedy quantifier)

sed 's|\(http:\/\/.*\?\/\).*|\1|' 

but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.

like image 785
Joel Avatar asked Jul 09 '09 10:07

Joel


People also ask

Can you use regex in sed?

The sed command has longlist of supported operations that can be performed to ease the process of editing text files. It allows the users to apply the expressions that are usually used in programming languages; one of the core supported expressions is Regular Expression (regex).

Is regex matching greedy?

The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.

What is greedy and non-greedy in regex?

It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible. On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.

What is greedy and non-greedy matching in Python?

So the difference between the greedy and the non-greedy match is the following: The greedy match will try to match as many repetitions of the quantified pattern as possible. The non-greedy match will try to match as few repetitions of the quantified pattern as possible.


1 Answers

Neither basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a later regex. Fortunately, Perl regex for this context is pretty easy to get:

perl -pe 's|(http://.*?/).*|\1|' 
like image 130
chaos Avatar answered Sep 23 '22 23:09

chaos