Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeating a regex pattern

Tags:

regex

sed

First, I don't know if this is actually possible but what I want to do is repeat a regex pattern. The pattern I'm using is:

sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt

An input of

250.    7.9    Shutter Island (2010)    110,675

Will return:

Shutter Island (2010)

I'm matching all none tabs, (250.) then tab, then all none tabs (7.9) then tab. Next I backrefrence the film title then matching all remaining chars (110,675).

It works fine, but im learning regex and this looks ugly, the regex [^-\t]*\t is repeated just after itself, is there anyway to repeat this like you can a character like a{2,2}?

I've tried ([^-\t]*\t){2,2} (and variations) but I'm guessing that is trying to match [^-\t]*\t\t?

Also if there is any way to make my above code shorter and cleaner any help would be greatly appreciated.

like image 827
akd5446 Avatar asked Dec 01 '22 09:12

akd5446


2 Answers

This works for me:

sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt

If your sed supports -r you can get rid of most of the escaping:

sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt

Change the first 2 to select different fields (0-3).

This will also work:

sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt

Change the 3 to select different fields (1-4).

like image 97
Dennis Williamson Avatar answered Dec 05 '22 16:12

Dennis Williamson


To use repeating curly brackets and grouping brackets with sed properly, you may have to escape it with backslashes like

sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt

Yes, this command will work properly with your example.

If you feel annoyed to, you can choose to put -r option which enables regex extended mode and forget about backslash escapes on brackets.

sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt

Found that this is almost the same as Dennis Williamson's answer, but I'm leaving it because it's shorter expression to do the same.

like image 30
Ch.Idea Avatar answered Dec 05 '22 15:12

Ch.Idea