First, I don't know if this is actually possible but what I want to do is repeat a regex pattern. The pattern I'm using is:
sed 's/[^-\t]*\t[^-\t]*\t\([^-\t]*\).*/\1/' films.txt
An input of
250. 7.9 Shutter Island (2010) 110,675
Will return:
Shutter Island (2010)
I'm matching all none tabs, (250.) then tab, then all none tabs (7.9) then tab. Next I backrefrence the film title then matching all remaining chars (110,675).
It works fine, but im learning regex and this looks ugly, the regex [^-\t]*\t is repeated just after itself, is there anyway to repeat this like you can a character like a{2,2}?
I've tried ([^-\t]*\t){2,2}
(and variations) but I'm guessing that is trying to match [^-\t]*\t\t?
Also if there is any way to make my above code shorter and cleaner any help would be greatly appreciated.
This works for me:
sed 's/\([^\t]*\t\)\{2\}\([^\t]*\).*/\2/' films.txt
If your sed
supports -r
you can get rid of most of the escaping:
sed -r 's/([^\t]*\t){2}([^\t]*).*/\2/' films.txt
Change the first 2
to select different fields (0-3).
This will also work:
sed 's/[^\t]\+/\n&/3;s/.*\n//;s/\t.*//' films.txt
Change the 3
to select different fields (1-4).
To use repeating curly brackets and grouping brackets with sed
properly, you may have to escape it with backslashes like
sed 's/\([^-\t]*\t\)\{3\}.*/\1/' films.txt
Yes, this command will work properly with your example.
If you feel annoyed to, you can choose to put -r option which enables regex extended mode and forget about backslash escapes on brackets.
sed -r 's/([^-\t]*\t){3}.*/\1/' films.txt
Found that this is almost the same as Dennis Williamson's answer, but I'm leaving it because it's shorter expression to do the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With