Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using grep linux command with perl regex + capturing groups

Tags:

regex

linux

perl

so I've done some research on the subject and I didn't quite find the perfect solution. For example I have a string inside a variable.

var="a1b1c2"

now what I want to do is match only "a" follow by any digit, but I only want it to return the number after "a" To match it a rule such as

'a\d'

and since I only need the digit, I tried with

'a(\d)'

and maybe it did capture it somewhere, but I don't know where, the output here is still "a1"

I also tried a non-capturing group to ignore the "a" in the output, but no effect in perl regex:

'(?:a)\d'

for reference, this is the full command in my terminal:

[root@host ~]# var="a1b1c2"
[root@host ~]# echo $var |grep -oP "a(\d)"
a1 <--output

Probably it's also possible without the -P (some not-perl regex format), I'm thankful for every answer :)

EDIT: using

\K

is not really the solution, since I don't necessarily need the last part of the match.

EDIT2: I need to able to get any part of the match, for instance:

[root@host ~]# var="a1b1c2"
[root@host ~]# echo $var |grep -oP "(a)\d"
a1 <--output
but the wanted output in this case would be "a"

EDIT3: The problem is nearly solved using "look-behind assertions" such as:

(?<=a)\d

will not return the letter "a", only the digit following it, but it needs a fixed length, for example it cannot be used as:

(?<=\w+)\d

EDIT4: The best way so far is either using perl or combine a combination of look-behind assertions and the \K but it still seems to have some limitations. For example:

1234_foo_1234_bar
1234567_foo_123456789_bar
1_foo_12345_bar

if "foo" and "bar" are place-holders for words that don't always have the same length,
there is no way to match all above examples while output "foobar", since the
number between them doesn't have a fixed length, while it can't be done with \K since we need "foo"

Any further suggestions are still appreciated :)

like image 438
shiro Avatar asked Jul 10 '14 01:07

shiro


People also ask

Can you use regex with grep?

The grep command (short for Global Regular Expressions Print) is a powerful text processing tool for searching through files and directories. When grep is combined with regex (regular expressions), advanced searching and output filtering become simple.

How do I grep a pattern in Linux?

To find a pattern that is more than one word long, enclose the string with single or double quotation marks. The grep command can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern.

Is grep a Pcre?

grep understands three different versions of regular expression syntax: basic (BRE), extended (ERE), and Perl-compatible (PCRE).


2 Answers

After some testing I found out, that the pattern inside the look-behind assertion needs to be fixed length (something like (?<=\w+)something will not work, any suggestions?

As I posted and deleted my answer previously because you stated it did not fit your needs:

Most of the time, you can avoid variable length lookbehinds by using \K. This resets the starting point of the reported match and any previously consumed characters are no longer included. (throws away everything that it has matched up to that point.)

The key difference between using \K and a lookbehind is that, a lookbehind does not allow the use of quantifiers: the length of what you are looking for must be fixed. But \K can be placed anywhere in a pattern, so you are able to use any quantifiers.

As you can see in the below example, using a quantifier in the lookbheind will not work.

echo 'foosomething' | grep -Po '(?<=\w+)something'
#=> grep: lookbehind assertion is not fixed length

So you could do:

echo 'foosomething' | grep -Po '\w+\Ksomething'
#=> something

To get a substring only between two patterns, you can add Positive Lookahead into the mix.

echo 'foosomethingbar' | grep -Po 'foo\K.*?(?=bar)'
#=> something

Or used fixed Lookbehind combined with Lookahead.

echo 'foosomethingbar' | grep -Po '(?<=foo).*?(?=bar)'
#=> something
like image 192
hwnd Avatar answered Nov 05 '22 12:11

hwnd


The pattern (?<=a)\d uses a look-behind assertion to only print a digit following the letter 'a'. This works with GNU grep -Po, ack -o, and pcregrep -o. The assertion is zero width, so it isn't included in the match.

like image 33
Slade Avatar answered Nov 05 '22 12:11

Slade