Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ERE - adding quantifier to group with inner group and back-reference

Was trying to get words with consecutive repeated letters occurring twice or thrice. Not able find a way to use quantifier and capture group using ERE

$ grep --version | head -n1
grep (GNU grep) 2.25

$ # consecutive repeated letters occurring twice
$ grep -m5 -xiE '[a-z]*([a-z])\1[a-z]*[a-z]*([a-z])\2[a-z]*' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed

$ # no output for this, why?
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words


Works with -P though

$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed

$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){3}' /usr/share/dict/words
Chattahoochee
McConnell
Mississippi
Mississippian
Mississippians


Thanks Casimir et Hippolyte for coming up with simpler input and regex to test this behavior

$ echo 'aazbb' | grep -E '(([a-z])\2[a-z]*){2}' || echo 'No match'
aazbb
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*' || echo 'No match'
aazbbycc
$ echo 'aazbbycc' | grep -P '(([a-z])\2[a-z]*){3}' || echo 'No match'
aazbbycc

$ # failing case
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}' || echo 'No match'
No match

Same behavior seen with sed as well

$ sed --version | head -n1
sed (GNU sed) 4.2.2

$ echo 'aazbb' | sed -E '/(([a-z])\2[a-z]*){2}/! s/.*/No match/'
aazbb    
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*/! s/.*/No match/'
aazbbycc

$ # failing case
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){3}/! s/.*/No match/'
No match


Related search links, I checked some of them, but didn't get anything close to this question

  • https://savannah.gnu.org/bugs/?group=grep
  • http://lists.gnu.org/archive/html/bug-sed/

If this is solved in newer version of grep or sed, let me know. Also, if the issue is seen in non-GNU implementations

like image 388
Sundeep Avatar asked Apr 23 '17 15:04

Sundeep


People also ask

How do you cite a group in regex?

Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. (abc){3} matches abcabcabc.

How do you cite a capture group?

Normally, within a pattern, you create a back-reference to the content a capture group previously matched by using a backslash followed by the group number—for instance \1 for Group 1. (The syntax for replacements can vary.)

What are quantifiers in regex?

quantifier matches the preceding element zero or more times but as few times as possible. It's the lazy counterpart of the greedy quantifier * . In the following example, the regular expression \b\w*?

What is regex group?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.


1 Answers

I suppose -E doesn't allow Quantifiers, that's why it works only with -P


to match 2 or more consecutive groups of repeated letters:

grep -P '(?:([a-z])\1*([a-z])\2){1}' /usr/share/dict/words

to match 3 or more consecutive groups of repeated letters:

grep -P '(?:([a-z])\1*([a-z])\2){2}' /usr/share/dict/words

Options:

-P, --perl-regexp         PATTERN is a Perl regular expression
like image 87
Pedro Lobito Avatar answered Sep 18 '22 16:09

Pedro Lobito