Was trying to get words with consecutive repeated letters occurring twice or thrice. Not able find a way to use quantifier and capture group using ERE <pre class="prettyprint"><code>$ grep --version | head -n1 grep (GNU grep) 2.25 $ # consecutive repeated letters occurring twice $ grep -m5 -xiE '[a-z]*([a-z])\1[a-z]*[a-z]*([a-z])\2[a-z]*' /usr/share/dict/words Abbott Annabelle Annette Appaloosa Appleseed $ # no output for this, why? $ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words </code></pre> Works with <code>-P</code> though <pre class="prettyprint"><code>$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words Abbott Annabelle Annette Appaloosa Appleseed $ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){3}' /usr/share/dict/words Chattahoochee McConnell Mississippi Mississippian Mississippians </code></pre> Thanks Casimir et Hippolyte for coming up with simpler input and regex to test this behavior <pre class="prettyprint"><code>$ echo 'aazbb' | grep -E '(([a-z])\2[a-z]*){2}' || echo 'No match' aazbb $ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*' || echo 'No match' aazbbycc $ echo 'aazbbycc' | grep -P '(([a-z])\2[a-z]*){3}' || echo 'No match' aazbbycc $ # failing case $ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}' || echo 'No match' No match </code></pre> Same behavior seen with <code>sed</code> as well <pre class="prettyprint"><code>$ sed --version | head -n1 sed (GNU sed) 4.2.2 $ echo 'aazbb' | sed -E '/(([a-z])\2[a-z]*){2}/! s/.*/No match/' aazbb $ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*/! s/.*/No match/' aazbbycc $ # failing case $ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){3}/! s/.*/No match/' No match </code></pre> Related search links, I checked some of them, but didn't get anything close to this question <ul> <li>https://savannah.gnu.org/bugs/?group=grep</li> <li>http://lists.gnu.org/archive/html/bug-sed/</li> </ul> If this is solved in newer version of <code>grep</code> or <code>sed</code>, let me know. Also, if the issue is seen in non-GNU implementations

<strike>I suppose <code>-E</code> doesn't allow <code>Quantifiers</code>, that's why it works only with <code>-P</code></strike> <hr> to match 2 or more consecutive groups of repeated letters: <pre class="prettyprint"><code>grep -P '(?:([a-z])\1*([a-z])\2){1}' /usr/share/dict/words </code></pre> to match 3 or more consecutive groups of repeated letters: <pre class="prettyprint"><code>grep -P '(?:([a-z])\1*([a-z])\2){2}' /usr/share/dict/words </code></pre> <hr> Options: <pre class="prettyprint"><code>-P, --perl-regexp PATTERN is a Perl regular expression </code></pre>

ERE - adding quantifier to group with inner group and back-reference

Tags:

regex

grep

sed

gnu

pcre

Was trying to get words with consecutive repeated letters occurring twice or thrice. Not able find a way to use quantifier and capture group using ERE

$ grep --version | head -n1
grep (GNU grep) 2.25

$ # consecutive repeated letters occurring twice
$ grep -m5 -xiE '[a-z]*([a-z])\1[a-z]*[a-z]*([a-z])\2[a-z]*' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed

$ # no output for this, why?
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words

Works with -P though

$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed

$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){3}' /usr/share/dict/words
Chattahoochee
McConnell
Mississippi
Mississippian
Mississippians

Thanks Casimir et Hippolyte for coming up with simpler input and regex to test this behavior

$ echo 'aazbb' | grep -E '(([a-z])\2[a-z]*){2}' || echo 'No match'
aazbb
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*' || echo 'No match'
aazbbycc
$ echo 'aazbbycc' | grep -P '(([a-z])\2[a-z]*){3}' || echo 'No match'
aazbbycc

$ # failing case
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}' || echo 'No match'
No match

Same behavior seen with sed as well

$ sed --version | head -n1
sed (GNU sed) 4.2.2

$ echo 'aazbb' | sed -E '/(([a-z])\2[a-z]*){2}/! s/.*/No match/'
aazbb    
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){2}([a-z])\3[a-z]*/! s/.*/No match/'
aazbbycc

$ # failing case
$ echo 'aazbbycc' | sed -E '/(([a-z])\2[a-z]*){3}/! s/.*/No match/'
No match

Related search links, I checked some of them, but didn't get anything close to this question

https://savannah.gnu.org/bugs/?group=grep
http://lists.gnu.org/archive/html/bug-sed/

If this is solved in newer version of grep or sed, let me know. Also, if the issue is seen in non-GNU implementations

388

asked Apr 23 '17 15:04

Sundeep

1 Answers

~~I suppose -E doesn't allow Quantifiers, that's why it works only with -P~~

to match 2 or more consecutive groups of repeated letters:

grep -P '(?:([a-z])\1*([a-z])\2){1}' /usr/share/dict/words

to match 3 or more consecutive groups of repeated letters:

grep -P '(?:([a-z])\1*([a-z])\2){2}' /usr/share/dict/words

Options:

-P, --perl-regexp         PATTERN is a Perl regular expression

answered Sep 18 '22 16:09

Pedro Lobito

Related questions
                            
                                Ruby extract data from string using regex
                            
                                How to extract decimal number from string in C#
                            
                                Constructing regex pattern to match sentence
                            
                                laravel validation rule for only letters
                            
                                Why are people using regexp for email and other complex validation?
                            
                                ng-pattern for only numbers will accept chars like '-' in angular.js
                            
                                Format a string using regex in Java
                            
                                How to match with regex all special chars except "-" in PHP?
                            
                                Remove every white space between tags using JavaScript
                            
                                Notepad++ Search and Replace: delete all after "/" in each row
                            
                                How do I write a regex in PHP to remove special characters?
                            
                                Regex to remove HTML attribute from any HTML tag (style="")?
                            
                                Replace Comma(,) with Dot(.) RegEx php
                            
                                Match strings with regular expression in ignore case
                            
                                Find last character in a string in PHP
                            
                                pg_dump --exclude-table pattern matching
                            
                                Convert Json date string to JavaScript date object
                            
                                Blackberry Bold- Unable to recognize URLs and even custom patterns registered
                            
                                Virtual machine from regular expression
                            
                                Java scanner usage with \R pattern (issue with buffer boundary)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With