Regex question: Match sequence only n times on a random place

Tags:

grep

I have a regex question, take for example:

...AAABZBZBCCCDDD...
...BZBZBDDDBZBZBCCC...

I am looking for a regular expression that matches BZBZB just n times.
in a line. So, if I wanted to match the sequence only once, I should only get the first line as output.

The string occurs on random places in the text. And the regex should be compatible with grep or egrep...

Thanks in advance.

515

asked Jan 07 '11 21:01

3sdmx

2 Answers

grep '\(.*BZBZB\)\{5\}' will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.

However, you can do this in two grep commands:

cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'

will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)

answered Oct 13 '22 03:10

Keith Irwin

From the grep man page:

   -m NUM, --max-count=NUM
    Stop  reading  a file after NUM matching lines.  If the input is
    standard input from a regular file, and NUM matching  lines  are
    output,  grep  ensures  that the standard input is positioned to
    just after the last matching line before exiting, regardless  of
    the  presence of trailing context lines.  This enables a calling
    process to resume a search.  When grep stops after NUM  matching
    lines,  it  outputs  any trailing context lines.  When the -c or
    --count option is also  used,  grep  does  not  output  a  count
    greater  than NUM.  When the -v or --invert-match option is also
    used, grep stops after outputting NUM non-matching lines.

So we need two grep expressions:

grep -e "BZ" -o
grep -e "BZ" -m n

The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.

>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ

Hopefully that is what you needed.

answered Oct 13 '22 03:10

mklauber

Related questions
                            
                                Ungreedy regex in C#
                            
                                Match all URLs except certain URLs in Chrome Extension
                            
                                How does negative lookahead with asterisks work?
                            
                                Java Regex validate username length
                            
                                Why regular expression ((x,y)|(x,z)) is nondeterministic?
                            
                                Regex Binary Pattern Search in PHP
                            
                                How do I make a word optional in a Cucumber step definition?
                            
                                How to remove text between multiple pairs of brackets?
                            
                                Remove characters after the last occurrence of a specific character
                            
                                Groovy: Idiomatic way to replace captured groups
                            
                                Remove everything after a character, but keep the character
                            
                                linux find files with optional character in their name
                            
                                How to insert a space between Chinese character and English character?
                            
                                Can I introspect a Regex's interpolated value?
                            
                                How do regular expressions work in selenium?
                            
                                Validate Multiple Emails Comma Separated with javascript
                            
                                How can I write only certain lines of a file in Perl?
                            
                                How to recall search pattern when writing replace regex pattern in Vim?
                            
                                match end of line javascript regex
                            
                                Using regexp in assertEquals() does not work

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With