I have a regex question, take for example:
I am looking for a regular expression that matches BZBZB just n times.
in a line. So, if I wanted to match the sequence only once, I should only get the first line as output.
The string occurs on random places in the text. And the regex should be compatible with grep or egrep...
Thanks in advance.
is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
How do you match a character sequence in regex? To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches “.” ; regex \+ matches “+” ; and regex \( matches “(” .
Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.
A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.
grep '\(.*BZBZB\)\{5\}'
will do 5 times, but this will match anything which appears 5 times or more because grep checks if any substring of a line matches. Because grep doesn't have any way to do negative matching of strings in its regular expressions (only characters), this cannot be done with a single command unless, for example, you knew that the characters used in the string to be matched were not used elsewhere.
However, you can do this in two grep commands:
cat temp.txt | grep '\(.*BZBZB\)\{5\}' | grep -v '\(.*BZBZB\)\{6\}'
will return lines in which BZBZB appears exactly 5 times. (Basically, it's doing a positive check for 5 or more times and then a negative check for six or more times.)
From the grep man page:
-m NUM, --max-count=NUM Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are output, grep ensures that the standard input is positioned to just after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. When grep stops after NUM matching lines, it outputs any trailing context lines. When the -c or --count option is also used, grep does not output a count greater than NUM. When the -v or --invert-match option is also used, grep stops after outputting NUM non-matching lines.
So we need two grep expressions:
grep -e "BZ" -o
grep -e "BZ" -m n
The first one finds all instances of "BZ" in the previous string, without including the content around the lines. Each instance is spit out on its own line. The second one takes each line spit out and continues until n lines have been found.
>>>"ABZABZABX" |grep -e "BZ" -o | grep -e "BZ" -m 1
BZ
Hopefully that is what you needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With