Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep - How would I match a regex using only two characters, but with each character occuring the same number of times?

Tags:

regex

grep

perl

Using grep, I am trying to match lines which are comprised of two characters, one followed repeated followed by the other, but only match when the number of first character occurances is equal to the occurrences of the second character.

As an example, imagine that I can only match two characters like '0' and '1'. Now imagine that if there are n '0' characters then there must be n '1' characters following directly afterward. For example:

  • ''
  • '0011'
  • '000111'
  • '00000000001111111111'

would all match. But:

  • '011'
  • '1100'
  • '110001'

wouldn't match.

I've been playing around with capture groups and trawling through perldoc for more info on grep -P but haven't found any leads to solve my problem - with grep at least.

How could I make a grep command to match strings given these constraints?

EDIT:

  • In this example, the 0s should come before the 1s as per the restriction "following directly afterward"
  • The empty string should also be a match case because by the example restrictions, when there are n 0s there should be n 1s, so with zero 0s there should be zero 1s.
like image 205
Matthew Brian Avatar asked Feb 27 '21 05:02

Matthew Brian


People also ask

How do you match a character sequence in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Can you use regex with grep?

GNU grep supports three regular expression syntaxes, Basic, Extended, and Perl-compatible. In its simplest form, when no regular expression type is given, grep interpret search patterns as basic regular expressions. To interpret the pattern as an extended regular expression, use the -E ( or --extended-regexp ) option.

How do you grep a character?

To match a character that is special to grep –E, put a backslash ( \ ) in front of the character. It is usually simpler to use grep –F when you don't need special pattern matching.

Which option would you choose to force grep to use a basic regular expression Bre?

Basic and extended regular expressions are two variations on the syntax of the specified pattern. Basic Regular Expression (BRE) syntax is the default in sed (and similarly in grep ). Use the POSIX-specified -E option ( -r , --regexp-extended ) to enable Extended Regular Expression (ERE) syntax.


3 Answers

See EDIT below for update to clarifications


Here's a Perl one-liner instead of grep

perl -wne'print if /^((.)\g{-1}+)((.)\g{-1}+)$/ and length $1 == length $3' file

The length comparison of matches is clearly done outside of regex; I don't see that it can be done inside nicely, and I don't see anything wrong with using code which isn't regex :)

This doesn't match with a single character (ab), what wouldn't really make sense and what seems excluded from the question. The anchors (^ and $) make it so that it can match only strings with two characters, what seems to be specified.

That \g{-1} is a relative backreference. It matches the same subpattern that was captured last, which is what we need instead of a simple backreference (\g1).

This is need because \g1 refers to the first capture, the set of parens started first (leftmost), which is the capture of the whole pattern. (We can use \g2 but it is bad practice to count them off.)

This can be made nicer by using named references, but then it would be more elaborate as well.


EDIT   Following the clarifications, whereby it must be 0s first then the same number of 1s, and 0-repetitions count (so an empty line), as well as 1-repetition of course (so 01). This simplifies matters greatly, for just

perl -wne'print if /^(0*)(1*)$/ and length $1 == length $2' file

The 0 and 1 can be made into variables which can be supplied as external arguments, if desired (so it can be any grammar, a and b etc).

It prints as expected on the example input from the question, so on input file

0011

000111
00000000001111111111
01

011
1100
110001

it prints

0011

000111
00000000001111111111
01

(the last empty line in output being the empty line in the middle, after which no more lines match)


That is, without employing tricky features that run code inside regex, which would make it far more complex. If you still wish to play with that see it in perlre and in perlretut.

Or, this can also be done using recursion in regex, with similar (or little lesser?) complexity.

like image 185
zdim Avatar answered Oct 18 '22 02:10

zdim


This awk one line should do the job:

cat file

0011

000111
00000000001111111111
011
1100
11000
awk '/^0*1*$/ && gsub(/0/, "&") == gsub(/1/, "&")' file

0011
000111
00000000001111111111

Or if you want to print numbers that may have 1s followed by 0s then use:

# awk command
awk '/^(0*1*|1*0*)$/ && gsub(/0/, "&") == gsub(/1/, "&")' file

0011
000111
00000000001111111111
1100

gsub function returns number of replacements.


Since you've used grep tag, here is one gnu grep command with -P (PCRE recursive) regex:

grep -P '^(0(?1)?1|1(?1)?0)?$' file

0011
000111
00000000001111111111
1100

grep RegEx Demo

like image 3
anubhava Avatar answered Oct 18 '22 03:10

anubhava


With your shown samples only, in case you are ok with awk you could try following.

awk 'match($0,/^0+/){num1=RLENGTH;match($0,/1+/);if(num1==RLENGTH){print}}' Input_file

Explanation: Adding detailed explanation for above.

awk '                          ##Starting awk program from here.
match($0,/^0+/){               ##Using match function to match starting zeroes here.
  num1=RLENGTH                 ##Creating num1 here with rlength.
  match($0,/1+/)               ##Matching all ones now.
  if(num1==RLENGTH){ print }   ##Checking condition if num1 is equal to current length then print the line.
}
' Input_file                   ##mentioning Input_file name here.
like image 3
RavinderSingh13 Avatar answered Oct 18 '22 01:10

RavinderSingh13