Using grep, I am trying to match lines which are comprised of two characters, one followed repeated followed by the other, but only match when the number of first character occurances is equal to the occurrences of the second character. As an example, imagine that I can only match two characters like '0' and '1'. Now imagine that if there are n '0' characters then there must be n '1' characters following directly afterward. For example: <ul> <li>''</li> <li>'0011'</li> <li>'000111'</li> <li>'00000000001111111111'</li> </ul> would all match. But: <ul> <li>'011'</li> <li>'1100'</li> <li>'110001'</li> </ul> wouldn't match. I've been playing around with capture groups and trawling through perldoc for more info on grep -P but haven't found any leads to solve my problem - with grep at least. How could I make a grep command to match strings given these constraints? EDIT: <ul> <li>In this example, the 0s should come before the 1s as per the restriction "following directly afterward"</li> <li>The empty string should also be a match case because by the example restrictions, when there are n 0s there should be n 1s, so with zero 0s there should be zero 1s.</li> </ul>

See EDIT below for update to clarifications <hr> Here's a Perl one-liner instead of <code>grep</code> <pre class="prettyprint"><code>perl -wne'print if /^((.)\g{-1}+)((.)\g{-1}+)$/ and length $1 == length $3' file </code></pre> The length comparison of matches is clearly done outside of regex; I don't see that it can be done inside nicely&dagger;, and I don't see anything wrong with using code which isn't regex :) This doesn't match with a single character (<code>ab</code>), what wouldn't really make sense and what seems excluded from the question. The anchors (<code>^</code> and <code>$</code>) make it so that it can match only strings with two characters, what seems to be specified. That <code>\g{-1}</code> is a relative backreference. It matches the same subpattern that was captured last, which is what we need instead of a simple backreference (<code>\g1</code>). This is need because <code>\g1</code> refers to the first capture, the set of parens started first (leftmost), which is the capture of the whole pattern. (We can use <code>\g2</code> but it is bad practice to count them off.) This can be made nicer by using named references, but then it would be more elaborate as well. <hr> EDIT Following the clarifications, whereby it must be <code>0</code>s first then the same number of <code>1</code>s, and <code>0</code>-repetitions count (so an empty line), as well as <code>1</code>-repetition of course (so <code>01</code>). This simplifies matters greatly, for just <pre class="prettyprint"><code>perl -wne'print if /^(0*)(1*)$/ and length $1 == length $2' file </code></pre> The <code>0</code> and <code>1</code> can be made into variables which can be supplied as external arguments, if desired (so it can be any grammar, <code>a</code> and <code>b</code> etc). It prints as expected on the example input from the question, so on input <code>file</code> <pre class="prettyprint"> 0011 000111 00000000001111111111 01 011 1100 110001 </pre> it prints <pre class="prettyprint"> 0011 000111 00000000001111111111 01 </pre> (the last empty line in output being the empty line in the middle, after which no more lines match) <hr> &dagger; That is, without employing tricky features that run code inside regex, which would make it far more complex. If you still wish to play with that see it in perlre and in perlretut. Or, this can also be done using recursion in regex, with similar (or little lesser?) complexity.

This <code>awk</code> one line should do the job: <pre class="prettyprint lang-sh prettyprint-override"><code>cat file 0011 000111 00000000001111111111 011 1100 11000 </code></pre> <pre class="prettyprint lang-sh prettyprint-override"><code>awk '/^0*1*$/ && gsub(/0/, "&") == gsub(/1/, "&")' file 0011 000111 00000000001111111111 </code></pre> Or if you want to print numbers that may have <code>1</code>s followed by <code>0</code>s then use: <pre class="prettyprint lang-sh prettyprint-override"><code># awk command awk '/^(0*1*|1*0*)$/ && gsub(/0/, "&") == gsub(/1/, "&")' file 0011 000111 00000000001111111111 1100 </code></pre> <code>gsub</code> function returns number of replacements. <hr> Since you've used <code>grep</code> tag, here is one <code>gnu grep</code> command with <code>-P</code> (PCRE recursive) regex: <pre class="prettyprint lang-sh prettyprint-override"><code>grep -P '^(0(?1)?1|1(?1)?0)?$' file 0011 000111 00000000001111111111 1100 </code></pre> grep RegEx Demo

With your shown samples only, in case you are ok with <code>awk</code> you could try following. <pre class="prettyprint"><code>awk 'match($0,/^0+/){num1=RLENGTH;match($0,/1+/);if(num1==RLENGTH){print}}' Input_file </code></pre> Explanation: Adding detailed explanation for above. <pre class="prettyprint"><code>awk ' ##Starting awk program from here. match($0,/^0+/){ ##Using match function to match starting zeroes here. num1=RLENGTH ##Creating num1 here with rlength. match($0,/1+/) ##Matching all ones now. if(num1==RLENGTH){ print } ##Checking condition if num1 is equal to current length then print the line. } ' Input_file ##mentioning Input_file name here. </code></pre>

grep - How would I match a regex using only two characters, but with each character occuring the same number of times?

Tags:

regex

grep

perl

Using grep, I am trying to match lines which are comprised of two characters, one followed repeated followed by the other, but only match when the number of first character occurances is equal to the occurrences of the second character.

As an example, imagine that I can only match two characters like '0' and '1'. Now imagine that if there are n '0' characters then there must be n '1' characters following directly afterward. For example:

''
'0011'
'000111'
'00000000001111111111'

would all match. But:

'011'
'1100'
'110001'

wouldn't match.

I've been playing around with capture groups and trawling through perldoc for more info on grep -P but haven't found any leads to solve my problem - with grep at least.

How could I make a grep command to match strings given these constraints?

EDIT:

In this example, the 0s should come before the 1s as per the restriction "following directly afterward"
The empty string should also be a match case because by the example restrictions, when there are n 0s there should be n 1s, so with zero 0s there should be zero 1s.

205

asked Feb 27 '21 05:02

Matthew Brian

3 Answers

See EDIT below for update to clarifications

Here's a Perl one-liner instead of grep

perl -wne'print if /^((.)\g{-1}+)((.)\g{-1}+)$/ and length $1 == length $3' file

The length comparison of matches is clearly done outside of regex; I don't see that it can be done inside nicely^†, and I don't see anything wrong with using code which isn't regex :)

This doesn't match with a single character (ab), what wouldn't really make sense and what seems excluded from the question. The anchors (^ and $) make it so that it can match only strings with two characters, what seems to be specified.

That \g{-1} is a relative backreference. It matches the same subpattern that was captured last, which is what we need instead of a simple backreference (\g1).

This is need because \g1 refers to the first capture, the set of parens started first (leftmost), which is the capture of the whole pattern. (We can use \g2 but it is bad practice to count them off.)

This can be made nicer by using named references, but then it would be more elaborate as well.

EDIT Following the clarifications, whereby it must be 0s first then the same number of 1s, and 0-repetitions count (so an empty line), as well as 1-repetition of course (so 01). This simplifies matters greatly, for just

perl -wne'print if /^(0*)(1*)$/ and length $1 == length $2' file

The 0 and 1 can be made into variables which can be supplied as external arguments, if desired (so it can be any grammar, a and b etc).

It prints as expected on the example input from the question, so on input file

0011

000111
00000000001111111111
01

011
1100
110001

it prints

0011

000111
00000000001111111111
01

(the last empty line in output being the empty line in the middle, after which no more lines match)

^† That is, without employing tricky features that run code inside regex, which would make it far more complex. If you still wish to play with that see it in perlre and in perlretut.

Or, this can also be done using recursion in regex, with similar (or little lesser?) complexity.

185

answered Oct 18 '22 02:10

zdim

This awk one line should do the job:

cat file

0011

000111
00000000001111111111
011
1100
11000

awk '/^0*1*$/ && gsub(/0/, "&") == gsub(/1/, "&")' file

0011
000111
00000000001111111111

Or if you want to print numbers that may have 1s followed by 0s then use:

# awk command
awk '/^(0*1*|1*0*)$/ && gsub(/0/, "&") == gsub(/1/, "&")' file

0011
000111
00000000001111111111
1100

gsub function returns number of replacements.

Since you've used grep tag, here is one gnu grep command with -P (PCRE recursive) regex:

grep -P '^(0(?1)?1|1(?1)?0)?$' file

0011
000111
00000000001111111111
1100

grep RegEx Demo

answered Oct 18 '22 03:10

anubhava

With your shown samples only, in case you are ok with awk you could try following.

awk 'match($0,/^0+/){num1=RLENGTH;match($0,/1+/);if(num1==RLENGTH){print}}' Input_file

Explanation: Adding detailed explanation for above.

awk '                          ##Starting awk program from here.
match($0,/^0+/){               ##Using match function to match starting zeroes here.
  num1=RLENGTH                 ##Creating num1 here with rlength.
  match($0,/1+/)               ##Matching all ones now.
  if(num1==RLENGTH){ print }   ##Checking condition if num1 is equal to current length then print the line.
}
' Input_file                   ##mentioning Input_file name here.

answered Oct 18 '22 01:10

RavinderSingh13

Related questions
                            
                                plain js to select element by attribute name starts with
                            
                                How to extract multiple values with a regular expression in Jmeter
                            
                                Regex for Passport Number
                            
                                How to use regular expressions in Swift 3?
                            
                                What is a valid UUID?
                            
                                Performing a Regex search and Replace on a std::string
                            
                                How to specify regex replacement for different capturing groups in F#
                            
                                Select data using a regular expression
                            
                                How to replace multiple matches / groups with regexes?
                            
                                Remove special characters from string in Red language
                            
                                Creating New Column In Pandas Dataframe Using Regex [duplicate]
                            
                                Swift, phone number regex
                            
                                Javascript .match regular expression with reverse quantifier (or parse right to left)
                            
                                Select characters that appear only once in a string
                            
                                How can I get the second matcher in regex in Java? [duplicate]
                            
                                Regex find instance of dash, but not <space>dash<space>
                            
                                How to split all strings in a column AND include prefix in all the new data
                            
                                Split string with repeated delimiters
                            
                                Yup schema validation: Exclude a certain pattern in the method '.matches()'
                            
                                replace number greater than 5 digits in a text

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With