Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex should not match numbers inside parenthesis

Tags:

regex

perl

I have a passage of verses and they are numbered. I want each numbered verse in separate line so I add a new line before them but I have some parenthesis that also have numbers. They too are replaced by new lines. I don't want to match the numbers inside parenthesis. I used

$_=~s/(\d+)/\n$1 /gs;

with this input:

1Hello2Hai (in 2:3) 3hi 4 bye

but it replaces the numbers inside paranthesis also.

Required output :

1 Hello
2 Hai (in 2:3)
3 hi
4 bye

Actual output:

1 Hello
2 Hai (in
2:
3)
3 hi
4 bye

How do I construct the regex so that it doesn't match inside parenthesis. I use perl for the regex.

like image 315
xtreak Avatar asked Dec 14 '13 12:12

xtreak


People also ask

Can you use parentheses in regex?

By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.

What does parenthesis mean in regex?

From the link you quoted: Using parentheses around a pattern “captures” the text matched by that pattern and sends it as an argument to the view function .

How do you match brackets in regex?

[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(. *\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string.


2 Answers

You can try this:

#!/usr/bin/perl 
use strict;
use warnings;

my $stro = <<'END';
1Hello2Hai (in 2:3) 3hi 4 bye
END

$stro =~s/(\((?>[^()]++|(?1))*\))(*SKIP)(*FAIL)|\s*(\d+)\s*/\n$2 /g;

print $stro;

pattern details:

The idea is to skip content in parenthesis. To do that I try to match parenthesis first with this recursive subpattern: (\((?>[^()]++|(?1))*\)) and I make the subpattern fail and force the regex engine to not retry the substring with an other alternative with (*SKIP) and (*FAIL) backtracking control verbs.

(*SKIP) forces to not retry the content matched on his left if the subpattern will fail later.

(*FAIL) forces the subpattern to fail.

An other way:

As you can read in the perl documentation, backtracking control verbs are an experimental regex feature and should be mentioned in a production code. (However, this feature exists for several years.)

Here is a simple way without these features: You match all that precedes a number and you remove it from the match result with the \K feature:

s/(?:(\((?>[^()]++|(?1))*\))|[^\d(]+)*\K\s*(\d+)\s*/\n$2 /g
like image 173
12 revs Avatar answered Oct 12 '22 00:10

12 revs


use this pattern
(\D+)(\d+)(?=((?!\)).)*\(|[^()]*$) with /g option
and replace with $1\n$2 Demo

or to adjust the indentation use this pattern
(\d+)\s*(?=((?!\)).)*\(|[^()]*$) with /g option
and replace with \n$1 Demo
except you have to get rid of the first blank line

like image 26
alpha bravo Avatar answered Oct 11 '22 23:10

alpha bravo