I have a passage of verses and they are numbered. I want each numbered verse in separate line so I add a new line before them but I have some parenthesis that also have numbers. They too are replaced by new lines. I don't want to match the numbers inside parenthesis. I used
$_=~s/(\d+)/\n$1 /gs;
with this input:
1Hello2Hai (in 2:3) 3hi 4 bye
but it replaces the numbers inside paranthesis also.
Required output :
1 Hello
2 Hai (in 2:3)
3 hi
4 bye
Actual output:
1 Hello
2 Hai (in
2:
3)
3 hi
4 bye
How do I construct the regex so that it doesn't match inside parenthesis. I use perl
for the regex
.
By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.
From the link you quoted: Using parentheses around a pattern “captures” the text matched by that pattern and sends it as an argument to the view function .
[^\(]* matches everything that isn't an opening bracket at the beginning of the string, (\(. *\)) captures the required substring enclosed in brackets, and [^\)]* matches everything that isn't a closing bracket at the end of the string.
You can try this:
#!/usr/bin/perl
use strict;
use warnings;
my $stro = <<'END';
1Hello2Hai (in 2:3) 3hi 4 bye
END
$stro =~s/(\((?>[^()]++|(?1))*\))(*SKIP)(*FAIL)|\s*(\d+)\s*/\n$2 /g;
print $stro;
pattern details:
The idea is to skip content in parenthesis. To do that I try to match parenthesis first with this recursive subpattern: (\((?>[^()]++|(?1))*\))
and I make the subpattern fail and force the regex engine to not retry the substring with an other alternative with (*SKIP)
and (*FAIL)
backtracking control verbs.
(*SKIP)
forces to not retry the content matched on his left if the subpattern will fail later.
(*FAIL)
forces the subpattern to fail.
An other way:
As you can read in the perl documentation, backtracking control verbs are an experimental regex feature and should be mentioned in a production code. (However, this feature exists for several years.)
Here is a simple way without these features: You match all that precedes a number and you remove it from the match result with the \K
feature:
s/(?:(\((?>[^()]++|(?1))*\))|[^\d(]+)*\K\s*(\d+)\s*/\n$2 /g
use this pattern(\D+)(\d+)(?=((?!\)).)*\(|[^()]*$)
with /g option
and replace with $1\n$2
Demo
or to adjust the indentation use this pattern(\d+)\s*(?=((?!\)).)*\(|[^()]*$)
with /g option
and replace with \n$1
Demo
except you have to get rid of the first blank line
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With